├── README.md ├── datasets.py ├── detection ├── README.md ├── configs │ ├── _base_ │ │ ├── datasets │ │ │ ├── cityscapes_detection.py │ │ │ ├── cityscapes_instance.py │ │ │ ├── coco_detection.py │ │ │ ├── coco_instance.py │ │ │ ├── coco_instance_semantic.py │ │ │ ├── deepfashion.py │ │ │ ├── lvis_v0.5_instance.py │ │ │ ├── lvis_v1_instance.py │ │ │ ├── voc0712.py │ │ │ └── wider_face.py │ │ ├── default_runtime.py │ │ ├── models │ │ │ ├── cascade_mask_rcnn_r50_fpn.py │ │ │ ├── cascade_rcnn_r50_fpn.py │ │ │ ├── fast_rcnn_r50_fpn.py │ │ │ ├── faster_rcnn_r50_caffe_c4.py │ │ │ ├── faster_rcnn_r50_caffe_dc5.py │ │ │ ├── faster_rcnn_r50_fpn.py │ │ │ ├── mask_rcnn_r50_caffe_c4.py │ │ │ ├── mask_rcnn_r50_fpn.py │ │ │ ├── retinanet_r50_fpn.py │ │ │ ├── rpn_r50_caffe_c4.py │ │ │ ├── rpn_r50_fpn.py │ │ │ └── ssd300.py │ │ └── schedules │ │ │ ├── schedule_1x.py │ │ │ ├── schedule_20e.py │ │ │ └── schedule_2x.py │ ├── mask_rcnn_p2t_b_fpn_1x_coco.py │ ├── mask_rcnn_p2t_l_fpn_1x_coco.py │ ├── mask_rcnn_p2t_s_fpn_1x_coco.py │ ├── mask_rcnn_p2t_t_fpn_1x_coco.py │ ├── retinanet_p2t_b_fpn_1x_coco.py │ ├── retinanet_p2t_l_fpn_1x_coco.py │ ├── retinanet_p2t_s_fpn_1x_coco.py │ └── retinanet_p2t_t_fpn_1x_coco.py ├── dist_test.sh ├── dist_train.sh ├── p2t.py ├── test.py └── train.py ├── engine.py ├── figures └── p2t-arch.jpg ├── hubconf.py ├── main.py ├── mcloader ├── __init__.py ├── classification.py ├── data_prefetcher.py ├── image_list.py ├── imagenet.py └── mcloader.py ├── p2t.py ├── samplers.py ├── segmentation ├── README.md ├── align_resize.py ├── configs │ ├── _base_ │ │ ├── datasets │ │ │ ├── ade20k.py │ │ │ ├── chase_db1.py │ │ │ ├── cityscapes.py │ │ │ ├── drive.py │ │ │ ├── hrf.py │ │ │ ├── pascal_context.py │ │ │ ├── pascal_voc12.py │ │ │ ├── pascal_voc12_aug.py │ │ │ └── stare.py │ │ ├── default_runtime.py │ │ ├── models │ │ │ ├── fpn_r50.py │ │ │ └── upernet_r50.py │ │ └── schedules │ │ │ ├── schedule_160k.py │ │ │ ├── schedule_20k.py │ │ │ ├── schedule_40k.py │ │ │ └── schedule_80k.py │ ├── sem_fpn_p2t_b_ade20k_80k.py │ ├── sem_fpn_p2t_l_ade20k_80k.py │ ├── sem_fpn_p2t_s_ade20k_80k.py │ └── sem_fpn_p2t_t_ade20k_80k.py ├── dist_test.sh ├── dist_train.sh ├── p2t.py ├── test.py └── train.py └── utils.py /README.md: -------------------------------------------------------------------------------- 1 | ## [TPAMI22] Pyramid Pooling Transformer for Scene Understanding 2 | 3 | This is the official repository for Pyramid Pooling Transformer (P2T). This repository contains: 4 | 5 | * [x] Full code for training/test 6 | * [x] Pretrained models in image classification, object detection, and semantic segmentation. 7 | 8 | Related links: 9 | [[Official PDF Download]](https://mmcheng.net/wp-content/uploads/2022/09/22TPAMI-P2T.pdf) 10 | [[中译版全文]](https://mmcheng.net/wp-content/uploads/2022/08/22PAMI_P2T_CN.pdf) 11 | [[5分钟中文解读]](https://mp.weixin.qq.com/s/7qXtyFaIiYny0eUqBbPraQ) 12 | 13 | ### Requirements: 14 | 15 | * torch>=1.7+ 16 | * torchvision>=0.7.0+ 17 | * timm>=0.3.2 18 | 19 | Validated on Torch 1.6/1.7/1.8, timm 0.3.2/0.4.12 20 | 21 | ### Introduction 22 | 23 | Pyramid pooling transformer (P2T) is a new generation backbone network,, benefiting many fundamental downstream vision tasks like object detection, semantic segmentation, and instance segmentation. 24 | 25 | Although pyramid pooling has demonstrated its power on many downstream tasks such as object detection (SPP) and semantic segmentation (PSPNet), it has not been explored on the backbone network, which serves a cornerstone for many downstream vision tasks. 26 | P2T first bridges the gap between pyramid pooling and backbone network. 27 | The core idea is P2T is adapting pyramid pooling to the downsampling of the flatten sequences in computing the self-attention, 28 | simultaneously reducing the sequence length and capturing powerful multi-scale contextual features. 29 | Pyramid pooling is also very efficient and will only induce negligible computational cost. 30 | 31 | In the experiments, P2T beats all the CNN/Transformer competitors such as ResNet, ResNeXt, Res2Net, PVT, Swin, Twins, and PVTv2, on image classification, semantic segmentation, object detection, and instance segmentation. 32 | 33 | 34 | 35 | ### Image Classification 36 | 37 | | Variants | Input Size | Acc Top-1 | Acc Top-5 | #Params (M) | # GFLOPS | Google Drive | 38 | |:---------------:|:---------:|:-----:|:-----:|:-----------:|:-----------------:|-----------------| 39 | | P2T-Tiny | 224 x 224 | 79.8 | 94.9 | 11.6 | 1.8 | [[weights]](https://drive.google.com/file/d/1x9EweWx77pXrHOCc7RJF3sYK2rht0_4m/view?usp=sharing)\|[[log]](https://drive.google.com/file/d/1CDofCg9pi0Cyiha_dIimggF228M5mOeH/view?usp=sharing) | 40 | | P2T-Small | 224 x 224 | 82.4 | 96.0 | 24.1 | 3.7 | [[weights]](https://drive.google.com/file/d/1FlwhyVKw0zqj2mux248gIQFQ8DGPi8rS/view?usp=sharing)\|[[log]](https://drive.google.com/file/d/1bCZz7y0I0EEw74KaVg5iAr3hBYtSIEii/view?usp=sharing) | 41 | | P2T-Base | 224 x 224 | 83.5 | 96.6 | 36.2 | 6.5 | [[weights]](https://drive.google.com/file/d/1iZoWexUTPUDSIZiJHNRt2zZl2kFj68F4/view?usp=sharing)\|[[log]](https://drive.google.com/file/d/13_XaX0XtYSzPatVl54ihFbEwflHLVvsl/view?usp=sharing) | 42 | | P2T-Large | 224 x 224 | 83.9 | 96.7 | 54.5 | 9.8 | [[weights]](https://drive.google.com/file/d/13jBJ7ShDJd1juViC-zPtfLXYPRwkNfya/view?usp=sharing)\|[[log]](https://drive.google.com/file/d/1-RLjGzez-_O2_8obbXvUYGhWacPnqK1U/view?usp=sharing) | 43 | 44 | All models are trained on ImageNet1K dataset. You can see all weights/logs at this url: [[Google Drive]](https://drive.google.com/drive/folders/1Osweqc1OphwtWONXIgD20q9_I2arT9yz?usp=sharing) 45 | [BaiduPan, 提取码yhwu](https://pan.baidu.com/s/1JkE62CS9EoSTLW1M1Ajmxw?pwd=yhwu) 46 | 47 | 48 | 49 | ### Semantic Segmentation 50 | 51 | #### ADE20K (val set) 52 | 53 | | Base Model | Variants | mIoU | aAcc | mAcc | #Params (M) | # GFLOPS | Google Drive | 54 | | :--: | :-------: | :--: | :--: | :---------: | :------: | :----------------------------------------------------------: | :----------------------------------------------------------: | 55 | | Semantic FPN | P2T-Tiny | 43.4 | 80.8 | 54.5 | 15.4 | 31.6 | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) | 56 | | Semantic FPN | P2T-Small | 46.7 | 82.0 | 58.4 | 27.8 | 42.7 | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) | 57 | | Semantic FPN | P2T-Base | 48.7 | 82.9 | 60.7 | 39.8 | 58.5 | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) | 58 | | Semantic FPN | P2T-Large | 49.4 | 83.3 | 61.9 | 58.1 | 77.7 | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) | 59 | 60 | The training and validation scripts can refer to the `segmentation` folder. 61 | 62 | BaiduPan download link: [BaiduPan, 提取码yhwu](https://pan.baidu.com/s/1JkE62CS9EoSTLW1M1Ajmxw?pwd=yhwu) 63 | 64 | ### Object Detection 65 | 66 | Tested on the coco validation set 67 | 68 | 69 | | Base Model | Variants | AP | AP@0.5 | AP@0.75 | #Params (M) | # GFLOPS | 70 | | :--------: | :-------: | :--: | :----: | :-----: | :---------: | :------: | 71 | | RetinaNet | P2T-Tiny | 41.3 | 62.0 | 44.1 | 21.1 | 206 | 72 | | RetinaNet | P2T-Small | 44.4 | 65.3 | 47.6 | 33.8 | 260 | 73 | | RetinaNet | P2T-Base | 46.1 | 67.5 | 49.6 | 45.8 | 344 | 74 | | RetinaNet | P2T-Large | 47.2 | 68.4 | 50.9 | 64.4 | 449 | 75 | 76 | Use this address to access all pretrained weights and logs: [[Google Drive]](https://drive.google.com/drive/folders/1fcg7n3Ga8cYoT-3Ar0PeQXjAC3AnQYyY?usp=sharing) 77 | 78 | BaiduPan download link: [BaiduPan, 提取码yhwu](https://pan.baidu.com/s/1JkE62CS9EoSTLW1M1Ajmxw?pwd=yhwu) 79 | 80 | 81 | ### Instance Segmentation 82 | 83 | Tested on the coco val set 84 | 85 | 86 | | Base Model | Variants | APb | APb@0.5 | APm | APm@0.5 | #Params (M) | # GFLOPS | 87 | | :--------: | :-------: | :--: | :-----: | :--: | :-----: | :---------: | :------: | 88 | | Mask R-CNN | P2T-Tiny | 43.3 | 65.7 | 39.6 | 62.5 | 31.3 | 225 | 89 | | Mask R-CNN | P2T-Small | 45.5 | 67.7 | 41.4 | 64.6 | 43.7 | 279 | 90 | | Mask R-CNN | P2T-Base | 47.2 | 69.3 | 42.7 | 66.1 | 55.7 | 363 | 91 | | Mask R-CNN | P2T-Large | 48.3 | 70.2 | 43.5 | 67.3 | 74.0 | 467 | 92 | 93 | `APb` denotes AP box metric, and `APm` is the AP mask metric. 94 | 95 | Use this address to access all pretrained weights and logs: [[Google Drive]](https://drive.google.com/drive/folders/1fcg7n3Ga8cYoT-3Ar0PeQXjAC3AnQYyY?usp=sharing) 96 | 97 | ### Train 98 | 99 | Use the following commands to train `P2T-Small` for distributed learning with 8 GPUs: 100 | 101 | ````bash 102 | python -m torch.distributed.launch --nproc_per_node=8 \ 103 | --master_port=$((RANDOM+10000)) --use_env main.py --data-path ${YOUR_DATA_PATH} --batch-size 128 --model p2t_small --drop-path 0.1 104 | # model names: --model p2t_tiny/p2t_small/p2t_base/p2t_large 105 | # with --drop-path 0.1/0.1/0.3/0.3 106 | # replace ${YOUR_DATA_PATH} with your data path that contains train/ val/ directory 107 | ```` 108 | 109 | ### Validate the performance 110 | 111 | Download the pretrained weights to `pretrained` directory first. Then use the following commands to validate the performance: 112 | 113 | ````bash 114 | python main.py --eval --resume pretrained/p2t_small.pth --model p2t_small 115 | ```` 116 | 117 | ### Citation 118 | 119 | If you are using the code/model/data provided here in a publication, please consider citing our works: 120 | 121 | ```` 122 | @ARTICLE{wu2022p2t, 123 | author={Wu, Yu-Huan and Liu, Yun and Zhan, Xin and Cheng, Ming-Ming}, 124 | journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 125 | title={{P2T}: Pyramid Pooling Transformer for Scene Understanding}, 126 | year={2022}, 127 | doi = {10.1109/tpami.2022.3202765}, 128 | } 129 | ```` 130 | 131 | ### Other Notes 132 | 133 | If you meet any problems, please do not hesitate to contact us. 134 | Issues and discussions are welcome in the repository! 135 | You can also contact us via sending messages to this email: wuyuhuan@mail.nankai.edu.cn 136 | 137 | 138 | ### License 139 | 140 | This code is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License for Non-Commercial use only. Any commercial use should get formal permission first. 141 | 142 | -------------------------------------------------------------------------------- /datasets.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2015-present, Facebook, Inc. 2 | # All rights reserved. 3 | import os 4 | import json 5 | 6 | from torchvision import datasets, transforms 7 | from torchvision.datasets.folder import ImageFolder, default_loader 8 | 9 | from timm.data.constants import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD 10 | from timm.data import create_transform 11 | from mcloader import ClassificationDataset 12 | 13 | 14 | class INatDataset(ImageFolder): 15 | def __init__(self, root, train=True, year=2018, transform=None, target_transform=None, 16 | category='name', loader=default_loader): 17 | self.transform = transform 18 | self.loader = loader 19 | self.target_transform = target_transform 20 | self.year = year 21 | # assert category in ['kingdom','phylum','class','order','supercategory','family','genus','name'] 22 | path_json = os.path.join(root, f'{"train" if train else "val"}{year}.json') 23 | with open(path_json) as json_file: 24 | data = json.load(json_file) 25 | 26 | with open(os.path.join(root, 'categories.json')) as json_file: 27 | data_catg = json.load(json_file) 28 | 29 | path_json_for_targeter = os.path.join(root, f"train{year}.json") 30 | 31 | with open(path_json_for_targeter) as json_file: 32 | data_for_targeter = json.load(json_file) 33 | 34 | targeter = {} 35 | indexer = 0 36 | for elem in data_for_targeter['annotations']: 37 | king = [] 38 | king.append(data_catg[int(elem['category_id'])][category]) 39 | if king[0] not in targeter.keys(): 40 | targeter[king[0]] = indexer 41 | indexer += 1 42 | self.nb_classes = len(targeter) 43 | 44 | self.samples = [] 45 | for elem in data['images']: 46 | cut = elem['file_name'].split('/') 47 | target_current = int(cut[2]) 48 | path_current = os.path.join(root, cut[0], cut[2], cut[3]) 49 | 50 | categors = data_catg[target_current] 51 | target_current_true = targeter[categors[category]] 52 | self.samples.append((path_current, target_current_true)) 53 | 54 | # __getitem__ and __len__ inherited from ImageFolder 55 | 56 | 57 | def build_dataset(is_train, args): 58 | transform = build_transform(is_train, args) 59 | 60 | if args.data_set == 'CIFAR': 61 | dataset = datasets.CIFAR100(args.data_path, train=is_train, transform=transform) 62 | nb_classes = 100 63 | elif args.data_set == 'IMNET': 64 | if not args.use_mcloader: 65 | root = os.path.join(args.data_path, 'train' if is_train else 'val') 66 | dataset = datasets.ImageFolder(root, transform=transform) 67 | else: 68 | dataset = ClassificationDataset( 69 | 'train' if is_train else 'val', 70 | pipeline=transform 71 | ) 72 | nb_classes = 1000 73 | elif args.data_set == 'INAT': 74 | dataset = INatDataset(args.data_path, train=is_train, year=2018, 75 | category=args.inat_category, transform=transform) 76 | nb_classes = dataset.nb_classes 77 | elif args.data_set == 'INAT19': 78 | dataset = INatDataset(args.data_path, train=is_train, year=2019, 79 | category=args.inat_category, transform=transform) 80 | nb_classes = dataset.nb_classes 81 | 82 | return dataset, nb_classes 83 | 84 | 85 | def build_transform(is_train, args): 86 | resize_im = args.input_size > 32 87 | if is_train: 88 | # this should always dispatch to transforms_imagenet_train 89 | transform = create_transform( 90 | input_size=args.input_size, 91 | is_training=True, 92 | color_jitter=args.color_jitter, 93 | auto_augment=args.aa, 94 | interpolation=args.train_interpolation, 95 | re_prob=args.reprob, 96 | re_mode=args.remode, 97 | re_count=args.recount, 98 | ) 99 | if not resize_im: 100 | # replace RandomResizedCropAndInterpolation with 101 | # RandomCrop 102 | transform.transforms[0] = transforms.RandomCrop( 103 | args.input_size, padding=4) 104 | return transform 105 | 106 | t = [] 107 | if resize_im: 108 | size = int((256 / 224) * args.input_size) 109 | t.append( 110 | transforms.Resize(size, interpolation=3), # to maintain same ratio w.r.t. 224 images 111 | ) 112 | t.append(transforms.CenterCrop(args.input_size)) 113 | 114 | t.append(transforms.ToTensor()) 115 | t.append(transforms.Normalize(IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD)) 116 | return transforms.Compose(t) 117 | -------------------------------------------------------------------------------- /detection/README.md: -------------------------------------------------------------------------------- 1 | ## [TPAMI22] Pyramid Pooling Transformer for Scene Understanding 2 | 3 | This folder contains full training and test code for semantic segmentation. 4 | 5 | ### Requirements 6 | 7 | * mmdetection == 2.14 8 | 9 | We train each model based on `mmdetection==2.8.0`. 10 | Since new GPU cards (RTX 3000 series) should compile mmcv from source to support this early version, 11 | we reorganize the config and support newer mmdetection version. 12 | Therefore, you can simply reproduce the result on newer GPUs. 13 | 14 | ### Data Preparation 15 | 16 | Put MS COCO dataset files to `data/coco/`. 17 | 18 | ### Object Detection 19 | 20 | Tested on the coco validation set 21 | 22 | 23 | | Base Model | Variants | AP | AP@0.5 | AP@0.75 | #Params (M) | # GFLOPS | 24 | | :--: | :-------: | :--: | :--: | :---------: | :------: | :----------------------------------------------------------: | 25 | | RetinaNet | P2T-Tiny | 41.3 | 62.0 | 44.1 | 21.1 | 206 | 26 | | RetinaNet | P2T-Small | 44.4 | 65.3 | 47.6 | 33.8 | 260 | 27 | | RetinaNet | P2T-Base | 46.1 | 67.5 | 49.6 | 45.8 | 344 | 28 | | RetinaNet | P2T-Large | 47.2 | 68.4 | 50.9 | 64.4 | 449 | 29 | 30 | Use this address to access all pretrained weights and logs: [[Google Drive]](https://drive.google.com/drive/folders/1fcg7n3Ga8cYoT-3Ar0PeQXjAC3AnQYyY?usp=sharing) 31 | 32 | ### Instance Segmentation 33 | 34 | Tested on the coco val set 35 | 36 | 37 | | Base Model | Variants | APb | APb@0.5 | APm | APm@0.5 | #Params (M) | # GFLOPS | 38 | | :--: | :-------: | :--: | :--: | :---------: | :------: | :----------------------------------------------------------: | :----------------------------------------------------------: | 39 | | Mask R-CNN | P2T-Tiny | 43.3 | 65.7 | 39.6 | 62.5 | 31.3 | 225 | 40 | | Mask R-CNN | P2T-Small | 45.5 | 67.7 | 41.4 | 64.6 | 43.7 | 279 | 41 | | Mask R-CNN | P2T-Base | 47.2 | 69.3 | 42.7 | 66.1 | 55.7 | 363 | 42 | | Mask R-CNN | P2T-Large | 48.3 | 70.2 | 43.5 | 67.3 | 74.0 | 467 | 43 | 44 | `APb` denotes AP box metric, and `APm` is the AP mask metric. 45 | 46 | Use this address to access all pretrained weights and logs: [[Google Drive]](https://drive.google.com/drive/folders/1fcg7n3Ga8cYoT-3Ar0PeQXjAC3AnQYyY?usp=sharing) 47 | 48 | 49 | ### Train 50 | 51 | Before training, please make sure you have `mmdetection==2.14` and downloaded the ImageNet-pretrained P2T weights from [[Google Drive]](https://drive.google.com/drive/folders/1Osweqc1OphwtWONXIgD20q9_I2arT9yz?usp=sharing) or 52 | [[BaiduPan, 提取码yhwu]](https://pan.baidu.com/s/1JkE62CS9EoSTLW1M1Ajmxw?pwd=yhwu). 53 | Put them to `pretrained/` folder. 54 | 55 | Use the following commands to train `Mask R-CNN` with `P2T-Tiny` backbone for distributed learning with 8 GPUs: 56 | 57 | ```` 58 | bash dist_train.sh configs/mask_rcnn_p2t_t_fpn_1x_coco.py 8 59 | ```` 60 | 61 | Other configs are on the `configs` directory. 62 | 63 | ### Validate 64 | 65 | Please download the pretrained model from [[Google Drive]](https://drive.google.com/drive/folders/1fcg7n3Ga8cYoT-3Ar0PeQXjAC3AnQYyY?usp=sharing) or [[BaiduPan, 提取码yhwu]](https://pan.baidu.com/s/1JkE62CS9EoSTLW1M1Ajmxw?pwd=yhwu). Put them to `pretrained` folder. 66 | Then, use the following commands to validate `Semantic FPN` with `P2T-Small` backbone in a single GPU: 67 | 68 | ```` 69 | bash dist_test.sh configs/mask_rcnn_p2t_t_fpn_1x_coco.py pretrained/mask_rcnn_p2t_t_fpn_1x_coco-d875fa68.pth 1 70 | ```` 71 | 72 | 73 | ### Other Notes 74 | 75 | If you meet any problems, please do not hesitate to contact us. 76 | Issues and discussions are welcome in the repository! 77 | You can also contact us via sending messages to this email: wuyuhuan@mail.nankai.edu.cn 78 | 79 | 80 | 81 | ### Citation 82 | 83 | If you are using the code/model/data provided here in a publication, please consider citing our works: 84 | 85 | ```` 86 | @ARTICLE{wu2022p2t, 87 | author={Wu, Yu-Huan and Liu, Yun and Zhan, Xin and Cheng, Ming-Ming}, 88 | journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 89 | title={{P2T}: Pyramid Pooling Transformer for Scene Understanding}, 90 | year={2022}, 91 | doi = {10.1109/tpami.2022.3202765}, 92 | } 93 | ```` 94 | 95 | ### License 96 | 97 | This code is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License for Non-Commercial use only. Any commercial use should get formal permission first. 98 | 99 | -------------------------------------------------------------------------------- /detection/configs/_base_/datasets/cityscapes_detection.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'CityscapesDataset' 3 | data_root = 'data/cityscapes/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict(type='LoadAnnotations', with_bbox=True), 9 | dict( 10 | type='Resize', img_scale=[(2048, 800), (2048, 1024)], keep_ratio=True), 11 | dict(type='RandomFlip', flip_ratio=0.5), 12 | dict(type='Normalize', **img_norm_cfg), 13 | dict(type='Pad', size_divisor=32), 14 | dict(type='DefaultFormatBundle'), 15 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 16 | ] 17 | test_pipeline = [ 18 | dict(type='LoadImageFromFile'), 19 | dict( 20 | type='MultiScaleFlipAug', 21 | img_scale=(2048, 1024), 22 | flip=False, 23 | transforms=[ 24 | dict(type='Resize', keep_ratio=True), 25 | dict(type='RandomFlip'), 26 | dict(type='Normalize', **img_norm_cfg), 27 | dict(type='Pad', size_divisor=32), 28 | dict(type='ImageToTensor', keys=['img']), 29 | dict(type='Collect', keys=['img']), 30 | ]) 31 | ] 32 | data = dict( 33 | samples_per_gpu=1, 34 | workers_per_gpu=2, 35 | train=dict( 36 | type='RepeatDataset', 37 | times=8, 38 | dataset=dict( 39 | type=dataset_type, 40 | ann_file=data_root + 41 | 'annotations/instancesonly_filtered_gtFine_train.json', 42 | img_prefix=data_root + 'leftImg8bit/train/', 43 | pipeline=train_pipeline)), 44 | val=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 47 | 'annotations/instancesonly_filtered_gtFine_val.json', 48 | img_prefix=data_root + 'leftImg8bit/val/', 49 | pipeline=test_pipeline), 50 | test=dict( 51 | type=dataset_type, 52 | ann_file=data_root + 53 | 'annotations/instancesonly_filtered_gtFine_test.json', 54 | img_prefix=data_root + 'leftImg8bit/test/', 55 | pipeline=test_pipeline)) 56 | evaluation = dict(interval=1, metric='bbox') 57 | -------------------------------------------------------------------------------- /detection/configs/_base_/datasets/cityscapes_instance.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'CityscapesDataset' 3 | data_root = 'data/cityscapes/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True), 9 | dict( 10 | type='Resize', img_scale=[(2048, 800), (2048, 1024)], keep_ratio=True), 11 | dict(type='RandomFlip', flip_ratio=0.5), 12 | dict(type='Normalize', **img_norm_cfg), 13 | dict(type='Pad', size_divisor=32), 14 | dict(type='DefaultFormatBundle'), 15 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), 16 | ] 17 | test_pipeline = [ 18 | dict(type='LoadImageFromFile'), 19 | dict( 20 | type='MultiScaleFlipAug', 21 | img_scale=(2048, 1024), 22 | flip=False, 23 | transforms=[ 24 | dict(type='Resize', keep_ratio=True), 25 | dict(type='RandomFlip'), 26 | dict(type='Normalize', **img_norm_cfg), 27 | dict(type='Pad', size_divisor=32), 28 | dict(type='ImageToTensor', keys=['img']), 29 | dict(type='Collect', keys=['img']), 30 | ]) 31 | ] 32 | data = dict( 33 | samples_per_gpu=1, 34 | workers_per_gpu=2, 35 | train=dict( 36 | type='RepeatDataset', 37 | times=8, 38 | dataset=dict( 39 | type=dataset_type, 40 | ann_file=data_root + 41 | 'annotations/instancesonly_filtered_gtFine_train.json', 42 | img_prefix=data_root + 'leftImg8bit/train/', 43 | pipeline=train_pipeline)), 44 | val=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 47 | 'annotations/instancesonly_filtered_gtFine_val.json', 48 | img_prefix=data_root + 'leftImg8bit/val/', 49 | pipeline=test_pipeline), 50 | test=dict( 51 | type=dataset_type, 52 | ann_file=data_root + 53 | 'annotations/instancesonly_filtered_gtFine_test.json', 54 | img_prefix=data_root + 'leftImg8bit/test/', 55 | pipeline=test_pipeline)) 56 | evaluation = dict(metric=['bbox', 'segm']) 57 | -------------------------------------------------------------------------------- /detection/configs/_base_/datasets/coco_detection.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'CocoDataset' 3 | data_root = 'data/coco/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict(type='LoadAnnotations', with_bbox=True), 9 | dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), 10 | dict(type='RandomFlip', flip_ratio=0.5), 11 | dict(type='Normalize', **img_norm_cfg), 12 | dict(type='Pad', size_divisor=32), 13 | dict(type='DefaultFormatBundle'), 14 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 15 | ] 16 | test_pipeline = [ 17 | dict(type='LoadImageFromFile'), 18 | dict( 19 | type='MultiScaleFlipAug', 20 | img_scale=(1333, 800), 21 | flip=False, 22 | transforms=[ 23 | dict(type='Resize', keep_ratio=True), 24 | dict(type='RandomFlip'), 25 | dict(type='Normalize', **img_norm_cfg), 26 | dict(type='Pad', size_divisor=32), 27 | dict(type='ImageToTensor', keys=['img']), 28 | dict(type='Collect', keys=['img']), 29 | ]) 30 | ] 31 | data = dict( 32 | samples_per_gpu=2, 33 | workers_per_gpu=2, 34 | train=dict( 35 | type=dataset_type, 36 | ann_file=data_root + 'annotations/instances_train2017.json', 37 | img_prefix=data_root + 'train2017/', 38 | pipeline=train_pipeline), 39 | val=dict( 40 | type=dataset_type, 41 | ann_file=data_root + 'annotations/instances_val2017.json', 42 | img_prefix=data_root + 'val2017/', 43 | pipeline=test_pipeline), 44 | test=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 'annotations/instances_val2017.json', 47 | img_prefix=data_root + 'val2017/', 48 | pipeline=test_pipeline)) 49 | evaluation = dict(interval=1, metric='bbox') 50 | -------------------------------------------------------------------------------- /detection/configs/_base_/datasets/coco_instance.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'CocoDataset' 3 | data_root = 'data/coco/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True), 9 | dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), 10 | dict(type='RandomFlip', flip_ratio=0.5), 11 | dict(type='Normalize', **img_norm_cfg), 12 | dict(type='Pad', size_divisor=32), 13 | dict(type='DefaultFormatBundle'), 14 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), 15 | ] 16 | test_pipeline = [ 17 | dict(type='LoadImageFromFile'), 18 | dict( 19 | type='MultiScaleFlipAug', 20 | img_scale=(1333, 800), 21 | flip=False, 22 | transforms=[ 23 | dict(type='Resize', keep_ratio=True), 24 | dict(type='RandomFlip'), 25 | dict(type='Normalize', **img_norm_cfg), 26 | dict(type='Pad', size_divisor=32), 27 | dict(type='ImageToTensor', keys=['img']), 28 | dict(type='Collect', keys=['img']), 29 | ]) 30 | ] 31 | data = dict( 32 | samples_per_gpu=2, 33 | workers_per_gpu=2, 34 | train=dict( 35 | type=dataset_type, 36 | ann_file=data_root + 'annotations/instances_train2017.json', 37 | img_prefix=data_root + 'train2017/', 38 | pipeline=train_pipeline), 39 | val=dict( 40 | type=dataset_type, 41 | ann_file=data_root + 'annotations/instances_val2017.json', 42 | img_prefix=data_root + 'val2017/', 43 | pipeline=test_pipeline), 44 | test=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 'annotations/instances_val2017.json', 47 | img_prefix=data_root + 'val2017/', 48 | pipeline=test_pipeline)) 49 | evaluation = dict(metric=['bbox', 'segm']) 50 | -------------------------------------------------------------------------------- /detection/configs/_base_/datasets/coco_instance_semantic.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'CocoDataset' 3 | data_root = 'data/coco/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict( 9 | type='LoadAnnotations', with_bbox=True, with_mask=True, with_seg=True), 10 | dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), 11 | dict(type='RandomFlip', flip_ratio=0.5), 12 | dict(type='Normalize', **img_norm_cfg), 13 | dict(type='Pad', size_divisor=32), 14 | dict(type='SegRescale', scale_factor=1 / 8), 15 | dict(type='DefaultFormatBundle'), 16 | dict( 17 | type='Collect', 18 | keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']), 19 | ] 20 | test_pipeline = [ 21 | dict(type='LoadImageFromFile'), 22 | dict( 23 | type='MultiScaleFlipAug', 24 | img_scale=(1333, 800), 25 | flip=False, 26 | transforms=[ 27 | dict(type='Resize', keep_ratio=True), 28 | dict(type='RandomFlip', flip_ratio=0.5), 29 | dict(type='Normalize', **img_norm_cfg), 30 | dict(type='Pad', size_divisor=32), 31 | dict(type='ImageToTensor', keys=['img']), 32 | dict(type='Collect', keys=['img']), 33 | ]) 34 | ] 35 | data = dict( 36 | samples_per_gpu=2, 37 | workers_per_gpu=2, 38 | train=dict( 39 | type=dataset_type, 40 | ann_file=data_root + 'annotations/instances_train2017.json', 41 | img_prefix=data_root + 'train2017/', 42 | seg_prefix=data_root + 'stuffthingmaps/train2017/', 43 | pipeline=train_pipeline), 44 | val=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 'annotations/instances_val2017.json', 47 | img_prefix=data_root + 'val2017/', 48 | pipeline=test_pipeline), 49 | test=dict( 50 | type=dataset_type, 51 | ann_file=data_root + 'annotations/instances_val2017.json', 52 | img_prefix=data_root + 'val2017/', 53 | pipeline=test_pipeline)) 54 | evaluation = dict(metric=['bbox', 'segm']) 55 | -------------------------------------------------------------------------------- /detection/configs/_base_/datasets/deepfashion.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'DeepFashionDataset' 3 | data_root = 'data/DeepFashion/In-shop/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True), 9 | dict(type='Resize', img_scale=(750, 1101), keep_ratio=True), 10 | dict(type='RandomFlip', flip_ratio=0.5), 11 | dict(type='Normalize', **img_norm_cfg), 12 | dict(type='Pad', size_divisor=32), 13 | dict(type='DefaultFormatBundle'), 14 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), 15 | ] 16 | test_pipeline = [ 17 | dict(type='LoadImageFromFile'), 18 | dict( 19 | type='MultiScaleFlipAug', 20 | img_scale=(750, 1101), 21 | flip=False, 22 | transforms=[ 23 | dict(type='Resize', keep_ratio=True), 24 | dict(type='RandomFlip'), 25 | dict(type='Normalize', **img_norm_cfg), 26 | dict(type='Pad', size_divisor=32), 27 | dict(type='ImageToTensor', keys=['img']), 28 | dict(type='Collect', keys=['img']), 29 | ]) 30 | ] 31 | data = dict( 32 | imgs_per_gpu=2, 33 | workers_per_gpu=1, 34 | train=dict( 35 | type=dataset_type, 36 | ann_file=data_root + 'annotations/DeepFashion_segmentation_query.json', 37 | img_prefix=data_root + 'Img/', 38 | pipeline=train_pipeline, 39 | data_root=data_root), 40 | val=dict( 41 | type=dataset_type, 42 | ann_file=data_root + 'annotations/DeepFashion_segmentation_query.json', 43 | img_prefix=data_root + 'Img/', 44 | pipeline=test_pipeline, 45 | data_root=data_root), 46 | test=dict( 47 | type=dataset_type, 48 | ann_file=data_root + 49 | 'annotations/DeepFashion_segmentation_gallery.json', 50 | img_prefix=data_root + 'Img/', 51 | pipeline=test_pipeline, 52 | data_root=data_root)) 53 | evaluation = dict(interval=5, metric=['bbox', 'segm']) 54 | -------------------------------------------------------------------------------- /detection/configs/_base_/datasets/lvis_v0.5_instance.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | _base_ = 'coco_instance.py' 3 | dataset_type = 'LVISV05Dataset' 4 | data_root = 'data/lvis_v0.5/' 5 | data = dict( 6 | samples_per_gpu=2, 7 | workers_per_gpu=2, 8 | train=dict( 9 | _delete_=True, 10 | type='ClassBalancedDataset', 11 | oversample_thr=1e-3, 12 | dataset=dict( 13 | type=dataset_type, 14 | ann_file=data_root + 'annotations/lvis_v0.5_train.json', 15 | img_prefix=data_root + 'train2017/')), 16 | val=dict( 17 | type=dataset_type, 18 | ann_file=data_root + 'annotations/lvis_v0.5_val.json', 19 | img_prefix=data_root + 'val2017/'), 20 | test=dict( 21 | type=dataset_type, 22 | ann_file=data_root + 'annotations/lvis_v0.5_val.json', 23 | img_prefix=data_root + 'val2017/')) 24 | evaluation = dict(metric=['bbox', 'segm']) 25 | -------------------------------------------------------------------------------- /detection/configs/_base_/datasets/lvis_v1_instance.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | _base_ = 'coco_instance.py' 3 | dataset_type = 'LVISV1Dataset' 4 | data_root = 'data/lvis_v1/' 5 | data = dict( 6 | samples_per_gpu=2, 7 | workers_per_gpu=2, 8 | train=dict( 9 | _delete_=True, 10 | type='ClassBalancedDataset', 11 | oversample_thr=1e-3, 12 | dataset=dict( 13 | type=dataset_type, 14 | ann_file=data_root + 'annotations/lvis_v1_train.json', 15 | img_prefix=data_root)), 16 | val=dict( 17 | type=dataset_type, 18 | ann_file=data_root + 'annotations/lvis_v1_val.json', 19 | img_prefix=data_root), 20 | test=dict( 21 | type=dataset_type, 22 | ann_file=data_root + 'annotations/lvis_v1_val.json', 23 | img_prefix=data_root)) 24 | evaluation = dict(metric=['bbox', 'segm']) 25 | -------------------------------------------------------------------------------- /detection/configs/_base_/datasets/voc0712.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'VOCDataset' 3 | data_root = 'data/VOCdevkit/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict(type='LoadAnnotations', with_bbox=True), 9 | dict(type='Resize', img_scale=(1000, 600), keep_ratio=True), 10 | dict(type='RandomFlip', flip_ratio=0.5), 11 | dict(type='Normalize', **img_norm_cfg), 12 | dict(type='Pad', size_divisor=32), 13 | dict(type='DefaultFormatBundle'), 14 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 15 | ] 16 | test_pipeline = [ 17 | dict(type='LoadImageFromFile'), 18 | dict( 19 | type='MultiScaleFlipAug', 20 | img_scale=(1000, 600), 21 | flip=False, 22 | transforms=[ 23 | dict(type='Resize', keep_ratio=True), 24 | dict(type='RandomFlip'), 25 | dict(type='Normalize', **img_norm_cfg), 26 | dict(type='Pad', size_divisor=32), 27 | dict(type='ImageToTensor', keys=['img']), 28 | dict(type='Collect', keys=['img']), 29 | ]) 30 | ] 31 | data = dict( 32 | samples_per_gpu=2, 33 | workers_per_gpu=2, 34 | train=dict( 35 | type='RepeatDataset', 36 | times=3, 37 | dataset=dict( 38 | type=dataset_type, 39 | ann_file=[ 40 | data_root + 'VOC2007/ImageSets/Main/trainval.txt', 41 | data_root + 'VOC2012/ImageSets/Main/trainval.txt' 42 | ], 43 | img_prefix=[data_root + 'VOC2007/', data_root + 'VOC2012/'], 44 | pipeline=train_pipeline)), 45 | val=dict( 46 | type=dataset_type, 47 | ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt', 48 | img_prefix=data_root + 'VOC2007/', 49 | pipeline=test_pipeline), 50 | test=dict( 51 | type=dataset_type, 52 | ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt', 53 | img_prefix=data_root + 'VOC2007/', 54 | pipeline=test_pipeline)) 55 | evaluation = dict(interval=1, metric='mAP') 56 | -------------------------------------------------------------------------------- /detection/configs/_base_/datasets/wider_face.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'WIDERFaceDataset' 3 | data_root = 'data/WIDERFace/' 4 | img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True) 5 | train_pipeline = [ 6 | dict(type='LoadImageFromFile', to_float32=True), 7 | dict(type='LoadAnnotations', with_bbox=True), 8 | dict( 9 | type='PhotoMetricDistortion', 10 | brightness_delta=32, 11 | contrast_range=(0.5, 1.5), 12 | saturation_range=(0.5, 1.5), 13 | hue_delta=18), 14 | dict( 15 | type='Expand', 16 | mean=img_norm_cfg['mean'], 17 | to_rgb=img_norm_cfg['to_rgb'], 18 | ratio_range=(1, 4)), 19 | dict( 20 | type='MinIoURandomCrop', 21 | min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), 22 | min_crop_size=0.3), 23 | dict(type='Resize', img_scale=(300, 300), keep_ratio=False), 24 | dict(type='Normalize', **img_norm_cfg), 25 | dict(type='RandomFlip', flip_ratio=0.5), 26 | dict(type='DefaultFormatBundle'), 27 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 28 | ] 29 | test_pipeline = [ 30 | dict(type='LoadImageFromFile'), 31 | dict( 32 | type='MultiScaleFlipAug', 33 | img_scale=(300, 300), 34 | flip=False, 35 | transforms=[ 36 | dict(type='Resize', keep_ratio=False), 37 | dict(type='Normalize', **img_norm_cfg), 38 | dict(type='ImageToTensor', keys=['img']), 39 | dict(type='Collect', keys=['img']), 40 | ]) 41 | ] 42 | data = dict( 43 | samples_per_gpu=60, 44 | workers_per_gpu=2, 45 | train=dict( 46 | type='RepeatDataset', 47 | times=2, 48 | dataset=dict( 49 | type=dataset_type, 50 | ann_file=data_root + 'train.txt', 51 | img_prefix=data_root + 'WIDER_train/', 52 | min_size=17, 53 | pipeline=train_pipeline)), 54 | val=dict( 55 | type=dataset_type, 56 | ann_file=data_root + 'val.txt', 57 | img_prefix=data_root + 'WIDER_val/', 58 | pipeline=test_pipeline), 59 | test=dict( 60 | type=dataset_type, 61 | ann_file=data_root + 'val.txt', 62 | img_prefix=data_root + 'WIDER_val/', 63 | pipeline=test_pipeline)) 64 | -------------------------------------------------------------------------------- /detection/configs/_base_/default_runtime.py: -------------------------------------------------------------------------------- 1 | checkpoint_config = dict(interval=1) 2 | # yapf:disable 3 | log_config = dict( 4 | interval=50, 5 | hooks=[ 6 | dict(type='TextLoggerHook'), 7 | # dict(type='TensorboardLoggerHook') 8 | ]) 9 | # yapf:enable 10 | custom_hooks = [dict(type='NumClassCheckHook')] 11 | 12 | dist_params = dict(backend='nccl') 13 | log_level = 'INFO' 14 | load_from = None 15 | resume_from = None 16 | workflow = [('train', 1)] 17 | -------------------------------------------------------------------------------- /detection/configs/_base_/models/cascade_mask_rcnn_r50_fpn.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='CascadeRCNN', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=4, 8 | out_indices=(0, 1, 2, 3), 9 | frozen_stages=1, 10 | norm_cfg=dict(type='BN', requires_grad=True), 11 | norm_eval=True, 12 | style='pytorch', 13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), 14 | neck=dict( 15 | type='FPN', 16 | in_channels=[256, 512, 1024, 2048], 17 | out_channels=256, 18 | num_outs=5), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=256, 22 | feat_channels=256, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[8], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[4, 8, 16, 32, 64]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), 35 | roi_head=dict( 36 | type='CascadeRoIHead', 37 | num_stages=3, 38 | stage_loss_weights=[1, 0.5, 0.25], 39 | bbox_roi_extractor=dict( 40 | type='SingleRoIExtractor', 41 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), 42 | out_channels=256, 43 | featmap_strides=[4, 8, 16, 32]), 44 | bbox_head=[ 45 | dict( 46 | type='Shared2FCBBoxHead', 47 | in_channels=256, 48 | fc_out_channels=1024, 49 | roi_feat_size=7, 50 | num_classes=80, 51 | bbox_coder=dict( 52 | type='DeltaXYWHBBoxCoder', 53 | target_means=[0., 0., 0., 0.], 54 | target_stds=[0.1, 0.1, 0.2, 0.2]), 55 | reg_class_agnostic=True, 56 | loss_cls=dict( 57 | type='CrossEntropyLoss', 58 | use_sigmoid=False, 59 | loss_weight=1.0), 60 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0, 61 | loss_weight=1.0)), 62 | dict( 63 | type='Shared2FCBBoxHead', 64 | in_channels=256, 65 | fc_out_channels=1024, 66 | roi_feat_size=7, 67 | num_classes=80, 68 | bbox_coder=dict( 69 | type='DeltaXYWHBBoxCoder', 70 | target_means=[0., 0., 0., 0.], 71 | target_stds=[0.05, 0.05, 0.1, 0.1]), 72 | reg_class_agnostic=True, 73 | loss_cls=dict( 74 | type='CrossEntropyLoss', 75 | use_sigmoid=False, 76 | loss_weight=1.0), 77 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0, 78 | loss_weight=1.0)), 79 | dict( 80 | type='Shared2FCBBoxHead', 81 | in_channels=256, 82 | fc_out_channels=1024, 83 | roi_feat_size=7, 84 | num_classes=80, 85 | bbox_coder=dict( 86 | type='DeltaXYWHBBoxCoder', 87 | target_means=[0., 0., 0., 0.], 88 | target_stds=[0.033, 0.033, 0.067, 0.067]), 89 | reg_class_agnostic=True, 90 | loss_cls=dict( 91 | type='CrossEntropyLoss', 92 | use_sigmoid=False, 93 | loss_weight=1.0), 94 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) 95 | ], 96 | mask_roi_extractor=dict( 97 | type='SingleRoIExtractor', 98 | roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), 99 | out_channels=256, 100 | featmap_strides=[4, 8, 16, 32]), 101 | mask_head=dict( 102 | type='FCNMaskHead', 103 | num_convs=4, 104 | in_channels=256, 105 | conv_out_channels=256, 106 | num_classes=80, 107 | loss_mask=dict( 108 | type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), 109 | # model training and testing settings 110 | train_cfg=dict( 111 | rpn=dict( 112 | assigner=dict( 113 | type='MaxIoUAssigner', 114 | pos_iou_thr=0.7, 115 | neg_iou_thr=0.3, 116 | min_pos_iou=0.3, 117 | match_low_quality=True, 118 | ignore_iof_thr=-1), 119 | sampler=dict( 120 | type='RandomSampler', 121 | num=256, 122 | pos_fraction=0.5, 123 | neg_pos_ub=-1, 124 | add_gt_as_proposals=False), 125 | allowed_border=0, 126 | pos_weight=-1, 127 | debug=False), 128 | rpn_proposal=dict( 129 | nms_pre=2000, 130 | max_per_img=2000, 131 | nms=dict(type='nms', iou_threshold=0.7), 132 | min_bbox_size=0), 133 | rcnn=[ 134 | dict( 135 | assigner=dict( 136 | type='MaxIoUAssigner', 137 | pos_iou_thr=0.5, 138 | neg_iou_thr=0.5, 139 | min_pos_iou=0.5, 140 | match_low_quality=False, 141 | ignore_iof_thr=-1), 142 | sampler=dict( 143 | type='RandomSampler', 144 | num=512, 145 | pos_fraction=0.25, 146 | neg_pos_ub=-1, 147 | add_gt_as_proposals=True), 148 | mask_size=28, 149 | pos_weight=-1, 150 | debug=False), 151 | dict( 152 | assigner=dict( 153 | type='MaxIoUAssigner', 154 | pos_iou_thr=0.6, 155 | neg_iou_thr=0.6, 156 | min_pos_iou=0.6, 157 | match_low_quality=False, 158 | ignore_iof_thr=-1), 159 | sampler=dict( 160 | type='RandomSampler', 161 | num=512, 162 | pos_fraction=0.25, 163 | neg_pos_ub=-1, 164 | add_gt_as_proposals=True), 165 | mask_size=28, 166 | pos_weight=-1, 167 | debug=False), 168 | dict( 169 | assigner=dict( 170 | type='MaxIoUAssigner', 171 | pos_iou_thr=0.7, 172 | neg_iou_thr=0.7, 173 | min_pos_iou=0.7, 174 | match_low_quality=False, 175 | ignore_iof_thr=-1), 176 | sampler=dict( 177 | type='RandomSampler', 178 | num=512, 179 | pos_fraction=0.25, 180 | neg_pos_ub=-1, 181 | add_gt_as_proposals=True), 182 | mask_size=28, 183 | pos_weight=-1, 184 | debug=False) 185 | ]), 186 | test_cfg=dict( 187 | rpn=dict( 188 | nms_pre=1000, 189 | max_per_img=1000, 190 | nms=dict(type='nms', iou_threshold=0.7), 191 | min_bbox_size=0), 192 | rcnn=dict( 193 | score_thr=0.05, 194 | nms=dict(type='nms', iou_threshold=0.5), 195 | max_per_img=100, 196 | mask_thr_binary=0.5))) 197 | -------------------------------------------------------------------------------- /detection/configs/_base_/models/cascade_rcnn_r50_fpn.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='CascadeRCNN', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=4, 8 | out_indices=(0, 1, 2, 3), 9 | frozen_stages=1, 10 | norm_cfg=dict(type='BN', requires_grad=True), 11 | norm_eval=True, 12 | style='pytorch', 13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), 14 | neck=dict( 15 | type='FPN', 16 | in_channels=[256, 512, 1024, 2048], 17 | out_channels=256, 18 | num_outs=5), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=256, 22 | feat_channels=256, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[8], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[4, 8, 16, 32, 64]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), 35 | roi_head=dict( 36 | type='CascadeRoIHead', 37 | num_stages=3, 38 | stage_loss_weights=[1, 0.5, 0.25], 39 | bbox_roi_extractor=dict( 40 | type='SingleRoIExtractor', 41 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), 42 | out_channels=256, 43 | featmap_strides=[4, 8, 16, 32]), 44 | bbox_head=[ 45 | dict( 46 | type='Shared2FCBBoxHead', 47 | in_channels=256, 48 | fc_out_channels=1024, 49 | roi_feat_size=7, 50 | num_classes=80, 51 | bbox_coder=dict( 52 | type='DeltaXYWHBBoxCoder', 53 | target_means=[0., 0., 0., 0.], 54 | target_stds=[0.1, 0.1, 0.2, 0.2]), 55 | reg_class_agnostic=True, 56 | loss_cls=dict( 57 | type='CrossEntropyLoss', 58 | use_sigmoid=False, 59 | loss_weight=1.0), 60 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0, 61 | loss_weight=1.0)), 62 | dict( 63 | type='Shared2FCBBoxHead', 64 | in_channels=256, 65 | fc_out_channels=1024, 66 | roi_feat_size=7, 67 | num_classes=80, 68 | bbox_coder=dict( 69 | type='DeltaXYWHBBoxCoder', 70 | target_means=[0., 0., 0., 0.], 71 | target_stds=[0.05, 0.05, 0.1, 0.1]), 72 | reg_class_agnostic=True, 73 | loss_cls=dict( 74 | type='CrossEntropyLoss', 75 | use_sigmoid=False, 76 | loss_weight=1.0), 77 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0, 78 | loss_weight=1.0)), 79 | dict( 80 | type='Shared2FCBBoxHead', 81 | in_channels=256, 82 | fc_out_channels=1024, 83 | roi_feat_size=7, 84 | num_classes=80, 85 | bbox_coder=dict( 86 | type='DeltaXYWHBBoxCoder', 87 | target_means=[0., 0., 0., 0.], 88 | target_stds=[0.033, 0.033, 0.067, 0.067]), 89 | reg_class_agnostic=True, 90 | loss_cls=dict( 91 | type='CrossEntropyLoss', 92 | use_sigmoid=False, 93 | loss_weight=1.0), 94 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) 95 | ]), 96 | # model training and testing settings 97 | train_cfg=dict( 98 | rpn=dict( 99 | assigner=dict( 100 | type='MaxIoUAssigner', 101 | pos_iou_thr=0.7, 102 | neg_iou_thr=0.3, 103 | min_pos_iou=0.3, 104 | match_low_quality=True, 105 | ignore_iof_thr=-1), 106 | sampler=dict( 107 | type='RandomSampler', 108 | num=256, 109 | pos_fraction=0.5, 110 | neg_pos_ub=-1, 111 | add_gt_as_proposals=False), 112 | allowed_border=0, 113 | pos_weight=-1, 114 | debug=False), 115 | rpn_proposal=dict( 116 | nms_pre=2000, 117 | max_per_img=2000, 118 | nms=dict(type='nms', iou_threshold=0.7), 119 | min_bbox_size=0), 120 | rcnn=[ 121 | dict( 122 | assigner=dict( 123 | type='MaxIoUAssigner', 124 | pos_iou_thr=0.5, 125 | neg_iou_thr=0.5, 126 | min_pos_iou=0.5, 127 | match_low_quality=False, 128 | ignore_iof_thr=-1), 129 | sampler=dict( 130 | type='RandomSampler', 131 | num=512, 132 | pos_fraction=0.25, 133 | neg_pos_ub=-1, 134 | add_gt_as_proposals=True), 135 | pos_weight=-1, 136 | debug=False), 137 | dict( 138 | assigner=dict( 139 | type='MaxIoUAssigner', 140 | pos_iou_thr=0.6, 141 | neg_iou_thr=0.6, 142 | min_pos_iou=0.6, 143 | match_low_quality=False, 144 | ignore_iof_thr=-1), 145 | sampler=dict( 146 | type='RandomSampler', 147 | num=512, 148 | pos_fraction=0.25, 149 | neg_pos_ub=-1, 150 | add_gt_as_proposals=True), 151 | pos_weight=-1, 152 | debug=False), 153 | dict( 154 | assigner=dict( 155 | type='MaxIoUAssigner', 156 | pos_iou_thr=0.7, 157 | neg_iou_thr=0.7, 158 | min_pos_iou=0.7, 159 | match_low_quality=False, 160 | ignore_iof_thr=-1), 161 | sampler=dict( 162 | type='RandomSampler', 163 | num=512, 164 | pos_fraction=0.25, 165 | neg_pos_ub=-1, 166 | add_gt_as_proposals=True), 167 | pos_weight=-1, 168 | debug=False) 169 | ]), 170 | test_cfg=dict( 171 | rpn=dict( 172 | nms_pre=1000, 173 | max_per_img=1000, 174 | nms=dict(type='nms', iou_threshold=0.7), 175 | min_bbox_size=0), 176 | rcnn=dict( 177 | score_thr=0.05, 178 | nms=dict(type='nms', iou_threshold=0.5), 179 | max_per_img=100))) 180 | -------------------------------------------------------------------------------- /detection/configs/_base_/models/fast_rcnn_r50_fpn.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='FastRCNN', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=4, 8 | out_indices=(0, 1, 2, 3), 9 | frozen_stages=1, 10 | norm_cfg=dict(type='BN', requires_grad=True), 11 | norm_eval=True, 12 | style='pytorch', 13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), 14 | neck=dict( 15 | type='FPN', 16 | in_channels=[256, 512, 1024, 2048], 17 | out_channels=256, 18 | num_outs=5), 19 | roi_head=dict( 20 | type='StandardRoIHead', 21 | bbox_roi_extractor=dict( 22 | type='SingleRoIExtractor', 23 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), 24 | out_channels=256, 25 | featmap_strides=[4, 8, 16, 32]), 26 | bbox_head=dict( 27 | type='Shared2FCBBoxHead', 28 | in_channels=256, 29 | fc_out_channels=1024, 30 | roi_feat_size=7, 31 | num_classes=80, 32 | bbox_coder=dict( 33 | type='DeltaXYWHBBoxCoder', 34 | target_means=[0., 0., 0., 0.], 35 | target_stds=[0.1, 0.1, 0.2, 0.2]), 36 | reg_class_agnostic=False, 37 | loss_cls=dict( 38 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), 39 | loss_bbox=dict(type='L1Loss', loss_weight=1.0))), 40 | # model training and testing settings 41 | train_cfg=dict( 42 | rcnn=dict( 43 | assigner=dict( 44 | type='MaxIoUAssigner', 45 | pos_iou_thr=0.5, 46 | neg_iou_thr=0.5, 47 | min_pos_iou=0.5, 48 | match_low_quality=False, 49 | ignore_iof_thr=-1), 50 | sampler=dict( 51 | type='RandomSampler', 52 | num=512, 53 | pos_fraction=0.25, 54 | neg_pos_ub=-1, 55 | add_gt_as_proposals=True), 56 | pos_weight=-1, 57 | debug=False)), 58 | test_cfg=dict( 59 | rcnn=dict( 60 | score_thr=0.05, 61 | nms=dict(type='nms', iou_threshold=0.5), 62 | max_per_img=100))) 63 | -------------------------------------------------------------------------------- /detection/configs/_base_/models/faster_rcnn_r50_caffe_c4.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | norm_cfg = dict(type='BN', requires_grad=False) 3 | model = dict( 4 | type='FasterRCNN', 5 | backbone=dict( 6 | type='ResNet', 7 | depth=50, 8 | num_stages=3, 9 | strides=(1, 2, 2), 10 | dilations=(1, 1, 1), 11 | out_indices=(2, ), 12 | frozen_stages=1, 13 | norm_cfg=norm_cfg, 14 | norm_eval=True, 15 | style='caffe', 16 | init_cfg=dict( 17 | type='Pretrained', 18 | checkpoint='open-mmlab://detectron2/resnet50_caffe')), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=1024, 22 | feat_channels=1024, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[2, 4, 8, 16, 32], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[16]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 35 | roi_head=dict( 36 | type='StandardRoIHead', 37 | shared_head=dict( 38 | type='ResLayer', 39 | depth=50, 40 | stage=3, 41 | stride=2, 42 | dilation=1, 43 | style='caffe', 44 | norm_cfg=norm_cfg, 45 | norm_eval=True), 46 | bbox_roi_extractor=dict( 47 | type='SingleRoIExtractor', 48 | roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), 49 | out_channels=1024, 50 | featmap_strides=[16]), 51 | bbox_head=dict( 52 | type='BBoxHead', 53 | with_avg_pool=True, 54 | roi_feat_size=7, 55 | in_channels=2048, 56 | num_classes=80, 57 | bbox_coder=dict( 58 | type='DeltaXYWHBBoxCoder', 59 | target_means=[0., 0., 0., 0.], 60 | target_stds=[0.1, 0.1, 0.2, 0.2]), 61 | reg_class_agnostic=False, 62 | loss_cls=dict( 63 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), 64 | loss_bbox=dict(type='L1Loss', loss_weight=1.0))), 65 | # model training and testing settings 66 | train_cfg=dict( 67 | rpn=dict( 68 | assigner=dict( 69 | type='MaxIoUAssigner', 70 | pos_iou_thr=0.7, 71 | neg_iou_thr=0.3, 72 | min_pos_iou=0.3, 73 | match_low_quality=True, 74 | ignore_iof_thr=-1), 75 | sampler=dict( 76 | type='RandomSampler', 77 | num=256, 78 | pos_fraction=0.5, 79 | neg_pos_ub=-1, 80 | add_gt_as_proposals=False), 81 | allowed_border=0, 82 | pos_weight=-1, 83 | debug=False), 84 | rpn_proposal=dict( 85 | nms_pre=12000, 86 | max_per_img=2000, 87 | nms=dict(type='nms', iou_threshold=0.7), 88 | min_bbox_size=0), 89 | rcnn=dict( 90 | assigner=dict( 91 | type='MaxIoUAssigner', 92 | pos_iou_thr=0.5, 93 | neg_iou_thr=0.5, 94 | min_pos_iou=0.5, 95 | match_low_quality=False, 96 | ignore_iof_thr=-1), 97 | sampler=dict( 98 | type='RandomSampler', 99 | num=512, 100 | pos_fraction=0.25, 101 | neg_pos_ub=-1, 102 | add_gt_as_proposals=True), 103 | pos_weight=-1, 104 | debug=False)), 105 | test_cfg=dict( 106 | rpn=dict( 107 | nms_pre=6000, 108 | max_per_img=1000, 109 | nms=dict(type='nms', iou_threshold=0.7), 110 | min_bbox_size=0), 111 | rcnn=dict( 112 | score_thr=0.05, 113 | nms=dict(type='nms', iou_threshold=0.5), 114 | max_per_img=100))) 115 | -------------------------------------------------------------------------------- /detection/configs/_base_/models/faster_rcnn_r50_caffe_dc5.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | norm_cfg = dict(type='BN', requires_grad=False) 3 | model = dict( 4 | type='FasterRCNN', 5 | backbone=dict( 6 | type='ResNet', 7 | depth=50, 8 | num_stages=4, 9 | strides=(1, 2, 2, 1), 10 | dilations=(1, 1, 1, 2), 11 | out_indices=(3, ), 12 | frozen_stages=1, 13 | norm_cfg=norm_cfg, 14 | norm_eval=True, 15 | style='caffe', 16 | init_cfg=dict( 17 | type='Pretrained', 18 | checkpoint='open-mmlab://detectron2/resnet50_caffe')), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=2048, 22 | feat_channels=2048, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[2, 4, 8, 16, 32], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[16]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 35 | roi_head=dict( 36 | type='StandardRoIHead', 37 | bbox_roi_extractor=dict( 38 | type='SingleRoIExtractor', 39 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), 40 | out_channels=2048, 41 | featmap_strides=[16]), 42 | bbox_head=dict( 43 | type='Shared2FCBBoxHead', 44 | in_channels=2048, 45 | fc_out_channels=1024, 46 | roi_feat_size=7, 47 | num_classes=80, 48 | bbox_coder=dict( 49 | type='DeltaXYWHBBoxCoder', 50 | target_means=[0., 0., 0., 0.], 51 | target_stds=[0.1, 0.1, 0.2, 0.2]), 52 | reg_class_agnostic=False, 53 | loss_cls=dict( 54 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), 55 | loss_bbox=dict(type='L1Loss', loss_weight=1.0))), 56 | # model training and testing settings 57 | train_cfg=dict( 58 | rpn=dict( 59 | assigner=dict( 60 | type='MaxIoUAssigner', 61 | pos_iou_thr=0.7, 62 | neg_iou_thr=0.3, 63 | min_pos_iou=0.3, 64 | match_low_quality=True, 65 | ignore_iof_thr=-1), 66 | sampler=dict( 67 | type='RandomSampler', 68 | num=256, 69 | pos_fraction=0.5, 70 | neg_pos_ub=-1, 71 | add_gt_as_proposals=False), 72 | allowed_border=0, 73 | pos_weight=-1, 74 | debug=False), 75 | rpn_proposal=dict( 76 | nms_pre=12000, 77 | max_per_img=2000, 78 | nms=dict(type='nms', iou_threshold=0.7), 79 | min_bbox_size=0), 80 | rcnn=dict( 81 | assigner=dict( 82 | type='MaxIoUAssigner', 83 | pos_iou_thr=0.5, 84 | neg_iou_thr=0.5, 85 | min_pos_iou=0.5, 86 | match_low_quality=False, 87 | ignore_iof_thr=-1), 88 | sampler=dict( 89 | type='RandomSampler', 90 | num=512, 91 | pos_fraction=0.25, 92 | neg_pos_ub=-1, 93 | add_gt_as_proposals=True), 94 | pos_weight=-1, 95 | debug=False)), 96 | test_cfg=dict( 97 | rpn=dict( 98 | nms=dict(type='nms', iou_threshold=0.7), 99 | nms_pre=6000, 100 | max_per_img=1000, 101 | min_bbox_size=0), 102 | rcnn=dict( 103 | score_thr=0.05, 104 | nms=dict(type='nms', iou_threshold=0.5), 105 | max_per_img=100))) 106 | -------------------------------------------------------------------------------- /detection/configs/_base_/models/faster_rcnn_r50_fpn.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='FasterRCNN', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=4, 8 | out_indices=(0, 1, 2, 3), 9 | frozen_stages=1, 10 | norm_cfg=dict(type='BN', requires_grad=True), 11 | norm_eval=True, 12 | style='pytorch', 13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), 14 | neck=dict( 15 | type='FPN', 16 | in_channels=[256, 512, 1024, 2048], 17 | out_channels=256, 18 | num_outs=5), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=256, 22 | feat_channels=256, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[8], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[4, 8, 16, 32, 64]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 35 | roi_head=dict( 36 | type='StandardRoIHead', 37 | bbox_roi_extractor=dict( 38 | type='SingleRoIExtractor', 39 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), 40 | out_channels=256, 41 | featmap_strides=[4, 8, 16, 32]), 42 | bbox_head=dict( 43 | type='Shared2FCBBoxHead', 44 | in_channels=256, 45 | fc_out_channels=1024, 46 | roi_feat_size=7, 47 | num_classes=80, 48 | bbox_coder=dict( 49 | type='DeltaXYWHBBoxCoder', 50 | target_means=[0., 0., 0., 0.], 51 | target_stds=[0.1, 0.1, 0.2, 0.2]), 52 | reg_class_agnostic=False, 53 | loss_cls=dict( 54 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), 55 | loss_bbox=dict(type='L1Loss', loss_weight=1.0))), 56 | # model training and testing settings 57 | train_cfg=dict( 58 | rpn=dict( 59 | assigner=dict( 60 | type='MaxIoUAssigner', 61 | pos_iou_thr=0.7, 62 | neg_iou_thr=0.3, 63 | min_pos_iou=0.3, 64 | match_low_quality=True, 65 | ignore_iof_thr=-1), 66 | sampler=dict( 67 | type='RandomSampler', 68 | num=256, 69 | pos_fraction=0.5, 70 | neg_pos_ub=-1, 71 | add_gt_as_proposals=False), 72 | allowed_border=-1, 73 | pos_weight=-1, 74 | debug=False), 75 | rpn_proposal=dict( 76 | nms_pre=2000, 77 | max_per_img=1000, 78 | nms=dict(type='nms', iou_threshold=0.7), 79 | min_bbox_size=0), 80 | rcnn=dict( 81 | assigner=dict( 82 | type='MaxIoUAssigner', 83 | pos_iou_thr=0.5, 84 | neg_iou_thr=0.5, 85 | min_pos_iou=0.5, 86 | match_low_quality=False, 87 | ignore_iof_thr=-1), 88 | sampler=dict( 89 | type='RandomSampler', 90 | num=512, 91 | pos_fraction=0.25, 92 | neg_pos_ub=-1, 93 | add_gt_as_proposals=True), 94 | pos_weight=-1, 95 | debug=False)), 96 | test_cfg=dict( 97 | rpn=dict( 98 | nms_pre=1000, 99 | max_per_img=1000, 100 | nms=dict(type='nms', iou_threshold=0.7), 101 | min_bbox_size=0), 102 | rcnn=dict( 103 | score_thr=0.05, 104 | nms=dict(type='nms', iou_threshold=0.5), 105 | max_per_img=100) 106 | # soft-nms is also supported for rcnn testing 107 | # e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05) 108 | )) 109 | -------------------------------------------------------------------------------- /detection/configs/_base_/models/mask_rcnn_r50_caffe_c4.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | norm_cfg = dict(type='BN', requires_grad=False) 3 | model = dict( 4 | type='MaskRCNN', 5 | backbone=dict( 6 | type='ResNet', 7 | depth=50, 8 | num_stages=3, 9 | strides=(1, 2, 2), 10 | dilations=(1, 1, 1), 11 | out_indices=(2, ), 12 | frozen_stages=1, 13 | norm_cfg=norm_cfg, 14 | norm_eval=True, 15 | style='caffe', 16 | init_cfg=dict( 17 | type='Pretrained', 18 | checkpoint='open-mmlab://detectron2/resnet50_caffe')), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=1024, 22 | feat_channels=1024, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[2, 4, 8, 16, 32], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[16]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 35 | roi_head=dict( 36 | type='StandardRoIHead', 37 | shared_head=dict( 38 | type='ResLayer', 39 | depth=50, 40 | stage=3, 41 | stride=2, 42 | dilation=1, 43 | style='caffe', 44 | norm_cfg=norm_cfg, 45 | norm_eval=True), 46 | bbox_roi_extractor=dict( 47 | type='SingleRoIExtractor', 48 | roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), 49 | out_channels=1024, 50 | featmap_strides=[16]), 51 | bbox_head=dict( 52 | type='BBoxHead', 53 | with_avg_pool=True, 54 | roi_feat_size=7, 55 | in_channels=2048, 56 | num_classes=80, 57 | bbox_coder=dict( 58 | type='DeltaXYWHBBoxCoder', 59 | target_means=[0., 0., 0., 0.], 60 | target_stds=[0.1, 0.1, 0.2, 0.2]), 61 | reg_class_agnostic=False, 62 | loss_cls=dict( 63 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), 64 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 65 | mask_roi_extractor=None, 66 | mask_head=dict( 67 | type='FCNMaskHead', 68 | num_convs=0, 69 | in_channels=2048, 70 | conv_out_channels=256, 71 | num_classes=80, 72 | loss_mask=dict( 73 | type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), 74 | # model training and testing settings 75 | train_cfg=dict( 76 | rpn=dict( 77 | assigner=dict( 78 | type='MaxIoUAssigner', 79 | pos_iou_thr=0.7, 80 | neg_iou_thr=0.3, 81 | min_pos_iou=0.3, 82 | match_low_quality=True, 83 | ignore_iof_thr=-1), 84 | sampler=dict( 85 | type='RandomSampler', 86 | num=256, 87 | pos_fraction=0.5, 88 | neg_pos_ub=-1, 89 | add_gt_as_proposals=False), 90 | allowed_border=0, 91 | pos_weight=-1, 92 | debug=False), 93 | rpn_proposal=dict( 94 | nms_pre=12000, 95 | max_per_img=2000, 96 | nms=dict(type='nms', iou_threshold=0.7), 97 | min_bbox_size=0), 98 | rcnn=dict( 99 | assigner=dict( 100 | type='MaxIoUAssigner', 101 | pos_iou_thr=0.5, 102 | neg_iou_thr=0.5, 103 | min_pos_iou=0.5, 104 | match_low_quality=False, 105 | ignore_iof_thr=-1), 106 | sampler=dict( 107 | type='RandomSampler', 108 | num=512, 109 | pos_fraction=0.25, 110 | neg_pos_ub=-1, 111 | add_gt_as_proposals=True), 112 | mask_size=14, 113 | pos_weight=-1, 114 | debug=False)), 115 | test_cfg=dict( 116 | rpn=dict( 117 | nms_pre=6000, 118 | nms=dict(type='nms', iou_threshold=0.7), 119 | max_per_img=1000, 120 | min_bbox_size=0), 121 | rcnn=dict( 122 | score_thr=0.05, 123 | nms=dict(type='nms', iou_threshold=0.5), 124 | max_per_img=100, 125 | mask_thr_binary=0.5))) 126 | -------------------------------------------------------------------------------- /detection/configs/_base_/models/mask_rcnn_r50_fpn.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='MaskRCNN', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=4, 8 | out_indices=(0, 1, 2, 3), 9 | frozen_stages=1, 10 | norm_cfg=dict(type='BN', requires_grad=True), 11 | norm_eval=True, 12 | style='pytorch', 13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), 14 | neck=dict( 15 | type='FPN', 16 | in_channels=[256, 512, 1024, 2048], 17 | out_channels=256, 18 | num_outs=5), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=256, 22 | feat_channels=256, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[8], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[4, 8, 16, 32, 64]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 35 | roi_head=dict( 36 | type='StandardRoIHead', 37 | bbox_roi_extractor=dict( 38 | type='SingleRoIExtractor', 39 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), 40 | out_channels=256, 41 | featmap_strides=[4, 8, 16, 32]), 42 | bbox_head=dict( 43 | type='Shared2FCBBoxHead', 44 | in_channels=256, 45 | fc_out_channels=1024, 46 | roi_feat_size=7, 47 | num_classes=80, 48 | bbox_coder=dict( 49 | type='DeltaXYWHBBoxCoder', 50 | target_means=[0., 0., 0., 0.], 51 | target_stds=[0.1, 0.1, 0.2, 0.2]), 52 | reg_class_agnostic=False, 53 | loss_cls=dict( 54 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), 55 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 56 | mask_roi_extractor=dict( 57 | type='SingleRoIExtractor', 58 | roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), 59 | out_channels=256, 60 | featmap_strides=[4, 8, 16, 32]), 61 | mask_head=dict( 62 | type='FCNMaskHead', 63 | num_convs=4, 64 | in_channels=256, 65 | conv_out_channels=256, 66 | num_classes=80, 67 | loss_mask=dict( 68 | type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), 69 | # model training and testing settings 70 | train_cfg=dict( 71 | rpn=dict( 72 | assigner=dict( 73 | type='MaxIoUAssigner', 74 | pos_iou_thr=0.7, 75 | neg_iou_thr=0.3, 76 | min_pos_iou=0.3, 77 | match_low_quality=True, 78 | ignore_iof_thr=-1), 79 | sampler=dict( 80 | type='RandomSampler', 81 | num=256, 82 | pos_fraction=0.5, 83 | neg_pos_ub=-1, 84 | add_gt_as_proposals=False), 85 | allowed_border=-1, 86 | pos_weight=-1, 87 | debug=False), 88 | rpn_proposal=dict( 89 | nms_pre=2000, 90 | max_per_img=1000, 91 | nms=dict(type='nms', iou_threshold=0.7), 92 | min_bbox_size=0), 93 | rcnn=dict( 94 | assigner=dict( 95 | type='MaxIoUAssigner', 96 | pos_iou_thr=0.5, 97 | neg_iou_thr=0.5, 98 | min_pos_iou=0.5, 99 | match_low_quality=True, 100 | ignore_iof_thr=-1), 101 | sampler=dict( 102 | type='RandomSampler', 103 | num=512, 104 | pos_fraction=0.25, 105 | neg_pos_ub=-1, 106 | add_gt_as_proposals=True), 107 | mask_size=28, 108 | pos_weight=-1, 109 | debug=False)), 110 | test_cfg=dict( 111 | rpn=dict( 112 | nms_pre=1000, 113 | max_per_img=1000, 114 | nms=dict(type='nms', iou_threshold=0.7), 115 | min_bbox_size=0), 116 | rcnn=dict( 117 | score_thr=0.05, 118 | nms=dict(type='nms', iou_threshold=0.5), 119 | max_per_img=100, 120 | mask_thr_binary=0.5))) 121 | -------------------------------------------------------------------------------- /detection/configs/_base_/models/retinanet_r50_fpn.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='RetinaNet', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=4, 8 | out_indices=(0, 1, 2, 3), 9 | frozen_stages=1, 10 | norm_cfg=dict(type='BN', requires_grad=True), 11 | norm_eval=True, 12 | style='pytorch', 13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), 14 | neck=dict( 15 | type='FPN', 16 | in_channels=[256, 512, 1024, 2048], 17 | out_channels=256, 18 | start_level=1, 19 | add_extra_convs='on_input', 20 | num_outs=5), 21 | bbox_head=dict( 22 | type='RetinaHead', 23 | num_classes=80, 24 | in_channels=256, 25 | stacked_convs=4, 26 | feat_channels=256, 27 | anchor_generator=dict( 28 | type='AnchorGenerator', 29 | octave_base_scale=4, 30 | scales_per_octave=3, 31 | ratios=[0.5, 1.0, 2.0], 32 | strides=[8, 16, 32, 64, 128]), 33 | bbox_coder=dict( 34 | type='DeltaXYWHBBoxCoder', 35 | target_means=[.0, .0, .0, .0], 36 | target_stds=[1.0, 1.0, 1.0, 1.0]), 37 | loss_cls=dict( 38 | type='FocalLoss', 39 | use_sigmoid=True, 40 | gamma=2.0, 41 | alpha=0.25, 42 | loss_weight=1.0), 43 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 44 | # model training and testing settings 45 | train_cfg=dict( 46 | assigner=dict( 47 | type='MaxIoUAssigner', 48 | pos_iou_thr=0.5, 49 | neg_iou_thr=0.4, 50 | min_pos_iou=0, 51 | ignore_iof_thr=-1), 52 | allowed_border=-1, 53 | pos_weight=-1, 54 | debug=False), 55 | test_cfg=dict( 56 | nms_pre=1000, 57 | min_bbox_size=0, 58 | score_thr=0.05, 59 | nms=dict(type='nms', iou_threshold=0.5), 60 | max_per_img=100)) 61 | -------------------------------------------------------------------------------- /detection/configs/_base_/models/rpn_r50_caffe_c4.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='RPN', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=3, 8 | strides=(1, 2, 2), 9 | dilations=(1, 1, 1), 10 | out_indices=(2, ), 11 | frozen_stages=1, 12 | norm_cfg=dict(type='BN', requires_grad=False), 13 | norm_eval=True, 14 | style='caffe', 15 | init_cfg=dict( 16 | type='Pretrained', 17 | checkpoint='open-mmlab://detectron2/resnet50_caffe')), 18 | neck=None, 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=1024, 22 | feat_channels=1024, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[2, 4, 8, 16, 32], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[16]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 35 | # model training and testing settings 36 | train_cfg=dict( 37 | rpn=dict( 38 | assigner=dict( 39 | type='MaxIoUAssigner', 40 | pos_iou_thr=0.7, 41 | neg_iou_thr=0.3, 42 | min_pos_iou=0.3, 43 | ignore_iof_thr=-1), 44 | sampler=dict( 45 | type='RandomSampler', 46 | num=256, 47 | pos_fraction=0.5, 48 | neg_pos_ub=-1, 49 | add_gt_as_proposals=False), 50 | allowed_border=0, 51 | pos_weight=-1, 52 | debug=False)), 53 | test_cfg=dict( 54 | rpn=dict( 55 | nms_pre=12000, 56 | max_per_img=2000, 57 | nms=dict(type='nms', iou_threshold=0.7), 58 | min_bbox_size=0))) 59 | -------------------------------------------------------------------------------- /detection/configs/_base_/models/rpn_r50_fpn.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='RPN', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=4, 8 | out_indices=(0, 1, 2, 3), 9 | frozen_stages=1, 10 | norm_cfg=dict(type='BN', requires_grad=True), 11 | norm_eval=True, 12 | style='pytorch', 13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), 14 | neck=dict( 15 | type='FPN', 16 | in_channels=[256, 512, 1024, 2048], 17 | out_channels=256, 18 | num_outs=5), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=256, 22 | feat_channels=256, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[8], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[4, 8, 16, 32, 64]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 35 | # model training and testing settings 36 | train_cfg=dict( 37 | rpn=dict( 38 | assigner=dict( 39 | type='MaxIoUAssigner', 40 | pos_iou_thr=0.7, 41 | neg_iou_thr=0.3, 42 | min_pos_iou=0.3, 43 | ignore_iof_thr=-1), 44 | sampler=dict( 45 | type='RandomSampler', 46 | num=256, 47 | pos_fraction=0.5, 48 | neg_pos_ub=-1, 49 | add_gt_as_proposals=False), 50 | allowed_border=0, 51 | pos_weight=-1, 52 | debug=False)), 53 | test_cfg=dict( 54 | rpn=dict( 55 | nms_pre=2000, 56 | max_per_img=1000, 57 | nms=dict(type='nms', iou_threshold=0.7), 58 | min_bbox_size=0))) 59 | -------------------------------------------------------------------------------- /detection/configs/_base_/models/ssd300.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | input_size = 300 3 | model = dict( 4 | type='SingleStageDetector', 5 | backbone=dict( 6 | type='SSDVGG', 7 | depth=16, 8 | with_last_pool=False, 9 | ceil_mode=True, 10 | out_indices=(3, 4), 11 | out_feature_indices=(22, 34), 12 | init_cfg=dict( 13 | type='Pretrained', checkpoint='open-mmlab://vgg16_caffe')), 14 | neck=dict( 15 | type='SSDNeck', 16 | in_channels=(512, 1024), 17 | out_channels=(512, 1024, 512, 256, 256, 256), 18 | level_strides=(2, 2, 1, 1), 19 | level_paddings=(1, 1, 0, 0), 20 | l2_norm_scale=20), 21 | bbox_head=dict( 22 | type='SSDHead', 23 | in_channels=(512, 1024, 512, 256, 256, 256), 24 | num_classes=80, 25 | anchor_generator=dict( 26 | type='SSDAnchorGenerator', 27 | scale_major=False, 28 | input_size=input_size, 29 | basesize_ratio_range=(0.15, 0.9), 30 | strides=[8, 16, 32, 64, 100, 300], 31 | ratios=[[2], [2, 3], [2, 3], [2, 3], [2], [2]]), 32 | bbox_coder=dict( 33 | type='DeltaXYWHBBoxCoder', 34 | target_means=[.0, .0, .0, .0], 35 | target_stds=[0.1, 0.1, 0.2, 0.2])), 36 | # model training and testing settings 37 | train_cfg=dict( 38 | assigner=dict( 39 | type='MaxIoUAssigner', 40 | pos_iou_thr=0.5, 41 | neg_iou_thr=0.5, 42 | min_pos_iou=0., 43 | ignore_iof_thr=-1, 44 | gt_max_assign_all=False), 45 | smoothl1_beta=1., 46 | allowed_border=-1, 47 | pos_weight=-1, 48 | neg_pos_ratio=3, 49 | debug=False), 50 | test_cfg=dict( 51 | nms_pre=1000, 52 | nms=dict(type='nms', iou_threshold=0.45), 53 | min_bbox_size=0, 54 | score_thr=0.02, 55 | max_per_img=200)) 56 | cudnn_benchmark = True 57 | -------------------------------------------------------------------------------- /detection/configs/_base_/schedules/schedule_1x.py: -------------------------------------------------------------------------------- 1 | # optimizer 2 | optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) 3 | optimizer_config = dict(grad_clip=None) 4 | # learning policy 5 | lr_config = dict( 6 | policy='step', 7 | warmup='linear', 8 | warmup_iters=500, 9 | warmup_ratio=0.001, 10 | step=[8, 11]) 11 | runner = dict(type='EpochBasedRunner', max_epochs=12) 12 | -------------------------------------------------------------------------------- /detection/configs/_base_/schedules/schedule_20e.py: -------------------------------------------------------------------------------- 1 | # optimizer 2 | optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) 3 | optimizer_config = dict(grad_clip=None) 4 | # learning policy 5 | lr_config = dict( 6 | policy='step', 7 | warmup='linear', 8 | warmup_iters=500, 9 | warmup_ratio=0.001, 10 | step=[16, 19]) 11 | runner = dict(type='EpochBasedRunner', max_epochs=20) 12 | -------------------------------------------------------------------------------- /detection/configs/_base_/schedules/schedule_2x.py: -------------------------------------------------------------------------------- 1 | # optimizer 2 | optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) 3 | optimizer_config = dict(grad_clip=None) 4 | # learning policy 5 | lr_config = dict( 6 | policy='step', 7 | warmup='linear', 8 | warmup_iters=500, 9 | warmup_ratio=0.001, 10 | step=[16, 22]) 11 | runner = dict(type='EpochBasedRunner', max_epochs=24) 12 | -------------------------------------------------------------------------------- /detection/configs/mask_rcnn_p2t_b_fpn_1x_coco.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '_base_/models/mask_rcnn_r50_fpn.py', 3 | '_base_/datasets/coco_instance.py', 4 | # '../configs/_base_/schedules/schedule_1x.py', 5 | '_base_/default_runtime.py' 6 | ] 7 | model = dict( 8 | backbone=dict( 9 | type='p2t_base', 10 | style='pytorch', 11 | init_cfg=dict( 12 | type='Pretrained', 13 | checkpoint='pretrained/p2t_base.pth'), 14 | ), 15 | neck=dict( 16 | type='FPN', 17 | in_channels=[64, 128, 320, 512], 18 | out_channels=256, 19 | num_outs=5) 20 | ) 21 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001) 22 | optimizer_config = dict(grad_clip=dict(max_norm=10, norm_type=2)) 23 | 24 | # learning policy 25 | lr_config = dict( 26 | policy='step', 27 | warmup='linear', 28 | warmup_iters=500, 29 | warmup_ratio=0.001, 30 | step=[8, 11]) 31 | 32 | total_epochs = 12 33 | fp16 = None 34 | find_unused_parameters = True -------------------------------------------------------------------------------- /detection/configs/mask_rcnn_p2t_l_fpn_1x_coco.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '_base_/models/mask_rcnn_r50_fpn.py', 3 | '_base_/datasets/coco_instance.py', 4 | # '../configs/_base_/schedules/schedule_1x.py', 5 | '_base_/default_runtime.py' 6 | ] 7 | model = dict( 8 | backbone=dict( 9 | type='p2t_large', 10 | style='pytorch', 11 | init_cfg=dict( 12 | type='Pretrained', 13 | checkpoint='pretrained/p2t_large.pth'), 14 | ), 15 | neck=dict( 16 | type='FPN', 17 | in_channels=[64, 128, 320, 640], 18 | out_channels=256, 19 | num_outs=5) 20 | ) 21 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001) 22 | optimizer_config = dict(grad_clip=dict(max_norm=10, norm_type=2)) 23 | 24 | # learning policy 25 | lr_config = dict( 26 | policy='step', 27 | warmup='linear', 28 | warmup_iters=500, 29 | warmup_ratio=0.001, 30 | step=[8, 11]) 31 | 32 | total_epochs = 12 33 | fp16 = None 34 | find_unused_parameters = True -------------------------------------------------------------------------------- /detection/configs/mask_rcnn_p2t_s_fpn_1x_coco.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '_base_/models/mask_rcnn_r50_fpn.py', 3 | '_base_/datasets/coco_instance.py', 4 | # '../configs/_base_/schedules/schedule_1x.py', 5 | '_base_/default_runtime.py' 6 | ] 7 | model = dict( 8 | backbone=dict( 9 | type='p2t_small', 10 | style='pytorch', 11 | init_cfg=dict( 12 | type='Pretrained', 13 | checkpoint='pretrained/p2t_small.pth'), 14 | ), 15 | neck=dict( 16 | type='FPN', 17 | in_channels=[64, 128, 320, 512], 18 | out_channels=256, 19 | num_outs=5) 20 | ) 21 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001) 22 | optimizer_config = dict(grad_clip=None) 23 | 24 | # learning policy 25 | lr_config = dict( 26 | policy='step', 27 | warmup='linear', 28 | warmup_iters=500, 29 | warmup_ratio=0.001, 30 | step=[8, 11]) 31 | 32 | total_epochs = 12 33 | fp16 = None 34 | find_unused_parameters = True -------------------------------------------------------------------------------- /detection/configs/mask_rcnn_p2t_t_fpn_1x_coco.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '_base_/models/mask_rcnn_r50_fpn.py', 3 | '_base_/datasets/coco_instance.py', 4 | # '../configs/_base_/schedules/schedule_1x.py', 5 | '_base_/default_runtime.py' 6 | ] 7 | model = dict( 8 | backbone=dict( 9 | type='p2t_tiny', 10 | style='pytorch', 11 | init_cfg=dict( 12 | type='Pretrained', 13 | checkpoint='pretrained/p2t_tiny.pth'), 14 | ), 15 | neck=dict( 16 | type='FPN', 17 | in_channels=[48, 96, 240, 384], 18 | out_channels=256, 19 | num_outs=5) 20 | ) 21 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001) 22 | optimizer_config = dict(grad_clip=None) 23 | 24 | # learning policy 25 | lr_config = dict( 26 | policy='step', 27 | warmup='linear', 28 | warmup_iters=500, 29 | warmup_ratio=0.001, 30 | step=[8, 11]) 31 | 32 | total_epochs = 12 33 | fp16 = None 34 | find_unused_parameters = True -------------------------------------------------------------------------------- /detection/configs/retinanet_p2t_b_fpn_1x_coco.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '_base_/models/retinanet_r50_fpn.py', 3 | '_base_/datasets/coco_detection.py', 4 | '_base_/schedules/schedule_1x.py', '_base_/default_runtime.py' 5 | ] 6 | 7 | model = dict( 8 | backbone=dict( 9 | type='p2t_base', 10 | style='pytorch', 11 | init_cfg=dict( 12 | type='Pretrained', 13 | checkpoint='pretrained/p2t_base.pth'), 14 | ), 15 | neck=dict( 16 | type='FPN', 17 | in_channels=[64, 128, 320, 512], 18 | out_channels=256, 19 | num_outs=5) 20 | ) 21 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001) 22 | optimizer_config = dict(grad_clip=None) 23 | 24 | # learning policy 25 | lr_config = dict( 26 | policy='step', 27 | warmup='linear', 28 | warmup_iters=500, 29 | warmup_ratio=0.001, 30 | step=[8, 11]) 31 | 32 | total_epochs = 12 33 | fp16 = None 34 | find_unused_parameters = True -------------------------------------------------------------------------------- /detection/configs/retinanet_p2t_l_fpn_1x_coco.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '_base_/models/retinanet_r50_fpn.py', 3 | '_base_/datasets/coco_detection.py', 4 | '_base_/schedules/schedule_1x.py', '_base_/default_runtime.py' 5 | ] 6 | 7 | model = dict( 8 | backbone=dict( 9 | type='p2t_large', 10 | style='pytorch', 11 | init_cfg=dict( 12 | type='Pretrained', 13 | checkpoint='pretrained/p2t_large.pth'), 14 | ), 15 | neck=dict( 16 | type='FPN', 17 | in_channels=[64, 128, 320, 640], 18 | out_channels=256, 19 | num_outs=5) 20 | ) 21 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001) 22 | optimizer_config = dict(grad_clip=None) 23 | 24 | # learning policy 25 | lr_config = dict( 26 | policy='step', 27 | warmup='linear', 28 | warmup_iters=500, 29 | warmup_ratio=0.001, 30 | step=[8, 11]) 31 | 32 | total_epochs = 12 33 | fp16 = None 34 | find_unused_parameters = True -------------------------------------------------------------------------------- /detection/configs/retinanet_p2t_s_fpn_1x_coco.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '_base_/models/retinanet_r50_fpn.py', 3 | '_base_/datasets/coco_detection.py', 4 | '_base_/schedules/schedule_1x.py', '_base_/default_runtime.py' 5 | ] 6 | 7 | model = dict( 8 | backbone=dict( 9 | type='p2t_small', 10 | style='pytorch', 11 | init_cfg=dict( 12 | type='Pretrained', 13 | checkpoint='pretrained/p2t_small.pth'), 14 | ), 15 | neck=dict( 16 | type='FPN', 17 | in_channels=[64, 128, 320, 512], 18 | out_channels=256, 19 | num_outs=5) 20 | ) 21 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001) 22 | optimizer_config = dict(grad_clip=None) 23 | 24 | # learning policy 25 | lr_config = dict( 26 | policy='step', 27 | warmup='linear', 28 | warmup_iters=500, 29 | warmup_ratio=0.001, 30 | step=[8, 11]) 31 | 32 | total_epochs = 12 33 | fp16 = None 34 | find_unused_parameters = True -------------------------------------------------------------------------------- /detection/configs/retinanet_p2t_t_fpn_1x_coco.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '_base_/models/retinanet_r50_fpn.py', 3 | '_base_/datasets/coco_detection.py', 4 | '_base_/schedules/schedule_1x.py', '_base_/default_runtime.py' 5 | ] 6 | 7 | 8 | model = dict( 9 | backbone=dict( 10 | type='p2t_tiny', 11 | style='pytorch', 12 | init_cfg=dict( 13 | type='Pretrained', 14 | checkpoint='pretrained/p2t_tiny.pth'), 15 | ), 16 | neck=dict( 17 | type='FPN', 18 | in_channels=[48, 96, 240, 384], 19 | out_channels=256, 20 | num_outs=5) 21 | ) 22 | optimizer = dict(_delete_=True, type='AdamW', lr=0.0001, weight_decay=0.0001) 23 | optimizer_config = dict(grad_clip=None) 24 | 25 | # learning policy 26 | lr_config = dict( 27 | policy='step', 28 | warmup='linear', 29 | warmup_iters=500, 30 | warmup_ratio=0.001, 31 | step=[8, 11]) 32 | 33 | total_epochs = 12 34 | fp16 = None 35 | find_unused_parameters = True -------------------------------------------------------------------------------- /detection/dist_test.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | CONFIG=$1 4 | CHECKPOINT=$2 5 | GPUS=$3 6 | PORT=${PORT:-29400} 7 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ 8 | python $(dirname "$0")/test.py $CONFIG $CHECKPOINT ${@:4} 9 | 10 | ## example command: 11 | ## bash dist_test.sh configs/mask_rcnn_p2t_t_fpn_1x_coco.py pretrained/mask_rcnn_p2t_t_fpn_1x_coco-d875fa68.pth 1 12 | -------------------------------------------------------------------------------- /detection/dist_train.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | export OMP_NUM_THREADS=1 4 | 5 | CONFIG=$1 6 | N_GPUS=$2 7 | PORT=${PORT:-29500} 8 | 9 | 10 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ 11 | python -m torch.distributed.launch --nproc_per_node=${N_GPUS} \ 12 | --master_port=${PORT} \ 13 | --use_env $(dirname "$0")/train.py ${CONFIG} --launcher pytorch ${@:3} 14 | 15 | ## bash dist_train.sh configs/mask_rcnn_p2t_t_fpn_1x_coco.py 8 16 | 17 | -------------------------------------------------------------------------------- /detection/p2t.py: -------------------------------------------------------------------------------- 1 | from os import sep 2 | from pickle import TRUE 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | import torch.jit as jit 7 | from functools import partial 8 | 9 | from timm.models.layers import DropPath, to_2tuple, trunc_normal_ 10 | from timm.models.registry import register_model 11 | from timm.models.vision_transformer import _cfg 12 | 13 | from mmdet.models.builder import BACKBONES 14 | from mmcv.runner import load_checkpoint 15 | from mmdet.utils import get_root_logger 16 | 17 | 18 | import numpy as np 19 | from time import time 20 | 21 | __all__ = [ 22 | 'p2t_tiny', 'p2t_small', 'p2t_base', 'p2t_large' 23 | ] 24 | 25 | 26 | 27 | class IRB(nn.Module): 28 | def __init__(self, in_features, hidden_features=None, out_features=None, ksize=3, act_layer=nn.Hardswish, drop=0.): 29 | super().__init__() 30 | out_features = out_features or in_features 31 | hidden_features = hidden_features or in_features 32 | self.fc1 = nn.Conv2d(in_features, hidden_features, 1, 1, 0) 33 | self.act = act_layer() 34 | self.conv = nn.Conv2d(hidden_features, hidden_features, kernel_size=ksize, padding=ksize//2, stride=1, groups=hidden_features) 35 | self.fc2 = nn.Conv2d(hidden_features, out_features, 1, 1, 0) 36 | self.drop = nn.Dropout(drop) 37 | 38 | def forward(self, x, H, W): 39 | B, N, C = x.shape 40 | x = x.permute(0,2,1).reshape(B, C, H, W) 41 | x = self.fc1(x) 42 | x = self.act(x) 43 | x = self.conv(x) 44 | x = self.act(x) 45 | x = self.fc2(x) 46 | return x.reshape(B, C, -1).permute(0,2,1) 47 | 48 | 49 | class PoolingAttention(nn.Module): 50 | def __init__(self, dim, num_heads=2, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0., 51 | pool_ratios=[1,2,3,6]): 52 | 53 | super().__init__() 54 | assert dim % num_heads == 0, f"dim {dim} should be divided by num_heads {num_heads}." 55 | 56 | self.dim = dim 57 | self.num_heads = num_heads 58 | self.num_elements = np.array([t*t for t in pool_ratios]).sum() 59 | head_dim = dim // num_heads 60 | self.scale = qk_scale or head_dim ** -0.5 61 | 62 | self.q = nn.Sequential(nn.Linear(dim, dim, bias=qkv_bias)) 63 | self.kv = nn.Sequential(nn.Linear(dim, dim * 2, bias=qkv_bias)) 64 | 65 | self.attn_drop = nn.Dropout(attn_drop) 66 | self.proj = nn.Linear(dim, dim) 67 | self.proj_drop = nn.Dropout(proj_drop) 68 | 69 | self.pool_ratios = pool_ratios 70 | self.pools = nn.ModuleList() 71 | 72 | self.norm = nn.LayerNorm(dim) 73 | 74 | def forward(self, x, H, W, d_convs=None): 75 | B, N, C = x.shape 76 | 77 | q = self.q(x).reshape(B, N, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3) 78 | pools = [] 79 | x_ = x.permute(0, 2, 1).reshape(B, C, H, W) 80 | for (pool_ratio, l) in zip(self.pool_ratios, d_convs): 81 | pool = F.adaptive_avg_pool2d(x_, (round(H/pool_ratio), round(W/pool_ratio))) 82 | pool = pool + l(pool) 83 | pools.append(pool.view(B, C, -1)) 84 | 85 | pools = torch.cat(pools, dim=2) 86 | pools = self.norm(pools.permute(0,2,1)) 87 | 88 | kv = self.kv(pools).reshape(B, -1, 2, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4) 89 | k, v = kv[0], kv[1] 90 | 91 | attn = (q @ k.transpose(-2, -1)) * self.scale 92 | attn = attn.softmax(dim=-1) 93 | x = (attn @ v) 94 | x = x.transpose(1,2).contiguous().reshape(B, N, C) 95 | 96 | x = self.proj(x) 97 | 98 | return x 99 | 100 | 101 | class Block(nn.Module): 102 | 103 | def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0., 104 | drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, pool_ratios=[12,16,20,24]): 105 | super().__init__() 106 | self.norm1 = norm_layer(dim) 107 | self.attn = PoolingAttention( 108 | dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, 109 | attn_drop=attn_drop, proj_drop=drop, pool_ratios=pool_ratios) 110 | 111 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() 112 | 113 | self.norm2 = norm_layer(dim) 114 | self.mlp = IRB(in_features=dim, hidden_features=int(dim * mlp_ratio), act_layer=nn.Hardswish, drop=drop, ksize=3) 115 | 116 | def forward(self, x, H, W, d_convs=None): 117 | x = x + self.drop_path(self.attn(self.norm1(x), H, W, d_convs=d_convs)) 118 | x = x + self.drop_path(self.mlp(self.norm2(x), H, W)) 119 | 120 | return x 121 | 122 | class PatchEmbed(nn.Module): 123 | """ Image to Patch Embedding 124 | """ 125 | 126 | def __init__(self, img_size=224, patch_size=16, kernel_size=3, in_chans=3, embed_dim=768, overlap=True): 127 | super().__init__() 128 | img_size = to_2tuple(img_size) 129 | patch_size = to_2tuple(patch_size) 130 | 131 | self.img_size = img_size 132 | self.patch_size = patch_size 133 | assert img_size[0] % patch_size[0] == 0 and img_size[1] % patch_size[1] == 0, \ 134 | f"img_size {img_size} should be divided by patch_size {patch_size}." 135 | self.H, self.W = img_size[0] // patch_size[0], img_size[1] // patch_size[1] 136 | self.num_patches = self.H * self.W 137 | if not overlap: 138 | self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size) 139 | else: 140 | self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=kernel_size, stride=patch_size, padding=kernel_size//2) 141 | 142 | self.norm = nn.LayerNorm(embed_dim) 143 | 144 | def forward(self, x): 145 | B, C, H, W = x.shape 146 | x = self.proj(x).flatten(2).transpose(1, 2) 147 | x = self.norm(x) 148 | H, W = H // self.patch_size[0], W // self.patch_size[1] 149 | 150 | return x, (H, W) 151 | 152 | 153 | 154 | class PyramidPoolingTransformer(nn.Module): 155 | def __init__(self, img_size=224, patch_size=4, in_chans=3, embed_dims=[64, 128, 320, 512], 156 | num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True, qk_scale=None, drop_rate=0., 157 | attn_drop_rate=0., drop_path_rate=0.1, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2, 2, 9, 3], **kwargs): # 158 | super().__init__() 159 | print("loading p2t") 160 | self.depths = depths 161 | 162 | self.embed_dims = embed_dims 163 | 164 | # pyramid pooling ratios for each stage 165 | pool_ratios = [[12,16,20,24], [6,8,10,12], [3,4,5,6], [1,2,3,4]] 166 | 167 | self.patch_embed1 = PatchEmbed(img_size=img_size, patch_size=4, kernel_size=7, in_chans=in_chans, 168 | embed_dim=embed_dims[0], overlap=True) 169 | 170 | self.patch_embed2 = PatchEmbed(img_size=img_size // 4, patch_size=2, in_chans=embed_dims[0], 171 | embed_dim=embed_dims[1], overlap=True) 172 | self.patch_embed3 = PatchEmbed(img_size=img_size // 8, patch_size=2, in_chans=embed_dims[1], 173 | embed_dim=embed_dims[2], overlap=True) 174 | self.patch_embed4 = PatchEmbed(img_size=img_size // 16, patch_size=2, in_chans=embed_dims[2], 175 | embed_dim=embed_dims[3], overlap=True) 176 | 177 | self.d_convs1 = nn.ModuleList([nn.Conv2d(embed_dims[0], embed_dims[0], kernel_size=3, stride=1, padding=1, groups=embed_dims[0]) for temp in pool_ratios[0]]) 178 | self.d_convs2 = nn.ModuleList([nn.Conv2d(embed_dims[1], embed_dims[1], kernel_size=3, stride=1, padding=1, groups=embed_dims[1]) for temp in pool_ratios[1]]) 179 | self.d_convs3 = nn.ModuleList([nn.Conv2d(embed_dims[2], embed_dims[2], kernel_size=3, stride=1, padding=1, groups=embed_dims[2]) for temp in pool_ratios[2]]) 180 | self.d_convs4 = nn.ModuleList([nn.Conv2d(embed_dims[3], embed_dims[3], kernel_size=3, stride=1, padding=1, groups=embed_dims[3]) for temp in pool_ratios[3]]) 181 | 182 | # transformer encoder 183 | dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))] # stochastic depth decay rule 184 | cur = 0 185 | 186 | 187 | ksize = 3 188 | 189 | self.block1 = nn.ModuleList([Block( 190 | dim=embed_dims[0], num_heads=num_heads[0], mlp_ratio=mlp_ratios[0], qkv_bias=qkv_bias, qk_scale=qk_scale, 191 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[0]) 192 | for i in range(depths[0])]) 193 | 194 | 195 | cur += depths[0] 196 | self.block2 = nn.ModuleList([Block( 197 | dim=embed_dims[1], num_heads=num_heads[1], mlp_ratio=mlp_ratios[1], qkv_bias=qkv_bias, qk_scale=qk_scale, 198 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[1]) 199 | for i in range(depths[1])]) 200 | 201 | cur += depths[1] 202 | 203 | 204 | self.block3 = nn.ModuleList([Block( 205 | dim=embed_dims[2], num_heads=num_heads[2], mlp_ratio=mlp_ratios[2], qkv_bias=qkv_bias, qk_scale=qk_scale, 206 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[2]) 207 | for i in range(depths[2])]) 208 | 209 | cur += depths[2] 210 | 211 | self.block4 = nn.ModuleList([Block( 212 | dim=embed_dims[3], num_heads=num_heads[3], mlp_ratio=mlp_ratios[3], qkv_bias=qkv_bias, qk_scale=qk_scale, 213 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[3]) 214 | for i in range(depths[3])]) 215 | 216 | # classification head, usually not used in dense prediction tasks 217 | 218 | self.apply(self._init_weights) 219 | 220 | 221 | def init_weights(self, pretrained=None): 222 | if isinstance(pretrained, str): 223 | logger = get_root_logger() 224 | load_checkpoint(self, pretrained, map_location='cpu', strict=False, logger=logger) 225 | 226 | 227 | def reset_drop_path(self, drop_path_rate): 228 | dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(self.depths))] 229 | cur = 0 230 | for i in range(self.depths[0]): 231 | self.block1[i].drop_path.drop_prob = dpr[cur + i] 232 | 233 | cur += self.depths[0] 234 | for i in range(self.depths[1]): 235 | self.block2[i].drop_path.drop_prob = dpr[cur + i] 236 | 237 | cur += self.depths[1] 238 | for i in range(self.depths[2]): 239 | self.block3[i].drop_path.drop_prob = dpr[cur + i] 240 | 241 | cur += self.depths[2] 242 | for i in range(self.depths[3]): 243 | self.block4[i].drop_path.drop_prob = dpr[cur + i] 244 | 245 | def _init_weights(self, m): 246 | if isinstance(m, nn.Linear): 247 | trunc_normal_(m.weight, std=.02) 248 | if isinstance(m, nn.Linear) and m.bias is not None: 249 | nn.init.constant_(m.bias, 0) 250 | elif isinstance(m, nn.LayerNorm): 251 | nn.init.constant_(m.bias, 0) 252 | nn.init.constant_(m.weight, 1.0) 253 | 254 | 255 | @torch.jit.ignore 256 | def no_weight_decay(self): 257 | # return {'pos_embed', 'cls_token'} # has pos_embed may be better 258 | return {'cls_token'} 259 | 260 | def get_classifier(self): 261 | return self.head 262 | 263 | def forward_features(self, x): 264 | outs = [] 265 | 266 | B = x.shape[0] 267 | 268 | # stage 1 269 | x, (H, W) = self.patch_embed1(x) 270 | 271 | for idx, blk in enumerate(self.block1): 272 | x = blk(x, H, W, self.d_convs1) 273 | x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous() 274 | outs.append(x) 275 | 276 | # stage 2 277 | x, (H, W) = self.patch_embed2(x) 278 | 279 | for idx, blk in enumerate(self.block2): 280 | x = blk(x, H, W, self.d_convs2) 281 | x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous() 282 | outs.append(x) 283 | 284 | x, (H, W) = self.patch_embed3(x) 285 | 286 | for idx, blk in enumerate(self.block3): 287 | x = blk(x, H, W, self.d_convs3) 288 | x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous() 289 | outs.append(x) 290 | 291 | # stage 4 292 | x, (H, W) = self.patch_embed4(x) 293 | 294 | for idx, blk in enumerate(self.block4): 295 | x = blk(x, H, W, self.d_convs4) 296 | x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous() 297 | outs.append(x) 298 | 299 | return outs 300 | 301 | def forward(self, x): 302 | x = self.forward_features(x) 303 | 304 | return x 305 | 306 | 307 | def _conv_filter(state_dict, patch_size=16): 308 | """ convert patch embedding weight from manual patchify + linear proj to conv""" 309 | out_dict = {} 310 | for k, v in state_dict.items(): 311 | if 'patch_embed.proj.weight' in k: 312 | v = v.reshape((v.shape[0], 3, patch_size, patch_size)) 313 | out_dict[k] = v 314 | 315 | return out_dict 316 | 317 | 318 | @BACKBONES.register_module() 319 | class p2t_tiny(PyramidPoolingTransformer): 320 | def __init__(self, **kwargs): 321 | super().__init__( 322 | patch_size=4, embed_dims=[48, 96, 240, 384], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], 323 | qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2, 2, 6, 3], 324 | drop_rate=0.0, drop_path_rate=0.1, **kwargs) 325 | 326 | 327 | @BACKBONES.register_module() 328 | class p2t_small(PyramidPoolingTransformer): 329 | def __init__(self, **kwargs): 330 | super().__init__( 331 | patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], 332 | qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2,2,9,3], mlp_ratios=[8,8,4,4], 333 | drop_rate=0.0, drop_path_rate=0.1, **kwargs) 334 | 335 | 336 | @BACKBONES.register_module() 337 | class p2t_base(PyramidPoolingTransformer): 338 | def __init__(self, **kwargs): 339 | super().__init__( 340 | patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], 341 | qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3,4,18,3], mlp_ratios=[8,8,4,4], 342 | drop_rate=0.0, drop_path_rate=0.3, **kwargs) 343 | 344 | 345 | @BACKBONES.register_module() 346 | class p2t_large(PyramidPoolingTransformer): 347 | def __init__(self, **kwargs): 348 | super().__init__( 349 | patch_size=4, embed_dims=[64, 128, 320, 640], num_heads=[1, 2, 5, 8], 350 | qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3,8,27,3], mlp_ratios=[8,8,4,4], 351 | drop_rate=0.0, drop_path_rate=0.3, **kwargs) 352 | -------------------------------------------------------------------------------- /detection/test.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import os.path as osp 4 | import time 5 | import warnings 6 | 7 | import mmcv 8 | import torch 9 | from mmcv import Config, DictAction 10 | from mmcv.cnn import fuse_conv_bn 11 | from mmcv.parallel import MMDataParallel, MMDistributedDataParallel 12 | from mmcv.runner import (get_dist_info, init_dist, load_checkpoint, 13 | wrap_fp16_model) 14 | 15 | from mmdet.apis import multi_gpu_test, single_gpu_test 16 | from mmdet.datasets import (build_dataloader, build_dataset, 17 | replace_ImageToTensor) 18 | from mmdet.models import build_detector 19 | 20 | import p2t 21 | 22 | def parse_args(): 23 | parser = argparse.ArgumentParser( 24 | description='MMDet test (and eval) a model') 25 | parser.add_argument('config', help='test config file path') 26 | parser.add_argument('checkpoint', help='checkpoint file') 27 | parser.add_argument( 28 | '--work-dir', 29 | help='the directory to save the file containing evaluation metrics') 30 | parser.add_argument('--out', help='output result file in pickle format') 31 | parser.add_argument( 32 | '--fuse-conv-bn', 33 | action='store_true', 34 | help='Whether to fuse conv and bn, this will slightly increase' 35 | 'the inference speed') 36 | parser.add_argument( 37 | '--format-only', 38 | action='store_true', 39 | help='Format the output results without perform evaluation. It is' 40 | 'useful when you want to format the result to a specific format and ' 41 | 'submit it to the test server') 42 | parser.add_argument( 43 | '--eval', 44 | type=str, 45 | nargs='+', 46 | help='evaluation metrics, which depends on the dataset, e.g., "bbox",' 47 | ' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC') 48 | parser.add_argument('--show', action='store_true', help='show results') 49 | parser.add_argument( 50 | '--show-dir', help='directory where painted images will be saved') 51 | parser.add_argument( 52 | '--show-score-thr', 53 | type=float, 54 | default=0.3, 55 | help='score threshold (default: 0.3)') 56 | parser.add_argument( 57 | '--gpu-collect', 58 | action='store_true', 59 | help='whether to use gpu to collect results.') 60 | parser.add_argument( 61 | '--tmpdir', 62 | help='tmp directory used for collecting results from multiple ' 63 | 'workers, available when gpu-collect is not specified') 64 | parser.add_argument( 65 | '--cfg-options', 66 | nargs='+', 67 | action=DictAction, 68 | help='override some settings in the used config, the key-value pair ' 69 | 'in xxx=yyy format will be merged into config file. If the value to ' 70 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b ' 71 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" ' 72 | 'Note that the quotation marks are necessary and that no white space ' 73 | 'is allowed.') 74 | parser.add_argument( 75 | '--options', 76 | nargs='+', 77 | action=DictAction, 78 | help='custom options for evaluation, the key-value pair in xxx=yyy ' 79 | 'format will be kwargs for dataset.evaluate() function (deprecate), ' 80 | 'change to --eval-options instead.') 81 | parser.add_argument( 82 | '--eval-options', 83 | nargs='+', 84 | action=DictAction, 85 | help='custom options for evaluation, the key-value pair in xxx=yyy ' 86 | 'format will be kwargs for dataset.evaluate() function') 87 | parser.add_argument( 88 | '--launcher', 89 | choices=['none', 'pytorch', 'slurm', 'mpi'], 90 | default='none', 91 | help='job launcher') 92 | parser.add_argument('--local_rank', type=int, default=0) 93 | args = parser.parse_args() 94 | if 'LOCAL_RANK' not in os.environ: 95 | os.environ['LOCAL_RANK'] = str(args.local_rank) 96 | 97 | if args.options and args.eval_options: 98 | raise ValueError( 99 | '--options and --eval-options cannot be both ' 100 | 'specified, --options is deprecated in favor of --eval-options') 101 | if args.options: 102 | warnings.warn('--options is deprecated in favor of --eval-options') 103 | args.eval_options = args.options 104 | return args 105 | 106 | 107 | def main(): 108 | args = parse_args() 109 | 110 | assert args.out or args.eval or args.format_only or args.show \ 111 | or args.show_dir, \ 112 | ('Please specify at least one operation (save/eval/format/show the ' 113 | 'results / save the results) with the argument "--out", "--eval"' 114 | ', "--format-only", "--show" or "--show-dir"') 115 | 116 | if args.eval and args.format_only: 117 | raise ValueError('--eval and --format_only cannot be both specified') 118 | 119 | if args.out is not None and not args.out.endswith(('.pkl', '.pickle')): 120 | raise ValueError('The output file must be a pkl file.') 121 | 122 | cfg = Config.fromfile(args.config) 123 | if args.cfg_options is not None: 124 | cfg.merge_from_dict(args.cfg_options) 125 | # import modules from string list. 126 | if cfg.get('custom_imports', None): 127 | from mmcv.utils import import_modules_from_strings 128 | import_modules_from_strings(**cfg['custom_imports']) 129 | # set cudnn_benchmark 130 | if cfg.get('cudnn_benchmark', False): 131 | torch.backends.cudnn.benchmark = True 132 | 133 | cfg.model.pretrained = None 134 | if cfg.model.get('neck'): 135 | if isinstance(cfg.model.neck, list): 136 | for neck_cfg in cfg.model.neck: 137 | if neck_cfg.get('rfp_backbone'): 138 | if neck_cfg.rfp_backbone.get('pretrained'): 139 | neck_cfg.rfp_backbone.pretrained = None 140 | elif cfg.model.neck.get('rfp_backbone'): 141 | if cfg.model.neck.rfp_backbone.get('pretrained'): 142 | cfg.model.neck.rfp_backbone.pretrained = None 143 | 144 | # in case the test dataset is concatenated 145 | samples_per_gpu = 1 146 | if isinstance(cfg.data.test, dict): 147 | cfg.data.test.test_mode = True 148 | samples_per_gpu = cfg.data.test.pop('samples_per_gpu', 1) 149 | if samples_per_gpu > 1: 150 | # Replace 'ImageToTensor' to 'DefaultFormatBundle' 151 | cfg.data.test.pipeline = replace_ImageToTensor( 152 | cfg.data.test.pipeline) 153 | elif isinstance(cfg.data.test, list): 154 | for ds_cfg in cfg.data.test: 155 | ds_cfg.test_mode = True 156 | samples_per_gpu = max( 157 | [ds_cfg.pop('samples_per_gpu', 1) for ds_cfg in cfg.data.test]) 158 | if samples_per_gpu > 1: 159 | for ds_cfg in cfg.data.test: 160 | ds_cfg.pipeline = replace_ImageToTensor(ds_cfg.pipeline) 161 | 162 | # init distributed env first, since logger depends on the dist info. 163 | if args.launcher == 'none': 164 | distributed = False 165 | else: 166 | distributed = True 167 | init_dist(args.launcher, **cfg.dist_params) 168 | 169 | rank, _ = get_dist_info() 170 | # allows not to create 171 | if args.work_dir is not None and rank == 0: 172 | mmcv.mkdir_or_exist(osp.abspath(args.work_dir)) 173 | timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime()) 174 | json_file = osp.join(args.work_dir, f'eval_{timestamp}.json') 175 | 176 | # build the dataloader 177 | dataset = build_dataset(cfg.data.test) 178 | data_loader = build_dataloader( 179 | dataset, 180 | samples_per_gpu=samples_per_gpu, 181 | workers_per_gpu=cfg.data.workers_per_gpu, 182 | dist=distributed, 183 | shuffle=False) 184 | 185 | # build the model and load checkpoint 186 | cfg.model.train_cfg = None 187 | model = build_detector(cfg.model, test_cfg=cfg.get('test_cfg')) 188 | fp16_cfg = cfg.get('fp16', None) 189 | if fp16_cfg is not None: 190 | wrap_fp16_model(model) 191 | checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu') 192 | if args.fuse_conv_bn: 193 | model = fuse_conv_bn(model) 194 | # old versions did not save class info in checkpoints, this walkaround is 195 | # for backward compatibility 196 | if 'CLASSES' in checkpoint.get('meta', {}): 197 | model.CLASSES = checkpoint['meta']['CLASSES'] 198 | else: 199 | model.CLASSES = dataset.CLASSES 200 | 201 | if not distributed: 202 | model = MMDataParallel(model, device_ids=[0]) 203 | outputs = single_gpu_test(model, data_loader, args.show, args.show_dir, 204 | args.show_score_thr) 205 | else: 206 | model = MMDistributedDataParallel( 207 | model.cuda(), 208 | device_ids=[torch.cuda.current_device()], 209 | broadcast_buffers=False) 210 | outputs = multi_gpu_test(model, data_loader, args.tmpdir, 211 | args.gpu_collect) 212 | 213 | rank, _ = get_dist_info() 214 | if rank == 0: 215 | if args.out: 216 | print(f'\nwriting results to {args.out}') 217 | mmcv.dump(outputs, args.out) 218 | kwargs = {} if args.eval_options is None else args.eval_options 219 | if args.format_only: 220 | dataset.format_results(outputs, **kwargs) 221 | if args.eval: 222 | eval_kwargs = cfg.get('evaluation', {}).copy() 223 | # hard-code way to remove EvalHook args 224 | for key in [ 225 | 'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best', 226 | 'rule' 227 | ]: 228 | eval_kwargs.pop(key, None) 229 | eval_kwargs.update(dict(metric=args.eval, **kwargs)) 230 | metric = dataset.evaluate(outputs, **eval_kwargs) 231 | print(metric) 232 | metric_dict = dict(config=args.config, metric=metric) 233 | if args.work_dir is not None and rank == 0: 234 | mmcv.dump(metric_dict, json_file) 235 | 236 | 237 | if __name__ == '__main__': 238 | main() -------------------------------------------------------------------------------- /detection/train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import copy 3 | import os 4 | import os.path as osp 5 | import time 6 | import warnings 7 | 8 | import mmcv 9 | import torch 10 | from mmcv import Config, DictAction 11 | from mmcv.runner import get_dist_info, init_dist 12 | from mmcv.utils import get_git_hash 13 | 14 | from mmdet import __version__ 15 | from mmdet.apis import set_random_seed, train_detector 16 | from mmdet.datasets import build_dataset 17 | from mmdet.models import build_detector 18 | from mmdet.utils import collect_env, get_root_logger 19 | import p2t 20 | 21 | def parse_args(): 22 | parser = argparse.ArgumentParser(description='Train a detector') 23 | parser.add_argument('config', help='train config file path') 24 | parser.add_argument('--work-dir', help='the dir to save logs and models') 25 | parser.add_argument( 26 | '--resume-from', help='the checkpoint file to resume from') 27 | parser.add_argument( 28 | '--no-validate', 29 | action='store_true', 30 | help='whether not to evaluate the checkpoint during training') 31 | group_gpus = parser.add_mutually_exclusive_group() 32 | group_gpus.add_argument( 33 | '--gpus', 34 | type=int, 35 | help='number of gpus to use ' 36 | '(only applicable to non-distributed training)') 37 | group_gpus.add_argument( 38 | '--gpu-ids', 39 | type=int, 40 | nargs='+', 41 | help='ids of gpus to use ' 42 | '(only applicable to non-distributed training)') 43 | parser.add_argument('--seed', type=int, default=None, help='random seed') 44 | parser.add_argument( 45 | '--deterministic', 46 | action='store_true', 47 | help='whether to set deterministic options for CUDNN backend.') 48 | parser.add_argument( 49 | '--options', 50 | nargs='+', 51 | action=DictAction, 52 | help='override some settings in the used config, the key-value pair ' 53 | 'in xxx=yyy format will be merged into config file (deprecate), ' 54 | 'change to --cfg-options instead.') 55 | parser.add_argument( 56 | '--cfg-options', 57 | nargs='+', 58 | action=DictAction, 59 | help='override some settings in the used config, the key-value pair ' 60 | 'in xxx=yyy format will be merged into config file. If the value to ' 61 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b ' 62 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" ' 63 | 'Note that the quotation marks are necessary and that no white space ' 64 | 'is allowed.') 65 | parser.add_argument( 66 | '--launcher', 67 | choices=['none', 'pytorch', 'slurm', 'mpi'], 68 | default='none', 69 | help='job launcher') 70 | parser.add_argument('--local_rank', type=int, default=0) 71 | args = parser.parse_args() 72 | if 'LOCAL_RANK' not in os.environ: 73 | os.environ['LOCAL_RANK'] = str(args.local_rank) 74 | 75 | if args.options and args.cfg_options: 76 | raise ValueError( 77 | '--options and --cfg-options cannot be both ' 78 | 'specified, --options is deprecated in favor of --cfg-options') 79 | if args.options: 80 | warnings.warn('--options is deprecated in favor of --cfg-options') 81 | args.cfg_options = args.options 82 | 83 | return args 84 | 85 | 86 | def main(): 87 | args = parse_args() 88 | 89 | cfg = Config.fromfile(args.config) 90 | if args.cfg_options is not None: 91 | cfg.merge_from_dict(args.cfg_options) 92 | # import modules from string list. 93 | if cfg.get('custom_imports', None): 94 | from mmcv.utils import import_modules_from_strings 95 | import_modules_from_strings(**cfg['custom_imports']) 96 | # set cudnn_benchmark 97 | if cfg.get('cudnn_benchmark', False): 98 | torch.backends.cudnn.benchmark = True 99 | 100 | # work_dir is determined in this priority: CLI > segment in file > filename 101 | if args.work_dir is not None: 102 | # update configs according to CLI args if args.work_dir is not None 103 | cfg.work_dir = args.work_dir 104 | elif cfg.get('work_dir', None) is None: 105 | # use config filename as default work_dir if cfg.work_dir is None 106 | cfg.work_dir = osp.join('./work_dirs', 107 | osp.splitext(osp.basename(args.config))[0]) 108 | if args.resume_from is not None: 109 | cfg.resume_from = args.resume_from 110 | if args.gpu_ids is not None: 111 | cfg.gpu_ids = args.gpu_ids 112 | else: 113 | cfg.gpu_ids = range(1) if args.gpus is None else range(args.gpus) 114 | 115 | # init distributed env first, since logger depends on the dist info. 116 | if args.launcher == 'none': 117 | distributed = False 118 | else: 119 | distributed = True 120 | init_dist(args.launcher, **cfg.dist_params) 121 | # re-set gpu_ids with distributed training mode 122 | _, world_size = get_dist_info() 123 | cfg.gpu_ids = range(world_size) 124 | 125 | # create work_dir 126 | mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir)) 127 | # dump config 128 | cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config))) 129 | # init the logger before other steps 130 | timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime()) 131 | log_file = osp.join(cfg.work_dir, f'{timestamp}.log') 132 | logger = get_root_logger(log_file=log_file, log_level=cfg.log_level) 133 | 134 | # init the meta dict to record some important information such as 135 | # environment info and seed, which will be logged 136 | meta = dict() 137 | # log env info 138 | env_info_dict = collect_env() 139 | env_info = '\n'.join([(f'{k}: {v}') for k, v in env_info_dict.items()]) 140 | dash_line = '-' * 60 + '\n' 141 | logger.info('Environment info:\n' + dash_line + env_info + '\n' + 142 | dash_line) 143 | meta['env_info'] = env_info 144 | meta['config'] = cfg.pretty_text 145 | # log some basic info 146 | logger.info(f'Distributed training: {distributed}') 147 | logger.info(f'Config:\n{cfg.pretty_text}') 148 | 149 | # set random seeds 150 | if args.seed is not None: 151 | logger.info(f'Set random seed to {args.seed}, ' 152 | f'deterministic: {args.deterministic}') 153 | set_random_seed(args.seed, deterministic=args.deterministic) 154 | cfg.seed = args.seed 155 | meta['seed'] = args.seed 156 | meta['exp_name'] = osp.basename(args.config) 157 | 158 | model = build_detector( cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg')) 159 | 160 | datasets = [build_dataset(cfg.data.train)] 161 | if len(cfg.workflow) == 2: 162 | val_dataset = copy.deepcopy(cfg.data.val) 163 | val_dataset.pipeline = cfg.data.train.pipeline 164 | datasets.append(build_dataset(val_dataset)) 165 | if cfg.checkpoint_config is not None: 166 | # save mmdet version, config file content and class names in 167 | # checkpoints as meta data 168 | cfg.checkpoint_config.meta = dict( 169 | mmdet_version=__version__ + get_git_hash()[:7], 170 | CLASSES=datasets[0].CLASSES) 171 | # add an attribute for visualization convenience 172 | model.CLASSES = datasets[0].CLASSES 173 | train_detector( 174 | model, 175 | datasets, 176 | cfg, 177 | distributed=distributed, 178 | validate=(not args.no_validate), 179 | timestamp=timestamp, 180 | meta=meta) 181 | 182 | 183 | if __name__ == '__main__': 184 | main() -------------------------------------------------------------------------------- /engine.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2015-present, Facebook, Inc. 2 | # All rights reserved. 3 | """ 4 | Train and eval functions used in main.py 5 | """ 6 | import math 7 | import sys 8 | from typing import Iterable, Optional 9 | 10 | import torch 11 | 12 | from timm.data import Mixup 13 | from timm.utils import accuracy, ModelEma 14 | 15 | import utils 16 | 17 | #from apex import amp, optimizers, parallel 18 | 19 | def train_one_epoch(model: torch.nn.Module, criterion: None, 20 | data_loader: Iterable, optimizer: torch.optim.Optimizer, 21 | device: torch.device, epoch: int, loss_scaler, max_norm: float = 0, 22 | model_ema: Optional[ModelEma] = None, mixup_fn: Optional[Mixup] = None, 23 | set_training_mode=True, 24 | fp32=False): 25 | model.train(set_training_mode) 26 | metric_logger = utils.MetricLogger(delimiter=" ") 27 | metric_logger.add_meter('lr', utils.SmoothedValue(window_size=1, fmt='{value:.6f}')) 28 | header = 'Epoch: [{}]'.format(epoch) 29 | print_freq = 10 30 | 31 | for samples, targets in metric_logger.log_every(data_loader, print_freq, header): 32 | samples = samples.to(device, non_blocking=True) 33 | targets = targets.to(device, non_blocking=True) 34 | 35 | if mixup_fn is not None: 36 | samples, targets = mixup_fn(samples, targets) 37 | 38 | # with torch.cuda.amp.autocast(): 39 | # outputs = model(samples) 40 | # loss = criterion(samples, outputs, targets) 41 | with torch.cuda.amp.autocast(enabled=not fp32): 42 | outputs = model(samples) 43 | loss = criterion(outputs, targets) 44 | 45 | 46 | loss_value = loss.item() 47 | 48 | if not math.isfinite(loss_value): 49 | print("Loss is {}, stopping training".format(loss_value)) 50 | sys.exit(1) 51 | 52 | optimizer.zero_grad() 53 | 54 | # this attribute is added by timm on one optimizer (adahessian) 55 | 56 | if loss_scaler is not None: 57 | is_second_order = hasattr(optimizer, 'is_second_order') and optimizer.is_second_order 58 | loss_scaler(loss, optimizer, clip_grad=max_norm, 59 | parameters=model.parameters(), create_graph=is_second_order) 60 | 61 | torch.cuda.synchronize() 62 | if model_ema is not None: 63 | model_ema.update(model) 64 | 65 | metric_logger.update(loss=loss_value) 66 | metric_logger.update(lr=optimizer.param_groups[0]["lr"]) 67 | # gather the stats from all processes 68 | metric_logger.synchronize_between_processes() 69 | print("Averaged stats:", metric_logger) 70 | return {k: meter.global_avg for k, meter in metric_logger.meters.items()} 71 | 72 | 73 | @torch.no_grad() 74 | def evaluate(data_loader, model, device): 75 | criterion = torch.nn.CrossEntropyLoss() 76 | 77 | metric_logger = utils.MetricLogger(delimiter=" ") 78 | header = 'Test:' 79 | 80 | # switch to evaluation mode 81 | model.eval() 82 | 83 | for images, target in metric_logger.log_every(data_loader, 10, header): 84 | images = images.to(device, non_blocking=True) 85 | target = target.to(device, non_blocking=True) 86 | 87 | # compute output 88 | with torch.cuda.amp.autocast(): 89 | output = model(images) 90 | loss = criterion(output, target) 91 | 92 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 93 | 94 | batch_size = images.shape[0] 95 | metric_logger.update(loss=loss.item()) 96 | metric_logger.meters['acc1'].update(acc1.item(), n=batch_size) 97 | metric_logger.meters['acc5'].update(acc5.item(), n=batch_size) 98 | # gather the stats from all processes 99 | metric_logger.synchronize_between_processes() 100 | print('* Acc@1 {top1.global_avg:.3f} Acc@5 {top5.global_avg:.3f} loss {losses.global_avg:.3f}' 101 | .format(top1=metric_logger.acc1, top5=metric_logger.acc5, losses=metric_logger.loss)) 102 | 103 | return {k: meter.global_avg for k, meter in metric_logger.meters.items()} 104 | -------------------------------------------------------------------------------- /figures/p2t-arch.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yuhuan-wu/P2T/8811157e77bcca6aecf0206998de29373eaa872d/figures/p2t-arch.jpg -------------------------------------------------------------------------------- /hubconf.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2015-present, Facebook, Inc. 2 | # All rights reserved. 3 | from models import * 4 | 5 | dependencies = ["torch", "torchvision", "timm"] 6 | -------------------------------------------------------------------------------- /mcloader/__init__.py: -------------------------------------------------------------------------------- 1 | from .classification import ClassificationDataset 2 | from .data_prefetcher import DataPrefetcher -------------------------------------------------------------------------------- /mcloader/classification.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.utils.data import Dataset 3 | from .imagenet import ImageNet 4 | 5 | 6 | class ClassificationDataset(Dataset): 7 | """Dataset for classification. 8 | """ 9 | 10 | def __init__(self, split='train', pipeline=None): 11 | if split == 'train': 12 | self.data_source = ImageNet(root='data/imagenet/train', 13 | list_file='data/imagenet/meta/train.txt', 14 | memcached=True, 15 | mclient_path='/mnt/lustre/share/memcached_client') 16 | else: 17 | self.data_source = ImageNet(root='data/imagenet/val', 18 | list_file='data/imagenet/meta/val.txt', 19 | memcached=True, 20 | mclient_path='/mnt/lustre/share/memcached_client') 21 | self.pipeline = pipeline 22 | 23 | def __len__(self): 24 | return self.data_source.get_length() 25 | 26 | def __getitem__(self, idx): 27 | img, target = self.data_source.get_sample(idx) 28 | if self.pipeline is not None: 29 | img = self.pipeline(img) 30 | 31 | return img, target 32 | -------------------------------------------------------------------------------- /mcloader/data_prefetcher.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | 4 | class DataPrefetcher: 5 | def __init__(self, loader): 6 | self.loader = iter(loader) 7 | self.stream = torch.cuda.Stream() 8 | self.preload() 9 | 10 | def preload(self): 11 | try: 12 | self.next_input, self.next_target = next(self.loader) 13 | except StopIteration: 14 | self.next_input = None 15 | self.next_target = None 16 | return 17 | 18 | with torch.cuda.stream(self.stream): 19 | self.next_input = self.next_input.cuda(non_blocking=True) 20 | self.next_target = self.next_target.cuda(non_blocking=True) 21 | 22 | def next(self): 23 | torch.cuda.current_stream().wait_stream(self.stream) 24 | input = self.next_input 25 | target = self.next_target 26 | if input is not None: 27 | self.preload() 28 | return input, target 29 | -------------------------------------------------------------------------------- /mcloader/image_list.py: -------------------------------------------------------------------------------- 1 | import os 2 | from PIL import Image 3 | 4 | from .mcloader import McLoader 5 | 6 | 7 | class ImageList(object): 8 | 9 | def __init__(self, root, list_file, memcached=False, mclient_path=None): 10 | with open(list_file, 'r') as f: 11 | lines = f.readlines() 12 | self.has_labels = len(lines[0].split()) == 2 13 | if self.has_labels: 14 | self.fns, self.labels = zip(*[l.strip().split() for l in lines]) 15 | self.labels = [int(l) for l in self.labels] 16 | else: 17 | self.fns = [l.strip() for l in lines] 18 | self.fns = [os.path.join(root, fn) for fn in self.fns] 19 | self.memcached = memcached 20 | self.mclient_path = mclient_path 21 | self.initialized = False 22 | 23 | def _init_memcached(self): 24 | if not self.initialized: 25 | assert self.mclient_path is not None 26 | self.mc_loader = McLoader(self.mclient_path) 27 | self.initialized = True 28 | 29 | def get_length(self): 30 | return len(self.fns) 31 | 32 | def get_sample(self, idx): 33 | if self.memcached: 34 | self._init_memcached() 35 | if self.memcached: 36 | img = self.mc_loader(self.fns[idx]) 37 | else: 38 | img = Image.open(self.fns[idx]) 39 | img = img.convert('RGB') 40 | if self.has_labels: 41 | target = self.labels[idx] 42 | return img, target 43 | else: 44 | return img 45 | -------------------------------------------------------------------------------- /mcloader/imagenet.py: -------------------------------------------------------------------------------- 1 | from .image_list import ImageList 2 | 3 | 4 | class ImageNet(ImageList): 5 | 6 | def __init__(self, root, list_file, memcached, mclient_path): 7 | super(ImageNet, self).__init__( 8 | root, list_file, memcached, mclient_path) 9 | -------------------------------------------------------------------------------- /mcloader/mcloader.py: -------------------------------------------------------------------------------- 1 | import io 2 | from PIL import Image 3 | try: 4 | import mc 5 | except ImportError as E: 6 | pass 7 | 8 | 9 | def pil_loader(img_str): 10 | buff = io.BytesIO(img_str) 11 | return Image.open(buff) 12 | 13 | 14 | class McLoader(object): 15 | 16 | def __init__(self, mclient_path): 17 | assert mclient_path is not None, \ 18 | "Please specify 'data_mclient_path' in the config." 19 | self.mclient_path = mclient_path 20 | server_list_config_file = "{}/server_list.conf".format( 21 | self.mclient_path) 22 | client_config_file = "{}/client.conf".format(self.mclient_path) 23 | self.mclient = mc.MemcachedClient.GetInstance(server_list_config_file, 24 | client_config_file) 25 | 26 | def __call__(self, fn): 27 | try: 28 | img_value = mc.pyvector() 29 | self.mclient.Get(fn, img_value) 30 | img_value_str = mc.ConvertBuffer(img_value) 31 | img = pil_loader(img_value_str) 32 | except: 33 | print('Read image failed ({})'.format(fn)) 34 | return None 35 | else: 36 | return img -------------------------------------------------------------------------------- /samplers.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2015-present, Facebook, Inc. 2 | # All rights reserved. 3 | import torch 4 | import torch.distributed as dist 5 | import math 6 | 7 | 8 | class RASampler(torch.utils.data.Sampler): 9 | """Sampler that restricts data loading to a subset of the dataset for distributed, 10 | with repeated augmentation. 11 | It ensures that different each augmented version of a sample will be visible to a 12 | different process (GPU) 13 | Heavily based on torch.utils.data.DistributedSampler 14 | """ 15 | 16 | def __init__(self, dataset, num_replicas=None, rank=None, shuffle=True): 17 | if num_replicas is None: 18 | if not dist.is_available(): 19 | raise RuntimeError("Requires distributed package to be available") 20 | num_replicas = dist.get_world_size() 21 | if rank is None: 22 | if not dist.is_available(): 23 | raise RuntimeError("Requires distributed package to be available") 24 | rank = dist.get_rank() 25 | self.dataset = dataset 26 | self.num_replicas = num_replicas 27 | self.rank = rank 28 | self.epoch = 0 29 | self.num_samples = int(math.ceil(len(self.dataset) * 3.0 / self.num_replicas)) 30 | self.total_size = self.num_samples * self.num_replicas 31 | # self.num_selected_samples = int(math.ceil(len(self.dataset) / self.num_replicas)) 32 | self.num_selected_samples = int(math.floor(len(self.dataset) // 256 * 256 / self.num_replicas)) 33 | self.shuffle = shuffle 34 | 35 | def __iter__(self): 36 | # deterministically shuffle based on epoch 37 | g = torch.Generator() 38 | g.manual_seed(self.epoch) 39 | if self.shuffle: 40 | indices = torch.randperm(len(self.dataset), generator=g).tolist() 41 | else: 42 | indices = list(range(len(self.dataset))) 43 | 44 | # add extra samples to make it evenly divisible 45 | indices = [ele for ele in indices for i in range(3)] 46 | indices += indices[:(self.total_size - len(indices))] 47 | assert len(indices) == self.total_size 48 | 49 | # subsample 50 | indices = indices[self.rank:self.total_size:self.num_replicas] 51 | assert len(indices) == self.num_samples 52 | 53 | return iter(indices[:self.num_selected_samples]) 54 | 55 | def __len__(self): 56 | return self.num_selected_samples 57 | 58 | def set_epoch(self, epoch): 59 | self.epoch = epoch 60 | -------------------------------------------------------------------------------- /segmentation/README.md: -------------------------------------------------------------------------------- 1 | ## [TPAMI22] Pyramid Pooling Transformer for Scene Understanding 2 | 3 | This folder contains full training and test code for semantic segmentation. 4 | 5 | ### Requirements 6 | 7 | * mmsegmentation >= 0.12+ 8 | 9 | ### Results (val set) & Pretrained Models) 10 | 11 | 12 | | Base Model | Variants | mIoU | aAcc | mAcc | #Params (M) | # GFLOPS | Google Drive | 13 | | :--: | :-------: | :--: | :--: | :---------: | :------: | :----------------------------------------------------------: | :----------------------------------------------------------: | 14 | | Semantic FPN | P2T-Tiny | 43.4 | 80.8 | 54.5 | 15.4 | 31.6 | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) | 15 | | Semantic FPN | P2T-Small | 46.7 | 82.0 | 58.4 | 27.8 | 42.7 | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) | 16 | | Semantic FPN | P2T-Base | 48.7 | 82.9 | 60.7 | 39.8 | 58.5 | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) | 17 | | Semantic FPN | P2T-Large | 49.4 | 83.3 | 61.9 | 58.1 | 77.7 | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) | 18 | 19 | ### Data Preparation 20 | 21 | Put data folder of ADE20K dataset to `data/ade/ADEChallengeData2016`. 22 | 23 | ### Train 24 | 25 | Use the following commands to train `Semantic FPN` with `P2T-Small` backbone for distributed learning with 8 GPUs: 26 | 27 | ```` 28 | bash dist_train.sh configs/sem_fpn_p2t_s_ade20k_80k.py 8 29 | ```` 30 | 31 | ### Validate 32 | 33 | Please download the pretrained model from the above table. Put them to `pretrained` folder. 34 | Then, use the following commands to validate `Semantic FPN` with `P2T-Small` backbone in a single GPU: 35 | 36 | ```` 37 | bash dist_test.sh configs/sem_fpn_p2t_s_ade20k_80k.py pretrained/sem_fpn_p2t_s_ade20k_80k.pth 1 38 | ```` 39 | 40 | 41 | ### Other Notes 42 | 43 | If you meet any problems, please do not hesitate to contact us. 44 | Issues and discussions are welcome in the repository! 45 | You can also contact us via sending messages to this email: wuyuhuan@mail.nankai.edu.cn 46 | 47 | 48 | 49 | ### Citation 50 | 51 | If you are using the code/model/data provided here in a publication, please consider citing our works: 52 | 53 | ```` 54 | @ARTICLE{wu2022p2t, 55 | author={Wu, Yu-Huan and Liu, Yun and Zhan, Xin and Cheng, Ming-Ming}, 56 | journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 57 | title={{P2T}: Pyramid Pooling Transformer for Scene Understanding}, 58 | year={2022}, 59 | doi = {10.1109/tpami.2022.3202765}, 60 | } 61 | ```` 62 | 63 | ### License 64 | 65 | This code is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License for Non-Commercial use only. Any commercial use should get formal permission first. 66 | 67 | -------------------------------------------------------------------------------- /segmentation/align_resize.py: -------------------------------------------------------------------------------- 1 | import mmcv 2 | import numpy as np 3 | from mmcv.utils import deprecated_api_warning, is_tuple_of 4 | from numpy import random 5 | 6 | from mmseg.datasets.builder import PIPELINES 7 | #from IPython import embed 8 | 9 | @PIPELINES.register_module() 10 | class AlignResize(object): 11 | """Resize images & seg. Align 12 | """ 13 | 14 | def __init__(self, 15 | img_scale=None, 16 | multiscale_mode='range', 17 | ratio_range=None, 18 | keep_ratio=True, 19 | size_divisor=32): 20 | if img_scale is None: 21 | self.img_scale = None 22 | else: 23 | if isinstance(img_scale, list): 24 | self.img_scale = img_scale 25 | else: 26 | self.img_scale = [img_scale] 27 | assert mmcv.is_list_of(self.img_scale, tuple) 28 | 29 | if ratio_range is not None: 30 | # mode 1: given img_scale=None and a range of image ratio 31 | # mode 2: given a scale and a range of image ratio 32 | assert self.img_scale is None or len(self.img_scale) == 1 33 | else: 34 | # mode 3 and 4: given multiple scales or a range of scales 35 | assert multiscale_mode in ['value', 'range'] 36 | 37 | self.multiscale_mode = multiscale_mode 38 | self.ratio_range = ratio_range 39 | self.keep_ratio = keep_ratio 40 | self.size_divisor = size_divisor 41 | 42 | @staticmethod 43 | def random_select(img_scales): 44 | """Randomly select an img_scale from given candidates. 45 | 46 | Args: 47 | img_scales (list[tuple]): Images scales for selection. 48 | 49 | Returns: 50 | (tuple, int): Returns a tuple ``(img_scale, scale_dix)``, 51 | where ``img_scale`` is the selected image scale and 52 | ``scale_idx`` is the selected index in the given candidates. 53 | """ 54 | 55 | assert mmcv.is_list_of(img_scales, tuple) 56 | scale_idx = np.random.randint(len(img_scales)) 57 | img_scale = img_scales[scale_idx] 58 | return img_scale, scale_idx 59 | 60 | @staticmethod 61 | def random_sample(img_scales): 62 | """Randomly sample an img_scale when ``multiscale_mode=='range'``. 63 | 64 | Args: 65 | img_scales (list[tuple]): Images scale range for sampling. 66 | There must be two tuples in img_scales, which specify the lower 67 | and uper bound of image scales. 68 | 69 | Returns: 70 | (tuple, None): Returns a tuple ``(img_scale, None)``, where 71 | ``img_scale`` is sampled scale and None is just a placeholder 72 | to be consistent with :func:`random_select`. 73 | """ 74 | 75 | assert mmcv.is_list_of(img_scales, tuple) and len(img_scales) == 2 76 | img_scale_long = [max(s) for s in img_scales] 77 | img_scale_short = [min(s) for s in img_scales] 78 | long_edge = np.random.randint( 79 | min(img_scale_long), 80 | max(img_scale_long) + 1) 81 | short_edge = np.random.randint( 82 | min(img_scale_short), 83 | max(img_scale_short) + 1) 84 | img_scale = (long_edge, short_edge) 85 | return img_scale, None 86 | 87 | @staticmethod 88 | def random_sample_ratio(img_scale, ratio_range): 89 | """Randomly sample an img_scale when ``ratio_range`` is specified. 90 | 91 | A ratio will be randomly sampled from the range specified by 92 | ``ratio_range``. Then it would be multiplied with ``img_scale`` to 93 | generate sampled scale. 94 | 95 | Args: 96 | img_scale (tuple): Images scale base to multiply with ratio. 97 | ratio_range (tuple[float]): The minimum and maximum ratio to scale 98 | the ``img_scale``. 99 | 100 | Returns: 101 | (tuple, None): Returns a tuple ``(scale, None)``, where 102 | ``scale`` is sampled ratio multiplied with ``img_scale`` and 103 | None is just a placeholder to be consistent with 104 | :func:`random_select`. 105 | """ 106 | 107 | assert isinstance(img_scale, tuple) and len(img_scale) == 2 108 | min_ratio, max_ratio = ratio_range 109 | assert min_ratio <= max_ratio 110 | ratio = np.random.random_sample() * (max_ratio - min_ratio) + min_ratio 111 | scale = int(img_scale[0] * ratio), int(img_scale[1] * ratio) 112 | return scale, None 113 | 114 | def _random_scale(self, results): 115 | """Randomly sample an img_scale according to ``ratio_range`` and 116 | ``multiscale_mode``. 117 | 118 | If ``ratio_range`` is specified, a ratio will be sampled and be 119 | multiplied with ``img_scale``. 120 | If multiple scales are specified by ``img_scale``, a scale will be 121 | sampled according to ``multiscale_mode``. 122 | Otherwise, single scale will be used. 123 | 124 | Args: 125 | results (dict): Result dict from :obj:`dataset`. 126 | 127 | Returns: 128 | dict: Two new keys 'scale` and 'scale_idx` are added into 129 | ``results``, which would be used by subsequent pipelines. 130 | """ 131 | 132 | if self.ratio_range is not None: 133 | if self.img_scale is None: 134 | h, w = results['img'].shape[:2] 135 | scale, scale_idx = self.random_sample_ratio((w, h), 136 | self.ratio_range) 137 | else: 138 | scale, scale_idx = self.random_sample_ratio( 139 | self.img_scale[0], self.ratio_range) 140 | elif len(self.img_scale) == 1: 141 | scale, scale_idx = self.img_scale[0], 0 142 | elif self.multiscale_mode == 'range': 143 | scale, scale_idx = self.random_sample(self.img_scale) 144 | elif self.multiscale_mode == 'value': 145 | scale, scale_idx = self.random_select(self.img_scale) 146 | else: 147 | raise NotImplementedError 148 | 149 | results['scale'] = scale 150 | results['scale_idx'] = scale_idx 151 | 152 | def _align(self, img, size_divisor, interpolation=None): 153 | align_h = int(np.ceil(img.shape[0] / size_divisor)) * size_divisor 154 | align_w = int(np.ceil(img.shape[1] / size_divisor)) * size_divisor 155 | if interpolation == None: 156 | img = mmcv.imresize(img, (align_w, align_h)) 157 | else: 158 | img = mmcv.imresize(img, (align_w, align_h), interpolation=interpolation) 159 | return img 160 | 161 | def _resize_img(self, results): 162 | """Resize images with ``results['scale']``.""" 163 | if self.keep_ratio: 164 | img, scale_factor = mmcv.imrescale( 165 | results['img'], results['scale'], return_scale=True) 166 | #### align #### 167 | img = self._align(img, self.size_divisor) 168 | # the w_scale and h_scale has minor difference 169 | # a real fix should be done in the mmcv.imrescale in the future 170 | new_h, new_w = img.shape[:2] 171 | h, w = results['img'].shape[:2] 172 | w_scale = new_w / w 173 | h_scale = new_h / h 174 | else: 175 | img, w_scale, h_scale = mmcv.imresize( 176 | results['img'], results['scale'], return_scale=True) 177 | 178 | h, w = img.shape[:2] 179 | assert int(np.ceil(h / self.size_divisor)) * self.size_divisor == h and \ 180 | int(np.ceil(w / self.size_divisor)) * self.size_divisor == w, \ 181 | "img size not align. h:{} w:{}".format(h,w) 182 | scale_factor = np.array([w_scale, h_scale, w_scale, h_scale], 183 | dtype=np.float32) 184 | results['img'] = img 185 | results['img_shape'] = img.shape 186 | results['pad_shape'] = img.shape # in case that there is no padding 187 | results['scale_factor'] = scale_factor 188 | results['keep_ratio'] = self.keep_ratio 189 | 190 | def _resize_seg(self, results): 191 | """Resize semantic segmentation map with ``results['scale']``.""" 192 | for key in results.get('seg_fields', []): 193 | if self.keep_ratio: 194 | gt_seg = mmcv.imrescale( 195 | results[key], results['scale'], interpolation='nearest') 196 | gt_seg = self._align(gt_seg, self.size_divisor, interpolation='nearest') 197 | else: 198 | gt_seg = mmcv.imresize( 199 | results[key], results['scale'], interpolation='nearest') 200 | h, w = gt_seg.shape[:2] 201 | assert int(np.ceil(h / self.size_divisor)) * self.size_divisor == h and \ 202 | int(np.ceil(w / self.size_divisor)) * self.size_divisor == w, \ 203 | "gt_seg size not align. h:{} w:{}".format(h, w) 204 | results[key] = gt_seg 205 | 206 | def __call__(self, results): 207 | """Call function to resize images, bounding boxes, masks, semantic 208 | segmentation map. 209 | 210 | Args: 211 | results (dict): Result dict from loading pipeline. 212 | 213 | Returns: 214 | dict: Resized results, 'img_shape', 'pad_shape', 'scale_factor', 215 | 'keep_ratio' keys are added into result dict. 216 | """ 217 | 218 | if 'scale' not in results: 219 | self._random_scale(results) 220 | self._resize_img(results) 221 | self._resize_seg(results) 222 | return results 223 | 224 | def __repr__(self): 225 | repr_str = self.__class__.__name__ 226 | repr_str += (f'(img_scale={self.img_scale}, ' 227 | f'multiscale_mode={self.multiscale_mode}, ' 228 | f'ratio_range={self.ratio_range}, ' 229 | f'keep_ratio={self.keep_ratio})') 230 | return repr_str -------------------------------------------------------------------------------- /segmentation/configs/_base_/datasets/ade20k.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'ADE20KDataset' 3 | data_root = 'data/ade/ADEChallengeData2016' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | crop_size = (512, 512) 7 | train_pipeline = [ 8 | dict(type='LoadImageFromFile'), 9 | dict(type='LoadAnnotations', reduce_zero_label=True), 10 | dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)), 11 | dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), 12 | dict(type='RandomFlip', prob=0.5), 13 | dict(type='PhotoMetricDistortion'), 14 | dict(type='Normalize', **img_norm_cfg), 15 | dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), 16 | dict(type='DefaultFormatBundle'), 17 | dict(type='Collect', keys=['img', 'gt_semantic_seg']), 18 | ] 19 | test_pipeline = [ 20 | dict(type='LoadImageFromFile'), 21 | dict( 22 | type='MultiScaleFlipAug', 23 | img_scale=(2048, 512), 24 | #img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], 25 | flip=False, 26 | transforms=[ 27 | dict(type='AlignResize', keep_ratio=True, size_divisor=32), 28 | dict(type='RandomFlip'), 29 | dict(type='Normalize', **img_norm_cfg), 30 | dict(type='ImageToTensor', keys=['img']), 31 | dict(type='Collect', keys=['img']), 32 | ]) 33 | ] 34 | print(test_pipeline) 35 | 36 | data = dict( 37 | samples_per_gpu=4, 38 | workers_per_gpu=4, 39 | train=dict( 40 | type=dataset_type, 41 | data_root=data_root, 42 | img_dir='images/training', 43 | ann_dir='annotations/training', 44 | pipeline=train_pipeline), 45 | val=dict( 46 | type=dataset_type, 47 | data_root=data_root, 48 | img_dir='images/validation', 49 | ann_dir='annotations/validation', 50 | pipeline=test_pipeline), 51 | test=dict( 52 | type=dataset_type, 53 | data_root=data_root, 54 | img_dir='images/validation', 55 | ann_dir='annotations/validation', 56 | pipeline=test_pipeline)) 57 | -------------------------------------------------------------------------------- /segmentation/configs/_base_/datasets/chase_db1.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'ChaseDB1Dataset' 3 | data_root = 'data/CHASE_DB1' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | img_scale = (960, 999) 7 | crop_size = (128, 128) 8 | train_pipeline = [ 9 | dict(type='LoadImageFromFile'), 10 | dict(type='LoadAnnotations'), 11 | dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), 12 | dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), 13 | dict(type='RandomFlip', prob=0.5), 14 | dict(type='PhotoMetricDistortion'), 15 | dict(type='Normalize', **img_norm_cfg), 16 | dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), 17 | dict(type='DefaultFormatBundle'), 18 | dict(type='Collect', keys=['img', 'gt_semantic_seg']) 19 | ] 20 | test_pipeline = [ 21 | dict(type='LoadImageFromFile'), 22 | dict( 23 | type='MultiScaleFlipAug', 24 | img_scale=img_scale, 25 | # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0], 26 | flip=False, 27 | transforms=[ 28 | dict(type='Resize', keep_ratio=True), 29 | dict(type='RandomFlip'), 30 | dict(type='Normalize', **img_norm_cfg), 31 | dict(type='ImageToTensor', keys=['img']), 32 | dict(type='Collect', keys=['img']) 33 | ]) 34 | ] 35 | 36 | data = dict( 37 | samples_per_gpu=4, 38 | workers_per_gpu=4, 39 | train=dict( 40 | type='RepeatDataset', 41 | times=40000, 42 | dataset=dict( 43 | type=dataset_type, 44 | data_root=data_root, 45 | img_dir='images/training', 46 | ann_dir='annotations/training', 47 | pipeline=train_pipeline)), 48 | val=dict( 49 | type=dataset_type, 50 | data_root=data_root, 51 | img_dir='images/validation', 52 | ann_dir='annotations/validation', 53 | pipeline=test_pipeline), 54 | test=dict( 55 | type=dataset_type, 56 | data_root=data_root, 57 | img_dir='images/validation', 58 | ann_dir='annotations/validation', 59 | pipeline=test_pipeline)) 60 | -------------------------------------------------------------------------------- /segmentation/configs/_base_/datasets/cityscapes.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'CityscapesDataset' 3 | data_root = 'data/cityscapes/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | crop_size = (512, 1024) 7 | train_pipeline = [ 8 | dict(type='LoadImageFromFile'), 9 | dict(type='LoadAnnotations'), 10 | dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)), 11 | dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), 12 | dict(type='RandomFlip', prob=0.5), 13 | dict(type='PhotoMetricDistortion'), 14 | dict(type='Normalize', **img_norm_cfg), 15 | dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), 16 | dict(type='DefaultFormatBundle'), 17 | dict(type='Collect', keys=['img', 'gt_semantic_seg']), 18 | ] 19 | test_pipeline = [ 20 | dict(type='LoadImageFromFile'), 21 | dict( 22 | type='MultiScaleFlipAug', 23 | img_scale=(2048, 1024), 24 | # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], 25 | flip=False, 26 | transforms=[ 27 | dict(type='AlignResize', keep_ratio=True, size_divisor=32), 28 | dict(type='RandomFlip'), 29 | dict(type='Normalize', **img_norm_cfg), 30 | dict(type='ImageToTensor', keys=['img']), 31 | dict(type='Collect', keys=['img']), 32 | ]) 33 | ] 34 | data = dict( 35 | samples_per_gpu=2, 36 | workers_per_gpu=2, 37 | train=dict( 38 | type=dataset_type, 39 | data_root=data_root, 40 | img_dir='leftImg8bit/train', 41 | ann_dir='gtFine/train', 42 | pipeline=train_pipeline), 43 | val=dict( 44 | type=dataset_type, 45 | data_root=data_root, 46 | img_dir='leftImg8bit/val', 47 | ann_dir='gtFine/val', 48 | pipeline=test_pipeline), 49 | test=dict( 50 | type=dataset_type, 51 | data_root=data_root, 52 | img_dir='leftImg8bit/val', 53 | ann_dir='gtFine/val', 54 | pipeline=test_pipeline)) 55 | -------------------------------------------------------------------------------- /segmentation/configs/_base_/datasets/drive.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'DRIVEDataset' 3 | data_root = 'data/DRIVE' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | img_scale = (584, 565) 7 | crop_size = (64, 64) 8 | train_pipeline = [ 9 | dict(type='LoadImageFromFile'), 10 | dict(type='LoadAnnotations'), 11 | dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), 12 | dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), 13 | dict(type='RandomFlip', prob=0.5), 14 | dict(type='PhotoMetricDistortion'), 15 | dict(type='Normalize', **img_norm_cfg), 16 | dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), 17 | dict(type='DefaultFormatBundle'), 18 | dict(type='Collect', keys=['img', 'gt_semantic_seg']) 19 | ] 20 | test_pipeline = [ 21 | dict(type='LoadImageFromFile'), 22 | dict( 23 | type='MultiScaleFlipAug', 24 | img_scale=img_scale, 25 | # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0], 26 | flip=False, 27 | transforms=[ 28 | dict(type='Resize', keep_ratio=True), 29 | dict(type='RandomFlip'), 30 | dict(type='Normalize', **img_norm_cfg), 31 | dict(type='ImageToTensor', keys=['img']), 32 | dict(type='Collect', keys=['img']) 33 | ]) 34 | ] 35 | 36 | data = dict( 37 | samples_per_gpu=4, 38 | workers_per_gpu=4, 39 | train=dict( 40 | type='RepeatDataset', 41 | times=40000, 42 | dataset=dict( 43 | type=dataset_type, 44 | data_root=data_root, 45 | img_dir='images/training', 46 | ann_dir='annotations/training', 47 | pipeline=train_pipeline)), 48 | val=dict( 49 | type=dataset_type, 50 | data_root=data_root, 51 | img_dir='images/validation', 52 | ann_dir='annotations/validation', 53 | pipeline=test_pipeline), 54 | test=dict( 55 | type=dataset_type, 56 | data_root=data_root, 57 | img_dir='images/validation', 58 | ann_dir='annotations/validation', 59 | pipeline=test_pipeline)) 60 | -------------------------------------------------------------------------------- /segmentation/configs/_base_/datasets/hrf.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'HRFDataset' 3 | data_root = 'data/HRF' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | img_scale = (2336, 3504) 7 | crop_size = (256, 256) 8 | train_pipeline = [ 9 | dict(type='LoadImageFromFile'), 10 | dict(type='LoadAnnotations'), 11 | dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), 12 | dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), 13 | dict(type='RandomFlip', prob=0.5), 14 | dict(type='PhotoMetricDistortion'), 15 | dict(type='Normalize', **img_norm_cfg), 16 | dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), 17 | dict(type='DefaultFormatBundle'), 18 | dict(type='Collect', keys=['img', 'gt_semantic_seg']) 19 | ] 20 | test_pipeline = [ 21 | dict(type='LoadImageFromFile'), 22 | dict( 23 | type='MultiScaleFlipAug', 24 | img_scale=img_scale, 25 | # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0], 26 | flip=False, 27 | transforms=[ 28 | dict(type='Resize', keep_ratio=True), 29 | dict(type='RandomFlip'), 30 | dict(type='Normalize', **img_norm_cfg), 31 | dict(type='ImageToTensor', keys=['img']), 32 | dict(type='Collect', keys=['img']) 33 | ]) 34 | ] 35 | 36 | data = dict( 37 | samples_per_gpu=4, 38 | workers_per_gpu=4, 39 | train=dict( 40 | type='RepeatDataset', 41 | times=40000, 42 | dataset=dict( 43 | type=dataset_type, 44 | data_root=data_root, 45 | img_dir='images/training', 46 | ann_dir='annotations/training', 47 | pipeline=train_pipeline)), 48 | val=dict( 49 | type=dataset_type, 50 | data_root=data_root, 51 | img_dir='images/validation', 52 | ann_dir='annotations/validation', 53 | pipeline=test_pipeline), 54 | test=dict( 55 | type=dataset_type, 56 | data_root=data_root, 57 | img_dir='images/validation', 58 | ann_dir='annotations/validation', 59 | pipeline=test_pipeline)) 60 | -------------------------------------------------------------------------------- /segmentation/configs/_base_/datasets/pascal_context.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'PascalContextDataset' 3 | data_root = 'data/VOCdevkit/VOC2010/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | 7 | img_scale = (520, 520) 8 | crop_size = (480, 480) 9 | 10 | train_pipeline = [ 11 | dict(type='LoadImageFromFile'), 12 | dict(type='LoadAnnotations'), 13 | dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), 14 | dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), 15 | dict(type='RandomFlip', prob=0.5), 16 | dict(type='PhotoMetricDistortion'), 17 | dict(type='Normalize', **img_norm_cfg), 18 | dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), 19 | dict(type='DefaultFormatBundle'), 20 | dict(type='Collect', keys=['img', 'gt_semantic_seg']), 21 | ] 22 | test_pipeline = [ 23 | dict(type='LoadImageFromFile'), 24 | dict( 25 | type='MultiScaleFlipAug', 26 | img_scale=img_scale, 27 | # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], 28 | flip=False, 29 | transforms=[ 30 | dict(type='Resize', keep_ratio=True), 31 | dict(type='RandomFlip'), 32 | dict(type='Normalize', **img_norm_cfg), 33 | dict(type='ImageToTensor', keys=['img']), 34 | dict(type='Collect', keys=['img']), 35 | ]) 36 | ] 37 | data = dict( 38 | samples_per_gpu=4, 39 | workers_per_gpu=4, 40 | train=dict( 41 | type=dataset_type, 42 | data_root=data_root, 43 | img_dir='JPEGImages', 44 | ann_dir='SegmentationClassContext', 45 | split='ImageSets/SegmentationContext/train.txt', 46 | pipeline=train_pipeline), 47 | val=dict( 48 | type=dataset_type, 49 | data_root=data_root, 50 | img_dir='JPEGImages', 51 | ann_dir='SegmentationClassContext', 52 | split='ImageSets/SegmentationContext/val.txt', 53 | pipeline=test_pipeline), 54 | test=dict( 55 | type=dataset_type, 56 | data_root=data_root, 57 | img_dir='JPEGImages', 58 | ann_dir='SegmentationClassContext', 59 | split='ImageSets/SegmentationContext/val.txt', 60 | pipeline=test_pipeline)) 61 | -------------------------------------------------------------------------------- /segmentation/configs/_base_/datasets/pascal_voc12.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'PascalVOCDataset' 3 | data_root = 'data/VOCdevkit/VOC2012' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | crop_size = (512, 512) 7 | train_pipeline = [ 8 | dict(type='LoadImageFromFile'), 9 | dict(type='LoadAnnotations'), 10 | dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)), 11 | dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), 12 | dict(type='RandomFlip', prob=0.5), 13 | dict(type='PhotoMetricDistortion'), 14 | dict(type='Normalize', **img_norm_cfg), 15 | dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), 16 | dict(type='DefaultFormatBundle'), 17 | dict(type='Collect', keys=['img', 'gt_semantic_seg']), 18 | ] 19 | test_pipeline = [ 20 | dict(type='LoadImageFromFile'), 21 | dict( 22 | type='MultiScaleFlipAug', 23 | img_scale=(2048, 512), 24 | # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], 25 | flip=False, 26 | transforms=[ 27 | dict(type='Resize', keep_ratio=True), 28 | dict(type='RandomFlip'), 29 | dict(type='Normalize', **img_norm_cfg), 30 | dict(type='ImageToTensor', keys=['img']), 31 | dict(type='Collect', keys=['img']), 32 | ]) 33 | ] 34 | data = dict( 35 | samples_per_gpu=4, 36 | workers_per_gpu=4, 37 | train=dict( 38 | type=dataset_type, 39 | data_root=data_root, 40 | img_dir='JPEGImages', 41 | ann_dir='SegmentationClass', 42 | split='ImageSets/Segmentation/train.txt', 43 | pipeline=train_pipeline), 44 | val=dict( 45 | type=dataset_type, 46 | data_root=data_root, 47 | img_dir='JPEGImages', 48 | ann_dir='SegmentationClass', 49 | split='ImageSets/Segmentation/val.txt', 50 | pipeline=test_pipeline), 51 | test=dict( 52 | type=dataset_type, 53 | data_root=data_root, 54 | img_dir='JPEGImages', 55 | ann_dir='SegmentationClass', 56 | split='ImageSets/Segmentation/val.txt', 57 | pipeline=test_pipeline)) 58 | -------------------------------------------------------------------------------- /segmentation/configs/_base_/datasets/pascal_voc12_aug.py: -------------------------------------------------------------------------------- 1 | _base_ = './pascal_voc12.py' 2 | # dataset settings 3 | data = dict( 4 | train=dict( 5 | ann_dir=['SegmentationClass', 'SegmentationClassAug'], 6 | split=[ 7 | 'ImageSets/Segmentation/train.txt', 8 | 'ImageSets/Segmentation/aug.txt' 9 | ])) 10 | -------------------------------------------------------------------------------- /segmentation/configs/_base_/datasets/stare.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'STAREDataset' 3 | data_root = 'data/STARE' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | img_scale = (605, 700) 7 | crop_size = (128, 128) 8 | train_pipeline = [ 9 | dict(type='LoadImageFromFile'), 10 | dict(type='LoadAnnotations'), 11 | dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), 12 | dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), 13 | dict(type='RandomFlip', prob=0.5), 14 | dict(type='PhotoMetricDistortion'), 15 | dict(type='Normalize', **img_norm_cfg), 16 | dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), 17 | dict(type='DefaultFormatBundle'), 18 | dict(type='Collect', keys=['img', 'gt_semantic_seg']) 19 | ] 20 | test_pipeline = [ 21 | dict(type='LoadImageFromFile'), 22 | dict( 23 | type='MultiScaleFlipAug', 24 | img_scale=img_scale, 25 | # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0], 26 | flip=False, 27 | transforms=[ 28 | dict(type='Resize', keep_ratio=True), 29 | dict(type='RandomFlip'), 30 | dict(type='Normalize', **img_norm_cfg), 31 | dict(type='ImageToTensor', keys=['img']), 32 | dict(type='Collect', keys=['img']) 33 | ]) 34 | ] 35 | 36 | data = dict( 37 | samples_per_gpu=4, 38 | workers_per_gpu=4, 39 | train=dict( 40 | type='RepeatDataset', 41 | times=40000, 42 | dataset=dict( 43 | type=dataset_type, 44 | data_root=data_root, 45 | img_dir='images/training', 46 | ann_dir='annotations/training', 47 | pipeline=train_pipeline)), 48 | val=dict( 49 | type=dataset_type, 50 | data_root=data_root, 51 | img_dir='images/validation', 52 | ann_dir='annotations/validation', 53 | pipeline=test_pipeline), 54 | test=dict( 55 | type=dataset_type, 56 | data_root=data_root, 57 | img_dir='images/validation', 58 | ann_dir='annotations/validation', 59 | pipeline=test_pipeline)) 60 | -------------------------------------------------------------------------------- /segmentation/configs/_base_/default_runtime.py: -------------------------------------------------------------------------------- 1 | # yapf:disable 2 | log_config = dict( 3 | interval=50, 4 | hooks=[ 5 | dict(type='TextLoggerHook', by_epoch=False), 6 | # dict(type='TensorboardLoggerHook') 7 | ]) 8 | # yapf:enable 9 | dist_params = dict(backend='nccl') 10 | log_level = 'INFO' 11 | load_from = None 12 | resume_from = None 13 | workflow = [('train', 1)] 14 | cudnn_benchmark = False 15 | 16 | -------------------------------------------------------------------------------- /segmentation/configs/_base_/models/fpn_r50.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | norm_cfg = dict(type='SyncBN', requires_grad=True) 3 | model = dict( 4 | type='EncoderDecoder', 5 | pretrained='open-mmlab://resnet50_v1c', 6 | backbone=dict( 7 | type='ResNetV1c', 8 | depth=50, 9 | num_stages=4, 10 | out_indices=(0, 1, 2, 3), 11 | dilations=(1, 1, 1, 1), 12 | strides=(1, 2, 2, 2), 13 | norm_cfg=norm_cfg, 14 | norm_eval=False, 15 | style='pytorch', 16 | contract_dilation=True), 17 | neck=dict( 18 | type='FPN', 19 | in_channels=[256, 512, 1024, 2048], 20 | out_channels=256, 21 | num_outs=4), 22 | decode_head=dict( 23 | type='FPNHead', 24 | in_channels=[256, 256, 256, 256], 25 | in_index=[0, 1, 2, 3], 26 | feature_strides=[4, 8, 16, 32], 27 | channels=128, 28 | dropout_ratio=0.1, 29 | num_classes=19, 30 | norm_cfg=norm_cfg, 31 | align_corners=False, 32 | loss_decode=dict( 33 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), 34 | # model training and testing settings 35 | train_cfg=dict(), 36 | test_cfg=dict(mode='whole')) 37 | find_unused_parameters = True -------------------------------------------------------------------------------- /segmentation/configs/_base_/models/upernet_r50.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | norm_cfg = dict(type='SyncBN', requires_grad=True) 3 | model = dict( 4 | type='EncoderDecoder', 5 | pretrained='open-mmlab://resnet50_v1c', 6 | backbone=dict( 7 | type='ResNetV1c', 8 | depth=50, 9 | num_stages=4, 10 | out_indices=(0, 1, 2, 3), 11 | dilations=(1, 1, 1, 1), 12 | strides=(1, 2, 2, 2), 13 | norm_cfg=norm_cfg, 14 | norm_eval=False, 15 | style='pytorch', 16 | contract_dilation=True), 17 | decode_head=dict( 18 | type='UPerHead', 19 | in_channels=[256, 512, 1024, 2048], 20 | in_index=[0, 1, 2, 3], 21 | pool_scales=(1, 2, 3, 6), 22 | channels=512, 23 | dropout_ratio=0.1, 24 | num_classes=19, 25 | norm_cfg=norm_cfg, 26 | align_corners=False, 27 | loss_decode=dict( 28 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), 29 | auxiliary_head=dict( 30 | type='FCNHead', 31 | in_channels=1024, 32 | in_index=2, 33 | channels=256, 34 | num_convs=1, 35 | concat_input=False, 36 | dropout_ratio=0.1, 37 | num_classes=19, 38 | norm_cfg=norm_cfg, 39 | align_corners=False, 40 | loss_decode=dict( 41 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), 42 | # model training and testing settings 43 | train_cfg=dict(), 44 | test_cfg=dict(mode='whole') 45 | # test_cfg=dict(mode='slide', crop_size=(1024, 1024), stride=(768, 768)) 46 | ) 47 | -------------------------------------------------------------------------------- /segmentation/configs/_base_/schedules/schedule_160k.py: -------------------------------------------------------------------------------- 1 | # optimizer 2 | optimizer = dict(type='SGD', lr=0.01, weight_decay=0.0005) 3 | optimizer_config = dict() 4 | # learning policy 5 | lr_config = dict(policy='poly', power=0.9, min_lr=1e-5, by_epoch=False) 6 | # runtime settings 7 | runner = dict(type='IterBasedRunner', max_iters=160000) 8 | checkpoint_config = dict(by_epoch=False, interval=16000) 9 | evaluation = dict(interval=16000, metric='mIoU') 10 | -------------------------------------------------------------------------------- /segmentation/configs/_base_/schedules/schedule_20k.py: -------------------------------------------------------------------------------- 1 | # optimizer 2 | optimizer = dict(type='SGD', lr=0.01, weight_decay=0.0005) 3 | optimizer_config = dict() 4 | # learning policy 5 | lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) 6 | # runtime settings 7 | runner = dict(type='IterBasedRunner', max_iters=20000) 8 | checkpoint_config = dict(by_epoch=False, interval=2000) 9 | evaluation = dict(interval=2000, metric='mIoU') 10 | -------------------------------------------------------------------------------- /segmentation/configs/_base_/schedules/schedule_40k.py: -------------------------------------------------------------------------------- 1 | # optimizer 2 | optimizer = dict(type='SGD', lr=0.01, weight_decay=0.0005) 3 | optimizer_config = dict() 4 | # learning policy 5 | lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) 6 | # runtime settings 7 | runner = dict(type='IterBasedRunner', max_iters=40000) 8 | checkpoint_config = dict(by_epoch=False, interval=4000) 9 | evaluation = dict(interval=4000, metric='mIoU') 10 | -------------------------------------------------------------------------------- /segmentation/configs/_base_/schedules/schedule_80k.py: -------------------------------------------------------------------------------- 1 | # optimizer 2 | optimizer = dict(type='SGD', lr=0.01, weight_decay=0.0005) 3 | optimizer_config = dict() 4 | # learning policy 5 | lr_config = dict(policy='poly', power=0.9, min_lr=1e-6, by_epoch=False) 6 | # runtime settings 7 | runner = dict(type='IterBasedRunner', max_iters=80000) 8 | checkpoint_config = dict(by_epoch=False, interval=8000) 9 | evaluation = dict(interval=8000, metric='mIoU') 10 | -------------------------------------------------------------------------------- /segmentation/configs/sem_fpn_p2t_b_ade20k_80k.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '_base_/models/fpn_r50.py', '_base_/datasets/ade20k.py', 3 | '_base_/default_runtime.py', '_base_/schedules/schedule_80k.py' 4 | ] 5 | 6 | model = dict( 7 | type='EncoderDecoder', 8 | pretrained='pretrained/p2t_base.pth', 9 | backbone=dict( 10 | type='p2t_base', 11 | style='pytorch'), 12 | neck=dict( 13 | type='FPN', 14 | in_channels=[64, 128, 320, 512], 15 | out_channels=256, 16 | num_outs=4), 17 | decode_head=dict(num_classes=150), 18 | ) 19 | cudnn_benchmark = False 20 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001) 21 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) 22 | data = dict(samples_per_gpu=2) 23 | find_unused_parameters = True 24 | -------------------------------------------------------------------------------- /segmentation/configs/sem_fpn_p2t_l_ade20k_80k.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '_base_/models/fpn_r50.py', '_base_/datasets/ade20k.py', 3 | '_base_/default_runtime.py', '_base_/schedules/schedule_80k.py' 4 | ] 5 | 6 | model = dict( 7 | type='EncoderDecoder', 8 | pretrained='pretrained/p2t_large.pth', 9 | backbone=dict( 10 | type='p2t_large', 11 | style='pytorch'), 12 | neck=dict( 13 | type='FPN', 14 | in_channels=[64, 128, 320, 640], 15 | out_channels=256, 16 | num_outs=4), 17 | decode_head=dict(num_classes=150), 18 | ) 19 | cudnn_benchmark = False 20 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001) 21 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) 22 | data = dict(samples_per_gpu=2) 23 | find_unused_parameters = True 24 | -------------------------------------------------------------------------------- /segmentation/configs/sem_fpn_p2t_s_ade20k_80k.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '_base_/models/fpn_r50.py', '_base_/datasets/ade20k.py', 3 | '_base_/default_runtime.py', '_base_/schedules/schedule_80k.py' 4 | ] 5 | 6 | model = dict( 7 | type='EncoderDecoder', 8 | pretrained='pretrained/p2t_small.pth', 9 | backbone=dict( 10 | type='p2t_small', 11 | style='pytorch'), 12 | neck=dict( 13 | type='FPN', 14 | in_channels=[64, 128, 320, 512], 15 | out_channels=256, 16 | num_outs=4), 17 | decode_head=dict(num_classes=150), 18 | ) 19 | cudnn_benchmark = False 20 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001) 21 | optimizer_config = dict() 22 | data = dict(samples_per_gpu=2) 23 | find_unused_parameters = True 24 | -------------------------------------------------------------------------------- /segmentation/configs/sem_fpn_p2t_t_ade20k_80k.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '_base_/models/fpn_r50.py', '_base_/datasets/ade20k.py', 3 | '_base_/default_runtime.py', '_base_/schedules/schedule_80k.py' 4 | ] 5 | 6 | model = dict( 7 | type='EncoderDecoder', 8 | pretrained='pretrained/p2t_tiny.pth', 9 | backbone=dict( 10 | type='p2t_tiny', 11 | style='pytorch'), 12 | neck=dict( 13 | type='FPN', 14 | in_channels=[48, 96, 240, 384], 15 | out_channels=256, 16 | num_outs=4), 17 | decode_head=dict(num_classes=150), 18 | ) 19 | cudnn_benchmark = False 20 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001) 21 | optimizer_config = dict() 22 | data = dict(samples_per_gpu=2) 23 | find_unused_parameters = True 24 | -------------------------------------------------------------------------------- /segmentation/dist_test.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | CONFIG=$1 4 | CHECKPOINT=$2 5 | GPUS=$3 6 | PORT=${PORT:-29400} 7 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ 8 | python $(dirname "$0")/test.py $CONFIG $CHECKPOINT ${@:4} 9 | 10 | ## example command: 11 | ## bash dist_test.sh configs/sem_fpn_p2t_s_ade20k_80k.py pretrained/sem_fpn_p2t_s_ade20k_80k.pth 1 --eval mIoU 12 | -------------------------------------------------------------------------------- /segmentation/dist_train.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | export OMP_NUM_THREADS=1 4 | 5 | CONFIG=$1 6 | N_GPUS=$2 7 | PORT=${PORT:-29500} 8 | 9 | 10 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ 11 | python -m torch.distributed.launch --nproc_per_node=${N_GPUS} \ 12 | --master_port=${PORT} \ 13 | --use_env $(dirname "$0")/train.py ${CONFIG} --launcher pytorch ${@:3} 14 | 15 | ## bash dist_train.sh configs/sem_fpn_p2t_s_ade20k_80k.py 8 16 | ## training [p2t_small + semantic fpn] costs ~4GB GPU memory for each GPU (2 images/gpu). 17 | -------------------------------------------------------------------------------- /segmentation/p2t.py: -------------------------------------------------------------------------------- 1 | from os import sep 2 | from pickle import TRUE 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | import torch.jit as jit 7 | from functools import partial 8 | 9 | from timm.models.layers import DropPath, to_2tuple, trunc_normal_ 10 | from timm.models.registry import register_model 11 | from timm.models.vision_transformer import _cfg 12 | 13 | from mmseg.models.builder import BACKBONES 14 | from mmcv.runner import load_checkpoint 15 | from mmseg.utils import get_root_logger 16 | 17 | 18 | import numpy as np 19 | from time import time 20 | 21 | __all__ = [ 22 | 'p2t_tiny', 'p2t_small', 'p2t_base', 'p2t_large' 23 | ] 24 | 25 | 26 | 27 | class IRB(nn.Module): 28 | def __init__(self, in_features, hidden_features=None, out_features=None, ksize=3, act_layer=nn.Hardswish, drop=0.): 29 | super().__init__() 30 | out_features = out_features or in_features 31 | hidden_features = hidden_features or in_features 32 | self.fc1 = nn.Conv2d(in_features, hidden_features, 1, 1, 0) 33 | self.act = act_layer() 34 | self.conv = nn.Conv2d(hidden_features, hidden_features, kernel_size=ksize, padding=ksize//2, stride=1, groups=hidden_features) 35 | self.fc2 = nn.Conv2d(hidden_features, out_features, 1, 1, 0) 36 | self.drop = nn.Dropout(drop) 37 | 38 | def forward(self, x, H, W): 39 | B, N, C = x.shape 40 | x = x.permute(0,2,1).reshape(B, C, H, W) 41 | x = self.fc1(x) 42 | x = self.act(x) 43 | x = self.conv(x) 44 | x = self.act(x) 45 | x = self.fc2(x) 46 | return x.reshape(B, C, -1).permute(0,2,1) 47 | 48 | 49 | class PoolingAttention(nn.Module): 50 | def __init__(self, dim, num_heads=2, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0., 51 | pool_ratios=[1,2,3,6]): 52 | 53 | super().__init__() 54 | assert dim % num_heads == 0, f"dim {dim} should be divided by num_heads {num_heads}." 55 | 56 | self.dim = dim 57 | self.num_heads = num_heads 58 | self.num_elements = np.array([t*t for t in pool_ratios]).sum() 59 | head_dim = dim // num_heads 60 | self.scale = qk_scale or head_dim ** -0.5 61 | 62 | self.q = nn.Sequential(nn.Linear(dim, dim, bias=qkv_bias)) 63 | self.kv = nn.Sequential(nn.Linear(dim, dim * 2, bias=qkv_bias)) 64 | 65 | self.attn_drop = nn.Dropout(attn_drop) 66 | self.proj = nn.Linear(dim, dim) 67 | self.proj_drop = nn.Dropout(proj_drop) 68 | 69 | self.pool_ratios = pool_ratios 70 | self.pools = nn.ModuleList() 71 | 72 | self.norm = nn.LayerNorm(dim) 73 | 74 | def forward(self, x, H, W, d_convs=None): 75 | B, N, C = x.shape 76 | 77 | q = self.q(x).reshape(B, N, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3) 78 | pools = [] 79 | x_ = x.permute(0, 2, 1).reshape(B, C, H, W) 80 | for (pool_ratio, l) in zip(self.pool_ratios, d_convs): 81 | pool = F.adaptive_avg_pool2d(x_, (round(H/pool_ratio), round(W/pool_ratio))) 82 | pool = pool + l(pool) 83 | pools.append(pool.view(B, C, -1)) 84 | 85 | pools = torch.cat(pools, dim=2) 86 | pools = self.norm(pools.permute(0,2,1)) 87 | 88 | kv = self.kv(pools).reshape(B, -1, 2, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4) 89 | k, v = kv[0], kv[1] 90 | 91 | attn = (q @ k.transpose(-2, -1)) * self.scale 92 | attn = attn.softmax(dim=-1) 93 | x = (attn @ v) 94 | x = x.transpose(1,2).contiguous().reshape(B, N, C) 95 | 96 | x = self.proj(x) 97 | 98 | return x 99 | 100 | 101 | class Block(nn.Module): 102 | 103 | def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0., 104 | drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, pool_ratios=[12,16,20,24]): 105 | super().__init__() 106 | self.norm1 = norm_layer(dim) 107 | self.attn = PoolingAttention( 108 | dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, 109 | attn_drop=attn_drop, proj_drop=drop, pool_ratios=pool_ratios) 110 | 111 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() 112 | 113 | self.norm2 = norm_layer(dim) 114 | self.mlp = IRB(in_features=dim, hidden_features=int(dim * mlp_ratio), act_layer=nn.Hardswish, drop=drop, ksize=3) 115 | 116 | def forward(self, x, H, W, d_convs=None): 117 | x = x + self.drop_path(self.attn(self.norm1(x), H, W, d_convs=d_convs)) 118 | x = x + self.drop_path(self.mlp(self.norm2(x), H, W)) 119 | 120 | return x 121 | 122 | class PatchEmbed(nn.Module): 123 | """ Image to Patch Embedding 124 | """ 125 | 126 | def __init__(self, img_size=224, patch_size=16, kernel_size=3, in_chans=3, embed_dim=768, overlap=True): 127 | super().__init__() 128 | img_size = to_2tuple(img_size) 129 | patch_size = to_2tuple(patch_size) 130 | 131 | self.img_size = img_size 132 | self.patch_size = patch_size 133 | assert img_size[0] % patch_size[0] == 0 and img_size[1] % patch_size[1] == 0, \ 134 | f"img_size {img_size} should be divided by patch_size {patch_size}." 135 | self.H, self.W = img_size[0] // patch_size[0], img_size[1] // patch_size[1] 136 | self.num_patches = self.H * self.W 137 | if not overlap: 138 | self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size) 139 | else: 140 | self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=kernel_size, stride=patch_size, padding=kernel_size//2) 141 | 142 | self.norm = nn.LayerNorm(embed_dim) 143 | 144 | def forward(self, x): 145 | B, C, H, W = x.shape 146 | x = self.proj(x).flatten(2).transpose(1, 2) 147 | x = self.norm(x) 148 | H, W = H // self.patch_size[0], W // self.patch_size[1] 149 | 150 | return x, (H, W) 151 | 152 | 153 | 154 | class PyramidPoolingTransformer(nn.Module): 155 | def __init__(self, img_size=224, patch_size=4, in_chans=3, num_classes=1000, embed_dims=[64, 128, 320, 512], 156 | num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True, qk_scale=None, drop_rate=0., 157 | attn_drop_rate=0., drop_path_rate=0.1, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2, 2, 9, 3], **kwargs): # 158 | super().__init__() 159 | print("loading p2t") 160 | self.num_classes = num_classes 161 | self.depths = depths 162 | 163 | self.embed_dims = embed_dims 164 | 165 | # pyramid pooling ratios for each stage 166 | pool_ratios = [[12,16,20,24], [6,8,10,12], [3,4,5,6], [1,2,3,4]] 167 | 168 | self.patch_embed1 = PatchEmbed(img_size=img_size, patch_size=4, kernel_size=7, in_chans=in_chans, 169 | embed_dim=embed_dims[0], overlap=True) 170 | 171 | self.patch_embed2 = PatchEmbed(img_size=img_size // 4, patch_size=2, in_chans=embed_dims[0], 172 | embed_dim=embed_dims[1], overlap=True) 173 | self.patch_embed3 = PatchEmbed(img_size=img_size // 8, patch_size=2, in_chans=embed_dims[1], 174 | embed_dim=embed_dims[2], overlap=True) 175 | self.patch_embed4 = PatchEmbed(img_size=img_size // 16, patch_size=2, in_chans=embed_dims[2], 176 | embed_dim=embed_dims[3], overlap=True) 177 | 178 | self.d_convs1 = nn.ModuleList([nn.Conv2d(embed_dims[0], embed_dims[0], kernel_size=3, stride=1, padding=1, groups=embed_dims[0]) for temp in pool_ratios[0]]) 179 | self.d_convs2 = nn.ModuleList([nn.Conv2d(embed_dims[1], embed_dims[1], kernel_size=3, stride=1, padding=1, groups=embed_dims[1]) for temp in pool_ratios[1]]) 180 | self.d_convs3 = nn.ModuleList([nn.Conv2d(embed_dims[2], embed_dims[2], kernel_size=3, stride=1, padding=1, groups=embed_dims[2]) for temp in pool_ratios[2]]) 181 | self.d_convs4 = nn.ModuleList([nn.Conv2d(embed_dims[3], embed_dims[3], kernel_size=3, stride=1, padding=1, groups=embed_dims[3]) for temp in pool_ratios[3]]) 182 | 183 | # transformer encoder 184 | dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))] # stochastic depth decay rule 185 | cur = 0 186 | 187 | 188 | ksize = 3 189 | 190 | self.block1 = nn.ModuleList([Block( 191 | dim=embed_dims[0], num_heads=num_heads[0], mlp_ratio=mlp_ratios[0], qkv_bias=qkv_bias, qk_scale=qk_scale, 192 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[0]) 193 | for i in range(depths[0])]) 194 | 195 | 196 | cur += depths[0] 197 | self.block2 = nn.ModuleList([Block( 198 | dim=embed_dims[1], num_heads=num_heads[1], mlp_ratio=mlp_ratios[1], qkv_bias=qkv_bias, qk_scale=qk_scale, 199 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[1]) 200 | for i in range(depths[1])]) 201 | 202 | cur += depths[1] 203 | 204 | 205 | self.block3 = nn.ModuleList([Block( 206 | dim=embed_dims[2], num_heads=num_heads[2], mlp_ratio=mlp_ratios[2], qkv_bias=qkv_bias, qk_scale=qk_scale, 207 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[2]) 208 | for i in range(depths[2])]) 209 | 210 | cur += depths[2] 211 | 212 | self.block4 = nn.ModuleList([Block( 213 | dim=embed_dims[3], num_heads=num_heads[3], mlp_ratio=mlp_ratios[3], qkv_bias=qkv_bias, qk_scale=qk_scale, 214 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[3]) 215 | for i in range(depths[3])]) 216 | 217 | # classification head, usually not used in dense prediction tasks 218 | self.head = nn.Linear(embed_dims[3], num_classes) if num_classes > 0 else nn.Identity() 219 | self.gap = nn.AdaptiveAvgPool1d(1) 220 | 221 | self.apply(self._init_weights) 222 | 223 | 224 | def init_weights(self, pretrained=None): 225 | if isinstance(pretrained, str): 226 | logger = get_root_logger() 227 | load_checkpoint(self, pretrained, map_location='cpu', strict=False, logger=logger) 228 | 229 | 230 | def reset_drop_path(self, drop_path_rate): 231 | dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(self.depths))] 232 | cur = 0 233 | for i in range(self.depths[0]): 234 | self.block1[i].drop_path.drop_prob = dpr[cur + i] 235 | 236 | cur += self.depths[0] 237 | for i in range(self.depths[1]): 238 | self.block2[i].drop_path.drop_prob = dpr[cur + i] 239 | 240 | cur += self.depths[1] 241 | for i in range(self.depths[2]): 242 | self.block3[i].drop_path.drop_prob = dpr[cur + i] 243 | 244 | cur += self.depths[2] 245 | for i in range(self.depths[3]): 246 | self.block4[i].drop_path.drop_prob = dpr[cur + i] 247 | 248 | def _init_weights(self, m): 249 | if isinstance(m, nn.Linear): 250 | trunc_normal_(m.weight, std=.02) 251 | if isinstance(m, nn.Linear) and m.bias is not None: 252 | nn.init.constant_(m.bias, 0) 253 | elif isinstance(m, nn.LayerNorm): 254 | nn.init.constant_(m.bias, 0) 255 | nn.init.constant_(m.weight, 1.0) 256 | 257 | 258 | @torch.jit.ignore 259 | def no_weight_decay(self): 260 | # return {'pos_embed', 'cls_token'} # has pos_embed may be better 261 | return {'cls_token'} 262 | 263 | def get_classifier(self): 264 | return self.head 265 | 266 | def reset_classifier(self, num_classes, global_pool=''): 267 | self.num_classes = num_classes 268 | self.head = nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else nn.Identity() 269 | 270 | def forward_features(self, x): 271 | outs = [] 272 | 273 | B = x.shape[0] 274 | 275 | # stage 1 276 | x, (H, W) = self.patch_embed1(x) 277 | 278 | for idx, blk in enumerate(self.block1): 279 | x = blk(x, H, W, self.d_convs1) 280 | x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2) 281 | outs.append(x) 282 | 283 | # stage 2 284 | x, (H, W) = self.patch_embed2(x) 285 | 286 | for idx, blk in enumerate(self.block2): 287 | x = blk(x, H, W, self.d_convs2) 288 | x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2) 289 | outs.append(x) 290 | 291 | x, (H, W) = self.patch_embed3(x) 292 | 293 | for idx, blk in enumerate(self.block3): 294 | x = blk(x, H, W, self.d_convs3) 295 | x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2) 296 | outs.append(x) 297 | 298 | # stage 4 299 | x, (H, W) = self.patch_embed4(x) 300 | 301 | for idx, blk in enumerate(self.block4): 302 | x = blk(x, H, W, self.d_convs4) 303 | x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2) 304 | outs.append(x) 305 | 306 | return outs 307 | 308 | def forward(self, x): 309 | x = self.forward_features(x) 310 | 311 | return x 312 | 313 | 314 | def _conv_filter(state_dict, patch_size=16): 315 | """ convert patch embedding weight from manual patchify + linear proj to conv""" 316 | out_dict = {} 317 | for k, v in state_dict.items(): 318 | if 'patch_embed.proj.weight' in k: 319 | v = v.reshape((v.shape[0], 3, patch_size, patch_size)) 320 | out_dict[k] = v 321 | 322 | return out_dict 323 | 324 | 325 | @BACKBONES.register_module() 326 | class p2t_tiny(PyramidPoolingTransformer): 327 | def __init__(self, **kwargs): 328 | super().__init__( 329 | patch_size=4, embed_dims=[48, 96, 240, 384], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], 330 | qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2, 2, 6, 3], 331 | drop_rate=0.0, drop_path_rate=0.1, **kwargs) 332 | 333 | 334 | @BACKBONES.register_module() 335 | class p2t_small(PyramidPoolingTransformer): 336 | def __init__(self, **kwargs): 337 | super().__init__( 338 | patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], 339 | qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2,2,9,3], mlp_ratios=[8,8,4,4], 340 | drop_rate=0.0, drop_path_rate=0.1, **kwargs) 341 | 342 | 343 | @BACKBONES.register_module() 344 | class p2t_base(PyramidPoolingTransformer): 345 | def __init__(self, **kwargs): 346 | super().__init__( 347 | patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], 348 | qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3,4,18,3], mlp_ratios=[8,8,4,4], 349 | drop_rate=0.0, drop_path_rate=0.3, **kwargs) 350 | 351 | 352 | @BACKBONES.register_module() 353 | class p2t_large(PyramidPoolingTransformer): 354 | def __init__(self, **kwargs): 355 | super().__init__( 356 | patch_size=4, embed_dims=[64, 128, 320, 640], num_heads=[1, 2, 5, 8], 357 | qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3,8,27,3], mlp_ratios=[8,8,4,4], 358 | drop_rate=0.0, drop_path_rate=0.3, **kwargs) 359 | 360 | 361 | -------------------------------------------------------------------------------- /segmentation/test.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | 4 | import mmcv 5 | import torch 6 | from mmcv.parallel import MMDataParallel, MMDistributedDataParallel 7 | from mmcv.runner import get_dist_info, init_dist, load_checkpoint 8 | from mmcv.utils import DictAction 9 | from mmseg.apis import multi_gpu_test, single_gpu_test 10 | from mmseg.datasets import build_dataloader, build_dataset 11 | from mmseg.models import build_segmentor 12 | 13 | import p2t 14 | from align_resize import AlignResize 15 | 16 | def parse_args(): 17 | parser = argparse.ArgumentParser( 18 | description='mmseg test (and eval) a model') 19 | parser.add_argument('config', help='test config file path') 20 | parser.add_argument('checkpoint', help='checkpoint file') 21 | parser.add_argument( 22 | '--aug-test', action='store_true', help='Use Flip and Multi scale aug') 23 | parser.add_argument('--out', help='output result file in pickle format') 24 | parser.add_argument( 25 | '--format-only', 26 | action='store_true', 27 | help='Format the output results without perform evaluation. It is' 28 | 'useful when you want to format the result to a specific format and ' 29 | 'submit it to the test server') 30 | parser.add_argument( 31 | '--eval', 32 | type=str, 33 | nargs='+', 34 | help='evaluation metrics, which depends on the dataset, e.g., "mIoU"' 35 | ' for generic datasets, and "cityscapes" for Cityscapes') 36 | parser.add_argument('--show', action='store_true', help='show results') 37 | parser.add_argument( 38 | '--show-dir', help='directory where painted images will be saved') 39 | parser.add_argument( 40 | '--gpu-collect', 41 | action='store_true', 42 | help='whether to use gpu to collect results.') 43 | parser.add_argument( 44 | '--tmpdir', 45 | help='tmp directory used for collecting results from multiple ' 46 | 'workers, available when gpu_collect is not specified') 47 | parser.add_argument( 48 | '--options', nargs='+', action=DictAction, help='custom options') 49 | parser.add_argument( 50 | '--eval-options', 51 | nargs='+', 52 | action=DictAction, 53 | help='custom options for evaluation') 54 | parser.add_argument( 55 | '--launcher', 56 | choices=['none', 'pytorch', 'slurm', 'mpi'], 57 | default='none', 58 | help='job launcher') 59 | parser.add_argument( 60 | '--opacity', 61 | type=float, 62 | default=1, 63 | help='Opacity of painted segmentation map. In (0, 1] range.') 64 | parser.add_argument('--local_rank', type=int, default=0) 65 | args = parser.parse_args() 66 | if 'LOCAL_RANK' not in os.environ: 67 | os.environ['LOCAL_RANK'] = str(args.local_rank) 68 | return args 69 | 70 | 71 | def main(): 72 | args = parse_args() 73 | 74 | assert args.out or args.eval or args.format_only or args.show \ 75 | or args.show_dir, \ 76 | ('Please specify at least one operation (save/eval/format/show the ' 77 | 'results / save the results) with the argument "--out", "--eval"' 78 | ', "--format-only", "--show" or "--show-dir"') 79 | 80 | if args.eval and args.format_only: 81 | raise ValueError('--eval and --format_only cannot be both specified') 82 | 83 | if args.out is not None and not args.out.endswith(('.pkl', '.pickle')): 84 | raise ValueError('The output file must be a pkl file.') 85 | 86 | cfg = mmcv.Config.fromfile(args.config) 87 | if args.options is not None: 88 | cfg.merge_from_dict(args.options) 89 | # set cudnn_benchmark 90 | if cfg.get('cudnn_benchmark', False): 91 | torch.backends.cudnn.benchmark = False 92 | if args.aug_test: 93 | # hard code index 94 | cfg.data.test.pipeline[1].img_ratios = [ 95 | 0.5, 0.75, 1.0, 1.25, 1.5, 1.75 96 | ] 97 | cfg.data.test.pipeline[1].flip = True 98 | cfg.model.pretrained = None 99 | cfg.data.test.test_mode = True 100 | 101 | # init distributed env first, since logger depends on the dist info. 102 | if args.launcher == 'none': 103 | distributed = False 104 | else: 105 | distributed = True 106 | init_dist(args.launcher, **cfg.dist_params) 107 | 108 | # build the dataloader 109 | # TODO: support multiple images per gpu (only minor changes are needed) 110 | dataset = build_dataset(cfg.data.test) 111 | data_loader = build_dataloader( 112 | dataset, 113 | samples_per_gpu=1, 114 | workers_per_gpu=cfg.data.workers_per_gpu, 115 | dist=distributed, 116 | shuffle=False) 117 | 118 | # build the model and load checkpoint 119 | cfg.model.train_cfg = None 120 | model = build_segmentor(cfg.model, test_cfg=cfg.get('test_cfg')) 121 | if os.path.exists(args.checkpoint): 122 | checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu') 123 | if 'CLASSES' in checkpoint.get('meta', {}): 124 | model.CLASSES = checkpoint['meta']['CLASSES'] 125 | else: 126 | print('"CLASSES" not found in meta, use dataset.CLASSES instead') 127 | model.CLASSES = dataset.CLASSES 128 | if 'PALETTE' in checkpoint.get('meta', {}): 129 | model.PALETTE = checkpoint['meta']['PALETTE'] 130 | else: 131 | print('"PALETTE" not found in meta, use dataset.PALETTE instead') 132 | model.PALETTE = dataset.PALETTE 133 | 134 | 135 | efficient_test = True 136 | if args.eval_options is not None: 137 | efficient_test = args.eval_options.get('efficient_test', False) 138 | print(model) 139 | 140 | if not distributed: 141 | model = MMDataParallel(model, device_ids=[0]) 142 | outputs = single_gpu_test(model, data_loader, args.show, args.show_dir, 143 | efficient_test, args.opacity) 144 | else: 145 | model = MMDistributedDataParallel( 146 | model.cuda(), 147 | device_ids=[torch.cuda.current_device()], 148 | broadcast_buffers=False) 149 | outputs = multi_gpu_test(model, data_loader, args.tmpdir, 150 | args.gpu_collect, efficient_test) 151 | 152 | rank, _ = get_dist_info() 153 | if rank == 0: 154 | if args.out: 155 | print(f'\nwriting results to {args.out}') 156 | mmcv.dump(outputs, args.out) 157 | kwargs = {} if args.eval_options is None else args.eval_options 158 | 159 | if args.format_only: 160 | dataset.format_results(outputs, **kwargs) 161 | if args.eval: 162 | dataset.evaluate(outputs, args.eval, **kwargs) 163 | 164 | 165 | if __name__ == '__main__': 166 | main() 167 | -------------------------------------------------------------------------------- /segmentation/train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import copy 3 | import os 4 | import os.path as osp 5 | import time 6 | import mmcv 7 | import torch 8 | from mmcv.runner import init_dist 9 | from mmcv.utils import Config, DictAction, get_git_hash 10 | from mmseg import __version__ 11 | from mmseg.apis import set_random_seed, train_segmentor 12 | from mmseg.datasets import build_dataset 13 | from mmseg.models import build_segmentor 14 | from mmseg.utils import collect_env, get_root_logger 15 | 16 | import p2t 17 | from align_resize import AlignResize 18 | 19 | def parse_args(): 20 | parser = argparse.ArgumentParser(description='Train a segmentor') 21 | parser.add_argument('config', help='train config file path') 22 | parser.add_argument('--work-dir', help='the dir to save logs and models') 23 | parser.add_argument( 24 | '--load-from', help='the checkpoint file to load weights from') 25 | parser.add_argument( 26 | '--resume-from', help='the checkpoint file to resume from') 27 | parser.add_argument( 28 | '--no-validate', 29 | action='store_true', 30 | help='whether not to evaluate the checkpoint during training') 31 | group_gpus = parser.add_mutually_exclusive_group() 32 | group_gpus.add_argument( 33 | '--gpus', 34 | type=int, 35 | help='number of gpus to use ' 36 | '(only applicable to non-distributed training)') 37 | group_gpus.add_argument( 38 | '--gpu-ids', 39 | type=int, 40 | nargs='+', 41 | help='ids of gpus to use ' 42 | '(only applicable to non-distributed training)') 43 | parser.add_argument('--seed', type=int, default=None, help='random seed') 44 | parser.add_argument( 45 | '--deterministic', 46 | action='store_true', 47 | help='whether to set deterministic options for CUDNN backend.') 48 | parser.add_argument( 49 | '--options', nargs='+', action=DictAction, help='custom options') 50 | parser.add_argument( 51 | '--launcher', 52 | choices=['none', 'pytorch', 'slurm', 'mpi'], 53 | default='none', 54 | help='job launcher') 55 | parser.add_argument('--local_rank', type=int, default=0) 56 | args = parser.parse_args() 57 | if 'LOCAL_RANK' not in os.environ: 58 | os.environ['LOCAL_RANK'] = str(args.local_rank) 59 | 60 | return args 61 | 62 | 63 | def main(): 64 | args = parse_args() 65 | 66 | cfg = Config.fromfile(args.config) 67 | if args.options is not None: 68 | cfg.merge_from_dict(args.options) 69 | # set cudnn_benchmark 70 | if cfg.get('cudnn_benchmark', False): 71 | torch.backends.cudnn.benchmark = True 72 | 73 | # work_dir is determined in this priority: CLI > segment in file > filename 74 | if args.work_dir is not None: 75 | # update configs according to CLI args if args.work_dir is not None 76 | cfg.work_dir = args.work_dir 77 | elif cfg.get('work_dir', None) is None: 78 | # use config filename as default work_dir if cfg.work_dir is None 79 | cfg.work_dir = osp.join('./work_dirs', 80 | osp.splitext(osp.basename(args.config))[0]) 81 | if args.load_from is not None: 82 | cfg.load_from = args.load_from 83 | if args.resume_from is not None: 84 | cfg.resume_from = args.resume_from 85 | if args.gpu_ids is not None: 86 | cfg.gpu_ids = args.gpu_ids 87 | else: 88 | cfg.gpu_ids = range(1) if args.gpus is None else range(args.gpus) 89 | 90 | # init distributed env first, since logger depends on the dist info. 91 | if args.launcher == 'none': 92 | distributed = False 93 | else: 94 | distributed = True 95 | init_dist(args.launcher, **cfg.dist_params) 96 | 97 | # create work_dir 98 | mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir)) 99 | # dump config 100 | cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config))) 101 | # init the logger before other steps 102 | timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime()) 103 | log_file = osp.join(cfg.work_dir, f'{timestamp}.log') 104 | logger = get_root_logger(log_file=log_file, log_level=cfg.log_level) 105 | 106 | # init the meta dict to record some important information such as 107 | # environment info and seed, which will be logged 108 | meta = dict() 109 | # log env info 110 | env_info_dict = collect_env() 111 | env_info = '\n'.join([f'{k}: {v}' for k, v in env_info_dict.items()]) 112 | dash_line = '-' * 60 + '\n' 113 | logger.info('Environment info:\n' + dash_line + env_info + '\n' + 114 | dash_line) 115 | meta['env_info'] = env_info 116 | 117 | # log some basic info 118 | logger.info(f'Distributed training: {distributed}') 119 | logger.info(f'Config:\n{cfg.pretty_text}') 120 | 121 | # set random seeds 122 | if args.seed is not None: 123 | logger.info(f'Set random seed to {args.seed}, deterministic: ' 124 | f'{args.deterministic}') 125 | set_random_seed(args.seed, deterministic=args.deterministic) 126 | cfg.seed = args.seed 127 | meta['seed'] = args.seed 128 | meta['exp_name'] = osp.basename(args.config) 129 | 130 | model = build_segmentor( 131 | cfg.model, 132 | train_cfg=cfg.get('train_cfg'), 133 | test_cfg=cfg.get('test_cfg')) 134 | 135 | logger.info(model) 136 | 137 | datasets = [build_dataset(cfg.data.train)] 138 | if len(cfg.workflow) == 2: 139 | val_dataset = copy.deepcopy(cfg.data.val) 140 | val_dataset.pipeline = cfg.data.train.pipeline 141 | datasets.append(build_dataset(val_dataset)) 142 | if cfg.checkpoint_config is not None: 143 | # save mmseg version, config file content and class names in 144 | # checkpoints as meta data 145 | cfg.checkpoint_config.meta = dict( 146 | mmseg_version=f'{__version__}+{get_git_hash()[:7]}', 147 | config=cfg.pretty_text, 148 | CLASSES=datasets[0].CLASSES, 149 | PALETTE=datasets[0].PALETTE) 150 | # add an attribute for visualization convenience 151 | model.CLASSES = datasets[0].CLASSES 152 | train_segmentor( 153 | model, 154 | datasets, 155 | cfg, 156 | distributed=distributed, 157 | validate=(not args.no_validate), 158 | timestamp=timestamp, 159 | meta=meta) 160 | 161 | 162 | if __name__ == '__main__': 163 | main() 164 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2015-present, Facebook, Inc. 2 | # All rights reserved. 3 | """ 4 | Misc functions, including distributed helpers. 5 | 6 | Mostly copy-paste from torchvision references. 7 | """ 8 | import io 9 | import os 10 | import time 11 | from collections import defaultdict, deque 12 | import datetime 13 | 14 | import torch 15 | import torch.distributed as dist 16 | 17 | 18 | class SmoothedValue(object): 19 | """Track a series of values and provide access to smoothed values over a 20 | window or the global series average. 21 | """ 22 | 23 | def __init__(self, window_size=20, fmt=None): 24 | if fmt is None: 25 | fmt = "{median:.4f} ({global_avg:.4f})" 26 | self.deque = deque(maxlen=window_size) 27 | self.total = 0.0 28 | self.count = 0 29 | self.fmt = fmt 30 | 31 | def update(self, value, n=1): 32 | self.deque.append(value) 33 | self.count += n 34 | self.total += value * n 35 | 36 | def synchronize_between_processes(self): 37 | """ 38 | Warning: does not synchronize the deque! 39 | """ 40 | if not is_dist_avail_and_initialized(): 41 | return 42 | t = torch.tensor([self.count, self.total], dtype=torch.float64, device='cuda') 43 | dist.barrier() 44 | dist.all_reduce(t) 45 | t = t.tolist() 46 | self.count = int(t[0]) 47 | self.total = t[1] 48 | 49 | @property 50 | def median(self): 51 | d = torch.tensor(list(self.deque)) 52 | return d.median().item() 53 | 54 | @property 55 | def avg(self): 56 | d = torch.tensor(list(self.deque), dtype=torch.float32) 57 | return d.mean().item() 58 | 59 | @property 60 | def global_avg(self): 61 | return self.total / self.count 62 | 63 | @property 64 | def max(self): 65 | return max(self.deque) 66 | 67 | @property 68 | def value(self): 69 | return self.deque[-1] 70 | 71 | def __str__(self): 72 | return self.fmt.format( 73 | median=self.median, 74 | avg=self.avg, 75 | global_avg=self.global_avg, 76 | max=self.max, 77 | value=self.value) 78 | 79 | 80 | class MetricLogger(object): 81 | def __init__(self, delimiter="\t"): 82 | self.meters = defaultdict(SmoothedValue) 83 | self.delimiter = delimiter 84 | 85 | def update(self, **kwargs): 86 | for k, v in kwargs.items(): 87 | if isinstance(v, torch.Tensor): 88 | v = v.item() 89 | assert isinstance(v, (float, int)) 90 | self.meters[k].update(v) 91 | 92 | def __getattr__(self, attr): 93 | if attr in self.meters: 94 | return self.meters[attr] 95 | if attr in self.__dict__: 96 | return self.__dict__[attr] 97 | raise AttributeError("'{}' object has no attribute '{}'".format( 98 | type(self).__name__, attr)) 99 | 100 | def __str__(self): 101 | loss_str = [] 102 | for name, meter in self.meters.items(): 103 | loss_str.append( 104 | "{}: {}".format(name, str(meter)) 105 | ) 106 | return self.delimiter.join(loss_str) 107 | 108 | def synchronize_between_processes(self): 109 | for meter in self.meters.values(): 110 | meter.synchronize_between_processes() 111 | 112 | def add_meter(self, name, meter): 113 | self.meters[name] = meter 114 | 115 | def log_every(self, iterable, print_freq, header=None): 116 | i = 0 117 | if not header: 118 | header = '' 119 | start_time = time.time() 120 | end = time.time() 121 | iter_time = SmoothedValue(fmt='{avg:.4f}') 122 | data_time = SmoothedValue(fmt='{avg:.4f}') 123 | space_fmt = ':' + str(len(str(len(iterable)))) + 'd' 124 | log_msg = [ 125 | header, 126 | '[{0' + space_fmt + '}/{1}]', 127 | 'eta: {eta}', 128 | '{meters}', 129 | 'time: {time}', 130 | 'data: {data}' 131 | ] 132 | if torch.cuda.is_available(): 133 | log_msg.append('max mem: {memory:.0f}') 134 | log_msg = self.delimiter.join(log_msg) 135 | MB = 1024.0 * 1024.0 136 | for obj in iterable: 137 | data_time.update(time.time() - end) 138 | yield obj 139 | iter_time.update(time.time() - end) 140 | if i % print_freq == 0 or i == len(iterable) - 1: 141 | eta_seconds = iter_time.global_avg * (len(iterable) - i) 142 | eta_string = str(datetime.timedelta(seconds=int(eta_seconds))) 143 | if torch.cuda.is_available(): 144 | print(log_msg.format( 145 | i, len(iterable), eta=eta_string, 146 | meters=str(self), 147 | time=str(iter_time), data=str(data_time), 148 | memory=torch.cuda.max_memory_allocated() / MB)) 149 | else: 150 | print(log_msg.format( 151 | i, len(iterable), eta=eta_string, 152 | meters=str(self), 153 | time=str(iter_time), data=str(data_time))) 154 | i += 1 155 | end = time.time() 156 | total_time = time.time() - start_time 157 | total_time_str = str(datetime.timedelta(seconds=int(total_time))) 158 | print('{} Total time: {} ({:.4f} s / it)'.format( 159 | header, total_time_str, total_time / len(iterable))) 160 | 161 | 162 | def _load_checkpoint_for_ema(model_ema, checkpoint): 163 | """ 164 | Workaround for ModelEma._load_checkpoint to accept an already-loaded object 165 | """ 166 | mem_file = io.BytesIO() 167 | torch.save(checkpoint, mem_file) 168 | mem_file.seek(0) 169 | model_ema._load_checkpoint(mem_file) 170 | 171 | 172 | def setup_for_distributed(is_master): 173 | """ 174 | This function disables printing when not in master process 175 | """ 176 | import builtins as __builtin__ 177 | builtin_print = __builtin__.print 178 | 179 | def print(*args, **kwargs): 180 | force = kwargs.pop('force', False) 181 | if is_master or force: 182 | builtin_print(*args, **kwargs) 183 | 184 | __builtin__.print = print 185 | 186 | 187 | def is_dist_avail_and_initialized(): 188 | if not dist.is_available(): 189 | return False 190 | if not dist.is_initialized(): 191 | return False 192 | return True 193 | 194 | 195 | def get_world_size(): 196 | if not is_dist_avail_and_initialized(): 197 | return 1 198 | return dist.get_world_size() 199 | 200 | 201 | def get_rank(): 202 | if not is_dist_avail_and_initialized(): 203 | return 0 204 | return dist.get_rank() 205 | 206 | 207 | def is_main_process(): 208 | return get_rank() == 0 209 | 210 | 211 | def save_on_master(*args, **kwargs): 212 | if is_main_process(): 213 | torch.save(*args, **kwargs) 214 | 215 | 216 | def init_distributed_mode(args): 217 | #args.world_size = 8 * args.world_size 218 | if 'RANK' in os.environ and 'WORLD_SIZE' in os.environ: 219 | args.rank = int(os.environ["RANK"]) 220 | args.world_size = int(os.environ['WORLD_SIZE']) 221 | args.gpu = int(os.environ['LOCAL_RANK']) 222 | elif 'SLURM_PROCID' in os.environ: 223 | args.rank = int(os.environ['SLURM_PROCID']) 224 | args.gpu = args.rank % torch.cuda.device_count() 225 | else: 226 | print('Not using distributed mode') 227 | args.distributed = False 228 | return 229 | 230 | args.distributed = True 231 | 232 | torch.cuda.set_device(args.gpu) 233 | args.dist_backend = 'nccl' 234 | print('| distributed init (rank {}): {}'.format( 235 | args.rank, args.dist_url), flush=True) 236 | torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url, 237 | world_size=args.world_size, rank=args.rank) 238 | torch.distributed.barrier() 239 | setup_for_distributed(args.rank == 0) 240 | --------------------------------------------------------------------------------