├── .gitignore ├── LICENSE ├── README.md ├── all.sh ├── configs └── fsod │ ├── Base-FSOD-C4.yaml │ ├── R_50_C4_1x.yaml │ └── finetune_R_50_C4_1x.yaml ├── datasets ├── coco │ ├── 1_split_filter.py │ ├── 2_balance.py │ ├── 3_gen_support_pool.py │ ├── 4_gen_support_pool_10_shot.py │ ├── 5_voc_part.py │ ├── 6_voc_few_shot.py │ └── new_annotations │ │ └── final_split_voc_10_shot_instances_train2017.json └── generate_support_data.sh ├── fewx ├── __init__.py ├── config │ ├── __init__.py │ ├── config.py │ └── defaults.py ├── data │ ├── __init__.py │ ├── build.py │ ├── dataset_mapper.py │ └── datasets │ │ ├── __init__.py │ │ ├── builtin.py │ │ └── register_coco.py ├── evaluation │ ├── __init__.py │ └── coco_evaluation.py ├── layers │ ├── __init__.py │ ├── boundary.py │ ├── conv_with_kaiming_uniform.py │ ├── deform_conv.py │ ├── iou_loss.py │ ├── misc.py │ ├── ml_nms.py │ └── naive_group_norm.py ├── modeling │ ├── __init__.py │ └── fsod │ │ ├── __init__.py │ │ ├── fsod_fast_rcnn.py │ │ ├── fsod_rcnn.py │ │ ├── fsod_roi_heads.py │ │ └── fsod_rpn.py ├── solver │ ├── __init__.py │ └── build.py └── utils │ ├── __init__.py │ ├── comm.py │ └── measures.py ├── fsod_train_net.py └── log ├── fsod_finetune_test_log.txt ├── fsod_finetune_train_log.txt ├── fsod_train_log.txt └── metric.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # output dir 2 | output 3 | instant_test_output 4 | inference_test_output 5 | 6 | 7 | *.jpg 8 | *.png 9 | #*.txt 10 | #*.json 11 | *.diff 12 | 13 | # compilation and distribution 14 | __pycache__ 15 | _ext 16 | *.pyc 17 | *.so 18 | detectron2.egg-info/ 19 | build/ 20 | dist/ 21 | wheels/ 22 | 23 | # pytorch/python/numpy formats 24 | *.pth 25 | *.pkl 26 | *.npy 27 | 28 | # ipython/jupyter notebooks 29 | *.ipynb 30 | **/.ipynb_checkpoints/ 31 | 32 | # Editor temporaries 33 | *.swn 34 | *.swo 35 | *.swp 36 | *~ 37 | 38 | # editor settings 39 | .idea 40 | .vscode 41 | 42 | # project dirs 43 | /detectron2/model_zoo/configs 44 | #/datasets 45 | /projects/*/datasets 46 | /models 47 | #/datasets/coco/new_annotations 48 | /datasets/coco/support 49 | /datasets/coco/10_shot_support 50 | 51 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Qi Fan 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # FewX 2 | 3 | **FewX** is an open source toolbox on top of Detectron2 for data-limited instance-level recognition tasks, e.g., few-shot object detection, few-shot instance segmentation, partially supervised instance segmentation and so on. 4 | 5 | All data-limited instance-level recognition works from **Qi Fan** (HKUST, fanqics@gmail.com) are open-sourced here. 6 | 7 | To date, FewX implements the following algorithms: 8 | 9 | - [FSOD](https://arxiv.org/abs/1908.01998): few-shot object detection with [FSOD dataset](https://github.com/fanq15/Few-Shot-Object-Detection-Dataset). 10 | - [CPMask](https://arxiv.org/abs/2007.12387): partially supervised/fully supervised/few-shot instance segmentation. (**20220725 working on it**) 11 | - [FSVOD](https://arxiv.org/abs/2104.14805): few-shot video object detection with [FSVOD-500 dataset](https://drive.google.com/drive/folders/1DDQ81A8yVj7D8vLUS01657ATr2sK1zgC?usp=sharing) and [FSYTV-40 dataset](https://drive.google.com/drive/folders/1a1PpfAxeYL7AbxYViDDnx7ACFtRohVL5?usp=sharing). (**20220725 working on it**) 12 | 13 | ## Highlights 14 | - **State-of-the-art performance.** 15 | - FSOD is the best few-shot object detection model. (This model can be directly applied to novel classes without finetuning. And finetuning can bring better performance.) 16 | - CPMask is the best partially supervised/few-shot instance segmentation model. 17 | - **Easy to use.** You only need to run 3 code lines to conduct the entire experiment. 18 | - Install Pre-Built Detectron2 in one code line. 19 | - Prepare dataset in one code line. (You need to first download the dataset and change the **data path** in the script.) 20 | - Training and evaluation in one code line. 21 | 22 | ## Updates 23 | - FewX has been released. (09/08/2020) 24 | 25 | ## Results on MS COCO 26 | 27 | ### Few Shot Object Detection 28 | 29 | |Method|Training Dataset|Evaluation way&shot|box AP|download| 30 | |:--------:|:--------:|:--------:|:--------:|:--:| 31 | |FSOD (paper)|COCO (non-voc)|full-way 10-shot|11.1|-| 32 | |FSOD (this implementation)|COCO (non-voc)|full-way 10-shot|**12.0**|model \| metrics| 33 | 34 | The results are reported on the COCO voc subset with **ResNet-50** backbone. 35 | 36 | The model only trained on base classes is base model \. 37 | 38 | You can reference the [original FSOD implementation](https://github.com/fanq15/FSOD-code) on the [Few-Shot-Object-Detection-Dataset](https://github.com/fanq15/Few-Shot-Object-Detection-Dataset). 39 | 40 | ## Step 1: Installation 41 | You only need to install [detectron2](https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md). We recommend the Pre-Built Detectron2 (Linux only) version with pytorch 1.7. I use the Pre-Built Detectron2 with CUDA 10.1 and pytorch 1.7 and you can run this code to install it. 42 | 43 | ``` 44 | python -m pip install detectron2 -f \ 45 | https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html 46 | ``` 47 | 48 | ## Step 2: Prepare dataset 49 | - Prepare for coco dataset following [this instruction](https://github.com/facebookresearch/detectron2/tree/master/datasets). 50 | 51 | - `cd datasets`, change the `DATA_ROOT` in the `generate_support_data.sh` to your data path and run `sh generate_support_data.sh`. 52 | 53 | ``` 54 | cd FewX/datasets 55 | sh generate_support_data.sh 56 | ``` 57 | 58 | ## Step 3: Training and Evaluation 59 | 60 | Run `sh all.sh` in the root dir. (This script uses `4 GPUs`. You can change the GPU number. If you use 2 GPUs with unchanged batch size (8), please [halve the learning rate](https://github.com/fanq15/FewX/issues/6#issuecomment-674367388).) 61 | 62 | ``` 63 | cd FewX 64 | sh all.sh 65 | ``` 66 | 67 | 68 | ## TODO 69 | - [ ] Add FSVOD and CPMask codes to this repo. 70 | - [ ] Add other dataset results to FSOD. 71 | - [ ] Add [CPMask](https://arxiv.org/abs/2007.12387) code with partially supervised instance segmentation, fully supervised instance segmentation and few-shot instance segmentation. 72 | 73 | ## Citing FewX 74 | If you use this toolbox in your research or wish to refer to the baseline results, please use the following BibTeX entries. 75 | 76 | ``` 77 | @inproceedings{fan2021fsvod, 78 | title={Few-Shot Video Object Detection}, 79 | author={Fan, Qi and Tang, Chi-Keung and Tai, Yu-Wing}, 80 | booktitle={ECCV}, 81 | year={2022} 82 | } 83 | @inproceedings{fan2020cpmask, 84 | title={Commonality-Parsing Network across Shape and Appearance for Partially Supervised Instance Segmentation}, 85 | author={Fan, Qi and Ke, Lei and Pei, Wenjie and Tang, Chi-Keung and Tai, Yu-Wing}, 86 | booktitle={ECCV}, 87 | year={2020} 88 | } 89 | @inproceedings{fan2020fsod, 90 | title={Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector}, 91 | author={Fan, Qi and Zhuo, Wei and Tang, Chi-Keung and Tai, Yu-Wing}, 92 | booktitle={CVPR}, 93 | year={2020} 94 | } 95 | ``` 96 | 97 | ## Special Thanks 98 | [Detectron2](https://github.com/facebookresearch/detectron2), [AdelaiDet](https://github.com/aim-uofa/AdelaiDet), [centermask2](https://github.com/youngwanLEE/centermask2) 99 | -------------------------------------------------------------------------------- /all.sh: -------------------------------------------------------------------------------- 1 | rm support_dir/support_feature.pkl 2 | CUDA_VISIBLE_DEVICES=0,1,2,3 python3 fsod_train_net.py --num-gpus 4 \ 3 | --config-file configs/fsod/R_50_C4_1x.yaml 2>&1 | tee log/fsod_train_log.txt 4 | 5 | #CUDA_VISIBLE_DEVICES=0,1,2,3 python3 tools/train_net.py --num-gpus 4 \ 6 | # --config-file configs/fsod/R_50_C4_1x.yaml \ 7 | # --eval-only MODEL.WEIGHTS ./output/fsod/R_50_C4_1x/model_final.pth 2>&1 | tee log/fsod_test_log.txt 8 | 9 | rm support_dir/support_feature.pkl 10 | CUDA_VISIBLE_DEVICES=0,1,2,3 python3 fsod_train_net.py --num-gpus 4 \ 11 | --config-file configs/fsod/finetune_R_50_C4_1x.yaml 2>&1 | tee log/fsod_finetune_train_log.txt 12 | CUDA_VISIBLE_DEVICES=0,1,2,3 python3 fsod_train_net.py --num-gpus 4 \ 13 | --config-file configs/fsod/finetune_R_50_C4_1x.yaml \ 14 | --eval-only MODEL.WEIGHTS ./output/fsod/finetune_dir/R_50_C4_1x/model_final.pth 2>&1 | tee log/fsod_finetune_test_log.txt 15 | 16 | #CUDA_VISIBLE_DEVICES=0,1,2,3 python3 fsod_train_net.py --num-gpus 4 \ 17 | # --config-file configs/fsod/finetune_R_50_C4_1x.yaml \ 18 | # --eval-only MODEL.WEIGHTS ./output/fsod/finetune_dir/R_50_C4_1x/model_final.pth 2>&1 | tee log/fsod_finetune_test_log.txt 19 | 20 | -------------------------------------------------------------------------------- /configs/fsod/Base-FSOD-C4.yaml: -------------------------------------------------------------------------------- 1 | MODEL: 2 | META_ARCHITECTURE: "FsodRCNN" 3 | PROPOSAL_GENERATOR: 4 | NAME: "FsodRPN" 5 | RPN: 6 | PRE_NMS_TOPK_TEST: 6000 7 | POST_NMS_TOPK_TEST: 100 8 | ROI_HEADS: 9 | NAME: "FsodRes5ROIHeads" 10 | BATCH_SIZE_PER_IMAGE: 128 11 | POSITIVE_FRACTION: 0.5 12 | NUM_CLASSES: 1 13 | BACKBONE: 14 | FREEZE_AT: 3 15 | #PIXEL_MEAN: [102.9801, 115.9465, 122.7717] 16 | DATASETS: 17 | TRAIN: ("coco_2017_train_nonvoc",) #("coco_2017_train",) 18 | TEST: ("coco_2017_val",) 19 | DATALOADER: 20 | NUM_WORKERS: 8 21 | SOLVER: 22 | IMS_PER_BATCH: 8 #16 23 | BASE_LR: 0.004 #0.02 24 | STEPS: (112000, 120000) #(30000, 40000) #(56000,) #(60000, 80000) 25 | MAX_ITER: 120000 #300000 #45000 #60000 #90000 26 | WARMUP_ITERS: 1000 #500 27 | WARMUP_FACTOR: 0.1 28 | CHECKPOINT_PERIOD: 30000 29 | HEAD_LR_FACTOR: 2.0 30 | #WEIGHT_DECAY_BIAS: 0.0 31 | INPUT: 32 | FS: 33 | SUPPORT_WAY: 2 34 | SUPPORT_SHOT: 10 35 | MIN_SIZE_TRAIN: (440, 472, 504, 536, 568, 600) #(600,) #(640, 672, 704, 736, 768, 800) 36 | MAX_SIZE_TRAIN: 1000 37 | MIN_SIZE_TEST: 600 38 | MAX_SIZE_TEST: 1000 39 | VERSION: 2 40 | -------------------------------------------------------------------------------- /configs/fsod/R_50_C4_1x.yaml: -------------------------------------------------------------------------------- 1 | _BASE_: "Base-FSOD-C4.yaml" 2 | MODEL: 3 | WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl" 4 | MASK_ON: False 5 | RESNETS: 6 | DEPTH: 50 7 | OUTPUT_DIR: './output/fsod/R_50_C4_1x' 8 | -------------------------------------------------------------------------------- /configs/fsod/finetune_R_50_C4_1x.yaml: -------------------------------------------------------------------------------- 1 | _BASE_: "Base-FSOD-C4.yaml" 2 | MODEL: 3 | WEIGHTS: "./output/fsod/R_50_C4_1x/model_final.pth" 4 | MASK_ON: False 5 | RESNETS: 6 | DEPTH: 50 7 | BACKBONE: 8 | FREEZE_AT: 5 9 | DATASETS: 10 | TRAIN: ("coco_2017_train_voc_10_shot",) 11 | TEST: ("coco_2017_val",) 12 | SOLVER: 13 | IMS_PER_BATCH: 4 14 | BASE_LR: 0.001 15 | STEPS: (2000, 3000) 16 | MAX_ITER: 3000 17 | WARMUP_ITERS: 200 18 | INPUT: 19 | FS: 20 | FEW_SHOT: True 21 | SUPPORT_WAY: 2 22 | SUPPORT_SHOT: 9 23 | MIN_SIZE_TRAIN: (440, 472, 504, 536, 568, 600) 24 | MAX_SIZE_TRAIN: 1000 25 | MIN_SIZE_TEST: 600 26 | MAX_SIZE_TEST: 1000 27 | OUTPUT_DIR: './output/fsod/finetune_dir/R_50_C4_1x' 28 | 29 | -------------------------------------------------------------------------------- /datasets/coco/1_split_filter.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Created on Fri Jun 5 15:27:52 2020 5 | 6 | @author: fanq15 7 | """ 8 | 9 | from pycocotools.coco import COCO 10 | import cv2 11 | import numpy as np 12 | from os.path import join, isdir 13 | from os import mkdir, makedirs 14 | from concurrent import futures 15 | import sys 16 | import time 17 | import math 18 | import matplotlib.pyplot as plt 19 | import os 20 | import pandas as pd 21 | import json 22 | import sys 23 | 24 | 25 | def filter_coco(coco, cls_split): 26 | new_anns = [] 27 | all_cls_dict = {} 28 | for img_id, id in enumerate(coco.imgs): 29 | img = coco.loadImgs(id)[0] 30 | anns = coco.loadAnns(coco.getAnnIds(imgIds=id, iscrowd=None)) 31 | skip_flag = False 32 | img_cls_dict = {} 33 | if len(anns) == 0: 34 | continue 35 | for ann in anns: 36 | segmentation = ann['segmentation'] 37 | area = ann['area'] 38 | iscrowd = ann['iscrowd'] 39 | image_id = ann['image_id'] 40 | bbox = ann['bbox'] 41 | category_id = ann['category_id'] 42 | id = ann['id'] 43 | bbox_area = bbox[2] * bbox[3] 44 | 45 | # filter images with small boxes 46 | if category_id in cls_split: 47 | if bbox_area < 32 * 32: 48 | skip_flag = True 49 | 50 | if skip_flag: 51 | continue 52 | else: 53 | for ann in anns: 54 | category_id = ann['category_id'] 55 | if category_id in cls_split: 56 | new_anns.append(ann) 57 | 58 | if category_id in all_cls_dict.keys(): 59 | all_cls_dict[category_id] += 1 60 | else: 61 | all_cls_dict[category_id] = 1 62 | 63 | print(len(new_anns)) 64 | print(sorted(all_cls_dict.items(), key = lambda kv:(kv[1], kv[0]))) 65 | return new_anns 66 | 67 | 68 | root_path = sys.argv[1] 69 | print(root_path) 70 | #root_path = '/home/fanqi/data/COCO' 71 | dataDir = './annotations' 72 | support_dict = {} 73 | 74 | support_dict['support_box'] = [] 75 | support_dict['category_id'] = [] 76 | support_dict['image_id'] = [] 77 | support_dict['id'] = [] 78 | support_dict['file_path'] = [] 79 | 80 | voc_inds = (0, 1, 2, 3, 4, 5, 6, 8, 14, 15, 16, 17, 18, 19, 39, 56, 57, 58, 60, 62) 81 | 82 | 83 | for dataType in ['instances_train2017.json']: #, 'split_voc_instances_train2017.json']: 84 | annFile = join(dataDir, dataType) 85 | 86 | with open(annFile,'r') as load_f: 87 | dataset = json.load(load_f) 88 | print(dataset.keys()) 89 | save_info = dataset['info'] 90 | save_licenses = dataset['licenses'] 91 | save_images = dataset['images'] 92 | save_categories = dataset['categories'] 93 | save_annotations = dataset['annotations'] 94 | 95 | 96 | inds_split2 = [i for i in range(len(save_categories)) if i not in voc_inds] 97 | 98 | # split annotations according to categories 99 | categories_split1 = [save_categories[i] for i in voc_inds] 100 | categories_split2 = [save_categories[i] for i in inds_split2] 101 | cids_split1 = [c['id'] for c in categories_split1] 102 | cids_split2 = [c['id'] for c in categories_split2] 103 | print('Split 1: {} classes'.format(len(categories_split1))) 104 | for c in categories_split1: 105 | print('\t', c['name']) 106 | print('Split 2: {} classes'.format(len(categories_split2))) 107 | for c in categories_split2: 108 | print('\t', c['name']) 109 | 110 | coco = COCO(annFile) 111 | 112 | # for non-voc, there can be non_voc images 113 | annotations_split2 = filter_coco(coco, cids_split2) 114 | 115 | # for voc, there can be non_voc images 116 | annotations = dataset['annotations'] 117 | annotations_split1 = [] 118 | 119 | for ann in annotations: 120 | if ann['category_id'] in cids_split1: # voc 20 121 | annotations_split1.append(ann) 122 | 123 | dataset_split1 = { 124 | 'info': save_info, 125 | 'licenses': save_licenses, 126 | 'images': save_images, 127 | 'annotations': annotations_split1, 128 | 'categories': save_categories} 129 | dataset_split2 = { 130 | 'info': save_info, 131 | 'licenses': save_licenses, 132 | 'images': save_images, 133 | 'annotations': annotations_split2, 134 | 'categories': save_categories} 135 | new_annotations_path = os.path.join(root_path, 'new_annotations') 136 | if not os.path.exists(new_annotations_path): 137 | os.makedirs(new_annotations_path) 138 | split2_file = os.path.join(root_path, 'new_annotations/final_split_non_voc_instances_train2017.json') 139 | 140 | with open(split2_file, 'w') as f: 141 | json.dump(dataset_split2, f) 142 | -------------------------------------------------------------------------------- /datasets/coco/2_balance.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Created on Fri Jun 5 16:04:06 2020 5 | 6 | @author: fanq15 7 | """ 8 | 9 | from pycocotools.coco import COCO 10 | import cv2 11 | import numpy as np 12 | from os.path import join, isdir 13 | from os import mkdir, makedirs 14 | from concurrent import futures 15 | import sys 16 | import time 17 | import math 18 | import matplotlib.pyplot as plt 19 | import os 20 | import pandas as pd 21 | import json 22 | import sys 23 | 24 | def balance_coco(coco): 25 | all_cls_dict = {} 26 | for img_id, id in enumerate(coco.imgs): 27 | anns = coco.loadAnns(coco.getAnnIds(imgIds=id, iscrowd=None)) 28 | img_cls_dict = {} 29 | if len(anns) == 0: 30 | continue 31 | for ann in anns: 32 | category_id = ann['category_id'] 33 | id = ann['id'] 34 | 35 | if category_id in img_cls_dict.keys(): 36 | img_cls_dict[category_id] += 1 37 | else: 38 | img_cls_dict[category_id] = 1 39 | 40 | for category_id, num in img_cls_dict.items(): 41 | if category_id in all_cls_dict.keys(): 42 | all_cls_dict[category_id] += 1 # count the image number containing the target category 43 | else: 44 | all_cls_dict[category_id] = 1 45 | print('Image number of non-voc classes before class balancing: ', sorted(all_cls_dict.items(), key = lambda kv:(kv[1], kv[0]))) 46 | 47 | new_anns = [] 48 | for img_id, id in enumerate(coco.imgs): 49 | save_flag = False 50 | anns = coco.loadAnns(coco.getAnnIds(imgIds=id, iscrowd=None)) 51 | if len(anns) == 0: 52 | continue 53 | # in this image, if there is at least one category with less than 2000 images, save this image. 54 | # otherwise, remove this image. 55 | for ann in anns: 56 | category_id = ann['category_id'] 57 | 58 | if all_cls_dict[category_id] <= 2000: 59 | save_flag = True 60 | 61 | if save_flag: 62 | for ann in anns: 63 | new_anns.append(ann) 64 | else: 65 | for ann in anns: 66 | category_id = ann['category_id'] 67 | all_cls_dict[category_id] -= 1 68 | print('Instance number of non-voc classes after class balancing: ', len(new_anns)) 69 | print('Image number of non-voc classes after class balancing: ', sorted(all_cls_dict.items(), key = lambda kv:(kv[1], kv[0]))) 70 | 71 | return new_anns 72 | 73 | root_path = sys.argv[1] 74 | #root_path = '/home/fanqi/data/COCO' 75 | dataDir = os.path.join(root_path, 'new_annotations') 76 | support_dict = {} 77 | 78 | support_dict['support_box'] = [] 79 | support_dict['category_id'] = [] 80 | support_dict['image_id'] = [] 81 | support_dict['id'] = [] 82 | support_dict['file_path'] = [] 83 | 84 | 85 | for dataType in ['split_non_voc_instances_train2017.json']: 86 | annFile = join(dataDir, dataType) 87 | 88 | with open(annFile,'r') as load_f: 89 | dataset = json.load(load_f) 90 | print(dataset.keys()) 91 | save_info = dataset['info'] 92 | save_licenses = dataset['licenses'] 93 | save_images = dataset['images'] 94 | save_categories = dataset['categories'] 95 | 96 | print(annFile) 97 | shot_num = 10 98 | coco = COCO(annFile) 99 | print(coco) 100 | 101 | annotations = balance_coco(coco) 102 | dataset_split = { 103 | 'info': save_info, 104 | 'licenses': save_licenses, 105 | 'images': save_images, 106 | 'annotations': annotations, 107 | 'categories': save_categories} 108 | split_file = os.path.join(root_path, 'new_annotations/final_split_non_voc_instances_train2017.json') 109 | 110 | with open(split_file, 'w') as f: 111 | json.dump(dataset_split, f) 112 | -------------------------------------------------------------------------------- /datasets/coco/3_gen_support_pool.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Created on Thu Jun 4 15:30:24 2020 5 | 6 | @author: fanq15 7 | """ 8 | 9 | from pycocotools.coco import COCO 10 | import cv2 11 | import numpy as np 12 | from os.path import join, isdir 13 | from os import mkdir, makedirs 14 | from concurrent import futures 15 | import sys 16 | import time 17 | import math 18 | import matplotlib.pyplot as plt 19 | import os 20 | import pandas as pd 21 | import json 22 | import shutil 23 | import sys 24 | 25 | def vis_image(im, bboxs, im_name): 26 | dpi = 300 27 | fig, ax = plt.subplots() 28 | ax.imshow(im, aspect='equal') 29 | plt.axis('off') 30 | height, width, channels = im.shape 31 | fig.set_size_inches(width/100.0/3.0, height/100.0/3.0) 32 | plt.gca().xaxis.set_major_locator(plt.NullLocator()) 33 | plt.gca().yaxis.set_major_locator(plt.NullLocator()) 34 | plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0,wspace=0) 35 | plt.margins(0,0) 36 | # Show box (off by default, box_alpha=0.0) 37 | for bbox in bboxs: 38 | ax.add_patch( 39 | plt.Rectangle((bbox[0], bbox[1]), 40 | bbox[2] - bbox[0], 41 | bbox[3] - bbox[1], 42 | fill=False, edgecolor='r', 43 | linewidth=0.5, alpha=1)) 44 | output_name = os.path.basename(im_name) 45 | plt.savefig(im_name, dpi=dpi, bbox_inches='tight', pad_inches=0) 46 | plt.close('all') 47 | 48 | 49 | def crop_support(img, bbox): 50 | image_shape = img.shape[:2] # h, w 51 | data_height, data_width = image_shape 52 | 53 | img = img.transpose(2, 0, 1) 54 | 55 | x1 = int(bbox[0]) 56 | y1 = int(bbox[1]) 57 | x2 = int(bbox[2]) 58 | y2 = int(bbox[3]) 59 | 60 | width = x2 - x1 61 | height = y2 - y1 62 | context_pixel = 16 #int(16 * im_scale) 63 | 64 | new_x1 = 0 65 | new_y1 = 0 66 | new_x2 = width 67 | new_y2 = height 68 | target_size = (320, 320) #(384, 384) 69 | 70 | if width >= height: 71 | crop_x1 = x1 - context_pixel 72 | crop_x2 = x2 + context_pixel 73 | 74 | # New_x1 and new_x2 will change when crop context or overflow 75 | new_x1 = new_x1 + context_pixel 76 | new_x2 = new_x1 + width 77 | if crop_x1 < 0: 78 | new_x1 = new_x1 + crop_x1 79 | new_x2 = new_x1 + width 80 | crop_x1 = 0 81 | if crop_x2 > data_width: 82 | crop_x2 = data_width 83 | 84 | short_size = height 85 | long_size = crop_x2 - crop_x1 86 | y_center = int((y2+y1) / 2) #math.ceil((y2 + y1) / 2) 87 | crop_y1 = int(y_center - (long_size / 2)) #int(y_center - math.ceil(long_size / 2)) 88 | crop_y2 = int(y_center + (long_size / 2)) #int(y_center + math.floor(long_size / 2)) 89 | 90 | # New_y1 and new_y2 will change when crop context or overflow 91 | new_y1 = new_y1 + math.ceil((long_size - short_size) / 2) 92 | new_y2 = new_y1 + height 93 | if crop_y1 < 0: 94 | new_y1 = new_y1 + crop_y1 95 | new_y2 = new_y1 + height 96 | crop_y1 = 0 97 | if crop_y2 > data_height: 98 | crop_y2 = data_height 99 | 100 | crop_short_size = crop_y2 - crop_y1 101 | crop_long_size = crop_x2 - crop_x1 102 | square = np.zeros((3, crop_long_size, crop_long_size), dtype = np.uint8) 103 | delta = int((crop_long_size - crop_short_size) / 2) #int(math.ceil((crop_long_size - crop_short_size) / 2)) 104 | square_y1 = delta 105 | square_y2 = delta + crop_short_size 106 | 107 | new_y1 = new_y1 + delta 108 | new_y2 = new_y2 + delta 109 | 110 | crop_box = img[:, crop_y1:crop_y2, crop_x1:crop_x2] 111 | square[:, square_y1:square_y2, :] = crop_box 112 | 113 | #show_square = np.zeros((crop_long_size, crop_long_size, 3))#, dtype=np.int16) 114 | #show_crop_box = original_img[crop_y1:crop_y2, crop_x1:crop_x2, :] 115 | #show_square[square_y1:square_y2, :, :] = show_crop_box 116 | #show_square = show_square.astype(np.int16) 117 | else: 118 | crop_y1 = y1 - context_pixel 119 | crop_y2 = y2 + context_pixel 120 | 121 | # New_y1 and new_y2 will change when crop context or overflow 122 | new_y1 = new_y1 + context_pixel 123 | new_y2 = new_y1 + height 124 | if crop_y1 < 0: 125 | new_y1 = new_y1 + crop_y1 126 | new_y2 = new_y1 + height 127 | crop_y1 = 0 128 | if crop_y2 > data_height: 129 | crop_y2 = data_height 130 | 131 | short_size = width 132 | long_size = crop_y2 - crop_y1 133 | x_center = int((x2 + x1) / 2) #math.ceil((x2 + x1) / 2) 134 | crop_x1 = int(x_center - (long_size / 2)) #int(x_center - math.ceil(long_size / 2)) 135 | crop_x2 = int(x_center + (long_size / 2)) #int(x_center + math.floor(long_size / 2)) 136 | 137 | # New_x1 and new_x2 will change when crop context or overflow 138 | new_x1 = new_x1 + math.ceil((long_size - short_size) / 2) 139 | new_x2 = new_x1 + width 140 | if crop_x1 < 0: 141 | new_x1 = new_x1 + crop_x1 142 | new_x2 = new_x1 + width 143 | crop_x1 = 0 144 | if crop_x2 > data_width: 145 | crop_x2 = data_width 146 | 147 | crop_short_size = crop_x2 - crop_x1 148 | crop_long_size = crop_y2 - crop_y1 149 | square = np.zeros((3, crop_long_size, crop_long_size), dtype = np.uint8) 150 | delta = int((crop_long_size - crop_short_size) / 2) #int(math.ceil((crop_long_size - crop_short_size) / 2)) 151 | square_x1 = delta 152 | square_x2 = delta + crop_short_size 153 | 154 | new_x1 = new_x1 + delta 155 | new_x2 = new_x2 + delta 156 | crop_box = img[:, crop_y1:crop_y2, crop_x1:crop_x2] 157 | square[:, :, square_x1:square_x2] = crop_box 158 | 159 | #show_square = np.zeros((crop_long_size, crop_long_size, 3)) #, dtype=np.int16) 160 | #show_crop_box = original_img[crop_y1:crop_y2, crop_x1:crop_x2, :] 161 | #show_square[:, square_x1:square_x2, :] = show_crop_box 162 | #show_square = show_square.astype(np.int16) 163 | #print(crop_y2 - crop_y1, crop_x2 - crop_x1, bbox, data_height, data_width) 164 | 165 | square = square.astype(np.float32, copy=False) 166 | square_scale = float(target_size[0]) / long_size 167 | square = square.transpose(1,2,0) 168 | square = cv2.resize(square, target_size, interpolation=cv2.INTER_LINEAR) # None, None, fx=square_scale, fy=square_scale, interpolation=cv2.INTER_LINEAR) 169 | #square = square.transpose(2,0,1) 170 | square = square.astype(np.uint8) 171 | 172 | new_x1 = int(new_x1 * square_scale) 173 | new_y1 = int(new_y1 * square_scale) 174 | new_x2 = int(new_x2 * square_scale) 175 | new_y2 = int(new_y2 * square_scale) 176 | 177 | # For test 178 | #show_square = cv2.resize(show_square, target_size, interpolation=cv2.INTER_LINEAR) # None, None, fx=square_scale, fy=square_scale, interpolation=cv2.INTER_LINEAR) 179 | #self.vis_image(show_square, [new_x1, new_y1, new_x2, new_y2], img_path.split('/')[-1][:-4]+'_crop.jpg', './test') 180 | 181 | support_data = square 182 | support_box = np.array([new_x1, new_y1, new_x2, new_y2]).astype(np.float32) 183 | return support_data, support_box 184 | 185 | 186 | def main(): 187 | dataDir = '.' 188 | #root_path = '/home/fanqi/data/COCO' 189 | root_path = sys.argv[1] 190 | support_path = os.path.join(root_path, 'support') 191 | if not isdir(support_path): 192 | mkdir(support_path) 193 | #else: 194 | # shutil.rmtree(support_path) 195 | 196 | support_dict = {} 197 | 198 | support_dict['support_box'] = [] 199 | support_dict['category_id'] = [] 200 | support_dict['image_id'] = [] 201 | support_dict['id'] = [] 202 | support_dict['file_path'] = [] 203 | 204 | for dataType in ['train2017']: #, 'train2017']: 205 | set_crop_base_path = join(support_path, dataType) 206 | set_img_base_path = join(dataDir, dataType) 207 | 208 | annFile = os.path.join(root_path, 'new_annotations/final_split_non_voc_instances_train2017.json') 209 | with open(annFile,'r') as load_f: 210 | dataset = json.load(load_f) 211 | print(dataset.keys()) 212 | save_info = dataset['info'] 213 | save_licenses = dataset['licenses'] 214 | save_images = dataset['images'] 215 | save_categories = dataset['categories'] 216 | 217 | coco = COCO(annFile) 218 | 219 | for img_id, id in enumerate(coco.imgs): 220 | if img_id % 100 == 0: 221 | print(img_id) 222 | img = coco.loadImgs(id)[0] 223 | anns = coco.loadAnns(coco.getAnnIds(imgIds=id, iscrowd=None)) 224 | 225 | if len(anns) == 0: 226 | continue 227 | 228 | frame_crop_base_path = join(set_crop_base_path, img['file_name'].split('/')[-1].split('.')[0]) 229 | if not isdir(frame_crop_base_path): makedirs(frame_crop_base_path) 230 | im = cv2.imread('{}/{}'.format(set_img_base_path, img['file_name'])) 231 | #print('{}/{}'.format(set_img_base_path, img['file_name'])) 232 | for item_id, ann in enumerate(anns): 233 | rect = ann['bbox'] 234 | bbox = [rect[0], rect[1], rect[0] + rect[2], rect[1] + rect[3]] 235 | support_img, support_box = crop_support(im, bbox) 236 | #im_name = img['file_name'].split('.')[0] + '_' + str(item_id) + '.jpg' 237 | #output_dir = './fig' 238 | #vis_image(support_img[:, :, ::-1], support_box, join(frame_crop_base_path, '{:04d}.jpg'.format(item_id))) 239 | if rect[2] <= 0 or rect[3] <=0: 240 | print(rect) 241 | continue 242 | file_path = join(frame_crop_base_path, '{:04d}.jpg'.format(item_id)) 243 | cv2.imwrite(file_path, support_img) 244 | #print(file_path) 245 | support_dict['support_box'].append(support_box.tolist()) 246 | support_dict['category_id'].append(ann['category_id']) 247 | support_dict['image_id'].append(ann['image_id']) 248 | support_dict['id'].append(ann['id']) 249 | support_dict['file_path'].append(file_path) 250 | 251 | support_df = pd.DataFrame.from_dict(support_dict) 252 | 253 | return support_df 254 | 255 | 256 | if __name__ == '__main__': 257 | since = time.time() 258 | support_df = main() 259 | support_df.to_pickle("./train_support_df.pkl") 260 | 261 | time_elapsed = time.time() - since 262 | print('Total complete in {:.0f}m {:.0f}s'.format( 263 | time_elapsed // 60, time_elapsed % 60)) 264 | 265 | -------------------------------------------------------------------------------- /datasets/coco/4_gen_support_pool_10_shot.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Created on Thu Jun 4 15:30:24 2020 5 | 6 | @author: fanq15 7 | """ 8 | 9 | from pycocotools.coco import COCO 10 | import cv2 11 | import numpy as np 12 | from os.path import join, isdir 13 | from os import mkdir, makedirs 14 | from concurrent import futures 15 | import sys 16 | import time 17 | import math 18 | import matplotlib.pyplot as plt 19 | import os 20 | import pandas as pd 21 | import json 22 | import shutil 23 | import sys 24 | 25 | def vis_image(im, bboxs, im_name): 26 | dpi = 300 27 | fig, ax = plt.subplots() 28 | ax.imshow(im, aspect='equal') 29 | plt.axis('off') 30 | height, width, channels = im.shape 31 | fig.set_size_inches(width/100.0/3.0, height/100.0/3.0) 32 | plt.gca().xaxis.set_major_locator(plt.NullLocator()) 33 | plt.gca().yaxis.set_major_locator(plt.NullLocator()) 34 | plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0,wspace=0) 35 | plt.margins(0,0) 36 | # Show box (off by default, box_alpha=0.0) 37 | for bbox in bboxs: 38 | ax.add_patch( 39 | plt.Rectangle((bbox[0], bbox[1]), 40 | bbox[2] - bbox[0], 41 | bbox[3] - bbox[1], 42 | fill=False, edgecolor='r', 43 | linewidth=0.5, alpha=1)) 44 | output_name = os.path.basename(im_name) 45 | plt.savefig(im_name, dpi=dpi, bbox_inches='tight', pad_inches=0) 46 | plt.close('all') 47 | 48 | 49 | def crop_support(img, bbox): 50 | image_shape = img.shape[:2] # h, w 51 | data_height, data_width = image_shape 52 | 53 | img = img.transpose(2, 0, 1) 54 | 55 | x1 = int(bbox[0]) 56 | y1 = int(bbox[1]) 57 | x2 = int(bbox[2]) 58 | y2 = int(bbox[3]) 59 | 60 | width = x2 - x1 61 | height = y2 - y1 62 | context_pixel = 16 #int(16 * im_scale) 63 | 64 | new_x1 = 0 65 | new_y1 = 0 66 | new_x2 = width 67 | new_y2 = height 68 | target_size = (320, 320) #(384, 384) 69 | 70 | if width >= height: 71 | crop_x1 = x1 - context_pixel 72 | crop_x2 = x2 + context_pixel 73 | 74 | # New_x1 and new_x2 will change when crop context or overflow 75 | new_x1 = new_x1 + context_pixel 76 | new_x2 = new_x1 + width 77 | if crop_x1 < 0: 78 | new_x1 = new_x1 + crop_x1 79 | new_x2 = new_x1 + width 80 | crop_x1 = 0 81 | if crop_x2 > data_width: 82 | crop_x2 = data_width 83 | 84 | short_size = height 85 | long_size = crop_x2 - crop_x1 86 | y_center = int((y2+y1) / 2) #math.ceil((y2 + y1) / 2) 87 | crop_y1 = int(y_center - (long_size / 2)) #int(y_center - math.ceil(long_size / 2)) 88 | crop_y2 = int(y_center + (long_size / 2)) #int(y_center + math.floor(long_size / 2)) 89 | 90 | # New_y1 and new_y2 will change when crop context or overflow 91 | new_y1 = new_y1 + math.ceil((long_size - short_size) / 2) 92 | new_y2 = new_y1 + height 93 | if crop_y1 < 0: 94 | new_y1 = new_y1 + crop_y1 95 | new_y2 = new_y1 + height 96 | crop_y1 = 0 97 | if crop_y2 > data_height: 98 | crop_y2 = data_height 99 | 100 | crop_short_size = crop_y2 - crop_y1 101 | crop_long_size = crop_x2 - crop_x1 102 | square = np.zeros((3, crop_long_size, crop_long_size), dtype = np.uint8) 103 | delta = int((crop_long_size - crop_short_size) / 2) #int(math.ceil((crop_long_size - crop_short_size) / 2)) 104 | square_y1 = delta 105 | square_y2 = delta + crop_short_size 106 | 107 | new_y1 = new_y1 + delta 108 | new_y2 = new_y2 + delta 109 | 110 | crop_box = img[:, crop_y1:crop_y2, crop_x1:crop_x2] 111 | square[:, square_y1:square_y2, :] = crop_box 112 | 113 | #show_square = np.zeros((crop_long_size, crop_long_size, 3))#, dtype=np.int16) 114 | #show_crop_box = original_img[crop_y1:crop_y2, crop_x1:crop_x2, :] 115 | #show_square[square_y1:square_y2, :, :] = show_crop_box 116 | #show_square = show_square.astype(np.int16) 117 | else: 118 | crop_y1 = y1 - context_pixel 119 | crop_y2 = y2 + context_pixel 120 | 121 | # New_y1 and new_y2 will change when crop context or overflow 122 | new_y1 = new_y1 + context_pixel 123 | new_y2 = new_y1 + height 124 | if crop_y1 < 0: 125 | new_y1 = new_y1 + crop_y1 126 | new_y2 = new_y1 + height 127 | crop_y1 = 0 128 | if crop_y2 > data_height: 129 | crop_y2 = data_height 130 | 131 | short_size = width 132 | long_size = crop_y2 - crop_y1 133 | x_center = int((x2 + x1) / 2) #math.ceil((x2 + x1) / 2) 134 | crop_x1 = int(x_center - (long_size / 2)) #int(x_center - math.ceil(long_size / 2)) 135 | crop_x2 = int(x_center + (long_size / 2)) #int(x_center + math.floor(long_size / 2)) 136 | 137 | # New_x1 and new_x2 will change when crop context or overflow 138 | new_x1 = new_x1 + math.ceil((long_size - short_size) / 2) 139 | new_x2 = new_x1 + width 140 | if crop_x1 < 0: 141 | new_x1 = new_x1 + crop_x1 142 | new_x2 = new_x1 + width 143 | crop_x1 = 0 144 | if crop_x2 > data_width: 145 | crop_x2 = data_width 146 | 147 | crop_short_size = crop_x2 - crop_x1 148 | crop_long_size = crop_y2 - crop_y1 149 | square = np.zeros((3, crop_long_size, crop_long_size), dtype = np.uint8) 150 | delta = int((crop_long_size - crop_short_size) / 2) #int(math.ceil((crop_long_size - crop_short_size) / 2)) 151 | square_x1 = delta 152 | square_x2 = delta + crop_short_size 153 | 154 | new_x1 = new_x1 + delta 155 | new_x2 = new_x2 + delta 156 | crop_box = img[:, crop_y1:crop_y2, crop_x1:crop_x2] 157 | square[:, :, square_x1:square_x2] = crop_box 158 | 159 | #show_square = np.zeros((crop_long_size, crop_long_size, 3)) #, dtype=np.int16) 160 | #show_crop_box = original_img[crop_y1:crop_y2, crop_x1:crop_x2, :] 161 | #show_square[:, square_x1:square_x2, :] = show_crop_box 162 | #show_square = show_square.astype(np.int16) 163 | #print(crop_y2 - crop_y1, crop_x2 - crop_x1, bbox, data_height, data_width) 164 | 165 | square = square.astype(np.float32, copy=False) 166 | square_scale = float(target_size[0]) / long_size 167 | square = square.transpose(1,2,0) 168 | square = cv2.resize(square, target_size, interpolation=cv2.INTER_LINEAR) # None, None, fx=square_scale, fy=square_scale, interpolation=cv2.INTER_LINEAR) 169 | #square = square.transpose(2,0,1) 170 | square = square.astype(np.uint8) 171 | 172 | new_x1 = int(new_x1 * square_scale) 173 | new_y1 = int(new_y1 * square_scale) 174 | new_x2 = int(new_x2 * square_scale) 175 | new_y2 = int(new_y2 * square_scale) 176 | 177 | # For test 178 | #show_square = cv2.resize(show_square, target_size, interpolation=cv2.INTER_LINEAR) # None, None, fx=square_scale, fy=square_scale, interpolation=cv2.INTER_LINEAR) 179 | #self.vis_image(show_square, [new_x1, new_y1, new_x2, new_y2], img_path.split('/')[-1][:-4]+'_crop.jpg', './test') 180 | 181 | support_data = square 182 | support_box = np.array([new_x1, new_y1, new_x2, new_y2]).astype(np.float32) 183 | return support_data, support_box 184 | 185 | 186 | def main(): 187 | dataDir = '.' 188 | 189 | #root_path = '/home/fanqi/data/COCO' 190 | root_path = sys.argv[1] 191 | support_path = os.path.join(root_path, '10_shot_support') 192 | #support_path = '10_shot_support' 193 | if not isdir(support_path): 194 | mkdir(support_path) 195 | #else: 196 | # shutil.rmtree(support_path) 197 | 198 | support_dict = {} 199 | 200 | support_dict['support_box'] = [] 201 | support_dict['category_id'] = [] 202 | support_dict['image_id'] = [] 203 | support_dict['id'] = [] 204 | support_dict['file_path'] = [] 205 | 206 | for dataType in ['train2017']: #, 'train2017']: 207 | set_crop_base_path = join(support_path, dataType) 208 | set_img_base_path = join(dataDir, dataType) 209 | 210 | # other information 211 | #annFile = '{}/annotations/instances_{}.json'.format(dataDir,dataType) 212 | #annFile = './new_annotations/final_split_voc_10_shot_instances_train2017.json' 213 | 214 | annFile = os.path.join(root_path, 'new_annotations/final_split_voc_10_shot_instances_train2017.json') 215 | 216 | with open(annFile,'r') as load_f: 217 | dataset = json.load(load_f) 218 | print(dataset.keys()) 219 | save_info = dataset['info'] 220 | save_licenses = dataset['licenses'] 221 | save_images = dataset['images'] 222 | save_categories = dataset['categories'] 223 | 224 | coco = COCO(annFile) 225 | 226 | for img_id, id in enumerate(coco.imgs): 227 | if img_id % 100 == 0: 228 | print(img_id) 229 | img = coco.loadImgs(id)[0] 230 | anns = coco.loadAnns(coco.getAnnIds(imgIds=id, iscrowd=None)) 231 | 232 | if len(anns) == 0: 233 | continue 234 | 235 | #print(img['file_name']) 236 | frame_crop_base_path = join(set_crop_base_path, img['file_name'].split('/')[-1].split('.')[0]) 237 | if not isdir(frame_crop_base_path): makedirs(frame_crop_base_path) 238 | im = cv2.imread('{}/{}'.format(set_img_base_path, img['file_name'])) 239 | for item_id, ann in enumerate(anns): 240 | #print(ann) 241 | rect = ann['bbox'] 242 | bbox = [rect[0], rect[1], rect[0] + rect[2], rect[1] + rect[3]] 243 | support_img, support_box = crop_support(im, bbox) 244 | #im_name = img['file_name'].split('.')[0] + '_' + str(item_id) + '.jpg' 245 | #output_dir = './fig' 246 | #vis_image(support_img[:, :, ::-1], support_box, join(frame_crop_base_path, '{:04d}.jpg'.format(item_id))) 247 | if rect[2] <= 0 or rect[3] <=0: 248 | print(rect) 249 | continue 250 | file_path = join(frame_crop_base_path, '{:04d}.jpg'.format(item_id)) 251 | cv2.imwrite(file_path, support_img) 252 | #print(file_path) 253 | support_dict['support_box'].append(support_box.tolist()) 254 | support_dict['category_id'].append(ann['category_id']) 255 | support_dict['image_id'].append(ann['image_id']) 256 | support_dict['id'].append(ann['id']) 257 | support_dict['file_path'].append(file_path) 258 | 259 | support_df = pd.DataFrame.from_dict(support_dict) 260 | 261 | return support_df 262 | 263 | 264 | if __name__ == '__main__': 265 | since = time.time() 266 | support_df = main() 267 | support_df.to_pickle("./10_shot_support_df.pkl") 268 | 269 | time_elapsed = time.time() - since 270 | print('Total complete in {:.0f}m {:.0f}s'.format( 271 | time_elapsed // 60, time_elapsed % 60)) 272 | 273 | -------------------------------------------------------------------------------- /datasets/coco/5_voc_part.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Created on Fri Jun 5 15:27:52 2020 5 | 6 | @author: fanq15 7 | """ 8 | 9 | from pycocotools.coco import COCO 10 | import cv2 11 | import numpy as np 12 | from os.path import join, isdir 13 | from os import mkdir, makedirs 14 | from concurrent import futures 15 | import sys 16 | import time 17 | import math 18 | import matplotlib.pyplot as plt 19 | import os 20 | import pandas as pd 21 | import json 22 | import sys 23 | 24 | root_path = './' 25 | print(root_path) 26 | #root_path = '/home/fanqi/data/COCO' 27 | dataDir = './annotations' 28 | support_dict = {} 29 | 30 | support_dict['support_box'] = [] 31 | support_dict['category_id'] = [] 32 | support_dict['image_id'] = [] 33 | support_dict['id'] = [] 34 | support_dict['file_path'] = [] 35 | 36 | voc_inds = (0, 1, 2, 3, 4, 5, 6, 8, 14, 15, 16, 17, 18, 19, 39, 56, 57, 58, 60, 62) 37 | 38 | 39 | for dataType in ['instances_train2017.json']: #, 'split_voc_instances_train2017.json']: 40 | annFile = join(dataDir, dataType) 41 | 42 | with open(annFile,'r') as load_f: 43 | dataset = json.load(load_f) 44 | print(dataset.keys()) 45 | save_info = dataset['info'] 46 | save_licenses = dataset['licenses'] 47 | save_images = dataset['images'] 48 | save_categories = dataset['categories'] 49 | save_annotations = dataset['annotations'] 50 | 51 | 52 | inds_split2 = [i for i in range(len(save_categories)) if i not in voc_inds] 53 | 54 | # split annotations according to categories 55 | categories_split1 = [save_categories[i] for i in voc_inds] 56 | categories_split2 = [save_categories[i] for i in inds_split2] 57 | cids_split1 = [c['id'] for c in categories_split1] 58 | cids_split2 = [c['id'] for c in categories_split2] 59 | print('Split 1: {} classes'.format(len(categories_split1))) 60 | for c in categories_split1: 61 | print('\t', c['name']) 62 | print('Split 2: {} classes'.format(len(categories_split2))) 63 | for c in categories_split2: 64 | print('\t', c['name']) 65 | 66 | coco = COCO(annFile) 67 | 68 | # for voc, there can be non_voc images 69 | annotations = dataset['annotations'] 70 | annotations_split1 = [] 71 | 72 | for ann in annotations: 73 | if ann['category_id'] in cids_split1: # voc 20 74 | annotations_split1.append(ann) 75 | 76 | dataset_split1 = { 77 | 'info': save_info, 78 | 'licenses': save_licenses, 79 | 'images': save_images, 80 | 'annotations': annotations_split1, 81 | 'categories': save_categories} 82 | 83 | new_annotations_path = os.path.join(root_path, 'new_annotations') 84 | if not os.path.exists(new_annotations_path): 85 | os.makedirs(new_annotations_path) 86 | split1_file = os.path.join(root_path, 'new_annotations/split_voc_instances_train2017.json') 87 | 88 | with open(split1_file, 'w') as f: 89 | json.dump(dataset_split1, f) 90 | 91 | 92 | 93 | -------------------------------------------------------------------------------- /datasets/coco/6_voc_few_shot.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Created on Fri Jun 5 16:04:06 2020 5 | 6 | @author: fanq15 7 | """ 8 | 9 | from pycocotools.coco import COCO 10 | import cv2 11 | import numpy as np 12 | from os.path import join, isdir 13 | from os import mkdir, makedirs 14 | from concurrent import futures 15 | import sys 16 | import time 17 | import math 18 | import matplotlib.pyplot as plt 19 | import os 20 | import pandas as pd 21 | import json 22 | import sys 23 | 24 | def few_shot(coco, shot_num): 25 | new_anns = [] 26 | all_cls_dict = {} 27 | for img_id, id in enumerate(coco.imgs): 28 | anns = coco.loadAnns(coco.getAnnIds(imgIds=id, iscrowd=None)) 29 | skip_flag = False 30 | img_cls_dict = {} 31 | if len(anns) != 1: 32 | continue 33 | for ann in anns: 34 | area = ann['area'] 35 | category_id = ann['category_id'] 36 | id = ann['id'] 37 | 38 | if category_id in img_cls_dict.keys(): 39 | img_cls_dict[category_id] += 1 40 | else: 41 | img_cls_dict[category_id] = 1 42 | 43 | # filter images with small boxes 44 | if area < 64 * 64 or area > 224 * 224: 45 | skip_flag = True 46 | 47 | if category_id in all_cls_dict.keys(): 48 | if all_cls_dict[category_id] == shot_num: 49 | skip_flag = True 50 | 51 | if skip_flag: 52 | continue 53 | else: 54 | for ann in anns: 55 | new_anns.append(ann) 56 | for category_id, num in img_cls_dict.items(): 57 | if category_id in all_cls_dict.keys(): 58 | all_cls_dict[category_id] += num 59 | else: 60 | all_cls_dict[category_id] = num 61 | print(len(new_anns)) 62 | print(sorted(all_cls_dict.items(), key = lambda kv:(kv[1], kv[0]))) 63 | return new_anns 64 | 65 | 66 | root_path = './' 67 | #root_path = '/home/fanqi/data/COCO' 68 | dataDir = os.path.join(root_path, 'new_annotations') 69 | support_dict = {} 70 | 71 | support_dict['support_box'] = [] 72 | support_dict['category_id'] = [] 73 | support_dict['image_id'] = [] 74 | support_dict['id'] = [] 75 | support_dict['file_path'] = [] 76 | 77 | 78 | for dataType in ['split_voc_instances_train2017.json']: 79 | annFile = join(dataDir, dataType) 80 | 81 | with open(annFile,'r') as load_f: 82 | dataset = json.load(load_f) 83 | print(dataset.keys()) 84 | save_info = dataset['info'] 85 | save_licenses = dataset['licenses'] 86 | save_images = dataset['images'] 87 | save_categories = dataset['categories'] 88 | 89 | print(annFile) 90 | shot_num = 10 91 | coco = COCO(annFile) 92 | print(coco) 93 | 94 | annotations = few_shot(coco, shot_num) 95 | dataset_split = { 96 | 'info': save_info, 97 | 'licenses': save_licenses, 98 | 'images': save_images, 99 | 'annotations': annotations, 100 | 'categories': save_categories} 101 | #split_file = os.path.join(root_path, 'new_annotations/final_split_voc_10_shot_instances_train2017.json') 102 | split_file = './new_annotations/final_split_voc_10_shot_instances_train2017.json' 103 | 104 | with open(split_file, 'w') as f: 105 | json.dump(dataset_split, f) 106 | 107 | 108 | 109 | 110 | -------------------------------------------------------------------------------- /datasets/generate_support_data.sh: -------------------------------------------------------------------------------- 1 | DATA_ROOT=/home/fanqi/data/COCO 2 | 3 | cd coco 4 | 5 | ln -s $DATA_ROOT/train2017 ./ 6 | ln -s $DATA_ROOT/val2017 ./ 7 | ln -s $DATA_ROOT/annotations ./ 8 | 9 | python3 1_split_filter.py ./ 10 | #python3 2_balance.py ./ 11 | python3 3_gen_support_pool.py ./ 12 | python3 4_gen_support_pool_10_shot.py ./ 13 | 14 | -------------------------------------------------------------------------------- /fewx/__init__.py: -------------------------------------------------------------------------------- 1 | from fewx import modeling 2 | from fewx import config 3 | from fewx import layers 4 | 5 | __version__ = "0.1" 6 | -------------------------------------------------------------------------------- /fewx/config/__init__.py: -------------------------------------------------------------------------------- 1 | from .config import get_cfg 2 | 3 | __all__ = [ 4 | "get_cfg", 5 | ] 6 | -------------------------------------------------------------------------------- /fewx/config/config.py: -------------------------------------------------------------------------------- 1 | from detectron2.config import CfgNode 2 | 3 | 4 | def get_cfg() -> CfgNode: 5 | """ 6 | Get a copy of the default config. 7 | Returns: 8 | a detectron2 CfgNode instance. 9 | """ 10 | from .defaults import _C 11 | 12 | return _C.clone() 13 | -------------------------------------------------------------------------------- /fewx/config/defaults.py: -------------------------------------------------------------------------------- 1 | from detectron2.config.defaults import _C 2 | from detectron2.config import CfgNode as CN 3 | 4 | 5 | # ---------------------------------------------------------------------------- # 6 | # Additional Configs 7 | # ---------------------------------------------------------------------------- # 8 | _C.SOLVER.HEAD_LR_FACTOR = 1.0 9 | 10 | # ---------------------------------------------------------------------------- # 11 | # Few shot setting 12 | # ---------------------------------------------------------------------------- # 13 | _C.INPUT.FS = CN() 14 | _C.INPUT.FS.FEW_SHOT = False 15 | _C.INPUT.FS.SUPPORT_WAY = 2 16 | _C.INPUT.FS.SUPPORT_SHOT = 10 17 | -------------------------------------------------------------------------------- /fewx/data/__init__.py: -------------------------------------------------------------------------------- 1 | from .dataset_mapper import DatasetMapperWithSupport 2 | # ensure the builtin datasets are registered 3 | from . import datasets # isort:skip 4 | 5 | __all__ = [k for k in globals().keys() if not k.startswith("_")] 6 | -------------------------------------------------------------------------------- /fewx/data/build.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | import bisect 3 | import copy 4 | import itertools 5 | import logging 6 | import numpy as np 7 | import operator 8 | import pickle 9 | import torch.utils.data 10 | from fvcore.common.file_io import PathManager 11 | from tabulate import tabulate 12 | from termcolor import colored 13 | 14 | from detectron2.structures import BoxMode 15 | from detectron2.utils.comm import get_world_size 16 | from detectron2.utils.env import seed_all_rng 17 | from detectron2.utils.logger import log_first_n 18 | 19 | from detectron2.data.catalog import DatasetCatalog, MetadataCatalog 20 | from detectron2.data.common import AspectRatioGroupedDataset, DatasetFromList, MapDataset 21 | from detectron2.data.dataset_mapper import DatasetMapper 22 | from detectron2.data.detection_utils import check_metadata_consistency 23 | from detectron2.data.samplers import InferenceSampler, RepeatFactorTrainingSampler, TrainingSampler 24 | 25 | from detectron2.data.build import build_batch_data_loader, filter_images_with_only_crowd_annotations, load_proposals_into_dataset, filter_images_with_few_keypoints, print_instances_class_histogram, trivial_batch_collator, get_detection_dataset_dicts 26 | 27 | def fsod_get_detection_dataset_dicts( 28 | dataset_names, filter_empty=True, min_keypoints=0, proposal_files=None 29 | ): 30 | """ 31 | Load and prepare dataset dicts for instance detection/segmentation and semantic segmentation. 32 | Args: 33 | dataset_names (list[str]): a list of dataset names 34 | filter_empty (bool): whether to filter out images without instance annotations 35 | min_keypoints (int): filter out images with fewer keypoints than 36 | `min_keypoints`. Set to 0 to do nothing. 37 | proposal_files (list[str]): if given, a list of object proposal files 38 | that match each dataset in `dataset_names`. 39 | """ 40 | assert len(dataset_names) 41 | dataset_dicts_original = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names] 42 | for dataset_name, dicts in zip(dataset_names, dataset_dicts_original): 43 | assert len(dicts), "Dataset '{}' is empty!".format(dataset_name) 44 | 45 | if proposal_files is not None: 46 | assert len(dataset_names) == len(proposal_files) 47 | # load precomputed proposals from proposal files 48 | dataset_dicts_original = [ 49 | load_proposals_into_dataset(dataset_i_dicts, proposal_file) 50 | for dataset_i_dicts, proposal_file in zip(dataset_dicts_original, proposal_files) 51 | ] 52 | 53 | if 'train' not in dataset_names[0]: 54 | dataset_dicts = list(itertools.chain.from_iterable(dataset_dicts_original)) 55 | else: 56 | dataset_dicts_original = list(itertools.chain.from_iterable(dataset_dicts_original)) 57 | dataset_dicts_original = filter_images_with_only_crowd_annotations(dataset_dicts_original) 58 | ################################################################################### 59 | # split image-based annotations to instance-based annotations for few-shot learning 60 | dataset_dicts = [] 61 | index_dicts = [] 62 | split_flag = True 63 | if split_flag: 64 | for record in dataset_dicts_original: 65 | file_name = record['file_name'] 66 | height = record['height'] 67 | width = record['width'] 68 | image_id = record['image_id'] 69 | annotations = record['annotations'] 70 | category_dict = {} 71 | for ann_id, ann in enumerate(annotations): 72 | 73 | ann.pop("segmentation", None) 74 | ann.pop("keypoints", None) 75 | 76 | category_id = ann['category_id'] 77 | if category_id not in category_dict.keys(): 78 | category_dict[category_id] = [ann] 79 | else: 80 | category_dict[category_id].append(ann) 81 | 82 | for key, item in category_dict.items(): 83 | instance_ann = {} 84 | instance_ann['file_name'] = file_name 85 | instance_ann['height'] = height 86 | instance_ann['width'] = width 87 | 88 | instance_ann['annotations'] = item 89 | 90 | dataset_dicts.append(instance_ann) 91 | 92 | 93 | has_instances = "annotations" in dataset_dicts[0] 94 | # Keep images without instance-level GT if the dataset has semantic labels. 95 | if filter_empty and has_instances and "sem_seg_file_name" not in dataset_dicts[0]: 96 | dataset_dicts = filter_images_with_only_crowd_annotations(dataset_dicts) 97 | 98 | if min_keypoints > 0 and has_instances: 99 | dataset_dicts = filter_images_with_few_keypoints(dataset_dicts, min_keypoints) 100 | 101 | if has_instances: 102 | try: 103 | class_names = MetadataCatalog.get(dataset_names[0]).thing_classes 104 | check_metadata_consistency("thing_classes", dataset_names) 105 | print_instances_class_histogram(dataset_dicts, class_names) 106 | except AttributeError: # class names are not available for this dataset 107 | pass 108 | return dataset_dicts 109 | 110 | def build_detection_train_loader(cfg, mapper=None): 111 | """ 112 | A data loader is created by the following steps: 113 | 1. Use the dataset names in config to query :class:`DatasetCatalog`, and obtain a list of dicts. 114 | 2. Coordinate a random shuffle order shared among all processes (all GPUs) 115 | 3. Each process spawn another few workers to process the dicts. Each worker will: 116 | * Map each metadata dict into another format to be consumed by the model. 117 | * Batch them by simply putting dicts into a list. 118 | The batched ``list[mapped_dict]`` is what this dataloader will yield. 119 | Args: 120 | cfg (CfgNode): the config 121 | mapper (callable): a callable which takes a sample (dict) from dataset and 122 | returns the format to be consumed by the model. 123 | By default it will be `DatasetMapper(cfg, True)`. 124 | Returns: 125 | an infinite iterator of training data 126 | """ 127 | dataset_dicts = fsod_get_detection_dataset_dicts( 128 | cfg.DATASETS.TRAIN, 129 | filter_empty=cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS, 130 | min_keypoints=cfg.MODEL.ROI_KEYPOINT_HEAD.MIN_KEYPOINTS_PER_IMAGE 131 | if cfg.MODEL.KEYPOINT_ON 132 | else 0, 133 | proposal_files=cfg.DATASETS.PROPOSAL_FILES_TRAIN if cfg.MODEL.LOAD_PROPOSALS else None, 134 | ) 135 | dataset = DatasetFromList(dataset_dicts, copy=False) 136 | 137 | if mapper is None: 138 | mapper = DatasetMapper(cfg, True) 139 | dataset = MapDataset(dataset, mapper) 140 | 141 | sampler_name = cfg.DATALOADER.SAMPLER_TRAIN 142 | logger = logging.getLogger(__name__) 143 | logger.info("Using training sampler {}".format(sampler_name)) 144 | # TODO avoid if-else? 145 | if sampler_name == "TrainingSampler": 146 | sampler = TrainingSampler(len(dataset)) 147 | elif sampler_name == "RepeatFactorTrainingSampler": 148 | repeat_factors = RepeatFactorTrainingSampler.repeat_factors_from_category_frequency( 149 | dataset_dicts, cfg.DATALOADER.REPEAT_THRESHOLD 150 | ) 151 | sampler = RepeatFactorTrainingSampler(repeat_factors) 152 | else: 153 | raise ValueError("Unknown training sampler: {}".format(sampler_name)) 154 | return build_batch_data_loader( 155 | dataset, 156 | sampler, 157 | cfg.SOLVER.IMS_PER_BATCH, 158 | aspect_ratio_grouping=cfg.DATALOADER.ASPECT_RATIO_GROUPING, 159 | num_workers=cfg.DATALOADER.NUM_WORKERS, 160 | ) 161 | 162 | def build_detection_test_loader(cfg, dataset_name, mapper=None): 163 | """ 164 | Similar to `build_detection_train_loader`. 165 | But this function uses the given `dataset_name` argument (instead of the names in cfg), 166 | and uses batch size 1. 167 | Args: 168 | cfg: a detectron2 CfgNode 169 | dataset_name (str): a name of the dataset that's available in the DatasetCatalog 170 | mapper (callable): a callable which takes a sample (dict) from dataset 171 | and returns the format to be consumed by the model. 172 | By default it will be `DatasetMapper(cfg, False)`. 173 | Returns: 174 | DataLoader: a torch DataLoader, that loads the given detection 175 | dataset, with test-time transformation and batching. 176 | """ 177 | dataset_dicts = get_detection_dataset_dicts( 178 | [dataset_name], 179 | filter_empty=False, # True, 180 | proposal_files=[ 181 | cfg.DATASETS.PROPOSAL_FILES_TEST[list(cfg.DATASETS.TEST).index(dataset_name)] 182 | ] 183 | if cfg.MODEL.LOAD_PROPOSALS 184 | else None, 185 | ) 186 | 187 | dataset = DatasetFromList(dataset_dicts) 188 | if mapper is None: 189 | mapper = DatasetMapper(cfg, False) # True) 190 | dataset = MapDataset(dataset, mapper) 191 | 192 | sampler = InferenceSampler(len(dataset)) 193 | # Always use 1 image per worker during inference since this is the 194 | # standard when reporting inference time in papers. 195 | batch_sampler = torch.utils.data.sampler.BatchSampler(sampler, 1, drop_last=False) 196 | 197 | data_loader = torch.utils.data.DataLoader( 198 | dataset, 199 | num_workers=cfg.DATALOADER.NUM_WORKERS, 200 | batch_sampler=batch_sampler, 201 | collate_fn=trivial_batch_collator, 202 | ) 203 | return data_loader 204 | -------------------------------------------------------------------------------- /fewx/data/dataset_mapper.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | import copy 3 | import logging 4 | import numpy as np 5 | import torch 6 | from fvcore.common.file_io import PathManager 7 | from PIL import Image 8 | 9 | from detectron2.data import detection_utils as utils 10 | from detectron2.data import transforms as T 11 | 12 | import pandas as pd 13 | from detectron2.data.catalog import MetadataCatalog 14 | 15 | """ 16 | This file contains the default mapping that's applied to "dataset dicts". 17 | """ 18 | 19 | __all__ = ["DatasetMapperWithSupport"] 20 | 21 | 22 | class DatasetMapperWithSupport: 23 | """ 24 | A callable which takes a dataset dict in Detectron2 Dataset format, 25 | and map it into a format used by the model. 26 | 27 | This is the default callable to be used to map your dataset dict into training data. 28 | You may need to follow it to implement your own one for customized logic, 29 | such as a different way to read or transform images. 30 | See :doc:`/tutorials/data_loading` for details. 31 | 32 | The callable currently does the following: 33 | 34 | 1. Read the image from "file_name" 35 | 2. Applies cropping/geometric transforms to the image and annotations 36 | 3. Prepare data and annotations to Tensor and :class:`Instances` 37 | """ 38 | 39 | def __init__(self, cfg, is_train=True): 40 | if cfg.INPUT.CROP.ENABLED and is_train: 41 | self.crop_gen = T.RandomCrop(cfg.INPUT.CROP.TYPE, cfg.INPUT.CROP.SIZE) 42 | logging.getLogger(__name__).info("CropGen used in training: " + str(self.crop_gen)) 43 | else: 44 | self.crop_gen = None 45 | 46 | self.tfm_gens = utils.build_transform_gen(cfg, is_train) 47 | 48 | # fmt: off 49 | self.img_format = cfg.INPUT.FORMAT 50 | self.mask_on = cfg.MODEL.MASK_ON 51 | self.mask_format = cfg.INPUT.MASK_FORMAT 52 | self.keypoint_on = cfg.MODEL.KEYPOINT_ON 53 | self.load_proposals = cfg.MODEL.LOAD_PROPOSALS 54 | 55 | self.few_shot = cfg.INPUT.FS.FEW_SHOT 56 | self.support_way = cfg.INPUT.FS.SUPPORT_WAY 57 | self.support_shot = cfg.INPUT.FS.SUPPORT_SHOT 58 | # fmt: on 59 | if self.keypoint_on and is_train: 60 | # Flip only makes sense in training 61 | self.keypoint_hflip_indices = utils.create_keypoint_hflip_indices(cfg.DATASETS.TRAIN) 62 | else: 63 | self.keypoint_hflip_indices = None 64 | 65 | if self.load_proposals: 66 | self.proposal_min_box_size = cfg.MODEL.PROPOSAL_GENERATOR.MIN_SIZE 67 | self.proposal_topk = ( 68 | cfg.DATASETS.PRECOMPUTED_PROPOSAL_TOPK_TRAIN 69 | if is_train 70 | else cfg.DATASETS.PRECOMPUTED_PROPOSAL_TOPK_TEST 71 | ) 72 | self.is_train = is_train 73 | 74 | if self.is_train: 75 | # support_df 76 | self.support_on = True 77 | if self.few_shot: 78 | self.support_df = pd.read_pickle("./datasets/coco/10_shot_support_df.pkl") 79 | else: 80 | self.support_df = pd.read_pickle("./datasets/coco/train_support_df.pkl") 81 | 82 | metadata = MetadataCatalog.get('coco_2017_train') 83 | # unmap the category mapping ids for COCO 84 | reverse_id_mapper = lambda dataset_id: metadata.thing_dataset_id_to_contiguous_id[dataset_id] # noqa 85 | self.support_df['category_id'] = self.support_df['category_id'].map(reverse_id_mapper) 86 | 87 | 88 | def __call__(self, dataset_dict): 89 | """ 90 | Args: 91 | dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format. 92 | 93 | Returns: 94 | dict: a format that builtin models in detectron2 accept 95 | """ 96 | dataset_dict = copy.deepcopy(dataset_dict) # it will be modified by code below 97 | # USER: Write your own image loading if it's not from a file 98 | image = utils.read_image(dataset_dict["file_name"], format=self.img_format) 99 | utils.check_image_size(dataset_dict, image) 100 | if self.is_train: 101 | # support 102 | if self.support_on: 103 | if "annotations" in dataset_dict: 104 | # USER: Modify this if you want to keep them for some reason. 105 | for anno in dataset_dict["annotations"]: 106 | if not self.mask_on: 107 | anno.pop("segmentation", None) 108 | if not self.keypoint_on: 109 | anno.pop("keypoints", None) 110 | support_images, support_bboxes, support_cls = self.generate_support(dataset_dict) 111 | dataset_dict['support_images'] = torch.as_tensor(np.ascontiguousarray(support_images)) 112 | dataset_dict['support_bboxes'] = support_bboxes 113 | dataset_dict['support_cls'] = support_cls 114 | 115 | if "annotations" not in dataset_dict: 116 | image, transforms = T.apply_transform_gens( 117 | ([self.crop_gen] if self.crop_gen else []) + self.tfm_gens, image 118 | ) 119 | else: 120 | # Crop around an instance if there are instances in the image. 121 | # USER: Remove if you don't use cropping 122 | if self.crop_gen: 123 | crop_tfm = utils.gen_crop_transform_with_instance( 124 | self.crop_gen.get_crop_size(image.shape[:2]), 125 | image.shape[:2], 126 | np.random.choice(dataset_dict["annotations"]), 127 | ) 128 | image = crop_tfm.apply_image(image) 129 | image, transforms = T.apply_transform_gens(self.tfm_gens, image) 130 | if self.crop_gen: 131 | transforms = crop_tfm + transforms 132 | 133 | image_shape = image.shape[:2] # h, w 134 | 135 | # Pytorch's dataloader is efficient on torch.Tensor due to shared-memory, 136 | # but not efficient on large generic data structures due to the use of pickle & mp.Queue. 137 | # Therefore it's important to use torch.Tensor. 138 | dataset_dict["image"] = torch.as_tensor(np.ascontiguousarray(image.transpose(2, 0, 1))) 139 | 140 | # USER: Remove if you don't use pre-computed proposals. 141 | # Most users would not need this feature. 142 | if self.load_proposals: 143 | utils.transform_proposals( 144 | dataset_dict, 145 | image_shape, 146 | transforms, 147 | self.proposal_min_box_size, 148 | self.proposal_topk, 149 | ) 150 | 151 | if not self.is_train: 152 | # USER: Modify this if you want to keep them for some reason. 153 | dataset_dict.pop("annotations", None) 154 | dataset_dict.pop("sem_seg_file_name", None) 155 | return dataset_dict 156 | 157 | if "annotations" in dataset_dict: 158 | # USER: Modify this if you want to keep them for some reason. 159 | for anno in dataset_dict["annotations"]: 160 | if not self.mask_on: 161 | anno.pop("segmentation", None) 162 | if not self.keypoint_on: 163 | anno.pop("keypoints", None) 164 | 165 | # USER: Implement additional transformations if you have other types of data 166 | annos = [ 167 | utils.transform_instance_annotations( 168 | obj, transforms, image_shape, keypoint_hflip_indices=self.keypoint_hflip_indices 169 | ) 170 | for obj in dataset_dict.pop("annotations") 171 | if obj.get("iscrowd", 0) == 0 172 | ] 173 | instances = utils.annotations_to_instances( 174 | annos, image_shape, mask_format=self.mask_format 175 | ) 176 | # Create a tight bounding box from masks, useful when image is cropped 177 | if self.crop_gen and instances.has("gt_masks"): 178 | instances.gt_boxes = instances.gt_masks.get_bounding_boxes() 179 | dataset_dict["instances"] = utils.filter_empty_instances(instances) 180 | 181 | # USER: Remove if you don't do semantic/panoptic segmentation. 182 | if "sem_seg_file_name" in dataset_dict: 183 | with PathManager.open(dataset_dict.pop("sem_seg_file_name"), "rb") as f: 184 | sem_seg_gt = Image.open(f) 185 | sem_seg_gt = np.asarray(sem_seg_gt, dtype="uint8") 186 | sem_seg_gt = transforms.apply_segmentation(sem_seg_gt) 187 | sem_seg_gt = torch.as_tensor(sem_seg_gt.astype("long")) 188 | dataset_dict["sem_seg"] = sem_seg_gt 189 | return dataset_dict 190 | 191 | def generate_support(self, dataset_dict): 192 | support_way = self.support_way #2 193 | support_shot = self.support_shot #5 194 | 195 | id = dataset_dict['annotations'][0]['id'] 196 | query_cls = self.support_df.loc[self.support_df['id']==id, 'category_id'].tolist()[0] # they share the same category_id and image_id 197 | query_img = self.support_df.loc[self.support_df['id']==id, 'image_id'].tolist()[0] 198 | all_cls = self.support_df.loc[self.support_df['image_id']==query_img, 'category_id'].tolist() 199 | 200 | # Crop support data and get new support box in the support data 201 | support_data_all = np.zeros((support_way * support_shot, 3, 320, 320), dtype = np.float32) 202 | support_box_all = np.zeros((support_way * support_shot, 4), dtype = np.float32) 203 | used_image_id = [query_img] 204 | 205 | used_id_ls = [] 206 | for item in dataset_dict['annotations']: 207 | used_id_ls.append(item['id']) 208 | #used_category_id = [query_cls] 209 | used_category_id = list(set(all_cls)) 210 | support_category_id = [] 211 | mixup_i = 0 212 | 213 | for shot in range(support_shot): 214 | # Support image and box 215 | support_id = self.support_df.loc[(self.support_df['category_id'] == query_cls) & (~self.support_df['image_id'].isin(used_image_id)) & (~self.support_df['id'].isin(used_id_ls)), 'id'].sample(random_state=id).tolist()[0] 216 | support_cls = self.support_df.loc[self.support_df['id'] == support_id, 'category_id'].tolist()[0] 217 | support_img = self.support_df.loc[self.support_df['id'] == support_id, 'image_id'].tolist()[0] 218 | used_id_ls.append(support_id) 219 | used_image_id.append(support_img) 220 | 221 | support_db = self.support_df.loc[self.support_df['id'] == support_id, :] 222 | assert support_db['id'].values[0] == support_id 223 | 224 | support_data = utils.read_image('./datasets/coco/' + support_db["file_path"].tolist()[0], format=self.img_format) 225 | support_data = torch.as_tensor(np.ascontiguousarray(support_data.transpose(2, 0, 1))) 226 | support_box = support_db['support_box'].tolist()[0] 227 | #print(support_data) 228 | support_data_all[mixup_i] = support_data 229 | support_box_all[mixup_i] = support_box 230 | support_category_id.append(0) #support_cls) 231 | mixup_i += 1 232 | 233 | if support_way == 1: 234 | pass 235 | else: 236 | for way in range(support_way-1): 237 | other_cls = self.support_df.loc[(~self.support_df['category_id'].isin(used_category_id)), 'category_id'].drop_duplicates().sample(random_state=id).tolist()[0] 238 | used_category_id.append(other_cls) 239 | for shot in range(support_shot): 240 | # Support image and box 241 | 242 | support_id = self.support_df.loc[(self.support_df['category_id'] == other_cls) & (~self.support_df['image_id'].isin(used_image_id)) & (~self.support_df['id'].isin(used_id_ls)), 'id'].sample(random_state=id).tolist()[0] 243 | 244 | support_cls = self.support_df.loc[self.support_df['id'] == support_id, 'category_id'].tolist()[0] 245 | support_img = self.support_df.loc[self.support_df['id'] == support_id, 'image_id'].tolist()[0] 246 | 247 | used_id_ls.append(support_id) 248 | used_image_id.append(support_img) 249 | 250 | support_db = self.support_df.loc[self.support_df['id'] == support_id, :] 251 | assert support_db['id'].values[0] == support_id 252 | 253 | support_data = utils.read_image('./datasets/coco/' + support_db["file_path"].tolist()[0], format=self.img_format) 254 | support_data = torch.as_tensor(np.ascontiguousarray(support_data.transpose(2, 0, 1))) 255 | support_box = support_db['support_box'].tolist()[0] 256 | support_data_all[mixup_i] = support_data 257 | support_box_all[mixup_i] = support_box 258 | support_category_id.append(1) #support_cls) 259 | mixup_i += 1 260 | 261 | return support_data_all, support_box_all, support_category_id 262 | -------------------------------------------------------------------------------- /fewx/data/datasets/__init__.py: -------------------------------------------------------------------------------- 1 | from . import builtin # ensure the builtin datasets are registered 2 | from .register_coco import register_coco_instances 3 | 4 | __all__ = [k for k in globals().keys() if "builtin" not in k and not k.startswith("_")] 5 | -------------------------------------------------------------------------------- /fewx/data/datasets/builtin.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from .register_coco import register_coco_instances 4 | from detectron2.data.datasets.builtin_meta import _get_builtin_metadata 5 | 6 | # ==== Predefined datasets and splits for COCO ========== 7 | 8 | _PREDEFINED_SPLITS_COCO = {} 9 | _PREDEFINED_SPLITS_COCO["coco"] = { 10 | "coco_2017_train_nonvoc": ("coco/train2017", "coco/new_annotations/final_split_non_voc_instances_train2017.json"), 11 | "coco_2017_train_voc_10_shot": ("coco/train2017", "coco/new_annotations/final_split_voc_10_shot_instances_train2017.json"), 12 | } 13 | 14 | def register_all_coco(root): 15 | for dataset_name, splits_per_dataset in _PREDEFINED_SPLITS_COCO.items(): 16 | for key, (image_root, json_file) in splits_per_dataset.items(): 17 | # Assume pre-defined datasets live in `./datasets`. 18 | register_coco_instances( 19 | key, 20 | _get_builtin_metadata(dataset_name), 21 | os.path.join(root, json_file) if "://" not in json_file else json_file, 22 | os.path.join(root, image_root), 23 | ) 24 | 25 | # Register them all under "./datasets" 26 | _root = os.getenv("DETECTRON2_DATASETS", "datasets") 27 | register_all_coco(_root) 28 | -------------------------------------------------------------------------------- /fewx/data/datasets/register_coco.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | import copy 3 | import os 4 | 5 | from detectron2.data import DatasetCatalog, MetadataCatalog 6 | 7 | from detectron2.data.datasets.coco import load_coco_json, load_sem_seg 8 | 9 | """ 10 | This file contains functions to register a COCO-format dataset to the DatasetCatalog. 11 | """ 12 | 13 | __all__ = ["register_coco_instances", "register_coco_panoptic_separated"] 14 | 15 | 16 | def register_coco_instances(name, metadata, json_file, image_root): 17 | """ 18 | Register a dataset in COCO's json annotation format for 19 | instance detection, instance segmentation and keypoint detection. 20 | (i.e., Type 1 and 2 in http://cocodataset.org/#format-data. 21 | `instances*.json` and `person_keypoints*.json` in the dataset). 22 | This is an example of how to register a new dataset. 23 | You can do something similar to this function, to register new datasets. 24 | Args: 25 | name (str): the name that identifies a dataset, e.g. "coco_2014_train". 26 | metadata (dict): extra metadata associated with this dataset. You can 27 | leave it as an empty dict. 28 | json_file (str): path to the json instance annotation file. 29 | image_root (str or path-like): directory which contains all the images. 30 | """ 31 | assert isinstance(name, str), name 32 | assert isinstance(json_file, (str, os.PathLike)), json_file 33 | assert isinstance(image_root, (str, os.PathLike)), image_root 34 | # 1. register a function which returns dicts 35 | DatasetCatalog.register(name, lambda: load_coco_json(json_file, image_root, name, extra_annotation_keys=['id'])) 36 | 37 | # 2. Optionally, add metadata about this dataset, 38 | # since they might be useful in evaluation, visualization or logging 39 | MetadataCatalog.get(name).set( 40 | json_file=json_file, image_root=image_root, evaluator_type="coco", **metadata 41 | ) 42 | -------------------------------------------------------------------------------- /fewx/evaluation/__init__.py: -------------------------------------------------------------------------------- 1 | from .coco_evaluation import COCOEvaluator 2 | 3 | __all__ = [k for k in globals().keys() if not k.startswith("_")] 4 | -------------------------------------------------------------------------------- /fewx/evaluation/coco_evaluation.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | import contextlib 3 | import copy 4 | import io 5 | import itertools 6 | import json 7 | import logging 8 | import numpy as np 9 | import os 10 | import pickle 11 | from collections import OrderedDict 12 | import pycocotools.mask as mask_util 13 | import torch 14 | from fvcore.common.file_io import PathManager 15 | from pycocotools.coco import COCO 16 | from tabulate import tabulate 17 | 18 | import detectron2.utils.comm as comm 19 | from detectron2.data import MetadataCatalog 20 | from detectron2.data.datasets.coco import convert_to_coco_json 21 | from detectron2.evaluation.fast_eval_api import COCOeval_opt as COCOeval 22 | from detectron2.structures import Boxes, BoxMode, pairwise_iou 23 | from detectron2.utils.logger import create_small_table 24 | 25 | from detectron2.evaluation.evaluator import DatasetEvaluator 26 | 27 | CLASS_NAMES = [ 28 | "airplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", 29 | "chair", "cow", "dining table", "dog", "horse", "motorcycle", "person", 30 | "potted plant", "sheep", "couch", "train", "tv", 31 | ] 32 | 33 | class COCOEvaluator(DatasetEvaluator): 34 | """ 35 | Evaluate AR for object proposals, AP for instance detection/segmentation, AP 36 | for keypoint detection outputs using COCO's metrics. 37 | See http://cocodataset.org/#detection-eval and 38 | http://cocodataset.org/#keypoints-eval to understand its metrics. 39 | 40 | In addition to COCO, this evaluator is able to support any bounding box detection, 41 | instance segmentation, or keypoint detection dataset. 42 | """ 43 | 44 | def __init__(self, dataset_name, cfg, distributed, output_dir=None): 45 | """ 46 | Args: 47 | dataset_name (str): name of the dataset to be evaluated. 48 | It must have either the following corresponding metadata: 49 | 50 | "json_file": the path to the COCO format annotation 51 | 52 | Or it must be in detectron2's standard dataset format 53 | so it can be converted to COCO format automatically. 54 | cfg (CfgNode): config instance 55 | distributed (True): if True, will collect results from all ranks and run evaluation 56 | in the main process. 57 | Otherwise, will evaluate the results in the current process. 58 | output_dir (str): optional, an output directory to dump all 59 | results predicted on the dataset. The dump contains two files: 60 | 61 | 1. "instance_predictions.pth" a file in torch serialization 62 | format that contains all the raw original predictions. 63 | 2. "coco_instances_results.json" a json file in COCO's result 64 | format. 65 | """ 66 | self._tasks = self._tasks_from_config(cfg) 67 | self._distributed = distributed 68 | self._output_dir = output_dir 69 | 70 | self._cpu_device = torch.device("cpu") 71 | 72 | # replace fewx with d2 73 | self._logger = logging.getLogger(__name__) 74 | self._metadata = MetadataCatalog.get(dataset_name) 75 | if not hasattr(self._metadata, "json_file"): 76 | self._logger.info( 77 | f"'{dataset_name}' is not registered by `register_coco_instances`." 78 | " Therefore trying to convert it to COCO format ..." 79 | ) 80 | 81 | cache_path = os.path.join(output_dir, f"{dataset_name}_coco_format.json") 82 | self._metadata.json_file = cache_path 83 | convert_to_coco_json(dataset_name, cache_path) 84 | 85 | json_file = PathManager.get_local_path(self._metadata.json_file) 86 | with contextlib.redirect_stdout(io.StringIO()): 87 | self._coco_api = COCO(json_file) 88 | 89 | self._kpt_oks_sigmas = cfg.TEST.KEYPOINT_OKS_SIGMAS 90 | # Test set json files do not contain annotations (evaluation must be 91 | # performed using the COCO evaluation server). 92 | self._do_evaluation = "annotations" in self._coco_api.dataset 93 | 94 | def reset(self): 95 | self._predictions = [] 96 | 97 | def _tasks_from_config(self, cfg): 98 | """ 99 | Returns: 100 | tuple[str]: tasks that can be evaluated under the given configuration. 101 | """ 102 | tasks = ("bbox",) 103 | if cfg.MODEL.MASK_ON: 104 | tasks = tasks + ("segm",) 105 | if cfg.MODEL.KEYPOINT_ON: 106 | tasks = tasks + ("keypoints",) 107 | return tasks 108 | 109 | def process(self, inputs, outputs): 110 | """ 111 | Args: 112 | inputs: the inputs to a COCO model (e.g., GeneralizedRCNN). 113 | It is a list of dict. Each dict corresponds to an image and 114 | contains keys like "height", "width", "file_name", "image_id". 115 | outputs: the outputs of a COCO model. It is a list of dicts with key 116 | "instances" that contains :class:`Instances`. 117 | """ 118 | for input, output in zip(inputs, outputs): 119 | prediction = {"image_id": input["image_id"]} 120 | 121 | # TODO this is ugly 122 | if "instances" in output: 123 | instances = output["instances"].to(self._cpu_device) 124 | prediction["instances"] = instances_to_coco_json(instances, input["image_id"]) 125 | if "proposals" in output: 126 | prediction["proposals"] = output["proposals"].to(self._cpu_device) 127 | self._predictions.append(prediction) 128 | 129 | def evaluate(self): 130 | if self._distributed: 131 | comm.synchronize() 132 | predictions = comm.gather(self._predictions, dst=0) 133 | predictions = list(itertools.chain(*predictions)) 134 | 135 | if not comm.is_main_process(): 136 | return {} 137 | else: 138 | predictions = self._predictions 139 | 140 | if len(predictions) == 0: 141 | self._logger.warning("[COCOEvaluator] Did not receive valid predictions.") 142 | return {} 143 | 144 | if self._output_dir: 145 | PathManager.mkdirs(self._output_dir) 146 | file_path = os.path.join(self._output_dir, "instances_predictions.pth") 147 | with PathManager.open(file_path, "wb") as f: 148 | torch.save(predictions, f) 149 | 150 | self._results = OrderedDict() 151 | if "proposals" in predictions[0]: 152 | self._eval_box_proposals(predictions) 153 | if "instances" in predictions[0]: 154 | self._eval_predictions(set(self._tasks), predictions) 155 | # Copy so the caller can do whatever with results 156 | return copy.deepcopy(self._results) 157 | 158 | def _eval_predictions(self, tasks, predictions): 159 | """ 160 | Evaluate predictions on the given tasks. 161 | Fill self._results with the metrics of the tasks. 162 | """ 163 | self._logger.info("Preparing results for COCO format ...") 164 | coco_results = list(itertools.chain(*[x["instances"] for x in predictions])) 165 | 166 | # unmap the category ids for COCO 167 | if hasattr(self._metadata, "thing_dataset_id_to_contiguous_id"): 168 | reverse_id_mapping = { 169 | v: k for k, v in self._metadata.thing_dataset_id_to_contiguous_id.items() 170 | } 171 | for result in coco_results: 172 | category_id = result["category_id"] 173 | assert ( 174 | category_id in reverse_id_mapping 175 | ), "A prediction has category_id={}, which is not available in the dataset.".format( 176 | category_id 177 | ) 178 | result["category_id"] = reverse_id_mapping[category_id] 179 | 180 | if self._output_dir: 181 | file_path = os.path.join(self._output_dir, "coco_instances_results.json") 182 | self._logger.info("Saving results to {}".format(file_path)) 183 | with PathManager.open(file_path, "w") as f: 184 | f.write(json.dumps(coco_results)) 185 | f.flush() 186 | 187 | if not self._do_evaluation: 188 | self._logger.info("Annotations are not available for evaluation.") 189 | return 190 | 191 | self._logger.info("Evaluating predictions ...") 192 | for task in sorted(tasks): 193 | coco_eval = ( 194 | _evaluate_predictions_on_coco( 195 | self._coco_api, coco_results, task, kpt_oks_sigmas=self._kpt_oks_sigmas 196 | ) 197 | if len(coco_results) > 0 198 | else None # cocoapi does not handle empty results very well 199 | ) 200 | 201 | res = self._derive_coco_results( 202 | coco_eval, task, class_names=self._metadata.get("thing_classes") 203 | ) 204 | self._results[task] = res 205 | 206 | def _eval_box_proposals(self, predictions): 207 | """ 208 | Evaluate the box proposals in predictions. 209 | Fill self._results with the metrics for "box_proposals" task. 210 | """ 211 | if self._output_dir: 212 | # Saving generated box proposals to file. 213 | # Predicted box_proposals are in XYXY_ABS mode. 214 | bbox_mode = BoxMode.XYXY_ABS.value 215 | ids, boxes, objectness_logits = [], [], [] 216 | for prediction in predictions: 217 | ids.append(prediction["image_id"]) 218 | boxes.append(prediction["proposals"].proposal_boxes.tensor.numpy()) 219 | objectness_logits.append(prediction["proposals"].objectness_logits.numpy()) 220 | 221 | proposal_data = { 222 | "boxes": boxes, 223 | "objectness_logits": objectness_logits, 224 | "ids": ids, 225 | "bbox_mode": bbox_mode, 226 | } 227 | with PathManager.open(os.path.join(self._output_dir, "box_proposals.pkl"), "wb") as f: 228 | pickle.dump(proposal_data, f) 229 | 230 | if not self._do_evaluation: 231 | self._logger.info("Annotations are not available for evaluation.") 232 | return 233 | 234 | self._logger.info("Evaluating bbox proposals ...") 235 | res = {} 236 | areas = {"all": "", "small": "s", "medium": "m", "large": "l"} 237 | for limit in [100, 1000]: 238 | for area, suffix in areas.items(): 239 | stats = _evaluate_box_proposals(predictions, self._coco_api, area=area, limit=limit) 240 | key = "AR{}@{:d}".format(suffix, limit) 241 | res[key] = float(stats["ar"].item() * 100) 242 | self._logger.info("Proposal metrics: \n" + create_small_table(res)) 243 | self._results["box_proposals"] = res 244 | 245 | def _calculate_ap(self, class_names, precisions, T=None, A=None): 246 | ################## ap ##################### 247 | voc_ls = [] 248 | non_voc_ls = [] 249 | 250 | for idx, name in enumerate(class_names): 251 | # area range index 0: all area ranges 252 | # max dets index -1: typically 100 per image 253 | if T is not None and A is None: 254 | precision = precisions[T, :, idx, 0, -1] 255 | elif A is not None and T is None: 256 | precision = precisions[:, :, idx, A, -1] 257 | elif T is None and A is None: 258 | precision = precisions[:, :, idx, 0, -1] 259 | else: 260 | assert False 261 | 262 | precision = precision[precision > -1] 263 | ap = np.mean(precision) if precision.size else float("nan") 264 | 265 | # calculate voc ap and non-voc ap 266 | if name in CLASS_NAMES: 267 | voc_ls.append(ap * 100) 268 | else: 269 | non_voc_ls.append(ap * 100) 270 | if len(voc_ls) > 0: 271 | voc_ap = sum(voc_ls) * 1.0 / len(voc_ls) 272 | else: 273 | voc_ap = 0.0 274 | if len(non_voc_ls) > 0: 275 | non_voc_ap = sum(non_voc_ls) * 1.0 / len(non_voc_ls) 276 | else: 277 | non_voc_ap = 0.0 278 | 279 | return voc_ap, non_voc_ap 280 | 281 | def _derive_coco_results(self, coco_eval, iou_type, class_names=None): 282 | """ 283 | Derive the desired score numbers from summarized COCOeval. 284 | 285 | Args: 286 | coco_eval (None or COCOEval): None represents no predictions from model. 287 | iou_type (str): 288 | class_names (None or list[str]): if provided, will use it to predict 289 | per-category AP. 290 | 291 | Returns: 292 | a dict of {metric name: score} 293 | """ 294 | 295 | metrics = { 296 | "bbox": ["AP", "AP50", "AP75", "APs", "APm", "APl"], 297 | "segm": ["AP", "AP50", "AP75", "APs", "APm", "APl"], 298 | "keypoints": ["AP", "AP50", "AP75", "APm", "APl"], 299 | }[iou_type] 300 | 301 | if coco_eval is None: 302 | self._logger.warn("No predictions from the model!") 303 | return {metric: float("nan") for metric in metrics} 304 | 305 | # the standard metrics 306 | results = { 307 | metric: float(coco_eval.stats[idx] * 100 if coco_eval.stats[idx] >= 0 else "nan") 308 | for idx, metric in enumerate(metrics) 309 | } 310 | self._logger.info( 311 | "Evaluation results for {}: \n".format(iou_type) + create_small_table(results) 312 | ) 313 | if not np.isfinite(sum(results.values())): 314 | self._logger.info("Some metrics cannot be computed and is shown as NaN.") 315 | 316 | if class_names is None or len(class_names) <= 1: 317 | return results 318 | # Compute per-category AP 319 | # from https://github.com/facebookresearch/Detectron/blob/a6a835f5b8208c45d0dce217ce9bbda915f44df7/detectron/datasets/json_dataset_evaluator.py#L222-L252 # noqa 320 | precisions = coco_eval.eval["precision"] 321 | # precision has dims (iou, recall, cls, area range, max dets) 322 | assert len(class_names) == precisions.shape[2] 323 | 324 | results_per_category = [] 325 | voc_ls = [] 326 | non_voc_ls = [] 327 | 328 | ################## ap ##################### 329 | for idx, name in enumerate(class_names): 330 | # area range index 0: all area ranges 331 | # max dets index -1: typically 100 per image 332 | precision = precisions[:, :, idx, 0, -1] 333 | precision = precision[precision > -1] 334 | ap = np.mean(precision) if precision.size else float("nan") 335 | results_per_category.append(("{}".format(name), float(ap * 100))) 336 | # calculate voc and non voc metrics 337 | voc_ap, non_voc_ap = self._calculate_ap(class_names, precisions) 338 | voc_ap_50, non_voc_ap_50 = self._calculate_ap(class_names, precisions, T=0) # T=0, iou_thresh = 0.5 339 | voc_ap_75, non_voc_ap_75 = self._calculate_ap(class_names, precisions, T=5) # T=5, iou_thresh = 0.75 340 | 341 | voc_ap_small, non_voc_ap_small = self._calculate_ap(class_names, precisions, A=1) # A=1, small 342 | voc_ap_medium, non_voc_ap_medium = self._calculate_ap(class_names, precisions, A=2) # A=2, medium 343 | voc_ap_large, non_voc_ap_large = self._calculate_ap(class_names, precisions, A=3) # A=3, large 344 | 345 | # print voc ap 346 | self._logger.info("Evaluation results for VOC 20 categories =======> AP : " + str('%.2f' % voc_ap)) 347 | self._logger.info("Evaluation results for VOC 20 categories =======> AP50: " + str('%.2f' % voc_ap_50)) 348 | self._logger.info("Evaluation results for VOC 20 categories =======> AP75: " + str('%.2f' % voc_ap_75)) 349 | self._logger.info("Evaluation results for VOC 20 categories =======> APs : " + str('%.2f' % voc_ap_small)) 350 | self._logger.info("Evaluation results for VOC 20 categories =======> APm : " + str('%.2f' % voc_ap_medium)) 351 | self._logger.info("Evaluation results for VOC 20 categories =======> APl : " + str('%.2f' % voc_ap_large)) 352 | 353 | # print voc ap 354 | self._logger.info("Evaluation results for Non VOC 60 categories =======> AP : " + str('%.2f' % non_voc_ap)) 355 | self._logger.info("Evaluation results for Non VOC 60 categories =======> AP50: " + str('%.2f' % non_voc_ap_50)) 356 | self._logger.info("Evaluation results for Non VOC 60 categories =======> AP75: " + str('%.2f' % non_voc_ap_75)) 357 | self._logger.info("Evaluation results for Non VOC 60 categories =======> APs : " + str('%.2f' % non_voc_ap_small)) 358 | self._logger.info("Evaluation results for Non VOC 60 categories =======> APm : " + str('%.2f' % non_voc_ap_medium)) 359 | self._logger.info("Evaluation results for Non VOC 60 categories =======> APl : " + str('%.2f' % non_voc_ap_large)) 360 | 361 | ''' 362 | # log evaluation results in csv 363 | # type, AP, AP50, AP75, APs, APm, APl 364 | eval_log = iou_type + ',' + 'AP,AP50,AP75,APs,APm,APl' + '\n' 365 | eval_log += 'all' + ',' + str('%.2f' % results['AP']) + ',' + str('%.2f' % results['AP50']) + ',' + str('%.2f' % results['AP75']) + ',' + str('%.2f' % results['APs']) + ',' + str('%.2f' % results['APm']) + ',' + str('%.2f' % results['APl']) + '\n' 366 | eval_log += 'VOC' + ',' + str('%.2f' % voc_ap) + ',' + str('%.2f' % voc_ap_50) + ',' + str('%.2f' % voc_ap_75) + ',' + str('%.2f' % voc_ap_small) + ',' + str('%.2f' % voc_ap_medium) + ',' + str('%.2f' % voc_ap_large) + '\n' 367 | eval_log += 'Non-VOC' + ',' + str('%.2f' % non_voc_ap) + ',' + str('%.2f' % non_voc_ap_50) + ',' + str('%.2f' % non_voc_ap_75) + ',' + str('%.2f' % non_voc_ap_small) + ',' + str('%.2f' % non_voc_ap_medium) + ',' + str('%.2f' % non_voc_ap_large) + '\n' 368 | 369 | with open('evaluation_result.csv', 'a') as f: 370 | f.write(eval_log) 371 | ''' 372 | # tabulate it 373 | N_COLS = min(6, len(results_per_category) * 2) 374 | results_flatten = list(itertools.chain(*results_per_category)) 375 | results_2d = itertools.zip_longest(*[results_flatten[i::N_COLS] for i in range(N_COLS)]) 376 | table = tabulate( 377 | results_2d, 378 | tablefmt="pipe", 379 | floatfmt=".3f", 380 | headers=["category", "AP"] * (N_COLS // 2), 381 | numalign="left", 382 | ) 383 | self._logger.info("Per-category {} AP: \n".format(iou_type) + table) 384 | 385 | results.update({"AP-" + name: ap for name, ap in results_per_category}) 386 | return results 387 | 388 | def instances_to_coco_json(instances, img_id): 389 | """ 390 | Dump an "Instances" object to a COCO-format json that's used for evaluation. 391 | 392 | Args: 393 | instances (Instances): 394 | img_id (int): the image id 395 | 396 | Returns: 397 | list[dict]: list of json annotations in COCO format. 398 | """ 399 | num_instance = len(instances) 400 | if num_instance == 0: 401 | return [] 402 | 403 | boxes = instances.pred_boxes.tensor.numpy() 404 | boxes = BoxMode.convert(boxes, BoxMode.XYXY_ABS, BoxMode.XYWH_ABS) 405 | boxes = boxes.tolist() 406 | scores = instances.scores.tolist() 407 | classes = instances.pred_classes.tolist() 408 | 409 | has_mask = instances.has("pred_masks") 410 | if has_mask: 411 | # use RLE to encode the masks, because they are too large and takes memory 412 | # since this evaluator stores outputs of the entire dataset 413 | rles = [ 414 | mask_util.encode(np.array(mask[:, :, None], order="F", dtype="uint8"))[0] 415 | for mask in instances.pred_masks 416 | ] 417 | for rle in rles: 418 | # "counts" is an array encoded by mask_util as a byte-stream. Python3's 419 | # json writer which always produces strings cannot serialize a bytestream 420 | # unless you decode it. Thankfully, utf-8 works out (which is also what 421 | # the pycocotools/_mask.pyx does). 422 | rle["counts"] = rle["counts"].decode("utf-8") 423 | 424 | has_keypoints = instances.has("pred_keypoints") 425 | if has_keypoints: 426 | keypoints = instances.pred_keypoints 427 | 428 | results = [] 429 | for k in range(num_instance): 430 | result = { 431 | "image_id": img_id, 432 | "category_id": classes[k], 433 | "bbox": boxes[k], 434 | "score": scores[k], 435 | } 436 | if has_mask: 437 | result["segmentation"] = rles[k] 438 | if has_keypoints: 439 | # In COCO annotations, 440 | # keypoints coordinates are pixel indices. 441 | # However our predictions are floating point coordinates. 442 | # Therefore we subtract 0.5 to be consistent with the annotation format. 443 | # This is the inverse of data loading logic in `datasets/coco.py`. 444 | keypoints[k][:, :2] -= 0.5 445 | result["keypoints"] = keypoints[k].flatten().tolist() 446 | results.append(result) 447 | return results 448 | 449 | 450 | # inspired from Detectron: 451 | # https://github.com/facebookresearch/Detectron/blob/a6a835f5b8208c45d0dce217ce9bbda915f44df7/detectron/datasets/json_dataset_evaluator.py#L255 # noqa 452 | def _evaluate_box_proposals(dataset_predictions, coco_api, thresholds=None, area="all", limit=None): 453 | """ 454 | Evaluate detection proposal recall metrics. This function is a much 455 | faster alternative to the official COCO API recall evaluation code. However, 456 | it produces slightly different results. 457 | """ 458 | # Record max overlap value for each gt box 459 | # Return vector of overlap values 460 | areas = { 461 | "all": 0, 462 | "small": 1, 463 | "medium": 2, 464 | "large": 3, 465 | "96-128": 4, 466 | "128-256": 5, 467 | "256-512": 6, 468 | "512-inf": 7, 469 | } 470 | area_ranges = [ 471 | [0 ** 2, 1e5 ** 2], # all 472 | [0 ** 2, 32 ** 2], # small 473 | [32 ** 2, 96 ** 2], # medium 474 | [96 ** 2, 1e5 ** 2], # large 475 | [96 ** 2, 128 ** 2], # 96-128 476 | [128 ** 2, 256 ** 2], # 128-256 477 | [256 ** 2, 512 ** 2], # 256-512 478 | [512 ** 2, 1e5 ** 2], 479 | ] # 512-inf 480 | assert area in areas, "Unknown area range: {}".format(area) 481 | area_range = area_ranges[areas[area]] 482 | gt_overlaps = [] 483 | num_pos = 0 484 | 485 | for prediction_dict in dataset_predictions: 486 | predictions = prediction_dict["proposals"] 487 | 488 | # sort predictions in descending order 489 | # TODO maybe remove this and make it explicit in the documentation 490 | inds = predictions.objectness_logits.sort(descending=True)[1] 491 | predictions = predictions[inds] 492 | 493 | ann_ids = coco_api.getAnnIds(imgIds=prediction_dict["image_id"]) 494 | anno = coco_api.loadAnns(ann_ids) 495 | gt_boxes = [ 496 | BoxMode.convert(obj["bbox"], BoxMode.XYWH_ABS, BoxMode.XYXY_ABS) 497 | for obj in anno 498 | if obj["iscrowd"] == 0 499 | ] 500 | gt_boxes = torch.as_tensor(gt_boxes).reshape(-1, 4) # guard against no boxes 501 | gt_boxes = Boxes(gt_boxes) 502 | gt_areas = torch.as_tensor([obj["area"] for obj in anno if obj["iscrowd"] == 0]) 503 | 504 | if len(gt_boxes) == 0 or len(predictions) == 0: 505 | continue 506 | 507 | valid_gt_inds = (gt_areas >= area_range[0]) & (gt_areas <= area_range[1]) 508 | gt_boxes = gt_boxes[valid_gt_inds] 509 | 510 | num_pos += len(gt_boxes) 511 | 512 | if len(gt_boxes) == 0: 513 | continue 514 | 515 | if limit is not None and len(predictions) > limit: 516 | predictions = predictions[:limit] 517 | 518 | overlaps = pairwise_iou(predictions.proposal_boxes, gt_boxes) 519 | 520 | _gt_overlaps = torch.zeros(len(gt_boxes)) 521 | for j in range(min(len(predictions), len(gt_boxes))): 522 | # find which proposal box maximally covers each gt box 523 | # and get the iou amount of coverage for each gt box 524 | max_overlaps, argmax_overlaps = overlaps.max(dim=0) 525 | 526 | # find which gt box is 'best' covered (i.e. 'best' = most iou) 527 | gt_ovr, gt_ind = max_overlaps.max(dim=0) 528 | assert gt_ovr >= 0 529 | # find the proposal box that covers the best covered gt box 530 | box_ind = argmax_overlaps[gt_ind] 531 | # record the iou coverage of this gt box 532 | _gt_overlaps[j] = overlaps[box_ind, gt_ind] 533 | assert _gt_overlaps[j] == gt_ovr 534 | # mark the proposal box and the gt box as used 535 | overlaps[box_ind, :] = -1 536 | overlaps[:, gt_ind] = -1 537 | 538 | # append recorded iou coverage level 539 | gt_overlaps.append(_gt_overlaps) 540 | gt_overlaps = ( 541 | torch.cat(gt_overlaps, dim=0) if len(gt_overlaps) else torch.zeros(0, dtype=torch.float32) 542 | ) 543 | gt_overlaps, _ = torch.sort(gt_overlaps) 544 | 545 | if thresholds is None: 546 | step = 0.05 547 | thresholds = torch.arange(0.5, 0.95 + 1e-5, step, dtype=torch.float32) 548 | recalls = torch.zeros_like(thresholds) 549 | # compute recall for each iou threshold 550 | for i, t in enumerate(thresholds): 551 | recalls[i] = (gt_overlaps >= t).float().sum() / float(num_pos) 552 | # ar = 2 * np.trapz(recalls, thresholds) 553 | ar = recalls.mean() 554 | return { 555 | "ar": ar, 556 | "recalls": recalls, 557 | "thresholds": thresholds, 558 | "gt_overlaps": gt_overlaps, 559 | "num_pos": num_pos, 560 | } 561 | 562 | 563 | def _evaluate_predictions_on_coco(coco_gt, coco_results, iou_type, kpt_oks_sigmas=None): 564 | """ 565 | Evaluate the coco results using COCOEval API. 566 | """ 567 | assert len(coco_results) > 0 568 | 569 | if iou_type == "segm": 570 | coco_results = copy.deepcopy(coco_results) 571 | # When evaluating mask AP, if the results contain bbox, cocoapi will 572 | # use the box area as the area of the instance, instead of the mask area. 573 | # This leads to a different definition of small/medium/large. 574 | # We remove the bbox field to let mask AP use mask area. 575 | for c in coco_results: 576 | c.pop("bbox", None) 577 | 578 | coco_dt = coco_gt.loadRes(coco_results) 579 | coco_eval = COCOeval(coco_gt, coco_dt, iou_type) 580 | 581 | if iou_type == "keypoints": 582 | # Use the COCO default keypoint OKS sigmas unless overrides are specified 583 | if kpt_oks_sigmas: 584 | assert hasattr(coco_eval.params, "kpt_oks_sigmas"), "pycocotools is too old!" 585 | coco_eval.params.kpt_oks_sigmas = np.array(kpt_oks_sigmas) 586 | # COCOAPI requires every detection and every gt to have keypoints, so 587 | # we just take the first entry from both 588 | num_keypoints_dt = len(coco_results[0]["keypoints"]) // 3 589 | num_keypoints_gt = len(next(iter(coco_gt.anns.values()))["keypoints"]) // 3 590 | num_keypoints_oks = len(coco_eval.params.kpt_oks_sigmas) 591 | assert num_keypoints_oks == num_keypoints_dt == num_keypoints_gt, ( 592 | f"[COCOEvaluator] Prediction contain {num_keypoints_dt} keypoints. " 593 | f"Ground truth contains {num_keypoints_gt} keypoints. " 594 | f"The length of cfg.TEST.KEYPOINT_OKS_SIGMAS is {num_keypoints_oks}. " 595 | "They have to agree with each other. For meaning of OKS, please refer to " 596 | "http://cocodataset.org/#keypoints-eval." 597 | ) 598 | 599 | coco_eval.evaluate() 600 | coco_eval.accumulate() 601 | coco_eval.summarize() 602 | 603 | return coco_eval 604 | -------------------------------------------------------------------------------- /fewx/layers/__init__.py: -------------------------------------------------------------------------------- 1 | from .deform_conv import DFConv2d 2 | from .ml_nms import ml_nms 3 | from .iou_loss import IOULoss 4 | from .conv_with_kaiming_uniform import conv_with_kaiming_uniform 5 | from .naive_group_norm import NaiveGroupNorm 6 | from .boundary import get_instances_contour_interior 7 | from .misc import interpolate 8 | 9 | __all__ = [k for k in globals().keys() if not k.startswith("_")] 10 | -------------------------------------------------------------------------------- /fewx/layers/boundary.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Created on Sat Jan 11 00:22:15 2020 5 | 6 | @author: fanq15 7 | """ 8 | import numpy as np 9 | #from PIL import Image #, ImageOps, ImageDraw 10 | from skimage import filters, img_as_ubyte 11 | from skimage.morphology import remove_small_objects, dilation, erosion, binary_dilation, binary_erosion, square 12 | #from scipy.ndimage.interpolation import map_coordinates 13 | #from scipy.ndimage.morphology import binary_fill_holes 14 | from scipy.ndimage.filters import gaussian_filter 15 | from scipy.ndimage.measurements import center_of_mass 16 | 17 | def get_contour_interior(mask, bold=False): 18 | if True: #'camunet' == config['param']['model']: 19 | # 2-pixel contour (1out+1in), 2-pixel shrinked interior 20 | outer = binary_dilation(mask) #, square(9)) 21 | if bold: 22 | outer = binary_dilation(outer) #, square(9)) 23 | inner = binary_erosion(mask) #, square(9)) 24 | contour = ((outer != inner) > 0).astype(np.uint8) 25 | interior = (erosion(inner) > 0).astype(np.uint8) 26 | else: 27 | contour = filters.scharr(mask) 28 | scharr_threshold = np.amax(abs(contour)) / 2. 29 | contour = (np.abs(contour) > scharr_threshold).astype(np.uint8) 30 | interior = (mask - contour > 0).astype(np.uint8) 31 | return contour, interior 32 | 33 | def get_center(mask): 34 | r = 2 35 | y, x = center_of_mass(mask) 36 | center_img = Image.fromarray(np.zeros_like(mask).astype(np.uint8)) 37 | if not np.isnan(x) and not np.isnan(y): 38 | draw = ImageDraw.Draw(center_img) 39 | draw.ellipse([x-r, y-r, x+r, y+r], fill='White') 40 | center = np.asarray(center_img) 41 | return center 42 | 43 | def get_instances_contour_interior(instances_mask): 44 | adjacent_boundary_only = False #False #config['contour'].getboolean('adjacent_boundary_only') 45 | instances_mask = instances_mask.data 46 | result_c = np.zeros_like(instances_mask, dtype=np.uint8) 47 | result_i = np.zeros_like(instances_mask, dtype=np.uint8) 48 | weight = np.ones_like(instances_mask, dtype=np.float32) 49 | #masks = decompose_mask(instances_mask) 50 | #for m in masks: 51 | contour, interior = get_contour_interior(instances_mask, bold=adjacent_boundary_only) 52 | #center = get_center(m) 53 | if adjacent_boundary_only: 54 | result_c += contour 55 | else: 56 | result_c = np.maximum(result_c, contour) 57 | result_i = np.maximum(result_i, interior) 58 | #contour += center 59 | contour = np.where(contour > 0, 1, 0) 60 | # magic number 50 make weight distributed to [1, 5) roughly 61 | weight *= (1 + gaussian_filter(contour, sigma=1) / 50) 62 | if adjacent_boundary_only: 63 | result_c = (result_c > 1).astype(np.uint8) 64 | return result_c, result_i, weight 65 | -------------------------------------------------------------------------------- /fewx/layers/conv_with_kaiming_uniform.py: -------------------------------------------------------------------------------- 1 | from torch import nn 2 | 3 | from detectron2.layers import Conv2d 4 | from .deform_conv import DFConv2d 5 | from detectron2.layers.batch_norm import get_norm 6 | 7 | 8 | def conv_with_kaiming_uniform( 9 | norm=None, activation=None, 10 | use_deformable=False, use_sep=False): 11 | def make_conv( 12 | in_channels, out_channels, kernel_size, stride=1, dilation=1 13 | ): 14 | if use_deformable: 15 | conv_func = DFConv2d 16 | else: 17 | conv_func = Conv2d 18 | if use_sep: 19 | assert in_channels == out_channels 20 | groups = in_channels 21 | else: 22 | groups = 1 23 | conv = conv_func( 24 | in_channels, 25 | out_channels, 26 | kernel_size=kernel_size, 27 | stride=stride, 28 | padding=dilation * (kernel_size - 1) // 2, 29 | dilation=dilation, 30 | groups=groups, 31 | bias=(norm is None) 32 | ) 33 | if not use_deformable: 34 | # Caffe2 implementation uses XavierFill, which in fact 35 | # corresponds to kaiming_uniform_ in PyTorch 36 | nn.init.kaiming_uniform_(conv.weight, a=1) 37 | if norm is None: 38 | nn.init.constant_(conv.bias, 0) 39 | module = [conv,] 40 | if norm is not None and len(norm) > 0: 41 | if norm == "GN": 42 | norm_module = nn.GroupNorm(32, out_channels) 43 | else: 44 | norm_module = get_norm(norm, out_channels) 45 | module.append(norm_module) 46 | if activation is not None: 47 | module.append(nn.ReLU(inplace=True)) 48 | if len(module) > 1: 49 | return nn.Sequential(*module) 50 | return conv 51 | 52 | return make_conv 53 | -------------------------------------------------------------------------------- /fewx/layers/deform_conv.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch import nn 3 | 4 | from detectron2.layers import Conv2d 5 | 6 | 7 | class _NewEmptyTensorOp(torch.autograd.Function): 8 | @staticmethod 9 | def forward(ctx, x, new_shape): 10 | ctx.shape = x.shape 11 | return x.new_empty(new_shape) 12 | 13 | @staticmethod 14 | def backward(ctx, grad): 15 | shape = ctx.shape 16 | return _NewEmptyTensorOp.apply(grad, shape), None 17 | 18 | 19 | class DFConv2d(nn.Module): 20 | """ 21 | Deformable convolutional layer with configurable 22 | deformable groups, dilations and groups. 23 | 24 | Code is from: 25 | https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/maskrcnn_benchmark/layers/misc.py 26 | 27 | 28 | """ 29 | def __init__( 30 | self, 31 | in_channels, 32 | out_channels, 33 | with_modulated_dcn=True, 34 | kernel_size=3, 35 | stride=1, 36 | groups=1, 37 | dilation=1, 38 | deformable_groups=1, 39 | bias=False, 40 | padding=None 41 | ): 42 | super(DFConv2d, self).__init__() 43 | if isinstance(kernel_size, (list, tuple)): 44 | assert isinstance(stride, (list, tuple)) 45 | assert isinstance(dilation, (list, tuple)) 46 | assert len(kernel_size) == 2 47 | assert len(stride) == 2 48 | assert len(dilation) == 2 49 | padding = ( 50 | dilation[0] * (kernel_size[0] - 1) // 2, 51 | dilation[1] * (kernel_size[1] - 1) // 2 52 | ) 53 | offset_base_channels = kernel_size[0] * kernel_size[1] 54 | else: 55 | padding = dilation * (kernel_size - 1) // 2 56 | offset_base_channels = kernel_size * kernel_size 57 | if with_modulated_dcn: 58 | from detectron2.layers.deform_conv import ModulatedDeformConv 59 | offset_channels = offset_base_channels * 3 # default: 27 60 | conv_block = ModulatedDeformConv 61 | else: 62 | from detectron2.layers.deform_conv import DeformConv 63 | offset_channels = offset_base_channels * 2 # default: 18 64 | conv_block = DeformConv 65 | self.offset = Conv2d( 66 | in_channels, 67 | deformable_groups * offset_channels, 68 | kernel_size=kernel_size, 69 | stride=stride, 70 | padding=padding, 71 | groups=1, 72 | dilation=dilation 73 | ) 74 | for l in [self.offset, ]: 75 | nn.init.kaiming_uniform_(l.weight, a=1) 76 | torch.nn.init.constant_(l.bias, 0.) 77 | self.conv = conv_block( 78 | in_channels, 79 | out_channels, 80 | kernel_size=kernel_size, 81 | stride=stride, 82 | padding=padding, 83 | dilation=dilation, 84 | groups=groups, 85 | deformable_groups=deformable_groups, 86 | bias=bias 87 | ) 88 | self.with_modulated_dcn = with_modulated_dcn 89 | self.kernel_size = kernel_size 90 | self.stride = stride 91 | self.padding = padding 92 | self.dilation = dilation 93 | self.offset_split = offset_base_channels * deformable_groups * 2 94 | 95 | def forward(self, x, return_offset=False): 96 | if x.numel() > 0: 97 | if not self.with_modulated_dcn: 98 | offset_mask = self.offset(x) 99 | x = self.conv(x, offset_mask) 100 | else: 101 | offset_mask = self.offset(x) 102 | offset = offset_mask[:, :self.offset_split, :, :] 103 | mask = offset_mask[:, self.offset_split:, :, :].sigmoid() 104 | x = self.conv(x, offset, mask) 105 | if return_offset: 106 | return x, offset_mask 107 | return x 108 | # get output shape 109 | output_shape = [ 110 | (i + 2 * p - (di * (k - 1) + 1)) // d + 1 111 | for i, p, di, k, d in zip( 112 | x.shape[-2:], 113 | self.padding, 114 | self.dilation, 115 | self.kernel_size, 116 | self.stride 117 | ) 118 | ] 119 | output_shape = [x.shape[0], self.conv.weight.shape[0]] + output_shape 120 | return _NewEmptyTensorOp.apply(x, output_shape) 121 | -------------------------------------------------------------------------------- /fewx/layers/iou_loss.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch import nn 3 | 4 | 5 | class IOULoss(nn.Module): 6 | """ 7 | Intersetion Over Union (IoU) loss which supports three 8 | different IoU computations: 9 | * IoU 10 | * Linear IoU 11 | * gIoU 12 | """ 13 | def __init__(self, loc_loss_type='iou'): 14 | super(IOULoss, self).__init__() 15 | self.loc_loss_type = loc_loss_type 16 | 17 | def forward(self, pred, target, weight=None): 18 | """ 19 | Args: 20 | pred: Nx4 predicted bounding boxes 21 | target: Nx4 target bounding boxes 22 | weight: N loss weight for each instance 23 | """ 24 | pred_left = pred[:, 0] 25 | pred_top = pred[:, 1] 26 | pred_right = pred[:, 2] 27 | pred_bottom = pred[:, 3] 28 | 29 | target_left = target[:, 0] 30 | target_top = target[:, 1] 31 | target_right = target[:, 2] 32 | target_bottom = target[:, 3] 33 | 34 | target_aera = (target_left + target_right) * \ 35 | (target_top + target_bottom) 36 | pred_aera = (pred_left + pred_right) * \ 37 | (pred_top + pred_bottom) 38 | 39 | w_intersect = torch.min(pred_left, target_left) + \ 40 | torch.min(pred_right, target_right) 41 | h_intersect = torch.min(pred_bottom, target_bottom) + \ 42 | torch.min(pred_top, target_top) 43 | 44 | g_w_intersect = torch.max(pred_left, target_left) + \ 45 | torch.max(pred_right, target_right) 46 | g_h_intersect = torch.max(pred_bottom, target_bottom) + \ 47 | torch.max(pred_top, target_top) 48 | ac_uion = g_w_intersect * g_h_intersect 49 | 50 | area_intersect = w_intersect * h_intersect 51 | area_union = target_aera + pred_aera - area_intersect 52 | 53 | ious = (area_intersect + 1.0) / (area_union + 1.0) 54 | gious = ious - (ac_uion - area_union) / ac_uion 55 | if self.loc_loss_type == 'iou': 56 | losses = -torch.log(ious) 57 | elif self.loc_loss_type == 'linear_iou': 58 | losses = 1 - ious 59 | elif self.loc_loss_type == 'giou': 60 | losses = 1 - gious 61 | else: 62 | raise NotImplementedError 63 | 64 | if weight is not None: 65 | return (losses * weight).sum() 66 | else: 67 | return losses.sum() 68 | -------------------------------------------------------------------------------- /fewx/layers/misc.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | """ 3 | helper class that supports empty tensors on some nn functions. 4 | Ideally, add support directly in PyTorch to empty tensors in 5 | those functions. 6 | This can be removed once https://github.com/pytorch/pytorch/issues/12013 7 | is implemented 8 | """ 9 | 10 | import math 11 | import torch 12 | from torch.nn.modules.utils import _ntuple 13 | from torch import nn 14 | 15 | class _NewEmptyTensorOp(torch.autograd.Function): 16 | @staticmethod 17 | def forward(ctx, x, new_shape): 18 | ctx.shape = x.shape 19 | return x.new_empty(new_shape) 20 | 21 | @staticmethod 22 | def backward(ctx, grad): 23 | shape = ctx.shape 24 | return _NewEmptyTensorOp.apply(grad, shape), None 25 | 26 | 27 | def interpolate( 28 | input, size=None, scale_factor=None, mode="nearest", align_corners=None 29 | ): 30 | if input.numel() > 0: 31 | return torch.nn.functional.interpolate( 32 | input, size, scale_factor, mode, align_corners 33 | ) 34 | 35 | def _check_size_scale_factor(dim): 36 | if size is None and scale_factor is None: 37 | raise ValueError("either size or scale_factor should be defined") 38 | if size is not None and scale_factor is not None: 39 | raise ValueError("only one of size or scale_factor should be defined") 40 | if ( 41 | scale_factor is not None 42 | and isinstance(scale_factor, tuple) 43 | and len(scale_factor) != dim 44 | ): 45 | raise ValueError( 46 | "scale_factor shape must match input shape. " 47 | "Input is {}D, scale_factor size is {}".format(dim, len(scale_factor)) 48 | ) 49 | 50 | def _output_size(dim): 51 | _check_size_scale_factor(dim) 52 | if size is not None: 53 | return size 54 | scale_factors = _ntuple(dim)(scale_factor) 55 | # math.floor might return float in py2.7 56 | return [ 57 | int(math.floor(input.size(i + 2) * scale_factors[i])) for i in range(dim) 58 | ] 59 | 60 | output_shape = tuple(_output_size(2)) 61 | output_shape = input.shape[:-2] + output_shape 62 | return _NewEmptyTensorOp.apply(input, output_shape) 63 | -------------------------------------------------------------------------------- /fewx/layers/ml_nms.py: -------------------------------------------------------------------------------- 1 | from detectron2.layers import batched_nms 2 | 3 | 4 | def ml_nms(boxlist, nms_thresh, max_proposals=-1, 5 | score_field="scores", label_field="labels"): 6 | """ 7 | Performs non-maximum suppression on a boxlist, with scores specified 8 | in a boxlist field via score_field. 9 | 10 | Args: 11 | boxlist (detectron2.structures.Boxes): 12 | nms_thresh (float): 13 | max_proposals (int): if > 0, then only the top max_proposals are kept 14 | after non-maximum suppression 15 | score_field (str): 16 | """ 17 | if nms_thresh <= 0: 18 | return boxlist 19 | boxes = boxlist.pred_boxes.tensor 20 | scores = boxlist.scores 21 | labels = boxlist.pred_classes 22 | keep = batched_nms(boxes, scores, labels, nms_thresh) 23 | if max_proposals > 0: 24 | keep = keep[: max_proposals] 25 | boxlist = boxlist[keep] 26 | return boxlist 27 | -------------------------------------------------------------------------------- /fewx/layers/naive_group_norm.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.nn import Module, Parameter 3 | from torch.nn import init 4 | 5 | 6 | class NaiveGroupNorm(Module): 7 | r"""NaiveGroupNorm implements Group Normalization with the high-level matrix operations in PyTorch. 8 | It is a temporary solution to export GN by ONNX before the official GN can be exported by ONNX. 9 | The usage of NaiveGroupNorm is exactly the same as the official :class:`torch.nn.GroupNorm`. 10 | Args: 11 | num_groups (int): number of groups to separate the channels into 12 | num_channels (int): number of channels expected in input 13 | eps: a value added to the denominator for numerical stability. Default: 1e-5 14 | affine: a boolean value that when set to ``True``, this module 15 | has learnable per-channel affine parameters initialized to ones (for weights) 16 | and zeros (for biases). Default: ``True``. 17 | Shape: 18 | - Input: :math:`(N, C, *)` where :math:`C=\text{num\_channels}` 19 | - Output: :math:`(N, C, *)` (same shape as input) 20 | Examples:: 21 | >>> input = torch.randn(20, 6, 10, 10) 22 | >>> # Separate 6 channels into 3 groups 23 | >>> m = NaiveGroupNorm(3, 6) 24 | >>> # Separate 6 channels into 6 groups (equivalent with InstanceNorm) 25 | >>> m = NaiveGroupNorm(6, 6) 26 | >>> # Put all 6 channels into a single group (equivalent with LayerNorm) 27 | >>> m = NaiveGroupNorm(1, 6) 28 | >>> # Activating the module 29 | >>> output = m(input) 30 | .. _`Group Normalization`: https://arxiv.org/abs/1803.08494 31 | """ 32 | __constants__ = ['num_groups', 'num_channels', 'eps', 'affine', 'weight', 33 | 'bias'] 34 | 35 | def __init__(self, num_groups, num_channels, eps=1e-5, affine=True): 36 | super(NaiveGroupNorm, self).__init__() 37 | self.num_groups = num_groups 38 | self.num_channels = num_channels 39 | self.eps = eps 40 | self.affine = affine 41 | if self.affine: 42 | self.weight = Parameter(torch.Tensor(num_channels)) 43 | self.bias = Parameter(torch.Tensor(num_channels)) 44 | else: 45 | self.register_parameter('weight', None) 46 | self.register_parameter('bias', None) 47 | self.reset_parameters() 48 | 49 | def reset_parameters(self): 50 | if self.affine: 51 | init.ones_(self.weight) 52 | init.zeros_(self.bias) 53 | 54 | def forward(self, input): 55 | N, C, H, W = input.size() 56 | assert C % self.num_groups == 0 57 | input = input.reshape(N, self.num_groups, -1) 58 | mean = input.mean(dim=-1, keepdim=True) 59 | var = (input ** 2).mean(dim=-1, keepdim=True) - mean ** 2 60 | std = torch.sqrt(var + self.eps) 61 | 62 | input = (input - mean) / std 63 | input = input.reshape(N, C, H, W) 64 | if self.affine: 65 | input = input * self.weight.reshape(1, C, 1, 1) + self.bias.reshape(1, C, 1, 1) 66 | return input 67 | 68 | def extra_repr(self): 69 | return '{num_groups}, {num_channels}, eps={eps}, ' \ 70 | 'affine={affine}'.format(**self.__dict__) 71 | -------------------------------------------------------------------------------- /fewx/modeling/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | from .fsod import FsodRCNN, FsodRes5ROIHeads, FsodFastRCNNOutputLayers, FsodRPN 3 | 4 | _EXCLUDE = {"torch", "ShapeSpec"} 5 | __all__ = [k for k in globals().keys() if k not in _EXCLUDE and not k.startswith("_")] 6 | -------------------------------------------------------------------------------- /fewx/modeling/fsod/__init__.py: -------------------------------------------------------------------------------- 1 | from .fsod_rcnn import FsodRCNN 2 | from .fsod_roi_heads import FsodRes5ROIHeads 3 | from .fsod_fast_rcnn import FsodFastRCNNOutputLayers 4 | from .fsod_rpn import FsodRPN 5 | -------------------------------------------------------------------------------- /fewx/modeling/fsod/fsod_rcnn.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | import logging 3 | import numpy as np 4 | import torch 5 | from torch import nn 6 | 7 | from detectron2.data.detection_utils import convert_image_to_rgb 8 | from detectron2.structures import ImageList, Boxes, Instances 9 | from detectron2.utils.events import get_event_storage 10 | from detectron2.utils.logger import log_first_n 11 | 12 | from detectron2.modeling.backbone import build_backbone 13 | from detectron2.modeling.postprocessing import detector_postprocess 14 | from detectron2.modeling.proposal_generator import build_proposal_generator 15 | from .fsod_roi_heads import build_roi_heads 16 | from detectron2.modeling.meta_arch.build import META_ARCH_REGISTRY 17 | 18 | from detectron2.modeling.poolers import ROIPooler 19 | import torch.nn.functional as F 20 | 21 | from .fsod_fast_rcnn import FsodFastRCNNOutputs 22 | 23 | import os 24 | 25 | import matplotlib.pyplot as plt 26 | import pandas as pd 27 | 28 | from detectron2.data.catalog import MetadataCatalog 29 | import detectron2.data.detection_utils as utils 30 | import pickle 31 | import sys 32 | 33 | __all__ = ["FsodRCNN"] 34 | 35 | 36 | @META_ARCH_REGISTRY.register() 37 | class FsodRCNN(nn.Module): 38 | """ 39 | Generalized R-CNN. Any models that contains the following three components: 40 | 1. Per-image feature extraction (aka backbone) 41 | 2. Region proposal generation 42 | 3. Per-region feature extraction and prediction 43 | """ 44 | 45 | def __init__(self, cfg): 46 | super().__init__() 47 | 48 | self.backbone = build_backbone(cfg) 49 | self.proposal_generator = build_proposal_generator(cfg, self.backbone.output_shape()) 50 | self.roi_heads = build_roi_heads(cfg, self.backbone.output_shape()) 51 | self.vis_period = cfg.VIS_PERIOD 52 | self.input_format = cfg.INPUT.FORMAT 53 | 54 | assert len(cfg.MODEL.PIXEL_MEAN) == len(cfg.MODEL.PIXEL_STD) 55 | self.register_buffer("pixel_mean", torch.Tensor(cfg.MODEL.PIXEL_MEAN).view(-1, 1, 1)) 56 | self.register_buffer("pixel_std", torch.Tensor(cfg.MODEL.PIXEL_STD).view(-1, 1, 1)) 57 | 58 | self.in_features = cfg.MODEL.ROI_HEADS.IN_FEATURES 59 | 60 | self.support_way = cfg.INPUT.FS.SUPPORT_WAY 61 | self.support_shot = cfg.INPUT.FS.SUPPORT_SHOT 62 | self.logger = logging.getLogger(__name__) 63 | 64 | @property 65 | def device(self): 66 | return self.pixel_mean.device 67 | 68 | def visualize_training(self, batched_inputs, proposals): 69 | """ 70 | A function used to visualize images and proposals. It shows ground truth 71 | bounding boxes on the original image and up to 20 predicted object 72 | proposals on the original image. Users can implement different 73 | visualization functions for different models. 74 | 75 | Args: 76 | batched_inputs (list): a list that contains input to the model. 77 | proposals (list): a list that contains predicted proposals. Both 78 | batched_inputs and proposals should have the same length. 79 | """ 80 | from detectron2.utils.visualizer import Visualizer 81 | 82 | storage = get_event_storage() 83 | max_vis_prop = 20 84 | 85 | for input, prop in zip(batched_inputs, proposals): 86 | img = input["image"] 87 | img = convert_image_to_rgb(img.permute(1, 2, 0), self.input_format) 88 | v_gt = Visualizer(img, None) 89 | v_gt = v_gt.overlay_instances(boxes=input["instances"].gt_boxes) 90 | anno_img = v_gt.get_image() 91 | box_size = min(len(prop.proposal_boxes), max_vis_prop) 92 | v_pred = Visualizer(img, None) 93 | v_pred = v_pred.overlay_instances( 94 | boxes=prop.proposal_boxes[0:box_size].tensor.cpu().numpy() 95 | ) 96 | prop_img = v_pred.get_image() 97 | vis_img = np.concatenate((anno_img, prop_img), axis=1) 98 | vis_img = vis_img.transpose(2, 0, 1) 99 | vis_name = "Left: GT bounding boxes; Right: Predicted proposals" 100 | storage.put_image(vis_name, vis_img) 101 | break # only visualize one image in a batch 102 | 103 | def forward(self, batched_inputs): 104 | """ 105 | Args: 106 | batched_inputs: a list, batched outputs of :class:`DatasetMapper` . 107 | Each item in the list contains the inputs for one image. 108 | For now, each item in the list is a dict that contains: 109 | 110 | * image: Tensor, image in (C, H, W) format. 111 | * instances (optional): groundtruth :class:`Instances` 112 | * proposals (optional): :class:`Instances`, precomputed proposals. 113 | 114 | Other information that's included in the original dicts, such as: 115 | 116 | * "height", "width" (int): the output resolution of the model, used in inference. 117 | See :meth:`postprocess` for details. 118 | 119 | Returns: 120 | list[dict]: 121 | Each dict is the output for one input image. 122 | The dict contains one key "instances" whose value is a :class:`Instances`. 123 | The :class:`Instances` object has the following keys: 124 | "pred_boxes", "pred_classes", "scores", "pred_masks", "pred_keypoints" 125 | """ 126 | if not self.training: 127 | self.init_model() 128 | return self.inference(batched_inputs) 129 | 130 | images, support_images = self.preprocess_image(batched_inputs) 131 | if "instances" in batched_inputs[0]: 132 | for x in batched_inputs: 133 | x['instances'].set('gt_classes', torch.full_like(x['instances'].get('gt_classes'), 0)) 134 | 135 | gt_instances = [x["instances"].to(self.device) for x in batched_inputs] 136 | else: 137 | gt_instances = None 138 | 139 | features = self.backbone(images.tensor) 140 | 141 | # support branches 142 | support_bboxes_ls = [] 143 | for item in batched_inputs: 144 | bboxes = item['support_bboxes'] 145 | for box in bboxes: 146 | box = Boxes(box[np.newaxis, :]) 147 | support_bboxes_ls.append(box.to(self.device)) 148 | 149 | B, N, C, H, W = support_images.tensor.shape 150 | assert N == self.support_way * self.support_shot 151 | 152 | support_images = support_images.tensor.reshape(B*N, C, H, W) 153 | support_features = self.backbone(support_images) 154 | 155 | # support feature roi pooling 156 | feature_pooled = self.roi_heads.roi_pooling(support_features, support_bboxes_ls) 157 | 158 | support_box_features = self.roi_heads._shared_roi_transform([support_features[f] for f in self.in_features], support_bboxes_ls) 159 | assert self.support_way == 2 # now only 2 way support 160 | 161 | detector_loss_cls = [] 162 | detector_loss_box_reg = [] 163 | rpn_loss_rpn_cls = [] 164 | rpn_loss_rpn_loc = [] 165 | for i in range(B): # batch 166 | # query 167 | query_gt_instances = [gt_instances[i]] # one query gt instances 168 | query_images = ImageList.from_tensors([images[i]]) # one query image 169 | 170 | query_feature_res4 = features['res4'][i].unsqueeze(0) # one query feature for attention rpn 171 | query_features = {'res4': query_feature_res4} # one query feature for rcnn 172 | 173 | # positive support branch ################################## 174 | pos_begin = i * self.support_shot * self.support_way 175 | pos_end = pos_begin + self.support_shot 176 | pos_support_features = feature_pooled[pos_begin:pos_end].mean(0, True) # pos support features from res4, average all supports, for rcnn 177 | pos_support_features_pool = pos_support_features.mean(dim=[2, 3], keepdim=True) # average pooling support feature for attention rpn 178 | pos_correlation = F.conv2d(query_feature_res4, pos_support_features_pool.permute(1,0,2,3), groups=1024) # attention map 179 | 180 | pos_features = {'res4': pos_correlation} # attention map for attention rpn 181 | pos_support_box_features = support_box_features[pos_begin:pos_end].mean(0, True) 182 | pos_proposals, pos_anchors, pos_pred_objectness_logits, pos_gt_labels, pos_pred_anchor_deltas, pos_gt_boxes = self.proposal_generator(query_images, pos_features, query_gt_instances) # attention rpn 183 | pos_pred_class_logits, pos_pred_proposal_deltas, pos_detector_proposals = self.roi_heads(query_images, query_features, pos_support_box_features, pos_proposals, query_gt_instances) # pos rcnn 184 | 185 | # negative support branch ################################## 186 | neg_begin = pos_end 187 | neg_end = neg_begin + self.support_shot 188 | 189 | neg_support_features = feature_pooled[neg_begin:neg_end].mean(0, True) 190 | neg_support_features_pool = neg_support_features.mean(dim=[2, 3], keepdim=True) 191 | neg_correlation = F.conv2d(query_feature_res4, neg_support_features_pool.permute(1,0,2,3), groups=1024) 192 | 193 | neg_features = {'res4': neg_correlation} 194 | 195 | neg_support_box_features = support_box_features[neg_begin:neg_end].mean(0, True) 196 | neg_proposals, neg_anchors, neg_pred_objectness_logits, neg_gt_labels, neg_pred_anchor_deltas, neg_gt_boxes = self.proposal_generator(query_images, neg_features, query_gt_instances) 197 | neg_pred_class_logits, neg_pred_proposal_deltas, neg_detector_proposals = self.roi_heads(query_images, query_features, neg_support_box_features, neg_proposals, query_gt_instances) 198 | 199 | # rpn loss 200 | outputs_images = ImageList.from_tensors([images[i], images[i]]) 201 | 202 | outputs_pred_objectness_logits = [torch.cat(pos_pred_objectness_logits + neg_pred_objectness_logits, dim=0)] 203 | outputs_pred_anchor_deltas = [torch.cat(pos_pred_anchor_deltas + neg_pred_anchor_deltas, dim=0)] 204 | 205 | outputs_anchors = pos_anchors # + neg_anchors 206 | 207 | # convert 1 in neg_gt_labels to 0 208 | for item in neg_gt_labels: 209 | item[item == 1] = 0 210 | 211 | outputs_gt_boxes = pos_gt_boxes + neg_gt_boxes #[None] 212 | outputs_gt_labels = pos_gt_labels + neg_gt_labels 213 | 214 | if self.training: 215 | proposal_losses = self.proposal_generator.losses( 216 | outputs_anchors, outputs_pred_objectness_logits, outputs_gt_labels, outputs_pred_anchor_deltas, outputs_gt_boxes) 217 | proposal_losses = {k: v * self.proposal_generator.loss_weight for k, v in proposal_losses.items()} 218 | else: 219 | proposal_losses = {} 220 | 221 | # detector loss 222 | detector_pred_class_logits = torch.cat([pos_pred_class_logits, neg_pred_class_logits], dim=0) 223 | detector_pred_proposal_deltas = torch.cat([pos_pred_proposal_deltas, neg_pred_proposal_deltas], dim=0) 224 | for item in neg_detector_proposals: 225 | item.gt_classes = torch.full_like(item.gt_classes, 1) 226 | 227 | #detector_proposals = pos_detector_proposals + neg_detector_proposals 228 | detector_proposals = [Instances.cat(pos_detector_proposals + neg_detector_proposals)] 229 | if self.training: 230 | predictions = detector_pred_class_logits, detector_pred_proposal_deltas 231 | detector_losses = self.roi_heads.box_predictor.losses(predictions, detector_proposals) 232 | 233 | rpn_loss_rpn_cls.append(proposal_losses['loss_rpn_cls']) 234 | rpn_loss_rpn_loc.append(proposal_losses['loss_rpn_loc']) 235 | detector_loss_cls.append(detector_losses['loss_cls']) 236 | detector_loss_box_reg.append(detector_losses['loss_box_reg']) 237 | 238 | proposal_losses = {} 239 | detector_losses = {} 240 | 241 | proposal_losses['loss_rpn_cls'] = torch.stack(rpn_loss_rpn_cls).mean() 242 | proposal_losses['loss_rpn_loc'] = torch.stack(rpn_loss_rpn_loc).mean() 243 | detector_losses['loss_cls'] = torch.stack(detector_loss_cls).mean() 244 | detector_losses['loss_box_reg'] = torch.stack(detector_loss_box_reg).mean() 245 | 246 | 247 | losses = {} 248 | losses.update(detector_losses) 249 | losses.update(proposal_losses) 250 | return losses 251 | 252 | def init_model(self): 253 | self.support_on = True #False 254 | 255 | support_dir = './support_dir' 256 | if not os.path.exists(support_dir): 257 | os.makedirs(support_dir) 258 | 259 | support_file_name = os.path.join(support_dir, 'support_feature.pkl') 260 | if not os.path.exists(support_file_name): 261 | support_path = './datasets/coco/10_shot_support_df.pkl' 262 | support_df = pd.read_pickle(support_path) 263 | 264 | metadata = MetadataCatalog.get('coco_2017_train') 265 | # unmap the category mapping ids for COCO 266 | reverse_id_mapper = lambda dataset_id: metadata.thing_dataset_id_to_contiguous_id[dataset_id] # noqa 267 | support_df['category_id'] = support_df['category_id'].map(reverse_id_mapper) 268 | 269 | support_dict = {'res4_avg': {}, 'res5_avg': {}} 270 | for cls in support_df['category_id'].unique(): 271 | support_cls_df = support_df.loc[support_df['category_id'] == cls, :].reset_index() 272 | support_data_all = [] 273 | support_box_all = [] 274 | 275 | for index, support_img_df in support_cls_df.iterrows(): 276 | img_path = os.path.join('./datasets/coco', support_img_df['file_path']) 277 | support_data = utils.read_image(img_path, format='BGR') 278 | support_data = torch.as_tensor(np.ascontiguousarray(support_data.transpose(2, 0, 1))) 279 | support_data_all.append(support_data) 280 | 281 | support_box = support_img_df['support_box'] 282 | support_box_all.append(Boxes([support_box]).to(self.device)) 283 | 284 | # support images 285 | support_images = [x.to(self.device) for x in support_data_all] 286 | support_images = [(x - self.pixel_mean) / self.pixel_std for x in support_images] 287 | support_images = ImageList.from_tensors(support_images, self.backbone.size_divisibility) 288 | support_features = self.backbone(support_images.tensor) 289 | 290 | res4_pooled = self.roi_heads.roi_pooling(support_features, support_box_all) 291 | res4_avg = res4_pooled.mean(0, True) 292 | res4_avg = res4_avg.mean(dim=[2,3], keepdim=True) 293 | support_dict['res4_avg'][cls] = res4_avg.detach().cpu().data 294 | 295 | res5_feature = self.roi_heads._shared_roi_transform([support_features[f] for f in self.in_features], support_box_all) 296 | res5_avg = res5_feature.mean(0, True) 297 | support_dict['res5_avg'][cls] = res5_avg.detach().cpu().data 298 | 299 | del res4_avg 300 | del res4_pooled 301 | del support_features 302 | del res5_feature 303 | del res5_avg 304 | 305 | with open(support_file_name, 'wb') as f: 306 | pickle.dump(support_dict, f) 307 | self.logger.info("=========== Offline support features are generated. ===========") 308 | self.logger.info("============ Few-shot object detetion will start. =============") 309 | sys.exit(0) 310 | 311 | else: 312 | with open(support_file_name, "rb") as hFile: 313 | self.support_dict = pickle.load(hFile, encoding="latin1") 314 | for res_key, res_dict in self.support_dict.items(): 315 | for cls_key, feature in res_dict.items(): 316 | self.support_dict[res_key][cls_key] = feature.cuda() 317 | 318 | def inference(self, batched_inputs, detected_instances=None, do_postprocess=True): 319 | """ 320 | Run inference on the given inputs. 321 | Args: 322 | batched_inputs (list[dict]): same as in :meth:`forward` 323 | detected_instances (None or list[Instances]): if not None, it 324 | contains an `Instances` object per image. The `Instances` 325 | object contains "pred_boxes" and "pred_classes" which are 326 | known boxes in the image. 327 | The inference will then skip the detection of bounding boxes, 328 | and only predict other per-ROI outputs. 329 | do_postprocess (bool): whether to apply post-processing on the outputs. 330 | Returns: 331 | same as in :meth:`forward`. 332 | """ 333 | assert not self.training 334 | 335 | images = self.preprocess_image(batched_inputs) 336 | features = self.backbone(images.tensor) 337 | 338 | B, _, _, _ = features['res4'].shape 339 | assert B == 1 # only support 1 query image in test 340 | assert len(images) == 1 341 | support_proposals_dict = {} 342 | support_box_features_dict = {} 343 | proposal_num_dict = {} 344 | 345 | for cls_id, res4_avg in self.support_dict['res4_avg'].items(): 346 | query_images = ImageList.from_tensors([images[0]]) # one query image 347 | 348 | query_features_res4 = features['res4'] # one query feature for attention rpn 349 | query_features = {'res4': query_features_res4} # one query feature for rcnn 350 | 351 | # support branch ################################## 352 | support_box_features = self.support_dict['res5_avg'][cls_id] 353 | 354 | correlation = F.conv2d(query_features_res4, res4_avg.permute(1,0,2,3), groups=1024) # attention map 355 | 356 | support_correlation = {'res4': correlation} # attention map for attention rpn 357 | 358 | proposals, _ = self.proposal_generator(query_images, support_correlation, None) 359 | support_proposals_dict[cls_id] = proposals 360 | support_box_features_dict[cls_id] = support_box_features 361 | 362 | if cls_id not in proposal_num_dict.keys(): 363 | proposal_num_dict[cls_id] = [] 364 | proposal_num_dict[cls_id].append(len(proposals[0])) 365 | 366 | del support_box_features 367 | del correlation 368 | del res4_avg 369 | del query_features_res4 370 | 371 | results, _ = self.roi_heads.eval_with_support(query_images, query_features, support_proposals_dict, support_box_features_dict) 372 | 373 | if do_postprocess: 374 | return FsodRCNN._postprocess(results, batched_inputs, images.image_sizes) 375 | else: 376 | return results 377 | 378 | def preprocess_image(self, batched_inputs): 379 | """ 380 | Normalize, pad and batch the input images. 381 | """ 382 | images = [x["image"].to(self.device) for x in batched_inputs] 383 | images = [(x - self.pixel_mean) / self.pixel_std for x in images] 384 | images = ImageList.from_tensors(images, self.backbone.size_divisibility) 385 | if self.training: 386 | # support images 387 | support_images = [x['support_images'].to(self.device) for x in batched_inputs] 388 | support_images = [(x - self.pixel_mean) / self.pixel_std for x in support_images] 389 | support_images = ImageList.from_tensors(support_images, self.backbone.size_divisibility) 390 | 391 | return images, support_images 392 | else: 393 | return images 394 | 395 | @staticmethod 396 | def _postprocess(instances, batched_inputs, image_sizes): 397 | """ 398 | Rescale the output instances to the target size. 399 | """ 400 | # note: private function; subject to changes 401 | processed_results = [] 402 | for results_per_image, input_per_image, image_size in zip( 403 | instances, batched_inputs, image_sizes 404 | ): 405 | height = input_per_image.get("height", image_size[0]) 406 | width = input_per_image.get("width", image_size[1]) 407 | r = detector_postprocess(results_per_image, height, width) 408 | processed_results.append({"instances": r}) 409 | return processed_results 410 | -------------------------------------------------------------------------------- /fewx/modeling/fsod/fsod_roi_heads.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | import inspect 3 | import logging 4 | import numpy as np 5 | from typing import Dict, List, Optional, Tuple, Union 6 | import torch 7 | from torch import nn 8 | 9 | from detectron2.config import configurable 10 | from detectron2.layers import ShapeSpec, nonzero_tuple 11 | from detectron2.structures import Boxes, ImageList, Instances, pairwise_iou 12 | from detectron2.utils.events import get_event_storage 13 | from detectron2.utils.registry import Registry 14 | 15 | from detectron2.modeling.backbone.resnet import BottleneckBlock, make_stage 16 | from detectron2.modeling.matcher import Matcher 17 | from detectron2.modeling.poolers import ROIPooler 18 | from detectron2.modeling.proposal_generator.proposal_utils import add_ground_truth_to_proposals 19 | from detectron2.modeling.sampling import subsample_labels 20 | from detectron2.modeling.roi_heads.box_head import build_box_head 21 | from detectron2.modeling.roi_heads.roi_heads import ROIHeads 22 | from .fsod_fast_rcnn import FsodFastRCNNOutputLayers 23 | 24 | import time 25 | from detectron2.structures import Boxes, Instances 26 | 27 | ROI_HEADS_REGISTRY = Registry("ROI_HEADS") 28 | ROI_HEADS_REGISTRY.__doc__ = """ 29 | Registry for ROI heads in a generalized R-CNN model. 30 | ROIHeads take feature maps and region proposals, and 31 | perform per-region computation. 32 | 33 | The registered object will be called with `obj(cfg, input_shape)`. 34 | The call is expected to return an :class:`ROIHeads`. 35 | """ 36 | 37 | logger = logging.getLogger(__name__) 38 | 39 | def build_roi_heads(cfg, input_shape): 40 | """ 41 | Build ROIHeads defined by `cfg.MODEL.ROI_HEADS.NAME`. 42 | """ 43 | name = cfg.MODEL.ROI_HEADS.NAME 44 | return ROI_HEADS_REGISTRY.get(name)(cfg, input_shape) 45 | 46 | @ROI_HEADS_REGISTRY.register() 47 | class FsodRes5ROIHeads(ROIHeads): 48 | """ 49 | The ROIHeads in a typical "C4" R-CNN model, where 50 | the box and mask head share the cropping and 51 | the per-region feature computation by a Res5 block. 52 | """ 53 | 54 | def __init__(self, cfg, input_shape): 55 | super().__init__(cfg) 56 | 57 | # fmt: off 58 | self.in_features = cfg.MODEL.ROI_HEADS.IN_FEATURES 59 | pooler_resolution = cfg.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION 60 | pooler_type = cfg.MODEL.ROI_BOX_HEAD.POOLER_TYPE 61 | pooler_scales = (1.0 / input_shape[self.in_features[0]].stride, ) 62 | sampling_ratio = cfg.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO 63 | self.mask_on = cfg.MODEL.MASK_ON 64 | # fmt: on 65 | assert not cfg.MODEL.KEYPOINT_ON 66 | assert len(self.in_features) == 1 67 | 68 | self.pooler = ROIPooler( 69 | output_size=pooler_resolution, 70 | scales=pooler_scales, 71 | sampling_ratio=sampling_ratio, 72 | pooler_type=pooler_type, 73 | ) 74 | 75 | self.res5, out_channels = self._build_res5_block(cfg) 76 | self.box_predictor = FsodFastRCNNOutputLayers( 77 | cfg, ShapeSpec(channels=out_channels, height=1, width=1) 78 | ) 79 | 80 | def _build_res5_block(self, cfg): 81 | # fmt: off 82 | stage_channel_factor = 2 ** 3 # res5 is 8x res2 83 | num_groups = cfg.MODEL.RESNETS.NUM_GROUPS 84 | width_per_group = cfg.MODEL.RESNETS.WIDTH_PER_GROUP 85 | bottleneck_channels = num_groups * width_per_group * stage_channel_factor 86 | out_channels = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS * stage_channel_factor 87 | stride_in_1x1 = cfg.MODEL.RESNETS.STRIDE_IN_1X1 88 | norm = cfg.MODEL.RESNETS.NORM 89 | assert not cfg.MODEL.RESNETS.DEFORM_ON_PER_STAGE[-1], \ 90 | "Deformable conv is not yet supported in res5 head." 91 | # fmt: on 92 | 93 | blocks = make_stage( 94 | BottleneckBlock, 95 | 3, 96 | first_stride=2, 97 | in_channels=out_channels // 2, 98 | bottleneck_channels=bottleneck_channels, 99 | out_channels=out_channels, 100 | num_groups=num_groups, 101 | norm=norm, 102 | stride_in_1x1=stride_in_1x1, 103 | ) 104 | return nn.Sequential(*blocks), out_channels 105 | 106 | def _shared_roi_transform(self, features, boxes): 107 | x = self.pooler(features, boxes) 108 | return self.res5(x) 109 | 110 | def roi_pooling(self, features, boxes): 111 | box_features = self.pooler( 112 | [features[f] for f in self.in_features], boxes 113 | ) 114 | #feature_pooled = box_features.mean(dim=[2, 3], keepdim=True) # pooled to 1x1 115 | 116 | return box_features #feature_pooled 117 | 118 | def forward(self, images, features, support_box_features, proposals, targets=None): 119 | """ 120 | See :meth:`ROIHeads.forward`. 121 | """ 122 | del images 123 | 124 | if self.training: 125 | assert targets 126 | proposals = self.label_and_sample_proposals(proposals, targets) 127 | del targets 128 | proposal_boxes = [x.proposal_boxes for x in proposals] 129 | box_features = self._shared_roi_transform( 130 | [features[f] for f in self.in_features], proposal_boxes 131 | ) 132 | 133 | #support_features = self.res5(support_features) 134 | pred_class_logits, pred_proposal_deltas = self.box_predictor(box_features, support_box_features) 135 | 136 | return pred_class_logits, pred_proposal_deltas, proposals 137 | 138 | @torch.no_grad() 139 | def eval_with_support(self, images, features, support_proposals_dict, support_box_features_dict): 140 | """ 141 | See :meth:`ROIHeads.forward`. 142 | """ 143 | del images 144 | 145 | full_proposals_ls = [] 146 | cls_ls = [] 147 | for cls_id, proposals in support_proposals_dict.items(): 148 | full_proposals_ls.append(proposals[0]) 149 | cls_ls.append(cls_id) 150 | 151 | full_proposals_ls = [Instances.cat(full_proposals_ls)] 152 | 153 | proposal_boxes = [x.proposal_boxes for x in full_proposals_ls] 154 | assert len(proposal_boxes[0]) == 2000 155 | 156 | box_features = self._shared_roi_transform( 157 | [features[f] for f in self.in_features], proposal_boxes 158 | ) 159 | 160 | full_scores_ls = [] 161 | full_bboxes_ls = [] 162 | full_cls_ls = [] 163 | cnt = 0 164 | #for cls_id, support_box_features in support_box_features_dict.items(): 165 | for cls_id in cls_ls: 166 | support_box_features = support_box_features_dict[cls_id] 167 | query_features = box_features[cnt*100:(cnt+1)*100] 168 | pred_class_logits, pred_proposal_deltas = self.box_predictor(query_features, support_box_features) 169 | full_scores_ls.append(pred_class_logits) 170 | full_bboxes_ls.append(pred_proposal_deltas) 171 | full_cls_ls.append(torch.full_like(pred_class_logits[:, 0].unsqueeze(-1), cls_id).to(torch.int8)) 172 | del query_features 173 | del support_box_features 174 | 175 | cnt += 1 176 | 177 | class_logits = torch.cat(full_scores_ls, dim=0) 178 | proposal_deltas = torch.cat(full_bboxes_ls, dim=0) 179 | pred_cls = torch.cat(full_cls_ls, dim=0) #.unsqueeze(-1) 180 | 181 | predictions = class_logits, proposal_deltas 182 | proposals = full_proposals_ls 183 | pred_instances, _ = self.box_predictor.inference(pred_cls, predictions, proposals) 184 | pred_instances = self.forward_with_given_boxes(features, pred_instances) 185 | 186 | return pred_instances, {} 187 | 188 | def forward_with_given_boxes(self, features, instances): 189 | """ 190 | Use the given boxes in `instances` to produce other (non-box) per-ROI outputs. 191 | 192 | Args: 193 | features: same as in `forward()` 194 | instances (list[Instances]): instances to predict other outputs. Expect the keys 195 | "pred_boxes" and "pred_classes" to exist. 196 | 197 | Returns: 198 | instances (Instances): 199 | the same `Instances` object, with extra 200 | fields such as `pred_masks` or `pred_keypoints`. 201 | """ 202 | assert not self.training 203 | assert instances[0].has("pred_boxes") and instances[0].has("pred_classes") 204 | 205 | if self.mask_on: 206 | features = [features[f] for f in self.in_features] 207 | x = self._shared_roi_transform(features, [x.pred_boxes for x in instances]) 208 | return self.mask_head(x, instances) 209 | else: 210 | return instances 211 | -------------------------------------------------------------------------------- /fewx/modeling/fsod/fsod_rpn.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | from typing import Dict, List, Optional, Tuple 3 | import torch 4 | import torch.nn.functional as F 5 | from fvcore.nn import smooth_l1_loss 6 | from torch import nn 7 | 8 | from detectron2.config import configurable 9 | from detectron2.layers import ShapeSpec, cat 10 | from detectron2.structures import Boxes, ImageList, Instances, pairwise_iou 11 | from detectron2.utils.events import get_event_storage 12 | from detectron2.utils.memory import retry_if_cuda_oom 13 | from detectron2.utils.registry import Registry 14 | 15 | from detectron2.modeling.anchor_generator import build_anchor_generator 16 | from detectron2.modeling.box_regression import Box2BoxTransform 17 | from detectron2.modeling.matcher import Matcher 18 | from detectron2.modeling.sampling import subsample_labels 19 | from detectron2.modeling.proposal_generator.build import PROPOSAL_GENERATOR_REGISTRY 20 | from detectron2.modeling.proposal_generator.proposal_utils import find_top_rpn_proposals 21 | 22 | RPN_HEAD_REGISTRY = Registry("RPN_HEAD") 23 | RPN_HEAD_REGISTRY.__doc__ = """ 24 | Registry for RPN heads, which take feature maps and perform 25 | objectness classification and bounding box regression for anchors. 26 | 27 | The registered object will be called with `obj(cfg, input_shape)`. 28 | The call should return a `nn.Module` object. 29 | """ 30 | 31 | 32 | """ 33 | Shape shorthand in this module: 34 | 35 | N: number of images in the minibatch 36 | L: number of feature maps per image on which RPN is run 37 | A: number of cell anchors (must be the same for all feature maps) 38 | Hi, Wi: height and width of the i-th feature map 39 | B: size of the box parameterization 40 | 41 | Naming convention: 42 | 43 | objectness: refers to the binary classification of an anchor as object vs. not object. 44 | 45 | deltas: refers to the 4-d (dx, dy, dw, dh) deltas that parameterize the box2box 46 | transform (see :class:`box_regression.Box2BoxTransform`), or 5d for rotated boxes. 47 | 48 | pred_objectness_logits: predicted objectness scores in [-inf, +inf]; use 49 | sigmoid(pred_objectness_logits) to estimate P(object). 50 | 51 | gt_labels: ground-truth binary classification labels for objectness 52 | 53 | pred_anchor_deltas: predicted box2box transform deltas 54 | 55 | gt_anchor_deltas: ground-truth box2box transform deltas 56 | """ 57 | 58 | 59 | def build_rpn_head(cfg, input_shape): 60 | """ 61 | Build an RPN head defined by `cfg.MODEL.RPN.HEAD_NAME`. 62 | """ 63 | name = cfg.MODEL.RPN.HEAD_NAME 64 | return RPN_HEAD_REGISTRY.get(name)(cfg, input_shape) 65 | 66 | 67 | @RPN_HEAD_REGISTRY.register() 68 | class StandardRPNHead(nn.Module): 69 | """ 70 | Standard RPN classification and regression heads described in :paper:`Faster R-CNN`. 71 | Uses a 3x3 conv to produce a shared hidden state from which one 1x1 conv predicts 72 | objectness logits for each anchor and a second 1x1 conv predicts bounding-box deltas 73 | specifying how to deform each anchor into an object proposal. 74 | """ 75 | 76 | @configurable 77 | def __init__(self, *, in_channels: int, num_anchors: int, box_dim: int = 4): 78 | """ 79 | NOTE: this interface is experimental. 80 | 81 | Args: 82 | in_channels (int): number of input feature channels. When using multiple 83 | input features, they must have the same number of channels. 84 | num_anchors (int): number of anchors to predict for *each spatial position* 85 | on the feature map. The total number of anchors for each 86 | feature map will be `num_anchors * H * W`. 87 | box_dim (int): dimension of a box, which is also the number of box regression 88 | predictions to make for each anchor. An axis aligned box has 89 | box_dim=4, while a rotated box has box_dim=5. 90 | """ 91 | super().__init__() 92 | # 3x3 conv for the hidden representation 93 | self.conv = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1) 94 | # 1x1 conv for predicting objectness logits 95 | self.objectness_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1) 96 | # 1x1 conv for predicting box2box transform deltas 97 | self.anchor_deltas = nn.Conv2d(in_channels, num_anchors * box_dim, kernel_size=1, stride=1) 98 | 99 | for l in [self.conv, self.objectness_logits, self.anchor_deltas]: 100 | nn.init.normal_(l.weight, std=0.01) 101 | nn.init.constant_(l.bias, 0) 102 | 103 | @classmethod 104 | def from_config(cls, cfg, input_shape): 105 | # Standard RPN is shared across levels: 106 | in_channels = [s.channels for s in input_shape] 107 | assert len(set(in_channels)) == 1, "Each level must have the same channel!" 108 | in_channels = in_channels[0] 109 | 110 | # RPNHead should take the same input as anchor generator 111 | # NOTE: it assumes that creating an anchor generator does not have unwanted side effect. 112 | anchor_generator = build_anchor_generator(cfg, input_shape) 113 | num_anchors = anchor_generator.num_anchors 114 | box_dim = anchor_generator.box_dim 115 | assert ( 116 | len(set(num_anchors)) == 1 117 | ), "Each level must have the same number of anchors per spatial position" 118 | return {"in_channels": in_channels, "num_anchors": num_anchors[0], "box_dim": box_dim} 119 | 120 | def forward(self, features: List[torch.Tensor]): 121 | """ 122 | Args: 123 | features (list[Tensor]): list of feature maps 124 | 125 | Returns: 126 | list[Tensor]: A list of L elements. 127 | Element i is a tensor of shape (N, A, Hi, Wi) representing 128 | the predicted objectness logits for all anchors. A is the number of cell anchors. 129 | list[Tensor]: A list of L elements. Element i is a tensor of shape 130 | (N, A*box_dim, Hi, Wi) representing the predicted "deltas" used to transform anchors 131 | to proposals. 132 | """ 133 | pred_objectness_logits = [] 134 | pred_anchor_deltas = [] 135 | for x in features: 136 | t = F.relu(self.conv(x)) 137 | pred_objectness_logits.append(self.objectness_logits(t)) 138 | pred_anchor_deltas.append(self.anchor_deltas(t)) 139 | return pred_objectness_logits, pred_anchor_deltas 140 | 141 | 142 | @PROPOSAL_GENERATOR_REGISTRY.register() 143 | class FsodRPN(nn.Module): 144 | """ 145 | Region Proposal Network, introduced by :paper:`Faster R-CNN`. 146 | """ 147 | 148 | @configurable 149 | def __init__( 150 | self, 151 | *, 152 | in_features: List[str], 153 | head: nn.Module, 154 | anchor_generator: nn.Module, 155 | anchor_matcher: Matcher, 156 | box2box_transform: Box2BoxTransform, 157 | batch_size_per_image: int, 158 | positive_fraction: float, 159 | pre_nms_topk: Tuple[float, float], 160 | post_nms_topk: Tuple[float, float], 161 | nms_thresh: float = 0.7, 162 | min_box_size: float = 0.0, 163 | anchor_boundary_thresh: float = -1.0, 164 | loss_weight: float = 1.0, 165 | smooth_l1_beta: float = 0.0 166 | ): 167 | """ 168 | NOTE: this interface is experimental. 169 | 170 | Args: 171 | in_features (list[str]): list of names of input features to use 172 | head (nn.Module): a module that predicts logits and regression deltas 173 | for each level from a list of per-level features 174 | anchor_generator (nn.Module): a module that creates anchors from a 175 | list of features. Usually an instance of :class:`AnchorGenerator` 176 | anchor_matcher (Matcher): label the anchors by matching them with ground truth. 177 | box2box_transform (Box2BoxTransform): defines the transform from anchors boxes to 178 | instance boxes 179 | batch_size_per_image (int): number of anchors per image to sample for training 180 | positive_fraction (float): fraction of foreground anchors to sample for training 181 | pre_nms_topk (tuple[float]): (train, test) that represents the 182 | number of top k proposals to select before NMS, in 183 | training and testing. 184 | post_nms_topk (tuple[float]): (train, test) that represents the 185 | number of top k proposals to select after NMS, in 186 | training and testing. 187 | nms_thresh (float): NMS threshold used to de-duplicate the predicted proposals 188 | min_box_size (float): remove proposal boxes with any side smaller than this threshold, 189 | in the unit of input image pixels 190 | anchor_boundary_thresh (float): legacy option 191 | loss_weight (float): weight to be multiplied to the loss 192 | smooth_l1_beta (float): beta parameter for the smooth L1 193 | regression loss. Default to use L1 loss. 194 | """ 195 | super().__init__() 196 | self.in_features = in_features 197 | self.rpn_head = head 198 | self.anchor_generator = anchor_generator 199 | self.anchor_matcher = anchor_matcher 200 | self.box2box_transform = box2box_transform 201 | self.batch_size_per_image = batch_size_per_image 202 | self.positive_fraction = positive_fraction 203 | # Map from self.training state to train/test settings 204 | self.pre_nms_topk = {True: pre_nms_topk[0], False: pre_nms_topk[1]} 205 | self.post_nms_topk = {True: post_nms_topk[0], False: post_nms_topk[1]} 206 | self.nms_thresh = nms_thresh 207 | self.min_box_size = min_box_size 208 | self.anchor_boundary_thresh = anchor_boundary_thresh 209 | self.loss_weight = loss_weight 210 | self.smooth_l1_beta = smooth_l1_beta 211 | 212 | @classmethod 213 | def from_config(cls, cfg, input_shape: Dict[str, ShapeSpec]): 214 | in_features = cfg.MODEL.RPN.IN_FEATURES 215 | ret = { 216 | "in_features": in_features, 217 | "min_box_size": cfg.MODEL.PROPOSAL_GENERATOR.MIN_SIZE, 218 | "nms_thresh": cfg.MODEL.RPN.NMS_THRESH, 219 | "batch_size_per_image": cfg.MODEL.RPN.BATCH_SIZE_PER_IMAGE, 220 | "positive_fraction": cfg.MODEL.RPN.POSITIVE_FRACTION, 221 | "smooth_l1_beta": cfg.MODEL.RPN.SMOOTH_L1_BETA, 222 | "loss_weight": cfg.MODEL.RPN.LOSS_WEIGHT, 223 | "anchor_boundary_thresh": cfg.MODEL.RPN.BOUNDARY_THRESH, 224 | "box2box_transform": Box2BoxTransform(weights=cfg.MODEL.RPN.BBOX_REG_WEIGHTS), 225 | } 226 | 227 | ret["pre_nms_topk"] = (cfg.MODEL.RPN.PRE_NMS_TOPK_TRAIN, cfg.MODEL.RPN.PRE_NMS_TOPK_TEST) 228 | ret["post_nms_topk"] = (cfg.MODEL.RPN.POST_NMS_TOPK_TRAIN, cfg.MODEL.RPN.POST_NMS_TOPK_TEST) 229 | 230 | ret["anchor_generator"] = build_anchor_generator(cfg, [input_shape[f] for f in in_features]) 231 | ret["anchor_matcher"] = Matcher( 232 | cfg.MODEL.RPN.IOU_THRESHOLDS, cfg.MODEL.RPN.IOU_LABELS, allow_low_quality_matches=True 233 | ) 234 | ret["head"] = build_rpn_head(cfg, [input_shape[f] for f in in_features]) 235 | return ret 236 | 237 | def _subsample_labels(self, label): 238 | """ 239 | Randomly sample a subset of positive and negative examples, and overwrite 240 | the label vector to the ignore value (-1) for all elements that are not 241 | included in the sample. 242 | 243 | Args: 244 | labels (Tensor): a vector of -1, 0, 1. Will be modified in-place and returned. 245 | """ 246 | pos_idx, neg_idx = subsample_labels( 247 | label, self.batch_size_per_image, self.positive_fraction, 0 248 | ) 249 | # Fill with the ignore label (-1), then set positive and negative labels 250 | label.fill_(-1) 251 | label.scatter_(0, pos_idx, 1) 252 | label.scatter_(0, neg_idx, 0) 253 | return label 254 | 255 | @torch.no_grad() 256 | def label_and_sample_anchors(self, anchors: List[Boxes], gt_instances: List[Instances]): 257 | """ 258 | Args: 259 | anchors (list[Boxes]): anchors for each feature map. 260 | gt_instances: the ground-truth instances for each image. 261 | 262 | Returns: 263 | list[Tensor]: 264 | List of #img tensors. i-th element is a vector of labels whose length is 265 | the total number of anchors across all feature maps R = sum(Hi * Wi * A). 266 | Label values are in {-1, 0, 1}, with meanings: -1 = ignore; 0 = negative 267 | class; 1 = positive class. 268 | list[Tensor]: 269 | i-th element is a Rx4 tensor. The values are the matched gt boxes for each 270 | anchor. Values are undefined for those anchors not labeled as 1. 271 | """ 272 | anchors = Boxes.cat(anchors) 273 | 274 | gt_boxes = [x.gt_boxes for x in gt_instances] 275 | image_sizes = [x.image_size for x in gt_instances] 276 | del gt_instances 277 | 278 | gt_labels = [] 279 | matched_gt_boxes = [] 280 | for image_size_i, gt_boxes_i in zip(image_sizes, gt_boxes): 281 | """ 282 | image_size_i: (h, w) for the i-th image 283 | gt_boxes_i: ground-truth boxes for i-th image 284 | """ 285 | 286 | match_quality_matrix = retry_if_cuda_oom(pairwise_iou)(gt_boxes_i, anchors) 287 | matched_idxs, gt_labels_i = retry_if_cuda_oom(self.anchor_matcher)(match_quality_matrix) 288 | # Matching is memory-expensive and may result in CPU tensors. But the result is small 289 | gt_labels_i = gt_labels_i.to(device=gt_boxes_i.device) 290 | del match_quality_matrix 291 | 292 | if self.anchor_boundary_thresh >= 0: 293 | # Discard anchors that go out of the boundaries of the image 294 | # NOTE: This is legacy functionality that is turned off by default in Detectron2 295 | anchors_inside_image = anchors.inside_box(image_size_i, self.anchor_boundary_thresh) 296 | gt_labels_i[~anchors_inside_image] = -1 297 | 298 | # A vector of labels (-1, 0, 1) for each anchor 299 | gt_labels_i = self._subsample_labels(gt_labels_i) 300 | 301 | if len(gt_boxes_i) == 0: 302 | # These values won't be used anyway since the anchor is labeled as background 303 | matched_gt_boxes_i = torch.zeros_like(anchors.tensor) 304 | else: 305 | # TODO wasted indexing computation for ignored boxes 306 | matched_gt_boxes_i = gt_boxes_i[matched_idxs].tensor 307 | 308 | gt_labels.append(gt_labels_i) # N,AHW 309 | matched_gt_boxes.append(matched_gt_boxes_i) 310 | return gt_labels, matched_gt_boxes 311 | 312 | def losses( 313 | self, 314 | anchors, 315 | pred_objectness_logits: List[torch.Tensor], 316 | gt_labels: List[torch.Tensor], 317 | pred_anchor_deltas: List[torch.Tensor], 318 | gt_boxes, 319 | ): 320 | """ 321 | Return the losses from a set of RPN predictions and their associated ground-truth. 322 | 323 | Args: 324 | anchors (list[Boxes or RotatedBoxes]): anchors for each feature map, each 325 | has shape (Hi*Wi*A, B), where B is box dimension (4 or 5). 326 | pred_objectness_logits (list[Tensor]): A list of L elements. 327 | Element i is a tensor of shape (N, Hi*Wi*A) representing 328 | the predicted objectness logits for all anchors. 329 | gt_labels (list[Tensor]): Output of :meth:`label_and_sample_anchors`. 330 | pred_anchor_deltas (list[Tensor]): A list of L elements. Element i is a tensor of shape 331 | (N, Hi*Wi*A, 4 or 5) representing the predicted "deltas" used to transform anchors 332 | to proposals. 333 | gt_boxes (list[Boxes or RotatedBoxes]): Output of :meth:`label_and_sample_anchors`. 334 | 335 | Returns: 336 | dict[loss name -> loss value]: A dict mapping from loss name to loss value. 337 | Loss names are: `loss_rpn_cls` for objectness classification and 338 | `loss_rpn_loc` for proposal localization. 339 | """ 340 | num_images = len(gt_labels) 341 | gt_labels = torch.stack(gt_labels) # (N, sum(Hi*Wi*Ai)) 342 | anchors = type(anchors[0]).cat(anchors).tensor # Ax(4 or 5) 343 | #print(anchors.shape, gt_boxes[0].shape, len(gt_boxes)) 344 | gt_anchor_deltas = [self.box2box_transform.get_deltas(anchors, k) for k in gt_boxes] 345 | gt_anchor_deltas = torch.stack(gt_anchor_deltas) # (N, sum(Hi*Wi*Ai), 4 or 5) 346 | 347 | # Log the number of positive/negative anchors per-image that's used in training 348 | pos_mask = gt_labels == 1 349 | num_pos_anchors = pos_mask.sum().item() 350 | num_neg_anchors = (gt_labels == 0).sum().item() 351 | storage = get_event_storage() 352 | storage.put_scalar("rpn/num_pos_anchors", num_pos_anchors / num_images) 353 | storage.put_scalar("rpn/num_neg_anchors", num_neg_anchors / num_images) 354 | 355 | localization_loss = smooth_l1_loss( 356 | cat(pred_anchor_deltas, dim=1)[pos_mask], 357 | gt_anchor_deltas[pos_mask], 358 | self.smooth_l1_beta, 359 | reduction="sum", 360 | ) 361 | valid_mask = gt_labels >= 0 362 | objectness_loss = F.binary_cross_entropy_with_logits( 363 | cat(pred_objectness_logits, dim=1)[valid_mask], 364 | gt_labels[valid_mask].to(torch.float32), 365 | reduction="sum", 366 | ) 367 | normalizer = self.batch_size_per_image * num_images 368 | return { 369 | "loss_rpn_cls": objectness_loss / normalizer, 370 | "loss_rpn_loc": localization_loss / normalizer, 371 | } 372 | 373 | def forward( 374 | self, 375 | images: ImageList, 376 | features: Dict[str, torch.Tensor], 377 | gt_instances: Optional[Instances] = None, 378 | ): 379 | """ 380 | Args: 381 | images (ImageList): input images of length `N` 382 | features (dict[str, Tensor]): input data as a mapping from feature 383 | map name to tensor. Axis 0 represents the number of images `N` in 384 | the input data; axes 1-3 are channels, height, and width, which may 385 | vary between feature maps (e.g., if a feature pyramid is used). 386 | gt_instances (list[Instances], optional): a length `N` list of `Instances`s. 387 | Each `Instances` stores ground-truth instances for the corresponding image. 388 | 389 | Returns: 390 | proposals: list[Instances]: contains fields "proposal_boxes", "objectness_logits" 391 | loss: dict[Tensor] or None 392 | """ 393 | features = [features[f] for f in self.in_features] 394 | anchors = self.anchor_generator(features) 395 | 396 | pred_objectness_logits, pred_anchor_deltas = self.rpn_head(features) 397 | # Transpose the Hi*Wi*A dimension to the middle: 398 | pred_objectness_logits = [ 399 | # (N, A, Hi, Wi) -> (N, Hi, Wi, A) -> (N, Hi*Wi*A) 400 | score.permute(0, 2, 3, 1).flatten(1) 401 | for score in pred_objectness_logits 402 | ] 403 | pred_anchor_deltas = [ 404 | # (N, A*B, Hi, Wi) -> (N, A, B, Hi, Wi) -> (N, Hi, Wi, A, B) -> (N, Hi*Wi*A, B) 405 | x.view(x.shape[0], -1, self.anchor_generator.box_dim, x.shape[-2], x.shape[-1]) 406 | .permute(0, 3, 4, 1, 2) 407 | .flatten(1, -2) 408 | for x in pred_anchor_deltas 409 | ] 410 | 411 | if self.training: 412 | gt_labels, gt_boxes = self.label_and_sample_anchors(anchors, gt_instances) 413 | #losses = self.losses( 414 | # anchors, pred_objectness_logits, gt_labels, pred_anchor_deltas, gt_boxes 415 | #) 416 | #losses = {k: v * self.loss_weight for k, v in losses.items()} 417 | proposals = self.predict_proposals( 418 | anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes 419 | ) 420 | 421 | return proposals, anchors, pred_objectness_logits, gt_labels, pred_anchor_deltas, gt_boxes #, losses 422 | else: 423 | losses = {} 424 | proposals = self.predict_proposals( 425 | anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes 426 | ) 427 | return proposals, losses 428 | 429 | @torch.no_grad() 430 | def predict_proposals( 431 | self, 432 | anchors, 433 | pred_objectness_logits: List[torch.Tensor], 434 | pred_anchor_deltas: List[torch.Tensor], 435 | image_sizes: List[Tuple[int, int]], 436 | ): 437 | """ 438 | Decode all the predicted box regression deltas to proposals. Find the top proposals 439 | by applying NMS and removing boxes that are too small. 440 | 441 | Returns: 442 | proposals (list[Instances]): list of N Instances. The i-th Instances 443 | stores post_nms_topk object proposals for image i, sorted by their 444 | objectness score in descending order. 445 | """ 446 | # The proposals are treated as fixed for approximate joint training with roi heads. 447 | # This approach ignores the derivative w.r.t. the proposal boxes’ coordinates that 448 | # are also network responses, so is approximate. 449 | pred_proposals = self._decode_proposals(anchors, pred_anchor_deltas) 450 | return find_top_rpn_proposals( 451 | pred_proposals, 452 | pred_objectness_logits, 453 | image_sizes, 454 | self.nms_thresh, 455 | self.pre_nms_topk[self.training], 456 | self.post_nms_topk[self.training], 457 | self.min_box_size, 458 | self.training, 459 | ) 460 | 461 | def _decode_proposals(self, anchors, pred_anchor_deltas: List[torch.Tensor]): 462 | """ 463 | Transform anchors into proposals by applying the predicted anchor deltas. 464 | 465 | Returns: 466 | proposals (list[Tensor]): A list of L tensors. Tensor i has shape 467 | (N, Hi*Wi*A, B) 468 | """ 469 | N = pred_anchor_deltas[0].shape[0] 470 | proposals = [] 471 | # For each feature map 472 | for anchors_i, pred_anchor_deltas_i in zip(anchors, pred_anchor_deltas): 473 | B = anchors_i.tensor.size(1) 474 | pred_anchor_deltas_i = pred_anchor_deltas_i.reshape(-1, B) 475 | # Expand anchors to shape (N*Hi*Wi*A, B) 476 | anchors_i = anchors_i.tensor.unsqueeze(0).expand(N, -1, -1).reshape(-1, B) 477 | proposals_i = self.box2box_transform.apply_deltas(pred_anchor_deltas_i, anchors_i) 478 | # Append feature map proposals with shape (N, Hi*Wi*A, B) 479 | proposals.append(proposals_i.view(N, -1, B)) 480 | return proposals 481 | -------------------------------------------------------------------------------- /fewx/solver/__init__.py: -------------------------------------------------------------------------------- 1 | from .build import build_lr_scheduler, build_optimizer 2 | 3 | __all__ = [k for k in globals().keys() if not k.startswith("_")] 4 | -------------------------------------------------------------------------------- /fewx/solver/build.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | from enum import Enum 3 | from typing import Any, Callable, Dict, Iterable, List, Set, Type, Union 4 | import torch 5 | 6 | from detectron2.config import CfgNode 7 | 8 | from detectron2.solver.lr_scheduler import WarmupCosineLR, WarmupMultiStepLR 9 | 10 | _GradientClipperInput = Union[torch.Tensor, Iterable[torch.Tensor]] 11 | _GradientClipper = Callable[[_GradientClipperInput], None] 12 | 13 | 14 | class GradientClipType(Enum): 15 | VALUE = "value" 16 | NORM = "norm" 17 | 18 | 19 | def _create_gradient_clipper(cfg: CfgNode) -> _GradientClipper: 20 | """ 21 | Creates gradient clipping closure to clip by value or by norm, 22 | according to the provided config. 23 | """ 24 | cfg = cfg.clone() 25 | 26 | def clip_grad_norm(p: _GradientClipperInput): 27 | torch.nn.utils.clip_grad_norm_(p, cfg.CLIP_VALUE, cfg.NORM_TYPE) 28 | 29 | def clip_grad_value(p: _GradientClipperInput): 30 | torch.nn.utils.clip_grad_value_(p, cfg.CLIP_VALUE) 31 | 32 | _GRADIENT_CLIP_TYPE_TO_CLIPPER = { 33 | GradientClipType.VALUE: clip_grad_value, 34 | GradientClipType.NORM: clip_grad_norm, 35 | } 36 | return _GRADIENT_CLIP_TYPE_TO_CLIPPER[GradientClipType(cfg.CLIP_TYPE)] 37 | 38 | 39 | def _generate_optimizer_class_with_gradient_clipping( 40 | optimizer_type: Type[torch.optim.Optimizer], gradient_clipper: _GradientClipper 41 | ) -> Type[torch.optim.Optimizer]: 42 | """ 43 | Dynamically creates a new type that inherits the type of a given instance 44 | and overrides the `step` method to add gradient clipping 45 | """ 46 | 47 | def optimizer_wgc_step(self, closure=None): 48 | for group in self.param_groups: 49 | for p in group["params"]: 50 | gradient_clipper(p) 51 | super(type(self), self).step(closure) 52 | 53 | OptimizerWithGradientClip = type( 54 | optimizer_type.__name__ + "WithGradientClip", 55 | (optimizer_type,), 56 | {"step": optimizer_wgc_step}, 57 | ) 58 | return OptimizerWithGradientClip 59 | 60 | 61 | def maybe_add_gradient_clipping( 62 | cfg: CfgNode, optimizer: torch.optim.Optimizer 63 | ) -> torch.optim.Optimizer: 64 | """ 65 | If gradient clipping is enabled through config options, wraps the existing 66 | optimizer instance of some type OptimizerType to become an instance 67 | of the new dynamically created class OptimizerTypeWithGradientClip 68 | that inherits OptimizerType and overrides the `step` method to 69 | include gradient clipping. 70 | 71 | Args: 72 | cfg: CfgNode 73 | configuration options 74 | optimizer: torch.optim.Optimizer 75 | existing optimizer instance 76 | 77 | Return: 78 | optimizer: torch.optim.Optimizer 79 | either the unmodified optimizer instance (if gradient clipping is 80 | disabled), or the same instance with adjusted __class__ to override 81 | the `step` method and include gradient clipping 82 | """ 83 | if not cfg.SOLVER.CLIP_GRADIENTS.ENABLED: 84 | return optimizer 85 | grad_clipper = _create_gradient_clipper(cfg.SOLVER.CLIP_GRADIENTS) 86 | OptimizerWithGradientClip = _generate_optimizer_class_with_gradient_clipping( 87 | type(optimizer), grad_clipper 88 | ) 89 | optimizer.__class__ = OptimizerWithGradientClip 90 | return optimizer 91 | 92 | 93 | def build_optimizer(cfg: CfgNode, model: torch.nn.Module) -> torch.optim.Optimizer: 94 | """ 95 | Build an optimizer from config. 96 | """ 97 | norm_module_types = ( 98 | torch.nn.BatchNorm1d, 99 | torch.nn.BatchNorm2d, 100 | torch.nn.BatchNorm3d, 101 | torch.nn.SyncBatchNorm, 102 | # NaiveSyncBatchNorm inherits from BatchNorm2d 103 | torch.nn.GroupNorm, 104 | torch.nn.InstanceNorm1d, 105 | torch.nn.InstanceNorm2d, 106 | torch.nn.InstanceNorm3d, 107 | torch.nn.LayerNorm, 108 | torch.nn.LocalResponseNorm, 109 | ) 110 | params: List[Dict[str, Any]] = [] 111 | memo: Set[torch.nn.parameter.Parameter] = set() 112 | for module in model.modules(): 113 | for key, value in module.named_parameters(): #recurse=False): 114 | if not value.requires_grad: 115 | continue 116 | # Avoid duplicating parameters 117 | if value in memo: 118 | continue 119 | memo.add(value) 120 | lr = cfg.SOLVER.BASE_LR 121 | weight_decay = cfg.SOLVER.WEIGHT_DECAY 122 | if isinstance(module, norm_module_types): 123 | weight_decay = cfg.SOLVER.WEIGHT_DECAY_NORM 124 | elif "bias" in key: #key == "bias": 125 | # NOTE: unlike Detectron v1, we now default BIAS_LR_FACTOR to 1.0 126 | # and WEIGHT_DECAY_BIAS to WEIGHT_DECAY so that bias optimizer 127 | # hyperparameters are by default exactly the same as for regular 128 | # weights. 129 | lr = cfg.SOLVER.BASE_LR * cfg.SOLVER.BIAS_LR_FACTOR 130 | weight_decay = cfg.SOLVER.WEIGHT_DECAY_BIAS 131 | 132 | if 'box_predictor' in key: 133 | lr = cfg.SOLVER.BASE_LR * cfg.SOLVER.HEAD_LR_FACTOR 134 | params += [{"params": [value], "lr": lr, "weight_decay": weight_decay}] 135 | optimizer = torch.optim.SGD( 136 | params, cfg.SOLVER.BASE_LR, momentum=cfg.SOLVER.MOMENTUM, nesterov=cfg.SOLVER.NESTEROV 137 | ) 138 | optimizer = maybe_add_gradient_clipping(cfg, optimizer) 139 | return optimizer 140 | 141 | 142 | def build_lr_scheduler( 143 | cfg: CfgNode, optimizer: torch.optim.Optimizer 144 | ) -> torch.optim.lr_scheduler._LRScheduler: 145 | """ 146 | Build a LR scheduler from config. 147 | """ 148 | name = cfg.SOLVER.LR_SCHEDULER_NAME 149 | if name == "WarmupMultiStepLR": 150 | return WarmupMultiStepLR( 151 | optimizer, 152 | cfg.SOLVER.STEPS, 153 | cfg.SOLVER.GAMMA, 154 | warmup_factor=cfg.SOLVER.WARMUP_FACTOR, 155 | warmup_iters=cfg.SOLVER.WARMUP_ITERS, 156 | warmup_method=cfg.SOLVER.WARMUP_METHOD, 157 | ) 158 | elif name == "WarmupCosineLR": 159 | return WarmupCosineLR( 160 | optimizer, 161 | cfg.SOLVER.MAX_ITER, 162 | warmup_factor=cfg.SOLVER.WARMUP_FACTOR, 163 | warmup_iters=cfg.SOLVER.WARMUP_ITERS, 164 | warmup_method=cfg.SOLVER.WARMUP_METHOD, 165 | ) 166 | else: 167 | raise ValueError("Unknown LR scheduler: {}".format(name)) 168 | -------------------------------------------------------------------------------- /fewx/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fanq15/FewX/6347165aa24ba20d03ef06321f82bd818e3a8021/fewx/utils/__init__.py -------------------------------------------------------------------------------- /fewx/utils/comm.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | import torch.distributed as dist 4 | 5 | from detectron2.utils.comm import get_world_size 6 | 7 | 8 | def reduce_sum(tensor): 9 | world_size = get_world_size() 10 | if world_size < 2: 11 | return tensor 12 | tensor = tensor.clone() 13 | dist.all_reduce(tensor, op=dist.ReduceOp.SUM) 14 | return tensor 15 | 16 | 17 | def aligned_bilinear(tensor, factor): 18 | assert tensor.dim() == 4 19 | assert factor >= 1 20 | assert int(factor) == factor 21 | 22 | if factor == 1: 23 | return tensor 24 | 25 | h, w = tensor.size()[2:] 26 | tensor = F.pad(tensor, pad=(0, 1, 0, 1), mode="replicate") 27 | oh = factor * h + 1 28 | ow = factor * w + 1 29 | tensor = F.interpolate( 30 | tensor, size=(oh, ow), 31 | mode='bilinear', 32 | align_corners=True 33 | ) 34 | tensor = F.pad( 35 | tensor, pad=(factor // 2, 0, factor // 2, 0), 36 | mode="replicate" 37 | ) 38 | 39 | return tensor[:, :, :oh - 1, :ow - 1] 40 | 41 | 42 | def compute_locations(h, w, stride, device): 43 | shifts_x = torch.arange( 44 | 0, w * stride, step=stride, 45 | dtype=torch.float32, device=device 46 | ) 47 | shifts_y = torch.arange( 48 | 0, h * stride, step=stride, 49 | dtype=torch.float32, device=device 50 | ) 51 | shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x) 52 | shift_x = shift_x.reshape(-1) 53 | shift_y = shift_y.reshape(-1) 54 | locations = torch.stack((shift_x, shift_y), dim=1) + stride // 2 55 | return locations 56 | -------------------------------------------------------------------------------- /fewx/utils/measures.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | # Adapted from https://github.com/ShichenLiu/CondenseNet/blob/master/utils.py 3 | from __future__ import absolute_import 4 | from __future__ import unicode_literals 5 | from __future__ import print_function 6 | from __future__ import division 7 | 8 | import operator 9 | 10 | from functools import reduce 11 | 12 | 13 | def get_num_gen(gen): 14 | return sum(1 for x in gen) 15 | 16 | 17 | def is_pruned(layer): 18 | try: 19 | layer.mask 20 | return True 21 | except AttributeError: 22 | return False 23 | 24 | 25 | def is_leaf(model): 26 | return get_num_gen(model.children()) == 0 27 | 28 | 29 | def get_layer_info(layer): 30 | layer_str = str(layer) 31 | type_name = layer_str[:layer_str.find('(')].strip() 32 | return type_name 33 | 34 | 35 | def get_layer_param(model): 36 | return sum([reduce(operator.mul, i.size(), 1) for i in model.parameters()]) 37 | 38 | 39 | ### The input batch size should be 1 to call this function 40 | def measure_layer(layer, *args): 41 | global count_ops, count_params 42 | 43 | for x in args: 44 | delta_ops = 0 45 | delta_params = 0 46 | multi_add = 1 47 | type_name = get_layer_info(layer) 48 | 49 | ### ops_conv 50 | if type_name in ['Conv2d']: 51 | out_h = int((x.size()[2] + 2 * layer.padding[0] / layer.dilation[0] - layer.kernel_size[0]) / 52 | layer.stride[0] + 1) 53 | out_w = int((x.size()[3] + 2 * layer.padding[1] / layer.dilation[1] - layer.kernel_size[1]) / 54 | layer.stride[1] + 1) 55 | delta_ops = layer.in_channels * layer.out_channels * layer.kernel_size[0] * layer.kernel_size[1] * out_h * out_w / layer.groups * multi_add 56 | delta_params = get_layer_param(layer) 57 | 58 | elif type_name in ['ConvTranspose2d']: 59 | _, _, in_h, in_w = x.size() 60 | out_h = int((in_h-1)*layer.stride[0] - 2 * layer.padding[0] + layer.kernel_size[0] + layer.output_padding[0]) 61 | out_w = int((in_w-1)*layer.stride[1] - 2 * layer.padding[1] + layer.kernel_size[1] + layer.output_padding[1]) 62 | delta_ops = layer.in_channels * layer.out_channels * layer.kernel_size[0] * \ 63 | layer.kernel_size[1] * out_h * out_w / layer.groups * multi_add 64 | delta_params = get_layer_param(layer) 65 | 66 | ### ops_learned_conv 67 | elif type_name in ['LearnedGroupConv']: 68 | measure_layer(layer.relu, x) 69 | measure_layer(layer.norm, x) 70 | conv = layer.conv 71 | out_h = int((x.size()[2] + 2 * conv.padding[0] - conv.kernel_size[0]) / 72 | conv.stride[0] + 1) 73 | out_w = int((x.size()[3] + 2 * conv.padding[1] - conv.kernel_size[1]) / 74 | conv.stride[1] + 1) 75 | delta_ops = conv.in_channels * conv.out_channels * conv.kernel_size[0] * conv.kernel_size[1] * out_h * out_w / layer.condense_factor * multi_add 76 | delta_params = get_layer_param(conv) / layer.condense_factor 77 | 78 | ### ops_nonlinearity 79 | elif type_name in ['ReLU', 'ReLU6']: 80 | delta_ops = x.numel() 81 | delta_params = get_layer_param(layer) 82 | 83 | ### ops_pooling 84 | elif type_name in ['AvgPool2d', 'MaxPool2d']: 85 | in_w = x.size()[2] 86 | kernel_ops = layer.kernel_size * layer.kernel_size 87 | out_w = int((in_w + 2 * layer.padding - layer.kernel_size) / layer.stride + 1) 88 | out_h = int((in_w + 2 * layer.padding - layer.kernel_size) / layer.stride + 1) 89 | delta_ops = x.size()[0] * x.size()[1] * out_w * out_h * kernel_ops 90 | delta_params = get_layer_param(layer) 91 | 92 | elif type_name in ['LastLevelMaxPool']: 93 | pass 94 | 95 | elif type_name in ['AdaptiveAvgPool2d']: 96 | delta_ops = x.size()[0] * x.size()[1] * x.size()[2] * x.size()[3] 97 | delta_params = get_layer_param(layer) 98 | 99 | elif type_name in ['ZeroPad2d', 'RetinaNetPostProcessor']: 100 | pass 101 | #delta_ops = x.size()[0] * x.size()[1] * x.size()[2] * x.size()[3] 102 | #delta_params = get_layer_param(layer) 103 | 104 | ### ops_linear 105 | elif type_name in ['Linear']: 106 | weight_ops = layer.weight.numel() * multi_add 107 | bias_ops = layer.bias.numel() 108 | delta_ops = x.size()[0] * (weight_ops + bias_ops) 109 | delta_params = get_layer_param(layer) 110 | 111 | ### ops_nothing 112 | elif type_name in ['BatchNorm2d', 'Dropout2d', 'DropChannel', 'Dropout', 'FrozenBatchNorm2d', 'GroupNorm']: 113 | delta_params = get_layer_param(layer) 114 | 115 | elif type_name in ['SumTwo']: 116 | delta_ops = x.numel() 117 | 118 | elif type_name in ['AggregateCell']: 119 | if not layer.pre_transform: 120 | delta_ops = 2 * x.numel() # twice for each input 121 | else: 122 | measure_layer(layer.branch_1, x) 123 | measure_layer(layer.branch_2, x) 124 | delta_params = get_layer_param(layer) 125 | 126 | elif type_name in ['Identity', 'Zero']: 127 | pass 128 | 129 | elif type_name in ['Scale']: 130 | delta_params = get_layer_param(layer) 131 | delta_ops = x.numel() 132 | 133 | elif type_name in ['FCOSPostProcessor', 'RPNPostProcessor', 'KeypointPostProcessor', 134 | 'ROIAlign', 'PostProcessor', 'KeypointRCNNPredictor', 135 | 'NaiveSyncBatchNorm', 'Upsample', 'Sequential']: 136 | pass 137 | 138 | elif type_name in ['DeformConv']: 139 | # don't count bilinear 140 | offset_conv = list(layer.parameters())[0] 141 | delta_ops = reduce(operator.mul, offset_conv.size(), x.size()[2] * x.size()[3]) 142 | out_h = int((x.size()[2] + 2 * layer.padding[0] / layer.dilation[0] 143 | - layer.kernel_size[0]) / layer.stride[0] + 1) 144 | out_w = int((x.size()[3] + 2 * layer.padding[1] / layer.dilation[1] 145 | - layer.kernel_size[1]) / layer.stride[1] + 1) 146 | delta_ops += layer.in_channels * layer.out_channels * layer.kernel_size[0] * layer.kernel_size[1] * out_h * out_w / layer.groups * multi_add 147 | delta_params = get_layer_param(layer) 148 | 149 | ### unknown layer type 150 | else: 151 | raise TypeError('unknown layer type: %s' % type_name) 152 | 153 | count_ops += delta_ops 154 | count_params += delta_params 155 | return 156 | 157 | 158 | def measure_model(model, x): 159 | global count_ops, count_params 160 | count_ops = 0 161 | count_params = 0 162 | 163 | def should_measure(x): 164 | return is_leaf(x) or is_pruned(x) 165 | 166 | def modify_forward(model): 167 | for child in model.children(): 168 | if should_measure(child): 169 | def new_forward(m): 170 | def lambda_forward(*args): 171 | measure_layer(m, *args) 172 | return m.old_forward(*args) 173 | return lambda_forward 174 | child.old_forward = child.forward 175 | child.forward = new_forward(child) 176 | else: 177 | modify_forward(child) 178 | 179 | def restore_forward(model): 180 | for child in model.children(): 181 | # leaf node 182 | if is_leaf(child) and hasattr(child, 'old_forward'): 183 | child.forward = child.old_forward 184 | child.old_forward = None 185 | else: 186 | restore_forward(child) 187 | 188 | modify_forward(model) 189 | out = model.forward(x) 190 | restore_forward(model) 191 | 192 | return out, count_ops, count_params 193 | -------------------------------------------------------------------------------- /fsod_train_net.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 3 | 4 | """ 5 | TridentNet Training Script. 6 | 7 | This script is a simplified version of the training script in detectron2/tools. 8 | """ 9 | 10 | import os 11 | 12 | from detectron2.checkpoint import DetectionCheckpointer 13 | from detectron2.config import get_cfg 14 | from detectron2.engine import DefaultTrainer, default_argument_parser, default_setup, launch 15 | #from detectron2.evaluation import COCOEvaluator 16 | #from detectron2.data import build_detection_train_loader 17 | from detectron2.data import build_batch_data_loader 18 | 19 | from fewx.config import get_cfg 20 | from fewx.data.dataset_mapper import DatasetMapperWithSupport 21 | from fewx.data.build import build_detection_train_loader, build_detection_test_loader 22 | from fewx.solver import build_optimizer 23 | from fewx.evaluation import COCOEvaluator 24 | 25 | import bisect 26 | import copy 27 | import itertools 28 | import logging 29 | import numpy as np 30 | import operator 31 | import pickle 32 | import torch.utils.data 33 | 34 | import detectron2.utils.comm as comm 35 | from detectron2.utils.logger import setup_logger 36 | 37 | class Trainer(DefaultTrainer): 38 | 39 | @classmethod 40 | def build_train_loader(cls, cfg): 41 | """ 42 | Returns: 43 | iterable 44 | It calls :func:`detectron2.data.build_detection_train_loader` with a customized 45 | DatasetMapper, which adds categorical labels as a semantic mask. 46 | """ 47 | mapper = DatasetMapperWithSupport(cfg) 48 | return build_detection_train_loader(cfg, mapper) 49 | 50 | @classmethod 51 | def build_test_loader(cls, cfg, dataset_name): 52 | """ 53 | Returns: 54 | iterable 55 | It now calls :func:`detectron2.data.build_detection_test_loader`. 56 | Overwrite it if you'd like a different data loader. 57 | """ 58 | return build_detection_test_loader(cfg, dataset_name) 59 | 60 | @classmethod 61 | def build_optimizer(cls, cfg, model): 62 | """ 63 | Returns: 64 | torch.optim.Optimizer: 65 | It now calls :func:`detectron2.solver.build_optimizer`. 66 | Overwrite it if you'd like a different optimizer. 67 | """ 68 | return build_optimizer(cfg, model) 69 | 70 | @classmethod 71 | def build_evaluator(cls, cfg, dataset_name, output_folder=None): 72 | if output_folder is None: 73 | output_folder = os.path.join(cfg.OUTPUT_DIR, "inference") 74 | return COCOEvaluator(dataset_name, cfg, True, output_folder) 75 | 76 | 77 | def setup(args): 78 | """ 79 | Create configs and perform basic setups. 80 | """ 81 | cfg = get_cfg() 82 | cfg.merge_from_file(args.config_file) 83 | cfg.merge_from_list(args.opts) 84 | cfg.freeze() 85 | default_setup(cfg, args) 86 | 87 | rank = comm.get_rank() 88 | setup_logger(cfg.OUTPUT_DIR, distributed_rank=rank, name="fewx") 89 | 90 | return cfg 91 | 92 | 93 | def main(args): 94 | cfg = setup(args) 95 | 96 | if args.eval_only: 97 | model = Trainer.build_model(cfg) 98 | DetectionCheckpointer(model, save_dir=cfg.OUTPUT_DIR).resume_or_load( 99 | cfg.MODEL.WEIGHTS, resume=args.resume 100 | ) 101 | res = Trainer.test(cfg, model) 102 | return res 103 | 104 | trainer = Trainer(cfg) 105 | trainer.resume_or_load(resume=args.resume) 106 | return trainer.train() 107 | 108 | 109 | if __name__ == "__main__": 110 | args = default_argument_parser().parse_args() 111 | print("Command Line Args:", args) 112 | launch( 113 | main, 114 | args.num_gpus, 115 | num_machines=args.num_machines, 116 | machine_rank=args.machine_rank, 117 | dist_url=args.dist_url, 118 | args=(args,), 119 | ) 120 | -------------------------------------------------------------------------------- /log/metric.txt: -------------------------------------------------------------------------------- 1 | Running per image evaluation... 2 | Evaluate annotation type *bbox* 3 | COCOeval_opt.evaluate() finished in 7.92 seconds. 4 | Accumulating evaluation results... 5 | COCOeval_opt.accumulate() finished in 0.96 seconds. 6 | Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.030 7 | Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.056 8 | Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.029 9 | Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.007 10 | Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.031 11 | Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.052 12 | Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.047 13 | Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.066 14 | Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.066 15 | Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.009 16 | Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.059 17 | Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.114 18 | [08/08 18:10:48 fewx.evaluation.coco_evaluation]: Evaluation results for bbox: 19 | | AP | AP50 | AP75 | APs | APm | APl | 20 | |:-----:|:------:|:------:|:-----:|:-----:|:-----:| 21 | | 2.989 | 5.592 | 2.948 | 0.724 | 3.057 | 5.165 | 22 | [08/08 18:10:48 fewx.evaluation.coco_evaluation]: Evaluation results for VOC 20 categories =======> AP : 11.95 23 | [08/08 18:10:48 fewx.evaluation.coco_evaluation]: Evaluation results for VOC 20 categories =======> AP50: 22.37 24 | [08/08 18:10:48 fewx.evaluation.coco_evaluation]: Evaluation results for VOC 20 categories =======> AP75: 11.79 25 | [08/08 18:10:48 fewx.evaluation.coco_evaluation]: Evaluation results for VOC 20 categories =======> APs : 2.89 26 | [08/08 18:10:48 fewx.evaluation.coco_evaluation]: Evaluation results for VOC 20 categories =======> APm : 12.23 27 | [08/08 18:10:48 fewx.evaluation.coco_evaluation]: Evaluation results for VOC 20 categories =======> APl : 20.66 28 | [08/08 18:10:48 fewx.evaluation.coco_evaluation]: Evaluation results for Non VOC 60 categories =======> AP : 0.00 29 | [08/08 18:10:48 fewx.evaluation.coco_evaluation]: Evaluation results for Non VOC 60 categories =======> AP50: 0.00 30 | [08/08 18:10:48 fewx.evaluation.coco_evaluation]: Evaluation results for Non VOC 60 categories =======> AP75: 0.00 31 | [08/08 18:10:48 fewx.evaluation.coco_evaluation]: Evaluation results for Non VOC 60 categories =======> APs : 0.00 32 | [08/08 18:10:48 fewx.evaluation.coco_evaluation]: Evaluation results for Non VOC 60 categories =======> APm : 0.00 33 | [08/08 18:10:48 fewx.evaluation.coco_evaluation]: Evaluation results for Non VOC 60 categories =======> APl : 0.00 34 | [08/08 18:10:48 fewx.evaluation.coco_evaluation]: Per-category bbox AP: 35 | | category | AP | category | AP | category | AP | 36 | |:--------------|:-------|:-------------|:-------|:---------------|:-------| 37 | | person | 10.446 | bicycle | 5.604 | car | 11.692 | 38 | | motorcycle | 10.597 | airplane | 23.452 | bus | 20.706 | 39 | | train | 16.856 | truck | 0.000 | boat | 1.730 | 40 | | traffic light | 0.000 | fire hydrant | 0.000 | stop sign | 0.000 | 41 | | parking meter | 0.000 | bench | 0.000 | bird | 8.258 | 42 | | cat | 25.047 | dog | 17.904 | horse | 11.009 | 43 | | sheep | 6.285 | cow | 9.031 | elephant | 0.000 | 44 | | bear | 0.000 | zebra | 0.000 | giraffe | 0.000 | 45 | | backpack | 0.000 | umbrella | 0.000 | handbag | 0.000 | 46 | | tie | 0.000 | suitcase | 0.000 | frisbee | 0.000 | 47 | | skis | 0.000 | snowboard | 0.000 | sports ball | 0.000 | 48 | | kite | 0.000 | baseball bat | 0.000 | baseball glove | 0.000 | 49 | | skateboard | 0.000 | surfboard | 0.000 | tennis racket | 0.000 | 50 | | bottle | 7.444 | wine glass | 0.000 | cup | 0.000 | 51 | | fork | 0.000 | knife | 0.000 | spoon | 0.000 | 52 | | bowl | 0.000 | banana | 0.000 | apple | 0.000 | 53 | | sandwich | 0.000 | orange | 0.000 | broccoli | 0.000 | 54 | | carrot | 0.000 | hot dog | 0.000 | pizza | 0.000 | 55 | | donut | 0.000 | cake | 0.000 | chair | 2.958 | 56 | | couch | 11.579 | potted plant | 4.004 | bed | 0.000 | 57 | | dining table | 8.955 | toilet | 0.000 | tv | 25.533 | 58 | | laptop | 0.000 | mouse | 0.000 | remote | 0.000 | 59 | | keyboard | 0.000 | cell phone | 0.000 | microwave | 0.000 | 60 | | oven | 0.000 | toaster | 0.000 | sink | 0.000 | 61 | | refrigerator | 0.000 | book | 0.000 | clock | 0.000 | 62 | | vase | 0.000 | scissors | 0.000 | teddy bear | 0.000 | 63 | | hair drier | 0.000 | toothbrush | 0.000 | | | 64 | [08/08 18:10:48 d2.engine.defaults]: Evaluation results for coco_2017_val in csv format: 65 | [08/08 18:10:48 d2.evaluation.testing]: copypaste: Task: bbox 66 | [08/08 18:10:48 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl 67 | [08/08 18:10:48 d2.evaluation.testing]: copypaste: 2.9886,5.5922,2.9480,0.7236,3.0575,5.1649 68 | --------------------------------------------------------------------------------