├── .gitignore
├── LICENSE
├── README.md
├── all.sh
├── configs
└── fsod
│ ├── Base-FSOD-C4.yaml
│ ├── R_50_C4_1x.yaml
│ └── finetune_R_50_C4_1x.yaml
├── datasets
├── coco
│ ├── 1_split_filter.py
│ ├── 2_balance.py
│ ├── 3_gen_support_pool.py
│ ├── 4_gen_support_pool_10_shot.py
│ ├── 5_voc_part.py
│ ├── 6_voc_few_shot.py
│ └── new_annotations
│ │ └── final_split_voc_10_shot_instances_train2017.json
└── generate_support_data.sh
├── fewx
├── __init__.py
├── config
│ ├── __init__.py
│ ├── config.py
│ └── defaults.py
├── data
│ ├── __init__.py
│ ├── build.py
│ ├── dataset_mapper.py
│ └── datasets
│ │ ├── __init__.py
│ │ ├── builtin.py
│ │ └── register_coco.py
├── evaluation
│ ├── __init__.py
│ └── coco_evaluation.py
├── layers
│ ├── __init__.py
│ ├── boundary.py
│ ├── conv_with_kaiming_uniform.py
│ ├── deform_conv.py
│ ├── iou_loss.py
│ ├── misc.py
│ ├── ml_nms.py
│ └── naive_group_norm.py
├── modeling
│ ├── __init__.py
│ └── fsod
│ │ ├── __init__.py
│ │ ├── fsod_fast_rcnn.py
│ │ ├── fsod_rcnn.py
│ │ ├── fsod_roi_heads.py
│ │ └── fsod_rpn.py
├── solver
│ ├── __init__.py
│ └── build.py
└── utils
│ ├── __init__.py
│ ├── comm.py
│ └── measures.py
├── fsod_train_net.py
└── log
├── fsod_finetune_test_log.txt
├── fsod_finetune_train_log.txt
├── fsod_train_log.txt
└── metric.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | # output dir
2 | output
3 | instant_test_output
4 | inference_test_output
5 |
6 |
7 | *.jpg
8 | *.png
9 | #*.txt
10 | #*.json
11 | *.diff
12 |
13 | # compilation and distribution
14 | __pycache__
15 | _ext
16 | *.pyc
17 | *.so
18 | detectron2.egg-info/
19 | build/
20 | dist/
21 | wheels/
22 |
23 | # pytorch/python/numpy formats
24 | *.pth
25 | *.pkl
26 | *.npy
27 |
28 | # ipython/jupyter notebooks
29 | *.ipynb
30 | **/.ipynb_checkpoints/
31 |
32 | # Editor temporaries
33 | *.swn
34 | *.swo
35 | *.swp
36 | *~
37 |
38 | # editor settings
39 | .idea
40 | .vscode
41 |
42 | # project dirs
43 | /detectron2/model_zoo/configs
44 | #/datasets
45 | /projects/*/datasets
46 | /models
47 | #/datasets/coco/new_annotations
48 | /datasets/coco/support
49 | /datasets/coco/10_shot_support
50 |
51 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2024 Qi Fan
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # FewX
2 |
3 | **FewX** is an open source toolbox on top of Detectron2 for data-limited instance-level recognition tasks, e.g., few-shot object detection, few-shot instance segmentation, partially supervised instance segmentation and so on.
4 |
5 | All data-limited instance-level recognition works from **Qi Fan** (HKUST, fanqics@gmail.com) are open-sourced here.
6 |
7 | To date, FewX implements the following algorithms:
8 |
9 | - [FSOD](https://arxiv.org/abs/1908.01998): few-shot object detection with [FSOD dataset](https://github.com/fanq15/Few-Shot-Object-Detection-Dataset).
10 | - [CPMask](https://arxiv.org/abs/2007.12387): partially supervised/fully supervised/few-shot instance segmentation. (**20220725 working on it**)
11 | - [FSVOD](https://arxiv.org/abs/2104.14805): few-shot video object detection with [FSVOD-500 dataset](https://drive.google.com/drive/folders/1DDQ81A8yVj7D8vLUS01657ATr2sK1zgC?usp=sharing) and [FSYTV-40 dataset](https://drive.google.com/drive/folders/1a1PpfAxeYL7AbxYViDDnx7ACFtRohVL5?usp=sharing). (**20220725 working on it**)
12 |
13 | ## Highlights
14 | - **State-of-the-art performance.**
15 | - FSOD is the best few-shot object detection model. (This model can be directly applied to novel classes without finetuning. And finetuning can bring better performance.)
16 | - CPMask is the best partially supervised/few-shot instance segmentation model.
17 | - **Easy to use.** You only need to run 3 code lines to conduct the entire experiment.
18 | - Install Pre-Built Detectron2 in one code line.
19 | - Prepare dataset in one code line. (You need to first download the dataset and change the **data path** in the script.)
20 | - Training and evaluation in one code line.
21 |
22 | ## Updates
23 | - FewX has been released. (09/08/2020)
24 |
25 | ## Results on MS COCO
26 |
27 | ### Few Shot Object Detection
28 |
29 | |Method|Training Dataset|Evaluation way&shot|box AP|download|
30 | |:--------:|:--------:|:--------:|:--------:|:--:|
31 | |FSOD (paper)|COCO (non-voc)|full-way 10-shot|11.1|-|
32 | |FSOD (this implementation)|COCO (non-voc)|full-way 10-shot|**12.0**|model \| metrics|
33 |
34 | The results are reported on the COCO voc subset with **ResNet-50** backbone.
35 |
36 | The model only trained on base classes is base model \.
37 |
38 | You can reference the [original FSOD implementation](https://github.com/fanq15/FSOD-code) on the [Few-Shot-Object-Detection-Dataset](https://github.com/fanq15/Few-Shot-Object-Detection-Dataset).
39 |
40 | ## Step 1: Installation
41 | You only need to install [detectron2](https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md). We recommend the Pre-Built Detectron2 (Linux only) version with pytorch 1.7. I use the Pre-Built Detectron2 with CUDA 10.1 and pytorch 1.7 and you can run this code to install it.
42 |
43 | ```
44 | python -m pip install detectron2 -f \
45 | https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
46 | ```
47 |
48 | ## Step 2: Prepare dataset
49 | - Prepare for coco dataset following [this instruction](https://github.com/facebookresearch/detectron2/tree/master/datasets).
50 |
51 | - `cd datasets`, change the `DATA_ROOT` in the `generate_support_data.sh` to your data path and run `sh generate_support_data.sh`.
52 |
53 | ```
54 | cd FewX/datasets
55 | sh generate_support_data.sh
56 | ```
57 |
58 | ## Step 3: Training and Evaluation
59 |
60 | Run `sh all.sh` in the root dir. (This script uses `4 GPUs`. You can change the GPU number. If you use 2 GPUs with unchanged batch size (8), please [halve the learning rate](https://github.com/fanq15/FewX/issues/6#issuecomment-674367388).)
61 |
62 | ```
63 | cd FewX
64 | sh all.sh
65 | ```
66 |
67 |
68 | ## TODO
69 | - [ ] Add FSVOD and CPMask codes to this repo.
70 | - [ ] Add other dataset results to FSOD.
71 | - [ ] Add [CPMask](https://arxiv.org/abs/2007.12387) code with partially supervised instance segmentation, fully supervised instance segmentation and few-shot instance segmentation.
72 |
73 | ## Citing FewX
74 | If you use this toolbox in your research or wish to refer to the baseline results, please use the following BibTeX entries.
75 |
76 | ```
77 | @inproceedings{fan2021fsvod,
78 | title={Few-Shot Video Object Detection},
79 | author={Fan, Qi and Tang, Chi-Keung and Tai, Yu-Wing},
80 | booktitle={ECCV},
81 | year={2022}
82 | }
83 | @inproceedings{fan2020cpmask,
84 | title={Commonality-Parsing Network across Shape and Appearance for Partially Supervised Instance Segmentation},
85 | author={Fan, Qi and Ke, Lei and Pei, Wenjie and Tang, Chi-Keung and Tai, Yu-Wing},
86 | booktitle={ECCV},
87 | year={2020}
88 | }
89 | @inproceedings{fan2020fsod,
90 | title={Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector},
91 | author={Fan, Qi and Zhuo, Wei and Tang, Chi-Keung and Tai, Yu-Wing},
92 | booktitle={CVPR},
93 | year={2020}
94 | }
95 | ```
96 |
97 | ## Special Thanks
98 | [Detectron2](https://github.com/facebookresearch/detectron2), [AdelaiDet](https://github.com/aim-uofa/AdelaiDet), [centermask2](https://github.com/youngwanLEE/centermask2)
99 |
--------------------------------------------------------------------------------
/all.sh:
--------------------------------------------------------------------------------
1 | rm support_dir/support_feature.pkl
2 | CUDA_VISIBLE_DEVICES=0,1,2,3 python3 fsod_train_net.py --num-gpus 4 \
3 | --config-file configs/fsod/R_50_C4_1x.yaml 2>&1 | tee log/fsod_train_log.txt
4 |
5 | #CUDA_VISIBLE_DEVICES=0,1,2,3 python3 tools/train_net.py --num-gpus 4 \
6 | # --config-file configs/fsod/R_50_C4_1x.yaml \
7 | # --eval-only MODEL.WEIGHTS ./output/fsod/R_50_C4_1x/model_final.pth 2>&1 | tee log/fsod_test_log.txt
8 |
9 | rm support_dir/support_feature.pkl
10 | CUDA_VISIBLE_DEVICES=0,1,2,3 python3 fsod_train_net.py --num-gpus 4 \
11 | --config-file configs/fsod/finetune_R_50_C4_1x.yaml 2>&1 | tee log/fsod_finetune_train_log.txt
12 | CUDA_VISIBLE_DEVICES=0,1,2,3 python3 fsod_train_net.py --num-gpus 4 \
13 | --config-file configs/fsod/finetune_R_50_C4_1x.yaml \
14 | --eval-only MODEL.WEIGHTS ./output/fsod/finetune_dir/R_50_C4_1x/model_final.pth 2>&1 | tee log/fsod_finetune_test_log.txt
15 |
16 | #CUDA_VISIBLE_DEVICES=0,1,2,3 python3 fsod_train_net.py --num-gpus 4 \
17 | # --config-file configs/fsod/finetune_R_50_C4_1x.yaml \
18 | # --eval-only MODEL.WEIGHTS ./output/fsod/finetune_dir/R_50_C4_1x/model_final.pth 2>&1 | tee log/fsod_finetune_test_log.txt
19 |
20 |
--------------------------------------------------------------------------------
/configs/fsod/Base-FSOD-C4.yaml:
--------------------------------------------------------------------------------
1 | MODEL:
2 | META_ARCHITECTURE: "FsodRCNN"
3 | PROPOSAL_GENERATOR:
4 | NAME: "FsodRPN"
5 | RPN:
6 | PRE_NMS_TOPK_TEST: 6000
7 | POST_NMS_TOPK_TEST: 100
8 | ROI_HEADS:
9 | NAME: "FsodRes5ROIHeads"
10 | BATCH_SIZE_PER_IMAGE: 128
11 | POSITIVE_FRACTION: 0.5
12 | NUM_CLASSES: 1
13 | BACKBONE:
14 | FREEZE_AT: 3
15 | #PIXEL_MEAN: [102.9801, 115.9465, 122.7717]
16 | DATASETS:
17 | TRAIN: ("coco_2017_train_nonvoc",) #("coco_2017_train",)
18 | TEST: ("coco_2017_val",)
19 | DATALOADER:
20 | NUM_WORKERS: 8
21 | SOLVER:
22 | IMS_PER_BATCH: 8 #16
23 | BASE_LR: 0.004 #0.02
24 | STEPS: (112000, 120000) #(30000, 40000) #(56000,) #(60000, 80000)
25 | MAX_ITER: 120000 #300000 #45000 #60000 #90000
26 | WARMUP_ITERS: 1000 #500
27 | WARMUP_FACTOR: 0.1
28 | CHECKPOINT_PERIOD: 30000
29 | HEAD_LR_FACTOR: 2.0
30 | #WEIGHT_DECAY_BIAS: 0.0
31 | INPUT:
32 | FS:
33 | SUPPORT_WAY: 2
34 | SUPPORT_SHOT: 10
35 | MIN_SIZE_TRAIN: (440, 472, 504, 536, 568, 600) #(600,) #(640, 672, 704, 736, 768, 800)
36 | MAX_SIZE_TRAIN: 1000
37 | MIN_SIZE_TEST: 600
38 | MAX_SIZE_TEST: 1000
39 | VERSION: 2
40 |
--------------------------------------------------------------------------------
/configs/fsod/R_50_C4_1x.yaml:
--------------------------------------------------------------------------------
1 | _BASE_: "Base-FSOD-C4.yaml"
2 | MODEL:
3 | WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
4 | MASK_ON: False
5 | RESNETS:
6 | DEPTH: 50
7 | OUTPUT_DIR: './output/fsod/R_50_C4_1x'
8 |
--------------------------------------------------------------------------------
/configs/fsod/finetune_R_50_C4_1x.yaml:
--------------------------------------------------------------------------------
1 | _BASE_: "Base-FSOD-C4.yaml"
2 | MODEL:
3 | WEIGHTS: "./output/fsod/R_50_C4_1x/model_final.pth"
4 | MASK_ON: False
5 | RESNETS:
6 | DEPTH: 50
7 | BACKBONE:
8 | FREEZE_AT: 5
9 | DATASETS:
10 | TRAIN: ("coco_2017_train_voc_10_shot",)
11 | TEST: ("coco_2017_val",)
12 | SOLVER:
13 | IMS_PER_BATCH: 4
14 | BASE_LR: 0.001
15 | STEPS: (2000, 3000)
16 | MAX_ITER: 3000
17 | WARMUP_ITERS: 200
18 | INPUT:
19 | FS:
20 | FEW_SHOT: True
21 | SUPPORT_WAY: 2
22 | SUPPORT_SHOT: 9
23 | MIN_SIZE_TRAIN: (440, 472, 504, 536, 568, 600)
24 | MAX_SIZE_TRAIN: 1000
25 | MIN_SIZE_TEST: 600
26 | MAX_SIZE_TEST: 1000
27 | OUTPUT_DIR: './output/fsod/finetune_dir/R_50_C4_1x'
28 |
29 |
--------------------------------------------------------------------------------
/datasets/coco/1_split_filter.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 | """
4 | Created on Fri Jun 5 15:27:52 2020
5 |
6 | @author: fanq15
7 | """
8 |
9 | from pycocotools.coco import COCO
10 | import cv2
11 | import numpy as np
12 | from os.path import join, isdir
13 | from os import mkdir, makedirs
14 | from concurrent import futures
15 | import sys
16 | import time
17 | import math
18 | import matplotlib.pyplot as plt
19 | import os
20 | import pandas as pd
21 | import json
22 | import sys
23 |
24 |
25 | def filter_coco(coco, cls_split):
26 | new_anns = []
27 | all_cls_dict = {}
28 | for img_id, id in enumerate(coco.imgs):
29 | img = coco.loadImgs(id)[0]
30 | anns = coco.loadAnns(coco.getAnnIds(imgIds=id, iscrowd=None))
31 | skip_flag = False
32 | img_cls_dict = {}
33 | if len(anns) == 0:
34 | continue
35 | for ann in anns:
36 | segmentation = ann['segmentation']
37 | area = ann['area']
38 | iscrowd = ann['iscrowd']
39 | image_id = ann['image_id']
40 | bbox = ann['bbox']
41 | category_id = ann['category_id']
42 | id = ann['id']
43 | bbox_area = bbox[2] * bbox[3]
44 |
45 | # filter images with small boxes
46 | if category_id in cls_split:
47 | if bbox_area < 32 * 32:
48 | skip_flag = True
49 |
50 | if skip_flag:
51 | continue
52 | else:
53 | for ann in anns:
54 | category_id = ann['category_id']
55 | if category_id in cls_split:
56 | new_anns.append(ann)
57 |
58 | if category_id in all_cls_dict.keys():
59 | all_cls_dict[category_id] += 1
60 | else:
61 | all_cls_dict[category_id] = 1
62 |
63 | print(len(new_anns))
64 | print(sorted(all_cls_dict.items(), key = lambda kv:(kv[1], kv[0])))
65 | return new_anns
66 |
67 |
68 | root_path = sys.argv[1]
69 | print(root_path)
70 | #root_path = '/home/fanqi/data/COCO'
71 | dataDir = './annotations'
72 | support_dict = {}
73 |
74 | support_dict['support_box'] = []
75 | support_dict['category_id'] = []
76 | support_dict['image_id'] = []
77 | support_dict['id'] = []
78 | support_dict['file_path'] = []
79 |
80 | voc_inds = (0, 1, 2, 3, 4, 5, 6, 8, 14, 15, 16, 17, 18, 19, 39, 56, 57, 58, 60, 62)
81 |
82 |
83 | for dataType in ['instances_train2017.json']: #, 'split_voc_instances_train2017.json']:
84 | annFile = join(dataDir, dataType)
85 |
86 | with open(annFile,'r') as load_f:
87 | dataset = json.load(load_f)
88 | print(dataset.keys())
89 | save_info = dataset['info']
90 | save_licenses = dataset['licenses']
91 | save_images = dataset['images']
92 | save_categories = dataset['categories']
93 | save_annotations = dataset['annotations']
94 |
95 |
96 | inds_split2 = [i for i in range(len(save_categories)) if i not in voc_inds]
97 |
98 | # split annotations according to categories
99 | categories_split1 = [save_categories[i] for i in voc_inds]
100 | categories_split2 = [save_categories[i] for i in inds_split2]
101 | cids_split1 = [c['id'] for c in categories_split1]
102 | cids_split2 = [c['id'] for c in categories_split2]
103 | print('Split 1: {} classes'.format(len(categories_split1)))
104 | for c in categories_split1:
105 | print('\t', c['name'])
106 | print('Split 2: {} classes'.format(len(categories_split2)))
107 | for c in categories_split2:
108 | print('\t', c['name'])
109 |
110 | coco = COCO(annFile)
111 |
112 | # for non-voc, there can be non_voc images
113 | annotations_split2 = filter_coco(coco, cids_split2)
114 |
115 | # for voc, there can be non_voc images
116 | annotations = dataset['annotations']
117 | annotations_split1 = []
118 |
119 | for ann in annotations:
120 | if ann['category_id'] in cids_split1: # voc 20
121 | annotations_split1.append(ann)
122 |
123 | dataset_split1 = {
124 | 'info': save_info,
125 | 'licenses': save_licenses,
126 | 'images': save_images,
127 | 'annotations': annotations_split1,
128 | 'categories': save_categories}
129 | dataset_split2 = {
130 | 'info': save_info,
131 | 'licenses': save_licenses,
132 | 'images': save_images,
133 | 'annotations': annotations_split2,
134 | 'categories': save_categories}
135 | new_annotations_path = os.path.join(root_path, 'new_annotations')
136 | if not os.path.exists(new_annotations_path):
137 | os.makedirs(new_annotations_path)
138 | split2_file = os.path.join(root_path, 'new_annotations/final_split_non_voc_instances_train2017.json')
139 |
140 | with open(split2_file, 'w') as f:
141 | json.dump(dataset_split2, f)
142 |
--------------------------------------------------------------------------------
/datasets/coco/2_balance.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 | """
4 | Created on Fri Jun 5 16:04:06 2020
5 |
6 | @author: fanq15
7 | """
8 |
9 | from pycocotools.coco import COCO
10 | import cv2
11 | import numpy as np
12 | from os.path import join, isdir
13 | from os import mkdir, makedirs
14 | from concurrent import futures
15 | import sys
16 | import time
17 | import math
18 | import matplotlib.pyplot as plt
19 | import os
20 | import pandas as pd
21 | import json
22 | import sys
23 |
24 | def balance_coco(coco):
25 | all_cls_dict = {}
26 | for img_id, id in enumerate(coco.imgs):
27 | anns = coco.loadAnns(coco.getAnnIds(imgIds=id, iscrowd=None))
28 | img_cls_dict = {}
29 | if len(anns) == 0:
30 | continue
31 | for ann in anns:
32 | category_id = ann['category_id']
33 | id = ann['id']
34 |
35 | if category_id in img_cls_dict.keys():
36 | img_cls_dict[category_id] += 1
37 | else:
38 | img_cls_dict[category_id] = 1
39 |
40 | for category_id, num in img_cls_dict.items():
41 | if category_id in all_cls_dict.keys():
42 | all_cls_dict[category_id] += 1 # count the image number containing the target category
43 | else:
44 | all_cls_dict[category_id] = 1
45 | print('Image number of non-voc classes before class balancing: ', sorted(all_cls_dict.items(), key = lambda kv:(kv[1], kv[0])))
46 |
47 | new_anns = []
48 | for img_id, id in enumerate(coco.imgs):
49 | save_flag = False
50 | anns = coco.loadAnns(coco.getAnnIds(imgIds=id, iscrowd=None))
51 | if len(anns) == 0:
52 | continue
53 | # in this image, if there is at least one category with less than 2000 images, save this image.
54 | # otherwise, remove this image.
55 | for ann in anns:
56 | category_id = ann['category_id']
57 |
58 | if all_cls_dict[category_id] <= 2000:
59 | save_flag = True
60 |
61 | if save_flag:
62 | for ann in anns:
63 | new_anns.append(ann)
64 | else:
65 | for ann in anns:
66 | category_id = ann['category_id']
67 | all_cls_dict[category_id] -= 1
68 | print('Instance number of non-voc classes after class balancing: ', len(new_anns))
69 | print('Image number of non-voc classes after class balancing: ', sorted(all_cls_dict.items(), key = lambda kv:(kv[1], kv[0])))
70 |
71 | return new_anns
72 |
73 | root_path = sys.argv[1]
74 | #root_path = '/home/fanqi/data/COCO'
75 | dataDir = os.path.join(root_path, 'new_annotations')
76 | support_dict = {}
77 |
78 | support_dict['support_box'] = []
79 | support_dict['category_id'] = []
80 | support_dict['image_id'] = []
81 | support_dict['id'] = []
82 | support_dict['file_path'] = []
83 |
84 |
85 | for dataType in ['split_non_voc_instances_train2017.json']:
86 | annFile = join(dataDir, dataType)
87 |
88 | with open(annFile,'r') as load_f:
89 | dataset = json.load(load_f)
90 | print(dataset.keys())
91 | save_info = dataset['info']
92 | save_licenses = dataset['licenses']
93 | save_images = dataset['images']
94 | save_categories = dataset['categories']
95 |
96 | print(annFile)
97 | shot_num = 10
98 | coco = COCO(annFile)
99 | print(coco)
100 |
101 | annotations = balance_coco(coco)
102 | dataset_split = {
103 | 'info': save_info,
104 | 'licenses': save_licenses,
105 | 'images': save_images,
106 | 'annotations': annotations,
107 | 'categories': save_categories}
108 | split_file = os.path.join(root_path, 'new_annotations/final_split_non_voc_instances_train2017.json')
109 |
110 | with open(split_file, 'w') as f:
111 | json.dump(dataset_split, f)
112 |
--------------------------------------------------------------------------------
/datasets/coco/3_gen_support_pool.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 | """
4 | Created on Thu Jun 4 15:30:24 2020
5 |
6 | @author: fanq15
7 | """
8 |
9 | from pycocotools.coco import COCO
10 | import cv2
11 | import numpy as np
12 | from os.path import join, isdir
13 | from os import mkdir, makedirs
14 | from concurrent import futures
15 | import sys
16 | import time
17 | import math
18 | import matplotlib.pyplot as plt
19 | import os
20 | import pandas as pd
21 | import json
22 | import shutil
23 | import sys
24 |
25 | def vis_image(im, bboxs, im_name):
26 | dpi = 300
27 | fig, ax = plt.subplots()
28 | ax.imshow(im, aspect='equal')
29 | plt.axis('off')
30 | height, width, channels = im.shape
31 | fig.set_size_inches(width/100.0/3.0, height/100.0/3.0)
32 | plt.gca().xaxis.set_major_locator(plt.NullLocator())
33 | plt.gca().yaxis.set_major_locator(plt.NullLocator())
34 | plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0,wspace=0)
35 | plt.margins(0,0)
36 | # Show box (off by default, box_alpha=0.0)
37 | for bbox in bboxs:
38 | ax.add_patch(
39 | plt.Rectangle((bbox[0], bbox[1]),
40 | bbox[2] - bbox[0],
41 | bbox[3] - bbox[1],
42 | fill=False, edgecolor='r',
43 | linewidth=0.5, alpha=1))
44 | output_name = os.path.basename(im_name)
45 | plt.savefig(im_name, dpi=dpi, bbox_inches='tight', pad_inches=0)
46 | plt.close('all')
47 |
48 |
49 | def crop_support(img, bbox):
50 | image_shape = img.shape[:2] # h, w
51 | data_height, data_width = image_shape
52 |
53 | img = img.transpose(2, 0, 1)
54 |
55 | x1 = int(bbox[0])
56 | y1 = int(bbox[1])
57 | x2 = int(bbox[2])
58 | y2 = int(bbox[3])
59 |
60 | width = x2 - x1
61 | height = y2 - y1
62 | context_pixel = 16 #int(16 * im_scale)
63 |
64 | new_x1 = 0
65 | new_y1 = 0
66 | new_x2 = width
67 | new_y2 = height
68 | target_size = (320, 320) #(384, 384)
69 |
70 | if width >= height:
71 | crop_x1 = x1 - context_pixel
72 | crop_x2 = x2 + context_pixel
73 |
74 | # New_x1 and new_x2 will change when crop context or overflow
75 | new_x1 = new_x1 + context_pixel
76 | new_x2 = new_x1 + width
77 | if crop_x1 < 0:
78 | new_x1 = new_x1 + crop_x1
79 | new_x2 = new_x1 + width
80 | crop_x1 = 0
81 | if crop_x2 > data_width:
82 | crop_x2 = data_width
83 |
84 | short_size = height
85 | long_size = crop_x2 - crop_x1
86 | y_center = int((y2+y1) / 2) #math.ceil((y2 + y1) / 2)
87 | crop_y1 = int(y_center - (long_size / 2)) #int(y_center - math.ceil(long_size / 2))
88 | crop_y2 = int(y_center + (long_size / 2)) #int(y_center + math.floor(long_size / 2))
89 |
90 | # New_y1 and new_y2 will change when crop context or overflow
91 | new_y1 = new_y1 + math.ceil((long_size - short_size) / 2)
92 | new_y2 = new_y1 + height
93 | if crop_y1 < 0:
94 | new_y1 = new_y1 + crop_y1
95 | new_y2 = new_y1 + height
96 | crop_y1 = 0
97 | if crop_y2 > data_height:
98 | crop_y2 = data_height
99 |
100 | crop_short_size = crop_y2 - crop_y1
101 | crop_long_size = crop_x2 - crop_x1
102 | square = np.zeros((3, crop_long_size, crop_long_size), dtype = np.uint8)
103 | delta = int((crop_long_size - crop_short_size) / 2) #int(math.ceil((crop_long_size - crop_short_size) / 2))
104 | square_y1 = delta
105 | square_y2 = delta + crop_short_size
106 |
107 | new_y1 = new_y1 + delta
108 | new_y2 = new_y2 + delta
109 |
110 | crop_box = img[:, crop_y1:crop_y2, crop_x1:crop_x2]
111 | square[:, square_y1:square_y2, :] = crop_box
112 |
113 | #show_square = np.zeros((crop_long_size, crop_long_size, 3))#, dtype=np.int16)
114 | #show_crop_box = original_img[crop_y1:crop_y2, crop_x1:crop_x2, :]
115 | #show_square[square_y1:square_y2, :, :] = show_crop_box
116 | #show_square = show_square.astype(np.int16)
117 | else:
118 | crop_y1 = y1 - context_pixel
119 | crop_y2 = y2 + context_pixel
120 |
121 | # New_y1 and new_y2 will change when crop context or overflow
122 | new_y1 = new_y1 + context_pixel
123 | new_y2 = new_y1 + height
124 | if crop_y1 < 0:
125 | new_y1 = new_y1 + crop_y1
126 | new_y2 = new_y1 + height
127 | crop_y1 = 0
128 | if crop_y2 > data_height:
129 | crop_y2 = data_height
130 |
131 | short_size = width
132 | long_size = crop_y2 - crop_y1
133 | x_center = int((x2 + x1) / 2) #math.ceil((x2 + x1) / 2)
134 | crop_x1 = int(x_center - (long_size / 2)) #int(x_center - math.ceil(long_size / 2))
135 | crop_x2 = int(x_center + (long_size / 2)) #int(x_center + math.floor(long_size / 2))
136 |
137 | # New_x1 and new_x2 will change when crop context or overflow
138 | new_x1 = new_x1 + math.ceil((long_size - short_size) / 2)
139 | new_x2 = new_x1 + width
140 | if crop_x1 < 0:
141 | new_x1 = new_x1 + crop_x1
142 | new_x2 = new_x1 + width
143 | crop_x1 = 0
144 | if crop_x2 > data_width:
145 | crop_x2 = data_width
146 |
147 | crop_short_size = crop_x2 - crop_x1
148 | crop_long_size = crop_y2 - crop_y1
149 | square = np.zeros((3, crop_long_size, crop_long_size), dtype = np.uint8)
150 | delta = int((crop_long_size - crop_short_size) / 2) #int(math.ceil((crop_long_size - crop_short_size) / 2))
151 | square_x1 = delta
152 | square_x2 = delta + crop_short_size
153 |
154 | new_x1 = new_x1 + delta
155 | new_x2 = new_x2 + delta
156 | crop_box = img[:, crop_y1:crop_y2, crop_x1:crop_x2]
157 | square[:, :, square_x1:square_x2] = crop_box
158 |
159 | #show_square = np.zeros((crop_long_size, crop_long_size, 3)) #, dtype=np.int16)
160 | #show_crop_box = original_img[crop_y1:crop_y2, crop_x1:crop_x2, :]
161 | #show_square[:, square_x1:square_x2, :] = show_crop_box
162 | #show_square = show_square.astype(np.int16)
163 | #print(crop_y2 - crop_y1, crop_x2 - crop_x1, bbox, data_height, data_width)
164 |
165 | square = square.astype(np.float32, copy=False)
166 | square_scale = float(target_size[0]) / long_size
167 | square = square.transpose(1,2,0)
168 | square = cv2.resize(square, target_size, interpolation=cv2.INTER_LINEAR) # None, None, fx=square_scale, fy=square_scale, interpolation=cv2.INTER_LINEAR)
169 | #square = square.transpose(2,0,1)
170 | square = square.astype(np.uint8)
171 |
172 | new_x1 = int(new_x1 * square_scale)
173 | new_y1 = int(new_y1 * square_scale)
174 | new_x2 = int(new_x2 * square_scale)
175 | new_y2 = int(new_y2 * square_scale)
176 |
177 | # For test
178 | #show_square = cv2.resize(show_square, target_size, interpolation=cv2.INTER_LINEAR) # None, None, fx=square_scale, fy=square_scale, interpolation=cv2.INTER_LINEAR)
179 | #self.vis_image(show_square, [new_x1, new_y1, new_x2, new_y2], img_path.split('/')[-1][:-4]+'_crop.jpg', './test')
180 |
181 | support_data = square
182 | support_box = np.array([new_x1, new_y1, new_x2, new_y2]).astype(np.float32)
183 | return support_data, support_box
184 |
185 |
186 | def main():
187 | dataDir = '.'
188 | #root_path = '/home/fanqi/data/COCO'
189 | root_path = sys.argv[1]
190 | support_path = os.path.join(root_path, 'support')
191 | if not isdir(support_path):
192 | mkdir(support_path)
193 | #else:
194 | # shutil.rmtree(support_path)
195 |
196 | support_dict = {}
197 |
198 | support_dict['support_box'] = []
199 | support_dict['category_id'] = []
200 | support_dict['image_id'] = []
201 | support_dict['id'] = []
202 | support_dict['file_path'] = []
203 |
204 | for dataType in ['train2017']: #, 'train2017']:
205 | set_crop_base_path = join(support_path, dataType)
206 | set_img_base_path = join(dataDir, dataType)
207 |
208 | annFile = os.path.join(root_path, 'new_annotations/final_split_non_voc_instances_train2017.json')
209 | with open(annFile,'r') as load_f:
210 | dataset = json.load(load_f)
211 | print(dataset.keys())
212 | save_info = dataset['info']
213 | save_licenses = dataset['licenses']
214 | save_images = dataset['images']
215 | save_categories = dataset['categories']
216 |
217 | coco = COCO(annFile)
218 |
219 | for img_id, id in enumerate(coco.imgs):
220 | if img_id % 100 == 0:
221 | print(img_id)
222 | img = coco.loadImgs(id)[0]
223 | anns = coco.loadAnns(coco.getAnnIds(imgIds=id, iscrowd=None))
224 |
225 | if len(anns) == 0:
226 | continue
227 |
228 | frame_crop_base_path = join(set_crop_base_path, img['file_name'].split('/')[-1].split('.')[0])
229 | if not isdir(frame_crop_base_path): makedirs(frame_crop_base_path)
230 | im = cv2.imread('{}/{}'.format(set_img_base_path, img['file_name']))
231 | #print('{}/{}'.format(set_img_base_path, img['file_name']))
232 | for item_id, ann in enumerate(anns):
233 | rect = ann['bbox']
234 | bbox = [rect[0], rect[1], rect[0] + rect[2], rect[1] + rect[3]]
235 | support_img, support_box = crop_support(im, bbox)
236 | #im_name = img['file_name'].split('.')[0] + '_' + str(item_id) + '.jpg'
237 | #output_dir = './fig'
238 | #vis_image(support_img[:, :, ::-1], support_box, join(frame_crop_base_path, '{:04d}.jpg'.format(item_id)))
239 | if rect[2] <= 0 or rect[3] <=0:
240 | print(rect)
241 | continue
242 | file_path = join(frame_crop_base_path, '{:04d}.jpg'.format(item_id))
243 | cv2.imwrite(file_path, support_img)
244 | #print(file_path)
245 | support_dict['support_box'].append(support_box.tolist())
246 | support_dict['category_id'].append(ann['category_id'])
247 | support_dict['image_id'].append(ann['image_id'])
248 | support_dict['id'].append(ann['id'])
249 | support_dict['file_path'].append(file_path)
250 |
251 | support_df = pd.DataFrame.from_dict(support_dict)
252 |
253 | return support_df
254 |
255 |
256 | if __name__ == '__main__':
257 | since = time.time()
258 | support_df = main()
259 | support_df.to_pickle("./train_support_df.pkl")
260 |
261 | time_elapsed = time.time() - since
262 | print('Total complete in {:.0f}m {:.0f}s'.format(
263 | time_elapsed // 60, time_elapsed % 60))
264 |
265 |
--------------------------------------------------------------------------------
/datasets/coco/4_gen_support_pool_10_shot.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 | """
4 | Created on Thu Jun 4 15:30:24 2020
5 |
6 | @author: fanq15
7 | """
8 |
9 | from pycocotools.coco import COCO
10 | import cv2
11 | import numpy as np
12 | from os.path import join, isdir
13 | from os import mkdir, makedirs
14 | from concurrent import futures
15 | import sys
16 | import time
17 | import math
18 | import matplotlib.pyplot as plt
19 | import os
20 | import pandas as pd
21 | import json
22 | import shutil
23 | import sys
24 |
25 | def vis_image(im, bboxs, im_name):
26 | dpi = 300
27 | fig, ax = plt.subplots()
28 | ax.imshow(im, aspect='equal')
29 | plt.axis('off')
30 | height, width, channels = im.shape
31 | fig.set_size_inches(width/100.0/3.0, height/100.0/3.0)
32 | plt.gca().xaxis.set_major_locator(plt.NullLocator())
33 | plt.gca().yaxis.set_major_locator(plt.NullLocator())
34 | plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0,wspace=0)
35 | plt.margins(0,0)
36 | # Show box (off by default, box_alpha=0.0)
37 | for bbox in bboxs:
38 | ax.add_patch(
39 | plt.Rectangle((bbox[0], bbox[1]),
40 | bbox[2] - bbox[0],
41 | bbox[3] - bbox[1],
42 | fill=False, edgecolor='r',
43 | linewidth=0.5, alpha=1))
44 | output_name = os.path.basename(im_name)
45 | plt.savefig(im_name, dpi=dpi, bbox_inches='tight', pad_inches=0)
46 | plt.close('all')
47 |
48 |
49 | def crop_support(img, bbox):
50 | image_shape = img.shape[:2] # h, w
51 | data_height, data_width = image_shape
52 |
53 | img = img.transpose(2, 0, 1)
54 |
55 | x1 = int(bbox[0])
56 | y1 = int(bbox[1])
57 | x2 = int(bbox[2])
58 | y2 = int(bbox[3])
59 |
60 | width = x2 - x1
61 | height = y2 - y1
62 | context_pixel = 16 #int(16 * im_scale)
63 |
64 | new_x1 = 0
65 | new_y1 = 0
66 | new_x2 = width
67 | new_y2 = height
68 | target_size = (320, 320) #(384, 384)
69 |
70 | if width >= height:
71 | crop_x1 = x1 - context_pixel
72 | crop_x2 = x2 + context_pixel
73 |
74 | # New_x1 and new_x2 will change when crop context or overflow
75 | new_x1 = new_x1 + context_pixel
76 | new_x2 = new_x1 + width
77 | if crop_x1 < 0:
78 | new_x1 = new_x1 + crop_x1
79 | new_x2 = new_x1 + width
80 | crop_x1 = 0
81 | if crop_x2 > data_width:
82 | crop_x2 = data_width
83 |
84 | short_size = height
85 | long_size = crop_x2 - crop_x1
86 | y_center = int((y2+y1) / 2) #math.ceil((y2 + y1) / 2)
87 | crop_y1 = int(y_center - (long_size / 2)) #int(y_center - math.ceil(long_size / 2))
88 | crop_y2 = int(y_center + (long_size / 2)) #int(y_center + math.floor(long_size / 2))
89 |
90 | # New_y1 and new_y2 will change when crop context or overflow
91 | new_y1 = new_y1 + math.ceil((long_size - short_size) / 2)
92 | new_y2 = new_y1 + height
93 | if crop_y1 < 0:
94 | new_y1 = new_y1 + crop_y1
95 | new_y2 = new_y1 + height
96 | crop_y1 = 0
97 | if crop_y2 > data_height:
98 | crop_y2 = data_height
99 |
100 | crop_short_size = crop_y2 - crop_y1
101 | crop_long_size = crop_x2 - crop_x1
102 | square = np.zeros((3, crop_long_size, crop_long_size), dtype = np.uint8)
103 | delta = int((crop_long_size - crop_short_size) / 2) #int(math.ceil((crop_long_size - crop_short_size) / 2))
104 | square_y1 = delta
105 | square_y2 = delta + crop_short_size
106 |
107 | new_y1 = new_y1 + delta
108 | new_y2 = new_y2 + delta
109 |
110 | crop_box = img[:, crop_y1:crop_y2, crop_x1:crop_x2]
111 | square[:, square_y1:square_y2, :] = crop_box
112 |
113 | #show_square = np.zeros((crop_long_size, crop_long_size, 3))#, dtype=np.int16)
114 | #show_crop_box = original_img[crop_y1:crop_y2, crop_x1:crop_x2, :]
115 | #show_square[square_y1:square_y2, :, :] = show_crop_box
116 | #show_square = show_square.astype(np.int16)
117 | else:
118 | crop_y1 = y1 - context_pixel
119 | crop_y2 = y2 + context_pixel
120 |
121 | # New_y1 and new_y2 will change when crop context or overflow
122 | new_y1 = new_y1 + context_pixel
123 | new_y2 = new_y1 + height
124 | if crop_y1 < 0:
125 | new_y1 = new_y1 + crop_y1
126 | new_y2 = new_y1 + height
127 | crop_y1 = 0
128 | if crop_y2 > data_height:
129 | crop_y2 = data_height
130 |
131 | short_size = width
132 | long_size = crop_y2 - crop_y1
133 | x_center = int((x2 + x1) / 2) #math.ceil((x2 + x1) / 2)
134 | crop_x1 = int(x_center - (long_size / 2)) #int(x_center - math.ceil(long_size / 2))
135 | crop_x2 = int(x_center + (long_size / 2)) #int(x_center + math.floor(long_size / 2))
136 |
137 | # New_x1 and new_x2 will change when crop context or overflow
138 | new_x1 = new_x1 + math.ceil((long_size - short_size) / 2)
139 | new_x2 = new_x1 + width
140 | if crop_x1 < 0:
141 | new_x1 = new_x1 + crop_x1
142 | new_x2 = new_x1 + width
143 | crop_x1 = 0
144 | if crop_x2 > data_width:
145 | crop_x2 = data_width
146 |
147 | crop_short_size = crop_x2 - crop_x1
148 | crop_long_size = crop_y2 - crop_y1
149 | square = np.zeros((3, crop_long_size, crop_long_size), dtype = np.uint8)
150 | delta = int((crop_long_size - crop_short_size) / 2) #int(math.ceil((crop_long_size - crop_short_size) / 2))
151 | square_x1 = delta
152 | square_x2 = delta + crop_short_size
153 |
154 | new_x1 = new_x1 + delta
155 | new_x2 = new_x2 + delta
156 | crop_box = img[:, crop_y1:crop_y2, crop_x1:crop_x2]
157 | square[:, :, square_x1:square_x2] = crop_box
158 |
159 | #show_square = np.zeros((crop_long_size, crop_long_size, 3)) #, dtype=np.int16)
160 | #show_crop_box = original_img[crop_y1:crop_y2, crop_x1:crop_x2, :]
161 | #show_square[:, square_x1:square_x2, :] = show_crop_box
162 | #show_square = show_square.astype(np.int16)
163 | #print(crop_y2 - crop_y1, crop_x2 - crop_x1, bbox, data_height, data_width)
164 |
165 | square = square.astype(np.float32, copy=False)
166 | square_scale = float(target_size[0]) / long_size
167 | square = square.transpose(1,2,0)
168 | square = cv2.resize(square, target_size, interpolation=cv2.INTER_LINEAR) # None, None, fx=square_scale, fy=square_scale, interpolation=cv2.INTER_LINEAR)
169 | #square = square.transpose(2,0,1)
170 | square = square.astype(np.uint8)
171 |
172 | new_x1 = int(new_x1 * square_scale)
173 | new_y1 = int(new_y1 * square_scale)
174 | new_x2 = int(new_x2 * square_scale)
175 | new_y2 = int(new_y2 * square_scale)
176 |
177 | # For test
178 | #show_square = cv2.resize(show_square, target_size, interpolation=cv2.INTER_LINEAR) # None, None, fx=square_scale, fy=square_scale, interpolation=cv2.INTER_LINEAR)
179 | #self.vis_image(show_square, [new_x1, new_y1, new_x2, new_y2], img_path.split('/')[-1][:-4]+'_crop.jpg', './test')
180 |
181 | support_data = square
182 | support_box = np.array([new_x1, new_y1, new_x2, new_y2]).astype(np.float32)
183 | return support_data, support_box
184 |
185 |
186 | def main():
187 | dataDir = '.'
188 |
189 | #root_path = '/home/fanqi/data/COCO'
190 | root_path = sys.argv[1]
191 | support_path = os.path.join(root_path, '10_shot_support')
192 | #support_path = '10_shot_support'
193 | if not isdir(support_path):
194 | mkdir(support_path)
195 | #else:
196 | # shutil.rmtree(support_path)
197 |
198 | support_dict = {}
199 |
200 | support_dict['support_box'] = []
201 | support_dict['category_id'] = []
202 | support_dict['image_id'] = []
203 | support_dict['id'] = []
204 | support_dict['file_path'] = []
205 |
206 | for dataType in ['train2017']: #, 'train2017']:
207 | set_crop_base_path = join(support_path, dataType)
208 | set_img_base_path = join(dataDir, dataType)
209 |
210 | # other information
211 | #annFile = '{}/annotations/instances_{}.json'.format(dataDir,dataType)
212 | #annFile = './new_annotations/final_split_voc_10_shot_instances_train2017.json'
213 |
214 | annFile = os.path.join(root_path, 'new_annotations/final_split_voc_10_shot_instances_train2017.json')
215 |
216 | with open(annFile,'r') as load_f:
217 | dataset = json.load(load_f)
218 | print(dataset.keys())
219 | save_info = dataset['info']
220 | save_licenses = dataset['licenses']
221 | save_images = dataset['images']
222 | save_categories = dataset['categories']
223 |
224 | coco = COCO(annFile)
225 |
226 | for img_id, id in enumerate(coco.imgs):
227 | if img_id % 100 == 0:
228 | print(img_id)
229 | img = coco.loadImgs(id)[0]
230 | anns = coco.loadAnns(coco.getAnnIds(imgIds=id, iscrowd=None))
231 |
232 | if len(anns) == 0:
233 | continue
234 |
235 | #print(img['file_name'])
236 | frame_crop_base_path = join(set_crop_base_path, img['file_name'].split('/')[-1].split('.')[0])
237 | if not isdir(frame_crop_base_path): makedirs(frame_crop_base_path)
238 | im = cv2.imread('{}/{}'.format(set_img_base_path, img['file_name']))
239 | for item_id, ann in enumerate(anns):
240 | #print(ann)
241 | rect = ann['bbox']
242 | bbox = [rect[0], rect[1], rect[0] + rect[2], rect[1] + rect[3]]
243 | support_img, support_box = crop_support(im, bbox)
244 | #im_name = img['file_name'].split('.')[0] + '_' + str(item_id) + '.jpg'
245 | #output_dir = './fig'
246 | #vis_image(support_img[:, :, ::-1], support_box, join(frame_crop_base_path, '{:04d}.jpg'.format(item_id)))
247 | if rect[2] <= 0 or rect[3] <=0:
248 | print(rect)
249 | continue
250 | file_path = join(frame_crop_base_path, '{:04d}.jpg'.format(item_id))
251 | cv2.imwrite(file_path, support_img)
252 | #print(file_path)
253 | support_dict['support_box'].append(support_box.tolist())
254 | support_dict['category_id'].append(ann['category_id'])
255 | support_dict['image_id'].append(ann['image_id'])
256 | support_dict['id'].append(ann['id'])
257 | support_dict['file_path'].append(file_path)
258 |
259 | support_df = pd.DataFrame.from_dict(support_dict)
260 |
261 | return support_df
262 |
263 |
264 | if __name__ == '__main__':
265 | since = time.time()
266 | support_df = main()
267 | support_df.to_pickle("./10_shot_support_df.pkl")
268 |
269 | time_elapsed = time.time() - since
270 | print('Total complete in {:.0f}m {:.0f}s'.format(
271 | time_elapsed // 60, time_elapsed % 60))
272 |
273 |
--------------------------------------------------------------------------------
/datasets/coco/5_voc_part.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 | """
4 | Created on Fri Jun 5 15:27:52 2020
5 |
6 | @author: fanq15
7 | """
8 |
9 | from pycocotools.coco import COCO
10 | import cv2
11 | import numpy as np
12 | from os.path import join, isdir
13 | from os import mkdir, makedirs
14 | from concurrent import futures
15 | import sys
16 | import time
17 | import math
18 | import matplotlib.pyplot as plt
19 | import os
20 | import pandas as pd
21 | import json
22 | import sys
23 |
24 | root_path = './'
25 | print(root_path)
26 | #root_path = '/home/fanqi/data/COCO'
27 | dataDir = './annotations'
28 | support_dict = {}
29 |
30 | support_dict['support_box'] = []
31 | support_dict['category_id'] = []
32 | support_dict['image_id'] = []
33 | support_dict['id'] = []
34 | support_dict['file_path'] = []
35 |
36 | voc_inds = (0, 1, 2, 3, 4, 5, 6, 8, 14, 15, 16, 17, 18, 19, 39, 56, 57, 58, 60, 62)
37 |
38 |
39 | for dataType in ['instances_train2017.json']: #, 'split_voc_instances_train2017.json']:
40 | annFile = join(dataDir, dataType)
41 |
42 | with open(annFile,'r') as load_f:
43 | dataset = json.load(load_f)
44 | print(dataset.keys())
45 | save_info = dataset['info']
46 | save_licenses = dataset['licenses']
47 | save_images = dataset['images']
48 | save_categories = dataset['categories']
49 | save_annotations = dataset['annotations']
50 |
51 |
52 | inds_split2 = [i for i in range(len(save_categories)) if i not in voc_inds]
53 |
54 | # split annotations according to categories
55 | categories_split1 = [save_categories[i] for i in voc_inds]
56 | categories_split2 = [save_categories[i] for i in inds_split2]
57 | cids_split1 = [c['id'] for c in categories_split1]
58 | cids_split2 = [c['id'] for c in categories_split2]
59 | print('Split 1: {} classes'.format(len(categories_split1)))
60 | for c in categories_split1:
61 | print('\t', c['name'])
62 | print('Split 2: {} classes'.format(len(categories_split2)))
63 | for c in categories_split2:
64 | print('\t', c['name'])
65 |
66 | coco = COCO(annFile)
67 |
68 | # for voc, there can be non_voc images
69 | annotations = dataset['annotations']
70 | annotations_split1 = []
71 |
72 | for ann in annotations:
73 | if ann['category_id'] in cids_split1: # voc 20
74 | annotations_split1.append(ann)
75 |
76 | dataset_split1 = {
77 | 'info': save_info,
78 | 'licenses': save_licenses,
79 | 'images': save_images,
80 | 'annotations': annotations_split1,
81 | 'categories': save_categories}
82 |
83 | new_annotations_path = os.path.join(root_path, 'new_annotations')
84 | if not os.path.exists(new_annotations_path):
85 | os.makedirs(new_annotations_path)
86 | split1_file = os.path.join(root_path, 'new_annotations/split_voc_instances_train2017.json')
87 |
88 | with open(split1_file, 'w') as f:
89 | json.dump(dataset_split1, f)
90 |
91 |
92 |
93 |
--------------------------------------------------------------------------------
/datasets/coco/6_voc_few_shot.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 | """
4 | Created on Fri Jun 5 16:04:06 2020
5 |
6 | @author: fanq15
7 | """
8 |
9 | from pycocotools.coco import COCO
10 | import cv2
11 | import numpy as np
12 | from os.path import join, isdir
13 | from os import mkdir, makedirs
14 | from concurrent import futures
15 | import sys
16 | import time
17 | import math
18 | import matplotlib.pyplot as plt
19 | import os
20 | import pandas as pd
21 | import json
22 | import sys
23 |
24 | def few_shot(coco, shot_num):
25 | new_anns = []
26 | all_cls_dict = {}
27 | for img_id, id in enumerate(coco.imgs):
28 | anns = coco.loadAnns(coco.getAnnIds(imgIds=id, iscrowd=None))
29 | skip_flag = False
30 | img_cls_dict = {}
31 | if len(anns) != 1:
32 | continue
33 | for ann in anns:
34 | area = ann['area']
35 | category_id = ann['category_id']
36 | id = ann['id']
37 |
38 | if category_id in img_cls_dict.keys():
39 | img_cls_dict[category_id] += 1
40 | else:
41 | img_cls_dict[category_id] = 1
42 |
43 | # filter images with small boxes
44 | if area < 64 * 64 or area > 224 * 224:
45 | skip_flag = True
46 |
47 | if category_id in all_cls_dict.keys():
48 | if all_cls_dict[category_id] == shot_num:
49 | skip_flag = True
50 |
51 | if skip_flag:
52 | continue
53 | else:
54 | for ann in anns:
55 | new_anns.append(ann)
56 | for category_id, num in img_cls_dict.items():
57 | if category_id in all_cls_dict.keys():
58 | all_cls_dict[category_id] += num
59 | else:
60 | all_cls_dict[category_id] = num
61 | print(len(new_anns))
62 | print(sorted(all_cls_dict.items(), key = lambda kv:(kv[1], kv[0])))
63 | return new_anns
64 |
65 |
66 | root_path = './'
67 | #root_path = '/home/fanqi/data/COCO'
68 | dataDir = os.path.join(root_path, 'new_annotations')
69 | support_dict = {}
70 |
71 | support_dict['support_box'] = []
72 | support_dict['category_id'] = []
73 | support_dict['image_id'] = []
74 | support_dict['id'] = []
75 | support_dict['file_path'] = []
76 |
77 |
78 | for dataType in ['split_voc_instances_train2017.json']:
79 | annFile = join(dataDir, dataType)
80 |
81 | with open(annFile,'r') as load_f:
82 | dataset = json.load(load_f)
83 | print(dataset.keys())
84 | save_info = dataset['info']
85 | save_licenses = dataset['licenses']
86 | save_images = dataset['images']
87 | save_categories = dataset['categories']
88 |
89 | print(annFile)
90 | shot_num = 10
91 | coco = COCO(annFile)
92 | print(coco)
93 |
94 | annotations = few_shot(coco, shot_num)
95 | dataset_split = {
96 | 'info': save_info,
97 | 'licenses': save_licenses,
98 | 'images': save_images,
99 | 'annotations': annotations,
100 | 'categories': save_categories}
101 | #split_file = os.path.join(root_path, 'new_annotations/final_split_voc_10_shot_instances_train2017.json')
102 | split_file = './new_annotations/final_split_voc_10_shot_instances_train2017.json'
103 |
104 | with open(split_file, 'w') as f:
105 | json.dump(dataset_split, f)
106 |
107 |
108 |
109 |
110 |
--------------------------------------------------------------------------------
/datasets/generate_support_data.sh:
--------------------------------------------------------------------------------
1 | DATA_ROOT=/home/fanqi/data/COCO
2 |
3 | cd coco
4 |
5 | ln -s $DATA_ROOT/train2017 ./
6 | ln -s $DATA_ROOT/val2017 ./
7 | ln -s $DATA_ROOT/annotations ./
8 |
9 | python3 1_split_filter.py ./
10 | #python3 2_balance.py ./
11 | python3 3_gen_support_pool.py ./
12 | python3 4_gen_support_pool_10_shot.py ./
13 |
14 |
--------------------------------------------------------------------------------
/fewx/__init__.py:
--------------------------------------------------------------------------------
1 | from fewx import modeling
2 | from fewx import config
3 | from fewx import layers
4 |
5 | __version__ = "0.1"
6 |
--------------------------------------------------------------------------------
/fewx/config/__init__.py:
--------------------------------------------------------------------------------
1 | from .config import get_cfg
2 |
3 | __all__ = [
4 | "get_cfg",
5 | ]
6 |
--------------------------------------------------------------------------------
/fewx/config/config.py:
--------------------------------------------------------------------------------
1 | from detectron2.config import CfgNode
2 |
3 |
4 | def get_cfg() -> CfgNode:
5 | """
6 | Get a copy of the default config.
7 | Returns:
8 | a detectron2 CfgNode instance.
9 | """
10 | from .defaults import _C
11 |
12 | return _C.clone()
13 |
--------------------------------------------------------------------------------
/fewx/config/defaults.py:
--------------------------------------------------------------------------------
1 | from detectron2.config.defaults import _C
2 | from detectron2.config import CfgNode as CN
3 |
4 |
5 | # ---------------------------------------------------------------------------- #
6 | # Additional Configs
7 | # ---------------------------------------------------------------------------- #
8 | _C.SOLVER.HEAD_LR_FACTOR = 1.0
9 |
10 | # ---------------------------------------------------------------------------- #
11 | # Few shot setting
12 | # ---------------------------------------------------------------------------- #
13 | _C.INPUT.FS = CN()
14 | _C.INPUT.FS.FEW_SHOT = False
15 | _C.INPUT.FS.SUPPORT_WAY = 2
16 | _C.INPUT.FS.SUPPORT_SHOT = 10
17 |
--------------------------------------------------------------------------------
/fewx/data/__init__.py:
--------------------------------------------------------------------------------
1 | from .dataset_mapper import DatasetMapperWithSupport
2 | # ensure the builtin datasets are registered
3 | from . import datasets # isort:skip
4 |
5 | __all__ = [k for k in globals().keys() if not k.startswith("_")]
6 |
--------------------------------------------------------------------------------
/fewx/data/build.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
2 | import bisect
3 | import copy
4 | import itertools
5 | import logging
6 | import numpy as np
7 | import operator
8 | import pickle
9 | import torch.utils.data
10 | from fvcore.common.file_io import PathManager
11 | from tabulate import tabulate
12 | from termcolor import colored
13 |
14 | from detectron2.structures import BoxMode
15 | from detectron2.utils.comm import get_world_size
16 | from detectron2.utils.env import seed_all_rng
17 | from detectron2.utils.logger import log_first_n
18 |
19 | from detectron2.data.catalog import DatasetCatalog, MetadataCatalog
20 | from detectron2.data.common import AspectRatioGroupedDataset, DatasetFromList, MapDataset
21 | from detectron2.data.dataset_mapper import DatasetMapper
22 | from detectron2.data.detection_utils import check_metadata_consistency
23 | from detectron2.data.samplers import InferenceSampler, RepeatFactorTrainingSampler, TrainingSampler
24 |
25 | from detectron2.data.build import build_batch_data_loader, filter_images_with_only_crowd_annotations, load_proposals_into_dataset, filter_images_with_few_keypoints, print_instances_class_histogram, trivial_batch_collator, get_detection_dataset_dicts
26 |
27 | def fsod_get_detection_dataset_dicts(
28 | dataset_names, filter_empty=True, min_keypoints=0, proposal_files=None
29 | ):
30 | """
31 | Load and prepare dataset dicts for instance detection/segmentation and semantic segmentation.
32 | Args:
33 | dataset_names (list[str]): a list of dataset names
34 | filter_empty (bool): whether to filter out images without instance annotations
35 | min_keypoints (int): filter out images with fewer keypoints than
36 | `min_keypoints`. Set to 0 to do nothing.
37 | proposal_files (list[str]): if given, a list of object proposal files
38 | that match each dataset in `dataset_names`.
39 | """
40 | assert len(dataset_names)
41 | dataset_dicts_original = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names]
42 | for dataset_name, dicts in zip(dataset_names, dataset_dicts_original):
43 | assert len(dicts), "Dataset '{}' is empty!".format(dataset_name)
44 |
45 | if proposal_files is not None:
46 | assert len(dataset_names) == len(proposal_files)
47 | # load precomputed proposals from proposal files
48 | dataset_dicts_original = [
49 | load_proposals_into_dataset(dataset_i_dicts, proposal_file)
50 | for dataset_i_dicts, proposal_file in zip(dataset_dicts_original, proposal_files)
51 | ]
52 |
53 | if 'train' not in dataset_names[0]:
54 | dataset_dicts = list(itertools.chain.from_iterable(dataset_dicts_original))
55 | else:
56 | dataset_dicts_original = list(itertools.chain.from_iterable(dataset_dicts_original))
57 | dataset_dicts_original = filter_images_with_only_crowd_annotations(dataset_dicts_original)
58 | ###################################################################################
59 | # split image-based annotations to instance-based annotations for few-shot learning
60 | dataset_dicts = []
61 | index_dicts = []
62 | split_flag = True
63 | if split_flag:
64 | for record in dataset_dicts_original:
65 | file_name = record['file_name']
66 | height = record['height']
67 | width = record['width']
68 | image_id = record['image_id']
69 | annotations = record['annotations']
70 | category_dict = {}
71 | for ann_id, ann in enumerate(annotations):
72 |
73 | ann.pop("segmentation", None)
74 | ann.pop("keypoints", None)
75 |
76 | category_id = ann['category_id']
77 | if category_id not in category_dict.keys():
78 | category_dict[category_id] = [ann]
79 | else:
80 | category_dict[category_id].append(ann)
81 |
82 | for key, item in category_dict.items():
83 | instance_ann = {}
84 | instance_ann['file_name'] = file_name
85 | instance_ann['height'] = height
86 | instance_ann['width'] = width
87 |
88 | instance_ann['annotations'] = item
89 |
90 | dataset_dicts.append(instance_ann)
91 |
92 |
93 | has_instances = "annotations" in dataset_dicts[0]
94 | # Keep images without instance-level GT if the dataset has semantic labels.
95 | if filter_empty and has_instances and "sem_seg_file_name" not in dataset_dicts[0]:
96 | dataset_dicts = filter_images_with_only_crowd_annotations(dataset_dicts)
97 |
98 | if min_keypoints > 0 and has_instances:
99 | dataset_dicts = filter_images_with_few_keypoints(dataset_dicts, min_keypoints)
100 |
101 | if has_instances:
102 | try:
103 | class_names = MetadataCatalog.get(dataset_names[0]).thing_classes
104 | check_metadata_consistency("thing_classes", dataset_names)
105 | print_instances_class_histogram(dataset_dicts, class_names)
106 | except AttributeError: # class names are not available for this dataset
107 | pass
108 | return dataset_dicts
109 |
110 | def build_detection_train_loader(cfg, mapper=None):
111 | """
112 | A data loader is created by the following steps:
113 | 1. Use the dataset names in config to query :class:`DatasetCatalog`, and obtain a list of dicts.
114 | 2. Coordinate a random shuffle order shared among all processes (all GPUs)
115 | 3. Each process spawn another few workers to process the dicts. Each worker will:
116 | * Map each metadata dict into another format to be consumed by the model.
117 | * Batch them by simply putting dicts into a list.
118 | The batched ``list[mapped_dict]`` is what this dataloader will yield.
119 | Args:
120 | cfg (CfgNode): the config
121 | mapper (callable): a callable which takes a sample (dict) from dataset and
122 | returns the format to be consumed by the model.
123 | By default it will be `DatasetMapper(cfg, True)`.
124 | Returns:
125 | an infinite iterator of training data
126 | """
127 | dataset_dicts = fsod_get_detection_dataset_dicts(
128 | cfg.DATASETS.TRAIN,
129 | filter_empty=cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS,
130 | min_keypoints=cfg.MODEL.ROI_KEYPOINT_HEAD.MIN_KEYPOINTS_PER_IMAGE
131 | if cfg.MODEL.KEYPOINT_ON
132 | else 0,
133 | proposal_files=cfg.DATASETS.PROPOSAL_FILES_TRAIN if cfg.MODEL.LOAD_PROPOSALS else None,
134 | )
135 | dataset = DatasetFromList(dataset_dicts, copy=False)
136 |
137 | if mapper is None:
138 | mapper = DatasetMapper(cfg, True)
139 | dataset = MapDataset(dataset, mapper)
140 |
141 | sampler_name = cfg.DATALOADER.SAMPLER_TRAIN
142 | logger = logging.getLogger(__name__)
143 | logger.info("Using training sampler {}".format(sampler_name))
144 | # TODO avoid if-else?
145 | if sampler_name == "TrainingSampler":
146 | sampler = TrainingSampler(len(dataset))
147 | elif sampler_name == "RepeatFactorTrainingSampler":
148 | repeat_factors = RepeatFactorTrainingSampler.repeat_factors_from_category_frequency(
149 | dataset_dicts, cfg.DATALOADER.REPEAT_THRESHOLD
150 | )
151 | sampler = RepeatFactorTrainingSampler(repeat_factors)
152 | else:
153 | raise ValueError("Unknown training sampler: {}".format(sampler_name))
154 | return build_batch_data_loader(
155 | dataset,
156 | sampler,
157 | cfg.SOLVER.IMS_PER_BATCH,
158 | aspect_ratio_grouping=cfg.DATALOADER.ASPECT_RATIO_GROUPING,
159 | num_workers=cfg.DATALOADER.NUM_WORKERS,
160 | )
161 |
162 | def build_detection_test_loader(cfg, dataset_name, mapper=None):
163 | """
164 | Similar to `build_detection_train_loader`.
165 | But this function uses the given `dataset_name` argument (instead of the names in cfg),
166 | and uses batch size 1.
167 | Args:
168 | cfg: a detectron2 CfgNode
169 | dataset_name (str): a name of the dataset that's available in the DatasetCatalog
170 | mapper (callable): a callable which takes a sample (dict) from dataset
171 | and returns the format to be consumed by the model.
172 | By default it will be `DatasetMapper(cfg, False)`.
173 | Returns:
174 | DataLoader: a torch DataLoader, that loads the given detection
175 | dataset, with test-time transformation and batching.
176 | """
177 | dataset_dicts = get_detection_dataset_dicts(
178 | [dataset_name],
179 | filter_empty=False, # True,
180 | proposal_files=[
181 | cfg.DATASETS.PROPOSAL_FILES_TEST[list(cfg.DATASETS.TEST).index(dataset_name)]
182 | ]
183 | if cfg.MODEL.LOAD_PROPOSALS
184 | else None,
185 | )
186 |
187 | dataset = DatasetFromList(dataset_dicts)
188 | if mapper is None:
189 | mapper = DatasetMapper(cfg, False) # True)
190 | dataset = MapDataset(dataset, mapper)
191 |
192 | sampler = InferenceSampler(len(dataset))
193 | # Always use 1 image per worker during inference since this is the
194 | # standard when reporting inference time in papers.
195 | batch_sampler = torch.utils.data.sampler.BatchSampler(sampler, 1, drop_last=False)
196 |
197 | data_loader = torch.utils.data.DataLoader(
198 | dataset,
199 | num_workers=cfg.DATALOADER.NUM_WORKERS,
200 | batch_sampler=batch_sampler,
201 | collate_fn=trivial_batch_collator,
202 | )
203 | return data_loader
204 |
--------------------------------------------------------------------------------
/fewx/data/dataset_mapper.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
2 | import copy
3 | import logging
4 | import numpy as np
5 | import torch
6 | from fvcore.common.file_io import PathManager
7 | from PIL import Image
8 |
9 | from detectron2.data import detection_utils as utils
10 | from detectron2.data import transforms as T
11 |
12 | import pandas as pd
13 | from detectron2.data.catalog import MetadataCatalog
14 |
15 | """
16 | This file contains the default mapping that's applied to "dataset dicts".
17 | """
18 |
19 | __all__ = ["DatasetMapperWithSupport"]
20 |
21 |
22 | class DatasetMapperWithSupport:
23 | """
24 | A callable which takes a dataset dict in Detectron2 Dataset format,
25 | and map it into a format used by the model.
26 |
27 | This is the default callable to be used to map your dataset dict into training data.
28 | You may need to follow it to implement your own one for customized logic,
29 | such as a different way to read or transform images.
30 | See :doc:`/tutorials/data_loading` for details.
31 |
32 | The callable currently does the following:
33 |
34 | 1. Read the image from "file_name"
35 | 2. Applies cropping/geometric transforms to the image and annotations
36 | 3. Prepare data and annotations to Tensor and :class:`Instances`
37 | """
38 |
39 | def __init__(self, cfg, is_train=True):
40 | if cfg.INPUT.CROP.ENABLED and is_train:
41 | self.crop_gen = T.RandomCrop(cfg.INPUT.CROP.TYPE, cfg.INPUT.CROP.SIZE)
42 | logging.getLogger(__name__).info("CropGen used in training: " + str(self.crop_gen))
43 | else:
44 | self.crop_gen = None
45 |
46 | self.tfm_gens = utils.build_transform_gen(cfg, is_train)
47 |
48 | # fmt: off
49 | self.img_format = cfg.INPUT.FORMAT
50 | self.mask_on = cfg.MODEL.MASK_ON
51 | self.mask_format = cfg.INPUT.MASK_FORMAT
52 | self.keypoint_on = cfg.MODEL.KEYPOINT_ON
53 | self.load_proposals = cfg.MODEL.LOAD_PROPOSALS
54 |
55 | self.few_shot = cfg.INPUT.FS.FEW_SHOT
56 | self.support_way = cfg.INPUT.FS.SUPPORT_WAY
57 | self.support_shot = cfg.INPUT.FS.SUPPORT_SHOT
58 | # fmt: on
59 | if self.keypoint_on and is_train:
60 | # Flip only makes sense in training
61 | self.keypoint_hflip_indices = utils.create_keypoint_hflip_indices(cfg.DATASETS.TRAIN)
62 | else:
63 | self.keypoint_hflip_indices = None
64 |
65 | if self.load_proposals:
66 | self.proposal_min_box_size = cfg.MODEL.PROPOSAL_GENERATOR.MIN_SIZE
67 | self.proposal_topk = (
68 | cfg.DATASETS.PRECOMPUTED_PROPOSAL_TOPK_TRAIN
69 | if is_train
70 | else cfg.DATASETS.PRECOMPUTED_PROPOSAL_TOPK_TEST
71 | )
72 | self.is_train = is_train
73 |
74 | if self.is_train:
75 | # support_df
76 | self.support_on = True
77 | if self.few_shot:
78 | self.support_df = pd.read_pickle("./datasets/coco/10_shot_support_df.pkl")
79 | else:
80 | self.support_df = pd.read_pickle("./datasets/coco/train_support_df.pkl")
81 |
82 | metadata = MetadataCatalog.get('coco_2017_train')
83 | # unmap the category mapping ids for COCO
84 | reverse_id_mapper = lambda dataset_id: metadata.thing_dataset_id_to_contiguous_id[dataset_id] # noqa
85 | self.support_df['category_id'] = self.support_df['category_id'].map(reverse_id_mapper)
86 |
87 |
88 | def __call__(self, dataset_dict):
89 | """
90 | Args:
91 | dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format.
92 |
93 | Returns:
94 | dict: a format that builtin models in detectron2 accept
95 | """
96 | dataset_dict = copy.deepcopy(dataset_dict) # it will be modified by code below
97 | # USER: Write your own image loading if it's not from a file
98 | image = utils.read_image(dataset_dict["file_name"], format=self.img_format)
99 | utils.check_image_size(dataset_dict, image)
100 | if self.is_train:
101 | # support
102 | if self.support_on:
103 | if "annotations" in dataset_dict:
104 | # USER: Modify this if you want to keep them for some reason.
105 | for anno in dataset_dict["annotations"]:
106 | if not self.mask_on:
107 | anno.pop("segmentation", None)
108 | if not self.keypoint_on:
109 | anno.pop("keypoints", None)
110 | support_images, support_bboxes, support_cls = self.generate_support(dataset_dict)
111 | dataset_dict['support_images'] = torch.as_tensor(np.ascontiguousarray(support_images))
112 | dataset_dict['support_bboxes'] = support_bboxes
113 | dataset_dict['support_cls'] = support_cls
114 |
115 | if "annotations" not in dataset_dict:
116 | image, transforms = T.apply_transform_gens(
117 | ([self.crop_gen] if self.crop_gen else []) + self.tfm_gens, image
118 | )
119 | else:
120 | # Crop around an instance if there are instances in the image.
121 | # USER: Remove if you don't use cropping
122 | if self.crop_gen:
123 | crop_tfm = utils.gen_crop_transform_with_instance(
124 | self.crop_gen.get_crop_size(image.shape[:2]),
125 | image.shape[:2],
126 | np.random.choice(dataset_dict["annotations"]),
127 | )
128 | image = crop_tfm.apply_image(image)
129 | image, transforms = T.apply_transform_gens(self.tfm_gens, image)
130 | if self.crop_gen:
131 | transforms = crop_tfm + transforms
132 |
133 | image_shape = image.shape[:2] # h, w
134 |
135 | # Pytorch's dataloader is efficient on torch.Tensor due to shared-memory,
136 | # but not efficient on large generic data structures due to the use of pickle & mp.Queue.
137 | # Therefore it's important to use torch.Tensor.
138 | dataset_dict["image"] = torch.as_tensor(np.ascontiguousarray(image.transpose(2, 0, 1)))
139 |
140 | # USER: Remove if you don't use pre-computed proposals.
141 | # Most users would not need this feature.
142 | if self.load_proposals:
143 | utils.transform_proposals(
144 | dataset_dict,
145 | image_shape,
146 | transforms,
147 | self.proposal_min_box_size,
148 | self.proposal_topk,
149 | )
150 |
151 | if not self.is_train:
152 | # USER: Modify this if you want to keep them for some reason.
153 | dataset_dict.pop("annotations", None)
154 | dataset_dict.pop("sem_seg_file_name", None)
155 | return dataset_dict
156 |
157 | if "annotations" in dataset_dict:
158 | # USER: Modify this if you want to keep them for some reason.
159 | for anno in dataset_dict["annotations"]:
160 | if not self.mask_on:
161 | anno.pop("segmentation", None)
162 | if not self.keypoint_on:
163 | anno.pop("keypoints", None)
164 |
165 | # USER: Implement additional transformations if you have other types of data
166 | annos = [
167 | utils.transform_instance_annotations(
168 | obj, transforms, image_shape, keypoint_hflip_indices=self.keypoint_hflip_indices
169 | )
170 | for obj in dataset_dict.pop("annotations")
171 | if obj.get("iscrowd", 0) == 0
172 | ]
173 | instances = utils.annotations_to_instances(
174 | annos, image_shape, mask_format=self.mask_format
175 | )
176 | # Create a tight bounding box from masks, useful when image is cropped
177 | if self.crop_gen and instances.has("gt_masks"):
178 | instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
179 | dataset_dict["instances"] = utils.filter_empty_instances(instances)
180 |
181 | # USER: Remove if you don't do semantic/panoptic segmentation.
182 | if "sem_seg_file_name" in dataset_dict:
183 | with PathManager.open(dataset_dict.pop("sem_seg_file_name"), "rb") as f:
184 | sem_seg_gt = Image.open(f)
185 | sem_seg_gt = np.asarray(sem_seg_gt, dtype="uint8")
186 | sem_seg_gt = transforms.apply_segmentation(sem_seg_gt)
187 | sem_seg_gt = torch.as_tensor(sem_seg_gt.astype("long"))
188 | dataset_dict["sem_seg"] = sem_seg_gt
189 | return dataset_dict
190 |
191 | def generate_support(self, dataset_dict):
192 | support_way = self.support_way #2
193 | support_shot = self.support_shot #5
194 |
195 | id = dataset_dict['annotations'][0]['id']
196 | query_cls = self.support_df.loc[self.support_df['id']==id, 'category_id'].tolist()[0] # they share the same category_id and image_id
197 | query_img = self.support_df.loc[self.support_df['id']==id, 'image_id'].tolist()[0]
198 | all_cls = self.support_df.loc[self.support_df['image_id']==query_img, 'category_id'].tolist()
199 |
200 | # Crop support data and get new support box in the support data
201 | support_data_all = np.zeros((support_way * support_shot, 3, 320, 320), dtype = np.float32)
202 | support_box_all = np.zeros((support_way * support_shot, 4), dtype = np.float32)
203 | used_image_id = [query_img]
204 |
205 | used_id_ls = []
206 | for item in dataset_dict['annotations']:
207 | used_id_ls.append(item['id'])
208 | #used_category_id = [query_cls]
209 | used_category_id = list(set(all_cls))
210 | support_category_id = []
211 | mixup_i = 0
212 |
213 | for shot in range(support_shot):
214 | # Support image and box
215 | support_id = self.support_df.loc[(self.support_df['category_id'] == query_cls) & (~self.support_df['image_id'].isin(used_image_id)) & (~self.support_df['id'].isin(used_id_ls)), 'id'].sample(random_state=id).tolist()[0]
216 | support_cls = self.support_df.loc[self.support_df['id'] == support_id, 'category_id'].tolist()[0]
217 | support_img = self.support_df.loc[self.support_df['id'] == support_id, 'image_id'].tolist()[0]
218 | used_id_ls.append(support_id)
219 | used_image_id.append(support_img)
220 |
221 | support_db = self.support_df.loc[self.support_df['id'] == support_id, :]
222 | assert support_db['id'].values[0] == support_id
223 |
224 | support_data = utils.read_image('./datasets/coco/' + support_db["file_path"].tolist()[0], format=self.img_format)
225 | support_data = torch.as_tensor(np.ascontiguousarray(support_data.transpose(2, 0, 1)))
226 | support_box = support_db['support_box'].tolist()[0]
227 | #print(support_data)
228 | support_data_all[mixup_i] = support_data
229 | support_box_all[mixup_i] = support_box
230 | support_category_id.append(0) #support_cls)
231 | mixup_i += 1
232 |
233 | if support_way == 1:
234 | pass
235 | else:
236 | for way in range(support_way-1):
237 | other_cls = self.support_df.loc[(~self.support_df['category_id'].isin(used_category_id)), 'category_id'].drop_duplicates().sample(random_state=id).tolist()[0]
238 | used_category_id.append(other_cls)
239 | for shot in range(support_shot):
240 | # Support image and box
241 |
242 | support_id = self.support_df.loc[(self.support_df['category_id'] == other_cls) & (~self.support_df['image_id'].isin(used_image_id)) & (~self.support_df['id'].isin(used_id_ls)), 'id'].sample(random_state=id).tolist()[0]
243 |
244 | support_cls = self.support_df.loc[self.support_df['id'] == support_id, 'category_id'].tolist()[0]
245 | support_img = self.support_df.loc[self.support_df['id'] == support_id, 'image_id'].tolist()[0]
246 |
247 | used_id_ls.append(support_id)
248 | used_image_id.append(support_img)
249 |
250 | support_db = self.support_df.loc[self.support_df['id'] == support_id, :]
251 | assert support_db['id'].values[0] == support_id
252 |
253 | support_data = utils.read_image('./datasets/coco/' + support_db["file_path"].tolist()[0], format=self.img_format)
254 | support_data = torch.as_tensor(np.ascontiguousarray(support_data.transpose(2, 0, 1)))
255 | support_box = support_db['support_box'].tolist()[0]
256 | support_data_all[mixup_i] = support_data
257 | support_box_all[mixup_i] = support_box
258 | support_category_id.append(1) #support_cls)
259 | mixup_i += 1
260 |
261 | return support_data_all, support_box_all, support_category_id
262 |
--------------------------------------------------------------------------------
/fewx/data/datasets/__init__.py:
--------------------------------------------------------------------------------
1 | from . import builtin # ensure the builtin datasets are registered
2 | from .register_coco import register_coco_instances
3 |
4 | __all__ = [k for k in globals().keys() if "builtin" not in k and not k.startswith("_")]
5 |
--------------------------------------------------------------------------------
/fewx/data/datasets/builtin.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | from .register_coco import register_coco_instances
4 | from detectron2.data.datasets.builtin_meta import _get_builtin_metadata
5 |
6 | # ==== Predefined datasets and splits for COCO ==========
7 |
8 | _PREDEFINED_SPLITS_COCO = {}
9 | _PREDEFINED_SPLITS_COCO["coco"] = {
10 | "coco_2017_train_nonvoc": ("coco/train2017", "coco/new_annotations/final_split_non_voc_instances_train2017.json"),
11 | "coco_2017_train_voc_10_shot": ("coco/train2017", "coco/new_annotations/final_split_voc_10_shot_instances_train2017.json"),
12 | }
13 |
14 | def register_all_coco(root):
15 | for dataset_name, splits_per_dataset in _PREDEFINED_SPLITS_COCO.items():
16 | for key, (image_root, json_file) in splits_per_dataset.items():
17 | # Assume pre-defined datasets live in `./datasets`.
18 | register_coco_instances(
19 | key,
20 | _get_builtin_metadata(dataset_name),
21 | os.path.join(root, json_file) if "://" not in json_file else json_file,
22 | os.path.join(root, image_root),
23 | )
24 |
25 | # Register them all under "./datasets"
26 | _root = os.getenv("DETECTRON2_DATASETS", "datasets")
27 | register_all_coco(_root)
28 |
--------------------------------------------------------------------------------
/fewx/data/datasets/register_coco.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
2 | import copy
3 | import os
4 |
5 | from detectron2.data import DatasetCatalog, MetadataCatalog
6 |
7 | from detectron2.data.datasets.coco import load_coco_json, load_sem_seg
8 |
9 | """
10 | This file contains functions to register a COCO-format dataset to the DatasetCatalog.
11 | """
12 |
13 | __all__ = ["register_coco_instances", "register_coco_panoptic_separated"]
14 |
15 |
16 | def register_coco_instances(name, metadata, json_file, image_root):
17 | """
18 | Register a dataset in COCO's json annotation format for
19 | instance detection, instance segmentation and keypoint detection.
20 | (i.e., Type 1 and 2 in http://cocodataset.org/#format-data.
21 | `instances*.json` and `person_keypoints*.json` in the dataset).
22 | This is an example of how to register a new dataset.
23 | You can do something similar to this function, to register new datasets.
24 | Args:
25 | name (str): the name that identifies a dataset, e.g. "coco_2014_train".
26 | metadata (dict): extra metadata associated with this dataset. You can
27 | leave it as an empty dict.
28 | json_file (str): path to the json instance annotation file.
29 | image_root (str or path-like): directory which contains all the images.
30 | """
31 | assert isinstance(name, str), name
32 | assert isinstance(json_file, (str, os.PathLike)), json_file
33 | assert isinstance(image_root, (str, os.PathLike)), image_root
34 | # 1. register a function which returns dicts
35 | DatasetCatalog.register(name, lambda: load_coco_json(json_file, image_root, name, extra_annotation_keys=['id']))
36 |
37 | # 2. Optionally, add metadata about this dataset,
38 | # since they might be useful in evaluation, visualization or logging
39 | MetadataCatalog.get(name).set(
40 | json_file=json_file, image_root=image_root, evaluator_type="coco", **metadata
41 | )
42 |
--------------------------------------------------------------------------------
/fewx/evaluation/__init__.py:
--------------------------------------------------------------------------------
1 | from .coco_evaluation import COCOEvaluator
2 |
3 | __all__ = [k for k in globals().keys() if not k.startswith("_")]
4 |
--------------------------------------------------------------------------------
/fewx/evaluation/coco_evaluation.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
2 | import contextlib
3 | import copy
4 | import io
5 | import itertools
6 | import json
7 | import logging
8 | import numpy as np
9 | import os
10 | import pickle
11 | from collections import OrderedDict
12 | import pycocotools.mask as mask_util
13 | import torch
14 | from fvcore.common.file_io import PathManager
15 | from pycocotools.coco import COCO
16 | from tabulate import tabulate
17 |
18 | import detectron2.utils.comm as comm
19 | from detectron2.data import MetadataCatalog
20 | from detectron2.data.datasets.coco import convert_to_coco_json
21 | from detectron2.evaluation.fast_eval_api import COCOeval_opt as COCOeval
22 | from detectron2.structures import Boxes, BoxMode, pairwise_iou
23 | from detectron2.utils.logger import create_small_table
24 |
25 | from detectron2.evaluation.evaluator import DatasetEvaluator
26 |
27 | CLASS_NAMES = [
28 | "airplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat",
29 | "chair", "cow", "dining table", "dog", "horse", "motorcycle", "person",
30 | "potted plant", "sheep", "couch", "train", "tv",
31 | ]
32 |
33 | class COCOEvaluator(DatasetEvaluator):
34 | """
35 | Evaluate AR for object proposals, AP for instance detection/segmentation, AP
36 | for keypoint detection outputs using COCO's metrics.
37 | See http://cocodataset.org/#detection-eval and
38 | http://cocodataset.org/#keypoints-eval to understand its metrics.
39 |
40 | In addition to COCO, this evaluator is able to support any bounding box detection,
41 | instance segmentation, or keypoint detection dataset.
42 | """
43 |
44 | def __init__(self, dataset_name, cfg, distributed, output_dir=None):
45 | """
46 | Args:
47 | dataset_name (str): name of the dataset to be evaluated.
48 | It must have either the following corresponding metadata:
49 |
50 | "json_file": the path to the COCO format annotation
51 |
52 | Or it must be in detectron2's standard dataset format
53 | so it can be converted to COCO format automatically.
54 | cfg (CfgNode): config instance
55 | distributed (True): if True, will collect results from all ranks and run evaluation
56 | in the main process.
57 | Otherwise, will evaluate the results in the current process.
58 | output_dir (str): optional, an output directory to dump all
59 | results predicted on the dataset. The dump contains two files:
60 |
61 | 1. "instance_predictions.pth" a file in torch serialization
62 | format that contains all the raw original predictions.
63 | 2. "coco_instances_results.json" a json file in COCO's result
64 | format.
65 | """
66 | self._tasks = self._tasks_from_config(cfg)
67 | self._distributed = distributed
68 | self._output_dir = output_dir
69 |
70 | self._cpu_device = torch.device("cpu")
71 |
72 | # replace fewx with d2
73 | self._logger = logging.getLogger(__name__)
74 | self._metadata = MetadataCatalog.get(dataset_name)
75 | if not hasattr(self._metadata, "json_file"):
76 | self._logger.info(
77 | f"'{dataset_name}' is not registered by `register_coco_instances`."
78 | " Therefore trying to convert it to COCO format ..."
79 | )
80 |
81 | cache_path = os.path.join(output_dir, f"{dataset_name}_coco_format.json")
82 | self._metadata.json_file = cache_path
83 | convert_to_coco_json(dataset_name, cache_path)
84 |
85 | json_file = PathManager.get_local_path(self._metadata.json_file)
86 | with contextlib.redirect_stdout(io.StringIO()):
87 | self._coco_api = COCO(json_file)
88 |
89 | self._kpt_oks_sigmas = cfg.TEST.KEYPOINT_OKS_SIGMAS
90 | # Test set json files do not contain annotations (evaluation must be
91 | # performed using the COCO evaluation server).
92 | self._do_evaluation = "annotations" in self._coco_api.dataset
93 |
94 | def reset(self):
95 | self._predictions = []
96 |
97 | def _tasks_from_config(self, cfg):
98 | """
99 | Returns:
100 | tuple[str]: tasks that can be evaluated under the given configuration.
101 | """
102 | tasks = ("bbox",)
103 | if cfg.MODEL.MASK_ON:
104 | tasks = tasks + ("segm",)
105 | if cfg.MODEL.KEYPOINT_ON:
106 | tasks = tasks + ("keypoints",)
107 | return tasks
108 |
109 | def process(self, inputs, outputs):
110 | """
111 | Args:
112 | inputs: the inputs to a COCO model (e.g., GeneralizedRCNN).
113 | It is a list of dict. Each dict corresponds to an image and
114 | contains keys like "height", "width", "file_name", "image_id".
115 | outputs: the outputs of a COCO model. It is a list of dicts with key
116 | "instances" that contains :class:`Instances`.
117 | """
118 | for input, output in zip(inputs, outputs):
119 | prediction = {"image_id": input["image_id"]}
120 |
121 | # TODO this is ugly
122 | if "instances" in output:
123 | instances = output["instances"].to(self._cpu_device)
124 | prediction["instances"] = instances_to_coco_json(instances, input["image_id"])
125 | if "proposals" in output:
126 | prediction["proposals"] = output["proposals"].to(self._cpu_device)
127 | self._predictions.append(prediction)
128 |
129 | def evaluate(self):
130 | if self._distributed:
131 | comm.synchronize()
132 | predictions = comm.gather(self._predictions, dst=0)
133 | predictions = list(itertools.chain(*predictions))
134 |
135 | if not comm.is_main_process():
136 | return {}
137 | else:
138 | predictions = self._predictions
139 |
140 | if len(predictions) == 0:
141 | self._logger.warning("[COCOEvaluator] Did not receive valid predictions.")
142 | return {}
143 |
144 | if self._output_dir:
145 | PathManager.mkdirs(self._output_dir)
146 | file_path = os.path.join(self._output_dir, "instances_predictions.pth")
147 | with PathManager.open(file_path, "wb") as f:
148 | torch.save(predictions, f)
149 |
150 | self._results = OrderedDict()
151 | if "proposals" in predictions[0]:
152 | self._eval_box_proposals(predictions)
153 | if "instances" in predictions[0]:
154 | self._eval_predictions(set(self._tasks), predictions)
155 | # Copy so the caller can do whatever with results
156 | return copy.deepcopy(self._results)
157 |
158 | def _eval_predictions(self, tasks, predictions):
159 | """
160 | Evaluate predictions on the given tasks.
161 | Fill self._results with the metrics of the tasks.
162 | """
163 | self._logger.info("Preparing results for COCO format ...")
164 | coco_results = list(itertools.chain(*[x["instances"] for x in predictions]))
165 |
166 | # unmap the category ids for COCO
167 | if hasattr(self._metadata, "thing_dataset_id_to_contiguous_id"):
168 | reverse_id_mapping = {
169 | v: k for k, v in self._metadata.thing_dataset_id_to_contiguous_id.items()
170 | }
171 | for result in coco_results:
172 | category_id = result["category_id"]
173 | assert (
174 | category_id in reverse_id_mapping
175 | ), "A prediction has category_id={}, which is not available in the dataset.".format(
176 | category_id
177 | )
178 | result["category_id"] = reverse_id_mapping[category_id]
179 |
180 | if self._output_dir:
181 | file_path = os.path.join(self._output_dir, "coco_instances_results.json")
182 | self._logger.info("Saving results to {}".format(file_path))
183 | with PathManager.open(file_path, "w") as f:
184 | f.write(json.dumps(coco_results))
185 | f.flush()
186 |
187 | if not self._do_evaluation:
188 | self._logger.info("Annotations are not available for evaluation.")
189 | return
190 |
191 | self._logger.info("Evaluating predictions ...")
192 | for task in sorted(tasks):
193 | coco_eval = (
194 | _evaluate_predictions_on_coco(
195 | self._coco_api, coco_results, task, kpt_oks_sigmas=self._kpt_oks_sigmas
196 | )
197 | if len(coco_results) > 0
198 | else None # cocoapi does not handle empty results very well
199 | )
200 |
201 | res = self._derive_coco_results(
202 | coco_eval, task, class_names=self._metadata.get("thing_classes")
203 | )
204 | self._results[task] = res
205 |
206 | def _eval_box_proposals(self, predictions):
207 | """
208 | Evaluate the box proposals in predictions.
209 | Fill self._results with the metrics for "box_proposals" task.
210 | """
211 | if self._output_dir:
212 | # Saving generated box proposals to file.
213 | # Predicted box_proposals are in XYXY_ABS mode.
214 | bbox_mode = BoxMode.XYXY_ABS.value
215 | ids, boxes, objectness_logits = [], [], []
216 | for prediction in predictions:
217 | ids.append(prediction["image_id"])
218 | boxes.append(prediction["proposals"].proposal_boxes.tensor.numpy())
219 | objectness_logits.append(prediction["proposals"].objectness_logits.numpy())
220 |
221 | proposal_data = {
222 | "boxes": boxes,
223 | "objectness_logits": objectness_logits,
224 | "ids": ids,
225 | "bbox_mode": bbox_mode,
226 | }
227 | with PathManager.open(os.path.join(self._output_dir, "box_proposals.pkl"), "wb") as f:
228 | pickle.dump(proposal_data, f)
229 |
230 | if not self._do_evaluation:
231 | self._logger.info("Annotations are not available for evaluation.")
232 | return
233 |
234 | self._logger.info("Evaluating bbox proposals ...")
235 | res = {}
236 | areas = {"all": "", "small": "s", "medium": "m", "large": "l"}
237 | for limit in [100, 1000]:
238 | for area, suffix in areas.items():
239 | stats = _evaluate_box_proposals(predictions, self._coco_api, area=area, limit=limit)
240 | key = "AR{}@{:d}".format(suffix, limit)
241 | res[key] = float(stats["ar"].item() * 100)
242 | self._logger.info("Proposal metrics: \n" + create_small_table(res))
243 | self._results["box_proposals"] = res
244 |
245 | def _calculate_ap(self, class_names, precisions, T=None, A=None):
246 | ################## ap #####################
247 | voc_ls = []
248 | non_voc_ls = []
249 |
250 | for idx, name in enumerate(class_names):
251 | # area range index 0: all area ranges
252 | # max dets index -1: typically 100 per image
253 | if T is not None and A is None:
254 | precision = precisions[T, :, idx, 0, -1]
255 | elif A is not None and T is None:
256 | precision = precisions[:, :, idx, A, -1]
257 | elif T is None and A is None:
258 | precision = precisions[:, :, idx, 0, -1]
259 | else:
260 | assert False
261 |
262 | precision = precision[precision > -1]
263 | ap = np.mean(precision) if precision.size else float("nan")
264 |
265 | # calculate voc ap and non-voc ap
266 | if name in CLASS_NAMES:
267 | voc_ls.append(ap * 100)
268 | else:
269 | non_voc_ls.append(ap * 100)
270 | if len(voc_ls) > 0:
271 | voc_ap = sum(voc_ls) * 1.0 / len(voc_ls)
272 | else:
273 | voc_ap = 0.0
274 | if len(non_voc_ls) > 0:
275 | non_voc_ap = sum(non_voc_ls) * 1.0 / len(non_voc_ls)
276 | else:
277 | non_voc_ap = 0.0
278 |
279 | return voc_ap, non_voc_ap
280 |
281 | def _derive_coco_results(self, coco_eval, iou_type, class_names=None):
282 | """
283 | Derive the desired score numbers from summarized COCOeval.
284 |
285 | Args:
286 | coco_eval (None or COCOEval): None represents no predictions from model.
287 | iou_type (str):
288 | class_names (None or list[str]): if provided, will use it to predict
289 | per-category AP.
290 |
291 | Returns:
292 | a dict of {metric name: score}
293 | """
294 |
295 | metrics = {
296 | "bbox": ["AP", "AP50", "AP75", "APs", "APm", "APl"],
297 | "segm": ["AP", "AP50", "AP75", "APs", "APm", "APl"],
298 | "keypoints": ["AP", "AP50", "AP75", "APm", "APl"],
299 | }[iou_type]
300 |
301 | if coco_eval is None:
302 | self._logger.warn("No predictions from the model!")
303 | return {metric: float("nan") for metric in metrics}
304 |
305 | # the standard metrics
306 | results = {
307 | metric: float(coco_eval.stats[idx] * 100 if coco_eval.stats[idx] >= 0 else "nan")
308 | for idx, metric in enumerate(metrics)
309 | }
310 | self._logger.info(
311 | "Evaluation results for {}: \n".format(iou_type) + create_small_table(results)
312 | )
313 | if not np.isfinite(sum(results.values())):
314 | self._logger.info("Some metrics cannot be computed and is shown as NaN.")
315 |
316 | if class_names is None or len(class_names) <= 1:
317 | return results
318 | # Compute per-category AP
319 | # from https://github.com/facebookresearch/Detectron/blob/a6a835f5b8208c45d0dce217ce9bbda915f44df7/detectron/datasets/json_dataset_evaluator.py#L222-L252 # noqa
320 | precisions = coco_eval.eval["precision"]
321 | # precision has dims (iou, recall, cls, area range, max dets)
322 | assert len(class_names) == precisions.shape[2]
323 |
324 | results_per_category = []
325 | voc_ls = []
326 | non_voc_ls = []
327 |
328 | ################## ap #####################
329 | for idx, name in enumerate(class_names):
330 | # area range index 0: all area ranges
331 | # max dets index -1: typically 100 per image
332 | precision = precisions[:, :, idx, 0, -1]
333 | precision = precision[precision > -1]
334 | ap = np.mean(precision) if precision.size else float("nan")
335 | results_per_category.append(("{}".format(name), float(ap * 100)))
336 | # calculate voc and non voc metrics
337 | voc_ap, non_voc_ap = self._calculate_ap(class_names, precisions)
338 | voc_ap_50, non_voc_ap_50 = self._calculate_ap(class_names, precisions, T=0) # T=0, iou_thresh = 0.5
339 | voc_ap_75, non_voc_ap_75 = self._calculate_ap(class_names, precisions, T=5) # T=5, iou_thresh = 0.75
340 |
341 | voc_ap_small, non_voc_ap_small = self._calculate_ap(class_names, precisions, A=1) # A=1, small
342 | voc_ap_medium, non_voc_ap_medium = self._calculate_ap(class_names, precisions, A=2) # A=2, medium
343 | voc_ap_large, non_voc_ap_large = self._calculate_ap(class_names, precisions, A=3) # A=3, large
344 |
345 | # print voc ap
346 | self._logger.info("Evaluation results for VOC 20 categories =======> AP : " + str('%.2f' % voc_ap))
347 | self._logger.info("Evaluation results for VOC 20 categories =======> AP50: " + str('%.2f' % voc_ap_50))
348 | self._logger.info("Evaluation results for VOC 20 categories =======> AP75: " + str('%.2f' % voc_ap_75))
349 | self._logger.info("Evaluation results for VOC 20 categories =======> APs : " + str('%.2f' % voc_ap_small))
350 | self._logger.info("Evaluation results for VOC 20 categories =======> APm : " + str('%.2f' % voc_ap_medium))
351 | self._logger.info("Evaluation results for VOC 20 categories =======> APl : " + str('%.2f' % voc_ap_large))
352 |
353 | # print voc ap
354 | self._logger.info("Evaluation results for Non VOC 60 categories =======> AP : " + str('%.2f' % non_voc_ap))
355 | self._logger.info("Evaluation results for Non VOC 60 categories =======> AP50: " + str('%.2f' % non_voc_ap_50))
356 | self._logger.info("Evaluation results for Non VOC 60 categories =======> AP75: " + str('%.2f' % non_voc_ap_75))
357 | self._logger.info("Evaluation results for Non VOC 60 categories =======> APs : " + str('%.2f' % non_voc_ap_small))
358 | self._logger.info("Evaluation results for Non VOC 60 categories =======> APm : " + str('%.2f' % non_voc_ap_medium))
359 | self._logger.info("Evaluation results for Non VOC 60 categories =======> APl : " + str('%.2f' % non_voc_ap_large))
360 |
361 | '''
362 | # log evaluation results in csv
363 | # type, AP, AP50, AP75, APs, APm, APl
364 | eval_log = iou_type + ',' + 'AP,AP50,AP75,APs,APm,APl' + '\n'
365 | eval_log += 'all' + ',' + str('%.2f' % results['AP']) + ',' + str('%.2f' % results['AP50']) + ',' + str('%.2f' % results['AP75']) + ',' + str('%.2f' % results['APs']) + ',' + str('%.2f' % results['APm']) + ',' + str('%.2f' % results['APl']) + '\n'
366 | eval_log += 'VOC' + ',' + str('%.2f' % voc_ap) + ',' + str('%.2f' % voc_ap_50) + ',' + str('%.2f' % voc_ap_75) + ',' + str('%.2f' % voc_ap_small) + ',' + str('%.2f' % voc_ap_medium) + ',' + str('%.2f' % voc_ap_large) + '\n'
367 | eval_log += 'Non-VOC' + ',' + str('%.2f' % non_voc_ap) + ',' + str('%.2f' % non_voc_ap_50) + ',' + str('%.2f' % non_voc_ap_75) + ',' + str('%.2f' % non_voc_ap_small) + ',' + str('%.2f' % non_voc_ap_medium) + ',' + str('%.2f' % non_voc_ap_large) + '\n'
368 |
369 | with open('evaluation_result.csv', 'a') as f:
370 | f.write(eval_log)
371 | '''
372 | # tabulate it
373 | N_COLS = min(6, len(results_per_category) * 2)
374 | results_flatten = list(itertools.chain(*results_per_category))
375 | results_2d = itertools.zip_longest(*[results_flatten[i::N_COLS] for i in range(N_COLS)])
376 | table = tabulate(
377 | results_2d,
378 | tablefmt="pipe",
379 | floatfmt=".3f",
380 | headers=["category", "AP"] * (N_COLS // 2),
381 | numalign="left",
382 | )
383 | self._logger.info("Per-category {} AP: \n".format(iou_type) + table)
384 |
385 | results.update({"AP-" + name: ap for name, ap in results_per_category})
386 | return results
387 |
388 | def instances_to_coco_json(instances, img_id):
389 | """
390 | Dump an "Instances" object to a COCO-format json that's used for evaluation.
391 |
392 | Args:
393 | instances (Instances):
394 | img_id (int): the image id
395 |
396 | Returns:
397 | list[dict]: list of json annotations in COCO format.
398 | """
399 | num_instance = len(instances)
400 | if num_instance == 0:
401 | return []
402 |
403 | boxes = instances.pred_boxes.tensor.numpy()
404 | boxes = BoxMode.convert(boxes, BoxMode.XYXY_ABS, BoxMode.XYWH_ABS)
405 | boxes = boxes.tolist()
406 | scores = instances.scores.tolist()
407 | classes = instances.pred_classes.tolist()
408 |
409 | has_mask = instances.has("pred_masks")
410 | if has_mask:
411 | # use RLE to encode the masks, because they are too large and takes memory
412 | # since this evaluator stores outputs of the entire dataset
413 | rles = [
414 | mask_util.encode(np.array(mask[:, :, None], order="F", dtype="uint8"))[0]
415 | for mask in instances.pred_masks
416 | ]
417 | for rle in rles:
418 | # "counts" is an array encoded by mask_util as a byte-stream. Python3's
419 | # json writer which always produces strings cannot serialize a bytestream
420 | # unless you decode it. Thankfully, utf-8 works out (which is also what
421 | # the pycocotools/_mask.pyx does).
422 | rle["counts"] = rle["counts"].decode("utf-8")
423 |
424 | has_keypoints = instances.has("pred_keypoints")
425 | if has_keypoints:
426 | keypoints = instances.pred_keypoints
427 |
428 | results = []
429 | for k in range(num_instance):
430 | result = {
431 | "image_id": img_id,
432 | "category_id": classes[k],
433 | "bbox": boxes[k],
434 | "score": scores[k],
435 | }
436 | if has_mask:
437 | result["segmentation"] = rles[k]
438 | if has_keypoints:
439 | # In COCO annotations,
440 | # keypoints coordinates are pixel indices.
441 | # However our predictions are floating point coordinates.
442 | # Therefore we subtract 0.5 to be consistent with the annotation format.
443 | # This is the inverse of data loading logic in `datasets/coco.py`.
444 | keypoints[k][:, :2] -= 0.5
445 | result["keypoints"] = keypoints[k].flatten().tolist()
446 | results.append(result)
447 | return results
448 |
449 |
450 | # inspired from Detectron:
451 | # https://github.com/facebookresearch/Detectron/blob/a6a835f5b8208c45d0dce217ce9bbda915f44df7/detectron/datasets/json_dataset_evaluator.py#L255 # noqa
452 | def _evaluate_box_proposals(dataset_predictions, coco_api, thresholds=None, area="all", limit=None):
453 | """
454 | Evaluate detection proposal recall metrics. This function is a much
455 | faster alternative to the official COCO API recall evaluation code. However,
456 | it produces slightly different results.
457 | """
458 | # Record max overlap value for each gt box
459 | # Return vector of overlap values
460 | areas = {
461 | "all": 0,
462 | "small": 1,
463 | "medium": 2,
464 | "large": 3,
465 | "96-128": 4,
466 | "128-256": 5,
467 | "256-512": 6,
468 | "512-inf": 7,
469 | }
470 | area_ranges = [
471 | [0 ** 2, 1e5 ** 2], # all
472 | [0 ** 2, 32 ** 2], # small
473 | [32 ** 2, 96 ** 2], # medium
474 | [96 ** 2, 1e5 ** 2], # large
475 | [96 ** 2, 128 ** 2], # 96-128
476 | [128 ** 2, 256 ** 2], # 128-256
477 | [256 ** 2, 512 ** 2], # 256-512
478 | [512 ** 2, 1e5 ** 2],
479 | ] # 512-inf
480 | assert area in areas, "Unknown area range: {}".format(area)
481 | area_range = area_ranges[areas[area]]
482 | gt_overlaps = []
483 | num_pos = 0
484 |
485 | for prediction_dict in dataset_predictions:
486 | predictions = prediction_dict["proposals"]
487 |
488 | # sort predictions in descending order
489 | # TODO maybe remove this and make it explicit in the documentation
490 | inds = predictions.objectness_logits.sort(descending=True)[1]
491 | predictions = predictions[inds]
492 |
493 | ann_ids = coco_api.getAnnIds(imgIds=prediction_dict["image_id"])
494 | anno = coco_api.loadAnns(ann_ids)
495 | gt_boxes = [
496 | BoxMode.convert(obj["bbox"], BoxMode.XYWH_ABS, BoxMode.XYXY_ABS)
497 | for obj in anno
498 | if obj["iscrowd"] == 0
499 | ]
500 | gt_boxes = torch.as_tensor(gt_boxes).reshape(-1, 4) # guard against no boxes
501 | gt_boxes = Boxes(gt_boxes)
502 | gt_areas = torch.as_tensor([obj["area"] for obj in anno if obj["iscrowd"] == 0])
503 |
504 | if len(gt_boxes) == 0 or len(predictions) == 0:
505 | continue
506 |
507 | valid_gt_inds = (gt_areas >= area_range[0]) & (gt_areas <= area_range[1])
508 | gt_boxes = gt_boxes[valid_gt_inds]
509 |
510 | num_pos += len(gt_boxes)
511 |
512 | if len(gt_boxes) == 0:
513 | continue
514 |
515 | if limit is not None and len(predictions) > limit:
516 | predictions = predictions[:limit]
517 |
518 | overlaps = pairwise_iou(predictions.proposal_boxes, gt_boxes)
519 |
520 | _gt_overlaps = torch.zeros(len(gt_boxes))
521 | for j in range(min(len(predictions), len(gt_boxes))):
522 | # find which proposal box maximally covers each gt box
523 | # and get the iou amount of coverage for each gt box
524 | max_overlaps, argmax_overlaps = overlaps.max(dim=0)
525 |
526 | # find which gt box is 'best' covered (i.e. 'best' = most iou)
527 | gt_ovr, gt_ind = max_overlaps.max(dim=0)
528 | assert gt_ovr >= 0
529 | # find the proposal box that covers the best covered gt box
530 | box_ind = argmax_overlaps[gt_ind]
531 | # record the iou coverage of this gt box
532 | _gt_overlaps[j] = overlaps[box_ind, gt_ind]
533 | assert _gt_overlaps[j] == gt_ovr
534 | # mark the proposal box and the gt box as used
535 | overlaps[box_ind, :] = -1
536 | overlaps[:, gt_ind] = -1
537 |
538 | # append recorded iou coverage level
539 | gt_overlaps.append(_gt_overlaps)
540 | gt_overlaps = (
541 | torch.cat(gt_overlaps, dim=0) if len(gt_overlaps) else torch.zeros(0, dtype=torch.float32)
542 | )
543 | gt_overlaps, _ = torch.sort(gt_overlaps)
544 |
545 | if thresholds is None:
546 | step = 0.05
547 | thresholds = torch.arange(0.5, 0.95 + 1e-5, step, dtype=torch.float32)
548 | recalls = torch.zeros_like(thresholds)
549 | # compute recall for each iou threshold
550 | for i, t in enumerate(thresholds):
551 | recalls[i] = (gt_overlaps >= t).float().sum() / float(num_pos)
552 | # ar = 2 * np.trapz(recalls, thresholds)
553 | ar = recalls.mean()
554 | return {
555 | "ar": ar,
556 | "recalls": recalls,
557 | "thresholds": thresholds,
558 | "gt_overlaps": gt_overlaps,
559 | "num_pos": num_pos,
560 | }
561 |
562 |
563 | def _evaluate_predictions_on_coco(coco_gt, coco_results, iou_type, kpt_oks_sigmas=None):
564 | """
565 | Evaluate the coco results using COCOEval API.
566 | """
567 | assert len(coco_results) > 0
568 |
569 | if iou_type == "segm":
570 | coco_results = copy.deepcopy(coco_results)
571 | # When evaluating mask AP, if the results contain bbox, cocoapi will
572 | # use the box area as the area of the instance, instead of the mask area.
573 | # This leads to a different definition of small/medium/large.
574 | # We remove the bbox field to let mask AP use mask area.
575 | for c in coco_results:
576 | c.pop("bbox", None)
577 |
578 | coco_dt = coco_gt.loadRes(coco_results)
579 | coco_eval = COCOeval(coco_gt, coco_dt, iou_type)
580 |
581 | if iou_type == "keypoints":
582 | # Use the COCO default keypoint OKS sigmas unless overrides are specified
583 | if kpt_oks_sigmas:
584 | assert hasattr(coco_eval.params, "kpt_oks_sigmas"), "pycocotools is too old!"
585 | coco_eval.params.kpt_oks_sigmas = np.array(kpt_oks_sigmas)
586 | # COCOAPI requires every detection and every gt to have keypoints, so
587 | # we just take the first entry from both
588 | num_keypoints_dt = len(coco_results[0]["keypoints"]) // 3
589 | num_keypoints_gt = len(next(iter(coco_gt.anns.values()))["keypoints"]) // 3
590 | num_keypoints_oks = len(coco_eval.params.kpt_oks_sigmas)
591 | assert num_keypoints_oks == num_keypoints_dt == num_keypoints_gt, (
592 | f"[COCOEvaluator] Prediction contain {num_keypoints_dt} keypoints. "
593 | f"Ground truth contains {num_keypoints_gt} keypoints. "
594 | f"The length of cfg.TEST.KEYPOINT_OKS_SIGMAS is {num_keypoints_oks}. "
595 | "They have to agree with each other. For meaning of OKS, please refer to "
596 | "http://cocodataset.org/#keypoints-eval."
597 | )
598 |
599 | coco_eval.evaluate()
600 | coco_eval.accumulate()
601 | coco_eval.summarize()
602 |
603 | return coco_eval
604 |
--------------------------------------------------------------------------------
/fewx/layers/__init__.py:
--------------------------------------------------------------------------------
1 | from .deform_conv import DFConv2d
2 | from .ml_nms import ml_nms
3 | from .iou_loss import IOULoss
4 | from .conv_with_kaiming_uniform import conv_with_kaiming_uniform
5 | from .naive_group_norm import NaiveGroupNorm
6 | from .boundary import get_instances_contour_interior
7 | from .misc import interpolate
8 |
9 | __all__ = [k for k in globals().keys() if not k.startswith("_")]
10 |
--------------------------------------------------------------------------------
/fewx/layers/boundary.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 | """
4 | Created on Sat Jan 11 00:22:15 2020
5 |
6 | @author: fanq15
7 | """
8 | import numpy as np
9 | #from PIL import Image #, ImageOps, ImageDraw
10 | from skimage import filters, img_as_ubyte
11 | from skimage.morphology import remove_small_objects, dilation, erosion, binary_dilation, binary_erosion, square
12 | #from scipy.ndimage.interpolation import map_coordinates
13 | #from scipy.ndimage.morphology import binary_fill_holes
14 | from scipy.ndimage.filters import gaussian_filter
15 | from scipy.ndimage.measurements import center_of_mass
16 |
17 | def get_contour_interior(mask, bold=False):
18 | if True: #'camunet' == config['param']['model']:
19 | # 2-pixel contour (1out+1in), 2-pixel shrinked interior
20 | outer = binary_dilation(mask) #, square(9))
21 | if bold:
22 | outer = binary_dilation(outer) #, square(9))
23 | inner = binary_erosion(mask) #, square(9))
24 | contour = ((outer != inner) > 0).astype(np.uint8)
25 | interior = (erosion(inner) > 0).astype(np.uint8)
26 | else:
27 | contour = filters.scharr(mask)
28 | scharr_threshold = np.amax(abs(contour)) / 2.
29 | contour = (np.abs(contour) > scharr_threshold).astype(np.uint8)
30 | interior = (mask - contour > 0).astype(np.uint8)
31 | return contour, interior
32 |
33 | def get_center(mask):
34 | r = 2
35 | y, x = center_of_mass(mask)
36 | center_img = Image.fromarray(np.zeros_like(mask).astype(np.uint8))
37 | if not np.isnan(x) and not np.isnan(y):
38 | draw = ImageDraw.Draw(center_img)
39 | draw.ellipse([x-r, y-r, x+r, y+r], fill='White')
40 | center = np.asarray(center_img)
41 | return center
42 |
43 | def get_instances_contour_interior(instances_mask):
44 | adjacent_boundary_only = False #False #config['contour'].getboolean('adjacent_boundary_only')
45 | instances_mask = instances_mask.data
46 | result_c = np.zeros_like(instances_mask, dtype=np.uint8)
47 | result_i = np.zeros_like(instances_mask, dtype=np.uint8)
48 | weight = np.ones_like(instances_mask, dtype=np.float32)
49 | #masks = decompose_mask(instances_mask)
50 | #for m in masks:
51 | contour, interior = get_contour_interior(instances_mask, bold=adjacent_boundary_only)
52 | #center = get_center(m)
53 | if adjacent_boundary_only:
54 | result_c += contour
55 | else:
56 | result_c = np.maximum(result_c, contour)
57 | result_i = np.maximum(result_i, interior)
58 | #contour += center
59 | contour = np.where(contour > 0, 1, 0)
60 | # magic number 50 make weight distributed to [1, 5) roughly
61 | weight *= (1 + gaussian_filter(contour, sigma=1) / 50)
62 | if adjacent_boundary_only:
63 | result_c = (result_c > 1).astype(np.uint8)
64 | return result_c, result_i, weight
65 |
--------------------------------------------------------------------------------
/fewx/layers/conv_with_kaiming_uniform.py:
--------------------------------------------------------------------------------
1 | from torch import nn
2 |
3 | from detectron2.layers import Conv2d
4 | from .deform_conv import DFConv2d
5 | from detectron2.layers.batch_norm import get_norm
6 |
7 |
8 | def conv_with_kaiming_uniform(
9 | norm=None, activation=None,
10 | use_deformable=False, use_sep=False):
11 | def make_conv(
12 | in_channels, out_channels, kernel_size, stride=1, dilation=1
13 | ):
14 | if use_deformable:
15 | conv_func = DFConv2d
16 | else:
17 | conv_func = Conv2d
18 | if use_sep:
19 | assert in_channels == out_channels
20 | groups = in_channels
21 | else:
22 | groups = 1
23 | conv = conv_func(
24 | in_channels,
25 | out_channels,
26 | kernel_size=kernel_size,
27 | stride=stride,
28 | padding=dilation * (kernel_size - 1) // 2,
29 | dilation=dilation,
30 | groups=groups,
31 | bias=(norm is None)
32 | )
33 | if not use_deformable:
34 | # Caffe2 implementation uses XavierFill, which in fact
35 | # corresponds to kaiming_uniform_ in PyTorch
36 | nn.init.kaiming_uniform_(conv.weight, a=1)
37 | if norm is None:
38 | nn.init.constant_(conv.bias, 0)
39 | module = [conv,]
40 | if norm is not None and len(norm) > 0:
41 | if norm == "GN":
42 | norm_module = nn.GroupNorm(32, out_channels)
43 | else:
44 | norm_module = get_norm(norm, out_channels)
45 | module.append(norm_module)
46 | if activation is not None:
47 | module.append(nn.ReLU(inplace=True))
48 | if len(module) > 1:
49 | return nn.Sequential(*module)
50 | return conv
51 |
52 | return make_conv
53 |
--------------------------------------------------------------------------------
/fewx/layers/deform_conv.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch import nn
3 |
4 | from detectron2.layers import Conv2d
5 |
6 |
7 | class _NewEmptyTensorOp(torch.autograd.Function):
8 | @staticmethod
9 | def forward(ctx, x, new_shape):
10 | ctx.shape = x.shape
11 | return x.new_empty(new_shape)
12 |
13 | @staticmethod
14 | def backward(ctx, grad):
15 | shape = ctx.shape
16 | return _NewEmptyTensorOp.apply(grad, shape), None
17 |
18 |
19 | class DFConv2d(nn.Module):
20 | """
21 | Deformable convolutional layer with configurable
22 | deformable groups, dilations and groups.
23 |
24 | Code is from:
25 | https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/maskrcnn_benchmark/layers/misc.py
26 |
27 |
28 | """
29 | def __init__(
30 | self,
31 | in_channels,
32 | out_channels,
33 | with_modulated_dcn=True,
34 | kernel_size=3,
35 | stride=1,
36 | groups=1,
37 | dilation=1,
38 | deformable_groups=1,
39 | bias=False,
40 | padding=None
41 | ):
42 | super(DFConv2d, self).__init__()
43 | if isinstance(kernel_size, (list, tuple)):
44 | assert isinstance(stride, (list, tuple))
45 | assert isinstance(dilation, (list, tuple))
46 | assert len(kernel_size) == 2
47 | assert len(stride) == 2
48 | assert len(dilation) == 2
49 | padding = (
50 | dilation[0] * (kernel_size[0] - 1) // 2,
51 | dilation[1] * (kernel_size[1] - 1) // 2
52 | )
53 | offset_base_channels = kernel_size[0] * kernel_size[1]
54 | else:
55 | padding = dilation * (kernel_size - 1) // 2
56 | offset_base_channels = kernel_size * kernel_size
57 | if with_modulated_dcn:
58 | from detectron2.layers.deform_conv import ModulatedDeformConv
59 | offset_channels = offset_base_channels * 3 # default: 27
60 | conv_block = ModulatedDeformConv
61 | else:
62 | from detectron2.layers.deform_conv import DeformConv
63 | offset_channels = offset_base_channels * 2 # default: 18
64 | conv_block = DeformConv
65 | self.offset = Conv2d(
66 | in_channels,
67 | deformable_groups * offset_channels,
68 | kernel_size=kernel_size,
69 | stride=stride,
70 | padding=padding,
71 | groups=1,
72 | dilation=dilation
73 | )
74 | for l in [self.offset, ]:
75 | nn.init.kaiming_uniform_(l.weight, a=1)
76 | torch.nn.init.constant_(l.bias, 0.)
77 | self.conv = conv_block(
78 | in_channels,
79 | out_channels,
80 | kernel_size=kernel_size,
81 | stride=stride,
82 | padding=padding,
83 | dilation=dilation,
84 | groups=groups,
85 | deformable_groups=deformable_groups,
86 | bias=bias
87 | )
88 | self.with_modulated_dcn = with_modulated_dcn
89 | self.kernel_size = kernel_size
90 | self.stride = stride
91 | self.padding = padding
92 | self.dilation = dilation
93 | self.offset_split = offset_base_channels * deformable_groups * 2
94 |
95 | def forward(self, x, return_offset=False):
96 | if x.numel() > 0:
97 | if not self.with_modulated_dcn:
98 | offset_mask = self.offset(x)
99 | x = self.conv(x, offset_mask)
100 | else:
101 | offset_mask = self.offset(x)
102 | offset = offset_mask[:, :self.offset_split, :, :]
103 | mask = offset_mask[:, self.offset_split:, :, :].sigmoid()
104 | x = self.conv(x, offset, mask)
105 | if return_offset:
106 | return x, offset_mask
107 | return x
108 | # get output shape
109 | output_shape = [
110 | (i + 2 * p - (di * (k - 1) + 1)) // d + 1
111 | for i, p, di, k, d in zip(
112 | x.shape[-2:],
113 | self.padding,
114 | self.dilation,
115 | self.kernel_size,
116 | self.stride
117 | )
118 | ]
119 | output_shape = [x.shape[0], self.conv.weight.shape[0]] + output_shape
120 | return _NewEmptyTensorOp.apply(x, output_shape)
121 |
--------------------------------------------------------------------------------
/fewx/layers/iou_loss.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch import nn
3 |
4 |
5 | class IOULoss(nn.Module):
6 | """
7 | Intersetion Over Union (IoU) loss which supports three
8 | different IoU computations:
9 | * IoU
10 | * Linear IoU
11 | * gIoU
12 | """
13 | def __init__(self, loc_loss_type='iou'):
14 | super(IOULoss, self).__init__()
15 | self.loc_loss_type = loc_loss_type
16 |
17 | def forward(self, pred, target, weight=None):
18 | """
19 | Args:
20 | pred: Nx4 predicted bounding boxes
21 | target: Nx4 target bounding boxes
22 | weight: N loss weight for each instance
23 | """
24 | pred_left = pred[:, 0]
25 | pred_top = pred[:, 1]
26 | pred_right = pred[:, 2]
27 | pred_bottom = pred[:, 3]
28 |
29 | target_left = target[:, 0]
30 | target_top = target[:, 1]
31 | target_right = target[:, 2]
32 | target_bottom = target[:, 3]
33 |
34 | target_aera = (target_left + target_right) * \
35 | (target_top + target_bottom)
36 | pred_aera = (pred_left + pred_right) * \
37 | (pred_top + pred_bottom)
38 |
39 | w_intersect = torch.min(pred_left, target_left) + \
40 | torch.min(pred_right, target_right)
41 | h_intersect = torch.min(pred_bottom, target_bottom) + \
42 | torch.min(pred_top, target_top)
43 |
44 | g_w_intersect = torch.max(pred_left, target_left) + \
45 | torch.max(pred_right, target_right)
46 | g_h_intersect = torch.max(pred_bottom, target_bottom) + \
47 | torch.max(pred_top, target_top)
48 | ac_uion = g_w_intersect * g_h_intersect
49 |
50 | area_intersect = w_intersect * h_intersect
51 | area_union = target_aera + pred_aera - area_intersect
52 |
53 | ious = (area_intersect + 1.0) / (area_union + 1.0)
54 | gious = ious - (ac_uion - area_union) / ac_uion
55 | if self.loc_loss_type == 'iou':
56 | losses = -torch.log(ious)
57 | elif self.loc_loss_type == 'linear_iou':
58 | losses = 1 - ious
59 | elif self.loc_loss_type == 'giou':
60 | losses = 1 - gious
61 | else:
62 | raise NotImplementedError
63 |
64 | if weight is not None:
65 | return (losses * weight).sum()
66 | else:
67 | return losses.sum()
68 |
--------------------------------------------------------------------------------
/fewx/layers/misc.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | """
3 | helper class that supports empty tensors on some nn functions.
4 | Ideally, add support directly in PyTorch to empty tensors in
5 | those functions.
6 | This can be removed once https://github.com/pytorch/pytorch/issues/12013
7 | is implemented
8 | """
9 |
10 | import math
11 | import torch
12 | from torch.nn.modules.utils import _ntuple
13 | from torch import nn
14 |
15 | class _NewEmptyTensorOp(torch.autograd.Function):
16 | @staticmethod
17 | def forward(ctx, x, new_shape):
18 | ctx.shape = x.shape
19 | return x.new_empty(new_shape)
20 |
21 | @staticmethod
22 | def backward(ctx, grad):
23 | shape = ctx.shape
24 | return _NewEmptyTensorOp.apply(grad, shape), None
25 |
26 |
27 | def interpolate(
28 | input, size=None, scale_factor=None, mode="nearest", align_corners=None
29 | ):
30 | if input.numel() > 0:
31 | return torch.nn.functional.interpolate(
32 | input, size, scale_factor, mode, align_corners
33 | )
34 |
35 | def _check_size_scale_factor(dim):
36 | if size is None and scale_factor is None:
37 | raise ValueError("either size or scale_factor should be defined")
38 | if size is not None and scale_factor is not None:
39 | raise ValueError("only one of size or scale_factor should be defined")
40 | if (
41 | scale_factor is not None
42 | and isinstance(scale_factor, tuple)
43 | and len(scale_factor) != dim
44 | ):
45 | raise ValueError(
46 | "scale_factor shape must match input shape. "
47 | "Input is {}D, scale_factor size is {}".format(dim, len(scale_factor))
48 | )
49 |
50 | def _output_size(dim):
51 | _check_size_scale_factor(dim)
52 | if size is not None:
53 | return size
54 | scale_factors = _ntuple(dim)(scale_factor)
55 | # math.floor might return float in py2.7
56 | return [
57 | int(math.floor(input.size(i + 2) * scale_factors[i])) for i in range(dim)
58 | ]
59 |
60 | output_shape = tuple(_output_size(2))
61 | output_shape = input.shape[:-2] + output_shape
62 | return _NewEmptyTensorOp.apply(input, output_shape)
63 |
--------------------------------------------------------------------------------
/fewx/layers/ml_nms.py:
--------------------------------------------------------------------------------
1 | from detectron2.layers import batched_nms
2 |
3 |
4 | def ml_nms(boxlist, nms_thresh, max_proposals=-1,
5 | score_field="scores", label_field="labels"):
6 | """
7 | Performs non-maximum suppression on a boxlist, with scores specified
8 | in a boxlist field via score_field.
9 |
10 | Args:
11 | boxlist (detectron2.structures.Boxes):
12 | nms_thresh (float):
13 | max_proposals (int): if > 0, then only the top max_proposals are kept
14 | after non-maximum suppression
15 | score_field (str):
16 | """
17 | if nms_thresh <= 0:
18 | return boxlist
19 | boxes = boxlist.pred_boxes.tensor
20 | scores = boxlist.scores
21 | labels = boxlist.pred_classes
22 | keep = batched_nms(boxes, scores, labels, nms_thresh)
23 | if max_proposals > 0:
24 | keep = keep[: max_proposals]
25 | boxlist = boxlist[keep]
26 | return boxlist
27 |
--------------------------------------------------------------------------------
/fewx/layers/naive_group_norm.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.nn import Module, Parameter
3 | from torch.nn import init
4 |
5 |
6 | class NaiveGroupNorm(Module):
7 | r"""NaiveGroupNorm implements Group Normalization with the high-level matrix operations in PyTorch.
8 | It is a temporary solution to export GN by ONNX before the official GN can be exported by ONNX.
9 | The usage of NaiveGroupNorm is exactly the same as the official :class:`torch.nn.GroupNorm`.
10 | Args:
11 | num_groups (int): number of groups to separate the channels into
12 | num_channels (int): number of channels expected in input
13 | eps: a value added to the denominator for numerical stability. Default: 1e-5
14 | affine: a boolean value that when set to ``True``, this module
15 | has learnable per-channel affine parameters initialized to ones (for weights)
16 | and zeros (for biases). Default: ``True``.
17 | Shape:
18 | - Input: :math:`(N, C, *)` where :math:`C=\text{num\_channels}`
19 | - Output: :math:`(N, C, *)` (same shape as input)
20 | Examples::
21 | >>> input = torch.randn(20, 6, 10, 10)
22 | >>> # Separate 6 channels into 3 groups
23 | >>> m = NaiveGroupNorm(3, 6)
24 | >>> # Separate 6 channels into 6 groups (equivalent with InstanceNorm)
25 | >>> m = NaiveGroupNorm(6, 6)
26 | >>> # Put all 6 channels into a single group (equivalent with LayerNorm)
27 | >>> m = NaiveGroupNorm(1, 6)
28 | >>> # Activating the module
29 | >>> output = m(input)
30 | .. _`Group Normalization`: https://arxiv.org/abs/1803.08494
31 | """
32 | __constants__ = ['num_groups', 'num_channels', 'eps', 'affine', 'weight',
33 | 'bias']
34 |
35 | def __init__(self, num_groups, num_channels, eps=1e-5, affine=True):
36 | super(NaiveGroupNorm, self).__init__()
37 | self.num_groups = num_groups
38 | self.num_channels = num_channels
39 | self.eps = eps
40 | self.affine = affine
41 | if self.affine:
42 | self.weight = Parameter(torch.Tensor(num_channels))
43 | self.bias = Parameter(torch.Tensor(num_channels))
44 | else:
45 | self.register_parameter('weight', None)
46 | self.register_parameter('bias', None)
47 | self.reset_parameters()
48 |
49 | def reset_parameters(self):
50 | if self.affine:
51 | init.ones_(self.weight)
52 | init.zeros_(self.bias)
53 |
54 | def forward(self, input):
55 | N, C, H, W = input.size()
56 | assert C % self.num_groups == 0
57 | input = input.reshape(N, self.num_groups, -1)
58 | mean = input.mean(dim=-1, keepdim=True)
59 | var = (input ** 2).mean(dim=-1, keepdim=True) - mean ** 2
60 | std = torch.sqrt(var + self.eps)
61 |
62 | input = (input - mean) / std
63 | input = input.reshape(N, C, H, W)
64 | if self.affine:
65 | input = input * self.weight.reshape(1, C, 1, 1) + self.bias.reshape(1, C, 1, 1)
66 | return input
67 |
68 | def extra_repr(self):
69 | return '{num_groups}, {num_channels}, eps={eps}, ' \
70 | 'affine={affine}'.format(**self.__dict__)
71 |
--------------------------------------------------------------------------------
/fewx/modeling/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
2 | from .fsod import FsodRCNN, FsodRes5ROIHeads, FsodFastRCNNOutputLayers, FsodRPN
3 |
4 | _EXCLUDE = {"torch", "ShapeSpec"}
5 | __all__ = [k for k in globals().keys() if k not in _EXCLUDE and not k.startswith("_")]
6 |
--------------------------------------------------------------------------------
/fewx/modeling/fsod/__init__.py:
--------------------------------------------------------------------------------
1 | from .fsod_rcnn import FsodRCNN
2 | from .fsod_roi_heads import FsodRes5ROIHeads
3 | from .fsod_fast_rcnn import FsodFastRCNNOutputLayers
4 | from .fsod_rpn import FsodRPN
5 |
--------------------------------------------------------------------------------
/fewx/modeling/fsod/fsod_rcnn.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
2 | import logging
3 | import numpy as np
4 | import torch
5 | from torch import nn
6 |
7 | from detectron2.data.detection_utils import convert_image_to_rgb
8 | from detectron2.structures import ImageList, Boxes, Instances
9 | from detectron2.utils.events import get_event_storage
10 | from detectron2.utils.logger import log_first_n
11 |
12 | from detectron2.modeling.backbone import build_backbone
13 | from detectron2.modeling.postprocessing import detector_postprocess
14 | from detectron2.modeling.proposal_generator import build_proposal_generator
15 | from .fsod_roi_heads import build_roi_heads
16 | from detectron2.modeling.meta_arch.build import META_ARCH_REGISTRY
17 |
18 | from detectron2.modeling.poolers import ROIPooler
19 | import torch.nn.functional as F
20 |
21 | from .fsod_fast_rcnn import FsodFastRCNNOutputs
22 |
23 | import os
24 |
25 | import matplotlib.pyplot as plt
26 | import pandas as pd
27 |
28 | from detectron2.data.catalog import MetadataCatalog
29 | import detectron2.data.detection_utils as utils
30 | import pickle
31 | import sys
32 |
33 | __all__ = ["FsodRCNN"]
34 |
35 |
36 | @META_ARCH_REGISTRY.register()
37 | class FsodRCNN(nn.Module):
38 | """
39 | Generalized R-CNN. Any models that contains the following three components:
40 | 1. Per-image feature extraction (aka backbone)
41 | 2. Region proposal generation
42 | 3. Per-region feature extraction and prediction
43 | """
44 |
45 | def __init__(self, cfg):
46 | super().__init__()
47 |
48 | self.backbone = build_backbone(cfg)
49 | self.proposal_generator = build_proposal_generator(cfg, self.backbone.output_shape())
50 | self.roi_heads = build_roi_heads(cfg, self.backbone.output_shape())
51 | self.vis_period = cfg.VIS_PERIOD
52 | self.input_format = cfg.INPUT.FORMAT
53 |
54 | assert len(cfg.MODEL.PIXEL_MEAN) == len(cfg.MODEL.PIXEL_STD)
55 | self.register_buffer("pixel_mean", torch.Tensor(cfg.MODEL.PIXEL_MEAN).view(-1, 1, 1))
56 | self.register_buffer("pixel_std", torch.Tensor(cfg.MODEL.PIXEL_STD).view(-1, 1, 1))
57 |
58 | self.in_features = cfg.MODEL.ROI_HEADS.IN_FEATURES
59 |
60 | self.support_way = cfg.INPUT.FS.SUPPORT_WAY
61 | self.support_shot = cfg.INPUT.FS.SUPPORT_SHOT
62 | self.logger = logging.getLogger(__name__)
63 |
64 | @property
65 | def device(self):
66 | return self.pixel_mean.device
67 |
68 | def visualize_training(self, batched_inputs, proposals):
69 | """
70 | A function used to visualize images and proposals. It shows ground truth
71 | bounding boxes on the original image and up to 20 predicted object
72 | proposals on the original image. Users can implement different
73 | visualization functions for different models.
74 |
75 | Args:
76 | batched_inputs (list): a list that contains input to the model.
77 | proposals (list): a list that contains predicted proposals. Both
78 | batched_inputs and proposals should have the same length.
79 | """
80 | from detectron2.utils.visualizer import Visualizer
81 |
82 | storage = get_event_storage()
83 | max_vis_prop = 20
84 |
85 | for input, prop in zip(batched_inputs, proposals):
86 | img = input["image"]
87 | img = convert_image_to_rgb(img.permute(1, 2, 0), self.input_format)
88 | v_gt = Visualizer(img, None)
89 | v_gt = v_gt.overlay_instances(boxes=input["instances"].gt_boxes)
90 | anno_img = v_gt.get_image()
91 | box_size = min(len(prop.proposal_boxes), max_vis_prop)
92 | v_pred = Visualizer(img, None)
93 | v_pred = v_pred.overlay_instances(
94 | boxes=prop.proposal_boxes[0:box_size].tensor.cpu().numpy()
95 | )
96 | prop_img = v_pred.get_image()
97 | vis_img = np.concatenate((anno_img, prop_img), axis=1)
98 | vis_img = vis_img.transpose(2, 0, 1)
99 | vis_name = "Left: GT bounding boxes; Right: Predicted proposals"
100 | storage.put_image(vis_name, vis_img)
101 | break # only visualize one image in a batch
102 |
103 | def forward(self, batched_inputs):
104 | """
105 | Args:
106 | batched_inputs: a list, batched outputs of :class:`DatasetMapper` .
107 | Each item in the list contains the inputs for one image.
108 | For now, each item in the list is a dict that contains:
109 |
110 | * image: Tensor, image in (C, H, W) format.
111 | * instances (optional): groundtruth :class:`Instances`
112 | * proposals (optional): :class:`Instances`, precomputed proposals.
113 |
114 | Other information that's included in the original dicts, such as:
115 |
116 | * "height", "width" (int): the output resolution of the model, used in inference.
117 | See :meth:`postprocess` for details.
118 |
119 | Returns:
120 | list[dict]:
121 | Each dict is the output for one input image.
122 | The dict contains one key "instances" whose value is a :class:`Instances`.
123 | The :class:`Instances` object has the following keys:
124 | "pred_boxes", "pred_classes", "scores", "pred_masks", "pred_keypoints"
125 | """
126 | if not self.training:
127 | self.init_model()
128 | return self.inference(batched_inputs)
129 |
130 | images, support_images = self.preprocess_image(batched_inputs)
131 | if "instances" in batched_inputs[0]:
132 | for x in batched_inputs:
133 | x['instances'].set('gt_classes', torch.full_like(x['instances'].get('gt_classes'), 0))
134 |
135 | gt_instances = [x["instances"].to(self.device) for x in batched_inputs]
136 | else:
137 | gt_instances = None
138 |
139 | features = self.backbone(images.tensor)
140 |
141 | # support branches
142 | support_bboxes_ls = []
143 | for item in batched_inputs:
144 | bboxes = item['support_bboxes']
145 | for box in bboxes:
146 | box = Boxes(box[np.newaxis, :])
147 | support_bboxes_ls.append(box.to(self.device))
148 |
149 | B, N, C, H, W = support_images.tensor.shape
150 | assert N == self.support_way * self.support_shot
151 |
152 | support_images = support_images.tensor.reshape(B*N, C, H, W)
153 | support_features = self.backbone(support_images)
154 |
155 | # support feature roi pooling
156 | feature_pooled = self.roi_heads.roi_pooling(support_features, support_bboxes_ls)
157 |
158 | support_box_features = self.roi_heads._shared_roi_transform([support_features[f] for f in self.in_features], support_bboxes_ls)
159 | assert self.support_way == 2 # now only 2 way support
160 |
161 | detector_loss_cls = []
162 | detector_loss_box_reg = []
163 | rpn_loss_rpn_cls = []
164 | rpn_loss_rpn_loc = []
165 | for i in range(B): # batch
166 | # query
167 | query_gt_instances = [gt_instances[i]] # one query gt instances
168 | query_images = ImageList.from_tensors([images[i]]) # one query image
169 |
170 | query_feature_res4 = features['res4'][i].unsqueeze(0) # one query feature for attention rpn
171 | query_features = {'res4': query_feature_res4} # one query feature for rcnn
172 |
173 | # positive support branch ##################################
174 | pos_begin = i * self.support_shot * self.support_way
175 | pos_end = pos_begin + self.support_shot
176 | pos_support_features = feature_pooled[pos_begin:pos_end].mean(0, True) # pos support features from res4, average all supports, for rcnn
177 | pos_support_features_pool = pos_support_features.mean(dim=[2, 3], keepdim=True) # average pooling support feature for attention rpn
178 | pos_correlation = F.conv2d(query_feature_res4, pos_support_features_pool.permute(1,0,2,3), groups=1024) # attention map
179 |
180 | pos_features = {'res4': pos_correlation} # attention map for attention rpn
181 | pos_support_box_features = support_box_features[pos_begin:pos_end].mean(0, True)
182 | pos_proposals, pos_anchors, pos_pred_objectness_logits, pos_gt_labels, pos_pred_anchor_deltas, pos_gt_boxes = self.proposal_generator(query_images, pos_features, query_gt_instances) # attention rpn
183 | pos_pred_class_logits, pos_pred_proposal_deltas, pos_detector_proposals = self.roi_heads(query_images, query_features, pos_support_box_features, pos_proposals, query_gt_instances) # pos rcnn
184 |
185 | # negative support branch ##################################
186 | neg_begin = pos_end
187 | neg_end = neg_begin + self.support_shot
188 |
189 | neg_support_features = feature_pooled[neg_begin:neg_end].mean(0, True)
190 | neg_support_features_pool = neg_support_features.mean(dim=[2, 3], keepdim=True)
191 | neg_correlation = F.conv2d(query_feature_res4, neg_support_features_pool.permute(1,0,2,3), groups=1024)
192 |
193 | neg_features = {'res4': neg_correlation}
194 |
195 | neg_support_box_features = support_box_features[neg_begin:neg_end].mean(0, True)
196 | neg_proposals, neg_anchors, neg_pred_objectness_logits, neg_gt_labels, neg_pred_anchor_deltas, neg_gt_boxes = self.proposal_generator(query_images, neg_features, query_gt_instances)
197 | neg_pred_class_logits, neg_pred_proposal_deltas, neg_detector_proposals = self.roi_heads(query_images, query_features, neg_support_box_features, neg_proposals, query_gt_instances)
198 |
199 | # rpn loss
200 | outputs_images = ImageList.from_tensors([images[i], images[i]])
201 |
202 | outputs_pred_objectness_logits = [torch.cat(pos_pred_objectness_logits + neg_pred_objectness_logits, dim=0)]
203 | outputs_pred_anchor_deltas = [torch.cat(pos_pred_anchor_deltas + neg_pred_anchor_deltas, dim=0)]
204 |
205 | outputs_anchors = pos_anchors # + neg_anchors
206 |
207 | # convert 1 in neg_gt_labels to 0
208 | for item in neg_gt_labels:
209 | item[item == 1] = 0
210 |
211 | outputs_gt_boxes = pos_gt_boxes + neg_gt_boxes #[None]
212 | outputs_gt_labels = pos_gt_labels + neg_gt_labels
213 |
214 | if self.training:
215 | proposal_losses = self.proposal_generator.losses(
216 | outputs_anchors, outputs_pred_objectness_logits, outputs_gt_labels, outputs_pred_anchor_deltas, outputs_gt_boxes)
217 | proposal_losses = {k: v * self.proposal_generator.loss_weight for k, v in proposal_losses.items()}
218 | else:
219 | proposal_losses = {}
220 |
221 | # detector loss
222 | detector_pred_class_logits = torch.cat([pos_pred_class_logits, neg_pred_class_logits], dim=0)
223 | detector_pred_proposal_deltas = torch.cat([pos_pred_proposal_deltas, neg_pred_proposal_deltas], dim=0)
224 | for item in neg_detector_proposals:
225 | item.gt_classes = torch.full_like(item.gt_classes, 1)
226 |
227 | #detector_proposals = pos_detector_proposals + neg_detector_proposals
228 | detector_proposals = [Instances.cat(pos_detector_proposals + neg_detector_proposals)]
229 | if self.training:
230 | predictions = detector_pred_class_logits, detector_pred_proposal_deltas
231 | detector_losses = self.roi_heads.box_predictor.losses(predictions, detector_proposals)
232 |
233 | rpn_loss_rpn_cls.append(proposal_losses['loss_rpn_cls'])
234 | rpn_loss_rpn_loc.append(proposal_losses['loss_rpn_loc'])
235 | detector_loss_cls.append(detector_losses['loss_cls'])
236 | detector_loss_box_reg.append(detector_losses['loss_box_reg'])
237 |
238 | proposal_losses = {}
239 | detector_losses = {}
240 |
241 | proposal_losses['loss_rpn_cls'] = torch.stack(rpn_loss_rpn_cls).mean()
242 | proposal_losses['loss_rpn_loc'] = torch.stack(rpn_loss_rpn_loc).mean()
243 | detector_losses['loss_cls'] = torch.stack(detector_loss_cls).mean()
244 | detector_losses['loss_box_reg'] = torch.stack(detector_loss_box_reg).mean()
245 |
246 |
247 | losses = {}
248 | losses.update(detector_losses)
249 | losses.update(proposal_losses)
250 | return losses
251 |
252 | def init_model(self):
253 | self.support_on = True #False
254 |
255 | support_dir = './support_dir'
256 | if not os.path.exists(support_dir):
257 | os.makedirs(support_dir)
258 |
259 | support_file_name = os.path.join(support_dir, 'support_feature.pkl')
260 | if not os.path.exists(support_file_name):
261 | support_path = './datasets/coco/10_shot_support_df.pkl'
262 | support_df = pd.read_pickle(support_path)
263 |
264 | metadata = MetadataCatalog.get('coco_2017_train')
265 | # unmap the category mapping ids for COCO
266 | reverse_id_mapper = lambda dataset_id: metadata.thing_dataset_id_to_contiguous_id[dataset_id] # noqa
267 | support_df['category_id'] = support_df['category_id'].map(reverse_id_mapper)
268 |
269 | support_dict = {'res4_avg': {}, 'res5_avg': {}}
270 | for cls in support_df['category_id'].unique():
271 | support_cls_df = support_df.loc[support_df['category_id'] == cls, :].reset_index()
272 | support_data_all = []
273 | support_box_all = []
274 |
275 | for index, support_img_df in support_cls_df.iterrows():
276 | img_path = os.path.join('./datasets/coco', support_img_df['file_path'])
277 | support_data = utils.read_image(img_path, format='BGR')
278 | support_data = torch.as_tensor(np.ascontiguousarray(support_data.transpose(2, 0, 1)))
279 | support_data_all.append(support_data)
280 |
281 | support_box = support_img_df['support_box']
282 | support_box_all.append(Boxes([support_box]).to(self.device))
283 |
284 | # support images
285 | support_images = [x.to(self.device) for x in support_data_all]
286 | support_images = [(x - self.pixel_mean) / self.pixel_std for x in support_images]
287 | support_images = ImageList.from_tensors(support_images, self.backbone.size_divisibility)
288 | support_features = self.backbone(support_images.tensor)
289 |
290 | res4_pooled = self.roi_heads.roi_pooling(support_features, support_box_all)
291 | res4_avg = res4_pooled.mean(0, True)
292 | res4_avg = res4_avg.mean(dim=[2,3], keepdim=True)
293 | support_dict['res4_avg'][cls] = res4_avg.detach().cpu().data
294 |
295 | res5_feature = self.roi_heads._shared_roi_transform([support_features[f] for f in self.in_features], support_box_all)
296 | res5_avg = res5_feature.mean(0, True)
297 | support_dict['res5_avg'][cls] = res5_avg.detach().cpu().data
298 |
299 | del res4_avg
300 | del res4_pooled
301 | del support_features
302 | del res5_feature
303 | del res5_avg
304 |
305 | with open(support_file_name, 'wb') as f:
306 | pickle.dump(support_dict, f)
307 | self.logger.info("=========== Offline support features are generated. ===========")
308 | self.logger.info("============ Few-shot object detetion will start. =============")
309 | sys.exit(0)
310 |
311 | else:
312 | with open(support_file_name, "rb") as hFile:
313 | self.support_dict = pickle.load(hFile, encoding="latin1")
314 | for res_key, res_dict in self.support_dict.items():
315 | for cls_key, feature in res_dict.items():
316 | self.support_dict[res_key][cls_key] = feature.cuda()
317 |
318 | def inference(self, batched_inputs, detected_instances=None, do_postprocess=True):
319 | """
320 | Run inference on the given inputs.
321 | Args:
322 | batched_inputs (list[dict]): same as in :meth:`forward`
323 | detected_instances (None or list[Instances]): if not None, it
324 | contains an `Instances` object per image. The `Instances`
325 | object contains "pred_boxes" and "pred_classes" which are
326 | known boxes in the image.
327 | The inference will then skip the detection of bounding boxes,
328 | and only predict other per-ROI outputs.
329 | do_postprocess (bool): whether to apply post-processing on the outputs.
330 | Returns:
331 | same as in :meth:`forward`.
332 | """
333 | assert not self.training
334 |
335 | images = self.preprocess_image(batched_inputs)
336 | features = self.backbone(images.tensor)
337 |
338 | B, _, _, _ = features['res4'].shape
339 | assert B == 1 # only support 1 query image in test
340 | assert len(images) == 1
341 | support_proposals_dict = {}
342 | support_box_features_dict = {}
343 | proposal_num_dict = {}
344 |
345 | for cls_id, res4_avg in self.support_dict['res4_avg'].items():
346 | query_images = ImageList.from_tensors([images[0]]) # one query image
347 |
348 | query_features_res4 = features['res4'] # one query feature for attention rpn
349 | query_features = {'res4': query_features_res4} # one query feature for rcnn
350 |
351 | # support branch ##################################
352 | support_box_features = self.support_dict['res5_avg'][cls_id]
353 |
354 | correlation = F.conv2d(query_features_res4, res4_avg.permute(1,0,2,3), groups=1024) # attention map
355 |
356 | support_correlation = {'res4': correlation} # attention map for attention rpn
357 |
358 | proposals, _ = self.proposal_generator(query_images, support_correlation, None)
359 | support_proposals_dict[cls_id] = proposals
360 | support_box_features_dict[cls_id] = support_box_features
361 |
362 | if cls_id not in proposal_num_dict.keys():
363 | proposal_num_dict[cls_id] = []
364 | proposal_num_dict[cls_id].append(len(proposals[0]))
365 |
366 | del support_box_features
367 | del correlation
368 | del res4_avg
369 | del query_features_res4
370 |
371 | results, _ = self.roi_heads.eval_with_support(query_images, query_features, support_proposals_dict, support_box_features_dict)
372 |
373 | if do_postprocess:
374 | return FsodRCNN._postprocess(results, batched_inputs, images.image_sizes)
375 | else:
376 | return results
377 |
378 | def preprocess_image(self, batched_inputs):
379 | """
380 | Normalize, pad and batch the input images.
381 | """
382 | images = [x["image"].to(self.device) for x in batched_inputs]
383 | images = [(x - self.pixel_mean) / self.pixel_std for x in images]
384 | images = ImageList.from_tensors(images, self.backbone.size_divisibility)
385 | if self.training:
386 | # support images
387 | support_images = [x['support_images'].to(self.device) for x in batched_inputs]
388 | support_images = [(x - self.pixel_mean) / self.pixel_std for x in support_images]
389 | support_images = ImageList.from_tensors(support_images, self.backbone.size_divisibility)
390 |
391 | return images, support_images
392 | else:
393 | return images
394 |
395 | @staticmethod
396 | def _postprocess(instances, batched_inputs, image_sizes):
397 | """
398 | Rescale the output instances to the target size.
399 | """
400 | # note: private function; subject to changes
401 | processed_results = []
402 | for results_per_image, input_per_image, image_size in zip(
403 | instances, batched_inputs, image_sizes
404 | ):
405 | height = input_per_image.get("height", image_size[0])
406 | width = input_per_image.get("width", image_size[1])
407 | r = detector_postprocess(results_per_image, height, width)
408 | processed_results.append({"instances": r})
409 | return processed_results
410 |
--------------------------------------------------------------------------------
/fewx/modeling/fsod/fsod_roi_heads.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
2 | import inspect
3 | import logging
4 | import numpy as np
5 | from typing import Dict, List, Optional, Tuple, Union
6 | import torch
7 | from torch import nn
8 |
9 | from detectron2.config import configurable
10 | from detectron2.layers import ShapeSpec, nonzero_tuple
11 | from detectron2.structures import Boxes, ImageList, Instances, pairwise_iou
12 | from detectron2.utils.events import get_event_storage
13 | from detectron2.utils.registry import Registry
14 |
15 | from detectron2.modeling.backbone.resnet import BottleneckBlock, make_stage
16 | from detectron2.modeling.matcher import Matcher
17 | from detectron2.modeling.poolers import ROIPooler
18 | from detectron2.modeling.proposal_generator.proposal_utils import add_ground_truth_to_proposals
19 | from detectron2.modeling.sampling import subsample_labels
20 | from detectron2.modeling.roi_heads.box_head import build_box_head
21 | from detectron2.modeling.roi_heads.roi_heads import ROIHeads
22 | from .fsod_fast_rcnn import FsodFastRCNNOutputLayers
23 |
24 | import time
25 | from detectron2.structures import Boxes, Instances
26 |
27 | ROI_HEADS_REGISTRY = Registry("ROI_HEADS")
28 | ROI_HEADS_REGISTRY.__doc__ = """
29 | Registry for ROI heads in a generalized R-CNN model.
30 | ROIHeads take feature maps and region proposals, and
31 | perform per-region computation.
32 |
33 | The registered object will be called with `obj(cfg, input_shape)`.
34 | The call is expected to return an :class:`ROIHeads`.
35 | """
36 |
37 | logger = logging.getLogger(__name__)
38 |
39 | def build_roi_heads(cfg, input_shape):
40 | """
41 | Build ROIHeads defined by `cfg.MODEL.ROI_HEADS.NAME`.
42 | """
43 | name = cfg.MODEL.ROI_HEADS.NAME
44 | return ROI_HEADS_REGISTRY.get(name)(cfg, input_shape)
45 |
46 | @ROI_HEADS_REGISTRY.register()
47 | class FsodRes5ROIHeads(ROIHeads):
48 | """
49 | The ROIHeads in a typical "C4" R-CNN model, where
50 | the box and mask head share the cropping and
51 | the per-region feature computation by a Res5 block.
52 | """
53 |
54 | def __init__(self, cfg, input_shape):
55 | super().__init__(cfg)
56 |
57 | # fmt: off
58 | self.in_features = cfg.MODEL.ROI_HEADS.IN_FEATURES
59 | pooler_resolution = cfg.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION
60 | pooler_type = cfg.MODEL.ROI_BOX_HEAD.POOLER_TYPE
61 | pooler_scales = (1.0 / input_shape[self.in_features[0]].stride, )
62 | sampling_ratio = cfg.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO
63 | self.mask_on = cfg.MODEL.MASK_ON
64 | # fmt: on
65 | assert not cfg.MODEL.KEYPOINT_ON
66 | assert len(self.in_features) == 1
67 |
68 | self.pooler = ROIPooler(
69 | output_size=pooler_resolution,
70 | scales=pooler_scales,
71 | sampling_ratio=sampling_ratio,
72 | pooler_type=pooler_type,
73 | )
74 |
75 | self.res5, out_channels = self._build_res5_block(cfg)
76 | self.box_predictor = FsodFastRCNNOutputLayers(
77 | cfg, ShapeSpec(channels=out_channels, height=1, width=1)
78 | )
79 |
80 | def _build_res5_block(self, cfg):
81 | # fmt: off
82 | stage_channel_factor = 2 ** 3 # res5 is 8x res2
83 | num_groups = cfg.MODEL.RESNETS.NUM_GROUPS
84 | width_per_group = cfg.MODEL.RESNETS.WIDTH_PER_GROUP
85 | bottleneck_channels = num_groups * width_per_group * stage_channel_factor
86 | out_channels = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS * stage_channel_factor
87 | stride_in_1x1 = cfg.MODEL.RESNETS.STRIDE_IN_1X1
88 | norm = cfg.MODEL.RESNETS.NORM
89 | assert not cfg.MODEL.RESNETS.DEFORM_ON_PER_STAGE[-1], \
90 | "Deformable conv is not yet supported in res5 head."
91 | # fmt: on
92 |
93 | blocks = make_stage(
94 | BottleneckBlock,
95 | 3,
96 | first_stride=2,
97 | in_channels=out_channels // 2,
98 | bottleneck_channels=bottleneck_channels,
99 | out_channels=out_channels,
100 | num_groups=num_groups,
101 | norm=norm,
102 | stride_in_1x1=stride_in_1x1,
103 | )
104 | return nn.Sequential(*blocks), out_channels
105 |
106 | def _shared_roi_transform(self, features, boxes):
107 | x = self.pooler(features, boxes)
108 | return self.res5(x)
109 |
110 | def roi_pooling(self, features, boxes):
111 | box_features = self.pooler(
112 | [features[f] for f in self.in_features], boxes
113 | )
114 | #feature_pooled = box_features.mean(dim=[2, 3], keepdim=True) # pooled to 1x1
115 |
116 | return box_features #feature_pooled
117 |
118 | def forward(self, images, features, support_box_features, proposals, targets=None):
119 | """
120 | See :meth:`ROIHeads.forward`.
121 | """
122 | del images
123 |
124 | if self.training:
125 | assert targets
126 | proposals = self.label_and_sample_proposals(proposals, targets)
127 | del targets
128 | proposal_boxes = [x.proposal_boxes for x in proposals]
129 | box_features = self._shared_roi_transform(
130 | [features[f] for f in self.in_features], proposal_boxes
131 | )
132 |
133 | #support_features = self.res5(support_features)
134 | pred_class_logits, pred_proposal_deltas = self.box_predictor(box_features, support_box_features)
135 |
136 | return pred_class_logits, pred_proposal_deltas, proposals
137 |
138 | @torch.no_grad()
139 | def eval_with_support(self, images, features, support_proposals_dict, support_box_features_dict):
140 | """
141 | See :meth:`ROIHeads.forward`.
142 | """
143 | del images
144 |
145 | full_proposals_ls = []
146 | cls_ls = []
147 | for cls_id, proposals in support_proposals_dict.items():
148 | full_proposals_ls.append(proposals[0])
149 | cls_ls.append(cls_id)
150 |
151 | full_proposals_ls = [Instances.cat(full_proposals_ls)]
152 |
153 | proposal_boxes = [x.proposal_boxes for x in full_proposals_ls]
154 | assert len(proposal_boxes[0]) == 2000
155 |
156 | box_features = self._shared_roi_transform(
157 | [features[f] for f in self.in_features], proposal_boxes
158 | )
159 |
160 | full_scores_ls = []
161 | full_bboxes_ls = []
162 | full_cls_ls = []
163 | cnt = 0
164 | #for cls_id, support_box_features in support_box_features_dict.items():
165 | for cls_id in cls_ls:
166 | support_box_features = support_box_features_dict[cls_id]
167 | query_features = box_features[cnt*100:(cnt+1)*100]
168 | pred_class_logits, pred_proposal_deltas = self.box_predictor(query_features, support_box_features)
169 | full_scores_ls.append(pred_class_logits)
170 | full_bboxes_ls.append(pred_proposal_deltas)
171 | full_cls_ls.append(torch.full_like(pred_class_logits[:, 0].unsqueeze(-1), cls_id).to(torch.int8))
172 | del query_features
173 | del support_box_features
174 |
175 | cnt += 1
176 |
177 | class_logits = torch.cat(full_scores_ls, dim=0)
178 | proposal_deltas = torch.cat(full_bboxes_ls, dim=0)
179 | pred_cls = torch.cat(full_cls_ls, dim=0) #.unsqueeze(-1)
180 |
181 | predictions = class_logits, proposal_deltas
182 | proposals = full_proposals_ls
183 | pred_instances, _ = self.box_predictor.inference(pred_cls, predictions, proposals)
184 | pred_instances = self.forward_with_given_boxes(features, pred_instances)
185 |
186 | return pred_instances, {}
187 |
188 | def forward_with_given_boxes(self, features, instances):
189 | """
190 | Use the given boxes in `instances` to produce other (non-box) per-ROI outputs.
191 |
192 | Args:
193 | features: same as in `forward()`
194 | instances (list[Instances]): instances to predict other outputs. Expect the keys
195 | "pred_boxes" and "pred_classes" to exist.
196 |
197 | Returns:
198 | instances (Instances):
199 | the same `Instances` object, with extra
200 | fields such as `pred_masks` or `pred_keypoints`.
201 | """
202 | assert not self.training
203 | assert instances[0].has("pred_boxes") and instances[0].has("pred_classes")
204 |
205 | if self.mask_on:
206 | features = [features[f] for f in self.in_features]
207 | x = self._shared_roi_transform(features, [x.pred_boxes for x in instances])
208 | return self.mask_head(x, instances)
209 | else:
210 | return instances
211 |
--------------------------------------------------------------------------------
/fewx/modeling/fsod/fsod_rpn.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
2 | from typing import Dict, List, Optional, Tuple
3 | import torch
4 | import torch.nn.functional as F
5 | from fvcore.nn import smooth_l1_loss
6 | from torch import nn
7 |
8 | from detectron2.config import configurable
9 | from detectron2.layers import ShapeSpec, cat
10 | from detectron2.structures import Boxes, ImageList, Instances, pairwise_iou
11 | from detectron2.utils.events import get_event_storage
12 | from detectron2.utils.memory import retry_if_cuda_oom
13 | from detectron2.utils.registry import Registry
14 |
15 | from detectron2.modeling.anchor_generator import build_anchor_generator
16 | from detectron2.modeling.box_regression import Box2BoxTransform
17 | from detectron2.modeling.matcher import Matcher
18 | from detectron2.modeling.sampling import subsample_labels
19 | from detectron2.modeling.proposal_generator.build import PROPOSAL_GENERATOR_REGISTRY
20 | from detectron2.modeling.proposal_generator.proposal_utils import find_top_rpn_proposals
21 |
22 | RPN_HEAD_REGISTRY = Registry("RPN_HEAD")
23 | RPN_HEAD_REGISTRY.__doc__ = """
24 | Registry for RPN heads, which take feature maps and perform
25 | objectness classification and bounding box regression for anchors.
26 |
27 | The registered object will be called with `obj(cfg, input_shape)`.
28 | The call should return a `nn.Module` object.
29 | """
30 |
31 |
32 | """
33 | Shape shorthand in this module:
34 |
35 | N: number of images in the minibatch
36 | L: number of feature maps per image on which RPN is run
37 | A: number of cell anchors (must be the same for all feature maps)
38 | Hi, Wi: height and width of the i-th feature map
39 | B: size of the box parameterization
40 |
41 | Naming convention:
42 |
43 | objectness: refers to the binary classification of an anchor as object vs. not object.
44 |
45 | deltas: refers to the 4-d (dx, dy, dw, dh) deltas that parameterize the box2box
46 | transform (see :class:`box_regression.Box2BoxTransform`), or 5d for rotated boxes.
47 |
48 | pred_objectness_logits: predicted objectness scores in [-inf, +inf]; use
49 | sigmoid(pred_objectness_logits) to estimate P(object).
50 |
51 | gt_labels: ground-truth binary classification labels for objectness
52 |
53 | pred_anchor_deltas: predicted box2box transform deltas
54 |
55 | gt_anchor_deltas: ground-truth box2box transform deltas
56 | """
57 |
58 |
59 | def build_rpn_head(cfg, input_shape):
60 | """
61 | Build an RPN head defined by `cfg.MODEL.RPN.HEAD_NAME`.
62 | """
63 | name = cfg.MODEL.RPN.HEAD_NAME
64 | return RPN_HEAD_REGISTRY.get(name)(cfg, input_shape)
65 |
66 |
67 | @RPN_HEAD_REGISTRY.register()
68 | class StandardRPNHead(nn.Module):
69 | """
70 | Standard RPN classification and regression heads described in :paper:`Faster R-CNN`.
71 | Uses a 3x3 conv to produce a shared hidden state from which one 1x1 conv predicts
72 | objectness logits for each anchor and a second 1x1 conv predicts bounding-box deltas
73 | specifying how to deform each anchor into an object proposal.
74 | """
75 |
76 | @configurable
77 | def __init__(self, *, in_channels: int, num_anchors: int, box_dim: int = 4):
78 | """
79 | NOTE: this interface is experimental.
80 |
81 | Args:
82 | in_channels (int): number of input feature channels. When using multiple
83 | input features, they must have the same number of channels.
84 | num_anchors (int): number of anchors to predict for *each spatial position*
85 | on the feature map. The total number of anchors for each
86 | feature map will be `num_anchors * H * W`.
87 | box_dim (int): dimension of a box, which is also the number of box regression
88 | predictions to make for each anchor. An axis aligned box has
89 | box_dim=4, while a rotated box has box_dim=5.
90 | """
91 | super().__init__()
92 | # 3x3 conv for the hidden representation
93 | self.conv = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1)
94 | # 1x1 conv for predicting objectness logits
95 | self.objectness_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1)
96 | # 1x1 conv for predicting box2box transform deltas
97 | self.anchor_deltas = nn.Conv2d(in_channels, num_anchors * box_dim, kernel_size=1, stride=1)
98 |
99 | for l in [self.conv, self.objectness_logits, self.anchor_deltas]:
100 | nn.init.normal_(l.weight, std=0.01)
101 | nn.init.constant_(l.bias, 0)
102 |
103 | @classmethod
104 | def from_config(cls, cfg, input_shape):
105 | # Standard RPN is shared across levels:
106 | in_channels = [s.channels for s in input_shape]
107 | assert len(set(in_channels)) == 1, "Each level must have the same channel!"
108 | in_channels = in_channels[0]
109 |
110 | # RPNHead should take the same input as anchor generator
111 | # NOTE: it assumes that creating an anchor generator does not have unwanted side effect.
112 | anchor_generator = build_anchor_generator(cfg, input_shape)
113 | num_anchors = anchor_generator.num_anchors
114 | box_dim = anchor_generator.box_dim
115 | assert (
116 | len(set(num_anchors)) == 1
117 | ), "Each level must have the same number of anchors per spatial position"
118 | return {"in_channels": in_channels, "num_anchors": num_anchors[0], "box_dim": box_dim}
119 |
120 | def forward(self, features: List[torch.Tensor]):
121 | """
122 | Args:
123 | features (list[Tensor]): list of feature maps
124 |
125 | Returns:
126 | list[Tensor]: A list of L elements.
127 | Element i is a tensor of shape (N, A, Hi, Wi) representing
128 | the predicted objectness logits for all anchors. A is the number of cell anchors.
129 | list[Tensor]: A list of L elements. Element i is a tensor of shape
130 | (N, A*box_dim, Hi, Wi) representing the predicted "deltas" used to transform anchors
131 | to proposals.
132 | """
133 | pred_objectness_logits = []
134 | pred_anchor_deltas = []
135 | for x in features:
136 | t = F.relu(self.conv(x))
137 | pred_objectness_logits.append(self.objectness_logits(t))
138 | pred_anchor_deltas.append(self.anchor_deltas(t))
139 | return pred_objectness_logits, pred_anchor_deltas
140 |
141 |
142 | @PROPOSAL_GENERATOR_REGISTRY.register()
143 | class FsodRPN(nn.Module):
144 | """
145 | Region Proposal Network, introduced by :paper:`Faster R-CNN`.
146 | """
147 |
148 | @configurable
149 | def __init__(
150 | self,
151 | *,
152 | in_features: List[str],
153 | head: nn.Module,
154 | anchor_generator: nn.Module,
155 | anchor_matcher: Matcher,
156 | box2box_transform: Box2BoxTransform,
157 | batch_size_per_image: int,
158 | positive_fraction: float,
159 | pre_nms_topk: Tuple[float, float],
160 | post_nms_topk: Tuple[float, float],
161 | nms_thresh: float = 0.7,
162 | min_box_size: float = 0.0,
163 | anchor_boundary_thresh: float = -1.0,
164 | loss_weight: float = 1.0,
165 | smooth_l1_beta: float = 0.0
166 | ):
167 | """
168 | NOTE: this interface is experimental.
169 |
170 | Args:
171 | in_features (list[str]): list of names of input features to use
172 | head (nn.Module): a module that predicts logits and regression deltas
173 | for each level from a list of per-level features
174 | anchor_generator (nn.Module): a module that creates anchors from a
175 | list of features. Usually an instance of :class:`AnchorGenerator`
176 | anchor_matcher (Matcher): label the anchors by matching them with ground truth.
177 | box2box_transform (Box2BoxTransform): defines the transform from anchors boxes to
178 | instance boxes
179 | batch_size_per_image (int): number of anchors per image to sample for training
180 | positive_fraction (float): fraction of foreground anchors to sample for training
181 | pre_nms_topk (tuple[float]): (train, test) that represents the
182 | number of top k proposals to select before NMS, in
183 | training and testing.
184 | post_nms_topk (tuple[float]): (train, test) that represents the
185 | number of top k proposals to select after NMS, in
186 | training and testing.
187 | nms_thresh (float): NMS threshold used to de-duplicate the predicted proposals
188 | min_box_size (float): remove proposal boxes with any side smaller than this threshold,
189 | in the unit of input image pixels
190 | anchor_boundary_thresh (float): legacy option
191 | loss_weight (float): weight to be multiplied to the loss
192 | smooth_l1_beta (float): beta parameter for the smooth L1
193 | regression loss. Default to use L1 loss.
194 | """
195 | super().__init__()
196 | self.in_features = in_features
197 | self.rpn_head = head
198 | self.anchor_generator = anchor_generator
199 | self.anchor_matcher = anchor_matcher
200 | self.box2box_transform = box2box_transform
201 | self.batch_size_per_image = batch_size_per_image
202 | self.positive_fraction = positive_fraction
203 | # Map from self.training state to train/test settings
204 | self.pre_nms_topk = {True: pre_nms_topk[0], False: pre_nms_topk[1]}
205 | self.post_nms_topk = {True: post_nms_topk[0], False: post_nms_topk[1]}
206 | self.nms_thresh = nms_thresh
207 | self.min_box_size = min_box_size
208 | self.anchor_boundary_thresh = anchor_boundary_thresh
209 | self.loss_weight = loss_weight
210 | self.smooth_l1_beta = smooth_l1_beta
211 |
212 | @classmethod
213 | def from_config(cls, cfg, input_shape: Dict[str, ShapeSpec]):
214 | in_features = cfg.MODEL.RPN.IN_FEATURES
215 | ret = {
216 | "in_features": in_features,
217 | "min_box_size": cfg.MODEL.PROPOSAL_GENERATOR.MIN_SIZE,
218 | "nms_thresh": cfg.MODEL.RPN.NMS_THRESH,
219 | "batch_size_per_image": cfg.MODEL.RPN.BATCH_SIZE_PER_IMAGE,
220 | "positive_fraction": cfg.MODEL.RPN.POSITIVE_FRACTION,
221 | "smooth_l1_beta": cfg.MODEL.RPN.SMOOTH_L1_BETA,
222 | "loss_weight": cfg.MODEL.RPN.LOSS_WEIGHT,
223 | "anchor_boundary_thresh": cfg.MODEL.RPN.BOUNDARY_THRESH,
224 | "box2box_transform": Box2BoxTransform(weights=cfg.MODEL.RPN.BBOX_REG_WEIGHTS),
225 | }
226 |
227 | ret["pre_nms_topk"] = (cfg.MODEL.RPN.PRE_NMS_TOPK_TRAIN, cfg.MODEL.RPN.PRE_NMS_TOPK_TEST)
228 | ret["post_nms_topk"] = (cfg.MODEL.RPN.POST_NMS_TOPK_TRAIN, cfg.MODEL.RPN.POST_NMS_TOPK_TEST)
229 |
230 | ret["anchor_generator"] = build_anchor_generator(cfg, [input_shape[f] for f in in_features])
231 | ret["anchor_matcher"] = Matcher(
232 | cfg.MODEL.RPN.IOU_THRESHOLDS, cfg.MODEL.RPN.IOU_LABELS, allow_low_quality_matches=True
233 | )
234 | ret["head"] = build_rpn_head(cfg, [input_shape[f] for f in in_features])
235 | return ret
236 |
237 | def _subsample_labels(self, label):
238 | """
239 | Randomly sample a subset of positive and negative examples, and overwrite
240 | the label vector to the ignore value (-1) for all elements that are not
241 | included in the sample.
242 |
243 | Args:
244 | labels (Tensor): a vector of -1, 0, 1. Will be modified in-place and returned.
245 | """
246 | pos_idx, neg_idx = subsample_labels(
247 | label, self.batch_size_per_image, self.positive_fraction, 0
248 | )
249 | # Fill with the ignore label (-1), then set positive and negative labels
250 | label.fill_(-1)
251 | label.scatter_(0, pos_idx, 1)
252 | label.scatter_(0, neg_idx, 0)
253 | return label
254 |
255 | @torch.no_grad()
256 | def label_and_sample_anchors(self, anchors: List[Boxes], gt_instances: List[Instances]):
257 | """
258 | Args:
259 | anchors (list[Boxes]): anchors for each feature map.
260 | gt_instances: the ground-truth instances for each image.
261 |
262 | Returns:
263 | list[Tensor]:
264 | List of #img tensors. i-th element is a vector of labels whose length is
265 | the total number of anchors across all feature maps R = sum(Hi * Wi * A).
266 | Label values are in {-1, 0, 1}, with meanings: -1 = ignore; 0 = negative
267 | class; 1 = positive class.
268 | list[Tensor]:
269 | i-th element is a Rx4 tensor. The values are the matched gt boxes for each
270 | anchor. Values are undefined for those anchors not labeled as 1.
271 | """
272 | anchors = Boxes.cat(anchors)
273 |
274 | gt_boxes = [x.gt_boxes for x in gt_instances]
275 | image_sizes = [x.image_size for x in gt_instances]
276 | del gt_instances
277 |
278 | gt_labels = []
279 | matched_gt_boxes = []
280 | for image_size_i, gt_boxes_i in zip(image_sizes, gt_boxes):
281 | """
282 | image_size_i: (h, w) for the i-th image
283 | gt_boxes_i: ground-truth boxes for i-th image
284 | """
285 |
286 | match_quality_matrix = retry_if_cuda_oom(pairwise_iou)(gt_boxes_i, anchors)
287 | matched_idxs, gt_labels_i = retry_if_cuda_oom(self.anchor_matcher)(match_quality_matrix)
288 | # Matching is memory-expensive and may result in CPU tensors. But the result is small
289 | gt_labels_i = gt_labels_i.to(device=gt_boxes_i.device)
290 | del match_quality_matrix
291 |
292 | if self.anchor_boundary_thresh >= 0:
293 | # Discard anchors that go out of the boundaries of the image
294 | # NOTE: This is legacy functionality that is turned off by default in Detectron2
295 | anchors_inside_image = anchors.inside_box(image_size_i, self.anchor_boundary_thresh)
296 | gt_labels_i[~anchors_inside_image] = -1
297 |
298 | # A vector of labels (-1, 0, 1) for each anchor
299 | gt_labels_i = self._subsample_labels(gt_labels_i)
300 |
301 | if len(gt_boxes_i) == 0:
302 | # These values won't be used anyway since the anchor is labeled as background
303 | matched_gt_boxes_i = torch.zeros_like(anchors.tensor)
304 | else:
305 | # TODO wasted indexing computation for ignored boxes
306 | matched_gt_boxes_i = gt_boxes_i[matched_idxs].tensor
307 |
308 | gt_labels.append(gt_labels_i) # N,AHW
309 | matched_gt_boxes.append(matched_gt_boxes_i)
310 | return gt_labels, matched_gt_boxes
311 |
312 | def losses(
313 | self,
314 | anchors,
315 | pred_objectness_logits: List[torch.Tensor],
316 | gt_labels: List[torch.Tensor],
317 | pred_anchor_deltas: List[torch.Tensor],
318 | gt_boxes,
319 | ):
320 | """
321 | Return the losses from a set of RPN predictions and their associated ground-truth.
322 |
323 | Args:
324 | anchors (list[Boxes or RotatedBoxes]): anchors for each feature map, each
325 | has shape (Hi*Wi*A, B), where B is box dimension (4 or 5).
326 | pred_objectness_logits (list[Tensor]): A list of L elements.
327 | Element i is a tensor of shape (N, Hi*Wi*A) representing
328 | the predicted objectness logits for all anchors.
329 | gt_labels (list[Tensor]): Output of :meth:`label_and_sample_anchors`.
330 | pred_anchor_deltas (list[Tensor]): A list of L elements. Element i is a tensor of shape
331 | (N, Hi*Wi*A, 4 or 5) representing the predicted "deltas" used to transform anchors
332 | to proposals.
333 | gt_boxes (list[Boxes or RotatedBoxes]): Output of :meth:`label_and_sample_anchors`.
334 |
335 | Returns:
336 | dict[loss name -> loss value]: A dict mapping from loss name to loss value.
337 | Loss names are: `loss_rpn_cls` for objectness classification and
338 | `loss_rpn_loc` for proposal localization.
339 | """
340 | num_images = len(gt_labels)
341 | gt_labels = torch.stack(gt_labels) # (N, sum(Hi*Wi*Ai))
342 | anchors = type(anchors[0]).cat(anchors).tensor # Ax(4 or 5)
343 | #print(anchors.shape, gt_boxes[0].shape, len(gt_boxes))
344 | gt_anchor_deltas = [self.box2box_transform.get_deltas(anchors, k) for k in gt_boxes]
345 | gt_anchor_deltas = torch.stack(gt_anchor_deltas) # (N, sum(Hi*Wi*Ai), 4 or 5)
346 |
347 | # Log the number of positive/negative anchors per-image that's used in training
348 | pos_mask = gt_labels == 1
349 | num_pos_anchors = pos_mask.sum().item()
350 | num_neg_anchors = (gt_labels == 0).sum().item()
351 | storage = get_event_storage()
352 | storage.put_scalar("rpn/num_pos_anchors", num_pos_anchors / num_images)
353 | storage.put_scalar("rpn/num_neg_anchors", num_neg_anchors / num_images)
354 |
355 | localization_loss = smooth_l1_loss(
356 | cat(pred_anchor_deltas, dim=1)[pos_mask],
357 | gt_anchor_deltas[pos_mask],
358 | self.smooth_l1_beta,
359 | reduction="sum",
360 | )
361 | valid_mask = gt_labels >= 0
362 | objectness_loss = F.binary_cross_entropy_with_logits(
363 | cat(pred_objectness_logits, dim=1)[valid_mask],
364 | gt_labels[valid_mask].to(torch.float32),
365 | reduction="sum",
366 | )
367 | normalizer = self.batch_size_per_image * num_images
368 | return {
369 | "loss_rpn_cls": objectness_loss / normalizer,
370 | "loss_rpn_loc": localization_loss / normalizer,
371 | }
372 |
373 | def forward(
374 | self,
375 | images: ImageList,
376 | features: Dict[str, torch.Tensor],
377 | gt_instances: Optional[Instances] = None,
378 | ):
379 | """
380 | Args:
381 | images (ImageList): input images of length `N`
382 | features (dict[str, Tensor]): input data as a mapping from feature
383 | map name to tensor. Axis 0 represents the number of images `N` in
384 | the input data; axes 1-3 are channels, height, and width, which may
385 | vary between feature maps (e.g., if a feature pyramid is used).
386 | gt_instances (list[Instances], optional): a length `N` list of `Instances`s.
387 | Each `Instances` stores ground-truth instances for the corresponding image.
388 |
389 | Returns:
390 | proposals: list[Instances]: contains fields "proposal_boxes", "objectness_logits"
391 | loss: dict[Tensor] or None
392 | """
393 | features = [features[f] for f in self.in_features]
394 | anchors = self.anchor_generator(features)
395 |
396 | pred_objectness_logits, pred_anchor_deltas = self.rpn_head(features)
397 | # Transpose the Hi*Wi*A dimension to the middle:
398 | pred_objectness_logits = [
399 | # (N, A, Hi, Wi) -> (N, Hi, Wi, A) -> (N, Hi*Wi*A)
400 | score.permute(0, 2, 3, 1).flatten(1)
401 | for score in pred_objectness_logits
402 | ]
403 | pred_anchor_deltas = [
404 | # (N, A*B, Hi, Wi) -> (N, A, B, Hi, Wi) -> (N, Hi, Wi, A, B) -> (N, Hi*Wi*A, B)
405 | x.view(x.shape[0], -1, self.anchor_generator.box_dim, x.shape[-2], x.shape[-1])
406 | .permute(0, 3, 4, 1, 2)
407 | .flatten(1, -2)
408 | for x in pred_anchor_deltas
409 | ]
410 |
411 | if self.training:
412 | gt_labels, gt_boxes = self.label_and_sample_anchors(anchors, gt_instances)
413 | #losses = self.losses(
414 | # anchors, pred_objectness_logits, gt_labels, pred_anchor_deltas, gt_boxes
415 | #)
416 | #losses = {k: v * self.loss_weight for k, v in losses.items()}
417 | proposals = self.predict_proposals(
418 | anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes
419 | )
420 |
421 | return proposals, anchors, pred_objectness_logits, gt_labels, pred_anchor_deltas, gt_boxes #, losses
422 | else:
423 | losses = {}
424 | proposals = self.predict_proposals(
425 | anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes
426 | )
427 | return proposals, losses
428 |
429 | @torch.no_grad()
430 | def predict_proposals(
431 | self,
432 | anchors,
433 | pred_objectness_logits: List[torch.Tensor],
434 | pred_anchor_deltas: List[torch.Tensor],
435 | image_sizes: List[Tuple[int, int]],
436 | ):
437 | """
438 | Decode all the predicted box regression deltas to proposals. Find the top proposals
439 | by applying NMS and removing boxes that are too small.
440 |
441 | Returns:
442 | proposals (list[Instances]): list of N Instances. The i-th Instances
443 | stores post_nms_topk object proposals for image i, sorted by their
444 | objectness score in descending order.
445 | """
446 | # The proposals are treated as fixed for approximate joint training with roi heads.
447 | # This approach ignores the derivative w.r.t. the proposal boxes’ coordinates that
448 | # are also network responses, so is approximate.
449 | pred_proposals = self._decode_proposals(anchors, pred_anchor_deltas)
450 | return find_top_rpn_proposals(
451 | pred_proposals,
452 | pred_objectness_logits,
453 | image_sizes,
454 | self.nms_thresh,
455 | self.pre_nms_topk[self.training],
456 | self.post_nms_topk[self.training],
457 | self.min_box_size,
458 | self.training,
459 | )
460 |
461 | def _decode_proposals(self, anchors, pred_anchor_deltas: List[torch.Tensor]):
462 | """
463 | Transform anchors into proposals by applying the predicted anchor deltas.
464 |
465 | Returns:
466 | proposals (list[Tensor]): A list of L tensors. Tensor i has shape
467 | (N, Hi*Wi*A, B)
468 | """
469 | N = pred_anchor_deltas[0].shape[0]
470 | proposals = []
471 | # For each feature map
472 | for anchors_i, pred_anchor_deltas_i in zip(anchors, pred_anchor_deltas):
473 | B = anchors_i.tensor.size(1)
474 | pred_anchor_deltas_i = pred_anchor_deltas_i.reshape(-1, B)
475 | # Expand anchors to shape (N*Hi*Wi*A, B)
476 | anchors_i = anchors_i.tensor.unsqueeze(0).expand(N, -1, -1).reshape(-1, B)
477 | proposals_i = self.box2box_transform.apply_deltas(pred_anchor_deltas_i, anchors_i)
478 | # Append feature map proposals with shape (N, Hi*Wi*A, B)
479 | proposals.append(proposals_i.view(N, -1, B))
480 | return proposals
481 |
--------------------------------------------------------------------------------
/fewx/solver/__init__.py:
--------------------------------------------------------------------------------
1 | from .build import build_lr_scheduler, build_optimizer
2 |
3 | __all__ = [k for k in globals().keys() if not k.startswith("_")]
4 |
--------------------------------------------------------------------------------
/fewx/solver/build.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
2 | from enum import Enum
3 | from typing import Any, Callable, Dict, Iterable, List, Set, Type, Union
4 | import torch
5 |
6 | from detectron2.config import CfgNode
7 |
8 | from detectron2.solver.lr_scheduler import WarmupCosineLR, WarmupMultiStepLR
9 |
10 | _GradientClipperInput = Union[torch.Tensor, Iterable[torch.Tensor]]
11 | _GradientClipper = Callable[[_GradientClipperInput], None]
12 |
13 |
14 | class GradientClipType(Enum):
15 | VALUE = "value"
16 | NORM = "norm"
17 |
18 |
19 | def _create_gradient_clipper(cfg: CfgNode) -> _GradientClipper:
20 | """
21 | Creates gradient clipping closure to clip by value or by norm,
22 | according to the provided config.
23 | """
24 | cfg = cfg.clone()
25 |
26 | def clip_grad_norm(p: _GradientClipperInput):
27 | torch.nn.utils.clip_grad_norm_(p, cfg.CLIP_VALUE, cfg.NORM_TYPE)
28 |
29 | def clip_grad_value(p: _GradientClipperInput):
30 | torch.nn.utils.clip_grad_value_(p, cfg.CLIP_VALUE)
31 |
32 | _GRADIENT_CLIP_TYPE_TO_CLIPPER = {
33 | GradientClipType.VALUE: clip_grad_value,
34 | GradientClipType.NORM: clip_grad_norm,
35 | }
36 | return _GRADIENT_CLIP_TYPE_TO_CLIPPER[GradientClipType(cfg.CLIP_TYPE)]
37 |
38 |
39 | def _generate_optimizer_class_with_gradient_clipping(
40 | optimizer_type: Type[torch.optim.Optimizer], gradient_clipper: _GradientClipper
41 | ) -> Type[torch.optim.Optimizer]:
42 | """
43 | Dynamically creates a new type that inherits the type of a given instance
44 | and overrides the `step` method to add gradient clipping
45 | """
46 |
47 | def optimizer_wgc_step(self, closure=None):
48 | for group in self.param_groups:
49 | for p in group["params"]:
50 | gradient_clipper(p)
51 | super(type(self), self).step(closure)
52 |
53 | OptimizerWithGradientClip = type(
54 | optimizer_type.__name__ + "WithGradientClip",
55 | (optimizer_type,),
56 | {"step": optimizer_wgc_step},
57 | )
58 | return OptimizerWithGradientClip
59 |
60 |
61 | def maybe_add_gradient_clipping(
62 | cfg: CfgNode, optimizer: torch.optim.Optimizer
63 | ) -> torch.optim.Optimizer:
64 | """
65 | If gradient clipping is enabled through config options, wraps the existing
66 | optimizer instance of some type OptimizerType to become an instance
67 | of the new dynamically created class OptimizerTypeWithGradientClip
68 | that inherits OptimizerType and overrides the `step` method to
69 | include gradient clipping.
70 |
71 | Args:
72 | cfg: CfgNode
73 | configuration options
74 | optimizer: torch.optim.Optimizer
75 | existing optimizer instance
76 |
77 | Return:
78 | optimizer: torch.optim.Optimizer
79 | either the unmodified optimizer instance (if gradient clipping is
80 | disabled), or the same instance with adjusted __class__ to override
81 | the `step` method and include gradient clipping
82 | """
83 | if not cfg.SOLVER.CLIP_GRADIENTS.ENABLED:
84 | return optimizer
85 | grad_clipper = _create_gradient_clipper(cfg.SOLVER.CLIP_GRADIENTS)
86 | OptimizerWithGradientClip = _generate_optimizer_class_with_gradient_clipping(
87 | type(optimizer), grad_clipper
88 | )
89 | optimizer.__class__ = OptimizerWithGradientClip
90 | return optimizer
91 |
92 |
93 | def build_optimizer(cfg: CfgNode, model: torch.nn.Module) -> torch.optim.Optimizer:
94 | """
95 | Build an optimizer from config.
96 | """
97 | norm_module_types = (
98 | torch.nn.BatchNorm1d,
99 | torch.nn.BatchNorm2d,
100 | torch.nn.BatchNorm3d,
101 | torch.nn.SyncBatchNorm,
102 | # NaiveSyncBatchNorm inherits from BatchNorm2d
103 | torch.nn.GroupNorm,
104 | torch.nn.InstanceNorm1d,
105 | torch.nn.InstanceNorm2d,
106 | torch.nn.InstanceNorm3d,
107 | torch.nn.LayerNorm,
108 | torch.nn.LocalResponseNorm,
109 | )
110 | params: List[Dict[str, Any]] = []
111 | memo: Set[torch.nn.parameter.Parameter] = set()
112 | for module in model.modules():
113 | for key, value in module.named_parameters(): #recurse=False):
114 | if not value.requires_grad:
115 | continue
116 | # Avoid duplicating parameters
117 | if value in memo:
118 | continue
119 | memo.add(value)
120 | lr = cfg.SOLVER.BASE_LR
121 | weight_decay = cfg.SOLVER.WEIGHT_DECAY
122 | if isinstance(module, norm_module_types):
123 | weight_decay = cfg.SOLVER.WEIGHT_DECAY_NORM
124 | elif "bias" in key: #key == "bias":
125 | # NOTE: unlike Detectron v1, we now default BIAS_LR_FACTOR to 1.0
126 | # and WEIGHT_DECAY_BIAS to WEIGHT_DECAY so that bias optimizer
127 | # hyperparameters are by default exactly the same as for regular
128 | # weights.
129 | lr = cfg.SOLVER.BASE_LR * cfg.SOLVER.BIAS_LR_FACTOR
130 | weight_decay = cfg.SOLVER.WEIGHT_DECAY_BIAS
131 |
132 | if 'box_predictor' in key:
133 | lr = cfg.SOLVER.BASE_LR * cfg.SOLVER.HEAD_LR_FACTOR
134 | params += [{"params": [value], "lr": lr, "weight_decay": weight_decay}]
135 | optimizer = torch.optim.SGD(
136 | params, cfg.SOLVER.BASE_LR, momentum=cfg.SOLVER.MOMENTUM, nesterov=cfg.SOLVER.NESTEROV
137 | )
138 | optimizer = maybe_add_gradient_clipping(cfg, optimizer)
139 | return optimizer
140 |
141 |
142 | def build_lr_scheduler(
143 | cfg: CfgNode, optimizer: torch.optim.Optimizer
144 | ) -> torch.optim.lr_scheduler._LRScheduler:
145 | """
146 | Build a LR scheduler from config.
147 | """
148 | name = cfg.SOLVER.LR_SCHEDULER_NAME
149 | if name == "WarmupMultiStepLR":
150 | return WarmupMultiStepLR(
151 | optimizer,
152 | cfg.SOLVER.STEPS,
153 | cfg.SOLVER.GAMMA,
154 | warmup_factor=cfg.SOLVER.WARMUP_FACTOR,
155 | warmup_iters=cfg.SOLVER.WARMUP_ITERS,
156 | warmup_method=cfg.SOLVER.WARMUP_METHOD,
157 | )
158 | elif name == "WarmupCosineLR":
159 | return WarmupCosineLR(
160 | optimizer,
161 | cfg.SOLVER.MAX_ITER,
162 | warmup_factor=cfg.SOLVER.WARMUP_FACTOR,
163 | warmup_iters=cfg.SOLVER.WARMUP_ITERS,
164 | warmup_method=cfg.SOLVER.WARMUP_METHOD,
165 | )
166 | else:
167 | raise ValueError("Unknown LR scheduler: {}".format(name))
168 |
--------------------------------------------------------------------------------
/fewx/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fanq15/FewX/6347165aa24ba20d03ef06321f82bd818e3a8021/fewx/utils/__init__.py
--------------------------------------------------------------------------------
/fewx/utils/comm.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn.functional as F
3 | import torch.distributed as dist
4 |
5 | from detectron2.utils.comm import get_world_size
6 |
7 |
8 | def reduce_sum(tensor):
9 | world_size = get_world_size()
10 | if world_size < 2:
11 | return tensor
12 | tensor = tensor.clone()
13 | dist.all_reduce(tensor, op=dist.ReduceOp.SUM)
14 | return tensor
15 |
16 |
17 | def aligned_bilinear(tensor, factor):
18 | assert tensor.dim() == 4
19 | assert factor >= 1
20 | assert int(factor) == factor
21 |
22 | if factor == 1:
23 | return tensor
24 |
25 | h, w = tensor.size()[2:]
26 | tensor = F.pad(tensor, pad=(0, 1, 0, 1), mode="replicate")
27 | oh = factor * h + 1
28 | ow = factor * w + 1
29 | tensor = F.interpolate(
30 | tensor, size=(oh, ow),
31 | mode='bilinear',
32 | align_corners=True
33 | )
34 | tensor = F.pad(
35 | tensor, pad=(factor // 2, 0, factor // 2, 0),
36 | mode="replicate"
37 | )
38 |
39 | return tensor[:, :, :oh - 1, :ow - 1]
40 |
41 |
42 | def compute_locations(h, w, stride, device):
43 | shifts_x = torch.arange(
44 | 0, w * stride, step=stride,
45 | dtype=torch.float32, device=device
46 | )
47 | shifts_y = torch.arange(
48 | 0, h * stride, step=stride,
49 | dtype=torch.float32, device=device
50 | )
51 | shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x)
52 | shift_x = shift_x.reshape(-1)
53 | shift_y = shift_y.reshape(-1)
54 | locations = torch.stack((shift_x, shift_y), dim=1) + stride // 2
55 | return locations
56 |
--------------------------------------------------------------------------------
/fewx/utils/measures.py:
--------------------------------------------------------------------------------
1 | # coding: utf-8
2 | # Adapted from https://github.com/ShichenLiu/CondenseNet/blob/master/utils.py
3 | from __future__ import absolute_import
4 | from __future__ import unicode_literals
5 | from __future__ import print_function
6 | from __future__ import division
7 |
8 | import operator
9 |
10 | from functools import reduce
11 |
12 |
13 | def get_num_gen(gen):
14 | return sum(1 for x in gen)
15 |
16 |
17 | def is_pruned(layer):
18 | try:
19 | layer.mask
20 | return True
21 | except AttributeError:
22 | return False
23 |
24 |
25 | def is_leaf(model):
26 | return get_num_gen(model.children()) == 0
27 |
28 |
29 | def get_layer_info(layer):
30 | layer_str = str(layer)
31 | type_name = layer_str[:layer_str.find('(')].strip()
32 | return type_name
33 |
34 |
35 | def get_layer_param(model):
36 | return sum([reduce(operator.mul, i.size(), 1) for i in model.parameters()])
37 |
38 |
39 | ### The input batch size should be 1 to call this function
40 | def measure_layer(layer, *args):
41 | global count_ops, count_params
42 |
43 | for x in args:
44 | delta_ops = 0
45 | delta_params = 0
46 | multi_add = 1
47 | type_name = get_layer_info(layer)
48 |
49 | ### ops_conv
50 | if type_name in ['Conv2d']:
51 | out_h = int((x.size()[2] + 2 * layer.padding[0] / layer.dilation[0] - layer.kernel_size[0]) /
52 | layer.stride[0] + 1)
53 | out_w = int((x.size()[3] + 2 * layer.padding[1] / layer.dilation[1] - layer.kernel_size[1]) /
54 | layer.stride[1] + 1)
55 | delta_ops = layer.in_channels * layer.out_channels * layer.kernel_size[0] * layer.kernel_size[1] * out_h * out_w / layer.groups * multi_add
56 | delta_params = get_layer_param(layer)
57 |
58 | elif type_name in ['ConvTranspose2d']:
59 | _, _, in_h, in_w = x.size()
60 | out_h = int((in_h-1)*layer.stride[0] - 2 * layer.padding[0] + layer.kernel_size[0] + layer.output_padding[0])
61 | out_w = int((in_w-1)*layer.stride[1] - 2 * layer.padding[1] + layer.kernel_size[1] + layer.output_padding[1])
62 | delta_ops = layer.in_channels * layer.out_channels * layer.kernel_size[0] * \
63 | layer.kernel_size[1] * out_h * out_w / layer.groups * multi_add
64 | delta_params = get_layer_param(layer)
65 |
66 | ### ops_learned_conv
67 | elif type_name in ['LearnedGroupConv']:
68 | measure_layer(layer.relu, x)
69 | measure_layer(layer.norm, x)
70 | conv = layer.conv
71 | out_h = int((x.size()[2] + 2 * conv.padding[0] - conv.kernel_size[0]) /
72 | conv.stride[0] + 1)
73 | out_w = int((x.size()[3] + 2 * conv.padding[1] - conv.kernel_size[1]) /
74 | conv.stride[1] + 1)
75 | delta_ops = conv.in_channels * conv.out_channels * conv.kernel_size[0] * conv.kernel_size[1] * out_h * out_w / layer.condense_factor * multi_add
76 | delta_params = get_layer_param(conv) / layer.condense_factor
77 |
78 | ### ops_nonlinearity
79 | elif type_name in ['ReLU', 'ReLU6']:
80 | delta_ops = x.numel()
81 | delta_params = get_layer_param(layer)
82 |
83 | ### ops_pooling
84 | elif type_name in ['AvgPool2d', 'MaxPool2d']:
85 | in_w = x.size()[2]
86 | kernel_ops = layer.kernel_size * layer.kernel_size
87 | out_w = int((in_w + 2 * layer.padding - layer.kernel_size) / layer.stride + 1)
88 | out_h = int((in_w + 2 * layer.padding - layer.kernel_size) / layer.stride + 1)
89 | delta_ops = x.size()[0] * x.size()[1] * out_w * out_h * kernel_ops
90 | delta_params = get_layer_param(layer)
91 |
92 | elif type_name in ['LastLevelMaxPool']:
93 | pass
94 |
95 | elif type_name in ['AdaptiveAvgPool2d']:
96 | delta_ops = x.size()[0] * x.size()[1] * x.size()[2] * x.size()[3]
97 | delta_params = get_layer_param(layer)
98 |
99 | elif type_name in ['ZeroPad2d', 'RetinaNetPostProcessor']:
100 | pass
101 | #delta_ops = x.size()[0] * x.size()[1] * x.size()[2] * x.size()[3]
102 | #delta_params = get_layer_param(layer)
103 |
104 | ### ops_linear
105 | elif type_name in ['Linear']:
106 | weight_ops = layer.weight.numel() * multi_add
107 | bias_ops = layer.bias.numel()
108 | delta_ops = x.size()[0] * (weight_ops + bias_ops)
109 | delta_params = get_layer_param(layer)
110 |
111 | ### ops_nothing
112 | elif type_name in ['BatchNorm2d', 'Dropout2d', 'DropChannel', 'Dropout', 'FrozenBatchNorm2d', 'GroupNorm']:
113 | delta_params = get_layer_param(layer)
114 |
115 | elif type_name in ['SumTwo']:
116 | delta_ops = x.numel()
117 |
118 | elif type_name in ['AggregateCell']:
119 | if not layer.pre_transform:
120 | delta_ops = 2 * x.numel() # twice for each input
121 | else:
122 | measure_layer(layer.branch_1, x)
123 | measure_layer(layer.branch_2, x)
124 | delta_params = get_layer_param(layer)
125 |
126 | elif type_name in ['Identity', 'Zero']:
127 | pass
128 |
129 | elif type_name in ['Scale']:
130 | delta_params = get_layer_param(layer)
131 | delta_ops = x.numel()
132 |
133 | elif type_name in ['FCOSPostProcessor', 'RPNPostProcessor', 'KeypointPostProcessor',
134 | 'ROIAlign', 'PostProcessor', 'KeypointRCNNPredictor',
135 | 'NaiveSyncBatchNorm', 'Upsample', 'Sequential']:
136 | pass
137 |
138 | elif type_name in ['DeformConv']:
139 | # don't count bilinear
140 | offset_conv = list(layer.parameters())[0]
141 | delta_ops = reduce(operator.mul, offset_conv.size(), x.size()[2] * x.size()[3])
142 | out_h = int((x.size()[2] + 2 * layer.padding[0] / layer.dilation[0]
143 | - layer.kernel_size[0]) / layer.stride[0] + 1)
144 | out_w = int((x.size()[3] + 2 * layer.padding[1] / layer.dilation[1]
145 | - layer.kernel_size[1]) / layer.stride[1] + 1)
146 | delta_ops += layer.in_channels * layer.out_channels * layer.kernel_size[0] * layer.kernel_size[1] * out_h * out_w / layer.groups * multi_add
147 | delta_params = get_layer_param(layer)
148 |
149 | ### unknown layer type
150 | else:
151 | raise TypeError('unknown layer type: %s' % type_name)
152 |
153 | count_ops += delta_ops
154 | count_params += delta_params
155 | return
156 |
157 |
158 | def measure_model(model, x):
159 | global count_ops, count_params
160 | count_ops = 0
161 | count_params = 0
162 |
163 | def should_measure(x):
164 | return is_leaf(x) or is_pruned(x)
165 |
166 | def modify_forward(model):
167 | for child in model.children():
168 | if should_measure(child):
169 | def new_forward(m):
170 | def lambda_forward(*args):
171 | measure_layer(m, *args)
172 | return m.old_forward(*args)
173 | return lambda_forward
174 | child.old_forward = child.forward
175 | child.forward = new_forward(child)
176 | else:
177 | modify_forward(child)
178 |
179 | def restore_forward(model):
180 | for child in model.children():
181 | # leaf node
182 | if is_leaf(child) and hasattr(child, 'old_forward'):
183 | child.forward = child.old_forward
184 | child.old_forward = None
185 | else:
186 | restore_forward(child)
187 |
188 | modify_forward(model)
189 | out = model.forward(x)
190 | restore_forward(model)
191 |
192 | return out, count_ops, count_params
193 |
--------------------------------------------------------------------------------
/fsod_train_net.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
3 |
4 | """
5 | TridentNet Training Script.
6 |
7 | This script is a simplified version of the training script in detectron2/tools.
8 | """
9 |
10 | import os
11 |
12 | from detectron2.checkpoint import DetectionCheckpointer
13 | from detectron2.config import get_cfg
14 | from detectron2.engine import DefaultTrainer, default_argument_parser, default_setup, launch
15 | #from detectron2.evaluation import COCOEvaluator
16 | #from detectron2.data import build_detection_train_loader
17 | from detectron2.data import build_batch_data_loader
18 |
19 | from fewx.config import get_cfg
20 | from fewx.data.dataset_mapper import DatasetMapperWithSupport
21 | from fewx.data.build import build_detection_train_loader, build_detection_test_loader
22 | from fewx.solver import build_optimizer
23 | from fewx.evaluation import COCOEvaluator
24 |
25 | import bisect
26 | import copy
27 | import itertools
28 | import logging
29 | import numpy as np
30 | import operator
31 | import pickle
32 | import torch.utils.data
33 |
34 | import detectron2.utils.comm as comm
35 | from detectron2.utils.logger import setup_logger
36 |
37 | class Trainer(DefaultTrainer):
38 |
39 | @classmethod
40 | def build_train_loader(cls, cfg):
41 | """
42 | Returns:
43 | iterable
44 | It calls :func:`detectron2.data.build_detection_train_loader` with a customized
45 | DatasetMapper, which adds categorical labels as a semantic mask.
46 | """
47 | mapper = DatasetMapperWithSupport(cfg)
48 | return build_detection_train_loader(cfg, mapper)
49 |
50 | @classmethod
51 | def build_test_loader(cls, cfg, dataset_name):
52 | """
53 | Returns:
54 | iterable
55 | It now calls :func:`detectron2.data.build_detection_test_loader`.
56 | Overwrite it if you'd like a different data loader.
57 | """
58 | return build_detection_test_loader(cfg, dataset_name)
59 |
60 | @classmethod
61 | def build_optimizer(cls, cfg, model):
62 | """
63 | Returns:
64 | torch.optim.Optimizer:
65 | It now calls :func:`detectron2.solver.build_optimizer`.
66 | Overwrite it if you'd like a different optimizer.
67 | """
68 | return build_optimizer(cfg, model)
69 |
70 | @classmethod
71 | def build_evaluator(cls, cfg, dataset_name, output_folder=None):
72 | if output_folder is None:
73 | output_folder = os.path.join(cfg.OUTPUT_DIR, "inference")
74 | return COCOEvaluator(dataset_name, cfg, True, output_folder)
75 |
76 |
77 | def setup(args):
78 | """
79 | Create configs and perform basic setups.
80 | """
81 | cfg = get_cfg()
82 | cfg.merge_from_file(args.config_file)
83 | cfg.merge_from_list(args.opts)
84 | cfg.freeze()
85 | default_setup(cfg, args)
86 |
87 | rank = comm.get_rank()
88 | setup_logger(cfg.OUTPUT_DIR, distributed_rank=rank, name="fewx")
89 |
90 | return cfg
91 |
92 |
93 | def main(args):
94 | cfg = setup(args)
95 |
96 | if args.eval_only:
97 | model = Trainer.build_model(cfg)
98 | DetectionCheckpointer(model, save_dir=cfg.OUTPUT_DIR).resume_or_load(
99 | cfg.MODEL.WEIGHTS, resume=args.resume
100 | )
101 | res = Trainer.test(cfg, model)
102 | return res
103 |
104 | trainer = Trainer(cfg)
105 | trainer.resume_or_load(resume=args.resume)
106 | return trainer.train()
107 |
108 |
109 | if __name__ == "__main__":
110 | args = default_argument_parser().parse_args()
111 | print("Command Line Args:", args)
112 | launch(
113 | main,
114 | args.num_gpus,
115 | num_machines=args.num_machines,
116 | machine_rank=args.machine_rank,
117 | dist_url=args.dist_url,
118 | args=(args,),
119 | )
120 |
--------------------------------------------------------------------------------
/log/metric.txt:
--------------------------------------------------------------------------------
1 | Running per image evaluation...
2 | Evaluate annotation type *bbox*
3 | COCOeval_opt.evaluate() finished in 7.92 seconds.
4 | Accumulating evaluation results...
5 | COCOeval_opt.accumulate() finished in 0.96 seconds.
6 | Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.030
7 | Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.056
8 | Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.029
9 | Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.007
10 | Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.031
11 | Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.052
12 | Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.047
13 | Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.066
14 | Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.066
15 | Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.009
16 | Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.059
17 | Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.114
18 | [32m[08/08 18:10:48 fewx.evaluation.coco_evaluation]: [0mEvaluation results for bbox:
19 | | AP | AP50 | AP75 | APs | APm | APl |
20 | |:-----:|:------:|:------:|:-----:|:-----:|:-----:|
21 | | 2.989 | 5.592 | 2.948 | 0.724 | 3.057 | 5.165 |
22 | [32m[08/08 18:10:48 fewx.evaluation.coco_evaluation]: [0mEvaluation results for VOC 20 categories =======> AP : 11.95
23 | [32m[08/08 18:10:48 fewx.evaluation.coco_evaluation]: [0mEvaluation results for VOC 20 categories =======> AP50: 22.37
24 | [32m[08/08 18:10:48 fewx.evaluation.coco_evaluation]: [0mEvaluation results for VOC 20 categories =======> AP75: 11.79
25 | [32m[08/08 18:10:48 fewx.evaluation.coco_evaluation]: [0mEvaluation results for VOC 20 categories =======> APs : 2.89
26 | [32m[08/08 18:10:48 fewx.evaluation.coco_evaluation]: [0mEvaluation results for VOC 20 categories =======> APm : 12.23
27 | [32m[08/08 18:10:48 fewx.evaluation.coco_evaluation]: [0mEvaluation results for VOC 20 categories =======> APl : 20.66
28 | [32m[08/08 18:10:48 fewx.evaluation.coco_evaluation]: [0mEvaluation results for Non VOC 60 categories =======> AP : 0.00
29 | [32m[08/08 18:10:48 fewx.evaluation.coco_evaluation]: [0mEvaluation results for Non VOC 60 categories =======> AP50: 0.00
30 | [32m[08/08 18:10:48 fewx.evaluation.coco_evaluation]: [0mEvaluation results for Non VOC 60 categories =======> AP75: 0.00
31 | [32m[08/08 18:10:48 fewx.evaluation.coco_evaluation]: [0mEvaluation results for Non VOC 60 categories =======> APs : 0.00
32 | [32m[08/08 18:10:48 fewx.evaluation.coco_evaluation]: [0mEvaluation results for Non VOC 60 categories =======> APm : 0.00
33 | [32m[08/08 18:10:48 fewx.evaluation.coco_evaluation]: [0mEvaluation results for Non VOC 60 categories =======> APl : 0.00
34 | [32m[08/08 18:10:48 fewx.evaluation.coco_evaluation]: [0mPer-category bbox AP:
35 | | category | AP | category | AP | category | AP |
36 | |:--------------|:-------|:-------------|:-------|:---------------|:-------|
37 | | person | 10.446 | bicycle | 5.604 | car | 11.692 |
38 | | motorcycle | 10.597 | airplane | 23.452 | bus | 20.706 |
39 | | train | 16.856 | truck | 0.000 | boat | 1.730 |
40 | | traffic light | 0.000 | fire hydrant | 0.000 | stop sign | 0.000 |
41 | | parking meter | 0.000 | bench | 0.000 | bird | 8.258 |
42 | | cat | 25.047 | dog | 17.904 | horse | 11.009 |
43 | | sheep | 6.285 | cow | 9.031 | elephant | 0.000 |
44 | | bear | 0.000 | zebra | 0.000 | giraffe | 0.000 |
45 | | backpack | 0.000 | umbrella | 0.000 | handbag | 0.000 |
46 | | tie | 0.000 | suitcase | 0.000 | frisbee | 0.000 |
47 | | skis | 0.000 | snowboard | 0.000 | sports ball | 0.000 |
48 | | kite | 0.000 | baseball bat | 0.000 | baseball glove | 0.000 |
49 | | skateboard | 0.000 | surfboard | 0.000 | tennis racket | 0.000 |
50 | | bottle | 7.444 | wine glass | 0.000 | cup | 0.000 |
51 | | fork | 0.000 | knife | 0.000 | spoon | 0.000 |
52 | | bowl | 0.000 | banana | 0.000 | apple | 0.000 |
53 | | sandwich | 0.000 | orange | 0.000 | broccoli | 0.000 |
54 | | carrot | 0.000 | hot dog | 0.000 | pizza | 0.000 |
55 | | donut | 0.000 | cake | 0.000 | chair | 2.958 |
56 | | couch | 11.579 | potted plant | 4.004 | bed | 0.000 |
57 | | dining table | 8.955 | toilet | 0.000 | tv | 25.533 |
58 | | laptop | 0.000 | mouse | 0.000 | remote | 0.000 |
59 | | keyboard | 0.000 | cell phone | 0.000 | microwave | 0.000 |
60 | | oven | 0.000 | toaster | 0.000 | sink | 0.000 |
61 | | refrigerator | 0.000 | book | 0.000 | clock | 0.000 |
62 | | vase | 0.000 | scissors | 0.000 | teddy bear | 0.000 |
63 | | hair drier | 0.000 | toothbrush | 0.000 | | |
64 | [32m[08/08 18:10:48 d2.engine.defaults]: [0mEvaluation results for coco_2017_val in csv format:
65 | [32m[08/08 18:10:48 d2.evaluation.testing]: [0mcopypaste: Task: bbox
66 | [32m[08/08 18:10:48 d2.evaluation.testing]: [0mcopypaste: AP,AP50,AP75,APs,APm,APl
67 | [32m[08/08 18:10:48 d2.evaluation.testing]: [0mcopypaste: 2.9886,5.5922,2.9480,0.7236,3.0575,5.1649
68 |
--------------------------------------------------------------------------------