├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── configs
    ├── r50_ut_detr_coco_additional.sh
    ├── r50_ut_detr_omni_bees.sh
    ├── r50_ut_detr_omni_coco.sh
    ├── r50_ut_detr_omni_crowdhuman.sh
    ├── r50_ut_detr_omni_objects.sh
    ├── r50_ut_detr_omni_voc.sh
    ├── r50_ut_detr_point_ufo.sh
    ├── r50_ut_detr_tagsU_ufo.sh
    └── r50_ut_detr_voc07to12_semi.sh
├── datasets
    ├── __init__.py
    ├── coco.py
    ├── coco_eval.py
    ├── coco_panoptic.py
    ├── data_prefetcher.py
    ├── panoptic_eval.py
    ├── samplers.py
    ├── torchvision_datasets
    │   ├── __init__.py
    │   └── coco.py
    ├── transforms.py
    └── transforms_with_record.py
├── engine.py
├── main.py
├── models
    ├── __init__.py
    ├── backbone.py
    ├── deformable_detr.py
    ├── deformable_transformer.py
    ├── matcher.py
    ├── ops
    │   ├── functions
    │   │   ├── __init__.py
    │   │   └── ms_deform_attn_func.py
    │   ├── make.sh
    │   ├── modules
    │   │   ├── __init__.py
    │   │   └── ms_deform_attn.py
    │   ├── setup.py
    │   ├── src
    │   │   ├── cpu
    │   │   │   ├── ms_deform_attn_cpu.cpp
    │   │   │   └── ms_deform_attn_cpu.h
    │   │   ├── cuda
    │   │   │   ├── ms_deform_attn_cuda.cu
    │   │   │   ├── ms_deform_attn_cuda.h
    │   │   │   └── ms_deform_im2col_cuda.cuh
    │   │   ├── ms_deform_attn.h
    │   │   └── vision.cpp
    │   └── test.py
    ├── position_encoding.py
    └── segmentation.py
├── requirements.txt
├── scripts
    ├── Bees2COCO.py
    ├── IoU_extreme.npy
    ├── VOC2COCO.py
    ├── add_indicator_to_coco2014.py
    ├── add_indicator_to_coco2017_val.py
    ├── add_indicator_to_objects365val.py
    ├── build_crowdhuman_dataset.py
    ├── combine_voc_trainval20072012.py
    ├── convert_crowdhuman_to_coco.py
    ├── file_list_to_remove.txt
    ├── prepare_objects365_for_omni.py
    ├── prepare_voc_dataset.py
    ├── split_bees_train_val.py
    ├── split_dataset_bees_omni.py
    ├── split_dataset_coco_omni.py
    ├── split_dataset_crowdhuman_omni.py
    ├── split_dataset_objects365_omni.py
    └── split_dataset_voc_omni.py
├── tools
    ├── launch.py
    ├── run_dist_launch.sh
    └── run_dist_slurm.sh
└── util
    ├── __init__.py
    ├── box_ops.py
    ├── filtering.py
    ├── misc.py
    └── plot_utils.py


/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | 
10 | ## Reporting Bugs/Feature Requests
11 | 
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 | 
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 | 
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 | 
22 | 
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 | 
26 | 1. You are working against the latest source on the *main* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 | 
30 | To send us a pull request, please:
31 | 
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 | 
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 | 
42 | 
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 | 
46 | 
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 | 
52 | 
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 | 
56 | 
57 | ## Licensing
58 | 
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Omni-DETR: Omni-Supervised Object Detection with Transformers
  2 | 
  3 | This is the PyTorch implementation of the [Omni-DETR](https://assets.amazon.science/91/3c/ac87e7dd44789a62e03b2230e0ed/omni-detr-omni-supervised-object-detection-with-transformers.pdf) paper. It is a unified framework to use different types of weak annotations for object detection.
  4 | 
  5 | If you use the code/model/results of this repository please cite:
  6 | ```
  7 | @inproceedings{wang2022omni,
  8 |   author  = {Pei Wang and Zhaowei Cai and Hao Yang and Gurumurthy Swaminathan and Nuno Vasconcelos and Bernt Schiele and Stefano Soatto},
  9 |   title   = {Omni-DETR: Omni-Supervised Object Detection with Transformers},
 10 |   booktitle = {CVPR},
 11 |   Year  = {2022}
 12 | }
 13 | ```
 14 | 
 15 | ## Installation
 16 | 
 17 | First, install PyTorch and torchvision. We have tested on version of 1.8.1, but the other versions should also be working, e.g. no earlier than 1.5.1.
 18 | 
 19 | Our implementation is partially based on [Deformable DETR](https://github.com/fundamentalvision/Deformable-DETR/). Please follow its [instruction](https://github.com/fundamentalvision/Deformable-DETR/blob/main/README.md) for other requirements.
 20 | 
 21 | ## Usage
 22 | 
 23 | ### Dataset organization
 24 | 
 25 | Please organize each dataset as follows,
 26 | 
 27 | ```
 28 | code_root/
 29 | └── coco/
 30 |   ├── train2017/
 31 |   ├── val2017/
 32 |   ├── train2014/
 33 |   ├── val2014/
 34 |   └── annotations/
 35 |     ├── instances_train2017.json
 36 |     ├── instances_val2017.json
 37 |     ├── instances_valminusminival2014.json
 38 |     └── instances_train2014.json
 39 | └── voc/
 40 |   └── VOCdevkit/
 41 |     └── VOC2007trainval
 42 |       ├── Annotations/
 43 |       ├── JPEGImages/
 44 |     └── VOC2012trainval/
 45 |       ├── Annotations/
 46 |       ├── JPEGImages/
 47 |     └── VOC2007test/
 48 |       ├── Annotations/
 49 |       ├── JPEGImages/
 50 |     └── VOC20072012trainval/
 51 |       ├── Annotations/
 52 |       ├── JPEGImages/
 53 |  └── objects365/
 54 |      ├── train_objects365/
 55 |         ├── objects365_v1_00000000.jpg
 56 |         ├── ...
 57 |      ├── val_objects365/
 58 |         ├── objects365_v1_00000016.jpg
 59 |         ├── ...
 60 |      └── annotations/
 61 |         ├── objects365_train.json
 62 |         └── objects365_val.json
 63 |  └── bees/
 64 |      └── ML-Data/
 65 |  └── crowdhuman/
 66 |     ├── Images/
 67 |       |── 273271,1a0d6000b9e1f5b7.jpg
 68 |       |── ...
 69 |     ├── annotation_train.odgt
 70 |     └── annotation_val.odgt
 71 |       
 72 | ```
 73 | 
 74 | ### Dataset preparation
 75 | First go to ``scripts`` folder
 76 | 
 77 | ```
 78 | cd scripts
 79 | ```
 80 | 
 81 | #### COCO
 82 | To get the split labeled and omni-labeled datasets
 83 | ```
 84 | python split_dataset_coco_omni.py
 85 | ```
 86 | Add indicator to coco val set
 87 | ```
 88 | python add_indicator_to_coco2017_val.py
 89 | ```
 90 | For experiments compared with UFO, we prepare coco2014 set
 91 | ```
 92 | python add_indicator_to_coco2014.py
 93 | ```
 94 | #### VOC
 95 | First need to convert the annotation formats to coco style by
 96 | ```
 97 | python VOC2COCO.py --xml_dir ../voc/VOCdevkit/VOC2007trainval/Annotations --json_file ../voc/VOCdevkit/VOC2007trainval/instances_VOC_trainval2007.json
 98 | python VOC2COCO.py --xml_dir ../voc/VOCdevkit/VOC2007test/Annotations --json_file ../voc/VOCdevkit/VOC2007test/instances_VOC_test2007.json
 99 | python VOC2COCO.py --xml_dir ../voc/VOCdevkit/VOC2012trainval/Annotations --json_file ../voc/VOCdevkit/VOC2012trainval/instances_VOC_trainval2012.json
100 | ```
101 | To combine the annotations of voc07 and voc12 by
102 | ```
103 | python combine_voc_trainval20072012.py
104 | ```
105 | Add indicator to voc07 and 12
106 | ```
107 | python prepare_voc_dataset.py
108 | ```
109 | To get the split labeled and omni-labeled datasets
110 | ```
111 | python split_dataset_voc_omni.py
112 | ```
113 | 
114 | 
115 | #### Objects365
116 | First sample a subset from the original whole training set
117 | ```
118 | python prepare_objects365_for_omni.py
119 | ```
120 | Add indicator to val
121 | ```
122 | python add_indicator_to_objects365val.py
123 | ```
124 | To get the split labeled and omni-labeled datasets
125 | ```
126 | python split_dataset_objects365_omni.py
127 | ```
128 | 
129 | #### Bees
130 | Because the official training set has some broken images (with names from ``Erlen_Erlen_Hive_04_1264.jpg`` to ``Erlen_Erlen_Hive_04_1842.jpg``), we first need to 
131 | manually delete them or run
132 | ```
133 | xargs rm -r file_list_to_remove.txt
134 | ```
135 | Finally, 3596 samples are kept. Next, convert the annotation formats to coco style by
136 | ```
137 | python Bees2COCO.py
138 | ```
139 | To split the training and validation set as 8:2
140 | ```
141 | python split_bees_train_val.py
142 | ```
143 | To get the split labeled and omni-labeled datasets
144 | ```
145 | python split_dataset_bees_omni.py
146 | ```
147 | 
148 | #### CrowdHuman
149 | Please follow [repo](https://github.com/xingyizhou/CenterTrack/blob/master/readme/DATA.md) to first convert annotations with odgt format to coco format, or run
150 | ```
151 | python convert_crowdhuman_to_coco.py
152 | ```
153 | Because we only focus on the full body detection of CrowdHuman, we first extract such annotation by
154 | ```
155 | python build_crowdhuman_dataset.py
156 | ```
157 | To get the split labeled and omni-labeled datasets
158 | ```
159 | python split_dataset_crowdhuman_omni.py
160 | ```
161 | 
162 | ### Training Omni-DETR
163 | After preparing datasets, please change the arguments in the config files, such as ``annotation_json_label``, ``annotation_json_unlabel``, according to the name of the generated json file above. The ``BURN_IN_STEP`` argument sometimes also needs to be changed (please refer to our supplementary materials). In our experiments, this hyperparameter does not have a huge impact on the results.
164 | 
165 | Because semi-supervised learning is just a special case of omni-supervised learning, to generate semi-supervised results, please modify the ratio of ``fully_labeled`` and ``Unsup``, but set others as 0, when splitting the dataset.
166 | 
167 | Training Omni-DETR on each dataset (from the repo main folder)
168 | 
169 | #### Training from scratch
170 | 
171 | ```
172 | GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_omni_coco.sh
173 | GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_omni_voc.sh
174 | GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_omni_objects.sh
175 | GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_omni_bees.sh
176 | GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_omni_crowdhuman.sh
177 | ```
178 | 
179 | #### Training from Deformable DETR
180 | Because our burn-in stage is totally same as Deformable DETR, it is acceptable to start from a Deformable DETR checkpoint to skip the burn-in stage. Just modify the ``resume`` argument in config file above.
181 | 
182 | 
183 | Before running the above scripts, you may have to run the below to change access permissions,
184 | ```
185 | chmod u+x ./tools/run_dist_launch.sh
186 | chmod u+x ./configs/r50_ut_detr_omni_coco.sh
187 | ```
188 | 
189 | ### Training under the setting of COCO35to80
190 | ```
191 | GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_tagsU_ufo.sh
192 | GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_point_ufo.sh
193 | ```
194 | 
195 | ### Training under the setting of VOC07to12
196 | ```
197 | GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_voc07to12_semi.sh
198 | ```
199 | 
200 | ### Note
201 | 1. Some of our experiments are on 800-pixels images by 8 * GPUs with 32G memory. If such memory is not affordable, please change the argument of ``pixels`` to 600. Then it can work on 8 * GPUs with 16G memory. 
202 | 2. This code could have some minor accuracy differences from our paper due to some implementation changes after the paper submission.
203 | 
204 | ## License
205 | 
206 | This project is under the Apache-2.0 license. See [LICENSE](LICENSE) for details.
207 | 


--------------------------------------------------------------------------------
/configs/r50_ut_detr_coco_additional.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | set -x
 4 | 
 5 | EXP_DIR=results/r50_coco_additional_ep500_burnin10
 6 | PY_ARGS=${@:1}
 7 | 
 8 | python -u main.py \
 9 |     --output_dir ${EXP_DIR} \
10 |     ${PY_ARGS} \
11 |     --BURN_IN_STEP 10 \
12 |     --TEACHER_UPDATE_ITER 1 \
13 |     --EMA_KEEP_RATE 0.9996 \
14 |     --annotation_json_label 'instances_train2017_w_indicator.json' \
15 |     --annotation_json_unlabel 'image_info_unlabeled2017_w_indicator.json' \
16 |     --CONFIDENCE_THRESHOLD 0.7 \
17 |     --data_path './coco' \
18 |     --lr 2e-4 \
19 |     --epochs 1000 \
20 |     --lr_drop 1000 \
21 |     --pixels 800 \
22 |     --dataset_file 'coco_add_semi' \
23 |     --resume '' \
24 | 
25 | 


--------------------------------------------------------------------------------
/configs/r50_ut_detr_omni_bees.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | set -x
 4 | 
 5 | EXP_DIR=results/r50_omni_bees_20fully0Unsup0tagsU34tagsK0pointsU0pointsK46boxesEC0boxesU_ep1k_burnin200
 6 | PY_ARGS=${@:1}
 7 | 
 8 | python -u main.py \
 9 |     --output_dir ${EXP_DIR} \
10 |     ${PY_ARGS} \
11 |     --BURN_IN_STEP 200 \
12 |     --TEACHER_UPDATE_ITER 1 \
13 |     --EMA_KEEP_RATE 0.9996 \
14 |     --annotation_json_label 'instances_bees_train_bees_omni_label_seed1709_20fully0Unsup0tagsU34tagsK0pointsU0pointsK46boxesEC0boxesU.json' \
15 |     --annotation_json_unlabel 'instances_bees_train_bees_omni_unlabel_seed1709_20fully0Unsup0tagsU34tagsK0pointsU0pointsK46boxesEC0boxesU.json' \
16 |     --CONFIDENCE_THRESHOLD 0.7 \
17 |     --data_path './bees' \
18 |     --lr 2e-4 \
19 |     --epochs 1000 \
20 |     --lr_drop 1000 \
21 |     --pixels 600 \
22 |     --save_freq 100 \
23 |     --eval_freq 20 \
24 |     --dataset_file 'bees_omni' \
25 |     --resume '' \
26 | 


--------------------------------------------------------------------------------
/configs/r50_ut_detr_omni_coco.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | set -x
 4 | 
 5 | EXP_DIR=results/r50_omni_coco_20fully16Unsup0tagsU0tagsK0pointsU0pointsK64boxesEC0boxesU_ep150_burnin20
 6 | PY_ARGS=${@:1}
 7 | 
 8 | python -u main.py \
 9 |     --output_dir ${EXP_DIR} \
10 |     ${PY_ARGS} \
11 |     --BURN_IN_STEP 20 \
12 |     --TEACHER_UPDATE_ITER 1 \
13 |     --EMA_KEEP_RATE 0.9996 \
14 |     --annotation_json_label 'instances_train2017_coco_omni_label_seed1709_20fully16Unsup0tagsU0tagsK0pointsU0pointsK64boxesEC0boxesU.json' \
15 |     --annotation_json_unlabel 'instances_train2017_coco_omni_unlabel_seed1709_20fully16Unsup0tagsU0tagsK0pointsU0pointsK64boxesEC0boxesU.json' \
16 |     --CONFIDENCE_THRESHOLD 0.7 \
17 |     --data_path './coco' \
18 |     --lr 2e-4 \
19 |     --epochs 150 \
20 |     --lr_drop 150 \
21 |     --pixels 600 \
22 |     --save_freq 20 \
23 |     --dataset_file 'coco_omni' \
24 |     --resume '' \
25 | 


--------------------------------------------------------------------------------
/configs/r50_ut_detr_omni_crowdhuman.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | set -x
 4 | 
 5 | EXP_DIR=results/r50_omni_crowdhuman_20fully0Unsup0tagsU0tagsK34pointsU0pointsK46boxesEC0boxesU_ep500_burnin100
 6 | PY_ARGS=${@:1}
 7 | 
 8 | python -u main.py \
 9 |     --output_dir ${EXP_DIR} \
10 |     ${PY_ARGS} \
11 |     --BURN_IN_STEP 100 \
12 |     --TEACHER_UPDATE_ITER 1 \
13 |     --EMA_KEEP_RATE 0.9996 \
14 |     --annotation_json_label 'train_fullbody_crowdhuman_omni_label_seed1709_20fully0Unsup0tagsU0tagsK34pointsU0pointsK46boxesEC0boxesU.json' \
15 |     --annotation_json_unlabel 'train_fullbody_crowdhuman_omni_unlabel_seed1709_20fully0Unsup0tagsU0tagsK34pointsU0pointsK46boxesEC0boxesU.json' \
16 |     --CONFIDENCE_THRESHOLD 0.7 \
17 |     --data_path './crowdhuman' \
18 |     --lr 2e-4 \
19 |     --epochs 500 \
20 |     --lr_drop 500 \
21 |     --pixels 600 \
22 |     --save_freq 100 \
23 |     --eval_freq 5 \
24 |     --dataset_file 'crowdhuman_omni' \
25 |     --resume '' \
26 | 


--------------------------------------------------------------------------------
/configs/r50_ut_detr_omni_objects.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | set -x
 4 | 
 5 | EXP_DIR=results/r50_omni_objects_25fully0Unsup0tagsU0tagsK34pointsU0pointsK41boxesEC0boxesU_ep200_burnin40
 6 | PY_ARGS=${@:1}
 7 | 
 8 | python -u main.py \
 9 |     --output_dir ${EXP_DIR} \
10 |     ${PY_ARGS} \
11 |     --BURN_IN_STEP 40 \
12 |     --TEACHER_UPDATE_ITER 1 \
13 |     --EMA_KEEP_RATE 0.9996 \
14 |     --annotation_json_label 'objects365_train_sampled_objects_omni_label_seed1709_25fully0Unsup0tagsU0tagsK34pointsU0pointsK41boxesEC0boxesU.json' \
15 |     --annotation_json_unlabel 'objects365_train_sampled_objects_omni_unlabel_seed1709_25fully0Unsup0tagsU0tagsK34pointsU0pointsK41boxesEC0boxesU.json' \
16 |     --CONFIDENCE_THRESHOLD 0.7 \
17 |     --data_path './objects365' \
18 |     --lr 2e-4 \
19 |     --epochs 200 \
20 |     --lr_drop 200 \
21 |     --pixels 600 \
22 |     --save_freq 20 \
23 |     --eval_freq 5 \
24 |     --dataset_file 'objects_omni' \
25 |     --resume '' \
26 | 
27 | 


--------------------------------------------------------------------------------
/configs/r50_ut_detr_omni_voc.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | set -x
 4 | 
 5 | EXP_DIR=results/r50_omni_voc_20fully19Unsup0tagsU0tagsK0pointsU0pointsK61boxesEC0boxesU_ep500_burnin100
 6 | PY_ARGS=${@:1}
 7 | 
 8 | python -u main.py \
 9 |     --output_dir ${EXP_DIR} \
10 |     ${PY_ARGS} \
11 |     --BURN_IN_STEP 100 \
12 |     --TEACHER_UPDATE_ITER 1 \
13 |     --EMA_KEEP_RATE 0.9996 \
14 |     --annotation_json_label 'instances_VOC_trainval20072012_voc_omni_label_seed1709_20fully19Unsup0tagsU0tagsK0pointsU0pointsK61boxesEC0boxesU.json' \
15 |     --annotation_json_unlabel 'instances_VOC_trainval20072012_voc_omni_unlabel_seed1709_20fully19Unsup0tagsU0tagsK0pointsU0pointsK61boxesEC0boxesU.json' \
16 |     --CONFIDENCE_THRESHOLD 0.7 \
17 |     --data_path './voc' \
18 |     --lr 2e-4 \
19 |     --epochs 500 \
20 |     --lr_drop 500 \
21 |     --pixels 600 \
22 |     --save_freq 100 \
23 |     --eval_freq 5 \
24 |     --dataset_file 'voc_omni' \
25 |     --resume '' \
26 | 


--------------------------------------------------------------------------------
/configs/r50_ut_detr_point_ufo.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | set -x
 4 | 
 5 | EXP_DIR=results/r50_coco35to80_points_pixels800_ufo_ep500_burnin10
 6 | PY_ARGS=${@:1}
 7 | 
 8 | python -u main.py \
 9 |     --output_dir ${EXP_DIR} \
10 |     ${PY_ARGS} \
11 |     --BURN_IN_STEP 10 \
12 |     --TEACHER_UPDATE_ITER 1 \
13 |     --EMA_KEEP_RATE 0.9996 \
14 |     --annotation_json_label 'instances_valminusminival2014_w_indicator.json' \
15 |     --annotation_json_unlabel 'instances_train2014_w_indicator_pointsK.json' \
16 |     --CONFIDENCE_THRESHOLD 0.7 \
17 |     --data_path './coco' \
18 |     --lr 2e-4 \
19 |     --epochs 500 \
20 |     --lr_drop 500 \
21 |     --pixels 800 \
22 |     --dataset_file 'coco_35to80_point' \
23 |     --resume '' \
24 | 
25 | 


--------------------------------------------------------------------------------
/configs/r50_ut_detr_tagsU_ufo.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | set -x
 4 | 
 5 | EXP_DIR=results/r50_coco35to80_tags_pixels800_ufo_ep500_burnin10
 6 | PY_ARGS=${@:1}
 7 | 
 8 | python -u main.py \
 9 |     --output_dir ${EXP_DIR} \
10 |     ${PY_ARGS} \
11 |     --BURN_IN_STEP 10 \
12 |     --TEACHER_UPDATE_ITER 1 \
13 |     --EMA_KEEP_RATE 0.9996 \
14 |     --annotation_json_label 'instances_valminusminival2014_w_indicator.json' \
15 |     --annotation_json_unlabel 'instances_train2014_w_indicator_tagsU.json' \
16 |     --CONFIDENCE_THRESHOLD 0.7 \
17 |     --data_path './coco' \
18 |     --lr 2e-4 \
19 |     --epochs 500 \
20 |     --lr_drop 500 \
21 |     --pixels 800 \
22 |     --dataset_file 'coco_35to80_tagsU' \
23 |     --resume '' \
24 | 
25 | 


--------------------------------------------------------------------------------
/configs/r50_ut_detr_voc07to12_semi.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | set -x
 4 | 
 5 | EXP_DIR=results/r50_voc07to12_ep500_burnin80
 6 | PY_ARGS=${@:1}
 7 | 
 8 | python -u main.py \
 9 |     --output_dir ${EXP_DIR} \
10 |     ${PY_ARGS} \
11 |     --BURN_IN_STEP 80 \
12 |     --TEACHER_UPDATE_ITER 1 \
13 |     --EMA_KEEP_RATE 0.9996 \
14 |     --annotation_json_label 'instances_VOC_trainval2007_semi_label.json' \
15 |     --annotation_json_unlabel 'instances_VOC_trainval2012_semi_unlabel.json' \
16 |     --CONFIDENCE_THRESHOLD 0.7 \
17 |     --data_path './voc' \
18 |     --lr 2e-4 \
19 |     --epochs 500 \
20 |     --lr_drop 500 \
21 |     --pixels 800 \
22 |     --dataset_file 'voc_semi' \
23 |     --resume '' \
24 | 
25 | 


--------------------------------------------------------------------------------
/datasets/__init__.py:
--------------------------------------------------------------------------------
  1 | # ------------------------------------------------------------------------
  2 | # Deformable DETR
  3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
  4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
  5 | # ------------------------------------------------------------------------
  6 | # Modified from DETR (https://github.com/facebookresearch/detr)
  7 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
  8 | # ------------------------------------------------------------------------
  9 | 
 10 | import torch.utils.data
 11 | from .torchvision_datasets import CocoDetection
 12 | from .coco import build as build_coco
 13 | from .coco import build_semi_label as build_coco_semi_label
 14 | from .coco import build_semi_unlabel as build_coco_semi_unlabel
 15 | 
 16 | def get_coco_api_from_dataset(dataset):
 17 |     for _ in range(10):
 18 |         # if isinstance(dataset, torchvision.datasets.CocoDetection):
 19 |         #     break
 20 |         if isinstance(dataset, torch.utils.data.Subset):
 21 |             dataset = dataset.dataset
 22 |     if isinstance(dataset, CocoDetection):
 23 |         return dataset.coco
 24 | 
 25 | def build_dataset(image_set, label, args):
 26 |     if args.dataset_file == 'coco' and label == True:
 27 |         return build_coco(image_set, args)
 28 |     if args.dataset_file == 'coco_panoptic':
 29 |         # to avoid making panopticapi required for coco
 30 |         from .coco_panoptic import build as build_coco_panoptic
 31 |         return build_coco_panoptic(image_set, args)
 32 |     if args.dataset_file == 'coco_omni' and image_set == 'train' and label == True:
 33 |         return build_coco_semi_label(image_set, args)
 34 |     if args.dataset_file == 'coco_omni' and image_set == 'train' and label == False:
 35 |         return build_coco_semi_unlabel(image_set, args)
 36 |     if args.dataset_file == 'coco_omni' and image_set == 'val' and label == True:
 37 |         return build_coco(image_set, args)
 38 |     if args.dataset_file == 'coco_omni' and image_set == 'burnin' and label == True:
 39 |         return build_coco('train', args)
 40 |     if args.dataset_file == 'coco_add_semi' and image_set == 'train' and label == True:
 41 |         return build_coco_semi_label(image_set, args)
 42 |     if args.dataset_file == 'coco_add_semi' and image_set == 'train' and label == False:
 43 |         return build_coco_semi_unlabel(image_set, args)
 44 |     if args.dataset_file == 'coco_add_semi' and image_set == 'val' and label == True:
 45 |         return build_coco(image_set, args)
 46 |     if args.dataset_file == 'coco_add_semi' and image_set == 'burnin' and label == True:
 47 |         return build_coco('train', args)
 48 |     if args.dataset_file == 'coco_35to80_tagsU' and image_set == 'train' and label == True:
 49 |         return build_coco_semi_label(image_set, args)
 50 |     if args.dataset_file == 'coco_35to80_tagsU' and image_set == 'train' and label == False:
 51 |         return build_coco_semi_unlabel(image_set, args)
 52 |     if args.dataset_file == 'coco_35to80_tagsU' and image_set == 'val' and label == True:
 53 |         return build_coco(image_set, args)
 54 |     if args.dataset_file == 'coco_35to80_tagsU' and image_set == 'burnin' and label == True:
 55 |         return build_coco('train', args)
 56 |     if args.dataset_file == 'coco_35to80_point' and image_set == 'train' and label == True:
 57 |         return build_coco_semi_label(image_set, args)
 58 |     if args.dataset_file == 'coco_35to80_point' and image_set == 'train' and label == False:
 59 |         return build_coco_semi_unlabel(image_set, args)
 60 |     if args.dataset_file == 'coco_35to80_point' and image_set == 'val' and label == True:
 61 |         return build_coco(image_set, args)
 62 |     if args.dataset_file == 'coco_35to80_point' and image_set == 'burnin' and label == True:
 63 |         return build_coco('train', args)
 64 |     if args.dataset_file == 'coco_objects_tagsU' and image_set == 'train' and label == True:
 65 |         return build_coco_semi_label(image_set, args)
 66 |     if args.dataset_file == 'coco_objects_tagsU' and image_set == 'train' and label == False:
 67 |         return build_coco_semi_unlabel(image_set, args)
 68 |     if args.dataset_file == 'coco_objects_tagsU' and image_set == 'val' and label == True:
 69 |         return build_coco(image_set, args)
 70 |     if args.dataset_file == 'coco_objects_tagsU' and image_set == 'burnin' and label == True:
 71 |         return build_coco('train', args)
 72 |     if args.dataset_file == 'coco_objects_points' and image_set == 'train' and label == True:
 73 |         return build_coco_semi_label(image_set, args)
 74 |     if args.dataset_file == 'coco_objects_points' and image_set == 'train' and label == False:
 75 |         return build_coco_semi_unlabel(image_set, args)
 76 |     if args.dataset_file == 'coco_objects_points' and image_set == 'val' and label == True:
 77 |         return build_coco(image_set, args)
 78 |     if args.dataset_file == 'coco_objects_points' and image_set == 'burnin' and label == True:
 79 |         return build_coco('train', args)
 80 |     if args.dataset_file == 'bees_omni' and image_set == 'train' and label == True:
 81 |         return build_coco_semi_label(image_set, args)
 82 |     if args.dataset_file == 'bees_omni' and image_set == 'train' and label == False:
 83 |         return build_coco_semi_unlabel(image_set, args)
 84 |     if args.dataset_file == 'bees_omni' and image_set == 'val' and label == True:
 85 |         return build_coco(image_set, args)
 86 |     if args.dataset_file == 'bees_omni' and image_set == 'burnin' and label == True:
 87 |         return build_coco('train', args)
 88 |     if args.dataset_file == 'voc_semi' and image_set == 'train' and label == True:
 89 |         return build_coco_semi_label(image_set, args)
 90 |     if args.dataset_file == 'voc_semi' and image_set == 'train' and label == False:
 91 |         return build_coco_semi_unlabel(image_set, args)
 92 |     if args.dataset_file == 'voc_semi' and image_set == 'val' and label == True:
 93 |         return build_coco(image_set, args)
 94 |     if args.dataset_file == 'voc_semi' and image_set == 'burnin' and label == True:
 95 |         return build_coco('train', args)
 96 |     if args.dataset_file == 'voc_omni' and image_set == 'train' and label == True:
 97 |         return build_coco_semi_label(image_set, args)
 98 |     if args.dataset_file == 'voc_omni' and image_set == 'train' and label == False:
 99 |         return build_coco_semi_unlabel(image_set, args)
100 |     if args.dataset_file == 'voc_omni' and image_set == 'val' and label == True:
101 |         return build_coco(image_set, args)
102 |     if args.dataset_file == 'voc_omni' and image_set == 'burnin' and label == True:
103 |         return build_coco('train', args)
104 |     if args.dataset_file == 'objects_omni' and image_set == 'train' and label == True:
105 |         return build_coco_semi_label(image_set, args)
106 |     if args.dataset_file == 'objects_omni' and image_set == 'train' and label == False:
107 |         return build_coco_semi_unlabel(image_set, args)
108 |     if args.dataset_file == 'objects_omni' and image_set == 'val' and label == True:
109 |         return build_coco(image_set, args)
110 |     if args.dataset_file == 'objects_omni' and image_set == 'burnin' and label == True:
111 |         return build_coco('train', args)
112 |     if args.dataset_file == 'crowdhuman_omni' and image_set == 'train' and label == True:
113 |         return build_coco_semi_label(image_set, args)
114 |     if args.dataset_file == 'crowdhuman_omni' and image_set == 'train' and label == False:
115 |         return build_coco_semi_unlabel(image_set, args)
116 |     if args.dataset_file == 'crowdhuman_omni' and image_set == 'val' and label == True:
117 |         return build_coco(image_set, args)
118 |     if args.dataset_file == 'crowdhuman_omni' and image_set == 'burnin' and label == True:
119 |         return build_coco('train', args)
120 |     raise ValueError(f'dataset {args.dataset_file} not supported')
121 | 


--------------------------------------------------------------------------------
/datasets/coco_eval.py:
--------------------------------------------------------------------------------
  1 | # ------------------------------------------------------------------------
  2 | # Modified from Deformable DETR (https://github.com/fundamentalvision/Deformable-DETR)
  3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
  4 | # Licensed under the Apache-2.0 License.
  5 | # ------------------------------------------------------------------------
  6 | # Modified from DETR (https://github.com/facebookresearch/detr)
  7 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
  8 | # Licensed under the Apache-2.0 License.
  9 | # ------------------------------------------------------------------------
 10 | # Modifications Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 11 | # SPDX-License-Identifier: Apache-2.0
 12 | 
 13 | """
 14 | COCO evaluator that works in distributed mode.
 15 | 
 16 | Mostly copy-paste from https://github.com/pytorch/vision/blob/edfd5a7/references/detection/coco_eval.py
 17 | The difference is that there is less copy-pasting from pycocotools
 18 | in the end of the file, as python3 can suppress prints with contextlib
 19 | """
 20 | import os
 21 | import contextlib
 22 | import copy
 23 | import numpy as np
 24 | import torch
 25 | 
 26 | from pycocotools.cocoeval import COCOeval
 27 | from pycocotools.coco import COCO
 28 | import pycocotools.mask as mask_util
 29 | 
 30 | from util.misc import all_gather
 31 | 
 32 | 
 33 | class CocoEvaluator(object):
 34 |     def __init__(self, coco_gt, iou_types):
 35 |         assert isinstance(iou_types, (list, tuple))
 36 |         coco_gt = copy.deepcopy(coco_gt)
 37 |         self.coco_gt = coco_gt
 38 | 
 39 |         self.iou_types = iou_types
 40 |         self.coco_eval = {}
 41 |         for iou_type in iou_types:
 42 |             self.coco_eval[iou_type] = COCOeval(coco_gt, iouType=iou_type)
 43 | 
 44 |         self.img_ids = []
 45 |         self.eval_imgs = {k: [] for k in iou_types}
 46 | 
 47 |     def update(self, predictions):
 48 |         img_ids = list(np.unique(list(predictions.keys())))
 49 |         self.img_ids.extend(img_ids)
 50 | 
 51 |         for iou_type in self.iou_types:
 52 |             results = self.prepare(predictions, iou_type)
 53 | 
 54 |             # suppress pycocotools prints
 55 |             with open(os.devnull, 'w') as devnull:
 56 |                 with contextlib.redirect_stdout(devnull):
 57 |                     coco_dt = COCO.loadRes(self.coco_gt, results) if results else COCO()
 58 |             coco_eval = self.coco_eval[iou_type]
 59 | 
 60 |             coco_eval.cocoDt = coco_dt
 61 |             coco_eval.params.imgIds = list(img_ids)
 62 |             img_ids, eval_imgs = evaluate(coco_eval)
 63 | 
 64 |             self.eval_imgs[iou_type].append(eval_imgs)
 65 | 
 66 |     def synchronize_between_processes(self):
 67 |         for iou_type in self.iou_types:
 68 |             self.eval_imgs[iou_type] = np.concatenate(self.eval_imgs[iou_type], 2)
 69 |             create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type])
 70 | 
 71 |     def accumulate(self):
 72 |         for coco_eval in self.coco_eval.values():
 73 |             coco_eval.accumulate()
 74 | 
 75 |     def summarize(self):
 76 |         for iou_type, coco_eval in self.coco_eval.items():
 77 |             print("IoU metric: {}".format(iou_type))
 78 |             coco_eval.summarize()
 79 | 
 80 |     def prepare(self, predictions, iou_type):
 81 |         if iou_type == "bbox":
 82 |             return self.prepare_for_coco_detection(predictions)
 83 |         elif iou_type == "segm":
 84 |             return self.prepare_for_coco_segmentation(predictions)
 85 |         elif iou_type == "keypoints":
 86 |             return self.prepare_for_coco_keypoint(predictions)
 87 |         else:
 88 |             raise ValueError("Unknown iou type {}".format(iou_type))
 89 | 
 90 |     def prepare_for_coco_detection(self, predictions):
 91 |         coco_results = []
 92 |         for original_id, prediction in predictions.items():
 93 |             if len(prediction) == 0:
 94 |                 continue
 95 | 
 96 |             boxes = prediction["boxes"]
 97 |             boxes = convert_to_xywh(boxes).tolist()
 98 |             scores = prediction["scores"].tolist()
 99 |             labels = prediction["labels"].tolist()
100 | 
101 |             coco_results.extend(
102 |                 [
103 |                     {
104 |                         "image_id": original_id,
105 |                         "category_id": labels[k],
106 |                         "bbox": box,
107 |                         "score": scores[k],
108 |                     }
109 |                     for k, box in enumerate(boxes)
110 |                 ]
111 |             )
112 |         return coco_results
113 | 
114 |     def prepare_for_coco_segmentation(self, predictions):
115 |         coco_results = []
116 |         for original_id, prediction in predictions.items():
117 |             if len(prediction) == 0:
118 |                 continue
119 | 
120 |             scores = prediction["scores"]
121 |             labels = prediction["labels"]
122 |             masks = prediction["masks"]
123 | 
124 |             masks = masks > 0.5
125 | 
126 |             scores = prediction["scores"].tolist()
127 |             labels = prediction["labels"].tolist()
128 | 
129 |             rles = [
130 |                 mask_util.encode(np.array(mask[0, :, :, np.newaxis], dtype=np.uint8, order="F"))[0]
131 |                 for mask in masks
132 |             ]
133 |             for rle in rles:
134 |                 rle["counts"] = rle["counts"].decode("utf-8")
135 | 
136 |             coco_results.extend(
137 |                 [
138 |                     {
139 |                         "image_id": original_id,
140 |                         "category_id": labels[k],
141 |                         "segmentation": rle,
142 |                         "score": scores[k],
143 |                     }
144 |                     for k, rle in enumerate(rles)
145 |                 ]
146 |             )
147 |         return coco_results
148 | 
149 |     def prepare_for_coco_keypoint(self, predictions):
150 |         coco_results = []
151 |         for original_id, prediction in predictions.items():
152 |             if len(prediction) == 0:
153 |                 continue
154 | 
155 |             boxes = prediction["boxes"]
156 |             boxes = convert_to_xywh(boxes).tolist()
157 |             scores = prediction["scores"].tolist()
158 |             labels = prediction["labels"].tolist()
159 |             keypoints = prediction["keypoints"]
160 |             keypoints = keypoints.flatten(start_dim=1).tolist()
161 | 
162 |             coco_results.extend(
163 |                 [
164 |                     {
165 |                         "image_id": original_id,
166 |                         "category_id": labels[k],
167 |                         'keypoints': keypoint,
168 |                         "score": scores[k],
169 |                     }
170 |                     for k, keypoint in enumerate(keypoints)
171 |                 ]
172 |             )
173 |         return coco_results
174 | 
175 | 
176 | def convert_to_xywh(boxes):
177 |     xmin, ymin, xmax, ymax = boxes.unbind(1)
178 |     return torch.stack((xmin, ymin, xmax - xmin, ymax - ymin), dim=1)
179 | 
180 | 
181 | def merge(img_ids, eval_imgs):
182 |     all_img_ids = all_gather(img_ids)
183 |     all_eval_imgs = all_gather(eval_imgs)
184 | 
185 |     merged_img_ids = []
186 |     for p in all_img_ids:
187 |         merged_img_ids.extend(p)
188 | 
189 |     merged_eval_imgs = []
190 |     for p in all_eval_imgs:
191 |         merged_eval_imgs.append(p)
192 | 
193 |     merged_img_ids = np.array(merged_img_ids)
194 |     merged_eval_imgs = np.concatenate(merged_eval_imgs, 2)
195 | 
196 |     # keep only unique (and in sorted order) images
197 |     merged_img_ids, idx = np.unique(merged_img_ids, return_index=True)
198 |     merged_eval_imgs = merged_eval_imgs[..., idx]
199 | 
200 |     return merged_img_ids, merged_eval_imgs
201 | 
202 | 
203 | def create_common_coco_eval(coco_eval, img_ids, eval_imgs):
204 |     img_ids, eval_imgs = merge(img_ids, eval_imgs)
205 |     img_ids = list(img_ids)
206 |     eval_imgs = list(eval_imgs.flatten())
207 | 
208 |     coco_eval.evalImgs = eval_imgs
209 |     coco_eval.params.imgIds = img_ids
210 |     coco_eval._paramsEval = copy.deepcopy(coco_eval.params)
211 | 
212 | 
213 | #################################################################
214 | # From pycocotools, just removed the prints and fixed
215 | # a Python3 bug about unicode not defined
216 | #################################################################
217 | 
218 | 
219 | def evaluate(self):
220 |     '''
221 |     Run per image evaluation on given images and store results (a list of dict) in self.evalImgs
222 |     :return: None
223 |     '''
224 |     # tic = time.time()
225 |     # print('Running per image evaluation...')
226 |     p = self.params
227 |     # add backward compatibility if useSegm is specified in params
228 |     if p.useSegm is not None:
229 |         p.iouType = 'segm' if p.useSegm == 1 else 'bbox'
230 |         print('useSegm (deprecated) is not None. Running {} evaluation'.format(p.iouType))
231 |     # print('Evaluate annotation type *{}*'.format(p.iouType))
232 |     p.imgIds = list(np.unique(p.imgIds))
233 |     if p.useCats:
234 |         p.catIds = list(np.unique(p.catIds))
235 |     p.maxDets = sorted(p.maxDets)
236 |     self.params = p
237 | 
238 |     self._prepare()
239 |     # loop through images, area range, max detection number
240 |     catIds = p.catIds if p.useCats else [-1]
241 | 
242 |     if p.iouType == 'segm' or p.iouType == 'bbox':
243 |         computeIoU = self.computeIoU
244 |     elif p.iouType == 'keypoints':
245 |         computeIoU = self.computeOks
246 |     self.ious = {
247 |         (imgId, catId): computeIoU(imgId, catId)
248 |         for imgId in p.imgIds
249 |         for catId in catIds}
250 | 
251 |     evaluateImg = self.evaluateImg
252 |     maxDet = p.maxDets[-1]
253 |     evalImgs = [
254 |         evaluateImg(imgId, catId, areaRng, maxDet)
255 |         for catId in catIds
256 |         for areaRng in p.areaRng
257 |         for imgId in p.imgIds
258 |     ]
259 |     # this is NOT in the pycocotools code, but could be done outside
260 |     evalImgs = np.asarray(evalImgs).reshape(len(catIds), len(p.areaRng), len(p.imgIds))
261 |     self._paramsEval = copy.deepcopy(self.params)
262 |     # toc = time.time()
263 |     # print('DONE (t={:0.2f}s).'.format(toc-tic))
264 |     return p.imgIds, evalImgs
265 | 
266 | #################################################################
267 | # end of straight copy from pycocotools, just removing the prints
268 | #################################################################
269 | 


--------------------------------------------------------------------------------
/datasets/coco_panoptic.py:
--------------------------------------------------------------------------------
  1 | # ------------------------------------------------------------------------
  2 | # Modified from Deformable DETR (https://github.com/fundamentalvision/Deformable-DETR)
  3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
  4 | # Licensed under the Apache-2.0 License.
  5 | # ------------------------------------------------------------------------
  6 | # Modified from DETR (https://github.com/facebookresearch/detr)
  7 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
  8 | # Licensed under the Apache-2.0 License.
  9 | # ------------------------------------------------------------------------
 10 | # Modifications Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 11 | # SPDX-License-Identifier: Apache-2.0
 12 | 
 13 | import json
 14 | from pathlib import Path
 15 | 
 16 | import numpy as np
 17 | import torch
 18 | from PIL import Image
 19 | 
 20 | from panopticapi.utils import rgb2id
 21 | from util.box_ops import masks_to_boxes
 22 | 
 23 | from .coco import make_coco_transforms
 24 | 
 25 | 
 26 | class CocoPanoptic:
 27 |     def __init__(self, img_folder, ann_folder, ann_file, transforms=None, return_masks=True):
 28 |         with open(ann_file, 'r') as f:
 29 |             self.coco = json.load(f)
 30 | 
 31 |         # sort 'images' field so that they are aligned with 'annotations'
 32 |         # i.e., in alphabetical order
 33 |         self.coco['images'] = sorted(self.coco['images'], key=lambda x: x['id'])
 34 |         # sanity check
 35 |         if "annotations" in self.coco:
 36 |             for img, ann in zip(self.coco['images'], self.coco['annotations']):
 37 |                 assert img['file_name'][:-4] == ann['file_name'][:-4]
 38 | 
 39 |         self.img_folder = img_folder
 40 |         self.ann_folder = ann_folder
 41 |         self.ann_file = ann_file
 42 |         self.transforms = transforms
 43 |         self.return_masks = return_masks
 44 | 
 45 |     def __getitem__(self, idx):
 46 |         ann_info = self.coco['annotations'][idx] if "annotations" in self.coco else self.coco['images'][idx]
 47 |         img_path = Path(self.img_folder) / ann_info['file_name'].replace('.png', '.jpg')
 48 |         ann_path = Path(self.ann_folder) / ann_info['file_name']
 49 | 
 50 |         img = Image.open(img_path).convert('RGB')
 51 |         w, h = img.size
 52 |         if "segments_info" in ann_info:
 53 |             masks = np.asarray(Image.open(ann_path), dtype=np.uint32)
 54 |             masks = rgb2id(masks)
 55 | 
 56 |             ids = np.array([ann['id'] for ann in ann_info['segments_info']])
 57 |             masks = masks == ids[:, None, None]
 58 | 
 59 |             masks = torch.as_tensor(masks, dtype=torch.uint8)
 60 |             labels = torch.tensor([ann['category_id'] for ann in ann_info['segments_info']], dtype=torch.int64)
 61 | 
 62 |         target = {}
 63 |         target['image_id'] = torch.tensor([ann_info['image_id'] if "image_id" in ann_info else ann_info["id"]])
 64 |         if self.return_masks:
 65 |             target['masks'] = masks
 66 |         target['labels'] = labels
 67 | 
 68 |         target["boxes"] = masks_to_boxes(masks)
 69 | 
 70 |         target['size'] = torch.as_tensor([int(h), int(w)])
 71 |         target['orig_size'] = torch.as_tensor([int(h), int(w)])
 72 |         if "segments_info" in ann_info:
 73 |             for name in ['iscrowd', 'area']:
 74 |                 target[name] = torch.tensor([ann[name] for ann in ann_info['segments_info']])
 75 | 
 76 |         if self.transforms is not None:
 77 |             img, target = self.transforms(img, target)
 78 | 
 79 |         return img, target
 80 | 
 81 |     def __len__(self):
 82 |         return len(self.coco['images'])
 83 | 
 84 |     def get_height_and_width(self, idx):
 85 |         img_info = self.coco['images'][idx]
 86 |         height = img_info['height']
 87 |         width = img_info['width']
 88 |         return height, width
 89 | 
 90 | 
 91 | def build(image_set, args):
 92 |     img_folder_root = Path(args.coco_path)
 93 |     ann_folder_root = Path(args.coco_panoptic_path)
 94 |     assert img_folder_root.exists(), f'provided COCO path {img_folder_root} does not exist'
 95 |     assert ann_folder_root.exists(), f'provided COCO path {ann_folder_root} does not exist'
 96 |     mode = 'panoptic'
 97 |     PATHS = {
 98 |         "train": ("train2017", Path("annotations") / f'{mode}_train2017.json'),
 99 |         "val": ("val2017", Path("annotations") / f'{mode}_val2017.json'),
100 |     }
101 | 
102 |     img_folder, ann_file = PATHS[image_set]
103 |     img_folder_path = img_folder_root / img_folder
104 |     ann_folder = ann_folder_root / f'{mode}_{img_folder}'
105 |     ann_file = ann_folder_root / ann_file
106 | 
107 |     dataset = CocoPanoptic(img_folder_path, ann_folder, ann_file,
108 |                            transforms=make_coco_transforms(image_set), return_masks=args.masks)
109 | 
110 |     return dataset
111 | 


--------------------------------------------------------------------------------
/datasets/data_prefetcher.py:
--------------------------------------------------------------------------------
  1 | # ------------------------------------------------------------------------
  2 | # Deformable DETR
  3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
  4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
  5 | # ------------------------------------------------------------------------
  6 | # Modifications Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  7 | # SPDX-License-Identifier: Apache-2.0
  8 | 
  9 | import torch
 10 | 
 11 | def to_cuda(samples, targets, device):
 12 |     samples = samples.to(device, non_blocking=True)
 13 |     targets = [{k: v.to(device, non_blocking=True) for k, v in t.items()} for t in targets]
 14 |     return samples, targets
 15 | 
 16 | 
 17 | def to_cuda_semi(samples_q, targets_q, records_q, samples_k, targets_k, records_k, indicators, labeltypes, device):
 18 |     samples_q = samples_q.to(device, non_blocking=True)
 19 |     samples_k = samples_k.to(device, non_blocking=True)
 20 |     targets_q = [{k: v.to(device, non_blocking=True) for k, v in t.items()} for t in targets_q]
 21 |     targets_k = [{k: v.to(device, non_blocking=True) for k, v in t.items()} for t in targets_k]
 22 |     return samples_q, targets_q, records_q, samples_k, targets_k, records_k, indicators, labeltypes
 23 | 
 24 | class data_prefetcher():
 25 |     def __init__(self, loader, device, prefetch=True):
 26 |         self.loader = iter(loader)
 27 |         self.prefetch = prefetch
 28 |         self.device = device
 29 |         if prefetch:
 30 |             self.stream = torch.cuda.Stream()
 31 |             self.preload()
 32 | 
 33 |     def preload(self):
 34 |         try:
 35 |             self.next_samples, self.next_targets = next(self.loader)
 36 |         except StopIteration:
 37 |             self.next_samples = None
 38 |             self.next_targets = None
 39 |             return
 40 |         # if record_stream() doesn't work, another option is to make sure device inputs are created
 41 |         # on the main stream.
 42 |         # self.next_input_gpu = torch.empty_like(self.next_input, device='cuda')
 43 |         # self.next_target_gpu = torch.empty_like(self.next_target, device='cuda')
 44 |         # Need to make sure the memory allocated for next_* is not still in use by the main stream
 45 |         # at the time we start copying to next_*:
 46 |         # self.stream.wait_stream(torch.cuda.current_stream())
 47 |         with torch.cuda.stream(self.stream):
 48 |             self.next_samples, self.next_targets = to_cuda(self.next_samples, self.next_targets, self.device)
 49 |             # more code for the alternative if record_stream() doesn't work:
 50 |             # copy_ will record the use of the pinned source tensor in this side stream.
 51 |             # self.next_input_gpu.copy_(self.next_input, non_blocking=True)
 52 |             # self.next_target_gpu.copy_(self.next_target, non_blocking=True)
 53 |             # self.next_input = self.next_input_gpu
 54 |             # self.next_target = self.next_target_gpu
 55 | 
 56 |             # With Amp, it isn't necessary to manually convert data to half.
 57 |             # if args.fp16:
 58 |             #     self.next_input = self.next_input.half()
 59 |             # else:
 60 | 
 61 |     def next(self):
 62 |         if self.prefetch:
 63 |             torch.cuda.current_stream().wait_stream(self.stream)
 64 |             samples = self.next_samples
 65 |             targets = self.next_targets
 66 |             if samples is not None:
 67 |                 samples.record_stream(torch.cuda.current_stream())
 68 |             if targets is not None:
 69 |                 for t in targets:
 70 |                     for k, v in t.items():
 71 |                         v.record_stream(torch.cuda.current_stream())
 72 |             self.preload()
 73 |         else:
 74 |             try:
 75 |                 samples, targets = next(self.loader)
 76 |                 samples, targets = to_cuda(samples, targets, self.device)
 77 |             except StopIteration:
 78 |                 samples = None
 79 |                 targets = None
 80 |         return samples, targets
 81 | 
 82 | 
 83 | 
 84 | class data_prefetcher_semi():
 85 |     def __init__(self, loader, device, prefetch=True):
 86 |         self.loader = iter(loader)
 87 |         self.prefetch = prefetch
 88 |         self.device = device
 89 |         if prefetch:
 90 |             self.stream = torch.cuda.Stream()
 91 |             self.preload()
 92 | 
 93 |     def preload(self):
 94 |         try:
 95 |             self.next_samples_q, self.next_targets_q, self.next_records_q, self.next_samples_k, self.next_targets_k, self.next_records_k, self.next_indicators, self.next_labeltypes = next(self.loader)
 96 |         except StopIteration:
 97 |             self.next_samples_q = None
 98 |             self.next_targets_q = None
 99 |             self.next_samples_k = None
100 |             self.next_targets_k = None
101 |             self.next_records_q = None
102 |             self.next_records_k = None
103 |             self.next_indicators = None
104 |             self.next_labeltypes = None
105 |             return
106 |         # if record_stream() doesn't work, another option is to make sure device inputs are created
107 |         # on the main stream.
108 |         # self.next_input_gpu = torch.empty_like(self.next_input, device='cuda')
109 |         # self.next_target_gpu = torch.empty_like(self.next_target, device='cuda')
110 |         # Need to make sure the memory allocated for next_* is not still in use by the main stream
111 |         # at the time we start copying to next_*:
112 |         # self.stream.wait_stream(torch.cuda.current_stream())
113 |         with torch.cuda.stream(self.stream):
114 |             self.next_samples_q, self.next_targets_q, self.next_records_q, self.next_samples_k, self.next_targets_k, self.next_records_k, self.next_indicators, self.next_labeltypes = to_cuda_semi(self.next_samples_q, self.next_targets_q, self.next_records_q, self.next_samples_k, self.next_targets_k, self.next_records_k, self.next_indicators, self.next_labeltypes, self.device)
115 |             # more code for the alternative if record_stream() doesn't work:
116 |             # copy_ will record the use of the pinned source tensor in this side stream.
117 |             # self.next_input_gpu.copy_(self.next_input, non_blocking=True)
118 |             # self.next_target_gpu.copy_(self.next_target, non_blocking=True)
119 |             # self.next_input = self.next_input_gpu
120 |             # self.next_target = self.next_target_gpu
121 | 
122 |             # With Amp, it isn't necessary to manually convert data to half.
123 |             # if args.fp16:
124 |             #     self.next_input = self.next_input.half()
125 |             # else:
126 | 
127 |     def next(self):
128 |         if self.prefetch:
129 |             torch.cuda.current_stream().wait_stream(self.stream)
130 |             samples_q = self.next_samples_q
131 |             targets_q = self.next_targets_q
132 |             records_q = self.next_records_q
133 |             samples_k = self.next_samples_k
134 |             targets_k = self.next_targets_k
135 |             records_k = self.next_records_k
136 |             indicators = self.next_indicators
137 |             labeltypes = self.next_labeltypes
138 |             if samples_q is not None:
139 |                 samples_q.record_stream(torch.cuda.current_stream())
140 |             if samples_k is not None:
141 |                 samples_k.record_stream(torch.cuda.current_stream())
142 |             if targets_q is not None:
143 |                 for t in targets_q:
144 |                     for k, v in t.items():
145 |                         v.record_stream(torch.cuda.current_stream())
146 |             if targets_k is not None:
147 |                 for t in targets_k:
148 |                     for k, v in t.items():
149 |                         v.record_stream(torch.cuda.current_stream())
150 |             self.preload()
151 |         else:
152 |             try:
153 |                 samples_q, targets_q, records_q, samples_k, targets_k, records_k, indicators, labeltypes = next(self.loader)
154 |                 samples_q, targets_q, records_q, samples_k, targets_k, records_k, indicators, labeltypes = to_cuda_semi(samples_q, targets_q, records_q, samples_k, targets_k, records_k, indicators, labeltypes, self.device)
155 |             except StopIteration:
156 |                 samples_q = None
157 |                 targets_q = None
158 |                 samples_k = None
159 |                 targets_k = None
160 |                 indicators = None
161 |                 labeltypes = None
162 |         return samples_q, targets_q, records_q, samples_k, targets_k, records_k, indicators, labeltypes


--------------------------------------------------------------------------------
/datasets/panoptic_eval.py:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------
 2 | # Deformable DETR
 3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
 4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 5 | # ------------------------------------------------------------------------
 6 | # Modified from DETR (https://github.com/facebookresearch/detr)
 7 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
 8 | # ------------------------------------------------------------------------
 9 | 
10 | import json
11 | import os
12 | 
13 | import util.misc as utils
14 | 
15 | try:
16 |     from panopticapi.evaluation import pq_compute
17 | except ImportError:
18 |     pass
19 | 
20 | 
21 | class PanopticEvaluator(object):
22 |     def __init__(self, ann_file, ann_folder, output_dir="panoptic_eval"):
23 |         self.gt_json = ann_file
24 |         self.gt_folder = ann_folder
25 |         if utils.is_main_process():
26 |             if not os.path.exists(output_dir):
27 |                 os.mkdir(output_dir)
28 |         self.output_dir = output_dir
29 |         self.predictions = []
30 | 
31 |     def update(self, predictions):
32 |         for p in predictions:
33 |             with open(os.path.join(self.output_dir, p["file_name"]), "wb") as f:
34 |                 f.write(p.pop("png_string"))
35 | 
36 |         self.predictions += predictions
37 | 
38 |     def synchronize_between_processes(self):
39 |         all_predictions = utils.all_gather(self.predictions)
40 |         merged_predictions = []
41 |         for p in all_predictions:
42 |             merged_predictions += p
43 |         self.predictions = merged_predictions
44 | 
45 |     def summarize(self):
46 |         if utils.is_main_process():
47 |             json_data = {"annotations": self.predictions}
48 |             predictions_json = os.path.join(self.output_dir, "predictions.json")
49 |             with open(predictions_json, "w") as f:
50 |                 f.write(json.dumps(json_data))
51 |             return pq_compute(self.gt_json, predictions_json, gt_folder=self.gt_folder, pred_folder=self.output_dir)
52 |         return None
53 | 


--------------------------------------------------------------------------------
/datasets/samplers.py:
--------------------------------------------------------------------------------
  1 | # ------------------------------------------------------------------------
  2 | # Deformable DETR
  3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
  4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
  5 | # ------------------------------------------------------------------------
  6 | # Modified from codes in torch.utils.data.distributed
  7 | # ------------------------------------------------------------------------
  8 | 
  9 | import os
 10 | import math
 11 | import torch
 12 | import torch.distributed as dist
 13 | from torch.utils.data.sampler import Sampler
 14 | 
 15 | 
 16 | class DistributedSampler(Sampler):
 17 |     """Sampler that restricts data loading to a subset of the dataset.
 18 |     It is especially useful in conjunction with
 19 |     :class:`torch.nn.parallel.DistributedDataParallel`. In such case, each
 20 |     process can pass a DistributedSampler instance as a DataLoader sampler,
 21 |     and load a subset of the original dataset that is exclusive to it.
 22 |     .. note::
 23 |         Dataset is assumed to be of constant size.
 24 |     Arguments:
 25 |         dataset: Dataset used for sampling.
 26 |         num_replicas (optional): Number of processes participating in
 27 |             distributed training.
 28 |         rank (optional): Rank of the current process within num_replicas.
 29 |     """
 30 | 
 31 |     def __init__(self, dataset, num_replicas=None, rank=None, local_rank=None, local_size=None, shuffle=True):
 32 |         if num_replicas is None:
 33 |             if not dist.is_available():
 34 |                 raise RuntimeError("Requires distributed package to be available")
 35 |             num_replicas = dist.get_world_size()
 36 |         if rank is None:
 37 |             if not dist.is_available():
 38 |                 raise RuntimeError("Requires distributed package to be available")
 39 |             rank = dist.get_rank()
 40 |         self.dataset = dataset
 41 |         self.num_replicas = num_replicas
 42 |         self.rank = rank
 43 |         self.epoch = 0
 44 |         self.num_samples = int(math.ceil(len(self.dataset) * 1.0 / self.num_replicas))
 45 |         self.total_size = self.num_samples * self.num_replicas
 46 |         self.shuffle = shuffle
 47 | 
 48 |     def __iter__(self):
 49 |         if self.shuffle:
 50 |             # deterministically shuffle based on epoch
 51 |             g = torch.Generator()
 52 |             g.manual_seed(self.epoch)
 53 |             indices = torch.randperm(len(self.dataset), generator=g).tolist()
 54 |         else:
 55 |             indices = torch.arange(len(self.dataset)).tolist()
 56 | 
 57 |         # add extra samples to make it evenly divisible
 58 |         indices += indices[: (self.total_size - len(indices))]
 59 |         assert len(indices) == self.total_size
 60 | 
 61 |         # subsample
 62 |         offset = self.num_samples * self.rank
 63 |         indices = indices[offset : offset + self.num_samples]
 64 |         assert len(indices) == self.num_samples
 65 | 
 66 |         return iter(indices)
 67 | 
 68 |     def __len__(self):
 69 |         return self.num_samples
 70 | 
 71 |     def set_epoch(self, epoch):
 72 |         self.epoch = epoch
 73 | 
 74 | 
 75 | class NodeDistributedSampler(Sampler):
 76 |     """Sampler that restricts data loading to a subset of the dataset.
 77 |     It is especially useful in conjunction with
 78 |     :class:`torch.nn.parallel.DistributedDataParallel`. In such case, each
 79 |     process can pass a DistributedSampler instance as a DataLoader sampler,
 80 |     and load a subset of the original dataset that is exclusive to it.
 81 |     .. note::
 82 |         Dataset is assumed to be of constant size.
 83 |     Arguments:
 84 |         dataset: Dataset used for sampling.
 85 |         num_replicas (optional): Number of processes participating in
 86 |             distributed training.
 87 |         rank (optional): Rank of the current process within num_replicas.
 88 |     """
 89 | 
 90 |     def __init__(self, dataset, num_replicas=None, rank=None, local_rank=None, local_size=None, shuffle=True):
 91 |         if num_replicas is None:
 92 |             if not dist.is_available():
 93 |                 raise RuntimeError("Requires distributed package to be available")
 94 |             num_replicas = dist.get_world_size()
 95 |         if rank is None:
 96 |             if not dist.is_available():
 97 |                 raise RuntimeError("Requires distributed package to be available")
 98 |             rank = dist.get_rank()
 99 |         if local_rank is None:
100 |             local_rank = int(os.environ.get('LOCAL_RANK', 0))
101 |         if local_size is None:
102 |             local_size = int(os.environ.get('LOCAL_SIZE', 1))
103 |         self.dataset = dataset
104 |         self.shuffle = shuffle
105 |         self.num_replicas = num_replicas
106 |         self.num_parts = local_size
107 |         self.rank = rank
108 |         self.local_rank = local_rank
109 |         self.epoch = 0
110 |         self.num_samples = int(math.ceil(len(self.dataset) * 1.0 / self.num_replicas))
111 |         self.total_size = self.num_samples * self.num_replicas
112 | 
113 |         self.total_size_parts = self.num_samples * self.num_replicas // self.num_parts
114 | 
115 |     def __iter__(self):
116 |         if self.shuffle:
117 |             # deterministically shuffle based on epoch
118 |             g = torch.Generator()
119 |             g.manual_seed(self.epoch)
120 |             indices = torch.randperm(len(self.dataset), generator=g).tolist()
121 |         else:
122 |             indices = torch.arange(len(self.dataset)).tolist()
123 |         indices = [i for i in indices if i % self.num_parts == self.local_rank]
124 | 
125 |         # add extra samples to make it evenly divisible
126 |         indices += indices[:(self.total_size_parts - len(indices))]
127 |         assert len(indices) == self.total_size_parts
128 | 
129 |         # subsample
130 |         indices = indices[self.rank // self.num_parts:self.total_size_parts:self.num_replicas // self.num_parts]
131 |         assert len(indices) == self.num_samples
132 | 
133 |         return iter(indices)
134 | 
135 |     def __len__(self):
136 |         return self.num_samples
137 | 
138 |     def set_epoch(self, epoch):
139 |         self.epoch = epoch
140 | 


--------------------------------------------------------------------------------
/datasets/torchvision_datasets/__init__.py:
--------------------------------------------------------------------------------
1 | # ------------------------------------------------------------------------
2 | # Deformable DETR
3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
5 | # ------------------------------------------------------------------------
6 | 
7 | from .coco import CocoDetection
8 | from .coco import CocoDetection_semi
9 | 


--------------------------------------------------------------------------------
/datasets/torchvision_datasets/coco.py:
--------------------------------------------------------------------------------
  1 | # ------------------------------------------------------------------------
  2 | # Deformable DETR
  3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
  4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
  5 | # ------------------------------------------------------------------------
  6 | # Modified from torchvision
  7 | # ------------------------------------------------------------------------
  8 | 
  9 | """
 10 | Copy-Paste from torchvision, but add utility of caching images on memory
 11 | """
 12 | from torchvision.datasets.vision import VisionDataset
 13 | from PIL import Image
 14 | import os
 15 | import os.path
 16 | import tqdm
 17 | from io import BytesIO
 18 | 
 19 | 
 20 | class CocoDetection(VisionDataset):
 21 |     """`MS Coco Detection <http://mscoco.org/dataset/#detections-challenge2016>`_ Dataset.
 22 |     Args:
 23 |         root (string): Root directory where images are downloaded to.
 24 |         annFile (string): Path to json annotation file.
 25 |         transform (callable, optional): A function/transform that  takes in an PIL image
 26 |             and returns a transformed version. E.g, ``transforms.ToTensor``
 27 |         target_transform (callable, optional): A function/transform that takes in the
 28 |             target and transforms it.
 29 |         transforms (callable, optional): A function/transform that takes input sample and its target as entry
 30 |             and returns a transformed version.
 31 |     """
 32 | 
 33 |     def __init__(self, root, annFile, transform=None, target_transform=None, transforms=None,
 34 |                  cache_mode=False, local_rank=0, local_size=1):
 35 |         super(CocoDetection, self).__init__(root, transforms, transform, target_transform)
 36 |         from pycocotools.coco import COCO
 37 |         self.coco = COCO(annFile)
 38 |         self.ids = list(sorted(self.coco.imgs.keys()))
 39 |         self.cache_mode = cache_mode
 40 |         self.local_rank = local_rank
 41 |         self.local_size = local_size
 42 |         if cache_mode:
 43 |             self.cache = {}
 44 |             self.cache_images()
 45 | 
 46 |     def cache_images(self):
 47 |         self.cache = {}
 48 |         for index, img_id in zip(tqdm.trange(len(self.ids)), self.ids):
 49 |             if index % self.local_size != self.local_rank:
 50 |                 continue
 51 |             path = self.coco.loadImgs(img_id)[0]['file_name']
 52 |             with open(os.path.join(self.root, path), 'rb') as f:
 53 |                 self.cache[path] = f.read()
 54 | 
 55 |     def get_image(self, path):
 56 |         if self.cache_mode:
 57 |             if path not in self.cache.keys():
 58 |                 with open(os.path.join(self.root, path), 'rb') as f:
 59 |                     self.cache[path] = f.read()
 60 |             return Image.open(BytesIO(self.cache[path])).convert('RGB')
 61 |         return Image.open(os.path.join(self.root, path)).convert('RGB')
 62 | 
 63 |     def __getitem__(self, index):
 64 |         """
 65 |         Args:
 66 |             index (int): Index
 67 |         Returns:
 68 |             tuple: Tuple (image, target). target is the object returned by ``coco.loadAnns``.
 69 |         """
 70 |         coco = self.coco
 71 |         img_id = self.ids[index]
 72 |         ann_ids = coco.getAnnIds(imgIds=img_id)
 73 |         target = coco.loadAnns(ann_ids)
 74 | 
 75 |         path = coco.loadImgs(img_id)[0]['file_name']
 76 | 
 77 |         img = self.get_image(path)
 78 |         if self.transforms is not None:
 79 |             img, target = self.transforms(img, target)
 80 | 
 81 |         return img, target
 82 | 
 83 |     def __len__(self):
 84 |         return len(self.ids)
 85 | 
 86 | 
 87 | class CocoDetection_semi(VisionDataset):
 88 |     """`MS Coco Detection <http://mscoco.org/dataset/#detections-challenge2016>`_ Dataset.
 89 |     Args:
 90 |         root (string): Root directory where images are downloaded to.
 91 |         annFile (string): Path to json annotation file.
 92 |         transform (callable, optional): A function/transform that  takes in an PIL image
 93 |             and returns a transformed version. E.g, ``transforms.ToTensor``
 94 |         target_transform (callable, optional): A function/transform that takes in the
 95 |             target and transforms it.
 96 |         transforms (callable, optional): A function/transform that takes input sample and its target as entry
 97 |             and returns a transformed version.
 98 |     """
 99 | 
100 |     def __init__(self, root, annFile, transform=None, target_transform=None, transforms=None,
101 |                  cache_mode=False, local_rank=0, local_size=1):
102 |         super(CocoDetection_semi, self).__init__(root, transforms, transform, target_transform)
103 |         from pycocotools.coco import COCO
104 |         self.coco = COCO(annFile)
105 |         self.ids = list(sorted(self.coco.imgs.keys()))
106 |         self.cache_mode = cache_mode
107 |         self.local_rank = local_rank
108 |         self.local_size = local_size
109 |         if cache_mode:
110 |             self.cache = {}
111 |             self.cache_images()
112 | 
113 |     def cache_images(self):
114 |         self.cache = {}
115 |         for index, img_id in zip(tqdm.trange(len(self.ids)), self.ids):
116 |             if index % self.local_size != self.local_rank:
117 |                 continue
118 |             path = self.coco.loadImgs(img_id)[0]['file_name']
119 |             with open(os.path.join(self.root, path), 'rb') as f:
120 |                 self.cache[path] = f.read()
121 | 
122 |     def get_image(self, path):
123 |         if self.cache_mode:
124 |             if path not in self.cache.keys():
125 |                 with open(os.path.join(self.root, path), 'rb') as f:
126 |                     self.cache[path] = f.read()
127 |             return Image.open(BytesIO(self.cache[path])).convert('RGB')
128 |         return Image.open(os.path.join(self.root, path)).convert('RGB')
129 | 
130 |     def __getitem__(self, index):
131 |         """
132 |         Args:
133 |             index (int): Index
134 |         Returns:
135 |             tuple: Tuple (image, target). target is the object returned by ``coco.loadAnns``.
136 |         """
137 |         coco = self.coco
138 |         img_id = self.ids[index]
139 |         ann_ids = coco.getAnnIds(imgIds=img_id)
140 |         target = coco.loadAnns(ann_ids)
141 | 
142 |         path = coco.loadImgs(img_id)[0]['file_name']
143 |         indicator = coco.loadImgs(img_id)[0]['indicator']
144 |         labeltype = coco.loadImgs(img_id)[0]['label_type']
145 | 
146 |         img = self.get_image(path)
147 |         if self.transforms is not None:
148 |             img, target = self.transforms(img, target)
149 | 
150 |         return img, target, indicator, labeltype
151 | 
152 |     def __len__(self):
153 |         return len(self.ids)


--------------------------------------------------------------------------------
/models/__init__.py:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------
 2 | # Modified from Deformable DETR (https://github.com/fundamentalvision/Deformable-DETR)
 3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
 4 | # Licensed under the Apache-2.0 License.
 5 | # ------------------------------------------------------------------------
 6 | # Modified from DETR (https://github.com/facebookresearch/detr)
 7 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
 8 | # Licensed under the Apache-2.0 License.
 9 | # ------------------------------------------------------------------------
10 | # Modifications Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
11 | # SPDX-License-Identifier: Apache-2.0
12 | 
13 | from .deformable_detr import build
14 | from .deformable_detr import build_semi
15 | 
16 | def build_model(args):
17 |     return build(args)
18 | 
19 | def build_model_semi(args):
20 |     return build_semi(args)


--------------------------------------------------------------------------------
/models/backbone.py:
--------------------------------------------------------------------------------
  1 | # ------------------------------------------------------------------------
  2 | # Deformable DETR
  3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
  4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
  5 | # ------------------------------------------------------------------------
  6 | # Modified from DETR (https://github.com/facebookresearch/detr)
  7 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
  8 | # ------------------------------------------------------------------------
  9 | 
 10 | """
 11 | Backbone modules.
 12 | """
 13 | from collections import OrderedDict
 14 | 
 15 | import torch
 16 | import torch.nn.functional as F
 17 | import torchvision
 18 | from torch import nn
 19 | from torchvision.models._utils import IntermediateLayerGetter
 20 | from typing import Dict, List
 21 | 
 22 | from util.misc import NestedTensor, is_main_process
 23 | 
 24 | from .position_encoding import build_position_encoding
 25 | 
 26 | 
 27 | class FrozenBatchNorm2d(torch.nn.Module):
 28 |     """
 29 |     BatchNorm2d where the batch statistics and the affine parameters are fixed.
 30 | 
 31 |     Copy-paste from torchvision.misc.ops with added eps before rqsrt,
 32 |     without which any other models than torchvision.models.resnet[18,34,50,101]
 33 |     produce nans.
 34 |     """
 35 | 
 36 |     def __init__(self, n, eps=1e-5):
 37 |         super(FrozenBatchNorm2d, self).__init__()
 38 |         self.register_buffer("weight", torch.ones(n))
 39 |         self.register_buffer("bias", torch.zeros(n))
 40 |         self.register_buffer("running_mean", torch.zeros(n))
 41 |         self.register_buffer("running_var", torch.ones(n))
 42 |         self.eps = eps
 43 | 
 44 |     def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,
 45 |                               missing_keys, unexpected_keys, error_msgs):
 46 |         num_batches_tracked_key = prefix + 'num_batches_tracked'
 47 |         if num_batches_tracked_key in state_dict:
 48 |             del state_dict[num_batches_tracked_key]
 49 | 
 50 |         super(FrozenBatchNorm2d, self)._load_from_state_dict(
 51 |             state_dict, prefix, local_metadata, strict,
 52 |             missing_keys, unexpected_keys, error_msgs)
 53 | 
 54 |     def forward(self, x):
 55 |         # move reshapes to the beginning
 56 |         # to make it fuser-friendly
 57 |         w = self.weight.reshape(1, -1, 1, 1)
 58 |         b = self.bias.reshape(1, -1, 1, 1)
 59 |         rv = self.running_var.reshape(1, -1, 1, 1)
 60 |         rm = self.running_mean.reshape(1, -1, 1, 1)
 61 |         eps = self.eps
 62 |         scale = w * (rv + eps).rsqrt()
 63 |         bias = b - rm * scale
 64 |         return x * scale + bias
 65 | 
 66 | 
 67 | class BackboneBase(nn.Module):
 68 | 
 69 |     def __init__(self, backbone: nn.Module, train_backbone: bool, return_interm_layers: bool):
 70 |         super().__init__()
 71 |         for name, parameter in backbone.named_parameters():
 72 |             if not train_backbone or 'layer2' not in name and 'layer3' not in name and 'layer4' not in name:
 73 |                 parameter.requires_grad_(False)
 74 |         if return_interm_layers:
 75 |             # return_layers = {"layer1": "0", "layer2": "1", "layer3": "2", "layer4": "3"}
 76 |             return_layers = {"layer2": "0", "layer3": "1", "layer4": "2"}
 77 |             self.strides = [8, 16, 32]
 78 |             self.num_channels = [512, 1024, 2048]
 79 |         else:
 80 |             return_layers = {'layer4': "0"}
 81 |             self.strides = [32]
 82 |             self.num_channels = [2048]
 83 |         self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)
 84 | 
 85 |     def forward(self, tensor_list: NestedTensor):
 86 |         xs = self.body(tensor_list.tensors)
 87 |         out: Dict[str, NestedTensor] = {}
 88 |         for name, x in xs.items():
 89 |             m = tensor_list.mask
 90 |             assert m is not None
 91 |             mask = F.interpolate(m[None].float(), size=x.shape[-2:]).to(torch.bool)[0]
 92 |             out[name] = NestedTensor(x, mask)
 93 |         return out
 94 | 
 95 | 
 96 | class Backbone(BackboneBase):
 97 |     """ResNet backbone with frozen BatchNorm."""
 98 |     def __init__(self, name: str,
 99 |                  train_backbone: bool,
100 |                  return_interm_layers: bool,
101 |                  dilation: bool):
102 |         norm_layer = FrozenBatchNorm2d
103 |         backbone = getattr(torchvision.models, name)(
104 |             replace_stride_with_dilation=[False, False, dilation],
105 |             pretrained=is_main_process(), norm_layer=norm_layer)
106 |         assert name not in ('resnet18', 'resnet34'), "number of channels are hard coded"
107 | 
108 |         super().__init__(backbone, train_backbone, return_interm_layers)
109 |         if dilation:
110 |             self.strides[-1] = self.strides[-1] // 2
111 | 
112 | 
113 | class Joiner(nn.Sequential):
114 |     def __init__(self, backbone, position_embedding):
115 |         super().__init__(backbone, position_embedding)
116 |         self.strides = backbone.strides
117 |         self.num_channels = backbone.num_channels
118 | 
119 |     def forward(self, tensor_list: NestedTensor):
120 |         xs = self[0](tensor_list)
121 |         out: List[NestedTensor] = []
122 |         pos = []
123 |         for name, x in sorted(xs.items()):
124 |             out.append(x)
125 | 
126 |         # position encoding
127 |         for x in out:
128 |             pos.append(self[1](x).to(x.tensors.dtype))
129 | 
130 |         return out, pos
131 | 
132 | 
133 | def build_backbone(args):
134 |     position_embedding = build_position_encoding(args)
135 |     train_backbone = args.lr_backbone > 0
136 |     return_interm_layers = args.masks or (args.num_feature_levels > 1)
137 |     backbone = Backbone(args.backbone, train_backbone, return_interm_layers, args.dilation)
138 |     model = Joiner(backbone, position_embedding)
139 |     return model
140 | 


--------------------------------------------------------------------------------
/models/ops/functions/__init__.py:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------------------------------
 2 | # Deformable DETR
 3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
 4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 5 | # ------------------------------------------------------------------------------------------------
 6 | # Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
 7 | # ------------------------------------------------------------------------------------------------
 8 | 
 9 | from .ms_deform_attn_func import MSDeformAttnFunction
10 | 
11 | 


--------------------------------------------------------------------------------
/models/ops/functions/ms_deform_attn_func.py:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------------------------------
 2 | # Deformable DETR
 3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
 4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 5 | # ------------------------------------------------------------------------------------------------
 6 | # Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
 7 | # ------------------------------------------------------------------------------------------------
 8 | 
 9 | from __future__ import absolute_import
10 | from __future__ import print_function
11 | from __future__ import division
12 | 
13 | import torch
14 | import torch.nn.functional as F
15 | from torch.autograd import Function
16 | from torch.autograd.function import once_differentiable
17 | 
18 | import MultiScaleDeformableAttention as MSDA
19 | 
20 | 
21 | class MSDeformAttnFunction(Function):
22 |     @staticmethod
23 |     def forward(ctx, value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights, im2col_step):
24 |         ctx.im2col_step = im2col_step
25 |         output = MSDA.ms_deform_attn_forward(
26 |             value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights, ctx.im2col_step)
27 |         ctx.save_for_backward(value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights)
28 |         return output
29 | 
30 |     @staticmethod
31 |     @once_differentiable
32 |     def backward(ctx, grad_output):
33 |         value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights = ctx.saved_tensors
34 |         grad_value, grad_sampling_loc, grad_attn_weight = \
35 |             MSDA.ms_deform_attn_backward(
36 |                 value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights, grad_output, ctx.im2col_step)
37 | 
38 |         return grad_value, None, None, grad_sampling_loc, grad_attn_weight, None
39 | 
40 | 
41 | def ms_deform_attn_core_pytorch(value, value_spatial_shapes, sampling_locations, attention_weights):
42 |     # for debug and test only,
43 |     # need to use cuda version instead
44 |     N_, S_, M_, D_ = value.shape
45 |     _, Lq_, M_, L_, P_, _ = sampling_locations.shape
46 |     value_list = value.split([H_ * W_ for H_, W_ in value_spatial_shapes], dim=1)
47 |     sampling_grids = 2 * sampling_locations - 1
48 |     sampling_value_list = []
49 |     for lid_, (H_, W_) in enumerate(value_spatial_shapes):
50 |         # N_, H_*W_, M_, D_ -> N_, H_*W_, M_*D_ -> N_, M_*D_, H_*W_ -> N_*M_, D_, H_, W_
51 |         value_l_ = value_list[lid_].flatten(2).transpose(1, 2).reshape(N_*M_, D_, H_, W_)
52 |         # N_, Lq_, M_, P_, 2 -> N_, M_, Lq_, P_, 2 -> N_*M_, Lq_, P_, 2
53 |         sampling_grid_l_ = sampling_grids[:, :, :, lid_].transpose(1, 2).flatten(0, 1)
54 |         # N_*M_, D_, Lq_, P_
55 |         sampling_value_l_ = F.grid_sample(value_l_, sampling_grid_l_,
56 |                                           mode='bilinear', padding_mode='zeros', align_corners=False)
57 |         sampling_value_list.append(sampling_value_l_)
58 |     # (N_, Lq_, M_, L_, P_) -> (N_, M_, Lq_, L_, P_) -> (N_, M_, 1, Lq_, L_*P_)
59 |     attention_weights = attention_weights.transpose(1, 2).reshape(N_*M_, 1, Lq_, L_*P_)
60 |     output = (torch.stack(sampling_value_list, dim=-2).flatten(-2) * attention_weights).sum(-1).view(N_, M_*D_, Lq_)
61 |     return output.transpose(1, 2).contiguous()
62 | 


--------------------------------------------------------------------------------
/models/ops/make.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | # ------------------------------------------------------------------------------------------------
 3 | # Deformable DETR
 4 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
 5 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 6 | # ------------------------------------------------------------------------------------------------
 7 | # Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
 8 | # ------------------------------------------------------------------------------------------------
 9 | 
10 | python setup.py build install
11 | 


--------------------------------------------------------------------------------
/models/ops/modules/__init__.py:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------------------------------
 2 | # Deformable DETR
 3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
 4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 5 | # ------------------------------------------------------------------------------------------------
 6 | # Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
 7 | # ------------------------------------------------------------------------------------------------
 8 | 
 9 | from .ms_deform_attn import MSDeformAttn
10 | 


--------------------------------------------------------------------------------
/models/ops/modules/ms_deform_attn.py:
--------------------------------------------------------------------------------
  1 | # ------------------------------------------------------------------------------------------------
  2 | # Deformable DETR
  3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
  4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
  5 | # ------------------------------------------------------------------------------------------------
  6 | # Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
  7 | # ------------------------------------------------------------------------------------------------
  8 | 
  9 | from __future__ import absolute_import
 10 | from __future__ import print_function
 11 | from __future__ import division
 12 | 
 13 | import warnings
 14 | import math
 15 | 
 16 | import torch
 17 | from torch import nn
 18 | import torch.nn.functional as F
 19 | from torch.nn.init import xavier_uniform_, constant_
 20 | 
 21 | from ..functions import MSDeformAttnFunction
 22 | 
 23 | 
 24 | def _is_power_of_2(n):
 25 |     if (not isinstance(n, int)) or (n < 0):
 26 |         raise ValueError("invalid input for _is_power_of_2: {} (type: {})".format(n, type(n)))
 27 |     return (n & (n-1) == 0) and n != 0
 28 | 
 29 | 
 30 | class MSDeformAttn(nn.Module):
 31 |     def __init__(self, d_model=256, n_levels=4, n_heads=8, n_points=4):
 32 |         """
 33 |         Multi-Scale Deformable Attention Module
 34 |         :param d_model      hidden dimension
 35 |         :param n_levels     number of feature levels
 36 |         :param n_heads      number of attention heads
 37 |         :param n_points     number of sampling points per attention head per feature level
 38 |         """
 39 |         super().__init__()
 40 |         if d_model % n_heads != 0:
 41 |             raise ValueError('d_model must be divisible by n_heads, but got {} and {}'.format(d_model, n_heads))
 42 |         _d_per_head = d_model // n_heads
 43 |         # you'd better set _d_per_head to a power of 2 which is more efficient in our CUDA implementation
 44 |         if not _is_power_of_2(_d_per_head):
 45 |             warnings.warn("You'd better set d_model in MSDeformAttn to make the dimension of each attention head a power of 2 "
 46 |                           "which is more efficient in our CUDA implementation.")
 47 | 
 48 |         self.im2col_step = 64
 49 | 
 50 |         self.d_model = d_model
 51 |         self.n_levels = n_levels
 52 |         self.n_heads = n_heads
 53 |         self.n_points = n_points
 54 | 
 55 |         self.sampling_offsets = nn.Linear(d_model, n_heads * n_levels * n_points * 2)
 56 |         self.attention_weights = nn.Linear(d_model, n_heads * n_levels * n_points)
 57 |         self.value_proj = nn.Linear(d_model, d_model)
 58 |         self.output_proj = nn.Linear(d_model, d_model)
 59 | 
 60 |         self._reset_parameters()
 61 | 
 62 |     def _reset_parameters(self):
 63 |         constant_(self.sampling_offsets.weight.data, 0.)
 64 |         thetas = torch.arange(self.n_heads, dtype=torch.float32) * (2.0 * math.pi / self.n_heads)
 65 |         grid_init = torch.stack([thetas.cos(), thetas.sin()], -1)
 66 |         grid_init = (grid_init / grid_init.abs().max(-1, keepdim=True)[0]).view(self.n_heads, 1, 1, 2).repeat(1, self.n_levels, self.n_points, 1)
 67 |         for i in range(self.n_points):
 68 |             grid_init[:, :, i, :] *= i + 1
 69 |         with torch.no_grad():
 70 |             self.sampling_offsets.bias = nn.Parameter(grid_init.view(-1))
 71 |         constant_(self.attention_weights.weight.data, 0.)
 72 |         constant_(self.attention_weights.bias.data, 0.)
 73 |         xavier_uniform_(self.value_proj.weight.data)
 74 |         constant_(self.value_proj.bias.data, 0.)
 75 |         xavier_uniform_(self.output_proj.weight.data)
 76 |         constant_(self.output_proj.bias.data, 0.)
 77 | 
 78 |     def forward(self, query, reference_points, input_flatten, input_spatial_shapes, input_level_start_index, input_padding_mask=None):
 79 |         """
 80 |         :param query                       (N, Length_{query}, C)
 81 |         :param reference_points            (N, Length_{query}, n_levels, 2), range in [0, 1], top-left (0,0), bottom-right (1, 1), including padding area
 82 |                                         or (N, Length_{query}, n_levels, 4), add additional (w, h) to form reference boxes
 83 |         :param input_flatten               (N, \sum_{l=0}^{L-1} H_l \cdot W_l, C)
 84 |         :param input_spatial_shapes        (n_levels, 2), [(H_0, W_0), (H_1, W_1), ..., (H_{L-1}, W_{L-1})]
 85 |         :param input_level_start_index     (n_levels, ), [0, H_0*W_0, H_0*W_0+H_1*W_1, H_0*W_0+H_1*W_1+H_2*W_2, ..., H_0*W_0+H_1*W_1+...+H_{L-1}*W_{L-1}]
 86 |         :param input_padding_mask          (N, \sum_{l=0}^{L-1} H_l \cdot W_l), True for padding elements, False for non-padding elements
 87 | 
 88 |         :return output                     (N, Length_{query}, C)
 89 |         """
 90 |         N, Len_q, _ = query.shape
 91 |         N, Len_in, _ = input_flatten.shape
 92 |         assert (input_spatial_shapes[:, 0] * input_spatial_shapes[:, 1]).sum() == Len_in
 93 | 
 94 |         value = self.value_proj(input_flatten)
 95 |         if input_padding_mask is not None:
 96 |             value = value.masked_fill(input_padding_mask[..., None], float(0))
 97 |         value = value.view(N, Len_in, self.n_heads, self.d_model // self.n_heads)
 98 |         sampling_offsets = self.sampling_offsets(query).view(N, Len_q, self.n_heads, self.n_levels, self.n_points, 2)
 99 |         attention_weights = self.attention_weights(query).view(N, Len_q, self.n_heads, self.n_levels * self.n_points)
100 |         attention_weights = F.softmax(attention_weights, -1).view(N, Len_q, self.n_heads, self.n_levels, self.n_points)
101 |         # N, Len_q, n_heads, n_levels, n_points, 2
102 |         if reference_points.shape[-1] == 2:
103 |             offset_normalizer = torch.stack([input_spatial_shapes[..., 1], input_spatial_shapes[..., 0]], -1)
104 |             sampling_locations = reference_points[:, :, None, :, None, :] \
105 |                                  + sampling_offsets / offset_normalizer[None, None, None, :, None, :]
106 |         elif reference_points.shape[-1] == 4:
107 |             sampling_locations = reference_points[:, :, None, :, None, :2] \
108 |                                  + sampling_offsets / self.n_points * reference_points[:, :, None, :, None, 2:] * 0.5
109 |         else:
110 |             raise ValueError(
111 |                 'Last dim of reference_points must be 2 or 4, but get {} instead.'.format(reference_points.shape[-1]))
112 |         output = MSDeformAttnFunction.apply(
113 |             value, input_spatial_shapes, input_level_start_index, sampling_locations, attention_weights, self.im2col_step)
114 |         output = self.output_proj(output)
115 |         return output
116 | 


--------------------------------------------------------------------------------
/models/ops/setup.py:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------------------------------
 2 | # Deformable DETR
 3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
 4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 5 | # ------------------------------------------------------------------------------------------------
 6 | # Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
 7 | # ------------------------------------------------------------------------------------------------
 8 | 
 9 | import os
10 | import glob
11 | 
12 | import torch
13 | 
14 | from torch.utils.cpp_extension import CUDA_HOME
15 | from torch.utils.cpp_extension import CppExtension
16 | from torch.utils.cpp_extension import CUDAExtension
17 | 
18 | from setuptools import find_packages
19 | from setuptools import setup
20 | 
21 | requirements = ["torch", "torchvision"]
22 | 
23 | def get_extensions():
24 |     this_dir = os.path.dirname(os.path.abspath(__file__))
25 |     extensions_dir = os.path.join(this_dir, "src")
26 | 
27 |     main_file = glob.glob(os.path.join(extensions_dir, "*.cpp"))
28 |     source_cpu = glob.glob(os.path.join(extensions_dir, "cpu", "*.cpp"))
29 |     source_cuda = glob.glob(os.path.join(extensions_dir, "cuda", "*.cu"))
30 | 
31 |     sources = main_file + source_cpu
32 |     extension = CppExtension
33 |     extra_compile_args = {"cxx": []}
34 |     define_macros = []
35 | 
36 |     if torch.cuda.is_available() and CUDA_HOME is not None:
37 |         extension = CUDAExtension
38 |         sources += source_cuda
39 |         define_macros += [("WITH_CUDA", None)]
40 |         extra_compile_args["nvcc"] = [
41 |             "-DCUDA_HAS_FP16=1",
42 |             "-D__CUDA_NO_HALF_OPERATORS__",
43 |             "-D__CUDA_NO_HALF_CONVERSIONS__",
44 |             "-D__CUDA_NO_HALF2_OPERATORS__",
45 |         ]
46 |     else:
47 |         raise NotImplementedError('Cuda is not availabel')
48 | 
49 |     sources = [os.path.join(extensions_dir, s) for s in sources]
50 |     include_dirs = [extensions_dir]
51 |     ext_modules = [
52 |         extension(
53 |             "MultiScaleDeformableAttention",
54 |             sources,
55 |             include_dirs=include_dirs,
56 |             define_macros=define_macros,
57 |             extra_compile_args=extra_compile_args,
58 |         )
59 |     ]
60 |     return ext_modules
61 | 
62 | setup(
63 |     name="MultiScaleDeformableAttention",
64 |     version="1.0",
65 |     author="Weijie Su",
66 |     url="https://github.com/fundamentalvision/Deformable-DETR",
67 |     description="PyTorch Wrapper for CUDA Functions of Multi-Scale Deformable Attention",
68 |     packages=find_packages(exclude=("configs", "tests",)),
69 |     ext_modules=get_extensions(),
70 |     cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
71 | )
72 | 


--------------------------------------------------------------------------------
/models/ops/src/cpu/ms_deform_attn_cpu.cpp:
--------------------------------------------------------------------------------
 1 | /*!
 2 | **************************************************************************************************
 3 | * Deformable DETR
 4 | * Copyright (c) 2020 SenseTime. All Rights Reserved.
 5 | * Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 6 | **************************************************************************************************
 7 | * Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
 8 | **************************************************************************************************
 9 | */
10 | 
11 | #include <vector>
12 | 
13 | #include <ATen/ATen.h>
14 | #include <ATen/cuda/CUDAContext.h>
15 | 
16 | 
17 | at::Tensor
18 | ms_deform_attn_cpu_forward(
19 |     const at::Tensor &value, 
20 |     const at::Tensor &spatial_shapes,
21 |     const at::Tensor &level_start_index,
22 |     const at::Tensor &sampling_loc,
23 |     const at::Tensor &attn_weight,
24 |     const int im2col_step)
25 | {
26 |     AT_ERROR("Not implement on cpu");
27 | }
28 | 
29 | std::vector<at::Tensor>
30 | ms_deform_attn_cpu_backward(
31 |     const at::Tensor &value, 
32 |     const at::Tensor &spatial_shapes,
33 |     const at::Tensor &level_start_index,
34 |     const at::Tensor &sampling_loc,
35 |     const at::Tensor &attn_weight,
36 |     const at::Tensor &grad_output,
37 |     const int im2col_step)
38 | {
39 |     AT_ERROR("Not implement on cpu");
40 | }
41 | 
42 | 


--------------------------------------------------------------------------------
/models/ops/src/cpu/ms_deform_attn_cpu.h:
--------------------------------------------------------------------------------
 1 | /*!
 2 | **************************************************************************************************
 3 | * Deformable DETR
 4 | * Copyright (c) 2020 SenseTime. All Rights Reserved.
 5 | * Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 6 | **************************************************************************************************
 7 | * Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
 8 | **************************************************************************************************
 9 | */
10 | 
11 | #pragma once
12 | #include <torch/extension.h>
13 | 
14 | at::Tensor
15 | ms_deform_attn_cpu_forward(
16 |     const at::Tensor &value, 
17 |     const at::Tensor &spatial_shapes,
18 |     const at::Tensor &level_start_index,
19 |     const at::Tensor &sampling_loc,
20 |     const at::Tensor &attn_weight,
21 |     const int im2col_step);
22 | 
23 | std::vector<at::Tensor>
24 | ms_deform_attn_cpu_backward(
25 |     const at::Tensor &value, 
26 |     const at::Tensor &spatial_shapes,
27 |     const at::Tensor &level_start_index,
28 |     const at::Tensor &sampling_loc,
29 |     const at::Tensor &attn_weight,
30 |     const at::Tensor &grad_output,
31 |     const int im2col_step);
32 | 
33 | 
34 | 


--------------------------------------------------------------------------------
/models/ops/src/cuda/ms_deform_attn_cuda.cu:
--------------------------------------------------------------------------------
  1 | /*!
  2 | **************************************************************************************************
  3 | * Deformable DETR
  4 | * Copyright (c) 2020 SenseTime. All Rights Reserved.
  5 | * Licensed under the Apache License, Version 2.0 [see LICENSE for details]
  6 | **************************************************************************************************
  7 | * Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
  8 | **************************************************************************************************
  9 | */
 10 | 
 11 | #include <vector>
 12 | #include "cuda/ms_deform_im2col_cuda.cuh"
 13 | 
 14 | #include <ATen/ATen.h>
 15 | #include <ATen/cuda/CUDAContext.h>
 16 | #include <cuda.h>
 17 | #include <cuda_runtime.h>
 18 | 
 19 | 
 20 | at::Tensor ms_deform_attn_cuda_forward(
 21 |     const at::Tensor &value, 
 22 |     const at::Tensor &spatial_shapes,
 23 |     const at::Tensor &level_start_index,
 24 |     const at::Tensor &sampling_loc,
 25 |     const at::Tensor &attn_weight,
 26 |     const int im2col_step)
 27 | {
 28 |     AT_ASSERTM(value.is_contiguous(), "value tensor has to be contiguous");
 29 |     AT_ASSERTM(spatial_shapes.is_contiguous(), "spatial_shapes tensor has to be contiguous");
 30 |     AT_ASSERTM(level_start_index.is_contiguous(), "level_start_index tensor has to be contiguous");
 31 |     AT_ASSERTM(sampling_loc.is_contiguous(), "sampling_loc tensor has to be contiguous");
 32 |     AT_ASSERTM(attn_weight.is_contiguous(), "attn_weight tensor has to be contiguous");
 33 | 
 34 |     AT_ASSERTM(value.type().is_cuda(), "value must be a CUDA tensor");
 35 |     AT_ASSERTM(spatial_shapes.type().is_cuda(), "spatial_shapes must be a CUDA tensor");
 36 |     AT_ASSERTM(level_start_index.type().is_cuda(), "level_start_index must be a CUDA tensor");
 37 |     AT_ASSERTM(sampling_loc.type().is_cuda(), "sampling_loc must be a CUDA tensor");
 38 |     AT_ASSERTM(attn_weight.type().is_cuda(), "attn_weight must be a CUDA tensor");
 39 | 
 40 |     const int batch = value.size(0);
 41 |     const int spatial_size = value.size(1);
 42 |     const int num_heads = value.size(2);
 43 |     const int channels = value.size(3);
 44 | 
 45 |     const int num_levels = spatial_shapes.size(0);
 46 | 
 47 |     const int num_query = sampling_loc.size(1);
 48 |     const int num_point = sampling_loc.size(4);
 49 | 
 50 |     const int im2col_step_ = std::min(batch, im2col_step);
 51 | 
 52 |     AT_ASSERTM(batch % im2col_step_ == 0, "batch(%d) must divide im2col_step(%d)", batch, im2col_step_);
 53 |     
 54 |     auto output = at::zeros({batch, num_query, num_heads, channels}, value.options());
 55 | 
 56 |     const int batch_n = im2col_step_;
 57 |     auto output_n = output.view({batch/im2col_step_, batch_n, num_query, num_heads, channels});
 58 |     auto per_value_size = spatial_size * num_heads * channels;
 59 |     auto per_sample_loc_size = num_query * num_heads * num_levels * num_point * 2;
 60 |     auto per_attn_weight_size = num_query * num_heads * num_levels * num_point;
 61 |     for (int n = 0; n < batch/im2col_step_; ++n)
 62 |     {
 63 |         auto columns = output_n.select(0, n);
 64 |         AT_DISPATCH_FLOATING_TYPES(value.type(), "ms_deform_attn_forward_cuda", ([&] {
 65 |             ms_deformable_im2col_cuda(at::cuda::getCurrentCUDAStream(),
 66 |                 value.data<scalar_t>() + n * im2col_step_ * per_value_size,
 67 |                 spatial_shapes.data<int64_t>(),
 68 |                 level_start_index.data<int64_t>(),
 69 |                 sampling_loc.data<scalar_t>() + n * im2col_step_ * per_sample_loc_size,
 70 |                 attn_weight.data<scalar_t>() + n * im2col_step_ * per_attn_weight_size,
 71 |                 batch_n, spatial_size, num_heads, channels, num_levels, num_query, num_point,
 72 |                 columns.data<scalar_t>());
 73 | 
 74 |         }));
 75 |     }
 76 | 
 77 |     output = output.view({batch, num_query, num_heads*channels});
 78 | 
 79 |     return output;
 80 | }
 81 | 
 82 | 
 83 | std::vector<at::Tensor> ms_deform_attn_cuda_backward(
 84 |     const at::Tensor &value, 
 85 |     const at::Tensor &spatial_shapes,
 86 |     const at::Tensor &level_start_index,
 87 |     const at::Tensor &sampling_loc,
 88 |     const at::Tensor &attn_weight,
 89 |     const at::Tensor &grad_output,
 90 |     const int im2col_step)
 91 | {
 92 | 
 93 |     AT_ASSERTM(value.is_contiguous(), "value tensor has to be contiguous");
 94 |     AT_ASSERTM(spatial_shapes.is_contiguous(), "spatial_shapes tensor has to be contiguous");
 95 |     AT_ASSERTM(level_start_index.is_contiguous(), "level_start_index tensor has to be contiguous");
 96 |     AT_ASSERTM(sampling_loc.is_contiguous(), "sampling_loc tensor has to be contiguous");
 97 |     AT_ASSERTM(attn_weight.is_contiguous(), "attn_weight tensor has to be contiguous");
 98 |     AT_ASSERTM(grad_output.is_contiguous(), "grad_output tensor has to be contiguous");
 99 | 
100 |     AT_ASSERTM(value.type().is_cuda(), "value must be a CUDA tensor");
101 |     AT_ASSERTM(spatial_shapes.type().is_cuda(), "spatial_shapes must be a CUDA tensor");
102 |     AT_ASSERTM(level_start_index.type().is_cuda(), "level_start_index must be a CUDA tensor");
103 |     AT_ASSERTM(sampling_loc.type().is_cuda(), "sampling_loc must be a CUDA tensor");
104 |     AT_ASSERTM(attn_weight.type().is_cuda(), "attn_weight must be a CUDA tensor");
105 |     AT_ASSERTM(grad_output.type().is_cuda(), "grad_output must be a CUDA tensor");
106 | 
107 |     const int batch = value.size(0);
108 |     const int spatial_size = value.size(1);
109 |     const int num_heads = value.size(2);
110 |     const int channels = value.size(3);
111 | 
112 |     const int num_levels = spatial_shapes.size(0);
113 | 
114 |     const int num_query = sampling_loc.size(1);
115 |     const int num_point = sampling_loc.size(4);
116 | 
117 |     const int im2col_step_ = std::min(batch, im2col_step);
118 | 
119 |     AT_ASSERTM(batch % im2col_step_ == 0, "batch(%d) must divide im2col_step(%d)", batch, im2col_step_);
120 | 
121 |     auto grad_value = at::zeros_like(value);
122 |     auto grad_sampling_loc = at::zeros_like(sampling_loc);
123 |     auto grad_attn_weight = at::zeros_like(attn_weight);
124 | 
125 |     const int batch_n = im2col_step_;
126 |     auto per_value_size = spatial_size * num_heads * channels;
127 |     auto per_sample_loc_size = num_query * num_heads * num_levels * num_point * 2;
128 |     auto per_attn_weight_size = num_query * num_heads * num_levels * num_point;
129 |     auto grad_output_n = grad_output.view({batch/im2col_step_, batch_n, num_query, num_heads, channels});
130 |     
131 |     for (int n = 0; n < batch/im2col_step_; ++n)
132 |     {
133 |         auto grad_output_g = grad_output_n.select(0, n);
134 |         AT_DISPATCH_FLOATING_TYPES(value.type(), "ms_deform_attn_backward_cuda", ([&] {
135 |             ms_deformable_col2im_cuda(at::cuda::getCurrentCUDAStream(),
136 |                                     grad_output_g.data<scalar_t>(),
137 |                                     value.data<scalar_t>() + n * im2col_step_ * per_value_size,
138 |                                     spatial_shapes.data<int64_t>(),
139 |                                     level_start_index.data<int64_t>(),
140 |                                     sampling_loc.data<scalar_t>() + n * im2col_step_ * per_sample_loc_size,
141 |                                     attn_weight.data<scalar_t>() + n * im2col_step_ * per_attn_weight_size,
142 |                                     batch_n, spatial_size, num_heads, channels, num_levels, num_query, num_point,
143 |                                     grad_value.data<scalar_t>() +  n * im2col_step_ * per_value_size,
144 |                                     grad_sampling_loc.data<scalar_t>() + n * im2col_step_ * per_sample_loc_size,
145 |                                     grad_attn_weight.data<scalar_t>() + n * im2col_step_ * per_attn_weight_size);
146 | 
147 |         }));
148 |     }
149 | 
150 |     return {
151 |         grad_value, grad_sampling_loc, grad_attn_weight
152 |     };
153 | }


--------------------------------------------------------------------------------
/models/ops/src/cuda/ms_deform_attn_cuda.h:
--------------------------------------------------------------------------------
 1 | /*!
 2 | **************************************************************************************************
 3 | * Deformable DETR
 4 | * Copyright (c) 2020 SenseTime. All Rights Reserved.
 5 | * Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 6 | **************************************************************************************************
 7 | * Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
 8 | **************************************************************************************************
 9 | */
10 | 
11 | #pragma once
12 | #include <torch/extension.h>
13 | 
14 | at::Tensor ms_deform_attn_cuda_forward(
15 |     const at::Tensor &value, 
16 |     const at::Tensor &spatial_shapes,
17 |     const at::Tensor &level_start_index,
18 |     const at::Tensor &sampling_loc,
19 |     const at::Tensor &attn_weight,
20 |     const int im2col_step);
21 | 
22 | std::vector<at::Tensor> ms_deform_attn_cuda_backward(
23 |     const at::Tensor &value, 
24 |     const at::Tensor &spatial_shapes,
25 |     const at::Tensor &level_start_index,
26 |     const at::Tensor &sampling_loc,
27 |     const at::Tensor &attn_weight,
28 |     const at::Tensor &grad_output,
29 |     const int im2col_step);
30 | 
31 | 


--------------------------------------------------------------------------------
/models/ops/src/ms_deform_attn.h:
--------------------------------------------------------------------------------
 1 | /*!
 2 | **************************************************************************************************
 3 | * Deformable DETR
 4 | * Copyright (c) 2020 SenseTime. All Rights Reserved.
 5 | * Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 6 | **************************************************************************************************
 7 | * Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
 8 | **************************************************************************************************
 9 | */
10 | 
11 | #pragma once
12 | 
13 | #include "cpu/ms_deform_attn_cpu.h"
14 | 
15 | #ifdef WITH_CUDA
16 | #include "cuda/ms_deform_attn_cuda.h"
17 | #endif
18 | 
19 | 
20 | at::Tensor
21 | ms_deform_attn_forward(
22 |     const at::Tensor &value, 
23 |     const at::Tensor &spatial_shapes,
24 |     const at::Tensor &level_start_index,
25 |     const at::Tensor &sampling_loc,
26 |     const at::Tensor &attn_weight,
27 |     const int im2col_step)
28 | {
29 |     if (value.type().is_cuda())
30 |     {
31 | #ifdef WITH_CUDA
32 |         return ms_deform_attn_cuda_forward(
33 |             value, spatial_shapes, level_start_index, sampling_loc, attn_weight, im2col_step);
34 | #else
35 |         AT_ERROR("Not compiled with GPU support");
36 | #endif
37 |     }
38 |     AT_ERROR("Not implemented on the CPU");
39 | }
40 | 
41 | std::vector<at::Tensor>
42 | ms_deform_attn_backward(
43 |     const at::Tensor &value, 
44 |     const at::Tensor &spatial_shapes,
45 |     const at::Tensor &level_start_index,
46 |     const at::Tensor &sampling_loc,
47 |     const at::Tensor &attn_weight,
48 |     const at::Tensor &grad_output,
49 |     const int im2col_step)
50 | {
51 |     if (value.type().is_cuda())
52 |     {
53 | #ifdef WITH_CUDA
54 |         return ms_deform_attn_cuda_backward(
55 |             value, spatial_shapes, level_start_index, sampling_loc, attn_weight, grad_output, im2col_step);
56 | #else
57 |         AT_ERROR("Not compiled with GPU support");
58 | #endif
59 |     }
60 |     AT_ERROR("Not implemented on the CPU");
61 | }
62 | 
63 | 


--------------------------------------------------------------------------------
/models/ops/src/vision.cpp:
--------------------------------------------------------------------------------
 1 | /*!
 2 | **************************************************************************************************
 3 | * Deformable DETR
 4 | * Copyright (c) 2020 SenseTime. All Rights Reserved.
 5 | * Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 6 | **************************************************************************************************
 7 | * Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
 8 | **************************************************************************************************
 9 | */
10 | 
11 | #include "ms_deform_attn.h"
12 | 
13 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
14 |   m.def("ms_deform_attn_forward", &ms_deform_attn_forward, "ms_deform_attn_forward");
15 |   m.def("ms_deform_attn_backward", &ms_deform_attn_backward, "ms_deform_attn_backward");
16 | }
17 | 


--------------------------------------------------------------------------------
/models/ops/test.py:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------------------------------
 2 | # Deformable DETR
 3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
 4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 5 | # ------------------------------------------------------------------------------------------------
 6 | # Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
 7 | # ------------------------------------------------------------------------------------------------
 8 | 
 9 | from __future__ import absolute_import
10 | from __future__ import print_function
11 | from __future__ import division
12 | 
13 | import time
14 | import torch
15 | import torch.nn as nn
16 | from torch.autograd import gradcheck
17 | 
18 | from functions.ms_deform_attn_func import MSDeformAttnFunction, ms_deform_attn_core_pytorch
19 | 
20 | 
21 | N, M, D = 1, 2, 2
22 | Lq, L, P = 2, 2, 2
23 | shapes = torch.as_tensor([(6, 4), (3, 2)], dtype=torch.long).cuda()
24 | level_start_index = torch.cat((shapes.new_zeros((1, )), shapes.prod(1).cumsum(0)[:-1]))
25 | S = sum([(H*W).item() for H, W in shapes])
26 | 
27 | 
28 | torch.manual_seed(3)
29 | 
30 | 
31 | @torch.no_grad()
32 | def check_forward_equal_with_pytorch_double():
33 |     value = torch.rand(N, S, M, D).cuda() * 0.01
34 |     sampling_locations = torch.rand(N, Lq, M, L, P, 2).cuda()
35 |     attention_weights = torch.rand(N, Lq, M, L, P).cuda() + 1e-5
36 |     attention_weights /= attention_weights.sum(-1, keepdim=True).sum(-2, keepdim=True)
37 |     im2col_step = 2
38 |     output_pytorch = ms_deform_attn_core_pytorch(value.double(), shapes, sampling_locations.double(), attention_weights.double()).detach().cpu()
39 |     output_cuda = MSDeformAttnFunction.apply(value.double(), shapes, level_start_index, sampling_locations.double(), attention_weights.double(), im2col_step).detach().cpu()
40 |     fwdok = torch.allclose(output_cuda, output_pytorch)
41 |     max_abs_err = (output_cuda - output_pytorch).abs().max()
42 |     max_rel_err = ((output_cuda - output_pytorch).abs() / output_pytorch.abs()).max()
43 | 
44 |     print(f'* {fwdok} check_forward_equal_with_pytorch_double: max_abs_err {max_abs_err:.2e} max_rel_err {max_rel_err:.2e}')
45 | 
46 | 
47 | @torch.no_grad()
48 | def check_forward_equal_with_pytorch_float():
49 |     value = torch.rand(N, S, M, D).cuda() * 0.01
50 |     sampling_locations = torch.rand(N, Lq, M, L, P, 2).cuda()
51 |     attention_weights = torch.rand(N, Lq, M, L, P).cuda() + 1e-5
52 |     attention_weights /= attention_weights.sum(-1, keepdim=True).sum(-2, keepdim=True)
53 |     im2col_step = 2
54 |     output_pytorch = ms_deform_attn_core_pytorch(value, shapes, sampling_locations, attention_weights).detach().cpu()
55 |     output_cuda = MSDeformAttnFunction.apply(value, shapes, level_start_index, sampling_locations, attention_weights, im2col_step).detach().cpu()
56 |     fwdok = torch.allclose(output_cuda, output_pytorch, rtol=1e-2, atol=1e-3)
57 |     max_abs_err = (output_cuda - output_pytorch).abs().max()
58 |     max_rel_err = ((output_cuda - output_pytorch).abs() / output_pytorch.abs()).max()
59 | 
60 |     print(f'* {fwdok} check_forward_equal_with_pytorch_float: max_abs_err {max_abs_err:.2e} max_rel_err {max_rel_err:.2e}')
61 | 
62 | 
63 | def check_gradient_numerical(channels=4, grad_value=True, grad_sampling_loc=True, grad_attn_weight=True):
64 | 
65 |     value = torch.rand(N, S, M, channels).cuda() * 0.01
66 |     sampling_locations = torch.rand(N, Lq, M, L, P, 2).cuda()
67 |     attention_weights = torch.rand(N, Lq, M, L, P).cuda() + 1e-5
68 |     attention_weights /= attention_weights.sum(-1, keepdim=True).sum(-2, keepdim=True)
69 |     im2col_step = 2
70 |     func = MSDeformAttnFunction.apply
71 | 
72 |     value.requires_grad = grad_value
73 |     sampling_locations.requires_grad = grad_sampling_loc
74 |     attention_weights.requires_grad = grad_attn_weight
75 | 
76 |     gradok = gradcheck(func, (value.double(), shapes, level_start_index, sampling_locations.double(), attention_weights.double(), im2col_step))
77 | 
78 |     print(f'* {gradok} check_gradient_numerical(D={channels})')
79 | 
80 | 
81 | if __name__ == '__main__':
82 |     check_forward_equal_with_pytorch_double()
83 |     check_forward_equal_with_pytorch_float()
84 | 
85 |     for channels in [30, 32, 64, 71, 1025, 2048, 3096]:
86 |         check_gradient_numerical(channels, True, True, True)
87 | 
88 | 
89 | 
90 | 


--------------------------------------------------------------------------------
/models/position_encoding.py:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------
 2 | # Deformable DETR
 3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
 4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 5 | # ------------------------------------------------------------------------
 6 | # Modified from DETR (https://github.com/facebookresearch/detr)
 7 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
 8 | # ------------------------------------------------------------------------
 9 | 
10 | """
11 | Various positional encodings for the transformer.
12 | """
13 | import math
14 | import torch
15 | from torch import nn
16 | 
17 | from util.misc import NestedTensor
18 | 
19 | 
20 | class PositionEmbeddingSine(nn.Module):
21 |     """
22 |     This is a more standard version of the position embedding, very similar to the one
23 |     used by the Attention is all you need paper, generalized to work on images.
24 |     """
25 |     def __init__(self, num_pos_feats=64, temperature=10000, normalize=False, scale=None):
26 |         super().__init__()
27 |         self.num_pos_feats = num_pos_feats
28 |         self.temperature = temperature
29 |         self.normalize = normalize
30 |         if scale is not None and normalize is False:
31 |             raise ValueError("normalize should be True if scale is passed")
32 |         if scale is None:
33 |             scale = 2 * math.pi
34 |         self.scale = scale
35 | 
36 |     def forward(self, tensor_list: NestedTensor):
37 |         x = tensor_list.tensors
38 |         mask = tensor_list.mask
39 |         assert mask is not None
40 |         not_mask = ~mask
41 |         y_embed = not_mask.cumsum(1, dtype=torch.float32)
42 |         x_embed = not_mask.cumsum(2, dtype=torch.float32)
43 |         if self.normalize:
44 |             eps = 1e-6
45 |             y_embed = (y_embed - 0.5) / (y_embed[:, -1:, :] + eps) * self.scale
46 |             x_embed = (x_embed - 0.5) / (x_embed[:, :, -1:] + eps) * self.scale
47 | 
48 |         dim_t = torch.arange(self.num_pos_feats, dtype=torch.float32, device=x.device)
49 |         dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats)
50 | 
51 |         pos_x = x_embed[:, :, :, None] / dim_t
52 |         pos_y = y_embed[:, :, :, None] / dim_t
53 |         pos_x = torch.stack((pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4).flatten(3)
54 |         pos_y = torch.stack((pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4).flatten(3)
55 |         pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2)
56 |         return pos
57 | 
58 | 
59 | class PositionEmbeddingLearned(nn.Module):
60 |     """
61 |     Absolute pos embedding, learned.
62 |     """
63 |     def __init__(self, num_pos_feats=256):
64 |         super().__init__()
65 |         self.row_embed = nn.Embedding(50, num_pos_feats)
66 |         self.col_embed = nn.Embedding(50, num_pos_feats)
67 |         self.reset_parameters()
68 | 
69 |     def reset_parameters(self):
70 |         nn.init.uniform_(self.row_embed.weight)
71 |         nn.init.uniform_(self.col_embed.weight)
72 | 
73 |     def forward(self, tensor_list: NestedTensor):
74 |         x = tensor_list.tensors
75 |         h, w = x.shape[-2:]
76 |         i = torch.arange(w, device=x.device)
77 |         j = torch.arange(h, device=x.device)
78 |         x_emb = self.col_embed(i)
79 |         y_emb = self.row_embed(j)
80 |         pos = torch.cat([
81 |             x_emb.unsqueeze(0).repeat(h, 1, 1),
82 |             y_emb.unsqueeze(1).repeat(1, w, 1),
83 |         ], dim=-1).permute(2, 0, 1).unsqueeze(0).repeat(x.shape[0], 1, 1, 1)
84 |         return pos
85 | 
86 | 
87 | def build_position_encoding(args):
88 |     N_steps = args.hidden_dim // 2
89 |     if args.position_embedding in ('v2', 'sine'):
90 |         # TODO find a better way of exposing other arguments
91 |         position_embedding = PositionEmbeddingSine(N_steps, normalize=True)
92 |     elif args.position_embedding in ('v3', 'learned'):
93 |         position_embedding = PositionEmbeddingLearned(N_steps)
94 |     else:
95 |         raise ValueError(f"not supported {args.position_embedding}")
96 | 
97 |     return position_embedding
98 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | pycocotools
2 | tqdm
3 | cython
4 | scipy
5 | 


--------------------------------------------------------------------------------
/scripts/Bees2COCO.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: Apache-2.0
  3 | 
  4 | import os
  5 | import json
  6 | import xml.etree.ElementTree as ET
  7 | import glob
  8 | import cv2
  9 | 
 10 | START_BOUNDING_BOX_ID = 1
 11 | PRE_DEFINE_CATEGORIES = None
 12 | 
 13 | def get(root, name):
 14 |     vars = root.findall(name)
 15 |     return vars
 16 | 
 17 | 
 18 | def get_and_check(root, name, length):
 19 |     vars = root.findall(name)
 20 |     if len(vars) == 0:
 21 |         raise ValueError("Can not find %s in %s." % (name, root.tag))
 22 |     if length > 0 and len(vars) != length:
 23 |         raise ValueError(
 24 |             "The size of %s is supposed to be %d, but is %d."
 25 |             % (name, length, len(vars))
 26 |         )
 27 |     if length == 1:
 28 |         vars = vars[0]
 29 |     return vars
 30 | 
 31 | 
 32 | def get_filename_as_int(filename):
 33 |     try:
 34 |         filename = filename.replace("\\", "/")
 35 |         filename = os.path.splitext(os.path.basename(filename))[0]
 36 |         return int(filename)
 37 |     except:
 38 |         raise ValueError("Filename %s is supposed to be an integer." % (filename))
 39 | 
 40 | 
 41 | def get_categories(xml_files):
 42 |     """Generate category name to id mapping from a list of xml files.
 43 | 
 44 |     Arguments:
 45 |         xml_files {list} -- A list of xml file paths.
 46 | 
 47 |     Returns:
 48 |         dict -- category name to id mapping.
 49 |     """
 50 |     classes_names = []
 51 |     for xml_file in xml_files:
 52 |         tree = ET.parse(xml_file)
 53 |         root = tree.getroot()
 54 |         for member in root.findall("object"):
 55 |             classes_names.append(member[0].text)
 56 |     classes_names = list(set(classes_names))
 57 |     classes_names.sort()
 58 |     return {name: i for i, name in enumerate(classes_names)}
 59 | 
 60 | 
 61 | def convert(xml_files, json_file):
 62 |     json_dict = {"images": [], "type": "instances", "annotations": [], "categories": []}
 63 |     if PRE_DEFINE_CATEGORIES is not None:
 64 |         categories = PRE_DEFINE_CATEGORIES
 65 |     else:
 66 |         categories = {'bees': 0}
 67 |     bnd_id = START_BOUNDING_BOX_ID
 68 |     i_th = 0
 69 |     for xml_file in xml_files:
 70 |         tree = ET.parse(xml_file)
 71 |         root = tree.getroot()
 72 |         path = get(root, "path")
 73 |         if len(path) == 1:
 74 |             filename_original = os.path.basename(path[0].text)
 75 |         elif len(path) == 0:
 76 |             filename_original = get_and_check(root, "filename", 1).text
 77 |         else:
 78 |             raise ValueError("%d paths found in %s" % (len(path), xml_file))
 79 |         ## The filename must be a number
 80 |         # image_id = get_filename_as_int(filename)
 81 |         filename = filename_original.split('_')
 82 |         filename = filename[-1]
 83 |         filename = filename.split('.')
 84 |         image_id = int(filename[0])
 85 |         size = get_and_check(root, "size", 1)
 86 |         width = int(get_and_check(size, "width", 1).text)
 87 |         height = int(get_and_check(size, "height", 1).text)
 88 | 
 89 |         i_th = i_th + 1
 90 |         if i_th % 10 == 0:
 91 |             print(i_th)
 92 | 
 93 |         # we rescale the image if its size greater than 800, because in this dataset, the image is too big, for weak aug, we can't accept such big image because of memory issue
 94 |         if width > 600 or height > 600:
 95 |             if (width <= height and width == 600) or (height <= width and height == 600):
 96 |                 oh = height
 97 |                 ow = width
 98 |             if width < height:
 99 |                 ow = 600
100 |                 oh = int(600 * height / width)
101 |             else:
102 |                 oh = 600
103 |                 ow = int(600 * width / height)
104 | 
105 |             original_img = cv2.imread('../bees/ML-Data/' + filename_original)
106 |             resized_img = cv2.resize(original_img, (ow, oh))
107 |             ratios = [float(ow)/float(width), float(oh)/float(height)]
108 |             ratio_width, ratio_height = ratios
109 |             cv2.imwrite('../bees/ML-Data/' + filename_original, resized_img)
110 | 
111 |             image = {
112 |                 "file_name": filename_original,
113 |                 "height": oh,
114 |                 "width": ow,
115 |                 "id": image_id,
116 |             }
117 |             json_dict["images"].append(image)
118 |             ## Currently we do not support segmentation.
119 |             #  segmented = get_and_check(root, 'segmented', 1).text
120 |             #  assert segmented == '0'
121 |             for obj in get(root, "object"):
122 |                 category_id = 0
123 |                 bndbox = get_and_check(obj, "bndbox", 1)
124 |                 xmin = int(get_and_check(bndbox, "xmin", 1).text) - 1
125 |                 # ymin = int(get_and_check(bndbox, "ymin", 1).text) - 1
126 |                 ymin = int(float(get_and_check(bndbox, "ymin", 1).text)) - 1
127 |                 xmax = int(get_and_check(bndbox, "xmax", 1).text)
128 |                 ymax = int(get_and_check(bndbox, "ymax", 1).text)
129 |                 assert xmax > xmin
130 |                 assert ymax > ymin
131 |                 xmin = xmin * ratio_width
132 |                 xmax = xmax * ratio_width
133 |                 ymin = ymin * ratio_height
134 |                 ymax = ymax * ratio_height
135 |                 o_width = abs(xmax - xmin)
136 |                 o_height = abs(ymax - ymin)
137 | 
138 |                 ann = {
139 |                     "area": o_width * o_height,
140 |                     "iscrowd": 0,
141 |                     "image_id": image_id,
142 |                     "bbox": [xmin, ymin, o_width, o_height],
143 |                     "category_id": category_id,
144 |                     "id": bnd_id,
145 |                     "ignore": 0,
146 |                     "segmentation": [],
147 |                 }
148 |                 json_dict["annotations"].append(ann)
149 |                 bnd_id = bnd_id + 1
150 |         else:
151 |             image = {
152 |                 "file_name": filename_original,
153 |                 "height": height,
154 |                 "width": width,
155 |                 "id": image_id,
156 |             }
157 |             json_dict["images"].append(image)
158 |             ## Currently we do not support segmentation.
159 |             #  segmented = get_and_check(root, 'segmented', 1).text
160 |             #  assert segmented == '0'
161 |             for obj in get(root, "object"):
162 |                 category_id = 0
163 |                 bndbox = get_and_check(obj, "bndbox", 1)
164 |                 xmin = int(get_and_check(bndbox, "xmin", 1).text) - 1
165 |                 # ymin = int(get_and_check(bndbox, "ymin", 1).text) - 1
166 |                 ymin = int(float(get_and_check(bndbox, "ymin", 1).text)) - 1
167 |                 xmax = int(get_and_check(bndbox, "xmax", 1).text)
168 |                 ymax = int(get_and_check(bndbox, "ymax", 1).text)
169 |                 assert xmax > xmin
170 |                 assert ymax > ymin
171 |                 o_width = abs(xmax - xmin)
172 |                 o_height = abs(ymax - ymin)
173 |                 ann = {
174 |                     "area": o_width * o_height,
175 |                     "iscrowd": 0,
176 |                     "image_id": image_id,
177 |                     "bbox": [xmin, ymin, o_width, o_height],
178 |                     "category_id": category_id,
179 |                     "id": bnd_id,
180 |                     "ignore": 0,
181 |                     "segmentation": [],
182 |                 }
183 |                 json_dict["annotations"].append(ann)
184 |                 bnd_id = bnd_id + 1
185 | 
186 |     cat = {"supercategory": "none", "id": 0, "name": 'bees'}
187 |     json_dict["categories"].append(cat)
188 |     os.makedirs(os.path.dirname(json_file), exist_ok=True)
189 |     json_fp = open(json_file, "w")
190 |     json_str = json.dumps(json_dict)
191 |     json_fp.write(json_str)
192 |     json_fp.close()
193 | 
194 | 
195 | if __name__ == "__main__":
196 |     import argparse
197 | 
198 |     parser = argparse.ArgumentParser(
199 |         description="Convert Pascal VOC annotation to COCO format."
200 |     )
201 | 
202 |     parser.add_argument('--xml_dir', default="../bees/ML-Data/",
203 |                         help="Directory path to xml files.", type=str)
204 |     parser.add_argument('--json_file',
205 |                         default="../bees/instances_bees.json",
206 |                         help="Output COCO format json file.", type=str)
207 | 
208 |     args = parser.parse_args()
209 |     xml_files = glob.glob(os.path.join(args.xml_dir, "*.xml"))
210 | 
211 |     # If you want to do train/test split, you can pass a subset of xml files to convert function.
212 |     print("Number of xml files: {}".format(len(xml_files)))
213 |     convert(xml_files, args.json_file)
214 |     print("Success: {}".format(args.json_file))
215 | 


--------------------------------------------------------------------------------
/scripts/IoU_extreme.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/amazon-science/omni-detr/4a2a8bac62778fa8ad5e05bb0627fedccd1464ea/scripts/IoU_extreme.npy


--------------------------------------------------------------------------------
/scripts/VOC2COCO.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: Apache-2.0
  3 | 
  4 | #!/usr/bin/python
  5 | 
  6 | # pip install lxml
  7 | 
  8 | import os
  9 | import json
 10 | import xml.etree.ElementTree as ET
 11 | import glob
 12 | 
 13 | START_BOUNDING_BOX_ID = 1
 14 | PRE_DEFINE_CATEGORIES = None
 15 | 
 16 | 
 17 | # If necessary, pre-define category and its id
 18 | #  PRE_DEFINE_CATEGORIES = {"aeroplane": 1, "bicycle": 2, "bird": 3, "boat": 4,
 19 | #  "bottle":5, "bus": 6, "car": 7, "cat": 8, "chair": 9,
 20 | #  "cow": 10, "diningtable": 11, "dog": 12, "horse": 13,
 21 | #  "motorbike": 14, "person": 15, "pottedplant": 16,
 22 | #  "sheep": 17, "sofa": 18, "train": 19, "tvmonitor": 20}
 23 | 
 24 | 
 25 | def get(root, name):
 26 |     vars = root.findall(name)
 27 |     return vars
 28 | 
 29 | 
 30 | def get_and_check(root, name, length):
 31 |     vars = root.findall(name)
 32 |     if len(vars) == 0:
 33 |         raise ValueError("Can not find %s in %s." % (name, root.tag))
 34 |     if length > 0 and len(vars) != length:
 35 |         raise ValueError(
 36 |             "The size of %s is supposed to be %d, but is %d."
 37 |             % (name, length, len(vars))
 38 |         )
 39 |     if length == 1:
 40 |         vars = vars[0]
 41 |     return vars
 42 | 
 43 | 
 44 | def get_filename_as_int(filename):
 45 |     try:
 46 |         filename = filename.replace("\\", "/")
 47 |         filename = os.path.splitext(os.path.basename(filename))[0]
 48 |         return int(filename)
 49 |     except:
 50 |         raise ValueError("Filename %s is supposed to be an integer." % (filename))
 51 | 
 52 | 
 53 | def get_categories(xml_files):
 54 |     """Generate category name to id mapping from a list of xml files.
 55 | 
 56 |     Arguments:
 57 |         xml_files {list} -- A list of xml file paths.
 58 | 
 59 |     Returns:
 60 |         dict -- category name to id mapping.
 61 |     """
 62 |     classes_names = []
 63 |     for xml_file in xml_files:
 64 |         tree = ET.parse(xml_file)
 65 |         root = tree.getroot()
 66 |         for member in root.findall("object"):
 67 |             classes_names.append(member[0].text)
 68 |     classes_names = list(set(classes_names))
 69 |     classes_names.sort()
 70 |     return {name: i for i, name in enumerate(classes_names)}
 71 | 
 72 | 
 73 | def convert(xml_files, json_file):
 74 |     json_dict = {"images": [], "type": "instances", "annotations": [], "categories": []}
 75 |     if PRE_DEFINE_CATEGORIES is not None:
 76 |         categories = PRE_DEFINE_CATEGORIES
 77 |     else:
 78 |         categories = get_categories(xml_files)
 79 |     bnd_id = START_BOUNDING_BOX_ID
 80 |     for xml_file in xml_files:
 81 |         tree = ET.parse(xml_file)
 82 |         root = tree.getroot()
 83 |         path = get(root, "path")
 84 |         if len(path) == 1:
 85 |             filename = os.path.basename(path[0].text)
 86 |         elif len(path) == 0:
 87 |             filename = get_and_check(root, "filename", 1).text
 88 |         else:
 89 |             raise ValueError("%d paths found in %s" % (len(path), xml_file))
 90 |         ## The filename must be a number
 91 |         image_id = get_filename_as_int(filename)
 92 |         size = get_and_check(root, "size", 1)
 93 |         width = int(get_and_check(size, "width", 1).text)
 94 |         height = int(get_and_check(size, "height", 1).text)
 95 |         image = {
 96 |             "file_name": filename,
 97 |             "height": height,
 98 |             "width": width,
 99 |             "id": image_id,
100 |         }
101 |         json_dict["images"].append(image)
102 |         ## Currently we do not support segmentation.
103 |         #  segmented = get_and_check(root, 'segmented', 1).text
104 |         #  assert segmented == '0'
105 |         for obj in get(root, "object"):
106 |             category = get_and_check(obj, "name", 1).text
107 |             if category not in categories:
108 |                 new_id = len(categories)
109 |                 categories[category] = new_id
110 |             category_id = categories[category]
111 |             bndbox = get_and_check(obj, "bndbox", 1)
112 |             xmin = int(get_and_check(bndbox, "xmin", 1).text) - 1
113 |             # ymin = int(get_and_check(bndbox, "ymin", 1).text) - 1
114 |             ymin = int(float(get_and_check(bndbox, "ymin", 1).text)) - 1
115 |             xmax = int(get_and_check(bndbox, "xmax", 1).text)
116 |             ymax = int(get_and_check(bndbox, "ymax", 1).text)
117 |             assert xmax > xmin
118 |             assert ymax > ymin
119 |             o_width = abs(xmax - xmin)
120 |             o_height = abs(ymax - ymin)
121 |             ann = {
122 |                 "area": o_width * o_height,
123 |                 "iscrowd": 0,
124 |                 "image_id": image_id,
125 |                 "bbox": [xmin, ymin, o_width, o_height],
126 |                 "category_id": category_id,
127 |                 "id": bnd_id,
128 |                 "ignore": 0,
129 |                 "segmentation": [],
130 |             }
131 |             json_dict["annotations"].append(ann)
132 |             bnd_id = bnd_id + 1
133 | 
134 |     for cate, cid in categories.items():
135 |         cat = {"supercategory": "none", "id": cid, "name": cate}
136 |         json_dict["categories"].append(cat)
137 | 
138 |     os.makedirs(os.path.dirname(json_file), exist_ok=True)
139 |     json_fp = open(json_file, "w")
140 |     json_str = json.dumps(json_dict)
141 |     json_fp.write(json_str)
142 |     json_fp.close()
143 | 
144 | 
145 | if __name__ == "__main__":
146 |     import argparse
147 | 
148 |     parser = argparse.ArgumentParser(
149 |         description="Convert Pascal VOC annotation to COCO format."
150 |     )
151 |     parser.add_argument('--xml_dir', default="../voc/VOCdevkit/VOC2007trainval/Annotations", help="Directory path to xml files.", type=str)
152 |     parser.add_argument('--json_file', default="../voc/VOCdevkit/VOC2007trainval/instances_VOC_trainval2007.json", help="Output COCO format json file.", type=str)
153 |     args = parser.parse_args()
154 |     xml_files = glob.glob(os.path.join(args.xml_dir, "*.xml"))
155 | 
156 |     # If you want to do train/test split, you can pass a subset of xml files to convert function.
157 |     print("Number of xml files: {}".format(len(xml_files)))
158 |     convert(xml_files, args.json_file)
159 |     print("Success: {}".format(args.json_file))
160 | 


--------------------------------------------------------------------------------
/scripts/add_indicator_to_coco2014.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: Apache-2.0
  3 | 
  4 | import json
  5 | import numpy as np
  6 | from pycocotools.coco import COCO
  7 | 
  8 | def main():
  9 | 
 10 |     root_dir = '../coco/annotations/'
 11 |     json_file = root_dir + 'instances_valminusminival2014.json'
 12 |     coco_api = COCO(json_file)
 13 |     img_ids = sorted(coco_api.imgs.keys())
 14 |     imgs = coco_api.loadImgs(img_ids)
 15 | 
 16 |     # add indicator
 17 |     for i_img in imgs:
 18 |         i_img['indicator'] = 1
 19 |         i_img['label_type'] = 'fully'
 20 |     dataset_anns = [coco_api.imgToAnns[i] for i in img_ids]
 21 |     anns = [ann for img_anns in dataset_anns for ann in img_anns]
 22 | 
 23 |     ith = 0
 24 |     for i_ann in anns:
 25 |         mask_i = coco_api.annToMask(i_ann)
 26 |         # sample a point
 27 |         valid_idx = np.where(mask_i == 1)
 28 |         if np.sum(mask_i) > 0:
 29 |             sampled_idx = np.random.choice(np.arange(np.size(valid_idx[0])), 1)
 30 |             sampled_point_i = [valid_idx[1][sampled_idx][0], valid_idx[0][sampled_idx][0]]
 31 |             sampled_point_i = [float(item) for item in sampled_point_i]
 32 |             i_ann['point'] = sampled_point_i
 33 |         else:
 34 |             if len(i_ann['bbox']) > 0:
 35 |                 boxes = i_ann['bbox']
 36 |                 boxes = np.array(boxes)
 37 |                 mask_i[int(boxes[1]):(int(boxes[1]) + int(boxes[3])),
 38 |                 int(boxes[0]):(int(boxes[0]) + int(boxes[2]))] = 1
 39 |                 valid_idx = np.where(mask_i == 1)
 40 |                 if np.sum(mask_i) > 0:
 41 |                     sampled_idx = np.random.choice(np.arange(np.size(valid_idx[0])), 1)
 42 |                     sampled_point_i = [valid_idx[1][sampled_idx][0], valid_idx[0][sampled_idx][0]]
 43 |                     sampled_point_i = [float(item) for item in sampled_point_i]
 44 |                     i_ann['point'] = sampled_point_i
 45 |                 else:  # at least one of the box size less than 1 pixel
 46 |                     if int(boxes[2]) < 1:
 47 |                         boxes[2] = boxes[2] + 1
 48 |                     if int(boxes[3]) < 1:
 49 |                         boxes[3] = boxes[3] + 1
 50 |                     mask_i[int(boxes[1]):(int(boxes[1]) + int(boxes[3])),
 51 |                     int(boxes[0]):(int(boxes[0]) + int(boxes[2]))] = 1
 52 |                     valid_idx = np.where(mask_i == 1)
 53 |                     sampled_idx = np.random.choice(np.arange(np.size(valid_idx[0])), 1)
 54 |                     sampled_point_i = [valid_idx[1][sampled_idx][0], valid_idx[0][sampled_idx][0]]
 55 |                     sampled_point_i = [float(item) for item in sampled_point_i]
 56 |                     i_ann['point'] = sampled_point_i
 57 |             else:
 58 |                 i_ann['point'] = []
 59 |                 print(i_ann['bbox'])
 60 |         ith = ith + 1
 61 |         if ith % 10000 == 0:
 62 |             print(ith)
 63 | 
 64 |     data = {}
 65 |     data['images'] = imgs
 66 |     data['annotations'] = anns
 67 |     data['categories'] = list(coco_api.cats.values())
 68 |     data['info'] = coco_api.dataset['info']
 69 |     data['licenses'] = coco_api.dataset['licenses']
 70 |     output_file_label = root_dir + 'instances_valminusminival2014_w_indicator.json'
 71 | 
 72 |     ## save to json
 73 |     with open(output_file_label, 'w') as f:
 74 |         print('writing to json output:', output_file_label)
 75 |         json.dump(data, f, sort_keys=True)
 76 | 
 77 |     # unlabel part
 78 |     json_file = root_dir + 'instances_train2014.json'
 79 |     coco_api = COCO(json_file)
 80 |     img_ids = sorted(coco_api.imgs.keys())
 81 |     imgs = coco_api.loadImgs(img_ids)
 82 | 
 83 |     # add indicator
 84 |     for i_img in imgs:
 85 |         i_img['indicator'] = 0
 86 |         i_img['label_type'] = 'tagsU'  # need to assign correct string for different types of annotations, tagsU, pointsK, Unsup
 87 |     dataset_anns = [coco_api.imgToAnns[i] for i in img_ids]
 88 |     anns = [ann for img_anns in dataset_anns for ann in img_anns]
 89 | 
 90 |     ith = 0
 91 |     for i_ann in anns:
 92 |         mask_i = coco_api.annToMask(i_ann)
 93 |         # sample a point
 94 |         valid_idx = np.where(mask_i == 1)
 95 |         if np.sum(mask_i) > 0:
 96 |             sampled_idx = np.random.choice(np.arange(np.size(valid_idx[0])), 1)
 97 |             sampled_point_i = [valid_idx[1][sampled_idx][0], valid_idx[0][sampled_idx][0]]
 98 |             sampled_point_i = [float(item) for item in sampled_point_i]
 99 |             i_ann['point'] = sampled_point_i
100 |         else:
101 |             if len(i_ann['bbox']) > 0:
102 |                 boxes = i_ann['bbox']
103 |                 boxes = np.array(boxes)
104 |                 mask_i[int(boxes[1]):(int(boxes[1]) + int(boxes[3])),
105 |                 int(boxes[0]):(int(boxes[0]) + int(boxes[2]))] = 1
106 |                 valid_idx = np.where(mask_i == 1)
107 |                 if np.sum(mask_i) > 0:
108 |                     sampled_idx = np.random.choice(np.arange(np.size(valid_idx[0])), 1)
109 |                     sampled_point_i = [valid_idx[1][sampled_idx][0], valid_idx[0][sampled_idx][0]]
110 |                     sampled_point_i = [float(item) for item in sampled_point_i]
111 |                     i_ann['point'] = sampled_point_i
112 |                 else:  # at least one of the box size less than 1 pixel
113 |                     if int(boxes[2]) < 1:
114 |                         boxes[2] = boxes[2] + 1
115 |                     if int(boxes[3]) < 1:
116 |                         boxes[3] = boxes[3] + 1
117 |                     mask_i[int(boxes[1]):(int(boxes[1]) + int(boxes[3])),
118 |                     int(boxes[0]):(int(boxes[0]) + int(boxes[2]))] = 1
119 |                     valid_idx = np.where(mask_i == 1)
120 |                     sampled_idx = np.random.choice(np.arange(np.size(valid_idx[0])), 1)
121 |                     sampled_point_i = [valid_idx[1][sampled_idx][0], valid_idx[0][sampled_idx][0]]
122 |                     sampled_point_i = [float(item) for item in sampled_point_i]
123 |                     i_ann['point'] = sampled_point_i
124 |             else:
125 |                 i_ann['point'] = []
126 |                 print(i_ann['bbox'])
127 |         ith = ith + 1
128 |         if ith % 10000 == 0:
129 |             print(ith)
130 | 
131 |     data = {}
132 |     data['images'] = imgs
133 |     data['annotations'] = anns
134 |     data['categories'] = list(coco_api.cats.values())
135 |     data['info'] = coco_api.dataset['info']
136 |     data['licenses'] = coco_api.dataset['licenses']
137 |     output_file_label = root_dir + 'instances_train2014_w_indicator_tagsU.json'
138 |     ## save to json
139 |     with open(output_file_label, 'w') as f:
140 |         print('writing to json output:', output_file_label)
141 |         json.dump(data, f, sort_keys=True)
142 | 
143 |     # unlabel part
144 |     json_file = root_dir + 'instances_train2014.json'
145 |     coco_api = COCO(json_file)
146 |     img_ids = sorted(coco_api.imgs.keys())
147 |     imgs = coco_api.loadImgs(img_ids)
148 | 
149 |     # add indicator
150 |     for i_img in imgs:
151 |         i_img['indicator'] = 0
152 |         i_img['label_type'] = 'pointsK'  # need to assign correct string for different types of annotations, tagsU, pointsK, Unsup
153 |     dataset_anns = [coco_api.imgToAnns[i] for i in img_ids]
154 |     anns = [ann for img_anns in dataset_anns for ann in img_anns]
155 | 
156 |     ith = 0
157 |     for i_ann in anns:
158 |         mask_i = coco_api.annToMask(i_ann)
159 |         # sample a point
160 |         valid_idx = np.where(mask_i == 1)
161 |         if np.sum(mask_i) > 0:
162 |             sampled_idx = np.random.choice(np.arange(np.size(valid_idx[0])), 1)
163 |             sampled_point_i = [valid_idx[1][sampled_idx][0], valid_idx[0][sampled_idx][0]]
164 |             sampled_point_i = [float(item) for item in sampled_point_i]
165 |             i_ann['point'] = sampled_point_i
166 |         else:
167 |             if len(i_ann['bbox']) > 0:
168 |                 boxes = i_ann['bbox']
169 |                 boxes = np.array(boxes)
170 |                 mask_i[int(boxes[1]):(int(boxes[1]) + int(boxes[3])),
171 |                 int(boxes[0]):(int(boxes[0]) + int(boxes[2]))] = 1
172 |                 valid_idx = np.where(mask_i == 1)
173 |                 if np.sum(mask_i) > 0:
174 |                     sampled_idx = np.random.choice(np.arange(np.size(valid_idx[0])), 1)
175 |                     sampled_point_i = [valid_idx[1][sampled_idx][0], valid_idx[0][sampled_idx][0]]
176 |                     sampled_point_i = [float(item) for item in sampled_point_i]
177 |                     i_ann['point'] = sampled_point_i
178 |                 else:  # at least one of the box size less than 1 pixel
179 |                     if int(boxes[2]) < 1:
180 |                         boxes[2] = boxes[2] + 1
181 |                     if int(boxes[3]) < 1:
182 |                         boxes[3] = boxes[3] + 1
183 |                     mask_i[int(boxes[1]):(int(boxes[1]) + int(boxes[3])),
184 |                     int(boxes[0]):(int(boxes[0]) + int(boxes[2]))] = 1
185 |                     valid_idx = np.where(mask_i == 1)
186 |                     sampled_idx = np.random.choice(np.arange(np.size(valid_idx[0])), 1)
187 |                     sampled_point_i = [valid_idx[1][sampled_idx][0], valid_idx[0][sampled_idx][0]]
188 |                     sampled_point_i = [float(item) for item in sampled_point_i]
189 |                     i_ann['point'] = sampled_point_i
190 |             else:
191 |                 i_ann['point'] = []
192 |                 print(i_ann['bbox'])
193 |         ith = ith + 1
194 |         if ith % 10000 == 0:
195 |             print(ith)
196 | 
197 |     data = {}
198 |     data['images'] = imgs
199 |     data['annotations'] = anns
200 |     data['categories'] = list(coco_api.cats.values())
201 |     data['info'] = coco_api.dataset['info']
202 |     data['licenses'] = coco_api.dataset['licenses']
203 |     output_file_label = root_dir + 'instances_train2014_w_indicator_pointsK.json'
204 |     ## save to json
205 |     with open(output_file_label, 'w') as f:
206 |         print('writing to json output:', output_file_label)
207 |         json.dump(data, f, sort_keys=True)
208 | 
209 | 
210 | if __name__ == '__main__':
211 |     main()
212 |     print("finished!")


--------------------------------------------------------------------------------
/scripts/add_indicator_to_coco2017_val.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: Apache-2.0
 3 | 
 4 | import json
 5 | import numpy as np
 6 | from pycocotools.coco import COCO
 7 | 
 8 | def main():
 9 | 
10 |     # we also add point annotation to the validation set, although not used in the code
11 |     root_dir = '../coco/annotations/'
12 |     data_set = 'instances_val2017'
13 |     json_file = root_dir + '{}.json'.format(data_set)
14 |     coco_api = COCO(json_file)
15 |     img_ids = sorted(coco_api.imgs.keys())
16 |     imgs = coco_api.loadImgs(img_ids)
17 | 
18 |     # add indicator
19 |     for i_img in imgs:
20 |         i_img['indicator'] = 0
21 |         i_img['label_type'] = 'fully'
22 |     dataset_anns = [coco_api.imgToAnns[img_id] for img_id in img_ids]
23 |     anns = [ann for img_anns in dataset_anns for ann in img_anns]
24 | 
25 |     ith = 0
26 |     for i_ann in anns:
27 |         mask_i = coco_api.annToMask(i_ann)
28 |         # sample a point
29 |         valid_idx = np.where(mask_i == 1)
30 |         if np.sum(mask_i) > 0:
31 |             sampled_idx = np.random.choice(np.arange(np.size(valid_idx[0])), 1)
32 |             sampled_point_i = [valid_idx[1][sampled_idx][0], valid_idx[0][sampled_idx][0]]
33 |             sampled_point_i = [float(item) for item in sampled_point_i]
34 |             i_ann['point'] = sampled_point_i
35 |         else:
36 |             if len(i_ann['bbox']) > 0:
37 |                 boxes = i_ann['bbox']
38 |                 boxes = np.array(boxes)
39 |                 mask_i[int(boxes[1]):(int(boxes[1]) + int(boxes[3])),
40 |                 int(boxes[0]):(int(boxes[0]) + int(boxes[2]))] = 1
41 |                 valid_idx = np.where(mask_i == 1)
42 |                 if np.sum(mask_i) > 0:
43 |                     sampled_idx = np.random.choice(np.arange(np.size(valid_idx[0])), 1)
44 |                     sampled_point_i = [valid_idx[1][sampled_idx][0], valid_idx[0][sampled_idx][0]]
45 |                     sampled_point_i = [float(item) for item in sampled_point_i]
46 |                     i_ann['point'] = sampled_point_i
47 |                 else:  # at least one of the box size less than 1 pixel
48 |                     if int(boxes[2]) < 1:
49 |                         boxes[2] = boxes[2] + 1
50 |                     if int(boxes[3]) < 1:
51 |                         boxes[3] = boxes[3] + 1
52 |                     mask_i[int(boxes[1]):(int(boxes[1]) + int(boxes[3])),
53 |                     int(boxes[0]):(int(boxes[0]) + int(boxes[2]))] = 1
54 |                     valid_idx = np.where(mask_i == 1)
55 |                     sampled_idx = np.random.choice(np.arange(np.size(valid_idx[0])), 1)
56 |                     sampled_point_i = [valid_idx[1][sampled_idx][0], valid_idx[0][sampled_idx][0]]
57 |                     sampled_point_i = [float(item) for item in sampled_point_i]
58 |                     i_ann['point'] = sampled_point_i
59 |             else:
60 |                 i_ann['point'] = []
61 |                 print(i_ann['bbox'])
62 |         ith = ith + 1
63 |         if ith % 10000 == 0:
64 |             print(ith)
65 | 
66 |     sample_data = {}
67 |     sample_data['images'] = imgs
68 |     sample_data['annotations'] = anns
69 |     sample_data['categories'] = list(coco_api.cats.values())
70 |     sample_data['info'] = coco_api.dataset['info']
71 |     sample_data['licenses'] = coco_api.dataset['licenses']
72 | 
73 |     output_file_label = root_dir + 'instances_w_indicator_val2017.json'
74 |     ## save to json
75 |     with open(output_file_label, 'w') as f:
76 |         print('writing to json output:', output_file_label)
77 |         json.dump(sample_data, f, sort_keys=True)
78 | 
79 | if __name__ == '__main__':
80 |     main()
81 |     print("finished!")
82 | 


--------------------------------------------------------------------------------
/scripts/add_indicator_to_objects365val.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: Apache-2.0
 3 | 
 4 | import json
 5 | import numpy as np
 6 | import random
 7 | from pycocotools.coco import COCO
 8 | 
 9 | def main():
10 | 
11 |     root_dir = '../objects365/annotations/'
12 |     json_file = root_dir + 'objects365_val.json'
13 |     coco_api = COCO(json_file)
14 |     img_ids = sorted(coco_api.imgs.keys())
15 |     imgs = coco_api.loadImgs(img_ids)
16 | 
17 |     # add indicator
18 |     for i_img in imgs:
19 |         i_img['indicator'] = 1
20 |         i_img['label_type'] = 'fully'
21 |     dataset_anns = [coco_api.imgToAnns[i] for i in img_ids]
22 |     anns = [ann for img_anns in dataset_anns for ann in img_anns]
23 | 
24 |     ith = 0
25 |     for i_ann in anns:
26 |         if len(i_ann['bbox']) > 0:
27 |             boxes = i_ann['bbox']
28 |             boxes = np.array(boxes)
29 |             x0 = int(boxes[0])
30 |             y0 = int(boxes[1])
31 |             x1 = int(boxes[0]) + int(boxes[2])
32 |             y1 = int(boxes[1]) + int(boxes[3])
33 |             point_x = random.randint(x0, x1)
34 |             point_y = random.randint(y0, y1)
35 |             i_ann['point'] = [float(point_x), float(point_y)]
36 |         else:
37 |             i_ann['point'] = []
38 |             print(i_ann['bbox'])
39 |         ith = ith + 1
40 |         if ith % 10000 == 0:
41 |             print(ith)
42 | 
43 |     data = {}
44 |     data['images'] = imgs
45 |     data['annotations'] = anns
46 |     data['categories'] = list(coco_api.cats.values())
47 |     output_file_label = root_dir + 'objects365_val_w_indicator.json'
48 |     ## save to json
49 |     with open(output_file_label, 'w') as f:
50 |         print('writing to json output:', output_file_label)
51 |         json.dump(data, f, sort_keys=True)
52 | 
53 | if __name__ == '__main__':
54 |     main()
55 |     print("finished!")
56 | 


--------------------------------------------------------------------------------
/scripts/build_crowdhuman_dataset.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: Apache-2.0
  3 | 
  4 | import json
  5 | import numpy as np
  6 | import random
  7 | import cv2
  8 | from pycocotools.coco import COCO
  9 | 
 10 | def main():
 11 | 
 12 |     root_dir = '../crowdhuman/'
 13 |     data_set = 'train'
 14 |     json_file = root_dir + '{}.json'.format(data_set)
 15 |     coco_api = COCO(json_file)
 16 |     catIds = coco_api.getCatIds(catNms=['person'])
 17 |     imgIds = coco_api.getImgIds(catIds=catIds)
 18 |     imgIds.sort()
 19 | 
 20 |     imgs = coco_api.loadImgs(imgIds)
 21 |     dataset_anns = [coco_api.imgToAnns[img_id] for img_id in imgIds]
 22 |     anns = [ann for img_anns in dataset_anns for ann in img_anns]
 23 | 
 24 |     anns_new = []
 25 |     for i_ann in anns:
 26 |         if i_ann['category_id'] == 1:
 27 |             anns_new.append(i_ann)
 28 |     anns = anns_new
 29 | 
 30 |     id_to_downscale = []
 31 |     ratio_to_downscale = []
 32 |     ith = 0
 33 |     for i_img in imgs:
 34 |         if i_img['width'] > 600 or i_img['height'] > 600:
 35 |             width = i_img['width']
 36 |             height = i_img['height']
 37 |             if (width <= height and width == 600) or (height <= width and height == 600):
 38 |                 oh = height
 39 |                 ow = width
 40 |             if width < height:
 41 |                 ow = 600
 42 |                 oh = int(600 * height / width)
 43 |             else:
 44 |                 oh = 600
 45 |                 ow = int(600 * width / height)
 46 | 
 47 |             id_to_downscale.append(i_img['id'])
 48 |             original_img = cv2.imread(root_dir + 'Images/' + i_img['file_name'])
 49 |             resized_img = cv2.resize(original_img, (ow, oh))
 50 |             ratios = [float(ow) / float(width), float(oh) / float(height)]
 51 |             ratio_width, ratio_height = ratios
 52 |             cv2.imwrite(root_dir + 'Images/' + i_img['file_name'], resized_img)
 53 |             ratio_to_downscale.append([ratio_width, ratio_height])
 54 |         ith = ith + 1
 55 |         if ith % 10 == 0:
 56 |             print(ith)
 57 |     jth = 0
 58 |     for i_ann in anns:
 59 |         if i_ann['image_id'] in id_to_downscale:
 60 |             ith = id_to_downscale.index(i_ann['image_id'])
 61 |             ratio_width, ratio_height = ratio_to_downscale[ith]
 62 |             box_i = i_ann['bbox']
 63 |             boxes = np.array(box_i)
 64 |             boxes[2:] += boxes[:2]
 65 |             xmin = boxes[0]
 66 |             ymin = boxes[1]
 67 |             xmax = boxes[2]
 68 |             ymax = boxes[3]
 69 |             xmin = xmin * ratio_width
 70 |             xmax = xmax * ratio_width
 71 |             ymin = ymin * ratio_height
 72 |             ymax = ymax * ratio_height
 73 |             o_width = abs(xmax - xmin)
 74 |             o_height = abs(ymax - ymin)
 75 |             i_ann['bbox'] = [xmin, ymin, o_width, o_height]
 76 |             i_ann['area'] = o_width * o_height
 77 |         jth = jth + 1
 78 |         if jth % 1000 == 0:
 79 |             print(jth)
 80 | 
 81 |     sample_data = {}
 82 |     sample_data['images'] = imgs
 83 |     sample_data['annotations'] = anns
 84 |     sample_data['categories'] = [list(coco_api.cats.values())[0]]
 85 | 
 86 |     output_file_label = '{}{}_fullbody.json'.format(root_dir, data_set)
 87 |     ## save to json
 88 |     with open(output_file_label, 'w') as f:
 89 |         print('writing to json output:', output_file_label)
 90 |         json.dump(sample_data, f, sort_keys=True)
 91 | 
 92 |     data_set = 'val'
 93 |     json_file = root_dir + '{}.json'.format(data_set)
 94 |     coco_api = COCO(json_file)
 95 | 
 96 |     catIds = coco_api.getCatIds(catNms=['person'])
 97 |     imgIds = coco_api.getImgIds(catIds=catIds)
 98 |     imgIds.sort()
 99 |     imgs = coco_api.loadImgs(imgIds)
100 |     dataset_anns = [coco_api.imgToAnns[img_id] for img_id in imgIds]
101 |     anns = [ann for img_anns in dataset_anns for ann in img_anns]
102 | 
103 |     # add indicator
104 |     for i_img in imgs:
105 |         i_img['indicator'] = 1
106 |         i_img['label_type'] = 'fully'
107 | 
108 |     anns_new = []
109 |     for i_ann in anns:
110 |         if i_ann['category_id'] == 1:
111 |             anns_new.append(i_ann)
112 |     anns = anns_new
113 | 
114 |     ith = 0
115 |     for i_ann in anns:
116 |         if len(i_ann['bbox']) > 0:
117 |             boxes = i_ann['bbox']
118 |             boxes = np.array(boxes)
119 |             x0 = int(boxes[0])
120 |             y0 = int(boxes[1])
121 |             x1 = int(boxes[0]) + int(boxes[2])
122 |             y1 = int(boxes[1]) + int(boxes[3])
123 |             point_x = random.randint(x0, x1)
124 |             point_y = random.randint(y0, y1)
125 |             i_ann['point'] = [float(point_x), float(point_y)]
126 |         else:
127 |             i_ann['point'] = []
128 |             print(i_ann['bbox'])
129 |         ith = ith + 1
130 |         if ith % 1000 == 0:
131 |             print(ith)
132 | 
133 |     sample_data = {}
134 |     sample_data['images'] = imgs
135 |     sample_data['annotations'] = anns
136 |     sample_data['categories'] = [list(coco_api.cats.values())[0]]
137 |     output_file_label = '{}{}_fullbody.json'.format(root_dir, 'test')
138 |     ## save to json
139 |     with open(output_file_label, 'w') as f:
140 |         print('writing to json output:', output_file_label)
141 |         json.dump(sample_data, f, sort_keys=True)
142 | 
143 | if __name__ == '__main__':
144 |     main()
145 |     print("finished!")
146 | 


--------------------------------------------------------------------------------
/scripts/combine_voc_trainval20072012.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: Apache-2.0
 3 | 
 4 | import json
 5 | import random
 6 | from pycocotools.coco import COCO
 7 | 
 8 | def main():
 9 |     root_dir = '../voc/VOCdevkit/VOC2007trainval/'
10 |     data_set = 'instances_VOC_trainval2007'
11 |     json_file = root_dir + '{}.json'.format(data_set)
12 |     coco_api = COCO(json_file)
13 | 
14 |     img_ids = sorted(coco_api.imgs.keys())
15 |     num_imgs = len(img_ids)
16 |     num_samples = num_imgs
17 |     sample_ids = random.sample(img_ids, num_samples)
18 |     sample_ids = sorted(sample_ids)
19 |     imgs_2007 = coco_api.loadImgs(sample_ids)
20 | 
21 |     dataset_anns = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
22 |     anns_2007 = [ann for img_anns in dataset_anns for ann in img_anns]
23 | 
24 |     root_dir = '../voc/VOCdevkit/VOC2012trainval/'
25 |     data_set = 'instances_VOC_trainval2012'
26 |     json_file = root_dir + '{}.json'.format(data_set)
27 |     coco_api = COCO(json_file)
28 | 
29 |     img_ids = sorted(coco_api.imgs.keys())
30 |     num_imgs = len(img_ids)
31 |     num_samples = num_imgs
32 |     sample_ids = random.sample(img_ids, num_samples)
33 |     sample_ids = sorted(sample_ids)
34 |     imgs_2012 = coco_api.loadImgs(sample_ids)
35 | 
36 |     dataset_anns = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
37 |     anns_2012 = [ann for img_anns in dataset_anns for ann in img_anns]
38 | 
39 |     imgs = []
40 |     imgs.extend(imgs_2007)
41 |     imgs.extend(imgs_2012)
42 | 
43 |     anns = []
44 |     anns.extend(anns_2007)
45 |     anns.extend(anns_2012)
46 | 
47 |     sample_data = {}
48 |     sample_data['images'] = imgs
49 |     sample_data['annotations'] = anns
50 |     sample_data['categories'] = list(coco_api.cats.values())
51 | 
52 |     root_dir_new = '../voc/VOCdevkit/VOC20072012trainval/'
53 |     data_set_name = 'instances_VOC_trainval20072012'
54 |     output_file_label = '{}{}.json'.format(root_dir_new, data_set_name)
55 |     ## save to json
56 |     with open(output_file_label, 'w') as f:
57 |         print('writing to json output:', output_file_label)
58 |         json.dump(sample_data, f, sort_keys=True)
59 | 
60 | 
61 | if __name__ == '__main__':
62 |     main()
63 |     print("finished!")
64 | 


--------------------------------------------------------------------------------
/scripts/convert_crowdhuman_to_coco.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: Apache-2.0
 3 | 
 4 | import os
 5 | import numpy as np
 6 | import json
 7 | import cv2
 8 | import shutil
 9 | 
10 | DATA_PATH = '../crowdhuman/'
11 | OUT_PATH = DATA_PATH
12 | SPLITS = ['val', 'train']
13 | DEBUG = False
14 | 
15 | 
16 | def load_func(fpath):
17 |     print('fpath', fpath)
18 |     assert os.path.exists(fpath)
19 |     with open(fpath,'r') as fid:
20 |         lines = fid.readlines()
21 |     records =[json.loads(line.strip('\n')) for line in lines]
22 |     return records
23 | 
24 | 
25 | if __name__ == '__main__':
26 |     if not os.path.exists(OUT_PATH):
27 |         os.mkdir(OUT_PATH)
28 |     for split in SPLITS:
29 |         data_path = DATA_PATH + split
30 |         out_path = OUT_PATH + '{}.json'.format(split)
31 |         out = {'images': [], 'annotations': [],
32 |                'categories': [{'id': 1, 'name': 'person'}]}
33 |         ann_path = DATA_PATH + 'annotation_{}.odgt'.format(split)
34 |         anns_data = load_func(ann_path)
35 |         image_cnt = 0
36 |         ann_cnt = 0
37 |         video_cnt = 0
38 |         for ann_data in anns_data:
39 |             image_cnt += 1
40 | 
41 |             file_name_org = '{}.jpg'.format(ann_data['ID'])
42 |             file_name = file_name_org.replace(',', '_')
43 |             img_path = DATA_PATH + 'Images/' + file_name
44 |             if not os.path.exists(img_path):
45 |                 img_path_org = DATA_PATH + 'Images/' + file_name_org
46 |                 shutil.move(img_path_org, img_path)
47 |             print(img_path)
48 |             img = cv2.imread(img_path)
49 |             dimensions = img.shape
50 | 
51 |             image_info = {'file_name': file_name, 'height': dimensions[0], 'width': dimensions[1], 'id': image_cnt}
52 | 
53 |             out['images'].append(image_info)
54 |             if split != 'test':
55 |                 anns = ann_data['gtboxes']
56 |                 for i in range(len(anns)):
57 |                     ann_cnt += 1
58 |                     fbox = anns[i]['fbox']
59 |                     ann = {'id': ann_cnt, 'category_id': 1, 'image_id': image_cnt, 'bbox_vis': anns[i]['vbox'],
60 |                            'bbox': fbox, 'area': fbox[2] * fbox[3],
61 |                            'iscrowd': 1 if 'extra' in anns[i] and
62 |                                            'ignore' in anns[i]['extra'] and
63 |                                            anns[i]['extra']['ignore'] == 1 else 0}
64 |                     out['annotations'].append(ann)
65 |         print('loaded {} for {} images and {} samples'.format(
66 |           split, len(out['images']), len(out['annotations'])))
67 |         json.dump(out, open(out_path, 'w'))
68 | 


--------------------------------------------------------------------------------
/scripts/prepare_objects365_for_omni.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: Apache-2.0
 3 | 
 4 | import json
 5 | import numpy as np
 6 | import random
 7 | from pycocotools.coco import COCO
 8 | 
 9 | def main():
10 | 
11 |     root_dir = '../objects365/annotations/'
12 |     data_set = 'objects365_train'
13 |     json_file = root_dir + '{}.json'.format(data_set)
14 |     coco_api = COCO(json_file)
15 |     random_seed = 1709
16 |     maximum_example_per_category = 300
17 | 
18 |     random.seed(random_seed)
19 | 
20 |     class_index_list = []  # 1->365
21 |     category = coco_api.cats
22 |     category_list = []
23 |     for key, value in category.items():
24 |         class_index_list.append(key)
25 |         category_list.append(value['name'])
26 | 
27 |     # for each category we sample some examples
28 |     category_count = 0
29 |     all_satisfied_imgID = []
30 |     for i_category in category_list:
31 |         catIds = coco_api.getCatIds(catNms=[i_category])
32 |         imgIds_i = coco_api.getImgIds(catIds=catIds)
33 |         augmented_list = list(set(imgIds_i) - set(all_satisfied_imgID))
34 |         if len(augmented_list) > maximum_example_per_category:
35 |             augmented_list = random.sample(augmented_list, maximum_example_per_category)
36 |         all_satisfied_imgID.extend(augmented_list)
37 |         category_count += 1
38 | 
39 |     print(category_count)
40 | 
41 |     # original statistics
42 |     num_classes = len(coco_api.cats)
43 |     histogram = np.zeros((num_classes,), dtype=np.int)
44 |     random.seed(random_seed)
45 | 
46 |     sample_ids = sorted(all_satisfied_imgID)
47 |     imgs = coco_api.loadImgs(sample_ids)
48 | 
49 |     dataset_anns = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
50 |     anns = [ann for img_anns in dataset_anns for ann in img_anns]
51 | 
52 |     # sampled statistics
53 |     classes = [x["category_id"] for x in anns]
54 |     for i in classes:
55 |         index = class_index_list.index(int(i))
56 |         histogram[index] = histogram[index] + 1
57 |     class_ratios = histogram / np.sum(histogram)
58 |     print("sampled class ratios: {}".format(class_ratios))
59 |     print("each class has at least one example ", np.min(histogram)>0)
60 | 
61 |     sample_data = {}
62 |     sample_data['images'] = imgs
63 |     sample_data['annotations'] = anns
64 |     sample_data['categories'] = list(coco_api.cats.values())
65 | 
66 |     output_file_label = '{}{}_sampled.json'.format(root_dir, data_set)
67 |     ## save to json
68 |     with open(output_file_label, 'w') as f:
69 |         print('writing to json output:', output_file_label)
70 |         json.dump(sample_data, f, sort_keys=True)
71 | 
72 | 
73 | if __name__ == '__main__':
74 |     main()
75 |     print("finished!")
76 | 


--------------------------------------------------------------------------------
/scripts/prepare_voc_dataset.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: Apache-2.0
  3 | 
  4 | import json
  5 | import numpy as np
  6 | import random
  7 | from pycocotools.coco import COCO
  8 | 
  9 | def main():
 10 | 
 11 |     VOC2007trainval = True
 12 |     VOC2007test = True
 13 |     VOC2012trainval = True
 14 | 
 15 |     if VOC2007trainval:
 16 |         root_dir = '../voc/VOCdevkit/VOC2007trainval/'
 17 |         data_set = 'instances_VOC_trainval2007'
 18 |         json_file = root_dir + '{}.json'.format(data_set)
 19 |         coco_api = COCO(json_file)
 20 | 
 21 |         # original statistics
 22 |         num_classes = len(coco_api.cats)
 23 | 
 24 |         class_index_list = []
 25 |         category = coco_api.cats
 26 |         for key, value in category.items():
 27 |             class_index_list.append(key)
 28 | 
 29 |         histogram = np.zeros((num_classes,), dtype=np.int)
 30 | 
 31 |         img_ids = sorted(coco_api.imgs.keys())
 32 |         num_imgs = len(img_ids)
 33 |         num_samples = num_imgs
 34 |         sample_ids = random.sample(img_ids, num_samples)
 35 |         sample_ids = sorted(sample_ids)
 36 |         imgs = coco_api.loadImgs(sample_ids)
 37 | 
 38 |         # add indicator
 39 |         for i_img in imgs:
 40 |             i_img['indicator'] = 1
 41 |             i_img['label_type'] = 'fully'
 42 | 
 43 |         dataset_anns = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
 44 |         anns = [ann for img_anns in dataset_anns for ann in img_anns]
 45 | 
 46 |         ith = 0
 47 |         for i_ann in anns:
 48 |             if len(i_ann['bbox']) > 0:
 49 |                 boxes = i_ann['bbox']
 50 |                 boxes = np.array(boxes)
 51 |                 boxes[2:] += boxes[:2]
 52 |                 x0 = boxes[0]
 53 |                 y0 = boxes[1]
 54 |                 x1 = boxes[2]
 55 |                 y1 = boxes[3]
 56 |                 cx, cy, w, h = (x0 + x1) / 2, (y0 + y1) / 2, (x1 - x0), (y1 - y0)
 57 |                 mean = (cy, cx)
 58 |                 cov = [[h / 2, 0], [0, w / 2]]
 59 |                 sampled_point_i = np.random.multivariate_normal(mean, cov, 1)
 60 |                 i_ann['point'] = [sampled_point_i[0, 1], sampled_point_i[0, 0]]
 61 |             else:
 62 |                 i_ann['point'] = []
 63 |                 print(i_ann['bbox'])
 64 |             ith = ith + 1
 65 |             if ith % 10000 == 0:
 66 |                 print(ith)
 67 | 
 68 |         # sampled statistics
 69 |         classes = [x["category_id"] for x in anns]
 70 |         for i in classes:
 71 |             index = class_index_list.index(int(i))
 72 |             histogram[index] = histogram[index] + 1
 73 |         class_ratios = histogram / np.sum(histogram)
 74 |         print("sampled class ratios: {}".format(class_ratios))
 75 |         print("each class has at least one example ", np.min(histogram)>0)
 76 | 
 77 |         sample_data = {}
 78 |         sample_data['images'] = imgs
 79 |         sample_data['annotations'] = anns
 80 |         sample_data['categories'] = list(coco_api.cats.values())
 81 | 
 82 |         output_file_label = '{}{}_semi_label.json'.format(root_dir, data_set)
 83 |         ## save to json
 84 |         with open(output_file_label, 'w') as f:
 85 |             print('writing to json output:', output_file_label)
 86 |             json.dump(sample_data, f, sort_keys=True)
 87 | 
 88 |     if VOC2007test:
 89 |         root_dir = '../voc/VOCdevkit/VOC2007test/'
 90 |         data_set = 'instances_VOC_test2007'
 91 |         json_file = root_dir + '{}.json'.format(data_set)
 92 |         coco_api = COCO(json_file)
 93 | 
 94 |         # original statistics
 95 |         num_classes = len(coco_api.cats)
 96 | 
 97 |         class_index_list = []
 98 |         category = coco_api.cats
 99 |         for key, value in category.items():
100 |             class_index_list.append(key)
101 | 
102 |         histogram = np.zeros((num_classes,), dtype=np.int)
103 | 
104 |         img_ids = sorted(coco_api.imgs.keys())
105 |         num_imgs = len(img_ids)
106 |         num_samples = num_imgs
107 |         sample_ids = random.sample(img_ids, num_samples)
108 |         sample_ids = sorted(sample_ids)
109 |         imgs = coco_api.loadImgs(sample_ids)
110 | 
111 |         # add indicator
112 |         for i_img in imgs:
113 |             i_img['indicator'] = 1
114 |             i_img['label_type'] = 'fully'
115 | 
116 |         dataset_anns = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
117 |         anns = [ann for img_anns in dataset_anns for ann in img_anns]
118 | 
119 |         ith = 0
120 |         for i_ann in anns:
121 |             if len(i_ann['bbox']) > 0:
122 |                 boxes = i_ann['bbox']
123 |                 boxes = np.array(boxes)
124 |                 boxes[2:] += boxes[:2]
125 |                 x0 = boxes[0]
126 |                 y0 = boxes[1]
127 |                 x1 = boxes[2]
128 |                 y1 = boxes[3]
129 |                 cx, cy, w, h = (x0 + x1) / 2, (y0 + y1) / 2, (x1 - x0), (y1 - y0)
130 |                 mean = (cy, cx)
131 |                 cov = [[h / 2, 0], [0, w / 2]]
132 |                 sampled_point_i = np.random.multivariate_normal(mean, cov, 1)
133 |                 i_ann['point'] = [sampled_point_i[0, 1], sampled_point_i[0, 0]]
134 |             else:
135 |                 i_ann['point'] = []
136 |                 print(i_ann['bbox'])
137 |             ith = ith + 1
138 |             if ith % 10000 == 0:
139 |                 print(ith)
140 | 
141 |         # sampled statistics
142 |         classes = [x["category_id"] for x in anns]
143 |         for i in classes:
144 |             index = class_index_list.index(int(i))
145 |             histogram[index] = histogram[index] + 1
146 |         class_ratios = histogram / np.sum(histogram)
147 |         print("sampled class ratios: {}".format(class_ratios))
148 |         print("each class has at least one example ", np.min(histogram) > 0)
149 | 
150 |         sample_data = {}
151 |         sample_data['images'] = imgs
152 |         sample_data['annotations'] = anns
153 |         sample_data['categories'] = list(coco_api.cats.values())
154 | 
155 |         output_file_label = '{}{}.json'.format(root_dir, data_set)
156 |         ## save to json
157 |         with open(output_file_label, 'w') as f:
158 |             print('writing to json output:', output_file_label)
159 |             json.dump(sample_data, f, sort_keys=True)
160 | 
161 |     if VOC2012trainval:
162 |         root_dir = '../voc/VOCdevkit/VOC2012trainval/'
163 |         data_set = 'instances_VOC_trainval2012'
164 |         json_file = root_dir + '{}.json'.format(data_set)
165 |         coco_api = COCO(json_file)
166 | 
167 |         # original statistics
168 |         num_classes = len(coco_api.cats)
169 | 
170 |         class_index_list = []
171 |         category = coco_api.cats
172 |         for key, value in category.items():
173 |             class_index_list.append(key)
174 | 
175 |         histogram = np.zeros((num_classes,), dtype=np.int)
176 | 
177 |         img_ids = sorted(coco_api.imgs.keys())
178 |         num_imgs = len(img_ids)
179 |         num_samples = num_imgs
180 |         sample_ids = random.sample(img_ids, num_samples)
181 |         sample_ids = sorted(sample_ids)
182 |         imgs = coco_api.loadImgs(sample_ids)
183 | 
184 |         # add indicator
185 |         for i_img in imgs:
186 |             i_img['indicator'] = 0
187 |             i_img['label_type'] = 'Unsup'
188 | 
189 |         dataset_anns = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
190 |         anns = [ann for img_anns in dataset_anns for ann in img_anns]
191 | 
192 |         ith = 0
193 |         for i_ann in anns:
194 |             if len(i_ann['bbox']) > 0:
195 |                 boxes = i_ann['bbox']
196 |                 boxes = np.array(boxes)
197 |                 boxes[2:] += boxes[:2]
198 |                 x0 = boxes[0]
199 |                 y0 = boxes[1]
200 |                 x1 = boxes[2]
201 |                 y1 = boxes[3]
202 |                 cx, cy, w, h = (x0 + x1) / 2, (y0 + y1) / 2, (x1 - x0), (y1 - y0)
203 |                 mean = (cy, cx)
204 |                 cov = [[h / 2, 0], [0, w / 2]]
205 |                 sampled_point_i = np.random.multivariate_normal(mean, cov, 1)
206 |                 i_ann['point'] = [sampled_point_i[0, 1], sampled_point_i[0, 0]]
207 |             else:
208 |                 i_ann['point'] = []
209 |                 print(i_ann['bbox'])
210 |             ith = ith + 1
211 |             if ith % 10000 == 0:
212 |                 print(ith)
213 | 
214 |         # sampled statistics
215 |         classes = [x["category_id"] for x in anns]
216 |         for i in classes:
217 |             index = class_index_list.index(int(i))
218 |             histogram[index] = histogram[index] + 1
219 |         class_ratios = histogram / np.sum(histogram)
220 |         print("sampled class ratios: {}".format(class_ratios))
221 |         print("each class has at least one example ", np.min(histogram) > 0)
222 | 
223 |         sample_data = {}
224 |         sample_data['images'] = imgs
225 |         sample_data['annotations'] = anns
226 |         sample_data['categories'] = list(coco_api.cats.values())
227 | 
228 |         output_file_label = '{}{}_semi_unlabel.json'.format(root_dir, data_set)
229 |         ## save to json
230 |         with open(output_file_label, 'w') as f:
231 |             print('writing to json output:', output_file_label)
232 |             json.dump(sample_data, f, sort_keys=True)
233 | 
234 | if __name__ == '__main__':
235 |     main()
236 |     print("finished!")
237 | 


--------------------------------------------------------------------------------
/scripts/split_bees_train_val.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: Apache-2.0
  3 | 
  4 | import json
  5 | import numpy as np
  6 | import random
  7 | from pycocotools.coco import COCO
  8 | 
  9 | def main():
 10 | 
 11 |     root_dir = '../bees/'
 12 |     data_set = 'instances_bees'
 13 |     json_file = root_dir + '{}.json'.format(data_set)
 14 |     coco_api = COCO(json_file)
 15 |     random_seed = 1709
 16 | 
 17 |     ratio_label = 0.08
 18 |     ratio_unlabel = 0.72
 19 |     ratio_val = 0.2
 20 | 
 21 |     random.seed(random_seed)
 22 | 
 23 |     img_ids = sorted(coco_api.imgs.keys())
 24 |     num_imgs = len(img_ids)
 25 |     num_samples = round(num_imgs*ratio_label)
 26 |     sample_ids = random.sample(img_ids, num_samples)
 27 |     sample_ids = sorted(sample_ids)
 28 |     imgs_label = coco_api.loadImgs(sample_ids)
 29 | 
 30 |     dataset_anns = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
 31 |     anns_label = [ann for img_anns in dataset_anns for ann in img_anns]
 32 | 
 33 |     # sample unlabel data
 34 |     unsampled_ids = list(set(img_ids) - set(sample_ids))
 35 |     unsampled_ids = sorted(unsampled_ids)
 36 |     num_samples = round(num_imgs * ratio_unlabel)
 37 | 
 38 |     sample_ids = random.sample(unsampled_ids, num_samples)
 39 |     sample_ids = sorted(sample_ids)
 40 |     imgs_unlabel = coco_api.loadImgs(sample_ids)
 41 | 
 42 |     dataset_anns_u = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
 43 |     anns_unlabel = [ann for img_anns in dataset_anns_u for ann in img_anns]
 44 | 
 45 |     imgs = []
 46 |     imgs.extend(imgs_label)
 47 |     imgs.extend(imgs_unlabel)
 48 | 
 49 |     anns = []
 50 |     anns.extend(anns_label)
 51 |     anns.extend(anns_unlabel)
 52 | 
 53 |     sample_data = {}
 54 |     sample_data['images'] = imgs
 55 |     sample_data['annotations'] = anns
 56 |     sample_data['categories'] = list(coco_api.cats.values())
 57 | 
 58 |     output_file_label = '{}{}_train.json'.format(root_dir, data_set)
 59 |     ## save to json
 60 |     with open(output_file_label, 'w') as f:
 61 |         print('writing to json output:', output_file_label)
 62 |         json.dump(sample_data, f, sort_keys=True)
 63 | 
 64 |     # update the rest sample pool
 65 |     unsampled_ids = list(set(unsampled_ids) - set(sample_ids))
 66 |     unsampled_ids = sorted(unsampled_ids)
 67 | 
 68 |     num_samples = round(num_imgs * ratio_val)
 69 | 
 70 |     if num_samples > len(unsampled_ids):  #
 71 |         num_samples = len(unsampled_ids)
 72 | 
 73 |     sample_ids = random.sample(unsampled_ids, num_samples)
 74 |     sample_ids = sorted(sample_ids)
 75 |     imgs = coco_api.loadImgs(sample_ids)
 76 | 
 77 |     # add indicator
 78 |     for i_img in imgs:
 79 |         i_img['indicator'] = 0
 80 |         i_img['label_type'] = 'fully'
 81 | 
 82 |     dataset_anns = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
 83 |     anns = [ann for img_anns in dataset_anns for ann in img_anns]
 84 | 
 85 |     ith = 0
 86 |     for i_ann in anns:
 87 |         if len(i_ann['bbox']) > 0:
 88 |             boxes = i_ann['bbox']
 89 |             boxes = np.array(boxes)
 90 |             x0 = int(boxes[0])
 91 |             y0 = int(boxes[1])
 92 |             x1 = int(boxes[0]) + int(boxes[2])
 93 |             y1 = int(boxes[1]) + int(boxes[3])
 94 |             point_x = random.randint(x0, x1)
 95 |             point_y = random.randint(y0, y1)
 96 |             i_ann['point'] = [float(point_x), float(point_y)]
 97 |         else:
 98 |             i_ann['point'] = []
 99 |             print(i_ann['bbox'])
100 |         ith = ith + 1
101 |         if ith % 1000 == 0:
102 |             print(ith)
103 | 
104 |     sample_data = {}
105 |     sample_data['images'] = imgs
106 |     sample_data['annotations'] = anns
107 |     sample_data['categories'] = list(coco_api.cats.values())
108 | 
109 |     output_file_label = '{}{}_val.json'.format(root_dir, data_set)
110 |     ## save to json
111 |     with open(output_file_label, 'w') as f:
112 |         print('writing to json output:', output_file_label)
113 |         json.dump(sample_data, f, sort_keys=True)
114 | 
115 | 
116 | if __name__ == '__main__':
117 |     main()
118 |     print("finished!")
119 | 


--------------------------------------------------------------------------------
/scripts/split_dataset_bees_omni.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: Apache-2.0
  3 | 
  4 | import json
  5 | import numpy as np
  6 | import random
  7 | from pycocotools.coco import COCO
  8 | 
  9 | def main():
 10 | 
 11 |     root_dir = '../bees/'
 12 |     data_set = 'instances_bees_train'
 13 |     json_file = root_dir + '{}.json'.format(data_set)
 14 |     coco_api = COCO(json_file)
 15 |     random_seed = 1709
 16 | 
 17 |     # assign the ratio, the sum should be 1
 18 |     fully_labeled = 0.2
 19 |     Unsup = 0.0
 20 |     tagsU = 0.0
 21 |     tagsK = 0.34
 22 |     pointsU = 0.0
 23 |     pointsK = 0.0
 24 |     boxesEC = 0.46
 25 |     boxesU = 0.0
 26 |     assert sum([fully_labeled, Unsup, tagsU, tagsK, pointsU, pointsK, boxesEC, boxesU]) == 1.0
 27 | 
 28 |     # we first sample the fully label data
 29 |     # original statistics
 30 |     num_classes = len(coco_api.cats)
 31 |     class_index_list = []
 32 |     category = coco_api.cats
 33 |     for key, value in category.items():
 34 |         class_index_list.append(key)
 35 |     histogram = np.zeros((num_classes,), dtype=np.int)
 36 | 
 37 |     img_ids = sorted(coco_api.imgs.keys())
 38 |     num_imgs = len(img_ids)
 39 | 
 40 |     random.seed(random_seed)
 41 | 
 42 |     According_to_num = False  # if it is False, we split by percentage
 43 |     if According_to_num:
 44 |         num_samples = 578
 45 |     else:
 46 |         ratio = fully_labeled
 47 |         num_samples = round(num_imgs * ratio)
 48 | 
 49 |     sample_ids = random.sample(img_ids, num_samples)
 50 |     sample_ids = sorted(sample_ids)
 51 |     imgs = coco_api.loadImgs(sample_ids)
 52 | 
 53 |     # add indicator
 54 |     for i_img in imgs:
 55 |         i_img['indicator'] = 1
 56 |         i_img['label_type'] = 'fully'
 57 | 
 58 |     dataset_anns = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
 59 |     anns = [ann for img_anns in dataset_anns for ann in img_anns]
 60 | 
 61 |     ith = 0
 62 |     for i_ann in anns:
 63 |         if len(i_ann['bbox']) > 0:
 64 |             boxes = i_ann['bbox']
 65 |             boxes = np.array(boxes)
 66 |             x0 = int(boxes[0])
 67 |             y0 = int(boxes[1])
 68 |             x1 = int(boxes[0]) + int(boxes[2])
 69 |             y1 = int(boxes[1]) + int(boxes[3])
 70 |             point_x = random.randint(x0, x1)
 71 |             point_y = random.randint(y0, y1)
 72 |             i_ann['point'] = [float(point_x), float(point_y)]
 73 |         else:
 74 |             i_ann['point'] = []
 75 |             print(i_ann['bbox'])
 76 |         ith = ith + 1
 77 |         if ith % 1000 == 0:
 78 |             print(ith)
 79 | 
 80 |     # sampled statistics
 81 |     classes = [x["category_id"] for x in anns]
 82 |     for i in classes:
 83 |         index = class_index_list.index(int(i))
 84 |         histogram[index] = histogram[index] + 1
 85 |     class_ratios = histogram / np.sum(histogram)
 86 |     print("sampled class ratios: {}".format(class_ratios))
 87 |     print("each class has at least one example ", np.min(histogram) > 0)
 88 | 
 89 |     sample_data = {}
 90 |     sample_data['images'] = imgs
 91 |     sample_data['annotations'] = anns
 92 |     sample_data['categories'] = list(coco_api.cats.values())
 93 | 
 94 |     output_file_label = '{}{}_bees_omni_label_seed{}_{}fully{}Unsup{}tagsU{}tagsK{}pointsU{}pointsK{}boxesEC{}boxesU.json'.format(root_dir, data_set, random_seed, round(100*fully_labeled), round(100*Unsup), round(100*tagsU), round(100*tagsK), round(100*pointsU), round(100*pointsK), round(100*boxesEC), round(100*boxesU))
 95 |     ## save to json
 96 |     with open(output_file_label, 'w') as f:
 97 |         print('writing to json output:', output_file_label)
 98 |         json.dump(sample_data, f, sort_keys=True)
 99 | 
100 |     # next deal with unlabel (weakly label) data
101 |     unsampled_ids = list(set(img_ids) - set(sample_ids))
102 |     unsampled_ids = sorted(unsampled_ids)
103 | 
104 |     splitting = {}
105 |     imgs_all = []
106 |     anns_all = []
107 |     if According_to_num:
108 |         splitting['tagsU'] = 2500
109 |         splitting['tagsK'] = 1250
110 |         splitting['points'] = 2255
111 |         splitting['Unsup'] = 5417
112 |         splitting['pointsOnly'] = 1111
113 |         splitting['boxesN'] = 1111
114 |         splitting['boxes'] = 1111
115 |     else:
116 |         splitting['tagsU'] = round(num_imgs * tagsU)
117 |         splitting['tagsK'] = round(num_imgs * tagsK)
118 |         splitting['pointsU'] = round(num_imgs * pointsU)
119 |         splitting['pointsK'] = round(num_imgs * pointsK)
120 |         splitting['Unsup'] = round(num_imgs * Unsup)
121 |         splitting['boxesEC'] = round(num_imgs * boxesEC)
122 |         splitting['boxesU'] = round(num_imgs * boxesU)
123 | 
124 |     for key, value in splitting.items():
125 |         histogram = np.zeros((num_classes,), dtype=np.int)
126 |         num_samples = value
127 |         if num_samples > len(unsampled_ids):  #
128 |             num_samples = len(unsampled_ids)
129 | 
130 |         sample_ids = random.sample(unsampled_ids, num_samples)
131 |         sample_ids = sorted(sample_ids)
132 |         imgs = coco_api.loadImgs(sample_ids)
133 | 
134 |         # add indicator
135 |         for i_img in imgs:
136 |             i_img['indicator'] = 0
137 |             i_img['label_type'] = key
138 | 
139 |         dataset_anns_u = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
140 |         anns = [ann for img_anns in dataset_anns_u for ann in img_anns]
141 | 
142 |         ith = 0
143 |         for i_ann in anns:
144 |             if len(i_ann['bbox']) > 0:
145 |                 boxes = i_ann['bbox']
146 |                 boxes = np.array(boxes)
147 |                 x0 = int(boxes[0])
148 |                 y0 = int(boxes[1])
149 |                 x1 = int(boxes[0]) + int(boxes[2])
150 |                 y1 = int(boxes[1]) + int(boxes[3])
151 |                 point_x = random.randint(x0, x1)
152 |                 point_y = random.randint(y0, y1)
153 |                 i_ann['point'] = [float(point_x), float(point_y)]
154 |             else:
155 |                 i_ann['point'] = []
156 |                 print(i_ann['bbox'])
157 |             ith = ith + 1
158 |             if ith % 1000 == 0:
159 |                 print(ith)
160 | 
161 |         if key == 'boxesEC':
162 |             # -----the corresponding between delta and mIoU, I got this by emperical experiments
163 |             delta = [25, 9, 3, 1, 0.5, 0.3, 0.15, 0.06, 0.03, 0.01]
164 |             # mIoU = [0.95, 0.85, 0.75, 0.65, 0.55, 0.45, 0.35, 0.25, 0.15, 0.05]
165 | 
166 |             IoU = np.load('IoU_extreme.npy')
167 | 
168 |             # compute the distribution, bin is 0.1
169 |             IoU_list = list(IoU)
170 |             distribution_extreme = []
171 |             for i in np.arange(0.1, 1.1, 0.1):
172 |                 bin_i = [x for x in IoU_list if x >= i - 0.1 and x < i]
173 |                 distribution_extreme.append(len(bin_i) / len(IoU_list))
174 |             distribution_extreme = distribution_extreme[::-1]
175 | 
176 |             random.shuffle(anns)
177 |             ith_bin = 0
178 |             cur_delta = delta[ith_bin]
179 |             skip_to_next = distribution_extreme[ith_bin] * len(anns)
180 |             ith = 0
181 |             for i_ann in anns:
182 |                 if ith > skip_to_next:
183 |                     ith_bin += 1
184 |                     cur_delta = delta[ith_bin]
185 |                     skip_to_next = skip_to_next + distribution_extreme[ith_bin] * len(anns)
186 |                     print('to next')
187 |                 if 'iscrowd' not in i_ann or i_ann['iscrowd'] == 0:
188 |                     box_i = i_ann['bbox']
189 |                     boxes = np.array(box_i)
190 |                     boxes[2:] += boxes[:2]
191 |                     x0 = boxes[0]
192 |                     y0 = boxes[1]
193 |                     x1 = boxes[2]
194 |                     y1 = boxes[3]
195 | 
196 |                     # add noise to each of the two nodes
197 |                     cx, cy, w, h = (x0 + x1) / 2, (y0 + y1) / 2, (x1 - x0), (y1 - y0)
198 |                     mean = (y0, x0)
199 |                     cov = [[h / cur_delta, 0], [0, w / cur_delta]]
200 |                     sampled_point_i = np.random.multivariate_normal(mean, cov, 1)
201 |                     x0_new, y0_new = sampled_point_i[0, 1], sampled_point_i[0, 0]
202 | 
203 |                     mean = (y1, x1)
204 |                     sampled_point_i = np.random.multivariate_normal(mean, cov, 1)
205 |                     x1_new, y1_new = sampled_point_i[0, 1], sampled_point_i[0, 0]
206 | 
207 |                     x0 = min(x0_new, x1_new)
208 |                     x1 = max(x0_new, x1_new)
209 |                     y0 = min(y0_new, y1_new)
210 |                     y1 = max(y0_new, y1_new)
211 |                     w = x1 - x0
212 |                     h = y1 - y0
213 | 
214 |                     x0 = float(x0)
215 |                     y0 = float(y0)
216 |                     w = float(w)
217 |                     h = float(h)
218 |                     i_ann['bbox'] = [x0, y0, w, h]
219 |                 ith = ith + 1
220 |                 if ith % 10000 == 0:
221 |                     print(ith)
222 | 
223 |         # sampled statistics
224 |         classes = [x["category_id"] for x in anns]
225 |         for i in classes:
226 |             index = class_index_list.index(int(i))
227 |             histogram[index] = histogram[index] + 1
228 |         class_ratios = histogram / np.sum(histogram)
229 |         print("sampled class ratios: {}".format(class_ratios))
230 |         print("each class has at least one example ", np.min(histogram) > 0)
231 | 
232 |         imgs_all.extend(imgs)
233 |         anns_all.extend(anns)
234 | 
235 |         # update the rest sample pool
236 |         unsampled_ids = list(set(unsampled_ids) - set(sample_ids))
237 |         unsampled_ids = sorted(unsampled_ids)
238 | 
239 |     unsample_data = {}
240 |     unsample_data['images'] = imgs_all
241 |     unsample_data['annotations'] = anns_all
242 |     unsample_data['categories'] = list(coco_api.cats.values())
243 | 
244 |     output_file_unlabel = '{}{}_bees_omni_unlabel_seed{}_{}fully{}Unsup{}tagsU{}tagsK{}pointsU{}pointsK{}boxesEC{}boxesU.json'.format(root_dir, data_set, random_seed, round(100*fully_labeled), round(100*Unsup), round(100*tagsU), round(100*tagsK), round(100*pointsU), round(100*pointsK), round(100*boxesEC), round(100*boxesU))
245 |     ## save to json
246 |     with open(output_file_unlabel, 'w') as f:
247 |         print('writing to json output:', output_file_unlabel)
248 |         json.dump(unsample_data, f, sort_keys=True)
249 | 
250 | 
251 | if __name__ == '__main__':
252 |     main()
253 |     print("finished!")
254 | 


--------------------------------------------------------------------------------
/scripts/split_dataset_crowdhuman_omni.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: Apache-2.0
  3 | 
  4 | import json
  5 | import numpy as np
  6 | import random
  7 | from pycocotools.coco import COCO
  8 | 
  9 | def main():
 10 | 
 11 |     root_dir = '../crowdhuman/'
 12 |     data_set = 'train_fullbody'
 13 |     json_file = root_dir + '{}.json'.format(data_set)
 14 |     coco_api = COCO(json_file)
 15 |     random_seed = 1709
 16 | 
 17 |     # assign the ratio, the sum should be 1
 18 |     fully_labeled = 0.2
 19 |     Unsup = 0.0
 20 |     tagsU = 0.0
 21 |     tagsK = 0.0
 22 |     pointsU = 0.34
 23 |     pointsK = 0.0
 24 |     boxesEC = 0.46
 25 |     boxesU = 0.0
 26 |     assert sum([fully_labeled, Unsup, tagsU, tagsK, pointsU, pointsK, boxesEC, boxesU]) == 1.0
 27 | 
 28 |     # we first sample the fully label data
 29 |     # original statistics
 30 |     num_classes = len(coco_api.cats)
 31 |     class_index_list = []
 32 |     category = coco_api.cats
 33 |     for key, value in category.items():
 34 |         class_index_list.append(key)
 35 |     histogram = np.zeros((num_classes,), dtype=np.int)
 36 | 
 37 |     img_ids = sorted(coco_api.imgs.keys())
 38 |     num_imgs = len(img_ids)
 39 | 
 40 |     random.seed(random_seed)  # also need to set seed for numpy random?
 41 | 
 42 |     According_to_num = False  # if it is False, we split by percentage
 43 |     if According_to_num:
 44 |         num_samples = 578
 45 |     else:
 46 |         ratio = fully_labeled
 47 |         num_samples = round(num_imgs * ratio)
 48 | 
 49 |     sample_ids = random.sample(img_ids, num_samples)
 50 |     sample_ids = sorted(sample_ids)
 51 |     imgs = coco_api.loadImgs(sample_ids)
 52 | 
 53 |     # add indicator
 54 |     for i_img in imgs:
 55 |         i_img['indicator'] = 1
 56 |         i_img['label_type'] = 'fully'
 57 | 
 58 |     dataset_anns = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
 59 |     anns = [ann for img_anns in dataset_anns for ann in img_anns]
 60 | 
 61 |     # sampled statistics
 62 |     classes = [x["category_id"] for x in anns]
 63 |     for i in classes:
 64 |         index = class_index_list.index(int(i))
 65 |         histogram[index] = histogram[index] + 1
 66 |     class_ratios = histogram / np.sum(histogram)
 67 |     print("sampled class ratios: {}".format(class_ratios))
 68 |     print("each class has at least one example ", np.min(histogram) > 0)
 69 | 
 70 |     ith = 0
 71 |     for i_ann in anns:
 72 |         if len(i_ann['bbox']) > 0:
 73 |             boxes = i_ann['bbox']
 74 |             boxes = np.array(boxes)
 75 |             x0 = int(boxes[0])
 76 |             y0 = int(boxes[1])
 77 |             x1 = int(boxes[0]) + int(boxes[2])
 78 |             y1 = int(boxes[1]) + int(boxes[3])
 79 |             point_x = random.randint(x0, x1)
 80 |             point_y = random.randint(y0, y1)
 81 |             i_ann['point'] = [float(point_x), float(point_y)]
 82 |         else:
 83 |             i_ann['point'] = []
 84 |             print(i_ann['bbox'])
 85 |         ith = ith + 1
 86 |         if ith % 1000 == 0:
 87 |             print(ith)
 88 | 
 89 |     # sampled statistics
 90 |     classes = [x["category_id"] for x in anns]
 91 |     for i in classes:
 92 |         index = class_index_list.index(int(i))
 93 |         histogram[index] = histogram[index] + 1
 94 |     class_ratios = histogram / np.sum(histogram)
 95 |     print("sampled class ratios: {}".format(class_ratios))
 96 |     print("each class has at least one example ", np.min(histogram) > 0)
 97 | 
 98 |     sample_data = {}
 99 |     sample_data['images'] = imgs
100 |     sample_data['annotations'] = anns
101 |     sample_data['categories'] = list(coco_api.cats.values())
102 | 
103 |     output_file_label = '{}{}_crowdhuman_omni_label_seed{}_{}fully{}Unsup{}tagsU{}tagsK{}pointsU{}pointsK{}boxesEC{}boxesU.json'.format(root_dir, data_set, random_seed, round(100*fully_labeled), round(100*Unsup), round(100*tagsU), round(100*tagsK), round(100*pointsU), round(100*pointsK), round(100*boxesEC), round(100*boxesU))
104 |     ## save to json
105 |     with open(output_file_label, 'w') as f:
106 |         print('writing to json output:', output_file_label)
107 |         json.dump(sample_data, f, sort_keys=True)
108 | 
109 |     # next deal with unlabel (weakly label) data
110 |     unsampled_ids = list(set(img_ids) - set(sample_ids))
111 |     unsampled_ids = sorted(unsampled_ids)
112 | 
113 |     splitting = {}
114 |     imgs_all = []
115 |     anns_all = []
116 |     if According_to_num:
117 |         splitting['tagsU'] = 2500
118 |         splitting['tagsK'] = 1250
119 |         splitting['points'] = 2255
120 |         splitting['Unsup'] = 5417
121 |         splitting['pointsOnly'] = 1111
122 |         splitting['boxesN'] = 1111
123 |         splitting['boxes'] = 1111
124 |     else:
125 |         splitting['tagsU'] = round(num_imgs * tagsU)
126 |         splitting['tagsK'] = round(num_imgs * tagsK)
127 |         splitting['pointsU'] = round(num_imgs * pointsU)
128 |         splitting['pointsK'] = round(num_imgs * pointsK)
129 |         splitting['Unsup'] = round(num_imgs * Unsup)
130 |         splitting['boxesEC'] = round(num_imgs * boxesEC)
131 |         splitting['boxesU'] = round(num_imgs * boxesU)
132 | 
133 |     for key, value in splitting.items():
134 |         histogram = np.zeros((num_classes,), dtype=np.int)
135 |         num_samples = value
136 |         if num_samples > len(unsampled_ids):  #
137 |             num_samples = len(unsampled_ids)
138 | 
139 |         sample_ids = random.sample(unsampled_ids, num_samples)
140 |         sample_ids = sorted(sample_ids)
141 |         imgs = coco_api.loadImgs(sample_ids)
142 | 
143 |         # add indicator
144 |         for i_img in imgs:
145 |             i_img['indicator'] = 0
146 |             i_img['label_type'] = key
147 | 
148 |         dataset_anns_u = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
149 |         anns = [ann for img_anns in dataset_anns_u for ann in img_anns]
150 | 
151 |         ith = 0
152 |         for i_ann in anns:
153 |             if len(i_ann['bbox']) > 0:
154 |                 boxes = i_ann['bbox']
155 |                 boxes = np.array(boxes)
156 |                 x0 = int(boxes[0])
157 |                 y0 = int(boxes[1])
158 |                 x1 = int(boxes[0]) + int(boxes[2])
159 |                 y1 = int(boxes[1]) + int(boxes[3])
160 |                 point_x = random.randint(x0, x1)
161 |                 point_y = random.randint(y0, y1)
162 |                 i_ann['point'] = [float(point_x), float(point_y)]  # why get points for all sampled images? are points used?
163 |             else:
164 |                 i_ann['point'] = []
165 |                 print(i_ann['bbox'])
166 |             ith = ith + 1
167 |             if ith % 1000 == 0:
168 |                 print(ith)
169 | 
170 |         if key == 'boxesEC':
171 |             # -----the corresponding between delta and mIoU, I got this by emperical experiments
172 |             delta = [25, 9, 3, 1, 0.5, 0.3, 0.15, 0.06, 0.03, 0.01]
173 |             # mIoU = [0.95, 0.85, 0.75, 0.65, 0.55, 0.45, 0.35, 0.25, 0.15, 0.05]
174 | 
175 |             IoU = np.load('IoU_extreme.npy')
176 | 
177 |             # compute the distribution, bin is 0.1
178 |             IoU_list = list(IoU)
179 |             distribution_extreme = []
180 |             for i in np.arange(0.1, 1.1, 0.1):
181 |                 bin_i = [x for x in IoU_list if x >= i - 0.1 and x < i]
182 |                 distribution_extreme.append(len(bin_i) / len(IoU_list))
183 |             distribution_extreme = distribution_extreme[::-1]
184 | 
185 |             random.shuffle(anns)
186 |             ith_bin = 0
187 |             cur_delta = delta[ith_bin]
188 |             skip_to_next = distribution_extreme[ith_bin] * len(anns)
189 |             ith = 0
190 |             for i_ann in anns:
191 |                 if ith > skip_to_next:
192 |                     ith_bin += 1
193 |                     cur_delta = delta[ith_bin]
194 |                     skip_to_next = skip_to_next + distribution_extreme[ith_bin] * len(anns)
195 |                     print('to next')
196 |                 if 'iscrowd' not in i_ann or i_ann['iscrowd'] == 0:
197 |                     box_i = i_ann['bbox']
198 |                     boxes = np.array(box_i)
199 |                     boxes[2:] += boxes[:2]
200 |                     x0 = boxes[0]
201 |                     y0 = boxes[1]
202 |                     x1 = boxes[2]
203 |                     y1 = boxes[3]
204 | 
205 |                     # add noise to each of the two nodes
206 |                     cx, cy, w, h = (x0 + x1) / 2, (y0 + y1) / 2, (x1 - x0), (y1 - y0)
207 |                     mean = (y0, x0)
208 |                     cov = [[h / cur_delta, 0], [0, w / cur_delta]]
209 |                     sampled_point_i = np.random.multivariate_normal(mean, cov, 1)
210 |                     x0_new, y0_new = sampled_point_i[0, 1], sampled_point_i[0, 0]
211 | 
212 |                     mean = (y1, x1)
213 |                     sampled_point_i = np.random.multivariate_normal(mean, cov, 1)
214 |                     x1_new, y1_new = sampled_point_i[0, 1], sampled_point_i[0, 0]
215 | 
216 |                     x0 = min(x0_new, x1_new)
217 |                     x1 = max(x0_new, x1_new)
218 |                     y0 = min(y0_new, y1_new)
219 |                     y1 = max(y0_new, y1_new)
220 |                     w = x1 - x0
221 |                     h = y1 - y0
222 | 
223 |                     x0 = float(x0)
224 |                     y0 = float(y0)
225 |                     w = float(w)
226 |                     h = float(h)
227 |                     i_ann['bbox'] = [x0, y0, w, h]
228 |                 ith = ith + 1
229 |                 if ith % 10000 == 0:
230 |                     print(ith)
231 | 
232 |         # sampled statistics
233 |         classes = [x["category_id"] for x in anns]
234 |         for i in classes:
235 |             index = class_index_list.index(int(i))
236 |             histogram[index] = histogram[index] + 1
237 |         class_ratios = histogram / np.sum(histogram)
238 |         print("sampled class ratios: {}".format(class_ratios))
239 |         print("each class has at least one example ", np.min(histogram) > 0)
240 | 
241 |         imgs_all.extend(imgs)
242 |         anns_all.extend(anns)
243 | 
244 |         # update the rest sample pool
245 |         unsampled_ids = list(set(unsampled_ids) - set(sample_ids))
246 |         unsampled_ids = sorted(unsampled_ids)
247 | 
248 |     unsample_data = {}
249 |     unsample_data['images'] = imgs_all
250 |     unsample_data['annotations'] = anns_all
251 |     unsample_data['categories'] = list(coco_api.cats.values())
252 | 
253 |     output_file_unlabel = '{}{}_crowdhuman_omni_unlabel_seed{}_{}fully{}Unsup{}tagsU{}tagsK{}pointsU{}pointsK{}boxesEC{}boxesU.json'.format(root_dir, data_set, random_seed, round(100*fully_labeled), round(100*Unsup), round(100*tagsU), round(100*tagsK), round(100*pointsU), round(100*pointsK), round(100*boxesEC), round(100*boxesU))
254 |     ## save to json
255 |     with open(output_file_unlabel, 'w') as f:
256 |         print('writing to json output:', output_file_unlabel)
257 |         json.dump(unsample_data, f, sort_keys=True)
258 | 
259 | 
260 | if __name__ == '__main__':
261 |     main()
262 |     print("finished!")
263 | 


--------------------------------------------------------------------------------
/scripts/split_dataset_voc_omni.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: Apache-2.0
  3 | 
  4 | import json
  5 | import numpy as np
  6 | import random
  7 | from pycocotools.coco import COCO
  8 | 
  9 | def main():
 10 | 
 11 |     root_dir = '../voc/VOCdevkit/VOC20072012trainval/'
 12 |     data_set = 'instances_VOC_trainval20072012'
 13 |     json_file = root_dir + '{}.json'.format(data_set)
 14 |     coco_api = COCO(json_file)
 15 |     random_seed = 1709
 16 | 
 17 |     # assign the ratio, the sum should be 1
 18 |     fully_labeled = 0.2
 19 |     Unsup = 0.19
 20 |     tagsU = 0.0
 21 |     tagsK = 0.0
 22 |     pointsU = 0.0
 23 |     pointsK = 0.0
 24 |     boxesEC = 0.61
 25 |     boxesU = 0.0
 26 |     assert sum([fully_labeled, Unsup, tagsU, tagsK, pointsU, pointsK, boxesEC, boxesU]) == 1.0
 27 | 
 28 |     # we first sample the fully label data
 29 |     # original statistics
 30 |     num_classes = len(coco_api.cats)
 31 |     class_index_list = []
 32 |     category = coco_api.cats
 33 |     for key, value in category.items():
 34 |         class_index_list.append(key)
 35 |     histogram = np.zeros((num_classes,), dtype=np.int)
 36 | 
 37 |     img_ids = sorted(coco_api.imgs.keys())
 38 |     num_imgs = len(img_ids)
 39 | 
 40 |     random.seed(random_seed)
 41 | 
 42 |     According_to_num = False  # if it is False, we split by percentage
 43 |     if According_to_num:
 44 |         num_samples = 578
 45 |     else:
 46 |         ratio = fully_labeled
 47 |         num_samples = round(num_imgs * ratio)
 48 | 
 49 |     sample_ids = random.sample(img_ids, num_samples)
 50 |     sample_ids = sorted(sample_ids)
 51 |     imgs = coco_api.loadImgs(sample_ids)
 52 | 
 53 |     # add indicator
 54 |     for i_img in imgs:
 55 |         i_img['indicator'] = 1
 56 |         i_img['label_type'] = 'fully'
 57 | 
 58 |     dataset_anns = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
 59 |     anns = [ann for img_anns in dataset_anns for ann in img_anns]
 60 | 
 61 |     ith = 0
 62 |     for i_ann in anns:
 63 |         if len(i_ann['bbox']) > 0:
 64 |             boxes = i_ann['bbox']
 65 |             boxes = np.array(boxes)
 66 |             x0 = int(boxes[0])
 67 |             y0 = int(boxes[1])
 68 |             x1 = int(boxes[0]) + int(boxes[2])
 69 |             y1 = int(boxes[1]) + int(boxes[3])
 70 |             point_x = random.randint(x0, x1)
 71 |             point_y = random.randint(y0, y1)
 72 |             i_ann['point'] = [float(point_x), float(point_y)]
 73 |         else:
 74 |             i_ann['point'] = []
 75 |             print(i_ann['bbox'])
 76 |         ith = ith + 1
 77 |         if ith % 1000 == 0:
 78 |             print(ith)
 79 | 
 80 | 
 81 |     # sampled statistics
 82 |     classes = [x["category_id"] for x in anns]
 83 |     for i in classes:
 84 |         index = class_index_list.index(int(i))
 85 |         histogram[index] = histogram[index] + 1
 86 |     class_ratios = histogram / np.sum(histogram)
 87 |     print("sampled class ratios: {}".format(class_ratios))
 88 |     print("each class has at least one example ", np.min(histogram) > 0)
 89 | 
 90 |     sample_data = {}
 91 |     sample_data['images'] = imgs
 92 |     sample_data['annotations'] = anns
 93 |     sample_data['categories'] = list(coco_api.cats.values())
 94 | 
 95 |     output_file_label = '{}{}_voc_omni_label_seed{}_{}fully{}Unsup{}tagsU{}tagsK{}pointsU{}pointsK{}boxesEC{}boxesU.json'.format(root_dir, data_set, random_seed, round(100*fully_labeled), round(100*Unsup), round(100*tagsU), round(100*tagsK), round(100*pointsU), round(100*pointsK), round(100*boxesEC), round(100*boxesU))
 96 |     ## save to json
 97 |     with open(output_file_label, 'w') as f:
 98 |         print('writing to json output:', output_file_label)
 99 |         json.dump(sample_data, f, sort_keys=True)
100 | 
101 |     # next deal with unlabel (weakly label) data
102 |     unsampled_ids = list(set(img_ids) - set(sample_ids))
103 |     unsampled_ids = sorted(unsampled_ids)
104 | 
105 |     splitting = {}
106 |     imgs_all = []
107 |     anns_all = []
108 |     if According_to_num:
109 |         splitting['tagsU'] = 2500
110 |         splitting['tagsK'] = 1250
111 |         splitting['points'] = 2255
112 |         splitting['Unsup'] = 5417
113 |         splitting['pointsOnly'] = 1111
114 |         splitting['boxesN'] = 1111
115 |         splitting['boxes'] = 1111
116 |     else:
117 |         splitting['tagsU'] = round(num_imgs * tagsU)
118 |         splitting['tagsK'] = round(num_imgs * tagsK)
119 |         splitting['pointsU'] = round(num_imgs * pointsU)
120 |         splitting['pointsK'] = round(num_imgs * pointsK)
121 |         splitting['Unsup'] = round(num_imgs * Unsup)
122 |         splitting['boxesEC'] = round(num_imgs * boxesEC)
123 |         splitting['boxesU'] = round(num_imgs * boxesU)
124 | 
125 |     for key, value in splitting.items():
126 |         histogram = np.zeros((num_classes,), dtype=np.int)
127 |         num_samples = value
128 |         if num_samples > len(unsampled_ids):  #
129 |             num_samples = len(unsampled_ids)
130 | 
131 |         sample_ids = random.sample(unsampled_ids, num_samples)
132 |         sample_ids = sorted(sample_ids)
133 |         imgs = coco_api.loadImgs(sample_ids)
134 | 
135 |         # add indicator
136 |         for i_img in imgs:
137 |             i_img['indicator'] = 0
138 |             i_img['label_type'] = key
139 | 
140 |         dataset_anns_u = [coco_api.imgToAnns[img_id] for img_id in sample_ids]
141 |         anns = [ann for img_anns in dataset_anns_u for ann in img_anns]
142 | 
143 |         ith = 0
144 |         for i_ann in anns:
145 |             if len(i_ann['bbox']) > 0:
146 |                 boxes = i_ann['bbox']
147 |                 boxes = np.array(boxes)
148 |                 x0 = int(boxes[0])
149 |                 y0 = int(boxes[1])
150 |                 x1 = int(boxes[0]) + int(boxes[2])
151 |                 y1 = int(boxes[1]) + int(boxes[3])
152 |                 point_x = random.randint(x0, x1)
153 |                 point_y = random.randint(y0, y1)
154 |                 i_ann['point'] = [float(point_x), float(point_y)]
155 |             else:
156 |                 i_ann['point'] = []
157 |                 print(i_ann['bbox'])
158 |             ith = ith + 1
159 |             if ith % 1000 == 0:
160 |                 print(ith)
161 | 
162 |         if key == 'boxesEC':
163 |             # -----the corresponding between delta and mIoU, I got this by emperical experiments
164 |             delta = [25, 9, 3, 1, 0.5, 0.3, 0.15, 0.06, 0.03, 0.01]
165 |             # mIoU = [0.95, 0.85, 0.75, 0.65, 0.55, 0.45, 0.35, 0.25, 0.15, 0.05]
166 | 
167 |             IoU = np.load('IoU_extreme.npy')
168 | 
169 |             # compute the distribution, bin is 0.1
170 |             IoU_list = list(IoU)
171 |             distribution_extreme = []
172 |             for i in np.arange(0.1, 1.1, 0.1):
173 |                 bin_i = [x for x in IoU_list if x >= i - 0.1 and x < i]
174 |                 distribution_extreme.append(len(bin_i) / len(IoU_list))
175 |             distribution_extreme = distribution_extreme[::-1]
176 | 
177 |             random.shuffle(anns)
178 |             ith_bin = 0
179 |             cur_delta = delta[ith_bin]
180 |             skip_to_next = distribution_extreme[ith_bin] * len(anns)
181 |             ith = 0
182 |             for i_ann in anns:
183 |                 if ith > skip_to_next:
184 |                     ith_bin += 1
185 |                     cur_delta = delta[ith_bin]
186 |                     skip_to_next = skip_to_next + distribution_extreme[ith_bin] * len(anns)
187 |                     print('to next')
188 | 
189 |                 if 'iscrowd' not in i_ann or i_ann['iscrowd'] == 0:
190 |                     box_i = i_ann['bbox']
191 |                     boxes = np.array(box_i)
192 |                     boxes[2:] += boxes[:2]
193 |                     x0 = boxes[0]
194 |                     y0 = boxes[1]
195 |                     x1 = boxes[2]
196 |                     y1 = boxes[3]
197 | 
198 |                     # add noise to each of the two nodes
199 |                     cx, cy, w, h = (x0 + x1) / 2, (y0 + y1) / 2, (x1 - x0), (y1 - y0)
200 |                     mean = (y0, x0)
201 |                     cov = [[h / cur_delta, 0], [0, w / cur_delta]]
202 |                     sampled_point_i = np.random.multivariate_normal(mean, cov, 1)
203 |                     x0_new, y0_new = sampled_point_i[0, 1], sampled_point_i[0, 0]
204 | 
205 |                     mean = (y1, x1)
206 |                     sampled_point_i = np.random.multivariate_normal(mean, cov, 1)
207 |                     x1_new, y1_new = sampled_point_i[0, 1], sampled_point_i[0, 0]
208 | 
209 |                     x0 = min(x0_new, x1_new)
210 |                     x1 = max(x0_new, x1_new)
211 |                     y0 = min(y0_new, y1_new)
212 |                     y1 = max(y0_new, y1_new)
213 |                     w = x1 - x0
214 |                     h = y1 - y0
215 | 
216 | 
217 |                     x0 = float(x0)
218 |                     y0 = float(y0)
219 |                     w = float(w)
220 |                     h = float(h)
221 |                     i_ann['bbox'] = [x0, y0, w, h]
222 |                 ith = ith + 1
223 |                 if ith % 10000 == 0:
224 |                     print(ith)
225 | 
226 |         # sampled statistics
227 |         classes = [x["category_id"] for x in anns]
228 |         for i in classes:
229 |             index = class_index_list.index(int(i))
230 |             histogram[index] = histogram[index] + 1
231 |         class_ratios = histogram / np.sum(histogram)
232 |         print("sampled class ratios: {}".format(class_ratios))
233 |         print("each class has at least one example ", np.min(histogram) > 0)
234 | 
235 |         imgs_all.extend(imgs)
236 |         anns_all.extend(anns)
237 | 
238 |         # update the rest sample pool
239 |         unsampled_ids = list(set(unsampled_ids) - set(sample_ids))
240 |         unsampled_ids = sorted(unsampled_ids)
241 | 
242 |     unsample_data = {}
243 |     unsample_data['images'] = imgs_all
244 |     unsample_data['annotations'] = anns_all
245 |     unsample_data['categories'] = list(coco_api.cats.values())
246 | 
247 |     output_file_unlabel = '{}{}_voc_omni_unlabel_seed{}_{}fully{}Unsup{}tagsU{}tagsK{}pointsU{}pointsK{}boxesEC{}boxesU.json'.format(root_dir, data_set, random_seed, round(100*fully_labeled), round(100*Unsup), round(100*tagsU), round(100*tagsK), round(100*pointsU), round(100*pointsK), round(100*boxesEC), round(100*boxesU))
248 |     ## save to json
249 |     with open(output_file_unlabel, 'w') as f:
250 |         print('writing to json output:', output_file_unlabel)
251 |         json.dump(unsample_data, f, sort_keys=True)
252 | 
253 | 
254 | if __name__ == '__main__':
255 |     main()
256 |     print("finished!")
257 | 


--------------------------------------------------------------------------------
/tools/launch.py:
--------------------------------------------------------------------------------
  1 | # --------------------------------------------------------------------------------------------------------------------------
  2 | # Deformable DETR
  3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
  4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
  5 | # --------------------------------------------------------------------------------------------------------------------------
  6 | # Modified from https://github.com/pytorch/pytorch/blob/173f224570017b4b1a3a1a13d0bff280a54d9cd9/torch/distributed/launch.py
  7 | # --------------------------------------------------------------------------------------------------------------------------
  8 | 
  9 | r"""
 10 | `torch.distributed.launch` is a module that spawns up multiple distributed
 11 | training processes on each of the training nodes.
 12 | The utility can be used for single-node distributed training, in which one or
 13 | more processes per node will be spawned. The utility can be used for either
 14 | CPU training or GPU training. If the utility is used for GPU training,
 15 | each distributed process will be operating on a single GPU. This can achieve
 16 | well-improved single-node training performance. It can also be used in
 17 | multi-node distributed training, by spawning up multiple processes on each node
 18 | for well-improved multi-node distributed training performance as well.
 19 | This will especially be benefitial for systems with multiple Infiniband
 20 | interfaces that have direct-GPU support, since all of them can be utilized for
 21 | aggregated communication bandwidth.
 22 | In both cases of single-node distributed training or multi-node distributed
 23 | training, this utility will launch the given number of processes per node
 24 | (``--nproc_per_node``). If used for GPU training, this number needs to be less
 25 | or euqal to the number of GPUs on the current system (``nproc_per_node``),
 26 | and each process will be operating on a single GPU from *GPU 0 to
 27 | GPU (nproc_per_node - 1)*.
 28 | **How to use this module:**
 29 | 1. Single-Node multi-process distributed training
 30 | ::
 31 |     >>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
 32 |                YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other
 33 |                arguments of your training script)
 34 | 2. Multi-Node multi-process distributed training: (e.g. two nodes)
 35 | Node 1: *(IP: 192.168.1.1, and has a free port: 1234)*
 36 | ::
 37 |     >>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
 38 |                --nnodes=2 --node_rank=0 --master_addr="192.168.1.1"
 39 |                --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3
 40 |                and all other arguments of your training script)
 41 | Node 2:
 42 | ::
 43 |     >>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
 44 |                --nnodes=2 --node_rank=1 --master_addr="192.168.1.1"
 45 |                --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3
 46 |                and all other arguments of your training script)
 47 | 3. To look up what optional arguments this module offers:
 48 | ::
 49 |     >>> python -m torch.distributed.launch --help
 50 | **Important Notices:**
 51 | 1. This utilty and multi-process distributed (single-node or
 52 | multi-node) GPU training currently only achieves the best performance using
 53 | the NCCL distributed backend. Thus NCCL backend is the recommended backend to
 54 | use for GPU training.
 55 | 2. In your training program, you must parse the command-line argument:
 56 | ``--local_rank=LOCAL_PROCESS_RANK``, which will be provided by this module.
 57 | If your training program uses GPUs, you should ensure that your code only
 58 | runs on the GPU device of LOCAL_PROCESS_RANK. This can be done by:
 59 | Parsing the local_rank argument
 60 | ::
 61 |     >>> import argparse
 62 |     >>> parser = argparse.ArgumentParser()
 63 |     >>> parser.add_argument("--local_rank", type=int)
 64 |     >>> args = parser.parse_args()
 65 | Set your device to local rank using either
 66 | ::
 67 |     >>> torch.cuda.set_device(arg.local_rank)  # before your code runs
 68 | or
 69 | ::
 70 |     >>> with torch.cuda.device(arg.local_rank):
 71 |     >>>    # your code to run
 72 | 3. In your training program, you are supposed to call the following function
 73 | at the beginning to start the distributed backend. You need to make sure that
 74 | the init_method uses ``env://``, which is the only supported ``init_method``
 75 | by this module.
 76 | ::
 77 |     torch.distributed.init_process_group(backend='YOUR BACKEND',
 78 |                                          init_method='env://')
 79 | 4. In your training program, you can either use regular distributed functions
 80 | or use :func:`torch.nn.parallel.DistributedDataParallel` module. If your
 81 | training program uses GPUs for training and you would like to use
 82 | :func:`torch.nn.parallel.DistributedDataParallel` module,
 83 | here is how to configure it.
 84 | ::
 85 |     model = torch.nn.parallel.DistributedDataParallel(model,
 86 |                                                       device_ids=[arg.local_rank],
 87 |                                                       output_device=arg.local_rank)
 88 | Please ensure that ``device_ids`` argument is set to be the only GPU device id
 89 | that your code will be operating on. This is generally the local rank of the
 90 | process. In other words, the ``device_ids`` needs to be ``[args.local_rank]``,
 91 | and ``output_device`` needs to be ``args.local_rank`` in order to use this
 92 | utility
 93 | 5. Another way to pass ``local_rank`` to the subprocesses via environment variable
 94 | ``LOCAL_RANK``. This behavior is enabled when you launch the script with
 95 | ``--use_env=True``. You must adjust the subprocess example above to replace
 96 | ``args.local_rank`` with ``os.environ['LOCAL_RANK']``; the launcher
 97 | will not pass ``--local_rank`` when you specify this flag.
 98 | .. warning::
 99 |     ``local_rank`` is NOT globally unique: it is only unique per process
100 |     on a machine.  Thus, don't use it to decide if you should, e.g.,
101 |     write to a networked filesystem.  See
102 |     https://github.com/pytorch/pytorch/issues/12042 for an example of
103 |     how things can go wrong if you don't do this correctly.
104 | """
105 | 
106 | 
107 | import sys
108 | import subprocess
109 | import os
110 | import socket
111 | from argparse import ArgumentParser, REMAINDER
112 | 
113 | import torch
114 | 
115 | 
116 | def parse_args():
117 |     """
118 |     Helper function parsing the command line options
119 |     @retval ArgumentParser
120 |     """
121 |     parser = ArgumentParser(description="PyTorch distributed training launch "
122 |                                         "helper utilty that will spawn up "
123 |                                         "multiple distributed processes")
124 | 
125 |     # Optional arguments for the launch helper
126 |     parser.add_argument("--nnodes", type=int, default=1,
127 |                         help="The number of nodes to use for distributed "
128 |                              "training")
129 |     parser.add_argument("--node_rank", type=int, default=0,
130 |                         help="The rank of the node for multi-node distributed "
131 |                              "training")
132 |     parser.add_argument("--nproc_per_node", type=int, default=1,
133 |                         help="The number of processes to launch on each node, "
134 |                              "for GPU training, this is recommended to be set "
135 |                              "to the number of GPUs in your system so that "
136 |                              "each process can be bound to a single GPU.")
137 |     parser.add_argument("--master_addr", default="127.0.0.1", type=str,
138 |                         help="Master node (rank 0)'s address, should be either "
139 |                              "the IP address or the hostname of node 0, for "
140 |                              "single node multi-proc training, the "
141 |                              "--master_addr can simply be 127.0.0.1")
142 |     parser.add_argument("--master_port", default=29500, type=int,
143 |                         help="Master node (rank 0)'s free port that needs to "
144 |                              "be used for communciation during distributed "
145 |                              "training")
146 | 
147 |     # positional
148 |     parser.add_argument("training_script", type=str,
149 |                         help="The full path to the single GPU training "
150 |                              "program/script to be launched in parallel, "
151 |                              "followed by all the arguments for the "
152 |                              "training script")
153 | 
154 |     # rest from the training program
155 |     parser.add_argument('training_script_args', nargs=REMAINDER)
156 |     return parser.parse_args()
157 | 
158 | 
159 | def main():
160 |     args = parse_args()
161 | 
162 |     # world size in terms of number of processes
163 |     dist_world_size = args.nproc_per_node * args.nnodes
164 | 
165 |     # set PyTorch distributed related environmental variables
166 |     current_env = os.environ.copy()
167 |     current_env["MASTER_ADDR"] = args.master_addr
168 |     current_env["MASTER_PORT"] = str(args.master_port)
169 |     current_env["WORLD_SIZE"] = str(dist_world_size)
170 | 
171 |     processes = []
172 | 
173 |     for local_rank in range(0, args.nproc_per_node):
174 |         # each process's rank
175 |         dist_rank = args.nproc_per_node * args.node_rank + local_rank
176 |         current_env["RANK"] = str(dist_rank)
177 |         current_env["LOCAL_RANK"] = str(local_rank)
178 | 
179 |         cmd = [args.training_script] + args.training_script_args
180 | 
181 |         process = subprocess.Popen(cmd, env=current_env)
182 |         processes.append(process)
183 | 
184 |     for process in processes:
185 |         process.wait()
186 |         if process.returncode != 0:
187 |             raise subprocess.CalledProcessError(returncode=process.returncode,
188 |                                                 cmd=process.args)
189 | 
190 | 
191 | if __name__ == "__main__":
192 |     main()


--------------------------------------------------------------------------------
/tools/run_dist_launch.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | # ------------------------------------------------------------------------
 3 | # Deformable DETR
 4 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
 5 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 6 | # ------------------------------------------------------------------------
 7 | 
 8 | set -x
 9 | 
10 | GPUS=$1
11 | RUN_COMMAND=${@:2}
12 | if [ $GPUS -lt 8 ]; then
13 |     GPUS_PER_NODE=${GPUS_PER_NODE:-$GPUS}
14 | else
15 |     GPUS_PER_NODE=${GPUS_PER_NODE:-8}
16 | fi
17 | MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
18 | MASTER_PORT=${MASTER_PORT:-"29500"}
19 | NODE_RANK=${NODE_RANK:-0}
20 | 
21 | let "NNODES=GPUS/GPUS_PER_NODE"
22 | 
23 | python ./tools/launch.py \
24 |     --nnodes ${NNODES} \
25 |     --node_rank ${NODE_RANK} \
26 |     --master_addr ${MASTER_ADDR} \
27 |     --master_port ${MASTER_PORT} \
28 |     --nproc_per_node ${GPUS_PER_NODE} \
29 |     ${RUN_COMMAND}


--------------------------------------------------------------------------------
/tools/run_dist_slurm.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | # --------------------------------------------------------------------------------------------------------------------------
 3 | # Deformable DETR
 4 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
 5 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 6 | # --------------------------------------------------------------------------------------------------------------------------
 7 | # Modified from https://github.com/open-mmlab/mmdetection/blob/3b53fe15d87860c6941f3dda63c0f27422da6266/tools/slurm_train.sh
 8 | # --------------------------------------------------------------------------------------------------------------------------
 9 | 
10 | set -x
11 | 
12 | PARTITION=$1
13 | JOB_NAME=$2
14 | GPUS=$3
15 | RUN_COMMAND=${@:4}
16 | if [ $GPUS -lt 8 ]; then
17 |     GPUS_PER_NODE=${GPUS_PER_NODE:-$GPUS}
18 | else
19 |     GPUS_PER_NODE=${GPUS_PER_NODE:-8}
20 | fi
21 | CPUS_PER_TASK=${CPUS_PER_TASK:-4}
22 | SRUN_ARGS=${SRUN_ARGS:-""}
23 | 
24 | srun -p ${PARTITION} \
25 |     --job-name=${JOB_NAME} \
26 |     --gres=gpu:${GPUS_PER_NODE} \
27 |     --ntasks=${GPUS} \
28 |     --ntasks-per-node=${GPUS_PER_NODE} \
29 |     --cpus-per-task=${CPUS_PER_TASK} \
30 |     --kill-on-bad-exit=1 \
31 |     ${SRUN_ARGS} \
32 |     ${RUN_COMMAND}
33 | 
34 | 


--------------------------------------------------------------------------------
/util/__init__.py:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------
 2 | # Modified from Deformable DETR (https://github.com/fundamentalvision/Deformable-DETR)
 3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
 4 | # Licensed under the Apache-2.0 License.
 5 | # ------------------------------------------------------------------------
 6 | # Modified from DETR (https://github.com/facebookresearch/detr)
 7 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
 8 | # Licensed under the Apache-2.0 License.
 9 | # ------------------------------------------------------------------------
10 | # Modifications Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
11 | # SPDX-License-Identifier: Apache-2.0
12 | 


--------------------------------------------------------------------------------
/util/box_ops.py:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------
 2 | # Deformable DETR
 3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
 4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
 5 | # ------------------------------------------------------------------------
 6 | # Modified from DETR (https://github.com/facebookresearch/detr)
 7 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
 8 | # ------------------------------------------------------------------------
 9 | 
10 | """
11 | Utilities for bounding box manipulation and GIoU.
12 | """
13 | import torch
14 | from torchvision.ops.boxes import box_area
15 | 
16 | 
17 | def box_cxcywh_to_xyxy(x):
18 |     x_c, y_c, w, h = x.unbind(-1)
19 |     b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
20 |          (x_c + 0.5 * w), (y_c + 0.5 * h)]
21 |     return torch.stack(b, dim=-1)
22 | 
23 | 
24 | def box_xyxy_to_cxcywh(x):
25 |     x0, y0, x1, y1 = x.unbind(-1)
26 |     b = [(x0 + x1) / 2, (y0 + y1) / 2,
27 |          (x1 - x0), (y1 - y0)]
28 |     return torch.stack(b, dim=-1)
29 | 
30 | 
31 | # modified from torchvision to also return the union
32 | def box_iou(boxes1, boxes2):
33 |     area1 = box_area(boxes1)
34 |     area2 = box_area(boxes2)
35 | 
36 |     lt = torch.max(boxes1[:, None, :2], boxes2[:, :2])  # [N,M,2]
37 |     rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:])  # [N,M,2]
38 | 
39 |     wh = (rb - lt).clamp(min=0)  # [N,M,2]
40 |     inter = wh[:, :, 0] * wh[:, :, 1]  # [N,M]
41 | 
42 |     union = area1[:, None] + area2 - inter
43 | 
44 |     iou = inter / union
45 |     return iou, union
46 | 
47 | 
48 | def generalized_box_iou(boxes1, boxes2):
49 |     """
50 |     Generalized IoU from https://giou.stanford.edu/
51 | 
52 |     The boxes should be in [x0, y0, x1, y1] format
53 | 
54 |     Returns a [N, M] pairwise matrix, where N = len(boxes1)
55 |     and M = len(boxes2)
56 |     """
57 |     # degenerate boxes gives inf / nan results
58 |     # so do an early check
59 |     assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
60 |     assert (boxes2[:, 2:] >= boxes2[:, :2]).all()
61 |     iou, union = box_iou(boxes1, boxes2)
62 | 
63 |     lt = torch.min(boxes1[:, None, :2], boxes2[:, :2])
64 |     rb = torch.max(boxes1[:, None, 2:], boxes2[:, 2:])
65 | 
66 |     wh = (rb - lt).clamp(min=0)  # [N,M,2]
67 |     area = wh[:, :, 0] * wh[:, :, 1]
68 | 
69 |     return iou - (area - union) / area
70 | 
71 | 
72 | def masks_to_boxes(masks):
73 |     """Compute the bounding boxes around the provided masks
74 | 
75 |     The masks should be in format [N, H, W] where N is the number of masks, (H, W) are the spatial dimensions.
76 | 
77 |     Returns a [N, 4] tensors, with the boxes in xyxy format
78 |     """
79 |     if masks.numel() == 0:
80 |         return torch.zeros((0, 4), device=masks.device)
81 | 
82 |     h, w = masks.shape[-2:]
83 | 
84 |     y = torch.arange(0, h, dtype=torch.float)
85 |     x = torch.arange(0, w, dtype=torch.float)
86 |     y, x = torch.meshgrid(y, x)
87 | 
88 |     x_mask = (masks * x.unsqueeze(0))
89 |     x_max = x_mask.flatten(1).max(-1)[0]
90 |     x_min = x_mask.masked_fill(~(masks.bool()), 1e8).flatten(1).min(-1)[0]
91 | 
92 |     y_mask = (masks * y.unsqueeze(0))
93 |     y_max = y_mask.flatten(1).max(-1)[0]
94 |     y_min = y_mask.masked_fill(~(masks.bool()), 1e8).flatten(1).min(-1)[0]
95 | 
96 |     return torch.stack([x_min, y_min, x_max, y_max], 1)
97 | 


--------------------------------------------------------------------------------
/util/plot_utils.py:
--------------------------------------------------------------------------------
  1 | # ------------------------------------------------------------------------
  2 | # Deformable DETR
  3 | # Copyright (c) 2020 SenseTime. All Rights Reserved.
  4 | # Licensed under the Apache License, Version 2.0 [see LICENSE for details]
  5 | # ------------------------------------------------------------------------
  6 | # Modified from DETR (https://github.com/facebookresearch/detr)
  7 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
  8 | # ------------------------------------------------------------------------
  9 | 
 10 | """
 11 | Plotting utilities to visualize training logs.
 12 | """
 13 | import torch
 14 | import pandas as pd
 15 | import seaborn as sns
 16 | import matplotlib.pyplot as plt
 17 | 
 18 | from pathlib import Path, PurePath
 19 | 
 20 | 
 21 | def plot_logs(logs, fields=('class_error', 'loss_bbox_unscaled', 'mAP'), ewm_col=0, log_name='log.txt'):
 22 |     '''
 23 |     Function to plot specific fields from training log(s). Plots both training and test results.
 24 | 
 25 |     :: Inputs - logs = list containing Path objects, each pointing to individual dir with a log file
 26 |               - fields = which results to plot from each log file - plots both training and test for each field.
 27 |               - ewm_col = optional, which column to use as the exponential weighted smoothing of the plots
 28 |               - log_name = optional, name of log file if different than default 'log.txt'.
 29 | 
 30 |     :: Outputs - matplotlib plots of results in fields, color coded for each log file.
 31 |                - solid lines are training results, dashed lines are test results.
 32 | 
 33 |     '''
 34 |     func_name = "plot_utils.py::plot_logs"
 35 | 
 36 |     # verify logs is a list of Paths (list[Paths]) or single Pathlib object Path,
 37 |     # convert single Path to list to avoid 'not iterable' error
 38 | 
 39 |     if not isinstance(logs, list):
 40 |         if isinstance(logs, PurePath):
 41 |             logs = [logs]
 42 |             print(f"{func_name} info: logs param expects a list argument, converted to list[Path].")
 43 |         else:
 44 |             raise ValueError(f"{func_name} - invalid argument for logs parameter.\n \
 45 |             Expect list[Path] or single Path obj, received {type(logs)}")
 46 | 
 47 |     # verify valid dir(s) and that every item in list is Path object
 48 |     for i, dir in enumerate(logs):
 49 |         if not isinstance(dir, PurePath):
 50 |             raise ValueError(f"{func_name} - non-Path object in logs argument of {type(dir)}: \n{dir}")
 51 |         if dir.exists():
 52 |             continue
 53 |         raise ValueError(f"{func_name} - invalid directory in logs argument:\n{dir}")
 54 | 
 55 |     # load log file(s) and plot
 56 |     dfs = [pd.read_json(Path(p) / log_name, lines=True) for p in logs]
 57 | 
 58 |     fig, axs = plt.subplots(ncols=len(fields), figsize=(16, 5))
 59 | 
 60 |     for df, color in zip(dfs, sns.color_palette(n_colors=len(logs))):
 61 |         for j, field in enumerate(fields):
 62 |             if field == 'mAP':
 63 |                 coco_eval = pd.DataFrame(pd.np.stack(df.test_coco_eval.dropna().values)[:, 1]).ewm(com=ewm_col).mean()
 64 |                 axs[j].plot(coco_eval, c=color)
 65 |             else:
 66 |                 df.interpolate().ewm(com=ewm_col).mean().plot(
 67 |                     y=[f'train_{field}', f'test_{field}'],
 68 |                     ax=axs[j],
 69 |                     color=[color] * 2,
 70 |                     style=['-', '--']
 71 |                 )
 72 |     for ax, field in zip(axs, fields):
 73 |         ax.legend([Path(p).name for p in logs])
 74 |         ax.set_title(field)
 75 | 
 76 | 
 77 | def plot_precision_recall(files, naming_scheme='iter'):
 78 |     if naming_scheme == 'exp_id':
 79 |         # name becomes exp_id
 80 |         names = [f.parts[-3] for f in files]
 81 |     elif naming_scheme == 'iter':
 82 |         names = [f.stem for f in files]
 83 |     else:
 84 |         raise ValueError(f'not supported {naming_scheme}')
 85 |     fig, axs = plt.subplots(ncols=2, figsize=(16, 5))
 86 |     for f, color, name in zip(files, sns.color_palette("Blues", n_colors=len(files)), names):
 87 |         data = torch.load(f)
 88 |         # precision is n_iou, n_points, n_cat, n_area, max_det
 89 |         precision = data['precision']
 90 |         recall = data['params'].recThrs
 91 |         scores = data['scores']
 92 |         # take precision for all classes, all areas and 100 detections
 93 |         precision = precision[0, :, :, 0, -1].mean(1)
 94 |         scores = scores[0, :, :, 0, -1].mean(1)
 95 |         prec = precision.mean()
 96 |         rec = data['recall'][0, :, 0, -1].mean()
 97 |         print(f'{naming_scheme} {name}: mAP@50={prec * 100: 05.1f}, ' +
 98 |               f'score={scores.mean():0.3f}, ' +
 99 |               f'f1={2 * prec * rec / (prec + rec + 1e-8):0.3f}'
100 |               )
101 |         axs[0].plot(recall, precision, c=color)
102 |         axs[1].plot(recall, scores, c=color)
103 | 
104 |     axs[0].set_title('Precision / Recall')
105 |     axs[0].legend(names)
106 |     axs[1].set_title('Scores / Recall')
107 |     axs[1].legend(names)
108 |     return fig, axs
109 | 
110 | 
111 | 
112 | 


--------------------------------------------------------------------------------