├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── egoobjects_api
    ├── __init__.py
    ├── egoobjects.py
    ├── eval.py
    └── results.py
├── example.py
└── images
    ├── ICCV2023_poster_EgoObjects.jpg
    ├── intro.png
    ├── intro_teaser.png
    ├── logo.png
    ├── sample_images.png
    └── taxonomy.png


/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Code of Conduct
 2 | 
 3 | ## Our Pledge
 4 | 
 5 | In the interest of fostering an open and welcoming environment, we as
 6 | contributors and maintainers pledge to make participation in our project and
 7 | our community a harassment-free experience for everyone, regardless of age, body
 8 | size, disability, ethnicity, sex characteristics, gender identity and expression,
 9 | level of experience, education, socio-economic status, nationality, personal
10 | appearance, race, religion, or sexual identity and orientation.
11 | 
12 | ## Our Standards
13 | 
14 | Examples of behavior that contributes to creating a positive environment
15 | include:
16 | 
17 | * Using welcoming and inclusive language
18 | * Being respectful of differing viewpoints and experiences
19 | * Gracefully accepting constructive criticism
20 | * Focusing on what is best for the community
21 | * Showing empathy towards other community members
22 | 
23 | Examples of unacceptable behavior by participants include:
24 | 
25 | * The use of sexualized language or imagery and unwelcome sexual attention or
26 | advances
27 | * Trolling, insulting/derogatory comments, and personal or political attacks
28 | * Public or private harassment
29 | * Publishing others' private information, such as a physical or electronic
30 | address, without explicit permission
31 | * Other conduct which could reasonably be considered inappropriate in a
32 | professional setting
33 | 
34 | ## Our Responsibilities
35 | 
36 | Project maintainers are responsible for clarifying the standards of acceptable
37 | behavior and are expected to take appropriate and fair corrective action in
38 | response to any instances of unacceptable behavior.
39 | 
40 | Project maintainers have the right and responsibility to remove, edit, or
41 | reject comments, commits, code, wiki edits, issues, and other contributions
42 | that are not aligned to this Code of Conduct, or to ban temporarily or
43 | permanently any contributor for other behaviors that they deem inappropriate,
44 | threatening, offensive, or harmful.
45 | 
46 | ## Scope
47 | 
48 | This Code of Conduct applies within all project spaces, and it also applies when
49 | an individual is representing the project or its community in public spaces.
50 | Examples of representing a project or community include using an official
51 | project e-mail address, posting via an official social media account, or acting
52 | as an appointed representative at an online or offline event. Representation of
53 | a project may be further defined and clarified by project maintainers.
54 | 
55 | This Code of Conduct also applies outside the project spaces when there is a
56 | reasonable belief that an individual's behavior may have a negative impact on
57 | the project or its community.
58 | 
59 | ## Enforcement
60 | 
61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
62 | reported by contacting the project team at <opensource-conduct@meta.com>. All
63 | complaints will be reviewed and investigated and will result in a response that
64 | is deemed necessary and appropriate to the circumstances. The project team is
65 | obligated to maintain confidentiality with regard to the reporter of an incident.
66 | Further details of specific enforcement policies may be posted separately.
67 | 
68 | Project maintainers who do not follow or enforce the Code of Conduct in good
69 | faith may face temporary or permanent repercussions as determined by other
70 | members of the project's leadership.
71 | 
72 | ## Attribution
73 | 
74 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
75 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
76 | 
77 | [homepage]: https://www.contributor-covenant.org
78 | 
79 | For answers to common questions about this code of conduct, see
80 | https://www.contributor-covenant.org/faq


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to EgoObjects
 2 | We want to make contributing to this project as easy and transparent as
 3 | possible.
 4 | 
 5 | ## Our Development Process
 6 | Minor changes and improvements will be released on an ongoing basis. Larger changes (e.g., changesets implementing a new paper) will be released on a more periodic basis.
 7 | 
 8 | ## Pull Requests
 9 | We actively welcome your pull requests.
10 | 
11 | 1. Fork the repo and create your branch from `main`.
12 | 2. If you've added code that should be tested, add tests.
13 | 3. If you've changed APIs, update the documentation.
14 | 4. Ensure the test suite passes.
15 | 5. Make sure your code lints.
16 | 6. If you haven't already, complete the Contributor License Agreement ("CLA").
17 | 
18 | ## Contributor License Agreement ("CLA")
19 | In order to accept your pull request, we need you to submit a CLA. You only need
20 | to do this once to work on any of Facebook's open source projects.
21 | 
22 | Complete your CLA here: <https://code.facebook.com/cla>
23 | 
24 | ## Issues
25 | We use GitHub issues to track public bugs. Please ensure your description is
26 | clear and has sufficient instructions to be able to reproduce the issue.
27 | 
28 | Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
29 | disclosure of security bugs. In those cases, please go through the process
30 | outlined on that page and do not file a public issue.
31 | 
32 | ## Coding Style  
33 | * 4 spaces for indentation rather than tabs
34 | * 80 character line length
35 | * PEP8 formatting following [Black](https://black.readthedocs.io/en/stable/)
36 | 
37 | ## License
38 | By contributing to EgoObjects, you agree that your contributions will be licensed
39 | under the LICENSE file in the root directory of this source tree.


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) Meta Platforms, Inc. and affiliates.
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | <p align="center">
 2 | <img src="images/logo.png" height="500" align="center">
 3 | </p>
 4 | 
 5 | # EgoObjects
 6 | 
 7 | Official pytorch implementation of the ICCV 23' paper
 8 | 
 9 | **[EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding](https://arxiv.org/abs/2309.08816)**
10 | 
11 | [Chenchen Zhu](https://sites.google.com/andrew.cmu.edu/zcckernel), [Fanyi Xiao](https://fanyix.cs.ucdavis.edu/), [Andres Alvarado](https://www.linkedin.com/in/josecarlos12/), [Yasmine Babaei](https://www.linkedin.com/in/yasminebabaei/), [Jiabo Hu](https://www.linkedin.com/in/jiabo-hu-1321b1121/), [Hichem El-Mohri](https://www.linkedin.com/in/hichem-elmohri/), [Sean Chang Culatana](https://ai.meta.com/people/sean-chang-culatana/), [Roshan Sumbaly](https://www.linkedin.com/in/rsumbaly/), [Zhicheng Yan](https://sites.google.com/view/zhicheng-yan)
12 | 
13 | **Meta AI**
14 | 
15 | <a href='https://github.com/facebookresearch/EgoObjects'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2309.08816'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
16 | 
17 | EgoObjects is a large-scale egocentric dataset for fine-grained object understanding, which features videos captured by various wearable devices at worldwide locations, objects from a diverse set of categories commonly seen in indoor environments, and videos of the same object instance captured under diverse conditions. The dataset supports both the conventional category-level as well as the novel instance-level object detection task.
18 | 
19 | <p align="center">
20 | <img src="images/sample_images.png"/>
21 | </p>
22 | 
23 | <p align="center">
24 |   <img src="images/taxonomy.png" width="500" height="550">
25 | </p>
26 | 
27 | 
28 | ## EgoObjects v1.0
29 | 
30 | For this release, we have annotated 114K frames (79K train, 5.7K val, 29.5K test) sampled from 9K+ videos collected by 250 participants across the world. A total of 14.4K unique object instances from 368 categories are annotated. Among them, there are 1.3K main object instances from 206 categories and 13.1K secondary object instances (i.e., objects accompanying the main object) from 353 categories. On average, each image is annotated with 5.6 instances from 4.8 categories, and each object instance appears in 44.8 images, which ensures diverse viewing directions for the object. 
31 | 
32 | ## Dataset downloading
33 | 
34 | Release v1.0 is publicly available from this [page](https://ai.meta.com/datasets/egoobjects-downloads). Images (~40G) can be downloaded from file `EgoObjectsV1_images.zip`. Unified annotations for category and instance level object detection can be downloaded from files including `EgoObjectsV1_unified_train.json`, `EgoObjectsV1_unified_eval.json`, and `EgoObjectsV1_unified_metadata.json`. They can be placed under `$EgoObjects_ROOT/data/`. We follow the same data format as [LVIS](https://www.lvisdataset.org/dataset) with EgoObjects specific changes.
35 | 
36 | ## Setup
37 | 
38 | ### Requirements
39 | - Linux with Python ≥ 3.8
40 | - PyTorch ≥ 1.8.
41 |   Install them together at [pytorch.org](https://pytorch.org) to make sure of this. Note, please check
42 |   PyTorch version matches that is required by Detectron2.
43 | - Detectron2: follow [Detectron2 installation instructions](https://detectron2.readthedocs.io/tutorials/install.html).
44 | 
45 | ### Example conda environment setup
46 | ```bash
47 | conda create --name egoobjects python=3.9
48 | conda activate egoobjects
49 | conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
50 | python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
51 | 
52 | # under your working directory
53 | git clone https://github.com/facebookresearch/EgoObjects.git
54 | cd EgoObjects
55 | ```
56 | 
57 | If setup correctly, run our evaluation example code to get mock results for category and instance level detection tasks:
58 | ```bash
59 | python example.py
60 | ```
61 | 
62 | ## Timeline
63 | - 23' Sep 6, EgoObjects v1.0, including both data and annotations, is open sourced.
64 | - 23' March, EgoObjects v1.0 is covered by <a href="https://research.facebook.com/blog/2023/3/egoobjects-large-scale-egocentric-dataset-for-category-and-instance-level-object-understanding/">Meta AI</a>
65 | - 22' June, an earlier version of EgoObjects dataset is adopted by the <a href="https://sites.google.com/view/clvision2022/challenge"> Continual Learning Challenge in the 3rd CLVISION Workshop</a> at CVPR.
66 | - 22' March, EgoObjects is first introduced by <a href="https://ai.meta.com/blog/advancing-first-person-perception-with-2022-ego4d-challenge/#ego-objects">Meta AI Blog</a>
67 | 
68 | ## EgoObjects ICCV 23' Poster
69 | 
70 | <p align="center">
71 | <img src="images/ICCV2023_poster_EgoObjects.jpg" height="500" align="center">
72 | </p>
73 | 
74 | ## Citing EgoObjects
75 | Paper link: [[`arXiv`](https://arxiv.org/abs/2309.08816)]
76 | 
77 | If you find this code/data useful in your research then please cite our paper:
78 | ```
79 | @inproceedings{zhu2023egoobjects,
80 |   title={EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding},
81 |   author={Zhu, Chenchen and Xiao, Fanyi and Alvarado, Andrés and Babaei, Yasmine and Hu, Jiabo and El-Mohri, Hichem and Chang, Sean and Sumbaly, Roshan and Yan, Zhicheng},
82 |   booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
83 |   year={2023}
84 | }
85 | ```
86 | 
87 | ## Credit
88 | The code is a re-write of PythonAPI for [LVIS](https://github.com/lvis-dataset/lvis-api).
89 | The core functionality is the same with EgoObjects specific changes.
90 | 
91 | ## License
92 | EgoObjects is licensed under the [MIT License](LICENSE).
93 | 


--------------------------------------------------------------------------------
/egoobjects_api/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Meta Platforms, Inc. and affiliates.
2 | # All rights reserved.
3 | 
4 | # This source code is licensed under the license found in the
5 | # LICENSE file in the root directory of this source tree.


--------------------------------------------------------------------------------
/egoobjects_api/egoobjects.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # Copyright (c) Meta Platforms, Inc. and affiliates.
  3 | # All rights reserved.
  4 | 
  5 | # This source code is licensed under the license found in the
  6 | # LICENSE file in the root directory of this source tree.
  7 | 
  8 | import json
  9 | import logging
 10 | from collections import defaultdict
 11 | from copy import deepcopy
 12 | from typing import List, Optional, Dict, Any
 13 | 
 14 | from detectron2.data.catalog import Metadata
 15 | 
 16 | logger = logging.getLogger(__name__)
 17 | 
 18 | 
 19 | FILTER_OPTS = {
 20 |     # instance detection
 21 |     "egoobjects_unified_det_train": {},
 22 |     "egoobjects_unified_det_test_query": {
 23 |         "subset": "test",
 24 |     },
 25 |     "egoobjects_unified_det_val_query": {
 26 |         "subset": "val",
 27 |     },
 28 |     # category detection
 29 |     "egoobjects_cat_det_train": {
 30 |         "remove_non_category": True,
 31 |     },
 32 |     "egoobjects_cat_det_test": {
 33 |         "subset": "test",
 34 |         "remove_non_category": True,
 35 |     },
 36 |     "egoobjects_cat_det_val": {
 37 |         "subset": "val",
 38 |         "remove_non_category": True,
 39 |     },
 40 |     # instance detection with seen/unseen category
 41 |     "egoobjects_instdet_train": {},
 42 |     "egoobjects_instdet_test_query": {
 43 |         "subset": "test",
 44 |     },
 45 |     "egoobjects_instdet_val_query": {
 46 |         "subset": "val",
 47 |     },
 48 | }
 49 | 
 50 | 
 51 | def filter_annot(
 52 |     data,
 53 |     metadata,
 54 |     filter_opts,
 55 | ):
 56 |     if filter_opts is None:
 57 |         filter_opts = {}
 58 | 
 59 |     valid_image_set = set([x["id"] for x in data["images"]])
 60 |     # filter according to easy/hard splits
 61 |     if "difficulty" in filter_opts and filter_opts["difficulty"]:
 62 |         selected_image_set = set(
 63 |             [
 64 |                 x["id"]
 65 |                 for x in data["images"]
 66 |                 if x["difficulty"] == filter_opts["difficulty"]
 67 |             ]
 68 |         )
 69 |         valid_image_set = valid_image_set & selected_image_set
 70 | 
 71 |     # filter according to minival splits
 72 |     if "subset" in filter_opts and filter_opts["subset"]:
 73 |         selected_image_set = set(
 74 |             [x["id"] for x in data["images"] if x["subset"] == filter_opts["subset"]]
 75 |         )
 76 |         valid_image_set = valid_image_set & selected_image_set
 77 | 
 78 |     # filter out annotations/images without category_id field
 79 |     if "remove_non_category" in filter_opts and filter_opts["remove_non_category"]:
 80 |         if isinstance(metadata, Dict):
 81 |             cat_det_cats = metadata["cat_det_cats"]
 82 |         else:
 83 |             cat_det_cats = metadata.cat_det_cats
 84 |         kept_annot = []
 85 |         kept_image_id = set()
 86 |         kept_cat_ids = set([x["id"] for x in cat_det_cats])
 87 |         for anno in data["annotations"]:
 88 |             if (
 89 |                 "category_id" in anno
 90 |                 and anno["category_id"] in kept_cat_ids
 91 |                 and anno["image_id"] in valid_image_set
 92 |             ):
 93 |                 kept_annot.append(anno)
 94 |                 kept_image_id.add(anno["image_id"])
 95 |         data["annotations"] = kept_annot
 96 |         data["images"] = [x for x in data["images"] if x["id"] in kept_image_id]
 97 |     else:
 98 |         kept_annot = []
 99 |         kept_image_id = set()
100 |         for anno in data["annotations"]:
101 |             if anno["image_id"] in valid_image_set:
102 |                 kept_annot.append(anno)
103 |                 kept_image_id.add(anno["image_id"])
104 |         data["annotations"] = kept_annot
105 |         data["images"] = [x for x in data["images"] if x["id"] in kept_image_id]
106 | 
107 |     return data
108 | 
109 | 
110 | class EgoObjects:
111 |     def __init__(
112 |         self,
113 |         annotation_path: str,
114 |         metadata: Metadata,
115 |         filter_opts: Any = None,
116 |     ):
117 |         """
118 |         Args:
119 |             annotation_path: location of annotation file
120 |         """
121 |         logger.info(f"annotation_path {annotation_path}")
122 |         self.metadata = metadata
123 | 
124 |         with open(annotation_path, "r") as f:
125 |             self.dataset = json.load(f)
126 | 
127 |         # is_valid_video_id = self._valid_video_ids()
128 |         # if not is_valid_video_id:
129 |         #     self._replace_video_ids()
130 | 
131 |         # filter the dataset accordingly
132 |         self.dataset = filter_annot(
133 |             self.dataset,
134 |             metadata,
135 |             filter_opts,
136 |         )
137 | 
138 |         assert (
139 |             type(self.dataset) == dict
140 |         ), f"Annotation file format {type(self.dataset)} not supported."
141 | 
142 |         self.annotations = {
143 |             "cat_det": [
144 |                 deepcopy(ann)
145 |                 for ann in self.dataset["annotations"]
146 |                 if "category_id" in ann
147 |             ],
148 |             "inst_det": [
149 |                 deepcopy(ann)
150 |                 for ann in self.dataset["annotations"]
151 |                 if "instance_id" in ann
152 |             ],
153 |         }
154 | 
155 |         logger.info(f"num cat det instances {len(self.annotations['cat_det'])}")
156 |         logger.info(f"num inst det instances {len(self.annotations['inst_det'])}")
157 | 
158 |         self._create_index(metadata)
159 | 
160 |     def _valid_video_ids(self):
161 |         """dummy check on whether video ids lie in existing video_id """
162 |         video_ids_in_setting = {"01", "02", "03", "04", "05", "06", "07", "08", "09", "10"}
163 |         video_ids_in_dataset = set([img['video_id'] for img in self.dataset['images']])
164 |         return video_ids_in_dataset.issubset(video_ids_in_setting)
165 | 
166 |     def _replace_video_ids(self):
167 |         """
168 |         To align the `video_id` for *ego-object dataset towards existing dataset.
169 |         Rules:
170 |             {'1', '2', '3'} maps into {'01', '02', '03'},
171 |             while others {'V1', 'V2', 'V26', 'V28', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8'} map into {'04'}
172 |         """
173 |         star_to_existing_vid_mapping = {'1': '01', '2': '02', '3': '03'} # hard-coded replacement for easy videos
174 |         mapping_key_set = set(star_to_existing_vid_mapping.keys())
175 |         for img in self.dataset['images']:
176 |             if img['video_id'] in mapping_key_set:
177 |                 img.update({'video_id': star_to_existing_vid_mapping[img['video_id']]})
178 |             else:
179 |                 img.update({'video_id':'04'}) # hard-coded replacement for complex videos
180 | 
181 |         logger.info("VideoID in *ego-object updated to match existing dataset.")
182 |         return
183 | 
184 |     def _prepare_neg_instance_ids(self):
185 |         img_id_2_instance_id = {
186 |             img_id: {self.anns["inst_det"][ann_id]["instance_id"] for ann_id in ann_ids}
187 |             for img_id, ann_ids in self.img_ann_map["inst_det"].items()
188 |         }
189 |         for img_id, img in self.imgs.items():
190 |             if img_id in img_id_2_instance_id:
191 |                 img["neg_instance_ids"] = self.instance_ids.difference(
192 |                     img_id_2_instance_id[img_id]
193 |                 )
194 |             else:
195 |                 img["neg_instance_ids"] = self.instance_ids
196 | 
197 |     def _prepare_neg_cat_ids(self):
198 |         img_id_2_category_id = {
199 |             img_id: {self.anns["cat_det"][ann_id]["category_id"] for ann_id in ann_ids}
200 |             for img_id, ann_ids in self.img_ann_map["cat_det"].items()
201 |         }
202 |         for img_id, img in self.imgs.items():
203 |             if "neg_category_ids" in img:
204 |                 continue
205 |             elif img_id in img_id_2_category_id:
206 |                 img["neg_category_ids"] = self.category_ids.difference(
207 |                     img_id_2_category_id[img_id]
208 |                 )
209 |             else:
210 |                 img["neg_category_ids"] = self.category_ids
211 | 
212 |     def _create_index(self, metadata):
213 |         logger.info("Creating index.")
214 | 
215 |         self.cat_id_2_cat = {c["id"]: c for c in metadata.categories}
216 |         self.imgs = {img["id"]: img for img in self.dataset["images"]}
217 |         self.anns = defaultdict(dict)
218 |         self.img_ann_map = {"cat_det": defaultdict(list), "inst_det": defaultdict(list)}
219 |         for det_type, anns in self.annotations.items():
220 |             for ann in anns:
221 |                 self.anns[det_type][ann["id"]] = ann
222 |                 self.img_ann_map[det_type][ann["image_id"]].append(ann["id"])
223 | 
224 |             logger.info(
225 |                 f"{det_type}, len img_ann_map {len(self.img_ann_map[det_type])}"
226 |             )
227 | 
228 |         # self.category_ids = {x["id"] for x in metadata.cat_det_cats}
229 |         self.category_ids = {ann["category_id"] for ann in self.annotations["cat_det"]}
230 |         self.instance_ids = {ann["instance_id"] for ann in self.annotations["inst_det"]}
231 | 
232 |         self._prepare_neg_instance_ids()
233 |         self._prepare_neg_cat_ids()
234 | 
235 |         self.cats = {
236 |             "cat_det": {c["id"]: c for c in metadata.cat_det_cats},
237 |             "inst_det": {c["id"]: c for c in metadata.inst_det_cats},
238 |         }
239 | 
240 |         self.classes = {
241 |             "cat_det": {c["id"]: c for c in metadata.cat_det_cats},
242 |             "inst_det": {},
243 |         }
244 |         for _i, ann in enumerate(self.annotations["inst_det"]):
245 |             inst_id = ann["instance_id"]
246 |             cat_id = ann["category_id"] if "category_id" in ann else None
247 | 
248 |             if inst_id not in self.classes["inst_det"]:
249 |                 inst_dict = {"id": inst_id}
250 |                 if cat_id is not None:
251 |                     if "frequency" in self.cat_id_2_cat[cat_id]:
252 |                         frequency = self.cat_id_2_cat[cat_id]["frequency"]
253 |                     else:
254 |                         # assign all sample to frequent group if not specified
255 |                         frequency = "frequent"
256 |                     inst_dict.update(
257 |                         {
258 |                             "category_id": cat_id,
259 |                             "frequency": frequency,
260 |                         }
261 |                     )
262 | 
263 |                 self.classes["inst_det"][inst_id] = inst_dict
264 | 
265 |             else:
266 |                 if cat_id is not None:
267 |                     assert self.classes["inst_det"][inst_id]["category_id"] == cat_id
268 | 
269 |         logger.info(f"num total images: {len(self.imgs)}")
270 |         for det_type in ["cat_det", "inst_det"]:
271 |             logger.info(f"num images for {det_type}: {len(self.img_ann_map[det_type])}")
272 |             logger.info(f"num annotations for {det_type} {len(self.anns[det_type])}")
273 |             logger.info(
274 |                 f"num object categories of {det_type} {len(self.cats[det_type])}"
275 |             )
276 |             logger.info(f"num classes of {det_type} {len(self.classes[det_type])}")
277 | 
278 |         logger.info("Index created.")
279 | 
280 |     def get_img_ids(self):
281 |         """Get all img ids.
282 | 
283 |         Returns:
284 |             ids (int array): integer array of image ids
285 |         """
286 |         return list(self.imgs.keys())
287 | 
288 |     def get_class_ids(self, det_type: str):
289 |         """Get all category ids of category detection.
290 |         Args:
291 |             det_type: detection type. Choices {"cat_det", "inst_det"}
292 |         Returns:
293 |             ids: integer array of category ids
294 |         """
295 |         return list(self.classes[det_type].keys())
296 | 
297 |     def get_ann_ids(
298 |         self,
299 |         det_type: str,
300 |         img_ids: Optional[List[int]] = None,
301 |         class_ids: Optional[List[int]] = None,
302 |     ) -> List[int]:
303 |         """Get ann ids that satisfy given filter conditions.
304 | 
305 |         Args:
306 |             det_type: detection type. Choices {"cat_det", "inst_det"}
307 |             img_ids: get anns for given imgs
308 |             class_ids: get anns for given class ids, which are category ids for "cat_det"
309 |                 and instance ids for "inst_det"
310 |         Returns:
311 |             ids: integer array of ann ids
312 |         """
313 |         assert det_type in self.img_ann_map
314 |         anns = []
315 |         if img_ids is not None:
316 |             for img_id in img_ids:
317 |                 if img_id in self.img_ann_map[det_type]:
318 |                     anns.extend(
319 |                         [
320 |                             self.anns[det_type][ann_id]
321 |                             for ann_id in self.img_ann_map[det_type][img_id]
322 |                         ]
323 |                     )
324 |         else:
325 |             anns = self.annotations[det_type]
326 |         # return early if no more filtering required
327 |         if class_ids is None:
328 |             return [ann["id"] for ann in anns]
329 | 
330 |         class_ids = set(class_ids)
331 | 
332 |         ann_ids = [
333 |             _ann["id"]
334 |             for _ann in anns
335 |             if _ann["category_id" if det_type == "cat_det" else "instance_id"]
336 |             in class_ids
337 |         ]
338 | 
339 |         return ann_ids
340 | 
341 |     def _load_helper(self, _dict, ids):
342 |         if ids is None:
343 |             return list(_dict.values())
344 |         else:
345 |             return [_dict[id] for id in ids]
346 | 
347 |     def load_anns(self, det_type: str, ids: Optional[List[int]] = None):
348 |         """Load anns with the specified ids. If ids=None load all anns.
349 | 
350 |         Args:
351 |             det_type: detection type. Choices {"cat_det", "inst_det"}
352 |             ids: integer array of annotation ids
353 | 
354 |         Returns:
355 |             anns: loaded annotation objects
356 |         """
357 |         return self._load_helper(self.anns[det_type], ids)
358 | 
359 |     def load_classes(self, det_type: str, ids: Optional[List[int]] = None):
360 |         """Load classes  with the specified ids.
361 |         If ids=None load all classes.
362 | 
363 |         Args:
364 |             det_type: detection type. Choices {"cat_det", "inst_det"}
365 |             ids: integer array of class ids
366 | 
367 |         Returns:
368 |             classes: loaded class dicts
369 |         """
370 |         return self._load_helper(self.classes[det_type], ids)
371 | 
372 |     def load_imgs(self, ids: Optional[List[int]] = None):
373 |         """Load categories with the specified ids. If ids=None load all images.
374 | 
375 |         Args:
376 |             ids: integer array of image ids
377 | 
378 |         Returns:
379 |             imgs: loaded image dicts
380 |         """
381 |         return self._load_helper(self.imgs, ids)
382 | 
383 | 
384 | class EgoObjectsMetaInfo:
385 |     def __init__(self):
386 |         self.video_id_to_setting = {
387 |             "01": {
388 |                 "distance": "near",
389 |                 "camera motion": "horizontal",
390 |                 "background": "simple",
391 |                 "lighting": "bright",
392 |             },
393 |             "02": {
394 |                 "distance": "medium",
395 |                 "camera motion": "horizontal",
396 |                 "background": "simple",
397 |                 "lighting": "bright",
398 |             },
399 |             "03": {
400 |                 "distance": "near",
401 |                 "camera motion": "horizontal",
402 |                 "background": "simple",
403 |                 "lighting": "dim",
404 |             },
405 |             "04": {
406 |                 "distance": "medium",
407 |                 "camera motion": "horizontal",
408 |                 "background": "busy",
409 |                 "lighting": "bright",
410 |             },
411 |             "05": {
412 |                 "distance": "far",
413 |                 "camera motion": "horizontal",
414 |                 "background": "busy",
415 |                 "lighting": "bright",
416 |             },
417 |             "06": {
418 |                 "distance": "medium",
419 |                 "camera motion": "vertical",
420 |                 "background": "busy",
421 |                 "lighting": "bright",
422 |             },
423 |             "07": {
424 |                 "distance": "medium",
425 |                 "camera motion": "diagonal",
426 |                 "background": "busy",
427 |                 "lighting": "bright",
428 |             },
429 |             "08": {
430 |                 "distance": "near",
431 |                 "camera motion": "horizontal",
432 |                 "background": "busy",
433 |                 "lighting": "dim",
434 |             },
435 |             "09": {
436 |                 "distance": "medium",
437 |                 "camera motion": "horizontal",
438 |                 "background": "busy",
439 |                 "lighting": "dim",
440 |             },
441 |             "10": {
442 |                 "distance": "far",
443 |                 "camera motion": "horizontal",
444 |                 "background": "busy",
445 |                 "lighting": "dim",
446 |             },
447 |         }
448 | 
449 |         self.background = ["all", "simple", "busy"]
450 |         self.lighting = ["all", "bright", "dim"]
451 | 


--------------------------------------------------------------------------------
/egoobjects_api/eval.py:
--------------------------------------------------------------------------------
   1 | #!/usr/bin/env python3
   2 | # Copyright (c) Meta Platforms, Inc. and affiliates.
   3 | # All rights reserved.
   4 | 
   5 | # This source code is licensed under the license found in the
   6 | # LICENSE file in the root directory of this source tree.
   7 | 
   8 | import datetime
   9 | import logging
  10 | import math
  11 | 
  12 | import os
  13 | import time
  14 | from collections import defaultdict, OrderedDict
  15 | from multiprocessing import Pool
  16 | from typing import Dict, List, Optional, Set, Tuple
  17 | 
  18 | import numpy as np
  19 | import pycocotools.mask as mask_utils
  20 | import torch
  21 | # from egodet.metric.metric import RecallAtPrecision
  22 | 
  23 | # from iopath.common.file_io import PathManager
  24 | # from iopath.fb.manifold import ManifoldPathHandler
  25 | 
  26 | from .egoobjects import EgoObjects, EgoObjectsMetaInfo
  27 | 
  28 | from .results import EgoObjectsResults
  29 | 
  30 | 
  31 | # pathmgr = PathManager()
  32 | # pathmgr.register_handler(ManifoldPathHandler())
  33 | 
  34 | logger = logging.getLogger(__name__)
  35 | 
  36 | 
  37 | def evaluate_img(
  38 |     det_type,
  39 |     img_id,
  40 |     class_id,
  41 |     area_ratio,
  42 |     background,
  43 |     lighting,
  44 |     difficulty,
  45 |     gt,
  46 |     dt,
  47 |     ious,
  48 |     params,
  49 |     img_nel,
  50 | ):
  51 |     """Perform evaluation for single category and image."""
  52 |     if len(gt) == 0 and len(dt) == 0:
  53 |         return None
  54 | 
  55 |     # Add another filed _ignore to only consider anns satisfying the constraints.
  56 |     for g in gt:
  57 |         ignore = g["ignore"]
  58 |         if ignore == 0 and (area_ratio != "all" and area_ratio != g["area_ratio"]):
  59 |             ignore = 1
  60 |         if ignore == 0 and (background != "all" and background != g["background"]):
  61 |             ignore = 1
  62 |         if ignore == 0 and (lighting != "all" and lighting != g["lighting"]):
  63 |             ignore = 1
  64 |         if ignore == 0 and (
  65 |             difficulty != "all" and "difficulty" in g and difficulty != g["difficulty"]
  66 |         ):
  67 |             ignore = 1
  68 |         g["_ignore"] = ignore
  69 | 
  70 |     # Sort gt ignore last
  71 |     gt_idx = np.argsort([g["_ignore"] for g in gt], kind="mergesort")
  72 |     gt = [gt[i] for i in gt_idx]
  73 |     # Sort dt highest score first
  74 |     dt_idx = np.argsort([-d["score"] for d in dt], kind="mergesort")
  75 |     dt = [dt[i] for i in dt_idx]
  76 |     # load computed ious
  77 |     ious = ious[:, gt_idx] if len(ious) > 0 else ious
  78 | 
  79 |     num_thrs = len(params.iou_thrs)
  80 |     num_gt = len(gt)
  81 |     num_dt = len(dt)
  82 |     # Array to store the "id" of the matched dt/gt
  83 |     gt_m = np.zeros((num_thrs, num_gt))
  84 |     dt_m = np.zeros((num_thrs, num_dt))
  85 | 
  86 |     gt_ig = np.array([g["_ignore"] for g in gt])
  87 |     dt_ig = np.zeros((num_thrs, num_dt))
  88 | 
  89 |     for iou_thr_idx, iou_thr in enumerate(params.iou_thrs):
  90 |         if len(ious) == 0:
  91 |             break
  92 | 
  93 |         for dt_idx, _dt in enumerate(dt):
  94 |             iou = min([iou_thr, 1 - 1e-10])
  95 |             # information about best match so far (m=-1 -> unmatched)
  96 |             # store the gt_idx which matched for _dt
  97 |             m = -1
  98 |             for gt_idx, _ in enumerate(gt):
  99 |                 # if this gt already matched continue
 100 |                 if gt_m[iou_thr_idx, gt_idx] > 0:
 101 |                     continue
 102 |                 # if _dt matched to reg gt, and on ignore gt, stop
 103 |                 if m > -1 and gt_ig[m] == 0 and gt_ig[gt_idx] == 1:
 104 |                     break
 105 |                 # continue to next gt unless better match made
 106 |                 if ious[dt_idx, gt_idx] < iou:
 107 |                     continue
 108 |                 # if match successful and best so far, store appropriately
 109 |                 iou = ious[dt_idx, gt_idx]
 110 |                 m = gt_idx
 111 | 
 112 |             # No match found for _dt, go to next _dt
 113 |             if m == -1:
 114 |                 continue
 115 | 
 116 |             # if gt to ignore for some reason update dt_ig.
 117 |             # Should not be used in evaluation.
 118 |             dt_ig[iou_thr_idx, dt_idx] = gt_ig[m]
 119 |             # _dt match found, update gt_m, and dt_m with "id"
 120 |             dt_m[iou_thr_idx, dt_idx] = gt[m]["id"]
 121 |             gt_m[iou_thr_idx, m] = _dt["id"]
 122 | 
 123 |     # We will ignore any unmatched detection if that category was
 124 |     # not exhaustively annotated in gt.
 125 |     class_id_key = "category_id" if det_type == "cat_det" else "instance_id"
 126 |     # dt_ig_mask = [
 127 |     #     d["area"] < area_rng[0]
 128 |     #     or d["area"] > area_rng[1]
 129 |     #     or d[class_id_key] in img_nel[d["image_id"]]
 130 |     #     for d in dt
 131 |     # ]
 132 |     dt_ig_mask = [
 133 |         d[class_id_key] in img_nel[d["image_id"]]
 134 |         or (area_ratio != "all" and d["area_ratio"] != area_ratio)
 135 |         or (background != "all" and d["background"] != background)
 136 |         or (lighting != "all" and d["lighting"] != lighting)
 137 |         or (difficulty != "all" and "difficulty" in d and d["difficulty"] != difficulty)
 138 |         for d in dt
 139 |     ]
 140 |     dt_ig_mask = np.array(dt_ig_mask).reshape((1, num_dt))  # 1 X num_dt
 141 |     dt_ig_mask = np.repeat(dt_ig_mask, num_thrs, 0)  # num_thrs X num_dt
 142 |     # Based on dt_ig_mask ignore any unmatched detection by updating dt_ig
 143 |     dt_ig = np.logical_or(dt_ig, np.logical_and(dt_m == 0, dt_ig_mask))
 144 | 
 145 |     # store results for given image and category
 146 |     return {
 147 |         "dt_ids": [d["id"] for d in dt],
 148 |         "dt_matches": dt_m,
 149 |         "dt_scores": [d["score"] for d in dt],
 150 |         "gt_ignore": gt_ig,
 151 |         "dt_ignore": dt_ig,
 152 |         "config": (img_id, class_id, area_ratio, background, lighting, difficulty),
 153 |     }
 154 | 
 155 | 
 156 | class EgoObjectsEval:
 157 |     def __init__(
 158 |         self,
 159 |         gt: EgoObjects,
 160 |         dt: EgoObjectsResults,
 161 |         num_processes: int = 1,
 162 |         max_dets: int = 100,
 163 |         eval_type: Tuple[str] = ("cat_det", "inst_det"),
 164 |     ):
 165 |         """
 166 |         Args:
 167 |             gt: ground truth
 168 |             dt: detection results
 169 |             num_processes: If 0, use single main process. If >0, use multiprocessing.Pool to
 170 |                 do evaluate() using threads
 171 |             max_dets: maximal detections per image
 172 |         """
 173 |         assert num_processes >= 0
 174 | 
 175 |         self.gt = gt
 176 |         self.dt = dt
 177 |         self.num_processes = num_processes
 178 |         self.eval_type = eval_type
 179 | 
 180 |         self.eval_imgs = {}
 181 |         self.eval = {}
 182 |         self.gts = {}
 183 |         self.dts = {}
 184 |         self.results = {}
 185 |         self.ious = {}
 186 |         self.params = {
 187 |             "cat_det": CategoryDetectionParams(max_dets=max_dets),
 188 |             "inst_det": InstanceDetectionParams(max_dets=max_dets),
 189 |         }
 190 |         self.meta = EgoObjectsMetaInfo()
 191 |         self.freq_groups = {}
 192 |         self.img_nel = {}
 193 |         for det_type in self.eval_type:
 194 |             # per-image per-category evaluation results
 195 |             self.eval_imgs[det_type] = None
 196 |             self.eval[det_type] = {}  # accumulated evaluation results
 197 |             self.gts[det_type] = defaultdict(list)  # gt for evaluation
 198 |             self.dts[det_type] = defaultdict(list)  # dt for evaluation
 199 |             self.results[det_type] = OrderedDict()
 200 |             self.ious[det_type] = {}  # ious between all gts and dts
 201 | 
 202 |             self.params[det_type].img_ids = sorted(self.gt.get_img_ids())
 203 |             self.params[det_type].class_ids = sorted(self.gt.get_class_ids(det_type))
 204 | 
 205 |             logger.info(
 206 |                 f"{det_type}, num class ids {len(self.params[det_type].class_ids)}"
 207 |             )
 208 | 
 209 |     def run(self, metric_filter):
 210 |         unique_metrics = {}
 211 |         for det_type in self.eval_type:
 212 |             unique_metrics = set(
 213 |                 [
 214 |                     f"ar{x['ar']}-bg{x['bg']}-lt{x['lt']}-df{x['df']}"
 215 |                     for x in metric_filter[det_type]
 216 |                 ]
 217 |             )
 218 |             self.evaluate(det_type, unique_metrics)
 219 |             self.accumulate(det_type, unique_metrics)
 220 |             self.summarize(det_type, metric_filter[det_type])
 221 | 
 222 |     def evaluate(
 223 |         self, det_type: str, unique_metrics: Set[str], multiprocessing: bool = False
 224 |     ):
 225 |         logger.info(f"Running per image evaluation for {det_type}.")
 226 | 
 227 |         start_time = time.time()
 228 | 
 229 |         class_ids = self.params[det_type].class_ids
 230 |         self._prepare(det_type)
 231 | 
 232 |         self.ious[det_type] = {
 233 |             (img_id, class_id): self.compute_iou(det_type, img_id, class_id)
 234 |             for img_id in self.params[det_type].img_ids
 235 |             for class_id in class_ids
 236 |         }
 237 | 
 238 |         logger.info(f"num_processes {self.num_processes}")
 239 | 
 240 |         # loop through images, area range, max detection number
 241 |         arg_tuples = []
 242 |         for class_id in class_ids:
 243 |             for area_ratio in self.params[det_type].area_rng_lbl:
 244 |                 for bg in self.meta.background:
 245 |                     for lt in self.meta.lighting:
 246 |                         for df in self.params[det_type].difficulty:
 247 |                             for img_id in self.params[det_type].img_ids:
 248 |                                 metric_tag = f"ar{area_ratio}-bg{bg}-lt{lt}-df{df}"
 249 |                                 if metric_tag in unique_metrics:
 250 |                                     arg_tuples.append(
 251 |                                         (
 252 |                                             det_type,
 253 |                                             img_id,
 254 |                                             class_id,
 255 |                                             area_ratio,
 256 |                                             bg,
 257 |                                             lt,
 258 |                                             df,
 259 |                                             self.gts[det_type][img_id, class_id],
 260 |                                             self.dts[det_type][img_id, class_id],
 261 |                                             self.ious[det_type][img_id, class_id],
 262 |                                             self.params[det_type],
 263 |                                             self.img_nel[det_type],
 264 |                                         )
 265 |                                     )
 266 | 
 267 |         if self.num_processes > 1:
 268 |             with Pool(self.num_processes) as pool:
 269 |                 self.eval_imgs[det_type] = pool.starmap(evaluate_img, arg_tuples)
 270 |         else:
 271 |             self.eval_imgs[det_type] = [evaluate_img(*x) for x in arg_tuples]
 272 | 
 273 |         elapsed_time = time.time() - start_time
 274 |         logger.info(f"Elapsed time of {det_type} evaluate(): {elapsed_time:.2f} sec")
 275 | 
 276 |     def accumulate(self, det_type: str, unique_metrics: Set[str]):
 277 |         """Accumulate per image evaluation results and store the result in
 278 |         self.eval[det_type].
 279 |         """
 280 |         logger.info(f"Accumulating evaluation results for {det_type}.")
 281 |         if self.eval_imgs[det_type] is None:
 282 |             logger.warning(f"Please run evaluate('{det_type}') first.")
 283 | 
 284 |         class_ids = self.params[det_type].class_ids
 285 |         cls_id_2_idx = {x: i for i, x in enumerate(class_ids)}
 286 | 
 287 |         num_thrs = len(self.params[det_type].iou_thrs)
 288 |         num_recalls = len(self.params[det_type].rec_thrs)
 289 |         num_classes = len(class_ids)
 290 |         num_area_rngs = len(self.params[det_type].area_rng)
 291 |         num_backgrounds = len(self.meta.background)
 292 |         num_lightings = len(self.meta.lighting)
 293 |         num_difficulties = len(self.params[det_type].difficulty)
 294 | 
 295 |         # -1 for absent classes
 296 |         precision = -np.ones(
 297 |             (
 298 |                 num_thrs,
 299 |                 num_recalls,
 300 |                 num_classes,
 301 |                 num_area_rngs,
 302 |                 num_backgrounds,
 303 |                 num_lightings,
 304 |                 num_difficulties,
 305 |             )
 306 |         )
 307 |         recall = -np.ones(
 308 |             (
 309 |                 num_thrs,
 310 |                 num_classes,
 311 |                 num_area_rngs,
 312 |                 num_backgrounds,
 313 |                 num_lightings,
 314 |                 num_difficulties,
 315 |             )
 316 |         )
 317 |         # recall_at_precision = -np.ones(
 318 |         #     (
 319 |         #         num_thrs,
 320 |         #         num_classes,
 321 |         #         num_area_rngs,
 322 |         #         num_backgrounds,
 323 |         #         num_lightings,
 324 |         #         num_difficulties,
 325 |         #     )
 326 |         # )
 327 |         # recall_at_precision_metric = RecallAtPrecision(
 328 |         #     self.params[det_type].recall_at_precision_k
 329 |         # )
 330 | 
 331 |         # Initialize dt_pointers
 332 |         dt_pointers = {}
 333 |         for cls_idx in range(num_classes):
 334 |             dt_pointers[cls_idx] = {}
 335 |             for area_idx in range(num_area_rngs):
 336 |                 dt_pointers[cls_idx][area_idx] = {}
 337 |                 for bg_idx in range(num_backgrounds):
 338 |                     dt_pointers[cls_idx][area_idx][bg_idx] = {}
 339 |                     for lt_idx in range(num_lightings):
 340 |                         dt_pointers[cls_idx][area_idx][bg_idx][lt_idx] = {}
 341 |                         for df_idx in range(num_difficulties):
 342 |                             dt_pointers[cls_idx][area_idx][bg_idx][lt_idx][df_idx] = {}
 343 | 
 344 |         results = defaultdict(list)
 345 |         for res in self.eval_imgs[det_type]:
 346 |             if res is not None:
 347 |                 img_id, class_id, area, background, lighting, difficulty = res["config"]
 348 |                 cls_idx = cls_id_2_idx[class_id]
 349 |                 area_idx = self.params[det_type].area_rng_lbl.index(area)
 350 |                 bg_idx = self.meta.background.index(background)
 351 |                 lt_idx = self.meta.lighting.index(lighting)
 352 |                 df_idx = self.params[det_type].difficulty.index(difficulty)
 353 |                 results[(cls_idx, area_idx, bg_idx, lt_idx, df_idx)].append(
 354 |                     (img_id, res)
 355 |                 )
 356 | 
 357 |         for config, E in results.items():
 358 |             cls_idx, area_idx, bg_idx, lt_idx, df_idx = config
 359 |             E = [x[1] for x in E]
 360 | 
 361 |             # Append all scores: shape (N,)
 362 |             dt_scores = np.concatenate([e["dt_scores"] for e in E], axis=0)
 363 |             dt_ids = np.concatenate([e["dt_ids"] for e in E], axis=0)
 364 | 
 365 |             dt_idx = np.argsort(-dt_scores, kind="mergesort")
 366 |             dt_scores = dt_scores[dt_idx]
 367 |             dt_ids = dt_ids[dt_idx]
 368 | 
 369 |             dt_m = np.concatenate([e["dt_matches"] for e in E], axis=1)[:, dt_idx]
 370 |             dt_ig = np.concatenate([e["dt_ignore"] for e in E], axis=1)[:, dt_idx]
 371 | 
 372 |             gt_ig = np.concatenate([e["gt_ignore"] for e in E])
 373 |             # num gt anns to consider
 374 |             num_gt = np.count_nonzero(gt_ig == 0)
 375 | 
 376 |             if num_gt == 0:
 377 |                 continue
 378 | 
 379 |             tps = np.logical_and(dt_m, np.logical_not(dt_ig))
 380 |             fps = np.logical_and(np.logical_not(dt_m), np.logical_not(dt_ig))
 381 | 
 382 |             tp_sum = np.cumsum(tps, axis=1).astype(dtype=float)
 383 |             fp_sum = np.cumsum(fps, axis=1).astype(dtype=float)
 384 | 
 385 |             dt_pointers[cls_idx][area_idx][bg_idx][lt_idx][df_idx] = {
 386 |                 "dt_ids": dt_ids,
 387 |                 "tps": tps,
 388 |                 "fps": fps,
 389 |             }
 390 | 
 391 |             for iou_thr_idx, (tp, fp) in enumerate(zip(tp_sum, fp_sum)):
 392 |                 tp = np.array(tp)
 393 |                 fp = np.array(fp)
 394 |                 num_tp = len(tp)
 395 |                 rc = tp / num_gt
 396 |                 if num_tp:
 397 |                     recall[iou_thr_idx, cls_idx, area_idx, bg_idx, lt_idx, df_idx] = rc[
 398 |                         -1
 399 |                     ]
 400 |                 else:
 401 |                     recall[iou_thr_idx, cls_idx, area_idx, bg_idx, lt_idx, df_idx] = 0
 402 | 
 403 |                 # np.spacing(1) ~= eps
 404 |                 pr = tp / (fp + tp + np.spacing(1))
 405 |                 pr = pr.tolist()
 406 | 
 407 |                 # Replace each precision value with the maximum precision
 408 |                 # value to the right of that recall level. This ensures
 409 |                 # that the  calculated AP value will be less suspectable
 410 |                 # to small variations in the ranking.
 411 |                 for i in range(num_tp - 1, 0, -1):
 412 |                     if pr[i] > pr[i - 1]:
 413 |                         pr[i - 1] = pr[i]
 414 | 
 415 |                 rec_thrs_insert_idx = np.searchsorted(
 416 |                     rc, self.params[det_type].rec_thrs, side="left"
 417 |                 )
 418 | 
 419 |                 pr_at_recall = [0.0] * num_recalls
 420 | 
 421 |                 # we need to use "try-except" clause because for some high recall threshold,
 422 |                 # the "pr_idx" == len(pr)
 423 |                 try:
 424 |                     for _idx, pr_idx in enumerate(rec_thrs_insert_idx):
 425 |                         pr_at_recall[_idx] = pr[pr_idx]
 426 |                 except BaseException:
 427 |                     pass
 428 |                 precision[
 429 |                     iou_thr_idx, :, cls_idx, area_idx, bg_idx, lt_idx, df_idx
 430 |                 ] = np.array(pr_at_recall)
 431 |                 # Compute recall_at_precision below
 432 |                 dt_ig_i = dt_ig[iou_thr_idx]
 433 |                 dt_scores_i = dt_scores[np.logical_not(dt_ig_i)]
 434 |                 dt_m_i = dt_m[iou_thr_idx, np.logical_not(dt_ig_i)]
 435 |                 dt_true_i = np.greater(dt_m_i, 0)
 436 | 
 437 |                 # recall_at_precision_metric.reset_state()
 438 |                 # recall_at_precision_metric.update_state(
 439 |                 #     torch.from_numpy(dt_true_i).to(torch.float32),
 440 |                 #     torch.from_numpy(dt_scores_i).to(torch.float32),
 441 |                 #     num_gt,
 442 |                 # )
 443 |                 # res = recall_at_precision_metric.result()
 444 |                 # recall_at_precision[
 445 |                 #     iou_thr_idx, cls_idx, area_idx, bg_idx, lt_idx, df_idx
 446 |                 # ] = res
 447 | 
 448 |         self.eval[det_type] = {
 449 |             "params": self.params[det_type],
 450 |             "counts": [
 451 |                 num_thrs,
 452 |                 num_recalls,
 453 |                 num_classes,
 454 |                 num_area_rngs,
 455 |                 num_backgrounds,
 456 |                 num_lightings,
 457 |             ],
 458 |             "date": datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
 459 |             "precision": precision,
 460 |             "recall": recall,
 461 |             "dt_pointers": dt_pointers,
 462 |             # "recall_at_precision": recall_at_precision,
 463 |         }
 464 | 
 465 |     def _summarize(
 466 |         self,
 467 |         det_type: str,
 468 |         summary_type: str,
 469 |         iou_thr: Optional[List[float]] = None,
 470 |         area_rng: str = "all",
 471 |         background: str = "all",
 472 |         lighting: str = "all",
 473 |         difficulty: str = "all",
 474 |         group: Optional[str] = None,
 475 |         topk=None,
 476 |     ):
 477 |         group_idx = None
 478 |         if group is not None and group != "all":
 479 |             if det_type == "cat_det":
 480 |                 freq_group_idx = self.params[det_type].img_count_lbl.index(group)
 481 |                 group_idx = self.freq_groups[det_type][freq_group_idx]
 482 |             else:
 483 |                 assert group in {"seen", "unseen"}
 484 |                 if group == "seen":
 485 |                     group_idx = self.inst_det_seen_unseen_cat_groups[0]
 486 |                 else:
 487 |                     group_idx = self.inst_det_seen_unseen_cat_groups[1]
 488 | 
 489 |         aidx = np.array(
 490 |             [
 491 |                 idx
 492 |                 for idx, _area_rng in enumerate(self.params[det_type].area_rng_lbl)
 493 |                 if _area_rng == area_rng
 494 |             ]
 495 |         )
 496 |         bidx = np.array(
 497 |             [
 498 |                 idx
 499 |                 for idx, _background in enumerate(self.meta.background)
 500 |                 if _background == background
 501 |             ]
 502 |         )
 503 |         lidx = np.array(
 504 |             [
 505 |                 idx
 506 |                 for idx, _lighting in enumerate(self.meta.lighting)
 507 |                 if _lighting == lighting
 508 |             ]
 509 |         )
 510 |         didx = np.array(
 511 |             [
 512 |                 idx
 513 |                 for idx, _difficulty in enumerate(self.params[det_type].difficulty)
 514 |                 if _difficulty == difficulty
 515 |             ]
 516 |         )
 517 | 
 518 |         for idx in [aidx, bidx, lidx, didx]:
 519 |             if idx.size <= 0:
 520 |                 return -1
 521 | 
 522 |         tidx = None
 523 |         if iou_thr is not None:
 524 |             iou_thr_to_idx = {
 525 |                 x: i for i, x in enumerate(self.params[det_type].iou_thrs)
 526 |             }
 527 |             tidx = np.array([iou_thr_to_idx[x] for x in iou_thr]).astype(np.int64)
 528 | 
 529 |         if summary_type == "ap":
 530 |             s = self.eval[det_type]["precision"]
 531 |             if tidx is not None:
 532 |                 s = s[tidx]
 533 |             if group_idx is not None:
 534 |                 s = s[:, :, group_idx, aidx, bidx, lidx, didx]
 535 |             else:
 536 |                 s = s[:, :, :, aidx, bidx, lidx, didx]
 537 |         elif summary_type == "ar":
 538 |             s = self.eval[det_type]["recall"]
 539 |             if tidx is not None:
 540 |                 s = s[tidx]
 541 |             s = s[:, :, aidx, bidx, lidx, didx]
 542 |         elif summary_type == "r@p":
 543 |             s = self.eval[det_type]["recall_at_precision"]
 544 |             if tidx is not None:
 545 |                 s = s[tidx]
 546 |             if group_idx is not None:
 547 |                 s = s[:, group_idx, aidx, bidx, lidx, didx]
 548 |             else:
 549 |                 s = s[:, :, aidx, bidx, lidx, didx]
 550 |             if topk is not None:
 551 |                 sorted_s = -np.sort(-s, axis=1)
 552 |                 s = sorted_s[:, :topk]
 553 |         else:
 554 |             raise ValueError(f"unknown summary type {summary_type}")
 555 | 
 556 |         if len(s[s > -1]) == 0:
 557 |             mean_s = -1
 558 |         else:
 559 |             mean_s = np.mean(s[s > -1])
 560 |         return mean_s
 561 | 
 562 |     def summarize(self, det_type: str, metric_filter: List[Dict]):
 563 |         if not self.eval[det_type]:
 564 |             raise RuntimeError("Please run accumulate() first.")
 565 | 
 566 |         logger.info(f"Summarize detection results for {det_type}.")
 567 | 
 568 |         max_dets = self.params[det_type].max_dets
 569 |         coco_iou_thres = [0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]
 570 |         coco_iou_thres_str = f"{coco_iou_thres[0]:0.2f}:{coco_iou_thres[-1]:0.2f}"
 571 | 
 572 |         results = self.results[det_type]
 573 | 
 574 |         for metric in metric_filter:
 575 |             key = "AP"
 576 |             for k in ["iou", "gr", "ar", "bg", "lt", "df"]:
 577 |                 if k == "iou":
 578 |                     if metric[k] != "coco":
 579 |                         key = key + metric[k]
 580 |                 elif metric[k] != "all":
 581 |                     key = key + "-" + f"{k}{metric[k]}"
 582 | 
 583 |             results[key] = {
 584 |                 "title": "AP",
 585 |                 "iou_thres": coco_iou_thres_str
 586 |                 if metric["iou"] == "coco"
 587 |                 else int(metric["iou"]) / 100.0,
 588 |                 "area_rng": metric["ar"],
 589 |                 "background": metric["bg"],
 590 |                 "lighting": metric["lt"],
 591 |                 "difficulty": metric["df"],
 592 |                 "cat_group_name": metric["gr"],
 593 |                 "value": self._summarize(
 594 |                     det_type,
 595 |                     "ap",
 596 |                     iou_thr=coco_iou_thres
 597 |                     if metric["iou"] == "coco"
 598 |                     else [int(metric["iou"]) / 100.0],
 599 |                     group=metric["gr"],
 600 |                     area_rng=metric["ar"],
 601 |                     background=metric["bg"],
 602 |                     lighting=metric["lt"],
 603 |                     difficulty=metric["df"],
 604 |                 ),
 605 |             }
 606 | 
 607 |         # for metric in metric_filter:
 608 |         #     key = f"R@P{int(self.params[det_type].recall_at_precision_k * 100):02d}"
 609 |         #     for k in ["iou", "gr", "ar", "bg", "lt", "df"]:
 610 |         #         if k == "iou":
 611 |         #             if metric[k] != "coco":
 612 |         #                 key = key + "-" + metric[k]
 613 |         #         elif metric[k] != "all":
 614 |         #             key = key + "-" + f"{k}{metric[k]}"
 615 | 
 616 |         #     results[key] = {
 617 |         #         "title": "R@P",
 618 |         #         "iou_thres": coco_iou_thres_str
 619 |         #         if metric["iou"] == "coco"
 620 |         #         else int(metric["iou"]) / 100.0,
 621 |         #         "area_rng": metric["ar"],
 622 |         #         "background": metric["bg"],
 623 |         #         "lighting": metric["lt"],
 624 |         #         "difficulty": metric["df"],
 625 |         #         "cat_group_name": metric["gr"],
 626 |         #         "value": self._summarize(
 627 |         #             det_type,
 628 |         #             "r@p",
 629 |         #             iou_thr=coco_iou_thres
 630 |         #             if metric["iou"] == "coco"
 631 |         #             else [int(metric["iou"]) / 100.0],
 632 |         #             group=metric["gr"],
 633 |         #             area_rng=metric["ar"],
 634 |         #             background=metric["bg"],
 635 |         #             lighting=metric["lt"],
 636 |         #             difficulty=metric["df"],
 637 |         #         ),
 638 |         #     }
 639 | 
 640 |         key = f"AR50@{max_dets}"
 641 |         results[key] = {
 642 |             "title": "AR",
 643 |             "iou_thres": 0.5,
 644 |             "area_rng": "all",
 645 |             "background": "all",
 646 |             "lighting": "all",
 647 |             "cat_group_name": "all",
 648 |             "value": self._summarize(det_type, "ar", iou_thr=[0.5]),
 649 |         }
 650 | 
 651 |     def print_results(self):
 652 |         for det_type in self.eval_type:
 653 |             logger.info(f"print results for {det_type}")
 654 |             self._print_results(det_type)
 655 | 
 656 |     def _print_results(self, det_type: str):
 657 |         template = " {:<12} @[ IoU={:<9} | area={:>6s} | background={:>6s} | lighting={:>6s} | maxDets={:>3d} catIds={:>3s}] = {:0.3f}"
 658 | 
 659 |         for _key, value in self.results[det_type].items():
 660 |             max_dets = self.params[det_type].max_dets
 661 | 
 662 |             logger.info(
 663 |                 template.format(
 664 |                     value["title"],
 665 |                     value["iou_thres"],
 666 |                     value["area_rng"],
 667 |                     value["background"],
 668 |                     value["lighting"],
 669 |                     max_dets,
 670 |                     value["cat_group_name"] + f"-top{value.get('topk', '')}",
 671 |                     value["value"],
 672 |                 )
 673 |             )
 674 | 
 675 |     def _get_gt_dt(self, det_type, img_id, class_id):
 676 |         """Create gt, dt which are list of anns/dets."""
 677 |         gt = self.gts[det_type][img_id, class_id]
 678 |         dt = self.dts[det_type][img_id, class_id]
 679 |         return gt, dt
 680 | 
 681 |     def compute_iou(self, det_type: str, img_id: int, class_id: int):
 682 |         gt, dt = self._get_gt_dt(det_type, img_id, class_id)
 683 | 
 684 |         if len(gt) == 0 and len(dt) == 0:
 685 |             return []
 686 | 
 687 |         # Sort detections in decreasing order of score.
 688 |         idx = np.argsort([-d["score"] for d in dt], kind="mergesort")
 689 |         dt = [dt[i] for i in idx]
 690 | 
 691 |         ann_type = "bbox"
 692 |         gt = [g[ann_type] for g in gt]
 693 |         dt = [d[ann_type] for d in dt]
 694 | 
 695 |         # compute iou between each dt and gt region
 696 |         # will return array of shape len(dt), len(gt)
 697 |         iscrowd = [int(False)] * len(gt)
 698 |         ious = mask_utils.iou(dt, gt, iscrowd)
 699 |         return ious
 700 | 
 701 |     def _prepare(self, det_type: str):
 702 |         img_ids = self.params[det_type].img_ids
 703 |         class_ids = self.params[det_type].class_ids
 704 | 
 705 |         logger.info(f"{det_type}, len params img_ids {len(img_ids)}")
 706 |         logger.info(f"{det_type}, len params class_ids {len(class_ids)}")
 707 | 
 708 |         gts = self.gt.load_anns(
 709 |             det_type,
 710 |             self.gt.get_ann_ids(det_type, img_ids=img_ids, class_ids=class_ids),
 711 |         )
 712 |         dts = self.dt.load_anns(
 713 |             det_type,
 714 |             self.dt.get_ann_ids(det_type, img_ids=img_ids, class_ids=class_ids),
 715 |         )
 716 | 
 717 |         logger.info(f"{det_type}, len gts {len(gts)}")
 718 |         logger.info(f"{det_type}, len dts {len(dts)}")
 719 | 
 720 |         # set ignore flag
 721 |         for gt in gts:
 722 |             if "ignore" not in gt:
 723 |                 gt["ignore"] = 0
 724 | 
 725 |         for gt in gts:
 726 |             class_key = "category_id" if det_type == "cat_det" else "instance_id"
 727 |             self.gts[det_type][gt["image_id"], gt[class_key]].append(gt)
 728 | 
 729 |             # asscociate image meta info to each gt
 730 |             img = self.gt.imgs[gt["image_id"]]
 731 |             img_meta = self.meta.video_id_to_setting[img["video_id"]]
 732 |             for key in ["background", "lighting"]:
 733 |                 gt[key] = img_meta[key]
 734 | 
 735 |             area_ratio = math.sqrt(gt["area"] / (img["height"] * img["width"]))
 736 |             if area_ratio < 0.1:
 737 |                 gt["area_ratio"] = "small"
 738 |             elif area_ratio < 0.2:
 739 |                 gt["area_ratio"] = "medium"
 740 |             else:
 741 |                 gt["area_ratio"] = "large"
 742 | 
 743 |             if det_type == "inst_det":
 744 |                 # [Easy] -- register and detect on simple
 745 |                 # [Medium] -- register on busy, detect on simple; register and detect on busy
 746 |                 # [Hard] -- register on simple, detect on busy
 747 |                 instance_id = gt["instance_id"]
 748 |                 # the background for the instance that's used for registration
 749 |                 if (
 750 |                     hasattr(self.gt.metadata, "instance_register_bg")
 751 |                     and instance_id in self.gt.metadata.instance_register_bg
 752 |                 ):
 753 |                     register_bg = self.gt.metadata.instance_register_bg[instance_id]
 754 |                     query_bg = self.meta.video_id_to_setting[img["video_id"]][
 755 |                         "background"
 756 |                     ]
 757 |                     if register_bg == "simple" and query_bg == "simple":
 758 |                         gt["difficulty"] = "easy"
 759 |                     elif register_bg == "simple" and query_bg == "busy":
 760 |                         gt["difficulty"] = "hard"
 761 |                     else:
 762 |                         gt["difficulty"] = "medium"
 763 |                 else:
 764 |                     logger.warning(f"instance_id={instance_id} is not registered!")
 765 | 
 766 |         # For federated dataset evaluation we will filter out all dt for an
 767 |         # image which belong to classes not present in gt and not present in
 768 |         # the negative list for an image. In other words detector is not penalized
 769 |         # for classes about which we don't have gt information about their
 770 |         # presence or absence in an image.
 771 |         img_data = self.gt.load_imgs(ids=self.params[det_type].img_ids)
 772 |         # per image map of classes not present in image
 773 |         neg_class_ids_key = (
 774 |             "neg_category_ids" if det_type == "cat_det" else "neg_instance_ids"
 775 |         )
 776 |         img_nl = {d["id"]: d[neg_class_ids_key] for d in img_data}
 777 |         # per image list of classes present in image
 778 |         img_pl = defaultdict(set)
 779 |         class_id_key = "category_id" if det_type == "cat_det" else "instance_id"
 780 |         for ann in gts:
 781 |             img_pl[ann["image_id"]].add(ann[class_id_key])
 782 |         # per image map of classes which have missing gt. For these
 783 |         # classes we don't penalize the detector for false positives.
 784 |         self.img_nel[det_type] = {
 785 |             d["id"]: d["not_exhaustive_category_ids"]
 786 |             if det_type == "cat_det" and "not_exhaustive_category_ids" in d
 787 |             else []
 788 |             for d in img_data
 789 |         }
 790 | 
 791 |         for dt in dts:
 792 |             img_id, class_id = dt["image_id"], dt[class_id_key]
 793 | 
 794 |             if class_id not in img_nl[img_id] and class_id not in img_pl[img_id]:
 795 |                 continue
 796 | 
 797 |             # asscociate image meta info to each dt
 798 |             img = self.gt.imgs[dt["image_id"]]
 799 |             img_meta = self.meta.video_id_to_setting[img["video_id"]]
 800 |             for key in ["background", "lighting"]:
 801 |                 dt[key] = img_meta[key]
 802 | 
 803 |             area_ratio = math.sqrt(dt["area"] / (img["height"] * img["width"]))
 804 |             if area_ratio < 0.1:
 805 |                 dt["area_ratio"] = "small"
 806 |             elif area_ratio < 0.2:
 807 |                 dt["area_ratio"] = "medium"
 808 |             else:
 809 |                 dt["area_ratio"] = "large"
 810 | 
 811 |             if det_type == "inst_det":
 812 |                 # [Easy] -- register and detect on simple
 813 |                 # [Medium] -- register on busy, detect on simple; register and detect on busy
 814 |                 # [Hard] -- register on simple, detect on busy
 815 |                 instance_id = dt["instance_id"]
 816 |                 # the background for the instance that's used for registration
 817 |                 if (
 818 |                     hasattr(self.gt.metadata, "instance_register_bg")
 819 |                     and instance_id in self.gt.metadata.instance_register_bg
 820 |                 ):
 821 |                     register_bg = self.gt.metadata.instance_register_bg[instance_id]
 822 |                     query_bg = self.meta.video_id_to_setting[img["video_id"]][
 823 |                         "background"
 824 |                     ]
 825 |                     if register_bg == "simple" and query_bg == "simple":
 826 |                         dt["difficulty"] = "easy"
 827 |                     elif register_bg == "simple" and query_bg == "busy":
 828 |                         dt["difficulty"] = "hard"
 829 |                     else:
 830 |                         dt["difficulty"] = "medium"
 831 |                 else:
 832 |                     logger.warning(f"instance_id={instance_id} is not registered!")
 833 |             self.dts[det_type][img_id, class_id].append(dt)
 834 | 
 835 |         self.freq_groups[det_type] = self._prepare_freq_group(det_type)
 836 | 
 837 |         if det_type == "inst_det":
 838 |             self.inst_det_seen_unseen_cat_groups = (
 839 |                 self._prepare_seen_unseen_cat_groups()
 840 |             )
 841 | 
 842 |     def _prepare_freq_group(self, det_type: str):
 843 |         freq_groups = [[] for _ in self.params[det_type].img_count_lbl]
 844 |         class_data = self.gt.load_classes(det_type, self.params[det_type].class_ids)
 845 |         for idx, _class_data in enumerate(class_data):
 846 |             if "frequency" in _class_data:
 847 |                 frequency = _class_data["frequency"]
 848 |             else:
 849 |                 # assign all sample to frequent group if not specified
 850 |                 frequency = "frequent"
 851 |             freq_groups[self.params[det_type].img_count_lbl.index(frequency)].append(
 852 |                 idx
 853 |             )
 854 | 
 855 |         return freq_groups
 856 | 
 857 |     def _prepare_seen_unseen_cat_groups(self):
 858 |         det_type = "inst_det"
 859 |         # 2 groups in total, including "seen" and "unseen" groups
 860 |         seen_unseen_groups = [[], []]
 861 |         class_data = self.gt.load_classes(det_type, self.params[det_type].class_ids)
 862 | 
 863 |         logger.info(f"num cat det categories {len(self.gt.classes['cat_det'])}")
 864 | 
 865 |         for idx, _class_data in enumerate(class_data):
 866 |             # Object categories consideted by category detection are common categories between
 867 |             # train and val split.
 868 |             group_id = (
 869 |                 0
 870 |                 if "category_id" in _class_data
 871 |                 and _class_data["category_id"] in self.gt.classes["cat_det"]
 872 |                 else 1
 873 |             )
 874 |             seen_unseen_groups[group_id].append(idx)
 875 | 
 876 |         for group_id, group in enumerate(seen_unseen_groups):
 877 |             logger.info(f"seen/unseen group_id {group_id}, group size {len(group)}")
 878 | 
 879 |         return seen_unseen_groups
 880 | 
 881 |     def get_results(self):
 882 |         return {det_type: self._get_results(det_type) for det_type in self.eval_type}
 883 | 
 884 |     def _get_results(self, det_type: str):
 885 |         if len(self.results[det_type]) == 0:
 886 |             logger.warning(f"{det_type} results is empty. Call run().")
 887 |         return self.results[det_type]
 888 | 
 889 |     def log_per_class_results(self, output_dir):
 890 |         if output_dir:
 891 |             det_type = "cat_det"
 892 |             iou_thres = 0.5
 893 |             rec_at_prec_k = int(self.params[det_type].recall_at_precision_k * 100)
 894 |             recall_at_prec = self.eval[det_type]["recall_at_precision"]
 895 |             precision = self.eval[det_type]["precision"]
 896 | 
 897 |             for area_rng in self.params[det_type].area_rng_lbl:
 898 |                 aidx = [
 899 |                     idx
 900 |                     for idx, _area_rng in enumerate(self.params[det_type].area_rng_lbl)
 901 |                     if _area_rng == area_rng
 902 |                 ][0]
 903 |                 tidx = np.where(iou_thres == self.params[det_type].iou_thrs)[0].item()
 904 |                 bidx = [
 905 |                     idx
 906 |                     for idx, _background in enumerate(self.meta.background)
 907 |                     if _background == "all"
 908 |                 ][0]
 909 |                 lidx = [
 910 |                     idx
 911 |                     for idx, _lighting in enumerate(self.meta.lighting)
 912 |                     if _lighting == "all"
 913 |                 ][0]
 914 |                 didx = [
 915 |                     idx
 916 |                     for idx, _difficulty in enumerate(self.params[det_type].difficulty)
 917 |                     if _difficulty == "all"
 918 |                 ][0]
 919 | 
 920 |                 # print per-category R@P stats
 921 |                 cur_recall_at_prec = recall_at_prec[
 922 |                     tidx, :, aidx, bidx, lidx, didx
 923 |                 ].reshape(-1)
 924 |                 sort_idx = np.argsort(-cur_recall_at_prec)
 925 | 
 926 |                 lines = []
 927 |                 for idx in sort_idx.tolist():
 928 |                     r_at_p = cur_recall_at_prec[idx]
 929 |                     cat = self.gt.cats[det_type][self.params[det_type].class_ids[idx]]
 930 |                     lines.append(
 931 |                         "{},{},{},{:0.2f}".format(
 932 |                             cat["name"],
 933 |                             cat["image_count"] if "image_count" in cat else 0,
 934 |                             cat["instance_count"] if "instance_count" in cat else 0,
 935 |                             r_at_p,
 936 |                         )
 937 |                     )
 938 | 
 939 |                 key = "R@P{}-{}-{}.csv".format(
 940 |                     rec_at_prec_k, int(iou_thres * 100), area_rng
 941 |                 )
 942 |                 with open(os.path.join(output_dir, key), "w") as h:
 943 |                     h.write("\n".join(lines))
 944 | 
 945 |                 # print per-category AP50 stats
 946 |                 cur_precision = precision[tidx, :, :, aidx, bidx, lidx, didx]
 947 |                 cur_precision = np.mean(cur_precision, axis=0)
 948 |                 sort_idx = np.argsort(-cur_precision)
 949 | 
 950 |                 lines = []
 951 |                 for idx in sort_idx.tolist():
 952 |                     ap50 = cur_precision[idx]
 953 |                     cat = self.gt.cats[det_type][self.params[det_type].class_ids[idx]]
 954 |                     lines.append(
 955 |                         "{},{},{},{:0.2f}".format(
 956 |                             cat["name"],
 957 |                             cat["image_count"] if "image_count" in cat else 0,
 958 |                             cat["instance_count"] if "instance_count" in cat else 0,
 959 |                             ap50,
 960 |                         )
 961 |                     )
 962 | 
 963 |                 key = "AP{}-{}.csv".format(int(iou_thres * 100), area_rng)
 964 |                 with open(os.path.join(output_dir, key), "w") as h:
 965 |                     h.write("\n".join(lines))
 966 | 
 967 | 
 968 | class CategoryDetectionParams:
 969 |     def __init__(self, max_dets: int = 100):
 970 |         """CategoryDetectionParams for EgoObjects evaluation API."""
 971 |         self.img_ids = []
 972 |         self.class_ids = []
 973 |         # np.arange causes trouble.  the data point on arange is slightly
 974 |         # larger than the true value
 975 |         # self.iou_thrs = np.linspace(
 976 |         #     0.5, 0.95, int(np.round((0.95 - 0.5) / 0.05)) + 1, endpoint=True
 977 |         # )
 978 |         self.iou_thrs = np.array(
 979 |             [0.1, 0.25, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]
 980 |         )
 981 |         self.rec_thrs = np.linspace(
 982 |             0.0, 1.00, int(np.round((1.00 - 0.0) / 0.01)) + 1, endpoint=True
 983 |         )
 984 |         self.max_dets = max_dets
 985 |         self.area_rng = [
 986 |             [0**2, 1e5**2],
 987 |             [0**2, 32**2],
 988 |             [32**2, 96**2],
 989 |             [96**2, 1e5**2],
 990 |         ]
 991 |         self.area_rng_lbl = ["all", "small", "medium", "large"]
 992 |         # self.use_cats = 1
 993 |         # We bin classes in three bins based how many images of the training
 994 |         # set the category is present in.
 995 |         # r: Rare    :  < 10
 996 |         # c: Common  : >= 10 and < 100
 997 |         # f: Frequent: >= 100
 998 |         self.img_count_lbl = ["rare", "common", "frequent"]
 999 |         self.difficulty = ["all"]
1000 | 
1001 |         self.topk_classes = [100, 200]
1002 |         self.recall_at_precision_k = 0.9
1003 | 
1004 | 
1005 | class InstanceDetectionParams:
1006 |     def __init__(self, max_dets: int = 100):
1007 |         """InstanceDetectionParams for EgoObjects evaluation API."""
1008 |         self.img_ids = []
1009 |         self.class_ids = []
1010 |         # np.arange causes trouble.  the data point on arange is slightly
1011 |         # larger than the true value
1012 |         # self.iou_thrs = np.linspace(
1013 |         #     0.5, 0.95, int(np.round((0.95 - 0.5) / 0.05)) + 1, endpoint=True
1014 |         # )
1015 |         self.iou_thrs = np.array(
1016 |             [0.1, 0.25, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]
1017 |         )
1018 |         self.rec_thrs = np.linspace(
1019 |             0.0, 1.00, int(np.round((1.00 - 0.0) / 0.01)) + 1, endpoint=True
1020 |         )
1021 |         self.max_dets = max_dets
1022 |         self.area_rng = [
1023 |             [0**2, 1e5**2],
1024 |             [0**2, 32**2],
1025 |             [32**2, 96**2],
1026 |             [96**2, 1e5**2],
1027 |         ]
1028 |         self.area_rng_lbl = ["all", "small", "medium", "large"]
1029 |         # self.use_cats = 1
1030 |         # We bin classes in three bins based how many images of the training
1031 |         # set the category is present in.
1032 |         # r: Rare    :  < 10
1033 |         # c: Common  : >= 10 and < 100
1034 |         # f: Frequent: >= 100
1035 |         self.img_count_lbl = ["rare", "common", "frequent"]
1036 |         self.difficulty = ["all", "easy", "medium", "hard"]
1037 | 
1038 |         self.topk_classes = [400, 800, 1200]
1039 |         self.recall_at_precision_k = 0.9
1040 | 


--------------------------------------------------------------------------------
/egoobjects_api/results.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # Copyright (c) Meta Platforms, Inc. and affiliates.
 3 | # All rights reserved.
 4 | 
 5 | # This source code is licensed under the license found in the
 6 | # LICENSE file in the root directory of this source tree.
 7 | 
 8 | import logging
 9 | from collections import defaultdict
10 | from copy import deepcopy
11 | from typing import List, Dict, Any
12 | 
13 | from .egoobjects import EgoObjects
14 | 
15 | logger = logging.getLogger(__name__)
16 | 
17 | 
18 | class EgoObjectsResults(EgoObjects):
19 |     def __init__(
20 |         self,
21 |         gt: EgoObjects,
22 |         cat_det_dt_anns: List[Dict[str, Any]],
23 |         inst_det_dt_anns: List[Dict[str, Any]],
24 |         max_dets: int = 300,
25 |     ):
26 |         """Constructor for EgoObjects results.
27 |         Args:
28 |             gt: EgoObjects class instance
29 |             cat_det_dt_anns: detected bounding boxes for category detection
30 |             inst_det_dt_anns: detected bounding boxes for instance detection
31 |             max_dets: max number of detections per image.
32 |         """
33 |         logger.info(f"num category detections {len(cat_det_dt_anns)}")
34 |         logger.info(f"num instance detections {len(inst_det_dt_anns)}")
35 | 
36 |         self.dataset = deepcopy(gt.dataset)
37 | 
38 |         dt_anns = {}
39 | 
40 |         dt_anns["cat_det"] = (
41 |             self.limit_detections_per_image(cat_det_dt_anns, max_dets)
42 |             if max_dets >= 0
43 |             else cat_det_dt_anns
44 |         )
45 |         dt_anns["inst_det"] = (
46 |             self.limit_detections_per_image(inst_det_dt_anns, max_dets)
47 |             if max_dets >= 0
48 |             else inst_det_dt_anns
49 |         )
50 | 
51 |         logger.info(
52 |             f"after limit detections per image, len inst_det {len(dt_anns['inst_det'])}"
53 |         )
54 | 
55 |         for _k, anns in dt_anns.items():
56 |             if len(anns) > 0:
57 |                 assert "bbox" in anns[0]
58 |             for id, ann in enumerate(anns):
59 |                 _x1, _y1, w, h = ann["bbox"]
60 |                 ann["area"] = w * h
61 |                 ann["id"] = id + 1
62 | 
63 |         self.annotations = dt_anns
64 |         self._create_index(gt.metadata)
65 | 
66 |         # cat_det_dt_anns can be empty when we do not do category detection in the model.
67 |         if len(cat_det_dt_anns) > 0:
68 |             cat_det_img_ids_in_result = [ann["image_id"] for ann in cat_det_dt_anns]
69 | 
70 |             assert set(cat_det_img_ids_in_result) == (
71 |                 set(cat_det_img_ids_in_result) & set(self.get_img_ids())
72 |             ), "Results do not correspond to current EgoObjects dataset."
73 | 
74 |     def limit_detections_per_image(self, anns, max_dets):
75 |         img_ann = defaultdict(list)
76 |         for ann in anns:
77 |             img_ann[ann["image_id"]].append(ann)
78 | 
79 |         for img_id, _anns in img_ann.items():
80 |             if len(_anns) <= max_dets:
81 |                 continue
82 |             _anns = sorted(_anns, key=lambda ann: ann["score"], reverse=True)
83 |             img_ann[img_id] = _anns[:max_dets]
84 | 
85 |         return [ann for anns in img_ann.values() for ann in anns]
86 | 


--------------------------------------------------------------------------------
/example.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # Copyright (c) Meta Platforms, Inc. and affiliates.
  3 | # All rights reserved.
  4 | 
  5 | # This source code is licensed under the license found in the
  6 | # LICENSE file in the root directory of this source tree.
  7 | 
  8 | import json
  9 | import logging
 10 | import unittest
 11 | from copy import deepcopy
 12 | 
 13 | import numpy as np
 14 | from detectron2.utils.logger import create_small_table
 15 | from detectron2.data import MetadataCatalog
 16 | from egoobjects_api.eval import EgoObjectsEval
 17 | from egoobjects_api.results import EgoObjectsResults
 18 | from egoobjects_api.egoobjects import EgoObjects, FILTER_OPTS
 19 | 
 20 | logging.basicConfig(level=logging.INFO)
 21 | logger = logging.getLogger(__name__)
 22 | 
 23 | gt_json_file = "./data/EgoObjectsV1_unified_eval.json"
 24 | metadata_json_file = "./data/EgoObjectsV1_unified_metadata.json"
 25 | 
 26 | metric_filter = {}
 27 | 
 28 | # NOTE on legends
 29 | # iou     -- the IOU threshold for computing metircs, "coco" refers to the averaging for IOU = [0.5, 0.55, ..., 0.95]
 30 | # gr      -- the grouping for categories, for category detection it could in ["all", "frequent", "common", "rare"]
 31 | #             for instance detection, it could be in ["all", "seen", "unseen"]
 32 | # ar      -- area ratio for the gt, it can be in ["all", "small", "medium", "large"]
 33 | # bg      -- background, choice in ["all", "busy", "simple"]
 34 | # lt      -- lighting condition, choice in ["all", "bright", "dim"]
 35 | # df      -- the difficulty for the test sample, only used for instance detection, as we already explicitly splited
 36 | #             the validation set into an easy one and a hard one. It's all filled with "all"
 37 | metric_filter["cat_det"] = [
 38 |     {"iou": "coco", "gr": "all", "ar": "all", "bg": "all", "lt": "all", "df": "all"},
 39 |     {"iou": "50", "gr": "all", "ar": "all", "bg": "all", "lt": "all", "df": "all"},
 40 |     {"iou": "75", "gr": "all", "ar": "all", "bg": "all", "lt": "all", "df": "all"},
 41 | 
 42 |     {"iou": "50", "gr": "frequent", "ar": "all", "bg": "all", "lt": "all", "df": "all"},
 43 |     {"iou": "50", "gr": "common", "ar": "all", "bg": "all", "lt": "all", "df": "all"},
 44 |     {"iou": "50", "gr": "rare", "ar": "all", "bg": "all", "lt": "all", "df": "all"},
 45 | 
 46 |     {"iou": "50", "gr": "all", "ar": "large", "bg": "all", "lt": "all", "df": "all"},
 47 |     {"iou": "50", "gr": "all", "ar": "medium", "bg": "all", "lt": "all", "df": "all"},
 48 |     {"iou": "50", "gr": "all", "ar": "small", "bg": "all", "lt": "all", "df": "all"},
 49 | 
 50 |     {"iou": "50", "gr": "all", "ar": "all", "bg": "all", "lt": "bright", "df": "all"},
 51 |     {"iou": "50", "gr": "all", "ar": "all", "bg": "all", "lt": "dim", "df": "all"},
 52 |     {"iou": "50", "gr": "all", "ar": "all", "bg": "simple", "lt": "all", "df": "all"},
 53 |     {"iou": "50", "gr": "all", "ar": "all", "bg": "busy", "lt": "all", "df": "all"},
 54 | ]
 55 | 
 56 | metric_filter["inst_det"] = [
 57 |     {"iou": "coco", "gr": "all", "ar": "all", "bg": "all", "lt": "all", "df": "all"},
 58 |     {"iou": "50", "gr": "all", "ar": "all", "bg": "all", "lt": "all", "df": "all"},
 59 |     {"iou": "75", "gr": "all", "ar": "all", "bg": "all", "lt": "all", "df": "all"},
 60 | 
 61 |     {"iou": "50", "gr": "seen", "ar": "all", "bg": "all", "lt": "all", "df": "all"},
 62 |     {"iou": "50", "gr": "unseen", "ar": "all", "bg": "all", "lt": "all", "df": "all"},
 63 | 
 64 |     {"iou": "50", "gr": "all", "ar": "large", "bg": "all", "lt": "all", "df": "all"},
 65 |     {"iou": "50", "gr": "all", "ar": "medium", "bg": "all", "lt": "all", "df": "all"},
 66 |     {"iou": "50", "gr": "all", "ar": "small", "bg": "all", "lt": "all", "df": "all"},
 67 | 
 68 |     {"iou": "50", "gr": "all", "ar": "all", "bg": "all", "lt": "bright", "df": "all"},
 69 |     {"iou": "50", "gr": "all", "ar": "all", "bg": "all", "lt": "dim", "df": "all"},
 70 |     {"iou": "50", "gr": "all", "ar": "all", "bg": "simple", "lt": "all", "df": "all"},
 71 |     {"iou": "50", "gr": "all", "ar": "all", "bg": "busy", "lt": "all", "df": "all"},
 72 | ]
 73 | 
 74 | 
 75 | def get_egoobjects_meta(metadata_path: str):
 76 |     """
 77 |     return metadata dictionary with 4 keys:
 78 |         cat_det_cats
 79 |         inst_det_cats
 80 |         cat_det_cat_id_2_cont_id
 81 |         cat_det_cat_names
 82 |     """
 83 |     with open(metadata_path, "r") as fp:
 84 |         metadata = json.load(fp)
 85 | 
 86 |     cat_det_cat_id_2_name = {cat["id"]: cat["name"] for cat in metadata["cat_det_cats"]}
 87 |     cat_det_cat_ids = sorted([cat["id"] for cat in metadata["cat_det_cats"]])
 88 |     cat_det_cat_id_2_cont_id = {cat_id: i for i, cat_id in enumerate(cat_det_cat_ids)}
 89 |     cat_det_cat_names = [cat_det_cat_id_2_name[cat_id] for cat_id in cat_det_cat_ids]
 90 | 
 91 |     metadata["cat_det_cat_id_2_cont_id"] = cat_det_cat_id_2_cont_id
 92 |     metadata["cat_det_cat_names"] = cat_det_cat_names
 93 |     return metadata
 94 | 
 95 | def main():
 96 |     dataset_name = "EgoObjects"
 97 |     metadata = get_egoobjects_meta(metadata_json_file)
 98 |     MetadataCatalog.get(dataset_name).set(**metadata)
 99 |     metadata = MetadataCatalog.get(dataset_name)
100 | 
101 |     split = "egoobjects_unified_det_val_query"
102 |     gt = EgoObjects(gt_json_file, metadata, filter_opts=FILTER_OPTS[split])
103 | 
104 |     # dummy category detection predictions
105 |     dt_cat = [
106 |         deepcopy(ann)
107 |         for ann in gt.dataset["annotations"]
108 |         if "category_id" in ann and np.random.uniform() > 0.1
109 |     ]
110 | 
111 |     logger.info(f"len dt_cat {len(dt_cat)}")
112 | 
113 |     for dt_box in dt_cat:
114 |         cx, cy, w, h = dt_box["bbox"]
115 |         w = np.random.randint(int(w * 0.5), w)
116 |         h = np.random.randint(int(h * 0.5), h)
117 |         image_id = dt_box["image_id"]
118 |         category_id = dt_box["category_id"]
119 | 
120 |         dt_box["bbox"] = [cx, cy, w, h]
121 |         dt_box["area"] = w * h
122 |         dt_box["image_id"] = image_id
123 |         dt_box["category_id"] = category_id
124 |         dt_box["score"] = np.random.rand(1)[0]
125 | 
126 |     # dummy instance detection predictions
127 |     dt_inst = [
128 |         deepcopy(ann)
129 |         for ann in gt.dataset["annotations"]
130 |         if "instance_id" in ann and np.random.uniform() > 0.2
131 |     ]
132 | 
133 |     logger.info(f"len dt_inst {len(dt_inst)}")
134 | 
135 |     for dt_box in dt_inst:
136 |         cx, cy, w, h = dt_box["bbox"]
137 |         w = np.random.randint(int(w * 0.5), w)
138 |         h = np.random.randint(int(h * 0.5), h)
139 |         image_id = dt_box["image_id"]
140 |         instance_id = dt_box["instance_id"]
141 | 
142 |         dt_box["bbox"] = [cx, cy, w, h]
143 |         dt_box["area"] = w * h
144 |         dt_box["image_id"] = image_id
145 |         dt_box["instance_id"] = instance_id
146 |         dt_box["score"] = np.random.rand(1)[0]
147 | 
148 |     dt = EgoObjectsResults(gt, dt_cat, dt_inst)
149 |     evaluator = EgoObjectsEval(gt, dt, num_processes=32)
150 |     evaluator.run(metric_filter)
151 |     evaluator.print_results()
152 | 
153 |     results = evaluator.get_results()
154 |     for det_type in ["cat_det", "inst_det"]:
155 |         logger.info(f"{det_type} results")
156 |         one_result = results[det_type]
157 |         one_result = {metric: float(one_result[metric]["value"] * 100) for metric in one_result.keys()}
158 |         for _name, metric in one_result.items():
159 |             assert metric < 100.0
160 |         logger.info(create_small_table(one_result))
161 | 
162 | if __name__ == "__main__":
163 |     main()


--------------------------------------------------------------------------------
/images/ICCV2023_poster_EgoObjects.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/EgoObjects/88e683b53962637136fdd497cfb3067caf831012/images/ICCV2023_poster_EgoObjects.jpg


--------------------------------------------------------------------------------
/images/intro.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/EgoObjects/88e683b53962637136fdd497cfb3067caf831012/images/intro.png


--------------------------------------------------------------------------------
/images/intro_teaser.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/EgoObjects/88e683b53962637136fdd497cfb3067caf831012/images/intro_teaser.png


--------------------------------------------------------------------------------
/images/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/EgoObjects/88e683b53962637136fdd497cfb3067caf831012/images/logo.png


--------------------------------------------------------------------------------
/images/sample_images.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/EgoObjects/88e683b53962637136fdd497cfb3067caf831012/images/sample_images.png


--------------------------------------------------------------------------------
/images/taxonomy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/EgoObjects/88e683b53962637136fdd497cfb3067caf831012/images/taxonomy.png


--------------------------------------------------------------------------------