├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── egoobjects_api ├── __init__.py ├── egoobjects.py ├── eval.py └── results.py ├── example.py └── images ├── ICCV2023_poster_EgoObjects.jpg ├── intro.png ├── intro_teaser.png ├── logo.png ├── sample_images.png └── taxonomy.png /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as 6 | contributors and maintainers pledge to make participation in our project and 7 | our community a harassment-free experience for everyone, regardless of age, body 8 | size, disability, ethnicity, sex characteristics, gender identity and expression, 9 | level of experience, education, socio-economic status, nationality, personal 10 | appearance, race, religion, or sexual identity and orientation. 11 | 12 | ## Our Standards 13 | 14 | Examples of behavior that contributes to creating a positive environment 15 | include: 16 | 17 | * Using welcoming and inclusive language 18 | * Being respectful of differing viewpoints and experiences 19 | * Gracefully accepting constructive criticism 20 | * Focusing on what is best for the community 21 | * Showing empathy towards other community members 22 | 23 | Examples of unacceptable behavior by participants include: 24 | 25 | * The use of sexualized language or imagery and unwelcome sexual attention or 26 | advances 27 | * Trolling, insulting/derogatory comments, and personal or political attacks 28 | * Public or private harassment 29 | * Publishing others' private information, such as a physical or electronic 30 | address, without explicit permission 31 | * Other conduct which could reasonably be considered inappropriate in a 32 | professional setting 33 | 34 | ## Our Responsibilities 35 | 36 | Project maintainers are responsible for clarifying the standards of acceptable 37 | behavior and are expected to take appropriate and fair corrective action in 38 | response to any instances of unacceptable behavior. 39 | 40 | Project maintainers have the right and responsibility to remove, edit, or 41 | reject comments, commits, code, wiki edits, issues, and other contributions 42 | that are not aligned to this Code of Conduct, or to ban temporarily or 43 | permanently any contributor for other behaviors that they deem inappropriate, 44 | threatening, offensive, or harmful. 45 | 46 | ## Scope 47 | 48 | This Code of Conduct applies within all project spaces, and it also applies when 49 | an individual is representing the project or its community in public spaces. 50 | Examples of representing a project or community include using an official 51 | project e-mail address, posting via an official social media account, or acting 52 | as an appointed representative at an online or offline event. Representation of 53 | a project may be further defined and clarified by project maintainers. 54 | 55 | This Code of Conduct also applies outside the project spaces when there is a 56 | reasonable belief that an individual's behavior may have a negative impact on 57 | the project or its community. 58 | 59 | ## Enforcement 60 | 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 62 | reported by contacting the project team at . All 63 | complaints will be reviewed and investigated and will result in a response that 64 | is deemed necessary and appropriate to the circumstances. The project team is 65 | obligated to maintain confidentiality with regard to the reporter of an incident. 66 | Further details of specific enforcement policies may be posted separately. 67 | 68 | Project maintainers who do not follow or enforce the Code of Conduct in good 69 | faith may face temporary or permanent repercussions as determined by other 70 | members of the project's leadership. 71 | 72 | ## Attribution 73 | 74 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, 75 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html 76 | 77 | [homepage]: https://www.contributor-covenant.org 78 | 79 | For answers to common questions about this code of conduct, see 80 | https://www.contributor-covenant.org/faq -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to EgoObjects 2 | We want to make contributing to this project as easy and transparent as 3 | possible. 4 | 5 | ## Our Development Process 6 | Minor changes and improvements will be released on an ongoing basis. Larger changes (e.g., changesets implementing a new paper) will be released on a more periodic basis. 7 | 8 | ## Pull Requests 9 | We actively welcome your pull requests. 10 | 11 | 1. Fork the repo and create your branch from `main`. 12 | 2. If you've added code that should be tested, add tests. 13 | 3. If you've changed APIs, update the documentation. 14 | 4. Ensure the test suite passes. 15 | 5. Make sure your code lints. 16 | 6. If you haven't already, complete the Contributor License Agreement ("CLA"). 17 | 18 | ## Contributor License Agreement ("CLA") 19 | In order to accept your pull request, we need you to submit a CLA. You only need 20 | to do this once to work on any of Facebook's open source projects. 21 | 22 | Complete your CLA here: 23 | 24 | ## Issues 25 | We use GitHub issues to track public bugs. Please ensure your description is 26 | clear and has sufficient instructions to be able to reproduce the issue. 27 | 28 | Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe 29 | disclosure of security bugs. In those cases, please go through the process 30 | outlined on that page and do not file a public issue. 31 | 32 | ## Coding Style 33 | * 4 spaces for indentation rather than tabs 34 | * 80 character line length 35 | * PEP8 formatting following [Black](https://black.readthedocs.io/en/stable/) 36 | 37 | ## License 38 | By contributing to EgoObjects, you agree that your contributions will be licensed 39 | under the LICENSE file in the root directory of this source tree. -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) Meta Platforms, Inc. and affiliates. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

2 | 3 |

4 | 5 | # EgoObjects 6 | 7 | Official pytorch implementation of the ICCV 23' paper 8 | 9 | **[EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding](https://arxiv.org/abs/2309.08816)** 10 | 11 | [Chenchen Zhu](https://sites.google.com/andrew.cmu.edu/zcckernel), [Fanyi Xiao](https://fanyix.cs.ucdavis.edu/), [Andres Alvarado](https://www.linkedin.com/in/josecarlos12/), [Yasmine Babaei](https://www.linkedin.com/in/yasminebabaei/), [Jiabo Hu](https://www.linkedin.com/in/jiabo-hu-1321b1121/), [Hichem El-Mohri](https://www.linkedin.com/in/hichem-elmohri/), [Sean Chang Culatana](https://ai.meta.com/people/sean-chang-culatana/), [Roshan Sumbaly](https://www.linkedin.com/in/rsumbaly/), [Zhicheng Yan](https://sites.google.com/view/zhicheng-yan) 12 | 13 | **Meta AI** 14 | 15 | 16 | 17 | EgoObjects is a large-scale egocentric dataset for fine-grained object understanding, which features videos captured by various wearable devices at worldwide locations, objects from a diverse set of categories commonly seen in indoor environments, and videos of the same object instance captured under diverse conditions. The dataset supports both the conventional category-level as well as the novel instance-level object detection task. 18 | 19 |

20 | 21 |

22 | 23 |

24 | 25 |

26 | 27 | 28 | ## EgoObjects v1.0 29 | 30 | For this release, we have annotated 114K frames (79K train, 5.7K val, 29.5K test) sampled from 9K+ videos collected by 250 participants across the world. A total of 14.4K unique object instances from 368 categories are annotated. Among them, there are 1.3K main object instances from 206 categories and 13.1K secondary object instances (i.e., objects accompanying the main object) from 353 categories. On average, each image is annotated with 5.6 instances from 4.8 categories, and each object instance appears in 44.8 images, which ensures diverse viewing directions for the object. 31 | 32 | ## Dataset downloading 33 | 34 | Release v1.0 is publicly available from this [page](https://ai.meta.com/datasets/egoobjects-downloads). Images (~40G) can be downloaded from file `EgoObjectsV1_images.zip`. Unified annotations for category and instance level object detection can be downloaded from files including `EgoObjectsV1_unified_train.json`, `EgoObjectsV1_unified_eval.json`, and `EgoObjectsV1_unified_metadata.json`. They can be placed under `$EgoObjects_ROOT/data/`. We follow the same data format as [LVIS](https://www.lvisdataset.org/dataset) with EgoObjects specific changes. 35 | 36 | ## Setup 37 | 38 | ### Requirements 39 | - Linux with Python ≥ 3.8 40 | - PyTorch ≥ 1.8. 41 | Install them together at [pytorch.org](https://pytorch.org) to make sure of this. Note, please check 42 | PyTorch version matches that is required by Detectron2. 43 | - Detectron2: follow [Detectron2 installation instructions](https://detectron2.readthedocs.io/tutorials/install.html). 44 | 45 | ### Example conda environment setup 46 | ```bash 47 | conda create --name egoobjects python=3.9 48 | conda activate egoobjects 49 | conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia 50 | python -m pip install 'git+https://github.com/facebookresearch/detectron2.git' 51 | 52 | # under your working directory 53 | git clone https://github.com/facebookresearch/EgoObjects.git 54 | cd EgoObjects 55 | ``` 56 | 57 | If setup correctly, run our evaluation example code to get mock results for category and instance level detection tasks: 58 | ```bash 59 | python example.py 60 | ``` 61 | 62 | ## Timeline 63 | - 23' Sep 6, EgoObjects v1.0, including both data and annotations, is open sourced. 64 | - 23' March, EgoObjects v1.0 is covered by Meta AI 65 | - 22' June, an earlier version of EgoObjects dataset is adopted by the Continual Learning Challenge in the 3rd CLVISION Workshop at CVPR. 66 | - 22' March, EgoObjects is first introduced by Meta AI Blog 67 | 68 | ## EgoObjects ICCV 23' Poster 69 | 70 |

71 | 72 |

73 | 74 | ## Citing EgoObjects 75 | Paper link: [[`arXiv`](https://arxiv.org/abs/2309.08816)] 76 | 77 | If you find this code/data useful in your research then please cite our paper: 78 | ``` 79 | @inproceedings{zhu2023egoobjects, 80 | title={EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding}, 81 | author={Zhu, Chenchen and Xiao, Fanyi and Alvarado, Andrés and Babaei, Yasmine and Hu, Jiabo and El-Mohri, Hichem and Chang, Sean and Sumbaly, Roshan and Yan, Zhicheng}, 82 | booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, 83 | year={2023} 84 | } 85 | ``` 86 | 87 | ## Credit 88 | The code is a re-write of PythonAPI for [LVIS](https://github.com/lvis-dataset/lvis-api). 89 | The core functionality is the same with EgoObjects specific changes. 90 | 91 | ## License 92 | EgoObjects is licensed under the [MIT License](LICENSE). 93 | -------------------------------------------------------------------------------- /egoobjects_api/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Meta Platforms, Inc. and affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. -------------------------------------------------------------------------------- /egoobjects_api/egoobjects.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright (c) Meta Platforms, Inc. and affiliates. 3 | # All rights reserved. 4 | 5 | # This source code is licensed under the license found in the 6 | # LICENSE file in the root directory of this source tree. 7 | 8 | import json 9 | import logging 10 | from collections import defaultdict 11 | from copy import deepcopy 12 | from typing import List, Optional, Dict, Any 13 | 14 | from detectron2.data.catalog import Metadata 15 | 16 | logger = logging.getLogger(__name__) 17 | 18 | 19 | FILTER_OPTS = { 20 | # instance detection 21 | "egoobjects_unified_det_train": {}, 22 | "egoobjects_unified_det_test_query": { 23 | "subset": "test", 24 | }, 25 | "egoobjects_unified_det_val_query": { 26 | "subset": "val", 27 | }, 28 | # category detection 29 | "egoobjects_cat_det_train": { 30 | "remove_non_category": True, 31 | }, 32 | "egoobjects_cat_det_test": { 33 | "subset": "test", 34 | "remove_non_category": True, 35 | }, 36 | "egoobjects_cat_det_val": { 37 | "subset": "val", 38 | "remove_non_category": True, 39 | }, 40 | # instance detection with seen/unseen category 41 | "egoobjects_instdet_train": {}, 42 | "egoobjects_instdet_test_query": { 43 | "subset": "test", 44 | }, 45 | "egoobjects_instdet_val_query": { 46 | "subset": "val", 47 | }, 48 | } 49 | 50 | 51 | def filter_annot( 52 | data, 53 | metadata, 54 | filter_opts, 55 | ): 56 | if filter_opts is None: 57 | filter_opts = {} 58 | 59 | valid_image_set = set([x["id"] for x in data["images"]]) 60 | # filter according to easy/hard splits 61 | if "difficulty" in filter_opts and filter_opts["difficulty"]: 62 | selected_image_set = set( 63 | [ 64 | x["id"] 65 | for x in data["images"] 66 | if x["difficulty"] == filter_opts["difficulty"] 67 | ] 68 | ) 69 | valid_image_set = valid_image_set & selected_image_set 70 | 71 | # filter according to minival splits 72 | if "subset" in filter_opts and filter_opts["subset"]: 73 | selected_image_set = set( 74 | [x["id"] for x in data["images"] if x["subset"] == filter_opts["subset"]] 75 | ) 76 | valid_image_set = valid_image_set & selected_image_set 77 | 78 | # filter out annotations/images without category_id field 79 | if "remove_non_category" in filter_opts and filter_opts["remove_non_category"]: 80 | if isinstance(metadata, Dict): 81 | cat_det_cats = metadata["cat_det_cats"] 82 | else: 83 | cat_det_cats = metadata.cat_det_cats 84 | kept_annot = [] 85 | kept_image_id = set() 86 | kept_cat_ids = set([x["id"] for x in cat_det_cats]) 87 | for anno in data["annotations"]: 88 | if ( 89 | "category_id" in anno 90 | and anno["category_id"] in kept_cat_ids 91 | and anno["image_id"] in valid_image_set 92 | ): 93 | kept_annot.append(anno) 94 | kept_image_id.add(anno["image_id"]) 95 | data["annotations"] = kept_annot 96 | data["images"] = [x for x in data["images"] if x["id"] in kept_image_id] 97 | else: 98 | kept_annot = [] 99 | kept_image_id = set() 100 | for anno in data["annotations"]: 101 | if anno["image_id"] in valid_image_set: 102 | kept_annot.append(anno) 103 | kept_image_id.add(anno["image_id"]) 104 | data["annotations"] = kept_annot 105 | data["images"] = [x for x in data["images"] if x["id"] in kept_image_id] 106 | 107 | return data 108 | 109 | 110 | class EgoObjects: 111 | def __init__( 112 | self, 113 | annotation_path: str, 114 | metadata: Metadata, 115 | filter_opts: Any = None, 116 | ): 117 | """ 118 | Args: 119 | annotation_path: location of annotation file 120 | """ 121 | logger.info(f"annotation_path {annotation_path}") 122 | self.metadata = metadata 123 | 124 | with open(annotation_path, "r") as f: 125 | self.dataset = json.load(f) 126 | 127 | # is_valid_video_id = self._valid_video_ids() 128 | # if not is_valid_video_id: 129 | # self._replace_video_ids() 130 | 131 | # filter the dataset accordingly 132 | self.dataset = filter_annot( 133 | self.dataset, 134 | metadata, 135 | filter_opts, 136 | ) 137 | 138 | assert ( 139 | type(self.dataset) == dict 140 | ), f"Annotation file format {type(self.dataset)} not supported." 141 | 142 | self.annotations = { 143 | "cat_det": [ 144 | deepcopy(ann) 145 | for ann in self.dataset["annotations"] 146 | if "category_id" in ann 147 | ], 148 | "inst_det": [ 149 | deepcopy(ann) 150 | for ann in self.dataset["annotations"] 151 | if "instance_id" in ann 152 | ], 153 | } 154 | 155 | logger.info(f"num cat det instances {len(self.annotations['cat_det'])}") 156 | logger.info(f"num inst det instances {len(self.annotations['inst_det'])}") 157 | 158 | self._create_index(metadata) 159 | 160 | def _valid_video_ids(self): 161 | """dummy check on whether video ids lie in existing video_id """ 162 | video_ids_in_setting = {"01", "02", "03", "04", "05", "06", "07", "08", "09", "10"} 163 | video_ids_in_dataset = set([img['video_id'] for img in self.dataset['images']]) 164 | return video_ids_in_dataset.issubset(video_ids_in_setting) 165 | 166 | def _replace_video_ids(self): 167 | """ 168 | To align the `video_id` for *ego-object dataset towards existing dataset. 169 | Rules: 170 | {'1', '2', '3'} maps into {'01', '02', '03'}, 171 | while others {'V1', 'V2', 'V26', 'V28', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8'} map into {'04'} 172 | """ 173 | star_to_existing_vid_mapping = {'1': '01', '2': '02', '3': '03'} # hard-coded replacement for easy videos 174 | mapping_key_set = set(star_to_existing_vid_mapping.keys()) 175 | for img in self.dataset['images']: 176 | if img['video_id'] in mapping_key_set: 177 | img.update({'video_id': star_to_existing_vid_mapping[img['video_id']]}) 178 | else: 179 | img.update({'video_id':'04'}) # hard-coded replacement for complex videos 180 | 181 | logger.info("VideoID in *ego-object updated to match existing dataset.") 182 | return 183 | 184 | def _prepare_neg_instance_ids(self): 185 | img_id_2_instance_id = { 186 | img_id: {self.anns["inst_det"][ann_id]["instance_id"] for ann_id in ann_ids} 187 | for img_id, ann_ids in self.img_ann_map["inst_det"].items() 188 | } 189 | for img_id, img in self.imgs.items(): 190 | if img_id in img_id_2_instance_id: 191 | img["neg_instance_ids"] = self.instance_ids.difference( 192 | img_id_2_instance_id[img_id] 193 | ) 194 | else: 195 | img["neg_instance_ids"] = self.instance_ids 196 | 197 | def _prepare_neg_cat_ids(self): 198 | img_id_2_category_id = { 199 | img_id: {self.anns["cat_det"][ann_id]["category_id"] for ann_id in ann_ids} 200 | for img_id, ann_ids in self.img_ann_map["cat_det"].items() 201 | } 202 | for img_id, img in self.imgs.items(): 203 | if "neg_category_ids" in img: 204 | continue 205 | elif img_id in img_id_2_category_id: 206 | img["neg_category_ids"] = self.category_ids.difference( 207 | img_id_2_category_id[img_id] 208 | ) 209 | else: 210 | img["neg_category_ids"] = self.category_ids 211 | 212 | def _create_index(self, metadata): 213 | logger.info("Creating index.") 214 | 215 | self.cat_id_2_cat = {c["id"]: c for c in metadata.categories} 216 | self.imgs = {img["id"]: img for img in self.dataset["images"]} 217 | self.anns = defaultdict(dict) 218 | self.img_ann_map = {"cat_det": defaultdict(list), "inst_det": defaultdict(list)} 219 | for det_type, anns in self.annotations.items(): 220 | for ann in anns: 221 | self.anns[det_type][ann["id"]] = ann 222 | self.img_ann_map[det_type][ann["image_id"]].append(ann["id"]) 223 | 224 | logger.info( 225 | f"{det_type}, len img_ann_map {len(self.img_ann_map[det_type])}" 226 | ) 227 | 228 | # self.category_ids = {x["id"] for x in metadata.cat_det_cats} 229 | self.category_ids = {ann["category_id"] for ann in self.annotations["cat_det"]} 230 | self.instance_ids = {ann["instance_id"] for ann in self.annotations["inst_det"]} 231 | 232 | self._prepare_neg_instance_ids() 233 | self._prepare_neg_cat_ids() 234 | 235 | self.cats = { 236 | "cat_det": {c["id"]: c for c in metadata.cat_det_cats}, 237 | "inst_det": {c["id"]: c for c in metadata.inst_det_cats}, 238 | } 239 | 240 | self.classes = { 241 | "cat_det": {c["id"]: c for c in metadata.cat_det_cats}, 242 | "inst_det": {}, 243 | } 244 | for _i, ann in enumerate(self.annotations["inst_det"]): 245 | inst_id = ann["instance_id"] 246 | cat_id = ann["category_id"] if "category_id" in ann else None 247 | 248 | if inst_id not in self.classes["inst_det"]: 249 | inst_dict = {"id": inst_id} 250 | if cat_id is not None: 251 | if "frequency" in self.cat_id_2_cat[cat_id]: 252 | frequency = self.cat_id_2_cat[cat_id]["frequency"] 253 | else: 254 | # assign all sample to frequent group if not specified 255 | frequency = "frequent" 256 | inst_dict.update( 257 | { 258 | "category_id": cat_id, 259 | "frequency": frequency, 260 | } 261 | ) 262 | 263 | self.classes["inst_det"][inst_id] = inst_dict 264 | 265 | else: 266 | if cat_id is not None: 267 | assert self.classes["inst_det"][inst_id]["category_id"] == cat_id 268 | 269 | logger.info(f"num total images: {len(self.imgs)}") 270 | for det_type in ["cat_det", "inst_det"]: 271 | logger.info(f"num images for {det_type}: {len(self.img_ann_map[det_type])}") 272 | logger.info(f"num annotations for {det_type} {len(self.anns[det_type])}") 273 | logger.info( 274 | f"num object categories of {det_type} {len(self.cats[det_type])}" 275 | ) 276 | logger.info(f"num classes of {det_type} {len(self.classes[det_type])}") 277 | 278 | logger.info("Index created.") 279 | 280 | def get_img_ids(self): 281 | """Get all img ids. 282 | 283 | Returns: 284 | ids (int array): integer array of image ids 285 | """ 286 | return list(self.imgs.keys()) 287 | 288 | def get_class_ids(self, det_type: str): 289 | """Get all category ids of category detection. 290 | Args: 291 | det_type: detection type. Choices {"cat_det", "inst_det"} 292 | Returns: 293 | ids: integer array of category ids 294 | """ 295 | return list(self.classes[det_type].keys()) 296 | 297 | def get_ann_ids( 298 | self, 299 | det_type: str, 300 | img_ids: Optional[List[int]] = None, 301 | class_ids: Optional[List[int]] = None, 302 | ) -> List[int]: 303 | """Get ann ids that satisfy given filter conditions. 304 | 305 | Args: 306 | det_type: detection type. Choices {"cat_det", "inst_det"} 307 | img_ids: get anns for given imgs 308 | class_ids: get anns for given class ids, which are category ids for "cat_det" 309 | and instance ids for "inst_det" 310 | Returns: 311 | ids: integer array of ann ids 312 | """ 313 | assert det_type in self.img_ann_map 314 | anns = [] 315 | if img_ids is not None: 316 | for img_id in img_ids: 317 | if img_id in self.img_ann_map[det_type]: 318 | anns.extend( 319 | [ 320 | self.anns[det_type][ann_id] 321 | for ann_id in self.img_ann_map[det_type][img_id] 322 | ] 323 | ) 324 | else: 325 | anns = self.annotations[det_type] 326 | # return early if no more filtering required 327 | if class_ids is None: 328 | return [ann["id"] for ann in anns] 329 | 330 | class_ids = set(class_ids) 331 | 332 | ann_ids = [ 333 | _ann["id"] 334 | for _ann in anns 335 | if _ann["category_id" if det_type == "cat_det" else "instance_id"] 336 | in class_ids 337 | ] 338 | 339 | return ann_ids 340 | 341 | def _load_helper(self, _dict, ids): 342 | if ids is None: 343 | return list(_dict.values()) 344 | else: 345 | return [_dict[id] for id in ids] 346 | 347 | def load_anns(self, det_type: str, ids: Optional[List[int]] = None): 348 | """Load anns with the specified ids. If ids=None load all anns. 349 | 350 | Args: 351 | det_type: detection type. Choices {"cat_det", "inst_det"} 352 | ids: integer array of annotation ids 353 | 354 | Returns: 355 | anns: loaded annotation objects 356 | """ 357 | return self._load_helper(self.anns[det_type], ids) 358 | 359 | def load_classes(self, det_type: str, ids: Optional[List[int]] = None): 360 | """Load classes with the specified ids. 361 | If ids=None load all classes. 362 | 363 | Args: 364 | det_type: detection type. Choices {"cat_det", "inst_det"} 365 | ids: integer array of class ids 366 | 367 | Returns: 368 | classes: loaded class dicts 369 | """ 370 | return self._load_helper(self.classes[det_type], ids) 371 | 372 | def load_imgs(self, ids: Optional[List[int]] = None): 373 | """Load categories with the specified ids. If ids=None load all images. 374 | 375 | Args: 376 | ids: integer array of image ids 377 | 378 | Returns: 379 | imgs: loaded image dicts 380 | """ 381 | return self._load_helper(self.imgs, ids) 382 | 383 | 384 | class EgoObjectsMetaInfo: 385 | def __init__(self): 386 | self.video_id_to_setting = { 387 | "01": { 388 | "distance": "near", 389 | "camera motion": "horizontal", 390 | "background": "simple", 391 | "lighting": "bright", 392 | }, 393 | "02": { 394 | "distance": "medium", 395 | "camera motion": "horizontal", 396 | "background": "simple", 397 | "lighting": "bright", 398 | }, 399 | "03": { 400 | "distance": "near", 401 | "camera motion": "horizontal", 402 | "background": "simple", 403 | "lighting": "dim", 404 | }, 405 | "04": { 406 | "distance": "medium", 407 | "camera motion": "horizontal", 408 | "background": "busy", 409 | "lighting": "bright", 410 | }, 411 | "05": { 412 | "distance": "far", 413 | "camera motion": "horizontal", 414 | "background": "busy", 415 | "lighting": "bright", 416 | }, 417 | "06": { 418 | "distance": "medium", 419 | "camera motion": "vertical", 420 | "background": "busy", 421 | "lighting": "bright", 422 | }, 423 | "07": { 424 | "distance": "medium", 425 | "camera motion": "diagonal", 426 | "background": "busy", 427 | "lighting": "bright", 428 | }, 429 | "08": { 430 | "distance": "near", 431 | "camera motion": "horizontal", 432 | "background": "busy", 433 | "lighting": "dim", 434 | }, 435 | "09": { 436 | "distance": "medium", 437 | "camera motion": "horizontal", 438 | "background": "busy", 439 | "lighting": "dim", 440 | }, 441 | "10": { 442 | "distance": "far", 443 | "camera motion": "horizontal", 444 | "background": "busy", 445 | "lighting": "dim", 446 | }, 447 | } 448 | 449 | self.background = ["all", "simple", "busy"] 450 | self.lighting = ["all", "bright", "dim"] 451 | -------------------------------------------------------------------------------- /egoobjects_api/eval.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright (c) Meta Platforms, Inc. and affiliates. 3 | # All rights reserved. 4 | 5 | # This source code is licensed under the license found in the 6 | # LICENSE file in the root directory of this source tree. 7 | 8 | import datetime 9 | import logging 10 | import math 11 | 12 | import os 13 | import time 14 | from collections import defaultdict, OrderedDict 15 | from multiprocessing import Pool 16 | from typing import Dict, List, Optional, Set, Tuple 17 | 18 | import numpy as np 19 | import pycocotools.mask as mask_utils 20 | import torch 21 | # from egodet.metric.metric import RecallAtPrecision 22 | 23 | # from iopath.common.file_io import PathManager 24 | # from iopath.fb.manifold import ManifoldPathHandler 25 | 26 | from .egoobjects import EgoObjects, EgoObjectsMetaInfo 27 | 28 | from .results import EgoObjectsResults 29 | 30 | 31 | # pathmgr = PathManager() 32 | # pathmgr.register_handler(ManifoldPathHandler()) 33 | 34 | logger = logging.getLogger(__name__) 35 | 36 | 37 | def evaluate_img( 38 | det_type, 39 | img_id, 40 | class_id, 41 | area_ratio, 42 | background, 43 | lighting, 44 | difficulty, 45 | gt, 46 | dt, 47 | ious, 48 | params, 49 | img_nel, 50 | ): 51 | """Perform evaluation for single category and image.""" 52 | if len(gt) == 0 and len(dt) == 0: 53 | return None 54 | 55 | # Add another filed _ignore to only consider anns satisfying the constraints. 56 | for g in gt: 57 | ignore = g["ignore"] 58 | if ignore == 0 and (area_ratio != "all" and area_ratio != g["area_ratio"]): 59 | ignore = 1 60 | if ignore == 0 and (background != "all" and background != g["background"]): 61 | ignore = 1 62 | if ignore == 0 and (lighting != "all" and lighting != g["lighting"]): 63 | ignore = 1 64 | if ignore == 0 and ( 65 | difficulty != "all" and "difficulty" in g and difficulty != g["difficulty"] 66 | ): 67 | ignore = 1 68 | g["_ignore"] = ignore 69 | 70 | # Sort gt ignore last 71 | gt_idx = np.argsort([g["_ignore"] for g in gt], kind="mergesort") 72 | gt = [gt[i] for i in gt_idx] 73 | # Sort dt highest score first 74 | dt_idx = np.argsort([-d["score"] for d in dt], kind="mergesort") 75 | dt = [dt[i] for i in dt_idx] 76 | # load computed ious 77 | ious = ious[:, gt_idx] if len(ious) > 0 else ious 78 | 79 | num_thrs = len(params.iou_thrs) 80 | num_gt = len(gt) 81 | num_dt = len(dt) 82 | # Array to store the "id" of the matched dt/gt 83 | gt_m = np.zeros((num_thrs, num_gt)) 84 | dt_m = np.zeros((num_thrs, num_dt)) 85 | 86 | gt_ig = np.array([g["_ignore"] for g in gt]) 87 | dt_ig = np.zeros((num_thrs, num_dt)) 88 | 89 | for iou_thr_idx, iou_thr in enumerate(params.iou_thrs): 90 | if len(ious) == 0: 91 | break 92 | 93 | for dt_idx, _dt in enumerate(dt): 94 | iou = min([iou_thr, 1 - 1e-10]) 95 | # information about best match so far (m=-1 -> unmatched) 96 | # store the gt_idx which matched for _dt 97 | m = -1 98 | for gt_idx, _ in enumerate(gt): 99 | # if this gt already matched continue 100 | if gt_m[iou_thr_idx, gt_idx] > 0: 101 | continue 102 | # if _dt matched to reg gt, and on ignore gt, stop 103 | if m > -1 and gt_ig[m] == 0 and gt_ig[gt_idx] == 1: 104 | break 105 | # continue to next gt unless better match made 106 | if ious[dt_idx, gt_idx] < iou: 107 | continue 108 | # if match successful and best so far, store appropriately 109 | iou = ious[dt_idx, gt_idx] 110 | m = gt_idx 111 | 112 | # No match found for _dt, go to next _dt 113 | if m == -1: 114 | continue 115 | 116 | # if gt to ignore for some reason update dt_ig. 117 | # Should not be used in evaluation. 118 | dt_ig[iou_thr_idx, dt_idx] = gt_ig[m] 119 | # _dt match found, update gt_m, and dt_m with "id" 120 | dt_m[iou_thr_idx, dt_idx] = gt[m]["id"] 121 | gt_m[iou_thr_idx, m] = _dt["id"] 122 | 123 | # We will ignore any unmatched detection if that category was 124 | # not exhaustively annotated in gt. 125 | class_id_key = "category_id" if det_type == "cat_det" else "instance_id" 126 | # dt_ig_mask = [ 127 | # d["area"] < area_rng[0] 128 | # or d["area"] > area_rng[1] 129 | # or d[class_id_key] in img_nel[d["image_id"]] 130 | # for d in dt 131 | # ] 132 | dt_ig_mask = [ 133 | d[class_id_key] in img_nel[d["image_id"]] 134 | or (area_ratio != "all" and d["area_ratio"] != area_ratio) 135 | or (background != "all" and d["background"] != background) 136 | or (lighting != "all" and d["lighting"] != lighting) 137 | or (difficulty != "all" and "difficulty" in d and d["difficulty"] != difficulty) 138 | for d in dt 139 | ] 140 | dt_ig_mask = np.array(dt_ig_mask).reshape((1, num_dt)) # 1 X num_dt 141 | dt_ig_mask = np.repeat(dt_ig_mask, num_thrs, 0) # num_thrs X num_dt 142 | # Based on dt_ig_mask ignore any unmatched detection by updating dt_ig 143 | dt_ig = np.logical_or(dt_ig, np.logical_and(dt_m == 0, dt_ig_mask)) 144 | 145 | # store results for given image and category 146 | return { 147 | "dt_ids": [d["id"] for d in dt], 148 | "dt_matches": dt_m, 149 | "dt_scores": [d["score"] for d in dt], 150 | "gt_ignore": gt_ig, 151 | "dt_ignore": dt_ig, 152 | "config": (img_id, class_id, area_ratio, background, lighting, difficulty), 153 | } 154 | 155 | 156 | class EgoObjectsEval: 157 | def __init__( 158 | self, 159 | gt: EgoObjects, 160 | dt: EgoObjectsResults, 161 | num_processes: int = 1, 162 | max_dets: int = 100, 163 | eval_type: Tuple[str] = ("cat_det", "inst_det"), 164 | ): 165 | """ 166 | Args: 167 | gt: ground truth 168 | dt: detection results 169 | num_processes: If 0, use single main process. If >0, use multiprocessing.Pool to 170 | do evaluate() using threads 171 | max_dets: maximal detections per image 172 | """ 173 | assert num_processes >= 0 174 | 175 | self.gt = gt 176 | self.dt = dt 177 | self.num_processes = num_processes 178 | self.eval_type = eval_type 179 | 180 | self.eval_imgs = {} 181 | self.eval = {} 182 | self.gts = {} 183 | self.dts = {} 184 | self.results = {} 185 | self.ious = {} 186 | self.params = { 187 | "cat_det": CategoryDetectionParams(max_dets=max_dets), 188 | "inst_det": InstanceDetectionParams(max_dets=max_dets), 189 | } 190 | self.meta = EgoObjectsMetaInfo() 191 | self.freq_groups = {} 192 | self.img_nel = {} 193 | for det_type in self.eval_type: 194 | # per-image per-category evaluation results 195 | self.eval_imgs[det_type] = None 196 | self.eval[det_type] = {} # accumulated evaluation results 197 | self.gts[det_type] = defaultdict(list) # gt for evaluation 198 | self.dts[det_type] = defaultdict(list) # dt for evaluation 199 | self.results[det_type] = OrderedDict() 200 | self.ious[det_type] = {} # ious between all gts and dts 201 | 202 | self.params[det_type].img_ids = sorted(self.gt.get_img_ids()) 203 | self.params[det_type].class_ids = sorted(self.gt.get_class_ids(det_type)) 204 | 205 | logger.info( 206 | f"{det_type}, num class ids {len(self.params[det_type].class_ids)}" 207 | ) 208 | 209 | def run(self, metric_filter): 210 | unique_metrics = {} 211 | for det_type in self.eval_type: 212 | unique_metrics = set( 213 | [ 214 | f"ar{x['ar']}-bg{x['bg']}-lt{x['lt']}-df{x['df']}" 215 | for x in metric_filter[det_type] 216 | ] 217 | ) 218 | self.evaluate(det_type, unique_metrics) 219 | self.accumulate(det_type, unique_metrics) 220 | self.summarize(det_type, metric_filter[det_type]) 221 | 222 | def evaluate( 223 | self, det_type: str, unique_metrics: Set[str], multiprocessing: bool = False 224 | ): 225 | logger.info(f"Running per image evaluation for {det_type}.") 226 | 227 | start_time = time.time() 228 | 229 | class_ids = self.params[det_type].class_ids 230 | self._prepare(det_type) 231 | 232 | self.ious[det_type] = { 233 | (img_id, class_id): self.compute_iou(det_type, img_id, class_id) 234 | for img_id in self.params[det_type].img_ids 235 | for class_id in class_ids 236 | } 237 | 238 | logger.info(f"num_processes {self.num_processes}") 239 | 240 | # loop through images, area range, max detection number 241 | arg_tuples = [] 242 | for class_id in class_ids: 243 | for area_ratio in self.params[det_type].area_rng_lbl: 244 | for bg in self.meta.background: 245 | for lt in self.meta.lighting: 246 | for df in self.params[det_type].difficulty: 247 | for img_id in self.params[det_type].img_ids: 248 | metric_tag = f"ar{area_ratio}-bg{bg}-lt{lt}-df{df}" 249 | if metric_tag in unique_metrics: 250 | arg_tuples.append( 251 | ( 252 | det_type, 253 | img_id, 254 | class_id, 255 | area_ratio, 256 | bg, 257 | lt, 258 | df, 259 | self.gts[det_type][img_id, class_id], 260 | self.dts[det_type][img_id, class_id], 261 | self.ious[det_type][img_id, class_id], 262 | self.params[det_type], 263 | self.img_nel[det_type], 264 | ) 265 | ) 266 | 267 | if self.num_processes > 1: 268 | with Pool(self.num_processes) as pool: 269 | self.eval_imgs[det_type] = pool.starmap(evaluate_img, arg_tuples) 270 | else: 271 | self.eval_imgs[det_type] = [evaluate_img(*x) for x in arg_tuples] 272 | 273 | elapsed_time = time.time() - start_time 274 | logger.info(f"Elapsed time of {det_type} evaluate(): {elapsed_time:.2f} sec") 275 | 276 | def accumulate(self, det_type: str, unique_metrics: Set[str]): 277 | """Accumulate per image evaluation results and store the result in 278 | self.eval[det_type]. 279 | """ 280 | logger.info(f"Accumulating evaluation results for {det_type}.") 281 | if self.eval_imgs[det_type] is None: 282 | logger.warning(f"Please run evaluate('{det_type}') first.") 283 | 284 | class_ids = self.params[det_type].class_ids 285 | cls_id_2_idx = {x: i for i, x in enumerate(class_ids)} 286 | 287 | num_thrs = len(self.params[det_type].iou_thrs) 288 | num_recalls = len(self.params[det_type].rec_thrs) 289 | num_classes = len(class_ids) 290 | num_area_rngs = len(self.params[det_type].area_rng) 291 | num_backgrounds = len(self.meta.background) 292 | num_lightings = len(self.meta.lighting) 293 | num_difficulties = len(self.params[det_type].difficulty) 294 | 295 | # -1 for absent classes 296 | precision = -np.ones( 297 | ( 298 | num_thrs, 299 | num_recalls, 300 | num_classes, 301 | num_area_rngs, 302 | num_backgrounds, 303 | num_lightings, 304 | num_difficulties, 305 | ) 306 | ) 307 | recall = -np.ones( 308 | ( 309 | num_thrs, 310 | num_classes, 311 | num_area_rngs, 312 | num_backgrounds, 313 | num_lightings, 314 | num_difficulties, 315 | ) 316 | ) 317 | # recall_at_precision = -np.ones( 318 | # ( 319 | # num_thrs, 320 | # num_classes, 321 | # num_area_rngs, 322 | # num_backgrounds, 323 | # num_lightings, 324 | # num_difficulties, 325 | # ) 326 | # ) 327 | # recall_at_precision_metric = RecallAtPrecision( 328 | # self.params[det_type].recall_at_precision_k 329 | # ) 330 | 331 | # Initialize dt_pointers 332 | dt_pointers = {} 333 | for cls_idx in range(num_classes): 334 | dt_pointers[cls_idx] = {} 335 | for area_idx in range(num_area_rngs): 336 | dt_pointers[cls_idx][area_idx] = {} 337 | for bg_idx in range(num_backgrounds): 338 | dt_pointers[cls_idx][area_idx][bg_idx] = {} 339 | for lt_idx in range(num_lightings): 340 | dt_pointers[cls_idx][area_idx][bg_idx][lt_idx] = {} 341 | for df_idx in range(num_difficulties): 342 | dt_pointers[cls_idx][area_idx][bg_idx][lt_idx][df_idx] = {} 343 | 344 | results = defaultdict(list) 345 | for res in self.eval_imgs[det_type]: 346 | if res is not None: 347 | img_id, class_id, area, background, lighting, difficulty = res["config"] 348 | cls_idx = cls_id_2_idx[class_id] 349 | area_idx = self.params[det_type].area_rng_lbl.index(area) 350 | bg_idx = self.meta.background.index(background) 351 | lt_idx = self.meta.lighting.index(lighting) 352 | df_idx = self.params[det_type].difficulty.index(difficulty) 353 | results[(cls_idx, area_idx, bg_idx, lt_idx, df_idx)].append( 354 | (img_id, res) 355 | ) 356 | 357 | for config, E in results.items(): 358 | cls_idx, area_idx, bg_idx, lt_idx, df_idx = config 359 | E = [x[1] for x in E] 360 | 361 | # Append all scores: shape (N,) 362 | dt_scores = np.concatenate([e["dt_scores"] for e in E], axis=0) 363 | dt_ids = np.concatenate([e["dt_ids"] for e in E], axis=0) 364 | 365 | dt_idx = np.argsort(-dt_scores, kind="mergesort") 366 | dt_scores = dt_scores[dt_idx] 367 | dt_ids = dt_ids[dt_idx] 368 | 369 | dt_m = np.concatenate([e["dt_matches"] for e in E], axis=1)[:, dt_idx] 370 | dt_ig = np.concatenate([e["dt_ignore"] for e in E], axis=1)[:, dt_idx] 371 | 372 | gt_ig = np.concatenate([e["gt_ignore"] for e in E]) 373 | # num gt anns to consider 374 | num_gt = np.count_nonzero(gt_ig == 0) 375 | 376 | if num_gt == 0: 377 | continue 378 | 379 | tps = np.logical_and(dt_m, np.logical_not(dt_ig)) 380 | fps = np.logical_and(np.logical_not(dt_m), np.logical_not(dt_ig)) 381 | 382 | tp_sum = np.cumsum(tps, axis=1).astype(dtype=float) 383 | fp_sum = np.cumsum(fps, axis=1).astype(dtype=float) 384 | 385 | dt_pointers[cls_idx][area_idx][bg_idx][lt_idx][df_idx] = { 386 | "dt_ids": dt_ids, 387 | "tps": tps, 388 | "fps": fps, 389 | } 390 | 391 | for iou_thr_idx, (tp, fp) in enumerate(zip(tp_sum, fp_sum)): 392 | tp = np.array(tp) 393 | fp = np.array(fp) 394 | num_tp = len(tp) 395 | rc = tp / num_gt 396 | if num_tp: 397 | recall[iou_thr_idx, cls_idx, area_idx, bg_idx, lt_idx, df_idx] = rc[ 398 | -1 399 | ] 400 | else: 401 | recall[iou_thr_idx, cls_idx, area_idx, bg_idx, lt_idx, df_idx] = 0 402 | 403 | # np.spacing(1) ~= eps 404 | pr = tp / (fp + tp + np.spacing(1)) 405 | pr = pr.tolist() 406 | 407 | # Replace each precision value with the maximum precision 408 | # value to the right of that recall level. This ensures 409 | # that the calculated AP value will be less suspectable 410 | # to small variations in the ranking. 411 | for i in range(num_tp - 1, 0, -1): 412 | if pr[i] > pr[i - 1]: 413 | pr[i - 1] = pr[i] 414 | 415 | rec_thrs_insert_idx = np.searchsorted( 416 | rc, self.params[det_type].rec_thrs, side="left" 417 | ) 418 | 419 | pr_at_recall = [0.0] * num_recalls 420 | 421 | # we need to use "try-except" clause because for some high recall threshold, 422 | # the "pr_idx" == len(pr) 423 | try: 424 | for _idx, pr_idx in enumerate(rec_thrs_insert_idx): 425 | pr_at_recall[_idx] = pr[pr_idx] 426 | except BaseException: 427 | pass 428 | precision[ 429 | iou_thr_idx, :, cls_idx, area_idx, bg_idx, lt_idx, df_idx 430 | ] = np.array(pr_at_recall) 431 | # Compute recall_at_precision below 432 | dt_ig_i = dt_ig[iou_thr_idx] 433 | dt_scores_i = dt_scores[np.logical_not(dt_ig_i)] 434 | dt_m_i = dt_m[iou_thr_idx, np.logical_not(dt_ig_i)] 435 | dt_true_i = np.greater(dt_m_i, 0) 436 | 437 | # recall_at_precision_metric.reset_state() 438 | # recall_at_precision_metric.update_state( 439 | # torch.from_numpy(dt_true_i).to(torch.float32), 440 | # torch.from_numpy(dt_scores_i).to(torch.float32), 441 | # num_gt, 442 | # ) 443 | # res = recall_at_precision_metric.result() 444 | # recall_at_precision[ 445 | # iou_thr_idx, cls_idx, area_idx, bg_idx, lt_idx, df_idx 446 | # ] = res 447 | 448 | self.eval[det_type] = { 449 | "params": self.params[det_type], 450 | "counts": [ 451 | num_thrs, 452 | num_recalls, 453 | num_classes, 454 | num_area_rngs, 455 | num_backgrounds, 456 | num_lightings, 457 | ], 458 | "date": datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), 459 | "precision": precision, 460 | "recall": recall, 461 | "dt_pointers": dt_pointers, 462 | # "recall_at_precision": recall_at_precision, 463 | } 464 | 465 | def _summarize( 466 | self, 467 | det_type: str, 468 | summary_type: str, 469 | iou_thr: Optional[List[float]] = None, 470 | area_rng: str = "all", 471 | background: str = "all", 472 | lighting: str = "all", 473 | difficulty: str = "all", 474 | group: Optional[str] = None, 475 | topk=None, 476 | ): 477 | group_idx = None 478 | if group is not None and group != "all": 479 | if det_type == "cat_det": 480 | freq_group_idx = self.params[det_type].img_count_lbl.index(group) 481 | group_idx = self.freq_groups[det_type][freq_group_idx] 482 | else: 483 | assert group in {"seen", "unseen"} 484 | if group == "seen": 485 | group_idx = self.inst_det_seen_unseen_cat_groups[0] 486 | else: 487 | group_idx = self.inst_det_seen_unseen_cat_groups[1] 488 | 489 | aidx = np.array( 490 | [ 491 | idx 492 | for idx, _area_rng in enumerate(self.params[det_type].area_rng_lbl) 493 | if _area_rng == area_rng 494 | ] 495 | ) 496 | bidx = np.array( 497 | [ 498 | idx 499 | for idx, _background in enumerate(self.meta.background) 500 | if _background == background 501 | ] 502 | ) 503 | lidx = np.array( 504 | [ 505 | idx 506 | for idx, _lighting in enumerate(self.meta.lighting) 507 | if _lighting == lighting 508 | ] 509 | ) 510 | didx = np.array( 511 | [ 512 | idx 513 | for idx, _difficulty in enumerate(self.params[det_type].difficulty) 514 | if _difficulty == difficulty 515 | ] 516 | ) 517 | 518 | for idx in [aidx, bidx, lidx, didx]: 519 | if idx.size <= 0: 520 | return -1 521 | 522 | tidx = None 523 | if iou_thr is not None: 524 | iou_thr_to_idx = { 525 | x: i for i, x in enumerate(self.params[det_type].iou_thrs) 526 | } 527 | tidx = np.array([iou_thr_to_idx[x] for x in iou_thr]).astype(np.int64) 528 | 529 | if summary_type == "ap": 530 | s = self.eval[det_type]["precision"] 531 | if tidx is not None: 532 | s = s[tidx] 533 | if group_idx is not None: 534 | s = s[:, :, group_idx, aidx, bidx, lidx, didx] 535 | else: 536 | s = s[:, :, :, aidx, bidx, lidx, didx] 537 | elif summary_type == "ar": 538 | s = self.eval[det_type]["recall"] 539 | if tidx is not None: 540 | s = s[tidx] 541 | s = s[:, :, aidx, bidx, lidx, didx] 542 | elif summary_type == "r@p": 543 | s = self.eval[det_type]["recall_at_precision"] 544 | if tidx is not None: 545 | s = s[tidx] 546 | if group_idx is not None: 547 | s = s[:, group_idx, aidx, bidx, lidx, didx] 548 | else: 549 | s = s[:, :, aidx, bidx, lidx, didx] 550 | if topk is not None: 551 | sorted_s = -np.sort(-s, axis=1) 552 | s = sorted_s[:, :topk] 553 | else: 554 | raise ValueError(f"unknown summary type {summary_type}") 555 | 556 | if len(s[s > -1]) == 0: 557 | mean_s = -1 558 | else: 559 | mean_s = np.mean(s[s > -1]) 560 | return mean_s 561 | 562 | def summarize(self, det_type: str, metric_filter: List[Dict]): 563 | if not self.eval[det_type]: 564 | raise RuntimeError("Please run accumulate() first.") 565 | 566 | logger.info(f"Summarize detection results for {det_type}.") 567 | 568 | max_dets = self.params[det_type].max_dets 569 | coco_iou_thres = [0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95] 570 | coco_iou_thres_str = f"{coco_iou_thres[0]:0.2f}:{coco_iou_thres[-1]:0.2f}" 571 | 572 | results = self.results[det_type] 573 | 574 | for metric in metric_filter: 575 | key = "AP" 576 | for k in ["iou", "gr", "ar", "bg", "lt", "df"]: 577 | if k == "iou": 578 | if metric[k] != "coco": 579 | key = key + metric[k] 580 | elif metric[k] != "all": 581 | key = key + "-" + f"{k}{metric[k]}" 582 | 583 | results[key] = { 584 | "title": "AP", 585 | "iou_thres": coco_iou_thres_str 586 | if metric["iou"] == "coco" 587 | else int(metric["iou"]) / 100.0, 588 | "area_rng": metric["ar"], 589 | "background": metric["bg"], 590 | "lighting": metric["lt"], 591 | "difficulty": metric["df"], 592 | "cat_group_name": metric["gr"], 593 | "value": self._summarize( 594 | det_type, 595 | "ap", 596 | iou_thr=coco_iou_thres 597 | if metric["iou"] == "coco" 598 | else [int(metric["iou"]) / 100.0], 599 | group=metric["gr"], 600 | area_rng=metric["ar"], 601 | background=metric["bg"], 602 | lighting=metric["lt"], 603 | difficulty=metric["df"], 604 | ), 605 | } 606 | 607 | # for metric in metric_filter: 608 | # key = f"R@P{int(self.params[det_type].recall_at_precision_k * 100):02d}" 609 | # for k in ["iou", "gr", "ar", "bg", "lt", "df"]: 610 | # if k == "iou": 611 | # if metric[k] != "coco": 612 | # key = key + "-" + metric[k] 613 | # elif metric[k] != "all": 614 | # key = key + "-" + f"{k}{metric[k]}" 615 | 616 | # results[key] = { 617 | # "title": "R@P", 618 | # "iou_thres": coco_iou_thres_str 619 | # if metric["iou"] == "coco" 620 | # else int(metric["iou"]) / 100.0, 621 | # "area_rng": metric["ar"], 622 | # "background": metric["bg"], 623 | # "lighting": metric["lt"], 624 | # "difficulty": metric["df"], 625 | # "cat_group_name": metric["gr"], 626 | # "value": self._summarize( 627 | # det_type, 628 | # "r@p", 629 | # iou_thr=coco_iou_thres 630 | # if metric["iou"] == "coco" 631 | # else [int(metric["iou"]) / 100.0], 632 | # group=metric["gr"], 633 | # area_rng=metric["ar"], 634 | # background=metric["bg"], 635 | # lighting=metric["lt"], 636 | # difficulty=metric["df"], 637 | # ), 638 | # } 639 | 640 | key = f"AR50@{max_dets}" 641 | results[key] = { 642 | "title": "AR", 643 | "iou_thres": 0.5, 644 | "area_rng": "all", 645 | "background": "all", 646 | "lighting": "all", 647 | "cat_group_name": "all", 648 | "value": self._summarize(det_type, "ar", iou_thr=[0.5]), 649 | } 650 | 651 | def print_results(self): 652 | for det_type in self.eval_type: 653 | logger.info(f"print results for {det_type}") 654 | self._print_results(det_type) 655 | 656 | def _print_results(self, det_type: str): 657 | template = " {:<12} @[ IoU={:<9} | area={:>6s} | background={:>6s} | lighting={:>6s} | maxDets={:>3d} catIds={:>3s}] = {:0.3f}" 658 | 659 | for _key, value in self.results[det_type].items(): 660 | max_dets = self.params[det_type].max_dets 661 | 662 | logger.info( 663 | template.format( 664 | value["title"], 665 | value["iou_thres"], 666 | value["area_rng"], 667 | value["background"], 668 | value["lighting"], 669 | max_dets, 670 | value["cat_group_name"] + f"-top{value.get('topk', '')}", 671 | value["value"], 672 | ) 673 | ) 674 | 675 | def _get_gt_dt(self, det_type, img_id, class_id): 676 | """Create gt, dt which are list of anns/dets.""" 677 | gt = self.gts[det_type][img_id, class_id] 678 | dt = self.dts[det_type][img_id, class_id] 679 | return gt, dt 680 | 681 | def compute_iou(self, det_type: str, img_id: int, class_id: int): 682 | gt, dt = self._get_gt_dt(det_type, img_id, class_id) 683 | 684 | if len(gt) == 0 and len(dt) == 0: 685 | return [] 686 | 687 | # Sort detections in decreasing order of score. 688 | idx = np.argsort([-d["score"] for d in dt], kind="mergesort") 689 | dt = [dt[i] for i in idx] 690 | 691 | ann_type = "bbox" 692 | gt = [g[ann_type] for g in gt] 693 | dt = [d[ann_type] for d in dt] 694 | 695 | # compute iou between each dt and gt region 696 | # will return array of shape len(dt), len(gt) 697 | iscrowd = [int(False)] * len(gt) 698 | ious = mask_utils.iou(dt, gt, iscrowd) 699 | return ious 700 | 701 | def _prepare(self, det_type: str): 702 | img_ids = self.params[det_type].img_ids 703 | class_ids = self.params[det_type].class_ids 704 | 705 | logger.info(f"{det_type}, len params img_ids {len(img_ids)}") 706 | logger.info(f"{det_type}, len params class_ids {len(class_ids)}") 707 | 708 | gts = self.gt.load_anns( 709 | det_type, 710 | self.gt.get_ann_ids(det_type, img_ids=img_ids, class_ids=class_ids), 711 | ) 712 | dts = self.dt.load_anns( 713 | det_type, 714 | self.dt.get_ann_ids(det_type, img_ids=img_ids, class_ids=class_ids), 715 | ) 716 | 717 | logger.info(f"{det_type}, len gts {len(gts)}") 718 | logger.info(f"{det_type}, len dts {len(dts)}") 719 | 720 | # set ignore flag 721 | for gt in gts: 722 | if "ignore" not in gt: 723 | gt["ignore"] = 0 724 | 725 | for gt in gts: 726 | class_key = "category_id" if det_type == "cat_det" else "instance_id" 727 | self.gts[det_type][gt["image_id"], gt[class_key]].append(gt) 728 | 729 | # asscociate image meta info to each gt 730 | img = self.gt.imgs[gt["image_id"]] 731 | img_meta = self.meta.video_id_to_setting[img["video_id"]] 732 | for key in ["background", "lighting"]: 733 | gt[key] = img_meta[key] 734 | 735 | area_ratio = math.sqrt(gt["area"] / (img["height"] * img["width"])) 736 | if area_ratio < 0.1: 737 | gt["area_ratio"] = "small" 738 | elif area_ratio < 0.2: 739 | gt["area_ratio"] = "medium" 740 | else: 741 | gt["area_ratio"] = "large" 742 | 743 | if det_type == "inst_det": 744 | # [Easy] -- register and detect on simple 745 | # [Medium] -- register on busy, detect on simple; register and detect on busy 746 | # [Hard] -- register on simple, detect on busy 747 | instance_id = gt["instance_id"] 748 | # the background for the instance that's used for registration 749 | if ( 750 | hasattr(self.gt.metadata, "instance_register_bg") 751 | and instance_id in self.gt.metadata.instance_register_bg 752 | ): 753 | register_bg = self.gt.metadata.instance_register_bg[instance_id] 754 | query_bg = self.meta.video_id_to_setting[img["video_id"]][ 755 | "background" 756 | ] 757 | if register_bg == "simple" and query_bg == "simple": 758 | gt["difficulty"] = "easy" 759 | elif register_bg == "simple" and query_bg == "busy": 760 | gt["difficulty"] = "hard" 761 | else: 762 | gt["difficulty"] = "medium" 763 | else: 764 | logger.warning(f"instance_id={instance_id} is not registered!") 765 | 766 | # For federated dataset evaluation we will filter out all dt for an 767 | # image which belong to classes not present in gt and not present in 768 | # the negative list for an image. In other words detector is not penalized 769 | # for classes about which we don't have gt information about their 770 | # presence or absence in an image. 771 | img_data = self.gt.load_imgs(ids=self.params[det_type].img_ids) 772 | # per image map of classes not present in image 773 | neg_class_ids_key = ( 774 | "neg_category_ids" if det_type == "cat_det" else "neg_instance_ids" 775 | ) 776 | img_nl = {d["id"]: d[neg_class_ids_key] for d in img_data} 777 | # per image list of classes present in image 778 | img_pl = defaultdict(set) 779 | class_id_key = "category_id" if det_type == "cat_det" else "instance_id" 780 | for ann in gts: 781 | img_pl[ann["image_id"]].add(ann[class_id_key]) 782 | # per image map of classes which have missing gt. For these 783 | # classes we don't penalize the detector for false positives. 784 | self.img_nel[det_type] = { 785 | d["id"]: d["not_exhaustive_category_ids"] 786 | if det_type == "cat_det" and "not_exhaustive_category_ids" in d 787 | else [] 788 | for d in img_data 789 | } 790 | 791 | for dt in dts: 792 | img_id, class_id = dt["image_id"], dt[class_id_key] 793 | 794 | if class_id not in img_nl[img_id] and class_id not in img_pl[img_id]: 795 | continue 796 | 797 | # asscociate image meta info to each dt 798 | img = self.gt.imgs[dt["image_id"]] 799 | img_meta = self.meta.video_id_to_setting[img["video_id"]] 800 | for key in ["background", "lighting"]: 801 | dt[key] = img_meta[key] 802 | 803 | area_ratio = math.sqrt(dt["area"] / (img["height"] * img["width"])) 804 | if area_ratio < 0.1: 805 | dt["area_ratio"] = "small" 806 | elif area_ratio < 0.2: 807 | dt["area_ratio"] = "medium" 808 | else: 809 | dt["area_ratio"] = "large" 810 | 811 | if det_type == "inst_det": 812 | # [Easy] -- register and detect on simple 813 | # [Medium] -- register on busy, detect on simple; register and detect on busy 814 | # [Hard] -- register on simple, detect on busy 815 | instance_id = dt["instance_id"] 816 | # the background for the instance that's used for registration 817 | if ( 818 | hasattr(self.gt.metadata, "instance_register_bg") 819 | and instance_id in self.gt.metadata.instance_register_bg 820 | ): 821 | register_bg = self.gt.metadata.instance_register_bg[instance_id] 822 | query_bg = self.meta.video_id_to_setting[img["video_id"]][ 823 | "background" 824 | ] 825 | if register_bg == "simple" and query_bg == "simple": 826 | dt["difficulty"] = "easy" 827 | elif register_bg == "simple" and query_bg == "busy": 828 | dt["difficulty"] = "hard" 829 | else: 830 | dt["difficulty"] = "medium" 831 | else: 832 | logger.warning(f"instance_id={instance_id} is not registered!") 833 | self.dts[det_type][img_id, class_id].append(dt) 834 | 835 | self.freq_groups[det_type] = self._prepare_freq_group(det_type) 836 | 837 | if det_type == "inst_det": 838 | self.inst_det_seen_unseen_cat_groups = ( 839 | self._prepare_seen_unseen_cat_groups() 840 | ) 841 | 842 | def _prepare_freq_group(self, det_type: str): 843 | freq_groups = [[] for _ in self.params[det_type].img_count_lbl] 844 | class_data = self.gt.load_classes(det_type, self.params[det_type].class_ids) 845 | for idx, _class_data in enumerate(class_data): 846 | if "frequency" in _class_data: 847 | frequency = _class_data["frequency"] 848 | else: 849 | # assign all sample to frequent group if not specified 850 | frequency = "frequent" 851 | freq_groups[self.params[det_type].img_count_lbl.index(frequency)].append( 852 | idx 853 | ) 854 | 855 | return freq_groups 856 | 857 | def _prepare_seen_unseen_cat_groups(self): 858 | det_type = "inst_det" 859 | # 2 groups in total, including "seen" and "unseen" groups 860 | seen_unseen_groups = [[], []] 861 | class_data = self.gt.load_classes(det_type, self.params[det_type].class_ids) 862 | 863 | logger.info(f"num cat det categories {len(self.gt.classes['cat_det'])}") 864 | 865 | for idx, _class_data in enumerate(class_data): 866 | # Object categories consideted by category detection are common categories between 867 | # train and val split. 868 | group_id = ( 869 | 0 870 | if "category_id" in _class_data 871 | and _class_data["category_id"] in self.gt.classes["cat_det"] 872 | else 1 873 | ) 874 | seen_unseen_groups[group_id].append(idx) 875 | 876 | for group_id, group in enumerate(seen_unseen_groups): 877 | logger.info(f"seen/unseen group_id {group_id}, group size {len(group)}") 878 | 879 | return seen_unseen_groups 880 | 881 | def get_results(self): 882 | return {det_type: self._get_results(det_type) for det_type in self.eval_type} 883 | 884 | def _get_results(self, det_type: str): 885 | if len(self.results[det_type]) == 0: 886 | logger.warning(f"{det_type} results is empty. Call run().") 887 | return self.results[det_type] 888 | 889 | def log_per_class_results(self, output_dir): 890 | if output_dir: 891 | det_type = "cat_det" 892 | iou_thres = 0.5 893 | rec_at_prec_k = int(self.params[det_type].recall_at_precision_k * 100) 894 | recall_at_prec = self.eval[det_type]["recall_at_precision"] 895 | precision = self.eval[det_type]["precision"] 896 | 897 | for area_rng in self.params[det_type].area_rng_lbl: 898 | aidx = [ 899 | idx 900 | for idx, _area_rng in enumerate(self.params[det_type].area_rng_lbl) 901 | if _area_rng == area_rng 902 | ][0] 903 | tidx = np.where(iou_thres == self.params[det_type].iou_thrs)[0].item() 904 | bidx = [ 905 | idx 906 | for idx, _background in enumerate(self.meta.background) 907 | if _background == "all" 908 | ][0] 909 | lidx = [ 910 | idx 911 | for idx, _lighting in enumerate(self.meta.lighting) 912 | if _lighting == "all" 913 | ][0] 914 | didx = [ 915 | idx 916 | for idx, _difficulty in enumerate(self.params[det_type].difficulty) 917 | if _difficulty == "all" 918 | ][0] 919 | 920 | # print per-category R@P stats 921 | cur_recall_at_prec = recall_at_prec[ 922 | tidx, :, aidx, bidx, lidx, didx 923 | ].reshape(-1) 924 | sort_idx = np.argsort(-cur_recall_at_prec) 925 | 926 | lines = [] 927 | for idx in sort_idx.tolist(): 928 | r_at_p = cur_recall_at_prec[idx] 929 | cat = self.gt.cats[det_type][self.params[det_type].class_ids[idx]] 930 | lines.append( 931 | "{},{},{},{:0.2f}".format( 932 | cat["name"], 933 | cat["image_count"] if "image_count" in cat else 0, 934 | cat["instance_count"] if "instance_count" in cat else 0, 935 | r_at_p, 936 | ) 937 | ) 938 | 939 | key = "R@P{}-{}-{}.csv".format( 940 | rec_at_prec_k, int(iou_thres * 100), area_rng 941 | ) 942 | with open(os.path.join(output_dir, key), "w") as h: 943 | h.write("\n".join(lines)) 944 | 945 | # print per-category AP50 stats 946 | cur_precision = precision[tidx, :, :, aidx, bidx, lidx, didx] 947 | cur_precision = np.mean(cur_precision, axis=0) 948 | sort_idx = np.argsort(-cur_precision) 949 | 950 | lines = [] 951 | for idx in sort_idx.tolist(): 952 | ap50 = cur_precision[idx] 953 | cat = self.gt.cats[det_type][self.params[det_type].class_ids[idx]] 954 | lines.append( 955 | "{},{},{},{:0.2f}".format( 956 | cat["name"], 957 | cat["image_count"] if "image_count" in cat else 0, 958 | cat["instance_count"] if "instance_count" in cat else 0, 959 | ap50, 960 | ) 961 | ) 962 | 963 | key = "AP{}-{}.csv".format(int(iou_thres * 100), area_rng) 964 | with open(os.path.join(output_dir, key), "w") as h: 965 | h.write("\n".join(lines)) 966 | 967 | 968 | class CategoryDetectionParams: 969 | def __init__(self, max_dets: int = 100): 970 | """CategoryDetectionParams for EgoObjects evaluation API.""" 971 | self.img_ids = [] 972 | self.class_ids = [] 973 | # np.arange causes trouble. the data point on arange is slightly 974 | # larger than the true value 975 | # self.iou_thrs = np.linspace( 976 | # 0.5, 0.95, int(np.round((0.95 - 0.5) / 0.05)) + 1, endpoint=True 977 | # ) 978 | self.iou_thrs = np.array( 979 | [0.1, 0.25, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95] 980 | ) 981 | self.rec_thrs = np.linspace( 982 | 0.0, 1.00, int(np.round((1.00 - 0.0) / 0.01)) + 1, endpoint=True 983 | ) 984 | self.max_dets = max_dets 985 | self.area_rng = [ 986 | [0**2, 1e5**2], 987 | [0**2, 32**2], 988 | [32**2, 96**2], 989 | [96**2, 1e5**2], 990 | ] 991 | self.area_rng_lbl = ["all", "small", "medium", "large"] 992 | # self.use_cats = 1 993 | # We bin classes in three bins based how many images of the training 994 | # set the category is present in. 995 | # r: Rare : < 10 996 | # c: Common : >= 10 and < 100 997 | # f: Frequent: >= 100 998 | self.img_count_lbl = ["rare", "common", "frequent"] 999 | self.difficulty = ["all"] 1000 | 1001 | self.topk_classes = [100, 200] 1002 | self.recall_at_precision_k = 0.9 1003 | 1004 | 1005 | class InstanceDetectionParams: 1006 | def __init__(self, max_dets: int = 100): 1007 | """InstanceDetectionParams for EgoObjects evaluation API.""" 1008 | self.img_ids = [] 1009 | self.class_ids = [] 1010 | # np.arange causes trouble. the data point on arange is slightly 1011 | # larger than the true value 1012 | # self.iou_thrs = np.linspace( 1013 | # 0.5, 0.95, int(np.round((0.95 - 0.5) / 0.05)) + 1, endpoint=True 1014 | # ) 1015 | self.iou_thrs = np.array( 1016 | [0.1, 0.25, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95] 1017 | ) 1018 | self.rec_thrs = np.linspace( 1019 | 0.0, 1.00, int(np.round((1.00 - 0.0) / 0.01)) + 1, endpoint=True 1020 | ) 1021 | self.max_dets = max_dets 1022 | self.area_rng = [ 1023 | [0**2, 1e5**2], 1024 | [0**2, 32**2], 1025 | [32**2, 96**2], 1026 | [96**2, 1e5**2], 1027 | ] 1028 | self.area_rng_lbl = ["all", "small", "medium", "large"] 1029 | # self.use_cats = 1 1030 | # We bin classes in three bins based how many images of the training 1031 | # set the category is present in. 1032 | # r: Rare : < 10 1033 | # c: Common : >= 10 and < 100 1034 | # f: Frequent: >= 100 1035 | self.img_count_lbl = ["rare", "common", "frequent"] 1036 | self.difficulty = ["all", "easy", "medium", "hard"] 1037 | 1038 | self.topk_classes = [400, 800, 1200] 1039 | self.recall_at_precision_k = 0.9 1040 | -------------------------------------------------------------------------------- /egoobjects_api/results.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright (c) Meta Platforms, Inc. and affiliates. 3 | # All rights reserved. 4 | 5 | # This source code is licensed under the license found in the 6 | # LICENSE file in the root directory of this source tree. 7 | 8 | import logging 9 | from collections import defaultdict 10 | from copy import deepcopy 11 | from typing import List, Dict, Any 12 | 13 | from .egoobjects import EgoObjects 14 | 15 | logger = logging.getLogger(__name__) 16 | 17 | 18 | class EgoObjectsResults(EgoObjects): 19 | def __init__( 20 | self, 21 | gt: EgoObjects, 22 | cat_det_dt_anns: List[Dict[str, Any]], 23 | inst_det_dt_anns: List[Dict[str, Any]], 24 | max_dets: int = 300, 25 | ): 26 | """Constructor for EgoObjects results. 27 | Args: 28 | gt: EgoObjects class instance 29 | cat_det_dt_anns: detected bounding boxes for category detection 30 | inst_det_dt_anns: detected bounding boxes for instance detection 31 | max_dets: max number of detections per image. 32 | """ 33 | logger.info(f"num category detections {len(cat_det_dt_anns)}") 34 | logger.info(f"num instance detections {len(inst_det_dt_anns)}") 35 | 36 | self.dataset = deepcopy(gt.dataset) 37 | 38 | dt_anns = {} 39 | 40 | dt_anns["cat_det"] = ( 41 | self.limit_detections_per_image(cat_det_dt_anns, max_dets) 42 | if max_dets >= 0 43 | else cat_det_dt_anns 44 | ) 45 | dt_anns["inst_det"] = ( 46 | self.limit_detections_per_image(inst_det_dt_anns, max_dets) 47 | if max_dets >= 0 48 | else inst_det_dt_anns 49 | ) 50 | 51 | logger.info( 52 | f"after limit detections per image, len inst_det {len(dt_anns['inst_det'])}" 53 | ) 54 | 55 | for _k, anns in dt_anns.items(): 56 | if len(anns) > 0: 57 | assert "bbox" in anns[0] 58 | for id, ann in enumerate(anns): 59 | _x1, _y1, w, h = ann["bbox"] 60 | ann["area"] = w * h 61 | ann["id"] = id + 1 62 | 63 | self.annotations = dt_anns 64 | self._create_index(gt.metadata) 65 | 66 | # cat_det_dt_anns can be empty when we do not do category detection in the model. 67 | if len(cat_det_dt_anns) > 0: 68 | cat_det_img_ids_in_result = [ann["image_id"] for ann in cat_det_dt_anns] 69 | 70 | assert set(cat_det_img_ids_in_result) == ( 71 | set(cat_det_img_ids_in_result) & set(self.get_img_ids()) 72 | ), "Results do not correspond to current EgoObjects dataset." 73 | 74 | def limit_detections_per_image(self, anns, max_dets): 75 | img_ann = defaultdict(list) 76 | for ann in anns: 77 | img_ann[ann["image_id"]].append(ann) 78 | 79 | for img_id, _anns in img_ann.items(): 80 | if len(_anns) <= max_dets: 81 | continue 82 | _anns = sorted(_anns, key=lambda ann: ann["score"], reverse=True) 83 | img_ann[img_id] = _anns[:max_dets] 84 | 85 | return [ann for anns in img_ann.values() for ann in anns] 86 | -------------------------------------------------------------------------------- /example.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright (c) Meta Platforms, Inc. and affiliates. 3 | # All rights reserved. 4 | 5 | # This source code is licensed under the license found in the 6 | # LICENSE file in the root directory of this source tree. 7 | 8 | import json 9 | import logging 10 | import unittest 11 | from copy import deepcopy 12 | 13 | import numpy as np 14 | from detectron2.utils.logger import create_small_table 15 | from detectron2.data import MetadataCatalog 16 | from egoobjects_api.eval import EgoObjectsEval 17 | from egoobjects_api.results import EgoObjectsResults 18 | from egoobjects_api.egoobjects import EgoObjects, FILTER_OPTS 19 | 20 | logging.basicConfig(level=logging.INFO) 21 | logger = logging.getLogger(__name__) 22 | 23 | gt_json_file = "./data/EgoObjectsV1_unified_eval.json" 24 | metadata_json_file = "./data/EgoObjectsV1_unified_metadata.json" 25 | 26 | metric_filter = {} 27 | 28 | # NOTE on legends 29 | # iou -- the IOU threshold for computing metircs, "coco" refers to the averaging for IOU = [0.5, 0.55, ..., 0.95] 30 | # gr -- the grouping for categories, for category detection it could in ["all", "frequent", "common", "rare"] 31 | # for instance detection, it could be in ["all", "seen", "unseen"] 32 | # ar -- area ratio for the gt, it can be in ["all", "small", "medium", "large"] 33 | # bg -- background, choice in ["all", "busy", "simple"] 34 | # lt -- lighting condition, choice in ["all", "bright", "dim"] 35 | # df -- the difficulty for the test sample, only used for instance detection, as we already explicitly splited 36 | # the validation set into an easy one and a hard one. It's all filled with "all" 37 | metric_filter["cat_det"] = [ 38 | {"iou": "coco", "gr": "all", "ar": "all", "bg": "all", "lt": "all", "df": "all"}, 39 | {"iou": "50", "gr": "all", "ar": "all", "bg": "all", "lt": "all", "df": "all"}, 40 | {"iou": "75", "gr": "all", "ar": "all", "bg": "all", "lt": "all", "df": "all"}, 41 | 42 | {"iou": "50", "gr": "frequent", "ar": "all", "bg": "all", "lt": "all", "df": "all"}, 43 | {"iou": "50", "gr": "common", "ar": "all", "bg": "all", "lt": "all", "df": "all"}, 44 | {"iou": "50", "gr": "rare", "ar": "all", "bg": "all", "lt": "all", "df": "all"}, 45 | 46 | {"iou": "50", "gr": "all", "ar": "large", "bg": "all", "lt": "all", "df": "all"}, 47 | {"iou": "50", "gr": "all", "ar": "medium", "bg": "all", "lt": "all", "df": "all"}, 48 | {"iou": "50", "gr": "all", "ar": "small", "bg": "all", "lt": "all", "df": "all"}, 49 | 50 | {"iou": "50", "gr": "all", "ar": "all", "bg": "all", "lt": "bright", "df": "all"}, 51 | {"iou": "50", "gr": "all", "ar": "all", "bg": "all", "lt": "dim", "df": "all"}, 52 | {"iou": "50", "gr": "all", "ar": "all", "bg": "simple", "lt": "all", "df": "all"}, 53 | {"iou": "50", "gr": "all", "ar": "all", "bg": "busy", "lt": "all", "df": "all"}, 54 | ] 55 | 56 | metric_filter["inst_det"] = [ 57 | {"iou": "coco", "gr": "all", "ar": "all", "bg": "all", "lt": "all", "df": "all"}, 58 | {"iou": "50", "gr": "all", "ar": "all", "bg": "all", "lt": "all", "df": "all"}, 59 | {"iou": "75", "gr": "all", "ar": "all", "bg": "all", "lt": "all", "df": "all"}, 60 | 61 | {"iou": "50", "gr": "seen", "ar": "all", "bg": "all", "lt": "all", "df": "all"}, 62 | {"iou": "50", "gr": "unseen", "ar": "all", "bg": "all", "lt": "all", "df": "all"}, 63 | 64 | {"iou": "50", "gr": "all", "ar": "large", "bg": "all", "lt": "all", "df": "all"}, 65 | {"iou": "50", "gr": "all", "ar": "medium", "bg": "all", "lt": "all", "df": "all"}, 66 | {"iou": "50", "gr": "all", "ar": "small", "bg": "all", "lt": "all", "df": "all"}, 67 | 68 | {"iou": "50", "gr": "all", "ar": "all", "bg": "all", "lt": "bright", "df": "all"}, 69 | {"iou": "50", "gr": "all", "ar": "all", "bg": "all", "lt": "dim", "df": "all"}, 70 | {"iou": "50", "gr": "all", "ar": "all", "bg": "simple", "lt": "all", "df": "all"}, 71 | {"iou": "50", "gr": "all", "ar": "all", "bg": "busy", "lt": "all", "df": "all"}, 72 | ] 73 | 74 | 75 | def get_egoobjects_meta(metadata_path: str): 76 | """ 77 | return metadata dictionary with 4 keys: 78 | cat_det_cats 79 | inst_det_cats 80 | cat_det_cat_id_2_cont_id 81 | cat_det_cat_names 82 | """ 83 | with open(metadata_path, "r") as fp: 84 | metadata = json.load(fp) 85 | 86 | cat_det_cat_id_2_name = {cat["id"]: cat["name"] for cat in metadata["cat_det_cats"]} 87 | cat_det_cat_ids = sorted([cat["id"] for cat in metadata["cat_det_cats"]]) 88 | cat_det_cat_id_2_cont_id = {cat_id: i for i, cat_id in enumerate(cat_det_cat_ids)} 89 | cat_det_cat_names = [cat_det_cat_id_2_name[cat_id] for cat_id in cat_det_cat_ids] 90 | 91 | metadata["cat_det_cat_id_2_cont_id"] = cat_det_cat_id_2_cont_id 92 | metadata["cat_det_cat_names"] = cat_det_cat_names 93 | return metadata 94 | 95 | def main(): 96 | dataset_name = "EgoObjects" 97 | metadata = get_egoobjects_meta(metadata_json_file) 98 | MetadataCatalog.get(dataset_name).set(**metadata) 99 | metadata = MetadataCatalog.get(dataset_name) 100 | 101 | split = "egoobjects_unified_det_val_query" 102 | gt = EgoObjects(gt_json_file, metadata, filter_opts=FILTER_OPTS[split]) 103 | 104 | # dummy category detection predictions 105 | dt_cat = [ 106 | deepcopy(ann) 107 | for ann in gt.dataset["annotations"] 108 | if "category_id" in ann and np.random.uniform() > 0.1 109 | ] 110 | 111 | logger.info(f"len dt_cat {len(dt_cat)}") 112 | 113 | for dt_box in dt_cat: 114 | cx, cy, w, h = dt_box["bbox"] 115 | w = np.random.randint(int(w * 0.5), w) 116 | h = np.random.randint(int(h * 0.5), h) 117 | image_id = dt_box["image_id"] 118 | category_id = dt_box["category_id"] 119 | 120 | dt_box["bbox"] = [cx, cy, w, h] 121 | dt_box["area"] = w * h 122 | dt_box["image_id"] = image_id 123 | dt_box["category_id"] = category_id 124 | dt_box["score"] = np.random.rand(1)[0] 125 | 126 | # dummy instance detection predictions 127 | dt_inst = [ 128 | deepcopy(ann) 129 | for ann in gt.dataset["annotations"] 130 | if "instance_id" in ann and np.random.uniform() > 0.2 131 | ] 132 | 133 | logger.info(f"len dt_inst {len(dt_inst)}") 134 | 135 | for dt_box in dt_inst: 136 | cx, cy, w, h = dt_box["bbox"] 137 | w = np.random.randint(int(w * 0.5), w) 138 | h = np.random.randint(int(h * 0.5), h) 139 | image_id = dt_box["image_id"] 140 | instance_id = dt_box["instance_id"] 141 | 142 | dt_box["bbox"] = [cx, cy, w, h] 143 | dt_box["area"] = w * h 144 | dt_box["image_id"] = image_id 145 | dt_box["instance_id"] = instance_id 146 | dt_box["score"] = np.random.rand(1)[0] 147 | 148 | dt = EgoObjectsResults(gt, dt_cat, dt_inst) 149 | evaluator = EgoObjectsEval(gt, dt, num_processes=32) 150 | evaluator.run(metric_filter) 151 | evaluator.print_results() 152 | 153 | results = evaluator.get_results() 154 | for det_type in ["cat_det", "inst_det"]: 155 | logger.info(f"{det_type} results") 156 | one_result = results[det_type] 157 | one_result = {metric: float(one_result[metric]["value"] * 100) for metric in one_result.keys()} 158 | for _name, metric in one_result.items(): 159 | assert metric < 100.0 160 | logger.info(create_small_table(one_result)) 161 | 162 | if __name__ == "__main__": 163 | main() -------------------------------------------------------------------------------- /images/ICCV2023_poster_EgoObjects.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/EgoObjects/88e683b53962637136fdd497cfb3067caf831012/images/ICCV2023_poster_EgoObjects.jpg -------------------------------------------------------------------------------- /images/intro.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/EgoObjects/88e683b53962637136fdd497cfb3067caf831012/images/intro.png -------------------------------------------------------------------------------- /images/intro_teaser.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/EgoObjects/88e683b53962637136fdd497cfb3067caf831012/images/intro_teaser.png -------------------------------------------------------------------------------- /images/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/EgoObjects/88e683b53962637136fdd497cfb3067caf831012/images/logo.png -------------------------------------------------------------------------------- /images/sample_images.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/EgoObjects/88e683b53962637136fdd497cfb3067caf831012/images/sample_images.png -------------------------------------------------------------------------------- /images/taxonomy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/EgoObjects/88e683b53962637136fdd497cfb3067caf831012/images/taxonomy.png --------------------------------------------------------------------------------