├── .gitignore ├── AUTHORS ├── CHANGELIST-V1-TO-V2.md ├── CONTRIBUTING.md ├── CONTRIBUTORS ├── LICENSE ├── README.md ├── READMEV1.md ├── READMEV2.md ├── READMEV3.md ├── assets ├── bbox_hierarchy.json ├── label-frequencies-total.png ├── label-frequencies-training-set.png ├── oid_bbox_examples.png ├── share-of-correct-annotations-vs-frequency.png ├── v2-bbox_labels_vis_screenshot.png ├── v2-human-label-frequencies-bbox-train.png ├── v2-human-label-frequencies-bbox-val-test.png ├── v2-human-label-frequencies-train.png ├── v2-human-label-frequencies-val-test.png ├── v3-human-bbox-frequencies-test.png ├── v3-human-bbox-frequencies-train.png ├── v3-human-bbox-frequencies-validation.png ├── v3-human-label-frequencies-test.png ├── v3-human-label-frequencies-train.png └── v3-human-label-frequencies-validation.png ├── bbox_labels_vis.html ├── dict.csv ├── downloader.py └── tools ├── classify.py ├── classify_oidv2.py ├── compute_bottleneck.py └── download_data.sh /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | data/ 3 | .idea/ 4 | oid_env/ 5 | -------------------------------------------------------------------------------- /AUTHORS: -------------------------------------------------------------------------------- 1 | # This is the official list of Open Images authors for copyright purposes. 2 | # This file is distinct from the CONTRIBUTORS files. 3 | # See the latter for an explanation. 4 | 5 | # Names should be added to this file as: 6 | # Name or Organization 7 | # The email address is not required for organizations. 8 | 9 | Google Inc. 10 | -------------------------------------------------------------------------------- /CHANGELIST-V1-TO-V2.md: -------------------------------------------------------------------------------- 1 | # Changes between v1 and v2 release 2 | 3 | - Bounding box annotations: 2.07M (1.24M in train, 830K in validation+test) 4 | - Human-verified image-level labels in training set: 9.38M (4.30M positive examples, 5.07M negative examples) 5 | - Cleanup of human-verifications in the validation+test sets 6 | - The V1 validation set is now partitioned into validation and test sets. 7 | 8 | [Original README for V1](READMEV1.md) 9 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing guidelines 2 | 3 | ## How to become a contributor and submit your own code 4 | 5 | ### Contributor License Agreements 6 | 7 | We'd love to accept your patches! Before we can take them, we have to jump a couple of legal hurdles. 8 | 9 | Please fill out either the individual or corporate Contributor License Agreement (CLA). 10 | 11 | * If you are an individual writing original source code and you're sure you own the intellectual property, then you'll need to sign an [individual CLA](http://code.google.com/legal/individual-cla-v1.0.html). 12 | 13 | * If you work for a company that wants to allow you to contribute your work, then you'll need to sign a [corporate CLA](http://code.google.com/legal/corporate-cla-v1.0.html). 14 | 15 | Follow either of the two links above to access the appropriate CLA and instructions for how to sign and return it. Once we receive it, we'll be able to accept your pull requests. 16 | 17 | ***NOTE***: Only original source code from you and other people that have signed the CLA can be accepted into the main repository. 18 | 19 | ### Contributing code and annotations 20 | 21 | If you have improvements to Open Images, send us your pull requests! For those 22 | just getting started, Github has a [howto](https://help.github.com/articles/using-pull-requests/). 23 | 24 | We're interested in fixing the mistakes in train and validation sets based on human ground truth or machine-generated annotations, 25 | new annotations layers, examples on how to use the dataset, and many other things. 26 | 27 | -------------------------------------------------------------------------------- /CONTRIBUTORS: -------------------------------------------------------------------------------- 1 | # This is the official list of people who have contributed 2 | # to the OpenImages dataset. 3 | 4 | # Please keep the list sorted. 5 | 6 | Abhinav Gupta 7 | Alina Kuznetsova 8 | Andreas Veit 9 | Chen Sun 10 | David Cai 11 | Dhyanesh Narayanan 12 | Gal Chechik 13 | Hassan Rom 14 | Ivan Krasin 15 | Jasper Uijlings 16 | Kevin Murphy 17 | Neil Alldrin 18 | Sami Abu-El-Haija 19 | Serge Belongie 20 | Shahab Kamali 21 | Stefan Popov 22 | Tom Duerig 23 | Victor Gomes 24 | Vittorio Ferrari 25 | Zheyun Feng 26 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2016 The Open Images Authors. All rights reserved. 2 | 3 | Apache License 4 | Version 2.0, January 2004 5 | http://www.apache.org/licenses/ 6 | 7 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 8 | 9 | 1. Definitions. 10 | 11 | "License" shall mean the terms and conditions for use, reproduction, 12 | and distribution as defined by Sections 1 through 9 of this document. 13 | 14 | "Licensor" shall mean the copyright owner or entity authorized by 15 | the copyright owner that is granting the License. 16 | 17 | "Legal Entity" shall mean the union of the acting entity and all 18 | other entities that control, are controlled by, or are under common 19 | control with that entity. For the purposes of this definition, 20 | "control" means (i) the power, direct or indirect, to cause the 21 | direction or management of such entity, whether by contract or 22 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 23 | outstanding shares, or (iii) beneficial ownership of such entity. 24 | 25 | "You" (or "Your") shall mean an individual or Legal Entity 26 | exercising permissions granted by this License. 27 | 28 | "Source" form shall mean the preferred form for making modifications, 29 | including but not limited to software source code, documentation 30 | source, and configuration files. 31 | 32 | "Object" form shall mean any form resulting from mechanical 33 | transformation or translation of a Source form, including but 34 | not limited to compiled object code, generated documentation, 35 | and conversions to other media types. 36 | 37 | "Work" shall mean the work of authorship, whether in Source or 38 | Object form, made available under the License, as indicated by a 39 | copyright notice that is included in or attached to the work 40 | (an example is provided in the Appendix below). 41 | 42 | "Derivative Works" shall mean any work, whether in Source or Object 43 | form, that is based on (or derived from) the Work and for which the 44 | editorial revisions, annotations, elaborations, or other modifications 45 | represent, as a whole, an original work of authorship. For the purposes 46 | of this License, Derivative Works shall not include works that remain 47 | separable from, or merely link (or bind by name) to the interfaces of, 48 | the Work and Derivative Works thereof. 49 | 50 | "Contribution" shall mean any work of authorship, including 51 | the original version of the Work and any modifications or additions 52 | to that Work or Derivative Works thereof, that is intentionally 53 | submitted to Licensor for inclusion in the Work by the copyright owner 54 | or by an individual or Legal Entity authorized to submit on behalf of 55 | the copyright owner. For the purposes of this definition, "submitted" 56 | means any form of electronic, verbal, or written communication sent 57 | to the Licensor or its representatives, including but not limited to 58 | communication on electronic mailing lists, source code control systems, 59 | and issue tracking systems that are managed by, or on behalf of, the 60 | Licensor for the purpose of discussing and improving the Work, but 61 | excluding communication that is conspicuously marked or otherwise 62 | designated in writing by the copyright owner as "Not a Contribution." 63 | 64 | "Contributor" shall mean Licensor and any individual or Legal Entity 65 | on behalf of whom a Contribution has been received by Licensor and 66 | subsequently incorporated within the Work. 67 | 68 | 2. Grant of Copyright License. Subject to the terms and conditions of 69 | this License, each Contributor hereby grants to You a perpetual, 70 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 71 | copyright license to reproduce, prepare Derivative Works of, 72 | publicly display, publicly perform, sublicense, and distribute the 73 | Work and such Derivative Works in Source or Object form. 74 | 75 | 3. Grant of Patent License. Subject to the terms and conditions of 76 | this License, each Contributor hereby grants to You a perpetual, 77 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 78 | (except as stated in this section) patent license to make, have made, 79 | use, offer to sell, sell, import, and otherwise transfer the Work, 80 | where such license applies only to those patent claims licensable 81 | by such Contributor that are necessarily infringed by their 82 | Contribution(s) alone or by combination of their Contribution(s) 83 | with the Work to which such Contribution(s) was submitted. If You 84 | institute patent litigation against any entity (including a 85 | cross-claim or counterclaim in a lawsuit) alleging that the Work 86 | or a Contribution incorporated within the Work constitutes direct 87 | or contributory patent infringement, then any patent licenses 88 | granted to You under this License for that Work shall terminate 89 | as of the date such litigation is filed. 90 | 91 | 4. Redistribution. You may reproduce and distribute copies of the 92 | Work or Derivative Works thereof in any medium, with or without 93 | modifications, and in Source or Object form, provided that You 94 | meet the following conditions: 95 | 96 | (a) You must give any other recipients of the Work or 97 | Derivative Works a copy of this License; and 98 | 99 | (b) You must cause any modified files to carry prominent notices 100 | stating that You changed the files; and 101 | 102 | (c) You must retain, in the Source form of any Derivative Works 103 | that You distribute, all copyright, patent, trademark, and 104 | attribution notices from the Source form of the Work, 105 | excluding those notices that do not pertain to any part of 106 | the Derivative Works; and 107 | 108 | (d) If the Work includes a "NOTICE" text file as part of its 109 | distribution, then any Derivative Works that You distribute must 110 | include a readable copy of the attribution notices contained 111 | within such NOTICE file, excluding those notices that do not 112 | pertain to any part of the Derivative Works, in at least one 113 | of the following places: within a NOTICE text file distributed 114 | as part of the Derivative Works; within the Source form or 115 | documentation, if provided along with the Derivative Works; or, 116 | within a display generated by the Derivative Works, if and 117 | wherever such third-party notices normally appear. The contents 118 | of the NOTICE file are for informational purposes only and 119 | do not modify the License. You may add Your own attribution 120 | notices within Derivative Works that You distribute, alongside 121 | or as an addendum to the NOTICE text from the Work, provided 122 | that such additional attribution notices cannot be construed 123 | as modifying the License. 124 | 125 | You may add Your own copyright statement to Your modifications and 126 | may provide additional or different license terms and conditions 127 | for use, reproduction, or distribution of Your modifications, or 128 | for any such Derivative Works as a whole, provided Your use, 129 | reproduction, and distribution of the Work otherwise complies with 130 | the conditions stated in this License. 131 | 132 | 5. Submission of Contributions. Unless You explicitly state otherwise, 133 | any Contribution intentionally submitted for inclusion in the Work 134 | by You to the Licensor shall be under the terms and conditions of 135 | this License, without any additional terms or conditions. 136 | Notwithstanding the above, nothing herein shall supersede or modify 137 | the terms of any separate license agreement you may have executed 138 | with Licensor regarding such Contributions. 139 | 140 | 6. Trademarks. This License does not grant permission to use the trade 141 | names, trademarks, service marks, or product names of the Licensor, 142 | except as required for reasonable and customary use in describing the 143 | origin of the Work and reproducing the content of the NOTICE file. 144 | 145 | 7. Disclaimer of Warranty. Unless required by applicable law or 146 | agreed to in writing, Licensor provides the Work (and each 147 | Contributor provides its Contributions) on an "AS IS" BASIS, 148 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 149 | implied, including, without limitation, any warranties or conditions 150 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 151 | PARTICULAR PURPOSE. You are solely responsible for determining the 152 | appropriateness of using or redistributing the Work and assume any 153 | risks associated with Your exercise of permissions under this License. 154 | 155 | 8. Limitation of Liability. In no event and under no legal theory, 156 | whether in tort (including negligence), contract, or otherwise, 157 | unless required by applicable law (such as deliberate and grossly 158 | negligent acts) or agreed to in writing, shall any Contributor be 159 | liable to You for damages, including any direct, indirect, special, 160 | incidental, or consequential damages of any character arising as a 161 | result of this License or out of the use or inability to use the 162 | Work (including but not limited to damages for loss of goodwill, 163 | work stoppage, computer failure or malfunction, or any and all 164 | other commercial damages or losses), even if such Contributor 165 | has been advised of the possibility of such damages. 166 | 167 | 9. Accepting Warranty or Additional Liability. While redistributing 168 | the Work or Derivative Works thereof, You may choose to offer, 169 | and charge a fee for, acceptance of support, warranty, indemnity, 170 | or other liability obligations and/or rights consistent with this 171 | License. However, in accepting such obligations, You may act only 172 | on Your own behalf and on Your sole responsibility, not on behalf 173 | of any other Contributor, and only if You agree to indemnify, 174 | defend, and hold each Contributor harmless for any liability 175 | incurred by, or claims asserted against, such Contributor by reason 176 | of your accepting any such warranty or additional liability. 177 | 178 | END OF TERMS AND CONDITIONS 179 | 180 | APPENDIX: How to apply the Apache License to your work. 181 | 182 | To apply the Apache License to your work, attach the following 183 | boilerplate notice, with the fields enclosed by brackets "[]" 184 | replaced with your own identifying information. (Don't include 185 | the brackets!) The text should be enclosed in the appropriate 186 | comment syntax for the file format. We also recommend that a 187 | file or class name and description of purpose be included on the 188 | same "printed page" as the copyright notice for easier 189 | identification within third-party archives. 190 | 191 | Copyright 2016, The Open Images Authors. 192 | 193 | Licensed under the Apache License, Version 2.0 (the "License"); 194 | you may not use this file except in compliance with the License. 195 | You may obtain a copy of the License at 196 | 197 | http://www.apache.org/licenses/LICENSE-2.0 198 | 199 | Unless required by applicable law or agreed to in writing, software 200 | distributed under the License is distributed on an "AS IS" BASIS, 201 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 202 | See the License for the specific language governing permissions and 203 | limitations under the License. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Open Images Dataset 2 | 3 | As of V4, [the Open Images Dataset moved to a new site](https://storage.googleapis.com/openimages/web/index.html). 4 | -------------------------------------------------------------------------------- /READMEV1.md: -------------------------------------------------------------------------------- 1 | # Open Images dataset 2 | 3 | Open Images is a dataset of ~9 million URLs to images that have been annotated with labels spanning over 6000 categories. 4 | 5 | The annotations are licensed by Google Inc. under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license. The contents of this repository are released under an [Apache 2](LICENSE) license. 6 | 7 | The images are listed as having a [CC BY 2.0](https://creativecommons.org/licenses/by/2.0/) license. **Note:** while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself. 8 | 9 | ## Goodies 10 | - **New** [Pretrained Inception v3 model](https://github.com/openimages/dataset/wiki/Running-a-pretrained-classifier) is released. 11 | - **New** [OpenImages annotations on BigQuery](https://cloud.google.com/bigquery/public-data/openimages) 12 | 13 | ## Download the data 14 | 15 | * [Image URLs and metadata](https://storage.googleapis.com/openimages/2016_08/images_2016_08_v5.tar.gz) (990 MB) -- **updated**: added OriginalSize and OriginalMD5 and Thumbnail300KURL columns. 16 | * [Machine image-level annotations (train and validation sets)](https://storage.googleapis.com/openimages/2016_08/machine_ann_2016_08_v3.tar.gz) (450 MB) 17 | * [Human image-level annotations (validation set)](https://storage.googleapis.com/openimages/2016_08/human_ann_2016_08_v3.tar.gz) (9 MB) 18 | 19 | See also how to [import the annotations into PostgreSQL](https://github.com/openimages/dataset/wiki/Importing-into-PostgreSQL). 20 | 21 | ## Data organization 22 | 23 | Each image has a unique 64-bit ID assigned. In the CSV files they appear as zero-padded hex integers, such as 000060e3121c7305. The dataset is split into a training set (9011219 images) and a validation set (167057 images). Each image has zero, one or more image-level labels assigned. Both sets have machine-populated annotations, while the validation set also has human annotations. The raters have been asked to validate the machine annotations, which allowed to practically eliminate false positive from the validation set (but not false negatives). 24 | 25 | Labels are so called mids as can be found in [Freebase](https://en.wikipedia.org/wiki/Freebase) or [Google Knowledge Graph API](https://developers.google.com/knowledge-graph/). A short description of each label is available in [dict.csv](dict.csv). There are 7844 distinct labels attached to at least one images, but only around 6000 labels are considered 'trainable' with at least 50 images in the validation set and at least 20 images in the training set. 26 | 27 | Each annotation has a confidence number from 0.0 to 1.0 assigned. The human annotations are definite (either positive, 1.0 or negative, 0.0), while machine annotations have fractional confidences, generally, >= 0.5. The higher confidence, the smaller chance for the label to be a false positive. 28 | 29 | The data tarballs contain CSV files of two types: 30 | 31 | ### images.csv 32 | 33 | There's one such file for each subset inside train and validation subdirectories. It has image URLs, their OpenImages IDs, titles, authors and license information: 34 | 35 | ``` 36 | ImageID,Subset,OriginalURL,OriginalLandingURL,License,AuthorProfileURL,Author,Title,\ 37 | OriginalSize,OriginalMD5,Thumbnail300KURL 38 | ... 39 | 40 | 000060e3121c7305,train,https://c1.staticflickr.com/5/4129/5215831864_46f356962f_o.jpg,\ 41 | https://www.flickr.com/photos/brokentaco/5215831864,\ 42 | https://creativecommons.org/licenses/by/2.0/,\ 43 | "https://www.flickr.com/people/brokentaco/","David","28 Nov 2010 Our new house."\ 44 | 211079,0Sad+xMj2ttXM1U8meEJ0A==,https://c1.staticflickr.com/5/4129/5215831864_ee4e8c6535_z.jpg 45 | ``` 46 | 47 | The data is as it appears on the destination websites. 48 | 49 | * OriginalSize is the download size of the original image. 50 | * OriginalMD5 is base64-encoded binary MD5, as described [here](https://cloud.google.com/storage/transfer/create-url-list#md5). 51 | * Thumbnail300KURL is an optional URL to a thumbnail with ~300K pixels (~640x480). It's provided for the convenience of downloading the data in the absence of more convenient ways to get the images. If missing, the OriginalURL must be used (and then resized to the same size, if needed). **Beware:** these thumbnails are generated on the fly and their contents and even resolution might be different every day. 52 | 53 | ### labels.csv 54 | 55 | The CSVs of this type attach labels to image IDs: 56 | 57 | ``` 58 | ImageID,Source,LabelName,Confidence 59 | ... 60 | 000060e3121c7305,machine,/m/06ht1,0.9 61 | 000060e3121c7305,machine,/m/05wrt,0.9 62 | 000060e3121c7305,machine,/m/01l0mw,0.8 63 | 000060e3121c7305,machine,/m/03d2wd,0.7 64 | 000060e3121c7305,machine,/m/03nxtz,0.7 65 | 000060e3121c7305,machine,/m/023907r,0.7 66 | 000060e3121c7305,machine,/m/020g49,0.7 67 | 000060e3121c7305,machine,/m/0l7_8,0.6 68 | 000060e3121c7305,machine,/m/02rfdq,0.6 69 | 000060e3121c7305,machine,/m/038t8_,0.6 70 | 000060e3121c7305,machine,/m/03f6tq,0.6 71 | 000060e3121c7305,machine,/m/01s105,0.6 72 | 000060e3121c7305,machine,/m/01nblt,0.5 73 | ... 74 | ``` 75 | 76 | These can be converted to their short descriptions by looking into dict.csv: 77 | 78 | ``` 79 | "/m/05wrt","property" 80 | "/m/06ht1","room" 81 | "/m/01l0mw","home" 82 | "/m/03d2wd","dining room" 83 | "/m/03nxtz","cottage" 84 | "/m/020g49","hardwood" 85 | "/m/023907r","real estate" 86 | "/m/038t8_","estate" 87 | "/m/03f6tq","living room" 88 | "/m/0l7_8","floor" 89 | "/m/01nblt","apartment" 90 | "/m/01s105","cabinetry" 91 | ``` 92 | 93 | ## Stats and data quality 94 | 95 | The distribution of labels across the images is highly uneven with some labels attached to more than a million images, while others to less than 100: 96 | 97 | ![Label frequencies - Total](assets/label-frequencies-total.png) 98 | 99 | ![Label frequencies - Training set](assets/label-frequencies-training-set.png) 100 | 101 | While the machine annotations are somewhat noisy, in general, the labels with more images are more accurate: 102 | 103 | ![Share of correct annotations vs frequency](assets/share-of-correct-annotations-vs-frequency.png) 104 | 105 | 106 | We have trained an Inception v3 model based on Open Images annotations alone, and the model is good enough to be used for fine-tuning applications as well as for other things, like [DeepDream](https://research.googleblog.com/2015/07/deepdream-code-example-for-visualizing.html) or [artistic style transfer](https://arxiv.org/abs/1508.06576) which require a well developed hierarchy of filters. We hope to improve the quality of the annotations in Open Image the coming months, and therefore the quality of models which can be trained. 107 | 108 | ## Citations 109 | 110 | If you use the OpenImages dataset in your work, please cite it as: 111 | 112 | APA-style citation: "Krasin I., Duerig T., Alldrin N., Veit A., Abu-El-Haija S., Belongie S., Cai D., Feng Z., Ferrari V., Gomes V., Gupta A., Narayanan D., Sun C., Chechik G, Murphy K. OpenImages: A public dataset for large-scale multi-label and multi-class image classification, 2016. Available from https://github.com/openimages". 113 | 114 | BibTeX 115 | ``` 116 | @article{openimages, 117 | title={OpenImages: A public dataset for large-scale multi-label and multi-class image classification.}, 118 | author={Krasin, Ivan and Duerig, Tom and Alldrin, Neil and Veit, Andreas and Abu-El-Haija, Sami 119 | and Belongie, Serge and Cai, David and Feng, Zheyun and Ferrari, Vittorio and Gomes, Victor 120 | and Gupta, Abhinav and Narayanan, Dhyanesh and Sun, Chen and Chechik, Gal and Murphy, Kevin}, 121 | journal={Dataset available from https://github.com/openimages}, 122 | year={2016} 123 | } 124 | ``` 125 | -------------------------------------------------------------------------------- /READMEV2.md: -------------------------------------------------------------------------------- 1 | 2 | # Open Images dataset 3 | 4 | Open Images is a dataset of ~9 million URLs to images that have been annotated with image-level labels and bounding boxes spanning thousands of classes. 5 | 6 | The annotations are licensed by Google Inc. under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license. The contents of this repository are released under an [Apache 2](LICENSE) license. 7 | 8 | The images are listed as having a [CC BY 2.0](https://creativecommons.org/licenses/by/2.0/) license. **Note:** while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself. 9 | 10 | [![Bounding-Box Examples](assets/oid_bbox_examples.png)](http://www.cvdfoundation.org/datasets/open-images-dataset/vis) 11 | *Annotated images from the Open Images dataset. Left: [FAMILY MAKING A SNOWMAN](https://www.flickr.com/photos/mwvchamber/5433788065) by [mwvchamber](https://www.flickr.com/photos/mwvchamber/). Right: [STANZA STUDENTI.S.S. ANNUNZIATA](https://www.flickr.com/photos/ersupalermo/5759830290) by [ersupalermo](https://www.flickr.com/photos/ersupalermo/). Both images used under [CC BY 2.0](https://creativecommons.org/licenses/by/2.0/) license.
Browse the validation set [here](http://www.cvdfoundation.org/datasets/open-images-dataset/vis) and the train set [here](http://www.cvdfoundation.org/datasets/open-images-dataset/vis?set=train), courtesy of Common Visual Data Foundation ([CVDF](http://www.cvdfoundation.org)).
* 12 | 13 | ## Announcements 14 | 15 | * **10-20-2017** Resnet 101 image classification model released (trained on V2 data). [Model checkpoint](https://storage.googleapis.com/openimages/2017_07/oidv2-resnet_v1_101.ckpt.tar.gz), [Checkpoint readme](https://storage.googleapis.com/openimages/2017_07/oidv2-resnet_v1_101.readme.txt), [Inference code](tools/classify_oidv2.py). 16 | * **07-20-2017** V2 data is released! The dataset now includes 2M bounding boxes spanning 600 object classes (1.24M in train, 830K in validation+test), and 4.3M human-verified positive image-level labels on the training set. [Changelist](CHANGELIST-V1-TO-V2.md). Coming soon: Trained models (both image-level and object detectors). 17 | * **07-20-2017** [V2 data visualizer](http://www.cvdfoundation.org/datasets/open-images-dataset/vis), courtesy of Common Visual Data Foundation ([CVDF](http://www.cvdfoundation.org)). 18 | 19 | ## Download the data 20 | 21 | * [Image URLs and metadata](https://storage.googleapis.com/openimages/2017_07/images_2017_07.tar.gz) (990 MB) 22 | * [Bounding box annotations (train, validation, and test sets)](https://storage.googleapis.com/openimages/2017_07/annotations_human_bbox_2017_07.tar.gz) (37 MB) 23 | * [Human-verified image-level annotations (train, validation, and test sets)](https://storage.googleapis.com/openimages/2017_07/annotations_human_2017_07.tar.gz) (66 MB) 24 | * [Machine-generated image-level annotations (train, validation, and test sets)](https://storage.googleapis.com/openimages/2017_07/annotations_machine_2017_07.tar.gz) (447 MB) 25 | * [Classes and class descriptions](https://storage.googleapis.com/openimages/2017_07/classes_2017_07.tar.gz) (293 KB) 26 | 27 | See also how to [import the annotations into PostgreSQL](https://github.com/openimages/dataset/wiki/Importing-into-PostgreSQL). 28 | 29 | ## Data organization 30 | 31 | The [dataset](#download-the-data) is split into a training set (9,011,219 images), a validation set (41,620 images), and a test set (125,436 images). The validation of V1 set was partitioned into validation and test in the V2 release. This is intended to make evaluations more tractable. The images are annotated with image-level labels and bounding boxes as described below. 32 | 33 | ### Image-level labels 34 | 35 | Table 1 shows an overview of the image-level labels in all splits of the dataset. All images have image-level labels automatically generated by a computer vision model similar to [Google Cloud Vision API](https://cloud.google.com/vision/). 36 | 37 |

Table 1: Image-level labels.

38 | 39 | | | Train | Validation | Test | # Classes | # Trainable Classes | 40 | |---:|---:|---:|---:|---:|---:| 41 | |Images|9,011,219|41,620|125,436|-|-| 42 | |Machine Generated Labels|78,977,695|512,093|1,545,835|7,870|4,966| 43 | |Human Verified Labels|9,376,588
pos: 4,303,594
neg: 5,072,994|559,873
pos: 374,778
neg: 185,095|1,693,466
pos: 1,132,567
neg:    560,899|19,661|5,000| 44 | 45 | Moreover, all images in the validation and test sets as well as part of the training set have human-verified image-level labels. Most of the human verifications have been done with in-house annotators at Google. A smaller part has been done with crowd-sourced verification from Image Labeler: [Crowdsource app](http://play.google.com/store/apps/details?id=com.google.android.apps.village.boond), [g.co/imagelabeler](http://g.co/imagelabeler). This human verification process allows to practically eliminate false positives (but not false negatives, so some labels might be missing from an image). A variety of computer vision models were used to generate the samples (not just the one used for the machine-generated labels above) which is why the vocabulary is significantly expanded (#classes column in Table 1). 46 | 47 | Overall, there are 19868 distinct [classes with image-level labels](https://storage.googleapis.com/openimages/2017_07/classes.txt) (19661 have at least one human-verified sample and 7870 have a sample in the machine-generated pool). Of these, [5000 classes are considered trainable](https://storage.googleapis.com/openimages/2017_07/classes-trainable.txt), as they have at least 30 human-verified samples in the training set and 5 in the validation or test sets. Classes are identified by MIDs (Machine-generated Ids) as can be found in [Freebase](https://en.wikipedia.org/wiki/Freebase) or [Google Knowledge Graph API](https://developers.google.com/knowledge-graph/). A short description of each class is available in [class-descriptions.csv](https://storage.googleapis.com/openimages/2017_07/class-descriptions.csv). 48 | 49 | Each annotation has a confidence number from 0.0 to 1.0 assigned. Confidences for the human-verified labels are binary (either positive, 1.0 or negative, 0.0). Machine-generated labels have fractional confidences, generally >= 0.5. The higher the confidence, the smaller chance for the label to be a false positive. 50 | 51 | ### Bounding boxes 52 | 53 | Table 2 shows an overview of the bounding box annotations in all splits of the dataset, which span 600 object classes. These offer a broader range than those in the ILSVRC and COCO detection challenges, and include new objects such as fedora hat and snowman. 54 | 55 |

Table 2: Boxes.

56 | 57 | | | Train | Validation | Test | # Classes | # Trainable Classes | 58 | |---:|---:|---:|---:|---:|---:| 59 | |Images|669,124|41,620|125,436|-|-| 60 | |Boxes|1,240,316|204,621|625,282|600|545| 61 | 62 | We provide complete bounding box annotation for all object instances on the validation and test sets, all manually drawn by human annotators at Google. This was done for all image-level labels that have been positively verified by a human (see Table 1). Moreover, annotators also marked a set of attributes for each box, e.g. indicating whether that object is occluded. On average, there are about 5 boxes per image in the validation and test sets. 63 | 64 | We produced bounding boxes in the training set semi-automatically using an enhanced version of the method described in ["We don't need no bounding-boxes: Training object class detectors using only human verification", Papadopolous et al., CVPR 2016](https://arxiv.org/abs/1602.08405). This process iterates between detecting objects, asking humans to verify the current detections (boxes), and updating the detector. This enables to localize difficult objects that were missed in early iterations. We run it for 3-4 iterations, which produced boxes for about 70% of all image-level labels (i.e. those that have been positively verified by a human in Table 1). If there are multiple objects of the same class in one image, only one has been boxed. All boxes have been human verified to have IoU > 0.7 with the perfect box, and in practice they are very accurate (mean IoU around 0.82). We deliberately did not annotate human body parts for 80% of the training set due to the overwhelming number of instances. On average, there are about 2 boxes per image in the training set. 65 | 66 | Overall, there are [600 distinct classes with a bounding box](https://storage.googleapis.com/openimages/2017_07/classes-bbox.txt) attached to at least one image. Of these, [545 classes are considered trainable](https://storage.googleapis.com/openimages/2017_07/classes-bbox-trainable.txt) (the intersection of the 600 boxable classes with the 5000 image-level trainable classes). 67 | 68 | ### Data Formats 69 | 70 | The data tarballs contain the following files: 71 | 72 | ### images.csv 73 | 74 | There's one such file for each subset inside train, validation, and test subdirectories respectively. It has image URLs, their OpenImages IDs, titles, authors and license information: 75 | 76 | ``` 77 | ImageID,Subset,OriginalURL,OriginalLandingURL,License,AuthorProfileURL,Author,Title,\ 78 | OriginalSize,OriginalMD5,Thumbnail300KURL 79 | ... 80 | 000060e3121c7305,train,https://c1.staticflickr.com/5/4129/5215831864_46f356962f_o.jpg,\ 81 | https://www.flickr.com/photos/brokentaco/5215831864,\ 82 | https://creativecommons.org/licenses/by/2.0/,\ 83 | "https://www.flickr.com/people/brokentaco/","David","28 Nov 2010 Our new house."\ 84 | 211079,0Sad+xMj2ttXM1U8meEJ0A==,https://c1.staticflickr.com/5/4129/5215831864_ee4e8c6535_z.jpg 85 | ... 86 | ``` 87 | 88 | Each image has a unique 64-bit ID assigned. In the CSV files they appear as zero-padded hex integers, such as 000060e3121c7305. 89 | 90 | The data is as it appears on the destination websites. 91 | 92 | * OriginalSize is the download size of the original image. 93 | * OriginalMD5 is base64-encoded binary MD5, as described [here](https://cloud.google.com/storage/transfer/create-url-list#md5). 94 | * Thumbnail300KURL is an optional URL to a thumbnail with ~300K pixels (~640x480). It's provided for the convenience of downloading the data in the absence of more convenient ways to get the images. If missing, the OriginalURL must be used (and then resized to the same size, if needed). **Beware:** these thumbnails are generated on the fly and their contents and even resolution might be different every day. 95 | 96 | ### annotations-machine.csv 97 | 98 | Machine-generated image-level labels (one file each for train, validation, and test): 99 | 100 | ``` 101 | ImageID,Source,LabelName,Confidence 102 | 000002b66c9c498e,machine,/m/05_4_,0.7 103 | 000002b66c9c498e,machine,/m/0krfg,0.7 104 | 000002b66c9c498e,machine,/m/01kcnl,0.5 105 | 000002b97e5471a0,machine,/m/05_5t0l,0.9 106 | 000002b97e5471a0,machine,/m/0cgh4,0.8 107 | 000002b97e5471a0,machine,/m/0dx1j,0.8 108 | 000002b97e5471a0,machine,/m/039jbq,0.8 109 | 000002b97e5471a0,machine,/m/03nfmq,0.8 110 | 000002b97e5471a0,machine,/m/03jm5,0.7 111 | ... 112 | ``` 113 | 114 | These were generated from a computer vision model similar to 115 | [Google Cloud Vision API](https://cloud.google.com/vision/). 116 | 117 | ### annotations-human.csv 118 | 119 | Human-verified image-level labels (one file each for train, validation, and test): 120 | 121 | ``` 122 | ImageID,Source,LabelName,Confidence 123 | 000002b66c9c498e,human,/m/014l8n,0 124 | 000002b66c9c498e,human,/m/017ycn,0 125 | 000002b66c9c498e,human,/m/018tkd,0 126 | 000002b66c9c498e,human,/m/019_nn,1 127 | 000002b66c9c498e,human,/m/01_qs1,0 128 | 000002b66c9c498e,human,/m/01b7zv,0 129 | 000002b66c9c498e,human,/m/01bsxb,1 130 | 000002b66c9c498e,human,/m/01c5nlx,1 131 | 000002b66c9c498e,human,/m/01d07t,0 132 | ... 133 | ``` 134 | 135 | ### annotations-human-bbox.csv 136 | 137 | Human provided labels with bounding box coordinates (one file each for train, 138 | validation, and test). 139 | 140 | For the train set labels and box coordinates are provided: 141 | 142 | ``` 143 | ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax 144 | 000002b66c9c498e,human,/m/01g317,1,0.602,0.766,0.309,0.489 145 | 000002b66c9c498e,human,/m/01g317,1,0.611,0.750,0.287,0.469 146 | 000002b66c9c498e,human,/m/01mzpv,1,0.009,0.119,0.750,0.920 147 | 000002b66c9c498e,human,/m/01mzpv,1,0.019,0.098,0.767,0.892 148 | 000002b66c9c498e,human,/m/0270h,1,0.522,0.917,0.675,0.966 149 | 000002b66c9c498e,human,/m/0284d,1,0.522,0.917,0.675,0.966 150 | 000002b66c9c498e,human,/m/0284d,1,0.560,0.951,0.696,1.000 151 | 000002b66c9c498e,human,/m/02p0tk3,1,0.031,0.448,0.697,0.967 152 | 000002b66c9c498e,human,/m/02wbm,1,0.522,0.917,0.675,0.966 153 | ... 154 | ``` 155 | 156 | For the validation and test sets additional attributes are also given: 157 | 158 | ``` 159 | ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside 160 | 000e4e7ed48c932d,human,/m/05s2s,1,0.000563,1.000000,0.000000,0.999128,0,1,1,0,0 161 | 000e4e7ed48c932d,human,/m/0c9ph5,1,0.822343,1.000000,0.000000,0.222423,1,1,0,0,0 162 | 000e4e7ed48c932d,human,/m/0c9ph5,1,0.466559,0.761525,0.124262,0.560279,1,0,0,0,0 163 | 000e4e7ed48c932d,human,/m/0c9ph5,1,0.000000,0.082646,0.249817,0.386786,1,1,0,0,0 164 | 000e4e7ed48c932d,human,/m/0c9ph5,1,0.000000,0.214165,0.321725,0.683551,0,0,0,0,0 165 | 000e4e7ed48c932d,human,/m/0c9ph5,1,0.200481,0.316795,0.615067,0.728066,0,0,0,0,0 166 | 000e4e7ed48c932d,human,/m/0c9ph5,1,0.052237,0.229369,0.853620,1.000000,1,1,0,0,0 167 | 000e4e7ed48c932d,human,/m/0c9ph5,1,0.553224,0.780531,0.865034,1.000000,1,1,0,0,0 168 | 000e4e7ed48c932d,human,/m/0c9ph5,1,0.498488,0.583633,0.483805,0.542017,1,0,0,0,0 169 | ... 170 | ``` 171 | 172 | The attributes have the following definitions: 173 | 174 | - IsOccluded: Indicates that the object is occluded by another object in the image. 175 | - IsTruncated: Indicates that the object extends beyond the boundary of the image. 176 | - IsGroupOf: Indicates that the box spans a group of objects (e.g., a bed of flowers or a crowd of people). We asked annotators to use this tag for cases with more than 5 instances which are heavily occluding each other and are physically touching. 177 | - IsDepiction: Indicates that the object is a depiction (e.g., a cartoon or drawing of the object, not a real physical instance). 178 | - IsInside: Indicates a picture taken from the inside of the object (e.g., a car interior or inside of a building). 179 | 180 | ### class-descriptions.csv 181 | 182 | The label MIDs can be converted to their short descriptions by looking into class-descriptions.csv: 183 | 184 | ``` 185 | ... 186 | /m/025dyy,Box 187 | /m/025f_6,Dussehra 188 | /m/025fh,Professor x 189 | /m/025fnn,Savannah Sparrow 190 | /m/025fsf,Stapler 191 | /m/025gg7,Jaguar x-type 192 | /m/02_5h,Figure skating 193 | /m/025_h00,Solid-state drive 194 | /m/025_h88,White tailed prairie dog 195 | /m/025_hbp,Mercury monterey 196 | /m/025h_m,Yellow rumped Warbler 197 | /m/025khl,Spätzle 198 | ... 199 | ``` 200 | 201 | Note the presence of characters like commas and quotes. The file follows 202 | standard csv escaping rules. E.g., 203 | 204 | ``` 205 | /m/02wvth,"Fiat 500 ""topolino""" 206 | /m/03gtp5,Lamb's quarters 207 | /m/03hgsf0,"Lemon, lime and bitters" 208 | ``` 209 | 210 | ### classes.txt 211 | 212 | The list of 19868 image-level classes: 213 | 214 | ``` 215 | /m/0100nhbf 216 | /m/0104x9kv 217 | /m/0105jzwx 218 | /m/0105ld7g 219 | /m/0105lxy5 220 | /m/0105n86x 221 | /m/0105ts35 222 | /m/0108_09c 223 | /m/01_097 224 | /m/010dmf 225 | ... 226 | ``` 227 | 228 | ### classes-trainable.txt 229 | 230 | The list of 5000 trainable image-level classes. 231 | 232 | ### classes-bbox.txt 233 | 234 | The list of 600 box-level classes. 235 | 236 | ### classes-bbox-trainable.txt 237 | 238 | The list of 545 trainable box-level classes. 239 | 240 | ## Statistics and data analysis 241 | 242 | ### Browse the data 243 | 244 | - Browse the bounding box groundtruth [here](http://www.cvdfoundation.org/datasets/open-images-dataset/vis), courtesy of [CVDF](http://www.cvdfoundation.org). 245 | 246 | - View the set of boxable labels [here](https://storage.googleapis.com/openimages/2017_07/bbox_labels_vis/bbox_labels_vis.html): 247 | 248 | [![Hierarchy Visualizer](assets/v2-bbox_labels_vis_screenshot.png)](https://storage.googleapis.com/openimages/2017_07/bbox_labels_vis/bbox_labels_vis.html) 249 | 250 | ### Label distribution 251 | The following figures show the distribution of annotations across the dataset. Notice that the class distribution is heavily skewed (note: the y-axis is on a log-scale). Labels are ordered by number of positive examples, then by number of negative examples. Green indicates positive examples while red indicates negatives. 252 | 253 | ![Label frequencies - Training set](assets/v2-human-label-frequencies-train.png) 254 | ![Label frequencies - Val+Test sets](assets/v2-human-label-frequencies-val-test.png) 255 | ![Bounding box frequencies - Training set](assets/v2-human-label-frequencies-bbox-train.png) 256 | ![Bounding box frequencies - Val+Test sets](assets/v2-human-label-frequencies-bbox-val-test.png) 257 | 258 | ## Trained models 259 | 260 | - Resnet 101 image classification model (trained on V2 data): [Model checkpoint](https://storage.googleapis.com/openimages/2017_07/oidv2-resnet_v1_101.ckpt.tar.gz), [Checkpoint readme](https://storage.googleapis.com/openimages/2017_07/oidv2-resnet_v1_101.readme.txt), [Inference code](tools/classify_oidv2.py). 261 | 262 | ## Citations 263 | 264 | If you use the OpenImages dataset in your work, please cite it as: 265 | 266 | APA-style citation: "Krasin I., Duerig T., Alldrin N., Ferrari V., Abu-El-Haija S., Kuznetsova A., Rom H., Uijlings J., Popov S., Veit A., Belongie S., Gomes V., Gupta A., Sun C., Chechik G., Cai D., Feng Z., Narayanan D., Murphy K. OpenImages: A public dataset for large-scale multi-label and multi-class image classification, 2017. Available from https://github.com/openimages". 267 | 268 | BibTeX 269 | ``` 270 | @article{openimages, 271 | title={OpenImages: A public dataset for large-scale multi-label and multi-class image classification.}, 272 | author={Krasin, Ivan and Duerig, Tom and Alldrin, Neil and Ferrari, Vittorio and Abu-El-Haija, Sami and Kuznetsova, Alina and Rom, Hassan and Uijlings, Jasper and Popov, Stefan and Veit, Andreas and Belongie, Serge and Gomes, Victor and Gupta, Abhinav and Sun, Chen and Chechik, Gal and Cai, David and Feng, Zheyun and Narayanan, Dhyanesh and Murphy, Kevin}, 273 | journal={Dataset available from https://github.com/openimages}, 274 | year={2017} 275 | } 276 | ``` 277 | -------------------------------------------------------------------------------- /READMEV3.md: -------------------------------------------------------------------------------- 1 | 2 | # Open Images Dataset V3 3 | 4 | Open Images is a dataset of ~9 million URLs to images that have been annotated with image-level labels and bounding boxes spanning thousands of classes. 5 | 6 | The annotations are licensed by Google Inc. under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license. The contents of this repository are released under an [Apache 2](LICENSE) license. 7 | 8 | The images are listed as having a [CC BY 2.0](https://creativecommons.org/licenses/by/2.0/) license. **Note:** while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself. 9 | 10 | [![Bounding-Box Examples](assets/oid_bbox_examples.png)](http://www.cvdfoundation.org/datasets/open-images-dataset/vis) 11 | *Annotated images from the Open Images dataset. Left: [FAMILY MAKING A SNOWMAN](https://www.flickr.com/photos/mwvchamber/5433788065) by [mwvchamber](https://www.flickr.com/photos/mwvchamber/). Right: [STANZA STUDENTI.S.S. ANNUNZIATA](https://www.flickr.com/photos/ersupalermo/5759830290) by [ersupalermo](https://www.flickr.com/photos/ersupalermo/). Both images used under [CC BY 2.0](https://creativecommons.org/licenses/by/2.0/) license.
Browse the validation set [here](http://www.cvdfoundation.org/datasets/open-images-dataset/vis) and the train set [here](http://www.cvdfoundation.org/datasets/open-images-dataset/vis?set=train), courtesy of Common Visual Data Foundation ([CVDF](http://www.cvdfoundation.org)).
* 12 | 13 | ## Announcements 14 | 15 | * **11-20-2017** Inception resnet v2 object detection model released (trained on V2 data). [Model checkpoint](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md), [evaluation protocol](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/evaluation_protocols.md), and [inference and evaluation tools](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/oid_inference_and_evaluation.md) are available as part of the [Tensorflow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection). 16 | * **11-16-2017** All images can now be easily [downloaded](https://github.com/cvdfoundation/open-images-dataset) from the Common Visual Data Foundation! 17 | * **11-16-2017** V3 data released! The dataset now includes 3.7M bounding-boxes and 9.7M positive image-level labels on the training set. 18 | * **10-20-2017** Resnet 101 image classification model released (trained on V2 data). [Model checkpoint](https://storage.googleapis.com/openimages/2017_07/oidv2-resnet_v1_101.ckpt.tar.gz), [Checkpoint readme](https://storage.googleapis.com/openimages/2017_07/oidv2-resnet_v1_101.readme.txt), [Inference code](tools/classify_oidv2.py). 19 | * **07-20-2017** V2 data is released! The dataset now includes 2M bounding boxes spanning 600 object classes (1.24M in train, 830K in validation+test), and 4.3M human-verified positive image-level labels on the training set. [Changelist](CHANGELIST-V1-TO-V2.md). Coming soon: Trained models (both image-level and object detectors). 20 | * **07-20-2017** [V2 data visualizer](http://www.cvdfoundation.org/datasets/open-images-dataset/vis), courtesy of Common Visual Data Foundation ([CVDF](http://www.cvdfoundation.org)). 21 | 22 | ## Download the data 23 | 24 | * [Images](https://github.com/cvdfoundation/open-images-dataset): packaged for easy download from the Common Visual Data Foundation 25 | * [Image URLs and metadata](https://storage.googleapis.com/openimages/2017_11/images_2017_11.tar.gz) (990 MB) 26 | * [Bounding box annotations (train, validation, and test sets)](https://storage.googleapis.com/openimages/2017_11/annotations_human_bbox_2017_11.tar.gz) (97 MB) 27 | * [Human-verified image-level annotations (train, validation, and test sets)](https://storage.googleapis.com/openimages/2017_11/annotations_human_2017_11.tar.gz) (137 MB) 28 | * [Machine-generated image-level annotations (train, validation, and test sets)](https://storage.googleapis.com/openimages/2017_11/annotations_machine_2017_11.tar.gz) (447 MB) 29 | * [Classes and class descriptions](https://storage.googleapis.com/openimages/2017_11/classes_2017_11.tar.gz) (295 KB) 30 | 31 | See also how to [import the annotations into PostgreSQL](https://github.com/openimages/dataset/wiki/Importing-into-PostgreSQL). 32 | 33 | ## Data organization 34 | 35 | The [dataset](#download-the-data) is split into a training set (9,011,219 images), a validation set (41,620 images), and a test set (125,436 images). The validation of V1 set was partitioned into validation and test in the V2 release. This is intended to make evaluations more tractable. The images are annotated with image-level labels and bounding boxes as described below. 36 | 37 | ### Image-level labels 38 | 39 | Table 1 shows an overview of the image-level labels in all splits of the dataset. All images have image-level labels automatically generated by a computer vision model similar to [Google Cloud Vision API](https://cloud.google.com/vision/). These automatically generated labels have a substantial false positive rate. 40 | 41 |

Table 1: Image-level labels.

42 | 43 | | | Train | Validation | Test | # Classes | # Trainable Classes | 44 | |---:|---:|---:|---:|---:|---:| 45 | |Images|9,011,219|41,620|125,436|-|-| 46 | |Machine Generated Labels|78,977,695|512,093|1,545,835|7,870|4,966| 47 | |Human Verified Labels|20,868,755
pos:   9,741,876
neg: 11,126,879|551,390
pos: 365,772
neg: 185,618|1,667,399
pos: 1,105,052
neg:    562,347|19,693|5,000| 48 | 49 | Moreover, all images in the validation and test sets as well as part of the training set have human-verified image-level labels. Most of the human verifications have been done with in-house annotators at Google. A smaller part has been done with crowd-sourced verification from Image Labeler: [Crowdsource app](http://play.google.com/store/apps/details?id=com.google.android.apps.village.boond), [g.co/imagelabeler](http://g.co/imagelabeler). This human verification process allows to practically eliminate false positives (but not false negatives, so some labels might be missing from an image). The resulting verified positive labels are largely correct and we recommend to use these for training computer vision models. A variety of computer vision models were used to generate the samples (not just the one used for the machine-generated labels above) which is why the vocabulary is significantly expanded (#classes column in Table 1). 50 | 51 | Overall, there are 19,995 distinct [classes with image-level labels](https://storage.googleapis.com/openimages/2017_11/classes.txt) (19,693 have at least one human-verified sample and 7870 have a sample in the machine-generated pool; note that verifications come from both the released machine-generated labels and from internal-only models). Of these, [5000 classes are considered trainable](https://storage.googleapis.com/openimages/2017_11/classes-trainable.txt). The trainable classes are unchanged from V2 (in V2 they were defined to have at least 30 human-verified samples in the training set and 5 in the validation or test sets). Classes are identified by MIDs (Machine-generated Ids) as can be found in [Freebase](https://en.wikipedia.org/wiki/Freebase) or [Google Knowledge Graph API](https://developers.google.com/knowledge-graph/). A short description of each class is available in [class-descriptions.csv](https://storage.googleapis.com/openimages/2017_11/class-descriptions.csv). 52 | 53 | Each annotation has a confidence number from 0.0 to 1.0 assigned. Confidences for the human-verified labels are binary (either positive, 1.0 or negative, 0.0). Machine-generated labels have fractional confidences, generally >= 0.5. The higher the confidence, the smaller chance for the label to be a false positive. 54 | 55 | ### Bounding boxes 56 | 57 | Table 2 shows an overview of the bounding box annotations in all splits of the dataset, which span 600 object classes. These offer a broader range than those in the ILSVRC and COCO detection challenges, and include new objects such as fedora hat and snowman. 58 | 59 |

Table 2: Boxes.

60 | 61 | | | Train | Validation | Test | # Classes | # Trainable Classes | 62 | |---:|---:|---:|---:|---:|---:| 63 | |Images|1,593,853|41,620|125,436|-|-| 64 | |Boxes|3,709,509|204,621|625,282|600|545| 65 | 66 | We provide complete bounding box annotation for all object instances on the validation and test sets, all manually drawn by human annotators at Google. This was done for all image-level labels that have been positively verified by a human (see Table 1). We deliberately tried to annotate boxes at the most specific level in our [semantic hierarchy](https://storage.googleapis.com/openimages/2017_07/bbox_labels_vis/bbox_labels_vis.html) as possible. For example, car has two children limousine and van. All limousines and all vans have been annotated as such, while all other kinds of cars have been annotated as car. Moreover, annotators also marked a set of attributes for each box, e.g. indicating whether that object is occluded. On average, there are about 5 boxes per image in the validation and test sets. 67 | 68 | For the training set, we considered annotating boxes in 1.5M images, and focusing on the most specific available positive image-level labels. For example, if an image has labels {car, limousine, screwdriver}, then we consider annotating boxes for limousine and screwdriver. We provide a total of 3.68M bounding boxes. 1.22M of them were drawn manually using the efficient interface described in ["Extreme clicking for efficient object annotation", Papadopolous et al., ICCV 2017](https://arxiv.org/abs/1708.02750). We produced the other 2.46M boxes semi-automatically using an enhanced version of the method described in ["We don't need no bounding-boxes: Training object class detectors using only human verification", Papadopolous et al., CVPR 2016](https://arxiv.org/abs/1602.08405). This process iterates between detecting objects, asking humans to verify the current detections (boxes), and updating the detector. This enables to localize difficult objects that were missed in early iterations. We run it for 3-4 iterations. All boxes have been human verified to have IoU > 0.7 with the perfect box, and in practice they are very accurate (mean IoU around 0.82). We deliberately did not annotate human body parts for 80% of the training set due to the overwhelming number of instances. On average, there are about 2 boxes per image in the training set. In general over the whole training set, if there are multiple objects of the same class in one image, only one has been boxed. 69 | 70 | Overall, there are [600 distinct classes with a bounding box](https://storage.googleapis.com/openimages/2017_11/classes-bbox.txt) attached to at least one image. Of these, [545 classes are considered trainable](https://storage.googleapis.com/openimages/2017_11/classes-bbox-trainable.txt) (the intersection of the 600 boxable classes with the 5000 image-level trainable classes). 71 | 72 | ### Data Formats 73 | 74 | The data tarballs contain the following files: 75 | 76 | ### images.csv 77 | 78 | There's one such file for each subset inside train, validation, and test subdirectories respectively. It has image URLs, their OpenImages IDs, titles, authors and license information: 79 | 80 | ``` 81 | ImageID,Subset,OriginalURL,OriginalLandingURL,License,AuthorProfileURL,Author,Title,\ 82 | OriginalSize,OriginalMD5,Thumbnail300KURL 83 | ... 84 | 000060e3121c7305,train,https://c1.staticflickr.com/5/4129/5215831864_46f356962f_o.jpg,\ 85 | https://www.flickr.com/photos/brokentaco/5215831864,\ 86 | https://creativecommons.org/licenses/by/2.0/,\ 87 | "https://www.flickr.com/people/brokentaco/","David","28 Nov 2010 Our new house."\ 88 | 211079,0Sad+xMj2ttXM1U8meEJ0A==,https://c1.staticflickr.com/5/4129/5215831864_ee4e8c6535_z.jpg 89 | ... 90 | ``` 91 | 92 | Each image has a unique 64-bit ID assigned. In the CSV files they appear as zero-padded hex integers, such as 000060e3121c7305. 93 | 94 | The data is as it appears on the destination websites. 95 | 96 | * OriginalSize is the download size of the original image. 97 | * OriginalMD5 is base64-encoded binary MD5, as described [here](https://cloud.google.com/storage/transfer/create-url-list#md5). 98 | * Thumbnail300KURL is an optional URL to a thumbnail with ~300K pixels (~640x480). It's provided for the convenience of downloading the data in the absence of more convenient ways to get the images. If missing, the OriginalURL must be used (and then resized to the same size, if needed). **Beware:** these thumbnails are generated on the fly and their contents and even resolution might be different every day. 99 | 100 | ### annotations-machine.csv 101 | 102 | Machine-generated image-level labels (one file each for train, validation, and test): 103 | 104 | ``` 105 | ImageID,Source,LabelName,Confidence 106 | 000002b66c9c498e,machine,/m/05_4_,0.7 107 | 000002b66c9c498e,machine,/m/0krfg,0.7 108 | 000002b66c9c498e,machine,/m/01kcnl,0.5 109 | 000002b97e5471a0,machine,/m/05_5t0l,0.9 110 | 000002b97e5471a0,machine,/m/0cgh4,0.8 111 | 000002b97e5471a0,machine,/m/0dx1j,0.8 112 | 000002b97e5471a0,machine,/m/039jbq,0.8 113 | 000002b97e5471a0,machine,/m/03nfmq,0.8 114 | 000002b97e5471a0,machine,/m/03jm5,0.7 115 | ... 116 | ``` 117 | 118 | These were generated from a computer vision model similar to 119 | [Google Cloud Vision API](https://cloud.google.com/vision/). 120 | 121 | ### annotations-human.csv 122 | 123 | Human-verified image-level labels (one file each for train, validation, and test): 124 | 125 | ``` 126 | ImageID,Source,LabelName,Confidence 127 | 000026e7ee790996,verification,/m/04hgtk,0 128 | 000026e7ee790996,verification,/m/07j7r,1 129 | 000026e7ee790996,crowdsource-verification,/m/01bqvp,1 130 | 000026e7ee790996,crowdsource-verification,/m/0csby,1 131 | 000026e7ee790996,verification,/m/01_m7,0 132 | 000026e7ee790996,verification,/m/01cbzq,1 133 | 000026e7ee790996,verification,/m/01czv3,0 134 | 000026e7ee790996,verification,/m/01v4jb,0 135 | 000026e7ee790996,verification,/m/03d1rd,0 136 | ... 137 | ``` 138 | 139 | The Source column indicates how the annotation was created: 140 | 141 | - "verification" are human verified image-level labels. 142 | - "crowdsource-verification" are human verified labels from the Crowdsource app. 143 | 144 | ### annotations-human-bbox.csv 145 | 146 | Human provided labels with bounding box coordinates (one file each for train, 147 | validation, and test). 148 | 149 | For the train set labels and box coordinates are provided: 150 | 151 | ``` 152 | ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax 153 | 000002b66c9c498e,activemil,/m/0284d,1,0.560250,0.951487,0.696401,1.000000 154 | 000002b66c9c498e,activemil,/m/052lwg6,1,0.543036,0.907668,0.699531,0.995305 155 | 000002b66c9c498e,activemil,/m/0fszt,1,0.510172,0.979656,0.641628,0.987480 156 | 000002b66c9c498e,verification,/m/01mzpv,1,0.018750,0.098438,0.767187,0.892187 157 | 000002b66c9c498e,xclick,/m/01g317,1,0.012520,0.195618,0.148670,0.588419 158 | 000002b66c9c498e,xclick,/m/0284d,1,0.528951,0.924883,0.676056,0.965571 159 | 000002b66c9c498e,xclick,/m/02wbm,1,0.530516,0.923318,0.668232,0.976526 160 | 000002b66c9c498e,xclick,/m/052lwg6,1,0.516432,0.928012,0.651017,0.985915 161 | 000002b66c9c498e,xclick,/m/0fszt,1,0.525822,0.920188,0.669797,0.971831 162 | ... 163 | ``` 164 | 165 | For the validation and test sets additional attributes are also given: 166 | 167 | ``` 168 | ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside 169 | 000026e7ee790996,freeform,/m/07j7r,1,0.071905,0.145346,0.206591,0.391306,0,1,1,0,0 170 | 000026e7ee790996,freeform,/m/07j7r,1,0.439756,0.572466,0.264153,0.435122,0,1,1,0,0 171 | 000026e7ee790996,freeform,/m/07j7r,1,0.668455,1.000000,0.000000,0.552825,0,1,1,0,0 172 | 000062a39995e348,freeform,/m/015p6,1,0.205719,0.849912,0.154144,1.000000,0,0,0,0,0 173 | 000062a39995e348,freeform,/m/05s2s,1,0.137133,0.377634,0.000000,0.884185,1,1,0,0,0 174 | 0000c64e1253d68f,freeform,/m/07yv9,1,0.000000,0.973850,0.000000,0.043342,0,1,1,0,0 175 | 0000c64e1253d68f,freeform,/m/0k4j,1,0.000000,0.513534,0.321356,0.689661,0,1,0,0,0 176 | 0000c64e1253d68f,freeform,/m/0k4j,1,0.016515,0.268228,0.299368,0.462906,1,0,0,0,0 177 | 0000c64e1253d68f,freeform,/m/0k4j,1,0.481498,0.904376,0.232029,0.489017,1,0,0,0,0 178 | ... 179 | ``` 180 | 181 | The attributes have the following definitions: 182 | 183 | - IsOccluded: Indicates that the object is occluded by another object in the image. 184 | - IsTruncated: Indicates that the object extends beyond the boundary of the image. 185 | - IsGroupOf: Indicates that the box spans a group of objects (e.g., a bed of flowers or a crowd of people). We asked annotators to use this tag for cases with more than 5 instances which are heavily occluding each other and are physically touching. 186 | - IsDepiction: Indicates that the object is a depiction (e.g., a cartoon or drawing of the object, not a real physical instance). 187 | - IsInside: Indicates a picture taken from the inside of the object (e.g., a car interior or inside of a building). 188 | 189 | The Source column indicates how the box was created: 190 | 191 | - "freeform" and "xclick" are human drawn boxes. 192 | - "activemil" are boxes human verified to be tight (IoU > 0.7) through an Active MIL process. 193 | - "verification" are boxes human verified to be tight (IoU > 0.7) from an object detection model internal to Google. 194 | 195 | 196 | ### class-descriptions.csv 197 | 198 | The label MIDs can be converted to their short descriptions by looking into class-descriptions.csv: 199 | 200 | ``` 201 | ... 202 | /m/025dyy,Box 203 | /m/025f_6,Dussehra 204 | /m/025fh,Professor x 205 | /m/025fnn,Savannah Sparrow 206 | /m/025fsf,Stapler 207 | /m/025gg7,Jaguar x-type 208 | /m/02_5h,Figure skating 209 | /m/025_h00,Solid-state drive 210 | /m/025_h88,White tailed prairie dog 211 | /m/025_hbp,Mercury monterey 212 | /m/025h_m,Yellow rumped Warbler 213 | /m/025khl,Spätzle 214 | ... 215 | ``` 216 | 217 | Note the presence of characters like commas and quotes. The file follows 218 | standard csv escaping rules. E.g., 219 | 220 | ``` 221 | /m/02wvth,"Fiat 500 ""topolino""" 222 | /m/03gtp5,Lamb's quarters 223 | /m/03hgsf0,"Lemon, lime and bitters" 224 | ``` 225 | 226 | ### classes.txt 227 | 228 | The list of 19,995 image-level classes: 229 | 230 | ``` 231 | /m/0100nhbf 232 | /m/0104x9kv 233 | /m/0105jzwx 234 | /m/0105ld7g 235 | /m/0105lxy5 236 | /m/0105n86x 237 | /m/0105ts35 238 | /m/0108_09c 239 | /m/01_097 240 | /m/010dmf 241 | ... 242 | ``` 243 | 244 | ### classes-trainable.txt 245 | 246 | The list of 5000 trainable image-level classes. 247 | 248 | ### classes-bbox.txt 249 | 250 | The list of 600 box-level classes. 251 | 252 | ### classes-bbox-trainable.txt 253 | 254 | The list of 545 trainable box-level classes. 255 | 256 | ## Statistics and data analysis 257 | 258 | ### Browse the data 259 | 260 | - Browse the bounding box groundtruth [here](http://www.cvdfoundation.org/datasets/open-images-dataset/vis), courtesy of [CVDF](http://www.cvdfoundation.org). 261 | 262 | - View the set of boxable labels [here](https://storage.googleapis.com/openimages/2017_07/bbox_labels_vis/bbox_labels_vis.html): 263 | 264 | [![Hierarchy Visualizer](assets/v2-bbox_labels_vis_screenshot.png)](https://storage.googleapis.com/openimages/2017_07/bbox_labels_vis/bbox_labels_vis.html) 265 | 266 | ### Label distribution 267 | The following figures show the distribution of annotations across the dataset. Notice that the class distribution is heavily skewed (note: the y-axis is on a log-scale). Labels are ordered by number of positive examples, then by number of negative examples. Green indicates positive examples while red indicates negatives. 268 | 269 | ![Label frequencies - Training set](assets/v3-human-label-frequencies-train.png) 270 | ![Label frequencies - Validation set](assets/v3-human-label-frequencies-validation.png) 271 | ![Label frequencies - Test set](assets/v3-human-label-frequencies-test.png) 272 | ![Bounding box frequencies - Training set](assets/v3-human-bbox-frequencies-train.png) 273 | ![Bounding box frequencies - Validation set](assets/v3-human-bbox-frequencies-validation.png) 274 | ![Bounding box frequencies - Test set](assets/v3-human-bbox-frequencies-test.png) 275 | 276 | ## Trained models 277 | 278 | - Inception resnet v2 object detection model (trained on V2 data). [Model checkpoint](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md), [evaluation protocol](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/evaluation_protocols.md), and [inference and evaluation tools](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/oid_inference_and_evaluation.md) are available as part of the [Tensorflow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection). 279 | - Resnet 101 image classification model (trained on V2 data): [Model checkpoint](https://storage.googleapis.com/openimages/2017_07/oidv2-resnet_v1_101.ckpt.tar.gz), [Checkpoint readme](https://storage.googleapis.com/openimages/2017_07/oidv2-resnet_v1_101.readme.txt), [Inference code](tools/classify_oidv2.py). 280 | 281 | ## Community Contributions 282 | 283 | * The team at [Algorithmia](https://algorithmia.com/) created an [in-depth object detection tutorial](https://blog.algorithmia.com/deep-dive-into-object-detection-with-open-images-using-tensorflow/) that walks through how to use the provided bounding box annotations to create a useful object detection model with Tensorflow. 284 | 285 | * Dan Nuffer offers helper code to retrieve the images at [Open Images dataset downloader](https://github.com/dnuffer/open_images_downloader). It is a program built for downloading, verifying and resizing the images and metadata. It is designed to run as fast as possible by taking advantage of the available hardware and bandwidth by using asynchronous I/O and parallelism. 286 | 287 | NOTE: While we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself. 288 | 289 | ## Citations 290 | 291 | If you use the OpenImages dataset in your work, please cite it as: 292 | 293 | APA-style citation: "Krasin I., Duerig T., Alldrin N., Ferrari V., Abu-El-Haija S., Kuznetsova A., Rom H., Uijlings J., Popov S., Veit A., Belongie S., Gomes V., Gupta A., Sun C., Chechik G., Cai D., Feng Z., Narayanan D., Murphy K. OpenImages: A public dataset for large-scale multi-label and multi-class image classification, 2017. Available from https://github.com/openimages". 294 | 295 | BibTeX 296 | ``` 297 | @article{openimages, 298 | title={OpenImages: A public dataset for large-scale multi-label and multi-class image classification.}, 299 | author={Krasin, Ivan and Duerig, Tom and Alldrin, Neil and Ferrari, Vittorio and Abu-El-Haija, Sami and Kuznetsova, Alina and Rom, Hassan and Uijlings, Jasper and Popov, Stefan and Veit, Andreas and Belongie, Serge and Gomes, Victor and Gupta, Abhinav and Sun, Chen and Chechik, Gal and Cai, David and Feng, Zheyun and Narayanan, Dhyanesh and Murphy, Kevin}, 300 | journal={Dataset available from https://github.com/openimages}, 301 | year={2017} 302 | } 303 | ``` 304 | -------------------------------------------------------------------------------- /assets/bbox_hierarchy.json: -------------------------------------------------------------------------------- 1 | {"name": "Entity", "children": [{"name": "Miscellaneous object", "children": [{"name": "Coin", "size": 1000}, {"name": "Flag", "size": 1000}, {"name": "Light bulb", "size": 1000}]}, {"name": "Indoor", "children": [{"name": "Toy", "children": [{"name": "Doll", "size": 1000}, {"name": "Balloon", "size": 1000}, {"name": "Dice", "size": 1000}, {"name": "Flying disc", "size": 1000}, {"name": "Kite", "size": 1000}, {"name": "Teddy bear", "size": 1000}]}, {"name": "Home appliance", "children": [{"name": "Washing machine", "size": 1000}, {"name": "Toaster", "size": 1000}, {"name": "Oven", "size": 1000}, {"name": "Blender", "size": 1000}, {"name": "Gas stove", "size": 1000}, {"name": "Mechanical fan", "size": 1000}, {"name": "Heater", "size": 1000}, {"name": "Kettle", "size": 1000}, {"name": "Hair dryer", "size": 1000}, {"name": "Refrigerator", "size": 1000}, {"name": "Wood-burning stove", "size": 1000}, {"name": "Humidifier", "size": 1000}, {"name": "Mixer", "size": 1000}, {"name": "Coffeemaker", "size": 1000}, {"name": "Vacuum", "size": 1000}, {"name": "Microwave oven", "size": 1000}, {"name": "Dishwasher", "size": 1000}, {"name": "Sewing machine", "size": 1000}, {"name": "Hand dryer", "size": 1000}, {"name": "Ceiling fan", "size": 1000}]}, {"name": "Plumbing fixture", "children": [{"name": "Sink", "size": 1000}, {"name": "Bidet", "size": 1000}, {"name": "Shower", "size": 1000}, {"name": "Tap", "size": 1000}, {"name": "Bathtub", "size": 1000}, {"name": "Toilet", "size": 1000}]}, {"name": "Office supplies", "children": [{"name": "Scissors", "size": 1000}, {"name": "Poster", "size": 1000}, {"name": "Calculator", "size": 1000}, {"name": "Box", "size": 1000}, {"name": "Stapler", "size": 1000}, {"name": "Whiteboard", "size": 1000}, {"name": "Pencil sharpener", "size": 1000}, {"name": "Eraser", "size": 1000}, {"name": "Fax", "size": 1000}, {"name": "Adhesive tape", "size": 1000}, {"name": "Ring binder", "size": 1000}, {"name": "Pencil case", "size": 1000}, {"name": "Plastic bag", "size": 1000}, {"name": "Paper cutter", "size": 1000}, {"name": "Toilet paper", "size": 1000}, {"name": "Envelope", "size": 1000}, {"name": "Pen", "size": 1000}]}, {"name": "Paper towel", "size": 1000}, {"name": "Pillow", "size": 1000}, {"name": "Kitchenware", "children": [{"name": "Kitchen utensil", "children": [{"name": "Chopsticks", "size": 1000}, {"name": "Ladle", "size": 1000}, {"name": "Spatula", "size": 1000}, {"name": "Can opener", "size": 1000}, {"name": "Cutting board", "size": 1000}, {"name": "Whisk", "size": 1000}, {"name": "Drinking straw", "size": 1000}, {"name": "Knife", "size": 1000}, {"name": "Bottle opener", "size": 1000}, {"name": "Measuring cup", "size": 1000}, {"name": "Pizza cutter", "size": 1000}, {"name": "Spoon", "size": 1000}, {"name": "Fork", "size": 1000}]}, {"name": "Tableware", "children": [{"name": "Chopsticks", "size": 1000}, {"name": "Teapot", "size": 1000}, {"name": "Mug", "size": 1000}, {"name": "Coffee cup", "size": 1000}, {"name": "Salt and pepper shakers", "size": 1000}, {"name": "Mixing bowl", "size": 1000}, {"name": "Saucer", "size": 1000}, {"name": "Cocktail shaker", "size": 1000}, {"name": "Bottle", "size": 1000}, {"name": "Bowl", "size": 1000}, {"name": "Plate", "size": 1000}, {"name": "Pitcher", "size": 1000}, {"name": "Kitchen knife", "size": 1000}, {"name": "Jug", "size": 1000}, {"name": "Platter", "size": 1000}, {"name": "Wine glass", "size": 1000}, {"name": "Spoon", "size": 1000}, {"name": "Fork", "size": 1000}, {"name": "Serving tray", "size": 1000}, {"name": "Cake stand", "size": 1000}]}, {"name": "Frying pan", "size": 1000}, {"name": "Wok", "size": 1000}, {"name": "Spice rack", "size": 1000}, {"name": "Kitchen appliance", "children": [{"name": "Oven", "size": 1000}, {"name": "Blender", "size": 1000}, {"name": "Slow cooker", "size": 1000}, {"name": "Food processor", "size": 1000}, {"name": "Refrigerator", "size": 1000}, {"name": "Waffle iron", "size": 1000}, {"name": "Mixer", "size": 1000}, {"name": "Coffeemaker", "size": 1000}, {"name": "Microwave oven", "size": 1000}, {"name": "Pressure cooker", "size": 1000}, {"name": "Dishwasher", "size": 1000}]}]}, {"name": "Fireplace", "size": 1000}, {"name": "Countertop", "size": 1000}, {"name": "Book", "size": 1000}, {"name": "Furniture", "children": [{"name": "Chair", "size": 1000}, {"name": "Cabinetry", "size": 1000}, {"name": "Desk", "size": 1000}, {"name": "Wine rack", "size": 1000}, {"name": "Couch", "children": [{"name": "Sofa bed", "size": 1000}, {"name": "Loveseat", "size": 1000}]}, {"name": "Wardrobe", "size": 1000}, {"name": "Nightstand", "size": 1000}, {"name": "Bookcase", "size": 1000}, {"name": "Bed", "children": [{"name": "Infant bed", "size": 1000}, {"name": "studio couch", "size": 1000}]}, {"name": "Filing cabinet", "size": 1000}, {"name": "Table", "children": [{"name": "Coffee table", "size": 1000}, {"name": "Kitchen & dining room table", "size": 1000}]}, {"name": "Chest of drawers", "size": 1000}, {"name": "Cupboard", "size": 1000}, {"name": "Bench", "size": 1000}, {"name": "Drawer", "size": 1000}, {"name": "Stool", "size": 1000}, {"name": "Shelf", "size": 1000}, {"name": "Wall clock", "size": 1000}, {"name": "Bathroom cabinet", "size": 1000}, {"name": "Closet", "size": 1000}]}, {"name": "Dog bed", "size": 1000}, {"name": "Cat furniture", "size": 1000}, {"name": "Interior design", "children": [{"name": "Lantern", "size": 1000}, {"name": "Poster", "size": 1000}, {"name": "Cabinetry", "size": 1000}, {"name": "Clock", "children": [{"name": "Alarm clock", "size": 1000}, {"name": "Digital clock", "size": 1000}, {"name": "Wall clock", "size": 1000}]}, {"name": "Christmas tree", "size": 1000}, {"name": "Vase", "size": 1000}, {"name": "Window blind", "size": 1000}, {"name": "Curtain", "size": 1000}, {"name": "Mirror", "size": 1000}, {"name": "Sculpture", "children": [{"name": "Snowman", "size": 1000}, {"name": "Bust", "size": 1000}, {"name": "Bronze sculpture", "size": 1000}]}, {"name": "Picture frame", "size": 1000}, {"name": "Candle", "size": 1000}, {"name": "Lamp", "size": 1000}, {"name": "Flowerpot", "size": 1000}, {"name": "Bathroom accessory", "children": [{"name": "Towel", "size": 1000}, {"name": "Toilet paper", "size": 1000}, {"name": "Soap dispenser", "size": 1000}, {"name": "Facial tissue holder", "size": 1000}]}]}]}, {"name": "Outdoor", "children": [{"name": "Snowman", "size": 1000}, {"name": "Beehive", "size": 1000}, {"name": "Tent", "size": 1000}, {"name": "Street items", "children": [{"name": "Parking meter", "size": 1000}, {"name": "Traffic light", "size": 1000}, {"name": "Billboard", "size": 1000}, {"name": "Traffic sign", "children": [{"name": "Stop sign", "size": 1000}]}, {"name": "Fire hydrant", "size": 1000}, {"name": "Fountain", "size": 1000}, {"name": "Street light", "size": 1000}]}, {"name": "Jacuzzi", "size": 1000}, {"name": "Building", "children": [{"name": "Tree house", "size": 1000}, {"name": "Lighthouse", "size": 1000}, {"name": "Skyscraper", "size": 1000}, {"name": "Castle", "size": 1000}, {"name": "Tower", "size": 1000}, {"name": "Buiding part", "children": [{"name": "Door", "children": [{"name": "Door handle", "size": 1000}]}, {"name": "Window", "size": 1000}, {"name": "Stairs", "size": 1000}, {"name": "Porch", "size": 1000}]}, {"name": "House", "size": 1000}, {"name": "Office building", "size": 1000}, {"name": "Convenience store", "size": 1000}]}, {"name": "Swimming pool", "size": 1000}]}, {"name": "Person", "children": [{"name": "Body part", "children": [{"name": "Eye", "size": 1000}, {"name": "Skull", "size": 1000}, {"name": "Head", "size": 1000}, {"name": "Face", "size": 1000}, {"name": "Mouth", "size": 1000}, {"name": "Ear", "size": 1000}, {"name": "Nose", "size": 1000}, {"name": "Hair", "size": 1000}, {"name": "Hand", "size": 1000}, {"name": "Foot", "size": 1000}, {"name": "Arm", "size": 1000}, {"name": "Leg", "size": 1000}, {"name": "Beard", "size": 1000}]}, {"name": "Man", "size": 1000}, {"name": "Woman", "size": 1000}, {"name": "Boy", "size": 1000}, {"name": "Girl", "size": 1000}]}, {"name": "Food", "children": [{"name": "Fast food", "children": [{"name": "Hot dog", "size": 1000}, {"name": "French fries", "size": 1000}]}, {"name": "Waffle", "size": 1000}, {"name": "Pancake", "size": 1000}, {"name": "Burrito", "size": 1000}, {"name": "Snack", "children": [{"name": "Pretzel", "size": 1000}, {"name": "Popcorn", "size": 1000}, {"name": "Cookie", "size": 1000}]}, {"name": "Dessert", "children": [{"name": "Muffin", "size": 1000}, {"name": "Cookie", "size": 1000}, {"name": "Ice cream", "size": 1000}, {"name": "Cake", "size": 1000}, {"name": "Candy", "size": 1000}]}, {"name": "Guacamole", "size": 1000}, {"name": "Fruit", "children": [{"name": "Apple", "size": 1000}, {"name": "Grape", "size": 1000}, {"name": "Common fig", "size": 1000}, {"name": "Pear", "size": 1000}, {"name": "Strawberry", "size": 1000}, {"name": "Tomato", "size": 1000}, {"name": "Lemon", "size": 1000}, {"name": "Banana", "size": 1000}, {"name": "Orange", "size": 1000}, {"name": "Peach", "size": 1000}, {"name": "Coconut", "size": 1000}, {"name": "Mango", "size": 1000}, {"name": "Pineapple", "size": 1000}, {"name": "Grapefruit", "size": 1000}, {"name": "Pomegranate", "size": 1000}, {"name": "Watermelon", "size": 1000}, {"name": "Cantaloupe", "size": 1000}]}, {"name": "Egg", "size": 1000}, {"name": "Baked goods", "children": [{"name": "Pretzel", "size": 1000}, {"name": "Bagel", "size": 1000}, {"name": "Muffin", "size": 1000}, {"name": "Cookie", "size": 1000}, {"name": "Bread", "size": 1000}, {"name": "Pastry", "children": [{"name": "Doughnut", "size": 1000}, {"name": "Croissant", "size": 1000}, {"name": "Tart", "size": 1000}]}]}, {"name": "Mushroom", "size": 1000}, {"name": "Pasta", "size": 1000}, {"name": "Pizza", "size": 1000}, {"name": "Seafood", "children": [{"name": "Squid", "size": 1000}, {"name": "Shellfish", "children": [{"name": "Oyster", "size": 1000}, {"name": "Lobster", "size": 1000}, {"name": "Shrimp", "size": 1000}, {"name": "Crab", "size": 1000}]}]}, {"name": "Taco", "size": 1000}, {"name": "Cooking spray", "size": 1000}, {"name": "Vegetable", "children": [{"name": "Cucumber", "size": 1000}, {"name": "Radish", "size": 1000}, {"name": "Artichoke", "size": 1000}, {"name": "Potato", "size": 1000}, {"name": "Tomato", "size": 1000}, {"name": "Asparagus", "size": 1000}, {"name": "Squash", "children": [{"name": "Pumpkin", "size": 1000}, {"name": "Zucchini", "size": 1000}]}, {"name": "Cabbage", "size": 1000}, {"name": "Carrot", "size": 1000}, {"name": "Salad", "size": 1000}, {"name": "Broccoli", "size": 1000}, {"name": "Bell pepper", "size": 1000}, {"name": "Winter melon", "size": 1000}]}, {"name": "Honeycomb", "size": 1000}, {"name": "Sandwich", "children": [{"name": "Hamburger", "size": 1000}, {"name": "Submarine sandwich", "size": 1000}]}, {"name": "Dairy", "children": [{"name": "Cheese", "size": 1000}, {"name": "Milk", "size": 1000}]}, {"name": "Sushi", "size": 1000}]}, {"name": "Plant", "children": [{"name": "Houseplant", "size": 1000}, {"name": "Tree", "children": [{"name": "Christmas tree", "size": 1000}, {"name": "Tree house", "size": 1000}, {"name": "Palm tree", "size": 1000}, {"name": "Maple", "size": 1000}, {"name": "Coconut", "size": 1000}, {"name": "Willow", "size": 1000}]}, {"name": "Flower", "children": [{"name": "Lavender", "size": 1000}, {"name": "Rose", "size": 1000}, {"name": "Sunflower", "size": 1000}, {"name": "Lily", "size": 1000}]}]}, {"name": "Vehicle", "children": [{"name": "Land vehicle", "children": [{"name": "Ambulance", "size": 1000}, {"name": "Cart", "size": 1000}, {"name": "Bicycle", "children": [{"name": "Bicycle wheel", "size": 1000}]}, {"name": "Bus", "size": 1000}, {"name": "Snowmobile", "size": 1000}, {"name": "Golf cart", "size": 1000}, {"name": "Motorcycle", "size": 1000}, {"name": "Segway", "size": 1000}, {"name": "Tank", "size": 1000}, {"name": "Train", "size": 1000}, {"name": "Truck", "size": 1000}, {"name": "Auto part", "children": [{"name": "Vehicle registration plate", "size": 1000}, {"name": "Wheel", "size": 1000}, {"name": "Seat belt", "size": 1000}, {"name": "Tire", "size": 1000}]}, {"name": "Unicycle", "size": 1000}, {"name": "Car", "children": [{"name": "Limousine", "size": 1000}, {"name": "Van", "size": 1000}]}, {"name": "Taxi", "size": 1000}, {"name": "Wheelchair", "size": 1000}]}, {"name": "Watercraft", "children": [{"name": "Boat", "children": [{"name": "Barge", "size": 1000}, {"name": "Gondola", "size": 1000}, {"name": "Canoe", "size": 1000}]}, {"name": "Jet ski", "size": 1000}, {"name": "Submarine", "size": 1000}]}, {"name": "Aerial vehicle", "children": [{"name": "Helicopter", "size": 1000}, {"name": "Airplane", "size": 1000}, {"name": "Rocket", "size": 1000}]}]}, {"name": "Clothing", "children": [{"name": "Shorts", "size": 1000}, {"name": "Dress", "size": 1000}, {"name": "Swimwear", "size": 1000}, {"name": "Brassiere", "size": 1000}, {"name": "Tiara", "size": 1000}, {"name": "Shirt", "size": 1000}, {"name": "Coat", "size": 1000}, {"name": "Suit", "size": 1000}, {"name": "Hat", "children": [{"name": "Cowboy hat", "size": 1000}, {"name": "Fedora", "size": 1000}, {"name": "Sombrero", "size": 1000}, {"name": "Sun hat", "size": 1000}]}, {"name": "Scarf", "size": 1000}, {"name": "Skirt", "children": [{"name": "Miniskirt", "size": 1000}]}, {"name": "Jacket", "size": 1000}, {"name": "Fashion accessory", "children": [{"name": "Glove", "children": [{"name": "Baseball glove", "size": 1000}]}, {"name": "Belt", "size": 1000}, {"name": "Sunglasses", "size": 1000}, {"name": "Tiara", "size": 1000}, {"name": "Necklace", "size": 1000}, {"name": "Sock", "size": 1000}, {"name": "Earrings", "size": 1000}, {"name": "Tie", "size": 1000}, {"name": "Goggles", "size": 1000}, {"name": "Hat", "children": [{"name": "Cowboy hat", "size": 1000}, {"name": "Fedora", "size": 1000}, {"name": "Sombrero", "size": 1000}, {"name": "Sun hat", "size": 1000}]}, {"name": "Scarf", "size": 1000}, {"name": "Handbag", "size": 1000}, {"name": "Watch", "size": 1000}, {"name": "Umbrella", "size": 1000}, {"name": "Glasses", "size": 1000}, {"name": "Crown", "size": 1000}]}, {"name": "Swim cap", "size": 1000}, {"name": "Trousers", "children": [{"name": "Jeans", "size": 1000}]}, {"name": "Footwear", "children": [{"name": "Roller skates", "size": 1000}, {"name": "Boot", "size": 1000}, {"name": "High heels", "size": 1000}, {"name": "Sandal", "size": 1000}]}, {"name": "Sports uniform", "size": 1000}, {"name": "Luggage & bags", "children": [{"name": "Backpack", "size": 1000}, {"name": "Suitcase", "size": 1000}, {"name": "Briefcase", "size": 1000}, {"name": "Handbag", "size": 1000}]}, {"name": "Helmet", "children": [{"name": "Bicycle helmet", "size": 1000}, {"name": "Football helmet", "size": 1000}]}]}, {"name": "Animal", "children": [{"name": "Bird", "children": [{"name": "Magpie", "size": 1000}, {"name": "Woodpecker", "size": 1000}, {"name": "Blue jay", "size": 1000}, {"name": "Ostrich", "size": 1000}, {"name": "Penguin", "size": 1000}, {"name": "Raven", "size": 1000}, {"name": "Chicken", "size": 1000}, {"name": "Eagle", "size": 1000}, {"name": "Owl", "size": 1000}, {"name": "Duck", "size": 1000}, {"name": "Canary", "size": 1000}, {"name": "Goose", "size": 1000}, {"name": "Swan", "size": 1000}, {"name": "Falcon", "size": 1000}, {"name": "Parrot", "size": 1000}, {"name": "Sparrow", "size": 1000}, {"name": "Turkey", "size": 1000}]}, {"name": "Invertebrate", "children": [{"name": "Tick", "size": 1000}, {"name": "Centipede", "size": 1000}, {"name": "Marine invertebrates", "children": [{"name": "Starfish", "size": 1000}, {"name": "Isopod", "size": 1000}, {"name": "Squid", "size": 1000}, {"name": "Lobster", "size": 1000}, {"name": "Jellyfish", "size": 1000}, {"name": "Shrimp", "size": 1000}, {"name": "Crab", "size": 1000}]}, {"name": "Insect", "children": [{"name": "Bee", "children": [{"name": "Beehive", "size": 1000}]}, {"name": "Beetle", "children": [{"name": "Lady bug", "size": 1000}]}, {"name": "Ant", "size": 1000}, {"name": "Moths and butterflies", "children": [{"name": "Caterpillar", "size": 1000}, {"name": "Butterfly", "size": 1000}]}, {"name": "Dragonfly", "size": 1000}]}, {"name": "Scorpion", "size": 1000}, {"name": "Worm", "size": 1000}, {"name": "Spider", "size": 1000}, {"name": "Oyster", "size": 1000}, {"name": "Snail", "size": 1000}]}, {"name": "Mammal", "children": [{"name": "Bat", "size": 1000}, {"name": "Carnivore", "children": [{"name": "Bear", "children": [{"name": "Brown bear", "size": 1000}, {"name": "Panda", "size": 1000}, {"name": "Polar bear", "size": 1000}, {"name": "Teddy bear", "size": 1000}]}, {"name": "Cat", "size": 1000}, {"name": "Fox", "size": 1000}, {"name": "Jaguar", "size": 1000}, {"name": "Lynx", "size": 1000}, {"name": "Red panda", "size": 1000}, {"name": "Tiger", "size": 1000}, {"name": "Lion", "size": 1000}, {"name": "Dog", "size": 1000}, {"name": "Leopard", "size": 1000}, {"name": "Cheetah", "size": 1000}, {"name": "Otter", "size": 1000}, {"name": "Raccoon", "size": 1000}]}, {"name": "Camel", "size": 1000}, {"name": "Cattle", "size": 1000}, {"name": "Giraffe", "size": 1000}, {"name": "Rhinoceros", "size": 1000}, {"name": "Goat", "size": 1000}, {"name": "Horse", "size": 1000}, {"name": "Hamster", "size": 1000}, {"name": "Kangaroo", "size": 1000}, {"name": "Koala", "size": 1000}, {"name": "Mouse", "size": 1000}, {"name": "Pig", "size": 1000}, {"name": "Rabbit", "size": 1000}, {"name": "Squirrel", "size": 1000}, {"name": "Sheep", "size": 1000}, {"name": "Zebra", "size": 1000}, {"name": "Monkey", "size": 1000}, {"name": "Hippopotamus", "size": 1000}, {"name": "Deer", "size": 1000}, {"name": "Elephant", "size": 1000}, {"name": "Porcupine", "size": 1000}, {"name": "Hedgehog", "size": 1000}, {"name": "Bull", "size": 1000}, {"name": "Antelope", "size": 1000}, {"name": "Mule", "size": 1000}, {"name": "Marine mammal", "children": [{"name": "Dolphin", "size": 1000}, {"name": "Whale", "size": 1000}, {"name": "Sea lion", "size": 1000}, {"name": "Harbor seal", "size": 1000}]}, {"name": "Skunk", "size": 1000}, {"name": "Alpaca", "size": 1000}, {"name": "Armadillo", "size": 1000}]}, {"name": "Reptile & Amphibian", "children": [{"name": "Dinosaur", "size": 1000}, {"name": "Lizard", "size": 1000}, {"name": "Snake", "size": 1000}, {"name": "Turtle", "children": [{"name": "Tortoise", "size": 1000}, {"name": "Sea turtle", "size": 1000}]}, {"name": "Crocodile", "size": 1000}, {"name": "Frog", "size": 1000}]}, {"name": "Fish", "children": [{"name": "Goldfish", "size": 1000}, {"name": "Shark", "size": 1000}, {"name": "Rays and skates", "size": 1000}, {"name": "Seahorse", "size": 1000}]}, {"name": "Shellfish", "children": [{"name": "Oyster", "size": 1000}, {"name": "Lobster", "size": 1000}, {"name": "Shrimp", "size": 1000}, {"name": "Crab", "size": 1000}]}]}, {"name": "Health and beauty", "children": [{"name": "Cosmetics", "children": [{"name": "Face powder", "size": 1000}, {"name": "Hair spray", "size": 1000}, {"name": "Lipstick", "size": 1000}, {"name": "Perfume", "size": 1000}]}, {"name": "Personal care", "children": [{"name": "Toothbrush", "size": 1000}, {"name": "Sunglasses", "size": 1000}, {"name": "Goggles", "size": 1000}, {"name": "Crutch", "size": 1000}, {"name": "Cream", "size": 1000}, {"name": "Diaper", "size": 1000}, {"name": "Glasses", "size": 1000}, {"name": "Wheelchair", "size": 1000}]}]}, {"name": "Equipment", "children": [{"name": "Medical equipment", "children": [{"name": "Syringe", "size": 1000}, {"name": "Stretcher", "size": 1000}, {"name": "Stethoscope", "size": 1000}, {"name": "Band-aid", "size": 1000}]}, {"name": "Musical instrument", "children": [{"name": "Organ", "size": 1000}, {"name": "Banjo", "size": 1000}, {"name": "Cello", "size": 1000}, {"name": "Drum", "size": 1000}, {"name": "Horn", "size": 1000}, {"name": "Guitar", "size": 1000}, {"name": "Harp", "size": 1000}, {"name": "Harpsichord", "size": 1000}, {"name": "Harmonica", "size": 1000}, {"name": "Musical keyboard", "size": 1000}, {"name": "Oboe", "size": 1000}, {"name": "Piano", "size": 1000}, {"name": "Saxophone", "size": 1000}, {"name": "Trombone", "size": 1000}, {"name": "Trumpet", "size": 1000}, {"name": "Violin", "size": 1000}, {"name": "Chime", "size": 1000}, {"name": "Flute", "size": 1000}, {"name": "Accordion", "size": 1000}, {"name": "Maracas", "size": 1000}]}, {"name": "Sports equipment", "children": [{"name": "Paddle", "size": 1000}, {"name": "Ball", "children": [{"name": "Football", "size": 1000}, {"name": "Cricket ball", "size": 1000}, {"name": "Volleyball", "size": 1000}, {"name": "Tennis ball", "size": 1000}, {"name": "Rugby ball", "size": 1000}]}, {"name": "Bicycle", "children": [{"name": "Bicycle wheel", "size": 1000}]}, {"name": "Surfboard", "size": 1000}, {"name": "Bow and arrow", "size": 1000}, {"name": "Hiking equipment", "size": 1000}, {"name": "Roller skates", "size": 1000}, {"name": "Flying disc", "size": 1000}, {"name": "Baseball bat", "size": 1000}, {"name": "Baseball glove", "size": 1000}, {"name": "Punching bag", "size": 1000}, {"name": "Golf ball", "size": 1000}, {"name": "Lifejacket", "size": 1000}, {"name": "Scoreboard", "size": 1000}, {"name": "Snowboard", "size": 1000}, {"name": "Skateboard", "size": 1000}, {"name": "Ski", "size": 1000}, {"name": "Bowling equipment", "size": 1000}, {"name": "Boxing equipment", "size": 1000}, {"name": "Exercise equipment", "children": [{"name": "Dumbbell", "size": 1000}, {"name": "Stationary bicycle", "size": 1000}, {"name": "Treadmill", "size": 1000}, {"name": "Bench", "size": 1000}, {"name": "Indoor rower", "size": 1000}]}, {"name": "Horizontal bar", "size": 1000}, {"name": "Parachute", "size": 1000}, {"name": "Racket", "children": [{"name": "Tennis racket", "size": 1000}, {"name": "Table tennis racket", "size": 1000}]}, {"name": "Balance beam", "size": 1000}, {"name": "Helmet", "children": [{"name": "Bicycle helmet", "size": 1000}, {"name": "Football helmet", "size": 1000}]}, {"name": "Billiard table", "size": 1000}]}, {"name": "Tool", "children": [{"name": "Container", "children": [{"name": "Tin can", "size": 1000}, {"name": "Barrel", "size": 1000}, {"name": "Bottle", "size": 1000}, {"name": "Picnic basket", "size": 1000}, {"name": "Jug", "size": 1000}, {"name": "Waste container", "size": 1000}, {"name": "Beaker", "size": 1000}, {"name": "Flowerpot", "size": 1000}]}, {"name": "Ladder", "size": 1000}, {"name": "Toothbrush", "size": 1000}, {"name": "Screwdriver", "size": 1000}, {"name": "Drill", "size": 1000}, {"name": "Chainsaw", "size": 1000}, {"name": "Wrench", "size": 1000}, {"name": "Flashlight", "size": 1000}, {"name": "Scissors", "size": 1000}, {"name": "Ratchet", "size": 1000}, {"name": "Kitchen utensil", "children": [{"name": "Chopsticks", "size": 1000}, {"name": "Ladle", "size": 1000}, {"name": "Spatula", "size": 1000}, {"name": "Can opener", "size": 1000}, {"name": "Cutting board", "size": 1000}, {"name": "Whisk", "size": 1000}, {"name": "Drinking straw", "size": 1000}, {"name": "Knife", "size": 1000}, {"name": "Bottle opener", "size": 1000}, {"name": "Measuring cup", "size": 1000}, {"name": "Pizza cutter", "size": 1000}, {"name": "Spoon", "size": 1000}, {"name": "Fork", "size": 1000}]}, {"name": "Hammer", "size": 1000}, {"name": "Scale", "size": 1000}, {"name": "Snowplow", "size": 1000}, {"name": "Nail", "size": 1000}, {"name": "Tripod", "size": 1000}, {"name": "Torch", "size": 1000}, {"name": "Chisel", "size": 1000}, {"name": "Axe", "size": 1000}, {"name": "Camera", "size": 1000}, {"name": "Grinder", "size": 1000}, {"name": "Ruler", "size": 1000}, {"name": "Binoculars", "size": 1000}]}, {"name": "Weapon", "children": [{"name": "Bow and arrow", "size": 1000}, {"name": "Cannon", "size": 1000}, {"name": "Dagger", "size": 1000}, {"name": "Knife", "size": 1000}, {"name": "Rifle", "size": 1000}, {"name": "Shotgun", "size": 1000}, {"name": "Tank", "size": 1000}, {"name": "Axe", "size": 1000}, {"name": "Handgun", "size": 1000}, {"name": "Sword", "size": 1000}, {"name": "Missile", "size": 1000}, {"name": "Bomb", "size": 1000}]}, {"name": "Electronic device", "children": [{"name": "Cassette deck", "size": 1000}, {"name": "Headphones", "size": 1000}, {"name": "Laptop", "size": 1000}, {"name": "Computer keyboard", "size": 1000}, {"name": "Printer", "size": 1000}, {"name": "Mouse", "size": 1000}, {"name": "Computer monitor", "size": 1000}, {"name": "Ac power plugs and socket-outlets", "size": 1000}, {"name": "Light switch", "size": 1000}, {"name": "Musical keyboard", "size": 1000}, {"name": "Television", "size": 1000}, {"name": "Telephone", "children": [{"name": "Mobile phone", "size": 1000}, {"name": "Corded phone", "size": 1000}]}, {"name": "Tablet computer", "size": 1000}, {"name": "Microphone", "size": 1000}, {"name": "Ipod", "size": 1000}, {"name": "Remote control", "size": 1000}]}]}, {"name": "Drink", "children": [{"name": "Beer", "size": 1000}, {"name": "Cocktail", "size": 1000}, {"name": "Coffee", "size": 1000}, {"name": "Juice", "size": 1000}, {"name": "Tea", "size": 1000}, {"name": "Wine", "size": 1000}]}]} -------------------------------------------------------------------------------- /assets/label-frequencies-total.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/label-frequencies-total.png -------------------------------------------------------------------------------- /assets/label-frequencies-training-set.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/label-frequencies-training-set.png -------------------------------------------------------------------------------- /assets/oid_bbox_examples.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/oid_bbox_examples.png -------------------------------------------------------------------------------- /assets/share-of-correct-annotations-vs-frequency.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/share-of-correct-annotations-vs-frequency.png -------------------------------------------------------------------------------- /assets/v2-bbox_labels_vis_screenshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/v2-bbox_labels_vis_screenshot.png -------------------------------------------------------------------------------- /assets/v2-human-label-frequencies-bbox-train.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/v2-human-label-frequencies-bbox-train.png -------------------------------------------------------------------------------- /assets/v2-human-label-frequencies-bbox-val-test.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/v2-human-label-frequencies-bbox-val-test.png -------------------------------------------------------------------------------- /assets/v2-human-label-frequencies-train.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/v2-human-label-frequencies-train.png -------------------------------------------------------------------------------- /assets/v2-human-label-frequencies-val-test.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/v2-human-label-frequencies-val-test.png -------------------------------------------------------------------------------- /assets/v3-human-bbox-frequencies-test.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/v3-human-bbox-frequencies-test.png -------------------------------------------------------------------------------- /assets/v3-human-bbox-frequencies-train.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/v3-human-bbox-frequencies-train.png -------------------------------------------------------------------------------- /assets/v3-human-bbox-frequencies-validation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/v3-human-bbox-frequencies-validation.png -------------------------------------------------------------------------------- /assets/v3-human-label-frequencies-test.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/v3-human-label-frequencies-test.png -------------------------------------------------------------------------------- /assets/v3-human-label-frequencies-train.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/v3-human-label-frequencies-train.png -------------------------------------------------------------------------------- /assets/v3-human-label-frequencies-validation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openimages/dataset/077282972acd0ad8628f1526760ad239a38a8a97/assets/v3-human-label-frequencies-validation.png -------------------------------------------------------------------------------- /bbox_labels_vis.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Flare Dendrogram 5 | 24 | 25 | 26 | 72 | -------------------------------------------------------------------------------- /downloader.py: -------------------------------------------------------------------------------- 1 | # python3 2 | # coding=utf-8 3 | # Copyright 2020 The Google Research Authors. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | """Open Images image downloader. 17 | 18 | This script downloads a subset of Open Images images, given a list of image ids. 19 | Typical uses of this tool might be downloading images: 20 | - That contain a certain category. 21 | - That have been annotated with certain types of annotations (e.g. Localized 22 | Narratives, Exhaustively annotated people, etc.) 23 | 24 | The input file IMAGE_LIST should be a text file containing one image per line 25 | with the format /, where is either "train", "test", 26 | "validation", or "challenge2018"; and is the image ID that uniquely 27 | identifies the image in Open Images. A sample file could be: 28 | train/f9e0434389a1d4dd 29 | train/1a007563ebc18664 30 | test/ea8bfd4e765304db 31 | 32 | """ 33 | 34 | import argparse 35 | from concurrent import futures 36 | import os 37 | import re 38 | import sys 39 | 40 | import boto3 41 | import botocore 42 | import tqdm 43 | 44 | BUCKET_NAME = 'open-images-dataset' 45 | REGEX = r'(test|train|validation|challenge2018)/([a-fA-F0-9]*)' 46 | 47 | 48 | def check_and_homogenize_one_image(image): 49 | split, image_id = re.match(REGEX, image).groups() 50 | yield split, image_id 51 | 52 | 53 | def check_and_homogenize_image_list(image_list): 54 | for line_number, image in enumerate(image_list): 55 | try: 56 | yield from check_and_homogenize_one_image(image) 57 | except (ValueError, AttributeError): 58 | raise ValueError( 59 | f'ERROR in line {line_number} of the image list. The following image ' 60 | f'string is not recognized: "{image}".') 61 | 62 | 63 | def read_image_list_file(image_list_file): 64 | with open(image_list_file, 'r') as f: 65 | for line in f: 66 | yield line.strip().replace('.jpg', '') 67 | 68 | 69 | def download_one_image(bucket, split, image_id, download_folder): 70 | try: 71 | bucket.download_file(f'{split}/{image_id}.jpg', 72 | os.path.join(download_folder, f'{image_id}.jpg')) 73 | except botocore.exceptions.ClientError as exception: 74 | sys.exit( 75 | f'ERROR when downloading image `{split}/{image_id}`: {str(exception)}') 76 | 77 | 78 | def download_all_images(args): 79 | """Downloads all images specified in the input file.""" 80 | bucket = boto3.resource( 81 | 's3', config=botocore.config.Config( 82 | signature_version=botocore.UNSIGNED)).Bucket(BUCKET_NAME) 83 | 84 | download_folder = args['download_folder'] or os.getcwd() 85 | 86 | if not os.path.exists(download_folder): 87 | os.makedirs(download_folder) 88 | 89 | try: 90 | image_list = list( 91 | check_and_homogenize_image_list( 92 | read_image_list_file(args['image_list']))) 93 | except ValueError as exception: 94 | sys.exit(exception) 95 | 96 | progress_bar = tqdm.tqdm( 97 | total=len(image_list), desc='Downloading images', leave=True) 98 | with futures.ThreadPoolExecutor( 99 | max_workers=args['num_processes']) as executor: 100 | all_futures = [ 101 | executor.submit(download_one_image, bucket, split, image_id, 102 | download_folder) for (split, image_id) in image_list 103 | ] 104 | for future in futures.as_completed(all_futures): 105 | future.result() 106 | progress_bar.update(1) 107 | progress_bar.close() 108 | 109 | 110 | if __name__ == '__main__': 111 | parser = argparse.ArgumentParser( 112 | description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) 113 | parser.add_argument( 114 | 'image_list', 115 | type=str, 116 | default=None, 117 | help=('Filename that contains the split + image IDs of the images to ' 118 | 'download. Check the document')) 119 | parser.add_argument( 120 | '--num_processes', 121 | type=int, 122 | default=5, 123 | help='Number of parallel processes to use (default is 5).') 124 | parser.add_argument( 125 | '--download_folder', 126 | type=str, 127 | default=None, 128 | help='Folder where to download the images.') 129 | download_all_images(vars(parser.parse_args())) 130 | -------------------------------------------------------------------------------- /tools/classify.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # 3 | # Copyright 2016 The Open Images Authors. All Rights Reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # ============================================================================== 17 | # 18 | # This script takes an Inception v3 checkpoint, runs the classifier 19 | # on the image and prints top(n) predictions in the human-readable form. 20 | # Example: 21 | # $ wget -O /tmp/cat.jpg https://farm6.staticflickr.com/5470/9372235876_d7d69f1790_b.jpg 22 | # $ ./tools/classify.py /tmp/cat.jpg 23 | # 5723: /m/0jbk - animal (score = 0.94) 24 | # 3473: /m/04rky - mammal (score = 0.93) 25 | # 4605: /m/09686 - vertebrate (score = 0.91) 26 | # 1261: /m/01yrx - cat (score = 0.90) 27 | # 3981: /m/068hy - pet (score = 0.87) 28 | # 841: /m/01l7qd - whiskers (score = 0.83) 29 | # 2430: /m/0307l - cat-like mammal (score = 0.78) 30 | # 4349: /m/07k6w8 - small to medium-sized cats (score = 0.75) 31 | # 2537: /m/035qhg - fauna (score = 0.47) 32 | # 1776: /m/02cqfm - close-up (score = 0.45) 33 | # 34 | # Make sure to download the ANN weights and support data with: 35 | # $ ./tools/download_data.sh 36 | 37 | from __future__ import absolute_import 38 | from __future__ import division 39 | from __future__ import print_function 40 | 41 | import argparse 42 | import math 43 | import sys 44 | import os.path 45 | 46 | import numpy as np 47 | import tensorflow as tf 48 | 49 | from tensorflow.contrib.slim.python.slim.nets import inception 50 | from tensorflow.python.framework import ops 51 | from tensorflow.python.training import saver as tf_saver 52 | from tensorflow.python.training import supervisor 53 | 54 | slim = tf.contrib.slim 55 | FLAGS = None 56 | 57 | def PreprocessImage(image, central_fraction=0.875): 58 | """Load and preprocess an image. 59 | 60 | Args: 61 | image: a tf.string tensor with an JPEG-encoded image. 62 | central_fraction: do a central crop with the specified 63 | fraction of image covered. 64 | Returns: 65 | An ops.Tensor that produces the preprocessed image. 66 | """ 67 | 68 | # Decode Jpeg data and convert to float. 69 | image = tf.cast(tf.image.decode_jpeg(image, channels=3), tf.float32) 70 | 71 | image = tf.image.central_crop(image, central_fraction=central_fraction) 72 | # Make into a 4D tensor by setting a 'batch size' of 1. 73 | image = tf.expand_dims(image, [0]) 74 | image = tf.image.resize_bilinear(image, 75 | [FLAGS.image_size, FLAGS.image_size], 76 | align_corners=False) 77 | 78 | # Center the image about 128.0 (which is done during training) and normalize. 79 | image = tf.multiply(image, 1.0/127.5) 80 | return tf.subtract(image, 1.0) 81 | 82 | 83 | def LoadLabelMaps(num_classes, labelmap_path, dict_path): 84 | """Load index->mid and mid->display name maps. 85 | 86 | Args: 87 | labelmap_path: path to the file with the list of mids, describing predictions. 88 | dict_path: path to the dict.csv that translates from mids to display names. 89 | Returns: 90 | labelmap: an index to mid list 91 | label_dict: mid to display name dictionary 92 | """ 93 | labelmap = [line.rstrip() for line in tf.gfile.GFile(labelmap_path).readlines()] 94 | if len(labelmap) != num_classes: 95 | tf.logging.fatal( 96 | "Label map loaded from {} contains {} lines while the number of classes is {}".format( 97 | labelmap_path, len(labelmap), num_classes)) 98 | sys.exit(1) 99 | 100 | label_dict = {} 101 | for line in tf.gfile.GFile(dict_path).readlines(): 102 | words = [word.strip(' "\n') for word in line.split(',', 1)] 103 | label_dict[words[0]] = words[1] 104 | 105 | return labelmap, label_dict 106 | 107 | 108 | def main(args): 109 | if not os.path.exists(FLAGS.checkpoint): 110 | tf.logging.fatal( 111 | 'Checkpoint %s does not exist. Have you download it? See tools/download_data.sh', 112 | FLAGS.checkpoint) 113 | g = tf.Graph() 114 | with g.as_default(): 115 | input_image = tf.placeholder(tf.string) 116 | processed_image = PreprocessImage(input_image) 117 | 118 | with slim.arg_scope(inception.inception_v3_arg_scope()): 119 | logits, end_points = inception.inception_v3( 120 | processed_image, num_classes=FLAGS.num_classes, is_training=False) 121 | 122 | predictions = end_points['multi_predictions'] = tf.nn.sigmoid( 123 | logits, name='multi_predictions') 124 | saver = tf_saver.Saver() 125 | sess = tf.Session() 126 | saver.restore(sess, FLAGS.checkpoint) 127 | 128 | # Run the evaluation on the images 129 | for image_path in FLAGS.image_path: 130 | if not os.path.exists(image_path): 131 | tf.logging.fatal('Input image does not exist %s', FLAGS.image_path[0]) 132 | img_data = tf.gfile.FastGFile(image_path, "rb").read() 133 | print(image_path) 134 | predictions_eval = np.squeeze(sess.run(predictions, 135 | {input_image: img_data})) 136 | 137 | # Print top(n) results 138 | labelmap, label_dict = LoadLabelMaps(FLAGS.num_classes, FLAGS.labelmap, FLAGS.dict) 139 | 140 | top_k = predictions_eval.argsort()[-FLAGS.n:][::-1] 141 | for idx in top_k: 142 | mid = labelmap[idx] 143 | display_name = label_dict.get(mid, 'unknown') 144 | score = predictions_eval[idx] 145 | print('{}: {} - {} (score = {:.2f})'.format(idx, mid, display_name, score)) 146 | print() 147 | 148 | 149 | if __name__ == '__main__': 150 | parser = argparse.ArgumentParser() 151 | parser.add_argument('--checkpoint', type=str, default='data/2016_08/model.ckpt', 152 | help='Checkpoint to run inference on.') 153 | parser.add_argument('--labelmap', type=str, default='data/2016_08/labelmap.txt', 154 | help='Label map that translates from index to mid.') 155 | parser.add_argument('--dict', type=str, default='dict.csv', 156 | help='Path to a dict.csv that translates from mid to a display name.') 157 | parser.add_argument('--image_size', type=int, default=299, 158 | help='Image size to run inference on.') 159 | parser.add_argument('--num_classes', type=int, default=6012, 160 | help='Number of output classes.') 161 | parser.add_argument('--n', type=int, default=10, 162 | help='Number of top predictions to print.') 163 | parser.add_argument('image_path', nargs='+', default='') 164 | FLAGS = parser.parse_args() 165 | tf.app.run() 166 | -------------------------------------------------------------------------------- /tools/classify_oidv2.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # 3 | # Copyright 2017 The Open Images Authors. All Rights Reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # ============================================================================== 17 | r"""Classifier inference utility. 18 | 19 | This code takes a resnet_v1_101 checkpoint, runs the classifier on the image and 20 | prints predictions in human-readable form. 21 | 22 | ------------------------------- 23 | Example command: 24 | ------------------------------- 25 | 26 | # 0. Create directory for model/data 27 | WORK_PATH="/tmp/oidv2" 28 | mkdir -p "${WORK_PATH}" 29 | cd "${WORK_PATH}" 30 | 31 | # 1. Download the model, inference code, and sample image 32 | wget https://storage.googleapis.com/openimages/2017_07/classes-trainable.txt 33 | wget https://storage.googleapis.com/openimages/2017_07/class-descriptions.csv 34 | wget https://storage.googleapis.com/openimages/2017_07/oidv2-resnet_v1_101.ckpt.tar.gz 35 | wget https://raw.githubusercontent.com/openimages/dataset/master/tools/classify_oidv2.py 36 | tar -xzf oidv2-resnet_v1_101.ckpt.tar.gz 37 | 38 | wget -O cat.jpg https://farm6.staticflickr.com/5470/9372235876_d7d69f1790_b.jpg 39 | 40 | # 2. Run inference 41 | python classify_oidv2.py \ 42 | --checkpoint_path='oidv2-resnet_v1_101.ckpt' \ 43 | --labelmap='classes-trainable.txt' \ 44 | --dict='class-descriptions.csv' \ 45 | --image="cat.jpg" \ 46 | --top_k=10 \ 47 | --score_threshold=0.3 48 | 49 | # Sample output: 50 | Image: "cat.jpg" 51 | 52 | 3272: /m/068hy - Pet (score = 0.96) 53 | 1076: /m/01yrx - Cat (score = 0.95) 54 | 0708: /m/01l7qd - Whiskers (score = 0.90) 55 | 4755: /m/0jbk - Animal (score = 0.90) 56 | 2847: /m/04rky - Mammal (score = 0.89) 57 | 2036: /m/0307l - Felidae (score = 0.79) 58 | 3574: /m/07k6w8 - Small to medium-sized cats (score = 0.77) 59 | 4799: /m/0k0pj - Nose (score = 0.70) 60 | 1495: /m/02cqfm - Close-up (score = 0.55) 61 | 0036: /m/012c9l - Domestic short-haired cat (score = 0.40) 62 | 63 | ------------------------------- 64 | Note on image preprocessing: 65 | ------------------------------- 66 | 67 | This is the code used to perform preprocessing: 68 | -------- 69 | from preprocessing import preprocessing_factory 70 | 71 | def PreprocessImage(image, network='resnet_v1_101', image_size=299): 72 | # If resolution is larger than 224 we need to adjust some internal resizing 73 | # parameters for vgg preprocessing. 74 | if any(network.startswith(x) for x in ['resnet', 'vgg']): 75 | preprocessing_kwargs = { 76 | 'resize_side_min': int(256 * image_size / 224), 77 | 'resize_side_max': int(512 * image_size / 224) 78 | } 79 | else: 80 | preprocessing_kwargs = {} 81 | preprocessing_fn = preprocessing_factory.get_preprocessing( 82 | name=network, is_training=False) 83 | 84 | height = image_size 85 | width = image_size 86 | image = preprocessing_fn(image, height, width, **preprocessing_kwargs) 87 | image.set_shape([height, width, 3]) 88 | return image 89 | -------- 90 | 91 | Note that there appears to be a small difference between the public version 92 | of slim image processing library and the internal version (which the meta 93 | graph is based on). Results that are very close, but not exactly identical to 94 | that of the metagraph. 95 | """ 96 | 97 | from __future__ import absolute_import 98 | from __future__ import division 99 | from __future__ import print_function 100 | 101 | import tensorflow as tf 102 | 103 | flags = tf.app.flags 104 | FLAGS = flags.FLAGS 105 | 106 | flags.DEFINE_string('labelmap', 'classes-trainable.txt', 107 | 'Labels, one per line.') 108 | 109 | flags.DEFINE_string('dict', 'class-descriptions.csv', 110 | 'Descriptive string for each label.') 111 | 112 | flags.DEFINE_string('checkpoint_path', 'oidv2-resnet_v1_101.ckpt', 113 | 'Path to checkpoint file.') 114 | 115 | flags.DEFINE_string('image', '', 116 | 'Comma separated paths to image files on which to perform ' 117 | 'inference.') 118 | 119 | flags.DEFINE_integer('top_k', 10, 'Maximum number of results to show.') 120 | 121 | flags.DEFINE_float('score_threshold', None, 'Score threshold.') 122 | 123 | 124 | def LoadLabelMap(labelmap_path, dict_path): 125 | """Load index->mid and mid->display name maps. 126 | 127 | Args: 128 | labelmap_path: path to the file with the list of mids, describing 129 | predictions. 130 | dict_path: path to the dict.csv that translates from mids to display names. 131 | Returns: 132 | labelmap: an index to mid list 133 | label_dict: mid to display name dictionary 134 | """ 135 | labelmap = [line.rstrip() for line in tf.gfile.GFile(labelmap_path)] 136 | 137 | label_dict = {} 138 | for line in tf.gfile.GFile(dict_path): 139 | words = [word.strip(' "\n') for word in line.split(',', 1)] 140 | label_dict[words[0]] = words[1] 141 | 142 | return labelmap, label_dict 143 | 144 | 145 | def main(_): 146 | # Load labelmap and dictionary from disk. 147 | labelmap, label_dict = LoadLabelMap(FLAGS.labelmap, FLAGS.dict) 148 | 149 | g = tf.Graph() 150 | with g.as_default(): 151 | with tf.Session() as sess: 152 | saver = tf.train.import_meta_graph(FLAGS.checkpoint_path + '.meta') 153 | saver.restore(sess, FLAGS.checkpoint_path) 154 | 155 | input_values = g.get_tensor_by_name('input_values:0') 156 | predictions = g.get_tensor_by_name('multi_predictions:0') 157 | 158 | for image_filename in FLAGS.image.split(','): 159 | compressed_image = tf.gfile.FastGFile(image_filename, 'rb').read() 160 | predictions_eval = sess.run( 161 | predictions, feed_dict={ 162 | input_values: [compressed_image] 163 | }) 164 | top_k = predictions_eval.argsort()[::-1] # indices sorted by score 165 | if FLAGS.top_k > 0: 166 | top_k = top_k[:FLAGS.top_k] 167 | if FLAGS.score_threshold is not None: 168 | top_k = [i for i in top_k 169 | if predictions_eval[i] >= FLAGS.score_threshold] 170 | print('Image: "%s"\n' % image_filename) 171 | for idx in top_k: 172 | mid = labelmap[idx] 173 | display_name = label_dict[mid] 174 | score = predictions_eval[idx] 175 | print('{:04d}: {} - {} (score = {:.2f})'.format( 176 | idx, mid, display_name, score)) 177 | 178 | 179 | if __name__ == '__main__': 180 | tf.app.run() 181 | -------------------------------------------------------------------------------- /tools/compute_bottleneck.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # 3 | # Copyright 2016 The Open Images Authors. All Rights Reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # ============================================================================== 17 | # 18 | # This script takes an Inception v3 checkpoint, runs the classifier 19 | # on the image and prints the values from the bottleneck layer. 20 | # Example: 21 | # $ wget -O /tmp/cat.jpg https://farm6.staticflickr.com/5470/9372235876_d7d69f1790_b.jpg 22 | # $ ./tools/compute_bottleneck.py /tmp/cat.jpg 23 | # 24 | # Make sure to download the ANN weights and support data with: 25 | # $ ./tools/download_data.sh 26 | 27 | from __future__ import absolute_import 28 | from __future__ import division 29 | from __future__ import print_function 30 | 31 | import argparse 32 | import math 33 | import sys 34 | import os.path 35 | 36 | import numpy as np 37 | import tensorflow as tf 38 | 39 | from tensorflow.contrib.slim.python.slim.nets import inception 40 | from tensorflow.python.framework import ops 41 | from tensorflow.python.training import saver as tf_saver 42 | from tensorflow.python.training import supervisor 43 | 44 | slim = tf.contrib.slim 45 | FLAGS = None 46 | 47 | def PreprocessImage(image_path, central_fraction=0.875): 48 | """Load and preprocess an image. 49 | 50 | Args: 51 | image_path: path to an image 52 | central_fraction: do a central crop with the specified 53 | fraction of image covered. 54 | Returns: 55 | An ops.Tensor that produces the preprocessed image. 56 | """ 57 | if not os.path.exists(image_path): 58 | tf.logging.fatal('Input image does not exist %s', image_path) 59 | img_data = tf.gfile.FastGFile(image_path).read() 60 | 61 | # Decode Jpeg data and convert to float. 62 | img = tf.cast(tf.image.decode_jpeg(img_data, channels=3), tf.float32) 63 | 64 | img = tf.image.central_crop(img, central_fraction=central_fraction) 65 | # Make into a 4D tensor by setting a 'batch size' of 1. 66 | img = tf.expand_dims(img, [0]) 67 | img = tf.image.resize_bilinear(img, 68 | [FLAGS.image_size, FLAGS.image_size], 69 | align_corners=False) 70 | 71 | # Center the image about 128.0 (which is done during training) and normalize. 72 | img = tf.multiply(img, 1.0/127.5) 73 | return tf.subtract(img, 1.0) 74 | 75 | 76 | def main(args): 77 | if not os.path.exists(FLAGS.checkpoint): 78 | tf.logging.fatal( 79 | 'Checkpoint %s does not exist. Have you download it? See tools/download_data.sh', 80 | FLAGS.checkpoint) 81 | g = tf.Graph() 82 | with g.as_default(): 83 | input_image = PreprocessImage(FLAGS.image_path[0]) 84 | 85 | with slim.arg_scope(inception.inception_v3_arg_scope()): 86 | logits, end_points = inception.inception_v3( 87 | input_image, num_classes=FLAGS.num_classes, is_training=False) 88 | 89 | bottleneck = end_points['PreLogits'] 90 | init_op = tf.group(tf.global_variables_initializer(), 91 | tf.local_variables_initializer(), 92 | tf.tables_initializer()) 93 | saver = tf_saver.Saver() 94 | sess = tf.Session() 95 | saver.restore(sess, FLAGS.checkpoint) 96 | 97 | # Run the evaluation on the image 98 | bottleneck_eval = np.squeeze(sess.run(bottleneck)) 99 | 100 | first = True 101 | for val in bottleneck_eval: 102 | if not first: 103 | sys.stdout.write(",") 104 | first = False 105 | sys.stdout.write('{:.3f}'.format(val)) 106 | sys.stdout.write('\n') 107 | 108 | 109 | if __name__ == '__main__': 110 | parser = argparse.ArgumentParser() 111 | parser.add_argument('--checkpoint', type=str, default='data/2016_08/model.ckpt', 112 | help='Checkpoint to run inference on.') 113 | parser.add_argument('--image_size', type=int, default=299, 114 | help='Image size to run inference on.') 115 | parser.add_argument('--num_classes', type=int, default=6012, 116 | help='Number of output classes.') 117 | parser.add_argument('image_path', nargs=1, default='') 118 | FLAGS = parser.parse_args() 119 | tf.app.run() 120 | -------------------------------------------------------------------------------- /tools/download_data.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | # Copyright 2016 The Open Images Authors. All Rights Reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # ============================================================================== 17 | # 18 | # Download and extract pretrained Inception v3 model. 19 | 20 | set -ue 21 | 22 | cd $(cd -P -- "$(dirname -- "$0")" && pwd -P) 23 | mkdir -p ../data 24 | cd ../data 25 | 26 | wget https://storage.googleapis.com/openimages/2016_08/model_2016_08.tar.gz 27 | tar -xzf model_2016_08.tar.gz 28 | --------------------------------------------------------------------------------