├── .gitignore
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── demo.py
├── images
├── loc-narr.gif
├── paper_thumb_1.jpeg
├── paper_thumb_10.jpeg
├── paper_thumb_11.jpeg
├── paper_thumb_12.jpeg
├── paper_thumb_13.jpeg
├── paper_thumb_14.jpeg
├── paper_thumb_2.jpeg
├── paper_thumb_3.jpeg
├── paper_thumb_4.jpeg
├── paper_thumb_5.jpeg
├── paper_thumb_6.jpeg
├── paper_thumb_7.jpeg
├── paper_thumb_8.jpeg
└── paper_thumb_9.jpeg
├── index.html
├── localized_narratives.py
├── transcription_example.py
└── web.js
/.gitignore:
--------------------------------------------------------------------------------
1 | speech_api_env
2 | .idea/
3 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # How to Contribute
2 |
3 | We'd love to accept your patches and contributions to this project. There are
4 | just a few small guidelines you need to follow.
5 |
6 | ## Contributor License Agreement
7 |
8 | Contributions to this project must be accompanied by a Contributor License
9 | Agreement. You (or your employer) retain the copyright to your contribution;
10 | this simply gives us permission to use and redistribute your contributions as
11 | part of the project. Head over to to see
12 | your current agreements on file or to sign a new one.
13 |
14 | You generally only need to submit a CLA once, so if you've already submitted one
15 | (even if it was for a different project), you probably don't need to do it
16 | again.
17 |
18 | ## Code reviews
19 |
20 | All submissions, including submissions by project members, require review. We
21 | use GitHub pull requests for this purpose. Consult
22 | [GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
23 | information on using pull requests.
24 |
25 | ## Community Guidelines
26 |
27 | This project follows
28 | [Google's Open Source Community Guidelines](https://opensource.google.com/conduct/).
29 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 |
2 | Apache License
3 | Version 2.0, January 2004
4 | http://www.apache.org/licenses/
5 |
6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7 |
8 | 1. Definitions.
9 |
10 | "License" shall mean the terms and conditions for use, reproduction,
11 | and distribution as defined by Sections 1 through 9 of this document.
12 |
13 | "Licensor" shall mean the copyright owner or entity authorized by
14 | the copyright owner that is granting the License.
15 |
16 | "Legal Entity" shall mean the union of the acting entity and all
17 | other entities that control, are controlled by, or are under common
18 | control with that entity. For the purposes of this definition,
19 | "control" means (i) the power, direct or indirect, to cause the
20 | direction or management of such entity, whether by contract or
21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
22 | outstanding shares, or (iii) beneficial ownership of such entity.
23 |
24 | "You" (or "Your") shall mean an individual or Legal Entity
25 | exercising permissions granted by this License.
26 |
27 | "Source" form shall mean the preferred form for making modifications,
28 | including but not limited to software source code, documentation
29 | source, and configuration files.
30 |
31 | "Object" form shall mean any form resulting from mechanical
32 | transformation or translation of a Source form, including but
33 | not limited to compiled object code, generated documentation,
34 | and conversions to other media types.
35 |
36 | "Work" shall mean the work of authorship, whether in Source or
37 | Object form, made available under the License, as indicated by a
38 | copyright notice that is included in or attached to the work
39 | (an example is provided in the Appendix below).
40 |
41 | "Derivative Works" shall mean any work, whether in Source or Object
42 | form, that is based on (or derived from) the Work and for which the
43 | editorial revisions, annotations, elaborations, or other modifications
44 | represent, as a whole, an original work of authorship. For the purposes
45 | of this License, Derivative Works shall not include works that remain
46 | separable from, or merely link (or bind by name) to the interfaces of,
47 | the Work and Derivative Works thereof.
48 |
49 | "Contribution" shall mean any work of authorship, including
50 | the original version of the Work and any modifications or additions
51 | to that Work or Derivative Works thereof, that is intentionally
52 | submitted to Licensor for inclusion in the Work by the copyright owner
53 | or by an individual or Legal Entity authorized to submit on behalf of
54 | the copyright owner. For the purposes of this definition, "submitted"
55 | means any form of electronic, verbal, or written communication sent
56 | to the Licensor or its representatives, including but not limited to
57 | communication on electronic mailing lists, source code control systems,
58 | and issue tracking systems that are managed by, or on behalf of, the
59 | Licensor for the purpose of discussing and improving the Work, but
60 | excluding communication that is conspicuously marked or otherwise
61 | designated in writing by the copyright owner as "Not a Contribution."
62 |
63 | "Contributor" shall mean Licensor and any individual or Legal Entity
64 | on behalf of whom a Contribution has been received by Licensor and
65 | subsequently incorporated within the Work.
66 |
67 | 2. Grant of Copyright License. Subject to the terms and conditions of
68 | this License, each Contributor hereby grants to You a perpetual,
69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70 | copyright license to reproduce, prepare Derivative Works of,
71 | publicly display, publicly perform, sublicense, and distribute the
72 | Work and such Derivative Works in Source or Object form.
73 |
74 | 3. Grant of Patent License. Subject to the terms and conditions of
75 | this License, each Contributor hereby grants to You a perpetual,
76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77 | (except as stated in this section) patent license to make, have made,
78 | use, offer to sell, sell, import, and otherwise transfer the Work,
79 | where such license applies only to those patent claims licensable
80 | by such Contributor that are necessarily infringed by their
81 | Contribution(s) alone or by combination of their Contribution(s)
82 | with the Work to which such Contribution(s) was submitted. If You
83 | institute patent litigation against any entity (including a
84 | cross-claim or counterclaim in a lawsuit) alleging that the Work
85 | or a Contribution incorporated within the Work constitutes direct
86 | or contributory patent infringement, then any patent licenses
87 | granted to You under this License for that Work shall terminate
88 | as of the date such litigation is filed.
89 |
90 | 4. Redistribution. You may reproduce and distribute copies of the
91 | Work or Derivative Works thereof in any medium, with or without
92 | modifications, and in Source or Object form, provided that You
93 | meet the following conditions:
94 |
95 | (a) You must give any other recipients of the Work or
96 | Derivative Works a copy of this License; and
97 |
98 | (b) You must cause any modified files to carry prominent notices
99 | stating that You changed the files; and
100 |
101 | (c) You must retain, in the Source form of any Derivative Works
102 | that You distribute, all copyright, patent, trademark, and
103 | attribution notices from the Source form of the Work,
104 | excluding those notices that do not pertain to any part of
105 | the Derivative Works; and
106 |
107 | (d) If the Work includes a "NOTICE" text file as part of its
108 | distribution, then any Derivative Works that You distribute must
109 | include a readable copy of the attribution notices contained
110 | within such NOTICE file, excluding those notices that do not
111 | pertain to any part of the Derivative Works, in at least one
112 | of the following places: within a NOTICE text file distributed
113 | as part of the Derivative Works; within the Source form or
114 | documentation, if provided along with the Derivative Works; or,
115 | within a display generated by the Derivative Works, if and
116 | wherever such third-party notices normally appear. The contents
117 | of the NOTICE file are for informational purposes only and
118 | do not modify the License. You may add Your own attribution
119 | notices within Derivative Works that You distribute, alongside
120 | or as an addendum to the NOTICE text from the Work, provided
121 | that such additional attribution notices cannot be construed
122 | as modifying the License.
123 |
124 | You may add Your own copyright statement to Your modifications and
125 | may provide additional or different license terms and conditions
126 | for use, reproduction, or distribution of Your modifications, or
127 | for any such Derivative Works as a whole, provided Your use,
128 | reproduction, and distribution of the Work otherwise complies with
129 | the conditions stated in this License.
130 |
131 | 5. Submission of Contributions. Unless You explicitly state otherwise,
132 | any Contribution intentionally submitted for inclusion in the Work
133 | by You to the Licensor shall be under the terms and conditions of
134 | this License, without any additional terms or conditions.
135 | Notwithstanding the above, nothing herein shall supersede or modify
136 | the terms of any separate license agreement you may have executed
137 | with Licensor regarding such Contributions.
138 |
139 | 6. Trademarks. This License does not grant permission to use the trade
140 | names, trademarks, service marks, or product names of the Licensor,
141 | except as required for reasonable and customary use in describing the
142 | origin of the Work and reproducing the content of the NOTICE file.
143 |
144 | 7. Disclaimer of Warranty. Unless required by applicable law or
145 | agreed to in writing, Licensor provides the Work (and each
146 | Contributor provides its Contributions) on an "AS IS" BASIS,
147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148 | implied, including, without limitation, any warranties or conditions
149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150 | PARTICULAR PURPOSE. You are solely responsible for determining the
151 | appropriateness of using or redistributing the Work and assume any
152 | risks associated with Your exercise of permissions under this License.
153 |
154 | 8. Limitation of Liability. In no event and under no legal theory,
155 | whether in tort (including negligence), contract, or otherwise,
156 | unless required by applicable law (such as deliberate and grossly
157 | negligent acts) or agreed to in writing, shall any Contributor be
158 | liable to You for damages, including any direct, indirect, special,
159 | incidental, or consequential damages of any character arising as a
160 | result of this License or out of the use or inability to use the
161 | Work (including but not limited to damages for loss of goodwill,
162 | work stoppage, computer failure or malfunction, or any and all
163 | other commercial damages or losses), even if such Contributor
164 | has been advised of the possibility of such damages.
165 |
166 | 9. Accepting Warranty or Additional Liability. While redistributing
167 | the Work or Derivative Works thereof, You may choose to offer,
168 | and charge a fee for, acceptance of support, warranty, indemnity,
169 | or other liability obligations and/or rights consistent with this
170 | License. However, in accepting such obligations, You may act only
171 | on Your own behalf and on Your sole responsibility, not on behalf
172 | of any other Contributor, and only if You agree to indemnify,
173 | defend, and hold each Contributor harmless for any liability
174 | incurred by, or claims asserted against, such Contributor by reason
175 | of your accepting any such warranty or additional liability.
176 |
177 | END OF TERMS AND CONDITIONS
178 |
179 | APPENDIX: How to apply the Apache License to your work.
180 |
181 | To apply the Apache License to your work, attach the following
182 | boilerplate notice, with the fields enclosed by brackets "[]"
183 | replaced with your own identifying information. (Don't include
184 | the brackets!) The text should be enclosed in the appropriate
185 | comment syntax for the file format. We also recommend that a
186 | file or class name and description of purpose be included on the
187 | same "printed page" as the copyright notice for easier
188 | identification within third-party archives.
189 |
190 | Copyright [yyyy] [name of copyright owner]
191 |
192 | Licensed under the Apache License, Version 2.0 (the "License");
193 | you may not use this file except in compliance with the License.
194 | You may obtain a copy of the License at
195 |
196 | http://www.apache.org/licenses/LICENSE-2.0
197 |
198 | Unless required by applicable law or agreed to in writing, software
199 | distributed under the License is distributed on an "AS IS" BASIS,
200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201 | See the License for the specific language governing permissions and
202 | limitations under the License.
203 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ## Localized Narratives
2 | Visit the [project page](https://google.github.io/localized-narratives) for all the information about Localized Narratives, data downloads, visualizations, and much more.
3 |
--------------------------------------------------------------------------------
/demo.py:
--------------------------------------------------------------------------------
1 | # python3
2 | # coding=utf-8
3 | # Copyright 2020 The Google Research Authors.
4 | #
5 | # Licensed under the Apache License, Version 2.0 (the "License");
6 | # you may not use this file except in compliance with the License.
7 | # You may obtain a copy of the License at
8 | #
9 | # http://www.apache.org/licenses/LICENSE-2.0
10 | #
11 | # Unless required by applicable law or agreed to in writing, software
12 | # distributed under the License is distributed on an "AS IS" BASIS,
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 | # See the License for the specific language governing permissions and
15 | # limitations under the License.
16 | """Demo usage of the Localized Narratives data loader."""
17 | import localized_narratives
18 |
19 | # This folder is where you would like to download the annotation files to and
20 | # where to read them from.
21 | local_dir = '/path/to/downloaded/data'
22 |
23 | # The DataLoader class allows us to download the data and read it from file.
24 | data_loader = localized_narratives.DataLoader(local_dir)
25 |
26 | # Downloads the annotation files (it first checks if they are not downloaded).
27 | data_loader.download_annotations('coco_val')
28 |
29 | # Iterates through all or a limited number of (e.g. 1 in this case) annotations
30 | # for all files found in the local folder for a given dataset and split. E.g.
31 | # for `open_images_train` it will read only one shard if only one file was
32 | # downloaded manually.
33 | loc_narr = next(data_loader.load_annotations('coco_val', 1))
34 |
35 | print(f'\nLocalized Narrative sample:\n{loc_narr}')
36 |
37 | print(f'\nVoice recording URL:\n {loc_narr.voice_recording_url}\n')
38 |
--------------------------------------------------------------------------------
/images/loc-narr.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/loc-narr.gif
--------------------------------------------------------------------------------
/images/paper_thumb_1.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/paper_thumb_1.jpeg
--------------------------------------------------------------------------------
/images/paper_thumb_10.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/paper_thumb_10.jpeg
--------------------------------------------------------------------------------
/images/paper_thumb_11.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/paper_thumb_11.jpeg
--------------------------------------------------------------------------------
/images/paper_thumb_12.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/paper_thumb_12.jpeg
--------------------------------------------------------------------------------
/images/paper_thumb_13.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/paper_thumb_13.jpeg
--------------------------------------------------------------------------------
/images/paper_thumb_14.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/paper_thumb_14.jpeg
--------------------------------------------------------------------------------
/images/paper_thumb_2.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/paper_thumb_2.jpeg
--------------------------------------------------------------------------------
/images/paper_thumb_3.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/paper_thumb_3.jpeg
--------------------------------------------------------------------------------
/images/paper_thumb_4.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/paper_thumb_4.jpeg
--------------------------------------------------------------------------------
/images/paper_thumb_5.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/paper_thumb_5.jpeg
--------------------------------------------------------------------------------
/images/paper_thumb_6.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/paper_thumb_6.jpeg
--------------------------------------------------------------------------------
/images/paper_thumb_7.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/paper_thumb_7.jpeg
--------------------------------------------------------------------------------
/images/paper_thumb_8.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/paper_thumb_8.jpeg
--------------------------------------------------------------------------------
/images/paper_thumb_9.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/google/localized-narratives/5c5b3031bc6feb1b453410b8cedece4541cf6e7c/images/paper_thumb_9.jpeg
--------------------------------------------------------------------------------
/index.html:
--------------------------------------------------------------------------------
1 |
Connecting Vision and Language with
598 | Localized Narratives
599 | Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, and Vittorio Ferrari
600 | ECCV (Spotlight), 2020
601 | [PDF] [BibTeX] [1'30'' video] [10' video]
602 |
603 |
@inproceedings{PontTuset_eccv2020,
604 | author = {Jordi Pont-Tuset and Jasper Uijlings and Soravit Changpinyo and Radu Soricut and Vittorio Ferrari},
605 | title = {Connecting Vision and Language with Localized Narratives},
606 | booktitle = {ECCV},
607 | year = {2020}
608 | }
609 |
610 |
611 |
Abstract
612 |
613 | We propose Localized Narratives, a new form of multimodal image annotations connecting vision and
614 | language. We ask annotators to describe an image with their voice while simultaneously hovering
615 | their mouse over the region they are describing. Since the voice and the mouse pointer are
616 | synchronized, we can localize every single word in the description. This dense visual grounding
617 | takes the form of a mouse trace segment per word and is unique to our data. We annotated 849k
618 | images with Localized Narratives: the whole COCO, Flickr30k, and ADE20K datasets, and 671k
619 | images of Open Images, all of which we make publicly available. We provide an extensive
620 | analysis of these annotations showing they are diverse, accurate, and efficient to produce.
621 | We also demonstrate their utility on the application of controlled image captioning.
622 |
623 |
624 |
Explore Localized Narratives
625 |
626 | Explore some images and play the Localized Narrative annotation: synchronized voice, caption,
627 | and mouse trace. Don't forget to turn the sound on!
628 |
629 |
630 |
631 |
632 |
Explore
633 |
634 |
635 |
636 | Explore
637 |
638 |
License
639 |
640 | All the annotations available through this website are released under a CC BY 4.0 license.
641 | You are free to redistribute and modify the annotations, but we ask you to please keep the original attribution to our paper.
642 |
643 |
644 |
Code
645 |
646 |
647 |
Python Data Loader and Helpers
648 |
649 | Visit the GitHub repository
650 | to view the code to download and work with Localized
651 | Narratives. Here is the documentation
652 | about the file formats used.
653 | Alternatively, you can manually download the data below.
654 |
655 |
From Traces to Boxes
656 |
657 | This colab
658 | demonstrates how we get from a trace segment to a bounding box.
659 |
660 |
661 |
662 |
663 |
Downloads
664 |
665 |
666 |
Full Localized Narratives
667 |
668 | Here you can download the full set of Localized Narratives (format description).
669 | Large files are split in shards (a list of them will appear when you click
670 | below).
671 | In parantheses, the number of Localized Narratives in each split. Please note that some images have more than one Localized Narrative annotation, e.g. 5k images in COCO are annotated 5 times.
672 |
673 |
674 |
675 |
676 |
File formats
677 |
The annotations are in JSON Lines format,
678 | that is, each line of the file is an independent valid JSON-encoded object. The largest
679 | files are split into smaller sub-files (shards) for ease of download. Since each
680 | line of the file is independent, the whole file can be reconstructed by simply
681 | concatenating the contents of the shards.
682 |
Each line represents one Localized Narrative annotation on one image by one annotator
683 | and has the following fields:
684 |
685 |
dataset_id String identifying the dataset and split where
686 | the image belongs, e.g. mscoco_val2017.
687 |
image_id String identifier of the image, as specified on
688 | each dataset.
689 |
annotator_id Integer number uniquely identifying each annotator.
690 |
caption Image caption as a string of characters.
691 |
timed_caption List of timed utterances, i.e. {utterance, start_time, end_time}
692 | where utterance is a word (or group of words) and (start_time, end_time) is
693 | the time during which it was spoken, with respect to the start of the recording.
694 |
traces List of trace segments, one between each time the mouse
695 | pointer enters the image and goes away from it. Each trace segment is represented as a list
696 | of timed points, i.e. {x, y, t}, where x and y are the normalized
697 | image coordinates (with origin at the top-left corner of the image) and t is
698 | the time in seconds since the start of the recording.
699 | Please note that the coordinates can go a bit beyond the image, i.e. <0
700 | or >1, as we recorded the mouse traces including a small band around the image.
701 |
voice_recording Relative URL path with respect to
702 | https://storage.googleapis.com/localized-narratives/voice-recordings
703 | where to find the voice recording (in
704 | OGG format) for that particular image.
705 |
706 |
Below a sample of one Localized Narrative in this format:
707 |
{
708 | dataset_id: 'mscoco_val2017',
709 | image_id: '137576',
710 | annotator_id: 93,
711 | caption: 'In this image there are group of cows standing and eating th...',
712 | timed_caption: [{'utterance': 'In this', 'start_time': 0.0, 'end_time': 0.4}, ...],
713 | traces: [[{'x': 0.2086, 'y': -0.0533, 't': 0.022}, ...], ...],
714 | voice_recording: 'coco_val/coco_val_137576_93.ogg'
715 | }
892 | To facilitate download, below are the annotations on the same images as above but containing
893 | only the textual caption, in case you are only interested in this part of Localized
894 | Narratives.
895 |
Below you can download the automatic speech-to-text
983 | transcriptions from the voice recordings.
984 | The format is a list of text chunks, each of which is a list of ten alternatives along with its confidence.
985 | Please note: the final caption text of Localized Narratives is given manually by the annotators.
986 | The automatic transcriptions below are only used to temporally align the manual transcription to the mouse traces.
987 | The timestamps used for this, though, were not stored, so the alignment process cannot be reproduced.
988 | To have some timestamps, you'd need to re-run Google's speech-to-text transcription
989 | (here the code we used).
990 | Given that the API is constantly evolving, though, the transcription will likely not match the one stored below.
991 |