├── LICENSE.md
├── README.md
├── evaluation
├── main.py
└── semantic_evaluation.py
├── media
└── figs
│ ├── ann_viz_rgb.jpg
│ ├── ddad_sensors.png
│ ├── ddad_viz.gif
│ ├── hq_viz_rgb.jpg
│ ├── notebook.png
│ ├── odaiba_viz_rgb.jpg
│ ├── pano1.png
│ ├── pano2.png
│ ├── pano3.png
│ └── tri-logo.png
└── notebooks
└── DDAD.ipynb
/LICENSE.md:
--------------------------------------------------------------------------------
1 | # Copyright 2020 Toyota Research Institute. All rights reserved. https://github.com/TRI-ML/DDAD
2 |
3 | This work is licensed under a
4 | Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
5 |
6 | You should have received a copy of the license along with this
7 | work. If not, see .
8 |
9 | =======================================================================
10 |
11 | Attribution-NonCommercial-ShareAlike 4.0 International
12 |
13 | =======================================================================
14 |
15 | Creative Commons Corporation ("Creative Commons") is not a law firm and
16 | does not provide legal services or legal advice. Distribution of
17 | Creative Commons public licenses does not create a lawyer-client or
18 | other relationship. Creative Commons makes its licenses and related
19 | information available on an "as-is" basis. Creative Commons gives no
20 | warranties regarding its licenses, any material licensed under their
21 | terms and conditions, or any related information. Creative Commons
22 | disclaims all liability for damages resulting from their use to the
23 | fullest extent possible.
24 |
25 | Using Creative Commons Public Licenses
26 |
27 | Creative Commons public licenses provide a standard set of terms and
28 | conditions that creators and other rights holders may use to share
29 | original works of authorship and other material subject to copyright
30 | and certain other rights specified in the public license below. The
31 | following considerations are for informational purposes only, are not
32 | exhaustive, and do not form part of our licenses.
33 |
34 | Considerations for licensors: Our public licenses are
35 | intended for use by those authorized to give the public
36 | permission to use material in ways otherwise restricted by
37 | copyright and certain other rights. Our licenses are
38 | irrevocable. Licensors should read and understand the terms
39 | and conditions of the license they choose before applying it.
40 | Licensors should also secure all rights necessary before
41 | applying our licenses so that the public can reuse the
42 | material as expected. Licensors should clearly mark any
43 | material not subject to the license. This includes other CC-
44 | licensed material, or material used under an exception or
45 | limitation to copyright. More considerations for licensors:
46 | wiki.creativecommons.org/Considerations_for_licensors
47 |
48 | Considerations for the public: By using one of our public
49 | licenses, a licensor grants the public permission to use the
50 | licensed material under specified terms and conditions. If
51 | the licensor's permission is not necessary for any reason--for
52 | example, because of any applicable exception or limitation to
53 | copyright--then that use is not regulated by the license. Our
54 | licenses grant only permissions under copyright and certain
55 | other rights that a licensor has authority to grant. Use of
56 | the licensed material may still be restricted for other
57 | reasons, including because others have copyright or other
58 | rights in the material. A licensor may make special requests,
59 | such as asking that all changes be marked or described.
60 | Although not required by our licenses, you are encouraged to
61 | respect those requests where reasonable. More considerations
62 | for the public:
63 | wiki.creativecommons.org/Considerations_for_licensees
64 |
65 | =======================================================================
66 |
67 | Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
68 | Public License
69 |
70 | By exercising the Licensed Rights (defined below), You accept and agree
71 | to be bound by the terms and conditions of this Creative Commons
72 | Attribution-NonCommercial-ShareAlike 4.0 International Public License
73 | ("Public License"). To the extent this Public License may be
74 | interpreted as a contract, You are granted the Licensed Rights in
75 | consideration of Your acceptance of these terms and conditions, and the
76 | Licensor grants You such rights in consideration of benefits the
77 | Licensor receives from making the Licensed Material available under
78 | these terms and conditions.
79 |
80 |
81 | Section 1 -- Definitions.
82 |
83 | a. Adapted Material means material subject to Copyright and Similar
84 | Rights that is derived from or based upon the Licensed Material
85 | and in which the Licensed Material is translated, altered,
86 | arranged, transformed, or otherwise modified in a manner requiring
87 | permission under the Copyright and Similar Rights held by the
88 | Licensor. For purposes of this Public License, where the Licensed
89 | Material is a musical work, performance, or sound recording,
90 | Adapted Material is always produced where the Licensed Material is
91 | synched in timed relation with a moving image.
92 |
93 | b. Adapter's License means the license You apply to Your Copyright
94 | and Similar Rights in Your contributions to Adapted Material in
95 | accordance with the terms and conditions of this Public License.
96 |
97 | c. BY-NC-SA Compatible License means a license listed at
98 | creativecommons.org/compatiblelicenses, approved by Creative
99 | Commons as essentially the equivalent of this Public License.
100 |
101 | d. Copyright and Similar Rights means copyright and/or similar rights
102 | closely related to copyright including, without limitation,
103 | performance, broadcast, sound recording, and Sui Generis Database
104 | Rights, without regard to how the rights are labeled or
105 | categorized. For purposes of this Public License, the rights
106 | specified in Section 2(b)(1)-(2) are not Copyright and Similar
107 | Rights.
108 |
109 | e. Effective Technological Measures means those measures that, in the
110 | absence of proper authority, may not be circumvented under laws
111 | fulfilling obligations under Article 11 of the WIPO Copyright
112 | Treaty adopted on December 20, 1996, and/or similar international
113 | agreements.
114 |
115 | f. Exceptions and Limitations means fair use, fair dealing, and/or
116 | any other exception or limitation to Copyright and Similar Rights
117 | that applies to Your use of the Licensed Material.
118 |
119 | g. License Elements means the license attributes listed in the name
120 | of a Creative Commons Public License. The License Elements of this
121 | Public License are Attribution, NonCommercial, and ShareAlike.
122 |
123 | h. Licensed Material means the artistic or literary work, database,
124 | or other material to which the Licensor applied this Public
125 | License.
126 |
127 | i. Licensed Rights means the rights granted to You subject to the
128 | terms and conditions of this Public License, which are limited to
129 | all Copyright and Similar Rights that apply to Your use of the
130 | Licensed Material and that the Licensor has authority to license.
131 |
132 | j. Licensor means the individual(s) or entity(ies) granting rights
133 | under this Public License.
134 |
135 | k. NonCommercial means not primarily intended for or directed towards
136 | commercial advantage or monetary compensation. For purposes of
137 | this Public License, the exchange of the Licensed Material for
138 | other material subject to Copyright and Similar Rights by digital
139 | file-sharing or similar means is NonCommercial provided there is
140 | no payment of monetary compensation in connection with the
141 | exchange.
142 |
143 | l. Share means to provide material to the public by any means or
144 | process that requires permission under the Licensed Rights, such
145 | as reproduction, public display, public performance, distribution,
146 | dissemination, communication, or importation, and to make material
147 | available to the public including in ways that members of the
148 | public may access the material from a place and at a time
149 | individually chosen by them.
150 |
151 | m. Sui Generis Database Rights means rights other than copyright
152 | resulting from Directive 96/9/EC of the European Parliament and of
153 | the Council of 11 March 1996 on the legal protection of databases,
154 | as amended and/or succeeded, as well as other essentially
155 | equivalent rights anywhere in the world.
156 |
157 | n. You means the individual or entity exercising the Licensed Rights
158 | under this Public License. Your has a corresponding meaning.
159 |
160 |
161 | Section 2 -- Scope.
162 |
163 | a. License grant.
164 |
165 | 1. Subject to the terms and conditions of this Public License,
166 | the Licensor hereby grants You a worldwide, royalty-free,
167 | non-sublicensable, non-exclusive, irrevocable license to
168 | exercise the Licensed Rights in the Licensed Material to:
169 |
170 | a. reproduce and Share the Licensed Material, in whole or
171 | in part, for NonCommercial purposes only; and
172 |
173 | b. produce, reproduce, and Share Adapted Material for
174 | NonCommercial purposes only.
175 |
176 | 2. Exceptions and Limitations. For the avoidance of doubt, where
177 | Exceptions and Limitations apply to Your use, this Public
178 | License does not apply, and You do not need to comply with
179 | its terms and conditions.
180 |
181 | 3. Term. The term of this Public License is specified in Section
182 | 6(a).
183 |
184 | 4. Media and formats; technical modifications allowed. The
185 | Licensor authorizes You to exercise the Licensed Rights in
186 | all media and formats whether now known or hereafter created,
187 | and to make technical modifications necessary to do so. The
188 | Licensor waives and/or agrees not to assert any right or
189 | authority to forbid You from making technical modifications
190 | necessary to exercise the Licensed Rights, including
191 | technical modifications necessary to circumvent Effective
192 | Technological Measures. For purposes of this Public License,
193 | simply making modifications authorized by this Section 2(a)
194 | (4) never produces Adapted Material.
195 |
196 | 5. Downstream recipients.
197 |
198 | a. Offer from the Licensor -- Licensed Material. Every
199 | recipient of the Licensed Material automatically
200 | receives an offer from the Licensor to exercise the
201 | Licensed Rights under the terms and conditions of this
202 | Public License.
203 |
204 | b. Additional offer from the Licensor -- Adapted Material.
205 | Every recipient of Adapted Material from You
206 | automatically receives an offer from the Licensor to
207 | exercise the Licensed Rights in the Adapted Material
208 | under the conditions of the Adapter's License You apply.
209 |
210 | c. No downstream restrictions. You may not offer or impose
211 | any additional or different terms or conditions on, or
212 | apply any Effective Technological Measures to, the
213 | Licensed Material if doing so restricts exercise of the
214 | Licensed Rights by any recipient of the Licensed
215 | Material.
216 |
217 | 6. No endorsement. Nothing in this Public License constitutes or
218 | may be construed as permission to assert or imply that You
219 | are, or that Your use of the Licensed Material is, connected
220 | with, or sponsored, endorsed, or granted official status by,
221 | the Licensor or others designated to receive attribution as
222 | provided in Section 3(a)(1)(A)(i).
223 |
224 | b. Other rights.
225 |
226 | 1. Moral rights, such as the right of integrity, are not
227 | licensed under this Public License, nor are publicity,
228 | privacy, and/or other similar personality rights; however, to
229 | the extent possible, the Licensor waives and/or agrees not to
230 | assert any such rights held by the Licensor to the limited
231 | extent necessary to allow You to exercise the Licensed
232 | Rights, but not otherwise.
233 |
234 | 2. Patent and trademark rights are not licensed under this
235 | Public License.
236 |
237 | 3. To the extent possible, the Licensor waives any right to
238 | collect royalties from You for the exercise of the Licensed
239 | Rights, whether directly or through a collecting society
240 | under any voluntary or waivable statutory or compulsory
241 | licensing scheme. In all other cases the Licensor expressly
242 | reserves any right to collect such royalties, including when
243 | the Licensed Material is used other than for NonCommercial
244 | purposes.
245 |
246 |
247 | Section 3 -- License Conditions.
248 |
249 | Your exercise of the Licensed Rights is expressly made subject to the
250 | following conditions.
251 |
252 | a. Attribution.
253 |
254 | 1. If You Share the Licensed Material (including in modified
255 | form), You must:
256 |
257 | a. retain the following if it is supplied by the Licensor
258 | with the Licensed Material:
259 |
260 | i. identification of the creator(s) of the Licensed
261 | Material and any others designated to receive
262 | attribution, in any reasonable manner requested by
263 | the Licensor (including by pseudonym if
264 | designated);
265 |
266 | ii. a copyright notice;
267 |
268 | iii. a notice that refers to this Public License;
269 |
270 | iv. a notice that refers to the disclaimer of
271 | warranties;
272 |
273 | v. a URI or hyperlink to the Licensed Material to the
274 | extent reasonably practicable;
275 |
276 | b. indicate if You modified the Licensed Material and
277 | retain an indication of any previous modifications; and
278 |
279 | c. indicate the Licensed Material is licensed under this
280 | Public License, and include the text of, or the URI or
281 | hyperlink to, this Public License.
282 |
283 | 2. You may satisfy the conditions in Section 3(a)(1) in any
284 | reasonable manner based on the medium, means, and context in
285 | which You Share the Licensed Material. For example, it may be
286 | reasonable to satisfy the conditions by providing a URI or
287 | hyperlink to a resource that includes the required
288 | information.
289 | 3. If requested by the Licensor, You must remove any of the
290 | information required by Section 3(a)(1)(A) to the extent
291 | reasonably practicable.
292 |
293 | b. ShareAlike.
294 |
295 | In addition to the conditions in Section 3(a), if You Share
296 | Adapted Material You produce, the following conditions also apply.
297 |
298 | 1. The Adapter's License You apply must be a Creative Commons
299 | license with the same License Elements, this version or
300 | later, or a BY-NC-SA Compatible License.
301 |
302 | 2. You must include the text of, or the URI or hyperlink to, the
303 | Adapter's License You apply. You may satisfy this condition
304 | in any reasonable manner based on the medium, means, and
305 | context in which You Share Adapted Material.
306 |
307 | 3. You may not offer or impose any additional or different terms
308 | or conditions on, or apply any Effective Technological
309 | Measures to, Adapted Material that restrict exercise of the
310 | rights granted under the Adapter's License You apply.
311 |
312 |
313 | Section 4 -- Sui Generis Database Rights.
314 |
315 | Where the Licensed Rights include Sui Generis Database Rights that
316 | apply to Your use of the Licensed Material:
317 |
318 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right
319 | to extract, reuse, reproduce, and Share all or a substantial
320 | portion of the contents of the database for NonCommercial purposes
321 | only;
322 |
323 | b. if You include all or a substantial portion of the database
324 | contents in a database in which You have Sui Generis Database
325 | Rights, then the database in which You have Sui Generis Database
326 | Rights (but not its individual contents) is Adapted Material,
327 | including for purposes of Section 3(b); and
328 |
329 | c. You must comply with the conditions in Section 3(a) if You Share
330 | all or a substantial portion of the contents of the database.
331 |
332 | For the avoidance of doubt, this Section 4 supplements and does not
333 | replace Your obligations under this Public License where the Licensed
334 | Rights include other Copyright and Similar Rights.
335 |
336 |
337 | Section 5 -- Disclaimer of Warranties and Limitation of Liability.
338 |
339 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
340 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
341 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
342 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
343 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
344 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
345 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
346 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
347 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
348 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
349 |
350 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
351 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
352 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
353 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
354 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
355 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
356 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
357 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
358 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
359 |
360 | c. The disclaimer of warranties and limitation of liability provided
361 | above shall be interpreted in a manner that, to the extent
362 | possible, most closely approximates an absolute disclaimer and
363 | waiver of all liability.
364 |
365 |
366 | Section 6 -- Term and Termination.
367 |
368 | a. This Public License applies for the term of the Copyright and
369 | Similar Rights licensed here. However, if You fail to comply with
370 | this Public License, then Your rights under this Public License
371 | terminate automatically.
372 |
373 | b. Where Your right to use the Licensed Material has terminated under
374 | Section 6(a), it reinstates:
375 |
376 | 1. automatically as of the date the violation is cured, provided
377 | it is cured within 30 days of Your discovery of the
378 | violation; or
379 |
380 | 2. upon express reinstatement by the Licensor.
381 |
382 | For the avoidance of doubt, this Section 6(b) does not affect any
383 | right the Licensor may have to seek remedies for Your violations
384 | of this Public License.
385 |
386 | c. For the avoidance of doubt, the Licensor may also offer the
387 | Licensed Material under separate terms or conditions or stop
388 | distributing the Licensed Material at any time; however, doing so
389 | will not terminate this Public License.
390 |
391 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
392 | License.
393 |
394 |
395 | Section 7 -- Other Terms and Conditions.
396 |
397 | a. The Licensor shall not be bound by any additional or different
398 | terms or conditions communicated by You unless expressly agreed.
399 |
400 | b. Any arrangements, understandings, or agreements regarding the
401 | Licensed Material not stated herein are separate from and
402 | independent of the terms and conditions of this Public License.
403 |
404 |
405 | Section 8 -- Interpretation.
406 |
407 | a. For the avoidance of doubt, this Public License does not, and
408 | shall not be interpreted to, reduce, limit, restrict, or impose
409 | conditions on any use of the Licensed Material that could lawfully
410 | be made without permission under this Public License.
411 |
412 | b. To the extent possible, if any provision of this Public License is
413 | deemed unenforceable, it shall be automatically reformed to the
414 | minimum extent necessary to make it enforceable. If the provision
415 | cannot be reformed, it shall be severed from this Public License
416 | without affecting the enforceability of the remaining terms and
417 | conditions.
418 |
419 | c. No term or condition of this Public License will be waived and no
420 | failure to comply consented to unless expressly agreed to by the
421 | Licensor.
422 |
423 | d. Nothing in this Public License constitutes or may be interpreted
424 | as a limitation upon, or waiver of, any privileges and immunities
425 | that apply to the Licensor or You, including from the legal
426 | processes of any jurisdiction or authority.
427 |
428 | =======================================================================
429 |
430 | Creative Commons is not a party to its public
431 | licenses. Notwithstanding, Creative Commons may elect to apply one of
432 | its public licenses to material it publishes and in those instances
433 | will be considered the “Licensor.” The text of the Creative Commons
434 | public licenses is dedicated to the public domain under the CC0 Public
435 | Domain Dedication. Except for the limited purpose of indicating that
436 | material is shared under a Creative Commons public license or as
437 | otherwise permitted by the Creative Commons policies published at
438 | creativecommons.org/policies, Creative Commons does not authorize the
439 | use of the trademark "Creative Commons" or any other trademark or logo
440 | of Creative Commons without its prior written consent including,
441 | without limitation, in connection with any unauthorized modifications
442 | to any of its public licenses or any other arrangements,
443 | understandings, or agreements concerning use of licensed material. For
444 | the avoidance of doubt, this paragraph does not form part of the
445 | public licenses.
446 |
447 | Creative Commons may be contacted at creativecommons.org.
448 |
449 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # DDAD - Dense Depth for Autonomous Driving
2 |
3 |
4 |
5 |
6 |
7 | - [DDAD depth challenge](#ddad-depth-challenge)
8 | - [How to Use](#how-to-use)
9 | - [Dataset details](#dataset-details)
10 | - [Dataset stats](#dataset-stats)
11 | - [Sensor placement](#sensor-placement)
12 | - [Evaluation metrics](#evaluation-metrics)
13 | - [IPython notebook](#ipython-notebook)
14 | - [References](#references)
15 | - [Privacy](#privacy)
16 | - [License](#license)
17 |
18 | DDAD is a new autonomous driving benchmark from TRI (Toyota Research Institute) for long range (up to 250m) and dense depth estimation in challenging and diverse urban conditions. It contains monocular videos and accurate ground-truth depth (across a full 360 degree field of view) generated from high-density LiDARs mounted on a fleet of self-driving cars operating in a cross-continental setting. DDAD contains scenes from urban settings in the United States (San Francisco, Bay Area, Cambridge, Detroit, Ann Arbor) and Japan (Tokyo, Odaiba).
19 |
20 | 
21 |
22 | ## DDAD depth challenge
23 |
24 | The [DDAD depth challenge](https://eval.ai/web/challenges/challenge-page/902/overview) consists of two tracks: self-supervised and semi-supervised monocular depth estimation. We will evaluate all methods against the ground truth Lidar depth, and we will also compute and report depth metric per semantic class. The winner will be chosen based on the abs_rel metric. The winners of the challenge will receive cash prizes and will present their work at the CVPR 2021 Workshop [“Frontiers of Monocular 3D Perception”](https://sites.google.com/view/mono3d-workshop). Please check below for details on the [DDAD dataset](#dataset-details), [notebook](ipython-notebook) for loading the data and a description of the [evaluation metrics](#evaluation-metrics).
25 |
26 | ## How to Use
27 |
28 | The data can be downloaded here: [train+val](https://tri-ml-public.s3.amazonaws.com/github/DDAD/datasets/DDAD.tar) (257 GB, md5 checksum: `c0da97967f76da80f86d6f97d0d98904`) and test (coming soon). To load the dataset, please use the [TRI Dataset Governance Policy (DGP) codebase](https://github.com/TRI-ML/dgp). The following snippet will instantiate the dataset:
29 |
30 | ```python
31 | from dgp.datasets import SynchronizedSceneDataset
32 |
33 | # Load synchronized pairs of camera and lidar frames.
34 | dataset =
35 | SynchronizedSceneDataset('/ddad.json',
36 | datum_names=('lidar', 'CAMERA_01', 'CAMERA_05'),
37 | generate_depth_from_datum='lidar',
38 | split='train'
39 | )
40 |
41 | # Iterate through the dataset.
42 | for sample in dataset:
43 | # Each sample contains a list of the requested datums.
44 | lidar, camera_01, camera_05 = sample[0:3]
45 | point_cloud = lidar['point_cloud'] # Nx3 numpy.ndarray
46 | image_01 = camera_01['rgb'] # PIL.Image
47 | depth_01 = camera_01['depth'] # (H,W) numpy.ndarray, generated from 'lidar'
48 | ```
49 |
50 | The [DGP](https://github.com/TRI-ML/dgp) codebase provides a number of functions that allow loading one or multiple camera images, projecting the lidar point cloud into the camera images, intrinsics and extrinsics support, etc. Additionally, please refer to the [Packnet-SfM](https://github.com/TRI-ML/packnet-sfm) codebase (in PyTorch) for more details on how to integrate and use DDAD for depth estimation training/inference/evaluation and state-of-the-art pretrained models.
51 |
52 | ## Dataset details
53 |
54 | DDAD includes high-resolution, long-range [Luminar-H2](https://www.luminartech.com/technology) as the LiDAR sensors used to generate pointclouds, with a maximum range of 250m and sub-1cm range precision. Additionally, it contains six calibrated cameras time-synchronized at 10 Hz, that together produce a 360 degree coverage around the vehicle. The six cameras are 2.4MP (1936 x 1216), global-shutter, and oriented at 60 degree intervals. They are synchronized with 10 Hz scans from our Luminar-H2 sensors oriented at 90 degree intervals (datum names: `camera_01`, `camera_05`, `camera_06`, `camera_07`, `camera_08` and `camera_09`) - the camera intrinsics can be accessed with `datum['intrinsics']`. The data from the Luminar sensors is aggregated into a 360 point cloud covering the scene (datum name: `lidar`). Each sensor has associated extrinsics mapping it to a common vehicle frame of reference (`datum['extrinsics']`).
55 |
56 | The training and validation scenes are 5 or 10 seconds long and consist of 50 or 100 samples with corresponding Luminar-H2 pointcloud and six image frames including intrinsic and extrinsic calibration. The training set contains 150 scenes with a total of 12650 individual samples (75900 RGB images), and the validation set contains 50 scenes with a total of 3950 samples (23700 RGB images).
57 |
58 |
59 |
60 |
61 |
62 |
63 |
64 |
65 |
66 |
67 |
68 | ## Dataset stats
69 |
70 | ### Training split
71 |
72 | | Location | Num Scenes (50 frames) | Num Scenes (100 frames) | Total frames |
73 | | ------------- |:-------------:|:-------------:|:-------------:|
74 | | SF | 0 | 19 | 1900 |
75 | | ANN | 23 | 53 | 6450 |
76 | | DET | 8 | 0 | 400 |
77 | | Japan | 16 | 31 | 3900 |
78 |
79 | Total: `150 scenes` and `12650 frames`.
80 |
81 | ### Validation split
82 |
83 | | Location | Num Scenes (50 frames) | Num Scenes (100 frames) | Total frames |
84 | | ------------- |:-------------:|:-------------:|:-------------:|
85 | | SF | 1 | 10 | 1050 |
86 | | ANN | 11 | 14 | 1950 |
87 | | Japan | 9 | 5 | 950 |
88 |
89 | Total: `50 scenes` and `3950 frames`.
90 |
91 | USA locations: ANN - Ann Arbor, MI; SF - San Francisco Bay Area, CA; DET - Detroit, MI; CAM - Cambridge, Massachusetts. Japan locations: Tokyo and Odaiba.
92 |
93 | ### Test split
94 |
95 | The test split consists of 3080 images with associated intrinsic calibration. The data can be downloaded from [here](https://tri-ml-public.s3.amazonaws.com/github/DDAD/datasets/DDAD_test.tar). 200 images from the test split have associated panoptic labels, similar to the DDAD validation split. The ground truth depth and panoptic labels will not be made public. To evaluate your method on the DDAD test split, please submit your results to the [DDAD depth challenge](https://eval.ai/web/challenges/challenge-page/902/overview), as a single zip file with the same file name convention as the test split (i.e. 000000.png ... 003079.png). Each entry in the zip file should correspond to the DDAD test split image with the same name, and it should be a 16bit single channel PNG image. Each prediction can be either at full image resolution or downsampled. If the resolution of the predicted depth is different from that of the input image, the evaluation script will upsample the predicted depth to the input image resolution using nearest neighbor interpolation.
96 |
97 | ## Sensor placement
98 |
99 | The figure below shows the placement of the DDAD LiDARs and cameras. Please note that both LiDAR and camera sensors are positioned so as to provide 360 degree coverage around the vehicle. The data from all sensors is time synchronized and reported at a frequency of 10 Hz. The data from the Luminar sensors is reported as a single point cloud in the vehicle frame of reference with origin on the ground below the center of the vehicle rear axle, as shown below. For instructions on visualizing the camera images and the point clouds please refer to this [IPython notebook](media/notebooks/DDAD.ipynb).
100 |
101 | 
102 |
103 | ## Evaluation metrics
104 |
105 | Please refer to the the [Packnet-SfM](https://github.com/TRI-ML/packnet-sfm) codebase for instructions on how to compute detailed depth evaluation metrics.
106 |
107 | We also provide an evaluation script compatible with our [Eval.AI challenge](https://eval.ai/web/challenges/challenge-page/902/overview), that can be used to test your submission on the front camera images of the DDAD validation split. Ground-truth depth maps for evaluation (obtained by iterating over the validation dataset in order) can be found [here](https://tri-ml-public.s3.amazonaws.com/github/DDAD/challenge/gt_val.zip), and an example submission file can be found [here](https://tri-ml-public.s3.amazonaws.com/github/DDAD/challenge/pred_val_sup.zip). To evaluate, you can run:
108 |
109 | ```
110 | cd evaluation
111 | python3 main.py gt_val.zip pred_val_sup.zip semi
112 | ```
113 |
114 | ## IPython notebook
115 |
116 | The associated [IPython notebook](notebooks/DDAD.ipynb) provides a detailed description of how to instantiate the dataset with various options, including loading frames with context, visualizing rgb and depth images for various cameras, and displaying the lidar point cloud.
117 |
118 | [](notebooks/DDAD.ipynb)
119 |
120 | ## References
121 |
122 | Please use the following citation when referencing DDAD:
123 |
124 | #### 3D Packing for Self-Supervised Monocular Depth Estimation (CVPR 2020 oral)
125 | *Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos and Adrien Gaidon*, [**[paper]**](https://arxiv.org/abs/1905.02693), [**[video]**](https://www.youtube.com/watch?v=b62iDkLgGSI)
126 | ```
127 | @inproceedings{packnet,
128 | author = {Vitor Guizilini and Rares Ambrus and Sudeep Pillai and Allan Raventos and Adrien Gaidon},
129 | title = {3D Packing for Self-Supervised Monocular Depth Estimation},
130 | booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
131 | primaryClass = {cs.CV}
132 | year = {2020},
133 | }
134 | ```
135 |
136 |
137 | ## Privacy
138 |
139 | To ensure privacy the DDAD dataset has been anonymized (license plate and face blurring) using state-of-the-art object detectors.
140 |
141 |
142 | ## License
143 |
144 | 
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
145 |
--------------------------------------------------------------------------------
/evaluation/main.py:
--------------------------------------------------------------------------------
1 | # Copyright 2021 Toyota Research Institute. All rights reserved.
2 | #
3 | # Validation groundtruth:
4 | # https://tri-ml-public.s3.amazonaws.com/github/DDAD/challenge/gt_val.zip
5 | #
6 | # Example of validation predictions (semi-supervised):
7 | # https://tri-ml-public.s3.amazonaws.com/github/DDAD/challenge/pred_val_sup.zip
8 | #
9 | # Predictions are stored as .png files in the same order as provided by the corresponding split (validation or test)
10 | # For more information, please check our depth estimation repository: https://github.com/tri-ml/packnet-sfm
11 | #
12 | # How to run:
13 | # python3 main.py gt_val.zip pred_val_sup.zip semi
14 |
15 |
16 | import sys
17 | import shutil
18 |
19 | from argparse import Namespace
20 | from zipfile import ZipFile
21 |
22 | from semantic_evaluation import main as SemanticEval
23 |
24 |
25 | def evaluate(gt_zip, pred_zip, phase):
26 |
27 | assert phase in ['semi', 'self'], 'Invalid phase name'
28 |
29 | use_gt_scale = phase == 'self'
30 |
31 | gt_folder = 'data/gt'
32 | print('gt_zip:', gt_zip)
33 | print('gt_folder:', gt_folder)
34 | with ZipFile(gt_zip, 'r') as zip:
35 | shutil.rmtree(gt_folder, ignore_errors=True)
36 | zip.extractall(path=gt_folder)
37 | pred_folder = 'data/pred'
38 | print('pred_zip:', pred_zip)
39 | print('pred_folder:', pred_folder)
40 | with ZipFile(pred_zip, 'r') as zip:
41 | shutil.rmtree(pred_folder, ignore_errors=True)
42 | zip.extractall(path=pred_folder)
43 |
44 | ranges = [200]
45 | metric = 'abs_rel'
46 |
47 | classes = [
48 | "All",
49 | "Road",
50 | "Sidewalk",
51 | "Wall",
52 | "Fence",
53 | "Building",
54 | "Pole",
55 | "T.Light",
56 | "T.Sign",
57 | "Vegetation",
58 | "Terrain",
59 | "Person",
60 | "Rider",
61 | "Car",
62 | "Truck",
63 | "Bus",
64 | "Bicycle",
65 | ]
66 |
67 | args = Namespace(**{
68 | 'gt_folder': gt_folder,
69 | 'pred_folder': pred_folder,
70 | 'ranges': ranges, 'classes': classes,
71 | 'metric': metric,
72 | 'output_folder': None,
73 | 'min_num_valid_pixels': 1,
74 | 'use_gt_scale': use_gt_scale,
75 | })
76 |
77 | dict_output = SemanticEval(args)
78 | print(dict_output)
79 |
80 | if __name__ == "__main__":
81 | gt_zip = sys.argv[1] # Groundtruth .zip folder
82 | pred_zip = sys.argv[2] # Predicted .zip folder
83 | phase = sys.argv[3] # Which phase will be used ('semi' or 'self')
84 | evaluate(gt_zip, pred_zip, phase)
85 |
86 |
--------------------------------------------------------------------------------
/evaluation/semantic_evaluation.py:
--------------------------------------------------------------------------------
1 | # Copyright 2021 Toyota Research Institute. All rights reserved.
2 |
3 | import argparse
4 | import os
5 | from argparse import Namespace
6 | from collections import OrderedDict
7 | from glob import glob
8 |
9 | import cv2
10 | import matplotlib.pyplot as plt
11 | import numpy as np
12 | import torch
13 | from tqdm import tqdm
14 |
15 | ddad_to_cityscapes = {
16 | # ROAD
17 | 7: 7, # Crosswalk
18 | 10: 7, # LaneMarking
19 | 11: 7, # LimitLine
20 | 13: 7, # OtherDriveableSurface
21 | 21: 7, # Road
22 | 24: 7, # RoadMarking
23 | 27: 7, # TemporaryConstructionObject
24 | # SIDEWALK
25 | 25: 8, # SideWalk
26 | 23: 8, # RoadBoundary (Curb)
27 | 14: 8, # OtherFixedStructure
28 | 15: 8, # OtherMovable
29 | # WALL
30 | 16: 12, # Overpass/Bridge/Tunnel
31 | 22: 12, # RoadBarriers
32 | # FENCE
33 | 8: 13, # Fence
34 | # BUILDING
35 | 2: 11, # Building
36 | # POLE
37 | 9: 17, # HorizontalPole
38 | 35: 17, # VerticalPole
39 | # TRAFFIC LIGHT
40 | 30: 19, # TrafficLight
41 | # TRAFFIC SIGN
42 | 31: 20, # TrafficSign
43 | # VEGETATION
44 | 34: 21, # Vegetation
45 | # TERRAIN
46 | 28: 22, # Terrain
47 | # SKY
48 | 26: 23, # Sky
49 | # PERSON
50 | 18: 24, # Pedestrian
51 | # RIDER
52 | 20: 25, # Rider
53 | # CAR
54 | 4: 26, # Car
55 | # TRUCK
56 | 33: 27, # Truck
57 | 5: 27, # Caravan/RV
58 | 6: 27, # ConstructionVehicle
59 | # BUS
60 | 3: 28, # Bus
61 | # TRAIN
62 | 32: 31, # Train
63 | # MOTORCYCLE
64 | 12: 32, # Motorcycle
65 | # BICYCLE
66 | 1: 33, # Bicycle
67 | # IGNORE
68 | 0: 255, # Animal
69 | 17: 255, # OwnCar (EgoCar)
70 | 19: 255, # Railway
71 | 29: 255, # TowedObject
72 | 36: 255, # WheeledSlow
73 | 37: 255, # Void
74 | }
75 |
76 | map_classes = {
77 | "Road": 7,
78 | "Sidewalk": 8,
79 | "Wall": 12,
80 | "Fence": 13,
81 | "Building": 11,
82 | "Pole": 17,
83 | "T.Light": 19,
84 | "T.Sign": 20,
85 | "Vegetation": 21,
86 | "Terrain": 22,
87 | "Sky": 23,
88 | "Person": 24,
89 | "Rider": 25,
90 | "Car": 26,
91 | "Truck": 27,
92 | "Bus": 28,
93 | "Train": 31,
94 | "Motorcycle": 32,
95 | "Bicycle": 33,
96 | "Ignore": 255,
97 | }
98 |
99 |
100 | def convert_ontology(semantic_id, ontology_convert):
101 | """Convert from one ontology to another"""
102 | if ontology_convert is None:
103 | return semantic_id
104 | else:
105 | semantic_id_convert = semantic_id.clone()
106 | for key, val in ontology_convert.items():
107 | semantic_id_convert[semantic_id == key] = val
108 | return semantic_id_convert
109 |
110 |
111 | def parse_args():
112 | """Parse arguments for benchmark script"""
113 | parser = argparse.ArgumentParser(description='PackNet-SfM benchmark script')
114 | parser.add_argument('--gt_folder', type=str,
115 | help='Folder containing predicted depth maps (.npz with key "depth")')
116 | parser.add_argument('--pred_folder', type=str,
117 | help='Folder containing predicted depth maps (.npz with key "depth")')
118 | parser.add_argument('--output_folder', type=str,
119 | help='Output folder where information will be stored')
120 | parser.add_argument('--use_gt_scale', action='store_true',
121 | help='Use ground-truth median scaling on predicted depth maps')
122 | parser.add_argument('--ranges', type=float, nargs='+', default=[200],
123 | help='Depth ranges to consider during evaluation')
124 | parser.add_argument('--classes', type=str, nargs='+', default=['All', 'Car', 'Pedestrian'],
125 | help='Semantic classes to consider during evaluation')
126 | parser.add_argument('--metric', type=str, default='rmse', choices=['abs_rel', 'rmse', 'silog', 'a1'],
127 | help='Which metric will be used for evaluation')
128 | parser.add_argument('--min_num_valid_pixels', type=int, default=1,
129 | help='Minimum number of valid pixels to consider')
130 | args = parser.parse_args()
131 | return args
132 |
133 |
134 | def create_summary_table(ranges, classes, matrix, folder, metric):
135 |
136 | # Prepare variables
137 | title = "Semantic/Range Depth Evaluation (%s) -- {}" % metric.upper()
138 | ranges = ['{}m'.format(r) for r in ranges]
139 | result = matrix.mean().round(decimals=3)
140 | matrix = matrix.round(decimals=2)
141 |
142 | # Create figure and axes
143 | fig, ax = plt.subplots()
144 | ax.imshow(matrix)
145 |
146 | # Show ticks
147 | ax.set_xticks(np.arange(len(ranges)))
148 | ax.set_yticks(np.arange(len(classes)))
149 |
150 | # Label ticks
151 | ax.set_xticklabels(ranges)
152 | ax.set_yticklabels(classes)
153 |
154 | # Rotate tick labels and set alignment
155 | plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
156 | rotation_mode="anchor")
157 |
158 | # Loop over data to create annotations.
159 | for i in range(len(ranges)):
160 | for j in range(len(classes)):
161 | ax.text(i, j, matrix[j, i],
162 | ha="center", va="center", color="w")
163 |
164 | # Plot figure
165 | ax.set_title(title.format(result))
166 | fig.tight_layout()
167 |
168 | # Save and show
169 | plt.savefig('{}/summary_table.png'.format(folder))
170 | plt.close()
171 |
172 |
173 | def create_bar_plot(key_range, key_class, matrix, name, idx, folder):
174 |
175 | # Prepare title and start plot
176 | title = 'Per-frame depth evaluation of **{} at {}m**'.format(key_class, key_range)
177 | fig, ax = plt.subplots(figsize=(10, 8))
178 |
179 | # Get x ticks and values
180 | x_ticks = [int(m[0]) for m in matrix]
181 | x_values = range(len(matrix))
182 | # Get y values
183 | y_values = [m[2 + idx] for m in matrix]
184 |
185 | # Prepare titles, ticks and labels
186 | ax.set_title(title)
187 | ax.set_xticks(x_values)
188 | ax.set_xticklabels(x_ticks)
189 | ax.set_xlabel('Image frame')
190 | ax.set_ylabel('{}'.format(name.upper()))
191 |
192 | # Rotate tick labels and set alignment
193 | plt.setp(ax.get_xticklabels(), rotation=70, ha="right",
194 | rotation_mode="anchor")
195 |
196 | # Show and save
197 | ax.bar(x_values, y_values)
198 | plt.savefig('{}/{}-{}m-{}.png'.format(folder, key_class, key_range, name))
199 |
200 |
201 | def load_sem_ins(file):
202 | """Load GT semantic and instance maps"""
203 | sem = file.replace('_gt', '_sem')
204 | if os.path.isfile(sem):
205 | ins = file.replace('_gt', '_ins')
206 | sem = cv2.imread(sem, cv2.IMREAD_ANYDEPTH) / 256.
207 | ins = cv2.imread(ins, cv2.IMREAD_ANYDEPTH) / 256.
208 | else:
209 | sem = ins = None
210 | return sem, ins
211 |
212 |
213 | def load_depth(depth):
214 | """Load a depth map"""
215 | depth = cv2.imread(depth, cv2.IMREAD_ANYDEPTH) / 256.
216 | depth = torch.tensor(depth).unsqueeze(0).unsqueeze(0)
217 | return depth
218 |
219 |
220 | def compute_depth_metrics(config, gt, pred, use_gt_scale=True,
221 | extra_mask=None, min_num_valid_pixels=1):
222 | """
223 | Compute depth metrics from predicted and ground-truth depth maps
224 |
225 | Parameters
226 | ----------
227 | config : CfgNode
228 | Metrics parameters
229 | gt : torch.Tensor
230 | Ground-truth depth map [B,1,H,W]
231 | pred : torch.Tensor
232 | Predicted depth map [B,1,H,W]
233 | use_gt_scale : bool
234 | True if ground-truth median-scaling is to be used
235 | extra_mask : torch.Tensor
236 | Extra mask to be used for calculation (e.g. semantic mask)
237 | min_num_valid_pixels : int
238 | Minimum number of valid pixels for the image to be considered
239 |
240 | Returns
241 | -------
242 | metrics : torch.Tensor [7]
243 | Depth metrics (abs_rel, sq_rel, rmse, rmse_log, a1, a2, a3)
244 | """
245 | # Initialize variables
246 | batch_size, _, gt_height, gt_width = gt.shape
247 | abs_diff = abs_rel = sq_rel = rmse = rmse_log = silog = a1 = a2 = a3 = 0.0
248 | # For each depth map
249 | for pred_i, gt_i in zip(pred, gt):
250 | gt_i, pred_i = torch.squeeze(gt_i), torch.squeeze(pred_i)
251 |
252 | # Keep valid pixels (min/max depth and crop)
253 | valid = (gt_i > config.min_depth) & (gt_i < config.max_depth)
254 | valid = valid & torch.squeeze(extra_mask) if extra_mask is not None else valid
255 |
256 | # Stop if there are no remaining valid pixels
257 | if valid.sum() < min_num_valid_pixels:
258 | return None, None
259 |
260 | # Keep only valid pixels
261 | gt_i, pred_i = gt_i[valid], pred_i[valid]
262 |
263 | # Ground-truth median scaling if needed
264 | if use_gt_scale:
265 | pred_i = pred_i * torch.median(gt_i) / torch.median(pred_i)
266 |
267 | # Clamp predicted depth values to min/max values
268 | pred_i = pred_i.clamp(config.min_depth, config.max_depth)
269 |
270 | # Calculate depth metrics
271 |
272 | thresh = torch.max((gt_i / pred_i), (pred_i / gt_i))
273 | a1 += (thresh < 1.25 ).float().mean()
274 | a2 += (thresh < 1.25 ** 2).float().mean()
275 | a3 += (thresh < 1.25 ** 3).float().mean()
276 |
277 | diff_i = gt_i - pred_i
278 | abs_diff += torch.mean(torch.abs(diff_i))
279 | abs_rel += torch.mean(torch.abs(diff_i) / gt_i)
280 | sq_rel += torch.mean(diff_i ** 2 / gt_i)
281 | rmse += torch.sqrt(torch.mean(diff_i ** 2))
282 | rmse_log += torch.sqrt(torch.mean((torch.log(gt_i) -
283 | torch.log(pred_i)) ** 2))
284 |
285 | err = torch.log(pred_i) - torch.log(gt_i)
286 | silog += torch.sqrt(torch.mean(err ** 2) - torch.mean(err) ** 2) * 100
287 |
288 | # Return average values for each metric
289 | return torch.tensor([metric / batch_size for metric in
290 | [abs_rel, sq_rel, rmse, rmse_log, silog, a1, a2, a3]]).type_as(gt), valid.sum()
291 |
292 |
293 | def main(args):
294 |
295 | # Get and sort ground-truth and predicted files
296 | pred_files = glob(os.path.join(args.pred_folder, '*.png'))
297 | pred_files.sort()
298 |
299 | gt_files = glob(os.path.join(args.gt_folder, '*_gt.png'))
300 | gt_files.sort()
301 |
302 | depth_ranges = args.ranges
303 | depth_classes = args.classes
304 |
305 | print('#### Depth ranges to evaluate:', depth_ranges)
306 | print('#### Depth classes to evaluate:', depth_classes)
307 | print('#### Number of predicted and groundtruth files:', len(pred_files), len(gt_files))
308 |
309 | # Metrics name
310 | metric_names = ['abs_rel', 'sqr_rel', 'rmse', 'rmse_log', 'silog', 'a1', 'a2', 'a3']
311 | matrix_metric = 'rmse'
312 |
313 | # Prepare matrix information
314 | matrix_idx = metric_names.index(matrix_metric)
315 | matrix = np.zeros((len(depth_classes), len(depth_ranges)))
316 |
317 | # Create metrics dictionary
318 | all_metrics = OrderedDict()
319 | for depth in depth_ranges:
320 | all_metrics[depth] = OrderedDict()
321 | for classes in depth_classes:
322 | all_metrics[depth][classes] = []
323 |
324 | assert len(pred_files) == len(gt_files), 'Wrong number of files'
325 |
326 | # Loop over all files
327 | progress_bar = tqdm(zip(pred_files, gt_files), total=len(pred_files))
328 | for i, (pred_file, gt_file) in enumerate(progress_bar):
329 | # Get and prepare ground-truth and predictions
330 | pred = load_depth(pred_file)
331 | gt = load_depth(gt_file)
332 | pred = torch.nn.functional.interpolate(pred, gt.shape[2:], mode='nearest')
333 | # Check for semantics
334 | sem = gt_file.replace('_gt.png', '_sem.png')
335 | with_semantic = os.path.exists(sem)
336 | if with_semantic:
337 | sem = torch.tensor(load_sem_ins(sem)[0]).unsqueeze(0).unsqueeze(0)
338 | if sem.max() < 1.0:
339 | sem = sem * 256
340 | sem = torch.nn.functional.interpolate(sem, gt.shape[2:], mode='nearest')
341 | sem = convert_ontology(sem, ddad_to_cityscapes)
342 | else:
343 | pass
344 | # Calculate metrics
345 | for key_depth in all_metrics.keys():
346 | for key_class in all_metrics[key_depth].keys():
347 | # Prepare config dictionary
348 | args_key = Namespace(**{
349 | 'min_depth': 0,
350 | 'max_depth': key_depth,
351 | })
352 | # Initialize metrics as None
353 | metrics, num = None, None
354 | # Considering all pixels
355 | if key_class == 'All':
356 | metrics, num = compute_depth_metrics(
357 | args_key, gt, pred, use_gt_scale=args.use_gt_scale)
358 | # Considering semantic classes
359 | elif with_semantic:
360 | metrics, num = compute_depth_metrics(
361 | args_key, gt, pred, use_gt_scale=args.use_gt_scale,
362 | extra_mask=sem == map_classes[key_class],
363 | min_num_valid_pixels=args.min_num_valid_pixels)
364 | # Store metrics if available
365 | if metrics is not None:
366 | metrics = metrics.detach().cpu().numpy()
367 | metrics = np.array([i, num] + list(metrics))
368 | all_metrics[key_depth][key_class].append(metrics)
369 |
370 | if args.output_folder is None:
371 | out_dict = {}
372 | # Loop over range values
373 | for key1, val1 in all_metrics.items():
374 | # Loop over depth metrics
375 | for key2, val2 in val1.items():
376 | key = '{}_{}m'.format(key2, key1)
377 | if len(val2) > 0:
378 | out_dict[key] = {}
379 | for i in range(len(metric_names)):
380 | idx = [val2[j][0] for j in range(len(val2))]
381 | nums = [val2[j][1] for j in range(len(val2))]
382 | vals = [val2[j][i+2] for j in range(len(val2))]
383 | out_dict[key]['{}'.format(metric_names[i])] = sum(
384 | [n * v for n, v in zip(nums, vals)]) / sum(nums)
385 | vals = [val2[j][i+2] for j in range(len(val2))]
386 | out_dict[key]['{}'.format(metric_names[i])] = sum(vals) / len(vals)
387 | else:
388 | out_dict[key] = None
389 |
390 | m_abs_rel = {}
391 | for key, val in out_dict.items():
392 | if 'All' not in key:
393 | m_abs_rel[key] = val['abs_rel'] if val is not None else None
394 | m_abs_rel = sum([val for val in m_abs_rel.values()]) / len(m_abs_rel.values())
395 |
396 | filtered_dict = {
397 | 'AbsRel': out_dict['All_200m']['abs_rel'],
398 | 'RMSE': out_dict['All_200m']['rmse'],
399 | 'SILog': out_dict['All_200m']['silog'],
400 | 'a1': out_dict['All_200m']['a1'],
401 | 'Car_AbsRel': out_dict['Car_200m']['abs_rel'],
402 | 'Person_AbsRel': out_dict['Person_200m']['abs_rel'],
403 | 'mAbsRel': m_abs_rel,
404 | }
405 |
406 | return filtered_dict
407 |
408 | # Terminal lines
409 | met_line = '| {:>11} | {:^5} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} |'
410 | hor_line = '|{:<}|'.format('-' * 109)
411 | num_line = '| {:>10}m | {:>5} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} |'
412 | # File lines
413 | hor_line_file = '|{:<}|'.format('-' * 106)
414 | met_line_file = '| {:>8} | {:^5} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} |'
415 | num_line_file = '| {:>8} | {:>5} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} |'
416 | # Create output folder
417 | os.makedirs(args.output_folder, exist_ok=True)
418 |
419 | # Loop over the dataset
420 | for i, key_class in enumerate(depth_classes):
421 | # Create file and write header
422 | file = open('{}/{}.txt'.format(args.output_folder, key_class), 'w')
423 | file.write(hor_line_file + '\n')
424 | file.write('| ***** {} *****\n'.format(key_class.upper()))
425 | # Print header
426 | print(hor_line)
427 | print(met_line.format(*((key_class.upper()), '#') + tuple(metric_names)))
428 | print(hor_line)
429 | # Loop over each depth range and semantic class
430 | for j, key_depth in enumerate(depth_ranges):
431 | metrics = all_metrics[key_depth][key_class]
432 | if len(metrics) > 0:
433 | # How many metrics were generated for that combination
434 | length = len(metrics)
435 | # Update file
436 | file.write(hor_line_file + '\n')
437 | file.write(met_line_file.format(*('{}m'.format(key_depth), '#') + tuple(metric_names)) + '\n')
438 | file.write(hor_line_file + '\n')
439 | # Create bar plot
440 | create_bar_plot(key_depth, key_class, metrics, matrix_metric, matrix_idx, args.output_folder)
441 | # Save individual metric to file
442 | for metric in metrics:
443 | idx, qty, metric = int(metric[0]), int(metric[1]), metric[2:]
444 | file.write(num_line_file.format(*(idx, qty) + tuple(metric)) + '\n')
445 | # Average metrics and update matrix
446 | metrics = (sum(metrics) / len(metrics))
447 | matrix[i, j] = metrics[2 + matrix_idx]
448 | # Print to terminal
449 | print(num_line.format(*((key_depth, length) + tuple(metrics[2:]))))
450 | # Update file
451 | file.write(hor_line_file + '\n')
452 | file.write(num_line_file.format(*('TOTAL', length) + tuple(metrics[2:])) + '\n')
453 | file.write(hor_line_file + '\n')
454 | # Finish file
455 | file.write(hor_line_file + '\n')
456 | file.close()
457 | # Finish terminal printing
458 | print(hor_line)
459 | # Create final results
460 | create_summary_table(depth_ranges, depth_classes, matrix, args.output_folder, args.metric)
461 |
462 |
463 | if __name__ == '__main__':
464 | args = parse_args()
465 | main(args)
466 |
--------------------------------------------------------------------------------
/media/figs/ann_viz_rgb.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/ann_viz_rgb.jpg
--------------------------------------------------------------------------------
/media/figs/ddad_sensors.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/ddad_sensors.png
--------------------------------------------------------------------------------
/media/figs/ddad_viz.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/ddad_viz.gif
--------------------------------------------------------------------------------
/media/figs/hq_viz_rgb.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/hq_viz_rgb.jpg
--------------------------------------------------------------------------------
/media/figs/notebook.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/notebook.png
--------------------------------------------------------------------------------
/media/figs/odaiba_viz_rgb.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/odaiba_viz_rgb.jpg
--------------------------------------------------------------------------------
/media/figs/pano1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/pano1.png
--------------------------------------------------------------------------------
/media/figs/pano2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/pano2.png
--------------------------------------------------------------------------------
/media/figs/pano3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/pano3.png
--------------------------------------------------------------------------------
/media/figs/tri-logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/tri-logo.png
--------------------------------------------------------------------------------
/notebooks/DDAD.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# DDAD - Dense Depth for Autonomous Driving\n",
8 | "\n",
9 | "DDAD is a new autonomous driving benchmark from TRI (Toyota Research Institute) for long range (up to 250m) and dense depth estimation in challenging and diverse urban conditions. It contains monocular videos and accurate ground-truth depth (across a full 360 degree field of view) generated from high-density LiDARs mounted on a fleet of self-driving cars operating in a cross-continental setting. DDAD contains scenes from urban settings in the United States (San Francisco, Bay Area, Cambridge, Detroit, Ann Arbor) and Japan (Tokyo, Odaiba). This notebook will demonstrate a number of simple steps that will allow you to load and visualize the DDAD dataset.\n"
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": null,
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import cv2\n",
19 | "import numpy as np\n",
20 | "import PIL\n",
21 | "from IPython import display\n",
22 | "from matplotlib.cm import get_cmap\n",
23 | "\n",
24 | "from dgp.datasets.synchronized_dataset import SynchronizedSceneDataset\n",
25 | "from dgp.proto.ontology_pb2 import Ontology\n",
26 | "from dgp.utils.protobuf import open_pbobject\n",
27 | "from dgp.utils.visualization import visualize_semantic_segmentation_2d\n",
28 | "\n",
29 | "plasma_color_map = get_cmap('plasma')"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": null,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# Define high level variables\n",
39 | "DDAD_TRAIN_VAL_JSON_PATH = '/data/datasets/ddad_train_val/ddad.json'\n",
40 | "DDAD_TEST_JSON_PATH = '/data/datasets/ddad_test/ddad_test.json'\n",
41 | "DATUMS = ['lidar'] + ['CAMERA_%02d' % idx for idx in [1, 5, 6, 7, 8, 9]] "
42 | ]
43 | },
44 | {
45 | "cell_type": "markdown",
46 | "metadata": {},
47 | "source": [
48 | "## DDAD Train split\n",
49 | "\n",
50 | "The training set contains 150 scenes with a total of 12650 individual samples (75900 RGB images)."
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": null,
56 | "metadata": {},
57 | "outputs": [],
58 | "source": [
59 | "ddad_train = SynchronizedSceneDataset(\n",
60 | " DDAD_TRAIN_VAL_JSON_PATH,\n",
61 | " split='train',\n",
62 | " datum_names=DATUMS,\n",
63 | " generate_depth_from_datum='lidar'\n",
64 | ")\n",
65 | "print('Loaded DDAD train split containing {} samples'.format(len(ddad_train)))"
66 | ]
67 | },
68 | {
69 | "cell_type": "markdown",
70 | "metadata": {},
71 | "source": [
72 | "### Load a random sample"
73 | ]
74 | },
75 | {
76 | "cell_type": "code",
77 | "execution_count": null,
78 | "metadata": {},
79 | "outputs": [],
80 | "source": [
81 | "random_sample_idx = np.random.randint(len(ddad_train))\n",
82 | "sample = ddad_train[random_sample_idx] # scene[0] - lidar, scene[1:] - camera datums\n",
83 | "sample_datum_names = [datum['datum_name'] for datum in sample]\n",
84 | "print('Loaded sample {} with datums {}'.format(random_sample_idx, sample_datum_names))"
85 | ]
86 | },
87 | {
88 | "cell_type": "markdown",
89 | "metadata": {},
90 | "source": [
91 | "### Visualize camera images"
92 | ]
93 | },
94 | {
95 | "cell_type": "code",
96 | "execution_count": null,
97 | "metadata": {},
98 | "outputs": [],
99 | "source": [
100 | "# Concat images and visualize\n",
101 | "images = [cam['rgb'].resize((192,120), PIL.Image.BILINEAR) for cam in sample[1:]]\n",
102 | "images = np.concatenate(images, axis=1)\n",
103 | "display.display(PIL.Image.fromarray(images))"
104 | ]
105 | },
106 | {
107 | "cell_type": "markdown",
108 | "metadata": {},
109 | "source": [
110 | "### Visualize corresponding depths"
111 | ]
112 | },
113 | {
114 | "cell_type": "code",
115 | "execution_count": null,
116 | "metadata": {},
117 | "outputs": [],
118 | "source": [
119 | "# Visualize corresponding depths, if the depth has been projected into the camera images\n",
120 | "if 'depth' in sample[1].keys():\n",
121 | " # Load and resize depth images\n",
122 | " depths = [cv2.resize(cam['depth'], dsize=(192,120), interpolation=cv2.INTER_NEAREST) \\\n",
123 | " for cam in sample[1:]]\n",
124 | " # Convert to RGB for visualization\n",
125 | " depths = [plasma_color_map(d)[:,:,:3] for d in depths]\n",
126 | " depths = np.concatenate(depths, axis=1)\n",
127 | " display.display(PIL.Image.fromarray((depths*255).astype(np.uint8)))"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | "### Visualize Lidar"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": null,
140 | "metadata": {},
141 | "outputs": [],
142 | "source": [
143 | "# Note: this requires open3d\n",
144 | "import open3d as o3d\n",
145 | "\n",
146 | "# Get lidar point cloud from sample\n",
147 | "lidar_cloud = sample[0]['point_cloud']\n",
148 | "# Create open3d visualization objects\n",
149 | "o3d_colors = np.tile(np.array([0., 0., 0.]), (len(lidar_cloud), 1))\n",
150 | "o3d_cloud = o3d.geometry.PointCloud()\n",
151 | "o3d_cloud.points = o3d.utility.Vector3dVector(lidar_cloud)\n",
152 | "o3d_cloud.colors = o3d.utility.Vector3dVector(o3d_colors)\n",
153 | "# Visualize (Note: needs open3d, openGL and X server support)\n",
154 | "o3d.visualization.draw_geometries([o3d_cloud])"
155 | ]
156 | },
157 | {
158 | "cell_type": "markdown",
159 | "metadata": {},
160 | "source": [
161 | "## DDAD train with temporal context\n",
162 | "\n",
163 | "To also return temporally adjacent scenes, use `forward_context` and `backward_context`."
164 | ]
165 | },
166 | {
167 | "cell_type": "code",
168 | "execution_count": null,
169 | "metadata": {},
170 | "outputs": [],
171 | "source": [
172 | "# Intantiate dataset with forward and backward context\n",
173 | "\n",
174 | "ddad_train_with_context = SynchronizedSceneDataset(\n",
175 | " DDAD_TRAIN_VAL_JSON_PATH,\n",
176 | " split='train',\n",
177 | " datum_names=('CAMERA_01',),\n",
178 | " generate_depth_from_datum='lidar',\n",
179 | " forward_context=1, \n",
180 | " backward_context=1\n",
181 | ")"
182 | ]
183 | },
184 | {
185 | "cell_type": "markdown",
186 | "metadata": {},
187 | "source": [
188 | "### Visualize front camera images"
189 | ]
190 | },
191 | {
192 | "cell_type": "code",
193 | "execution_count": null,
194 | "metadata": {},
195 | "outputs": [],
196 | "source": [
197 | "# Load random sample\n",
198 | "# Note that when forward_context or backward_context is used, the loader returns a list of samples\n",
199 | "samples = ddad_train_with_context[np.random.randint(len(ddad_train))] \n",
200 | "front_cam_images = []\n",
201 | "for sample in samples:\n",
202 | " front_cam_images.append(sample[0]['rgb'])\n",
203 | "# Resize images and visualize\n",
204 | "front_cam_images = [img.resize((192,120), PIL.Image.BILINEAR) for img in front_cam_images]\n",
205 | "front_cam_images = np.concatenate(front_cam_images, axis=1)\n",
206 | "display.display(PIL.Image.fromarray(front_cam_images))"
207 | ]
208 | },
209 | {
210 | "cell_type": "markdown",
211 | "metadata": {},
212 | "source": [
213 | "## DDAD Val split\n",
214 | "\n",
215 | "The validation set contains 50 scenes with a total of 3950 individual samples."
216 | ]
217 | },
218 | {
219 | "cell_type": "code",
220 | "execution_count": null,
221 | "metadata": {},
222 | "outputs": [],
223 | "source": [
224 | "# Load the val set\n",
225 | "ddad_val = SynchronizedSceneDataset(\n",
226 | " DDAD_TRAIN_VAL_JSON_PATH,\n",
227 | " split='val',\n",
228 | " datum_names=DATUMS,\n",
229 | " generate_depth_from_datum='lidar'\n",
230 | ")\n",
231 | "print('Loaded DDAD val split containing {} samples'.format(len(ddad_val)))"
232 | ]
233 | },
234 | {
235 | "cell_type": "markdown",
236 | "metadata": {},
237 | "source": [
238 | "### Load the panoptic segmentation labels from the val set\n",
239 | "\n",
240 | "50 of the DDAD validation samples have panoptic segmentation annotations for the front camera images. These annotations can be used for detailed, per-class evaluation."
241 | ]
242 | },
243 | {
244 | "cell_type": "code",
245 | "execution_count": null,
246 | "metadata": {
247 | "scrolled": false
248 | },
249 | "outputs": [],
250 | "source": [
251 | "ddad_val = SynchronizedSceneDataset(\n",
252 | " DDAD_TRAIN_VAL_JSON_PATH,\n",
253 | " split='val',\n",
254 | " datum_names=('CAMERA_01',),\n",
255 | " requested_annotations=('semantic_segmentation_2d', 'instance_segmentation_2d'),\n",
256 | " only_annotated_datums=True\n",
257 | ")\n",
258 | "print('Loaded annotated samples from DDAD val split. Total samples: {}.'.format(len(ddad_val)))"
259 | ]
260 | },
261 | {
262 | "cell_type": "markdown",
263 | "metadata": {},
264 | "source": [
265 | "# Visualize the semantic segmentation labels"
266 | ]
267 | },
268 | {
269 | "cell_type": "code",
270 | "execution_count": null,
271 | "metadata": {},
272 | "outputs": [],
273 | "source": [
274 | "# Load instance and semantic segmentation ontologies\n",
275 | "semseg_ontology = open_pbobject(ddad_val.scenes[0].ontology_files['semantic_segmentation_2d'], Ontology)\n",
276 | "instance_ontology = open_pbobject(ddad_val.scenes[0].ontology_files['instance_segmentation_2d'], Ontology)\n",
277 | "\n",
278 | "# Load random sample \n",
279 | "random_sample_idx = np.random.randint(len(ddad_val))\n",
280 | "sample = ddad_val[random_sample_idx]\n",
281 | "\n",
282 | "# Get image and sample segmentation annotation from sample \n",
283 | "image = np.array(sample[0]['rgb'])\n",
284 | "semantic_segmentation_2d_annotation = sample [0]['semantic_segmentation_2d']\n",
285 | "sem_seg_image = visualize_semantic_segmentation_2d(\n",
286 | " semantic_segmentation_2d_annotation, semseg_ontology, image=image, debug=False\n",
287 | ")\n",
288 | "\n",
289 | "# Visualize\n",
290 | "image = cv2.resize(image, dsize=(320,240), interpolation=cv2.INTER_NEAREST)\n",
291 | "sem_seg_image = cv2.resize(sem_seg_image, dsize=(320,240), interpolation=cv2.INTER_NEAREST)\n",
292 | "display.display(PIL.Image.fromarray(np.concatenate([image, sem_seg_image], axis=1)))"
293 | ]
294 | },
295 | {
296 | "cell_type": "markdown",
297 | "metadata": {},
298 | "source": [
299 | "## DDAD Test split"
300 | ]
301 | },
302 | {
303 | "cell_type": "code",
304 | "execution_count": null,
305 | "metadata": {},
306 | "outputs": [],
307 | "source": [
308 | "ddad_test = SynchronizedSceneDataset(\n",
309 | " DDAD_TEST_JSON_PATH,\n",
310 | " split='test',\n",
311 | " datum_names=DATUMS\n",
312 | ")\n",
313 | "print('Loaded DDAD test split containing {} samples'.format(len(ddad_test)))"
314 | ]
315 | }
316 | ],
317 | "metadata": {
318 | "kernelspec": {
319 | "display_name": "Python 3",
320 | "language": "python",
321 | "name": "python3"
322 | },
323 | "language_info": {
324 | "codemirror_mode": {
325 | "name": "ipython",
326 | "version": 3
327 | },
328 | "file_extension": ".py",
329 | "mimetype": "text/x-python",
330 | "name": "python",
331 | "nbconvert_exporter": "python",
332 | "pygments_lexer": "ipython3",
333 | "version": "3.6.9"
334 | }
335 | },
336 | "nbformat": 4,
337 | "nbformat_minor": 4
338 | }
--------------------------------------------------------------------------------