├── LICENSE.md ├── README.md ├── evaluation ├── main.py └── semantic_evaluation.py ├── media └── figs │ ├── ann_viz_rgb.jpg │ ├── ddad_sensors.png │ ├── ddad_viz.gif │ ├── hq_viz_rgb.jpg │ ├── notebook.png │ ├── odaiba_viz_rgb.jpg │ ├── pano1.png │ ├── pano2.png │ ├── pano3.png │ └── tri-logo.png └── notebooks └── DDAD.ipynb /LICENSE.md: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Toyota Research Institute. All rights reserved. https://github.com/TRI-ML/DDAD 2 | 3 | This work is licensed under a 4 | Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 5 | 6 | You should have received a copy of the license along with this 7 | work. If not, see . 8 | 9 | ======================================================================= 10 | 11 | Attribution-NonCommercial-ShareAlike 4.0 International 12 | 13 | ======================================================================= 14 | 15 | Creative Commons Corporation ("Creative Commons") is not a law firm and 16 | does not provide legal services or legal advice. Distribution of 17 | Creative Commons public licenses does not create a lawyer-client or 18 | other relationship. Creative Commons makes its licenses and related 19 | information available on an "as-is" basis. Creative Commons gives no 20 | warranties regarding its licenses, any material licensed under their 21 | terms and conditions, or any related information. Creative Commons 22 | disclaims all liability for damages resulting from their use to the 23 | fullest extent possible. 24 | 25 | Using Creative Commons Public Licenses 26 | 27 | Creative Commons public licenses provide a standard set of terms and 28 | conditions that creators and other rights holders may use to share 29 | original works of authorship and other material subject to copyright 30 | and certain other rights specified in the public license below. The 31 | following considerations are for informational purposes only, are not 32 | exhaustive, and do not form part of our licenses. 33 | 34 | Considerations for licensors: Our public licenses are 35 | intended for use by those authorized to give the public 36 | permission to use material in ways otherwise restricted by 37 | copyright and certain other rights. Our licenses are 38 | irrevocable. Licensors should read and understand the terms 39 | and conditions of the license they choose before applying it. 40 | Licensors should also secure all rights necessary before 41 | applying our licenses so that the public can reuse the 42 | material as expected. Licensors should clearly mark any 43 | material not subject to the license. This includes other CC- 44 | licensed material, or material used under an exception or 45 | limitation to copyright. More considerations for licensors: 46 | wiki.creativecommons.org/Considerations_for_licensors 47 | 48 | Considerations for the public: By using one of our public 49 | licenses, a licensor grants the public permission to use the 50 | licensed material under specified terms and conditions. If 51 | the licensor's permission is not necessary for any reason--for 52 | example, because of any applicable exception or limitation to 53 | copyright--then that use is not regulated by the license. Our 54 | licenses grant only permissions under copyright and certain 55 | other rights that a licensor has authority to grant. Use of 56 | the licensed material may still be restricted for other 57 | reasons, including because others have copyright or other 58 | rights in the material. A licensor may make special requests, 59 | such as asking that all changes be marked or described. 60 | Although not required by our licenses, you are encouraged to 61 | respect those requests where reasonable. More considerations 62 | for the public: 63 | wiki.creativecommons.org/Considerations_for_licensees 64 | 65 | ======================================================================= 66 | 67 | Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International 68 | Public License 69 | 70 | By exercising the Licensed Rights (defined below), You accept and agree 71 | to be bound by the terms and conditions of this Creative Commons 72 | Attribution-NonCommercial-ShareAlike 4.0 International Public License 73 | ("Public License"). To the extent this Public License may be 74 | interpreted as a contract, You are granted the Licensed Rights in 75 | consideration of Your acceptance of these terms and conditions, and the 76 | Licensor grants You such rights in consideration of benefits the 77 | Licensor receives from making the Licensed Material available under 78 | these terms and conditions. 79 | 80 | 81 | Section 1 -- Definitions. 82 | 83 | a. Adapted Material means material subject to Copyright and Similar 84 | Rights that is derived from or based upon the Licensed Material 85 | and in which the Licensed Material is translated, altered, 86 | arranged, transformed, or otherwise modified in a manner requiring 87 | permission under the Copyright and Similar Rights held by the 88 | Licensor. For purposes of this Public License, where the Licensed 89 | Material is a musical work, performance, or sound recording, 90 | Adapted Material is always produced where the Licensed Material is 91 | synched in timed relation with a moving image. 92 | 93 | b. Adapter's License means the license You apply to Your Copyright 94 | and Similar Rights in Your contributions to Adapted Material in 95 | accordance with the terms and conditions of this Public License. 96 | 97 | c. BY-NC-SA Compatible License means a license listed at 98 | creativecommons.org/compatiblelicenses, approved by Creative 99 | Commons as essentially the equivalent of this Public License. 100 | 101 | d. Copyright and Similar Rights means copyright and/or similar rights 102 | closely related to copyright including, without limitation, 103 | performance, broadcast, sound recording, and Sui Generis Database 104 | Rights, without regard to how the rights are labeled or 105 | categorized. For purposes of this Public License, the rights 106 | specified in Section 2(b)(1)-(2) are not Copyright and Similar 107 | Rights. 108 | 109 | e. Effective Technological Measures means those measures that, in the 110 | absence of proper authority, may not be circumvented under laws 111 | fulfilling obligations under Article 11 of the WIPO Copyright 112 | Treaty adopted on December 20, 1996, and/or similar international 113 | agreements. 114 | 115 | f. Exceptions and Limitations means fair use, fair dealing, and/or 116 | any other exception or limitation to Copyright and Similar Rights 117 | that applies to Your use of the Licensed Material. 118 | 119 | g. License Elements means the license attributes listed in the name 120 | of a Creative Commons Public License. The License Elements of this 121 | Public License are Attribution, NonCommercial, and ShareAlike. 122 | 123 | h. Licensed Material means the artistic or literary work, database, 124 | or other material to which the Licensor applied this Public 125 | License. 126 | 127 | i. Licensed Rights means the rights granted to You subject to the 128 | terms and conditions of this Public License, which are limited to 129 | all Copyright and Similar Rights that apply to Your use of the 130 | Licensed Material and that the Licensor has authority to license. 131 | 132 | j. Licensor means the individual(s) or entity(ies) granting rights 133 | under this Public License. 134 | 135 | k. NonCommercial means not primarily intended for or directed towards 136 | commercial advantage or monetary compensation. For purposes of 137 | this Public License, the exchange of the Licensed Material for 138 | other material subject to Copyright and Similar Rights by digital 139 | file-sharing or similar means is NonCommercial provided there is 140 | no payment of monetary compensation in connection with the 141 | exchange. 142 | 143 | l. Share means to provide material to the public by any means or 144 | process that requires permission under the Licensed Rights, such 145 | as reproduction, public display, public performance, distribution, 146 | dissemination, communication, or importation, and to make material 147 | available to the public including in ways that members of the 148 | public may access the material from a place and at a time 149 | individually chosen by them. 150 | 151 | m. Sui Generis Database Rights means rights other than copyright 152 | resulting from Directive 96/9/EC of the European Parliament and of 153 | the Council of 11 March 1996 on the legal protection of databases, 154 | as amended and/or succeeded, as well as other essentially 155 | equivalent rights anywhere in the world. 156 | 157 | n. You means the individual or entity exercising the Licensed Rights 158 | under this Public License. Your has a corresponding meaning. 159 | 160 | 161 | Section 2 -- Scope. 162 | 163 | a. License grant. 164 | 165 | 1. Subject to the terms and conditions of this Public License, 166 | the Licensor hereby grants You a worldwide, royalty-free, 167 | non-sublicensable, non-exclusive, irrevocable license to 168 | exercise the Licensed Rights in the Licensed Material to: 169 | 170 | a. reproduce and Share the Licensed Material, in whole or 171 | in part, for NonCommercial purposes only; and 172 | 173 | b. produce, reproduce, and Share Adapted Material for 174 | NonCommercial purposes only. 175 | 176 | 2. Exceptions and Limitations. For the avoidance of doubt, where 177 | Exceptions and Limitations apply to Your use, this Public 178 | License does not apply, and You do not need to comply with 179 | its terms and conditions. 180 | 181 | 3. Term. The term of this Public License is specified in Section 182 | 6(a). 183 | 184 | 4. Media and formats; technical modifications allowed. The 185 | Licensor authorizes You to exercise the Licensed Rights in 186 | all media and formats whether now known or hereafter created, 187 | and to make technical modifications necessary to do so. The 188 | Licensor waives and/or agrees not to assert any right or 189 | authority to forbid You from making technical modifications 190 | necessary to exercise the Licensed Rights, including 191 | technical modifications necessary to circumvent Effective 192 | Technological Measures. For purposes of this Public License, 193 | simply making modifications authorized by this Section 2(a) 194 | (4) never produces Adapted Material. 195 | 196 | 5. Downstream recipients. 197 | 198 | a. Offer from the Licensor -- Licensed Material. Every 199 | recipient of the Licensed Material automatically 200 | receives an offer from the Licensor to exercise the 201 | Licensed Rights under the terms and conditions of this 202 | Public License. 203 | 204 | b. Additional offer from the Licensor -- Adapted Material. 205 | Every recipient of Adapted Material from You 206 | automatically receives an offer from the Licensor to 207 | exercise the Licensed Rights in the Adapted Material 208 | under the conditions of the Adapter's License You apply. 209 | 210 | c. No downstream restrictions. You may not offer or impose 211 | any additional or different terms or conditions on, or 212 | apply any Effective Technological Measures to, the 213 | Licensed Material if doing so restricts exercise of the 214 | Licensed Rights by any recipient of the Licensed 215 | Material. 216 | 217 | 6. No endorsement. Nothing in this Public License constitutes or 218 | may be construed as permission to assert or imply that You 219 | are, or that Your use of the Licensed Material is, connected 220 | with, or sponsored, endorsed, or granted official status by, 221 | the Licensor or others designated to receive attribution as 222 | provided in Section 3(a)(1)(A)(i). 223 | 224 | b. Other rights. 225 | 226 | 1. Moral rights, such as the right of integrity, are not 227 | licensed under this Public License, nor are publicity, 228 | privacy, and/or other similar personality rights; however, to 229 | the extent possible, the Licensor waives and/or agrees not to 230 | assert any such rights held by the Licensor to the limited 231 | extent necessary to allow You to exercise the Licensed 232 | Rights, but not otherwise. 233 | 234 | 2. Patent and trademark rights are not licensed under this 235 | Public License. 236 | 237 | 3. To the extent possible, the Licensor waives any right to 238 | collect royalties from You for the exercise of the Licensed 239 | Rights, whether directly or through a collecting society 240 | under any voluntary or waivable statutory or compulsory 241 | licensing scheme. In all other cases the Licensor expressly 242 | reserves any right to collect such royalties, including when 243 | the Licensed Material is used other than for NonCommercial 244 | purposes. 245 | 246 | 247 | Section 3 -- License Conditions. 248 | 249 | Your exercise of the Licensed Rights is expressly made subject to the 250 | following conditions. 251 | 252 | a. Attribution. 253 | 254 | 1. If You Share the Licensed Material (including in modified 255 | form), You must: 256 | 257 | a. retain the following if it is supplied by the Licensor 258 | with the Licensed Material: 259 | 260 | i. identification of the creator(s) of the Licensed 261 | Material and any others designated to receive 262 | attribution, in any reasonable manner requested by 263 | the Licensor (including by pseudonym if 264 | designated); 265 | 266 | ii. a copyright notice; 267 | 268 | iii. a notice that refers to this Public License; 269 | 270 | iv. a notice that refers to the disclaimer of 271 | warranties; 272 | 273 | v. a URI or hyperlink to the Licensed Material to the 274 | extent reasonably practicable; 275 | 276 | b. indicate if You modified the Licensed Material and 277 | retain an indication of any previous modifications; and 278 | 279 | c. indicate the Licensed Material is licensed under this 280 | Public License, and include the text of, or the URI or 281 | hyperlink to, this Public License. 282 | 283 | 2. You may satisfy the conditions in Section 3(a)(1) in any 284 | reasonable manner based on the medium, means, and context in 285 | which You Share the Licensed Material. For example, it may be 286 | reasonable to satisfy the conditions by providing a URI or 287 | hyperlink to a resource that includes the required 288 | information. 289 | 3. If requested by the Licensor, You must remove any of the 290 | information required by Section 3(a)(1)(A) to the extent 291 | reasonably practicable. 292 | 293 | b. ShareAlike. 294 | 295 | In addition to the conditions in Section 3(a), if You Share 296 | Adapted Material You produce, the following conditions also apply. 297 | 298 | 1. The Adapter's License You apply must be a Creative Commons 299 | license with the same License Elements, this version or 300 | later, or a BY-NC-SA Compatible License. 301 | 302 | 2. You must include the text of, or the URI or hyperlink to, the 303 | Adapter's License You apply. You may satisfy this condition 304 | in any reasonable manner based on the medium, means, and 305 | context in which You Share Adapted Material. 306 | 307 | 3. You may not offer or impose any additional or different terms 308 | or conditions on, or apply any Effective Technological 309 | Measures to, Adapted Material that restrict exercise of the 310 | rights granted under the Adapter's License You apply. 311 | 312 | 313 | Section 4 -- Sui Generis Database Rights. 314 | 315 | Where the Licensed Rights include Sui Generis Database Rights that 316 | apply to Your use of the Licensed Material: 317 | 318 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right 319 | to extract, reuse, reproduce, and Share all or a substantial 320 | portion of the contents of the database for NonCommercial purposes 321 | only; 322 | 323 | b. if You include all or a substantial portion of the database 324 | contents in a database in which You have Sui Generis Database 325 | Rights, then the database in which You have Sui Generis Database 326 | Rights (but not its individual contents) is Adapted Material, 327 | including for purposes of Section 3(b); and 328 | 329 | c. You must comply with the conditions in Section 3(a) if You Share 330 | all or a substantial portion of the contents of the database. 331 | 332 | For the avoidance of doubt, this Section 4 supplements and does not 333 | replace Your obligations under this Public License where the Licensed 334 | Rights include other Copyright and Similar Rights. 335 | 336 | 337 | Section 5 -- Disclaimer of Warranties and Limitation of Liability. 338 | 339 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE 340 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS 341 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF 342 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, 343 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, 344 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 345 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, 346 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT 347 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT 348 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. 349 | 350 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE 351 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, 352 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, 353 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, 354 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR 355 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN 356 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR 357 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR 358 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. 359 | 360 | c. The disclaimer of warranties and limitation of liability provided 361 | above shall be interpreted in a manner that, to the extent 362 | possible, most closely approximates an absolute disclaimer and 363 | waiver of all liability. 364 | 365 | 366 | Section 6 -- Term and Termination. 367 | 368 | a. This Public License applies for the term of the Copyright and 369 | Similar Rights licensed here. However, if You fail to comply with 370 | this Public License, then Your rights under this Public License 371 | terminate automatically. 372 | 373 | b. Where Your right to use the Licensed Material has terminated under 374 | Section 6(a), it reinstates: 375 | 376 | 1. automatically as of the date the violation is cured, provided 377 | it is cured within 30 days of Your discovery of the 378 | violation; or 379 | 380 | 2. upon express reinstatement by the Licensor. 381 | 382 | For the avoidance of doubt, this Section 6(b) does not affect any 383 | right the Licensor may have to seek remedies for Your violations 384 | of this Public License. 385 | 386 | c. For the avoidance of doubt, the Licensor may also offer the 387 | Licensed Material under separate terms or conditions or stop 388 | distributing the Licensed Material at any time; however, doing so 389 | will not terminate this Public License. 390 | 391 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public 392 | License. 393 | 394 | 395 | Section 7 -- Other Terms and Conditions. 396 | 397 | a. The Licensor shall not be bound by any additional or different 398 | terms or conditions communicated by You unless expressly agreed. 399 | 400 | b. Any arrangements, understandings, or agreements regarding the 401 | Licensed Material not stated herein are separate from and 402 | independent of the terms and conditions of this Public License. 403 | 404 | 405 | Section 8 -- Interpretation. 406 | 407 | a. For the avoidance of doubt, this Public License does not, and 408 | shall not be interpreted to, reduce, limit, restrict, or impose 409 | conditions on any use of the Licensed Material that could lawfully 410 | be made without permission under this Public License. 411 | 412 | b. To the extent possible, if any provision of this Public License is 413 | deemed unenforceable, it shall be automatically reformed to the 414 | minimum extent necessary to make it enforceable. If the provision 415 | cannot be reformed, it shall be severed from this Public License 416 | without affecting the enforceability of the remaining terms and 417 | conditions. 418 | 419 | c. No term or condition of this Public License will be waived and no 420 | failure to comply consented to unless expressly agreed to by the 421 | Licensor. 422 | 423 | d. Nothing in this Public License constitutes or may be interpreted 424 | as a limitation upon, or waiver of, any privileges and immunities 425 | that apply to the Licensor or You, including from the legal 426 | processes of any jurisdiction or authority. 427 | 428 | ======================================================================= 429 | 430 | Creative Commons is not a party to its public 431 | licenses. Notwithstanding, Creative Commons may elect to apply one of 432 | its public licenses to material it publishes and in those instances 433 | will be considered the “Licensor.” The text of the Creative Commons 434 | public licenses is dedicated to the public domain under the CC0 Public 435 | Domain Dedication. Except for the limited purpose of indicating that 436 | material is shared under a Creative Commons public license or as 437 | otherwise permitted by the Creative Commons policies published at 438 | creativecommons.org/policies, Creative Commons does not authorize the 439 | use of the trademark "Creative Commons" or any other trademark or logo 440 | of Creative Commons without its prior written consent including, 441 | without limitation, in connection with any unauthorized modifications 442 | to any of its public licenses or any other arrangements, 443 | understandings, or agreements concerning use of licensed material. For 444 | the avoidance of doubt, this paragraph does not form part of the 445 | public licenses. 446 | 447 | Creative Commons may be contacted at creativecommons.org. 448 | 449 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DDAD - Dense Depth for Autonomous Driving 2 | 3 | 4 | 5 | 6 | 7 | - [DDAD depth challenge](#ddad-depth-challenge) 8 | - [How to Use](#how-to-use) 9 | - [Dataset details](#dataset-details) 10 | - [Dataset stats](#dataset-stats) 11 | - [Sensor placement](#sensor-placement) 12 | - [Evaluation metrics](#evaluation-metrics) 13 | - [IPython notebook](#ipython-notebook) 14 | - [References](#references) 15 | - [Privacy](#privacy) 16 | - [License](#license) 17 | 18 | DDAD is a new autonomous driving benchmark from TRI (Toyota Research Institute) for long range (up to 250m) and dense depth estimation in challenging and diverse urban conditions. It contains monocular videos and accurate ground-truth depth (across a full 360 degree field of view) generated from high-density LiDARs mounted on a fleet of self-driving cars operating in a cross-continental setting. DDAD contains scenes from urban settings in the United States (San Francisco, Bay Area, Cambridge, Detroit, Ann Arbor) and Japan (Tokyo, Odaiba). 19 | 20 | ![](media/figs/ddad_viz.gif) 21 | 22 | ## DDAD depth challenge 23 | 24 | The [DDAD depth challenge](https://eval.ai/web/challenges/challenge-page/902/overview) consists of two tracks: self-supervised and semi-supervised monocular depth estimation. We will evaluate all methods against the ground truth Lidar depth, and we will also compute and report depth metric per semantic class. The winner will be chosen based on the abs_rel metric. The winners of the challenge will receive cash prizes and will present their work at the CVPR 2021 Workshop [“Frontiers of Monocular 3D Perception”](https://sites.google.com/view/mono3d-workshop). Please check below for details on the [DDAD dataset](#dataset-details), [notebook](ipython-notebook) for loading the data and a description of the [evaluation metrics](#evaluation-metrics). 25 | 26 | ## How to Use 27 | 28 | The data can be downloaded here: [train+val](https://tri-ml-public.s3.amazonaws.com/github/DDAD/datasets/DDAD.tar) (257 GB, md5 checksum: `c0da97967f76da80f86d6f97d0d98904`) and test (coming soon). To load the dataset, please use the [TRI Dataset Governance Policy (DGP) codebase](https://github.com/TRI-ML/dgp). The following snippet will instantiate the dataset: 29 | 30 | ```python 31 | from dgp.datasets import SynchronizedSceneDataset 32 | 33 | # Load synchronized pairs of camera and lidar frames. 34 | dataset = 35 | SynchronizedSceneDataset('/ddad.json', 36 | datum_names=('lidar', 'CAMERA_01', 'CAMERA_05'), 37 | generate_depth_from_datum='lidar', 38 | split='train' 39 | ) 40 | 41 | # Iterate through the dataset. 42 | for sample in dataset: 43 | # Each sample contains a list of the requested datums. 44 | lidar, camera_01, camera_05 = sample[0:3] 45 | point_cloud = lidar['point_cloud'] # Nx3 numpy.ndarray 46 | image_01 = camera_01['rgb'] # PIL.Image 47 | depth_01 = camera_01['depth'] # (H,W) numpy.ndarray, generated from 'lidar' 48 | ``` 49 | 50 | The [DGP](https://github.com/TRI-ML/dgp) codebase provides a number of functions that allow loading one or multiple camera images, projecting the lidar point cloud into the camera images, intrinsics and extrinsics support, etc. Additionally, please refer to the [Packnet-SfM](https://github.com/TRI-ML/packnet-sfm) codebase (in PyTorch) for more details on how to integrate and use DDAD for depth estimation training/inference/evaluation and state-of-the-art pretrained models. 51 | 52 | ## Dataset details 53 | 54 | DDAD includes high-resolution, long-range [Luminar-H2](https://www.luminartech.com/technology) as the LiDAR sensors used to generate pointclouds, with a maximum range of 250m and sub-1cm range precision. Additionally, it contains six calibrated cameras time-synchronized at 10 Hz, that together produce a 360 degree coverage around the vehicle. The six cameras are 2.4MP (1936 x 1216), global-shutter, and oriented at 60 degree intervals. They are synchronized with 10 Hz scans from our Luminar-H2 sensors oriented at 90 degree intervals (datum names: `camera_01`, `camera_05`, `camera_06`, `camera_07`, `camera_08` and `camera_09`) - the camera intrinsics can be accessed with `datum['intrinsics']`. The data from the Luminar sensors is aggregated into a 360 point cloud covering the scene (datum name: `lidar`). Each sensor has associated extrinsics mapping it to a common vehicle frame of reference (`datum['extrinsics']`). 55 | 56 | The training and validation scenes are 5 or 10 seconds long and consist of 50 or 100 samples with corresponding Luminar-H2 pointcloud and six image frames including intrinsic and extrinsic calibration. The training set contains 150 scenes with a total of 12650 individual samples (75900 RGB images), and the validation set contains 50 scenes with a total of 3950 samples (23700 RGB images). 57 | 58 | 59 |

60 | 61 | 62 | 63 |

64 | 65 | 66 | 67 | 68 | ## Dataset stats 69 | 70 | ### Training split 71 | 72 | | Location | Num Scenes (50 frames) | Num Scenes (100 frames) | Total frames | 73 | | ------------- |:-------------:|:-------------:|:-------------:| 74 | | SF | 0 | 19 | 1900 | 75 | | ANN | 23 | 53 | 6450 | 76 | | DET | 8 | 0 | 400 | 77 | | Japan | 16 | 31 | 3900 | 78 | 79 | Total: `150 scenes` and `12650 frames`. 80 | 81 | ### Validation split 82 | 83 | | Location | Num Scenes (50 frames) | Num Scenes (100 frames) | Total frames | 84 | | ------------- |:-------------:|:-------------:|:-------------:| 85 | | SF | 1 | 10 | 1050 | 86 | | ANN | 11 | 14 | 1950 | 87 | | Japan | 9 | 5 | 950 | 88 | 89 | Total: `50 scenes` and `3950 frames`. 90 | 91 | USA locations: ANN - Ann Arbor, MI; SF - San Francisco Bay Area, CA; DET - Detroit, MI; CAM - Cambridge, Massachusetts. Japan locations: Tokyo and Odaiba. 92 | 93 | ### Test split 94 | 95 | The test split consists of 3080 images with associated intrinsic calibration. The data can be downloaded from [here](https://tri-ml-public.s3.amazonaws.com/github/DDAD/datasets/DDAD_test.tar). 200 images from the test split have associated panoptic labels, similar to the DDAD validation split. The ground truth depth and panoptic labels will not be made public. To evaluate your method on the DDAD test split, please submit your results to the [DDAD depth challenge](https://eval.ai/web/challenges/challenge-page/902/overview), as a single zip file with the same file name convention as the test split (i.e. 000000.png ... 003079.png). Each entry in the zip file should correspond to the DDAD test split image with the same name, and it should be a 16bit single channel PNG image. Each prediction can be either at full image resolution or downsampled. If the resolution of the predicted depth is different from that of the input image, the evaluation script will upsample the predicted depth to the input image resolution using nearest neighbor interpolation. 96 | 97 | ## Sensor placement 98 | 99 | The figure below shows the placement of the DDAD LiDARs and cameras. Please note that both LiDAR and camera sensors are positioned so as to provide 360 degree coverage around the vehicle. The data from all sensors is time synchronized and reported at a frequency of 10 Hz. The data from the Luminar sensors is reported as a single point cloud in the vehicle frame of reference with origin on the ground below the center of the vehicle rear axle, as shown below. For instructions on visualizing the camera images and the point clouds please refer to this [IPython notebook](media/notebooks/DDAD.ipynb). 100 | 101 | ![](media/figs/ddad_sensors.png) 102 | 103 | ## Evaluation metrics 104 | 105 | Please refer to the the [Packnet-SfM](https://github.com/TRI-ML/packnet-sfm) codebase for instructions on how to compute detailed depth evaluation metrics. 106 | 107 | We also provide an evaluation script compatible with our [Eval.AI challenge](https://eval.ai/web/challenges/challenge-page/902/overview), that can be used to test your submission on the front camera images of the DDAD validation split. Ground-truth depth maps for evaluation (obtained by iterating over the validation dataset in order) can be found [here](https://tri-ml-public.s3.amazonaws.com/github/DDAD/challenge/gt_val.zip), and an example submission file can be found [here](https://tri-ml-public.s3.amazonaws.com/github/DDAD/challenge/pred_val_sup.zip). To evaluate, you can run: 108 | 109 | ``` 110 | cd evaluation 111 | python3 main.py gt_val.zip pred_val_sup.zip semi 112 | ``` 113 | 114 | ## IPython notebook 115 | 116 | The associated [IPython notebook](notebooks/DDAD.ipynb) provides a detailed description of how to instantiate the dataset with various options, including loading frames with context, visualizing rgb and depth images for various cameras, and displaying the lidar point cloud. 117 | 118 | [![](media/figs/notebook.png)](notebooks/DDAD.ipynb) 119 | 120 | ## References 121 | 122 | Please use the following citation when referencing DDAD: 123 | 124 | #### 3D Packing for Self-Supervised Monocular Depth Estimation (CVPR 2020 oral) 125 | *Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos and Adrien Gaidon*, [**[paper]**](https://arxiv.org/abs/1905.02693), [**[video]**](https://www.youtube.com/watch?v=b62iDkLgGSI) 126 | ``` 127 | @inproceedings{packnet, 128 | author = {Vitor Guizilini and Rares Ambrus and Sudeep Pillai and Allan Raventos and Adrien Gaidon}, 129 | title = {3D Packing for Self-Supervised Monocular Depth Estimation}, 130 | booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 131 | primaryClass = {cs.CV} 132 | year = {2020}, 133 | } 134 | ``` 135 | 136 | 137 | ## Privacy 138 | 139 | To ensure privacy the DDAD dataset has been anonymized (license plate and face blurring) using state-of-the-art object detectors. 140 | 141 | 142 | ## License 143 | 144 | Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 145 | -------------------------------------------------------------------------------- /evaluation/main.py: -------------------------------------------------------------------------------- 1 | # Copyright 2021 Toyota Research Institute. All rights reserved. 2 | # 3 | # Validation groundtruth: 4 | # https://tri-ml-public.s3.amazonaws.com/github/DDAD/challenge/gt_val.zip 5 | # 6 | # Example of validation predictions (semi-supervised): 7 | # https://tri-ml-public.s3.amazonaws.com/github/DDAD/challenge/pred_val_sup.zip 8 | # 9 | # Predictions are stored as .png files in the same order as provided by the corresponding split (validation or test) 10 | # For more information, please check our depth estimation repository: https://github.com/tri-ml/packnet-sfm 11 | # 12 | # How to run: 13 | # python3 main.py gt_val.zip pred_val_sup.zip semi 14 | 15 | 16 | import sys 17 | import shutil 18 | 19 | from argparse import Namespace 20 | from zipfile import ZipFile 21 | 22 | from semantic_evaluation import main as SemanticEval 23 | 24 | 25 | def evaluate(gt_zip, pred_zip, phase): 26 | 27 | assert phase in ['semi', 'self'], 'Invalid phase name' 28 | 29 | use_gt_scale = phase == 'self' 30 | 31 | gt_folder = 'data/gt' 32 | print('gt_zip:', gt_zip) 33 | print('gt_folder:', gt_folder) 34 | with ZipFile(gt_zip, 'r') as zip: 35 | shutil.rmtree(gt_folder, ignore_errors=True) 36 | zip.extractall(path=gt_folder) 37 | pred_folder = 'data/pred' 38 | print('pred_zip:', pred_zip) 39 | print('pred_folder:', pred_folder) 40 | with ZipFile(pred_zip, 'r') as zip: 41 | shutil.rmtree(pred_folder, ignore_errors=True) 42 | zip.extractall(path=pred_folder) 43 | 44 | ranges = [200] 45 | metric = 'abs_rel' 46 | 47 | classes = [ 48 | "All", 49 | "Road", 50 | "Sidewalk", 51 | "Wall", 52 | "Fence", 53 | "Building", 54 | "Pole", 55 | "T.Light", 56 | "T.Sign", 57 | "Vegetation", 58 | "Terrain", 59 | "Person", 60 | "Rider", 61 | "Car", 62 | "Truck", 63 | "Bus", 64 | "Bicycle", 65 | ] 66 | 67 | args = Namespace(**{ 68 | 'gt_folder': gt_folder, 69 | 'pred_folder': pred_folder, 70 | 'ranges': ranges, 'classes': classes, 71 | 'metric': metric, 72 | 'output_folder': None, 73 | 'min_num_valid_pixels': 1, 74 | 'use_gt_scale': use_gt_scale, 75 | }) 76 | 77 | dict_output = SemanticEval(args) 78 | print(dict_output) 79 | 80 | if __name__ == "__main__": 81 | gt_zip = sys.argv[1] # Groundtruth .zip folder 82 | pred_zip = sys.argv[2] # Predicted .zip folder 83 | phase = sys.argv[3] # Which phase will be used ('semi' or 'self') 84 | evaluate(gt_zip, pred_zip, phase) 85 | 86 | -------------------------------------------------------------------------------- /evaluation/semantic_evaluation.py: -------------------------------------------------------------------------------- 1 | # Copyright 2021 Toyota Research Institute. All rights reserved. 2 | 3 | import argparse 4 | import os 5 | from argparse import Namespace 6 | from collections import OrderedDict 7 | from glob import glob 8 | 9 | import cv2 10 | import matplotlib.pyplot as plt 11 | import numpy as np 12 | import torch 13 | from tqdm import tqdm 14 | 15 | ddad_to_cityscapes = { 16 | # ROAD 17 | 7: 7, # Crosswalk 18 | 10: 7, # LaneMarking 19 | 11: 7, # LimitLine 20 | 13: 7, # OtherDriveableSurface 21 | 21: 7, # Road 22 | 24: 7, # RoadMarking 23 | 27: 7, # TemporaryConstructionObject 24 | # SIDEWALK 25 | 25: 8, # SideWalk 26 | 23: 8, # RoadBoundary (Curb) 27 | 14: 8, # OtherFixedStructure 28 | 15: 8, # OtherMovable 29 | # WALL 30 | 16: 12, # Overpass/Bridge/Tunnel 31 | 22: 12, # RoadBarriers 32 | # FENCE 33 | 8: 13, # Fence 34 | # BUILDING 35 | 2: 11, # Building 36 | # POLE 37 | 9: 17, # HorizontalPole 38 | 35: 17, # VerticalPole 39 | # TRAFFIC LIGHT 40 | 30: 19, # TrafficLight 41 | # TRAFFIC SIGN 42 | 31: 20, # TrafficSign 43 | # VEGETATION 44 | 34: 21, # Vegetation 45 | # TERRAIN 46 | 28: 22, # Terrain 47 | # SKY 48 | 26: 23, # Sky 49 | # PERSON 50 | 18: 24, # Pedestrian 51 | # RIDER 52 | 20: 25, # Rider 53 | # CAR 54 | 4: 26, # Car 55 | # TRUCK 56 | 33: 27, # Truck 57 | 5: 27, # Caravan/RV 58 | 6: 27, # ConstructionVehicle 59 | # BUS 60 | 3: 28, # Bus 61 | # TRAIN 62 | 32: 31, # Train 63 | # MOTORCYCLE 64 | 12: 32, # Motorcycle 65 | # BICYCLE 66 | 1: 33, # Bicycle 67 | # IGNORE 68 | 0: 255, # Animal 69 | 17: 255, # OwnCar (EgoCar) 70 | 19: 255, # Railway 71 | 29: 255, # TowedObject 72 | 36: 255, # WheeledSlow 73 | 37: 255, # Void 74 | } 75 | 76 | map_classes = { 77 | "Road": 7, 78 | "Sidewalk": 8, 79 | "Wall": 12, 80 | "Fence": 13, 81 | "Building": 11, 82 | "Pole": 17, 83 | "T.Light": 19, 84 | "T.Sign": 20, 85 | "Vegetation": 21, 86 | "Terrain": 22, 87 | "Sky": 23, 88 | "Person": 24, 89 | "Rider": 25, 90 | "Car": 26, 91 | "Truck": 27, 92 | "Bus": 28, 93 | "Train": 31, 94 | "Motorcycle": 32, 95 | "Bicycle": 33, 96 | "Ignore": 255, 97 | } 98 | 99 | 100 | def convert_ontology(semantic_id, ontology_convert): 101 | """Convert from one ontology to another""" 102 | if ontology_convert is None: 103 | return semantic_id 104 | else: 105 | semantic_id_convert = semantic_id.clone() 106 | for key, val in ontology_convert.items(): 107 | semantic_id_convert[semantic_id == key] = val 108 | return semantic_id_convert 109 | 110 | 111 | def parse_args(): 112 | """Parse arguments for benchmark script""" 113 | parser = argparse.ArgumentParser(description='PackNet-SfM benchmark script') 114 | parser.add_argument('--gt_folder', type=str, 115 | help='Folder containing predicted depth maps (.npz with key "depth")') 116 | parser.add_argument('--pred_folder', type=str, 117 | help='Folder containing predicted depth maps (.npz with key "depth")') 118 | parser.add_argument('--output_folder', type=str, 119 | help='Output folder where information will be stored') 120 | parser.add_argument('--use_gt_scale', action='store_true', 121 | help='Use ground-truth median scaling on predicted depth maps') 122 | parser.add_argument('--ranges', type=float, nargs='+', default=[200], 123 | help='Depth ranges to consider during evaluation') 124 | parser.add_argument('--classes', type=str, nargs='+', default=['All', 'Car', 'Pedestrian'], 125 | help='Semantic classes to consider during evaluation') 126 | parser.add_argument('--metric', type=str, default='rmse', choices=['abs_rel', 'rmse', 'silog', 'a1'], 127 | help='Which metric will be used for evaluation') 128 | parser.add_argument('--min_num_valid_pixels', type=int, default=1, 129 | help='Minimum number of valid pixels to consider') 130 | args = parser.parse_args() 131 | return args 132 | 133 | 134 | def create_summary_table(ranges, classes, matrix, folder, metric): 135 | 136 | # Prepare variables 137 | title = "Semantic/Range Depth Evaluation (%s) -- {}" % metric.upper() 138 | ranges = ['{}m'.format(r) for r in ranges] 139 | result = matrix.mean().round(decimals=3) 140 | matrix = matrix.round(decimals=2) 141 | 142 | # Create figure and axes 143 | fig, ax = plt.subplots() 144 | ax.imshow(matrix) 145 | 146 | # Show ticks 147 | ax.set_xticks(np.arange(len(ranges))) 148 | ax.set_yticks(np.arange(len(classes))) 149 | 150 | # Label ticks 151 | ax.set_xticklabels(ranges) 152 | ax.set_yticklabels(classes) 153 | 154 | # Rotate tick labels and set alignment 155 | plt.setp(ax.get_xticklabels(), rotation=45, ha="right", 156 | rotation_mode="anchor") 157 | 158 | # Loop over data to create annotations. 159 | for i in range(len(ranges)): 160 | for j in range(len(classes)): 161 | ax.text(i, j, matrix[j, i], 162 | ha="center", va="center", color="w") 163 | 164 | # Plot figure 165 | ax.set_title(title.format(result)) 166 | fig.tight_layout() 167 | 168 | # Save and show 169 | plt.savefig('{}/summary_table.png'.format(folder)) 170 | plt.close() 171 | 172 | 173 | def create_bar_plot(key_range, key_class, matrix, name, idx, folder): 174 | 175 | # Prepare title and start plot 176 | title = 'Per-frame depth evaluation of **{} at {}m**'.format(key_class, key_range) 177 | fig, ax = plt.subplots(figsize=(10, 8)) 178 | 179 | # Get x ticks and values 180 | x_ticks = [int(m[0]) for m in matrix] 181 | x_values = range(len(matrix)) 182 | # Get y values 183 | y_values = [m[2 + idx] for m in matrix] 184 | 185 | # Prepare titles, ticks and labels 186 | ax.set_title(title) 187 | ax.set_xticks(x_values) 188 | ax.set_xticklabels(x_ticks) 189 | ax.set_xlabel('Image frame') 190 | ax.set_ylabel('{}'.format(name.upper())) 191 | 192 | # Rotate tick labels and set alignment 193 | plt.setp(ax.get_xticklabels(), rotation=70, ha="right", 194 | rotation_mode="anchor") 195 | 196 | # Show and save 197 | ax.bar(x_values, y_values) 198 | plt.savefig('{}/{}-{}m-{}.png'.format(folder, key_class, key_range, name)) 199 | 200 | 201 | def load_sem_ins(file): 202 | """Load GT semantic and instance maps""" 203 | sem = file.replace('_gt', '_sem') 204 | if os.path.isfile(sem): 205 | ins = file.replace('_gt', '_ins') 206 | sem = cv2.imread(sem, cv2.IMREAD_ANYDEPTH) / 256. 207 | ins = cv2.imread(ins, cv2.IMREAD_ANYDEPTH) / 256. 208 | else: 209 | sem = ins = None 210 | return sem, ins 211 | 212 | 213 | def load_depth(depth): 214 | """Load a depth map""" 215 | depth = cv2.imread(depth, cv2.IMREAD_ANYDEPTH) / 256. 216 | depth = torch.tensor(depth).unsqueeze(0).unsqueeze(0) 217 | return depth 218 | 219 | 220 | def compute_depth_metrics(config, gt, pred, use_gt_scale=True, 221 | extra_mask=None, min_num_valid_pixels=1): 222 | """ 223 | Compute depth metrics from predicted and ground-truth depth maps 224 | 225 | Parameters 226 | ---------- 227 | config : CfgNode 228 | Metrics parameters 229 | gt : torch.Tensor 230 | Ground-truth depth map [B,1,H,W] 231 | pred : torch.Tensor 232 | Predicted depth map [B,1,H,W] 233 | use_gt_scale : bool 234 | True if ground-truth median-scaling is to be used 235 | extra_mask : torch.Tensor 236 | Extra mask to be used for calculation (e.g. semantic mask) 237 | min_num_valid_pixels : int 238 | Minimum number of valid pixels for the image to be considered 239 | 240 | Returns 241 | ------- 242 | metrics : torch.Tensor [7] 243 | Depth metrics (abs_rel, sq_rel, rmse, rmse_log, a1, a2, a3) 244 | """ 245 | # Initialize variables 246 | batch_size, _, gt_height, gt_width = gt.shape 247 | abs_diff = abs_rel = sq_rel = rmse = rmse_log = silog = a1 = a2 = a3 = 0.0 248 | # For each depth map 249 | for pred_i, gt_i in zip(pred, gt): 250 | gt_i, pred_i = torch.squeeze(gt_i), torch.squeeze(pred_i) 251 | 252 | # Keep valid pixels (min/max depth and crop) 253 | valid = (gt_i > config.min_depth) & (gt_i < config.max_depth) 254 | valid = valid & torch.squeeze(extra_mask) if extra_mask is not None else valid 255 | 256 | # Stop if there are no remaining valid pixels 257 | if valid.sum() < min_num_valid_pixels: 258 | return None, None 259 | 260 | # Keep only valid pixels 261 | gt_i, pred_i = gt_i[valid], pred_i[valid] 262 | 263 | # Ground-truth median scaling if needed 264 | if use_gt_scale: 265 | pred_i = pred_i * torch.median(gt_i) / torch.median(pred_i) 266 | 267 | # Clamp predicted depth values to min/max values 268 | pred_i = pred_i.clamp(config.min_depth, config.max_depth) 269 | 270 | # Calculate depth metrics 271 | 272 | thresh = torch.max((gt_i / pred_i), (pred_i / gt_i)) 273 | a1 += (thresh < 1.25 ).float().mean() 274 | a2 += (thresh < 1.25 ** 2).float().mean() 275 | a3 += (thresh < 1.25 ** 3).float().mean() 276 | 277 | diff_i = gt_i - pred_i 278 | abs_diff += torch.mean(torch.abs(diff_i)) 279 | abs_rel += torch.mean(torch.abs(diff_i) / gt_i) 280 | sq_rel += torch.mean(diff_i ** 2 / gt_i) 281 | rmse += torch.sqrt(torch.mean(diff_i ** 2)) 282 | rmse_log += torch.sqrt(torch.mean((torch.log(gt_i) - 283 | torch.log(pred_i)) ** 2)) 284 | 285 | err = torch.log(pred_i) - torch.log(gt_i) 286 | silog += torch.sqrt(torch.mean(err ** 2) - torch.mean(err) ** 2) * 100 287 | 288 | # Return average values for each metric 289 | return torch.tensor([metric / batch_size for metric in 290 | [abs_rel, sq_rel, rmse, rmse_log, silog, a1, a2, a3]]).type_as(gt), valid.sum() 291 | 292 | 293 | def main(args): 294 | 295 | # Get and sort ground-truth and predicted files 296 | pred_files = glob(os.path.join(args.pred_folder, '*.png')) 297 | pred_files.sort() 298 | 299 | gt_files = glob(os.path.join(args.gt_folder, '*_gt.png')) 300 | gt_files.sort() 301 | 302 | depth_ranges = args.ranges 303 | depth_classes = args.classes 304 | 305 | print('#### Depth ranges to evaluate:', depth_ranges) 306 | print('#### Depth classes to evaluate:', depth_classes) 307 | print('#### Number of predicted and groundtruth files:', len(pred_files), len(gt_files)) 308 | 309 | # Metrics name 310 | metric_names = ['abs_rel', 'sqr_rel', 'rmse', 'rmse_log', 'silog', 'a1', 'a2', 'a3'] 311 | matrix_metric = 'rmse' 312 | 313 | # Prepare matrix information 314 | matrix_idx = metric_names.index(matrix_metric) 315 | matrix = np.zeros((len(depth_classes), len(depth_ranges))) 316 | 317 | # Create metrics dictionary 318 | all_metrics = OrderedDict() 319 | for depth in depth_ranges: 320 | all_metrics[depth] = OrderedDict() 321 | for classes in depth_classes: 322 | all_metrics[depth][classes] = [] 323 | 324 | assert len(pred_files) == len(gt_files), 'Wrong number of files' 325 | 326 | # Loop over all files 327 | progress_bar = tqdm(zip(pred_files, gt_files), total=len(pred_files)) 328 | for i, (pred_file, gt_file) in enumerate(progress_bar): 329 | # Get and prepare ground-truth and predictions 330 | pred = load_depth(pred_file) 331 | gt = load_depth(gt_file) 332 | pred = torch.nn.functional.interpolate(pred, gt.shape[2:], mode='nearest') 333 | # Check for semantics 334 | sem = gt_file.replace('_gt.png', '_sem.png') 335 | with_semantic = os.path.exists(sem) 336 | if with_semantic: 337 | sem = torch.tensor(load_sem_ins(sem)[0]).unsqueeze(0).unsqueeze(0) 338 | if sem.max() < 1.0: 339 | sem = sem * 256 340 | sem = torch.nn.functional.interpolate(sem, gt.shape[2:], mode='nearest') 341 | sem = convert_ontology(sem, ddad_to_cityscapes) 342 | else: 343 | pass 344 | # Calculate metrics 345 | for key_depth in all_metrics.keys(): 346 | for key_class in all_metrics[key_depth].keys(): 347 | # Prepare config dictionary 348 | args_key = Namespace(**{ 349 | 'min_depth': 0, 350 | 'max_depth': key_depth, 351 | }) 352 | # Initialize metrics as None 353 | metrics, num = None, None 354 | # Considering all pixels 355 | if key_class == 'All': 356 | metrics, num = compute_depth_metrics( 357 | args_key, gt, pred, use_gt_scale=args.use_gt_scale) 358 | # Considering semantic classes 359 | elif with_semantic: 360 | metrics, num = compute_depth_metrics( 361 | args_key, gt, pred, use_gt_scale=args.use_gt_scale, 362 | extra_mask=sem == map_classes[key_class], 363 | min_num_valid_pixels=args.min_num_valid_pixels) 364 | # Store metrics if available 365 | if metrics is not None: 366 | metrics = metrics.detach().cpu().numpy() 367 | metrics = np.array([i, num] + list(metrics)) 368 | all_metrics[key_depth][key_class].append(metrics) 369 | 370 | if args.output_folder is None: 371 | out_dict = {} 372 | # Loop over range values 373 | for key1, val1 in all_metrics.items(): 374 | # Loop over depth metrics 375 | for key2, val2 in val1.items(): 376 | key = '{}_{}m'.format(key2, key1) 377 | if len(val2) > 0: 378 | out_dict[key] = {} 379 | for i in range(len(metric_names)): 380 | idx = [val2[j][0] for j in range(len(val2))] 381 | nums = [val2[j][1] for j in range(len(val2))] 382 | vals = [val2[j][i+2] for j in range(len(val2))] 383 | out_dict[key]['{}'.format(metric_names[i])] = sum( 384 | [n * v for n, v in zip(nums, vals)]) / sum(nums) 385 | vals = [val2[j][i+2] for j in range(len(val2))] 386 | out_dict[key]['{}'.format(metric_names[i])] = sum(vals) / len(vals) 387 | else: 388 | out_dict[key] = None 389 | 390 | m_abs_rel = {} 391 | for key, val in out_dict.items(): 392 | if 'All' not in key: 393 | m_abs_rel[key] = val['abs_rel'] if val is not None else None 394 | m_abs_rel = sum([val for val in m_abs_rel.values()]) / len(m_abs_rel.values()) 395 | 396 | filtered_dict = { 397 | 'AbsRel': out_dict['All_200m']['abs_rel'], 398 | 'RMSE': out_dict['All_200m']['rmse'], 399 | 'SILog': out_dict['All_200m']['silog'], 400 | 'a1': out_dict['All_200m']['a1'], 401 | 'Car_AbsRel': out_dict['Car_200m']['abs_rel'], 402 | 'Person_AbsRel': out_dict['Person_200m']['abs_rel'], 403 | 'mAbsRel': m_abs_rel, 404 | } 405 | 406 | return filtered_dict 407 | 408 | # Terminal lines 409 | met_line = '| {:>11} | {:^5} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} |' 410 | hor_line = '|{:<}|'.format('-' * 109) 411 | num_line = '| {:>10}m | {:>5} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} |' 412 | # File lines 413 | hor_line_file = '|{:<}|'.format('-' * 106) 414 | met_line_file = '| {:>8} | {:^5} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} | {:^8} |' 415 | num_line_file = '| {:>8} | {:>5} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} | {:^8.3f} |' 416 | # Create output folder 417 | os.makedirs(args.output_folder, exist_ok=True) 418 | 419 | # Loop over the dataset 420 | for i, key_class in enumerate(depth_classes): 421 | # Create file and write header 422 | file = open('{}/{}.txt'.format(args.output_folder, key_class), 'w') 423 | file.write(hor_line_file + '\n') 424 | file.write('| ***** {} *****\n'.format(key_class.upper())) 425 | # Print header 426 | print(hor_line) 427 | print(met_line.format(*((key_class.upper()), '#') + tuple(metric_names))) 428 | print(hor_line) 429 | # Loop over each depth range and semantic class 430 | for j, key_depth in enumerate(depth_ranges): 431 | metrics = all_metrics[key_depth][key_class] 432 | if len(metrics) > 0: 433 | # How many metrics were generated for that combination 434 | length = len(metrics) 435 | # Update file 436 | file.write(hor_line_file + '\n') 437 | file.write(met_line_file.format(*('{}m'.format(key_depth), '#') + tuple(metric_names)) + '\n') 438 | file.write(hor_line_file + '\n') 439 | # Create bar plot 440 | create_bar_plot(key_depth, key_class, metrics, matrix_metric, matrix_idx, args.output_folder) 441 | # Save individual metric to file 442 | for metric in metrics: 443 | idx, qty, metric = int(metric[0]), int(metric[1]), metric[2:] 444 | file.write(num_line_file.format(*(idx, qty) + tuple(metric)) + '\n') 445 | # Average metrics and update matrix 446 | metrics = (sum(metrics) / len(metrics)) 447 | matrix[i, j] = metrics[2 + matrix_idx] 448 | # Print to terminal 449 | print(num_line.format(*((key_depth, length) + tuple(metrics[2:])))) 450 | # Update file 451 | file.write(hor_line_file + '\n') 452 | file.write(num_line_file.format(*('TOTAL', length) + tuple(metrics[2:])) + '\n') 453 | file.write(hor_line_file + '\n') 454 | # Finish file 455 | file.write(hor_line_file + '\n') 456 | file.close() 457 | # Finish terminal printing 458 | print(hor_line) 459 | # Create final results 460 | create_summary_table(depth_ranges, depth_classes, matrix, args.output_folder, args.metric) 461 | 462 | 463 | if __name__ == '__main__': 464 | args = parse_args() 465 | main(args) 466 | -------------------------------------------------------------------------------- /media/figs/ann_viz_rgb.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/ann_viz_rgb.jpg -------------------------------------------------------------------------------- /media/figs/ddad_sensors.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/ddad_sensors.png -------------------------------------------------------------------------------- /media/figs/ddad_viz.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/ddad_viz.gif -------------------------------------------------------------------------------- /media/figs/hq_viz_rgb.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/hq_viz_rgb.jpg -------------------------------------------------------------------------------- /media/figs/notebook.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/notebook.png -------------------------------------------------------------------------------- /media/figs/odaiba_viz_rgb.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/odaiba_viz_rgb.jpg -------------------------------------------------------------------------------- /media/figs/pano1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/pano1.png -------------------------------------------------------------------------------- /media/figs/pano2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/pano2.png -------------------------------------------------------------------------------- /media/figs/pano3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/pano3.png -------------------------------------------------------------------------------- /media/figs/tri-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TRI-ML/DDAD/0c3f814d9cf58988ac679b8fd65fadf2ad523fb0/media/figs/tri-logo.png -------------------------------------------------------------------------------- /notebooks/DDAD.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# DDAD - Dense Depth for Autonomous Driving\n", 8 | "\n", 9 | "DDAD is a new autonomous driving benchmark from TRI (Toyota Research Institute) for long range (up to 250m) and dense depth estimation in challenging and diverse urban conditions. It contains monocular videos and accurate ground-truth depth (across a full 360 degree field of view) generated from high-density LiDARs mounted on a fleet of self-driving cars operating in a cross-continental setting. DDAD contains scenes from urban settings in the United States (San Francisco, Bay Area, Cambridge, Detroit, Ann Arbor) and Japan (Tokyo, Odaiba). This notebook will demonstrate a number of simple steps that will allow you to load and visualize the DDAD dataset.\n" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import cv2\n", 19 | "import numpy as np\n", 20 | "import PIL\n", 21 | "from IPython import display\n", 22 | "from matplotlib.cm import get_cmap\n", 23 | "\n", 24 | "from dgp.datasets.synchronized_dataset import SynchronizedSceneDataset\n", 25 | "from dgp.proto.ontology_pb2 import Ontology\n", 26 | "from dgp.utils.protobuf import open_pbobject\n", 27 | "from dgp.utils.visualization import visualize_semantic_segmentation_2d\n", 28 | "\n", 29 | "plasma_color_map = get_cmap('plasma')" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": null, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# Define high level variables\n", 39 | "DDAD_TRAIN_VAL_JSON_PATH = '/data/datasets/ddad_train_val/ddad.json'\n", 40 | "DDAD_TEST_JSON_PATH = '/data/datasets/ddad_test/ddad_test.json'\n", 41 | "DATUMS = ['lidar'] + ['CAMERA_%02d' % idx for idx in [1, 5, 6, 7, 8, 9]] " 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "## DDAD Train split\n", 49 | "\n", 50 | "The training set contains 150 scenes with a total of 12650 individual samples (75900 RGB images)." 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": null, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "ddad_train = SynchronizedSceneDataset(\n", 60 | " DDAD_TRAIN_VAL_JSON_PATH,\n", 61 | " split='train',\n", 62 | " datum_names=DATUMS,\n", 63 | " generate_depth_from_datum='lidar'\n", 64 | ")\n", 65 | "print('Loaded DDAD train split containing {} samples'.format(len(ddad_train)))" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "### Load a random sample" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": null, 78 | "metadata": {}, 79 | "outputs": [], 80 | "source": [ 81 | "random_sample_idx = np.random.randint(len(ddad_train))\n", 82 | "sample = ddad_train[random_sample_idx] # scene[0] - lidar, scene[1:] - camera datums\n", 83 | "sample_datum_names = [datum['datum_name'] for datum in sample]\n", 84 | "print('Loaded sample {} with datums {}'.format(random_sample_idx, sample_datum_names))" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "### Visualize camera images" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": null, 97 | "metadata": {}, 98 | "outputs": [], 99 | "source": [ 100 | "# Concat images and visualize\n", 101 | "images = [cam['rgb'].resize((192,120), PIL.Image.BILINEAR) for cam in sample[1:]]\n", 102 | "images = np.concatenate(images, axis=1)\n", 103 | "display.display(PIL.Image.fromarray(images))" 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "metadata": {}, 109 | "source": [ 110 | "### Visualize corresponding depths" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": null, 116 | "metadata": {}, 117 | "outputs": [], 118 | "source": [ 119 | "# Visualize corresponding depths, if the depth has been projected into the camera images\n", 120 | "if 'depth' in sample[1].keys():\n", 121 | " # Load and resize depth images\n", 122 | " depths = [cv2.resize(cam['depth'], dsize=(192,120), interpolation=cv2.INTER_NEAREST) \\\n", 123 | " for cam in sample[1:]]\n", 124 | " # Convert to RGB for visualization\n", 125 | " depths = [plasma_color_map(d)[:,:,:3] for d in depths]\n", 126 | " depths = np.concatenate(depths, axis=1)\n", 127 | " display.display(PIL.Image.fromarray((depths*255).astype(np.uint8)))" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "### Visualize Lidar" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": null, 140 | "metadata": {}, 141 | "outputs": [], 142 | "source": [ 143 | "# Note: this requires open3d\n", 144 | "import open3d as o3d\n", 145 | "\n", 146 | "# Get lidar point cloud from sample\n", 147 | "lidar_cloud = sample[0]['point_cloud']\n", 148 | "# Create open3d visualization objects\n", 149 | "o3d_colors = np.tile(np.array([0., 0., 0.]), (len(lidar_cloud), 1))\n", 150 | "o3d_cloud = o3d.geometry.PointCloud()\n", 151 | "o3d_cloud.points = o3d.utility.Vector3dVector(lidar_cloud)\n", 152 | "o3d_cloud.colors = o3d.utility.Vector3dVector(o3d_colors)\n", 153 | "# Visualize (Note: needs open3d, openGL and X server support)\n", 154 | "o3d.visualization.draw_geometries([o3d_cloud])" 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": {}, 160 | "source": [ 161 | "## DDAD train with temporal context\n", 162 | "\n", 163 | "To also return temporally adjacent scenes, use `forward_context` and `backward_context`." 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": null, 169 | "metadata": {}, 170 | "outputs": [], 171 | "source": [ 172 | "# Intantiate dataset with forward and backward context\n", 173 | "\n", 174 | "ddad_train_with_context = SynchronizedSceneDataset(\n", 175 | " DDAD_TRAIN_VAL_JSON_PATH,\n", 176 | " split='train',\n", 177 | " datum_names=('CAMERA_01',),\n", 178 | " generate_depth_from_datum='lidar',\n", 179 | " forward_context=1, \n", 180 | " backward_context=1\n", 181 | ")" 182 | ] 183 | }, 184 | { 185 | "cell_type": "markdown", 186 | "metadata": {}, 187 | "source": [ 188 | "### Visualize front camera images" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": {}, 195 | "outputs": [], 196 | "source": [ 197 | "# Load random sample\n", 198 | "# Note that when forward_context or backward_context is used, the loader returns a list of samples\n", 199 | "samples = ddad_train_with_context[np.random.randint(len(ddad_train))] \n", 200 | "front_cam_images = []\n", 201 | "for sample in samples:\n", 202 | " front_cam_images.append(sample[0]['rgb'])\n", 203 | "# Resize images and visualize\n", 204 | "front_cam_images = [img.resize((192,120), PIL.Image.BILINEAR) for img in front_cam_images]\n", 205 | "front_cam_images = np.concatenate(front_cam_images, axis=1)\n", 206 | "display.display(PIL.Image.fromarray(front_cam_images))" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "## DDAD Val split\n", 214 | "\n", 215 | "The validation set contains 50 scenes with a total of 3950 individual samples." 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": null, 221 | "metadata": {}, 222 | "outputs": [], 223 | "source": [ 224 | "# Load the val set\n", 225 | "ddad_val = SynchronizedSceneDataset(\n", 226 | " DDAD_TRAIN_VAL_JSON_PATH,\n", 227 | " split='val',\n", 228 | " datum_names=DATUMS,\n", 229 | " generate_depth_from_datum='lidar'\n", 230 | ")\n", 231 | "print('Loaded DDAD val split containing {} samples'.format(len(ddad_val)))" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "### Load the panoptic segmentation labels from the val set\n", 239 | "\n", 240 | "50 of the DDAD validation samples have panoptic segmentation annotations for the front camera images. These annotations can be used for detailed, per-class evaluation." 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": null, 246 | "metadata": { 247 | "scrolled": false 248 | }, 249 | "outputs": [], 250 | "source": [ 251 | "ddad_val = SynchronizedSceneDataset(\n", 252 | " DDAD_TRAIN_VAL_JSON_PATH,\n", 253 | " split='val',\n", 254 | " datum_names=('CAMERA_01',),\n", 255 | " requested_annotations=('semantic_segmentation_2d', 'instance_segmentation_2d'),\n", 256 | " only_annotated_datums=True\n", 257 | ")\n", 258 | "print('Loaded annotated samples from DDAD val split. Total samples: {}.'.format(len(ddad_val)))" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "# Visualize the semantic segmentation labels" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": {}, 272 | "outputs": [], 273 | "source": [ 274 | "# Load instance and semantic segmentation ontologies\n", 275 | "semseg_ontology = open_pbobject(ddad_val.scenes[0].ontology_files['semantic_segmentation_2d'], Ontology)\n", 276 | "instance_ontology = open_pbobject(ddad_val.scenes[0].ontology_files['instance_segmentation_2d'], Ontology)\n", 277 | "\n", 278 | "# Load random sample \n", 279 | "random_sample_idx = np.random.randint(len(ddad_val))\n", 280 | "sample = ddad_val[random_sample_idx]\n", 281 | "\n", 282 | "# Get image and sample segmentation annotation from sample \n", 283 | "image = np.array(sample[0]['rgb'])\n", 284 | "semantic_segmentation_2d_annotation = sample [0]['semantic_segmentation_2d']\n", 285 | "sem_seg_image = visualize_semantic_segmentation_2d(\n", 286 | " semantic_segmentation_2d_annotation, semseg_ontology, image=image, debug=False\n", 287 | ")\n", 288 | "\n", 289 | "# Visualize\n", 290 | "image = cv2.resize(image, dsize=(320,240), interpolation=cv2.INTER_NEAREST)\n", 291 | "sem_seg_image = cv2.resize(sem_seg_image, dsize=(320,240), interpolation=cv2.INTER_NEAREST)\n", 292 | "display.display(PIL.Image.fromarray(np.concatenate([image, sem_seg_image], axis=1)))" 293 | ] 294 | }, 295 | { 296 | "cell_type": "markdown", 297 | "metadata": {}, 298 | "source": [ 299 | "## DDAD Test split" 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": null, 305 | "metadata": {}, 306 | "outputs": [], 307 | "source": [ 308 | "ddad_test = SynchronizedSceneDataset(\n", 309 | " DDAD_TEST_JSON_PATH,\n", 310 | " split='test',\n", 311 | " datum_names=DATUMS\n", 312 | ")\n", 313 | "print('Loaded DDAD test split containing {} samples'.format(len(ddad_test)))" 314 | ] 315 | } 316 | ], 317 | "metadata": { 318 | "kernelspec": { 319 | "display_name": "Python 3", 320 | "language": "python", 321 | "name": "python3" 322 | }, 323 | "language_info": { 324 | "codemirror_mode": { 325 | "name": "ipython", 326 | "version": 3 327 | }, 328 | "file_extension": ".py", 329 | "mimetype": "text/x-python", 330 | "name": "python", 331 | "nbconvert_exporter": "python", 332 | "pygments_lexer": "ipython3", 333 | "version": "3.6.9" 334 | } 335 | }, 336 | "nbformat": 4, 337 | "nbformat_minor": 4 338 | } --------------------------------------------------------------------------------