├── .gitignore ├── assets ├── teaser.jpg ├── table_occ.png └── table_video.png ├── LICENSE └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | -------------------------------------------------------------------------------- /assets/teaser.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huster-YZY/GenieDrive/HEAD/assets/teaser.jpg -------------------------------------------------------------------------------- /assets/table_occ.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huster-YZY/GenieDrive/HEAD/assets/table_occ.png -------------------------------------------------------------------------------- /assets/table_video.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huster-YZY/GenieDrive/HEAD/assets/table_video.png -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 杨振亚 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |
2 | 3 | ### **GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation** 4 | 5 | [Zhenya Yang](https://scholar.google.com/citations?user=4nk3hAgAAAAJ&hl=zh-CN)1, 6 | [Zhe Liu](https://happinesslz.github.io)1,†, 7 | [Yuxiang Lu](https://innovator-zero.github.io)1, 8 | [Liping Hou](#)2, 9 | [Chenxuan Miao](https://scholar.google.com/citations?user=184t8cAAAAAJ&hl=en)1, 10 | [Siyi Peng](#)2, 11 | [Bailan Feng](#)2, 12 | [Xiang Bai](https://xbai.vlrlab.net)3, 13 | [Hengshuang Zhao](https://i.cs.hku.hk/~hszhao/)1,✉ 14 | 15 |
16 | 1 The University of Hong Kong, 17 | 2 Huawei Noah's Ark Lab, 18 | 3 Huazhong University of Science and Technology 19 |
20 | † Project leader, ✉ Corresponding author. 21 |
22 | 23 | > 📑 [[arXiv](https://arxiv.org/abs/2512.12751)], ⚙️ [[project page](https://huster-yzy.github.io/geniedrive_project_page/)], 🤗 [[model weights](#)] 24 | 25 | 26 |
27 | 28 |

Overview of our GenieDrive

29 |
30 | 31 |
32 | 33 | ## 📢 News 34 | 35 | - **[2025/12/15]** We release GenieDrive paper on arXiv. 🔥 36 | * **2025.12.15**: [DrivePI](https://github.com/happinesslz/DrivePI) paper released! A novel spatial-aware 4D MLLM that serves as a unified Vision-Language-Action (VLA) framework that is also compatible with vision-action (VA) models. 🔥 37 | * **2025.11.04**: Our previous work [UniLION](https://github.com/happinesslz/UniLION) has been released. Check out the [codebase](https://github.com/happinesslz/UniLION) for unified autonomous driving model with Linear Group RNNs. 🚀 38 | * **2024.09.26**: Our work [LION](https://github.com/happinesslz/LION) has been accepted by NeurIPS 2024. Visit the [codebase](https://github.com/happinesslz/LION) for Linear Group RNN for 3D Object Detection. 🚀 39 | 40 | ## 📋 TODO List 41 | 42 | - [ ] Release 4D occupancy forecasting code and model weights. 43 | - [ ] Release multi-view video generator code and weights. 44 | 45 | ## 📈 Results 46 | 47 | Our method achieves a remarkable increase in 4D Occupancy forecasting performance, with a 7.2\% increase in mIoU and a 4\% increase in IoU. 48 | Moreover, our tri-plane VAE compresses occupancy into a latent tri-plane that is only 58\% the size used in previous methods, while still maintaining superior reconstruction performance. 49 | This compact latent representation also contributes to fast inference (41 FPS) and a minimal parameter count of only 3.47M (including the VAE and prediction module). 50 | 51 |
52 | 53 |

Performance of 4D Occupancy Forecasting

54 |
55 | 56 | We train three driving video generation models that differ only in video length: S (8 frames, ~0.7 s), M (37 frames, ~3 s), and L (81 frames, ~7 s). Through rollout, the L model can further generate long multi-view driving videos of up to 241 frames (~20 s). 57 | GenieDrive consistently outperforms previous occupancy-based methods across all metrics, while also enabling much longer video generation. 58 | 59 |
60 | 61 |

Performance of Multi-View Video Generation

62 |
63 | 64 | 65 | 66 | 67 | ## 📝 Citation 68 | 69 | ```bibtex 70 | @article{yang2025geniedrive, 71 | author = {Yang, Zhenya and Liu, Zhe and Lu, Yuxiang and Hou, Liping and Miao, Chenxuan and Peng, Siyi and Feng, Bailan and Bai, Xiang and Zhao, Hengshuang}, 72 | title = {GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation}, 73 | journal = {arXiv:2512.12751}, 74 | year = {2025}, 75 | } 76 | ``` 77 | 78 | ## Acknowledgements 79 | We thank these great works and open-source repositories: [I2-World](https://github.com/lzzzzzm/II-World), [UniScene](https://github.com/Arlo0o/UniScene-Unified-Occupancy-centric-Driving-Scene-Generation), [DynamicCity](https://github.com/3DTopia/DynamicCity), [MMDectection3D](https://github.com/open-mmlab/mmdetection3d) and [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun). --------------------------------------------------------------------------------