├── .gitignore
├── assets
├── teaser.jpg
├── table_occ.png
└── table_video.png
├── LICENSE
└── README.md
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 |
--------------------------------------------------------------------------------
/assets/teaser.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huster-YZY/GenieDrive/HEAD/assets/teaser.jpg
--------------------------------------------------------------------------------
/assets/table_occ.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huster-YZY/GenieDrive/HEAD/assets/table_occ.png
--------------------------------------------------------------------------------
/assets/table_video.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huster-YZY/GenieDrive/HEAD/assets/table_video.png
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2025 杨振亚
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | ### **GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation**
4 |
5 | [Zhenya Yang](https://scholar.google.com/citations?user=4nk3hAgAAAAJ&hl=zh-CN)
1,
6 | [Zhe Liu](https://happinesslz.github.io)
1,†,
7 | [Yuxiang Lu](https://innovator-zero.github.io)
1,
8 | [Liping Hou](#)
2,
9 | [Chenxuan Miao](https://scholar.google.com/citations?user=184t8cAAAAAJ&hl=en)
1,
10 | [Siyi Peng](#)
2,
11 | [Bailan Feng](#)
2,
12 | [Xiang Bai](https://xbai.vlrlab.net)
3,
13 | [Hengshuang Zhao](https://i.cs.hku.hk/~hszhao/)
1,✉
14 |
15 |
16 |
1 The University of Hong Kong,
17 |
2 Huawei Noah's Ark Lab,
18 |
3 Huazhong University of Science and Technology
19 |
20 | † Project leader, ✉ Corresponding author.
21 |
22 |
23 | > 📑 [[arXiv](https://arxiv.org/abs/2512.12751)], ⚙️ [[project page](https://huster-yzy.github.io/geniedrive_project_page/)], 🤗 [[model weights](#)]
24 |
25 |
26 |
27 |

28 |
Overview of our GenieDrive
29 |
30 |
31 |
32 |
33 | ## 📢 News
34 |
35 | - **[2025/12/15]** We release GenieDrive paper on arXiv. 🔥
36 | * **2025.12.15**: [DrivePI](https://github.com/happinesslz/DrivePI) paper released! A novel spatial-aware 4D MLLM that serves as a unified Vision-Language-Action (VLA) framework that is also compatible with vision-action (VA) models. 🔥
37 | * **2025.11.04**: Our previous work [UniLION](https://github.com/happinesslz/UniLION) has been released. Check out the [codebase](https://github.com/happinesslz/UniLION) for unified autonomous driving model with Linear Group RNNs. 🚀
38 | * **2024.09.26**: Our work [LION](https://github.com/happinesslz/LION) has been accepted by NeurIPS 2024. Visit the [codebase](https://github.com/happinesslz/LION) for Linear Group RNN for 3D Object Detection. 🚀
39 |
40 | ## 📋 TODO List
41 |
42 | - [ ] Release 4D occupancy forecasting code and model weights.
43 | - [ ] Release multi-view video generator code and weights.
44 |
45 | ## 📈 Results
46 |
47 | Our method achieves a remarkable increase in 4D Occupancy forecasting performance, with a 7.2\% increase in mIoU and a 4\% increase in IoU.
48 | Moreover, our tri-plane VAE compresses occupancy into a latent tri-plane that is only 58\% the size used in previous methods, while still maintaining superior reconstruction performance.
49 | This compact latent representation also contributes to fast inference (41 FPS) and a minimal parameter count of only 3.47M (including the VAE and prediction module).
50 |
51 |
52 |

53 |
Performance of 4D Occupancy Forecasting
54 |
55 |
56 | We train three driving video generation models that differ only in video length: S (8 frames, ~0.7 s), M (37 frames, ~3 s), and L (81 frames, ~7 s). Through rollout, the L model can further generate long multi-view driving videos of up to 241 frames (~20 s).
57 | GenieDrive consistently outperforms previous occupancy-based methods across all metrics, while also enabling much longer video generation.
58 |
59 |
60 |

61 |
Performance of Multi-View Video Generation
62 |
63 |
64 |
65 |
66 |
67 | ## 📝 Citation
68 |
69 | ```bibtex
70 | @article{yang2025geniedrive,
71 | author = {Yang, Zhenya and Liu, Zhe and Lu, Yuxiang and Hou, Liping and Miao, Chenxuan and Peng, Siyi and Feng, Bailan and Bai, Xiang and Zhao, Hengshuang},
72 | title = {GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation},
73 | journal = {arXiv:2512.12751},
74 | year = {2025},
75 | }
76 | ```
77 |
78 | ## Acknowledgements
79 | We thank these great works and open-source repositories: [I2-World](https://github.com/lzzzzzm/II-World), [UniScene](https://github.com/Arlo0o/UniScene-Unified-Occupancy-centric-Driving-Scene-Generation), [DynamicCity](https://github.com/3DTopia/DynamicCity), [MMDectection3D](https://github.com/open-mmlab/mmdetection3d) and [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun).
--------------------------------------------------------------------------------