├── README.md └── figs └── teaser.jpg /README.md: -------------------------------------------------------------------------------- 1 | # GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data 2 | [![arXiv](https://img.shields.io/badge/arXiv-2505.03233-df2a2a.svg)](https://arxiv.org/pdf/2505.03233) 3 | [![Static Badge](https://img.shields.io/badge/Project-Page-a)](https://pku-epic.github.io/GraspVLA-web/) 4 | 5 | 6 | 7 | We present a cost-effective pretraining paradigm for VLA models using only synthetic data, achieving direct sim-to-real transfer and strong zero-shot generalizability for robotic grasping. Key contributions include: 8 | 9 | - **SynGrasp-1B**: a billion-frame synthetic grasping dataset, spanning 240 object categories and 10,000+ objects. 10 | 11 | - **GraspVLA**: a VLA model pretrained on SynGrasp-1B that achieves zero-shot generalization to real-world grasping without fine-tuning. 12 | 13 | - **Unified CoT Framework**: GraspVLA integrates autoregressive perception and flow-matching-based action generation into a single reasoning process, enabling joint training on synthetic action data and internet-scale semantic data for open-vocabulary grasping. 14 | 15 | ![teaser](./figs/teaser.jpg) 16 | 17 | TODO List: 18 | - [ ] Release the supplementary material 19 | - [ ] Release model weights 20 | - [ ] Release SynGrasp-1B dataset 21 | 22 | [![License](https://licensebuttons.net/l/by-nc/4.0/88x31.png)](LICENSE) -------------------------------------------------------------------------------- /figs/teaser.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKU-EPIC/GraspVLA/4d6963bee730525e175e0fcc5db9533c28060ff5/figs/teaser.jpg --------------------------------------------------------------------------------