├── README.md └── images ├── 1af4535a-acc3-4417-ae33-675f4301f560.png ├── 2d94d985-d47e-4899-9760-c1cb8f19cd89.png ├── 5b923b31-dbbf-470f-af09-5125f5b91ab0.png ├── 80237155-9b7b-4f70-9c2e-8ee38029becd.png ├── a608d080-665a-4ab1-bd8f-d5bd121454da.png ├── a7817c0b-04b1-4a7c-9535-f9ff7801a689.png ├── data-demo.jpg ├── ee709e8b-6f05-428d-abff-2578914aeb0d.png ├── evaluation_affordance.png ├── evaluation_planning.png └── evaluation_trajectory.png /README.md: -------------------------------------------------------------------------------- 1 | # ShareRobot Dataset 2 | 3 | **ShareRobot**, a high-quality heterogeneous dataset that labels multi-dimensional information, including task planning, object affordance, and end-effector trajectory, effectively enhancing various robotic capabilities. 4 | 5 | - **Project Website**: [[CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete](https://superrobobrain.github.io/ "可选标题") 6 | - **Download Link**: [ShareRobot Dataset](https://huggingface.co/datasets/BAAI/ShareRobot "可选标题") 7 | 8 | ## Overview of ShareRobot 9 | 10 | ![ee709e8b-6f05-428d-abff-2578914aeb0d](./images/ee709e8b-6f05-428d-abff-2578914aeb0d.png) 11 | 12 | For **planning**, we have 51,403 episodes and each with 30 frames. In the process of data generation, we design 5 different templates for each of the 10 question types in RoboVQA [1]. In the process of data generation, we randomly select 2 templates of each question type to generate question-answer pairs for every instance. This process transforms 51,403 instances into 1,027,990 question-answer pairs, with annotators monitoring data generation to maintain the dataset’s integrity. 13 | 14 | For **Affordance**, we have 6,522 images and each with affordance areas aligned with an instruction. 15 | 16 | For **Trajectory**, we have 6,870 images and each with at least 3 {x, y} coordinates aligned with an instruction. 17 | 18 | 19 | 20 | ## Data Sources🌍 21 | 22 | ![a608d080-665a-4ab1-bd8f-d5bd121454da](./images/a608d080-665a-4ab1-bd8f-d5bd121454da.png) 23 | 24 | **ShareRobot** dataset contains 23 original datasets from Open X-Embodiment dataset [2], 12 embodiments and 107 types of atomic tasks. 25 | 26 | 27 | 28 | ### Raw Dataset for Planning 29 | 30 | | Raw Dataset | Number of Raws | 31 | |:-------------------------------------------------------------:| --------------:| 32 | | nyu_door_opening_surprising_effectiveness | 421 | 33 | | bridge | 15738 | 34 | | dlr_edan_shared_control_converted_externally_to_rlds | 63 | 35 | | utokyo_xarm_pick_and_place_converted_externally_to_rlds | 92 | 36 | | cmu_stretch | 10 | 37 | | asu_table_top_converted_externally_to_rlds | 109 | 38 | | dlr_sara_pour_converted_externally_to_rlds | 51 | 39 | | utokyo_xarm_bimanual_converted_externally_to_rlds | 27 | 40 | | robo_set | 18164 | 41 | | dobbe | 5200 | 42 | | berkeley_autolab_ur5 | 882 | 43 | | qut_dexterous_manpulation | 192 | 44 | | aloha_mobile | 264 | 45 | | dlr_sara_grid_clamp_converted_externally_to_rlds | 40 | 46 | | ucsd_pick_and_place_dataset_converted_externally_to_rlds | 569 | 47 | | ucsd_kitchen_dataset_converted_externally_to_rlds | 39 | 48 | | jaco_play | 956 | 49 | | utokyo_pr2_opening_fridge_converted_externally_to_rlds | 64 | 50 | | conq_hose_manipulation | 56 | 51 | | fmb | 7836 | 52 | | plex_robosuite | 398 | 53 | | utokyo_pr2_tabletop_manipulation_converted_externally_to_rlds | 189 | 54 | | viola | 44 | 55 | 56 | 57 | 58 | ### Raw Dataset for Affordance 59 | 60 | | Raw Dataset | Number of Raws | 61 | |:-------------------------------------------------------------:| -------------:| 62 | | utokyo_pr2_tabletop_manipulation_converted_externally_to_rlds | 24 | 63 | | utokyo_xarm_pick_and_place_converted_externally_to_rlds | 23 | 64 | | ucsd_kitchen_dataset_converted_externally_to_rlds | 10 | 65 | | ucsd_pick_and_place_dataset_converted_externally_to_rlds | 112 | 66 | | nyu_door_opening_surprising_effectiveness | 85 | 67 | | jaco_play | 171 | 68 | | bridge | 2610 | 69 | | utokyo_pr2_opening_fridge_converted_externally_to_rlds | 12 | 70 | | asu_table_top_converted_externally_to_rlds | 24 | 71 | | viola | 1 | 72 | | berkeley_autolab_ur5 | 122 | 73 | | aloha_mobile | 23 | 74 | | conq_hose_manipulation | 1 | 75 | | dobbe | 717 | 76 | | fmb | 561 | 77 | | plex_robosuite | 13 | 78 | | qut_dexterous_manpulation | 16 | 79 | | robo_set | 1979 | 80 | | dlr_edan_shared_control_converted_externally_to_rlds | 18 | 81 | | **Summary** | 6522 | 82 | 83 | 84 | 85 | ### Raw Dataset for Trajectory 86 | 87 | | Raw Dataset | Number of Raws | 88 | |:-------------------------------------------------------------:| -------------:| 89 | | utokyo_pr2_tabletop_manipulation_converted_externally_to_rlds | 35 | 90 | | utokyo_xarm_pick_and_place_converted_externally_to_rlds | 36 | 91 | | ucsd_kitchen_dataset_converted_externally_to_rlds | 19 | 92 | | dlr_sara_grid_clamp_converted_externally_to_rlds | 1 | 93 | | ucsd_pick_and_place_dataset_converted_externally_to_rlds | 109 | 94 | | nyu_door_opening_surprising_effectiveness | 74 | 95 | | jaco_play | 175 | 96 | | utokyo_xarm_bimanual_converted_externally_to_rlds | 7 | 97 | | bridge | 2986 | 98 | | utokyo_pr2_opening_fridge_converted_externally_to_rlds | 12 | 99 | | asu_table_top_converted_externally_to_rlds | 22 | 100 | | berkeley_autolab_ur5 | 164 | 101 | | dobbe | 759 | 102 | | fmb | 48 | 103 | | qut_dexterous_manpulation | 29 | 104 | | robo_set | 2374 | 105 | | dlr_sara_pour_converted_externally_to_rlds | 3 | 106 | | dlr_edan_shared_control_converted_externally_to_rlds | 17 | 107 | | **Summary** | 6870 | 108 | 109 | 110 | 111 | ## Data Format 112 | 113 | ### Planning 114 | 115 | ![data-demo](./images/data-demo.jpg) 116 | 117 | ```json 118 | { 119 | "id"{ 120 | "id": "/mnt/hpfs/baaiei/jyShi/rt_frames_success/rtx_frames_success_42/62_robo_set#episode_1570", 121 | "task": "Future_Prediction_Task", 122 | "selected_step": 3, 123 | "conversations": [ 124 | { 125 | "from": "human", 126 | "value": " After , what's the most probable next event?" 127 | }, 128 | { 129 | "from": "gpt", 130 | "value": "" 131 | } 132 | ], 133 | "image": [ 134 | "/path/to/image_0-25" 135 | ] 136 | } 137 | } 138 | ``` 139 | 140 |       141 | 142 | 143 | 144 | ### Affordance 145 | 146 | 147 |
148 | 149 | 150 |
151 | 152 | ```json 153 | { 154 | 155 | "id": 2486, 156 | "meta_data": { 157 | "original_dataset": "bridge", 158 | "original_width": 640, 159 | "original_height": 480 160 | }, 161 | "instruction": "place the red fork to the left of the left burner", 162 | "affordance": { 163 | "x": 352.87425387858815, 164 | "y": 186.47871614766484, 165 | "width": 19.296008229513156, 166 | "height": 14.472006172134865 167 | } 168 | ``` 169 | 170 | 171 | 172 | #### Visualize Code 173 | 174 | ```python 175 | import json 176 | import os 177 | import cv2 178 | import numpy as np 179 | 180 | img_dir = '/path/to/your/original/images/dir' 181 | affordance_json = '/path/to/your/affordances/json' 182 | output_img_dir = '/path/to/your/visualized/images/dir' 183 | 184 | with open(affordance_json, 'r') as f: 185 | data = json.load(f) 186 | for item in data: 187 | filepath = os.path.join(img_dir, item['id']) 188 | 189 | image = cv2.imread(filepath) 190 | color = (255, 0, 0) 191 | thickness = 2 192 | 193 | x_min,y_min = item['affordance']['x'], item['affordance']['y'] 194 | x_max,y_max = item['affordance']['x']+item['affordance']['width'], item['affordance']['y']+item['affordance']['height'] 195 | 196 | # 定义矩形的四个顶点坐标 197 | pts = np.array([ 198 | [x_min, y_min], # 左上角 199 | [x_max, y_min], # 右上角 200 | [x_max, y_max], # 右下角 201 | [x_min, y_max] # 左下角 202 | ], dtype=np.float32) 203 | 204 | # 绘制矩形框 205 | cv2.polylines(image, [pts.astype(int)], isClosed=True, color=color, thickness=thickness) 206 | 207 | # 获取相对路径并拼接目标路径 208 | relative_path = os.path.relpath(filepath, img_dir) # 获取相对于 img_dir 的相对路径 209 | output_img_path = os.path.join(output_img_dir, relative_path) # 拼接目标路径 210 | 211 | # 创建目标文件夹 212 | output_directory = os.path.dirname(output_img_path) 213 | if not os.path.exists(output_directory): 214 | os.makedirs(output_directory) 215 | 216 | # 打印调试信息 217 | print(f"Input filepath: {filepath}") 218 | print(f"Output image path: {output_img_path}") 219 | print(f"Output directory: {output_directory}") 220 | 221 | # 保存图像 222 | cv2.imwrite(output_img_path, image) 223 | 224 | ``` 225 | 226 | 227 | 228 | 229 | 230 | ### Trajectory 231 | 232 | 233 |
234 | 235 | 236 |
237 | 238 | ```json 239 | { 240 | "id": 456, 241 | "meta_data": { 242 | "original_dataset": "bridge", 243 | "original_width": 640, 244 | "original_height": 480 245 | }, 246 | "instruction": "reach for the carrot", 247 | "points": [ 248 | [ 249 | 265.45454545454544, 250 | 120.0 251 | ], 252 | [ 253 | 275.1515151515152, 254 | 162.42424242424244 255 | ], 256 | [ 257 | 280.0, 258 | 213.33333333333331 259 | ], 260 | [ 261 | 280.0, 262 | 259.3939393939394 263 | ] 264 | ] 265 | }, 266 | ``` 267 | 268 | #### Visualize Code 269 | 270 | ```python 271 | import json 272 | import os 273 | from PIL import Image, ImageDraw 274 | 275 | trajectory_final = '/path/to/your/trajectory_json' 276 | img_dir = '/path/to/your/original/images/dir' 277 | output_img_dir = '/path/to/your/visualzed/images/dir' 278 | 279 | with open(trajectory_final, 'r') as f: 280 | data = json.load(f) 281 | for item in data: 282 | filepath = os.path.join(img_dir, item['id']) 283 | points = item['points'] 284 | 285 | image = Image.open(filepath).convert("RGB") # 确保图像是 RGB 模式 286 | draw = ImageDraw.Draw(image) # 创建绘图对象 287 | # 定义颜色和线宽 288 | color = (255, 0, 0) # 红色 (RGB 格式) 289 | thickness = 2 290 | 291 | 292 | scaled_points = [ 293 | (point[0], point[1]) 294 | for point in points 295 | ] 296 | # 按照顺序连接相邻的点 297 | for i in range(len(scaled_points) - 1): 298 | draw.line([scaled_points[i], scaled_points[i + 1]], fill=color, width=thickness) 299 | 300 | # 获取相对路径并拼接目标路径 301 | relative_path = os.path.relpath(filepath, img_dir) 302 | output_img_path = os.path.join(output_img_dir, relative_path) 303 | 304 | # 创建目标文件夹 305 | output_directory = os.path.dirname(output_img_path) 306 | if not os.path.exists(output_directory): 307 | os.makedirs(output_directory) 308 | 309 | # 打印调试信息 310 | print(f"Input filepath: {filepath}") 311 | print(f"Output image path: {output_img_path}") 312 | print(f"Output directory: {output_directory}") 313 | 314 | # 保存图像 315 | image.save(output_img_path) 316 | ``` 317 | 318 | 319 | 320 | ## Evaluation🚀 321 | Powered by ShareRobot dataset, RoboBrain Model achieves stunning results.🌟 322 | 323 | **Task planning capability**: The RoboBrain model trained on ShareRobot achieves a 30.2% improvement in task decomposition accuracy (BLEU-4 reached 55.05%), significantly better than existing methods;   324 | 325 | **Affordance perception capability**: The average accuracy (AP) of object affordance area recognition is 27.1%, which is 14.6% higher than the baseline model. 326 | 327 | **Trajectory prediction capability**: End-effector trajectory prediction error reduced by 42.9% (DFD index decreased from 0.191 to 0.109);      328 | 329 | **General capability**: In the OpenEQA benchmark, the scene understanding score surpasses general multimodal models such as GPT-4V. The RoboBrain model trained with ShareRobot did not sacrifice its general ability. 330 | 331 | ![evaluation_planning](./images/evaluation_planning.png) 332 |
333 | 334 | 335 |
336 | 337 | 338 | ## Reference 339 | 340 | [1] Pierre Sermanet, Tianli Ding, Jeffrey Zhao, Fei Xia, Debidatta Dwibedi, Keerthana Gopalakrishnan, Christine Chan,Gabriel Dulac-Arnold, Sharath Maddineni, Nikhil J Joshi,et al. Robovqa: Multimodal long-horizon reasoning forrobotics. In ICRA, pages 645–652, 2024. 341 | 342 | [2] Abby O’Neill, Abdul Rehman, Abhinav Gupta, AbhiramMaddukuri, Abhishek Gupta, Abhishek Padalkar, AbrahamLee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, et al.Open x-embodiment: Robotic learning datasets and rt-xmodels. arXiv preprint arXiv:2310.08864, 2023. 343 | 344 | 345 | 346 | ## Citation 347 | ``` 348 | @article{ji2025robobrain, 349 | title={RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete}, 350 | author={Ji, Yuheng and Tan, Huajie and Shi, Jiayu and Hao, Xiaoshuai and Zhang, Yuan and Zhang, Hengyuan and Wang, Pengwei and Zhao, Mengdi and Mu, Yao and An, Pengju and others}, 351 | journal={arXiv preprint arXiv:2502.21257}, 352 | year={2025} 353 | } 354 | ``` 355 | -------------------------------------------------------------------------------- /images/1af4535a-acc3-4417-ae33-675f4301f560.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FlagOpen/ShareRobot/1ff23d64d07834a47b7d428380ceb308a2062d65/images/1af4535a-acc3-4417-ae33-675f4301f560.png -------------------------------------------------------------------------------- /images/2d94d985-d47e-4899-9760-c1cb8f19cd89.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FlagOpen/ShareRobot/1ff23d64d07834a47b7d428380ceb308a2062d65/images/2d94d985-d47e-4899-9760-c1cb8f19cd89.png -------------------------------------------------------------------------------- /images/5b923b31-dbbf-470f-af09-5125f5b91ab0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FlagOpen/ShareRobot/1ff23d64d07834a47b7d428380ceb308a2062d65/images/5b923b31-dbbf-470f-af09-5125f5b91ab0.png -------------------------------------------------------------------------------- /images/80237155-9b7b-4f70-9c2e-8ee38029becd.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FlagOpen/ShareRobot/1ff23d64d07834a47b7d428380ceb308a2062d65/images/80237155-9b7b-4f70-9c2e-8ee38029becd.png -------------------------------------------------------------------------------- /images/a608d080-665a-4ab1-bd8f-d5bd121454da.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FlagOpen/ShareRobot/1ff23d64d07834a47b7d428380ceb308a2062d65/images/a608d080-665a-4ab1-bd8f-d5bd121454da.png -------------------------------------------------------------------------------- /images/a7817c0b-04b1-4a7c-9535-f9ff7801a689.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FlagOpen/ShareRobot/1ff23d64d07834a47b7d428380ceb308a2062d65/images/a7817c0b-04b1-4a7c-9535-f9ff7801a689.png -------------------------------------------------------------------------------- /images/data-demo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FlagOpen/ShareRobot/1ff23d64d07834a47b7d428380ceb308a2062d65/images/data-demo.jpg -------------------------------------------------------------------------------- /images/ee709e8b-6f05-428d-abff-2578914aeb0d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FlagOpen/ShareRobot/1ff23d64d07834a47b7d428380ceb308a2062d65/images/ee709e8b-6f05-428d-abff-2578914aeb0d.png -------------------------------------------------------------------------------- /images/evaluation_affordance.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FlagOpen/ShareRobot/1ff23d64d07834a47b7d428380ceb308a2062d65/images/evaluation_affordance.png -------------------------------------------------------------------------------- /images/evaluation_planning.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FlagOpen/ShareRobot/1ff23d64d07834a47b7d428380ceb308a2062d65/images/evaluation_planning.png -------------------------------------------------------------------------------- /images/evaluation_trajectory.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FlagOpen/ShareRobot/1ff23d64d07834a47b7d428380ceb308a2062d65/images/evaluation_trajectory.png --------------------------------------------------------------------------------