├── .gitignore ├── README.md ├── configs ├── config.yaml └── og_scene_file_pen.json ├── constraint_generation.py ├── environment.py ├── ik_solver.py ├── keypoint_proposal.py ├── main.py ├── media ├── pen-in-holder-disturbances.gif └── pen-in-holder.gif ├── og_utils.py ├── path_solver.py ├── requirements.txt ├── subgoal_solver.py ├── transform_utils.py ├── utils.py ├── visualizer.py └── vlm_query ├── pen ├── metadata.json ├── query_img.png ├── stage1_subgoal_constraints.txt ├── stage2_path_constraints.txt ├── stage2_subgoal_constraints.txt ├── stage3_path_constraints.txt └── stage3_subgoal_constraints.txt └── prompt_template.txt /.gitignore: -------------------------------------------------------------------------------- 1 | *cache* 2 | **/__pycache__/** 3 | **/*.egg-info/ 4 | *.egg -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation 2 | 3 | #### [[Project Page]](https://rekep-robot.github.io/) [[Paper]](https://rekep-robot.github.io/rekep.pdf) [[Video]](https://youtu.be/2S8YhBdLdww) 4 | 5 | [Wenlong Huang](https://wenlong.page)¹, [Chen Wang](https://www.chenwangjeremy.net/)^1*, [Yunzhu Li](https://yunzhuli.github.io/)^2*, [Ruohan Zhang](https://ai.stanford.edu/~zharu/)¹, [Li Fei-Fei](https://profiles.stanford.edu/fei-fei-li)¹ (\* indicates equal contributions) 6 | 7 | ¹Stanford University, ³Columbia University 8 | 9 |

10 | 11 | This is the official demo code for [ReKep](https://rekep-robot.github.io/) implemented in [OmniGibson](https://behavior.stanford.edu/omnigibson/index.html). ReKep is a method that uses large vision models and vision-language models in a hierarchical optimization framework to generate closed-loop trajectories for manipulation tasks. 12 | 13 | **Note: This codebase currently does not contain the perception pipeline used in our real-world experiments, including keypoint tracking, mask tracking, and SDF reconstruction. Instead, it directly accesses this information from the simulation. If you are interested in deploying the code on a real robot, please refer to [this section](#real-world-deployment) below and the paper [Appendix](https://rekep-robot.github.io/rekep.pdf#page=20.09), where further details are provided.** 14 | 15 | ## Setup Instructions 16 | 17 | Note that this codebase is best run with a display. For running in headless mode, refer to the [instructions in OmniGibson](https://behavior.stanford.edu/omnigibson/getting_started/installation.html). 18 | 19 | - Install [OmniGibson](https://behavior.stanford.edu/omnigibson/getting_started/installation.html). This code is tested on [this commit](https://github.com/StanfordVL/OmniGibson/tree/cc0316a0574018a3cb2956fcbff3be75c07cdf0f). 20 | 21 | NOTE: If you encounter the warning `We did not find Isaac Sim under ~/.local/share/ov/pkg.` when running `./scripts/setup.sh` for OmniGibson, first ensure that you have installed Isaac Sim. Assuming Isaac Sim is installed in the default directory, then provide the following path `/home/[USERNAME]/.local/share/ov/pkg/isaac-sim-2023.1.1` (replace `[USERNAME]` with your username). 22 | 23 | - Install ReKep in the same conda environment: 24 | ```Shell 25 | conda activate omnigibson 26 | cd .. 27 | git clone https://github.com/huangwl18/ReKep.git 28 | cd ReKep 29 | pip install -r requirements.txt 30 | ``` 31 | 32 | - Obtain an [OpenAI API](https://openai.com/blog/openai-api) key and set it up as an environment variable: 33 | ```Shell 34 | export OPENAI_API_KEY="YOUR_OPENAI_API_KEY" 35 | ``` 36 | 37 | ## Running Demo 38 | 39 | We provide a demo "pen-in-holder" task that illustrates the core idea in ReKep. Below we provide several options to run the demo. 40 | 41 | Notes: 42 | - An additional `--visualize` flag may be added to visualize every solution from optimization, but since the pipeline needs to repeatedly solves optimization problems, the visualization is blocking and needs to be closed every time in order to continue (by pressing "ESC"). 43 | - Due to challenges of determinism of the physics simulator, different runs with the same random seed may produce different results. It is possible that the robot may fail at the provided task, especially when external disturbances are applied. In this case, we recommend running the demo again. 44 | 45 | ### Demo with Cached Query 46 | 47 | We recommend starting with the cached VLM query. 48 | 49 | ```Shell 50 | python main.py --use_cached_query [--visualize] 51 | ``` 52 | 53 | A video will be saved to `./videos` by default. 54 | 55 |

56 | 57 | ### Demo with External Disturbances 58 | 59 | Since ReKep acts as a closed-loop policy, it is robust to disturbances with automatic failure recovery both within stages and across stages. To demonstrate this in simulation, we apply the following disturbances for the "pen-in-holder" task: 60 | 61 | - Move the pen when robot is trying to grasp the pen 62 | 63 | - Take the pen out of the gripper when robot is trying to reorient the pen 64 | 65 | - Move the holder when robot is trying to drop the pen into the holder 66 | 67 |

68 | 69 | Note that since the disturbances are pre-defined, we recommend running with the cached query. 70 | 71 | ```Shell 72 | python main.py --use_cached_query --apply_disturbance [--visualize] 73 | ``` 74 | ### Demo with Live Query 75 | 76 | The following script can be run to query VLM for a new sequence of ReKep constraints and executes them on the robot: 77 | 78 | ```Shell 79 | python main.py [--visualize] 80 | ``` 81 | 82 | ## Setup New Environments 83 | Leveraging the diverse objects and scenes provided by [BEHAVIOR-1K](https://behavior.stanford.edu/) in [OmniGibson](https://behavior.stanford.edu/omnigibson/index.html), new tasks and scenes can be easily configured. To change the objects, you may check out the available objects as part of the BEHAVIOR assets on this [page](https://behavior.stanford.edu/knowledgebase/objects/index.html) (click on each object instance to view its visualization). After identifying the objects, we recommend making a copy of the JSON scene file `./configs/og_scene_file_pen.json` and edit the `state` and `objects_info` accordingly. Remember that the scene file need to be supplied to the `Main` class at initialization. Additional [scenes](https://behavior.stanford.edu/knowledgebase/scenes/index.html) and [robots](https://behavior.stanford.edu/omnigibson/getting_started/examples.html#robots) provided by BEHAVIOR-1K may also be possible, but they are currently untested. 84 | 85 | ## Real-World Deployment 86 | To deploy ReKep in the real world, most changes should only be needed inside `environment.py`. Specifically, all of the "exposed functions" need to be changed for the real world environment. The following components need to be implemented: 87 | 88 | - **Robot Controller**: Our real-world implementation uses the joint impedance controller from [Deoxys](https://github.com/UT-Austin-RPL/deoxys_control) for our Franka setup. Specifically, when `execute_action` in `environment.py` receives a target end-effector pose, we first calculate IK to obtain the target joint positions and send the command to the low-level controller. 89 | - **Keypoint Tracker**: Keypoints need to be tracked in order to perform closed-loop replanning, and this typically needs to be achieved using RGD-D cameras. Our real-world implementation uses similarity matching of [DINOv2](https://github.com/facebookresearch/dinov2) features calculated from multiple RGB-D cameras to track the keypoints (details may be found in the [paper](https://rekep-robot.github.io/rekep.pdf) appendix). Alternatively, we also recommend trying out specialized point trackers, such as [\[1\]](https://github.com/henry123-boy/SpaTracker), [\[2\]](https://github.com/google-deepmind/tapnet), [\[3\]](https://github.com/facebookresearch/co-tracker), and [\[4\]](https://github.com/aharley/pips2). 90 | - **SDF Reconstruction**: In order to avoid collision with irrelevant objects or the table, an SDF voxel grid of the environment needs to be provided to the solvers. Additionally, the SDF should ignore robot arm and any grasped objects. Our real-world implementation uses [nvblox_torch](https://github.com/NVlabs/nvblox_torch) for ESDF reconstruction, [cuRobo](https://github.com/NVlabs/curobo) for segmenting robot arm, and [Cutie](https://github.com/hkchengrex/Cutie) for object mask tracking. 91 | - **(Optional) Consistency Cost**: If closed-loop replanning is desired, we find it helpful to include a consistency cost in the solver to encourage the new solution to be close to the previous one (more details can be found in the [paper](https://rekep-robot.github.io/rekep.pdf) appendix). 92 | - **(Optional) Grasp Metric or Grasp Detector**: We include a cost that encourages top-down grasp pose in this codebase, in addition to the collision avoidance cost and the ReKep constraint (for identifying grasp keypoint), which collectively identify the 6 DoF grasp pose. Alternatively, other grasp metrics can be included, such as force-closure. Our real-world implementation instead uses grasp detectors [AnyGrasp](https://github.com/graspnet/anygrasp_sdk), which is implemented as a special routine because it is too slow to be used as an optimizable cost. 93 | 94 | Since there are several components in the pipeline, running them sequentially in the real world may be too slow. As a result, we recommend running the following compute-intensive components in separate processes in addition to the main process that runs `main.py`: `subgoal_solver`, `path_solver`, `keypoint_tracker`, `sdf_reconstruction`, `mask_tracker`, and `grasp_detector` (if used). 95 | 96 | ## Known Limitations 97 | - **Prompt Tuning**: Since ReKep relies on VLMs to generate code-based constraints to solve for the behaviors of the robot, it is sensitive to the specific VLM used and the prompts given to the VLM. Although visual prompting is used, typically we find that the prompts do not necessarily need to contain image-text examples or code examples, and pure-text high-level instructions can go a long way with the latest VLM such as `GPT-4o`. As a result, when starting with a new domain and if you observe that the default prompt is failing, we recommend the following steps: 1) pick a few representative tasks in the domain for validation purposes, 2) procedurally update the prompt with high-level text examples and instructions, and 3) test the prompt by checking the text output and return to step 2 if needed. 98 | 99 | - **Performance Tuning**: For clarity purpose, the entire pipeline is run sequentially. The latency introduced by the simulator and the solvers gets compounded. If this is a concern, we recommend running compute-intensive components, such as the simulator, the `subgoal_solver`, and the `path_solver`, in separate processes, but concurrency needs to be handled with care. More discussion can be found in the "Real-World Deployment" section. To tune the solver, the `objective` function typically takes the majority of time, and among the different costs, the reachability cost by the IK solver is typically most expensive to compute. Depending on the task, you may reduce `sampling_maxfun` and `maxiter` in `configs/config.yaml` or disable the reachability cost. 100 | 101 | - **Task-Space Planning**: Since the current pipeline performs planning in the task space (i.e., solving for end-effector poses) instead of the joint space, it occasionally may produce actions kinematically challenging for robots to achieve, especially for tasks that require 6 DoF motions. 102 | 103 | ## Troubleshooting 104 | 105 | For issues related to OmniGibson, please raise a issue [here](https://github.com/StanfordVL/OmniGibson/issues). You are also welcome to join the [Discord](https://discord.com/invite/bccR5vGFEx) channel for timely support. 106 | 107 | For other issues related to the code in this repo, feel free to raise an issue in this repo and we will try to address it when available. 108 | -------------------------------------------------------------------------------- /configs/config.yaml: -------------------------------------------------------------------------------- 1 | device: &device 'cuda' 2 | seed: &seed 0 3 | 4 | main: 5 | interpolate_pos_step_size: &interpolate_pos_step_size 0.05 # controls the density of the final returned path 6 | interpolate_rot_step_size: &interpolate_rot_step_size 0.34 # about 20 degrees 7 | grasp_depth: 0.10 8 | constraint_tolerance: 0.10 # for backtracking 9 | bounds_min: &bounds_min [-0.45, -0.75, 0.698] 10 | bounds_max: &bounds_max [0.10, 0.60, 1.2] 11 | sdf_voxel_size: 0.01 12 | vlm_camera: 0 13 | action_steps_per_iter: 5 14 | seed: *seed 15 | 16 | env: 17 | video_cache_size: 2000 18 | og_sim: 19 | physics_frequency: 100 20 | action_frequency: 15 21 | 22 | scene: 23 | name: Rs_int 24 | type: InteractiveTraversableScene 25 | scene_model: Rs_int 26 | 27 | bounds_min: *bounds_min 28 | bounds_max: *bounds_max 29 | interpolate_pos_step_size: *interpolate_pos_step_size 30 | interpolate_rot_step_size: *interpolate_rot_step_size 31 | 32 | robot: 33 | robot_config: 34 | name: Fetch 35 | type: Fetch 36 | obs_modalities: [rgb, depth] 37 | action_modalities: continuous 38 | action_normalize: False 39 | position: [-0.8, 0.0, 0.] 40 | grasping_mode: assisted 41 | 42 | controller_config: 43 | base: 44 | name: DifferentialDriveController 45 | arm_0: 46 | name: OperationalSpaceController 47 | kp: 250 48 | kp_limits: [50, 400] 49 | damping_ratio: 0.6 50 | gripper_0: 51 | name: MultiFingerGripperController 52 | command_input_limits: [0.0, 1.0] 53 | mode: smooth 54 | camera: 55 | name: JointController 56 | 57 | camera: 58 | # recorder 59 | 1: 60 | name: cam_1 61 | position: [ 0.6137, 0.4764, 1.4565] 62 | orientation: [ 0.3212, 0.4682, 0.6788, 0.4656] 63 | resolution: 480 64 | 65 | # vlm camera 66 | 0: 67 | name: cam_0 68 | position: [-0.1655, 0.0167, 1.3664] 69 | orientation: [ 0.0550, 0.0544, 0.7010, 0.7090] 70 | resolution: 480 71 | 72 | path_solver: 73 | opt_pos_step_size: 0.20 # controls the density of control points in the path 74 | opt_rot_step_size: 0.78 # controls the density of control points in the path 75 | opt_interpolate_pos_step_size: 0.02 # controls the density of collision checking inside optimization 76 | opt_interpolate_rot_step_size: 0.10 77 | max_collision_points: 60 78 | sampling_maxfun: 5000 79 | bounds_min: *bounds_min 80 | bounds_max: *bounds_max 81 | constraint_tolerance: 0.0001 82 | minimizer_options: 83 | maxiter: 200 84 | 85 | subgoal_solver: 86 | bounds_min: *bounds_min 87 | bounds_max: *bounds_max 88 | sampling_maxfun: 5000 89 | max_collision_points: 60 90 | constraint_tolerance: 0.0001 91 | minimizer_options: 92 | maxiter: 200 93 | 94 | keypoint_proposer: 95 | num_candidates_per_mask: 5 96 | min_dist_bt_keypoints: 0.06 97 | max_mask_ratio: 0.5 98 | device: *device 99 | bounds_min: *bounds_min 100 | bounds_max: *bounds_max 101 | seed: *seed 102 | 103 | constraint_generator: 104 | model: chatgpt-4o-latest 105 | temperature: 0.0 106 | max_tokens: 2048 107 | 108 | visualizer: 109 | bounds_min: *bounds_min 110 | bounds_max: *bounds_max -------------------------------------------------------------------------------- /configs/og_scene_file_pen.json: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": {}, 3 | "state": { 4 | "system_registry": {}, 5 | "object_registry": { 6 | "ceilings_jfsbfp_0": { 7 | "root_link": { 8 | "pos": [ 9 | -3.0498788356781006, 10 | 2.527890920639038, 11 | 2.549999713897705 12 | ], 13 | "ori": [ 14 | 0.0, 15 | 0.0, 16 | 0.0, 17 | 1.0 18 | ], 19 | "lin_vel": [ 20 | 0.0, 21 | 0.0, 22 | 0.0 23 | ], 24 | "ang_vel": [ 25 | 0.0, 26 | 0.0, 27 | 0.0 28 | ] 29 | }, 30 | "joints": {}, 31 | "non_kin": {} 32 | }, 33 | "ceilings_phrrml_0": { 34 | "root_link": { 35 | "pos": [ 36 | -0.06987879425287247, 37 | -1.287109375, 38 | 2.549999713897705 39 | ], 40 | "ori": [ 41 | 0.0, 42 | 0.0, 43 | 0.0, 44 | 1.0 45 | ], 46 | "lin_vel": [ 47 | 0.0, 48 | 0.0, 49 | 0.0 50 | ], 51 | "ang_vel": [ 52 | 0.0, 53 | 0.0, 54 | 0.0 55 | ] 56 | }, 57 | "joints": {}, 58 | "non_kin": {} 59 | }, 60 | "ceilings_rhfker_0": { 61 | "root_link": { 62 | "pos": [ 63 | 1.2158854007720947, 64 | 2.5133121013641357, 65 | 2.549999952316284 66 | ], 67 | "ori": [ 68 | 0.0, 69 | 0.0, 70 | 0.0, 71 | 1.0 72 | ], 73 | "lin_vel": [ 74 | 0.0, 75 | 0.0, 76 | 0.0 77 | ], 78 | "ang_vel": [ 79 | 0.0, 80 | 0.0, 81 | 0.0 82 | ] 83 | }, 84 | "joints": {}, 85 | "non_kin": {} 86 | }, 87 | "ceilings_yimfml_0": { 88 | "root_link": { 89 | "pos": [ 90 | -0.793107271194458, 91 | 2.536679267883301, 92 | 2.549999952316284 93 | ], 94 | "ori": [ 95 | 0.0, 96 | 0.0, 97 | 0.0, 98 | 1.0 99 | ], 100 | "lin_vel": [ 101 | 0.0, 102 | 0.0, 103 | 0.0 104 | ], 105 | "ang_vel": [ 106 | 0.0, 107 | 0.0, 108 | 0.0 109 | ] 110 | }, 111 | "joints": {}, 112 | "non_kin": {} 113 | }, 114 | "ceilings_zlhppj_0": { 115 | "root_link": { 116 | "pos": [ 117 | -3.0498788356781006, 118 | -0.24460941553115845, 119 | 2.549999713897705 120 | ], 121 | "ori": [ 122 | 0.0, 123 | 0.0, 124 | 0.0, 125 | 1.0 126 | ], 127 | "lin_vel": [ 128 | 0.0, 129 | 0.0, 130 | 0.0 131 | ], 132 | "ang_vel": [ 133 | 0.0, 134 | 0.0, 135 | 0.0 136 | ] 137 | }, 138 | "joints": {}, 139 | "non_kin": {} 140 | }, 141 | "floors_avvnuv_0": { 142 | "root_link": { 143 | "pos": [ 144 | -3.0498788356781006, 145 | 2.527891159057617, 146 | -0.15000027418136597 147 | ], 148 | "ori": [ 149 | 0.0, 150 | 0.0, 151 | 0.0, 152 | 1.0 153 | ], 154 | "lin_vel": [ 155 | 0.0, 156 | 0.0, 157 | 0.0 158 | ], 159 | "ang_vel": [ 160 | 0.0, 161 | 0.0, 162 | 0.0 163 | ] 164 | }, 165 | "joints": {}, 166 | "non_kin": {} 167 | }, 168 | "floors_gjemfr_0": { 169 | "root_link": { 170 | "pos": [ 171 | 1.2158854007720947, 172 | 2.5133121013641357, 173 | -0.15000024437904358 174 | ], 175 | "ori": [ 176 | 0.0, 177 | 0.0, 178 | 0.0, 179 | 1.0 180 | ], 181 | "lin_vel": [ 182 | 0.0, 183 | 0.0, 184 | 0.0 185 | ], 186 | "ang_vel": [ 187 | 0.0, 188 | 0.0, 189 | 0.0 190 | ] 191 | }, 192 | "joints": {}, 193 | "non_kin": {} 194 | }, 195 | "floors_ifmioj_0": { 196 | "root_link": { 197 | "pos": [ 198 | -0.793107271194458, 199 | 2.536679267883301, 200 | -0.15000024437904358 201 | ], 202 | "ori": [ 203 | 0.0, 204 | 0.0, 205 | 0.0, 206 | 1.0 207 | ], 208 | "lin_vel": [ 209 | 0.0, 210 | 0.0, 211 | 0.0 212 | ], 213 | "ang_vel": [ 214 | 0.0, 215 | 0.0, 216 | 0.0 217 | ] 218 | }, 219 | "joints": {}, 220 | "non_kin": {} 221 | }, 222 | "floors_ptwlei_0": { 223 | "root_link": { 224 | "pos": [ 225 | -0.06987883150577545, 226 | -1.287109136581421, 227 | -0.15000025928020477 228 | ], 229 | "ori": [ 230 | 0.0, 231 | 0.0, 232 | 0.0, 233 | 1.0 234 | ], 235 | "lin_vel": [ 236 | 0.0, 237 | 0.0, 238 | 0.0 239 | ], 240 | "ang_vel": [ 241 | 0.0, 242 | 0.0, 243 | 0.0 244 | ] 245 | }, 246 | "joints": {}, 247 | "non_kin": {} 248 | }, 249 | "floors_ucjdgj_0": { 250 | "root_link": { 251 | "pos": [ 252 | -3.0498788356781006, 253 | -0.24460916221141815, 254 | -0.15000034868717194 255 | ], 256 | "ori": [ 257 | 0.0, 258 | 0.0, 259 | 0.0, 260 | 1.0 261 | ], 262 | "lin_vel": [ 263 | 0.0, 264 | 0.0, 265 | 0.0 266 | ], 267 | "ang_vel": [ 268 | 0.0, 269 | 0.0, 270 | 0.0 271 | ] 272 | }, 273 | "joints": {}, 274 | "non_kin": {} 275 | }, 276 | "walls_axhynp_0": { 277 | "root_link": { 278 | "pos": [ 279 | -1.233049750328064, 280 | 3.5148379802703857, 281 | 1.1139445304870605 282 | ], 283 | "ori": [ 284 | 0.0, 285 | 0.0, 286 | 0.0, 287 | 1.0 288 | ], 289 | "lin_vel": [ 290 | 0.0, 291 | 0.0, 292 | 0.0 293 | ], 294 | "ang_vel": [ 295 | 0.0, 296 | 0.0, 297 | 0.0 298 | ] 299 | }, 300 | "joints": {}, 301 | "non_kin": {} 302 | }, 303 | "walls_bcjmdz_0": { 304 | "root_link": { 305 | "pos": [ 306 | 1.940032720565796, 307 | -0.25174951553344727, 308 | 1.0881959199905396 309 | ], 310 | "ori": [ 311 | 0.0, 312 | 0.0, 313 | 0.0, 314 | 1.0 315 | ], 316 | "lin_vel": [ 317 | 0.0, 318 | 0.0, 319 | 0.0 320 | ], 321 | "ang_vel": [ 322 | 0.0, 323 | 0.0, 324 | 0.0 325 | ] 326 | }, 327 | "joints": {}, 328 | "non_kin": {} 329 | }, 330 | "walls_dekben_0": { 331 | "root_link": { 332 | "pos": [ 333 | -0.04535660520195961, 334 | -4.115162372589111, 335 | 1.0688008069992065 336 | ], 337 | "ori": [ 338 | 0.0, 339 | 0.0, 340 | 0.0, 341 | 1.0 342 | ], 343 | "lin_vel": [ 344 | 0.0, 345 | 0.0, 346 | 0.0 347 | ], 348 | "ang_vel": [ 349 | 0.0, 350 | 0.0, 351 | 0.0 352 | ] 353 | }, 354 | "joints": {}, 355 | "non_kin": {} 356 | }, 357 | "walls_exiopd_0": { 358 | "root_link": { 359 | "pos": [ 360 | -1.1345733404159546, 361 | 1.54094398021698, 362 | 1.143774390220642 363 | ], 364 | "ori": [ 365 | 0.0, 366 | 0.0, 367 | 0.0, 368 | 1.0 369 | ], 370 | "lin_vel": [ 371 | 0.0, 372 | 0.0, 373 | 0.0 374 | ], 375 | "ang_vel": [ 376 | 0.0, 377 | 0.0, 378 | 0.0 379 | ] 380 | }, 381 | "joints": {}, 382 | "non_kin": {} 383 | }, 384 | "walls_gxvfxw_0": { 385 | "root_link": { 386 | "pos": [ 387 | -2.0799880027770996, 388 | -0.2523879408836365, 389 | 1.130429983139038 390 | ], 391 | "ori": [ 392 | 0.0, 393 | 0.0, 394 | 0.0, 395 | 1.0 396 | ], 397 | "lin_vel": [ 398 | 0.0, 399 | 0.0, 400 | 0.0 401 | ], 402 | "ang_vel": [ 403 | 0.0, 404 | 0.0, 405 | 0.0 406 | ] 407 | }, 408 | "joints": {}, 409 | "non_kin": {} 410 | }, 411 | "walls_gxynpx_0": { 412 | "root_link": { 413 | "pos": [ 414 | -3.062925100326538, 415 | -2.0301620960235596, 416 | 1.1158838272094727 417 | ], 418 | "ori": [ 419 | 0.0, 420 | 0.0, 421 | 0.0, 422 | 1.0 423 | ], 424 | "lin_vel": [ 425 | 0.0, 426 | 0.0, 427 | 0.0 428 | ], 429 | "ang_vel": [ 430 | 0.0, 431 | 0.0, 432 | 0.0 433 | ] 434 | }, 435 | "joints": {}, 436 | "non_kin": {} 437 | }, 438 | "walls_xlgydu_0": { 439 | "root_link": { 440 | "pos": [ 441 | -4.019967555999756, 442 | 0.7578846216201782, 443 | 1.0950665473937988 444 | ], 445 | "ori": [ 446 | 0.0, 447 | 0.0, 448 | 0.0, 449 | 1.0 450 | ], 451 | "lin_vel": [ 452 | 0.0, 453 | 0.0, 454 | 0.0 455 | ], 456 | "ang_vel": [ 457 | 0.0, 458 | 0.0, 459 | 0.0 460 | ] 461 | }, 462 | "joints": {}, 463 | "non_kin": {} 464 | }, 465 | "table_1": { 466 | "root_link": { 467 | "pos": [ 468 | 0.13507460057735443, 469 | -0.020923776552081108, 470 | 0.5174875855445862 471 | ], 472 | "ori": [ 473 | 5.044171302870382e-09, 474 | -1.635271473787725e-09, 475 | 0.002193866763263941, 476 | 0.9999977946281433 477 | ], 478 | "lin_vel": [ 479 | 0.0, 480 | 0.0, 481 | 0.0 482 | ], 483 | "ang_vel": [ 484 | 0.0, 485 | 0.0, 486 | 0.0 487 | ] 488 | }, 489 | "joints": {}, 490 | "non_kin": {} 491 | }, 492 | "pen_1": { 493 | "root_link": { 494 | "pos": [ 495 | -0.30, 496 | -0.15, 497 | 0.7780028676986694 498 | ], 499 | "ori": [ -0.7071068, 0, 0, 0.7071068 ], 500 | "lin_vel": [ 501 | 0.0, 502 | 0.0, 503 | 0.0 504 | ], 505 | "ang_vel": [ 506 | 0.0, 507 | 0.0, 508 | 0.0 509 | ] 510 | }, 511 | "joints": {}, 512 | "non_kin": {} 513 | }, 514 | "pencil_holder_1": { 515 | "root_link": { 516 | "pos": [ 517 | -0.30, 518 | 0.15, 519 | 0.7163046002388 520 | ], 521 | "ori": [ 522 | 0.00011037533113267273, 523 | -0.00012373141362331808, 524 | -1.8041100702248514e-06, 525 | 1.0000001192092896 526 | ], 527 | "lin_vel": [ 528 | 0.0, 529 | 0.0, 530 | 0.0 531 | ], 532 | "ang_vel": [ 533 | 0.0, 534 | 0.0, 535 | 0.0 536 | ] 537 | }, 538 | "joints": {}, 539 | "non_kin": {} 540 | } 541 | } 542 | }, 543 | "objects_info": { 544 | "init_info": { 545 | "ceilings_jfsbfp_0": { 546 | "class_module": "omnigibson.objects.dataset_object", 547 | "class_name": "DatasetObject", 548 | "args": { 549 | "name": "ceilings_jfsbfp_0", 550 | "prim_path": "/World/ceilings_jfsbfp_0", 551 | "category": "ceilings", 552 | "model": "jfsbfp", 553 | "uuid": 48693673, 554 | "scale": [ 555 | 1.0000121190165758, 556 | 1.0000027376842497, 557 | 0.9999999602635717 558 | ], 559 | "fixed_base": true, 560 | "in_rooms": [ 561 | "bathroom_0" 562 | ], 563 | "bddl_object_scope": null 564 | } 565 | }, 566 | "ceilings_phrrml_0": { 567 | "class_module": "omnigibson.objects.dataset_object", 568 | "class_name": "DatasetObject", 569 | "args": { 570 | "name": "ceilings_phrrml_0", 571 | "prim_path": "/World/ceilings_phrrml_0", 572 | "category": "ceilings", 573 | "model": "phrrml", 574 | "uuid": 74173089, 575 | "scale": [ 576 | 0.9999942598615925, 577 | 0.9999986871377896, 578 | 0.9999999602635717 579 | ], 580 | "fixed_base": true, 581 | "in_rooms": [ 582 | "living_room_0" 583 | ], 584 | "bddl_object_scope": null 585 | } 586 | }, 587 | "ceilings_rhfker_0": { 588 | "class_module": "omnigibson.objects.dataset_object", 589 | "class_name": "DatasetObject", 590 | "args": { 591 | "name": "ceilings_rhfker_0", 592 | "prim_path": "/World/ceilings_rhfker_0", 593 | "category": "ceilings", 594 | "model": "rhfker", 595 | "uuid": 42059608, 596 | "scale": [ 597 | 1.0000056252610265, 598 | 1.000002918863586, 599 | 0.9999999602635717 600 | ], 601 | "fixed_base": true, 602 | "in_rooms": [ 603 | "entryway_0" 604 | ], 605 | "bddl_object_scope": null 606 | } 607 | }, 608 | "ceilings_yimfml_0": { 609 | "class_module": "omnigibson.objects.dataset_object", 610 | "class_name": "DatasetObject", 611 | "args": { 612 | "name": "ceilings_yimfml_0", 613 | "prim_path": "/World/ceilings_yimfml_0", 614 | "category": "ceilings", 615 | "model": "yimfml", 616 | "uuid": 33802283, 617 | "scale": [ 618 | 0.999999934791504, 619 | 1.0000027376842497, 620 | 0.9999999602635717 621 | ], 622 | "fixed_base": true, 623 | "in_rooms": [ 624 | "kitchen_0" 625 | ], 626 | "bddl_object_scope": null 627 | } 628 | }, 629 | "ceilings_zlhppj_0": { 630 | "class_module": "omnigibson.objects.dataset_object", 631 | "class_name": "DatasetObject", 632 | "args": { 633 | "name": "ceilings_zlhppj_0", 634 | "prim_path": "/World/ceilings_zlhppj_0", 635 | "category": "ceilings", 636 | "model": "zlhppj", 637 | "uuid": 97287247, 638 | "scale": [ 639 | 1.0000119961301062, 640 | 0.9999980648281501, 641 | 1.0000008543339716 642 | ], 643 | "fixed_base": true, 644 | "in_rooms": [ 645 | "bedroom_0" 646 | ], 647 | "bddl_object_scope": null 648 | } 649 | }, 650 | "floors_avvnuv_0": { 651 | "class_module": "omnigibson.objects.dataset_object", 652 | "class_name": "DatasetObject", 653 | "args": { 654 | "name": "floors_avvnuv_0", 655 | "prim_path": "/World/floors_avvnuv_0", 656 | "category": "floors", 657 | "model": "avvnuv", 658 | "uuid": 12717065, 659 | "scale": [ 660 | 1.0000121190165758, 661 | 1.0000027376842497, 662 | 1.0000001589457446 663 | ], 664 | "fixed_base": true, 665 | "in_rooms": [ 666 | "bathroom_0" 667 | ], 668 | "bddl_object_scope": null 669 | } 670 | }, 671 | "floors_gjemfr_0": { 672 | "class_module": "omnigibson.objects.dataset_object", 673 | "class_name": "DatasetObject", 674 | "args": { 675 | "name": "floors_gjemfr_0", 676 | "prim_path": "/World/floors_gjemfr_0", 677 | "category": "floors", 678 | "model": "gjemfr", 679 | "uuid": 94669642, 680 | "scale": [ 681 | 1.0000056252610265, 682 | 1.000002918863586, 683 | 0.9999999602635717 684 | ], 685 | "fixed_base": true, 686 | "in_rooms": [ 687 | "entryway_0" 688 | ], 689 | "bddl_object_scope": null 690 | } 691 | }, 692 | "floors_ifmioj_0": { 693 | "class_module": "omnigibson.objects.dataset_object", 694 | "class_name": "DatasetObject", 695 | "args": { 696 | "name": "floors_ifmioj_0", 697 | "prim_path": "/World/floors_ifmioj_0", 698 | "category": "floors", 699 | "model": "ifmioj", 700 | "uuid": 62654673, 701 | "scale": [ 702 | 0.999999934791504, 703 | 1.0000027376842497, 704 | 0.9999999602635717 705 | ], 706 | "fixed_base": true, 707 | "in_rooms": [ 708 | "kitchen_0" 709 | ], 710 | "bddl_object_scope": null 711 | } 712 | }, 713 | "floors_ptwlei_0": { 714 | "class_module": "omnigibson.objects.dataset_object", 715 | "class_name": "DatasetObject", 716 | "args": { 717 | "name": "floors_ptwlei_0", 718 | "prim_path": "/World/floors_ptwlei_0", 719 | "category": "floors", 720 | "model": "ptwlei", 721 | "uuid": 64419761, 722 | "scale": [ 723 | 0.9999942598615925, 724 | 0.9999987714425131, 725 | 1.0000000596046483 726 | ], 727 | "fixed_base": true, 728 | "in_rooms": [ 729 | "living_room_0" 730 | ], 731 | "bddl_object_scope": null 732 | } 733 | }, 734 | "floors_ucjdgj_0": { 735 | "class_module": "omnigibson.objects.dataset_object", 736 | "class_name": "DatasetObject", 737 | "args": { 738 | "name": "floors_ucjdgj_0", 739 | "prim_path": "/World/floors_ucjdgj_0", 740 | "category": "floors", 741 | "model": "ucjdgj", 742 | "uuid": 82484688, 743 | "scale": [ 744 | 1.0000119961301062, 745 | 0.9999980648281501, 746 | 1.0000007549927372 747 | ], 748 | "fixed_base": true, 749 | "in_rooms": [ 750 | "bedroom_0" 751 | ], 752 | "bddl_object_scope": null 753 | } 754 | }, 755 | "walls_axhynp_0": { 756 | "class_module": "omnigibson.objects.dataset_object", 757 | "class_name": "DatasetObject", 758 | "args": { 759 | "name": "walls_axhynp_0", 760 | "prim_path": "/World/walls_axhynp_0", 761 | "category": "walls", 762 | "model": "axhynp", 763 | "uuid": 67186365, 764 | "scale": [ 765 | 1.000001272100645, 766 | 1.0000595362588338, 767 | 0.9999999602635717 768 | ], 769 | "fixed_base": true, 770 | "in_rooms": "", 771 | "bddl_object_scope": null 772 | } 773 | }, 774 | "walls_bcjmdz_0": { 775 | "class_module": "omnigibson.objects.dataset_object", 776 | "class_name": "DatasetObject", 777 | "args": { 778 | "name": "walls_bcjmdz_0", 779 | "prim_path": "/World/walls_bcjmdz_0", 780 | "category": "walls", 781 | "model": "bcjmdz", 782 | "uuid": 52223757, 783 | "scale": [ 784 | 1.000050671594632, 785 | 1.00000099325917, 786 | 0.9999999602635717 787 | ], 788 | "fixed_base": true, 789 | "in_rooms": "", 790 | "bddl_object_scope": null 791 | } 792 | }, 793 | "walls_dekben_0": { 794 | "class_module": "omnigibson.objects.dataset_object", 795 | "class_name": "DatasetObject", 796 | "args": { 797 | "name": "walls_dekben_0", 798 | "prim_path": "/World/walls_dekben_0", 799 | "category": "walls", 800 | "model": "dekben", 801 | "uuid": 34758690, 802 | "scale": [ 803 | 1.0000019097954302, 804 | 1.0000432761023288, 805 | 0.9999999602635717 806 | ], 807 | "fixed_base": true, 808 | "in_rooms": "", 809 | "bddl_object_scope": null 810 | } 811 | }, 812 | "walls_exiopd_0": { 813 | "class_module": "omnigibson.objects.dataset_object", 814 | "class_name": "DatasetObject", 815 | "args": { 816 | "name": "walls_exiopd_0", 817 | "prim_path": "/World/walls_exiopd_0", 818 | "category": "walls", 819 | "model": "exiopd", 820 | "uuid": 62109373, 821 | "scale": [ 822 | 1.000001345225585, 823 | 1.000042018513113, 824 | 0.9999999602635717 825 | ], 826 | "fixed_base": true, 827 | "in_rooms": "", 828 | "bddl_object_scope": null 829 | } 830 | }, 831 | "walls_gxvfxw_0": { 832 | "class_module": "omnigibson.objects.dataset_object", 833 | "class_name": "DatasetObject", 834 | "args": { 835 | "name": "walls_gxvfxw_0", 836 | "prim_path": "/World/walls_gxvfxw_0", 837 | "category": "walls", 838 | "model": "gxvfxw", 839 | "uuid": 39069885, 840 | "scale": [ 841 | 1.0000561936004457, 842 | 1.0000009917765513, 843 | 0.9999999602635717 844 | ], 845 | "fixed_base": true, 846 | "in_rooms": "", 847 | "bddl_object_scope": null 848 | } 849 | }, 850 | "walls_gxynpx_0": { 851 | "class_module": "omnigibson.objects.dataset_object", 852 | "class_name": "DatasetObject", 853 | "args": { 854 | "name": "walls_gxynpx_0", 855 | "prim_path": "/World/walls_gxynpx_0", 856 | "category": "walls", 857 | "model": "gxynpx", 858 | "uuid": 60001904, 859 | "scale": [ 860 | 1.0000041226080834, 861 | 1.0000436412659373, 862 | 0.9999999602635717 863 | ], 864 | "fixed_base": true, 865 | "in_rooms": "", 866 | "bddl_object_scope": null 867 | } 868 | }, 869 | "walls_xlgydu_0": { 870 | "class_module": "omnigibson.objects.dataset_object", 871 | "class_name": "DatasetObject", 872 | "args": { 873 | "name": "walls_xlgydu_0", 874 | "prim_path": "/World/walls_xlgydu_0", 875 | "category": "walls", 876 | "model": "xlgydu", 877 | "uuid": 99082996, 878 | "scale": [ 879 | 1.0000498312191335, 880 | 1.0000013621939055, 881 | 0.9999999602635717 882 | ], 883 | "fixed_base": true, 884 | "in_rooms": "", 885 | "bddl_object_scope": null 886 | } 887 | }, 888 | "table_1": { 889 | "class_module": "omnigibson.objects.dataset_object", 890 | "class_name": "DatasetObject", 891 | "args": { 892 | "name": "table_1", 893 | "prim_path": "/World/table_1", 894 | "category": "breakfast_table", 895 | "model": "dnsjnv", 896 | "uuid": 88058023, 897 | "scale": 2 898 | } 899 | }, 900 | "pen_1": { 901 | "class_module": "omnigibson.objects.dataset_object", 902 | "class_name": "DatasetObject", 903 | "args": { 904 | "name": "pen_1", 905 | "prim_path": "/World/pen_1", 906 | "category": "pen", 907 | "model": "gqwdor", 908 | "scale": [ 909 | 3.0, 910 | 3.0, 911 | 1.3 912 | ] 913 | } 914 | }, 915 | "pencil_holder_1": { 916 | "class_module": "omnigibson.objects.dataset_object", 917 | "class_name": "DatasetObject", 918 | "args": { 919 | "name": "pencil_holder_1", 920 | "prim_path": "/World/pencil_holder_1", 921 | "category": "pencil_holder", 922 | "model": "muqeud", 923 | "scale": [ 924 | 1.5, 925 | 1.5, 926 | 1.0 927 | ] 928 | } 929 | } 930 | } 931 | } 932 | } -------------------------------------------------------------------------------- /constraint_generation.py: -------------------------------------------------------------------------------- 1 | import base64 2 | from openai import OpenAI 3 | import os 4 | import cv2 5 | import json 6 | import parse 7 | import numpy as np 8 | import time 9 | from datetime import datetime 10 | 11 | # Function to encode the image 12 | def encode_image(image_path): 13 | with open(image_path, "rb") as image_file: 14 | return base64.b64encode(image_file.read()).decode('utf-8') 15 | 16 | class ConstraintGenerator: 17 | def __init__(self, config): 18 | self.config = config 19 | self.client = OpenAI(api_key=os.environ['OPENAI_API_KEY']) 20 | self.base_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), './vlm_query') 21 | with open(os.path.join(self.base_dir, 'prompt_template.txt'), 'r') as f: 22 | self.prompt_template = f.read() 23 | 24 | def _build_prompt(self, image_path, instruction): 25 | img_base64 = encode_image(image_path) 26 | prompt_text = self.prompt_template.format(instruction=instruction) 27 | # save prompt 28 | with open(os.path.join(self.task_dir, 'prompt.txt'), 'w') as f: 29 | f.write(prompt_text) 30 | messages = [ 31 | { 32 | "role": "user", 33 | "content": [ 34 | { 35 | "type": "text", 36 | "text": self.prompt_template.format(instruction=instruction) 37 | }, 38 | { 39 | "type": "image_url", 40 | "image_url": { 41 | "url": f"data:image/png;base64,{img_base64}" 42 | } 43 | }, 44 | ] 45 | } 46 | ] 47 | return messages 48 | 49 | def _parse_and_save_constraints(self, output, save_dir): 50 | # parse into function blocks 51 | lines = output.split("\n") 52 | functions = dict() 53 | for i, line in enumerate(lines): 54 | if line.startswith("def "): 55 | start = i 56 | name = line.split("(")[0].split("def ")[1] 57 | if line.startswith(" return "): 58 | end = i 59 | functions[name] = lines[start:end+1] 60 | # organize them based on hierarchy in function names 61 | groupings = dict() 62 | for name in functions: 63 | parts = name.split("_")[:-1] # last one is the constraint idx 64 | key = "_".join(parts) 65 | if key not in groupings: 66 | groupings[key] = [] 67 | groupings[key].append(name) 68 | # save them into files 69 | for key in groupings: 70 | with open(os.path.join(save_dir, f"{key}_constraints.txt"), "w") as f: 71 | for name in groupings[key]: 72 | f.write("\n".join(functions[name]) + "\n\n") 73 | print(f"Constraints saved to {save_dir}") 74 | 75 | def _parse_other_metadata(self, output): 76 | data_dict = dict() 77 | # find num_stages 78 | num_stages_template = "num_stages = {num_stages}" 79 | for line in output.split("\n"): 80 | num_stages = parse.parse(num_stages_template, line) 81 | if num_stages is not None: 82 | break 83 | if num_stages is None: 84 | raise ValueError("num_stages not found in output") 85 | data_dict['num_stages'] = int(num_stages['num_stages']) 86 | # find grasp_keypoints 87 | grasp_keypoints_template = "grasp_keypoints = {grasp_keypoints}" 88 | for line in output.split("\n"): 89 | grasp_keypoints = parse.parse(grasp_keypoints_template, line) 90 | if grasp_keypoints is not None: 91 | break 92 | if grasp_keypoints is None: 93 | raise ValueError("grasp_keypoints not found in output") 94 | # convert into list of ints 95 | grasp_keypoints = grasp_keypoints['grasp_keypoints'].replace("[", "").replace("]", "").split(",") 96 | grasp_keypoints = [int(x.strip()) for x in grasp_keypoints] 97 | data_dict['grasp_keypoints'] = grasp_keypoints 98 | # find release_keypoints 99 | release_keypoints_template = "release_keypoints = {release_keypoints}" 100 | for line in output.split("\n"): 101 | release_keypoints = parse.parse(release_keypoints_template, line) 102 | if release_keypoints is not None: 103 | break 104 | if release_keypoints is None: 105 | raise ValueError("release_keypoints not found in output") 106 | # convert into list of ints 107 | release_keypoints = release_keypoints['release_keypoints'].replace("[", "").replace("]", "").split(",") 108 | release_keypoints = [int(x.strip()) for x in release_keypoints] 109 | data_dict['release_keypoints'] = release_keypoints 110 | return data_dict 111 | 112 | def _save_metadata(self, metadata): 113 | for k, v in metadata.items(): 114 | if isinstance(v, np.ndarray): 115 | metadata[k] = v.tolist() 116 | with open(os.path.join(self.task_dir, 'metadata.json'), 'w') as f: 117 | json.dump(metadata, f) 118 | print(f"Metadata saved to {os.path.join(self.task_dir, 'metadata.json')}") 119 | 120 | def generate(self, img, instruction, metadata): 121 | """ 122 | Args: 123 | img (np.ndarray): image of the scene (H, W, 3) uint8 124 | instruction (str): instruction for the query 125 | Returns: 126 | save_dir (str): directory where the constraints 127 | """ 128 | # create a directory for the task 129 | fname = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") + "_" + instruction.lower().replace(" ", "_") 130 | self.task_dir = os.path.join(self.base_dir, fname) 131 | os.makedirs(self.task_dir, exist_ok=True) 132 | # save query image 133 | image_path = os.path.join(self.task_dir, 'query_img.png') 134 | cv2.imwrite(image_path, img[..., ::-1]) 135 | # build prompt 136 | messages = self._build_prompt(image_path, instruction) 137 | # stream back the response 138 | stream = self.client.chat.completions.create(model=self.config['model'], 139 | messages=messages, 140 | temperature=self.config['temperature'], 141 | max_tokens=self.config['max_tokens'], 142 | stream=True) 143 | output = "" 144 | start = time.time() 145 | for chunk in stream: 146 | print(f'[{time.time()-start:.2f}s] Querying OpenAI API...', end='\r') 147 | if chunk.choices[0].delta.content is not None: 148 | output += chunk.choices[0].delta.content 149 | print(f'[{time.time()-start:.2f}s] Querying OpenAI API...Done') 150 | # save raw output 151 | with open(os.path.join(self.task_dir, 'output_raw.txt'), 'w') as f: 152 | f.write(output) 153 | # parse and save constraints 154 | self._parse_and_save_constraints(output, self.task_dir) 155 | # save metadata 156 | metadata.update(self._parse_other_metadata(output)) 157 | self._save_metadata(metadata) 158 | return self.task_dir 159 | -------------------------------------------------------------------------------- /environment.py: -------------------------------------------------------------------------------- 1 | import time 2 | import numpy as np 3 | import os 4 | import datetime 5 | import transform_utils as T 6 | import trimesh 7 | import open3d as o3d 8 | import imageio 9 | import omnigibson as og 10 | from omnigibson.macros import gm 11 | from omnigibson.utils.usd_utils import PoseAPI, mesh_prim_mesh_to_trimesh_mesh, mesh_prim_shape_to_trimesh_mesh 12 | from omnigibson.robots.fetch import Fetch 13 | from omnigibson.controllers import IsGraspingState 14 | from og_utils import OGCamera 15 | from utils import ( 16 | bcolors, 17 | get_clock_time, 18 | angle_between_rotmat, 19 | angle_between_quats, 20 | get_linear_interpolation_steps, 21 | linear_interpolate_poses, 22 | ) 23 | from omnigibson.robots.manipulation_robot import ManipulationRobot 24 | from omnigibson.controllers.controller_base import ControlType, BaseController 25 | 26 | # Don't use GPU dynamics and use flatcache for performance boost 27 | gm.USE_GPU_DYNAMICS = True 28 | gm.ENABLE_FLATCACHE = False 29 | 30 | # some customization to the OG functions 31 | def custom_clip_control(self, control): 32 | """ 33 | Clips the inputted @control signal based on @control_limits. 34 | 35 | Args: 36 | control (Array[float]): control signal to clip 37 | 38 | Returns: 39 | Array[float]: Clipped control signal 40 | """ 41 | clipped_control = control.clip( 42 | self._control_limits[self.control_type][0][self.dof_idx], 43 | self._control_limits[self.control_type][1][self.dof_idx], 44 | ) 45 | idx = ( 46 | self._dof_has_limits[self.dof_idx] 47 | if self.control_type == ControlType.POSITION 48 | else [True] * self.control_dim 49 | ) 50 | if len(control) > 1: 51 | control[idx] = clipped_control[idx] 52 | return control 53 | 54 | Fetch._initialize = ManipulationRobot._initialize 55 | BaseController.clip_control = custom_clip_control 56 | 57 | class ReKepOGEnv: 58 | def __init__(self, config, scene_file, verbose=False): 59 | self.video_cache = [] 60 | self.config = config 61 | self.verbose = verbose 62 | self.config['scene']['scene_file'] = scene_file 63 | self.bounds_min = np.array(self.config['bounds_min']) 64 | self.bounds_max = np.array(self.config['bounds_max']) 65 | self.interpolate_pos_step_size = self.config['interpolate_pos_step_size'] 66 | self.interpolate_rot_step_size = self.config['interpolate_rot_step_size'] 67 | # create omnigibson environment 68 | self.step_counter = 0 69 | self.og_env = og.Environment(dict(scene=self.config['scene'], robots=[self.config['robot']['robot_config']], env=self.config['og_sim'])) 70 | self.og_env.scene.update_initial_state() 71 | for _ in range(10): og.sim.step() 72 | # robot vars 73 | self.robot = self.og_env.robots[0] 74 | dof_idx = np.concatenate([self.robot.trunk_control_idx, 75 | self.robot.arm_control_idx[self.robot.default_arm]]) 76 | self.reset_joint_pos = self.robot.reset_joint_pos[dof_idx] 77 | self.world2robot_homo = T.pose_inv(T.pose2mat(self.robot.get_position_orientation())) 78 | # initialize cameras 79 | self._initialize_cameras(self.config['camera']) 80 | self.last_og_gripper_action = 1.0 81 | 82 | # ====================================== 83 | # = exposed functions 84 | # ====================================== 85 | def get_sdf_voxels(self, resolution, exclude_robot=True, exclude_obj_in_hand=True): 86 | """ 87 | open3d-based SDF computation 88 | 1. recursively get all usd prim and get their vertices and faces 89 | 2. compute SDF using open3d 90 | """ 91 | start = time.time() 92 | exclude_names = ['wall', 'floor', 'ceiling'] 93 | if exclude_robot: 94 | exclude_names += ['fetch', 'robot'] 95 | if exclude_obj_in_hand: 96 | assert self.config['robot']['robot_config']['grasping_mode'] in ['assisted', 'sticky'], "Currently only supported for assisted or sticky grasping" 97 | in_hand_obj = self.robot._ag_obj_in_hand[self.robot.default_arm] 98 | if in_hand_obj is not None: 99 | exclude_names.append(in_hand_obj.name.lower()) 100 | trimesh_objects = [] 101 | for obj in self.og_env.scene.objects: 102 | if any([name in obj.name.lower() for name in exclude_names]): 103 | continue 104 | for link in obj.links.values(): 105 | for mesh in link.collision_meshes.values(): 106 | mesh_type = mesh.prim.GetPrimTypeInfo().GetTypeName() 107 | if mesh_type == 'Mesh': 108 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(mesh.prim) 109 | else: 110 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(mesh.prim) 111 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(mesh.prim_path) 112 | trimesh_object.apply_transform(world_pose_w_scale) 113 | trimesh_objects.append(trimesh_object) 114 | # chain trimesh objects 115 | scene_mesh = trimesh.util.concatenate(trimesh_objects) 116 | # Create a scene and add the triangle mesh 117 | scene = o3d.t.geometry.RaycastingScene() 118 | vertex_positions = scene_mesh.vertices 119 | triangle_indices = scene_mesh.faces 120 | vertex_positions = o3d.core.Tensor(vertex_positions, dtype=o3d.core.Dtype.Float32) 121 | triangle_indices = o3d.core.Tensor(triangle_indices, dtype=o3d.core.Dtype.UInt32) 122 | _ = scene.add_triangles(vertex_positions, triangle_indices) # we do not need the geometry ID for mesh 123 | # create a grid 124 | shape = np.ceil((self.bounds_max - self.bounds_min) / resolution).astype(int) 125 | steps = (self.bounds_max - self.bounds_min) / shape 126 | grid = np.mgrid[self.bounds_min[0]:self.bounds_max[0]:steps[0], 127 | self.bounds_min[1]:self.bounds_max[1]:steps[1], 128 | self.bounds_min[2]:self.bounds_max[2]:steps[2]] 129 | grid = grid.reshape(3, -1).T 130 | # compute SDF 131 | sdf_voxels = scene.compute_signed_distance(grid.astype(np.float32)) 132 | # convert back to np array 133 | sdf_voxels = sdf_voxels.cpu().numpy() 134 | # open3d has flipped sign from our convention 135 | sdf_voxels = -sdf_voxels 136 | sdf_voxels = sdf_voxels.reshape(shape) 137 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] SDF voxels computed in {time.time() - start:.4f} seconds{bcolors.ENDC}') 138 | return sdf_voxels 139 | 140 | def get_cam_obs(self): 141 | self.last_cam_obs = dict() 142 | for cam_id in self.cams: 143 | self.last_cam_obs[cam_id] = self.cams[cam_id].get_obs() # each containing rgb, depth, points, seg 144 | return self.last_cam_obs 145 | 146 | def register_keypoints(self, keypoints): 147 | """ 148 | Args: 149 | keypoints (np.ndarray): keypoints in the world frame of shape (N, 3) 150 | Returns: 151 | None 152 | Given a set of keypoints in the world frame, this function registers them so that their newest positions can be accessed later. 153 | """ 154 | if not isinstance(keypoints, np.ndarray): 155 | keypoints = np.array(keypoints) 156 | self.keypoints = keypoints 157 | self._keypoint_registry = dict() 158 | self._keypoint2object = dict() 159 | exclude_names = ['wall', 'floor', 'ceiling', 'table', 'fetch', 'robot'] 160 | for idx, keypoint in enumerate(keypoints): 161 | closest_distance = np.inf 162 | for obj in self.og_env.scene.objects: 163 | if any([name in obj.name.lower() for name in exclude_names]): 164 | continue 165 | for link in obj.links.values(): 166 | for mesh in link.visual_meshes.values(): 167 | mesh_prim_path = mesh.prim_path 168 | mesh_type = mesh.prim.GetPrimTypeInfo().GetTypeName() 169 | if mesh_type == 'Mesh': 170 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(mesh.prim) 171 | else: 172 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(mesh.prim) 173 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(mesh.prim_path) 174 | trimesh_object.apply_transform(world_pose_w_scale) 175 | points_transformed = trimesh_object.sample(1000) 176 | 177 | # find closest point 178 | dists = np.linalg.norm(points_transformed - keypoint, axis=1) 179 | point = points_transformed[np.argmin(dists)] 180 | distance = np.linalg.norm(point - keypoint) 181 | if distance < closest_distance: 182 | closest_distance = distance 183 | closest_prim_path = mesh_prim_path 184 | closest_point = point 185 | closest_obj = obj 186 | self._keypoint_registry[idx] = (closest_prim_path, PoseAPI.get_world_pose(closest_prim_path)) 187 | self._keypoint2object[idx] = closest_obj 188 | # overwrite the keypoint with the closest point 189 | self.keypoints[idx] = closest_point 190 | 191 | def get_keypoint_positions(self): 192 | """ 193 | Args: 194 | None 195 | Returns: 196 | np.ndarray: keypoints in the world frame of shape (N, 3) 197 | Given the registered keypoints, this function returns their current positions in the world frame. 198 | """ 199 | assert hasattr(self, '_keypoint_registry') and self._keypoint_registry is not None, "Keypoints have not been registered yet." 200 | keypoint_positions = [] 201 | for idx, (prim_path, init_pose) in self._keypoint_registry.items(): 202 | init_pose = T.pose2mat(init_pose) 203 | centering_transform = T.pose_inv(init_pose) 204 | keypoint_centered = np.dot(centering_transform, np.append(self.keypoints[idx], 1))[:3] 205 | curr_pose = T.pose2mat(PoseAPI.get_world_pose(prim_path)) 206 | keypoint = np.dot(curr_pose, np.append(keypoint_centered, 1))[:3] 207 | keypoint_positions.append(keypoint) 208 | return np.array(keypoint_positions) 209 | 210 | def get_object_by_keypoint(self, keypoint_idx): 211 | """ 212 | Args: 213 | keypoint_idx (int): the index of the keypoint 214 | Returns: 215 | pointer: the object that the keypoint is associated with 216 | Given the keypoint index, this function returns the name of the object that the keypoint is associated with. 217 | """ 218 | assert hasattr(self, '_keypoint2object') and self._keypoint2object is not None, "Keypoints have not been registered yet." 219 | return self._keypoint2object[keypoint_idx] 220 | 221 | def get_collision_points(self, noise=True): 222 | """ 223 | Get the points of the gripper and any object in hand. 224 | """ 225 | # add gripper collision points 226 | collision_points = [] 227 | for obj in self.og_env.scene.objects: 228 | if 'fetch' in obj.name.lower(): 229 | for name, link in obj.links.items(): 230 | if 'gripper' in name.lower() or 'wrist' in name.lower(): # wrist_roll and wrist_flex 231 | for collision_mesh in link.collision_meshes.values(): 232 | mesh_prim_path = collision_mesh.prim_path 233 | mesh_type = collision_mesh.prim.GetPrimTypeInfo().GetTypeName() 234 | if mesh_type == 'Mesh': 235 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(collision_mesh.prim) 236 | else: 237 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(collision_mesh.prim) 238 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(mesh_prim_path) 239 | trimesh_object.apply_transform(world_pose_w_scale) 240 | points_transformed = trimesh_object.sample(1000) 241 | # add to collision points 242 | collision_points.append(points_transformed) 243 | # add object in hand collision points 244 | in_hand_obj = self.robot._ag_obj_in_hand[self.robot.default_arm] 245 | if in_hand_obj is not None: 246 | for link in in_hand_obj.links.values(): 247 | for collision_mesh in link.collision_meshes.values(): 248 | mesh_type = collision_mesh.prim.GetPrimTypeInfo().GetTypeName() 249 | if mesh_type == 'Mesh': 250 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(collision_mesh.prim) 251 | else: 252 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(collision_mesh.prim) 253 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(collision_mesh.prim_path) 254 | trimesh_object.apply_transform(world_pose_w_scale) 255 | points_transformed = trimesh_object.sample(1000) 256 | # add to collision points 257 | collision_points.append(points_transformed) 258 | collision_points = np.concatenate(collision_points, axis=0) 259 | return collision_points 260 | 261 | def reset(self): 262 | self.og_env.reset() 263 | self.robot.reset() 264 | for _ in range(5): self._step() 265 | self.open_gripper() 266 | # moving arm to the side to unblock view 267 | ee_pose = self.get_ee_pose() 268 | ee_pose[:3] += np.array([0.0, -0.2, -0.1]) 269 | action = np.concatenate([ee_pose, [self.get_gripper_null_action()]]) 270 | self.execute_action(action, precise=True) 271 | self.video_cache = [] 272 | print(f'{bcolors.HEADER}Reset done.{bcolors.ENDC}') 273 | 274 | def is_grasping(self, candidate_obj=None): 275 | return self.robot.is_grasping(candidate_obj=candidate_obj) == IsGraspingState.TRUE 276 | 277 | def get_ee_pose(self): 278 | ee_pos, ee_xyzw = (self.robot.get_eef_position(), self.robot.get_eef_orientation()) 279 | ee_pose = np.concatenate([ee_pos, ee_xyzw]) # [7] 280 | return ee_pose 281 | 282 | def get_ee_pos(self): 283 | return self.get_ee_pose()[:3] 284 | 285 | def get_ee_quat(self): 286 | return self.get_ee_pose()[3:] 287 | 288 | def get_arm_joint_postions(self): 289 | assert isinstance(self.robot, Fetch), "The IK solver assumes the robot is a Fetch robot" 290 | arm = self.robot.default_arm 291 | dof_idx = np.concatenate([self.robot.trunk_control_idx, self.robot.arm_control_idx[arm]]) 292 | arm_joint_pos = self.robot.get_joint_positions()[dof_idx] 293 | return arm_joint_pos 294 | 295 | def close_gripper(self): 296 | """ 297 | Exposed interface: 1.0 for closed, -1.0 for open, 0.0 for no change 298 | Internal OG interface: 1.0 for open, 0.0 for closed 299 | """ 300 | if self.last_og_gripper_action == 0.0: 301 | return 302 | action = np.zeros(12) 303 | action[10:] = [0, 0] # gripper: float. 0. for closed, 1. for open. 304 | for _ in range(30): 305 | self._step(action) 306 | self.last_og_gripper_action = 0.0 307 | 308 | def open_gripper(self): 309 | if self.last_og_gripper_action == 1.0: 310 | return 311 | action = np.zeros(12) 312 | action[10:] = [1, 1] # gripper: float. 0. for closed, 1. for open. 313 | for _ in range(30): 314 | self._step(action) 315 | self.last_og_gripper_action = 1.0 316 | 317 | def get_last_og_gripper_action(self): 318 | return self.last_og_gripper_action 319 | 320 | def get_gripper_open_action(self): 321 | return -1.0 322 | 323 | def get_gripper_close_action(self): 324 | return 1.0 325 | 326 | def get_gripper_null_action(self): 327 | return 0.0 328 | 329 | def compute_target_delta_ee(self, target_pose): 330 | target_pos, target_xyzw = target_pose[:3], target_pose[3:] 331 | ee_pose = self.get_ee_pose() 332 | ee_pos, ee_xyzw = ee_pose[:3], ee_pose[3:] 333 | pos_diff = np.linalg.norm(ee_pos - target_pos) 334 | rot_diff = angle_between_quats(ee_xyzw, target_xyzw) 335 | return pos_diff, rot_diff 336 | 337 | def execute_action( 338 | self, 339 | action, 340 | precise=True, 341 | ): 342 | """ 343 | Moves the robot gripper to a target pose by specifying the absolute pose in the world frame and executes gripper action. 344 | 345 | Args: 346 | action (x, y, z, qx, qy, qz, qw, gripper_action): absolute target pose in the world frame + gripper action. 347 | precise (bool): whether to use small position and rotation thresholds for precise movement (robot would move slower). 348 | Returns: 349 | tuple: A tuple containing the position and rotation errors after reaching the target pose. 350 | """ 351 | if precise: 352 | pos_threshold = 0.03 353 | rot_threshold = 3.0 354 | else: 355 | pos_threshold = 0.10 356 | rot_threshold = 5.0 357 | action = np.array(action).copy() 358 | assert action.shape == (8,) 359 | target_pose = action[:7] 360 | gripper_action = action[7] 361 | 362 | # ====================================== 363 | # = status and safety check 364 | # ====================================== 365 | if np.any(target_pose[:3] < self.bounds_min) \ 366 | or np.any(target_pose[:3] > self.bounds_max): 367 | print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Target position is out of bounds, clipping to workspace bounds{bcolors.ENDC}') 368 | target_pose[:3] = np.clip(target_pose[:3], self.bounds_min, self.bounds_max) 369 | 370 | # ====================================== 371 | # = interpolation 372 | # ====================================== 373 | current_pose = self.get_ee_pose() 374 | pos_diff = np.linalg.norm(current_pose[:3] - target_pose[:3]) 375 | rot_diff = angle_between_quats(current_pose[3:7], target_pose[3:7]) 376 | pos_is_close = pos_diff < self.interpolate_pos_step_size 377 | rot_is_close = rot_diff < self.interpolate_rot_step_size 378 | if pos_is_close and rot_is_close: 379 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Skipping interpolation{bcolors.ENDC}') 380 | pose_seq = np.array([target_pose]) 381 | else: 382 | num_steps = get_linear_interpolation_steps(current_pose, target_pose, self.interpolate_pos_step_size, self.interpolate_rot_step_size) 383 | pose_seq = linear_interpolate_poses(current_pose, target_pose, num_steps) 384 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Interpolating for {num_steps} steps{bcolors.ENDC}') 385 | 386 | # ====================================== 387 | # = move to target pose 388 | # ====================================== 389 | # move faster for intermediate poses 390 | intermediate_pos_threshold = 0.10 391 | intermediate_rot_threshold = 5.0 392 | for pose in pose_seq[:-1]: 393 | self._move_to_waypoint(pose, intermediate_pos_threshold, intermediate_rot_threshold) 394 | # move to the final pose with required precision 395 | pose = pose_seq[-1] 396 | self._move_to_waypoint(pose, pos_threshold, rot_threshold, max_steps=20 if not precise else 40) 397 | # compute error 398 | pos_error, rot_error = self.compute_target_delta_ee(target_pose) 399 | self.verbose and print(f'\n{bcolors.BOLD}[environment.py | {get_clock_time()}] Move to pose completed (pos_error: {pos_error}, rot_error: {np.rad2deg(rot_error)}){bcolors.ENDC}\n') 400 | 401 | # ====================================== 402 | # = apply gripper action 403 | # ====================================== 404 | if gripper_action == self.get_gripper_open_action(): 405 | self.open_gripper() 406 | elif gripper_action == self.get_gripper_close_action(): 407 | self.close_gripper() 408 | elif gripper_action == self.get_gripper_null_action(): 409 | pass 410 | else: 411 | raise ValueError(f"Invalid gripper action: {gripper_action}") 412 | 413 | return pos_error, rot_error 414 | 415 | def sleep(self, seconds): 416 | start = time.time() 417 | while time.time() - start < seconds: 418 | self._step() 419 | 420 | def save_video(self, save_path=None): 421 | save_dir = os.path.join(os.path.dirname(__file__), 'videos') 422 | os.makedirs(save_dir, exist_ok=True) 423 | if save_path is None: 424 | save_path = os.path.join(save_dir, f'{datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")}.mp4') 425 | video_writer = imageio.get_writer(save_path, fps=30) 426 | for rgb in self.video_cache: 427 | video_writer.append_data(rgb) 428 | video_writer.close() 429 | return save_path 430 | 431 | # ====================================== 432 | # = internal functions 433 | # ====================================== 434 | def _check_reached_ee(self, target_pos, target_xyzw, pos_threshold, rot_threshold): 435 | """ 436 | this is supposed to be for true ee pose (franka hand) in robot frame 437 | """ 438 | current_pos = self.robot.get_eef_position() 439 | current_xyzw = self.robot.get_eef_orientation() 440 | current_rotmat = T.quat2mat(current_xyzw) 441 | target_rotmat = T.quat2mat(target_xyzw) 442 | # calculate position delta 443 | pos_diff = (target_pos - current_pos).flatten() 444 | pos_error = np.linalg.norm(pos_diff) 445 | # calculate rotation delta 446 | rot_error = angle_between_rotmat(current_rotmat, target_rotmat) 447 | # print status 448 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Curr pose: {current_pos}, {current_xyzw} (pos_error: {pos_error.round(4)}, rot_error: {np.rad2deg(rot_error).round(4)}){bcolors.ENDC}') 449 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Goal pose: {target_pos}, {target_xyzw} (pos_thres: {pos_threshold}, rot_thres: {rot_threshold}){bcolors.ENDC}') 450 | if pos_error < pos_threshold and rot_error < np.deg2rad(rot_threshold): 451 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] OSC pose reached (pos_error: {pos_error.round(4)}, rot_error: {np.rad2deg(rot_error).round(4)}){bcolors.ENDC}') 452 | return True, pos_error, rot_error 453 | return False, pos_error, rot_error 454 | 455 | def _move_to_waypoint(self, target_pose_world, pos_threshold=0.02, rot_threshold=3.0, max_steps=10): 456 | pos_errors = [] 457 | rot_errors = [] 458 | count = 0 459 | while count < max_steps: 460 | reached, pos_error, rot_error = self._check_reached_ee(target_pose_world[:3], target_pose_world[3:7], pos_threshold, rot_threshold) 461 | pos_errors.append(pos_error) 462 | rot_errors.append(rot_error) 463 | if reached: 464 | break 465 | # convert world pose to robot pose 466 | target_pose_robot = np.dot(self.world2robot_homo, T.convert_pose_quat2mat(target_pose_world)) 467 | # convert to relative pose to be used with the underlying controller 468 | relative_position = target_pose_robot[:3, 3] - self.robot.get_relative_eef_position() 469 | relative_quat = T.quat_distance(T.mat2quat(target_pose_robot[:3, :3]), self.robot.get_relative_eef_orientation()) 470 | assert isinstance(self.robot, Fetch), "this action space is only for fetch" 471 | action = np.zeros(12) # first 3 are base, which we don't use 472 | action[4:7] = relative_position 473 | action[7:10] = T.quat2axisangle(relative_quat) 474 | action[10:] = [self.last_og_gripper_action, self.last_og_gripper_action] 475 | # step the action 476 | _ = self._step(action=action) 477 | count += 1 478 | if count == max_steps: 479 | print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] OSC pose not reached after {max_steps} steps (pos_error: {pos_errors[-1].round(4)}, rot_error: {np.rad2deg(rot_errors[-1]).round(4)}){bcolors.ENDC}') 480 | 481 | def _step(self, action=None): 482 | if hasattr(self, 'disturbance_seq') and self.disturbance_seq is not None: 483 | next(self.disturbance_seq) 484 | if action is not None: 485 | self.og_env.step(action) 486 | else: 487 | og.sim.step() 488 | cam_obs = self.get_cam_obs() 489 | rgb = cam_obs[1]['rgb'] 490 | if len(self.video_cache) < self.config['video_cache_size']: 491 | self.video_cache.append(rgb) 492 | else: 493 | self.video_cache.pop(0) 494 | self.video_cache.append(rgb) 495 | self.step_counter += 1 496 | 497 | def _initialize_cameras(self, cam_config): 498 | """ 499 | ::param poses: list of tuples of (position, orientation) of the cameras 500 | """ 501 | self.cams = dict() 502 | for cam_id in cam_config: 503 | cam_id = int(cam_id) 504 | self.cams[cam_id] = OGCamera(self.og_env, cam_config[cam_id]) 505 | for _ in range(10): og.sim.render() -------------------------------------------------------------------------------- /ik_solver.py: -------------------------------------------------------------------------------- 1 | """ 2 | Adapted from OmniGibson and the Lula IK solver 3 | """ 4 | import omnigibson.lazy as lazy 5 | import numpy as np 6 | 7 | class IKSolver: 8 | """ 9 | Class for thinly wrapping Lula IK solver 10 | """ 11 | 12 | def __init__( 13 | self, 14 | robot_description_path, 15 | robot_urdf_path, 16 | eef_name, 17 | reset_joint_pos, 18 | world2robot_homo, 19 | ): 20 | # Create robot description, kinematics, and config 21 | self.robot_description = lazy.lula.load_robot(robot_description_path, robot_urdf_path) 22 | self.kinematics = self.robot_description.kinematics() 23 | self.config = lazy.lula.CyclicCoordDescentIkConfig() 24 | self.eef_name = eef_name 25 | self.reset_joint_pos = reset_joint_pos 26 | self.world2robot_homo = world2robot_homo 27 | 28 | def solve( 29 | self, 30 | target_pose_homo, 31 | position_tolerance=0.01, 32 | orientation_tolerance=0.05, 33 | position_weight=1.0, 34 | orientation_weight=0.05, 35 | max_iterations=150, 36 | initial_joint_pos=None, 37 | ): 38 | """ 39 | Backs out joint positions to achieve desired @target_pos and @target_quat 40 | 41 | Args: 42 | target_pose_homo (np.ndarray): [4, 4] homogeneous transformation matrix of the target pose in world frame 43 | position_tolerance (float): Maximum position error (L2-norm) for a successful IK solution 44 | orientation_tolerance (float): Maximum orientation error (per-axis L2-norm) for a successful IK solution 45 | position_weight (float): Weight for the relative importance of position error during CCD 46 | orientation_weight (float): Weight for the relative importance of position error during CCD 47 | max_iterations (int): Number of iterations used for each cyclic coordinate descent. 48 | initial_joint_pos (None or n-array): If specified, will set the initial cspace seed when solving for joint 49 | positions. Otherwise, will use self.reset_joint_pos 50 | 51 | Returns: 52 | ik_results (lazy.lula.CyclicCoordDescentIkResult): IK result object containing the joint positions and other information. 53 | """ 54 | # convert target pose to robot base frame 55 | target_pose_robot = np.dot(self.world2robot_homo, target_pose_homo) 56 | target_pose_pos = target_pose_robot[:3, 3] 57 | target_pose_rot = target_pose_robot[:3, :3] 58 | ik_target_pose = lazy.lula.Pose3(lazy.lula.Rotation3(target_pose_rot), target_pose_pos) 59 | # Set the cspace seed and tolerance 60 | initial_joint_pos = self.reset_joint_pos if initial_joint_pos is None else np.array(initial_joint_pos) 61 | self.config.cspace_seeds = [initial_joint_pos] 62 | self.config.position_tolerance = position_tolerance 63 | self.config.orientation_tolerance = orientation_tolerance 64 | self.config.ccd_position_weight = position_weight 65 | self.config.ccd_orientation_weight = orientation_weight 66 | self.config.max_num_descents = max_iterations 67 | # Compute target joint positions 68 | ik_results = lazy.lula.compute_ik_ccd(self.kinematics, ik_target_pose, self.eef_name, self.config) 69 | return ik_results -------------------------------------------------------------------------------- /keypoint_proposal.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import cv2 4 | from torch.nn.functional import interpolate 5 | from kmeans_pytorch import kmeans 6 | from utils import filter_points_by_bounds 7 | from sklearn.cluster import MeanShift 8 | 9 | class KeypointProposer: 10 | def __init__(self, config): 11 | self.config = config 12 | self.device = torch.device(self.config['device']) 13 | self.dinov2 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14').eval().to(self.device) 14 | self.bounds_min = np.array(self.config['bounds_min']) 15 | self.bounds_max = np.array(self.config['bounds_max']) 16 | self.mean_shift = MeanShift(bandwidth=self.config['min_dist_bt_keypoints'], bin_seeding=True, n_jobs=32) 17 | self.patch_size = 14 # dinov2 18 | np.random.seed(self.config['seed']) 19 | torch.manual_seed(self.config['seed']) 20 | torch.cuda.manual_seed(self.config['seed']) 21 | 22 | def get_keypoints(self, rgb, points, masks): 23 | # preprocessing 24 | transformed_rgb, rgb, points, masks, shape_info = self._preprocess(rgb, points, masks) 25 | # get features 26 | features_flat = self._get_features(transformed_rgb, shape_info) 27 | # for each mask, cluster in feature space to get meaningful regions, and uske their centers as keypoint candidates 28 | candidate_keypoints, candidate_pixels, candidate_rigid_group_ids = self._cluster_features(points, features_flat, masks) 29 | # exclude keypoints that are outside of the workspace 30 | within_space = filter_points_by_bounds(candidate_keypoints, self.bounds_min, self.bounds_max, strict=True) 31 | candidate_keypoints = candidate_keypoints[within_space] 32 | candidate_pixels = candidate_pixels[within_space] 33 | candidate_rigid_group_ids = candidate_rigid_group_ids[within_space] 34 | # merge close points by clustering in cartesian space 35 | merged_indices = self._merge_clusters(candidate_keypoints) 36 | candidate_keypoints = candidate_keypoints[merged_indices] 37 | candidate_pixels = candidate_pixels[merged_indices] 38 | candidate_rigid_group_ids = candidate_rigid_group_ids[merged_indices] 39 | # sort candidates by locations 40 | sort_idx = np.lexsort((candidate_pixels[:, 0], candidate_pixels[:, 1])) 41 | candidate_keypoints = candidate_keypoints[sort_idx] 42 | candidate_pixels = candidate_pixels[sort_idx] 43 | candidate_rigid_group_ids = candidate_rigid_group_ids[sort_idx] 44 | # project keypoints to image space 45 | projected = self._project_keypoints_to_img(rgb, candidate_pixels, candidate_rigid_group_ids, masks, features_flat) 46 | return candidate_keypoints, projected 47 | 48 | def _preprocess(self, rgb, points, masks): 49 | # convert masks to binary masks 50 | masks = [masks == uid for uid in np.unique(masks)] 51 | # ensure input shape is compatible with dinov2 52 | H, W, _ = rgb.shape 53 | patch_h = int(H // self.patch_size) 54 | patch_w = int(W // self.patch_size) 55 | new_H = patch_h * self.patch_size 56 | new_W = patch_w * self.patch_size 57 | transformed_rgb = cv2.resize(rgb, (new_W, new_H)) 58 | transformed_rgb = transformed_rgb.astype(np.float32) / 255.0 # float32 [H, W, 3] 59 | # shape info 60 | shape_info = { 61 | 'img_h': H, 62 | 'img_w': W, 63 | 'patch_h': patch_h, 64 | 'patch_w': patch_w, 65 | } 66 | return transformed_rgb, rgb, points, masks, shape_info 67 | 68 | def _project_keypoints_to_img(self, rgb, candidate_pixels, candidate_rigid_group_ids, masks, features_flat): 69 | projected = rgb.copy() 70 | # overlay keypoints on the image 71 | for keypoint_count, pixel in enumerate(candidate_pixels): 72 | displayed_text = f"{keypoint_count}" 73 | text_length = len(displayed_text) 74 | # draw a box 75 | box_width = 30 + 10 * (text_length - 1) 76 | box_height = 30 77 | cv2.rectangle(projected, (pixel[1] - box_width // 2, pixel[0] - box_height // 2), (pixel[1] + box_width // 2, pixel[0] + box_height // 2), (255, 255, 255), -1) 78 | cv2.rectangle(projected, (pixel[1] - box_width // 2, pixel[0] - box_height // 2), (pixel[1] + box_width // 2, pixel[0] + box_height // 2), (0, 0, 0), 2) 79 | # draw text 80 | org = (pixel[1] - 7 * (text_length), pixel[0] + 7) 81 | color = (255, 0, 0) 82 | cv2.putText(projected, str(keypoint_count), org, cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2) 83 | keypoint_count += 1 84 | return projected 85 | 86 | @torch.inference_mode() 87 | @torch.amp.autocast('cuda') 88 | def _get_features(self, transformed_rgb, shape_info): 89 | img_h = shape_info['img_h'] 90 | img_w = shape_info['img_w'] 91 | patch_h = shape_info['patch_h'] 92 | patch_w = shape_info['patch_w'] 93 | # get features 94 | img_tensors = torch.from_numpy(transformed_rgb).permute(2, 0, 1).unsqueeze(0).to(self.device) # float32 [1, 3, H, W] 95 | assert img_tensors.shape[1] == 3, "unexpected image shape" 96 | features_dict = self.dinov2.forward_features(img_tensors) 97 | raw_feature_grid = features_dict['x_norm_patchtokens'] # float32 [num_cams, patch_h*patch_w, feature_dim] 98 | raw_feature_grid = raw_feature_grid.reshape(1, patch_h, patch_w, -1) # float32 [num_cams, patch_h, patch_w, feature_dim] 99 | # compute per-point feature using bilinear interpolation 100 | interpolated_feature_grid = interpolate(raw_feature_grid.permute(0, 3, 1, 2), # float32 [num_cams, feature_dim, patch_h, patch_w] 101 | size=(img_h, img_w), 102 | mode='bilinear').permute(0, 2, 3, 1).squeeze(0) # float32 [H, W, feature_dim] 103 | features_flat = interpolated_feature_grid.reshape(-1, interpolated_feature_grid.shape[-1]) # float32 [H*W, feature_dim] 104 | return features_flat 105 | 106 | def _cluster_features(self, points, features_flat, masks): 107 | candidate_keypoints = [] 108 | candidate_pixels = [] 109 | candidate_rigid_group_ids = [] 110 | for rigid_group_id, binary_mask in enumerate(masks): 111 | # ignore mask that is too large 112 | if np.mean(binary_mask) > self.config['max_mask_ratio']: 113 | continue 114 | # consider only foreground features 115 | obj_features_flat = features_flat[binary_mask.reshape(-1)] 116 | feature_pixels = np.argwhere(binary_mask) 117 | feature_points = points[binary_mask] 118 | # reduce dimensionality to be less sensitive to noise and texture 119 | obj_features_flat = obj_features_flat.double() 120 | (u, s, v) = torch.pca_lowrank(obj_features_flat, center=False) 121 | features_pca = torch.mm(obj_features_flat, v[:, :3]) 122 | features_pca = (features_pca - features_pca.min(0)[0]) / (features_pca.max(0)[0] - features_pca.min(0)[0]) 123 | X = features_pca 124 | # add feature_pixels as extra dimensions 125 | feature_points_torch = torch.tensor(feature_points, dtype=features_pca.dtype, device=features_pca.device) 126 | feature_points_torch = (feature_points_torch - feature_points_torch.min(0)[0]) / (feature_points_torch.max(0)[0] - feature_points_torch.min(0)[0]) 127 | X = torch.cat([X, feature_points_torch], dim=-1) 128 | # cluster features to get meaningful regions 129 | cluster_ids_x, cluster_centers = kmeans( 130 | X=X, 131 | num_clusters=self.config['num_candidates_per_mask'], 132 | distance='euclidean', 133 | device=self.device, 134 | ) 135 | cluster_centers = cluster_centers.to(self.device) 136 | for cluster_id in range(self.config['num_candidates_per_mask']): 137 | cluster_center = cluster_centers[cluster_id][:3] 138 | member_idx = cluster_ids_x == cluster_id 139 | member_points = feature_points[member_idx] 140 | member_pixels = feature_pixels[member_idx] 141 | member_features = features_pca[member_idx] 142 | dist = torch.norm(member_features - cluster_center, dim=-1) 143 | closest_idx = torch.argmin(dist) 144 | candidate_keypoints.append(member_points[closest_idx]) 145 | candidate_pixels.append(member_pixels[closest_idx]) 146 | candidate_rigid_group_ids.append(rigid_group_id) 147 | 148 | candidate_keypoints = np.array(candidate_keypoints) 149 | candidate_pixels = np.array(candidate_pixels) 150 | candidate_rigid_group_ids = np.array(candidate_rigid_group_ids) 151 | 152 | return candidate_keypoints, candidate_pixels, candidate_rigid_group_ids 153 | 154 | def _merge_clusters(self, candidate_keypoints): 155 | self.mean_shift.fit(candidate_keypoints) 156 | cluster_centers = self.mean_shift.cluster_centers_ 157 | merged_indices = [] 158 | for center in cluster_centers: 159 | dist = np.linalg.norm(candidate_keypoints - center, axis=-1) 160 | merged_indices.append(np.argmin(dist)) 161 | return merged_indices -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | import json 4 | import os 5 | import argparse 6 | from environment import ReKepOGEnv 7 | from keypoint_proposal import KeypointProposer 8 | from constraint_generation import ConstraintGenerator 9 | from ik_solver import IKSolver 10 | from subgoal_solver import SubgoalSolver 11 | from path_solver import PathSolver 12 | from visualizer import Visualizer 13 | import transform_utils as T 14 | from omnigibson.robots.fetch import Fetch 15 | from utils import ( 16 | bcolors, 17 | get_config, 18 | load_functions_from_txt, 19 | get_linear_interpolation_steps, 20 | spline_interpolate_poses, 21 | get_callable_grasping_cost_fn, 22 | print_opt_debug_dict, 23 | ) 24 | 25 | class Main: 26 | def __init__(self, scene_file, visualize=False): 27 | global_config = get_config(config_path="./configs/config.yaml") 28 | self.config = global_config['main'] 29 | self.bounds_min = np.array(self.config['bounds_min']) 30 | self.bounds_max = np.array(self.config['bounds_max']) 31 | self.visualize = visualize 32 | # set random seed 33 | np.random.seed(self.config['seed']) 34 | torch.manual_seed(self.config['seed']) 35 | torch.cuda.manual_seed(self.config['seed']) 36 | # initialize keypoint proposer and constraint generator 37 | self.keypoint_proposer = KeypointProposer(global_config['keypoint_proposer']) 38 | self.constraint_generator = ConstraintGenerator(global_config['constraint_generator']) 39 | # initialize environment 40 | self.env = ReKepOGEnv(global_config['env'], scene_file, verbose=False) 41 | # setup ik solver (for reachability cost) 42 | assert isinstance(self.env.robot, Fetch), "The IK solver assumes the robot is a Fetch robot" 43 | ik_solver = IKSolver( 44 | robot_description_path=self.env.robot.robot_arm_descriptor_yamls[self.env.robot.default_arm], 45 | robot_urdf_path=self.env.robot.urdf_path, 46 | eef_name=self.env.robot.eef_link_names[self.env.robot.default_arm], 47 | reset_joint_pos=self.env.reset_joint_pos, 48 | world2robot_homo=self.env.world2robot_homo, 49 | ) 50 | # initialize solvers 51 | self.subgoal_solver = SubgoalSolver(global_config['subgoal_solver'], ik_solver, self.env.reset_joint_pos) 52 | self.path_solver = PathSolver(global_config['path_solver'], ik_solver, self.env.reset_joint_pos) 53 | # initialize visualizer 54 | if self.visualize: 55 | self.visualizer = Visualizer(global_config['visualizer'], self.env) 56 | 57 | def perform_task(self, instruction, rekep_program_dir=None, disturbance_seq=None): 58 | self.env.reset() 59 | cam_obs = self.env.get_cam_obs() 60 | rgb = cam_obs[self.config['vlm_camera']]['rgb'] 61 | points = cam_obs[self.config['vlm_camera']]['points'] 62 | mask = cam_obs[self.config['vlm_camera']]['seg'] 63 | # ==================================== 64 | # = keypoint proposal and constraint generation 65 | # ==================================== 66 | if rekep_program_dir is None: 67 | keypoints, projected_img = self.keypoint_proposer.get_keypoints(rgb, points, mask) 68 | print(f'{bcolors.HEADER}Got {len(keypoints)} proposed keypoints{bcolors.ENDC}') 69 | if self.visualize: 70 | self.visualizer.show_img(projected_img) 71 | metadata = {'init_keypoint_positions': keypoints, 'num_keypoints': len(keypoints)} 72 | rekep_program_dir = self.constraint_generator.generate(projected_img, instruction, metadata) 73 | print(f'{bcolors.HEADER}Constraints generated{bcolors.ENDC}') 74 | # ==================================== 75 | # = execute 76 | # ==================================== 77 | self._execute(rekep_program_dir, disturbance_seq) 78 | 79 | def _update_disturbance_seq(self, stage, disturbance_seq): 80 | if disturbance_seq is not None: 81 | if stage in disturbance_seq and not self.applied_disturbance[stage]: 82 | # set the disturbance sequence, the generator will yield and instantiate one disturbance function for each env.step until it is exhausted 83 | self.env.disturbance_seq = disturbance_seq[stage](self.env) 84 | self.applied_disturbance[stage] = True 85 | 86 | def _execute(self, rekep_program_dir, disturbance_seq=None): 87 | # load metadata 88 | with open(os.path.join(rekep_program_dir, 'metadata.json'), 'r') as f: 89 | self.program_info = json.load(f) 90 | self.applied_disturbance = {stage: False for stage in range(1, self.program_info['num_stages'] + 1)} 91 | # register keypoints to be tracked 92 | self.env.register_keypoints(self.program_info['init_keypoint_positions']) 93 | # load constraints 94 | self.constraint_fns = dict() 95 | for stage in range(1, self.program_info['num_stages'] + 1): # stage starts with 1 96 | stage_dict = dict() 97 | for constraint_type in ['subgoal', 'path']: 98 | load_path = os.path.join(rekep_program_dir, f'stage{stage}_{constraint_type}_constraints.txt') 99 | get_grasping_cost_fn = get_callable_grasping_cost_fn(self.env) # special grasping function for VLM to call 100 | stage_dict[constraint_type] = load_functions_from_txt(load_path, get_grasping_cost_fn) if os.path.exists(load_path) else [] 101 | self.constraint_fns[stage] = stage_dict 102 | 103 | # bookkeeping of which keypoints can be moved in the optimization 104 | self.keypoint_movable_mask = np.zeros(self.program_info['num_keypoints'] + 1, dtype=bool) 105 | self.keypoint_movable_mask[0] = True # first keypoint is always the ee, so it's movable 106 | 107 | # main loop 108 | self.last_sim_step_counter = -np.inf 109 | self._update_stage(1) 110 | while True: 111 | scene_keypoints = self.env.get_keypoint_positions() 112 | self.keypoints = np.concatenate([[self.env.get_ee_pos()], scene_keypoints], axis=0) # first keypoint is always the ee 113 | self.curr_ee_pose = self.env.get_ee_pose() 114 | self.curr_joint_pos = self.env.get_arm_joint_postions() 115 | self.sdf_voxels = self.env.get_sdf_voxels(self.config['sdf_voxel_size']) 116 | self.collision_points = self.env.get_collision_points() 117 | # ==================================== 118 | # = decide whether to backtrack 119 | # ==================================== 120 | backtrack = False 121 | if self.stage > 1: 122 | path_constraints = self.constraint_fns[self.stage]['path'] 123 | for constraints in path_constraints: 124 | violation = constraints(self.keypoints[0], self.keypoints[1:]) 125 | if violation > self.config['constraint_tolerance']: 126 | backtrack = True 127 | break 128 | if backtrack: 129 | # determine which stage to backtrack to based on constraints 130 | for new_stage in range(self.stage - 1, 0, -1): 131 | path_constraints = self.constraint_fns[new_stage]['path'] 132 | # if no constraints, we can safely backtrack 133 | if len(path_constraints) == 0: 134 | break 135 | # otherwise, check if all constraints are satisfied 136 | all_constraints_satisfied = True 137 | for constraints in path_constraints: 138 | violation = constraints(self.keypoints[0], self.keypoints[1:]) 139 | if violation > self.config['constraint_tolerance']: 140 | all_constraints_satisfied = False 141 | break 142 | if all_constraints_satisfied: 143 | break 144 | print(f"{bcolors.HEADER}[stage={self.stage}] backtrack to stage {new_stage}{bcolors.ENDC}") 145 | self._update_stage(new_stage) 146 | else: 147 | # apply disturbance 148 | self._update_disturbance_seq(self.stage, disturbance_seq) 149 | # ==================================== 150 | # = get optimized plan 151 | # ==================================== 152 | if self.last_sim_step_counter == self.env.step_counter: 153 | print(f"{bcolors.WARNING}sim did not step forward within last iteration (HINT: adjust action_steps_per_iter to be larger or the pos_threshold to be smaller){bcolors.ENDC}") 154 | next_subgoal = self._get_next_subgoal(from_scratch=self.first_iter) 155 | next_path = self._get_next_path(next_subgoal, from_scratch=self.first_iter) 156 | self.first_iter = False 157 | self.action_queue = next_path.tolist() 158 | self.last_sim_step_counter = self.env.step_counter 159 | 160 | # ==================================== 161 | # = execute 162 | # ==================================== 163 | # determine if we proceed to the next stage 164 | count = 0 165 | while len(self.action_queue) > 0 and count < self.config['action_steps_per_iter']: 166 | next_action = self.action_queue.pop(0) 167 | precise = len(self.action_queue) == 0 168 | self.env.execute_action(next_action, precise=precise) 169 | count += 1 170 | if len(self.action_queue) == 0: 171 | if self.is_grasp_stage: 172 | self._execute_grasp_action() 173 | elif self.is_release_stage: 174 | self._execute_release_action() 175 | # if completed, save video and return 176 | if self.stage == self.program_info['num_stages']: 177 | self.env.sleep(2.0) 178 | save_path = self.env.save_video() 179 | print(f"{bcolors.OKGREEN}Video saved to {save_path}\n\n{bcolors.ENDC}") 180 | return 181 | # progress to next stage 182 | self._update_stage(self.stage + 1) 183 | 184 | def _get_next_subgoal(self, from_scratch): 185 | subgoal_constraints = self.constraint_fns[self.stage]['subgoal'] 186 | path_constraints = self.constraint_fns[self.stage]['path'] 187 | subgoal_pose, debug_dict = self.subgoal_solver.solve(self.curr_ee_pose, 188 | self.keypoints, 189 | self.keypoint_movable_mask, 190 | subgoal_constraints, 191 | path_constraints, 192 | self.sdf_voxels, 193 | self.collision_points, 194 | self.is_grasp_stage, 195 | self.curr_joint_pos, 196 | from_scratch=from_scratch) 197 | subgoal_pose_homo = T.convert_pose_quat2mat(subgoal_pose) 198 | # if grasp stage, back up a bit to leave room for grasping 199 | if self.is_grasp_stage: 200 | subgoal_pose[:3] += subgoal_pose_homo[:3, :3] @ np.array([-self.config['grasp_depth'] / 2.0, 0, 0]) 201 | debug_dict['stage'] = self.stage 202 | print_opt_debug_dict(debug_dict) 203 | if self.visualize: 204 | self.visualizer.visualize_subgoal(subgoal_pose) 205 | return subgoal_pose 206 | 207 | def _get_next_path(self, next_subgoal, from_scratch): 208 | path_constraints = self.constraint_fns[self.stage]['path'] 209 | path, debug_dict = self.path_solver.solve(self.curr_ee_pose, 210 | next_subgoal, 211 | self.keypoints, 212 | self.keypoint_movable_mask, 213 | path_constraints, 214 | self.sdf_voxels, 215 | self.collision_points, 216 | self.curr_joint_pos, 217 | from_scratch=from_scratch) 218 | print_opt_debug_dict(debug_dict) 219 | processed_path = self._process_path(path) 220 | if self.visualize: 221 | self.visualizer.visualize_path(processed_path) 222 | return processed_path 223 | 224 | def _process_path(self, path): 225 | # spline interpolate the path from the current ee pose 226 | full_control_points = np.concatenate([ 227 | self.curr_ee_pose.reshape(1, -1), 228 | path, 229 | ], axis=0) 230 | num_steps = get_linear_interpolation_steps(full_control_points[0], full_control_points[-1], 231 | self.config['interpolate_pos_step_size'], 232 | self.config['interpolate_rot_step_size']) 233 | dense_path = spline_interpolate_poses(full_control_points, num_steps) 234 | # add gripper action 235 | ee_action_seq = np.zeros((dense_path.shape[0], 8)) 236 | ee_action_seq[:, :7] = dense_path 237 | ee_action_seq[:, 7] = self.env.get_gripper_null_action() 238 | return ee_action_seq 239 | 240 | def _update_stage(self, stage): 241 | # update stage 242 | self.stage = stage 243 | self.is_grasp_stage = self.program_info['grasp_keypoints'][self.stage - 1] != -1 244 | self.is_release_stage = self.program_info['release_keypoints'][self.stage - 1] != -1 245 | # can only be grasp stage or release stage or none 246 | assert self.is_grasp_stage + self.is_release_stage <= 1, "Cannot be both grasp and release stage" 247 | if self.is_grasp_stage: # ensure gripper is open for grasping stage 248 | self.env.open_gripper() 249 | # clear action queue 250 | self.action_queue = [] 251 | # update keypoint movable mask 252 | self._update_keypoint_movable_mask() 253 | self.first_iter = True 254 | 255 | def _update_keypoint_movable_mask(self): 256 | for i in range(1, len(self.keypoint_movable_mask)): # first keypoint is ee so always movable 257 | keypoint_object = self.env.get_object_by_keypoint(i - 1) 258 | self.keypoint_movable_mask[i] = self.env.is_grasping(keypoint_object) 259 | 260 | def _execute_grasp_action(self): 261 | pregrasp_pose = self.env.get_ee_pose() 262 | grasp_pose = pregrasp_pose.copy() 263 | grasp_pose[:3] += T.quat2mat(pregrasp_pose[3:]) @ np.array([self.config['grasp_depth'], 0, 0]) 264 | grasp_action = np.concatenate([grasp_pose, [self.env.get_gripper_close_action()]]) 265 | self.env.execute_action(grasp_action, precise=True) 266 | 267 | def _execute_release_action(self): 268 | self.env.open_gripper() 269 | 270 | if __name__ == "__main__": 271 | parser = argparse.ArgumentParser() 272 | parser.add_argument('--task', type=str, default='pen', help='task to perform') 273 | parser.add_argument('--use_cached_query', action='store_true', help='instead of querying the VLM, use the cached query') 274 | parser.add_argument('--apply_disturbance', action='store_true', help='apply disturbance to test the robustness') 275 | parser.add_argument('--visualize', action='store_true', help='visualize each solution before executing (NOTE: this is blocking and needs to press "ESC" to continue)') 276 | args = parser.parse_args() 277 | 278 | if args.apply_disturbance: 279 | assert args.task == 'pen' and args.use_cached_query, 'disturbance sequence is only defined for cached scenario' 280 | 281 | # ==================================== 282 | # = pen task disturbance sequence 283 | # ==================================== 284 | def stage1_disturbance_seq(env): 285 | """ 286 | Move the pen in stage 0 when robot is trying to grasp the pen 287 | """ 288 | pen = env.og_env.scene.object_registry("name", "pen_1") 289 | holder = env.og_env.scene.object_registry("name", "pencil_holder_1") 290 | # disturbance sequence 291 | pos0, orn0 = pen.get_position_orientation() 292 | pose0 = np.concatenate([pos0, orn0]) 293 | pos1 = pos0 + np.array([-0.08, 0.0, 0.0]) 294 | orn1 = T.quat_multiply(T.euler2quat(np.array([0, 0, np.pi/4])), orn0) 295 | pose1 = np.concatenate([pos1, orn1]) 296 | pos2 = pos1 + np.array([0.10, 0.0, 0.0]) 297 | orn2 = T.quat_multiply(T.euler2quat(np.array([0, 0, -np.pi/2])), orn1) 298 | pose2 = np.concatenate([pos2, orn2]) 299 | control_points = np.array([pose0, pose1, pose2]) 300 | pose_seq = spline_interpolate_poses(control_points, num_steps=25) 301 | def disturbance(counter): 302 | if counter < len(pose_seq): 303 | pose = pose_seq[counter] 304 | pos, orn = pose[:3], pose[3:] 305 | pen.set_position_orientation(pos, orn) 306 | counter += 1 307 | counter = 0 308 | while True: 309 | yield disturbance(counter) 310 | counter += 1 311 | 312 | def stage2_disturbance_seq(env): 313 | """ 314 | Take the pen out of the gripper in stage 1 when robot is trying to reorient the pen 315 | """ 316 | apply_disturbance = env.is_grasping() 317 | pen = env.og_env.scene.object_registry("name", "pen_1") 318 | holder = env.og_env.scene.object_registry("name", "pencil_holder_1") 319 | # disturbance sequence 320 | pos0, orn0 = pen.get_position_orientation() 321 | pose0 = np.concatenate([pos0, orn0]) 322 | pose1 = np.array([-0.30, -0.15, 0.71, -0.7071068, 0, 0, 0.7071068]) 323 | control_points = np.array([pose0, pose1]) 324 | pose_seq = spline_interpolate_poses(control_points, num_steps=25) 325 | def disturbance(counter): 326 | if apply_disturbance: 327 | if counter < 20: 328 | if counter > 15: 329 | env.robot.release_grasp_immediately() # force robot to release the pen 330 | else: 331 | pass # do nothing for the other steps 332 | elif counter < len(pose_seq) + 20: 333 | env.robot.release_grasp_immediately() # force robot to release the pen 334 | pose = pose_seq[counter - 20] 335 | pos, orn = pose[:3], pose[3:] 336 | pen.set_position_orientation(pos, orn) 337 | counter += 1 338 | counter = 0 339 | while True: 340 | yield disturbance(counter) 341 | counter += 1 342 | 343 | def stage3_disturbance_seq(env): 344 | """ 345 | Move the holder in stage 2 when robot is trying to drop the pen into the holder 346 | """ 347 | pen = env.og_env.scene.object_registry("name", "pen_1") 348 | holder = env.og_env.scene.object_registry("name", "pencil_holder_1") 349 | # disturbance sequence 350 | pos0, orn0 = holder.get_position_orientation() 351 | pose0 = np.concatenate([pos0, orn0]) 352 | pos1 = pos0 + np.array([-0.02, -0.15, 0.0]) 353 | orn1 = orn0 354 | pose1 = np.concatenate([pos1, orn1]) 355 | control_points = np.array([pose0, pose1]) 356 | pose_seq = spline_interpolate_poses(control_points, num_steps=5) 357 | def disturbance(counter): 358 | if counter < len(pose_seq): 359 | pose = pose_seq[counter] 360 | pos, orn = pose[:3], pose[3:] 361 | holder.set_position_orientation(pos, orn) 362 | counter += 1 363 | counter = 0 364 | while True: 365 | yield disturbance(counter) 366 | counter += 1 367 | 368 | task_list = { 369 | 'pen': { 370 | 'scene_file': './configs/og_scene_file_pen.json', 371 | 'instruction': 'reorient the white pen and drop it upright into the black pen holder', 372 | 'rekep_program_dir': './vlm_query/pen', 373 | 'disturbance_seq': {1: stage1_disturbance_seq, 2: stage2_disturbance_seq, 3: stage3_disturbance_seq}, 374 | }, 375 | } 376 | task = task_list['pen'] 377 | scene_file = task['scene_file'] 378 | instruction = task['instruction'] 379 | main = Main(scene_file, visualize=args.visualize) 380 | main.perform_task(instruction, 381 | rekep_program_dir=task['rekep_program_dir'] if args.use_cached_query else None, 382 | disturbance_seq=task.get('disturbance_seq', None) if args.apply_disturbance else None) -------------------------------------------------------------------------------- /media/pen-in-holder-disturbances.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huangwl18/ReKep/63c43fdba60354980258beaeb8a7d48e088e1e3e/media/pen-in-holder-disturbances.gif -------------------------------------------------------------------------------- /media/pen-in-holder.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huangwl18/ReKep/63c43fdba60354980258beaeb8a7d48e088e1e3e/media/pen-in-holder.gif -------------------------------------------------------------------------------- /og_utils.py: -------------------------------------------------------------------------------- 1 | from omnigibson.sensors.vision_sensor import VisionSensor 2 | import transform_utils as T 3 | import numpy as np 4 | 5 | class OGCamera: 6 | """ 7 | Defines the camera class 8 | """ 9 | def __init__(self, og_env, config) -> None: 10 | self.cam = insert_camera(name=config['name'], og_env=og_env, width=config['resolution'], height=config['resolution']) 11 | self.cam.set_position_orientation(config['position'], config['orientation']) 12 | self.intrinsics = get_cam_intrinsics(self.cam) 13 | self.extrinsics = get_cam_extrinsics(self.cam) 14 | 15 | def get_params(self): 16 | """ 17 | Get the intrinsic and extrinsic parameters of the camera 18 | """ 19 | return {"intrinsics": self.intrinsics, "extrinsics": self.extrinsics} 20 | 21 | def get_obs(self): 22 | """ 23 | Gets the image observation from the camera. 24 | Assumes have rendered befor calling this function. 25 | No semantic handling here for now. 26 | """ 27 | obs = self.cam.get_obs() 28 | ret = {} 29 | ret["rgb"] = obs[0]["rgb"][:,:,:3] # H, W, 3 30 | ret["depth"] = obs[0]["depth_linear"] # H, W 31 | ret["points"] = pixel_to_3d_points(ret["depth"], self.intrinsics, self.extrinsics) # H, W, 3 32 | ret["seg"] = obs[0]["seg_semantic"] # H, W 33 | ret["intrinsic"] = self.intrinsics 34 | ret["extrinsic"] = self.extrinsics 35 | return ret 36 | 37 | def insert_camera(name, og_env, width=480, height=480): 38 | try: 39 | cam = VisionSensor( 40 | prim_path=f"/World/{name}", 41 | name=name, 42 | image_width=width, 43 | image_height=height, 44 | modalities=["rgb", "depth_linear", "seg_semantic"] 45 | ) 46 | except TypeError: 47 | cam = VisionSensor( 48 | relative_prim_path=f"/{name}", 49 | name=name, 50 | image_width=width, 51 | image_height=height, 52 | modalities=["rgb", "depth_linear", "seg_semantic"] 53 | ) 54 | 55 | try: 56 | cam.load() 57 | except TypeError: 58 | cam.load(og_env.scene) 59 | cam.initialize() 60 | return cam 61 | 62 | def get_cam_intrinsics(cam): 63 | """ 64 | Get the intrinsics matrix for a VisionSensor object 65 | ::param cam: VisionSensor object 66 | ::return intrinsics: 3x3 numpy array 67 | """ 68 | img_width = cam.image_width 69 | img_height = cam.image_height 70 | 71 | if img_width != img_height: 72 | raise ValueError("Only square images are supported") 73 | 74 | apert = cam.prim.GetAttribute("horizontalAperture").Get() 75 | focal_len_in_pixel = cam.focal_length * img_width / apert 76 | 77 | intrinsics = np.eye(3) 78 | intrinsics[0,0] = focal_len_in_pixel 79 | intrinsics[1,1] = focal_len_in_pixel 80 | intrinsics[0,2] = img_width / 2 81 | intrinsics[1,2] = img_height / 2 82 | 83 | return intrinsics 84 | 85 | def get_cam_extrinsics(cam): 86 | return T.pose_inv(T.pose2mat(cam.get_position_orientation())) 87 | 88 | def pixel_to_3d_points(depth_image, intrinsics, extrinsics): 89 | # Get the shape of the depth image 90 | H, W = depth_image.shape 91 | 92 | # Create a grid of (x, y) coordinates corresponding to each pixel in the image 93 | i, j = np.meshgrid(np.arange(W), np.arange(H), indexing='xy') 94 | 95 | # Unpack the intrinsic parameters 96 | fx, fy = intrinsics[0, 0], intrinsics[1, 1] 97 | cx, cy = intrinsics[0, 2], intrinsics[1, 2] 98 | 99 | # Convert pixel coordinates to normalized camera coordinates 100 | z = depth_image 101 | x = (i - cx) * z / fx 102 | y = (j - cy) * z / fy 103 | 104 | # Stack the coordinates to form (H, W, 3) 105 | camera_coordinates = np.stack((x, y, z), axis=-1) 106 | 107 | # Reshape to (H*W, 3) for matrix multiplication 108 | camera_coordinates = camera_coordinates.reshape(-1, 3) 109 | 110 | # Convert to homogeneous coordinates (H*W, 4) 111 | camera_coordinates_homogeneous = np.hstack((camera_coordinates, np.ones((camera_coordinates.shape[0], 1)))) 112 | 113 | # additional conversion to og convention 114 | T_mod = np.array([[1., 0., 0., 0., ], 115 | [0., -1., 0., 0.,], 116 | [0., 0., -1., 0.,], 117 | [0., 0., 0., 1.,]]) 118 | camera_coordinates_homogeneous = camera_coordinates_homogeneous @ T_mod 119 | 120 | # Apply extrinsics to get world coordinates 121 | # world_coordinates_homogeneous = camera_coordinates_homogeneous @ extrinsics.T 122 | world_coordinates_homogeneous = T.pose_inv(extrinsics) @ (camera_coordinates_homogeneous.T) 123 | world_coordinates_homogeneous = world_coordinates_homogeneous.T 124 | 125 | # Convert back to non-homogeneous coordinates 126 | world_coordinates = world_coordinates_homogeneous[:, :3] / world_coordinates_homogeneous[:, 3, np.newaxis] 127 | 128 | # Reshape back to (H, W, 3) 129 | world_coordinates = world_coordinates.reshape(H, W, 3) 130 | 131 | return world_coordinates 132 | 133 | def point_to_pixel(pt, intrinsics, extrinsics): 134 | """ 135 | pt -- (N, 3) 3d points in world frame 136 | intrinsics -- (3, 3) intrinsics matrix 137 | extrinsics -- (4, 4) extrinsics matrix 138 | """ 139 | pt_in_cam = extrinsics @ np.hstack((pt, np.ones((pt.shape[0], 1)))).T # (4, N) 140 | # multiply y, z by -1 141 | pt_in_cam[1, :] *= -1 142 | pt_in_cam[2, :] *= -1 143 | pt_in_cam = pt_in_cam[:3, :] 144 | pt_in_cam = intrinsics @ pt_in_cam 145 | pt_in_cam /= pt_in_cam[2, :] 146 | 147 | return pt_in_cam[:2, :].T -------------------------------------------------------------------------------- /path_solver.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from scipy.optimize import dual_annealing, minimize 3 | from scipy.interpolate import RegularGridInterpolator 4 | import copy 5 | import time 6 | import transform_utils as T 7 | from utils import ( 8 | farthest_point_sampling, 9 | get_linear_interpolation_steps, 10 | linear_interpolate_poses, 11 | normalize_vars, 12 | unnormalize_vars, 13 | get_samples_jitted, 14 | calculate_collision_cost, 15 | path_length, 16 | transform_keypoints, 17 | ) 18 | 19 | # ==================================== 20 | # = objective function 21 | # ==================================== 22 | def objective(opt_vars, 23 | og_bounds, 24 | start_pose, 25 | end_pose, 26 | keypoints_centered, 27 | keypoint_movable_mask, 28 | path_constraints, 29 | sdf_func, 30 | collision_points_centered, 31 | opt_interpolate_pos_step_size, 32 | opt_interpolate_rot_step_size, 33 | ik_solver, 34 | initial_joint_pos, 35 | reset_joint_pos, 36 | return_debug_dict=False): 37 | 38 | debug_dict = {} 39 | debug_dict['num_control_points'] = len(opt_vars) // 6 40 | 41 | # unnormalize variables and do conversion 42 | unnormalized_opt_vars = unnormalize_vars(opt_vars, og_bounds) 43 | control_points_euler = np.concatenate([start_pose[None], unnormalized_opt_vars.reshape(-1, 6), end_pose[None]], axis=0) # [num_control_points, 6] 44 | control_points_homo = T.convert_pose_euler2mat(control_points_euler) # [num_control_points, 4, 4] 45 | control_points_quat = T.convert_pose_mat2quat(control_points_homo) # [num_control_points, 7] 46 | # get dense samples 47 | poses_quat, num_poses = get_samples_jitted(control_points_homo, control_points_quat, opt_interpolate_pos_step_size, opt_interpolate_rot_step_size) 48 | poses_homo = T.convert_pose_quat2mat(poses_quat) 49 | debug_dict['num_poses'] = num_poses 50 | start_idx, end_idx = 1, num_poses - 1 # exclude start and goal 51 | 52 | cost = 0 53 | # collision cost 54 | if collision_points_centered is not None: 55 | collision_cost = 0.5 * calculate_collision_cost(poses_homo[start_idx:end_idx], sdf_func, collision_points_centered, 0.20) 56 | debug_dict['collision_cost'] = collision_cost 57 | cost += collision_cost 58 | 59 | # penalize path length 60 | pos_length, rot_length = path_length(poses_homo) 61 | approx_length = pos_length + rot_length * 1.0 62 | path_length_cost = 4.0 * approx_length 63 | debug_dict['path_length_cost'] = path_length_cost 64 | cost += path_length_cost 65 | 66 | # reachability cost 67 | ik_cost = 0 68 | reset_reg_cost = 0 69 | debug_dict['ik_pos_error'] = [] 70 | debug_dict['ik_feasible'] = [] 71 | max_iterations = 20 72 | for control_point_homo in control_points_homo: 73 | ik_result = ik_solver.solve( 74 | control_point_homo, 75 | max_iterations=max_iterations, 76 | initial_joint_pos=initial_joint_pos, 77 | ) 78 | debug_dict['ik_pos_error'].append(ik_result.position_error) 79 | debug_dict['ik_feasible'].append(ik_result.success) 80 | ik_cost += 20.0 * (ik_result.num_descents / max_iterations) 81 | if ik_result.success: 82 | reset_reg = np.linalg.norm(ik_result.cspace_position[:-1] - reset_joint_pos[:-1]) 83 | reset_reg = np.clip(reset_reg, 0.0, 3.0) 84 | else: 85 | reset_reg = 3.0 86 | reset_reg_cost += 0.2 * reset_reg 87 | debug_dict['ik_pos_error'] = np.array(debug_dict['ik_pos_error']) 88 | debug_dict['ik_feasible'] = np.array(debug_dict['ik_feasible']) 89 | debug_dict['ik_cost'] = ik_cost 90 | debug_dict['reset_reg_cost'] = reset_reg_cost 91 | cost += ik_cost 92 | 93 | # # path constraint violation cost 94 | debug_dict['path_violation'] = None 95 | if path_constraints is not None and len(path_constraints) > 0: 96 | path_constraint_cost = 0 97 | path_violation = [] 98 | for pose in poses_homo[start_idx:end_idx]: 99 | transformed_keypoints = transform_keypoints(pose, keypoints_centered, keypoint_movable_mask) 100 | for constraint in path_constraints: 101 | violation = constraint(transformed_keypoints[0], transformed_keypoints[1:]) 102 | path_violation.append(violation) 103 | path_constraint_cost += np.clip(violation, 0, np.inf) 104 | path_constraint_cost = 200.0*path_constraint_cost 105 | debug_dict['path_constraint_cost'] = path_constraint_cost 106 | debug_dict['path_violation'] = path_violation 107 | cost += path_constraint_cost 108 | 109 | debug_dict['total_cost'] = cost 110 | 111 | if return_debug_dict: 112 | return cost, debug_dict 113 | 114 | return cost 115 | 116 | 117 | class PathSolver: 118 | """ 119 | Given a goal pose and a start pose, solve for a sequence of intermediate poses for the end effector to follow. 120 | 121 | Optimization variables: 122 | - sequence of intermediate control points 123 | """ 124 | 125 | def __init__(self, config, ik_solver, reset_joint_pos): 126 | self.config = config 127 | self.ik_solver = ik_solver 128 | self.reset_joint_pos = reset_joint_pos 129 | self.last_opt_result = None 130 | # warmup 131 | self._warmup() 132 | 133 | def _warmup(self): 134 | start_pose = np.array([0.0, 0.0, 0.3, 0, 0, 0, 1]) 135 | end_pose = np.array([0.0, 0.0, 0.0, 0, 0, 0, 1]) 136 | keypoints = np.random.rand(10, 3) 137 | keypoint_movable_mask = np.random.rand(10) > 0.5 138 | path_constraints = [] 139 | sdf_voxels = np.zeros((10, 10, 10)) 140 | collision_points = np.random.rand(100, 3) 141 | self.solve(start_pose, end_pose, keypoints, keypoint_movable_mask, path_constraints, sdf_voxels, collision_points, None, from_scratch=True) 142 | self.last_opt_result = None 143 | 144 | def _setup_sdf(self, sdf_voxels): 145 | # create callable sdf function with interpolation 146 | x = np.linspace(self.config['bounds_min'][0], self.config['bounds_max'][0], sdf_voxels.shape[0]) 147 | y = np.linspace(self.config['bounds_min'][1], self.config['bounds_max'][1], sdf_voxels.shape[1]) 148 | z = np.linspace(self.config['bounds_min'][2], self.config['bounds_max'][2], sdf_voxels.shape[2]) 149 | sdf_func = RegularGridInterpolator((x, y, z), sdf_voxels, bounds_error=False, fill_value=0) 150 | return sdf_func 151 | 152 | def _check_opt_result(self, opt_result, path_quat, debug_dict, og_bounds): 153 | # accept the opt_result if it's only terminated due to iteration limit 154 | if (not opt_result.success and ('maximum' in opt_result.message.lower() or 'iteration' in opt_result.message.lower() or 'not necessarily' in opt_result.message.lower())): 155 | opt_result.success = True 156 | elif not opt_result.success: 157 | opt_result.message += '; invalid solution' 158 | # check whether path constraints are satisfied 159 | if debug_dict['path_violation'] is not None: 160 | path_violation = np.array(debug_dict['path_violation']) 161 | opt_result.message += f'; path_violation: {path_violation}' 162 | path_constraints_satisfied = all([violation <= self.config['constraint_tolerance'] for violation in path_violation]) 163 | if not path_constraints_satisfied: 164 | opt_result.success = False 165 | opt_result.message += f'; path constraint not satisfied' 166 | return opt_result 167 | 168 | def _center_collision_points_and_keypoints(self, ee_pose, collision_points, keypoints, keypoint_movable_mask): 169 | ee_pose_homo = T.pose2mat([ee_pose[:3], T.euler2quat(ee_pose[3:])]) 170 | centering_transform = np.linalg.inv(ee_pose_homo) 171 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3] 172 | keypoints_centered = transform_keypoints(centering_transform, keypoints, keypoint_movable_mask) 173 | return collision_points_centered, keypoints_centered 174 | 175 | def solve(self, 176 | start_pose, 177 | end_pose, 178 | keypoints, 179 | keypoint_movable_mask, 180 | path_constraints, 181 | sdf_voxels, 182 | collision_points, 183 | initial_joint_pos, 184 | from_scratch=False): 185 | """ 186 | Args: 187 | - start_pose (np.ndarray): [7], [x, y, z, qx, qy, qz, qw] 188 | - end_pose (np.ndarray): [7], [x, y, z, qx, qy, qz, qw] 189 | - keypoints (np.ndarray): [num_keypoints, 3] 190 | - keypoint_movable_mask (bool): whether the keypoints are on the object being grasped 191 | - path_constraints (List[Callable]): path constraints 192 | - sdf_voxels (np.ndarray): [H, W, D] 193 | - collision_points (np.ndarray): [num_points, 3], point cloud of the object being grasped 194 | - initial_joint_pos (np.ndarray): [N] initial joint positions of the robot. 195 | - from_scratch (bool): whether to start from scratch 196 | 197 | Returns: 198 | - opt_result (scipy.optimize.OptimizeResult): optimization opt_result 199 | - debug_dict (dict): debug information 200 | """ 201 | # downsample collision points 202 | if collision_points is not None and collision_points.shape[0] > self.config['max_collision_points']: 203 | collision_points = farthest_point_sampling(collision_points, self.config['max_collision_points']) 204 | sdf_func = self._setup_sdf(sdf_voxels) 205 | 206 | # ==================================== 207 | # = setup bounds 208 | # ==================================== 209 | # calculate an appropriate number of control points, including start and goal 210 | num_control_points = get_linear_interpolation_steps(start_pose, end_pose, self.config['opt_pos_step_size'], self.config['opt_rot_step_size']) 211 | num_control_points = np.clip(num_control_points, 3, 6) 212 | # transform to euler representation 213 | start_pose = np.concatenate([start_pose[:3], T.quat2euler(start_pose[3:])]) 214 | end_pose = np.concatenate([end_pose[:3], T.quat2euler(end_pose[3:])]) 215 | 216 | # bounds for decision variables 217 | og_bounds = [(b_min, b_max) for b_min, b_max in zip(self.config['bounds_min'], self.config['bounds_max'])] + \ 218 | [(-np.pi, np.pi) for _ in range(3)] 219 | og_bounds *= (num_control_points - 2) 220 | og_bounds = np.array(og_bounds, dtype=np.float64) 221 | bounds = [(-1, 1)] * len(og_bounds) 222 | num_vars = len(bounds) 223 | 224 | # ==================================== 225 | # = setup initial guess 226 | # ==================================== 227 | # use previous opt_result as initial guess if available 228 | if not from_scratch and self.last_opt_result is not None: 229 | init_sol = self.last_opt_result.x 230 | # if there are more control points in this iter, fill the rest with the last value + small noise 231 | if len(init_sol) < num_vars: 232 | new_x0 = np.empty(num_vars) 233 | new_x0[:len(init_sol)] = init_sol 234 | for i in range(len(init_sol), num_vars, 6): 235 | new_x0[i:i+6] = init_sol[-6:] + np.random.randn(6) * 0.01 236 | init_sol = new_x0 237 | # otherwise, use the last num_vars values 238 | else: 239 | init_sol = init_sol[-num_vars:] 240 | # initial guess as linear interpolation 241 | else: 242 | from_scratch = True 243 | interp_poses = linear_interpolate_poses(start_pose, end_pose, num_control_points) # [num_control_points, 6] 244 | init_sol = interp_poses[1:-1].flatten() # [num_control_points-2, 6] 245 | init_sol = normalize_vars(init_sol, og_bounds) 246 | 247 | # clip the initial guess to be within bounds 248 | for i, (b_min, b_max) in enumerate(bounds): 249 | init_sol[i] = np.clip(init_sol[i], b_min, b_max) 250 | 251 | # ==================================== 252 | # = other setup 253 | # ==================================== 254 | collision_points_centered, keypoints_centered = self._center_collision_points_and_keypoints(start_pose, collision_points, keypoints, keypoint_movable_mask) 255 | aux_args = (og_bounds, 256 | start_pose, 257 | end_pose, 258 | keypoints_centered, 259 | keypoint_movable_mask, 260 | path_constraints, 261 | sdf_func, 262 | collision_points_centered, 263 | self.config['opt_interpolate_pos_step_size'], 264 | self.config['opt_interpolate_rot_step_size'], 265 | self.ik_solver, 266 | initial_joint_pos, 267 | self.reset_joint_pos) 268 | 269 | # ==================================== 270 | # = solve optimization 271 | # ==================================== 272 | start = time.time() 273 | # use global optimization for the first iteration 274 | if from_scratch: 275 | opt_result = dual_annealing( 276 | func=objective, 277 | bounds=bounds, 278 | args=aux_args, 279 | maxfun=self.config['sampling_maxfun'], 280 | x0=init_sol, 281 | no_local_search=True, 282 | minimizer_kwargs={ 283 | 'method': 'SLSQP', 284 | 'options': self.config['minimizer_options'], 285 | }, 286 | ) 287 | # use gradient-based local optimization for the following iterations 288 | else: 289 | opt_result = minimize( 290 | fun=objective, 291 | x0=init_sol, 292 | args=aux_args, 293 | bounds=bounds, 294 | method='SLSQP', 295 | options=self.config['minimizer_options'], 296 | ) 297 | solve_time = time.time() - start 298 | 299 | # ==================================== 300 | # = post-process opt_result 301 | # ==================================== 302 | if isinstance(opt_result.message, list): 303 | opt_result.message = opt_result.message[0] 304 | # rerun to get debug info 305 | _, debug_dict = objective(opt_result.x, *aux_args, return_debug_dict=True) 306 | debug_dict['sol'] = opt_result.x.reshape(-1, 6) 307 | debug_dict['msg'] = opt_result.message 308 | debug_dict['solve_time'] = solve_time 309 | debug_dict['from_scratch'] = from_scratch 310 | debug_dict['type'] = 'path_solver' 311 | # unnormailze 312 | sol = unnormalize_vars(opt_result.x, og_bounds) 313 | # add end pose 314 | poses_euler = np.concatenate([sol.reshape(-1, 6), end_pose[None]], axis=0) 315 | poses_quat = T.convert_pose_euler2quat(poses_euler) # [num_control_points, 7] 316 | opt_result = self._check_opt_result(opt_result, poses_quat, debug_dict, og_bounds) 317 | # cache opt_result for future use if successful 318 | if opt_result.success: 319 | self.last_opt_result = copy.deepcopy(opt_result) 320 | return poses_quat, debug_dict -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | open3d==0.16.0 2 | omegaconf 3 | opencv-python 4 | hydra-core 5 | openai 6 | torch 7 | kmeans_pytorch 8 | parse -------------------------------------------------------------------------------- /subgoal_solver.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import time 3 | import copy 4 | from scipy.optimize import dual_annealing, minimize 5 | from scipy.interpolate import RegularGridInterpolator 6 | import transform_utils as T 7 | from utils import ( 8 | transform_keypoints, 9 | calculate_collision_cost, 10 | normalize_vars, 11 | unnormalize_vars, 12 | farthest_point_sampling, 13 | consistency, 14 | ) 15 | def objective(opt_vars, 16 | og_bounds, 17 | keypoints_centered, 18 | keypoint_movable_mask, 19 | goal_constraints, 20 | path_constraints, 21 | sdf_func, 22 | collision_points_centered, 23 | init_pose_homo, 24 | ik_solver, 25 | initial_joint_pos, 26 | reset_joint_pos, 27 | is_grasp_stage, 28 | return_debug_dict=False): 29 | 30 | debug_dict = {} 31 | # unnormalize variables and do conversion 32 | opt_pose = unnormalize_vars(opt_vars, og_bounds) 33 | opt_pose_homo = T.pose2mat([opt_pose[:3], T.euler2quat(opt_pose[3:])]) 34 | 35 | cost = 0 36 | # collision cost 37 | if collision_points_centered is not None: 38 | collision_cost = 0.8 * calculate_collision_cost(opt_pose_homo[None], sdf_func, collision_points_centered, 0.10) 39 | debug_dict['collision_cost'] = collision_cost 40 | cost += collision_cost 41 | 42 | # stay close to initial pose 43 | init_pose_cost = 1.0 * consistency(opt_pose_homo[None], init_pose_homo[None], rot_weight=1.5) 44 | debug_dict['init_pose_cost'] = init_pose_cost 45 | cost += init_pose_cost 46 | 47 | # reachability cost (approximated by number of IK iterations + regularization from reset joint pos) 48 | max_iterations = 20 49 | ik_result = ik_solver.solve( 50 | opt_pose_homo, 51 | max_iterations=max_iterations, 52 | initial_joint_pos=initial_joint_pos, 53 | ) 54 | ik_cost = 20.0 * (ik_result.num_descents / max_iterations) 55 | debug_dict['ik_feasible'] = ik_result.success 56 | debug_dict['ik_pos_error'] = ik_result.position_error 57 | debug_dict['ik_cost'] = ik_cost 58 | cost += ik_cost 59 | if ik_result.success: 60 | reset_reg = np.linalg.norm(ik_result.cspace_position[:-1] - reset_joint_pos[:-1]) 61 | reset_reg = np.clip(reset_reg, 0.0, 3.0) 62 | else: 63 | reset_reg = 3.0 64 | reset_reg_cost = 0.2 * reset_reg 65 | debug_dict['reset_reg_cost'] = reset_reg_cost 66 | cost += reset_reg_cost 67 | 68 | # grasp metric (better performance if using anygrasp or force-based grasp metrics) 69 | if is_grasp_stage: 70 | preferred_dir = np.array([0, 0, -1]) 71 | grasp_cost = -np.dot(opt_pose_homo[:3, 0], preferred_dir) + 1 # [0, 1] 72 | grasp_cost = 10.0 * grasp_cost 73 | debug_dict['grasp_cost'] = grasp_cost 74 | cost += grasp_cost 75 | 76 | # goal constraint violation cost 77 | debug_dict['subgoal_constraint_cost'] = None 78 | debug_dict['subgoal_violation'] = None 79 | if goal_constraints is not None and len(goal_constraints) > 0: 80 | subgoal_constraint_cost = 0 81 | transformed_keypoints = transform_keypoints(opt_pose_homo, keypoints_centered, keypoint_movable_mask) 82 | subgoal_violation = [] 83 | for constraint in goal_constraints: 84 | violation = constraint(transformed_keypoints[0], transformed_keypoints[1:]) 85 | subgoal_violation.append(violation) 86 | subgoal_constraint_cost += np.clip(violation, 0, np.inf) 87 | subgoal_constraint_cost = 200.0*subgoal_constraint_cost 88 | debug_dict['subgoal_constraint_cost'] = subgoal_constraint_cost 89 | debug_dict['subgoal_violation'] = subgoal_violation 90 | cost += subgoal_constraint_cost 91 | 92 | # path constraint violation cost 93 | debug_dict['path_violation'] = None 94 | if path_constraints is not None and len(path_constraints) > 0: 95 | path_constraint_cost = 0 96 | transformed_keypoints = transform_keypoints(opt_pose_homo, keypoints_centered, keypoint_movable_mask) 97 | path_violation = [] 98 | for constraint in path_constraints: 99 | violation = constraint(transformed_keypoints[0], transformed_keypoints[1:]) 100 | path_violation.append(violation) 101 | path_constraint_cost += np.clip(violation, 0, np.inf) 102 | path_constraint_cost = 200.0*path_constraint_cost 103 | debug_dict['path_constraint_cost'] = path_constraint_cost 104 | debug_dict['path_violation'] = path_violation 105 | cost += path_constraint_cost 106 | 107 | debug_dict['total_cost'] = cost 108 | 109 | if return_debug_dict: 110 | return cost, debug_dict 111 | 112 | return cost 113 | 114 | 115 | class SubgoalSolver: 116 | def __init__(self, config, ik_solver, reset_joint_pos): 117 | self.config = config 118 | self.ik_solver = ik_solver 119 | self.reset_joint_pos = reset_joint_pos 120 | self.last_opt_result = None 121 | # warmup 122 | self._warmup() 123 | 124 | def _warmup(self): 125 | ee_pose = np.array([0.0, 0.0, 0.0, 0, 0, 0, 1]) 126 | keypoints = np.random.rand(10, 3) 127 | keypoint_movable_mask = np.random.rand(10) > 0.5 128 | goal_constraints = [] 129 | path_constraints = [] 130 | sdf_voxels = np.zeros((10, 10, 10)) 131 | collision_points = np.random.rand(100, 3) 132 | self.solve(ee_pose, keypoints, keypoint_movable_mask, goal_constraints, path_constraints, sdf_voxels, collision_points, True, None, from_scratch=True) 133 | self.last_opt_result = None 134 | 135 | def _setup_sdf(self, sdf_voxels): 136 | # create callable sdf function with interpolation 137 | x = np.linspace(self.config['bounds_min'][0], self.config['bounds_max'][0], sdf_voxels.shape[0]) 138 | y = np.linspace(self.config['bounds_min'][1], self.config['bounds_max'][1], sdf_voxels.shape[1]) 139 | z = np.linspace(self.config['bounds_min'][2], self.config['bounds_max'][2], sdf_voxels.shape[2]) 140 | sdf_func = RegularGridInterpolator((x, y, z), sdf_voxels, bounds_error=False, fill_value=0) 141 | return sdf_func 142 | 143 | def _check_opt_result(self, opt_result, debug_dict): 144 | # accept the opt_result if it's only terminated due to iteration limit 145 | if (not opt_result.success and ('maximum' in opt_result.message.lower() or 'iteration' in opt_result.message.lower() or 'not necessarily' in opt_result.message.lower())): 146 | opt_result.success = True 147 | elif not opt_result.success: 148 | opt_result.message += '; invalid solution' 149 | # check whether goal constraints are satisfied 150 | if debug_dict['subgoal_violation'] is not None: 151 | goal_constraints_results = np.array(debug_dict['subgoal_violation']) 152 | opt_result.message += f'; goal_constraints_results: {goal_constraints_results} (higher is worse)' 153 | goal_constraints_satisfied = all([violation <= self.config['constraint_tolerance'] for violation in goal_constraints_results]) 154 | if not goal_constraints_satisfied: 155 | opt_result.success = False 156 | opt_result.message += f'; goal not satisfied' 157 | # check whether path constraints are satisfied 158 | if debug_dict['path_violation'] is not None: 159 | path_constraints_results = np.array(debug_dict['path_violation']) 160 | opt_result.message += f'; path_constraints_results: {path_constraints_results}' 161 | path_constraints_satisfied = all([violation <= self.config['constraint_tolerance'] for violation in path_constraints_results]) 162 | if not path_constraints_satisfied: 163 | opt_result.success = False 164 | opt_result.message += f'; path not satisfied' 165 | # check whether ik is feasible 166 | if 'ik_feasible' in debug_dict and not debug_dict['ik_feasible']: 167 | opt_result.success = False 168 | opt_result.message += f'; ik not feasible' 169 | return opt_result 170 | 171 | def _center_collision_points_and_keypoints(self, ee_pose_homo, collision_points, keypoints, keypoint_movable_mask): 172 | centering_transform = np.linalg.inv(ee_pose_homo) 173 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3] 174 | keypoints_centered = transform_keypoints(centering_transform, keypoints, keypoint_movable_mask) 175 | return collision_points_centered, keypoints_centered 176 | 177 | def solve(self, 178 | ee_pose, 179 | keypoints, 180 | keypoint_movable_mask, 181 | goal_constraints, 182 | path_constraints, 183 | sdf_voxels, 184 | collision_points, 185 | is_grasp_stage, 186 | initial_joint_pos, 187 | from_scratch=False, 188 | ): 189 | """ 190 | Args: 191 | - ee_pose (np.ndarray): [7], [x, y, z, qx, qy, qz, qw] end effector pose. 192 | - keypoints (np.ndarray): [M, 3] keypoint positions. 193 | - keypoint_movable_mask (bool): [M] boolean array indicating whether the keypoint is on the grasped object. 194 | - goal_constraints (List[Callable]): subgoal constraint functions. 195 | - path_constraints (List[Callable]): path constraint functions. 196 | - sdf_voxels (np.ndarray): [X, Y, Z] signed distance field of the environment. 197 | - collision_points (np.ndarray): [N, 3] point cloud of the object. 198 | - is_grasp_stage (bool): whether the current stage is a grasp stage. 199 | - initial_joint_pos (np.ndarray): [N] initial joint positions of the robot. 200 | - from_scratch (bool): whether to start from scratch. 201 | Returns: 202 | - result (scipy.optimize.OptimizeResult): optimization result. 203 | - debug_dict (dict): debug information. 204 | """ 205 | # downsample collision points 206 | if collision_points is not None and collision_points.shape[0] > self.config['max_collision_points']: 207 | collision_points = farthest_point_sampling(collision_points, self.config['max_collision_points']) 208 | sdf_func = self._setup_sdf(sdf_voxels) 209 | # ==================================== 210 | # = setup bounds and initial guess 211 | # ==================================== 212 | ee_pose = ee_pose.astype(np.float64) 213 | ee_pose_homo = T.pose2mat([ee_pose[:3], ee_pose[3:]]) 214 | ee_pose_euler = np.concatenate([ee_pose[:3], T.quat2euler(ee_pose[3:])]) 215 | # normalize opt variables to [0, 1] 216 | pos_bounds_min = self.config['bounds_min'] 217 | pos_bounds_max = self.config['bounds_max'] 218 | rot_bounds_min = np.array([-np.pi, -np.pi, -np.pi]) # euler angles 219 | rot_bounds_max = np.array([np.pi, np.pi, np.pi]) # euler angles 220 | og_bounds = [(b_min, b_max) for b_min, b_max in zip(np.concatenate([pos_bounds_min, rot_bounds_min]), np.concatenate([pos_bounds_max, rot_bounds_max]))] 221 | bounds = [(-1, 1)] * len(og_bounds) 222 | if not from_scratch and self.last_opt_result is not None: 223 | init_sol = self.last_opt_result.x 224 | else: 225 | init_sol = normalize_vars(ee_pose_euler, og_bounds) # start from the current pose 226 | from_scratch = True 227 | 228 | # ==================================== 229 | # = other setup 230 | # ==================================== 231 | collision_points_centered, keypoints_centered = self._center_collision_points_and_keypoints(ee_pose_homo, collision_points, keypoints, keypoint_movable_mask) 232 | aux_args = (og_bounds, 233 | keypoints_centered, 234 | keypoint_movable_mask, 235 | goal_constraints, 236 | path_constraints, 237 | sdf_func, 238 | collision_points_centered, 239 | ee_pose_homo, 240 | self.ik_solver, 241 | initial_joint_pos, 242 | self.reset_joint_pos, 243 | is_grasp_stage) 244 | 245 | # ==================================== 246 | # = solve optimization 247 | # ==================================== 248 | start = time.time() 249 | # use global optimization for the first iteration 250 | if from_scratch: 251 | opt_result = dual_annealing( 252 | func=objective, 253 | bounds=bounds, 254 | args=aux_args, 255 | maxfun=self.config['sampling_maxfun'], 256 | x0=init_sol, 257 | no_local_search=False, 258 | minimizer_kwargs={ 259 | 'method': 'SLSQP', 260 | 'options': self.config['minimizer_options'], 261 | }, 262 | ) 263 | # use gradient-based local optimization for the following iterations 264 | else: 265 | opt_result = minimize( 266 | fun=objective, 267 | x0=init_sol, 268 | args=aux_args, 269 | bounds=bounds, 270 | method='SLSQP', 271 | options=self.config['minimizer_options'], 272 | ) 273 | solve_time = time.time() - start 274 | 275 | # ==================================== 276 | # = post-process opt_result 277 | # ==================================== 278 | if isinstance(opt_result.message, list): 279 | opt_result.message = opt_result.message[0] 280 | # rerun to get debug info 281 | _, debug_dict = objective(opt_result.x, *aux_args, return_debug_dict=True) 282 | debug_dict['sol'] = opt_result.x 283 | debug_dict['msg'] = opt_result.message 284 | debug_dict['solve_time'] = solve_time 285 | debug_dict['from_scratch'] = from_scratch 286 | debug_dict['type'] = 'subgoal_solver' 287 | # unnormailze 288 | sol = unnormalize_vars(opt_result.x, og_bounds) 289 | sol = np.concatenate([sol[:3], T.euler2quat(sol[3:])]) 290 | opt_result = self._check_opt_result(opt_result, debug_dict) 291 | # cache opt_result for future use if successful 292 | if opt_result.success: 293 | self.last_opt_result = copy.deepcopy(opt_result) 294 | return sol, debug_dict -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | from numba import njit 4 | import open3d as o3d 5 | import datetime 6 | import scipy.interpolate as interpolate 7 | from scipy.spatial.transform import Slerp 8 | from scipy.spatial.transform import Rotation as R 9 | from scipy.spatial.transform import RotationSpline 10 | import transform_utils as T 11 | import yaml 12 | 13 | # =============================================== 14 | # = optimization utils 15 | # =============================================== 16 | def normalize_vars(vars, og_bounds): 17 | """ 18 | Given 1D variables and bounds, normalize the variables to [-1, 1] range. 19 | """ 20 | normalized_vars = np.empty_like(vars) 21 | for i, (b_min, b_max) in enumerate(og_bounds): 22 | normalized_vars[i] = (vars[i] - b_min) / (b_max - b_min) * 2 - 1 23 | return normalized_vars 24 | 25 | def unnormalize_vars(normalized_vars, og_bounds): 26 | """ 27 | Given 1D variables in [-1, 1] and original bounds, denormalize the variables to the original range. 28 | """ 29 | vars = np.empty_like(normalized_vars) 30 | for i, (b_min, b_max) in enumerate(og_bounds): 31 | vars[i] = (normalized_vars[i] + 1) / 2 * (b_max - b_min) + b_min 32 | return vars 33 | 34 | def calculate_collision_cost(poses, sdf_func, collision_points, threshold): 35 | assert poses.shape[1:] == (4, 4) 36 | transformed_pcs = batch_transform_points(collision_points, poses) 37 | transformed_pcs_flatten = transformed_pcs.reshape(-1, 3) # [num_poses * num_points, 3] 38 | signed_distance = sdf_func(transformed_pcs_flatten) + threshold # [num_poses * num_points] 39 | signed_distance = signed_distance.reshape(-1, collision_points.shape[0]) # [num_poses, num_points] 40 | non_zero_mask = signed_distance > 0 41 | collision_cost = np.sum(signed_distance[non_zero_mask]) 42 | return collision_cost 43 | 44 | @njit(cache=True, fastmath=True) 45 | def consistency(poses_a, poses_b, rot_weight=0.5): 46 | assert poses_a.shape[1:] == (4, 4) and poses_b.shape[1:] == (4, 4), 'poses must be of shape (N, 4, 4)' 47 | min_distances = np.zeros(len(poses_a), dtype=np.float64) 48 | for i in range(len(poses_a)): 49 | min_distance = 9999999 50 | a = poses_a[i] 51 | for j in range(len(poses_b)): 52 | b = poses_b[j] 53 | pos_distance = np.linalg.norm(a[:3, 3] - b[:3, 3]) 54 | rot_distance = angle_between_rotmat(a[:3, :3], b[:3, :3]) 55 | distance = pos_distance + rot_distance * rot_weight 56 | min_distance = min(min_distance, distance) 57 | min_distances[i] = min_distance 58 | return np.mean(min_distances) 59 | 60 | def transform_keypoints(transform, keypoints, movable_mask): 61 | assert transform.shape == (4, 4) 62 | transformed_keypoints = keypoints.copy() 63 | if movable_mask.sum() > 0: 64 | transformed_keypoints[movable_mask] = np.dot(keypoints[movable_mask], transform[:3, :3].T) + transform[:3, 3] 65 | return transformed_keypoints 66 | 67 | @njit(cache=True, fastmath=True) 68 | def batch_transform_points(points, transforms): 69 | """ 70 | Apply multiple of transformation to point cloud, return results of individual transformations. 71 | Args: 72 | points: point cloud (N, 3). 73 | transforms: M 4x4 transformations (M, 4, 4). 74 | Returns: 75 | np.array: point clouds (M, N, 3). 76 | """ 77 | assert transforms.shape[1:] == (4, 4), 'transforms must be of shape (M, 4, 4)' 78 | transformed_points = np.zeros((transforms.shape[0], points.shape[0], 3)) 79 | for i in range(transforms.shape[0]): 80 | pos, R = transforms[i, :3, 3], transforms[i, :3, :3] 81 | transformed_points[i] = np.dot(points, R.T) + pos 82 | return transformed_points 83 | 84 | @njit(cache=True, fastmath=True) 85 | def get_samples_jitted(control_points_homo, control_points_quat, opt_interpolate_pos_step_size, opt_interpolate_rot_step_size): 86 | assert control_points_homo.shape[1:] == (4, 4) 87 | # calculate number of samples per segment 88 | num_samples_per_segment = np.empty(len(control_points_homo) - 1, dtype=np.int64) 89 | for i in range(len(control_points_homo) - 1): 90 | start_pos = control_points_homo[i, :3, 3] 91 | start_rotmat = control_points_homo[i, :3, :3] 92 | end_pos = control_points_homo[i+1, :3, 3] 93 | end_rotmat = control_points_homo[i+1, :3, :3] 94 | pos_diff = np.linalg.norm(start_pos - end_pos) 95 | rot_diff = angle_between_rotmat(start_rotmat, end_rotmat) 96 | pos_num_steps = np.ceil(pos_diff / opt_interpolate_pos_step_size) 97 | rot_num_steps = np.ceil(rot_diff / opt_interpolate_rot_step_size) 98 | num_path_poses = int(max(pos_num_steps, rot_num_steps)) 99 | num_path_poses = max(num_path_poses, 2) # at least 2 poses, start and end 100 | num_samples_per_segment[i] = num_path_poses 101 | # fill in samples 102 | num_samples = num_samples_per_segment.sum() 103 | samples_7 = np.empty((num_samples, 7)) 104 | sample_idx = 0 105 | for i in range(len(control_points_quat) - 1): 106 | start_pos, start_xyzw = control_points_quat[i, :3], control_points_quat[i, 3:] 107 | end_pos, end_xyzw = control_points_quat[i+1, :3], control_points_quat[i+1, 3:] 108 | # using proper quaternion slerp interpolation 109 | poses_7 = np.empty((num_samples_per_segment[i], 7)) 110 | for j in range(num_samples_per_segment[i]): 111 | alpha = j / (num_samples_per_segment[i] - 1) 112 | pos = start_pos * (1 - alpha) + end_pos * alpha 113 | blended_xyzw = T.quat_slerp_jitted(start_xyzw, end_xyzw, alpha) 114 | pose_7 = np.empty(7) 115 | pose_7[:3] = pos 116 | pose_7[3:] = blended_xyzw 117 | poses_7[j] = pose_7 118 | samples_7[sample_idx:sample_idx+num_samples_per_segment[i]] = poses_7 119 | sample_idx += num_samples_per_segment[i] 120 | assert num_samples >= 2, f'num_samples: {num_samples}' 121 | return samples_7, num_samples 122 | 123 | @njit(cache=True, fastmath=True) 124 | def path_length(samples_homo): 125 | assert samples_homo.shape[1:] == (4, 4), 'samples_homo must be of shape (N, 4, 4)' 126 | pos_length = 0 127 | rot_length = 0 128 | for i in range(len(samples_homo) - 1): 129 | pos_length += np.linalg.norm(samples_homo[i, :3, 3] - samples_homo[i+1, :3, 3]) 130 | rot_length += angle_between_rotmat(samples_homo[i, :3, :3], samples_homo[i+1, :3, :3]) 131 | return pos_length, rot_length 132 | 133 | # =============================================== 134 | # = others 135 | # =============================================== 136 | def get_callable_grasping_cost_fn(env): 137 | def get_grasping_cost(keypoint_idx): 138 | keypoint_object = env.get_object_by_keypoint(keypoint_idx) 139 | return -env.is_grasping(candidate_obj=keypoint_object) + 1 # return 0 if grasping an object, 1 if not grasping any object 140 | return get_grasping_cost 141 | 142 | def get_config(config_path=None): 143 | if config_path is None: 144 | this_file_dir = os.path.dirname(os.path.abspath(__file__)) 145 | config_path = os.path.join(this_file_dir, 'configs/config.yaml') 146 | assert config_path and os.path.exists(config_path), f'config file does not exist ({config_path})' 147 | with open(config_path, 'r') as f: 148 | config = yaml.load(f, Loader=yaml.FullLoader) 149 | return config 150 | 151 | class bcolors: 152 | HEADER = '\033[95m' 153 | OKBLUE = '\033[94m' 154 | OKCYAN = '\033[96m' 155 | OKGREEN = '\033[92m' 156 | WARNING = '\033[93m' 157 | FAIL = '\033[91m' 158 | ENDC = '\033[0m' 159 | BOLD = '\033[1m' 160 | UNDERLINE = '\033[4m' 161 | 162 | def get_clock_time(milliseconds=False): 163 | curr_time = datetime.datetime.now() 164 | if milliseconds: 165 | return f'{curr_time.hour}:{curr_time.minute}:{curr_time.second}.{curr_time.microsecond // 1000}' 166 | else: 167 | return f'{curr_time.hour}:{curr_time.minute}:{curr_time.second}' 168 | 169 | def angle_between_quats(q1, q2): 170 | """Angle between two quaternions""" 171 | return 2 * np.arccos(np.clip(np.abs(np.dot(q1, q2)), -1, 1)) 172 | 173 | def filter_points_by_bounds(points, bounds_min, bounds_max, strict=True): 174 | """ 175 | Filter points by taking only points within workspace bounds. 176 | """ 177 | assert points.shape[1] == 3, "points must be (N, 3)" 178 | bounds_min = bounds_min.copy() 179 | bounds_max = bounds_max.copy() 180 | if not strict: 181 | bounds_min[:2] = bounds_min[:2] - 0.1 * (bounds_max[:2] - bounds_min[:2]) 182 | bounds_max[:2] = bounds_max[:2] + 0.1 * (bounds_max[:2] - bounds_min[:2]) 183 | bounds_min[2] = bounds_min[2] - 0.1 * (bounds_max[2] - bounds_min[2]) 184 | within_bounds_mask = ( 185 | (points[:, 0] >= bounds_min[0]) 186 | & (points[:, 0] <= bounds_max[0]) 187 | & (points[:, 1] >= bounds_min[1]) 188 | & (points[:, 1] <= bounds_max[1]) 189 | & (points[:, 2] >= bounds_min[2]) 190 | & (points[:, 2] <= bounds_max[2]) 191 | ) 192 | return within_bounds_mask 193 | 194 | def print_opt_debug_dict(debug_dict): 195 | print('\n' + '#' * 40) 196 | print(f'# Optimization debug info:') 197 | max_key_length = max(len(str(k)) for k in debug_dict.keys()) 198 | for k, v in debug_dict.items(): 199 | if isinstance(v, int) or isinstance(v, float): 200 | print(f'# {k:<{max_key_length}}: {v:.05f}') 201 | elif isinstance(v, list) and all(isinstance(x, int) or isinstance(x, float) for x in v): 202 | print(f'# {k:<{max_key_length}}: {np.array(v).round(5)}') 203 | else: 204 | print(f'# {k:<{max_key_length}}: {v}') 205 | print('#' * 40 + '\n') 206 | 207 | def merge_dicts(dicts): 208 | return { 209 | k : v 210 | for d in dicts 211 | for k, v in d.items() 212 | } 213 | 214 | def exec_safe(code_str, gvars=None, lvars=None): 215 | banned_phrases = ['import', '__'] 216 | for phrase in banned_phrases: 217 | assert phrase not in code_str 218 | 219 | if gvars is None: 220 | gvars = {} 221 | if lvars is None: 222 | lvars = {} 223 | empty_fn = lambda *args, **kwargs: None 224 | custom_gvars = merge_dicts([ 225 | gvars, 226 | {'exec': empty_fn, 'eval': empty_fn} 227 | ]) 228 | try: 229 | exec(code_str, custom_gvars, lvars) 230 | except Exception as e: 231 | print(f'Error executing code:\n{code_str}') 232 | raise e 233 | 234 | def load_functions_from_txt(txt_path, get_grasping_cost_fn): 235 | if txt_path is None: 236 | return [] 237 | # load txt file 238 | with open(txt_path, 'r') as f: 239 | functions_text = f.read() 240 | # execute functions 241 | gvars_dict = { 242 | 'np': np, 243 | 'get_grasping_cost_by_keypoint_idx': get_grasping_cost_fn, 244 | } # external library APIs 245 | lvars_dict = dict() 246 | exec_safe(functions_text, gvars=gvars_dict, lvars=lvars_dict) 247 | return list(lvars_dict.values()) 248 | 249 | @njit(cache=True, fastmath=True) 250 | def angle_between_rotmat(P, Q): 251 | R = np.dot(P, Q.T) 252 | cos_theta = (np.trace(R)-1)/2 253 | if cos_theta > 1: 254 | cos_theta = 1 255 | elif cos_theta < -1: 256 | cos_theta = -1 257 | return np.arccos(cos_theta) 258 | 259 | def fit_b_spline(control_points): 260 | # determine appropriate k 261 | k = min(3, control_points.shape[0]-1) 262 | spline = interpolate.splprep(control_points.T, s=0, k=k) 263 | return spline 264 | 265 | def sample_from_spline(spline, num_samples): 266 | sample_points = np.linspace(0, 1, num_samples) 267 | if isinstance(spline, RotationSpline): 268 | samples = spline(sample_points).as_matrix() # [num_samples, 3, 3] 269 | else: 270 | assert isinstance(spline, tuple) and len(spline) == 2, 'spline must be a tuple of (tck, u)' 271 | tck, u = spline 272 | samples = interpolate.splev(np.linspace(0, 1, num_samples), tck) # [spline_dim, num_samples] 273 | samples = np.array(samples).T # [num_samples, spline_dim] 274 | return samples 275 | 276 | def linear_interpolate_poses(start_pose, end_pose, num_poses): 277 | """ 278 | Interpolate between start and end pose. 279 | """ 280 | assert num_poses >= 2, 'num_poses must be at least 2' 281 | if start_pose.shape == (6,) and end_pose.shape == (6,): 282 | start_pos, start_euler = start_pose[:3], start_pose[3:] 283 | end_pos, end_euler = end_pose[:3], end_pose[3:] 284 | start_rotmat = T.euler2mat(start_euler) 285 | end_rotmat = T.euler2mat(end_euler) 286 | elif start_pose.shape == (4, 4) and end_pose.shape == (4, 4): 287 | start_pos = start_pose[:3, 3] 288 | start_rotmat = start_pose[:3, :3] 289 | end_pos = end_pose[:3, 3] 290 | end_rotmat = end_pose[:3, :3] 291 | elif start_pose.shape == (7,) and end_pose.shape == (7,): 292 | start_pos, start_quat = start_pose[:3], start_pose[3:] 293 | start_rotmat = T.quat2mat(start_quat) 294 | end_pos, end_quat = end_pose[:3], end_pose[3:] 295 | end_rotmat = T.quat2mat(end_quat) 296 | else: 297 | raise ValueError('start_pose and end_pose not recognized') 298 | slerp = Slerp([0, 1], R.from_matrix([start_rotmat, end_rotmat])) 299 | poses = [] 300 | for i in range(num_poses): 301 | alpha = i / (num_poses - 1) 302 | pos = start_pos * (1 - alpha) + end_pos * alpha 303 | rotmat = slerp(alpha).as_matrix() 304 | if start_pose.shape == (6,): 305 | euler = T.mat2euler(rotmat) 306 | poses.append(np.concatenate([pos, euler])) 307 | elif start_pose.shape == (4, 4): 308 | pose = np.eye(4) 309 | pose[:3, :3] = rotmat 310 | pose[:3, 3] = pos 311 | poses.append(pose) 312 | elif start_pose.shape == (7,): 313 | quat = T.mat2quat(rotmat) 314 | pose = np.concatenate([pos, quat]) 315 | poses.append(pose) 316 | return np.array(poses) 317 | 318 | def spline_interpolate_poses(control_points, num_steps): 319 | """ 320 | Interpolate between through the control points using spline interpolation. 321 | 1. Fit a b-spline through the positional terms of the control points. 322 | 2. Fit a RotationSpline through the rotational terms of the control points. 323 | 3. Sample the b-spline and RotationSpline at num_steps. 324 | 325 | Args: 326 | control_points: [N, 6] position + euler or [N, 4, 4] pose or [N, 7] position + quat 327 | num_steps: number of poses to interpolate 328 | Returns: 329 | poses: [num_steps, 6] position + euler or [num_steps, 4, 4] pose or [num_steps, 7] position + quat 330 | """ 331 | assert num_steps >= 2, 'num_steps must be at least 2' 332 | if isinstance(control_points, list): 333 | control_points = np.array(control_points) 334 | if control_points.shape[1] == 6: 335 | control_points_pos = control_points[:, :3] # [N, 3] 336 | control_points_euler = control_points[:, 3:] # [N, 3] 337 | control_points_rotmat = [] 338 | for control_point_euler in control_points_euler: 339 | control_points_rotmat.append(T.euler2mat(control_point_euler)) 340 | control_points_rotmat = np.array(control_points_rotmat) # [N, 3, 3] 341 | elif control_points.shape[1] == 4 and control_points.shape[2] == 4: 342 | control_points_pos = control_points[:, :3, 3] # [N, 3] 343 | control_points_rotmat = control_points[:, :3, :3] # [N, 3, 3] 344 | elif control_points.shape[1] == 7: 345 | control_points_pos = control_points[:, :3] 346 | control_points_rotmat = [] 347 | for control_point_quat in control_points[:, 3:]: 348 | control_points_rotmat.append(T.quat2mat(control_point_quat)) 349 | control_points_rotmat = np.array(control_points_rotmat) 350 | else: 351 | raise ValueError('control_points not recognized') 352 | # remove the duplicate points (threshold 1e-3) 353 | diff = np.linalg.norm(np.diff(control_points_pos, axis=0), axis=1) 354 | mask = diff > 1e-3 355 | # always keep the first and last points 356 | mask = np.concatenate([[True], mask[:-1], [True]]) 357 | control_points_pos = control_points_pos[mask] 358 | control_points_rotmat = control_points_rotmat[mask] 359 | # fit b-spline through positional terms control points 360 | pos_spline = fit_b_spline(control_points_pos) 361 | # fit RotationSpline through rotational terms control points 362 | times = pos_spline[1] 363 | rotations = R.from_matrix(control_points_rotmat) 364 | rot_spline = RotationSpline(times, rotations) 365 | # sample from the splines 366 | pos_samples = sample_from_spline(pos_spline, num_steps) # [num_steps, 3] 367 | rot_samples = sample_from_spline(rot_spline, num_steps) # [num_steps, 3, 3] 368 | if control_points.shape[1] == 6: 369 | poses = [] 370 | for i in range(num_steps): 371 | pose = np.concatenate([pos_samples[i], T.mat2euler(rot_samples[i])]) 372 | poses.append(pose) 373 | poses = np.array(poses) 374 | elif control_points.shape[1] == 4 and control_points.shape[2] == 4: 375 | poses = np.empty((num_steps, 4, 4)) 376 | poses[:, :3, :3] = rot_samples 377 | poses[:, :3, 3] = pos_samples 378 | poses[:, 3, 3] = 1 379 | elif control_points.shape[1] == 7: 380 | poses = np.empty((num_steps, 7)) 381 | for i in range(num_steps): 382 | quat = T.mat2quat(rot_samples[i]) 383 | pose = np.concatenate([pos_samples[i], quat]) 384 | poses[i] = pose 385 | return poses 386 | 387 | def get_linear_interpolation_steps(start_pose, end_pose, pos_step_size, rot_step_size): 388 | """ 389 | Given start and end pose, calculate the number of steps to interpolate between them. 390 | Args: 391 | start_pose: [6] position + euler or [4, 4] pose or [7] position + quat 392 | end_pose: [6] position + euler or [4, 4] pose or [7] position + quat 393 | pos_step_size: position step size 394 | rot_step_size: rotation step size 395 | Returns: 396 | num_path_poses: number of poses to interpolate 397 | """ 398 | if start_pose.shape == (6,) and end_pose.shape == (6,): 399 | start_pos, start_euler = start_pose[:3], start_pose[3:] 400 | end_pos, end_euler = end_pose[:3], end_pose[3:] 401 | start_rotmat = T.euler2mat(start_euler) 402 | end_rotmat = T.euler2mat(end_euler) 403 | elif start_pose.shape == (4, 4) and end_pose.shape == (4, 4): 404 | start_pos = start_pose[:3, 3] 405 | start_rotmat = start_pose[:3, :3] 406 | end_pos = end_pose[:3, 3] 407 | end_rotmat = end_pose[:3, :3] 408 | elif start_pose.shape == (7,) and end_pose.shape == (7,): 409 | start_pos, start_quat = start_pose[:3], start_pose[3:] 410 | start_rotmat = T.quat2mat(start_quat) 411 | end_pos, end_quat = end_pose[:3], end_pose[3:] 412 | end_rotmat = T.quat2mat(end_quat) 413 | else: 414 | raise ValueError('start_pose and end_pose not recognized') 415 | pos_diff = np.linalg.norm(start_pos - end_pos) 416 | rot_diff = angle_between_rotmat(start_rotmat, end_rotmat) 417 | pos_num_steps = np.ceil(pos_diff / pos_step_size) 418 | rot_num_steps = np.ceil(rot_diff / rot_step_size) 419 | num_path_poses = int(max(pos_num_steps, rot_num_steps)) 420 | num_path_poses = max(num_path_poses, 2) # at least start and end poses 421 | return num_path_poses 422 | 423 | def farthest_point_sampling(pc, num_points): 424 | """ 425 | Given a point cloud, sample num_points points that are the farthest apart. 426 | Use o3d farthest point sampling. 427 | """ 428 | assert pc.ndim == 2 and pc.shape[1] == 3, "pc must be a (N, 3) numpy array" 429 | pcd = o3d.geometry.PointCloud() 430 | pcd.points = o3d.utility.Vector3dVector(pc) 431 | downpcd_farthest = pcd.farthest_point_down_sample(num_points) 432 | return np.asarray(downpcd_farthest.points) -------------------------------------------------------------------------------- /visualizer.py: -------------------------------------------------------------------------------- 1 | import open3d as o3d 2 | import numpy as np 3 | import matplotlib 4 | import cv2 5 | import transform_utils as T 6 | from utils import filter_points_by_bounds, batch_transform_points 7 | 8 | def add_to_visualize_buffer(visualize_buffer, visualize_points, visualize_colors): 9 | assert visualize_points.shape[0] == visualize_colors.shape[0], f'got {visualize_points.shape[0]} for points and {visualize_colors.shape[0]} for colors' 10 | if len(visualize_points) == 0: 11 | return 12 | assert visualize_points.shape[1] == 3 13 | assert visualize_colors.shape[1] == 3 14 | # assert visualize_colors.max() <= 1.0 and visualize_colors.min() >= 0.0 15 | visualize_buffer["points"].append(visualize_points) 16 | visualize_buffer["colors"].append(visualize_colors) 17 | 18 | def generate_nearby_points(point, num_points_per_side=5, half_range=0.005): 19 | if point.ndim == 1: 20 | offsets = np.linspace(-1, 1, num_points_per_side) 21 | offsets_meshgrid = np.meshgrid(offsets, offsets, offsets) 22 | offsets_array = np.stack(offsets_meshgrid, axis=-1).reshape(-1, 3) 23 | nearby_points = point + offsets_array * half_range 24 | return nearby_points.reshape(-1, 3) 25 | else: 26 | assert point.shape[1] == 3, "point must be (N, 3)" 27 | assert point.ndim == 2, "point must be (N, 3)" 28 | # vectorized version 29 | offsets = np.linspace(-1, 1, num_points_per_side) 30 | offsets_meshgrid = np.meshgrid(offsets, offsets, offsets) 31 | offsets_array = np.stack(offsets_meshgrid, axis=-1).reshape(-1, 3) 32 | nearby_points = point[:, None, :] + offsets_array 33 | return nearby_points 34 | 35 | class Visualizer: 36 | def __init__(self, config, env): 37 | self.config = config 38 | self.env = env 39 | self.bounds_min = np.array(self.config['bounds_min']) 40 | self.bounds_max = np.array(self.config['bounds_max']) 41 | self.color = np.array([0.05, 0.55, 0.26]) 42 | self.world2viewer = np.array([ 43 | [0.3788, 0.3569, -0.8539, 0.0], 44 | [0.9198, -0.0429, 0.3901, 0.0], 45 | [-0.1026, 0.9332, 0.3445, 0.0], 46 | [0.0, 0.0, 0.0, 1.0] 47 | ]).T 48 | 49 | def show_img(self, rgb): 50 | cv2.imshow('img', rgb[..., ::-1]) 51 | cv2.waitKey(0) 52 | print('showing image, click on the window and press "ESC" to close and continue') 53 | cv2.destroyAllWindows() 54 | 55 | def show_pointcloud(self, points, colors): 56 | # transform to viewer frame 57 | points = np.dot(points, self.world2viewer[:3, :3].T) + self.world2viewer[:3, 3] 58 | # clip color to [0, 1] 59 | colors = np.clip(colors, 0.0, 1.0) 60 | pcd = o3d.geometry.PointCloud() 61 | pcd.points = o3d.utility.Vector3dVector(points.astype(np.float64)) # float64 is a lot faster than float32 when added to o3d later 62 | pcd.colors = o3d.utility.Vector3dVector(colors.astype(np.float64)) 63 | print('visualizing pointcloud, click on the window and press "ESC" to close and continue') 64 | o3d.visualization.draw_geometries([pcd]) 65 | 66 | def _get_scene_points_and_colors(self): 67 | # scene 68 | cam_obs = self.env.get_cam_obs() 69 | scene_points = [] 70 | scene_colors = [] 71 | for cam_id in range(len(cam_obs)): 72 | cam_points = cam_obs[cam_id]['points'].reshape(-1, 3) 73 | cam_colors = cam_obs[cam_id]['rgb'].reshape(-1, 3) / 255.0 74 | # clip to workspace 75 | within_workspace_mask = filter_points_by_bounds(cam_points, self.bounds_min, self.bounds_max, strict=False) 76 | cam_points = cam_points[within_workspace_mask] 77 | cam_colors = cam_colors[within_workspace_mask] 78 | scene_points.append(cam_points) 79 | scene_colors.append(cam_colors) 80 | scene_points = np.concatenate(scene_points, axis=0) 81 | scene_colors = np.concatenate(scene_colors, axis=0) 82 | return scene_points, scene_colors 83 | 84 | def visualize_subgoal(self, subgoal_pose): 85 | visualize_buffer = { 86 | "points": [], 87 | "colors": [] 88 | } 89 | # scene 90 | scene_points, scene_colors = self._get_scene_points_and_colors() 91 | add_to_visualize_buffer(visualize_buffer, scene_points, scene_colors) 92 | subgoal_pose_homo = T.convert_pose_quat2mat(subgoal_pose) 93 | # subgoal 94 | collision_points = self.env.get_collision_points(noise=False) 95 | # transform collision points to the subgoal frame 96 | ee_pose = self.env.get_ee_pose() 97 | ee_pose_homo = T.convert_pose_quat2mat(ee_pose) 98 | centering_transform = np.linalg.inv(ee_pose_homo) 99 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3] 100 | transformed_collision_points = batch_transform_points(collision_points_centered, subgoal_pose_homo[None]).reshape(-1, 3) 101 | collision_points_colors = np.array([self.color] * len(collision_points)) 102 | add_to_visualize_buffer(visualize_buffer, transformed_collision_points, collision_points_colors) 103 | # add keypoints 104 | keypoints = self.env.get_keypoint_positions() 105 | num_keypoints = keypoints.shape[0] 106 | color_map = matplotlib.colormaps["gist_rainbow"] 107 | keypoints_colors = [color_map(i / num_keypoints)[:3] for i in range(num_keypoints)] 108 | for i in range(num_keypoints): 109 | nearby_points = generate_nearby_points(keypoints[i], num_points_per_side=6, half_range=0.009) 110 | nearby_colors = np.tile(keypoints_colors[i], (nearby_points.shape[0], 1)) 111 | nearby_colors = 0.5 * nearby_colors + 0.5 * np.array([1, 1, 1]) 112 | add_to_visualize_buffer(visualize_buffer, nearby_points, nearby_colors) 113 | # visualize 114 | visualize_points = np.concatenate(visualize_buffer["points"], axis=0) 115 | visualize_colors = np.concatenate(visualize_buffer["colors"], axis=0) 116 | self.show_pointcloud(visualize_points, visualize_colors) 117 | 118 | def visualize_path(self, path): 119 | visualize_buffer = { 120 | "points": [], 121 | "colors": [] 122 | } 123 | # scene 124 | scene_points, scene_colors = self._get_scene_points_and_colors() 125 | add_to_visualize_buffer(visualize_buffer, scene_points, scene_colors) 126 | # draw curve based on poses 127 | for t in range(len(path) - 1): 128 | start = path[t][:3] 129 | end = path[t + 1][:3] 130 | num_interp_points = int(np.linalg.norm(start - end) / 0.0002) 131 | interp_points = np.linspace(start, end, num_interp_points) 132 | interp_colors = np.tile([0.0, 0.0, 0.0], (num_interp_points, 1)) 133 | # add a tint of white (the higher the j, the more white) 134 | whitening_coef = 0.3 + 0.5 * (t / len(path)) 135 | interp_colors = (1 - whitening_coef) * interp_colors + whitening_coef * np.array([1, 1, 1]) 136 | add_to_visualize_buffer(visualize_buffer, interp_points, interp_colors) 137 | # subsample path with a fixed step size 138 | step_size = 0.05 139 | subpath = [path[0]] 140 | for i in range(1, len(path) - 1): 141 | dist = np.linalg.norm(np.array(path[i][:3]) - np.array(subpath[-1][:3])) 142 | if dist > step_size: 143 | subpath.append(path[i]) 144 | subpath.append(path[-1]) 145 | path = np.array(subpath) 146 | path_length = path.shape[0] 147 | # path points 148 | collision_points = self.env.get_collision_points(noise=False) 149 | num_points = collision_points.shape[0] 150 | start_pose = self.env.get_ee_pose() 151 | centering_transform = np.linalg.inv(T.convert_pose_quat2mat(start_pose)) 152 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3] 153 | poses_homo = T.convert_pose_quat2mat(path[:, :7]) # the last number is gripper action 154 | transformed_collision_points = batch_transform_points(collision_points_centered, poses_homo).reshape(-1, 3) # (num_poses, num_points, 3) 155 | # calculate color based on the timestep 156 | collision_points_colors = np.ones([path_length, num_points, 3]) * self.color[None, None] 157 | for t in range(path_length): 158 | whitening_coef = 0.3 + 0.5 * (t / path_length) 159 | collision_points_colors[t] = (1 - whitening_coef) * collision_points_colors[t] + whitening_coef * np.array([1, 1, 1]) 160 | collision_points_colors = collision_points_colors.reshape(-1, 3) 161 | add_to_visualize_buffer(visualize_buffer, transformed_collision_points, collision_points_colors) 162 | # keypoints 163 | keypoints = self.env.get_keypoint_positions() 164 | num_keypoints = keypoints.shape[0] 165 | color_map = matplotlib.colormaps["gist_rainbow"] 166 | keypoints_colors = [color_map(i / num_keypoints)[:3] for i in range(num_keypoints)] 167 | for i in range(num_keypoints): 168 | nearby_points = generate_nearby_points(keypoints[i], num_points_per_side=6, half_range=0.009) 169 | nearby_colors = np.tile(keypoints_colors[i], (nearby_points.shape[0], 1)) 170 | nearby_colors = 0.5 * nearby_colors + 0.5 * np.array([1, 1, 1]) 171 | add_to_visualize_buffer(visualize_buffer, nearby_points, nearby_colors) 172 | # visualize 173 | visualize_points = np.concatenate(visualize_buffer["points"], axis=0) 174 | visualize_colors = np.concatenate(visualize_buffer["colors"], axis=0) 175 | self.show_pointcloud(visualize_points, visualize_colors) -------------------------------------------------------------------------------- /vlm_query/pen/metadata.json: -------------------------------------------------------------------------------- 1 | {"init_keypoint_positions": [[-0.25794319243703623, -0.2360844662773828, 0.6905476882082092], [-0.2671639177617655, -0.09433527815132495, 0.7122930549896219], [-0.27151234338495983, 0.08395197554366488, 0.7154432798183405], [-0.266379743453929, 0.08175199572534772, 0.8097122583174985], [-0.3607951132392543, 0.18610731832223598, 0.7178293666264162], [-0.2624312374613168, 0.195909440226552, 0.6867906392498943], [-0.31782969861483357, 0.21783651249007482, 0.7905845242493158]], "num_keypoints": 7, "num_stages": 3, "grasp_keypoints": [1, -1, -1], "release_keypoints": [-1, -1, 1]} -------------------------------------------------------------------------------- /vlm_query/pen/query_img.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huangwl18/ReKep/63c43fdba60354980258beaeb8a7d48e088e1e3e/vlm_query/pen/query_img.png -------------------------------------------------------------------------------- /vlm_query/pen/stage1_subgoal_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage1_subgoal_constraint1(end_effector, keypoints): 2 | """Align the end-effector with the white pen's grasping point (keypoint 1).""" 3 | grasp_point = keypoints[1] 4 | cost = np.linalg.norm(end_effector - grasp_point) 5 | return cost 6 | 7 | -------------------------------------------------------------------------------- /vlm_query/pen/stage2_path_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage2_path_constraint1(end_effector, keypoints): 2 | """The robot must still be grasping the white pen (keypoint 1).""" 3 | return get_grasping_cost_by_keypoint_idx(1) 4 | 5 | -------------------------------------------------------------------------------- /vlm_query/pen/stage2_subgoal_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage2_subgoal_constraint1(end_effector, keypoints): 2 | """Ensure the white pen is upright by aligning the vector from keypoint 0 to keypoint 1 with the z-axis.""" 3 | pen_vector = keypoints[1] - keypoints[0] 4 | z_axis = np.array([0, 0, 1]) 5 | cost = np.linalg.norm(np.cross(pen_vector, z_axis)) 6 | return cost 7 | 8 | -------------------------------------------------------------------------------- /vlm_query/pen/stage3_path_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage3_path_constraint1(end_effector, keypoints): 2 | """The robot must still be grasping the white pen (keypoint 1).""" 3 | return get_grasping_cost_by_keypoint_idx(1) 4 | 5 | -------------------------------------------------------------------------------- /vlm_query/pen/stage3_subgoal_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage3_subgoal_constraint1(end_effector, keypoints): 2 | """Ensure the white pen is above the black pen holder opening (mean of keypoints 3, 4, 5, 6) and aligned with the z-axis.""" 3 | holder_opening = np.mean(keypoints[3:7], axis=0) 4 | above_holder = holder_opening + np.array([0, 0, 0.2]) # 20cm above the holder opening 5 | cost = np.linalg.norm(keypoints[1] - above_holder) 6 | return cost 7 | 8 | def stage3_subgoal_constraint2(end_effector, keypoints): 9 | """Ensure the white pen is upright by aligning the vector from keypoint 0 to keypoint 1 with the z-axis.""" 10 | pen_vector = keypoints[1] - keypoints[0] 11 | z_axis = np.array([0, 0, 1]) 12 | cost = np.linalg.norm(np.cross(pen_vector, z_axis)) 13 | return cost 14 | 15 | -------------------------------------------------------------------------------- /vlm_query/prompt_template.txt: -------------------------------------------------------------------------------- 1 | ## Instructions 2 | Suppose you are controlling a robot to perform manipulation tasks by writing constraint functions in Python. The manipulation task is given as an image of the environment, overlayed with keypoints marked with their indices, along with a text instruction. For each given task, please perform the following steps: 3 | - Determine how many stages are involved in the task. Grasping must be an independent stage. Some examples: 4 | - "pouring tea from teapot": 5 | - 3 stages: "grasp teapot", "align teapot with cup opening", and "pour liquid" 6 | - "put red block on top of blue block": 7 | - 3 stages: "grasp red block", "drop the red block on top of blue block" 8 | - "reorient bouquet and drop it upright into vase": 9 | - 3 stages: "grasp bouquet", "reorient bouquet", and "keep upright and drop into vase" 10 | - For each stage, write two kinds of constraints, "sub-goal constraints" and "path constraints". The "sub-goal constraints" are constraints that must be satisfied **at the end of the stage**, while the "path constraints" are constraints that must be satisfied **within the stage**. Some examples: 11 | - "pouring liquid from teapot": 12 | - "grasp teapot" stage: 13 | - 1 sub-goal constraints: "align the end-effector with the teapot handle" 14 | - 0 path constraints 15 | - "align teapot with cup opening" stage: 16 | - 1 sub-goal constraints: "the teapot spout needs to be 10cm above the cup opening" 17 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot must stay upright to avoid spilling" 18 | - "pour liquid" stage: 19 | - 2 sub-goal constraints: "the teapot spout needs to be 5cm above the cup opening", "the teapot spout must be tilted to pour liquid" 20 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot spout is directly above the cup opening" 21 | - "put red block on top of blue block": 22 | - "grasp red block" stage: 23 | - 1 sub-goal constraints: "align the end-effector with the red block" 24 | - 0 path constraints 25 | - "drop the red block on top of blue block" stage: 26 | - 1 sub-goal constraints: "the red block is 10cm on top of the blue block" 27 | - 1 path constraints: "the robot must still be grasping the red block" 28 | - "eorient bouquet and drop it upright into vase": 29 | - "grasp bouquet" stage: 30 | - 1 sub-goal constraints: "align the end-effector with the bouquet stem" 31 | - 0 path constraints 32 | - "reorient bouquet" stage: 33 | - 1 sub-goal constraints: "the bouquet is upright (parallel to the z-axis)" 34 | - 1 path constraints: "the robot must still be grasping the bouquet stem" 35 | - "keep upright and drop into vase" stage: 36 | - 2 sub-goal constraints: "the bouquet must still stay upright (parallel to the z-axis)", "the bouquet is 20cm above the vase opening" 37 | - 1 path constraints: "the robot must still be grasping the bouquet stem" 38 | - Summarize keypoints to be grasped in all grasping stages by defining the `grasp_keypoints` variable. 39 | - Summarize at the end of which stage the robot should release the keypoints by defining the `release_keypoints` variable. 40 | 41 | **Note:** 42 | - Each constraint takes a dummy end-effector point and a set of keypoints as input and returns a numerical cost, where the constraint is satisfied if the cost is smaller than or equal to zero. 43 | - For each stage, you may write 0 or more sub-goal constraints and 0 or more path constraints. 44 | - Avoid using "if" statements in your constraints. 45 | - Avoid using path constraints when manipulating deformable objects (e.g., clothing, towels). 46 | - You do not need to consider collision avoidance. Focus on what is necessary to complete the task. 47 | - Inputs to the constraints are as follows: 48 | - `end_effector`: np.array of shape `(3,)` representing the end-effector position. 49 | - `keypoints`: np.array of shape `(K, 3)` representing the keypoint positions. 50 | - For any path constraint that requires the robot to be still grasping a keypoint `i`, you may use the provided function `get_grasping_cost_by_keypoint_idx` by calling `return get_grasping_cost_by_keypoint_idx(i)` where `i` is the index of the keypoint. 51 | - Inside of each function, you may use native Python functions, any NumPy functions, and the provided `get_grasping_cost_by_keypoint_idx` function. 52 | - For grasping stage, you should only write one sub-goal constraint that associates the end-effector with a keypoint. No path constraints are needed. 53 | - In order to move a keypoint, its associated object must be grasped in one of the previous stages. 54 | - The robot can only grasp one object at a time. 55 | - Grasping must be an independent stage from other stages. 56 | - You may use two keypoints to form a vector, which can be used to specify a rotation (by specifying the angle between the vector and a fixed axis). 57 | - You may use multiple keypoints to specify a surface or volume. 58 | - The keypoints marked on the image start with index 0, same as the given argument `keypoints` array. 59 | - For a point `i` to be relative to another point `j`, the function should define an `offsetted_point` variable that has the delta added to keypoint `j and then calculate the norm of the xyz coordinates of the keypoint `i` and the `offsetted_point`. 60 | - If you would like to specify a location not marked by a keypoint, try using multiple keypoints to specify the location (e.g., you may take the mean of multiple keypoints if the desired location is in the center of those keypoints). 61 | 62 | **Structure your output in a single python code block as follows:** 63 | ```python 64 | 65 | # Your explanation of how many stages are involved in the task and what each stage is about. 66 | # ... 67 | 68 | num_stages = ? 69 | 70 | ### stage 1 sub-goal constraints (if any) 71 | def stage1_subgoal_constraint1(end_effector, keypoints): 72 | """Put your explanation here.""" 73 | ... 74 | return cost 75 | # Add more sub-goal constraints if needed 76 | ... 77 | 78 | ### stage 1 path constraints (if any) 79 | def stage1_path_constraint1(end_effector, keypoints): 80 | """Put your explanation here.""" 81 | ... 82 | return cost 83 | # Add more path constraints if needed 84 | ... 85 | 86 | # repeat for more stages 87 | ... 88 | 89 | """ 90 | Summarize keypoints to be grasped in all grasping stages. 91 | The length of the list should be equal to the number of stages. 92 | For grapsing stage, write the keypoint index. For non-grasping stage, write -1. 93 | """ 94 | grasp_keypoints = [?, ..., ?] 95 | 96 | """ 97 | Summarize at **the end of which stage** the robot should release the keypoints. 98 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously). 99 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase. 100 | The length of the list should be equal to the number of stages. 101 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1. 102 | """ 103 | release_keypoints = [?, ..., ?] 104 | 105 | ``` 106 | 107 | ## Query 108 | Query Task: "{instruction}" 109 | Query Image: --------------------------------------------------------------------------------