├── .gitignore
├── README.md
├── configs
├── config.yaml
└── og_scene_file_pen.json
├── constraint_generation.py
├── environment.py
├── ik_solver.py
├── keypoint_proposal.py
├── main.py
├── media
├── pen-in-holder-disturbances.gif
└── pen-in-holder.gif
├── og_utils.py
├── path_solver.py
├── requirements.txt
├── subgoal_solver.py
├── transform_utils.py
├── utils.py
├── visualizer.py
└── vlm_query
├── pen
├── metadata.json
├── query_img.png
├── stage1_subgoal_constraints.txt
├── stage2_path_constraints.txt
├── stage2_subgoal_constraints.txt
├── stage3_path_constraints.txt
└── stage3_subgoal_constraints.txt
└── prompt_template.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | *cache*
2 | **/__pycache__/**
3 | **/*.egg-info/
4 | *.egg
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ## ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
2 |
3 | #### [[Project Page]](https://rekep-robot.github.io/) [[Paper]](https://rekep-robot.github.io/rekep.pdf) [[Video]](https://youtu.be/2S8YhBdLdww)
4 |
5 | [Wenlong Huang](https://wenlong.page)1, [Chen Wang](https://www.chenwangjeremy.net/)1*, [Yunzhu Li](https://yunzhuli.github.io/)2*, [Ruohan Zhang](https://ai.stanford.edu/~zharu/)1, [Li Fei-Fei](https://profiles.stanford.edu/fei-fei-li)1 (\* indicates equal contributions)
6 |
7 | 1Stanford University, 3Columbia University
8 |
9 |
10 |
11 | This is the official demo code for [ReKep](https://rekep-robot.github.io/) implemented in [OmniGibson](https://behavior.stanford.edu/omnigibson/index.html). ReKep is a method that uses large vision models and vision-language models in a hierarchical optimization framework to generate closed-loop trajectories for manipulation tasks.
12 |
13 | **Note: This codebase currently does not contain the perception pipeline used in our real-world experiments, including keypoint tracking, mask tracking, and SDF reconstruction. Instead, it directly accesses this information from the simulation. If you are interested in deploying the code on a real robot, please refer to [this section](#real-world-deployment) below and the paper [Appendix](https://rekep-robot.github.io/rekep.pdf#page=20.09), where further details are provided.**
14 |
15 | ## Setup Instructions
16 |
17 | Note that this codebase is best run with a display. For running in headless mode, refer to the [instructions in OmniGibson](https://behavior.stanford.edu/omnigibson/getting_started/installation.html).
18 |
19 | - Install [OmniGibson](https://behavior.stanford.edu/omnigibson/getting_started/installation.html). This code is tested on [this commit](https://github.com/StanfordVL/OmniGibson/tree/cc0316a0574018a3cb2956fcbff3be75c07cdf0f).
20 |
21 | NOTE: If you encounter the warning `We did not find Isaac Sim under ~/.local/share/ov/pkg.` when running `./scripts/setup.sh` for OmniGibson, first ensure that you have installed Isaac Sim. Assuming Isaac Sim is installed in the default directory, then provide the following path `/home/[USERNAME]/.local/share/ov/pkg/isaac-sim-2023.1.1` (replace `[USERNAME]` with your username).
22 |
23 | - Install ReKep in the same conda environment:
24 | ```Shell
25 | conda activate omnigibson
26 | cd ..
27 | git clone https://github.com/huangwl18/ReKep.git
28 | cd ReKep
29 | pip install -r requirements.txt
30 | ```
31 |
32 | - Obtain an [OpenAI API](https://openai.com/blog/openai-api) key and set it up as an environment variable:
33 | ```Shell
34 | export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
35 | ```
36 |
37 | ## Running Demo
38 |
39 | We provide a demo "pen-in-holder" task that illustrates the core idea in ReKep. Below we provide several options to run the demo.
40 |
41 | Notes:
42 | - An additional `--visualize` flag may be added to visualize every solution from optimization, but since the pipeline needs to repeatedly solves optimization problems, the visualization is blocking and needs to be closed every time in order to continue (by pressing "ESC").
43 | - Due to challenges of determinism of the physics simulator, different runs with the same random seed may produce different results. It is possible that the robot may fail at the provided task, especially when external disturbances are applied. In this case, we recommend running the demo again.
44 |
45 | ### Demo with Cached Query
46 |
47 | We recommend starting with the cached VLM query.
48 |
49 | ```Shell
50 | python main.py --use_cached_query [--visualize]
51 | ```
52 |
53 | A video will be saved to `./videos` by default.
54 |
55 |
56 |
57 | ### Demo with External Disturbances
58 |
59 | Since ReKep acts as a closed-loop policy, it is robust to disturbances with automatic failure recovery both within stages and across stages. To demonstrate this in simulation, we apply the following disturbances for the "pen-in-holder" task:
60 |
61 | - Move the pen when robot is trying to grasp the pen
62 |
63 | - Take the pen out of the gripper when robot is trying to reorient the pen
64 |
65 | - Move the holder when robot is trying to drop the pen into the holder
66 |
67 |
68 |
69 | Note that since the disturbances are pre-defined, we recommend running with the cached query.
70 |
71 | ```Shell
72 | python main.py --use_cached_query --apply_disturbance [--visualize]
73 | ```
74 | ### Demo with Live Query
75 |
76 | The following script can be run to query VLM for a new sequence of ReKep constraints and executes them on the robot:
77 |
78 | ```Shell
79 | python main.py [--visualize]
80 | ```
81 |
82 | ## Setup New Environments
83 | Leveraging the diverse objects and scenes provided by [BEHAVIOR-1K](https://behavior.stanford.edu/) in [OmniGibson](https://behavior.stanford.edu/omnigibson/index.html), new tasks and scenes can be easily configured. To change the objects, you may check out the available objects as part of the BEHAVIOR assets on this [page](https://behavior.stanford.edu/knowledgebase/objects/index.html) (click on each object instance to view its visualization). After identifying the objects, we recommend making a copy of the JSON scene file `./configs/og_scene_file_pen.json` and edit the `state` and `objects_info` accordingly. Remember that the scene file need to be supplied to the `Main` class at initialization. Additional [scenes](https://behavior.stanford.edu/knowledgebase/scenes/index.html) and [robots](https://behavior.stanford.edu/omnigibson/getting_started/examples.html#robots) provided by BEHAVIOR-1K may also be possible, but they are currently untested.
84 |
85 | ## Real-World Deployment
86 | To deploy ReKep in the real world, most changes should only be needed inside `environment.py`. Specifically, all of the "exposed functions" need to be changed for the real world environment. The following components need to be implemented:
87 |
88 | - **Robot Controller**: Our real-world implementation uses the joint impedance controller from [Deoxys](https://github.com/UT-Austin-RPL/deoxys_control) for our Franka setup. Specifically, when `execute_action` in `environment.py` receives a target end-effector pose, we first calculate IK to obtain the target joint positions and send the command to the low-level controller.
89 | - **Keypoint Tracker**: Keypoints need to be tracked in order to perform closed-loop replanning, and this typically needs to be achieved using RGD-D cameras. Our real-world implementation uses similarity matching of [DINOv2](https://github.com/facebookresearch/dinov2) features calculated from multiple RGB-D cameras to track the keypoints (details may be found in the [paper](https://rekep-robot.github.io/rekep.pdf) appendix). Alternatively, we also recommend trying out specialized point trackers, such as [\[1\]](https://github.com/henry123-boy/SpaTracker), [\[2\]](https://github.com/google-deepmind/tapnet), [\[3\]](https://github.com/facebookresearch/co-tracker), and [\[4\]](https://github.com/aharley/pips2).
90 | - **SDF Reconstruction**: In order to avoid collision with irrelevant objects or the table, an SDF voxel grid of the environment needs to be provided to the solvers. Additionally, the SDF should ignore robot arm and any grasped objects. Our real-world implementation uses [nvblox_torch](https://github.com/NVlabs/nvblox_torch) for ESDF reconstruction, [cuRobo](https://github.com/NVlabs/curobo) for segmenting robot arm, and [Cutie](https://github.com/hkchengrex/Cutie) for object mask tracking.
91 | - **(Optional) Consistency Cost**: If closed-loop replanning is desired, we find it helpful to include a consistency cost in the solver to encourage the new solution to be close to the previous one (more details can be found in the [paper](https://rekep-robot.github.io/rekep.pdf) appendix).
92 | - **(Optional) Grasp Metric or Grasp Detector**: We include a cost that encourages top-down grasp pose in this codebase, in addition to the collision avoidance cost and the ReKep constraint (for identifying grasp keypoint), which collectively identify the 6 DoF grasp pose. Alternatively, other grasp metrics can be included, such as force-closure. Our real-world implementation instead uses grasp detectors [AnyGrasp](https://github.com/graspnet/anygrasp_sdk), which is implemented as a special routine because it is too slow to be used as an optimizable cost.
93 |
94 | Since there are several components in the pipeline, running them sequentially in the real world may be too slow. As a result, we recommend running the following compute-intensive components in separate processes in addition to the main process that runs `main.py`: `subgoal_solver`, `path_solver`, `keypoint_tracker`, `sdf_reconstruction`, `mask_tracker`, and `grasp_detector` (if used).
95 |
96 | ## Known Limitations
97 | - **Prompt Tuning**: Since ReKep relies on VLMs to generate code-based constraints to solve for the behaviors of the robot, it is sensitive to the specific VLM used and the prompts given to the VLM. Although visual prompting is used, typically we find that the prompts do not necessarily need to contain image-text examples or code examples, and pure-text high-level instructions can go a long way with the latest VLM such as `GPT-4o`. As a result, when starting with a new domain and if you observe that the default prompt is failing, we recommend the following steps: 1) pick a few representative tasks in the domain for validation purposes, 2) procedurally update the prompt with high-level text examples and instructions, and 3) test the prompt by checking the text output and return to step 2 if needed.
98 |
99 | - **Performance Tuning**: For clarity purpose, the entire pipeline is run sequentially. The latency introduced by the simulator and the solvers gets compounded. If this is a concern, we recommend running compute-intensive components, such as the simulator, the `subgoal_solver`, and the `path_solver`, in separate processes, but concurrency needs to be handled with care. More discussion can be found in the "Real-World Deployment" section. To tune the solver, the `objective` function typically takes the majority of time, and among the different costs, the reachability cost by the IK solver is typically most expensive to compute. Depending on the task, you may reduce `sampling_maxfun` and `maxiter` in `configs/config.yaml` or disable the reachability cost.
100 |
101 | - **Task-Space Planning**: Since the current pipeline performs planning in the task space (i.e., solving for end-effector poses) instead of the joint space, it occasionally may produce actions kinematically challenging for robots to achieve, especially for tasks that require 6 DoF motions.
102 |
103 | ## Troubleshooting
104 |
105 | For issues related to OmniGibson, please raise a issue [here](https://github.com/StanfordVL/OmniGibson/issues). You are also welcome to join the [Discord](https://discord.com/invite/bccR5vGFEx) channel for timely support.
106 |
107 | For other issues related to the code in this repo, feel free to raise an issue in this repo and we will try to address it when available.
108 |
--------------------------------------------------------------------------------
/configs/config.yaml:
--------------------------------------------------------------------------------
1 | device: &device 'cuda'
2 | seed: &seed 0
3 |
4 | main:
5 | interpolate_pos_step_size: &interpolate_pos_step_size 0.05 # controls the density of the final returned path
6 | interpolate_rot_step_size: &interpolate_rot_step_size 0.34 # about 20 degrees
7 | grasp_depth: 0.10
8 | constraint_tolerance: 0.10 # for backtracking
9 | bounds_min: &bounds_min [-0.45, -0.75, 0.698]
10 | bounds_max: &bounds_max [0.10, 0.60, 1.2]
11 | sdf_voxel_size: 0.01
12 | vlm_camera: 0
13 | action_steps_per_iter: 5
14 | seed: *seed
15 |
16 | env:
17 | video_cache_size: 2000
18 | og_sim:
19 | physics_frequency: 100
20 | action_frequency: 15
21 |
22 | scene:
23 | name: Rs_int
24 | type: InteractiveTraversableScene
25 | scene_model: Rs_int
26 |
27 | bounds_min: *bounds_min
28 | bounds_max: *bounds_max
29 | interpolate_pos_step_size: *interpolate_pos_step_size
30 | interpolate_rot_step_size: *interpolate_rot_step_size
31 |
32 | robot:
33 | robot_config:
34 | name: Fetch
35 | type: Fetch
36 | obs_modalities: [rgb, depth]
37 | action_modalities: continuous
38 | action_normalize: False
39 | position: [-0.8, 0.0, 0.]
40 | grasping_mode: assisted
41 |
42 | controller_config:
43 | base:
44 | name: DifferentialDriveController
45 | arm_0:
46 | name: OperationalSpaceController
47 | kp: 250
48 | kp_limits: [50, 400]
49 | damping_ratio: 0.6
50 | gripper_0:
51 | name: MultiFingerGripperController
52 | command_input_limits: [0.0, 1.0]
53 | mode: smooth
54 | camera:
55 | name: JointController
56 |
57 | camera:
58 | # recorder
59 | 1:
60 | name: cam_1
61 | position: [ 0.6137, 0.4764, 1.4565]
62 | orientation: [ 0.3212, 0.4682, 0.6788, 0.4656]
63 | resolution: 480
64 |
65 | # vlm camera
66 | 0:
67 | name: cam_0
68 | position: [-0.1655, 0.0167, 1.3664]
69 | orientation: [ 0.0550, 0.0544, 0.7010, 0.7090]
70 | resolution: 480
71 |
72 | path_solver:
73 | opt_pos_step_size: 0.20 # controls the density of control points in the path
74 | opt_rot_step_size: 0.78 # controls the density of control points in the path
75 | opt_interpolate_pos_step_size: 0.02 # controls the density of collision checking inside optimization
76 | opt_interpolate_rot_step_size: 0.10
77 | max_collision_points: 60
78 | sampling_maxfun: 5000
79 | bounds_min: *bounds_min
80 | bounds_max: *bounds_max
81 | constraint_tolerance: 0.0001
82 | minimizer_options:
83 | maxiter: 200
84 |
85 | subgoal_solver:
86 | bounds_min: *bounds_min
87 | bounds_max: *bounds_max
88 | sampling_maxfun: 5000
89 | max_collision_points: 60
90 | constraint_tolerance: 0.0001
91 | minimizer_options:
92 | maxiter: 200
93 |
94 | keypoint_proposer:
95 | num_candidates_per_mask: 5
96 | min_dist_bt_keypoints: 0.06
97 | max_mask_ratio: 0.5
98 | device: *device
99 | bounds_min: *bounds_min
100 | bounds_max: *bounds_max
101 | seed: *seed
102 |
103 | constraint_generator:
104 | model: chatgpt-4o-latest
105 | temperature: 0.0
106 | max_tokens: 2048
107 |
108 | visualizer:
109 | bounds_min: *bounds_min
110 | bounds_max: *bounds_max
--------------------------------------------------------------------------------
/configs/og_scene_file_pen.json:
--------------------------------------------------------------------------------
1 | {
2 | "metadata": {},
3 | "state": {
4 | "system_registry": {},
5 | "object_registry": {
6 | "ceilings_jfsbfp_0": {
7 | "root_link": {
8 | "pos": [
9 | -3.0498788356781006,
10 | 2.527890920639038,
11 | 2.549999713897705
12 | ],
13 | "ori": [
14 | 0.0,
15 | 0.0,
16 | 0.0,
17 | 1.0
18 | ],
19 | "lin_vel": [
20 | 0.0,
21 | 0.0,
22 | 0.0
23 | ],
24 | "ang_vel": [
25 | 0.0,
26 | 0.0,
27 | 0.0
28 | ]
29 | },
30 | "joints": {},
31 | "non_kin": {}
32 | },
33 | "ceilings_phrrml_0": {
34 | "root_link": {
35 | "pos": [
36 | -0.06987879425287247,
37 | -1.287109375,
38 | 2.549999713897705
39 | ],
40 | "ori": [
41 | 0.0,
42 | 0.0,
43 | 0.0,
44 | 1.0
45 | ],
46 | "lin_vel": [
47 | 0.0,
48 | 0.0,
49 | 0.0
50 | ],
51 | "ang_vel": [
52 | 0.0,
53 | 0.0,
54 | 0.0
55 | ]
56 | },
57 | "joints": {},
58 | "non_kin": {}
59 | },
60 | "ceilings_rhfker_0": {
61 | "root_link": {
62 | "pos": [
63 | 1.2158854007720947,
64 | 2.5133121013641357,
65 | 2.549999952316284
66 | ],
67 | "ori": [
68 | 0.0,
69 | 0.0,
70 | 0.0,
71 | 1.0
72 | ],
73 | "lin_vel": [
74 | 0.0,
75 | 0.0,
76 | 0.0
77 | ],
78 | "ang_vel": [
79 | 0.0,
80 | 0.0,
81 | 0.0
82 | ]
83 | },
84 | "joints": {},
85 | "non_kin": {}
86 | },
87 | "ceilings_yimfml_0": {
88 | "root_link": {
89 | "pos": [
90 | -0.793107271194458,
91 | 2.536679267883301,
92 | 2.549999952316284
93 | ],
94 | "ori": [
95 | 0.0,
96 | 0.0,
97 | 0.0,
98 | 1.0
99 | ],
100 | "lin_vel": [
101 | 0.0,
102 | 0.0,
103 | 0.0
104 | ],
105 | "ang_vel": [
106 | 0.0,
107 | 0.0,
108 | 0.0
109 | ]
110 | },
111 | "joints": {},
112 | "non_kin": {}
113 | },
114 | "ceilings_zlhppj_0": {
115 | "root_link": {
116 | "pos": [
117 | -3.0498788356781006,
118 | -0.24460941553115845,
119 | 2.549999713897705
120 | ],
121 | "ori": [
122 | 0.0,
123 | 0.0,
124 | 0.0,
125 | 1.0
126 | ],
127 | "lin_vel": [
128 | 0.0,
129 | 0.0,
130 | 0.0
131 | ],
132 | "ang_vel": [
133 | 0.0,
134 | 0.0,
135 | 0.0
136 | ]
137 | },
138 | "joints": {},
139 | "non_kin": {}
140 | },
141 | "floors_avvnuv_0": {
142 | "root_link": {
143 | "pos": [
144 | -3.0498788356781006,
145 | 2.527891159057617,
146 | -0.15000027418136597
147 | ],
148 | "ori": [
149 | 0.0,
150 | 0.0,
151 | 0.0,
152 | 1.0
153 | ],
154 | "lin_vel": [
155 | 0.0,
156 | 0.0,
157 | 0.0
158 | ],
159 | "ang_vel": [
160 | 0.0,
161 | 0.0,
162 | 0.0
163 | ]
164 | },
165 | "joints": {},
166 | "non_kin": {}
167 | },
168 | "floors_gjemfr_0": {
169 | "root_link": {
170 | "pos": [
171 | 1.2158854007720947,
172 | 2.5133121013641357,
173 | -0.15000024437904358
174 | ],
175 | "ori": [
176 | 0.0,
177 | 0.0,
178 | 0.0,
179 | 1.0
180 | ],
181 | "lin_vel": [
182 | 0.0,
183 | 0.0,
184 | 0.0
185 | ],
186 | "ang_vel": [
187 | 0.0,
188 | 0.0,
189 | 0.0
190 | ]
191 | },
192 | "joints": {},
193 | "non_kin": {}
194 | },
195 | "floors_ifmioj_0": {
196 | "root_link": {
197 | "pos": [
198 | -0.793107271194458,
199 | 2.536679267883301,
200 | -0.15000024437904358
201 | ],
202 | "ori": [
203 | 0.0,
204 | 0.0,
205 | 0.0,
206 | 1.0
207 | ],
208 | "lin_vel": [
209 | 0.0,
210 | 0.0,
211 | 0.0
212 | ],
213 | "ang_vel": [
214 | 0.0,
215 | 0.0,
216 | 0.0
217 | ]
218 | },
219 | "joints": {},
220 | "non_kin": {}
221 | },
222 | "floors_ptwlei_0": {
223 | "root_link": {
224 | "pos": [
225 | -0.06987883150577545,
226 | -1.287109136581421,
227 | -0.15000025928020477
228 | ],
229 | "ori": [
230 | 0.0,
231 | 0.0,
232 | 0.0,
233 | 1.0
234 | ],
235 | "lin_vel": [
236 | 0.0,
237 | 0.0,
238 | 0.0
239 | ],
240 | "ang_vel": [
241 | 0.0,
242 | 0.0,
243 | 0.0
244 | ]
245 | },
246 | "joints": {},
247 | "non_kin": {}
248 | },
249 | "floors_ucjdgj_0": {
250 | "root_link": {
251 | "pos": [
252 | -3.0498788356781006,
253 | -0.24460916221141815,
254 | -0.15000034868717194
255 | ],
256 | "ori": [
257 | 0.0,
258 | 0.0,
259 | 0.0,
260 | 1.0
261 | ],
262 | "lin_vel": [
263 | 0.0,
264 | 0.0,
265 | 0.0
266 | ],
267 | "ang_vel": [
268 | 0.0,
269 | 0.0,
270 | 0.0
271 | ]
272 | },
273 | "joints": {},
274 | "non_kin": {}
275 | },
276 | "walls_axhynp_0": {
277 | "root_link": {
278 | "pos": [
279 | -1.233049750328064,
280 | 3.5148379802703857,
281 | 1.1139445304870605
282 | ],
283 | "ori": [
284 | 0.0,
285 | 0.0,
286 | 0.0,
287 | 1.0
288 | ],
289 | "lin_vel": [
290 | 0.0,
291 | 0.0,
292 | 0.0
293 | ],
294 | "ang_vel": [
295 | 0.0,
296 | 0.0,
297 | 0.0
298 | ]
299 | },
300 | "joints": {},
301 | "non_kin": {}
302 | },
303 | "walls_bcjmdz_0": {
304 | "root_link": {
305 | "pos": [
306 | 1.940032720565796,
307 | -0.25174951553344727,
308 | 1.0881959199905396
309 | ],
310 | "ori": [
311 | 0.0,
312 | 0.0,
313 | 0.0,
314 | 1.0
315 | ],
316 | "lin_vel": [
317 | 0.0,
318 | 0.0,
319 | 0.0
320 | ],
321 | "ang_vel": [
322 | 0.0,
323 | 0.0,
324 | 0.0
325 | ]
326 | },
327 | "joints": {},
328 | "non_kin": {}
329 | },
330 | "walls_dekben_0": {
331 | "root_link": {
332 | "pos": [
333 | -0.04535660520195961,
334 | -4.115162372589111,
335 | 1.0688008069992065
336 | ],
337 | "ori": [
338 | 0.0,
339 | 0.0,
340 | 0.0,
341 | 1.0
342 | ],
343 | "lin_vel": [
344 | 0.0,
345 | 0.0,
346 | 0.0
347 | ],
348 | "ang_vel": [
349 | 0.0,
350 | 0.0,
351 | 0.0
352 | ]
353 | },
354 | "joints": {},
355 | "non_kin": {}
356 | },
357 | "walls_exiopd_0": {
358 | "root_link": {
359 | "pos": [
360 | -1.1345733404159546,
361 | 1.54094398021698,
362 | 1.143774390220642
363 | ],
364 | "ori": [
365 | 0.0,
366 | 0.0,
367 | 0.0,
368 | 1.0
369 | ],
370 | "lin_vel": [
371 | 0.0,
372 | 0.0,
373 | 0.0
374 | ],
375 | "ang_vel": [
376 | 0.0,
377 | 0.0,
378 | 0.0
379 | ]
380 | },
381 | "joints": {},
382 | "non_kin": {}
383 | },
384 | "walls_gxvfxw_0": {
385 | "root_link": {
386 | "pos": [
387 | -2.0799880027770996,
388 | -0.2523879408836365,
389 | 1.130429983139038
390 | ],
391 | "ori": [
392 | 0.0,
393 | 0.0,
394 | 0.0,
395 | 1.0
396 | ],
397 | "lin_vel": [
398 | 0.0,
399 | 0.0,
400 | 0.0
401 | ],
402 | "ang_vel": [
403 | 0.0,
404 | 0.0,
405 | 0.0
406 | ]
407 | },
408 | "joints": {},
409 | "non_kin": {}
410 | },
411 | "walls_gxynpx_0": {
412 | "root_link": {
413 | "pos": [
414 | -3.062925100326538,
415 | -2.0301620960235596,
416 | 1.1158838272094727
417 | ],
418 | "ori": [
419 | 0.0,
420 | 0.0,
421 | 0.0,
422 | 1.0
423 | ],
424 | "lin_vel": [
425 | 0.0,
426 | 0.0,
427 | 0.0
428 | ],
429 | "ang_vel": [
430 | 0.0,
431 | 0.0,
432 | 0.0
433 | ]
434 | },
435 | "joints": {},
436 | "non_kin": {}
437 | },
438 | "walls_xlgydu_0": {
439 | "root_link": {
440 | "pos": [
441 | -4.019967555999756,
442 | 0.7578846216201782,
443 | 1.0950665473937988
444 | ],
445 | "ori": [
446 | 0.0,
447 | 0.0,
448 | 0.0,
449 | 1.0
450 | ],
451 | "lin_vel": [
452 | 0.0,
453 | 0.0,
454 | 0.0
455 | ],
456 | "ang_vel": [
457 | 0.0,
458 | 0.0,
459 | 0.0
460 | ]
461 | },
462 | "joints": {},
463 | "non_kin": {}
464 | },
465 | "table_1": {
466 | "root_link": {
467 | "pos": [
468 | 0.13507460057735443,
469 | -0.020923776552081108,
470 | 0.5174875855445862
471 | ],
472 | "ori": [
473 | 5.044171302870382e-09,
474 | -1.635271473787725e-09,
475 | 0.002193866763263941,
476 | 0.9999977946281433
477 | ],
478 | "lin_vel": [
479 | 0.0,
480 | 0.0,
481 | 0.0
482 | ],
483 | "ang_vel": [
484 | 0.0,
485 | 0.0,
486 | 0.0
487 | ]
488 | },
489 | "joints": {},
490 | "non_kin": {}
491 | },
492 | "pen_1": {
493 | "root_link": {
494 | "pos": [
495 | -0.30,
496 | -0.15,
497 | 0.7780028676986694
498 | ],
499 | "ori": [ -0.7071068, 0, 0, 0.7071068 ],
500 | "lin_vel": [
501 | 0.0,
502 | 0.0,
503 | 0.0
504 | ],
505 | "ang_vel": [
506 | 0.0,
507 | 0.0,
508 | 0.0
509 | ]
510 | },
511 | "joints": {},
512 | "non_kin": {}
513 | },
514 | "pencil_holder_1": {
515 | "root_link": {
516 | "pos": [
517 | -0.30,
518 | 0.15,
519 | 0.7163046002388
520 | ],
521 | "ori": [
522 | 0.00011037533113267273,
523 | -0.00012373141362331808,
524 | -1.8041100702248514e-06,
525 | 1.0000001192092896
526 | ],
527 | "lin_vel": [
528 | 0.0,
529 | 0.0,
530 | 0.0
531 | ],
532 | "ang_vel": [
533 | 0.0,
534 | 0.0,
535 | 0.0
536 | ]
537 | },
538 | "joints": {},
539 | "non_kin": {}
540 | }
541 | }
542 | },
543 | "objects_info": {
544 | "init_info": {
545 | "ceilings_jfsbfp_0": {
546 | "class_module": "omnigibson.objects.dataset_object",
547 | "class_name": "DatasetObject",
548 | "args": {
549 | "name": "ceilings_jfsbfp_0",
550 | "prim_path": "/World/ceilings_jfsbfp_0",
551 | "category": "ceilings",
552 | "model": "jfsbfp",
553 | "uuid": 48693673,
554 | "scale": [
555 | 1.0000121190165758,
556 | 1.0000027376842497,
557 | 0.9999999602635717
558 | ],
559 | "fixed_base": true,
560 | "in_rooms": [
561 | "bathroom_0"
562 | ],
563 | "bddl_object_scope": null
564 | }
565 | },
566 | "ceilings_phrrml_0": {
567 | "class_module": "omnigibson.objects.dataset_object",
568 | "class_name": "DatasetObject",
569 | "args": {
570 | "name": "ceilings_phrrml_0",
571 | "prim_path": "/World/ceilings_phrrml_0",
572 | "category": "ceilings",
573 | "model": "phrrml",
574 | "uuid": 74173089,
575 | "scale": [
576 | 0.9999942598615925,
577 | 0.9999986871377896,
578 | 0.9999999602635717
579 | ],
580 | "fixed_base": true,
581 | "in_rooms": [
582 | "living_room_0"
583 | ],
584 | "bddl_object_scope": null
585 | }
586 | },
587 | "ceilings_rhfker_0": {
588 | "class_module": "omnigibson.objects.dataset_object",
589 | "class_name": "DatasetObject",
590 | "args": {
591 | "name": "ceilings_rhfker_0",
592 | "prim_path": "/World/ceilings_rhfker_0",
593 | "category": "ceilings",
594 | "model": "rhfker",
595 | "uuid": 42059608,
596 | "scale": [
597 | 1.0000056252610265,
598 | 1.000002918863586,
599 | 0.9999999602635717
600 | ],
601 | "fixed_base": true,
602 | "in_rooms": [
603 | "entryway_0"
604 | ],
605 | "bddl_object_scope": null
606 | }
607 | },
608 | "ceilings_yimfml_0": {
609 | "class_module": "omnigibson.objects.dataset_object",
610 | "class_name": "DatasetObject",
611 | "args": {
612 | "name": "ceilings_yimfml_0",
613 | "prim_path": "/World/ceilings_yimfml_0",
614 | "category": "ceilings",
615 | "model": "yimfml",
616 | "uuid": 33802283,
617 | "scale": [
618 | 0.999999934791504,
619 | 1.0000027376842497,
620 | 0.9999999602635717
621 | ],
622 | "fixed_base": true,
623 | "in_rooms": [
624 | "kitchen_0"
625 | ],
626 | "bddl_object_scope": null
627 | }
628 | },
629 | "ceilings_zlhppj_0": {
630 | "class_module": "omnigibson.objects.dataset_object",
631 | "class_name": "DatasetObject",
632 | "args": {
633 | "name": "ceilings_zlhppj_0",
634 | "prim_path": "/World/ceilings_zlhppj_0",
635 | "category": "ceilings",
636 | "model": "zlhppj",
637 | "uuid": 97287247,
638 | "scale": [
639 | 1.0000119961301062,
640 | 0.9999980648281501,
641 | 1.0000008543339716
642 | ],
643 | "fixed_base": true,
644 | "in_rooms": [
645 | "bedroom_0"
646 | ],
647 | "bddl_object_scope": null
648 | }
649 | },
650 | "floors_avvnuv_0": {
651 | "class_module": "omnigibson.objects.dataset_object",
652 | "class_name": "DatasetObject",
653 | "args": {
654 | "name": "floors_avvnuv_0",
655 | "prim_path": "/World/floors_avvnuv_0",
656 | "category": "floors",
657 | "model": "avvnuv",
658 | "uuid": 12717065,
659 | "scale": [
660 | 1.0000121190165758,
661 | 1.0000027376842497,
662 | 1.0000001589457446
663 | ],
664 | "fixed_base": true,
665 | "in_rooms": [
666 | "bathroom_0"
667 | ],
668 | "bddl_object_scope": null
669 | }
670 | },
671 | "floors_gjemfr_0": {
672 | "class_module": "omnigibson.objects.dataset_object",
673 | "class_name": "DatasetObject",
674 | "args": {
675 | "name": "floors_gjemfr_0",
676 | "prim_path": "/World/floors_gjemfr_0",
677 | "category": "floors",
678 | "model": "gjemfr",
679 | "uuid": 94669642,
680 | "scale": [
681 | 1.0000056252610265,
682 | 1.000002918863586,
683 | 0.9999999602635717
684 | ],
685 | "fixed_base": true,
686 | "in_rooms": [
687 | "entryway_0"
688 | ],
689 | "bddl_object_scope": null
690 | }
691 | },
692 | "floors_ifmioj_0": {
693 | "class_module": "omnigibson.objects.dataset_object",
694 | "class_name": "DatasetObject",
695 | "args": {
696 | "name": "floors_ifmioj_0",
697 | "prim_path": "/World/floors_ifmioj_0",
698 | "category": "floors",
699 | "model": "ifmioj",
700 | "uuid": 62654673,
701 | "scale": [
702 | 0.999999934791504,
703 | 1.0000027376842497,
704 | 0.9999999602635717
705 | ],
706 | "fixed_base": true,
707 | "in_rooms": [
708 | "kitchen_0"
709 | ],
710 | "bddl_object_scope": null
711 | }
712 | },
713 | "floors_ptwlei_0": {
714 | "class_module": "omnigibson.objects.dataset_object",
715 | "class_name": "DatasetObject",
716 | "args": {
717 | "name": "floors_ptwlei_0",
718 | "prim_path": "/World/floors_ptwlei_0",
719 | "category": "floors",
720 | "model": "ptwlei",
721 | "uuid": 64419761,
722 | "scale": [
723 | 0.9999942598615925,
724 | 0.9999987714425131,
725 | 1.0000000596046483
726 | ],
727 | "fixed_base": true,
728 | "in_rooms": [
729 | "living_room_0"
730 | ],
731 | "bddl_object_scope": null
732 | }
733 | },
734 | "floors_ucjdgj_0": {
735 | "class_module": "omnigibson.objects.dataset_object",
736 | "class_name": "DatasetObject",
737 | "args": {
738 | "name": "floors_ucjdgj_0",
739 | "prim_path": "/World/floors_ucjdgj_0",
740 | "category": "floors",
741 | "model": "ucjdgj",
742 | "uuid": 82484688,
743 | "scale": [
744 | 1.0000119961301062,
745 | 0.9999980648281501,
746 | 1.0000007549927372
747 | ],
748 | "fixed_base": true,
749 | "in_rooms": [
750 | "bedroom_0"
751 | ],
752 | "bddl_object_scope": null
753 | }
754 | },
755 | "walls_axhynp_0": {
756 | "class_module": "omnigibson.objects.dataset_object",
757 | "class_name": "DatasetObject",
758 | "args": {
759 | "name": "walls_axhynp_0",
760 | "prim_path": "/World/walls_axhynp_0",
761 | "category": "walls",
762 | "model": "axhynp",
763 | "uuid": 67186365,
764 | "scale": [
765 | 1.000001272100645,
766 | 1.0000595362588338,
767 | 0.9999999602635717
768 | ],
769 | "fixed_base": true,
770 | "in_rooms": "",
771 | "bddl_object_scope": null
772 | }
773 | },
774 | "walls_bcjmdz_0": {
775 | "class_module": "omnigibson.objects.dataset_object",
776 | "class_name": "DatasetObject",
777 | "args": {
778 | "name": "walls_bcjmdz_0",
779 | "prim_path": "/World/walls_bcjmdz_0",
780 | "category": "walls",
781 | "model": "bcjmdz",
782 | "uuid": 52223757,
783 | "scale": [
784 | 1.000050671594632,
785 | 1.00000099325917,
786 | 0.9999999602635717
787 | ],
788 | "fixed_base": true,
789 | "in_rooms": "",
790 | "bddl_object_scope": null
791 | }
792 | },
793 | "walls_dekben_0": {
794 | "class_module": "omnigibson.objects.dataset_object",
795 | "class_name": "DatasetObject",
796 | "args": {
797 | "name": "walls_dekben_0",
798 | "prim_path": "/World/walls_dekben_0",
799 | "category": "walls",
800 | "model": "dekben",
801 | "uuid": 34758690,
802 | "scale": [
803 | 1.0000019097954302,
804 | 1.0000432761023288,
805 | 0.9999999602635717
806 | ],
807 | "fixed_base": true,
808 | "in_rooms": "",
809 | "bddl_object_scope": null
810 | }
811 | },
812 | "walls_exiopd_0": {
813 | "class_module": "omnigibson.objects.dataset_object",
814 | "class_name": "DatasetObject",
815 | "args": {
816 | "name": "walls_exiopd_0",
817 | "prim_path": "/World/walls_exiopd_0",
818 | "category": "walls",
819 | "model": "exiopd",
820 | "uuid": 62109373,
821 | "scale": [
822 | 1.000001345225585,
823 | 1.000042018513113,
824 | 0.9999999602635717
825 | ],
826 | "fixed_base": true,
827 | "in_rooms": "",
828 | "bddl_object_scope": null
829 | }
830 | },
831 | "walls_gxvfxw_0": {
832 | "class_module": "omnigibson.objects.dataset_object",
833 | "class_name": "DatasetObject",
834 | "args": {
835 | "name": "walls_gxvfxw_0",
836 | "prim_path": "/World/walls_gxvfxw_0",
837 | "category": "walls",
838 | "model": "gxvfxw",
839 | "uuid": 39069885,
840 | "scale": [
841 | 1.0000561936004457,
842 | 1.0000009917765513,
843 | 0.9999999602635717
844 | ],
845 | "fixed_base": true,
846 | "in_rooms": "",
847 | "bddl_object_scope": null
848 | }
849 | },
850 | "walls_gxynpx_0": {
851 | "class_module": "omnigibson.objects.dataset_object",
852 | "class_name": "DatasetObject",
853 | "args": {
854 | "name": "walls_gxynpx_0",
855 | "prim_path": "/World/walls_gxynpx_0",
856 | "category": "walls",
857 | "model": "gxynpx",
858 | "uuid": 60001904,
859 | "scale": [
860 | 1.0000041226080834,
861 | 1.0000436412659373,
862 | 0.9999999602635717
863 | ],
864 | "fixed_base": true,
865 | "in_rooms": "",
866 | "bddl_object_scope": null
867 | }
868 | },
869 | "walls_xlgydu_0": {
870 | "class_module": "omnigibson.objects.dataset_object",
871 | "class_name": "DatasetObject",
872 | "args": {
873 | "name": "walls_xlgydu_0",
874 | "prim_path": "/World/walls_xlgydu_0",
875 | "category": "walls",
876 | "model": "xlgydu",
877 | "uuid": 99082996,
878 | "scale": [
879 | 1.0000498312191335,
880 | 1.0000013621939055,
881 | 0.9999999602635717
882 | ],
883 | "fixed_base": true,
884 | "in_rooms": "",
885 | "bddl_object_scope": null
886 | }
887 | },
888 | "table_1": {
889 | "class_module": "omnigibson.objects.dataset_object",
890 | "class_name": "DatasetObject",
891 | "args": {
892 | "name": "table_1",
893 | "prim_path": "/World/table_1",
894 | "category": "breakfast_table",
895 | "model": "dnsjnv",
896 | "uuid": 88058023,
897 | "scale": 2
898 | }
899 | },
900 | "pen_1": {
901 | "class_module": "omnigibson.objects.dataset_object",
902 | "class_name": "DatasetObject",
903 | "args": {
904 | "name": "pen_1",
905 | "prim_path": "/World/pen_1",
906 | "category": "pen",
907 | "model": "gqwdor",
908 | "scale": [
909 | 3.0,
910 | 3.0,
911 | 1.3
912 | ]
913 | }
914 | },
915 | "pencil_holder_1": {
916 | "class_module": "omnigibson.objects.dataset_object",
917 | "class_name": "DatasetObject",
918 | "args": {
919 | "name": "pencil_holder_1",
920 | "prim_path": "/World/pencil_holder_1",
921 | "category": "pencil_holder",
922 | "model": "muqeud",
923 | "scale": [
924 | 1.5,
925 | 1.5,
926 | 1.0
927 | ]
928 | }
929 | }
930 | }
931 | }
932 | }
--------------------------------------------------------------------------------
/constraint_generation.py:
--------------------------------------------------------------------------------
1 | import base64
2 | from openai import OpenAI
3 | import os
4 | import cv2
5 | import json
6 | import parse
7 | import numpy as np
8 | import time
9 | from datetime import datetime
10 |
11 | # Function to encode the image
12 | def encode_image(image_path):
13 | with open(image_path, "rb") as image_file:
14 | return base64.b64encode(image_file.read()).decode('utf-8')
15 |
16 | class ConstraintGenerator:
17 | def __init__(self, config):
18 | self.config = config
19 | self.client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
20 | self.base_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), './vlm_query')
21 | with open(os.path.join(self.base_dir, 'prompt_template.txt'), 'r') as f:
22 | self.prompt_template = f.read()
23 |
24 | def _build_prompt(self, image_path, instruction):
25 | img_base64 = encode_image(image_path)
26 | prompt_text = self.prompt_template.format(instruction=instruction)
27 | # save prompt
28 | with open(os.path.join(self.task_dir, 'prompt.txt'), 'w') as f:
29 | f.write(prompt_text)
30 | messages = [
31 | {
32 | "role": "user",
33 | "content": [
34 | {
35 | "type": "text",
36 | "text": self.prompt_template.format(instruction=instruction)
37 | },
38 | {
39 | "type": "image_url",
40 | "image_url": {
41 | "url": f"data:image/png;base64,{img_base64}"
42 | }
43 | },
44 | ]
45 | }
46 | ]
47 | return messages
48 |
49 | def _parse_and_save_constraints(self, output, save_dir):
50 | # parse into function blocks
51 | lines = output.split("\n")
52 | functions = dict()
53 | for i, line in enumerate(lines):
54 | if line.startswith("def "):
55 | start = i
56 | name = line.split("(")[0].split("def ")[1]
57 | if line.startswith(" return "):
58 | end = i
59 | functions[name] = lines[start:end+1]
60 | # organize them based on hierarchy in function names
61 | groupings = dict()
62 | for name in functions:
63 | parts = name.split("_")[:-1] # last one is the constraint idx
64 | key = "_".join(parts)
65 | if key not in groupings:
66 | groupings[key] = []
67 | groupings[key].append(name)
68 | # save them into files
69 | for key in groupings:
70 | with open(os.path.join(save_dir, f"{key}_constraints.txt"), "w") as f:
71 | for name in groupings[key]:
72 | f.write("\n".join(functions[name]) + "\n\n")
73 | print(f"Constraints saved to {save_dir}")
74 |
75 | def _parse_other_metadata(self, output):
76 | data_dict = dict()
77 | # find num_stages
78 | num_stages_template = "num_stages = {num_stages}"
79 | for line in output.split("\n"):
80 | num_stages = parse.parse(num_stages_template, line)
81 | if num_stages is not None:
82 | break
83 | if num_stages is None:
84 | raise ValueError("num_stages not found in output")
85 | data_dict['num_stages'] = int(num_stages['num_stages'])
86 | # find grasp_keypoints
87 | grasp_keypoints_template = "grasp_keypoints = {grasp_keypoints}"
88 | for line in output.split("\n"):
89 | grasp_keypoints = parse.parse(grasp_keypoints_template, line)
90 | if grasp_keypoints is not None:
91 | break
92 | if grasp_keypoints is None:
93 | raise ValueError("grasp_keypoints not found in output")
94 | # convert into list of ints
95 | grasp_keypoints = grasp_keypoints['grasp_keypoints'].replace("[", "").replace("]", "").split(",")
96 | grasp_keypoints = [int(x.strip()) for x in grasp_keypoints]
97 | data_dict['grasp_keypoints'] = grasp_keypoints
98 | # find release_keypoints
99 | release_keypoints_template = "release_keypoints = {release_keypoints}"
100 | for line in output.split("\n"):
101 | release_keypoints = parse.parse(release_keypoints_template, line)
102 | if release_keypoints is not None:
103 | break
104 | if release_keypoints is None:
105 | raise ValueError("release_keypoints not found in output")
106 | # convert into list of ints
107 | release_keypoints = release_keypoints['release_keypoints'].replace("[", "").replace("]", "").split(",")
108 | release_keypoints = [int(x.strip()) for x in release_keypoints]
109 | data_dict['release_keypoints'] = release_keypoints
110 | return data_dict
111 |
112 | def _save_metadata(self, metadata):
113 | for k, v in metadata.items():
114 | if isinstance(v, np.ndarray):
115 | metadata[k] = v.tolist()
116 | with open(os.path.join(self.task_dir, 'metadata.json'), 'w') as f:
117 | json.dump(metadata, f)
118 | print(f"Metadata saved to {os.path.join(self.task_dir, 'metadata.json')}")
119 |
120 | def generate(self, img, instruction, metadata):
121 | """
122 | Args:
123 | img (np.ndarray): image of the scene (H, W, 3) uint8
124 | instruction (str): instruction for the query
125 | Returns:
126 | save_dir (str): directory where the constraints
127 | """
128 | # create a directory for the task
129 | fname = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") + "_" + instruction.lower().replace(" ", "_")
130 | self.task_dir = os.path.join(self.base_dir, fname)
131 | os.makedirs(self.task_dir, exist_ok=True)
132 | # save query image
133 | image_path = os.path.join(self.task_dir, 'query_img.png')
134 | cv2.imwrite(image_path, img[..., ::-1])
135 | # build prompt
136 | messages = self._build_prompt(image_path, instruction)
137 | # stream back the response
138 | stream = self.client.chat.completions.create(model=self.config['model'],
139 | messages=messages,
140 | temperature=self.config['temperature'],
141 | max_tokens=self.config['max_tokens'],
142 | stream=True)
143 | output = ""
144 | start = time.time()
145 | for chunk in stream:
146 | print(f'[{time.time()-start:.2f}s] Querying OpenAI API...', end='\r')
147 | if chunk.choices[0].delta.content is not None:
148 | output += chunk.choices[0].delta.content
149 | print(f'[{time.time()-start:.2f}s] Querying OpenAI API...Done')
150 | # save raw output
151 | with open(os.path.join(self.task_dir, 'output_raw.txt'), 'w') as f:
152 | f.write(output)
153 | # parse and save constraints
154 | self._parse_and_save_constraints(output, self.task_dir)
155 | # save metadata
156 | metadata.update(self._parse_other_metadata(output))
157 | self._save_metadata(metadata)
158 | return self.task_dir
159 |
--------------------------------------------------------------------------------
/environment.py:
--------------------------------------------------------------------------------
1 | import time
2 | import numpy as np
3 | import os
4 | import datetime
5 | import transform_utils as T
6 | import trimesh
7 | import open3d as o3d
8 | import imageio
9 | import omnigibson as og
10 | from omnigibson.macros import gm
11 | from omnigibson.utils.usd_utils import PoseAPI, mesh_prim_mesh_to_trimesh_mesh, mesh_prim_shape_to_trimesh_mesh
12 | from omnigibson.robots.fetch import Fetch
13 | from omnigibson.controllers import IsGraspingState
14 | from og_utils import OGCamera
15 | from utils import (
16 | bcolors,
17 | get_clock_time,
18 | angle_between_rotmat,
19 | angle_between_quats,
20 | get_linear_interpolation_steps,
21 | linear_interpolate_poses,
22 | )
23 | from omnigibson.robots.manipulation_robot import ManipulationRobot
24 | from omnigibson.controllers.controller_base import ControlType, BaseController
25 |
26 | # Don't use GPU dynamics and use flatcache for performance boost
27 | gm.USE_GPU_DYNAMICS = True
28 | gm.ENABLE_FLATCACHE = False
29 |
30 | # some customization to the OG functions
31 | def custom_clip_control(self, control):
32 | """
33 | Clips the inputted @control signal based on @control_limits.
34 |
35 | Args:
36 | control (Array[float]): control signal to clip
37 |
38 | Returns:
39 | Array[float]: Clipped control signal
40 | """
41 | clipped_control = control.clip(
42 | self._control_limits[self.control_type][0][self.dof_idx],
43 | self._control_limits[self.control_type][1][self.dof_idx],
44 | )
45 | idx = (
46 | self._dof_has_limits[self.dof_idx]
47 | if self.control_type == ControlType.POSITION
48 | else [True] * self.control_dim
49 | )
50 | if len(control) > 1:
51 | control[idx] = clipped_control[idx]
52 | return control
53 |
54 | Fetch._initialize = ManipulationRobot._initialize
55 | BaseController.clip_control = custom_clip_control
56 |
57 | class ReKepOGEnv:
58 | def __init__(self, config, scene_file, verbose=False):
59 | self.video_cache = []
60 | self.config = config
61 | self.verbose = verbose
62 | self.config['scene']['scene_file'] = scene_file
63 | self.bounds_min = np.array(self.config['bounds_min'])
64 | self.bounds_max = np.array(self.config['bounds_max'])
65 | self.interpolate_pos_step_size = self.config['interpolate_pos_step_size']
66 | self.interpolate_rot_step_size = self.config['interpolate_rot_step_size']
67 | # create omnigibson environment
68 | self.step_counter = 0
69 | self.og_env = og.Environment(dict(scene=self.config['scene'], robots=[self.config['robot']['robot_config']], env=self.config['og_sim']))
70 | self.og_env.scene.update_initial_state()
71 | for _ in range(10): og.sim.step()
72 | # robot vars
73 | self.robot = self.og_env.robots[0]
74 | dof_idx = np.concatenate([self.robot.trunk_control_idx,
75 | self.robot.arm_control_idx[self.robot.default_arm]])
76 | self.reset_joint_pos = self.robot.reset_joint_pos[dof_idx]
77 | self.world2robot_homo = T.pose_inv(T.pose2mat(self.robot.get_position_orientation()))
78 | # initialize cameras
79 | self._initialize_cameras(self.config['camera'])
80 | self.last_og_gripper_action = 1.0
81 |
82 | # ======================================
83 | # = exposed functions
84 | # ======================================
85 | def get_sdf_voxels(self, resolution, exclude_robot=True, exclude_obj_in_hand=True):
86 | """
87 | open3d-based SDF computation
88 | 1. recursively get all usd prim and get their vertices and faces
89 | 2. compute SDF using open3d
90 | """
91 | start = time.time()
92 | exclude_names = ['wall', 'floor', 'ceiling']
93 | if exclude_robot:
94 | exclude_names += ['fetch', 'robot']
95 | if exclude_obj_in_hand:
96 | assert self.config['robot']['robot_config']['grasping_mode'] in ['assisted', 'sticky'], "Currently only supported for assisted or sticky grasping"
97 | in_hand_obj = self.robot._ag_obj_in_hand[self.robot.default_arm]
98 | if in_hand_obj is not None:
99 | exclude_names.append(in_hand_obj.name.lower())
100 | trimesh_objects = []
101 | for obj in self.og_env.scene.objects:
102 | if any([name in obj.name.lower() for name in exclude_names]):
103 | continue
104 | for link in obj.links.values():
105 | for mesh in link.collision_meshes.values():
106 | mesh_type = mesh.prim.GetPrimTypeInfo().GetTypeName()
107 | if mesh_type == 'Mesh':
108 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(mesh.prim)
109 | else:
110 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(mesh.prim)
111 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(mesh.prim_path)
112 | trimesh_object.apply_transform(world_pose_w_scale)
113 | trimesh_objects.append(trimesh_object)
114 | # chain trimesh objects
115 | scene_mesh = trimesh.util.concatenate(trimesh_objects)
116 | # Create a scene and add the triangle mesh
117 | scene = o3d.t.geometry.RaycastingScene()
118 | vertex_positions = scene_mesh.vertices
119 | triangle_indices = scene_mesh.faces
120 | vertex_positions = o3d.core.Tensor(vertex_positions, dtype=o3d.core.Dtype.Float32)
121 | triangle_indices = o3d.core.Tensor(triangle_indices, dtype=o3d.core.Dtype.UInt32)
122 | _ = scene.add_triangles(vertex_positions, triangle_indices) # we do not need the geometry ID for mesh
123 | # create a grid
124 | shape = np.ceil((self.bounds_max - self.bounds_min) / resolution).astype(int)
125 | steps = (self.bounds_max - self.bounds_min) / shape
126 | grid = np.mgrid[self.bounds_min[0]:self.bounds_max[0]:steps[0],
127 | self.bounds_min[1]:self.bounds_max[1]:steps[1],
128 | self.bounds_min[2]:self.bounds_max[2]:steps[2]]
129 | grid = grid.reshape(3, -1).T
130 | # compute SDF
131 | sdf_voxels = scene.compute_signed_distance(grid.astype(np.float32))
132 | # convert back to np array
133 | sdf_voxels = sdf_voxels.cpu().numpy()
134 | # open3d has flipped sign from our convention
135 | sdf_voxels = -sdf_voxels
136 | sdf_voxels = sdf_voxels.reshape(shape)
137 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] SDF voxels computed in {time.time() - start:.4f} seconds{bcolors.ENDC}')
138 | return sdf_voxels
139 |
140 | def get_cam_obs(self):
141 | self.last_cam_obs = dict()
142 | for cam_id in self.cams:
143 | self.last_cam_obs[cam_id] = self.cams[cam_id].get_obs() # each containing rgb, depth, points, seg
144 | return self.last_cam_obs
145 |
146 | def register_keypoints(self, keypoints):
147 | """
148 | Args:
149 | keypoints (np.ndarray): keypoints in the world frame of shape (N, 3)
150 | Returns:
151 | None
152 | Given a set of keypoints in the world frame, this function registers them so that their newest positions can be accessed later.
153 | """
154 | if not isinstance(keypoints, np.ndarray):
155 | keypoints = np.array(keypoints)
156 | self.keypoints = keypoints
157 | self._keypoint_registry = dict()
158 | self._keypoint2object = dict()
159 | exclude_names = ['wall', 'floor', 'ceiling', 'table', 'fetch', 'robot']
160 | for idx, keypoint in enumerate(keypoints):
161 | closest_distance = np.inf
162 | for obj in self.og_env.scene.objects:
163 | if any([name in obj.name.lower() for name in exclude_names]):
164 | continue
165 | for link in obj.links.values():
166 | for mesh in link.visual_meshes.values():
167 | mesh_prim_path = mesh.prim_path
168 | mesh_type = mesh.prim.GetPrimTypeInfo().GetTypeName()
169 | if mesh_type == 'Mesh':
170 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(mesh.prim)
171 | else:
172 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(mesh.prim)
173 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(mesh.prim_path)
174 | trimesh_object.apply_transform(world_pose_w_scale)
175 | points_transformed = trimesh_object.sample(1000)
176 |
177 | # find closest point
178 | dists = np.linalg.norm(points_transformed - keypoint, axis=1)
179 | point = points_transformed[np.argmin(dists)]
180 | distance = np.linalg.norm(point - keypoint)
181 | if distance < closest_distance:
182 | closest_distance = distance
183 | closest_prim_path = mesh_prim_path
184 | closest_point = point
185 | closest_obj = obj
186 | self._keypoint_registry[idx] = (closest_prim_path, PoseAPI.get_world_pose(closest_prim_path))
187 | self._keypoint2object[idx] = closest_obj
188 | # overwrite the keypoint with the closest point
189 | self.keypoints[idx] = closest_point
190 |
191 | def get_keypoint_positions(self):
192 | """
193 | Args:
194 | None
195 | Returns:
196 | np.ndarray: keypoints in the world frame of shape (N, 3)
197 | Given the registered keypoints, this function returns their current positions in the world frame.
198 | """
199 | assert hasattr(self, '_keypoint_registry') and self._keypoint_registry is not None, "Keypoints have not been registered yet."
200 | keypoint_positions = []
201 | for idx, (prim_path, init_pose) in self._keypoint_registry.items():
202 | init_pose = T.pose2mat(init_pose)
203 | centering_transform = T.pose_inv(init_pose)
204 | keypoint_centered = np.dot(centering_transform, np.append(self.keypoints[idx], 1))[:3]
205 | curr_pose = T.pose2mat(PoseAPI.get_world_pose(prim_path))
206 | keypoint = np.dot(curr_pose, np.append(keypoint_centered, 1))[:3]
207 | keypoint_positions.append(keypoint)
208 | return np.array(keypoint_positions)
209 |
210 | def get_object_by_keypoint(self, keypoint_idx):
211 | """
212 | Args:
213 | keypoint_idx (int): the index of the keypoint
214 | Returns:
215 | pointer: the object that the keypoint is associated with
216 | Given the keypoint index, this function returns the name of the object that the keypoint is associated with.
217 | """
218 | assert hasattr(self, '_keypoint2object') and self._keypoint2object is not None, "Keypoints have not been registered yet."
219 | return self._keypoint2object[keypoint_idx]
220 |
221 | def get_collision_points(self, noise=True):
222 | """
223 | Get the points of the gripper and any object in hand.
224 | """
225 | # add gripper collision points
226 | collision_points = []
227 | for obj in self.og_env.scene.objects:
228 | if 'fetch' in obj.name.lower():
229 | for name, link in obj.links.items():
230 | if 'gripper' in name.lower() or 'wrist' in name.lower(): # wrist_roll and wrist_flex
231 | for collision_mesh in link.collision_meshes.values():
232 | mesh_prim_path = collision_mesh.prim_path
233 | mesh_type = collision_mesh.prim.GetPrimTypeInfo().GetTypeName()
234 | if mesh_type == 'Mesh':
235 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(collision_mesh.prim)
236 | else:
237 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(collision_mesh.prim)
238 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(mesh_prim_path)
239 | trimesh_object.apply_transform(world_pose_w_scale)
240 | points_transformed = trimesh_object.sample(1000)
241 | # add to collision points
242 | collision_points.append(points_transformed)
243 | # add object in hand collision points
244 | in_hand_obj = self.robot._ag_obj_in_hand[self.robot.default_arm]
245 | if in_hand_obj is not None:
246 | for link in in_hand_obj.links.values():
247 | for collision_mesh in link.collision_meshes.values():
248 | mesh_type = collision_mesh.prim.GetPrimTypeInfo().GetTypeName()
249 | if mesh_type == 'Mesh':
250 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(collision_mesh.prim)
251 | else:
252 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(collision_mesh.prim)
253 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(collision_mesh.prim_path)
254 | trimesh_object.apply_transform(world_pose_w_scale)
255 | points_transformed = trimesh_object.sample(1000)
256 | # add to collision points
257 | collision_points.append(points_transformed)
258 | collision_points = np.concatenate(collision_points, axis=0)
259 | return collision_points
260 |
261 | def reset(self):
262 | self.og_env.reset()
263 | self.robot.reset()
264 | for _ in range(5): self._step()
265 | self.open_gripper()
266 | # moving arm to the side to unblock view
267 | ee_pose = self.get_ee_pose()
268 | ee_pose[:3] += np.array([0.0, -0.2, -0.1])
269 | action = np.concatenate([ee_pose, [self.get_gripper_null_action()]])
270 | self.execute_action(action, precise=True)
271 | self.video_cache = []
272 | print(f'{bcolors.HEADER}Reset done.{bcolors.ENDC}')
273 |
274 | def is_grasping(self, candidate_obj=None):
275 | return self.robot.is_grasping(candidate_obj=candidate_obj) == IsGraspingState.TRUE
276 |
277 | def get_ee_pose(self):
278 | ee_pos, ee_xyzw = (self.robot.get_eef_position(), self.robot.get_eef_orientation())
279 | ee_pose = np.concatenate([ee_pos, ee_xyzw]) # [7]
280 | return ee_pose
281 |
282 | def get_ee_pos(self):
283 | return self.get_ee_pose()[:3]
284 |
285 | def get_ee_quat(self):
286 | return self.get_ee_pose()[3:]
287 |
288 | def get_arm_joint_postions(self):
289 | assert isinstance(self.robot, Fetch), "The IK solver assumes the robot is a Fetch robot"
290 | arm = self.robot.default_arm
291 | dof_idx = np.concatenate([self.robot.trunk_control_idx, self.robot.arm_control_idx[arm]])
292 | arm_joint_pos = self.robot.get_joint_positions()[dof_idx]
293 | return arm_joint_pos
294 |
295 | def close_gripper(self):
296 | """
297 | Exposed interface: 1.0 for closed, -1.0 for open, 0.0 for no change
298 | Internal OG interface: 1.0 for open, 0.0 for closed
299 | """
300 | if self.last_og_gripper_action == 0.0:
301 | return
302 | action = np.zeros(12)
303 | action[10:] = [0, 0] # gripper: float. 0. for closed, 1. for open.
304 | for _ in range(30):
305 | self._step(action)
306 | self.last_og_gripper_action = 0.0
307 |
308 | def open_gripper(self):
309 | if self.last_og_gripper_action == 1.0:
310 | return
311 | action = np.zeros(12)
312 | action[10:] = [1, 1] # gripper: float. 0. for closed, 1. for open.
313 | for _ in range(30):
314 | self._step(action)
315 | self.last_og_gripper_action = 1.0
316 |
317 | def get_last_og_gripper_action(self):
318 | return self.last_og_gripper_action
319 |
320 | def get_gripper_open_action(self):
321 | return -1.0
322 |
323 | def get_gripper_close_action(self):
324 | return 1.0
325 |
326 | def get_gripper_null_action(self):
327 | return 0.0
328 |
329 | def compute_target_delta_ee(self, target_pose):
330 | target_pos, target_xyzw = target_pose[:3], target_pose[3:]
331 | ee_pose = self.get_ee_pose()
332 | ee_pos, ee_xyzw = ee_pose[:3], ee_pose[3:]
333 | pos_diff = np.linalg.norm(ee_pos - target_pos)
334 | rot_diff = angle_between_quats(ee_xyzw, target_xyzw)
335 | return pos_diff, rot_diff
336 |
337 | def execute_action(
338 | self,
339 | action,
340 | precise=True,
341 | ):
342 | """
343 | Moves the robot gripper to a target pose by specifying the absolute pose in the world frame and executes gripper action.
344 |
345 | Args:
346 | action (x, y, z, qx, qy, qz, qw, gripper_action): absolute target pose in the world frame + gripper action.
347 | precise (bool): whether to use small position and rotation thresholds for precise movement (robot would move slower).
348 | Returns:
349 | tuple: A tuple containing the position and rotation errors after reaching the target pose.
350 | """
351 | if precise:
352 | pos_threshold = 0.03
353 | rot_threshold = 3.0
354 | else:
355 | pos_threshold = 0.10
356 | rot_threshold = 5.0
357 | action = np.array(action).copy()
358 | assert action.shape == (8,)
359 | target_pose = action[:7]
360 | gripper_action = action[7]
361 |
362 | # ======================================
363 | # = status and safety check
364 | # ======================================
365 | if np.any(target_pose[:3] < self.bounds_min) \
366 | or np.any(target_pose[:3] > self.bounds_max):
367 | print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Target position is out of bounds, clipping to workspace bounds{bcolors.ENDC}')
368 | target_pose[:3] = np.clip(target_pose[:3], self.bounds_min, self.bounds_max)
369 |
370 | # ======================================
371 | # = interpolation
372 | # ======================================
373 | current_pose = self.get_ee_pose()
374 | pos_diff = np.linalg.norm(current_pose[:3] - target_pose[:3])
375 | rot_diff = angle_between_quats(current_pose[3:7], target_pose[3:7])
376 | pos_is_close = pos_diff < self.interpolate_pos_step_size
377 | rot_is_close = rot_diff < self.interpolate_rot_step_size
378 | if pos_is_close and rot_is_close:
379 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Skipping interpolation{bcolors.ENDC}')
380 | pose_seq = np.array([target_pose])
381 | else:
382 | num_steps = get_linear_interpolation_steps(current_pose, target_pose, self.interpolate_pos_step_size, self.interpolate_rot_step_size)
383 | pose_seq = linear_interpolate_poses(current_pose, target_pose, num_steps)
384 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Interpolating for {num_steps} steps{bcolors.ENDC}')
385 |
386 | # ======================================
387 | # = move to target pose
388 | # ======================================
389 | # move faster for intermediate poses
390 | intermediate_pos_threshold = 0.10
391 | intermediate_rot_threshold = 5.0
392 | for pose in pose_seq[:-1]:
393 | self._move_to_waypoint(pose, intermediate_pos_threshold, intermediate_rot_threshold)
394 | # move to the final pose with required precision
395 | pose = pose_seq[-1]
396 | self._move_to_waypoint(pose, pos_threshold, rot_threshold, max_steps=20 if not precise else 40)
397 | # compute error
398 | pos_error, rot_error = self.compute_target_delta_ee(target_pose)
399 | self.verbose and print(f'\n{bcolors.BOLD}[environment.py | {get_clock_time()}] Move to pose completed (pos_error: {pos_error}, rot_error: {np.rad2deg(rot_error)}){bcolors.ENDC}\n')
400 |
401 | # ======================================
402 | # = apply gripper action
403 | # ======================================
404 | if gripper_action == self.get_gripper_open_action():
405 | self.open_gripper()
406 | elif gripper_action == self.get_gripper_close_action():
407 | self.close_gripper()
408 | elif gripper_action == self.get_gripper_null_action():
409 | pass
410 | else:
411 | raise ValueError(f"Invalid gripper action: {gripper_action}")
412 |
413 | return pos_error, rot_error
414 |
415 | def sleep(self, seconds):
416 | start = time.time()
417 | while time.time() - start < seconds:
418 | self._step()
419 |
420 | def save_video(self, save_path=None):
421 | save_dir = os.path.join(os.path.dirname(__file__), 'videos')
422 | os.makedirs(save_dir, exist_ok=True)
423 | if save_path is None:
424 | save_path = os.path.join(save_dir, f'{datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")}.mp4')
425 | video_writer = imageio.get_writer(save_path, fps=30)
426 | for rgb in self.video_cache:
427 | video_writer.append_data(rgb)
428 | video_writer.close()
429 | return save_path
430 |
431 | # ======================================
432 | # = internal functions
433 | # ======================================
434 | def _check_reached_ee(self, target_pos, target_xyzw, pos_threshold, rot_threshold):
435 | """
436 | this is supposed to be for true ee pose (franka hand) in robot frame
437 | """
438 | current_pos = self.robot.get_eef_position()
439 | current_xyzw = self.robot.get_eef_orientation()
440 | current_rotmat = T.quat2mat(current_xyzw)
441 | target_rotmat = T.quat2mat(target_xyzw)
442 | # calculate position delta
443 | pos_diff = (target_pos - current_pos).flatten()
444 | pos_error = np.linalg.norm(pos_diff)
445 | # calculate rotation delta
446 | rot_error = angle_between_rotmat(current_rotmat, target_rotmat)
447 | # print status
448 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Curr pose: {current_pos}, {current_xyzw} (pos_error: {pos_error.round(4)}, rot_error: {np.rad2deg(rot_error).round(4)}){bcolors.ENDC}')
449 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Goal pose: {target_pos}, {target_xyzw} (pos_thres: {pos_threshold}, rot_thres: {rot_threshold}){bcolors.ENDC}')
450 | if pos_error < pos_threshold and rot_error < np.deg2rad(rot_threshold):
451 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] OSC pose reached (pos_error: {pos_error.round(4)}, rot_error: {np.rad2deg(rot_error).round(4)}){bcolors.ENDC}')
452 | return True, pos_error, rot_error
453 | return False, pos_error, rot_error
454 |
455 | def _move_to_waypoint(self, target_pose_world, pos_threshold=0.02, rot_threshold=3.0, max_steps=10):
456 | pos_errors = []
457 | rot_errors = []
458 | count = 0
459 | while count < max_steps:
460 | reached, pos_error, rot_error = self._check_reached_ee(target_pose_world[:3], target_pose_world[3:7], pos_threshold, rot_threshold)
461 | pos_errors.append(pos_error)
462 | rot_errors.append(rot_error)
463 | if reached:
464 | break
465 | # convert world pose to robot pose
466 | target_pose_robot = np.dot(self.world2robot_homo, T.convert_pose_quat2mat(target_pose_world))
467 | # convert to relative pose to be used with the underlying controller
468 | relative_position = target_pose_robot[:3, 3] - self.robot.get_relative_eef_position()
469 | relative_quat = T.quat_distance(T.mat2quat(target_pose_robot[:3, :3]), self.robot.get_relative_eef_orientation())
470 | assert isinstance(self.robot, Fetch), "this action space is only for fetch"
471 | action = np.zeros(12) # first 3 are base, which we don't use
472 | action[4:7] = relative_position
473 | action[7:10] = T.quat2axisangle(relative_quat)
474 | action[10:] = [self.last_og_gripper_action, self.last_og_gripper_action]
475 | # step the action
476 | _ = self._step(action=action)
477 | count += 1
478 | if count == max_steps:
479 | print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] OSC pose not reached after {max_steps} steps (pos_error: {pos_errors[-1].round(4)}, rot_error: {np.rad2deg(rot_errors[-1]).round(4)}){bcolors.ENDC}')
480 |
481 | def _step(self, action=None):
482 | if hasattr(self, 'disturbance_seq') and self.disturbance_seq is not None:
483 | next(self.disturbance_seq)
484 | if action is not None:
485 | self.og_env.step(action)
486 | else:
487 | og.sim.step()
488 | cam_obs = self.get_cam_obs()
489 | rgb = cam_obs[1]['rgb']
490 | if len(self.video_cache) < self.config['video_cache_size']:
491 | self.video_cache.append(rgb)
492 | else:
493 | self.video_cache.pop(0)
494 | self.video_cache.append(rgb)
495 | self.step_counter += 1
496 |
497 | def _initialize_cameras(self, cam_config):
498 | """
499 | ::param poses: list of tuples of (position, orientation) of the cameras
500 | """
501 | self.cams = dict()
502 | for cam_id in cam_config:
503 | cam_id = int(cam_id)
504 | self.cams[cam_id] = OGCamera(self.og_env, cam_config[cam_id])
505 | for _ in range(10): og.sim.render()
--------------------------------------------------------------------------------
/ik_solver.py:
--------------------------------------------------------------------------------
1 | """
2 | Adapted from OmniGibson and the Lula IK solver
3 | """
4 | import omnigibson.lazy as lazy
5 | import numpy as np
6 |
7 | class IKSolver:
8 | """
9 | Class for thinly wrapping Lula IK solver
10 | """
11 |
12 | def __init__(
13 | self,
14 | robot_description_path,
15 | robot_urdf_path,
16 | eef_name,
17 | reset_joint_pos,
18 | world2robot_homo,
19 | ):
20 | # Create robot description, kinematics, and config
21 | self.robot_description = lazy.lula.load_robot(robot_description_path, robot_urdf_path)
22 | self.kinematics = self.robot_description.kinematics()
23 | self.config = lazy.lula.CyclicCoordDescentIkConfig()
24 | self.eef_name = eef_name
25 | self.reset_joint_pos = reset_joint_pos
26 | self.world2robot_homo = world2robot_homo
27 |
28 | def solve(
29 | self,
30 | target_pose_homo,
31 | position_tolerance=0.01,
32 | orientation_tolerance=0.05,
33 | position_weight=1.0,
34 | orientation_weight=0.05,
35 | max_iterations=150,
36 | initial_joint_pos=None,
37 | ):
38 | """
39 | Backs out joint positions to achieve desired @target_pos and @target_quat
40 |
41 | Args:
42 | target_pose_homo (np.ndarray): [4, 4] homogeneous transformation matrix of the target pose in world frame
43 | position_tolerance (float): Maximum position error (L2-norm) for a successful IK solution
44 | orientation_tolerance (float): Maximum orientation error (per-axis L2-norm) for a successful IK solution
45 | position_weight (float): Weight for the relative importance of position error during CCD
46 | orientation_weight (float): Weight for the relative importance of position error during CCD
47 | max_iterations (int): Number of iterations used for each cyclic coordinate descent.
48 | initial_joint_pos (None or n-array): If specified, will set the initial cspace seed when solving for joint
49 | positions. Otherwise, will use self.reset_joint_pos
50 |
51 | Returns:
52 | ik_results (lazy.lula.CyclicCoordDescentIkResult): IK result object containing the joint positions and other information.
53 | """
54 | # convert target pose to robot base frame
55 | target_pose_robot = np.dot(self.world2robot_homo, target_pose_homo)
56 | target_pose_pos = target_pose_robot[:3, 3]
57 | target_pose_rot = target_pose_robot[:3, :3]
58 | ik_target_pose = lazy.lula.Pose3(lazy.lula.Rotation3(target_pose_rot), target_pose_pos)
59 | # Set the cspace seed and tolerance
60 | initial_joint_pos = self.reset_joint_pos if initial_joint_pos is None else np.array(initial_joint_pos)
61 | self.config.cspace_seeds = [initial_joint_pos]
62 | self.config.position_tolerance = position_tolerance
63 | self.config.orientation_tolerance = orientation_tolerance
64 | self.config.ccd_position_weight = position_weight
65 | self.config.ccd_orientation_weight = orientation_weight
66 | self.config.max_num_descents = max_iterations
67 | # Compute target joint positions
68 | ik_results = lazy.lula.compute_ik_ccd(self.kinematics, ik_target_pose, self.eef_name, self.config)
69 | return ik_results
--------------------------------------------------------------------------------
/keypoint_proposal.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | import cv2
4 | from torch.nn.functional import interpolate
5 | from kmeans_pytorch import kmeans
6 | from utils import filter_points_by_bounds
7 | from sklearn.cluster import MeanShift
8 |
9 | class KeypointProposer:
10 | def __init__(self, config):
11 | self.config = config
12 | self.device = torch.device(self.config['device'])
13 | self.dinov2 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14').eval().to(self.device)
14 | self.bounds_min = np.array(self.config['bounds_min'])
15 | self.bounds_max = np.array(self.config['bounds_max'])
16 | self.mean_shift = MeanShift(bandwidth=self.config['min_dist_bt_keypoints'], bin_seeding=True, n_jobs=32)
17 | self.patch_size = 14 # dinov2
18 | np.random.seed(self.config['seed'])
19 | torch.manual_seed(self.config['seed'])
20 | torch.cuda.manual_seed(self.config['seed'])
21 |
22 | def get_keypoints(self, rgb, points, masks):
23 | # preprocessing
24 | transformed_rgb, rgb, points, masks, shape_info = self._preprocess(rgb, points, masks)
25 | # get features
26 | features_flat = self._get_features(transformed_rgb, shape_info)
27 | # for each mask, cluster in feature space to get meaningful regions, and uske their centers as keypoint candidates
28 | candidate_keypoints, candidate_pixels, candidate_rigid_group_ids = self._cluster_features(points, features_flat, masks)
29 | # exclude keypoints that are outside of the workspace
30 | within_space = filter_points_by_bounds(candidate_keypoints, self.bounds_min, self.bounds_max, strict=True)
31 | candidate_keypoints = candidate_keypoints[within_space]
32 | candidate_pixels = candidate_pixels[within_space]
33 | candidate_rigid_group_ids = candidate_rigid_group_ids[within_space]
34 | # merge close points by clustering in cartesian space
35 | merged_indices = self._merge_clusters(candidate_keypoints)
36 | candidate_keypoints = candidate_keypoints[merged_indices]
37 | candidate_pixels = candidate_pixels[merged_indices]
38 | candidate_rigid_group_ids = candidate_rigid_group_ids[merged_indices]
39 | # sort candidates by locations
40 | sort_idx = np.lexsort((candidate_pixels[:, 0], candidate_pixels[:, 1]))
41 | candidate_keypoints = candidate_keypoints[sort_idx]
42 | candidate_pixels = candidate_pixels[sort_idx]
43 | candidate_rigid_group_ids = candidate_rigid_group_ids[sort_idx]
44 | # project keypoints to image space
45 | projected = self._project_keypoints_to_img(rgb, candidate_pixels, candidate_rigid_group_ids, masks, features_flat)
46 | return candidate_keypoints, projected
47 |
48 | def _preprocess(self, rgb, points, masks):
49 | # convert masks to binary masks
50 | masks = [masks == uid for uid in np.unique(masks)]
51 | # ensure input shape is compatible with dinov2
52 | H, W, _ = rgb.shape
53 | patch_h = int(H // self.patch_size)
54 | patch_w = int(W // self.patch_size)
55 | new_H = patch_h * self.patch_size
56 | new_W = patch_w * self.patch_size
57 | transformed_rgb = cv2.resize(rgb, (new_W, new_H))
58 | transformed_rgb = transformed_rgb.astype(np.float32) / 255.0 # float32 [H, W, 3]
59 | # shape info
60 | shape_info = {
61 | 'img_h': H,
62 | 'img_w': W,
63 | 'patch_h': patch_h,
64 | 'patch_w': patch_w,
65 | }
66 | return transformed_rgb, rgb, points, masks, shape_info
67 |
68 | def _project_keypoints_to_img(self, rgb, candidate_pixels, candidate_rigid_group_ids, masks, features_flat):
69 | projected = rgb.copy()
70 | # overlay keypoints on the image
71 | for keypoint_count, pixel in enumerate(candidate_pixels):
72 | displayed_text = f"{keypoint_count}"
73 | text_length = len(displayed_text)
74 | # draw a box
75 | box_width = 30 + 10 * (text_length - 1)
76 | box_height = 30
77 | cv2.rectangle(projected, (pixel[1] - box_width // 2, pixel[0] - box_height // 2), (pixel[1] + box_width // 2, pixel[0] + box_height // 2), (255, 255, 255), -1)
78 | cv2.rectangle(projected, (pixel[1] - box_width // 2, pixel[0] - box_height // 2), (pixel[1] + box_width // 2, pixel[0] + box_height // 2), (0, 0, 0), 2)
79 | # draw text
80 | org = (pixel[1] - 7 * (text_length), pixel[0] + 7)
81 | color = (255, 0, 0)
82 | cv2.putText(projected, str(keypoint_count), org, cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
83 | keypoint_count += 1
84 | return projected
85 |
86 | @torch.inference_mode()
87 | @torch.amp.autocast('cuda')
88 | def _get_features(self, transformed_rgb, shape_info):
89 | img_h = shape_info['img_h']
90 | img_w = shape_info['img_w']
91 | patch_h = shape_info['patch_h']
92 | patch_w = shape_info['patch_w']
93 | # get features
94 | img_tensors = torch.from_numpy(transformed_rgb).permute(2, 0, 1).unsqueeze(0).to(self.device) # float32 [1, 3, H, W]
95 | assert img_tensors.shape[1] == 3, "unexpected image shape"
96 | features_dict = self.dinov2.forward_features(img_tensors)
97 | raw_feature_grid = features_dict['x_norm_patchtokens'] # float32 [num_cams, patch_h*patch_w, feature_dim]
98 | raw_feature_grid = raw_feature_grid.reshape(1, patch_h, patch_w, -1) # float32 [num_cams, patch_h, patch_w, feature_dim]
99 | # compute per-point feature using bilinear interpolation
100 | interpolated_feature_grid = interpolate(raw_feature_grid.permute(0, 3, 1, 2), # float32 [num_cams, feature_dim, patch_h, patch_w]
101 | size=(img_h, img_w),
102 | mode='bilinear').permute(0, 2, 3, 1).squeeze(0) # float32 [H, W, feature_dim]
103 | features_flat = interpolated_feature_grid.reshape(-1, interpolated_feature_grid.shape[-1]) # float32 [H*W, feature_dim]
104 | return features_flat
105 |
106 | def _cluster_features(self, points, features_flat, masks):
107 | candidate_keypoints = []
108 | candidate_pixels = []
109 | candidate_rigid_group_ids = []
110 | for rigid_group_id, binary_mask in enumerate(masks):
111 | # ignore mask that is too large
112 | if np.mean(binary_mask) > self.config['max_mask_ratio']:
113 | continue
114 | # consider only foreground features
115 | obj_features_flat = features_flat[binary_mask.reshape(-1)]
116 | feature_pixels = np.argwhere(binary_mask)
117 | feature_points = points[binary_mask]
118 | # reduce dimensionality to be less sensitive to noise and texture
119 | obj_features_flat = obj_features_flat.double()
120 | (u, s, v) = torch.pca_lowrank(obj_features_flat, center=False)
121 | features_pca = torch.mm(obj_features_flat, v[:, :3])
122 | features_pca = (features_pca - features_pca.min(0)[0]) / (features_pca.max(0)[0] - features_pca.min(0)[0])
123 | X = features_pca
124 | # add feature_pixels as extra dimensions
125 | feature_points_torch = torch.tensor(feature_points, dtype=features_pca.dtype, device=features_pca.device)
126 | feature_points_torch = (feature_points_torch - feature_points_torch.min(0)[0]) / (feature_points_torch.max(0)[0] - feature_points_torch.min(0)[0])
127 | X = torch.cat([X, feature_points_torch], dim=-1)
128 | # cluster features to get meaningful regions
129 | cluster_ids_x, cluster_centers = kmeans(
130 | X=X,
131 | num_clusters=self.config['num_candidates_per_mask'],
132 | distance='euclidean',
133 | device=self.device,
134 | )
135 | cluster_centers = cluster_centers.to(self.device)
136 | for cluster_id in range(self.config['num_candidates_per_mask']):
137 | cluster_center = cluster_centers[cluster_id][:3]
138 | member_idx = cluster_ids_x == cluster_id
139 | member_points = feature_points[member_idx]
140 | member_pixels = feature_pixels[member_idx]
141 | member_features = features_pca[member_idx]
142 | dist = torch.norm(member_features - cluster_center, dim=-1)
143 | closest_idx = torch.argmin(dist)
144 | candidate_keypoints.append(member_points[closest_idx])
145 | candidate_pixels.append(member_pixels[closest_idx])
146 | candidate_rigid_group_ids.append(rigid_group_id)
147 |
148 | candidate_keypoints = np.array(candidate_keypoints)
149 | candidate_pixels = np.array(candidate_pixels)
150 | candidate_rigid_group_ids = np.array(candidate_rigid_group_ids)
151 |
152 | return candidate_keypoints, candidate_pixels, candidate_rigid_group_ids
153 |
154 | def _merge_clusters(self, candidate_keypoints):
155 | self.mean_shift.fit(candidate_keypoints)
156 | cluster_centers = self.mean_shift.cluster_centers_
157 | merged_indices = []
158 | for center in cluster_centers:
159 | dist = np.linalg.norm(candidate_keypoints - center, axis=-1)
160 | merged_indices.append(np.argmin(dist))
161 | return merged_indices
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import numpy as np
3 | import json
4 | import os
5 | import argparse
6 | from environment import ReKepOGEnv
7 | from keypoint_proposal import KeypointProposer
8 | from constraint_generation import ConstraintGenerator
9 | from ik_solver import IKSolver
10 | from subgoal_solver import SubgoalSolver
11 | from path_solver import PathSolver
12 | from visualizer import Visualizer
13 | import transform_utils as T
14 | from omnigibson.robots.fetch import Fetch
15 | from utils import (
16 | bcolors,
17 | get_config,
18 | load_functions_from_txt,
19 | get_linear_interpolation_steps,
20 | spline_interpolate_poses,
21 | get_callable_grasping_cost_fn,
22 | print_opt_debug_dict,
23 | )
24 |
25 | class Main:
26 | def __init__(self, scene_file, visualize=False):
27 | global_config = get_config(config_path="./configs/config.yaml")
28 | self.config = global_config['main']
29 | self.bounds_min = np.array(self.config['bounds_min'])
30 | self.bounds_max = np.array(self.config['bounds_max'])
31 | self.visualize = visualize
32 | # set random seed
33 | np.random.seed(self.config['seed'])
34 | torch.manual_seed(self.config['seed'])
35 | torch.cuda.manual_seed(self.config['seed'])
36 | # initialize keypoint proposer and constraint generator
37 | self.keypoint_proposer = KeypointProposer(global_config['keypoint_proposer'])
38 | self.constraint_generator = ConstraintGenerator(global_config['constraint_generator'])
39 | # initialize environment
40 | self.env = ReKepOGEnv(global_config['env'], scene_file, verbose=False)
41 | # setup ik solver (for reachability cost)
42 | assert isinstance(self.env.robot, Fetch), "The IK solver assumes the robot is a Fetch robot"
43 | ik_solver = IKSolver(
44 | robot_description_path=self.env.robot.robot_arm_descriptor_yamls[self.env.robot.default_arm],
45 | robot_urdf_path=self.env.robot.urdf_path,
46 | eef_name=self.env.robot.eef_link_names[self.env.robot.default_arm],
47 | reset_joint_pos=self.env.reset_joint_pos,
48 | world2robot_homo=self.env.world2robot_homo,
49 | )
50 | # initialize solvers
51 | self.subgoal_solver = SubgoalSolver(global_config['subgoal_solver'], ik_solver, self.env.reset_joint_pos)
52 | self.path_solver = PathSolver(global_config['path_solver'], ik_solver, self.env.reset_joint_pos)
53 | # initialize visualizer
54 | if self.visualize:
55 | self.visualizer = Visualizer(global_config['visualizer'], self.env)
56 |
57 | def perform_task(self, instruction, rekep_program_dir=None, disturbance_seq=None):
58 | self.env.reset()
59 | cam_obs = self.env.get_cam_obs()
60 | rgb = cam_obs[self.config['vlm_camera']]['rgb']
61 | points = cam_obs[self.config['vlm_camera']]['points']
62 | mask = cam_obs[self.config['vlm_camera']]['seg']
63 | # ====================================
64 | # = keypoint proposal and constraint generation
65 | # ====================================
66 | if rekep_program_dir is None:
67 | keypoints, projected_img = self.keypoint_proposer.get_keypoints(rgb, points, mask)
68 | print(f'{bcolors.HEADER}Got {len(keypoints)} proposed keypoints{bcolors.ENDC}')
69 | if self.visualize:
70 | self.visualizer.show_img(projected_img)
71 | metadata = {'init_keypoint_positions': keypoints, 'num_keypoints': len(keypoints)}
72 | rekep_program_dir = self.constraint_generator.generate(projected_img, instruction, metadata)
73 | print(f'{bcolors.HEADER}Constraints generated{bcolors.ENDC}')
74 | # ====================================
75 | # = execute
76 | # ====================================
77 | self._execute(rekep_program_dir, disturbance_seq)
78 |
79 | def _update_disturbance_seq(self, stage, disturbance_seq):
80 | if disturbance_seq is not None:
81 | if stage in disturbance_seq and not self.applied_disturbance[stage]:
82 | # set the disturbance sequence, the generator will yield and instantiate one disturbance function for each env.step until it is exhausted
83 | self.env.disturbance_seq = disturbance_seq[stage](self.env)
84 | self.applied_disturbance[stage] = True
85 |
86 | def _execute(self, rekep_program_dir, disturbance_seq=None):
87 | # load metadata
88 | with open(os.path.join(rekep_program_dir, 'metadata.json'), 'r') as f:
89 | self.program_info = json.load(f)
90 | self.applied_disturbance = {stage: False for stage in range(1, self.program_info['num_stages'] + 1)}
91 | # register keypoints to be tracked
92 | self.env.register_keypoints(self.program_info['init_keypoint_positions'])
93 | # load constraints
94 | self.constraint_fns = dict()
95 | for stage in range(1, self.program_info['num_stages'] + 1): # stage starts with 1
96 | stage_dict = dict()
97 | for constraint_type in ['subgoal', 'path']:
98 | load_path = os.path.join(rekep_program_dir, f'stage{stage}_{constraint_type}_constraints.txt')
99 | get_grasping_cost_fn = get_callable_grasping_cost_fn(self.env) # special grasping function for VLM to call
100 | stage_dict[constraint_type] = load_functions_from_txt(load_path, get_grasping_cost_fn) if os.path.exists(load_path) else []
101 | self.constraint_fns[stage] = stage_dict
102 |
103 | # bookkeeping of which keypoints can be moved in the optimization
104 | self.keypoint_movable_mask = np.zeros(self.program_info['num_keypoints'] + 1, dtype=bool)
105 | self.keypoint_movable_mask[0] = True # first keypoint is always the ee, so it's movable
106 |
107 | # main loop
108 | self.last_sim_step_counter = -np.inf
109 | self._update_stage(1)
110 | while True:
111 | scene_keypoints = self.env.get_keypoint_positions()
112 | self.keypoints = np.concatenate([[self.env.get_ee_pos()], scene_keypoints], axis=0) # first keypoint is always the ee
113 | self.curr_ee_pose = self.env.get_ee_pose()
114 | self.curr_joint_pos = self.env.get_arm_joint_postions()
115 | self.sdf_voxels = self.env.get_sdf_voxels(self.config['sdf_voxel_size'])
116 | self.collision_points = self.env.get_collision_points()
117 | # ====================================
118 | # = decide whether to backtrack
119 | # ====================================
120 | backtrack = False
121 | if self.stage > 1:
122 | path_constraints = self.constraint_fns[self.stage]['path']
123 | for constraints in path_constraints:
124 | violation = constraints(self.keypoints[0], self.keypoints[1:])
125 | if violation > self.config['constraint_tolerance']:
126 | backtrack = True
127 | break
128 | if backtrack:
129 | # determine which stage to backtrack to based on constraints
130 | for new_stage in range(self.stage - 1, 0, -1):
131 | path_constraints = self.constraint_fns[new_stage]['path']
132 | # if no constraints, we can safely backtrack
133 | if len(path_constraints) == 0:
134 | break
135 | # otherwise, check if all constraints are satisfied
136 | all_constraints_satisfied = True
137 | for constraints in path_constraints:
138 | violation = constraints(self.keypoints[0], self.keypoints[1:])
139 | if violation > self.config['constraint_tolerance']:
140 | all_constraints_satisfied = False
141 | break
142 | if all_constraints_satisfied:
143 | break
144 | print(f"{bcolors.HEADER}[stage={self.stage}] backtrack to stage {new_stage}{bcolors.ENDC}")
145 | self._update_stage(new_stage)
146 | else:
147 | # apply disturbance
148 | self._update_disturbance_seq(self.stage, disturbance_seq)
149 | # ====================================
150 | # = get optimized plan
151 | # ====================================
152 | if self.last_sim_step_counter == self.env.step_counter:
153 | print(f"{bcolors.WARNING}sim did not step forward within last iteration (HINT: adjust action_steps_per_iter to be larger or the pos_threshold to be smaller){bcolors.ENDC}")
154 | next_subgoal = self._get_next_subgoal(from_scratch=self.first_iter)
155 | next_path = self._get_next_path(next_subgoal, from_scratch=self.first_iter)
156 | self.first_iter = False
157 | self.action_queue = next_path.tolist()
158 | self.last_sim_step_counter = self.env.step_counter
159 |
160 | # ====================================
161 | # = execute
162 | # ====================================
163 | # determine if we proceed to the next stage
164 | count = 0
165 | while len(self.action_queue) > 0 and count < self.config['action_steps_per_iter']:
166 | next_action = self.action_queue.pop(0)
167 | precise = len(self.action_queue) == 0
168 | self.env.execute_action(next_action, precise=precise)
169 | count += 1
170 | if len(self.action_queue) == 0:
171 | if self.is_grasp_stage:
172 | self._execute_grasp_action()
173 | elif self.is_release_stage:
174 | self._execute_release_action()
175 | # if completed, save video and return
176 | if self.stage == self.program_info['num_stages']:
177 | self.env.sleep(2.0)
178 | save_path = self.env.save_video()
179 | print(f"{bcolors.OKGREEN}Video saved to {save_path}\n\n{bcolors.ENDC}")
180 | return
181 | # progress to next stage
182 | self._update_stage(self.stage + 1)
183 |
184 | def _get_next_subgoal(self, from_scratch):
185 | subgoal_constraints = self.constraint_fns[self.stage]['subgoal']
186 | path_constraints = self.constraint_fns[self.stage]['path']
187 | subgoal_pose, debug_dict = self.subgoal_solver.solve(self.curr_ee_pose,
188 | self.keypoints,
189 | self.keypoint_movable_mask,
190 | subgoal_constraints,
191 | path_constraints,
192 | self.sdf_voxels,
193 | self.collision_points,
194 | self.is_grasp_stage,
195 | self.curr_joint_pos,
196 | from_scratch=from_scratch)
197 | subgoal_pose_homo = T.convert_pose_quat2mat(subgoal_pose)
198 | # if grasp stage, back up a bit to leave room for grasping
199 | if self.is_grasp_stage:
200 | subgoal_pose[:3] += subgoal_pose_homo[:3, :3] @ np.array([-self.config['grasp_depth'] / 2.0, 0, 0])
201 | debug_dict['stage'] = self.stage
202 | print_opt_debug_dict(debug_dict)
203 | if self.visualize:
204 | self.visualizer.visualize_subgoal(subgoal_pose)
205 | return subgoal_pose
206 |
207 | def _get_next_path(self, next_subgoal, from_scratch):
208 | path_constraints = self.constraint_fns[self.stage]['path']
209 | path, debug_dict = self.path_solver.solve(self.curr_ee_pose,
210 | next_subgoal,
211 | self.keypoints,
212 | self.keypoint_movable_mask,
213 | path_constraints,
214 | self.sdf_voxels,
215 | self.collision_points,
216 | self.curr_joint_pos,
217 | from_scratch=from_scratch)
218 | print_opt_debug_dict(debug_dict)
219 | processed_path = self._process_path(path)
220 | if self.visualize:
221 | self.visualizer.visualize_path(processed_path)
222 | return processed_path
223 |
224 | def _process_path(self, path):
225 | # spline interpolate the path from the current ee pose
226 | full_control_points = np.concatenate([
227 | self.curr_ee_pose.reshape(1, -1),
228 | path,
229 | ], axis=0)
230 | num_steps = get_linear_interpolation_steps(full_control_points[0], full_control_points[-1],
231 | self.config['interpolate_pos_step_size'],
232 | self.config['interpolate_rot_step_size'])
233 | dense_path = spline_interpolate_poses(full_control_points, num_steps)
234 | # add gripper action
235 | ee_action_seq = np.zeros((dense_path.shape[0], 8))
236 | ee_action_seq[:, :7] = dense_path
237 | ee_action_seq[:, 7] = self.env.get_gripper_null_action()
238 | return ee_action_seq
239 |
240 | def _update_stage(self, stage):
241 | # update stage
242 | self.stage = stage
243 | self.is_grasp_stage = self.program_info['grasp_keypoints'][self.stage - 1] != -1
244 | self.is_release_stage = self.program_info['release_keypoints'][self.stage - 1] != -1
245 | # can only be grasp stage or release stage or none
246 | assert self.is_grasp_stage + self.is_release_stage <= 1, "Cannot be both grasp and release stage"
247 | if self.is_grasp_stage: # ensure gripper is open for grasping stage
248 | self.env.open_gripper()
249 | # clear action queue
250 | self.action_queue = []
251 | # update keypoint movable mask
252 | self._update_keypoint_movable_mask()
253 | self.first_iter = True
254 |
255 | def _update_keypoint_movable_mask(self):
256 | for i in range(1, len(self.keypoint_movable_mask)): # first keypoint is ee so always movable
257 | keypoint_object = self.env.get_object_by_keypoint(i - 1)
258 | self.keypoint_movable_mask[i] = self.env.is_grasping(keypoint_object)
259 |
260 | def _execute_grasp_action(self):
261 | pregrasp_pose = self.env.get_ee_pose()
262 | grasp_pose = pregrasp_pose.copy()
263 | grasp_pose[:3] += T.quat2mat(pregrasp_pose[3:]) @ np.array([self.config['grasp_depth'], 0, 0])
264 | grasp_action = np.concatenate([grasp_pose, [self.env.get_gripper_close_action()]])
265 | self.env.execute_action(grasp_action, precise=True)
266 |
267 | def _execute_release_action(self):
268 | self.env.open_gripper()
269 |
270 | if __name__ == "__main__":
271 | parser = argparse.ArgumentParser()
272 | parser.add_argument('--task', type=str, default='pen', help='task to perform')
273 | parser.add_argument('--use_cached_query', action='store_true', help='instead of querying the VLM, use the cached query')
274 | parser.add_argument('--apply_disturbance', action='store_true', help='apply disturbance to test the robustness')
275 | parser.add_argument('--visualize', action='store_true', help='visualize each solution before executing (NOTE: this is blocking and needs to press "ESC" to continue)')
276 | args = parser.parse_args()
277 |
278 | if args.apply_disturbance:
279 | assert args.task == 'pen' and args.use_cached_query, 'disturbance sequence is only defined for cached scenario'
280 |
281 | # ====================================
282 | # = pen task disturbance sequence
283 | # ====================================
284 | def stage1_disturbance_seq(env):
285 | """
286 | Move the pen in stage 0 when robot is trying to grasp the pen
287 | """
288 | pen = env.og_env.scene.object_registry("name", "pen_1")
289 | holder = env.og_env.scene.object_registry("name", "pencil_holder_1")
290 | # disturbance sequence
291 | pos0, orn0 = pen.get_position_orientation()
292 | pose0 = np.concatenate([pos0, orn0])
293 | pos1 = pos0 + np.array([-0.08, 0.0, 0.0])
294 | orn1 = T.quat_multiply(T.euler2quat(np.array([0, 0, np.pi/4])), orn0)
295 | pose1 = np.concatenate([pos1, orn1])
296 | pos2 = pos1 + np.array([0.10, 0.0, 0.0])
297 | orn2 = T.quat_multiply(T.euler2quat(np.array([0, 0, -np.pi/2])), orn1)
298 | pose2 = np.concatenate([pos2, orn2])
299 | control_points = np.array([pose0, pose1, pose2])
300 | pose_seq = spline_interpolate_poses(control_points, num_steps=25)
301 | def disturbance(counter):
302 | if counter < len(pose_seq):
303 | pose = pose_seq[counter]
304 | pos, orn = pose[:3], pose[3:]
305 | pen.set_position_orientation(pos, orn)
306 | counter += 1
307 | counter = 0
308 | while True:
309 | yield disturbance(counter)
310 | counter += 1
311 |
312 | def stage2_disturbance_seq(env):
313 | """
314 | Take the pen out of the gripper in stage 1 when robot is trying to reorient the pen
315 | """
316 | apply_disturbance = env.is_grasping()
317 | pen = env.og_env.scene.object_registry("name", "pen_1")
318 | holder = env.og_env.scene.object_registry("name", "pencil_holder_1")
319 | # disturbance sequence
320 | pos0, orn0 = pen.get_position_orientation()
321 | pose0 = np.concatenate([pos0, orn0])
322 | pose1 = np.array([-0.30, -0.15, 0.71, -0.7071068, 0, 0, 0.7071068])
323 | control_points = np.array([pose0, pose1])
324 | pose_seq = spline_interpolate_poses(control_points, num_steps=25)
325 | def disturbance(counter):
326 | if apply_disturbance:
327 | if counter < 20:
328 | if counter > 15:
329 | env.robot.release_grasp_immediately() # force robot to release the pen
330 | else:
331 | pass # do nothing for the other steps
332 | elif counter < len(pose_seq) + 20:
333 | env.robot.release_grasp_immediately() # force robot to release the pen
334 | pose = pose_seq[counter - 20]
335 | pos, orn = pose[:3], pose[3:]
336 | pen.set_position_orientation(pos, orn)
337 | counter += 1
338 | counter = 0
339 | while True:
340 | yield disturbance(counter)
341 | counter += 1
342 |
343 | def stage3_disturbance_seq(env):
344 | """
345 | Move the holder in stage 2 when robot is trying to drop the pen into the holder
346 | """
347 | pen = env.og_env.scene.object_registry("name", "pen_1")
348 | holder = env.og_env.scene.object_registry("name", "pencil_holder_1")
349 | # disturbance sequence
350 | pos0, orn0 = holder.get_position_orientation()
351 | pose0 = np.concatenate([pos0, orn0])
352 | pos1 = pos0 + np.array([-0.02, -0.15, 0.0])
353 | orn1 = orn0
354 | pose1 = np.concatenate([pos1, orn1])
355 | control_points = np.array([pose0, pose1])
356 | pose_seq = spline_interpolate_poses(control_points, num_steps=5)
357 | def disturbance(counter):
358 | if counter < len(pose_seq):
359 | pose = pose_seq[counter]
360 | pos, orn = pose[:3], pose[3:]
361 | holder.set_position_orientation(pos, orn)
362 | counter += 1
363 | counter = 0
364 | while True:
365 | yield disturbance(counter)
366 | counter += 1
367 |
368 | task_list = {
369 | 'pen': {
370 | 'scene_file': './configs/og_scene_file_pen.json',
371 | 'instruction': 'reorient the white pen and drop it upright into the black pen holder',
372 | 'rekep_program_dir': './vlm_query/pen',
373 | 'disturbance_seq': {1: stage1_disturbance_seq, 2: stage2_disturbance_seq, 3: stage3_disturbance_seq},
374 | },
375 | }
376 | task = task_list['pen']
377 | scene_file = task['scene_file']
378 | instruction = task['instruction']
379 | main = Main(scene_file, visualize=args.visualize)
380 | main.perform_task(instruction,
381 | rekep_program_dir=task['rekep_program_dir'] if args.use_cached_query else None,
382 | disturbance_seq=task.get('disturbance_seq', None) if args.apply_disturbance else None)
--------------------------------------------------------------------------------
/media/pen-in-holder-disturbances.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huangwl18/ReKep/63c43fdba60354980258beaeb8a7d48e088e1e3e/media/pen-in-holder-disturbances.gif
--------------------------------------------------------------------------------
/media/pen-in-holder.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huangwl18/ReKep/63c43fdba60354980258beaeb8a7d48e088e1e3e/media/pen-in-holder.gif
--------------------------------------------------------------------------------
/og_utils.py:
--------------------------------------------------------------------------------
1 | from omnigibson.sensors.vision_sensor import VisionSensor
2 | import transform_utils as T
3 | import numpy as np
4 |
5 | class OGCamera:
6 | """
7 | Defines the camera class
8 | """
9 | def __init__(self, og_env, config) -> None:
10 | self.cam = insert_camera(name=config['name'], og_env=og_env, width=config['resolution'], height=config['resolution'])
11 | self.cam.set_position_orientation(config['position'], config['orientation'])
12 | self.intrinsics = get_cam_intrinsics(self.cam)
13 | self.extrinsics = get_cam_extrinsics(self.cam)
14 |
15 | def get_params(self):
16 | """
17 | Get the intrinsic and extrinsic parameters of the camera
18 | """
19 | return {"intrinsics": self.intrinsics, "extrinsics": self.extrinsics}
20 |
21 | def get_obs(self):
22 | """
23 | Gets the image observation from the camera.
24 | Assumes have rendered befor calling this function.
25 | No semantic handling here for now.
26 | """
27 | obs = self.cam.get_obs()
28 | ret = {}
29 | ret["rgb"] = obs[0]["rgb"][:,:,:3] # H, W, 3
30 | ret["depth"] = obs[0]["depth_linear"] # H, W
31 | ret["points"] = pixel_to_3d_points(ret["depth"], self.intrinsics, self.extrinsics) # H, W, 3
32 | ret["seg"] = obs[0]["seg_semantic"] # H, W
33 | ret["intrinsic"] = self.intrinsics
34 | ret["extrinsic"] = self.extrinsics
35 | return ret
36 |
37 | def insert_camera(name, og_env, width=480, height=480):
38 | try:
39 | cam = VisionSensor(
40 | prim_path=f"/World/{name}",
41 | name=name,
42 | image_width=width,
43 | image_height=height,
44 | modalities=["rgb", "depth_linear", "seg_semantic"]
45 | )
46 | except TypeError:
47 | cam = VisionSensor(
48 | relative_prim_path=f"/{name}",
49 | name=name,
50 | image_width=width,
51 | image_height=height,
52 | modalities=["rgb", "depth_linear", "seg_semantic"]
53 | )
54 |
55 | try:
56 | cam.load()
57 | except TypeError:
58 | cam.load(og_env.scene)
59 | cam.initialize()
60 | return cam
61 |
62 | def get_cam_intrinsics(cam):
63 | """
64 | Get the intrinsics matrix for a VisionSensor object
65 | ::param cam: VisionSensor object
66 | ::return intrinsics: 3x3 numpy array
67 | """
68 | img_width = cam.image_width
69 | img_height = cam.image_height
70 |
71 | if img_width != img_height:
72 | raise ValueError("Only square images are supported")
73 |
74 | apert = cam.prim.GetAttribute("horizontalAperture").Get()
75 | focal_len_in_pixel = cam.focal_length * img_width / apert
76 |
77 | intrinsics = np.eye(3)
78 | intrinsics[0,0] = focal_len_in_pixel
79 | intrinsics[1,1] = focal_len_in_pixel
80 | intrinsics[0,2] = img_width / 2
81 | intrinsics[1,2] = img_height / 2
82 |
83 | return intrinsics
84 |
85 | def get_cam_extrinsics(cam):
86 | return T.pose_inv(T.pose2mat(cam.get_position_orientation()))
87 |
88 | def pixel_to_3d_points(depth_image, intrinsics, extrinsics):
89 | # Get the shape of the depth image
90 | H, W = depth_image.shape
91 |
92 | # Create a grid of (x, y) coordinates corresponding to each pixel in the image
93 | i, j = np.meshgrid(np.arange(W), np.arange(H), indexing='xy')
94 |
95 | # Unpack the intrinsic parameters
96 | fx, fy = intrinsics[0, 0], intrinsics[1, 1]
97 | cx, cy = intrinsics[0, 2], intrinsics[1, 2]
98 |
99 | # Convert pixel coordinates to normalized camera coordinates
100 | z = depth_image
101 | x = (i - cx) * z / fx
102 | y = (j - cy) * z / fy
103 |
104 | # Stack the coordinates to form (H, W, 3)
105 | camera_coordinates = np.stack((x, y, z), axis=-1)
106 |
107 | # Reshape to (H*W, 3) for matrix multiplication
108 | camera_coordinates = camera_coordinates.reshape(-1, 3)
109 |
110 | # Convert to homogeneous coordinates (H*W, 4)
111 | camera_coordinates_homogeneous = np.hstack((camera_coordinates, np.ones((camera_coordinates.shape[0], 1))))
112 |
113 | # additional conversion to og convention
114 | T_mod = np.array([[1., 0., 0., 0., ],
115 | [0., -1., 0., 0.,],
116 | [0., 0., -1., 0.,],
117 | [0., 0., 0., 1.,]])
118 | camera_coordinates_homogeneous = camera_coordinates_homogeneous @ T_mod
119 |
120 | # Apply extrinsics to get world coordinates
121 | # world_coordinates_homogeneous = camera_coordinates_homogeneous @ extrinsics.T
122 | world_coordinates_homogeneous = T.pose_inv(extrinsics) @ (camera_coordinates_homogeneous.T)
123 | world_coordinates_homogeneous = world_coordinates_homogeneous.T
124 |
125 | # Convert back to non-homogeneous coordinates
126 | world_coordinates = world_coordinates_homogeneous[:, :3] / world_coordinates_homogeneous[:, 3, np.newaxis]
127 |
128 | # Reshape back to (H, W, 3)
129 | world_coordinates = world_coordinates.reshape(H, W, 3)
130 |
131 | return world_coordinates
132 |
133 | def point_to_pixel(pt, intrinsics, extrinsics):
134 | """
135 | pt -- (N, 3) 3d points in world frame
136 | intrinsics -- (3, 3) intrinsics matrix
137 | extrinsics -- (4, 4) extrinsics matrix
138 | """
139 | pt_in_cam = extrinsics @ np.hstack((pt, np.ones((pt.shape[0], 1)))).T # (4, N)
140 | # multiply y, z by -1
141 | pt_in_cam[1, :] *= -1
142 | pt_in_cam[2, :] *= -1
143 | pt_in_cam = pt_in_cam[:3, :]
144 | pt_in_cam = intrinsics @ pt_in_cam
145 | pt_in_cam /= pt_in_cam[2, :]
146 |
147 | return pt_in_cam[:2, :].T
--------------------------------------------------------------------------------
/path_solver.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from scipy.optimize import dual_annealing, minimize
3 | from scipy.interpolate import RegularGridInterpolator
4 | import copy
5 | import time
6 | import transform_utils as T
7 | from utils import (
8 | farthest_point_sampling,
9 | get_linear_interpolation_steps,
10 | linear_interpolate_poses,
11 | normalize_vars,
12 | unnormalize_vars,
13 | get_samples_jitted,
14 | calculate_collision_cost,
15 | path_length,
16 | transform_keypoints,
17 | )
18 |
19 | # ====================================
20 | # = objective function
21 | # ====================================
22 | def objective(opt_vars,
23 | og_bounds,
24 | start_pose,
25 | end_pose,
26 | keypoints_centered,
27 | keypoint_movable_mask,
28 | path_constraints,
29 | sdf_func,
30 | collision_points_centered,
31 | opt_interpolate_pos_step_size,
32 | opt_interpolate_rot_step_size,
33 | ik_solver,
34 | initial_joint_pos,
35 | reset_joint_pos,
36 | return_debug_dict=False):
37 |
38 | debug_dict = {}
39 | debug_dict['num_control_points'] = len(opt_vars) // 6
40 |
41 | # unnormalize variables and do conversion
42 | unnormalized_opt_vars = unnormalize_vars(opt_vars, og_bounds)
43 | control_points_euler = np.concatenate([start_pose[None], unnormalized_opt_vars.reshape(-1, 6), end_pose[None]], axis=0) # [num_control_points, 6]
44 | control_points_homo = T.convert_pose_euler2mat(control_points_euler) # [num_control_points, 4, 4]
45 | control_points_quat = T.convert_pose_mat2quat(control_points_homo) # [num_control_points, 7]
46 | # get dense samples
47 | poses_quat, num_poses = get_samples_jitted(control_points_homo, control_points_quat, opt_interpolate_pos_step_size, opt_interpolate_rot_step_size)
48 | poses_homo = T.convert_pose_quat2mat(poses_quat)
49 | debug_dict['num_poses'] = num_poses
50 | start_idx, end_idx = 1, num_poses - 1 # exclude start and goal
51 |
52 | cost = 0
53 | # collision cost
54 | if collision_points_centered is not None:
55 | collision_cost = 0.5 * calculate_collision_cost(poses_homo[start_idx:end_idx], sdf_func, collision_points_centered, 0.20)
56 | debug_dict['collision_cost'] = collision_cost
57 | cost += collision_cost
58 |
59 | # penalize path length
60 | pos_length, rot_length = path_length(poses_homo)
61 | approx_length = pos_length + rot_length * 1.0
62 | path_length_cost = 4.0 * approx_length
63 | debug_dict['path_length_cost'] = path_length_cost
64 | cost += path_length_cost
65 |
66 | # reachability cost
67 | ik_cost = 0
68 | reset_reg_cost = 0
69 | debug_dict['ik_pos_error'] = []
70 | debug_dict['ik_feasible'] = []
71 | max_iterations = 20
72 | for control_point_homo in control_points_homo:
73 | ik_result = ik_solver.solve(
74 | control_point_homo,
75 | max_iterations=max_iterations,
76 | initial_joint_pos=initial_joint_pos,
77 | )
78 | debug_dict['ik_pos_error'].append(ik_result.position_error)
79 | debug_dict['ik_feasible'].append(ik_result.success)
80 | ik_cost += 20.0 * (ik_result.num_descents / max_iterations)
81 | if ik_result.success:
82 | reset_reg = np.linalg.norm(ik_result.cspace_position[:-1] - reset_joint_pos[:-1])
83 | reset_reg = np.clip(reset_reg, 0.0, 3.0)
84 | else:
85 | reset_reg = 3.0
86 | reset_reg_cost += 0.2 * reset_reg
87 | debug_dict['ik_pos_error'] = np.array(debug_dict['ik_pos_error'])
88 | debug_dict['ik_feasible'] = np.array(debug_dict['ik_feasible'])
89 | debug_dict['ik_cost'] = ik_cost
90 | debug_dict['reset_reg_cost'] = reset_reg_cost
91 | cost += ik_cost
92 |
93 | # # path constraint violation cost
94 | debug_dict['path_violation'] = None
95 | if path_constraints is not None and len(path_constraints) > 0:
96 | path_constraint_cost = 0
97 | path_violation = []
98 | for pose in poses_homo[start_idx:end_idx]:
99 | transformed_keypoints = transform_keypoints(pose, keypoints_centered, keypoint_movable_mask)
100 | for constraint in path_constraints:
101 | violation = constraint(transformed_keypoints[0], transformed_keypoints[1:])
102 | path_violation.append(violation)
103 | path_constraint_cost += np.clip(violation, 0, np.inf)
104 | path_constraint_cost = 200.0*path_constraint_cost
105 | debug_dict['path_constraint_cost'] = path_constraint_cost
106 | debug_dict['path_violation'] = path_violation
107 | cost += path_constraint_cost
108 |
109 | debug_dict['total_cost'] = cost
110 |
111 | if return_debug_dict:
112 | return cost, debug_dict
113 |
114 | return cost
115 |
116 |
117 | class PathSolver:
118 | """
119 | Given a goal pose and a start pose, solve for a sequence of intermediate poses for the end effector to follow.
120 |
121 | Optimization variables:
122 | - sequence of intermediate control points
123 | """
124 |
125 | def __init__(self, config, ik_solver, reset_joint_pos):
126 | self.config = config
127 | self.ik_solver = ik_solver
128 | self.reset_joint_pos = reset_joint_pos
129 | self.last_opt_result = None
130 | # warmup
131 | self._warmup()
132 |
133 | def _warmup(self):
134 | start_pose = np.array([0.0, 0.0, 0.3, 0, 0, 0, 1])
135 | end_pose = np.array([0.0, 0.0, 0.0, 0, 0, 0, 1])
136 | keypoints = np.random.rand(10, 3)
137 | keypoint_movable_mask = np.random.rand(10) > 0.5
138 | path_constraints = []
139 | sdf_voxels = np.zeros((10, 10, 10))
140 | collision_points = np.random.rand(100, 3)
141 | self.solve(start_pose, end_pose, keypoints, keypoint_movable_mask, path_constraints, sdf_voxels, collision_points, None, from_scratch=True)
142 | self.last_opt_result = None
143 |
144 | def _setup_sdf(self, sdf_voxels):
145 | # create callable sdf function with interpolation
146 | x = np.linspace(self.config['bounds_min'][0], self.config['bounds_max'][0], sdf_voxels.shape[0])
147 | y = np.linspace(self.config['bounds_min'][1], self.config['bounds_max'][1], sdf_voxels.shape[1])
148 | z = np.linspace(self.config['bounds_min'][2], self.config['bounds_max'][2], sdf_voxels.shape[2])
149 | sdf_func = RegularGridInterpolator((x, y, z), sdf_voxels, bounds_error=False, fill_value=0)
150 | return sdf_func
151 |
152 | def _check_opt_result(self, opt_result, path_quat, debug_dict, og_bounds):
153 | # accept the opt_result if it's only terminated due to iteration limit
154 | if (not opt_result.success and ('maximum' in opt_result.message.lower() or 'iteration' in opt_result.message.lower() or 'not necessarily' in opt_result.message.lower())):
155 | opt_result.success = True
156 | elif not opt_result.success:
157 | opt_result.message += '; invalid solution'
158 | # check whether path constraints are satisfied
159 | if debug_dict['path_violation'] is not None:
160 | path_violation = np.array(debug_dict['path_violation'])
161 | opt_result.message += f'; path_violation: {path_violation}'
162 | path_constraints_satisfied = all([violation <= self.config['constraint_tolerance'] for violation in path_violation])
163 | if not path_constraints_satisfied:
164 | opt_result.success = False
165 | opt_result.message += f'; path constraint not satisfied'
166 | return opt_result
167 |
168 | def _center_collision_points_and_keypoints(self, ee_pose, collision_points, keypoints, keypoint_movable_mask):
169 | ee_pose_homo = T.pose2mat([ee_pose[:3], T.euler2quat(ee_pose[3:])])
170 | centering_transform = np.linalg.inv(ee_pose_homo)
171 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3]
172 | keypoints_centered = transform_keypoints(centering_transform, keypoints, keypoint_movable_mask)
173 | return collision_points_centered, keypoints_centered
174 |
175 | def solve(self,
176 | start_pose,
177 | end_pose,
178 | keypoints,
179 | keypoint_movable_mask,
180 | path_constraints,
181 | sdf_voxels,
182 | collision_points,
183 | initial_joint_pos,
184 | from_scratch=False):
185 | """
186 | Args:
187 | - start_pose (np.ndarray): [7], [x, y, z, qx, qy, qz, qw]
188 | - end_pose (np.ndarray): [7], [x, y, z, qx, qy, qz, qw]
189 | - keypoints (np.ndarray): [num_keypoints, 3]
190 | - keypoint_movable_mask (bool): whether the keypoints are on the object being grasped
191 | - path_constraints (List[Callable]): path constraints
192 | - sdf_voxels (np.ndarray): [H, W, D]
193 | - collision_points (np.ndarray): [num_points, 3], point cloud of the object being grasped
194 | - initial_joint_pos (np.ndarray): [N] initial joint positions of the robot.
195 | - from_scratch (bool): whether to start from scratch
196 |
197 | Returns:
198 | - opt_result (scipy.optimize.OptimizeResult): optimization opt_result
199 | - debug_dict (dict): debug information
200 | """
201 | # downsample collision points
202 | if collision_points is not None and collision_points.shape[0] > self.config['max_collision_points']:
203 | collision_points = farthest_point_sampling(collision_points, self.config['max_collision_points'])
204 | sdf_func = self._setup_sdf(sdf_voxels)
205 |
206 | # ====================================
207 | # = setup bounds
208 | # ====================================
209 | # calculate an appropriate number of control points, including start and goal
210 | num_control_points = get_linear_interpolation_steps(start_pose, end_pose, self.config['opt_pos_step_size'], self.config['opt_rot_step_size'])
211 | num_control_points = np.clip(num_control_points, 3, 6)
212 | # transform to euler representation
213 | start_pose = np.concatenate([start_pose[:3], T.quat2euler(start_pose[3:])])
214 | end_pose = np.concatenate([end_pose[:3], T.quat2euler(end_pose[3:])])
215 |
216 | # bounds for decision variables
217 | og_bounds = [(b_min, b_max) for b_min, b_max in zip(self.config['bounds_min'], self.config['bounds_max'])] + \
218 | [(-np.pi, np.pi) for _ in range(3)]
219 | og_bounds *= (num_control_points - 2)
220 | og_bounds = np.array(og_bounds, dtype=np.float64)
221 | bounds = [(-1, 1)] * len(og_bounds)
222 | num_vars = len(bounds)
223 |
224 | # ====================================
225 | # = setup initial guess
226 | # ====================================
227 | # use previous opt_result as initial guess if available
228 | if not from_scratch and self.last_opt_result is not None:
229 | init_sol = self.last_opt_result.x
230 | # if there are more control points in this iter, fill the rest with the last value + small noise
231 | if len(init_sol) < num_vars:
232 | new_x0 = np.empty(num_vars)
233 | new_x0[:len(init_sol)] = init_sol
234 | for i in range(len(init_sol), num_vars, 6):
235 | new_x0[i:i+6] = init_sol[-6:] + np.random.randn(6) * 0.01
236 | init_sol = new_x0
237 | # otherwise, use the last num_vars values
238 | else:
239 | init_sol = init_sol[-num_vars:]
240 | # initial guess as linear interpolation
241 | else:
242 | from_scratch = True
243 | interp_poses = linear_interpolate_poses(start_pose, end_pose, num_control_points) # [num_control_points, 6]
244 | init_sol = interp_poses[1:-1].flatten() # [num_control_points-2, 6]
245 | init_sol = normalize_vars(init_sol, og_bounds)
246 |
247 | # clip the initial guess to be within bounds
248 | for i, (b_min, b_max) in enumerate(bounds):
249 | init_sol[i] = np.clip(init_sol[i], b_min, b_max)
250 |
251 | # ====================================
252 | # = other setup
253 | # ====================================
254 | collision_points_centered, keypoints_centered = self._center_collision_points_and_keypoints(start_pose, collision_points, keypoints, keypoint_movable_mask)
255 | aux_args = (og_bounds,
256 | start_pose,
257 | end_pose,
258 | keypoints_centered,
259 | keypoint_movable_mask,
260 | path_constraints,
261 | sdf_func,
262 | collision_points_centered,
263 | self.config['opt_interpolate_pos_step_size'],
264 | self.config['opt_interpolate_rot_step_size'],
265 | self.ik_solver,
266 | initial_joint_pos,
267 | self.reset_joint_pos)
268 |
269 | # ====================================
270 | # = solve optimization
271 | # ====================================
272 | start = time.time()
273 | # use global optimization for the first iteration
274 | if from_scratch:
275 | opt_result = dual_annealing(
276 | func=objective,
277 | bounds=bounds,
278 | args=aux_args,
279 | maxfun=self.config['sampling_maxfun'],
280 | x0=init_sol,
281 | no_local_search=True,
282 | minimizer_kwargs={
283 | 'method': 'SLSQP',
284 | 'options': self.config['minimizer_options'],
285 | },
286 | )
287 | # use gradient-based local optimization for the following iterations
288 | else:
289 | opt_result = minimize(
290 | fun=objective,
291 | x0=init_sol,
292 | args=aux_args,
293 | bounds=bounds,
294 | method='SLSQP',
295 | options=self.config['minimizer_options'],
296 | )
297 | solve_time = time.time() - start
298 |
299 | # ====================================
300 | # = post-process opt_result
301 | # ====================================
302 | if isinstance(opt_result.message, list):
303 | opt_result.message = opt_result.message[0]
304 | # rerun to get debug info
305 | _, debug_dict = objective(opt_result.x, *aux_args, return_debug_dict=True)
306 | debug_dict['sol'] = opt_result.x.reshape(-1, 6)
307 | debug_dict['msg'] = opt_result.message
308 | debug_dict['solve_time'] = solve_time
309 | debug_dict['from_scratch'] = from_scratch
310 | debug_dict['type'] = 'path_solver'
311 | # unnormailze
312 | sol = unnormalize_vars(opt_result.x, og_bounds)
313 | # add end pose
314 | poses_euler = np.concatenate([sol.reshape(-1, 6), end_pose[None]], axis=0)
315 | poses_quat = T.convert_pose_euler2quat(poses_euler) # [num_control_points, 7]
316 | opt_result = self._check_opt_result(opt_result, poses_quat, debug_dict, og_bounds)
317 | # cache opt_result for future use if successful
318 | if opt_result.success:
319 | self.last_opt_result = copy.deepcopy(opt_result)
320 | return poses_quat, debug_dict
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | open3d==0.16.0
2 | omegaconf
3 | opencv-python
4 | hydra-core
5 | openai
6 | torch
7 | kmeans_pytorch
8 | parse
--------------------------------------------------------------------------------
/subgoal_solver.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import time
3 | import copy
4 | from scipy.optimize import dual_annealing, minimize
5 | from scipy.interpolate import RegularGridInterpolator
6 | import transform_utils as T
7 | from utils import (
8 | transform_keypoints,
9 | calculate_collision_cost,
10 | normalize_vars,
11 | unnormalize_vars,
12 | farthest_point_sampling,
13 | consistency,
14 | )
15 | def objective(opt_vars,
16 | og_bounds,
17 | keypoints_centered,
18 | keypoint_movable_mask,
19 | goal_constraints,
20 | path_constraints,
21 | sdf_func,
22 | collision_points_centered,
23 | init_pose_homo,
24 | ik_solver,
25 | initial_joint_pos,
26 | reset_joint_pos,
27 | is_grasp_stage,
28 | return_debug_dict=False):
29 |
30 | debug_dict = {}
31 | # unnormalize variables and do conversion
32 | opt_pose = unnormalize_vars(opt_vars, og_bounds)
33 | opt_pose_homo = T.pose2mat([opt_pose[:3], T.euler2quat(opt_pose[3:])])
34 |
35 | cost = 0
36 | # collision cost
37 | if collision_points_centered is not None:
38 | collision_cost = 0.8 * calculate_collision_cost(opt_pose_homo[None], sdf_func, collision_points_centered, 0.10)
39 | debug_dict['collision_cost'] = collision_cost
40 | cost += collision_cost
41 |
42 | # stay close to initial pose
43 | init_pose_cost = 1.0 * consistency(opt_pose_homo[None], init_pose_homo[None], rot_weight=1.5)
44 | debug_dict['init_pose_cost'] = init_pose_cost
45 | cost += init_pose_cost
46 |
47 | # reachability cost (approximated by number of IK iterations + regularization from reset joint pos)
48 | max_iterations = 20
49 | ik_result = ik_solver.solve(
50 | opt_pose_homo,
51 | max_iterations=max_iterations,
52 | initial_joint_pos=initial_joint_pos,
53 | )
54 | ik_cost = 20.0 * (ik_result.num_descents / max_iterations)
55 | debug_dict['ik_feasible'] = ik_result.success
56 | debug_dict['ik_pos_error'] = ik_result.position_error
57 | debug_dict['ik_cost'] = ik_cost
58 | cost += ik_cost
59 | if ik_result.success:
60 | reset_reg = np.linalg.norm(ik_result.cspace_position[:-1] - reset_joint_pos[:-1])
61 | reset_reg = np.clip(reset_reg, 0.0, 3.0)
62 | else:
63 | reset_reg = 3.0
64 | reset_reg_cost = 0.2 * reset_reg
65 | debug_dict['reset_reg_cost'] = reset_reg_cost
66 | cost += reset_reg_cost
67 |
68 | # grasp metric (better performance if using anygrasp or force-based grasp metrics)
69 | if is_grasp_stage:
70 | preferred_dir = np.array([0, 0, -1])
71 | grasp_cost = -np.dot(opt_pose_homo[:3, 0], preferred_dir) + 1 # [0, 1]
72 | grasp_cost = 10.0 * grasp_cost
73 | debug_dict['grasp_cost'] = grasp_cost
74 | cost += grasp_cost
75 |
76 | # goal constraint violation cost
77 | debug_dict['subgoal_constraint_cost'] = None
78 | debug_dict['subgoal_violation'] = None
79 | if goal_constraints is not None and len(goal_constraints) > 0:
80 | subgoal_constraint_cost = 0
81 | transformed_keypoints = transform_keypoints(opt_pose_homo, keypoints_centered, keypoint_movable_mask)
82 | subgoal_violation = []
83 | for constraint in goal_constraints:
84 | violation = constraint(transformed_keypoints[0], transformed_keypoints[1:])
85 | subgoal_violation.append(violation)
86 | subgoal_constraint_cost += np.clip(violation, 0, np.inf)
87 | subgoal_constraint_cost = 200.0*subgoal_constraint_cost
88 | debug_dict['subgoal_constraint_cost'] = subgoal_constraint_cost
89 | debug_dict['subgoal_violation'] = subgoal_violation
90 | cost += subgoal_constraint_cost
91 |
92 | # path constraint violation cost
93 | debug_dict['path_violation'] = None
94 | if path_constraints is not None and len(path_constraints) > 0:
95 | path_constraint_cost = 0
96 | transformed_keypoints = transform_keypoints(opt_pose_homo, keypoints_centered, keypoint_movable_mask)
97 | path_violation = []
98 | for constraint in path_constraints:
99 | violation = constraint(transformed_keypoints[0], transformed_keypoints[1:])
100 | path_violation.append(violation)
101 | path_constraint_cost += np.clip(violation, 0, np.inf)
102 | path_constraint_cost = 200.0*path_constraint_cost
103 | debug_dict['path_constraint_cost'] = path_constraint_cost
104 | debug_dict['path_violation'] = path_violation
105 | cost += path_constraint_cost
106 |
107 | debug_dict['total_cost'] = cost
108 |
109 | if return_debug_dict:
110 | return cost, debug_dict
111 |
112 | return cost
113 |
114 |
115 | class SubgoalSolver:
116 | def __init__(self, config, ik_solver, reset_joint_pos):
117 | self.config = config
118 | self.ik_solver = ik_solver
119 | self.reset_joint_pos = reset_joint_pos
120 | self.last_opt_result = None
121 | # warmup
122 | self._warmup()
123 |
124 | def _warmup(self):
125 | ee_pose = np.array([0.0, 0.0, 0.0, 0, 0, 0, 1])
126 | keypoints = np.random.rand(10, 3)
127 | keypoint_movable_mask = np.random.rand(10) > 0.5
128 | goal_constraints = []
129 | path_constraints = []
130 | sdf_voxels = np.zeros((10, 10, 10))
131 | collision_points = np.random.rand(100, 3)
132 | self.solve(ee_pose, keypoints, keypoint_movable_mask, goal_constraints, path_constraints, sdf_voxels, collision_points, True, None, from_scratch=True)
133 | self.last_opt_result = None
134 |
135 | def _setup_sdf(self, sdf_voxels):
136 | # create callable sdf function with interpolation
137 | x = np.linspace(self.config['bounds_min'][0], self.config['bounds_max'][0], sdf_voxels.shape[0])
138 | y = np.linspace(self.config['bounds_min'][1], self.config['bounds_max'][1], sdf_voxels.shape[1])
139 | z = np.linspace(self.config['bounds_min'][2], self.config['bounds_max'][2], sdf_voxels.shape[2])
140 | sdf_func = RegularGridInterpolator((x, y, z), sdf_voxels, bounds_error=False, fill_value=0)
141 | return sdf_func
142 |
143 | def _check_opt_result(self, opt_result, debug_dict):
144 | # accept the opt_result if it's only terminated due to iteration limit
145 | if (not opt_result.success and ('maximum' in opt_result.message.lower() or 'iteration' in opt_result.message.lower() or 'not necessarily' in opt_result.message.lower())):
146 | opt_result.success = True
147 | elif not opt_result.success:
148 | opt_result.message += '; invalid solution'
149 | # check whether goal constraints are satisfied
150 | if debug_dict['subgoal_violation'] is not None:
151 | goal_constraints_results = np.array(debug_dict['subgoal_violation'])
152 | opt_result.message += f'; goal_constraints_results: {goal_constraints_results} (higher is worse)'
153 | goal_constraints_satisfied = all([violation <= self.config['constraint_tolerance'] for violation in goal_constraints_results])
154 | if not goal_constraints_satisfied:
155 | opt_result.success = False
156 | opt_result.message += f'; goal not satisfied'
157 | # check whether path constraints are satisfied
158 | if debug_dict['path_violation'] is not None:
159 | path_constraints_results = np.array(debug_dict['path_violation'])
160 | opt_result.message += f'; path_constraints_results: {path_constraints_results}'
161 | path_constraints_satisfied = all([violation <= self.config['constraint_tolerance'] for violation in path_constraints_results])
162 | if not path_constraints_satisfied:
163 | opt_result.success = False
164 | opt_result.message += f'; path not satisfied'
165 | # check whether ik is feasible
166 | if 'ik_feasible' in debug_dict and not debug_dict['ik_feasible']:
167 | opt_result.success = False
168 | opt_result.message += f'; ik not feasible'
169 | return opt_result
170 |
171 | def _center_collision_points_and_keypoints(self, ee_pose_homo, collision_points, keypoints, keypoint_movable_mask):
172 | centering_transform = np.linalg.inv(ee_pose_homo)
173 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3]
174 | keypoints_centered = transform_keypoints(centering_transform, keypoints, keypoint_movable_mask)
175 | return collision_points_centered, keypoints_centered
176 |
177 | def solve(self,
178 | ee_pose,
179 | keypoints,
180 | keypoint_movable_mask,
181 | goal_constraints,
182 | path_constraints,
183 | sdf_voxels,
184 | collision_points,
185 | is_grasp_stage,
186 | initial_joint_pos,
187 | from_scratch=False,
188 | ):
189 | """
190 | Args:
191 | - ee_pose (np.ndarray): [7], [x, y, z, qx, qy, qz, qw] end effector pose.
192 | - keypoints (np.ndarray): [M, 3] keypoint positions.
193 | - keypoint_movable_mask (bool): [M] boolean array indicating whether the keypoint is on the grasped object.
194 | - goal_constraints (List[Callable]): subgoal constraint functions.
195 | - path_constraints (List[Callable]): path constraint functions.
196 | - sdf_voxels (np.ndarray): [X, Y, Z] signed distance field of the environment.
197 | - collision_points (np.ndarray): [N, 3] point cloud of the object.
198 | - is_grasp_stage (bool): whether the current stage is a grasp stage.
199 | - initial_joint_pos (np.ndarray): [N] initial joint positions of the robot.
200 | - from_scratch (bool): whether to start from scratch.
201 | Returns:
202 | - result (scipy.optimize.OptimizeResult): optimization result.
203 | - debug_dict (dict): debug information.
204 | """
205 | # downsample collision points
206 | if collision_points is not None and collision_points.shape[0] > self.config['max_collision_points']:
207 | collision_points = farthest_point_sampling(collision_points, self.config['max_collision_points'])
208 | sdf_func = self._setup_sdf(sdf_voxels)
209 | # ====================================
210 | # = setup bounds and initial guess
211 | # ====================================
212 | ee_pose = ee_pose.astype(np.float64)
213 | ee_pose_homo = T.pose2mat([ee_pose[:3], ee_pose[3:]])
214 | ee_pose_euler = np.concatenate([ee_pose[:3], T.quat2euler(ee_pose[3:])])
215 | # normalize opt variables to [0, 1]
216 | pos_bounds_min = self.config['bounds_min']
217 | pos_bounds_max = self.config['bounds_max']
218 | rot_bounds_min = np.array([-np.pi, -np.pi, -np.pi]) # euler angles
219 | rot_bounds_max = np.array([np.pi, np.pi, np.pi]) # euler angles
220 | og_bounds = [(b_min, b_max) for b_min, b_max in zip(np.concatenate([pos_bounds_min, rot_bounds_min]), np.concatenate([pos_bounds_max, rot_bounds_max]))]
221 | bounds = [(-1, 1)] * len(og_bounds)
222 | if not from_scratch and self.last_opt_result is not None:
223 | init_sol = self.last_opt_result.x
224 | else:
225 | init_sol = normalize_vars(ee_pose_euler, og_bounds) # start from the current pose
226 | from_scratch = True
227 |
228 | # ====================================
229 | # = other setup
230 | # ====================================
231 | collision_points_centered, keypoints_centered = self._center_collision_points_and_keypoints(ee_pose_homo, collision_points, keypoints, keypoint_movable_mask)
232 | aux_args = (og_bounds,
233 | keypoints_centered,
234 | keypoint_movable_mask,
235 | goal_constraints,
236 | path_constraints,
237 | sdf_func,
238 | collision_points_centered,
239 | ee_pose_homo,
240 | self.ik_solver,
241 | initial_joint_pos,
242 | self.reset_joint_pos,
243 | is_grasp_stage)
244 |
245 | # ====================================
246 | # = solve optimization
247 | # ====================================
248 | start = time.time()
249 | # use global optimization for the first iteration
250 | if from_scratch:
251 | opt_result = dual_annealing(
252 | func=objective,
253 | bounds=bounds,
254 | args=aux_args,
255 | maxfun=self.config['sampling_maxfun'],
256 | x0=init_sol,
257 | no_local_search=False,
258 | minimizer_kwargs={
259 | 'method': 'SLSQP',
260 | 'options': self.config['minimizer_options'],
261 | },
262 | )
263 | # use gradient-based local optimization for the following iterations
264 | else:
265 | opt_result = minimize(
266 | fun=objective,
267 | x0=init_sol,
268 | args=aux_args,
269 | bounds=bounds,
270 | method='SLSQP',
271 | options=self.config['minimizer_options'],
272 | )
273 | solve_time = time.time() - start
274 |
275 | # ====================================
276 | # = post-process opt_result
277 | # ====================================
278 | if isinstance(opt_result.message, list):
279 | opt_result.message = opt_result.message[0]
280 | # rerun to get debug info
281 | _, debug_dict = objective(opt_result.x, *aux_args, return_debug_dict=True)
282 | debug_dict['sol'] = opt_result.x
283 | debug_dict['msg'] = opt_result.message
284 | debug_dict['solve_time'] = solve_time
285 | debug_dict['from_scratch'] = from_scratch
286 | debug_dict['type'] = 'subgoal_solver'
287 | # unnormailze
288 | sol = unnormalize_vars(opt_result.x, og_bounds)
289 | sol = np.concatenate([sol[:3], T.euler2quat(sol[3:])])
290 | opt_result = self._check_opt_result(opt_result, debug_dict)
291 | # cache opt_result for future use if successful
292 | if opt_result.success:
293 | self.last_opt_result = copy.deepcopy(opt_result)
294 | return sol, debug_dict
--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
1 | import os
2 | import numpy as np
3 | from numba import njit
4 | import open3d as o3d
5 | import datetime
6 | import scipy.interpolate as interpolate
7 | from scipy.spatial.transform import Slerp
8 | from scipy.spatial.transform import Rotation as R
9 | from scipy.spatial.transform import RotationSpline
10 | import transform_utils as T
11 | import yaml
12 |
13 | # ===============================================
14 | # = optimization utils
15 | # ===============================================
16 | def normalize_vars(vars, og_bounds):
17 | """
18 | Given 1D variables and bounds, normalize the variables to [-1, 1] range.
19 | """
20 | normalized_vars = np.empty_like(vars)
21 | for i, (b_min, b_max) in enumerate(og_bounds):
22 | normalized_vars[i] = (vars[i] - b_min) / (b_max - b_min) * 2 - 1
23 | return normalized_vars
24 |
25 | def unnormalize_vars(normalized_vars, og_bounds):
26 | """
27 | Given 1D variables in [-1, 1] and original bounds, denormalize the variables to the original range.
28 | """
29 | vars = np.empty_like(normalized_vars)
30 | for i, (b_min, b_max) in enumerate(og_bounds):
31 | vars[i] = (normalized_vars[i] + 1) / 2 * (b_max - b_min) + b_min
32 | return vars
33 |
34 | def calculate_collision_cost(poses, sdf_func, collision_points, threshold):
35 | assert poses.shape[1:] == (4, 4)
36 | transformed_pcs = batch_transform_points(collision_points, poses)
37 | transformed_pcs_flatten = transformed_pcs.reshape(-1, 3) # [num_poses * num_points, 3]
38 | signed_distance = sdf_func(transformed_pcs_flatten) + threshold # [num_poses * num_points]
39 | signed_distance = signed_distance.reshape(-1, collision_points.shape[0]) # [num_poses, num_points]
40 | non_zero_mask = signed_distance > 0
41 | collision_cost = np.sum(signed_distance[non_zero_mask])
42 | return collision_cost
43 |
44 | @njit(cache=True, fastmath=True)
45 | def consistency(poses_a, poses_b, rot_weight=0.5):
46 | assert poses_a.shape[1:] == (4, 4) and poses_b.shape[1:] == (4, 4), 'poses must be of shape (N, 4, 4)'
47 | min_distances = np.zeros(len(poses_a), dtype=np.float64)
48 | for i in range(len(poses_a)):
49 | min_distance = 9999999
50 | a = poses_a[i]
51 | for j in range(len(poses_b)):
52 | b = poses_b[j]
53 | pos_distance = np.linalg.norm(a[:3, 3] - b[:3, 3])
54 | rot_distance = angle_between_rotmat(a[:3, :3], b[:3, :3])
55 | distance = pos_distance + rot_distance * rot_weight
56 | min_distance = min(min_distance, distance)
57 | min_distances[i] = min_distance
58 | return np.mean(min_distances)
59 |
60 | def transform_keypoints(transform, keypoints, movable_mask):
61 | assert transform.shape == (4, 4)
62 | transformed_keypoints = keypoints.copy()
63 | if movable_mask.sum() > 0:
64 | transformed_keypoints[movable_mask] = np.dot(keypoints[movable_mask], transform[:3, :3].T) + transform[:3, 3]
65 | return transformed_keypoints
66 |
67 | @njit(cache=True, fastmath=True)
68 | def batch_transform_points(points, transforms):
69 | """
70 | Apply multiple of transformation to point cloud, return results of individual transformations.
71 | Args:
72 | points: point cloud (N, 3).
73 | transforms: M 4x4 transformations (M, 4, 4).
74 | Returns:
75 | np.array: point clouds (M, N, 3).
76 | """
77 | assert transforms.shape[1:] == (4, 4), 'transforms must be of shape (M, 4, 4)'
78 | transformed_points = np.zeros((transforms.shape[0], points.shape[0], 3))
79 | for i in range(transforms.shape[0]):
80 | pos, R = transforms[i, :3, 3], transforms[i, :3, :3]
81 | transformed_points[i] = np.dot(points, R.T) + pos
82 | return transformed_points
83 |
84 | @njit(cache=True, fastmath=True)
85 | def get_samples_jitted(control_points_homo, control_points_quat, opt_interpolate_pos_step_size, opt_interpolate_rot_step_size):
86 | assert control_points_homo.shape[1:] == (4, 4)
87 | # calculate number of samples per segment
88 | num_samples_per_segment = np.empty(len(control_points_homo) - 1, dtype=np.int64)
89 | for i in range(len(control_points_homo) - 1):
90 | start_pos = control_points_homo[i, :3, 3]
91 | start_rotmat = control_points_homo[i, :3, :3]
92 | end_pos = control_points_homo[i+1, :3, 3]
93 | end_rotmat = control_points_homo[i+1, :3, :3]
94 | pos_diff = np.linalg.norm(start_pos - end_pos)
95 | rot_diff = angle_between_rotmat(start_rotmat, end_rotmat)
96 | pos_num_steps = np.ceil(pos_diff / opt_interpolate_pos_step_size)
97 | rot_num_steps = np.ceil(rot_diff / opt_interpolate_rot_step_size)
98 | num_path_poses = int(max(pos_num_steps, rot_num_steps))
99 | num_path_poses = max(num_path_poses, 2) # at least 2 poses, start and end
100 | num_samples_per_segment[i] = num_path_poses
101 | # fill in samples
102 | num_samples = num_samples_per_segment.sum()
103 | samples_7 = np.empty((num_samples, 7))
104 | sample_idx = 0
105 | for i in range(len(control_points_quat) - 1):
106 | start_pos, start_xyzw = control_points_quat[i, :3], control_points_quat[i, 3:]
107 | end_pos, end_xyzw = control_points_quat[i+1, :3], control_points_quat[i+1, 3:]
108 | # using proper quaternion slerp interpolation
109 | poses_7 = np.empty((num_samples_per_segment[i], 7))
110 | for j in range(num_samples_per_segment[i]):
111 | alpha = j / (num_samples_per_segment[i] - 1)
112 | pos = start_pos * (1 - alpha) + end_pos * alpha
113 | blended_xyzw = T.quat_slerp_jitted(start_xyzw, end_xyzw, alpha)
114 | pose_7 = np.empty(7)
115 | pose_7[:3] = pos
116 | pose_7[3:] = blended_xyzw
117 | poses_7[j] = pose_7
118 | samples_7[sample_idx:sample_idx+num_samples_per_segment[i]] = poses_7
119 | sample_idx += num_samples_per_segment[i]
120 | assert num_samples >= 2, f'num_samples: {num_samples}'
121 | return samples_7, num_samples
122 |
123 | @njit(cache=True, fastmath=True)
124 | def path_length(samples_homo):
125 | assert samples_homo.shape[1:] == (4, 4), 'samples_homo must be of shape (N, 4, 4)'
126 | pos_length = 0
127 | rot_length = 0
128 | for i in range(len(samples_homo) - 1):
129 | pos_length += np.linalg.norm(samples_homo[i, :3, 3] - samples_homo[i+1, :3, 3])
130 | rot_length += angle_between_rotmat(samples_homo[i, :3, :3], samples_homo[i+1, :3, :3])
131 | return pos_length, rot_length
132 |
133 | # ===============================================
134 | # = others
135 | # ===============================================
136 | def get_callable_grasping_cost_fn(env):
137 | def get_grasping_cost(keypoint_idx):
138 | keypoint_object = env.get_object_by_keypoint(keypoint_idx)
139 | return -env.is_grasping(candidate_obj=keypoint_object) + 1 # return 0 if grasping an object, 1 if not grasping any object
140 | return get_grasping_cost
141 |
142 | def get_config(config_path=None):
143 | if config_path is None:
144 | this_file_dir = os.path.dirname(os.path.abspath(__file__))
145 | config_path = os.path.join(this_file_dir, 'configs/config.yaml')
146 | assert config_path and os.path.exists(config_path), f'config file does not exist ({config_path})'
147 | with open(config_path, 'r') as f:
148 | config = yaml.load(f, Loader=yaml.FullLoader)
149 | return config
150 |
151 | class bcolors:
152 | HEADER = '\033[95m'
153 | OKBLUE = '\033[94m'
154 | OKCYAN = '\033[96m'
155 | OKGREEN = '\033[92m'
156 | WARNING = '\033[93m'
157 | FAIL = '\033[91m'
158 | ENDC = '\033[0m'
159 | BOLD = '\033[1m'
160 | UNDERLINE = '\033[4m'
161 |
162 | def get_clock_time(milliseconds=False):
163 | curr_time = datetime.datetime.now()
164 | if milliseconds:
165 | return f'{curr_time.hour}:{curr_time.minute}:{curr_time.second}.{curr_time.microsecond // 1000}'
166 | else:
167 | return f'{curr_time.hour}:{curr_time.minute}:{curr_time.second}'
168 |
169 | def angle_between_quats(q1, q2):
170 | """Angle between two quaternions"""
171 | return 2 * np.arccos(np.clip(np.abs(np.dot(q1, q2)), -1, 1))
172 |
173 | def filter_points_by_bounds(points, bounds_min, bounds_max, strict=True):
174 | """
175 | Filter points by taking only points within workspace bounds.
176 | """
177 | assert points.shape[1] == 3, "points must be (N, 3)"
178 | bounds_min = bounds_min.copy()
179 | bounds_max = bounds_max.copy()
180 | if not strict:
181 | bounds_min[:2] = bounds_min[:2] - 0.1 * (bounds_max[:2] - bounds_min[:2])
182 | bounds_max[:2] = bounds_max[:2] + 0.1 * (bounds_max[:2] - bounds_min[:2])
183 | bounds_min[2] = bounds_min[2] - 0.1 * (bounds_max[2] - bounds_min[2])
184 | within_bounds_mask = (
185 | (points[:, 0] >= bounds_min[0])
186 | & (points[:, 0] <= bounds_max[0])
187 | & (points[:, 1] >= bounds_min[1])
188 | & (points[:, 1] <= bounds_max[1])
189 | & (points[:, 2] >= bounds_min[2])
190 | & (points[:, 2] <= bounds_max[2])
191 | )
192 | return within_bounds_mask
193 |
194 | def print_opt_debug_dict(debug_dict):
195 | print('\n' + '#' * 40)
196 | print(f'# Optimization debug info:')
197 | max_key_length = max(len(str(k)) for k in debug_dict.keys())
198 | for k, v in debug_dict.items():
199 | if isinstance(v, int) or isinstance(v, float):
200 | print(f'# {k:<{max_key_length}}: {v:.05f}')
201 | elif isinstance(v, list) and all(isinstance(x, int) or isinstance(x, float) for x in v):
202 | print(f'# {k:<{max_key_length}}: {np.array(v).round(5)}')
203 | else:
204 | print(f'# {k:<{max_key_length}}: {v}')
205 | print('#' * 40 + '\n')
206 |
207 | def merge_dicts(dicts):
208 | return {
209 | k : v
210 | for d in dicts
211 | for k, v in d.items()
212 | }
213 |
214 | def exec_safe(code_str, gvars=None, lvars=None):
215 | banned_phrases = ['import', '__']
216 | for phrase in banned_phrases:
217 | assert phrase not in code_str
218 |
219 | if gvars is None:
220 | gvars = {}
221 | if lvars is None:
222 | lvars = {}
223 | empty_fn = lambda *args, **kwargs: None
224 | custom_gvars = merge_dicts([
225 | gvars,
226 | {'exec': empty_fn, 'eval': empty_fn}
227 | ])
228 | try:
229 | exec(code_str, custom_gvars, lvars)
230 | except Exception as e:
231 | print(f'Error executing code:\n{code_str}')
232 | raise e
233 |
234 | def load_functions_from_txt(txt_path, get_grasping_cost_fn):
235 | if txt_path is None:
236 | return []
237 | # load txt file
238 | with open(txt_path, 'r') as f:
239 | functions_text = f.read()
240 | # execute functions
241 | gvars_dict = {
242 | 'np': np,
243 | 'get_grasping_cost_by_keypoint_idx': get_grasping_cost_fn,
244 | } # external library APIs
245 | lvars_dict = dict()
246 | exec_safe(functions_text, gvars=gvars_dict, lvars=lvars_dict)
247 | return list(lvars_dict.values())
248 |
249 | @njit(cache=True, fastmath=True)
250 | def angle_between_rotmat(P, Q):
251 | R = np.dot(P, Q.T)
252 | cos_theta = (np.trace(R)-1)/2
253 | if cos_theta > 1:
254 | cos_theta = 1
255 | elif cos_theta < -1:
256 | cos_theta = -1
257 | return np.arccos(cos_theta)
258 |
259 | def fit_b_spline(control_points):
260 | # determine appropriate k
261 | k = min(3, control_points.shape[0]-1)
262 | spline = interpolate.splprep(control_points.T, s=0, k=k)
263 | return spline
264 |
265 | def sample_from_spline(spline, num_samples):
266 | sample_points = np.linspace(0, 1, num_samples)
267 | if isinstance(spline, RotationSpline):
268 | samples = spline(sample_points).as_matrix() # [num_samples, 3, 3]
269 | else:
270 | assert isinstance(spline, tuple) and len(spline) == 2, 'spline must be a tuple of (tck, u)'
271 | tck, u = spline
272 | samples = interpolate.splev(np.linspace(0, 1, num_samples), tck) # [spline_dim, num_samples]
273 | samples = np.array(samples).T # [num_samples, spline_dim]
274 | return samples
275 |
276 | def linear_interpolate_poses(start_pose, end_pose, num_poses):
277 | """
278 | Interpolate between start and end pose.
279 | """
280 | assert num_poses >= 2, 'num_poses must be at least 2'
281 | if start_pose.shape == (6,) and end_pose.shape == (6,):
282 | start_pos, start_euler = start_pose[:3], start_pose[3:]
283 | end_pos, end_euler = end_pose[:3], end_pose[3:]
284 | start_rotmat = T.euler2mat(start_euler)
285 | end_rotmat = T.euler2mat(end_euler)
286 | elif start_pose.shape == (4, 4) and end_pose.shape == (4, 4):
287 | start_pos = start_pose[:3, 3]
288 | start_rotmat = start_pose[:3, :3]
289 | end_pos = end_pose[:3, 3]
290 | end_rotmat = end_pose[:3, :3]
291 | elif start_pose.shape == (7,) and end_pose.shape == (7,):
292 | start_pos, start_quat = start_pose[:3], start_pose[3:]
293 | start_rotmat = T.quat2mat(start_quat)
294 | end_pos, end_quat = end_pose[:3], end_pose[3:]
295 | end_rotmat = T.quat2mat(end_quat)
296 | else:
297 | raise ValueError('start_pose and end_pose not recognized')
298 | slerp = Slerp([0, 1], R.from_matrix([start_rotmat, end_rotmat]))
299 | poses = []
300 | for i in range(num_poses):
301 | alpha = i / (num_poses - 1)
302 | pos = start_pos * (1 - alpha) + end_pos * alpha
303 | rotmat = slerp(alpha).as_matrix()
304 | if start_pose.shape == (6,):
305 | euler = T.mat2euler(rotmat)
306 | poses.append(np.concatenate([pos, euler]))
307 | elif start_pose.shape == (4, 4):
308 | pose = np.eye(4)
309 | pose[:3, :3] = rotmat
310 | pose[:3, 3] = pos
311 | poses.append(pose)
312 | elif start_pose.shape == (7,):
313 | quat = T.mat2quat(rotmat)
314 | pose = np.concatenate([pos, quat])
315 | poses.append(pose)
316 | return np.array(poses)
317 |
318 | def spline_interpolate_poses(control_points, num_steps):
319 | """
320 | Interpolate between through the control points using spline interpolation.
321 | 1. Fit a b-spline through the positional terms of the control points.
322 | 2. Fit a RotationSpline through the rotational terms of the control points.
323 | 3. Sample the b-spline and RotationSpline at num_steps.
324 |
325 | Args:
326 | control_points: [N, 6] position + euler or [N, 4, 4] pose or [N, 7] position + quat
327 | num_steps: number of poses to interpolate
328 | Returns:
329 | poses: [num_steps, 6] position + euler or [num_steps, 4, 4] pose or [num_steps, 7] position + quat
330 | """
331 | assert num_steps >= 2, 'num_steps must be at least 2'
332 | if isinstance(control_points, list):
333 | control_points = np.array(control_points)
334 | if control_points.shape[1] == 6:
335 | control_points_pos = control_points[:, :3] # [N, 3]
336 | control_points_euler = control_points[:, 3:] # [N, 3]
337 | control_points_rotmat = []
338 | for control_point_euler in control_points_euler:
339 | control_points_rotmat.append(T.euler2mat(control_point_euler))
340 | control_points_rotmat = np.array(control_points_rotmat) # [N, 3, 3]
341 | elif control_points.shape[1] == 4 and control_points.shape[2] == 4:
342 | control_points_pos = control_points[:, :3, 3] # [N, 3]
343 | control_points_rotmat = control_points[:, :3, :3] # [N, 3, 3]
344 | elif control_points.shape[1] == 7:
345 | control_points_pos = control_points[:, :3]
346 | control_points_rotmat = []
347 | for control_point_quat in control_points[:, 3:]:
348 | control_points_rotmat.append(T.quat2mat(control_point_quat))
349 | control_points_rotmat = np.array(control_points_rotmat)
350 | else:
351 | raise ValueError('control_points not recognized')
352 | # remove the duplicate points (threshold 1e-3)
353 | diff = np.linalg.norm(np.diff(control_points_pos, axis=0), axis=1)
354 | mask = diff > 1e-3
355 | # always keep the first and last points
356 | mask = np.concatenate([[True], mask[:-1], [True]])
357 | control_points_pos = control_points_pos[mask]
358 | control_points_rotmat = control_points_rotmat[mask]
359 | # fit b-spline through positional terms control points
360 | pos_spline = fit_b_spline(control_points_pos)
361 | # fit RotationSpline through rotational terms control points
362 | times = pos_spline[1]
363 | rotations = R.from_matrix(control_points_rotmat)
364 | rot_spline = RotationSpline(times, rotations)
365 | # sample from the splines
366 | pos_samples = sample_from_spline(pos_spline, num_steps) # [num_steps, 3]
367 | rot_samples = sample_from_spline(rot_spline, num_steps) # [num_steps, 3, 3]
368 | if control_points.shape[1] == 6:
369 | poses = []
370 | for i in range(num_steps):
371 | pose = np.concatenate([pos_samples[i], T.mat2euler(rot_samples[i])])
372 | poses.append(pose)
373 | poses = np.array(poses)
374 | elif control_points.shape[1] == 4 and control_points.shape[2] == 4:
375 | poses = np.empty((num_steps, 4, 4))
376 | poses[:, :3, :3] = rot_samples
377 | poses[:, :3, 3] = pos_samples
378 | poses[:, 3, 3] = 1
379 | elif control_points.shape[1] == 7:
380 | poses = np.empty((num_steps, 7))
381 | for i in range(num_steps):
382 | quat = T.mat2quat(rot_samples[i])
383 | pose = np.concatenate([pos_samples[i], quat])
384 | poses[i] = pose
385 | return poses
386 |
387 | def get_linear_interpolation_steps(start_pose, end_pose, pos_step_size, rot_step_size):
388 | """
389 | Given start and end pose, calculate the number of steps to interpolate between them.
390 | Args:
391 | start_pose: [6] position + euler or [4, 4] pose or [7] position + quat
392 | end_pose: [6] position + euler or [4, 4] pose or [7] position + quat
393 | pos_step_size: position step size
394 | rot_step_size: rotation step size
395 | Returns:
396 | num_path_poses: number of poses to interpolate
397 | """
398 | if start_pose.shape == (6,) and end_pose.shape == (6,):
399 | start_pos, start_euler = start_pose[:3], start_pose[3:]
400 | end_pos, end_euler = end_pose[:3], end_pose[3:]
401 | start_rotmat = T.euler2mat(start_euler)
402 | end_rotmat = T.euler2mat(end_euler)
403 | elif start_pose.shape == (4, 4) and end_pose.shape == (4, 4):
404 | start_pos = start_pose[:3, 3]
405 | start_rotmat = start_pose[:3, :3]
406 | end_pos = end_pose[:3, 3]
407 | end_rotmat = end_pose[:3, :3]
408 | elif start_pose.shape == (7,) and end_pose.shape == (7,):
409 | start_pos, start_quat = start_pose[:3], start_pose[3:]
410 | start_rotmat = T.quat2mat(start_quat)
411 | end_pos, end_quat = end_pose[:3], end_pose[3:]
412 | end_rotmat = T.quat2mat(end_quat)
413 | else:
414 | raise ValueError('start_pose and end_pose not recognized')
415 | pos_diff = np.linalg.norm(start_pos - end_pos)
416 | rot_diff = angle_between_rotmat(start_rotmat, end_rotmat)
417 | pos_num_steps = np.ceil(pos_diff / pos_step_size)
418 | rot_num_steps = np.ceil(rot_diff / rot_step_size)
419 | num_path_poses = int(max(pos_num_steps, rot_num_steps))
420 | num_path_poses = max(num_path_poses, 2) # at least start and end poses
421 | return num_path_poses
422 |
423 | def farthest_point_sampling(pc, num_points):
424 | """
425 | Given a point cloud, sample num_points points that are the farthest apart.
426 | Use o3d farthest point sampling.
427 | """
428 | assert pc.ndim == 2 and pc.shape[1] == 3, "pc must be a (N, 3) numpy array"
429 | pcd = o3d.geometry.PointCloud()
430 | pcd.points = o3d.utility.Vector3dVector(pc)
431 | downpcd_farthest = pcd.farthest_point_down_sample(num_points)
432 | return np.asarray(downpcd_farthest.points)
--------------------------------------------------------------------------------
/visualizer.py:
--------------------------------------------------------------------------------
1 | import open3d as o3d
2 | import numpy as np
3 | import matplotlib
4 | import cv2
5 | import transform_utils as T
6 | from utils import filter_points_by_bounds, batch_transform_points
7 |
8 | def add_to_visualize_buffer(visualize_buffer, visualize_points, visualize_colors):
9 | assert visualize_points.shape[0] == visualize_colors.shape[0], f'got {visualize_points.shape[0]} for points and {visualize_colors.shape[0]} for colors'
10 | if len(visualize_points) == 0:
11 | return
12 | assert visualize_points.shape[1] == 3
13 | assert visualize_colors.shape[1] == 3
14 | # assert visualize_colors.max() <= 1.0 and visualize_colors.min() >= 0.0
15 | visualize_buffer["points"].append(visualize_points)
16 | visualize_buffer["colors"].append(visualize_colors)
17 |
18 | def generate_nearby_points(point, num_points_per_side=5, half_range=0.005):
19 | if point.ndim == 1:
20 | offsets = np.linspace(-1, 1, num_points_per_side)
21 | offsets_meshgrid = np.meshgrid(offsets, offsets, offsets)
22 | offsets_array = np.stack(offsets_meshgrid, axis=-1).reshape(-1, 3)
23 | nearby_points = point + offsets_array * half_range
24 | return nearby_points.reshape(-1, 3)
25 | else:
26 | assert point.shape[1] == 3, "point must be (N, 3)"
27 | assert point.ndim == 2, "point must be (N, 3)"
28 | # vectorized version
29 | offsets = np.linspace(-1, 1, num_points_per_side)
30 | offsets_meshgrid = np.meshgrid(offsets, offsets, offsets)
31 | offsets_array = np.stack(offsets_meshgrid, axis=-1).reshape(-1, 3)
32 | nearby_points = point[:, None, :] + offsets_array
33 | return nearby_points
34 |
35 | class Visualizer:
36 | def __init__(self, config, env):
37 | self.config = config
38 | self.env = env
39 | self.bounds_min = np.array(self.config['bounds_min'])
40 | self.bounds_max = np.array(self.config['bounds_max'])
41 | self.color = np.array([0.05, 0.55, 0.26])
42 | self.world2viewer = np.array([
43 | [0.3788, 0.3569, -0.8539, 0.0],
44 | [0.9198, -0.0429, 0.3901, 0.0],
45 | [-0.1026, 0.9332, 0.3445, 0.0],
46 | [0.0, 0.0, 0.0, 1.0]
47 | ]).T
48 |
49 | def show_img(self, rgb):
50 | cv2.imshow('img', rgb[..., ::-1])
51 | cv2.waitKey(0)
52 | print('showing image, click on the window and press "ESC" to close and continue')
53 | cv2.destroyAllWindows()
54 |
55 | def show_pointcloud(self, points, colors):
56 | # transform to viewer frame
57 | points = np.dot(points, self.world2viewer[:3, :3].T) + self.world2viewer[:3, 3]
58 | # clip color to [0, 1]
59 | colors = np.clip(colors, 0.0, 1.0)
60 | pcd = o3d.geometry.PointCloud()
61 | pcd.points = o3d.utility.Vector3dVector(points.astype(np.float64)) # float64 is a lot faster than float32 when added to o3d later
62 | pcd.colors = o3d.utility.Vector3dVector(colors.astype(np.float64))
63 | print('visualizing pointcloud, click on the window and press "ESC" to close and continue')
64 | o3d.visualization.draw_geometries([pcd])
65 |
66 | def _get_scene_points_and_colors(self):
67 | # scene
68 | cam_obs = self.env.get_cam_obs()
69 | scene_points = []
70 | scene_colors = []
71 | for cam_id in range(len(cam_obs)):
72 | cam_points = cam_obs[cam_id]['points'].reshape(-1, 3)
73 | cam_colors = cam_obs[cam_id]['rgb'].reshape(-1, 3) / 255.0
74 | # clip to workspace
75 | within_workspace_mask = filter_points_by_bounds(cam_points, self.bounds_min, self.bounds_max, strict=False)
76 | cam_points = cam_points[within_workspace_mask]
77 | cam_colors = cam_colors[within_workspace_mask]
78 | scene_points.append(cam_points)
79 | scene_colors.append(cam_colors)
80 | scene_points = np.concatenate(scene_points, axis=0)
81 | scene_colors = np.concatenate(scene_colors, axis=0)
82 | return scene_points, scene_colors
83 |
84 | def visualize_subgoal(self, subgoal_pose):
85 | visualize_buffer = {
86 | "points": [],
87 | "colors": []
88 | }
89 | # scene
90 | scene_points, scene_colors = self._get_scene_points_and_colors()
91 | add_to_visualize_buffer(visualize_buffer, scene_points, scene_colors)
92 | subgoal_pose_homo = T.convert_pose_quat2mat(subgoal_pose)
93 | # subgoal
94 | collision_points = self.env.get_collision_points(noise=False)
95 | # transform collision points to the subgoal frame
96 | ee_pose = self.env.get_ee_pose()
97 | ee_pose_homo = T.convert_pose_quat2mat(ee_pose)
98 | centering_transform = np.linalg.inv(ee_pose_homo)
99 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3]
100 | transformed_collision_points = batch_transform_points(collision_points_centered, subgoal_pose_homo[None]).reshape(-1, 3)
101 | collision_points_colors = np.array([self.color] * len(collision_points))
102 | add_to_visualize_buffer(visualize_buffer, transformed_collision_points, collision_points_colors)
103 | # add keypoints
104 | keypoints = self.env.get_keypoint_positions()
105 | num_keypoints = keypoints.shape[0]
106 | color_map = matplotlib.colormaps["gist_rainbow"]
107 | keypoints_colors = [color_map(i / num_keypoints)[:3] for i in range(num_keypoints)]
108 | for i in range(num_keypoints):
109 | nearby_points = generate_nearby_points(keypoints[i], num_points_per_side=6, half_range=0.009)
110 | nearby_colors = np.tile(keypoints_colors[i], (nearby_points.shape[0], 1))
111 | nearby_colors = 0.5 * nearby_colors + 0.5 * np.array([1, 1, 1])
112 | add_to_visualize_buffer(visualize_buffer, nearby_points, nearby_colors)
113 | # visualize
114 | visualize_points = np.concatenate(visualize_buffer["points"], axis=0)
115 | visualize_colors = np.concatenate(visualize_buffer["colors"], axis=0)
116 | self.show_pointcloud(visualize_points, visualize_colors)
117 |
118 | def visualize_path(self, path):
119 | visualize_buffer = {
120 | "points": [],
121 | "colors": []
122 | }
123 | # scene
124 | scene_points, scene_colors = self._get_scene_points_and_colors()
125 | add_to_visualize_buffer(visualize_buffer, scene_points, scene_colors)
126 | # draw curve based on poses
127 | for t in range(len(path) - 1):
128 | start = path[t][:3]
129 | end = path[t + 1][:3]
130 | num_interp_points = int(np.linalg.norm(start - end) / 0.0002)
131 | interp_points = np.linspace(start, end, num_interp_points)
132 | interp_colors = np.tile([0.0, 0.0, 0.0], (num_interp_points, 1))
133 | # add a tint of white (the higher the j, the more white)
134 | whitening_coef = 0.3 + 0.5 * (t / len(path))
135 | interp_colors = (1 - whitening_coef) * interp_colors + whitening_coef * np.array([1, 1, 1])
136 | add_to_visualize_buffer(visualize_buffer, interp_points, interp_colors)
137 | # subsample path with a fixed step size
138 | step_size = 0.05
139 | subpath = [path[0]]
140 | for i in range(1, len(path) - 1):
141 | dist = np.linalg.norm(np.array(path[i][:3]) - np.array(subpath[-1][:3]))
142 | if dist > step_size:
143 | subpath.append(path[i])
144 | subpath.append(path[-1])
145 | path = np.array(subpath)
146 | path_length = path.shape[0]
147 | # path points
148 | collision_points = self.env.get_collision_points(noise=False)
149 | num_points = collision_points.shape[0]
150 | start_pose = self.env.get_ee_pose()
151 | centering_transform = np.linalg.inv(T.convert_pose_quat2mat(start_pose))
152 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3]
153 | poses_homo = T.convert_pose_quat2mat(path[:, :7]) # the last number is gripper action
154 | transformed_collision_points = batch_transform_points(collision_points_centered, poses_homo).reshape(-1, 3) # (num_poses, num_points, 3)
155 | # calculate color based on the timestep
156 | collision_points_colors = np.ones([path_length, num_points, 3]) * self.color[None, None]
157 | for t in range(path_length):
158 | whitening_coef = 0.3 + 0.5 * (t / path_length)
159 | collision_points_colors[t] = (1 - whitening_coef) * collision_points_colors[t] + whitening_coef * np.array([1, 1, 1])
160 | collision_points_colors = collision_points_colors.reshape(-1, 3)
161 | add_to_visualize_buffer(visualize_buffer, transformed_collision_points, collision_points_colors)
162 | # keypoints
163 | keypoints = self.env.get_keypoint_positions()
164 | num_keypoints = keypoints.shape[0]
165 | color_map = matplotlib.colormaps["gist_rainbow"]
166 | keypoints_colors = [color_map(i / num_keypoints)[:3] for i in range(num_keypoints)]
167 | for i in range(num_keypoints):
168 | nearby_points = generate_nearby_points(keypoints[i], num_points_per_side=6, half_range=0.009)
169 | nearby_colors = np.tile(keypoints_colors[i], (nearby_points.shape[0], 1))
170 | nearby_colors = 0.5 * nearby_colors + 0.5 * np.array([1, 1, 1])
171 | add_to_visualize_buffer(visualize_buffer, nearby_points, nearby_colors)
172 | # visualize
173 | visualize_points = np.concatenate(visualize_buffer["points"], axis=0)
174 | visualize_colors = np.concatenate(visualize_buffer["colors"], axis=0)
175 | self.show_pointcloud(visualize_points, visualize_colors)
--------------------------------------------------------------------------------
/vlm_query/pen/metadata.json:
--------------------------------------------------------------------------------
1 | {"init_keypoint_positions": [[-0.25794319243703623, -0.2360844662773828, 0.6905476882082092], [-0.2671639177617655, -0.09433527815132495, 0.7122930549896219], [-0.27151234338495983, 0.08395197554366488, 0.7154432798183405], [-0.266379743453929, 0.08175199572534772, 0.8097122583174985], [-0.3607951132392543, 0.18610731832223598, 0.7178293666264162], [-0.2624312374613168, 0.195909440226552, 0.6867906392498943], [-0.31782969861483357, 0.21783651249007482, 0.7905845242493158]], "num_keypoints": 7, "num_stages": 3, "grasp_keypoints": [1, -1, -1], "release_keypoints": [-1, -1, 1]}
--------------------------------------------------------------------------------
/vlm_query/pen/query_img.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/huangwl18/ReKep/63c43fdba60354980258beaeb8a7d48e088e1e3e/vlm_query/pen/query_img.png
--------------------------------------------------------------------------------
/vlm_query/pen/stage1_subgoal_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage1_subgoal_constraint1(end_effector, keypoints):
2 | """Align the end-effector with the white pen's grasping point (keypoint 1)."""
3 | grasp_point = keypoints[1]
4 | cost = np.linalg.norm(end_effector - grasp_point)
5 | return cost
6 |
7 |
--------------------------------------------------------------------------------
/vlm_query/pen/stage2_path_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage2_path_constraint1(end_effector, keypoints):
2 | """The robot must still be grasping the white pen (keypoint 1)."""
3 | return get_grasping_cost_by_keypoint_idx(1)
4 |
5 |
--------------------------------------------------------------------------------
/vlm_query/pen/stage2_subgoal_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage2_subgoal_constraint1(end_effector, keypoints):
2 | """Ensure the white pen is upright by aligning the vector from keypoint 0 to keypoint 1 with the z-axis."""
3 | pen_vector = keypoints[1] - keypoints[0]
4 | z_axis = np.array([0, 0, 1])
5 | cost = np.linalg.norm(np.cross(pen_vector, z_axis))
6 | return cost
7 |
8 |
--------------------------------------------------------------------------------
/vlm_query/pen/stage3_path_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage3_path_constraint1(end_effector, keypoints):
2 | """The robot must still be grasping the white pen (keypoint 1)."""
3 | return get_grasping_cost_by_keypoint_idx(1)
4 |
5 |
--------------------------------------------------------------------------------
/vlm_query/pen/stage3_subgoal_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage3_subgoal_constraint1(end_effector, keypoints):
2 | """Ensure the white pen is above the black pen holder opening (mean of keypoints 3, 4, 5, 6) and aligned with the z-axis."""
3 | holder_opening = np.mean(keypoints[3:7], axis=0)
4 | above_holder = holder_opening + np.array([0, 0, 0.2]) # 20cm above the holder opening
5 | cost = np.linalg.norm(keypoints[1] - above_holder)
6 | return cost
7 |
8 | def stage3_subgoal_constraint2(end_effector, keypoints):
9 | """Ensure the white pen is upright by aligning the vector from keypoint 0 to keypoint 1 with the z-axis."""
10 | pen_vector = keypoints[1] - keypoints[0]
11 | z_axis = np.array([0, 0, 1])
12 | cost = np.linalg.norm(np.cross(pen_vector, z_axis))
13 | return cost
14 |
15 |
--------------------------------------------------------------------------------
/vlm_query/prompt_template.txt:
--------------------------------------------------------------------------------
1 | ## Instructions
2 | Suppose you are controlling a robot to perform manipulation tasks by writing constraint functions in Python. The manipulation task is given as an image of the environment, overlayed with keypoints marked with their indices, along with a text instruction. For each given task, please perform the following steps:
3 | - Determine how many stages are involved in the task. Grasping must be an independent stage. Some examples:
4 | - "pouring tea from teapot":
5 | - 3 stages: "grasp teapot", "align teapot with cup opening", and "pour liquid"
6 | - "put red block on top of blue block":
7 | - 3 stages: "grasp red block", "drop the red block on top of blue block"
8 | - "reorient bouquet and drop it upright into vase":
9 | - 3 stages: "grasp bouquet", "reorient bouquet", and "keep upright and drop into vase"
10 | - For each stage, write two kinds of constraints, "sub-goal constraints" and "path constraints". The "sub-goal constraints" are constraints that must be satisfied **at the end of the stage**, while the "path constraints" are constraints that must be satisfied **within the stage**. Some examples:
11 | - "pouring liquid from teapot":
12 | - "grasp teapot" stage:
13 | - 1 sub-goal constraints: "align the end-effector with the teapot handle"
14 | - 0 path constraints
15 | - "align teapot with cup opening" stage:
16 | - 1 sub-goal constraints: "the teapot spout needs to be 10cm above the cup opening"
17 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot must stay upright to avoid spilling"
18 | - "pour liquid" stage:
19 | - 2 sub-goal constraints: "the teapot spout needs to be 5cm above the cup opening", "the teapot spout must be tilted to pour liquid"
20 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot spout is directly above the cup opening"
21 | - "put red block on top of blue block":
22 | - "grasp red block" stage:
23 | - 1 sub-goal constraints: "align the end-effector with the red block"
24 | - 0 path constraints
25 | - "drop the red block on top of blue block" stage:
26 | - 1 sub-goal constraints: "the red block is 10cm on top of the blue block"
27 | - 1 path constraints: "the robot must still be grasping the red block"
28 | - "eorient bouquet and drop it upright into vase":
29 | - "grasp bouquet" stage:
30 | - 1 sub-goal constraints: "align the end-effector with the bouquet stem"
31 | - 0 path constraints
32 | - "reorient bouquet" stage:
33 | - 1 sub-goal constraints: "the bouquet is upright (parallel to the z-axis)"
34 | - 1 path constraints: "the robot must still be grasping the bouquet stem"
35 | - "keep upright and drop into vase" stage:
36 | - 2 sub-goal constraints: "the bouquet must still stay upright (parallel to the z-axis)", "the bouquet is 20cm above the vase opening"
37 | - 1 path constraints: "the robot must still be grasping the bouquet stem"
38 | - Summarize keypoints to be grasped in all grasping stages by defining the `grasp_keypoints` variable.
39 | - Summarize at the end of which stage the robot should release the keypoints by defining the `release_keypoints` variable.
40 |
41 | **Note:**
42 | - Each constraint takes a dummy end-effector point and a set of keypoints as input and returns a numerical cost, where the constraint is satisfied if the cost is smaller than or equal to zero.
43 | - For each stage, you may write 0 or more sub-goal constraints and 0 or more path constraints.
44 | - Avoid using "if" statements in your constraints.
45 | - Avoid using path constraints when manipulating deformable objects (e.g., clothing, towels).
46 | - You do not need to consider collision avoidance. Focus on what is necessary to complete the task.
47 | - Inputs to the constraints are as follows:
48 | - `end_effector`: np.array of shape `(3,)` representing the end-effector position.
49 | - `keypoints`: np.array of shape `(K, 3)` representing the keypoint positions.
50 | - For any path constraint that requires the robot to be still grasping a keypoint `i`, you may use the provided function `get_grasping_cost_by_keypoint_idx` by calling `return get_grasping_cost_by_keypoint_idx(i)` where `i` is the index of the keypoint.
51 | - Inside of each function, you may use native Python functions, any NumPy functions, and the provided `get_grasping_cost_by_keypoint_idx` function.
52 | - For grasping stage, you should only write one sub-goal constraint that associates the end-effector with a keypoint. No path constraints are needed.
53 | - In order to move a keypoint, its associated object must be grasped in one of the previous stages.
54 | - The robot can only grasp one object at a time.
55 | - Grasping must be an independent stage from other stages.
56 | - You may use two keypoints to form a vector, which can be used to specify a rotation (by specifying the angle between the vector and a fixed axis).
57 | - You may use multiple keypoints to specify a surface or volume.
58 | - The keypoints marked on the image start with index 0, same as the given argument `keypoints` array.
59 | - For a point `i` to be relative to another point `j`, the function should define an `offsetted_point` variable that has the delta added to keypoint `j and then calculate the norm of the xyz coordinates of the keypoint `i` and the `offsetted_point`.
60 | - If you would like to specify a location not marked by a keypoint, try using multiple keypoints to specify the location (e.g., you may take the mean of multiple keypoints if the desired location is in the center of those keypoints).
61 |
62 | **Structure your output in a single python code block as follows:**
63 | ```python
64 |
65 | # Your explanation of how many stages are involved in the task and what each stage is about.
66 | # ...
67 |
68 | num_stages = ?
69 |
70 | ### stage 1 sub-goal constraints (if any)
71 | def stage1_subgoal_constraint1(end_effector, keypoints):
72 | """Put your explanation here."""
73 | ...
74 | return cost
75 | # Add more sub-goal constraints if needed
76 | ...
77 |
78 | ### stage 1 path constraints (if any)
79 | def stage1_path_constraint1(end_effector, keypoints):
80 | """Put your explanation here."""
81 | ...
82 | return cost
83 | # Add more path constraints if needed
84 | ...
85 |
86 | # repeat for more stages
87 | ...
88 |
89 | """
90 | Summarize keypoints to be grasped in all grasping stages.
91 | The length of the list should be equal to the number of stages.
92 | For grapsing stage, write the keypoint index. For non-grasping stage, write -1.
93 | """
94 | grasp_keypoints = [?, ..., ?]
95 |
96 | """
97 | Summarize at **the end of which stage** the robot should release the keypoints.
98 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously).
99 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase.
100 | The length of the list should be equal to the number of stages.
101 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1.
102 | """
103 | release_keypoints = [?, ..., ?]
104 |
105 | ```
106 |
107 | ## Query
108 | Query Task: "{instruction}"
109 | Query Image:
--------------------------------------------------------------------------------