├── .gitignore
├── README.md
├── configs
├── config.yaml
├── og_scene_file_pen.json
└── og_scene_file_red_pen.json
├── constraint_generation.py
├── dinov2_vits14_pretrain.pth
├── environment.py
├── ik_solver.py
├── keypoint_proposal.py
├── main.py
├── media
├── pen-in-holder-disturbances.gif
└── pen-in-holder.gif
├── og_utils.py
├── path_solver.py
├── requirements.txt
├── subgoal_solver.py
├── transform_utils.py
├── utils.py
├── videos
├── 2024-10-12-09-29-53.mp4
├── 2024-10-14-02-04-49.mp4
├── 2024-10-14-03-26-38.mp4
├── 2024-10-14-03-42-23.mp4
├── 2024-10-14-04-16-18.mp4
├── 2024-10-14-04-23-14.mp4
└── 2024-10-14-08-46-17.mp4
├── visualizer.py
└── vlm_query
├── 2024-10-14_08-17-50_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder
├── prompt.txt
└── query_img.png
├── 2024-10-14_08-24-59_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder
├── prompt.txt
└── query_img.png
├── 2024-10-14_08-26-54_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder
├── prompt.txt
└── query_img.png
├── 2024-10-14_08-39-29_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder
├── prompt.txt
└── query_img.png
├── 2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder
├── metadata.json
├── output_raw.txt
├── prompt.txt
├── query_img.png
├── stage1_subgoal_constraints.txt
├── stage2_path_constraints.txt
├── stage2_subgoal_constraints.txt
├── stage3_path_constraints.txt
└── stage3_subgoal_constraints.txt
├── pen
├── metadata.json
├── query_img.png
├── stage1_subgoal_constraints.txt
├── stage2_path_constraints.txt
├── stage2_subgoal_constraints.txt
├── stage3_path_constraints.txt
└── stage3_subgoal_constraints.txt
└── prompt_template.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | *cache*
2 | **/__pycache__/**
3 | **/*.egg-info/
4 | *.egg
5 |
6 | ./videos
7 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ## ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
2 |
3 | #### [[Project Page]](https://rekep-robot.github.io/) [[Paper]](https://rekep-robot.github.io/rekep.pdf) [[Video]](https://youtu.be/2S8YhBdLdww)
4 |
5 | [Wenlong Huang](https://wenlong.page)1, [Chen Wang](https://www.chenwangjeremy.net/)1*, [Yunzhu Li](https://yunzhuli.github.io/)2*, [Ruohan Zhang](https://ai.stanford.edu/~zharu/)1, [Li Fei-Fei](https://profiles.stanford.edu/fei-fei-li)1 (\* indicates equal contributions)
6 |
7 | 1Stanford University, 3Columbia University
8 |
9 |
10 |
11 | This is the official demo code for [ReKep](https://rekep-robot.github.io/) implemented in [OmniGibson](https://behavior.stanford.edu/omnigibson/index.html). ReKep is a method that uses large vision models and vision-language models in a hierarchical optimization framework to generate closed-loop trajectories for manipulation tasks.
12 |
13 |
14 | ## Setup Instructions
15 |
16 | Note that this codebase is best run with a display. For running in headless mode, refer to the [instructions in OmniGibson](https://behavior.stanford.edu/omnigibson/getting_started/installation.html).
17 |
18 | - Install [OmniGibson](https://behavior.stanford.edu/omnigibson/getting_started/installation.html). This code is tested on [this commit](https://github.com/StanfordVL/OmniGibson/tree/cc0316a0574018a3cb2956fcbff3be75c07cdf0f).
19 |
20 | NOTE: If you encounter the warning `We did not find Isaac Sim under ~/.local/share/ov/pkg.` when running `./scripts/setup.sh` for OmniGibson, first ensure that you have installed Isaac Sim. Assuming Isaac Sim is installed in the default directory, then provide the following path `/home/[USERNAME]/.local/share/ov/pkg/isaac-sim-2023.1.1` (replace `[USERNAME]` with your username).
21 |
22 | - Install ReKep in the same conda environment:
23 | ```Shell
24 | conda activate omnigibson
25 | cd ..
26 | git clone https://github.com/huangwl18/ReKep.git
27 | cd ReKep
28 | pip install -r requirements.txt
29 | ```
30 |
31 | - Obtain an [OpenAI API](https://openai.com/blog/openai-api) key and set it up as an environment variable:
32 | ```Shell
33 | export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
34 | ```
35 |
36 | ## Running Demo
37 |
38 | We provide a demo "pen-in-holder" task that illustrates the core idea in ReKep. Below we provide several options to run the demo.
39 |
40 | Notes:
41 | - An additional `--visualize` flag may be added to visualize every solution from optimization, but since the pipeline needs to repeatedly solves optimization problems, the visualization is blocking and needs to be closed every time in order to continue (by pressing "ESC").
42 | - Due to challenges of determinism of the physics simulator, different runs with the same random seed may produce different results. It is possible that the robot may fail at the provided task, especially when external disturbances are applied. In this case, we recommend running the demo again.
43 |
44 | ### Demo with Cached Query
45 |
46 | We recommend starting with the cached VLM query.
47 |
48 | ```Shell
49 | python main.py --use_cached_query [--visualize]
50 | ```
51 |
52 | A video will be saved to `./videos` by default.
53 |
54 |
55 |
56 | ### Demo with External Disturbances
57 |
58 | Since ReKep acts as a closed-loop policy, it is robust to disturbances with automatic failure recovery both within stages and across stages. To demonstrate this in simulation, we apply the following disturbances for the "pen-in-holder" task:
59 |
60 | - Move the pen when robot is trying to grasp the pen
61 |
62 | - Take the pen out of the gripper when robot is trying to reorient the pen
63 |
64 | - Move the holder when robot is trying to drop the pen into the holder
65 |
66 |
67 |
68 | Note that since the disturbances are pre-defined, we recommend running with the cached query.
69 |
70 | ```Shell
71 | python main.py --use_cached_query --apply_disturbance [--visualize]
72 | ```
73 | ### Demo with Live Query
74 |
75 | The following script can be run to query VLM for a new sequence of ReKep constraints and executes them on the robot:
76 |
77 | ```Shell
78 | python main.py [--visualize]
79 | ```
80 |
81 | ## Setup New Environments
82 | Leveraging the diverse objects and scenes provided by [BEHAVIOR-1K](https://behavior.stanford.edu/) in [OmniGibson](https://behavior.stanford.edu/omnigibson/index.html), new tasks and scenes can be easily configured. To change the objects, you may check out the available objects as part of the BEHAVIOR assets on this [page](https://behavior.stanford.edu/knowledgebase/objects/index.html) (click on each object instance to view its visualization). After identifying the objects, we recommend making a copy of the JSON scene file `./configs/og_scene_file_pen.json` and edit the `state` and `objects_info` accordingly. Remember that the scene file need to be supplied to the `Main` class at initialization. Additional [scenes](https://behavior.stanford.edu/knowledgebase/scenes/index.html) and [robots](https://behavior.stanford.edu/omnigibson/getting_started/examples.html#robots) provided by BEHAVIOR-1K may also be possible, but they are currently untested.
83 |
84 | ## Real-World Deployment
85 | To deploy ReKep in the real world, most changes should only be needed inside `environment.py`. Specifically, all of the "exposed functions" need to be changed for the real world environment. The following components need to be implemented:
86 |
87 | - **Robot Controller**: Our real-world implementation uses the joint impedance controller from [Deoxys](https://github.com/UT-Austin-RPL/deoxys_control) for our Franka setup. Specifically, when `execute_action` in `environment.py` receives a target end-effector pose, we first calculate IK to obtain the target joint positions and send the command to the low-level controller.
88 | - **Keypoint Tracker**: Keypoints need to be tracked in order to perform closed-loop replanning, and this typically needs to be achieved using RGD-D cameras. Our real-world implementation uses similarity matching of [DINOv2](https://github.com/facebookresearch/dinov2) features calculated from multiple RGB-D cameras to track the keypoints (details may be found in the [paper](https://rekep-robot.github.io/rekep.pdf) appendix). Alternatively, we also recommend trying out specialized point trackers, such as [\[1\]](https://github.com/henry123-boy/SpaTracker), [\[2\]](https://github.com/google-deepmind/tapnet), [\[3\]](https://github.com/facebookresearch/co-tracker), and [\[4\]](https://github.com/aharley/pips2).
89 | - **SDF Reconstruction**: In order to avoid collision with irrelevant objects or the table, an SDF voxel grid of the environment needs to be provided to the solvers. Additionally, the SDF should ignore robot arm and any grasped objects. Our real-world implementation uses [nvblox_torch](https://github.com/NVlabs/nvblox_torch) for ESDF reconstruction, [cuRobo](https://github.com/NVlabs/curobo) for segmenting robot arm, and [Cutie](https://github.com/hkchengrex/Cutie) for object mask tracking.
90 | - **(Optional) Consistency Cost**: If closed-loop replanning is desired, we find it helpful to include a consistency cost in the solver to encourage the new solution to be close to the previous one (more details can be found in the [paper](https://rekep-robot.github.io/rekep.pdf) appendix).
91 | - **(Optional) Grasp Metric or Grasp Detector**: We include a cost that encourages top-down grasp pose in this codebase, in addition to the collision avoidance cost and the ReKep constraint (for identifying grasp keypoint), which collectively identify the 6 DoF grasp pose. Alternatively, other grasp metrics can be included, such as force-closure. Our real-world implementation instead uses grasp detectors [AnyGrasp](https://github.com/graspnet/anygrasp_sdk), which is implemented as a special routine because it is too slow to be used as an optimizable cost.
92 |
93 | Since there are several components in the pipeline, running them sequentially in the real world may be too slow. As a result, we recommend running the following compute-intensive components in separate processes in addition to the main process that runs `main.py`: `subgoal_solver`, `path_solver`, `keypoint_tracker`, `sdf_reconstruction`, `mask_tracker`, and `grasp_detector` (if used).
94 |
95 | ## Known Limitations
96 | - **Prompt Tuning**: Since ReKep relies on VLMs to generate code-based constraints to solve for the behaviors of the robot, it is sensitive to the specific VLM used and the prompts given to the VLM. Although visual prompting is used, typically we find that the prompts do not necessarily need to contain image-text examples or code examples, and pure-text high-level instructions can go a long way with the latest VLM such as `GPT-4o`. As a result, when starting with a new domain and if you observe that the default prompt is failing, we recommend the following steps: 1) pick a few representative tasks in the domain for validation purposes, 2) procedurally update the prompt with high-level text examples and instructions, and 3) test the prompt by checking the text output and return to step 2 if needed.
97 |
98 | - **Performance Tuning**: For clarity purpose, the entire pipeline is run sequentially. The latency introduced by the simulator and the solvers gets compounded. If this is a concern, we recommend running compute-intensive components, such as the simulator, the `subgoal_solver`, and the `path_solver`, in separate processes, but concurrency needs to be handled with care. More discussion can be found in the "Real-World Deployment" section. To tune the solver, the `objective` function typically takes the majority of time, and among the different costs, the reachability cost by the IK solver is typically most expensive to compute. Depending on the task, you may reduce `sampling_maxfun` and `maxiter` in `configs/config.yaml` or disable the reachability cost.
99 |
100 | - **Task-Space Planning**: Since the current pipeline performs planning in the task space (i.e., solving for end-effector poses) instead of the joint space, it occasionally may produce actions kinematically challenging for robots to achieve, especially for tasks that require 6 DoF motions.
101 |
102 | ## Troubleshooting
103 |
104 | For issues related to OmniGibson, please raise a issue [here](https://github.com/StanfordVL/OmniGibson/issues). You are also welcome to join the [Discord](https://discord.com/invite/bccR5vGFEx) channel for timely support.
105 |
106 | For other issues related to the code in this repo, feel free to raise an issue in this repo and we will try to address it when available.
--------------------------------------------------------------------------------
/configs/config.yaml:
--------------------------------------------------------------------------------
1 | device: &device 'cuda'
2 | seed: &seed 0
3 |
4 | main:
5 | interpolate_pos_step_size: &interpolate_pos_step_size 0.05 # controls the density of the final returned path
6 | interpolate_rot_step_size: &interpolate_rot_step_size 0.34 # about 20 degrees
7 | grasp_depth: 0.10
8 | constraint_tolerance: 0.10 # for backtracking
9 | bounds_min: &bounds_min [-0.45, -0.75, 0.698]
10 | bounds_max: &bounds_max [0.10, 0.60, 1.2]
11 | sdf_voxel_size: 0.01
12 | vlm_camera: 0
13 | action_steps_per_iter: 5
14 | seed: *seed
15 |
16 | env:
17 | video_cache_size: 2000
18 | og_sim:
19 | physics_frequency: 60
20 | action_frequency: 15
21 |
22 | scene:
23 | name: Rs_int
24 | type: InteractiveTraversableScene
25 | scene_model: Rs_int
26 |
27 | bounds_min: *bounds_min
28 | bounds_max: *bounds_max
29 | interpolate_pos_step_size: *interpolate_pos_step_size
30 | interpolate_rot_step_size: *interpolate_rot_step_size
31 |
32 | robot:
33 | robot_config:
34 | name: Fetch
35 | type: Fetch
36 | obs_modalities: [rgb, depth]
37 | action_modalities: continuous
38 | action_normalize: False
39 | position: [-0.8, 0.0, 0.]
40 | grasping_mode: assisted
41 |
42 | controller_config:
43 | base:
44 | name: DifferentialDriveController
45 | arm_0:
46 | name: OperationalSpaceController
47 | kp: 250
48 | kp_limits: [50, 400]
49 | damping_ratio: 0.6
50 | gripper_0:
51 | name: MultiFingerGripperController
52 | command_input_limits: [0.0, 1.0]
53 | mode: smooth
54 | camera:
55 | name: JointController
56 |
57 | camera:
58 | # recorder
59 | 1:
60 | name: cam_1
61 | position: [ 0.6137, 0.4764, 1.4565]
62 | orientation: [ 0.3212, 0.4682, 0.6788, 0.4656]
63 | resolution: 480
64 |
65 | # vlm camera
66 | 0:
67 | name: cam_0
68 | position: [-0.1655, 0.0167, 1.3664]
69 | orientation: [ 0.0550, 0.0544, 0.7010, 0.7090]
70 | resolution: 480
71 |
72 | path_solver:
73 | opt_pos_step_size: 0.20 # controls the density of control points in the path
74 | opt_rot_step_size: 0.78 # controls the density of control points in the path
75 | opt_interpolate_pos_step_size: 0.02 # controls the density of collision checking inside optimization
76 | opt_interpolate_rot_step_size: 0.10
77 | max_collision_points: 60
78 | sampling_maxfun: 5000
79 | bounds_min: *bounds_min
80 | bounds_max: *bounds_max
81 | constraint_tolerance: 0.0001
82 | minimizer_options:
83 | maxiter: 200
84 |
85 | subgoal_solver:
86 | bounds_min: *bounds_min
87 | bounds_max: *bounds_max
88 | sampling_maxfun: 5000
89 | max_collision_points: 60
90 | constraint_tolerance: 0.0001
91 | minimizer_options:
92 | maxiter: 200
93 |
94 | keypoint_proposer:
95 | num_candidates_per_mask: 5
96 | min_dist_bt_keypoints: 0.06
97 | max_mask_ratio: 0.5
98 | device: *device
99 | bounds_min: *bounds_min
100 | bounds_max: *bounds_max
101 | seed: *seed
102 |
103 | constraint_generator:
104 | model: gpt-4o
105 | temperature: 0.0
106 | max_tokens: 2048
107 |
108 | visualizer:
109 | bounds_min: *bounds_min
110 | bounds_max: *bounds_max
--------------------------------------------------------------------------------
/constraint_generation.py:
--------------------------------------------------------------------------------
1 | import base64
2 | from openai import OpenAI
3 | import os
4 | import cv2
5 | import json
6 | import parse
7 | import numpy as np
8 | import time
9 | from datetime import datetime
10 |
11 | # Function to encode the image
12 | def encode_image(image_path):
13 | with open(image_path, "rb") as image_file:
14 | return base64.b64encode(image_file.read()).decode('utf-8')
15 |
16 | class ConstraintGenerator:
17 | def __init__(self, config):
18 | self.config = config
19 | self.client = OpenAI(api_key=os.environ['OPENAI_API_KEY'], base_url="https://api.chatanywhere.tech/v1")
20 | self.base_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), './vlm_query')
21 | with open(os.path.join(self.base_dir, 'prompt_template.txt'), 'r') as f:
22 | self.prompt_template = f.read()
23 |
24 | def _build_prompt(self, image_path, instruction):
25 | img_base64 = encode_image(image_path)
26 | prompt_text = self.prompt_template.format(instruction=instruction)
27 | # save prompt
28 | with open(os.path.join(self.task_dir, 'prompt.txt'), 'w') as f:
29 | f.write(prompt_text)
30 | messages = [
31 | {
32 | "role": "user",
33 | "content": [
34 | {
35 | "type": "text",
36 | "text": self.prompt_template.format(instruction=instruction)
37 | },
38 | {
39 | "type": "image_url",
40 | "image_url": {
41 | "url": f"data:image/png;base64,{img_base64}"
42 | }
43 | },
44 | ]
45 | }
46 | ]
47 | return messages
48 |
49 | def _parse_and_save_constraints(self, output, save_dir):
50 | # parse into function blocks
51 | lines = output.split("\n")
52 | functions = dict()
53 | for i, line in enumerate(lines):
54 | if line.startswith("def "):
55 | start = i
56 | name = line.split("(")[0].split("def ")[1]
57 | if line.startswith(" return "):
58 | end = i
59 | functions[name] = lines[start:end+1]
60 | # organize them based on hierarchy in function names
61 | groupings = dict()
62 | for name in functions:
63 | parts = name.split("_")[:-1] # last one is the constraint idx
64 | key = "_".join(parts)
65 | if key not in groupings:
66 | groupings[key] = []
67 | groupings[key].append(name)
68 | # save them into files
69 | for key in groupings:
70 | with open(os.path.join(save_dir, f"{key}_constraints.txt"), "w") as f:
71 | for name in groupings[key]:
72 | f.write("\n".join(functions[name]) + "\n\n")
73 | print(f"Constraints saved to {save_dir}")
74 |
75 | def _parse_other_metadata(self, output):
76 | data_dict = dict()
77 | # find num_stages
78 | num_stages_template = "num_stages = {num_stages}"
79 | for line in output.split("\n"):
80 | num_stages = parse.parse(num_stages_template, line)
81 | if num_stages is not None:
82 | break
83 | if num_stages is None:
84 | raise ValueError("num_stages not found in output")
85 | data_dict['num_stages'] = int(num_stages['num_stages'])
86 | # find grasp_keypoints
87 | grasp_keypoints_template = "grasp_keypoints = {grasp_keypoints}"
88 | for line in output.split("\n"):
89 | grasp_keypoints = parse.parse(grasp_keypoints_template, line)
90 | if grasp_keypoints is not None:
91 | break
92 | if grasp_keypoints is None:
93 | raise ValueError("grasp_keypoints not found in output")
94 | # convert into list of ints
95 | grasp_keypoints = grasp_keypoints['grasp_keypoints'].replace("[", "").replace("]", "").split(",")
96 | grasp_keypoints = [int(x.strip()) for x in grasp_keypoints]
97 | data_dict['grasp_keypoints'] = grasp_keypoints
98 | # find release_keypoints
99 | release_keypoints_template = "release_keypoints = {release_keypoints}"
100 | for line in output.split("\n"):
101 | release_keypoints = parse.parse(release_keypoints_template, line)
102 | if release_keypoints is not None:
103 | break
104 | if release_keypoints is None:
105 | raise ValueError("release_keypoints not found in output")
106 | # convert into list of ints
107 | release_keypoints = release_keypoints['release_keypoints'].replace("[", "").replace("]", "").split(",")
108 | release_keypoints = [int(x.strip()) for x in release_keypoints]
109 | data_dict['release_keypoints'] = release_keypoints
110 | return data_dict
111 |
112 | def _save_metadata(self, metadata):
113 | for k, v in metadata.items():
114 | if isinstance(v, np.ndarray):
115 | metadata[k] = v.tolist()
116 | with open(os.path.join(self.task_dir, 'metadata.json'), 'w') as f:
117 | json.dump(metadata, f)
118 | print(f"Metadata saved to {os.path.join(self.task_dir, 'metadata.json')}")
119 |
120 | def generate(self, img, instruction, metadata):
121 | """
122 | Args:
123 | img (np.ndarray): image of the scene (H, W, 3) uint8
124 | instruction (str): instruction for the query
125 | Returns:
126 | save_dir (str): directory where the constraints
127 | """
128 | # create a directory for the task
129 | fname = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") + "_" + instruction.lower().replace(" ", "_")
130 | self.task_dir = os.path.join(self.base_dir, fname)
131 | os.makedirs(self.task_dir, exist_ok=True)
132 | # save query image
133 | image_path = os.path.join(self.task_dir, 'query_img.png')
134 | cv2.imwrite(image_path, img[..., ::-1])
135 | # build prompt
136 | messages = self._build_prompt(image_path, instruction)
137 | # stream back the response
138 | stream = self.client.chat.completions.create(model=self.config['model'],
139 | messages=messages,
140 | temperature=self.config['temperature'],
141 | max_tokens=self.config['max_tokens'],
142 | stream=True)
143 | output = ""
144 | start = time.time()
145 | for chunk in stream:
146 | print(f'[{time.time()-start:.2f}s] Querying OpenAI API...', end='\r')
147 | if chunk.choices[0].delta.content is not None:
148 | output += chunk.choices[0].delta.content
149 | print(f'[{time.time()-start:.2f}s] Querying OpenAI API...Done')
150 | # save raw output
151 | with open(os.path.join(self.task_dir, 'output_raw.txt'), 'w') as f:
152 | f.write(output)
153 | # parse and save constraints
154 | self._parse_and_save_constraints(output, self.task_dir)
155 | # save metadata
156 | metadata.update(self._parse_other_metadata(output))
157 | self._save_metadata(metadata)
158 | return self.task_dir
159 |
--------------------------------------------------------------------------------
/dinov2_vits14_pretrain.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/dinov2_vits14_pretrain.pth
--------------------------------------------------------------------------------
/environment.py:
--------------------------------------------------------------------------------
1 | import time
2 | import numpy as np
3 | import os
4 | import datetime
5 | import transform_utils as T
6 | import trimesh
7 | import open3d as o3d
8 | import imageio
9 | import omnigibson as og
10 | from omnigibson.macros import gm
11 | from omnigibson.utils.usd_utils import PoseAPI, mesh_prim_mesh_to_trimesh_mesh, mesh_prim_shape_to_trimesh_mesh
12 | from omnigibson.robots.fetch import Fetch
13 | from omnigibson.controllers import IsGraspingState
14 | from og_utils import OGCamera
15 | from utils import (
16 | bcolors,
17 | get_clock_time,
18 | angle_between_rotmat,
19 | angle_between_quats,
20 | get_linear_interpolation_steps,
21 | linear_interpolate_poses,
22 | )
23 | from omnigibson.robots.manipulation_robot import ManipulationRobot
24 | from omnigibson.controllers.controller_base import ControlType, BaseController
25 | import torch
26 |
27 | # Don't use GPU dynamics and use flatcache for performance boost
28 | gm.USE_GPU_DYNAMICS = True
29 | gm.ENABLE_FLATCACHE = False
30 |
31 | # some customization to the OG functions
32 | def custom_clip_control(self, control):
33 | """
34 | Clips the inputted @control signal based on @control_limits.
35 |
36 | Args:
37 | control (Array[float]): control signal to clip
38 |
39 | Returns:
40 | Array[float]: Clipped control signal
41 | """
42 | clipped_control = control.clip(
43 | self._control_limits[self.control_type][0][self.dof_idx],
44 | self._control_limits[self.control_type][1][self.dof_idx],
45 | )
46 | idx = (
47 | self._dof_has_limits[self.dof_idx]
48 | if self.control_type == ControlType.POSITION
49 | else [True] * self.control_dim
50 | )
51 | if len(control) > 1:
52 | control[idx] = clipped_control[idx]
53 | return control
54 |
55 | Fetch._initialize = ManipulationRobot._initialize
56 | BaseController.clip_control = custom_clip_control
57 |
58 | class ReKepOGEnv:
59 | def __init__(self, config, scene_file, verbose=False):
60 | self.video_cache = []
61 | self.config = config
62 | self.verbose = verbose
63 | self.config['scene']['scene_file'] = scene_file
64 | self.bounds_min = np.array(self.config['bounds_min'])
65 | self.bounds_max = np.array(self.config['bounds_max'])
66 | self.interpolate_pos_step_size = self.config['interpolate_pos_step_size']
67 | self.interpolate_rot_step_size = self.config['interpolate_rot_step_size']
68 | # create omnigibson environment
69 | self.step_counter = 0
70 | self.og_env = og.Environment(dict(scene=self.config['scene'], robots=[self.config['robot']['robot_config']], env=self.config['og_sim']))
71 | self.og_env.scene.update_initial_state()
72 | for _ in range(10): og.sim.step()
73 | # robot vars
74 | self.robot = self.og_env.robots[0]
75 | dof_idx = np.concatenate([self.robot.trunk_control_idx,
76 | self.robot.arm_control_idx[self.robot.default_arm]])
77 | self.reset_joint_pos = self.robot.reset_joint_pos[dof_idx]
78 | self.world2robot_homo = T.pose_inv(T.pose2mat(self.robot.get_position_orientation()))
79 | # initialize cameras
80 | self._initialize_cameras(self.config['camera'])
81 | self.last_og_gripper_action = 1.0
82 |
83 | # ======================================
84 | # = exposed functions
85 | # ======================================
86 | def get_sdf_voxels(self, resolution, exclude_robot=True, exclude_obj_in_hand=True):
87 | """
88 | open3d-based SDF computation
89 | 1. recursively get all usd prim and get their vertices and faces
90 | 2. compute SDF using open3d
91 | """
92 | start = time.time()
93 | exclude_names = ['wall', 'floor', 'ceiling']
94 | if exclude_robot:
95 | exclude_names += ['fetch', 'robot']
96 | if exclude_obj_in_hand:
97 | assert self.config['robot']['robot_config']['grasping_mode'] in ['assisted', 'sticky'], "Currently only supported for assisted or sticky grasping"
98 | in_hand_obj = self.robot._ag_obj_in_hand[self.robot.default_arm]
99 | if in_hand_obj is not None:
100 | exclude_names.append(in_hand_obj.name.lower())
101 | trimesh_objects = []
102 | for obj in self.og_env.scene.objects:
103 | if any([name in obj.name.lower() for name in exclude_names]):
104 | continue
105 | for link in obj.links.values():
106 | for mesh in link.collision_meshes.values():
107 | mesh_type = mesh.prim.GetPrimTypeInfo().GetTypeName()
108 | if mesh_type == 'Mesh':
109 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(mesh.prim)
110 | else:
111 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(mesh.prim)
112 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(mesh.prim_path)
113 | trimesh_object.apply_transform(world_pose_w_scale)
114 | trimesh_objects.append(trimesh_object)
115 | # chain trimesh objects
116 | scene_mesh = trimesh.util.concatenate(trimesh_objects)
117 | # Create a scene and add the triangle mesh
118 | scene = o3d.t.geometry.RaycastingScene()
119 | vertex_positions = scene_mesh.vertices
120 | triangle_indices = scene_mesh.faces
121 | vertex_positions = o3d.core.Tensor(vertex_positions, dtype=o3d.core.Dtype.Float32)
122 | triangle_indices = o3d.core.Tensor(triangle_indices, dtype=o3d.core.Dtype.UInt32)
123 | _ = scene.add_triangles(vertex_positions, triangle_indices) # we do not need the geometry ID for mesh
124 | # create a grid
125 | shape = np.ceil((self.bounds_max - self.bounds_min) / resolution).astype(int)
126 | steps = (self.bounds_max - self.bounds_min) / shape
127 | grid = np.mgrid[self.bounds_min[0]:self.bounds_max[0]:steps[0],
128 | self.bounds_min[1]:self.bounds_max[1]:steps[1],
129 | self.bounds_min[2]:self.bounds_max[2]:steps[2]]
130 | grid = grid.reshape(3, -1).T
131 | # compute SDF
132 | sdf_voxels = scene.compute_signed_distance(grid.astype(np.float32))
133 | # convert back to np array
134 | sdf_voxels = sdf_voxels.cpu().numpy()
135 | # open3d has flipped sign from our convention
136 | sdf_voxels = -sdf_voxels
137 | sdf_voxels = sdf_voxels.reshape(shape)
138 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] SDF voxels computed in {time.time() - start:.4f} seconds{bcolors.ENDC}')
139 | return sdf_voxels
140 |
141 | def get_cam_obs(self):
142 | self.last_cam_obs = dict()
143 | for cam_id in self.cams:
144 | self.last_cam_obs[cam_id] = self.cams[cam_id].get_obs() # each containing rgb, depth, points, seg
145 | return self.last_cam_obs
146 |
147 | def register_keypoints(self, keypoints):
148 | """
149 | Args:
150 | keypoints (np.ndarray): keypoints in the world frame of shape (N, 3)
151 | Returns:
152 | None
153 | Given a set of keypoints in the world frame, this function registers them so that their newest positions can be accessed later.
154 | """
155 | if not isinstance(keypoints, np.ndarray):
156 | keypoints = np.array(keypoints)
157 | self.keypoints = keypoints
158 | self._keypoint_registry = dict()
159 | self._keypoint2object = dict()
160 | exclude_names = ['wall', 'floor', 'ceiling', 'table', 'fetch', 'robot']
161 | for idx, keypoint in enumerate(keypoints):
162 | closest_distance = np.inf
163 | for obj in self.og_env.scene.objects:
164 | if any([name in obj.name.lower() for name in exclude_names]):
165 | continue
166 | for link in obj.links.values():
167 | for mesh in link.visual_meshes.values():
168 | mesh_prim_path = mesh.prim_path
169 | mesh_type = mesh.prim.GetPrimTypeInfo().GetTypeName()
170 | if mesh_type == 'Mesh':
171 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(mesh.prim)
172 | else:
173 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(mesh.prim)
174 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(mesh.prim_path)
175 | trimesh_object.apply_transform(world_pose_w_scale)
176 | points_transformed = trimesh_object.sample(1000)
177 |
178 | # find closest point
179 | dists = np.linalg.norm(points_transformed - keypoint, axis=1)
180 | point = points_transformed[np.argmin(dists)]
181 | distance = np.linalg.norm(point - keypoint)
182 | if distance < closest_distance:
183 | closest_distance = distance
184 | closest_prim_path = mesh_prim_path
185 | closest_point = point
186 | closest_obj = obj
187 | self._keypoint_registry[idx] = (closest_prim_path, PoseAPI.get_world_pose(closest_prim_path))
188 | self._keypoint2object[idx] = closest_obj
189 | # overwrite the keypoint with the closest point
190 | self.keypoints[idx] = closest_point
191 |
192 | def get_keypoint_positions(self):
193 | """
194 | Args:
195 | None
196 | Returns:
197 | np.ndarray: keypoints in the world frame of shape (N, 3)
198 | Given the registered keypoints, this function returns their current positions in the world frame.
199 | """
200 | assert hasattr(self, '_keypoint_registry') and self._keypoint_registry is not None, "Keypoints have not been registered yet."
201 | keypoint_positions = []
202 | for idx, (prim_path, init_pose) in self._keypoint_registry.items():
203 | init_pose = T.pose2mat(init_pose)
204 | centering_transform = T.pose_inv(init_pose)
205 | keypoint_centered = np.dot(centering_transform, np.append(self.keypoints[idx], 1))[:3]
206 | curr_pose = T.pose2mat(PoseAPI.get_world_pose(prim_path))
207 | keypoint = np.dot(curr_pose, np.append(keypoint_centered, 1))[:3]
208 | keypoint_positions.append(keypoint)
209 | return np.array(keypoint_positions)
210 |
211 | def get_object_by_keypoint(self, keypoint_idx):
212 | """
213 | Args:
214 | keypoint_idx (int): the index of the keypoint
215 | Returns:
216 | pointer: the object that the keypoint is associated with
217 | Given the keypoint index, this function returns the name of the object that the keypoint is associated with.
218 | """
219 | assert hasattr(self, '_keypoint2object') and self._keypoint2object is not None, "Keypoints have not been registered yet."
220 | return self._keypoint2object[keypoint_idx]
221 |
222 | def get_collision_points(self, noise=True):
223 | """
224 | Get the points of the gripper and any object in hand.
225 | """
226 | # add gripper collision points
227 | collision_points = []
228 | for obj in self.og_env.scene.objects:
229 | if 'fetch' in obj.name.lower():
230 | for name, link in obj.links.items():
231 | if 'gripper' in name.lower() or 'wrist' in name.lower(): # wrist_roll and wrist_flex
232 | for collision_mesh in link.collision_meshes.values():
233 | mesh_prim_path = collision_mesh.prim_path
234 | mesh_type = collision_mesh.prim.GetPrimTypeInfo().GetTypeName()
235 | if mesh_type == 'Mesh':
236 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(collision_mesh.prim)
237 | else:
238 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(collision_mesh.prim)
239 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(mesh_prim_path)
240 | trimesh_object.apply_transform(world_pose_w_scale)
241 | points_transformed = trimesh_object.sample(1000)
242 | # add to collision points
243 | collision_points.append(points_transformed)
244 | # add object in hand collision points
245 | in_hand_obj = self.robot._ag_obj_in_hand[self.robot.default_arm]
246 | if in_hand_obj is not None:
247 | for link in in_hand_obj.links.values():
248 | for collision_mesh in link.collision_meshes.values():
249 | mesh_type = collision_mesh.prim.GetPrimTypeInfo().GetTypeName()
250 | if mesh_type == 'Mesh':
251 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(collision_mesh.prim)
252 | else:
253 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(collision_mesh.prim)
254 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(collision_mesh.prim_path)
255 | trimesh_object.apply_transform(world_pose_w_scale)
256 | points_transformed = trimesh_object.sample(1000)
257 | # add to collision points
258 | collision_points.append(points_transformed)
259 | collision_points = np.concatenate(collision_points, axis=0)
260 | return collision_points
261 |
262 | def reset(self):
263 | self.og_env.reset()
264 | self.robot.reset()
265 | for _ in range(5): self._step()
266 | self.open_gripper()
267 | # moving arm to the side to unblock view
268 | ee_pose = self.get_ee_pose()
269 | ee_pose[:3] += np.array([0.0, -0.2, -0.1])
270 | action = np.concatenate([ee_pose, [self.get_gripper_null_action()]])
271 | self.execute_action(action, precise=True)
272 | self.video_cache = []
273 | print(f'{bcolors.HEADER}Reset done.{bcolors.ENDC}')
274 |
275 | def is_grasping(self, candidate_obj=None):
276 | return self.robot.is_grasping(candidate_obj=candidate_obj) == IsGraspingState.TRUE
277 |
278 | def get_ee_pose(self):
279 | ee_pos, ee_xyzw = (self.robot.get_eef_position(), self.robot.get_eef_orientation())
280 | ee_pose = np.concatenate([ee_pos, ee_xyzw]) # [7]
281 | return ee_pose
282 |
283 | def get_ee_pos(self):
284 | return self.get_ee_pose()[:3]
285 |
286 | def get_ee_quat(self):
287 | return self.get_ee_pose()[3:]
288 |
289 | def get_arm_joint_postions(self):
290 | assert isinstance(self.robot, Fetch), "The IK solver assumes the robot is a Fetch robot"
291 | arm = self.robot.default_arm
292 | dof_idx = np.concatenate([self.robot.trunk_control_idx, self.robot.arm_control_idx[arm]])
293 | arm_joint_pos = self.robot.get_joint_positions()[dof_idx]
294 | return arm_joint_pos
295 |
296 | def close_gripper(self):
297 | """
298 | Exposed interface: 1.0 for closed, -1.0 for open, 0.0 for no change
299 | Internal OG interface: 1.0 for open, 0.0 for closed
300 | """
301 | if self.last_og_gripper_action == 0.0:
302 | return
303 | action = np.zeros(12)
304 | action[10:] = [0, 0] # gripper: float. 0. for closed, 1. for open.
305 | for _ in range(30):
306 | self._step(action)
307 | self.last_og_gripper_action = 0.0
308 |
309 | def open_gripper(self):
310 | if self.last_og_gripper_action == 1.0:
311 | return
312 | action = np.zeros(12)
313 | action[10:] = [1, 1] # gripper: float. 0. for closed, 1. for open.
314 | for _ in range(30):
315 | self._step(action)
316 | self.last_og_gripper_action = 1.0
317 |
318 | def get_last_og_gripper_action(self):
319 | return self.last_og_gripper_action
320 |
321 | def get_gripper_open_action(self):
322 | return -1.0
323 |
324 | def get_gripper_close_action(self):
325 | return 1.0
326 |
327 | def get_gripper_null_action(self):
328 | return 0.0
329 |
330 | def compute_target_delta_ee(self, target_pose):
331 | target_pos, target_xyzw = target_pose[:3], target_pose[3:]
332 | ee_pose = self.get_ee_pose()
333 | ee_pos, ee_xyzw = ee_pose[:3], ee_pose[3:]
334 | pos_diff = np.linalg.norm(ee_pos - target_pos)
335 | rot_diff = angle_between_quats(ee_xyzw, target_xyzw)
336 | return pos_diff, rot_diff
337 |
338 | def execute_action(
339 | self,
340 | action,
341 | precise=True,
342 | ):
343 | """
344 | Moves the robot gripper to a target pose by specifying the absolute pose in the world frame and executes gripper action.
345 |
346 | Args:
347 | action (x, y, z, qx, qy, qz, qw, gripper_action): absolute target pose in the world frame + gripper action.
348 | precise (bool): whether to use small position and rotation thresholds for precise movement (robot would move slower).
349 | Returns:
350 | tuple: A tuple containing the position and rotation errors after reaching the target pose.
351 | """
352 | if precise:
353 | pos_threshold = 0.03
354 | rot_threshold = 3.0
355 | else:
356 | pos_threshold = 0.10
357 | rot_threshold = 5.0
358 | action = np.array(action).copy()
359 | assert action.shape == (8,)
360 | target_pose = action[:7]
361 | gripper_action = action[7]
362 |
363 | # ======================================
364 | # = status and safety check
365 | # ======================================
366 | if np.any(target_pose[:3] < self.bounds_min) \
367 | or np.any(target_pose[:3] > self.bounds_max):
368 | print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Target position is out of bounds, clipping to workspace bounds{bcolors.ENDC}')
369 | target_pose[:3] = np.clip(target_pose[:3], self.bounds_min, self.bounds_max)
370 |
371 | # ======================================
372 | # = interpolation
373 | # ======================================
374 | current_pose = self.get_ee_pose()
375 | pos_diff = np.linalg.norm(current_pose[:3] - target_pose[:3])
376 | rot_diff = angle_between_quats(current_pose[3:7], target_pose[3:7])
377 | pos_is_close = pos_diff < self.interpolate_pos_step_size
378 | rot_is_close = rot_diff < self.interpolate_rot_step_size
379 | if pos_is_close and rot_is_close:
380 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Skipping interpolation{bcolors.ENDC}')
381 | pose_seq = np.array([target_pose])
382 | else:
383 | num_steps = get_linear_interpolation_steps(current_pose, target_pose, self.interpolate_pos_step_size, self.interpolate_rot_step_size)
384 | pose_seq = linear_interpolate_poses(current_pose, target_pose, num_steps)
385 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Interpolating for {num_steps} steps{bcolors.ENDC}')
386 |
387 | # ======================================
388 | # = move to target pose
389 | # ======================================
390 | # move faster for intermediate poses
391 | intermediate_pos_threshold = 0.10
392 | intermediate_rot_threshold = 5.0
393 | for pose in pose_seq[:-1]:
394 | self._move_to_waypoint(pose, intermediate_pos_threshold, intermediate_rot_threshold)
395 | # move to the final pose with required precision
396 | pose = pose_seq[-1]
397 | self._move_to_waypoint(pose, pos_threshold, rot_threshold, max_steps=20 if not precise else 40)
398 | # compute error
399 | pos_error, rot_error = self.compute_target_delta_ee(target_pose)
400 | self.verbose and print(f'\n{bcolors.BOLD}[environment.py | {get_clock_time()}] Move to pose completed (pos_error: {pos_error}, rot_error: {np.rad2deg(rot_error)}){bcolors.ENDC}\n')
401 |
402 | # ======================================
403 | # = apply gripper action
404 | # ======================================
405 | if gripper_action == self.get_gripper_open_action():
406 | self.open_gripper()
407 | elif gripper_action == self.get_gripper_close_action():
408 | self.close_gripper()
409 | elif gripper_action == self.get_gripper_null_action():
410 | pass
411 | else:
412 | raise ValueError(f"Invalid gripper action: {gripper_action}")
413 |
414 | return pos_error, rot_error
415 |
416 | def sleep(self, seconds):
417 | start = time.time()
418 | while time.time() - start < seconds:
419 | self._step()
420 |
421 | def save_video(self, save_path=None):
422 | save_dir = os.path.join(os.path.dirname(__file__), 'videos')
423 | os.makedirs(save_dir, exist_ok=True)
424 | if save_path is None:
425 | save_path = os.path.join(save_dir, f'{datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")}.mp4')
426 | video_writer = imageio.get_writer(save_path, fps=30)
427 | for rgb in self.video_cache:
428 | # print(type(rgb), rgb.shape)
429 | if not isinstance(rgb, np.ndarray):
430 | rgb = np.array(rgb)
431 | video_writer.append_data(rgb)
432 | video_writer.close()
433 | return save_path
434 |
435 | # ======================================
436 | # = internal functions
437 | # ======================================
438 | def _check_reached_ee(self, target_pos, target_xyzw, pos_threshold, rot_threshold):
439 | """
440 | this is supposed to be for true ee pose (franka hand) in robot frame
441 | """
442 | current_pos = self.robot.get_eef_position()
443 | current_xyzw = self.robot.get_eef_orientation()
444 | current_rotmat = T.quat2mat(current_xyzw)
445 | target_rotmat = T.quat2mat(target_xyzw)
446 | # calculate position delta
447 | if torch.is_tensor(current_pos):
448 | current_pos = current_pos.detach().cpu().numpy()
449 | pos_diff = (target_pos - current_pos).flatten()
450 | pos_error = np.linalg.norm(pos_diff)
451 | # calculate rotation delta
452 | rot_error = angle_between_rotmat(current_rotmat, target_rotmat)
453 | # print status
454 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Curr pose: {current_pos}, {current_xyzw} (pos_error: {pos_error.round(4)}, rot_error: {np.rad2deg(rot_error).round(4)}){bcolors.ENDC}')
455 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Goal pose: {target_pos}, {target_xyzw} (pos_thres: {pos_threshold}, rot_thres: {rot_threshold}){bcolors.ENDC}')
456 | if pos_error < pos_threshold and rot_error < np.deg2rad(rot_threshold):
457 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] OSC pose reached (pos_error: {pos_error.round(4)}, rot_error: {np.rad2deg(rot_error).round(4)}){bcolors.ENDC}')
458 | return True, pos_error, rot_error
459 | return False, pos_error, rot_error
460 |
461 | def _move_to_waypoint(self, target_pose_world, pos_threshold=0.02, rot_threshold=3.0, max_steps=10):
462 | pos_errors = []
463 | rot_errors = []
464 | count = 0
465 | while count < max_steps:
466 | reached, pos_error, rot_error = self._check_reached_ee(target_pose_world[:3], target_pose_world[3:7], pos_threshold, rot_threshold)
467 | pos_errors.append(pos_error)
468 | rot_errors.append(rot_error)
469 | if reached:
470 | break
471 | # convert world pose to robot pose
472 | target_pose_robot = np.dot(self.world2robot_homo, T.convert_pose_quat2mat(target_pose_world))
473 | # convert to relative pose to be used with the underlying controller
474 | self.relative_eef_position = self.robot.get_relative_eef_position()
475 | if torch.is_tensor(self.relative_eef_position):
476 | self.relative_eef_position = self.relative_eef_position.detach().cpu().numpy()
477 | relative_position = target_pose_robot[:3, 3] - self.relative_eef_position
478 | relative_quat = T.quat_distance(T.mat2quat(target_pose_robot[:3, :3]), self.robot.get_relative_eef_orientation())
479 | assert isinstance(self.robot, Fetch), "this action space is only for fetch"
480 | action = np.zeros(12) # first 3 are base, which we don't use
481 | action[4:7] = relative_position
482 | action[7:10] = T.quat2axisangle(relative_quat)
483 | action[10:] = [self.last_og_gripper_action, self.last_og_gripper_action]
484 | # step the action
485 | _ = self._step(action=action)
486 | count += 1
487 | if count == max_steps:
488 | print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] OSC pose not reached after {max_steps} steps (pos_error: {pos_errors[-1].round(4)}, rot_error: {np.rad2deg(rot_errors[-1]).round(4)}){bcolors.ENDC}')
489 |
490 | def _step(self, action=None):
491 | if hasattr(self, 'disturbance_seq') and self.disturbance_seq is not None:
492 | next(self.disturbance_seq)
493 | if action is not None:
494 | self.og_env.step(action)
495 | else:
496 | og.sim.step()
497 | cam_obs = self.get_cam_obs()
498 | rgb = cam_obs[1]['rgb']
499 | if len(self.video_cache) < self.config['video_cache_size']:
500 | self.video_cache.append(rgb)
501 | else:
502 | self.video_cache.pop(0)
503 | self.video_cache.append(rgb)
504 | self.step_counter += 1
505 |
506 | def _initialize_cameras(self, cam_config):
507 | """
508 | ::param poses: list of tuples of (position, orientation) of the cameras
509 | """
510 | self.cams = dict()
511 | for cam_id in cam_config:
512 | cam_id = int(cam_id)
513 | self.cams[cam_id] = OGCamera(self.og_env, cam_config[cam_id])
514 | for _ in range(10): og.sim.render()
--------------------------------------------------------------------------------
/ik_solver.py:
--------------------------------------------------------------------------------
1 | """
2 | Adapted from OmniGibson and the Lula IK solver
3 | """
4 | import omnigibson.lazy as lazy
5 | import numpy as np
6 |
7 | class IKSolver:
8 | """
9 | Class for thinly wrapping Lula IK solver
10 | """
11 |
12 | def __init__(
13 | self,
14 | robot_description_path,
15 | robot_urdf_path,
16 | eef_name,
17 | reset_joint_pos,
18 | world2robot_homo,
19 | ):
20 | # Create robot description, kinematics, and config
21 | self.robot_description = lazy.lula.load_robot(robot_description_path, robot_urdf_path)
22 | self.kinematics = self.robot_description.kinematics()
23 | self.config = lazy.lula.CyclicCoordDescentIkConfig()
24 | self.eef_name = eef_name
25 | self.reset_joint_pos = reset_joint_pos
26 | self.world2robot_homo = world2robot_homo
27 |
28 | def solve(
29 | self,
30 | target_pose_homo,
31 | position_tolerance=0.01,
32 | orientation_tolerance=0.05,
33 | position_weight=1.0,
34 | orientation_weight=0.05,
35 | max_iterations=150,
36 | initial_joint_pos=None,
37 | ):
38 | """
39 | Backs out joint positions to achieve desired @target_pos and @target_quat
40 |
41 | Args:
42 | target_pose_homo (np.ndarray): [4, 4] homogeneous transformation matrix of the target pose in world frame
43 | position_tolerance (float): Maximum position error (L2-norm) for a successful IK solution
44 | orientation_tolerance (float): Maximum orientation error (per-axis L2-norm) for a successful IK solution
45 | position_weight (float): Weight for the relative importance of position error during CCD
46 | orientation_weight (float): Weight for the relative importance of position error during CCD
47 | max_iterations (int): Number of iterations used for each cyclic coordinate descent.
48 | initial_joint_pos (None or n-array): If specified, will set the initial cspace seed when solving for joint
49 | positions. Otherwise, will use self.reset_joint_pos
50 |
51 | Returns:
52 | ik_results (lazy.lula.CyclicCoordDescentIkResult): IK result object containing the joint positions and other information.
53 | """
54 | # convert target pose to robot base frame
55 | target_pose_robot = np.dot(self.world2robot_homo, target_pose_homo)
56 | target_pose_pos = target_pose_robot[:3, 3]
57 | target_pose_rot = target_pose_robot[:3, :3]
58 | ik_target_pose = lazy.lula.Pose3(lazy.lula.Rotation3(target_pose_rot), target_pose_pos)
59 | # Set the cspace seed and tolerance
60 | initial_joint_pos = self.reset_joint_pos if initial_joint_pos is None else np.array(initial_joint_pos)
61 | self.config.cspace_seeds = [initial_joint_pos]
62 | self.config.position_tolerance = position_tolerance
63 | self.config.orientation_tolerance = orientation_tolerance
64 | self.config.ccd_position_weight = position_weight
65 | self.config.ccd_orientation_weight = orientation_weight
66 | self.config.max_num_descents = max_iterations
67 | # Compute target joint positions
68 | ik_results = lazy.lula.compute_ik_ccd(self.kinematics, ik_target_pose, self.eef_name, self.config)
69 | return ik_results
--------------------------------------------------------------------------------
/keypoint_proposal.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | import cv2
4 | from torch.nn.functional import interpolate
5 | from kmeans_pytorch import kmeans
6 | from utils import filter_points_by_bounds
7 | from sklearn.cluster import MeanShift
8 |
9 | class KeypointProposer:
10 | def __init__(self, config):
11 | self.config = config
12 | self.device = torch.device(self.config['device'])
13 | self.dinov2 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14').eval().to(self.device)
14 | # local_model_path = '/omnigibson-src/ReKep/dinov2_vits14_pretrain.pth'
15 | # checkpoint = torch.load(local_model_path)
16 | # self.dinov2 = checkpoint
17 | self.bounds_min = np.array(self.config['bounds_min'])
18 | self.bounds_max = np.array(self.config['bounds_max'])
19 | self.mean_shift = MeanShift(bandwidth=self.config['min_dist_bt_keypoints'], bin_seeding=True, n_jobs=32)
20 | self.patch_size = 14 # dinov2
21 | np.random.seed(self.config['seed'])
22 | torch.manual_seed(self.config['seed'])
23 | torch.cuda.manual_seed(self.config['seed'])
24 |
25 | def get_keypoints(self, rgb, points, masks):
26 | # preprocessing
27 | # breakpoint()
28 | transformed_rgb, rgb, points, masks, shape_info = self._preprocess(rgb, points, masks)
29 | # get features
30 | features_flat = self._get_features(transformed_rgb, shape_info)
31 | # for each mask, cluster in feature space to get meaningful regions, and use their centers as keypoint candidates
32 | candidate_keypoints, candidate_pixels, candidate_rigid_group_ids = self._cluster_features(points, features_flat, masks)
33 | # exclude keypoints that are outside of the workspace
34 | within_space = filter_points_by_bounds(candidate_keypoints, self.bounds_min, self.bounds_max, strict=True)
35 | candidate_keypoints = candidate_keypoints[within_space]
36 | candidate_pixels = candidate_pixels[within_space]
37 | candidate_rigid_group_ids = candidate_rigid_group_ids[within_space]
38 | # merge close points by clustering in cartesian space
39 | merged_indices = self._merge_clusters(candidate_keypoints)
40 | candidate_keypoints = candidate_keypoints[merged_indices]
41 | candidate_pixels = candidate_pixels[merged_indices]
42 | candidate_rigid_group_ids = candidate_rigid_group_ids[merged_indices]
43 | # sort candidates by locations
44 | sort_idx = np.lexsort((candidate_pixels[:, 0], candidate_pixels[:, 1]))
45 | candidate_keypoints = candidate_keypoints[sort_idx]
46 | candidate_pixels = candidate_pixels[sort_idx]
47 | candidate_rigid_group_ids = candidate_rigid_group_ids[sort_idx]
48 | # project keypoints to image space
49 | projected = self._project_keypoints_to_img(rgb, candidate_pixels, candidate_rigid_group_ids, masks, features_flat)
50 | return candidate_keypoints, projected
51 |
52 | def _preprocess(self, rgb, points, masks):
53 | if masks.is_cuda:
54 | masks = masks.cpu()
55 | # print("***masks", masks)
56 |
57 | rgb = rgb.cpu() # move to CPU if on GPU
58 | rgb = rgb.numpy()
59 | # print("***rgb", rgb)
60 |
61 | # convert masks to binary masks
62 | masks = [masks == uid for uid in np.unique(masks.numpy())]
63 | # print("***masks2", masks)
64 | # ensure input shape is compatible with dinov2
65 | H, W, _ = rgb.shape
66 | patch_h = int(H // self.patch_size)
67 | patch_w = int(W // self.patch_size)
68 | new_H = patch_h * self.patch_size
69 | new_W = patch_w * self.patch_size
70 | # print("***rgb2", rgb)
71 | transformed_rgb = cv2.resize(rgb, (new_W, new_H))
72 | transformed_rgb = transformed_rgb.astype(np.float32) / 255.0 # float32 [H, W, 3]
73 | # shape info
74 | shape_info = {
75 | 'img_h': H,
76 | 'img_w': W,
77 | 'patch_h': patch_h,
78 | 'patch_w': patch_w,
79 | }
80 | return transformed_rgb, rgb, points, masks, shape_info
81 |
82 | def _project_keypoints_to_img(self, rgb, candidate_pixels, candidate_rigid_group_ids, masks, features_flat):
83 | projected = rgb.copy()
84 | # overlay keypoints on the image
85 | for keypoint_count, pixel in enumerate(candidate_pixels):
86 | displayed_text = f"{keypoint_count}"
87 | text_length = len(displayed_text)
88 | # draw a box
89 | box_width = 30 + 10 * (text_length - 1)
90 | box_height = 30
91 | cv2.rectangle(projected, (pixel[1] - box_width // 2, pixel[0] - box_height // 2), (pixel[1] + box_width // 2, pixel[0] + box_height // 2), (255, 255, 255), -1)
92 | cv2.rectangle(projected, (pixel[1] - box_width // 2, pixel[0] - box_height // 2), (pixel[1] + box_width // 2, pixel[0] + box_height // 2), (0, 0, 0), 2)
93 | # draw text
94 | org = (pixel[1] - 7 * (text_length), pixel[0] + 7)
95 | color = (255, 0, 0)
96 | cv2.putText(projected, str(keypoint_count), org, cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
97 | keypoint_count += 1
98 | return projected
99 |
100 | @torch.inference_mode()
101 | @torch.amp.autocast('cuda')
102 | def _get_features(self, transformed_rgb, shape_info):
103 | img_h = shape_info['img_h']
104 | img_w = shape_info['img_w']
105 | patch_h = shape_info['patch_h']
106 | patch_w = shape_info['patch_w']
107 | # get features
108 | img_tensors = torch.from_numpy(transformed_rgb).permute(2, 0, 1).unsqueeze(0).to(self.device) # float32 [1, 3, H, W]
109 | assert img_tensors.shape[1] == 3, "unexpected image shape"
110 | # breakpoint()
111 | features_dict = self.dinov2.forward_features(img_tensors)
112 | raw_feature_grid = features_dict['x_norm_patchtokens'] # float32 [num_cams, patch_h*patch_w, feature_dim]
113 | raw_feature_grid = raw_feature_grid.reshape(1, patch_h, patch_w, -1) # float32 [num_cams, patch_h, patch_w, feature_dim]
114 | # compute per-point feature using bilinear interpolation
115 | interpolated_feature_grid = interpolate(raw_feature_grid.permute(0, 3, 1, 2), # float32 [num_cams, feature_dim, patch_h, patch_w]
116 | size=(img_h, img_w),
117 | mode='bilinear').permute(0, 2, 3, 1).squeeze(0) # float32 [H, W, feature_dim]
118 | features_flat = interpolated_feature_grid.reshape(-1, interpolated_feature_grid.shape[-1]) # float32 [H*W, feature_dim]
119 | return features_flat
120 |
121 | def _cluster_features(self, points, features_flat, masks):
122 | candidate_keypoints = []
123 | candidate_pixels = []
124 | candidate_rigid_group_ids = []
125 | for rigid_group_id, binary_mask in enumerate(masks):
126 | # ignore mask that is too large
127 | # print("***binary_mask", binary_mask)
128 | binary_mask = binary_mask.cpu().numpy()
129 | if np.mean(binary_mask) > self.config['max_mask_ratio']:
130 | continue
131 | # consider only foreground features
132 | obj_features_flat = features_flat[binary_mask.reshape(-1)]
133 | feature_pixels = np.argwhere(binary_mask)
134 | feature_points = points[binary_mask]
135 | # reduce dimensionality to be less sensitive to noise and texture
136 | obj_features_flat = obj_features_flat.double()
137 | (u, s, v) = torch.pca_lowrank(obj_features_flat, center=False)
138 | features_pca = torch.mm(obj_features_flat, v[:, :3])
139 | features_pca = (features_pca - features_pca.min(0)[0]) / (features_pca.max(0)[0] - features_pca.min(0)[0])
140 | X = features_pca
141 | # add feature_pixels as extra dimensions
142 | feature_points_torch = torch.tensor(feature_points, dtype=features_pca.dtype, device=features_pca.device)
143 | feature_points_torch = (feature_points_torch - feature_points_torch.min(0)[0]) / (feature_points_torch.max(0)[0] - feature_points_torch.min(0)[0])
144 | X = torch.cat([X, feature_points_torch], dim=-1)
145 | # cluster features to get meaningful regions
146 | cluster_ids_x, cluster_centers = kmeans(
147 | X=X,
148 | num_clusters=self.config['num_candidates_per_mask'],
149 | distance='euclidean',
150 | device=self.device,
151 | )
152 | cluster_centers = cluster_centers.to(self.device)
153 | for cluster_id in range(self.config['num_candidates_per_mask']):
154 | cluster_center = cluster_centers[cluster_id][:3]
155 | member_idx = cluster_ids_x == cluster_id
156 | member_points = feature_points[member_idx]
157 | member_pixels = feature_pixels[member_idx]
158 | member_features = features_pca[member_idx]
159 | dist = torch.norm(member_features - cluster_center, dim=-1)
160 | closest_idx = torch.argmin(dist)
161 | candidate_keypoints.append(member_points[closest_idx])
162 | candidate_pixels.append(member_pixels[closest_idx])
163 | candidate_rigid_group_ids.append(rigid_group_id)
164 |
165 | candidate_keypoints = np.array(candidate_keypoints)
166 | candidate_pixels = np.array(candidate_pixels)
167 | candidate_rigid_group_ids = np.array(candidate_rigid_group_ids)
168 |
169 | return candidate_keypoints, candidate_pixels, candidate_rigid_group_ids
170 |
171 | def _merge_clusters(self, candidate_keypoints):
172 | self.mean_shift.fit(candidate_keypoints)
173 | cluster_centers = self.mean_shift.cluster_centers_
174 | merged_indices = []
175 | for center in cluster_centers:
176 | dist = np.linalg.norm(candidate_keypoints - center, axis=-1)
177 | merged_indices.append(np.argmin(dist))
178 | return merged_indices
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import numpy as np
3 | import json
4 | import os
5 | import argparse
6 | from environment import ReKepOGEnv
7 | from keypoint_proposal import KeypointProposer
8 | from constraint_generation import ConstraintGenerator
9 | from ik_solver import IKSolver
10 | from subgoal_solver import SubgoalSolver
11 | from path_solver import PathSolver
12 | from visualizer import Visualizer
13 | import transform_utils as T
14 | from omnigibson.robots.fetch import Fetch
15 | from utils import (
16 | bcolors,
17 | get_config,
18 | load_functions_from_txt,
19 | get_linear_interpolation_steps,
20 | spline_interpolate_poses,
21 | get_callable_grasping_cost_fn,
22 | print_opt_debug_dict,
23 | )
24 |
25 | class Main:
26 | def __init__(self, scene_file, visualize=False):
27 | global_config = get_config(config_path="./configs/config.yaml")
28 | self.config = global_config['main']
29 | self.bounds_min = np.array(self.config['bounds_min'])
30 | self.bounds_max = np.array(self.config['bounds_max'])
31 | self.visualize = visualize
32 | # set random seed
33 | np.random.seed(self.config['seed'])
34 | torch.manual_seed(self.config['seed'])
35 | torch.cuda.manual_seed(self.config['seed'])
36 | # initialize keypoint proposer and constraint generator
37 | self.keypoint_proposer = KeypointProposer(global_config['keypoint_proposer'])
38 | self.constraint_generator = ConstraintGenerator(global_config['constraint_generator'])
39 | # initialize environment
40 | self.env = ReKepOGEnv(global_config['env'], scene_file, verbose=False)
41 | # setup ik solver (for reachability cost)
42 | assert isinstance(self.env.robot, Fetch), "The IK solver assumes the robot is a Fetch robot"
43 | ik_solver = IKSolver(
44 | robot_description_path=self.env.robot.robot_arm_descriptor_yamls[self.env.robot.default_arm],
45 | robot_urdf_path=self.env.robot.urdf_path,
46 | eef_name=self.env.robot.eef_link_names[self.env.robot.default_arm],
47 | reset_joint_pos=self.env.reset_joint_pos,
48 | world2robot_homo=self.env.world2robot_homo,
49 | )
50 | # initialize solvers
51 | self.subgoal_solver = SubgoalSolver(global_config['subgoal_solver'], ik_solver, self.env.reset_joint_pos)
52 | self.path_solver = PathSolver(global_config['path_solver'], ik_solver, self.env.reset_joint_pos)
53 | # initialize visualizer
54 | if self.visualize:
55 | self.visualizer = Visualizer(global_config['visualizer'], self.env)
56 |
57 | def perform_task(self, instruction, rekep_program_dir=None, disturbance_seq=None):
58 | self.env.reset()
59 | cam_obs = self.env.get_cam_obs()
60 | rgb = cam_obs[self.config['vlm_camera']]['rgb']
61 | points = cam_obs[self.config['vlm_camera']]['points']
62 | mask = cam_obs[self.config['vlm_camera']]['seg']
63 | # ====================================
64 | # = keypoint proposal and constraint generation
65 | # ====================================
66 | if rekep_program_dir is None:
67 | keypoints, projected_img = self.keypoint_proposer.get_keypoints(rgb, points, mask)
68 | print(f'{bcolors.HEADER}Got {len(keypoints)} proposed keypoints{bcolors.ENDC}')
69 | if self.visualize:
70 | self.visualizer.show_img(projected_img)
71 | metadata = {'init_keypoint_positions': keypoints, 'num_keypoints': len(keypoints)}
72 | rekep_program_dir = self.constraint_generator.generate(projected_img, instruction, metadata)
73 | print(f'{bcolors.HEADER}Constraints generated{bcolors.ENDC}')
74 | # ====================================
75 | # = execute
76 | # ====================================
77 | self._execute(rekep_program_dir, disturbance_seq)
78 |
79 | def _update_disturbance_seq(self, stage, disturbance_seq):
80 | if disturbance_seq is not None:
81 | if stage in disturbance_seq and not self.applied_disturbance[stage]:
82 | # set the disturbance sequence, the generator will yield and instantiate one disturbance function for each env.step until it is exhausted
83 | self.env.disturbance_seq = disturbance_seq[stage](self.env)
84 | self.applied_disturbance[stage] = True
85 |
86 | def _execute(self, rekep_program_dir, disturbance_seq=None):
87 | # load metadata
88 | with open(os.path.join(rekep_program_dir, 'metadata.json'), 'r') as f:
89 | self.program_info = json.load(f)
90 | self.applied_disturbance = {stage: False for stage in range(1, self.program_info['num_stages'] + 1)}
91 | # register keypoints to be tracked
92 | self.env.register_keypoints(self.program_info['init_keypoint_positions'])
93 | # load constraints
94 | self.constraint_fns = dict()
95 | for stage in range(1, self.program_info['num_stages'] + 1): # stage starts with 1
96 | stage_dict = dict()
97 | for constraint_type in ['subgoal', 'path']:
98 | load_path = os.path.join(rekep_program_dir, f'stage{stage}_{constraint_type}_constraints.txt')
99 | get_grasping_cost_fn = get_callable_grasping_cost_fn(self.env) # special grasping function for VLM to call
100 | stage_dict[constraint_type] = load_functions_from_txt(load_path, get_grasping_cost_fn) if os.path.exists(load_path) else []
101 | self.constraint_fns[stage] = stage_dict
102 |
103 | # bookkeeping of which keypoints can be moved in the optimization
104 | self.keypoint_movable_mask = np.zeros(self.program_info['num_keypoints'] + 1, dtype=bool)
105 | self.keypoint_movable_mask[0] = True # first keypoint is always the ee, so it's movable
106 |
107 | # main loop
108 | self.last_sim_step_counter = -np.inf
109 | self._update_stage(1)
110 | while True:
111 | scene_keypoints = self.env.get_keypoint_positions()
112 | self.keypoints = np.concatenate([[self.env.get_ee_pos()], scene_keypoints], axis=0) # first keypoint is always the ee
113 | self.curr_ee_pose = self.env.get_ee_pose()
114 | self.curr_joint_pos = self.env.get_arm_joint_postions()
115 | self.sdf_voxels = self.env.get_sdf_voxels(self.config['sdf_voxel_size'])
116 | self.collision_points = self.env.get_collision_points()
117 | # ====================================
118 | # = decide whether to backtrack
119 | # ====================================
120 | backtrack = False
121 | if self.stage > 1:
122 | path_constraints = self.constraint_fns[self.stage]['path']
123 | for constraints in path_constraints:
124 | violation = constraints(self.keypoints[0], self.keypoints[1:])
125 | if violation > self.config['constraint_tolerance']:
126 | backtrack = True
127 | break
128 | if backtrack:
129 | # determine which stage to backtrack to based on constraints
130 | for new_stage in range(self.stage - 1, 0, -1):
131 | path_constraints = self.constraint_fns[new_stage]['path']
132 | # if no constraints, we can safely backtrack
133 | if len(path_constraints) == 0:
134 | break
135 | # otherwise, check if all constraints are satisfied
136 | all_constraints_satisfied = True
137 | for constraints in path_constraints:
138 | violation = constraints(self.keypoints[0], self.keypoints[1:])
139 | if violation > self.config['constraint_tolerance']:
140 | all_constraints_satisfied = False
141 | break
142 | if all_constraints_satisfied:
143 | break
144 | print(f"{bcolors.HEADER}[stage={self.stage}] backtrack to stage {new_stage}{bcolors.ENDC}")
145 | self._update_stage(new_stage)
146 | else:
147 | # apply disturbance
148 | self._update_disturbance_seq(self.stage, disturbance_seq)
149 | # ====================================
150 | # = get optimized plan
151 | # ====================================
152 | if self.last_sim_step_counter == self.env.step_counter:
153 | print(f"{bcolors.WARNING}sim did not step forward within last iteration (HINT: adjust action_steps_per_iter to be larger or the pos_threshold to be smaller){bcolors.ENDC}")
154 | next_subgoal = self._get_next_subgoal(from_scratch=self.first_iter)
155 | next_path = self._get_next_path(next_subgoal, from_scratch=self.first_iter)
156 | self.first_iter = False
157 | self.action_queue = next_path.tolist()
158 | self.last_sim_step_counter = self.env.step_counter
159 |
160 | # ====================================
161 | # = execute
162 | # ====================================
163 | # determine if we proceed to the next stage
164 | count = 0
165 | while len(self.action_queue) > 0 and count < self.config['action_steps_per_iter']:
166 | next_action = self.action_queue.pop(0)
167 | precise = len(self.action_queue) == 0
168 | self.env.execute_action(next_action, precise=precise)
169 | count += 1
170 | if len(self.action_queue) == 0:
171 | if self.is_grasp_stage:
172 | self._execute_grasp_action()
173 | elif self.is_release_stage:
174 | self._execute_release_action()
175 | # if completed, save video and return
176 | if self.stage == self.program_info['num_stages']:
177 | self.env.sleep(2.0)
178 | save_path = self.env.save_video()
179 | print(f"{bcolors.OKGREEN}Video saved to {save_path}\n\n{bcolors.ENDC}")
180 | return
181 | # progress to next stage
182 | self._update_stage(self.stage + 1)
183 |
184 | def _get_next_subgoal(self, from_scratch):
185 | subgoal_constraints = self.constraint_fns[self.stage]['subgoal']
186 | path_constraints = self.constraint_fns[self.stage]['path']
187 | subgoal_pose, debug_dict = self.subgoal_solver.solve(self.curr_ee_pose,
188 | self.keypoints,
189 | self.keypoint_movable_mask,
190 | subgoal_constraints,
191 | path_constraints,
192 | self.sdf_voxels,
193 | self.collision_points,
194 | self.is_grasp_stage,
195 | self.curr_joint_pos,
196 | from_scratch=from_scratch)
197 | subgoal_pose_homo = T.convert_pose_quat2mat(subgoal_pose)
198 | # if grasp stage, back up a bit to leave room for grasping
199 | if self.is_grasp_stage:
200 | subgoal_pose[:3] += subgoal_pose_homo[:3, :3] @ np.array([-self.config['grasp_depth'] / 2.0, 0, 0])
201 | debug_dict['stage'] = self.stage
202 | print_opt_debug_dict(debug_dict)
203 | if self.visualize:
204 | self.visualizer.visualize_subgoal(subgoal_pose)
205 | return subgoal_pose
206 |
207 | def _get_next_path(self, next_subgoal, from_scratch):
208 | path_constraints = self.constraint_fns[self.stage]['path']
209 | path, debug_dict = self.path_solver.solve(self.curr_ee_pose,
210 | next_subgoal,
211 | self.keypoints,
212 | self.keypoint_movable_mask,
213 | path_constraints,
214 | self.sdf_voxels,
215 | self.collision_points,
216 | self.curr_joint_pos,
217 | from_scratch=from_scratch)
218 | print_opt_debug_dict(debug_dict)
219 | processed_path = self._process_path(path)
220 | if self.visualize:
221 | self.visualizer.visualize_path(processed_path)
222 | return processed_path
223 |
224 | def _process_path(self, path):
225 | # spline interpolate the path from the current ee pose
226 | full_control_points = np.concatenate([
227 | self.curr_ee_pose.reshape(1, -1),
228 | path,
229 | ], axis=0)
230 | num_steps = get_linear_interpolation_steps(full_control_points[0], full_control_points[-1],
231 | self.config['interpolate_pos_step_size'],
232 | self.config['interpolate_rot_step_size'])
233 | dense_path = spline_interpolate_poses(full_control_points, num_steps)
234 | # add gripper action
235 | ee_action_seq = np.zeros((dense_path.shape[0], 8))
236 | ee_action_seq[:, :7] = dense_path
237 | ee_action_seq[:, 7] = self.env.get_gripper_null_action()
238 | return ee_action_seq
239 |
240 | def _update_stage(self, stage):
241 | # update stage
242 | self.stage = stage
243 | self.is_grasp_stage = self.program_info['grasp_keypoints'][self.stage - 1] != -1
244 | self.is_release_stage = self.program_info['release_keypoints'][self.stage - 1] != -1
245 | # can only be grasp stage or release stage or none
246 | assert self.is_grasp_stage + self.is_release_stage <= 1, "Cannot be both grasp and release stage"
247 | if self.is_grasp_stage: # ensure gripper is open for grasping stage
248 | self.env.open_gripper()
249 | # clear action queue
250 | self.action_queue = []
251 | # update keypoint movable mask
252 | self._update_keypoint_movable_mask()
253 | self.first_iter = True
254 |
255 | def _update_keypoint_movable_mask(self):
256 | for i in range(1, len(self.keypoint_movable_mask)): # first keypoint is ee so always movable
257 | keypoint_object = self.env.get_object_by_keypoint(i - 1)
258 | self.keypoint_movable_mask[i] = self.env.is_grasping(keypoint_object)
259 |
260 | def _execute_grasp_action(self):
261 | pregrasp_pose = self.env.get_ee_pose()
262 | grasp_pose = pregrasp_pose.copy()
263 | grasp_pose[:3] += T.quat2mat(pregrasp_pose[3:]) @ np.array([self.config['grasp_depth'], 0, 0])
264 | grasp_action = np.concatenate([grasp_pose, [self.env.get_gripper_close_action()]])
265 | self.env.execute_action(grasp_action, precise=True)
266 |
267 | def _execute_release_action(self):
268 | self.env.open_gripper()
269 |
270 | if __name__ == "__main__":
271 | parser = argparse.ArgumentParser()
272 | parser.add_argument('--task', type=str, default='pen', help='task to perform')
273 | parser.add_argument('--use_cached_query', action='store_true', help='instead of querying the VLM, use the cached query')
274 | parser.add_argument('--apply_disturbance', action='store_true', help='apply disturbance to test the robustness')
275 | parser.add_argument('--visualize', action='store_true', help='visualize each solution before executing (NOTE: this is blocking and needs to press "ESC" to continue)')
276 | args = parser.parse_args()
277 |
278 | if args.apply_disturbance:
279 | assert args.task == 'pen' and args.use_cached_query, 'disturbance sequence is only defined for cached scenario'
280 |
281 | # ====================================
282 | # = pen task disturbance sequence
283 | # ====================================
284 | def stage1_disturbance_seq(env):
285 | """
286 | Move the pen in stage 0 when robot is trying to grasp the pen
287 | """
288 | pen = env.og_env.scene.object_registry("name", "pen_1")
289 | holder = env.og_env.scene.object_registry("name", "pencil_holder_1")
290 | # disturbance sequence
291 | pos0, orn0 = pen.get_position_orientation()
292 | pose0 = np.concatenate([pos0, orn0])
293 | pos1 = pos0 + np.array([-0.08, 0.0, 0.0])
294 | orn1 = T.quat_multiply(T.euler2quat(np.array([0, 0, np.pi/4])), orn0)
295 | pose1 = np.concatenate([pos1, orn1])
296 | pos2 = pos1 + np.array([0.10, 0.0, 0.0])
297 | orn2 = T.quat_multiply(T.euler2quat(np.array([0, 0, -np.pi/2])), orn1)
298 | pose2 = np.concatenate([pos2, orn2])
299 | control_points = np.array([pose0, pose1, pose2])
300 | pose_seq = spline_interpolate_poses(control_points, num_steps=25)
301 | def disturbance(counter):
302 | if counter < len(pose_seq):
303 | pose = pose_seq[counter]
304 | pos, orn = pose[:3], pose[3:]
305 | pen.set_position_orientation(pos, orn)
306 | counter += 1
307 | counter = 0
308 | while True:
309 | yield disturbance(counter)
310 | counter += 1
311 |
312 | def stage2_disturbance_seq(env):
313 | """
314 | Take the pen out of the gripper in stage 1 when robot is trying to reorient the pen
315 | """
316 | apply_disturbance = env.is_grasping()
317 | pen = env.og_env.scene.object_registry("name", "pen_1")
318 | holder = env.og_env.scene.object_registry("name", "pencil_holder_1")
319 | # disturbance sequence
320 | pos0, orn0 = pen.get_position_orientation()
321 | pose0 = np.concatenate([pos0, orn0])
322 | pose1 = np.array([-0.30, -0.15, 0.71, -0.7071068, 0, 0, 0.7071068])
323 | control_points = np.array([pose0, pose1])
324 | pose_seq = spline_interpolate_poses(control_points, num_steps=25)
325 | def disturbance(counter):
326 | if apply_disturbance:
327 | if counter < 20:
328 | if counter > 15:
329 | env.robot.release_grasp_immediately() # force robot to release the pen
330 | else:
331 | pass # do nothing for the other steps
332 | elif counter < len(pose_seq) + 20:
333 | env.robot.release_grasp_immediately() # force robot to release the pen
334 | pose = pose_seq[counter - 20]
335 | pos, orn = pose[:3], pose[3:]
336 | pen.set_position_orientation(pos, orn)
337 | counter += 1
338 | counter = 0
339 | while True:
340 | yield disturbance(counter)
341 | counter += 1
342 |
343 | def stage3_disturbance_seq(env):
344 | """
345 | Move the holder in stage 2 when robot is trying to drop the pen into the holder
346 | """
347 | pen = env.og_env.scene.object_registry("name", "pen_1")
348 | holder = env.og_env.scene.object_registry("name", "pencil_holder_1")
349 | # disturbance sequence
350 | pos0, orn0 = holder.get_position_orientation()
351 | pose0 = np.concatenate([pos0, orn0])
352 | pos1 = pos0 + np.array([-0.02, -0.15, 0.0])
353 | orn1 = orn0
354 | pose1 = np.concatenate([pos1, orn1])
355 | control_points = np.array([pose0, pose1])
356 | pose_seq = spline_interpolate_poses(control_points, num_steps=5)
357 | def disturbance(counter):
358 | if counter < len(pose_seq):
359 | pose = pose_seq[counter]
360 | pos, orn = pose[:3], pose[3:]
361 | holder.set_position_orientation(pos, orn)
362 | counter += 1
363 | counter = 0
364 | while True:
365 | yield disturbance(counter)
366 | counter += 1
367 |
368 | task_list = {
369 | 'pen': {
370 | 'scene_file': './configs/og_scene_file_red_pen.json',
371 | 'instruction': 'reorient the red pen and drop it upright into the black pen holder',
372 | 'rekep_program_dir': './vlm_query/pen',
373 | 'disturbance_seq': {1: stage1_disturbance_seq, 2: stage2_disturbance_seq, 3: stage3_disturbance_seq},
374 | },
375 | }
376 | task = task_list['pen']
377 | scene_file = task['scene_file']
378 | instruction = task['instruction']
379 | main = Main(scene_file, visualize=args.visualize)
380 | main.perform_task(instruction,
381 | rekep_program_dir=task['rekep_program_dir'] if args.use_cached_query else None,
382 | disturbance_seq=task.get('disturbance_seq', None) if args.apply_disturbance else None)
--------------------------------------------------------------------------------
/media/pen-in-holder-disturbances.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/media/pen-in-holder-disturbances.gif
--------------------------------------------------------------------------------
/media/pen-in-holder.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/media/pen-in-holder.gif
--------------------------------------------------------------------------------
/og_utils.py:
--------------------------------------------------------------------------------
1 | from omnigibson.sensors.vision_sensor import VisionSensor
2 | import transform_utils as T
3 | import numpy as np
4 | import torch
5 |
6 | class OGCamera:
7 | """
8 | Defines the camera class
9 | """
10 | def __init__(self, og_env, config) -> None:
11 | self.cam = insert_camera(name=config['name'], og_env=og_env, width=config['resolution'], height=config['resolution'])
12 | self.cam.set_position_orientation(config['position'], config['orientation'])
13 | self.intrinsics = get_cam_intrinsics(self.cam)
14 | self.extrinsics = get_cam_extrinsics(self.cam)
15 |
16 | def get_params(self):
17 | """
18 | Get the intrinsic and extrinsic parameters of the camera
19 | """
20 | return {"intrinsics": self.intrinsics, "extrinsics": self.extrinsics}
21 |
22 | def get_obs(self):
23 | """
24 | Gets the image observation from the camera.
25 | Assumes have rendered befor calling this function.
26 | No semantic handling here for now.
27 | """
28 | obs = self.cam.get_obs()
29 | ret = {}
30 | ret["rgb"] = obs[0]["rgb"][:,:,:3] # H, W, 3
31 | ret["depth"] = obs[0]["depth_linear"] # H, W
32 | ret["points"] = pixel_to_3d_points(ret["depth"], self.intrinsics, self.extrinsics) # H, W, 3
33 | ret["seg"] = obs[0]["seg_semantic"] # H, W
34 | ret["intrinsic"] = self.intrinsics
35 | ret["extrinsic"] = self.extrinsics
36 | return ret
37 |
38 | def insert_camera(name, og_env, width=480, height=480):
39 | try:
40 | cam = VisionSensor(
41 | prim_path=f"/World/{name}",
42 | name=name,
43 | image_width=width,
44 | image_height=height,
45 | modalities=["rgb", "depth_linear", "seg_semantic"]
46 | )
47 | except TypeError:
48 | cam = VisionSensor(
49 | relative_prim_path=f"/{name}",
50 | name=name,
51 | image_width=width,
52 | image_height=height,
53 | modalities=["rgb", "depth_linear", "seg_semantic"]
54 | )
55 |
56 | try:
57 | cam.load()
58 | except TypeError:
59 | cam.load(og_env.scene)
60 | cam.initialize()
61 | return cam
62 |
63 | def get_cam_intrinsics(cam):
64 | """
65 | Get the intrinsics matrix for a VisionSensor object
66 | ::param cam: VisionSensor object
67 | ::return intrinsics: 3x3 numpy array
68 | """
69 | img_width = cam.image_width
70 | img_height = cam.image_height
71 |
72 | if img_width != img_height:
73 | raise ValueError("Only square images are supported")
74 |
75 | apert = cam.prim.GetAttribute("horizontalAperture").Get()
76 | focal_len_in_pixel = cam.focal_length * img_width / apert
77 |
78 | intrinsics = np.eye(3)
79 | intrinsics[0,0] = focal_len_in_pixel
80 | intrinsics[1,1] = focal_len_in_pixel
81 | intrinsics[0,2] = img_width / 2
82 | intrinsics[1,2] = img_height / 2
83 |
84 | return intrinsics
85 |
86 | def get_cam_extrinsics(cam):
87 | return T.pose_inv(T.pose2mat(cam.get_position_orientation()))
88 |
89 | def pixel_to_3d_points(depth_image, intrinsics, extrinsics):
90 | # if torch.is_tensor(intrinsics):
91 | # intrinsics = intrinsics.detach().cpu().numpy()
92 | # if torch.is_tensor(extrinsics):
93 | # extrinsics = extrinsics.detach().cpu().numpy()
94 | # Get the shape of the depth image
95 | H, W = depth_image.shape
96 |
97 | # Create a grid of (x, y) coordinates corresponding to each pixel in the image
98 | i, j = np.meshgrid(np.arange(W), np.arange(H), indexing='xy')
99 |
100 | # Unpack the intrinsic parameters
101 | fx, fy = intrinsics[0, 0], intrinsics[1, 1]
102 | cx, cy = intrinsics[0, 2], intrinsics[1, 2]
103 |
104 | # Convert pixel coordinates to normalized camera coordinates
105 | if torch.is_tensor(depth_image):
106 | depth_image = depth_image.detach().cpu().numpy()
107 | z = depth_image
108 | x = (i - cx) * z / fx
109 | y = (j - cy) * z / fy
110 |
111 | # Stack the coordinates to form (H, W, 3)
112 | camera_coordinates = np.stack((x, y, z), axis=-1)
113 |
114 | # Reshape to (H*W, 3) for matrix multiplication
115 | camera_coordinates = camera_coordinates.reshape(-1, 3)
116 |
117 | # Convert to homogeneous coordinates (H*W, 4)
118 | camera_coordinates_homogeneous = np.hstack((camera_coordinates, np.ones((camera_coordinates.shape[0], 1))))
119 |
120 | # additional conversion to og convention
121 | T_mod = np.array([[1., 0., 0., 0., ],
122 | [0., -1., 0., 0.,],
123 | [0., 0., -1., 0.,],
124 | [0., 0., 0., 1.,]])
125 | camera_coordinates_homogeneous = camera_coordinates_homogeneous @ T_mod
126 |
127 | # Apply extrinsics to get world coordinates
128 | # world_coordinates_homogeneous = camera_coordinates_homogeneous @ extrinsics.T
129 | world_coordinates_homogeneous = T.pose_inv(extrinsics) @ (camera_coordinates_homogeneous.T)
130 | world_coordinates_homogeneous = world_coordinates_homogeneous.T
131 |
132 | # Convert back to non-homogeneous coordinates
133 | world_coordinates = world_coordinates_homogeneous[:, :3] / world_coordinates_homogeneous[:, 3, np.newaxis]
134 |
135 | # Reshape back to (H, W, 3)
136 | world_coordinates = world_coordinates.reshape(H, W, 3)
137 |
138 | return world_coordinates
139 |
140 | def point_to_pixel(pt, intrinsics, extrinsics):
141 | """
142 | pt -- (N, 3) 3d points in world frame
143 | intrinsics -- (3, 3) intrinsics matrix
144 | extrinsics -- (4, 4) extrinsics matrix
145 | """
146 | pt_in_cam = extrinsics @ np.hstack((pt, np.ones((pt.shape[0], 1)))).T # (4, N)
147 | # multiply y, z by -1
148 | pt_in_cam[1, :] *= -1
149 | pt_in_cam[2, :] *= -1
150 | pt_in_cam = pt_in_cam[:3, :]
151 | pt_in_cam = intrinsics @ pt_in_cam
152 | pt_in_cam /= pt_in_cam[2, :]
153 |
154 | return pt_in_cam[:2, :].T
--------------------------------------------------------------------------------
/path_solver.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from scipy.optimize import dual_annealing, minimize
3 | from scipy.interpolate import RegularGridInterpolator
4 | import copy
5 | import time
6 | import transform_utils as T
7 | from utils import (
8 | farthest_point_sampling,
9 | get_linear_interpolation_steps,
10 | linear_interpolate_poses,
11 | normalize_vars,
12 | unnormalize_vars,
13 | get_samples_jitted,
14 | calculate_collision_cost,
15 | path_length,
16 | transform_keypoints,
17 | )
18 | import torch
19 |
20 | # ====================================
21 | # = objective function
22 | # ====================================
23 | def objective(opt_vars,
24 | og_bounds,
25 | start_pose,
26 | end_pose,
27 | keypoints_centered,
28 | keypoint_movable_mask,
29 | path_constraints,
30 | sdf_func,
31 | collision_points_centered,
32 | opt_interpolate_pos_step_size,
33 | opt_interpolate_rot_step_size,
34 | ik_solver,
35 | initial_joint_pos,
36 | reset_joint_pos,
37 | return_debug_dict=False):
38 |
39 | debug_dict = {}
40 | debug_dict['num_control_points'] = len(opt_vars) // 6
41 |
42 | # unnormalize variables and do conversion
43 | unnormalized_opt_vars = unnormalize_vars(opt_vars, og_bounds)
44 | control_points_euler = np.concatenate([start_pose[None], unnormalized_opt_vars.reshape(-1, 6), end_pose[None]], axis=0) # [num_control_points, 6]
45 | control_points_homo = T.convert_pose_euler2mat(control_points_euler) # [num_control_points, 4, 4]
46 | control_points_quat = T.convert_pose_mat2quat(control_points_homo) # [num_control_points, 7]
47 | # get dense samples
48 | poses_quat, num_poses = get_samples_jitted(control_points_homo, control_points_quat, opt_interpolate_pos_step_size, opt_interpolate_rot_step_size)
49 | poses_homo = T.convert_pose_quat2mat(poses_quat)
50 | debug_dict['num_poses'] = num_poses
51 | start_idx, end_idx = 1, num_poses - 1 # exclude start and goal
52 |
53 | cost = 0
54 | # collision cost
55 | if collision_points_centered is not None:
56 | collision_cost = 0.5 * calculate_collision_cost(poses_homo[start_idx:end_idx], sdf_func, collision_points_centered, 0.20)
57 | debug_dict['collision_cost'] = collision_cost
58 | cost += collision_cost
59 |
60 | # penalize path length
61 | pos_length, rot_length = path_length(poses_homo)
62 | approx_length = pos_length + rot_length * 1.0
63 | path_length_cost = 4.0 * approx_length
64 | debug_dict['path_length_cost'] = path_length_cost
65 | cost += path_length_cost
66 |
67 | # reachability cost
68 | ik_cost = 0
69 | reset_reg_cost = 0
70 | debug_dict['ik_pos_error'] = []
71 | debug_dict['ik_feasible'] = []
72 | max_iterations = 20
73 | for control_point_homo in control_points_homo:
74 | ik_result = ik_solver.solve(
75 | control_point_homo,
76 | max_iterations=max_iterations,
77 | initial_joint_pos=initial_joint_pos,
78 | )
79 | debug_dict['ik_pos_error'].append(ik_result.position_error)
80 | debug_dict['ik_feasible'].append(ik_result.success)
81 | ik_cost += 20.0 * (ik_result.num_descents / max_iterations)
82 | if ik_result.success:
83 | reset_joint_pos = reset_joint_pos.detach().cpu().numpy() if torch.is_tensor(reset_joint_pos) else reset_joint_pos
84 | reset_reg = np.linalg.norm(ik_result.cspace_position[:-1] - reset_joint_pos[:-1])
85 | reset_reg = np.clip(reset_reg, 0.0, 3.0)
86 | else:
87 | reset_reg = 3.0
88 | reset_reg_cost += 0.2 * reset_reg
89 | debug_dict['ik_pos_error'] = np.array(debug_dict['ik_pos_error'])
90 | debug_dict['ik_feasible'] = np.array(debug_dict['ik_feasible'])
91 | debug_dict['ik_cost'] = ik_cost
92 | debug_dict['reset_reg_cost'] = reset_reg_cost
93 | cost += ik_cost
94 |
95 | # # path constraint violation cost
96 | debug_dict['path_violation'] = None
97 | if path_constraints is not None and len(path_constraints) > 0:
98 | path_constraint_cost = 0
99 | path_violation = []
100 | for pose in poses_homo[start_idx:end_idx]:
101 | transformed_keypoints = transform_keypoints(pose, keypoints_centered, keypoint_movable_mask)
102 | for constraint in path_constraints:
103 | violation = constraint(transformed_keypoints[0], transformed_keypoints[1:])
104 | path_violation.append(violation)
105 | path_constraint_cost += np.clip(violation, 0, np.inf)
106 | path_constraint_cost = 200.0*path_constraint_cost
107 | debug_dict['path_constraint_cost'] = path_constraint_cost
108 | debug_dict['path_violation'] = path_violation
109 | cost += path_constraint_cost
110 |
111 | debug_dict['total_cost'] = cost
112 |
113 | if return_debug_dict:
114 | return cost, debug_dict
115 |
116 | return cost
117 |
118 |
119 | class PathSolver:
120 | """
121 | Given a goal pose and a start pose, solve for a sequence of intermediate poses for the end effector to follow.
122 |
123 | Optimization variables:
124 | - sequence of intermediate control points
125 | """
126 |
127 | def __init__(self, config, ik_solver, reset_joint_pos):
128 | self.config = config
129 | self.ik_solver = ik_solver
130 | self.reset_joint_pos = reset_joint_pos
131 | self.last_opt_result = None
132 | # warmup
133 | self._warmup()
134 |
135 | def _warmup(self):
136 | start_pose = np.array([0.0, 0.0, 0.3, 0, 0, 0, 1])
137 | end_pose = np.array([0.0, 0.0, 0.0, 0, 0, 0, 1])
138 | keypoints = np.random.rand(10, 3)
139 | keypoint_movable_mask = np.random.rand(10) > 0.5
140 | path_constraints = []
141 | sdf_voxels = np.zeros((10, 10, 10))
142 | collision_points = np.random.rand(100, 3)
143 | self.solve(start_pose, end_pose, keypoints, keypoint_movable_mask, path_constraints, sdf_voxels, collision_points, None, from_scratch=True)
144 | self.last_opt_result = None
145 |
146 | def _setup_sdf(self, sdf_voxels):
147 | # create callable sdf function with interpolation
148 | x = np.linspace(self.config['bounds_min'][0], self.config['bounds_max'][0], sdf_voxels.shape[0])
149 | y = np.linspace(self.config['bounds_min'][1], self.config['bounds_max'][1], sdf_voxels.shape[1])
150 | z = np.linspace(self.config['bounds_min'][2], self.config['bounds_max'][2], sdf_voxels.shape[2])
151 | sdf_func = RegularGridInterpolator((x, y, z), sdf_voxels, bounds_error=False, fill_value=0)
152 | return sdf_func
153 |
154 | def _check_opt_result(self, opt_result, path_quat, debug_dict, og_bounds):
155 | # accept the opt_result if it's only terminated due to iteration limit
156 | if (not opt_result.success and ('maximum' in opt_result.message.lower() or 'iteration' in opt_result.message.lower() or 'not necessarily' in opt_result.message.lower())):
157 | opt_result.success = True
158 | elif not opt_result.success:
159 | opt_result.message += '; invalid solution'
160 | # check whether path constraints are satisfied
161 | if debug_dict['path_violation'] is not None:
162 | path_violation = np.array(debug_dict['path_violation'])
163 | opt_result.message += f'; path_violation: {path_violation}'
164 | path_constraints_satisfied = all([violation <= self.config['constraint_tolerance'] for violation in path_violation])
165 | if not path_constraints_satisfied:
166 | opt_result.success = False
167 | opt_result.message += f'; path constraint not satisfied'
168 | return opt_result
169 |
170 | def _center_collision_points_and_keypoints(self, ee_pose, collision_points, keypoints, keypoint_movable_mask):
171 | ee_pose_homo = T.pose2mat([ee_pose[:3], T.euler2quat(ee_pose[3:])])
172 | centering_transform = np.linalg.inv(ee_pose_homo)
173 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3]
174 | keypoints_centered = transform_keypoints(centering_transform, keypoints, keypoint_movable_mask)
175 | return collision_points_centered, keypoints_centered
176 |
177 | def solve(self,
178 | start_pose,
179 | end_pose,
180 | keypoints,
181 | keypoint_movable_mask,
182 | path_constraints,
183 | sdf_voxels,
184 | collision_points,
185 | initial_joint_pos,
186 | from_scratch=False):
187 | """
188 | Args:
189 | - start_pose (np.ndarray): [7], [x, y, z, qx, qy, qz, qw]
190 | - end_pose (np.ndarray): [7], [x, y, z, qx, qy, qz, qw]
191 | - keypoints (np.ndarray): [num_keypoints, 3]
192 | - keypoint_movable_mask (bool): whether the keypoints are on the object being grasped
193 | - path_constraints (List[Callable]): path constraints
194 | - sdf_voxels (np.ndarray): [H, W, D]
195 | - collision_points (np.ndarray): [num_points, 3], point cloud of the object being grasped
196 | - initial_joint_pos (np.ndarray): [N] initial joint positions of the robot.
197 | - from_scratch (bool): whether to start from scratch
198 |
199 | Returns:
200 | - opt_result (scipy.optimize.OptimizeResult): optimization opt_result
201 | - debug_dict (dict): debug information
202 | """
203 | # downsample collision points
204 | if collision_points is not None and collision_points.shape[0] > self.config['max_collision_points']:
205 | collision_points = farthest_point_sampling(collision_points, self.config['max_collision_points'])
206 | sdf_func = self._setup_sdf(sdf_voxels)
207 |
208 | # ====================================
209 | # = setup bounds
210 | # ====================================
211 | # calculate an appropriate number of control points, including start and goal
212 | num_control_points = get_linear_interpolation_steps(start_pose, end_pose, self.config['opt_pos_step_size'], self.config['opt_rot_step_size'])
213 | num_control_points = np.clip(num_control_points, 3, 6)
214 | # transform to euler representation
215 | start_pose = np.concatenate([start_pose[:3], T.quat2euler(start_pose[3:])])
216 | end_pose = np.concatenate([end_pose[:3], T.quat2euler(end_pose[3:])])
217 |
218 | # bounds for decision variables
219 | og_bounds = [(b_min, b_max) for b_min, b_max in zip(self.config['bounds_min'], self.config['bounds_max'])] + \
220 | [(-np.pi, np.pi) for _ in range(3)]
221 | og_bounds *= (num_control_points - 2)
222 | og_bounds = np.array(og_bounds, dtype=np.float64)
223 | bounds = [(-1, 1)] * len(og_bounds)
224 | num_vars = len(bounds)
225 |
226 | # ====================================
227 | # = setup initial guess
228 | # ====================================
229 | # use previous opt_result as initial guess if available
230 | if not from_scratch and self.last_opt_result is not None:
231 | init_sol = self.last_opt_result.x
232 | # if there are more control points in this iter, fill the rest with the last value + small noise
233 | if len(init_sol) < num_vars:
234 | new_x0 = np.empty(num_vars)
235 | new_x0[:len(init_sol)] = init_sol
236 | for i in range(len(init_sol), num_vars, 6):
237 | new_x0[i:i+6] = init_sol[-6:] + np.random.randn(6) * 0.01
238 | init_sol = new_x0
239 | # otherwise, use the last num_vars values
240 | else:
241 | init_sol = init_sol[-num_vars:]
242 | # initial guess as linear interpolation
243 | else:
244 | from_scratch = True
245 | interp_poses = linear_interpolate_poses(start_pose, end_pose, num_control_points) # [num_control_points, 6]
246 | init_sol = interp_poses[1:-1].flatten() # [num_control_points-2, 6]
247 | init_sol = normalize_vars(init_sol, og_bounds)
248 |
249 | # clip the initial guess to be within bounds
250 | for i, (b_min, b_max) in enumerate(bounds):
251 | init_sol[i] = np.clip(init_sol[i], b_min, b_max)
252 |
253 | # ====================================
254 | # = other setup
255 | # ====================================
256 | collision_points_centered, keypoints_centered = self._center_collision_points_and_keypoints(start_pose, collision_points, keypoints, keypoint_movable_mask)
257 | aux_args = (og_bounds,
258 | start_pose,
259 | end_pose,
260 | keypoints_centered,
261 | keypoint_movable_mask,
262 | path_constraints,
263 | sdf_func,
264 | collision_points_centered,
265 | self.config['opt_interpolate_pos_step_size'],
266 | self.config['opt_interpolate_rot_step_size'],
267 | self.ik_solver,
268 | initial_joint_pos,
269 | self.reset_joint_pos)
270 |
271 | # ====================================
272 | # = solve optimization
273 | # ====================================
274 | start = time.time()
275 | # use global optimization for the first iteration
276 | if from_scratch:
277 | opt_result = dual_annealing(
278 | func=objective,
279 | bounds=bounds,
280 | args=aux_args,
281 | maxfun=self.config['sampling_maxfun'],
282 | x0=init_sol,
283 | no_local_search=True,
284 | minimizer_kwargs={
285 | 'method': 'SLSQP',
286 | 'options': self.config['minimizer_options'],
287 | },
288 | )
289 | # use gradient-based local optimization for the following iterations
290 | else:
291 | opt_result = minimize(
292 | fun=objective,
293 | x0=init_sol,
294 | args=aux_args,
295 | bounds=bounds,
296 | method='SLSQP',
297 | options=self.config['minimizer_options'],
298 | )
299 | solve_time = time.time() - start
300 |
301 | # ====================================
302 | # = post-process opt_result
303 | # ====================================
304 | if isinstance(opt_result.message, list):
305 | opt_result.message = opt_result.message[0]
306 | # rerun to get debug info
307 | _, debug_dict = objective(opt_result.x, *aux_args, return_debug_dict=True)
308 | debug_dict['sol'] = opt_result.x.reshape(-1, 6)
309 | debug_dict['msg'] = opt_result.message
310 | debug_dict['solve_time'] = solve_time
311 | debug_dict['from_scratch'] = from_scratch
312 | debug_dict['type'] = 'path_solver'
313 | # unnormailze
314 | sol = unnormalize_vars(opt_result.x, og_bounds)
315 | # add end pose
316 | poses_euler = np.concatenate([sol.reshape(-1, 6), end_pose[None]], axis=0)
317 | poses_quat = T.convert_pose_euler2quat(poses_euler) # [num_control_points, 7]
318 | opt_result = self._check_opt_result(opt_result, poses_quat, debug_dict, og_bounds)
319 | # cache opt_result for future use if successful
320 | if opt_result.success:
321 | self.last_opt_result = copy.deepcopy(opt_result)
322 | return poses_quat, debug_dict
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | open3d==0.16.0
2 | omegaconf
3 | opencv-python
4 | hydra-core
5 | openai
6 | torch
7 | kmeans_pytorch
8 | parse
--------------------------------------------------------------------------------
/subgoal_solver.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import time
3 | import copy
4 | from scipy.optimize import dual_annealing, minimize
5 | from scipy.interpolate import RegularGridInterpolator
6 | import transform_utils as T
7 | from utils import (
8 | transform_keypoints,
9 | calculate_collision_cost,
10 | normalize_vars,
11 | unnormalize_vars,
12 | farthest_point_sampling,
13 | consistency,
14 | )
15 | import torch
16 | def objective(opt_vars,
17 | og_bounds,
18 | keypoints_centered,
19 | keypoint_movable_mask,
20 | goal_constraints,
21 | path_constraints,
22 | sdf_func,
23 | collision_points_centered,
24 | init_pose_homo,
25 | ik_solver,
26 | initial_joint_pos,
27 | reset_joint_pos,
28 | is_grasp_stage,
29 | return_debug_dict=False):
30 |
31 | debug_dict = {}
32 | # unnormalize variables and do conversion
33 | opt_pose = unnormalize_vars(opt_vars, og_bounds)
34 | opt_pose_homo = T.pose2mat([opt_pose[:3], T.euler2quat(opt_pose[3:])])
35 |
36 | cost = 0
37 | # collision cost
38 | if collision_points_centered is not None:
39 | collision_cost = 0.8 * calculate_collision_cost(opt_pose_homo[None], sdf_func, collision_points_centered, 0.10)
40 | debug_dict['collision_cost'] = collision_cost
41 | cost += collision_cost
42 |
43 | # stay close to initial pose
44 | init_pose_cost = 1.0 * consistency(opt_pose_homo[None], init_pose_homo[None], rot_weight=1.5)
45 | debug_dict['init_pose_cost'] = init_pose_cost
46 | cost += init_pose_cost
47 |
48 | # reachability cost (approximated by number of IK iterations + regularization from reset joint pos)
49 | max_iterations = 20
50 | ik_result = ik_solver.solve(
51 | opt_pose_homo,
52 | max_iterations=max_iterations,
53 | initial_joint_pos=initial_joint_pos,
54 | )
55 | ik_cost = 20.0 * (ik_result.num_descents / max_iterations)
56 | debug_dict['ik_feasible'] = ik_result.success
57 | debug_dict['ik_pos_error'] = ik_result.position_error
58 | debug_dict['ik_cost'] = ik_cost
59 | cost += ik_cost
60 | if ik_result.success:
61 | reset_joint_pos = reset_joint_pos.detach().cpu().numpy() if torch.is_tensor(reset_joint_pos) else reset_joint_pos
62 | reset_reg = np.linalg.norm(ik_result.cspace_position[:-1] - reset_joint_pos[:-1])
63 | reset_reg = np.clip(reset_reg, 0.0, 3.0)
64 | else:
65 | reset_reg = 3.0
66 | reset_reg_cost = 0.2 * reset_reg
67 | debug_dict['reset_reg_cost'] = reset_reg_cost
68 | cost += reset_reg_cost
69 |
70 | # grasp metric (better performance if using anygrasp or force-based grasp metrics)
71 | if is_grasp_stage:
72 | preferred_dir = np.array([0, 0, -1])
73 | grasp_cost = -np.dot(opt_pose_homo[:3, 0], preferred_dir) + 1 # [0, 1]
74 | grasp_cost = 10.0 * grasp_cost
75 | debug_dict['grasp_cost'] = grasp_cost
76 | cost += grasp_cost
77 |
78 | # goal constraint violation cost
79 | debug_dict['subgoal_constraint_cost'] = None
80 | debug_dict['subgoal_violation'] = None
81 | if goal_constraints is not None and len(goal_constraints) > 0:
82 | subgoal_constraint_cost = 0
83 | transformed_keypoints = transform_keypoints(opt_pose_homo, keypoints_centered, keypoint_movable_mask)
84 | subgoal_violation = []
85 | for constraint in goal_constraints:
86 | violation = constraint(transformed_keypoints[0], transformed_keypoints[1:])
87 | subgoal_violation.append(violation)
88 | subgoal_constraint_cost += np.clip(violation, 0, np.inf)
89 | subgoal_constraint_cost = 200.0*subgoal_constraint_cost
90 | debug_dict['subgoal_constraint_cost'] = subgoal_constraint_cost
91 | debug_dict['subgoal_violation'] = subgoal_violation
92 | cost += subgoal_constraint_cost
93 |
94 | # path constraint violation cost
95 | debug_dict['path_violation'] = None
96 | if path_constraints is not None and len(path_constraints) > 0:
97 | path_constraint_cost = 0
98 | transformed_keypoints = transform_keypoints(opt_pose_homo, keypoints_centered, keypoint_movable_mask)
99 | path_violation = []
100 | for constraint in path_constraints:
101 | violation = constraint(transformed_keypoints[0], transformed_keypoints[1:])
102 | path_violation.append(violation)
103 | path_constraint_cost += np.clip(violation, 0, np.inf)
104 | path_constraint_cost = 200.0*path_constraint_cost
105 | debug_dict['path_constraint_cost'] = path_constraint_cost
106 | debug_dict['path_violation'] = path_violation
107 | cost += path_constraint_cost
108 |
109 | debug_dict['total_cost'] = cost
110 |
111 | if return_debug_dict:
112 | return cost, debug_dict
113 |
114 | return cost
115 |
116 |
117 | class SubgoalSolver:
118 | def __init__(self, config, ik_solver, reset_joint_pos):
119 | self.config = config
120 | self.ik_solver = ik_solver
121 | self.reset_joint_pos = reset_joint_pos
122 | self.last_opt_result = None
123 | # warmup
124 | self._warmup()
125 |
126 | def _warmup(self):
127 | ee_pose = np.array([0.0, 0.0, 0.0, 0, 0, 0, 1])
128 | keypoints = np.random.rand(10, 3)
129 | keypoint_movable_mask = np.random.rand(10) > 0.5
130 | goal_constraints = []
131 | path_constraints = []
132 | sdf_voxels = np.zeros((10, 10, 10))
133 | collision_points = np.random.rand(100, 3)
134 | self.solve(ee_pose, keypoints, keypoint_movable_mask, goal_constraints, path_constraints, sdf_voxels, collision_points, True, None, from_scratch=True)
135 | self.last_opt_result = None
136 |
137 | def _setup_sdf(self, sdf_voxels):
138 | # create callable sdf function with interpolation
139 | x = np.linspace(self.config['bounds_min'][0], self.config['bounds_max'][0], sdf_voxels.shape[0])
140 | y = np.linspace(self.config['bounds_min'][1], self.config['bounds_max'][1], sdf_voxels.shape[1])
141 | z = np.linspace(self.config['bounds_min'][2], self.config['bounds_max'][2], sdf_voxels.shape[2])
142 | sdf_func = RegularGridInterpolator((x, y, z), sdf_voxels, bounds_error=False, fill_value=0)
143 | return sdf_func
144 |
145 | def _check_opt_result(self, opt_result, debug_dict):
146 | # accept the opt_result if it's only terminated due to iteration limit
147 | if (not opt_result.success and ('maximum' in opt_result.message.lower() or 'iteration' in opt_result.message.lower() or 'not necessarily' in opt_result.message.lower())):
148 | opt_result.success = True
149 | elif not opt_result.success:
150 | opt_result.message += '; invalid solution'
151 | # check whether goal constraints are satisfied
152 | if debug_dict['subgoal_violation'] is not None:
153 | goal_constraints_results = np.array(debug_dict['subgoal_violation'])
154 | opt_result.message += f'; goal_constraints_results: {goal_constraints_results} (higher is worse)'
155 | goal_constraints_satisfied = all([violation <= self.config['constraint_tolerance'] for violation in goal_constraints_results])
156 | if not goal_constraints_satisfied:
157 | opt_result.success = False
158 | opt_result.message += f'; goal not satisfied'
159 | # check whether path constraints are satisfied
160 | if debug_dict['path_violation'] is not None:
161 | path_constraints_results = np.array(debug_dict['path_violation'])
162 | opt_result.message += f'; path_constraints_results: {path_constraints_results}'
163 | path_constraints_satisfied = all([violation <= self.config['constraint_tolerance'] for violation in path_constraints_results])
164 | if not path_constraints_satisfied:
165 | opt_result.success = False
166 | opt_result.message += f'; path not satisfied'
167 | # check whether ik is feasible
168 | if 'ik_feasible' in debug_dict and not debug_dict['ik_feasible']:
169 | opt_result.success = False
170 | opt_result.message += f'; ik not feasible'
171 | return opt_result
172 |
173 | def _center_collision_points_and_keypoints(self, ee_pose_homo, collision_points, keypoints, keypoint_movable_mask):
174 | centering_transform = np.linalg.inv(ee_pose_homo)
175 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3]
176 | keypoints_centered = transform_keypoints(centering_transform, keypoints, keypoint_movable_mask)
177 | return collision_points_centered, keypoints_centered
178 |
179 | def solve(self,
180 | ee_pose,
181 | keypoints,
182 | keypoint_movable_mask,
183 | goal_constraints,
184 | path_constraints,
185 | sdf_voxels,
186 | collision_points,
187 | is_grasp_stage,
188 | initial_joint_pos,
189 | from_scratch=False,
190 | ):
191 | """
192 | Args:
193 | - ee_pose (np.ndarray): [7], [x, y, z, qx, qy, qz, qw] end effector pose.
194 | - keypoints (np.ndarray): [M, 3] keypoint positions.
195 | - keypoint_movable_mask (bool): [M] boolean array indicating whether the keypoint is on the grasped object.
196 | - goal_constraints (List[Callable]): subgoal constraint functions.
197 | - path_constraints (List[Callable]): path constraint functions.
198 | - sdf_voxels (np.ndarray): [X, Y, Z] signed distance field of the environment.
199 | - collision_points (np.ndarray): [N, 3] point cloud of the object.
200 | - is_grasp_stage (bool): whether the current stage is a grasp stage.
201 | - initial_joint_pos (np.ndarray): [N] initial joint positions of the robot.
202 | - from_scratch (bool): whether to start from scratch.
203 | Returns:
204 | - result (scipy.optimize.OptimizeResult): optimization result.
205 | - debug_dict (dict): debug information.
206 | """
207 | # downsample collision points
208 | if collision_points is not None and collision_points.shape[0] > self.config['max_collision_points']:
209 | collision_points = farthest_point_sampling(collision_points, self.config['max_collision_points'])
210 | sdf_func = self._setup_sdf(sdf_voxels)
211 | # ====================================
212 | # = setup bounds and initial guess
213 | # ====================================
214 | ee_pose = ee_pose.astype(np.float64)
215 | ee_pose_homo = T.pose2mat([ee_pose[:3], ee_pose[3:]])
216 | ee_pose_euler = np.concatenate([ee_pose[:3], T.quat2euler(ee_pose[3:])])
217 | # normalize opt variables to [0, 1]
218 | pos_bounds_min = self.config['bounds_min']
219 | pos_bounds_max = self.config['bounds_max']
220 | rot_bounds_min = np.array([-np.pi, -np.pi, -np.pi]) # euler angles
221 | rot_bounds_max = np.array([np.pi, np.pi, np.pi]) # euler angles
222 | og_bounds = [(b_min, b_max) for b_min, b_max in zip(np.concatenate([pos_bounds_min, rot_bounds_min]), np.concatenate([pos_bounds_max, rot_bounds_max]))]
223 | bounds = [(-1, 1)] * len(og_bounds)
224 | if not from_scratch and self.last_opt_result is not None:
225 | init_sol = self.last_opt_result.x
226 | else:
227 | init_sol = normalize_vars(ee_pose_euler, og_bounds) # start from the current pose
228 | from_scratch = True
229 |
230 | # ====================================
231 | # = other setup
232 | # ====================================
233 | collision_points_centered, keypoints_centered = self._center_collision_points_and_keypoints(ee_pose_homo, collision_points, keypoints, keypoint_movable_mask)
234 | aux_args = (og_bounds,
235 | keypoints_centered,
236 | keypoint_movable_mask,
237 | goal_constraints,
238 | path_constraints,
239 | sdf_func,
240 | collision_points_centered,
241 | ee_pose_homo,
242 | self.ik_solver,
243 | initial_joint_pos,
244 | self.reset_joint_pos,
245 | is_grasp_stage)
246 |
247 | # ====================================
248 | # = solve optimization
249 | # ====================================
250 | start = time.time()
251 | # use global optimization for the first iteration
252 | if from_scratch:
253 | opt_result = dual_annealing(
254 | func=objective,
255 | bounds=bounds,
256 | args=aux_args,
257 | maxfun=self.config['sampling_maxfun'],
258 | x0=init_sol,
259 | no_local_search=False,
260 | minimizer_kwargs={
261 | 'method': 'SLSQP',
262 | 'options': self.config['minimizer_options'],
263 | },
264 | )
265 | # use gradient-based local optimization for the following iterations
266 | else:
267 | opt_result = minimize(
268 | fun=objective,
269 | x0=init_sol,
270 | args=aux_args,
271 | bounds=bounds,
272 | method='SLSQP',
273 | options=self.config['minimizer_options'],
274 | )
275 | solve_time = time.time() - start
276 |
277 | # ====================================
278 | # = post-process opt_result
279 | # ====================================
280 | if isinstance(opt_result.message, list):
281 | opt_result.message = opt_result.message[0]
282 | # rerun to get debug info
283 | _, debug_dict = objective(opt_result.x, *aux_args, return_debug_dict=True)
284 | debug_dict['sol'] = opt_result.x
285 | debug_dict['msg'] = opt_result.message
286 | debug_dict['solve_time'] = solve_time
287 | debug_dict['from_scratch'] = from_scratch
288 | debug_dict['type'] = 'subgoal_solver'
289 | # unnormailze
290 | sol = unnormalize_vars(opt_result.x, og_bounds)
291 | sol = np.concatenate([sol[:3], T.euler2quat(sol[3:])])
292 | opt_result = self._check_opt_result(opt_result, debug_dict)
293 | # cache opt_result for future use if successful
294 | if opt_result.success:
295 | self.last_opt_result = copy.deepcopy(opt_result)
296 | return sol, debug_dict
--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
1 | import os
2 | import numpy as np
3 | from numba import njit
4 | import open3d as o3d
5 | import datetime
6 | import scipy.interpolate as interpolate
7 | from scipy.spatial.transform import Slerp
8 | from scipy.spatial.transform import Rotation as R
9 | from scipy.spatial.transform import RotationSpline
10 | import transform_utils as T
11 | import yaml
12 |
13 | # ===============================================
14 | # = optimization utils
15 | # ===============================================
16 | def normalize_vars(vars, og_bounds):
17 | """
18 | Given 1D variables and bounds, normalize the variables to [-1, 1] range.
19 | """
20 | normalized_vars = np.empty_like(vars)
21 | for i, (b_min, b_max) in enumerate(og_bounds):
22 | normalized_vars[i] = (vars[i] - b_min) / (b_max - b_min) * 2 - 1
23 | return normalized_vars
24 |
25 | def unnormalize_vars(normalized_vars, og_bounds):
26 | """
27 | Given 1D variables in [-1, 1] and original bounds, denormalize the variables to the original range.
28 | """
29 | vars = np.empty_like(normalized_vars)
30 | for i, (b_min, b_max) in enumerate(og_bounds):
31 | vars[i] = (normalized_vars[i] + 1) / 2 * (b_max - b_min) + b_min
32 | return vars
33 |
34 | def calculate_collision_cost(poses, sdf_func, collision_points, threshold):
35 | assert poses.shape[1:] == (4, 4)
36 | transformed_pcs = batch_transform_points(collision_points, poses)
37 | transformed_pcs_flatten = transformed_pcs.reshape(-1, 3) # [num_poses * num_points, 3]
38 | signed_distance = sdf_func(transformed_pcs_flatten) + threshold # [num_poses * num_points]
39 | signed_distance = signed_distance.reshape(-1, collision_points.shape[0]) # [num_poses, num_points]
40 | non_zero_mask = signed_distance > 0
41 | collision_cost = np.sum(signed_distance[non_zero_mask])
42 | return collision_cost
43 |
44 | @njit(cache=True, fastmath=True)
45 | def consistency(poses_a, poses_b, rot_weight=0.5):
46 | assert poses_a.shape[1:] == (4, 4) and poses_b.shape[1:] == (4, 4), 'poses must be of shape (N, 4, 4)'
47 | min_distances = np.zeros(len(poses_a), dtype=np.float64)
48 | for i in range(len(poses_a)):
49 | min_distance = 9999999
50 | a = poses_a[i]
51 | for j in range(len(poses_b)):
52 | b = poses_b[j]
53 | pos_distance = np.linalg.norm(a[:3, 3] - b[:3, 3])
54 | rot_distance = angle_between_rotmat(a[:3, :3], b[:3, :3])
55 | distance = pos_distance + rot_distance * rot_weight
56 | min_distance = min(min_distance, distance)
57 | min_distances[i] = min_distance
58 | return np.mean(min_distances)
59 |
60 | def transform_keypoints(transform, keypoints, movable_mask):
61 | assert transform.shape == (4, 4)
62 | transformed_keypoints = keypoints.copy()
63 | if movable_mask.sum() > 0:
64 | transformed_keypoints[movable_mask] = np.dot(keypoints[movable_mask], transform[:3, :3].T) + transform[:3, 3]
65 | return transformed_keypoints
66 |
67 | @njit(cache=True, fastmath=True)
68 | def batch_transform_points(points, transforms):
69 | """
70 | Apply multiple of transformation to point cloud, return results of individual transformations.
71 | Args:
72 | points: point cloud (N, 3).
73 | transforms: M 4x4 transformations (M, 4, 4).
74 | Returns:
75 | np.array: point clouds (M, N, 3).
76 | """
77 | assert transforms.shape[1:] == (4, 4), 'transforms must be of shape (M, 4, 4)'
78 | transformed_points = np.zeros((transforms.shape[0], points.shape[0], 3))
79 | for i in range(transforms.shape[0]):
80 | pos, R = transforms[i, :3, 3], transforms[i, :3, :3]
81 | transformed_points[i] = np.dot(points, R.T) + pos
82 | return transformed_points
83 |
84 | @njit(cache=True, fastmath=True)
85 | def get_samples_jitted(control_points_homo, control_points_quat, opt_interpolate_pos_step_size, opt_interpolate_rot_step_size):
86 | assert control_points_homo.shape[1:] == (4, 4)
87 | # calculate number of samples per segment
88 | num_samples_per_segment = np.empty(len(control_points_homo) - 1, dtype=np.int64)
89 | for i in range(len(control_points_homo) - 1):
90 | start_pos = control_points_homo[i, :3, 3]
91 | start_rotmat = control_points_homo[i, :3, :3]
92 | end_pos = control_points_homo[i+1, :3, 3]
93 | end_rotmat = control_points_homo[i+1, :3, :3]
94 | pos_diff = np.linalg.norm(start_pos - end_pos)
95 | rot_diff = angle_between_rotmat(start_rotmat, end_rotmat)
96 | pos_num_steps = np.ceil(pos_diff / opt_interpolate_pos_step_size)
97 | rot_num_steps = np.ceil(rot_diff / opt_interpolate_rot_step_size)
98 | num_path_poses = int(max(pos_num_steps, rot_num_steps))
99 | num_path_poses = max(num_path_poses, 2) # at least 2 poses, start and end
100 | num_samples_per_segment[i] = num_path_poses
101 | # fill in samples
102 | num_samples = num_samples_per_segment.sum()
103 | samples_7 = np.empty((num_samples, 7))
104 | sample_idx = 0
105 | for i in range(len(control_points_quat) - 1):
106 | start_pos, start_xyzw = control_points_quat[i, :3], control_points_quat[i, 3:]
107 | end_pos, end_xyzw = control_points_quat[i+1, :3], control_points_quat[i+1, 3:]
108 | # using proper quaternion slerp interpolation
109 | poses_7 = np.empty((num_samples_per_segment[i], 7))
110 | for j in range(num_samples_per_segment[i]):
111 | alpha = j / (num_samples_per_segment[i] - 1)
112 | pos = start_pos * (1 - alpha) + end_pos * alpha
113 | blended_xyzw = T.quat_slerp_jitted(start_xyzw, end_xyzw, alpha)
114 | pose_7 = np.empty(7)
115 | pose_7[:3] = pos
116 | pose_7[3:] = blended_xyzw
117 | poses_7[j] = pose_7
118 | samples_7[sample_idx:sample_idx+num_samples_per_segment[i]] = poses_7
119 | sample_idx += num_samples_per_segment[i]
120 | assert num_samples >= 2, f'num_samples: {num_samples}'
121 | return samples_7, num_samples
122 |
123 | @njit(cache=True, fastmath=True)
124 | def path_length(samples_homo):
125 | assert samples_homo.shape[1:] == (4, 4), 'samples_homo must be of shape (N, 4, 4)'
126 | pos_length = 0
127 | rot_length = 0
128 | for i in range(len(samples_homo) - 1):
129 | pos_length += np.linalg.norm(samples_homo[i, :3, 3] - samples_homo[i+1, :3, 3])
130 | rot_length += angle_between_rotmat(samples_homo[i, :3, :3], samples_homo[i+1, :3, :3])
131 | return pos_length, rot_length
132 |
133 | # ===============================================
134 | # = others
135 | # ===============================================
136 | def get_callable_grasping_cost_fn(env):
137 | def get_grasping_cost(keypoint_idx):
138 | keypoint_object = env.get_object_by_keypoint(keypoint_idx)
139 | return -env.is_grasping(candidate_obj=keypoint_object) + 1 # return 0 if grasping an object, 1 if not grasping any object
140 | return get_grasping_cost
141 |
142 | def get_config(config_path=None):
143 | if config_path is None:
144 | this_file_dir = os.path.dirname(os.path.abspath(__file__))
145 | config_path = os.path.join(this_file_dir, 'configs/config.yaml')
146 | assert config_path and os.path.exists(config_path), f'config file does not exist ({config_path})'
147 | with open(config_path, 'r') as f:
148 | config = yaml.load(f, Loader=yaml.FullLoader)
149 | return config
150 |
151 | class bcolors:
152 | HEADER = '\033[95m'
153 | OKBLUE = '\033[94m'
154 | OKCYAN = '\033[96m'
155 | OKGREEN = '\033[92m'
156 | WARNING = '\033[93m'
157 | FAIL = '\033[91m'
158 | ENDC = '\033[0m'
159 | BOLD = '\033[1m'
160 | UNDERLINE = '\033[4m'
161 |
162 | def get_clock_time(milliseconds=False):
163 | curr_time = datetime.datetime.now()
164 | if milliseconds:
165 | return f'{curr_time.hour}:{curr_time.minute}:{curr_time.second}.{curr_time.microsecond // 1000}'
166 | else:
167 | return f'{curr_time.hour}:{curr_time.minute}:{curr_time.second}'
168 |
169 | def angle_between_quats(q1, q2):
170 | """Angle between two quaternions"""
171 | return 2 * np.arccos(np.clip(np.abs(np.dot(q1, q2)), -1, 1))
172 |
173 | def filter_points_by_bounds(points, bounds_min, bounds_max, strict=True):
174 | """
175 | Filter points by taking only points within workspace bounds.
176 | """
177 | assert points.shape[1] == 3, "points must be (N, 3)"
178 | bounds_min = bounds_min.copy()
179 | bounds_max = bounds_max.copy()
180 | if not strict:
181 | bounds_min[:2] = bounds_min[:2] - 0.1 * (bounds_max[:2] - bounds_min[:2])
182 | bounds_max[:2] = bounds_max[:2] + 0.1 * (bounds_max[:2] - bounds_min[:2])
183 | bounds_min[2] = bounds_min[2] - 0.1 * (bounds_max[2] - bounds_min[2])
184 | within_bounds_mask = (
185 | (points[:, 0] >= bounds_min[0])
186 | & (points[:, 0] <= bounds_max[0])
187 | & (points[:, 1] >= bounds_min[1])
188 | & (points[:, 1] <= bounds_max[1])
189 | & (points[:, 2] >= bounds_min[2])
190 | & (points[:, 2] <= bounds_max[2])
191 | )
192 | return within_bounds_mask
193 |
194 | def print_opt_debug_dict(debug_dict):
195 | print('\n' + '#' * 40)
196 | print(f'# Optimization debug info:')
197 | max_key_length = max(len(str(k)) for k in debug_dict.keys())
198 | for k, v in debug_dict.items():
199 | if isinstance(v, int) or isinstance(v, float):
200 | print(f'# {k:<{max_key_length}}: {v:.05f}')
201 | elif isinstance(v, list) and all(isinstance(x, int) or isinstance(x, float) for x in v):
202 | print(f'# {k:<{max_key_length}}: {np.array(v).round(5)}')
203 | else:
204 | print(f'# {k:<{max_key_length}}: {v}')
205 | print('#' * 40 + '\n')
206 |
207 | def merge_dicts(dicts):
208 | return {
209 | k : v
210 | for d in dicts
211 | for k, v in d.items()
212 | }
213 |
214 | def exec_safe(code_str, gvars=None, lvars=None):
215 | banned_phrases = ['import', '__']
216 | for phrase in banned_phrases:
217 | assert phrase not in code_str
218 |
219 | if gvars is None:
220 | gvars = {}
221 | if lvars is None:
222 | lvars = {}
223 | empty_fn = lambda *args, **kwargs: None
224 | custom_gvars = merge_dicts([
225 | gvars,
226 | {'exec': empty_fn, 'eval': empty_fn}
227 | ])
228 | try:
229 | exec(code_str, custom_gvars, lvars)
230 | except Exception as e:
231 | print(f'Error executing code:\n{code_str}')
232 | raise e
233 |
234 | def load_functions_from_txt(txt_path, get_grasping_cost_fn):
235 | if txt_path is None:
236 | return []
237 | # load txt file
238 | with open(txt_path, 'r') as f:
239 | functions_text = f.read()
240 | # execute functions
241 | gvars_dict = {
242 | 'np': np,
243 | 'get_grasping_cost_by_keypoint_idx': get_grasping_cost_fn,
244 | } # external library APIs
245 | lvars_dict = dict()
246 | exec_safe(functions_text, gvars=gvars_dict, lvars=lvars_dict)
247 | return list(lvars_dict.values())
248 |
249 | @njit(cache=True, fastmath=True)
250 | def angle_between_rotmat(P, Q):
251 | R = np.dot(P, Q.T)
252 | cos_theta = (np.trace(R)-1)/2
253 | if cos_theta > 1:
254 | cos_theta = 1
255 | elif cos_theta < -1:
256 | cos_theta = -1
257 | return np.arccos(cos_theta)
258 |
259 | def fit_b_spline(control_points):
260 | # determine appropriate k
261 | k = min(3, control_points.shape[0]-1)
262 | spline = interpolate.splprep(control_points.T, s=0, k=k)
263 | return spline
264 |
265 | def sample_from_spline(spline, num_samples):
266 | sample_points = np.linspace(0, 1, num_samples)
267 | if isinstance(spline, RotationSpline):
268 | samples = spline(sample_points).as_matrix() # [num_samples, 3, 3]
269 | else:
270 | assert isinstance(spline, tuple) and len(spline) == 2, 'spline must be a tuple of (tck, u)'
271 | tck, u = spline
272 | samples = interpolate.splev(np.linspace(0, 1, num_samples), tck) # [spline_dim, num_samples]
273 | samples = np.array(samples).T # [num_samples, spline_dim]
274 | return samples
275 |
276 | def linear_interpolate_poses(start_pose, end_pose, num_poses):
277 | """
278 | Interpolate between start and end pose.
279 | """
280 | assert num_poses >= 2, 'num_poses must be at least 2'
281 | if start_pose.shape == (6,) and end_pose.shape == (6,):
282 | start_pos, start_euler = start_pose[:3], start_pose[3:]
283 | end_pos, end_euler = end_pose[:3], end_pose[3:]
284 | start_rotmat = T.euler2mat(start_euler)
285 | end_rotmat = T.euler2mat(end_euler)
286 | elif start_pose.shape == (4, 4) and end_pose.shape == (4, 4):
287 | start_pos = start_pose[:3, 3]
288 | start_rotmat = start_pose[:3, :3]
289 | end_pos = end_pose[:3, 3]
290 | end_rotmat = end_pose[:3, :3]
291 | elif start_pose.shape == (7,) and end_pose.shape == (7,):
292 | start_pos, start_quat = start_pose[:3], start_pose[3:]
293 | start_rotmat = T.quat2mat(start_quat)
294 | end_pos, end_quat = end_pose[:3], end_pose[3:]
295 | end_rotmat = T.quat2mat(end_quat)
296 | else:
297 | raise ValueError('start_pose and end_pose not recognized')
298 | slerp = Slerp([0, 1], R.from_matrix([start_rotmat, end_rotmat]))
299 | poses = []
300 | for i in range(num_poses):
301 | alpha = i / (num_poses - 1)
302 | pos = start_pos * (1 - alpha) + end_pos * alpha
303 | rotmat = slerp(alpha).as_matrix()
304 | if start_pose.shape == (6,):
305 | euler = T.mat2euler(rotmat)
306 | poses.append(np.concatenate([pos, euler]))
307 | elif start_pose.shape == (4, 4):
308 | pose = np.eye(4)
309 | pose[:3, :3] = rotmat
310 | pose[:3, 3] = pos
311 | poses.append(pose)
312 | elif start_pose.shape == (7,):
313 | quat = T.mat2quat(rotmat)
314 | pose = np.concatenate([pos, quat])
315 | poses.append(pose)
316 | return np.array(poses)
317 |
318 | def spline_interpolate_poses(control_points, num_steps):
319 | """
320 | Interpolate between through the control points using spline interpolation.
321 | 1. Fit a b-spline through the positional terms of the control points.
322 | 2. Fit a RotationSpline through the rotational terms of the control points.
323 | 3. Sample the b-spline and RotationSpline at num_steps.
324 |
325 | Args:
326 | control_points: [N, 6] position + euler or [N, 4, 4] pose or [N, 7] position + quat
327 | num_steps: number of poses to interpolate
328 | Returns:
329 | poses: [num_steps, 6] position + euler or [num_steps, 4, 4] pose or [num_steps, 7] position + quat
330 | """
331 | assert num_steps >= 2, 'num_steps must be at least 2'
332 | if isinstance(control_points, list):
333 | control_points = np.array(control_points)
334 | if control_points.shape[1] == 6:
335 | control_points_pos = control_points[:, :3] # [N, 3]
336 | control_points_euler = control_points[:, 3:] # [N, 3]
337 | control_points_rotmat = []
338 | for control_point_euler in control_points_euler:
339 | control_points_rotmat.append(T.euler2mat(control_point_euler))
340 | control_points_rotmat = np.array(control_points_rotmat) # [N, 3, 3]
341 | elif control_points.shape[1] == 4 and control_points.shape[2] == 4:
342 | control_points_pos = control_points[:, :3, 3] # [N, 3]
343 | control_points_rotmat = control_points[:, :3, :3] # [N, 3, 3]
344 | elif control_points.shape[1] == 7:
345 | control_points_pos = control_points[:, :3]
346 | control_points_rotmat = []
347 | for control_point_quat in control_points[:, 3:]:
348 | control_points_rotmat.append(T.quat2mat(control_point_quat))
349 | control_points_rotmat = np.array(control_points_rotmat)
350 | else:
351 | raise ValueError('control_points not recognized')
352 | # remove the duplicate points (threshold 1e-3)
353 | diff = np.linalg.norm(np.diff(control_points_pos, axis=0), axis=1)
354 | mask = diff > 1e-3
355 | # always keep the first and last points
356 | mask = np.concatenate([[True], mask[:-1], [True]])
357 | control_points_pos = control_points_pos[mask]
358 | control_points_rotmat = control_points_rotmat[mask]
359 | # fit b-spline through positional terms control points
360 | pos_spline = fit_b_spline(control_points_pos)
361 | # fit RotationSpline through rotational terms control points
362 | times = pos_spline[1]
363 | rotations = R.from_matrix(control_points_rotmat)
364 | rot_spline = RotationSpline(times, rotations)
365 | # sample from the splines
366 | pos_samples = sample_from_spline(pos_spline, num_steps) # [num_steps, 3]
367 | rot_samples = sample_from_spline(rot_spline, num_steps) # [num_steps, 3, 3]
368 | if control_points.shape[1] == 6:
369 | poses = []
370 | for i in range(num_steps):
371 | pose = np.concatenate([pos_samples[i], T.mat2euler(rot_samples[i])])
372 | poses.append(pose)
373 | poses = np.array(poses)
374 | elif control_points.shape[1] == 4 and control_points.shape[2] == 4:
375 | poses = np.empty((num_steps, 4, 4))
376 | poses[:, :3, :3] = rot_samples
377 | poses[:, :3, 3] = pos_samples
378 | poses[:, 3, 3] = 1
379 | elif control_points.shape[1] == 7:
380 | poses = np.empty((num_steps, 7))
381 | for i in range(num_steps):
382 | quat = T.mat2quat(rot_samples[i])
383 | pose = np.concatenate([pos_samples[i], quat])
384 | poses[i] = pose
385 | return poses
386 |
387 | def get_linear_interpolation_steps(start_pose, end_pose, pos_step_size, rot_step_size):
388 | """
389 | Given start and end pose, calculate the number of steps to interpolate between them.
390 | Args:
391 | start_pose: [6] position + euler or [4, 4] pose or [7] position + quat
392 | end_pose: [6] position + euler or [4, 4] pose or [7] position + quat
393 | pos_step_size: position step size
394 | rot_step_size: rotation step size
395 | Returns:
396 | num_path_poses: number of poses to interpolate
397 | """
398 | if start_pose.shape == (6,) and end_pose.shape == (6,):
399 | start_pos, start_euler = start_pose[:3], start_pose[3:]
400 | end_pos, end_euler = end_pose[:3], end_pose[3:]
401 | start_rotmat = T.euler2mat(start_euler)
402 | end_rotmat = T.euler2mat(end_euler)
403 | elif start_pose.shape == (4, 4) and end_pose.shape == (4, 4):
404 | start_pos = start_pose[:3, 3]
405 | start_rotmat = start_pose[:3, :3]
406 | end_pos = end_pose[:3, 3]
407 | end_rotmat = end_pose[:3, :3]
408 | elif start_pose.shape == (7,) and end_pose.shape == (7,):
409 | start_pos, start_quat = start_pose[:3], start_pose[3:]
410 | start_rotmat = T.quat2mat(start_quat)
411 | end_pos, end_quat = end_pose[:3], end_pose[3:]
412 | end_rotmat = T.quat2mat(end_quat)
413 | else:
414 | raise ValueError('start_pose and end_pose not recognized')
415 | pos_diff = np.linalg.norm(start_pos - end_pos)
416 | rot_diff = angle_between_rotmat(start_rotmat, end_rotmat)
417 | pos_num_steps = np.ceil(pos_diff / pos_step_size)
418 | rot_num_steps = np.ceil(rot_diff / rot_step_size)
419 | num_path_poses = int(max(pos_num_steps, rot_num_steps))
420 | num_path_poses = max(num_path_poses, 2) # at least start and end poses
421 | return num_path_poses
422 |
423 | def farthest_point_sampling(pc, num_points):
424 | """
425 | Given a point cloud, sample num_points points that are the farthest apart.
426 | Use o3d farthest point sampling.
427 | """
428 | assert pc.ndim == 2 and pc.shape[1] == 3, "pc must be a (N, 3) numpy array"
429 | pcd = o3d.geometry.PointCloud()
430 | pcd.points = o3d.utility.Vector3dVector(pc)
431 | downpcd_farthest = pcd.farthest_point_down_sample(num_points)
432 | return np.asarray(downpcd_farthest.points)
--------------------------------------------------------------------------------
/videos/2024-10-12-09-29-53.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/videos/2024-10-12-09-29-53.mp4
--------------------------------------------------------------------------------
/videos/2024-10-14-02-04-49.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/videos/2024-10-14-02-04-49.mp4
--------------------------------------------------------------------------------
/videos/2024-10-14-03-26-38.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/videos/2024-10-14-03-26-38.mp4
--------------------------------------------------------------------------------
/videos/2024-10-14-03-42-23.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/videos/2024-10-14-03-42-23.mp4
--------------------------------------------------------------------------------
/videos/2024-10-14-04-16-18.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/videos/2024-10-14-04-16-18.mp4
--------------------------------------------------------------------------------
/videos/2024-10-14-04-23-14.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/videos/2024-10-14-04-23-14.mp4
--------------------------------------------------------------------------------
/videos/2024-10-14-08-46-17.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/videos/2024-10-14-08-46-17.mp4
--------------------------------------------------------------------------------
/visualizer.py:
--------------------------------------------------------------------------------
1 | import open3d as o3d
2 | import numpy as np
3 | import matplotlib
4 | import cv2
5 | import transform_utils as T
6 | from utils import filter_points_by_bounds, batch_transform_points
7 |
8 | def add_to_visualize_buffer(visualize_buffer, visualize_points, visualize_colors):
9 | assert visualize_points.shape[0] == visualize_colors.shape[0], f'got {visualize_points.shape[0]} for points and {visualize_colors.shape[0]} for colors'
10 | if len(visualize_points) == 0:
11 | return
12 | assert visualize_points.shape[1] == 3
13 | assert visualize_colors.shape[1] == 3
14 | # assert visualize_colors.max() <= 1.0 and visualize_colors.min() >= 0.0
15 | visualize_buffer["points"].append(visualize_points)
16 | visualize_buffer["colors"].append(visualize_colors)
17 |
18 | def generate_nearby_points(point, num_points_per_side=5, half_range=0.005):
19 | if point.ndim == 1:
20 | offsets = np.linspace(-1, 1, num_points_per_side)
21 | offsets_meshgrid = np.meshgrid(offsets, offsets, offsets)
22 | offsets_array = np.stack(offsets_meshgrid, axis=-1).reshape(-1, 3)
23 | nearby_points = point + offsets_array * half_range
24 | return nearby_points.reshape(-1, 3)
25 | else:
26 | assert point.shape[1] == 3, "point must be (N, 3)"
27 | assert point.ndim == 2, "point must be (N, 3)"
28 | # vectorized version
29 | offsets = np.linspace(-1, 1, num_points_per_side)
30 | offsets_meshgrid = np.meshgrid(offsets, offsets, offsets)
31 | offsets_array = np.stack(offsets_meshgrid, axis=-1).reshape(-1, 3)
32 | nearby_points = point[:, None, :] + offsets_array
33 | return nearby_points
34 |
35 | class Visualizer:
36 | def __init__(self, config, env):
37 | self.config = config
38 | self.env = env
39 | self.bounds_min = np.array(self.config['bounds_min'])
40 | self.bounds_max = np.array(self.config['bounds_max'])
41 | self.color = np.array([0.05, 0.55, 0.26])
42 | self.world2viewer = np.array([
43 | [0.3788, 0.3569, -0.8539, 0.0],
44 | [0.9198, -0.0429, 0.3901, 0.0],
45 | [-0.1026, 0.9332, 0.3445, 0.0],
46 | [0.0, 0.0, 0.0, 1.0]
47 | ]).T
48 |
49 | def show_img(self, rgb):
50 | cv2.imshow('img', rgb[..., ::-1])
51 | cv2.waitKey(0)
52 | print('showing image, click on the window and press "ESC" to close and continue')
53 | cv2.destroyAllWindows()
54 |
55 | def show_pointcloud(self, points, colors):
56 | # transform to viewer frame
57 | points = np.dot(points, self.world2viewer[:3, :3].T) + self.world2viewer[:3, 3]
58 | # clip color to [0, 1]
59 | colors = np.clip(colors, 0.0, 1.0)
60 | pcd = o3d.geometry.PointCloud()
61 | pcd.points = o3d.utility.Vector3dVector(points.astype(np.float64)) # float64 is a lot faster than float32 when added to o3d later
62 | pcd.colors = o3d.utility.Vector3dVector(colors.astype(np.float64))
63 | print('visualizing pointcloud, click on the window and press "ESC" to close and continue')
64 | o3d.visualization.draw_geometries([pcd])
65 |
66 | def _get_scene_points_and_colors(self):
67 | # scene
68 | cam_obs = self.env.get_cam_obs()
69 | scene_points = []
70 | scene_colors = []
71 | for cam_id in range(len(cam_obs)):
72 | cam_points = cam_obs[cam_id]['points'].reshape(-1, 3)
73 | cam_colors = cam_obs[cam_id]['rgb'].reshape(-1, 3) / 255.0
74 | # clip to workspace
75 | within_workspace_mask = filter_points_by_bounds(cam_points, self.bounds_min, self.bounds_max, strict=False)
76 | cam_points = cam_points[within_workspace_mask]
77 | cam_colors = cam_colors[within_workspace_mask]
78 | scene_points.append(cam_points)
79 | scene_colors.append(cam_colors)
80 | scene_points = np.concatenate(scene_points, axis=0)
81 | scene_colors = np.concatenate(scene_colors, axis=0)
82 | return scene_points, scene_colors
83 |
84 | def visualize_subgoal(self, subgoal_pose):
85 | visualize_buffer = {
86 | "points": [],
87 | "colors": []
88 | }
89 | # scene
90 | scene_points, scene_colors = self._get_scene_points_and_colors()
91 | add_to_visualize_buffer(visualize_buffer, scene_points, scene_colors)
92 | subgoal_pose_homo = T.convert_pose_quat2mat(subgoal_pose)
93 | # subgoal
94 | collision_points = self.env.get_collision_points(noise=False)
95 | # transform collision points to the subgoal frame
96 | ee_pose = self.env.get_ee_pose()
97 | ee_pose_homo = T.convert_pose_quat2mat(ee_pose)
98 | centering_transform = np.linalg.inv(ee_pose_homo)
99 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3]
100 | transformed_collision_points = batch_transform_points(collision_points_centered, subgoal_pose_homo[None]).reshape(-1, 3)
101 | collision_points_colors = np.array([self.color] * len(collision_points))
102 | add_to_visualize_buffer(visualize_buffer, transformed_collision_points, collision_points_colors)
103 | # add keypoints
104 | keypoints = self.env.get_keypoint_positions()
105 | num_keypoints = keypoints.shape[0]
106 | color_map = matplotlib.colormaps["gist_rainbow"]
107 | keypoints_colors = [color_map(i / num_keypoints)[:3] for i in range(num_keypoints)]
108 | for i in range(num_keypoints):
109 | nearby_points = generate_nearby_points(keypoints[i], num_points_per_side=6, half_range=0.009)
110 | nearby_colors = np.tile(keypoints_colors[i], (nearby_points.shape[0], 1))
111 | nearby_colors = 0.5 * nearby_colors + 0.5 * np.array([1, 1, 1])
112 | add_to_visualize_buffer(visualize_buffer, nearby_points, nearby_colors)
113 | # visualize
114 | visualize_points = np.concatenate(visualize_buffer["points"], axis=0)
115 | visualize_colors = np.concatenate(visualize_buffer["colors"], axis=0)
116 | self.show_pointcloud(visualize_points, visualize_colors)
117 |
118 | def visualize_path(self, path):
119 | visualize_buffer = {
120 | "points": [],
121 | "colors": []
122 | }
123 | # scene
124 | scene_points, scene_colors = self._get_scene_points_and_colors()
125 | add_to_visualize_buffer(visualize_buffer, scene_points, scene_colors)
126 | # draw curve based on poses
127 | for t in range(len(path) - 1):
128 | start = path[t][:3]
129 | end = path[t + 1][:3]
130 | num_interp_points = int(np.linalg.norm(start - end) / 0.0002)
131 | interp_points = np.linspace(start, end, num_interp_points)
132 | interp_colors = np.tile([0.0, 0.0, 0.0], (num_interp_points, 1))
133 | # add a tint of white (the higher the j, the more white)
134 | whitening_coef = 0.3 + 0.5 * (t / len(path))
135 | interp_colors = (1 - whitening_coef) * interp_colors + whitening_coef * np.array([1, 1, 1])
136 | add_to_visualize_buffer(visualize_buffer, interp_points, interp_colors)
137 | # subsample path with a fixed step size
138 | step_size = 0.05
139 | subpath = [path[0]]
140 | for i in range(1, len(path) - 1):
141 | dist = np.linalg.norm(np.array(path[i][:3]) - np.array(subpath[-1][:3]))
142 | if dist > step_size:
143 | subpath.append(path[i])
144 | subpath.append(path[-1])
145 | path = np.array(subpath)
146 | path_length = path.shape[0]
147 | # path points
148 | collision_points = self.env.get_collision_points(noise=False)
149 | num_points = collision_points.shape[0]
150 | start_pose = self.env.get_ee_pose()
151 | centering_transform = np.linalg.inv(T.convert_pose_quat2mat(start_pose))
152 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3]
153 | poses_homo = T.convert_pose_quat2mat(path[:, :7]) # the last number is gripper action
154 | transformed_collision_points = batch_transform_points(collision_points_centered, poses_homo).reshape(-1, 3) # (num_poses, num_points, 3)
155 | # calculate color based on the timestep
156 | collision_points_colors = np.ones([path_length, num_points, 3]) * self.color[None, None]
157 | for t in range(path_length):
158 | whitening_coef = 0.3 + 0.5 * (t / path_length)
159 | collision_points_colors[t] = (1 - whitening_coef) * collision_points_colors[t] + whitening_coef * np.array([1, 1, 1])
160 | collision_points_colors = collision_points_colors.reshape(-1, 3)
161 | add_to_visualize_buffer(visualize_buffer, transformed_collision_points, collision_points_colors)
162 | # keypoints
163 | keypoints = self.env.get_keypoint_positions()
164 | num_keypoints = keypoints.shape[0]
165 | color_map = matplotlib.colormaps["gist_rainbow"]
166 | keypoints_colors = [color_map(i / num_keypoints)[:3] for i in range(num_keypoints)]
167 | for i in range(num_keypoints):
168 | nearby_points = generate_nearby_points(keypoints[i], num_points_per_side=6, half_range=0.009)
169 | nearby_colors = np.tile(keypoints_colors[i], (nearby_points.shape[0], 1))
170 | nearby_colors = 0.5 * nearby_colors + 0.5 * np.array([1, 1, 1])
171 | add_to_visualize_buffer(visualize_buffer, nearby_points, nearby_colors)
172 | # visualize
173 | visualize_points = np.concatenate(visualize_buffer["points"], axis=0)
174 | visualize_colors = np.concatenate(visualize_buffer["colors"], axis=0)
175 | self.show_pointcloud(visualize_points, visualize_colors)
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-17-50_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/prompt.txt:
--------------------------------------------------------------------------------
1 | ## Instructions
2 | Suppose you are controlling a robot to perform manipulation tasks by writing constraint functions in Python. The manipulation task is given as an image of the environment, overlayed with keypoints marked with their indices, along with a text instruction. For each given task, please perform the following steps:
3 | - Determine how many stages are involved in the task. Grasping must be an independent stage. Some examples:
4 | - "pouring tea from teapot":
5 | - 3 stages: "grasp teapot", "align teapot with cup opening", and "pour liquid"
6 | - "put red block on top of blue block":
7 | - 3 stages: "grasp red block", "drop the red block on top of blue block"
8 | - "reorient bouquet and drop it upright into vase":
9 | - 3 stages: "grasp bouquet", "reorient bouquet", and "keep upright and drop into vase"
10 | - For each stage, write two kinds of constraints, "sub-goal constraints" and "path constraints". The "sub-goal constraints" are constraints that must be satisfied **at the end of the stage**, while the "path constraints" are constraints that must be satisfied **within the stage**. Some examples:
11 | - "pouring liquid from teapot":
12 | - "grasp teapot" stage:
13 | - 1 sub-goal constraints: "align the end-effector with the teapot handle"
14 | - 0 path constraints
15 | - "align teapot with cup opening" stage:
16 | - 1 sub-goal constraints: "the teapot spout needs to be 10cm above the cup opening"
17 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot must stay upright to avoid spilling"
18 | - "pour liquid" stage:
19 | - 2 sub-goal constraints: "the teapot spout needs to be 5cm above the cup opening", "the teapot spout must be tilted to pour liquid"
20 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot spout is directly above the cup opening"
21 | - "put red block on top of blue block":
22 | - "grasp red block" stage:
23 | - 1 sub-goal constraints: "align the end-effector with the red block"
24 | - 0 path constraints
25 | - "drop the red block on top of blue block" stage:
26 | - 1 sub-goal constraints: "the red block is 10cm on top of the blue block"
27 | - 1 path constraints: "the robot must still be grasping the red block"
28 | - "eorient bouquet and drop it upright into vase":
29 | - "grasp bouquet" stage:
30 | - 1 sub-goal constraints: "align the end-effector with the bouquet stem"
31 | - 0 path constraints
32 | - "reorient bouquet" stage:
33 | - 1 sub-goal constraints: "the bouquet is upright (parallel to the z-axis)"
34 | - 1 path constraints: "the robot must still be grasping the bouquet stem"
35 | - "keep upright and drop into vase" stage:
36 | - 2 sub-goal constraints: "the bouquet must still stay upright (parallel to the z-axis)", "the bouquet is 20cm above the vase opening"
37 | - 1 path constraints: "the robot must still be grasping the bouquet stem"
38 | - Summarize keypoints to be grasped in all grasping stages by defining the `grasp_keypoints` variable.
39 | - Summarize at the end of which stage the robot should release the keypoints by defining the `release_keypoints` variable.
40 |
41 | **Note:**
42 | - Each constraint takes a dummy end-effector point and a set of keypoints as input and returns a numerical cost, where the constraint is satisfied if the cost is smaller than or equal to zero.
43 | - For each stage, you may write 0 or more sub-goal constraints and 0 or more path constraints.
44 | - Avoid using "if" statements in your constraints.
45 | - Avoid using path constraints when manipulating deformable objects (e.g., clothing, towels).
46 | - You do not need to consider collision avoidance. Focus on what is necessary to complete the task.
47 | - Inputs to the constraints are as follows:
48 | - `end_effector`: np.array of shape `(3,)` representing the end-effector position.
49 | - `keypoints`: np.array of shape `(K, 3)` representing the keypoint positions.
50 | - For any path constraint that requires the robot to be still grasping a keypoint `i`, you may use the provided function `get_grasping_cost_by_keypoint_idx` by calling `return get_grasping_cost_by_keypoint_idx(i)` where `i` is the index of the keypoint.
51 | - Inside of each function, you may use native Python functions, any NumPy functions, and the provided `get_grasping_cost_by_keypoint_idx` function.
52 | - For grasping stage, you should only write one sub-goal constraint that associates the end-effector with a keypoint. No path constraints are needed.
53 | - In order to move a keypoint, its associated object must be grasped in one of the previous stages.
54 | - The robot can only grasp one object at a time.
55 | - Grasping must be an independent stage from other stages.
56 | - You may use two keypoints to form a vector, which can be used to specify a rotation (by specifying the angle between the vector and a fixed axis).
57 | - You may use multiple keypoints to specify a surface or volume.
58 | - The keypoints marked on the image start with index 0, same as the given argument `keypoints` array.
59 | - For a point `i` to be relative to another point `j`, the function should define an `offsetted_point` variable that has the delta added to keypoint `j and then calculate the norm of the xyz coordinates of the keypoint `i` and the `offsetted_point`.
60 | - If you would like to specify a location not marked by a keypoint, try using multiple keypoints to specify the location (e.g., you may take the mean of multiple keypoints if the desired location is in the center of those keypoints).
61 |
62 | **Structure your output in a single python code block as follows:**
63 | ```python
64 |
65 | # Your explanation of how many stages are involved in the task and what each stage is about.
66 | # ...
67 |
68 | num_stages = ?
69 |
70 | ### stage 1 sub-goal constraints (if any)
71 | def stage1_subgoal_constraint1(end_effector, keypoints):
72 | """Put your explanation here."""
73 | ...
74 | return cost
75 | # Add more sub-goal constraints if needed
76 | ...
77 |
78 | ### stage 1 path constraints (if any)
79 | def stage1_path_constraint1(end_effector, keypoints):
80 | """Put your explanation here."""
81 | ...
82 | return cost
83 | # Add more path constraints if needed
84 | ...
85 |
86 | # repeat for more stages
87 | ...
88 |
89 | """
90 | Summarize keypoints to be grasped in all grasping stages.
91 | The length of the list should be equal to the number of stages.
92 | For grapsing stage, write the keypoint index. For non-grasping stage, write -1.
93 | """
94 | grasp_keypoints = [?, ..., ?]
95 |
96 | """
97 | Summarize at **the end of which stage** the robot should release the keypoints.
98 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously).
99 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase.
100 | The length of the list should be equal to the number of stages.
101 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1.
102 | """
103 | release_keypoints = [?, ..., ?]
104 |
105 | ```
106 |
107 | ## Query
108 | Query Task: "reorient the red pen and drop it upright into the black pen holder"
109 | Query Image:
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-17-50_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/vlm_query/2024-10-14_08-17-50_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-24-59_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/prompt.txt:
--------------------------------------------------------------------------------
1 | ## Instructions
2 | Suppose you are controlling a robot to perform manipulation tasks by writing constraint functions in Python. The manipulation task is given as an image of the environment, overlayed with keypoints marked with their indices, along with a text instruction. For each given task, please perform the following steps:
3 | - Determine how many stages are involved in the task. Grasping must be an independent stage. Some examples:
4 | - "pouring tea from teapot":
5 | - 3 stages: "grasp teapot", "align teapot with cup opening", and "pour liquid"
6 | - "put red block on top of blue block":
7 | - 3 stages: "grasp red block", "drop the red block on top of blue block"
8 | - "reorient bouquet and drop it upright into vase":
9 | - 3 stages: "grasp bouquet", "reorient bouquet", and "keep upright and drop into vase"
10 | - For each stage, write two kinds of constraints, "sub-goal constraints" and "path constraints". The "sub-goal constraints" are constraints that must be satisfied **at the end of the stage**, while the "path constraints" are constraints that must be satisfied **within the stage**. Some examples:
11 | - "pouring liquid from teapot":
12 | - "grasp teapot" stage:
13 | - 1 sub-goal constraints: "align the end-effector with the teapot handle"
14 | - 0 path constraints
15 | - "align teapot with cup opening" stage:
16 | - 1 sub-goal constraints: "the teapot spout needs to be 10cm above the cup opening"
17 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot must stay upright to avoid spilling"
18 | - "pour liquid" stage:
19 | - 2 sub-goal constraints: "the teapot spout needs to be 5cm above the cup opening", "the teapot spout must be tilted to pour liquid"
20 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot spout is directly above the cup opening"
21 | - "put red block on top of blue block":
22 | - "grasp red block" stage:
23 | - 1 sub-goal constraints: "align the end-effector with the red block"
24 | - 0 path constraints
25 | - "drop the red block on top of blue block" stage:
26 | - 1 sub-goal constraints: "the red block is 10cm on top of the blue block"
27 | - 1 path constraints: "the robot must still be grasping the red block"
28 | - "eorient bouquet and drop it upright into vase":
29 | - "grasp bouquet" stage:
30 | - 1 sub-goal constraints: "align the end-effector with the bouquet stem"
31 | - 0 path constraints
32 | - "reorient bouquet" stage:
33 | - 1 sub-goal constraints: "the bouquet is upright (parallel to the z-axis)"
34 | - 1 path constraints: "the robot must still be grasping the bouquet stem"
35 | - "keep upright and drop into vase" stage:
36 | - 2 sub-goal constraints: "the bouquet must still stay upright (parallel to the z-axis)", "the bouquet is 20cm above the vase opening"
37 | - 1 path constraints: "the robot must still be grasping the bouquet stem"
38 | - Summarize keypoints to be grasped in all grasping stages by defining the `grasp_keypoints` variable.
39 | - Summarize at the end of which stage the robot should release the keypoints by defining the `release_keypoints` variable.
40 |
41 | **Note:**
42 | - Each constraint takes a dummy end-effector point and a set of keypoints as input and returns a numerical cost, where the constraint is satisfied if the cost is smaller than or equal to zero.
43 | - For each stage, you may write 0 or more sub-goal constraints and 0 or more path constraints.
44 | - Avoid using "if" statements in your constraints.
45 | - Avoid using path constraints when manipulating deformable objects (e.g., clothing, towels).
46 | - You do not need to consider collision avoidance. Focus on what is necessary to complete the task.
47 | - Inputs to the constraints are as follows:
48 | - `end_effector`: np.array of shape `(3,)` representing the end-effector position.
49 | - `keypoints`: np.array of shape `(K, 3)` representing the keypoint positions.
50 | - For any path constraint that requires the robot to be still grasping a keypoint `i`, you may use the provided function `get_grasping_cost_by_keypoint_idx` by calling `return get_grasping_cost_by_keypoint_idx(i)` where `i` is the index of the keypoint.
51 | - Inside of each function, you may use native Python functions, any NumPy functions, and the provided `get_grasping_cost_by_keypoint_idx` function.
52 | - For grasping stage, you should only write one sub-goal constraint that associates the end-effector with a keypoint. No path constraints are needed.
53 | - In order to move a keypoint, its associated object must be grasped in one of the previous stages.
54 | - The robot can only grasp one object at a time.
55 | - Grasping must be an independent stage from other stages.
56 | - You may use two keypoints to form a vector, which can be used to specify a rotation (by specifying the angle between the vector and a fixed axis).
57 | - You may use multiple keypoints to specify a surface or volume.
58 | - The keypoints marked on the image start with index 0, same as the given argument `keypoints` array.
59 | - For a point `i` to be relative to another point `j`, the function should define an `offsetted_point` variable that has the delta added to keypoint `j and then calculate the norm of the xyz coordinates of the keypoint `i` and the `offsetted_point`.
60 | - If you would like to specify a location not marked by a keypoint, try using multiple keypoints to specify the location (e.g., you may take the mean of multiple keypoints if the desired location is in the center of those keypoints).
61 |
62 | **Structure your output in a single python code block as follows:**
63 | ```python
64 |
65 | # Your explanation of how many stages are involved in the task and what each stage is about.
66 | # ...
67 |
68 | num_stages = ?
69 |
70 | ### stage 1 sub-goal constraints (if any)
71 | def stage1_subgoal_constraint1(end_effector, keypoints):
72 | """Put your explanation here."""
73 | ...
74 | return cost
75 | # Add more sub-goal constraints if needed
76 | ...
77 |
78 | ### stage 1 path constraints (if any)
79 | def stage1_path_constraint1(end_effector, keypoints):
80 | """Put your explanation here."""
81 | ...
82 | return cost
83 | # Add more path constraints if needed
84 | ...
85 |
86 | # repeat for more stages
87 | ...
88 |
89 | """
90 | Summarize keypoints to be grasped in all grasping stages.
91 | The length of the list should be equal to the number of stages.
92 | For grapsing stage, write the keypoint index. For non-grasping stage, write -1.
93 | """
94 | grasp_keypoints = [?, ..., ?]
95 |
96 | """
97 | Summarize at **the end of which stage** the robot should release the keypoints.
98 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously).
99 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase.
100 | The length of the list should be equal to the number of stages.
101 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1.
102 | """
103 | release_keypoints = [?, ..., ?]
104 |
105 | ```
106 |
107 | ## Query
108 | Query Task: "reorient the red pen and drop it upright into the black pen holder"
109 | Query Image:
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-24-59_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/vlm_query/2024-10-14_08-24-59_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-26-54_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/prompt.txt:
--------------------------------------------------------------------------------
1 | ## Instructions
2 | Suppose you are controlling a robot to perform manipulation tasks by writing constraint functions in Python. The manipulation task is given as an image of the environment, overlayed with keypoints marked with their indices, along with a text instruction. For each given task, please perform the following steps:
3 | - Determine how many stages are involved in the task. Grasping must be an independent stage. Some examples:
4 | - "pouring tea from teapot":
5 | - 3 stages: "grasp teapot", "align teapot with cup opening", and "pour liquid"
6 | - "put red block on top of blue block":
7 | - 3 stages: "grasp red block", "drop the red block on top of blue block"
8 | - "reorient bouquet and drop it upright into vase":
9 | - 3 stages: "grasp bouquet", "reorient bouquet", and "keep upright and drop into vase"
10 | - For each stage, write two kinds of constraints, "sub-goal constraints" and "path constraints". The "sub-goal constraints" are constraints that must be satisfied **at the end of the stage**, while the "path constraints" are constraints that must be satisfied **within the stage**. Some examples:
11 | - "pouring liquid from teapot":
12 | - "grasp teapot" stage:
13 | - 1 sub-goal constraints: "align the end-effector with the teapot handle"
14 | - 0 path constraints
15 | - "align teapot with cup opening" stage:
16 | - 1 sub-goal constraints: "the teapot spout needs to be 10cm above the cup opening"
17 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot must stay upright to avoid spilling"
18 | - "pour liquid" stage:
19 | - 2 sub-goal constraints: "the teapot spout needs to be 5cm above the cup opening", "the teapot spout must be tilted to pour liquid"
20 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot spout is directly above the cup opening"
21 | - "put red block on top of blue block":
22 | - "grasp red block" stage:
23 | - 1 sub-goal constraints: "align the end-effector with the red block"
24 | - 0 path constraints
25 | - "drop the red block on top of blue block" stage:
26 | - 1 sub-goal constraints: "the red block is 10cm on top of the blue block"
27 | - 1 path constraints: "the robot must still be grasping the red block"
28 | - "eorient bouquet and drop it upright into vase":
29 | - "grasp bouquet" stage:
30 | - 1 sub-goal constraints: "align the end-effector with the bouquet stem"
31 | - 0 path constraints
32 | - "reorient bouquet" stage:
33 | - 1 sub-goal constraints: "the bouquet is upright (parallel to the z-axis)"
34 | - 1 path constraints: "the robot must still be grasping the bouquet stem"
35 | - "keep upright and drop into vase" stage:
36 | - 2 sub-goal constraints: "the bouquet must still stay upright (parallel to the z-axis)", "the bouquet is 20cm above the vase opening"
37 | - 1 path constraints: "the robot must still be grasping the bouquet stem"
38 | - Summarize keypoints to be grasped in all grasping stages by defining the `grasp_keypoints` variable.
39 | - Summarize at the end of which stage the robot should release the keypoints by defining the `release_keypoints` variable.
40 |
41 | **Note:**
42 | - Each constraint takes a dummy end-effector point and a set of keypoints as input and returns a numerical cost, where the constraint is satisfied if the cost is smaller than or equal to zero.
43 | - For each stage, you may write 0 or more sub-goal constraints and 0 or more path constraints.
44 | - Avoid using "if" statements in your constraints.
45 | - Avoid using path constraints when manipulating deformable objects (e.g., clothing, towels).
46 | - You do not need to consider collision avoidance. Focus on what is necessary to complete the task.
47 | - Inputs to the constraints are as follows:
48 | - `end_effector`: np.array of shape `(3,)` representing the end-effector position.
49 | - `keypoints`: np.array of shape `(K, 3)` representing the keypoint positions.
50 | - For any path constraint that requires the robot to be still grasping a keypoint `i`, you may use the provided function `get_grasping_cost_by_keypoint_idx` by calling `return get_grasping_cost_by_keypoint_idx(i)` where `i` is the index of the keypoint.
51 | - Inside of each function, you may use native Python functions, any NumPy functions, and the provided `get_grasping_cost_by_keypoint_idx` function.
52 | - For grasping stage, you should only write one sub-goal constraint that associates the end-effector with a keypoint. No path constraints are needed.
53 | - In order to move a keypoint, its associated object must be grasped in one of the previous stages.
54 | - The robot can only grasp one object at a time.
55 | - Grasping must be an independent stage from other stages.
56 | - You may use two keypoints to form a vector, which can be used to specify a rotation (by specifying the angle between the vector and a fixed axis).
57 | - You may use multiple keypoints to specify a surface or volume.
58 | - The keypoints marked on the image start with index 0, same as the given argument `keypoints` array.
59 | - For a point `i` to be relative to another point `j`, the function should define an `offsetted_point` variable that has the delta added to keypoint `j and then calculate the norm of the xyz coordinates of the keypoint `i` and the `offsetted_point`.
60 | - If you would like to specify a location not marked by a keypoint, try using multiple keypoints to specify the location (e.g., you may take the mean of multiple keypoints if the desired location is in the center of those keypoints).
61 |
62 | **Structure your output in a single python code block as follows:**
63 | ```python
64 |
65 | # Your explanation of how many stages are involved in the task and what each stage is about.
66 | # ...
67 |
68 | num_stages = ?
69 |
70 | ### stage 1 sub-goal constraints (if any)
71 | def stage1_subgoal_constraint1(end_effector, keypoints):
72 | """Put your explanation here."""
73 | ...
74 | return cost
75 | # Add more sub-goal constraints if needed
76 | ...
77 |
78 | ### stage 1 path constraints (if any)
79 | def stage1_path_constraint1(end_effector, keypoints):
80 | """Put your explanation here."""
81 | ...
82 | return cost
83 | # Add more path constraints if needed
84 | ...
85 |
86 | # repeat for more stages
87 | ...
88 |
89 | """
90 | Summarize keypoints to be grasped in all grasping stages.
91 | The length of the list should be equal to the number of stages.
92 | For grapsing stage, write the keypoint index. For non-grasping stage, write -1.
93 | """
94 | grasp_keypoints = [?, ..., ?]
95 |
96 | """
97 | Summarize at **the end of which stage** the robot should release the keypoints.
98 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously).
99 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase.
100 | The length of the list should be equal to the number of stages.
101 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1.
102 | """
103 | release_keypoints = [?, ..., ?]
104 |
105 | ```
106 |
107 | ## Query
108 | Query Task: "reorient the red pen and drop it upright into the black pen holder"
109 | Query Image:
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-26-54_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/vlm_query/2024-10-14_08-26-54_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-39-29_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/prompt.txt:
--------------------------------------------------------------------------------
1 | ## Instructions
2 | Suppose you are controlling a robot to perform manipulation tasks by writing constraint functions in Python. The manipulation task is given as an image of the environment, overlayed with keypoints marked with their indices, along with a text instruction. For each given task, please perform the following steps:
3 | - Determine how many stages are involved in the task. Grasping must be an independent stage. Some examples:
4 | - "pouring tea from teapot":
5 | - 3 stages: "grasp teapot", "align teapot with cup opening", and "pour liquid"
6 | - "put red block on top of blue block":
7 | - 3 stages: "grasp red block", "drop the red block on top of blue block"
8 | - "reorient bouquet and drop it upright into vase":
9 | - 3 stages: "grasp bouquet", "reorient bouquet", and "keep upright and drop into vase"
10 | - For each stage, write two kinds of constraints, "sub-goal constraints" and "path constraints". The "sub-goal constraints" are constraints that must be satisfied **at the end of the stage**, while the "path constraints" are constraints that must be satisfied **within the stage**. Some examples:
11 | - "pouring liquid from teapot":
12 | - "grasp teapot" stage:
13 | - 1 sub-goal constraints: "align the end-effector with the teapot handle"
14 | - 0 path constraints
15 | - "align teapot with cup opening" stage:
16 | - 1 sub-goal constraints: "the teapot spout needs to be 10cm above the cup opening"
17 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot must stay upright to avoid spilling"
18 | - "pour liquid" stage:
19 | - 2 sub-goal constraints: "the teapot spout needs to be 5cm above the cup opening", "the teapot spout must be tilted to pour liquid"
20 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot spout is directly above the cup opening"
21 | - "put red block on top of blue block":
22 | - "grasp red block" stage:
23 | - 1 sub-goal constraints: "align the end-effector with the red block"
24 | - 0 path constraints
25 | - "drop the red block on top of blue block" stage:
26 | - 1 sub-goal constraints: "the red block is 10cm on top of the blue block"
27 | - 1 path constraints: "the robot must still be grasping the red block"
28 | - "eorient bouquet and drop it upright into vase":
29 | - "grasp bouquet" stage:
30 | - 1 sub-goal constraints: "align the end-effector with the bouquet stem"
31 | - 0 path constraints
32 | - "reorient bouquet" stage:
33 | - 1 sub-goal constraints: "the bouquet is upright (parallel to the z-axis)"
34 | - 1 path constraints: "the robot must still be grasping the bouquet stem"
35 | - "keep upright and drop into vase" stage:
36 | - 2 sub-goal constraints: "the bouquet must still stay upright (parallel to the z-axis)", "the bouquet is 20cm above the vase opening"
37 | - 1 path constraints: "the robot must still be grasping the bouquet stem"
38 | - Summarize keypoints to be grasped in all grasping stages by defining the `grasp_keypoints` variable.
39 | - Summarize at the end of which stage the robot should release the keypoints by defining the `release_keypoints` variable.
40 |
41 | **Note:**
42 | - Each constraint takes a dummy end-effector point and a set of keypoints as input and returns a numerical cost, where the constraint is satisfied if the cost is smaller than or equal to zero.
43 | - For each stage, you may write 0 or more sub-goal constraints and 0 or more path constraints.
44 | - Avoid using "if" statements in your constraints.
45 | - Avoid using path constraints when manipulating deformable objects (e.g., clothing, towels).
46 | - You do not need to consider collision avoidance. Focus on what is necessary to complete the task.
47 | - Inputs to the constraints are as follows:
48 | - `end_effector`: np.array of shape `(3,)` representing the end-effector position.
49 | - `keypoints`: np.array of shape `(K, 3)` representing the keypoint positions.
50 | - For any path constraint that requires the robot to be still grasping a keypoint `i`, you may use the provided function `get_grasping_cost_by_keypoint_idx` by calling `return get_grasping_cost_by_keypoint_idx(i)` where `i` is the index of the keypoint.
51 | - Inside of each function, you may use native Python functions, any NumPy functions, and the provided `get_grasping_cost_by_keypoint_idx` function.
52 | - For grasping stage, you should only write one sub-goal constraint that associates the end-effector with a keypoint. No path constraints are needed.
53 | - In order to move a keypoint, its associated object must be grasped in one of the previous stages.
54 | - The robot can only grasp one object at a time.
55 | - Grasping must be an independent stage from other stages.
56 | - You may use two keypoints to form a vector, which can be used to specify a rotation (by specifying the angle between the vector and a fixed axis).
57 | - You may use multiple keypoints to specify a surface or volume.
58 | - The keypoints marked on the image start with index 0, same as the given argument `keypoints` array.
59 | - For a point `i` to be relative to another point `j`, the function should define an `offsetted_point` variable that has the delta added to keypoint `j and then calculate the norm of the xyz coordinates of the keypoint `i` and the `offsetted_point`.
60 | - If you would like to specify a location not marked by a keypoint, try using multiple keypoints to specify the location (e.g., you may take the mean of multiple keypoints if the desired location is in the center of those keypoints).
61 |
62 | **Structure your output in a single python code block as follows:**
63 | ```python
64 |
65 | # Your explanation of how many stages are involved in the task and what each stage is about.
66 | # ...
67 |
68 | num_stages = ?
69 |
70 | ### stage 1 sub-goal constraints (if any)
71 | def stage1_subgoal_constraint1(end_effector, keypoints):
72 | """Put your explanation here."""
73 | ...
74 | return cost
75 | # Add more sub-goal constraints if needed
76 | ...
77 |
78 | ### stage 1 path constraints (if any)
79 | def stage1_path_constraint1(end_effector, keypoints):
80 | """Put your explanation here."""
81 | ...
82 | return cost
83 | # Add more path constraints if needed
84 | ...
85 |
86 | # repeat for more stages
87 | ...
88 |
89 | """
90 | Summarize keypoints to be grasped in all grasping stages.
91 | The length of the list should be equal to the number of stages.
92 | For grapsing stage, write the keypoint index. For non-grasping stage, write -1.
93 | """
94 | grasp_keypoints = [?, ..., ?]
95 |
96 | """
97 | Summarize at **the end of which stage** the robot should release the keypoints.
98 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously).
99 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase.
100 | The length of the list should be equal to the number of stages.
101 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1.
102 | """
103 | release_keypoints = [?, ..., ?]
104 |
105 | ```
106 |
107 | ## Query
108 | Query Task: "reorient the red pen and drop it upright into the black pen holder"
109 | Query Image:
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-39-29_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/vlm_query/2024-10-14_08-39-29_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/metadata.json:
--------------------------------------------------------------------------------
1 | {"init_keypoint_positions": [[-0.3241568895701906, -0.26135710633859177, 0.8729118042800674], [-0.13032823512207833, -0.21448098577272112, 0.7109137537982642], [0.09161993232061272, -0.18090774877203772, 0.7072121106658631], [-0.1458522626378715, -0.1448651475409916, 0.7143060215345731], [-0.13071311749231349, -0.08189964351178469, 0.7158838367022394], [0.09358737688281993, -0.0534993640009152, 0.7095674875311597], [-0.2694544418102572, 0.08538554046925578, 0.7174613392959628], [-0.3588037694448534, 0.1895498661553155, 0.7242495959791785], [-0.22739163159972983, 0.16978317185973382, 0.809730735390318], [-0.2788102970880201, 0.2161113552597725, 0.7597332230457459]], "num_keypoints": 10, "num_stages": 3, "grasp_keypoints": [1, -1, -1], "release_keypoints": [-1, -1, 1]}
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/output_raw.txt:
--------------------------------------------------------------------------------
1 | ```python
2 | # The task involves 3 stages:
3 | # 1. Grasp the red pen.
4 | # 2. Reorient the red pen to be upright.
5 | # 3. Drop the red pen upright into the black pen holder.
6 |
7 | num_stages = 3
8 |
9 | ### stage 1 sub-goal constraints
10 | def stage1_subgoal_constraint1(end_effector, keypoints):
11 | """Align the end-effector with the red pen (keypoint 1)."""
12 | return np.linalg.norm(end_effector - keypoints[1])
13 |
14 | ### stage 1 path constraints
15 | # No path constraints needed for grasping.
16 |
17 | ### stage 2 sub-goal constraints
18 | def stage2_subgoal_constraint1(end_effector, keypoints):
19 | """The red pen is upright (parallel to the z-axis)."""
20 | pen_vector = keypoints[3] - keypoints[1]
21 | z_axis = np.array([0, 0, 1])
22 | return np.arccos(np.dot(pen_vector, z_axis) / (np.linalg.norm(pen_vector) * np.linalg.norm(z_axis)))
23 |
24 | ### stage 2 path constraints
25 | def stage2_path_constraint1(end_effector, keypoints):
26 | """The robot must still be grasping the red pen."""
27 | return get_grasping_cost_by_keypoint_idx(1)
28 |
29 | ### stage 3 sub-goal constraints
30 | def stage3_subgoal_constraint1(end_effector, keypoints):
31 | """The red pen must still stay upright (parallel to the z-axis)."""
32 | pen_vector = keypoints[3] - keypoints[1]
33 | z_axis = np.array([0, 0, 1])
34 | return np.arccos(np.dot(pen_vector, z_axis) / (np.linalg.norm(pen_vector) * np.linalg.norm(z_axis)))
35 |
36 | def stage3_subgoal_constraint2(end_effector, keypoints):
37 | """The red pen is 10cm above the pen holder opening."""
38 | pen_tip = keypoints[1]
39 | holder_center = (keypoints[6] + keypoints[7] + keypoints[8] + keypoints[9]) / 4
40 | offsetted_point = holder_center + np.array([0, 0, 0.1])
41 | return np.linalg.norm(pen_tip - offsetted_point)
42 |
43 | ### stage 3 path constraints
44 | def stage3_path_constraint1(end_effector, keypoints):
45 | """The robot must still be grasping the red pen."""
46 | return get_grasping_cost_by_keypoint_idx(1)
47 |
48 | """
49 | Summarize keypoints to be grasped in all grasping stages.
50 | The length of the list should be equal to the number of stages.
51 | For grasping stage, write the keypoint index. For non-grasping stage, write -1.
52 | """
53 | grasp_keypoints = [1, -1, -1]
54 |
55 | """
56 | Summarize at **the end of which stage** the robot should release the keypoints.
57 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously).
58 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase.
59 | The length of the list should be equal to the number of stages.
60 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1.
61 | """
62 | release_keypoints = [-1, -1, 1]
63 | ```
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/prompt.txt:
--------------------------------------------------------------------------------
1 | ## Instructions
2 | Suppose you are controlling a robot to perform manipulation tasks by writing constraint functions in Python. The manipulation task is given as an image of the environment, overlayed with keypoints marked with their indices, along with a text instruction. For each given task, please perform the following steps:
3 | - Determine how many stages are involved in the task. Grasping must be an independent stage. Some examples:
4 | - "pouring tea from teapot":
5 | - 3 stages: "grasp teapot", "align teapot with cup opening", and "pour liquid"
6 | - "put red block on top of blue block":
7 | - 3 stages: "grasp red block", "drop the red block on top of blue block"
8 | - "reorient bouquet and drop it upright into vase":
9 | - 3 stages: "grasp bouquet", "reorient bouquet", and "keep upright and drop into vase"
10 | - For each stage, write two kinds of constraints, "sub-goal constraints" and "path constraints". The "sub-goal constraints" are constraints that must be satisfied **at the end of the stage**, while the "path constraints" are constraints that must be satisfied **within the stage**. Some examples:
11 | - "pouring liquid from teapot":
12 | - "grasp teapot" stage:
13 | - 1 sub-goal constraints: "align the end-effector with the teapot handle"
14 | - 0 path constraints
15 | - "align teapot with cup opening" stage:
16 | - 1 sub-goal constraints: "the teapot spout needs to be 10cm above the cup opening"
17 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot must stay upright to avoid spilling"
18 | - "pour liquid" stage:
19 | - 2 sub-goal constraints: "the teapot spout needs to be 5cm above the cup opening", "the teapot spout must be tilted to pour liquid"
20 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot spout is directly above the cup opening"
21 | - "put red block on top of blue block":
22 | - "grasp red block" stage:
23 | - 1 sub-goal constraints: "align the end-effector with the red block"
24 | - 0 path constraints
25 | - "drop the red block on top of blue block" stage:
26 | - 1 sub-goal constraints: "the red block is 10cm on top of the blue block"
27 | - 1 path constraints: "the robot must still be grasping the red block"
28 | - "eorient bouquet and drop it upright into vase":
29 | - "grasp bouquet" stage:
30 | - 1 sub-goal constraints: "align the end-effector with the bouquet stem"
31 | - 0 path constraints
32 | - "reorient bouquet" stage:
33 | - 1 sub-goal constraints: "the bouquet is upright (parallel to the z-axis)"
34 | - 1 path constraints: "the robot must still be grasping the bouquet stem"
35 | - "keep upright and drop into vase" stage:
36 | - 2 sub-goal constraints: "the bouquet must still stay upright (parallel to the z-axis)", "the bouquet is 20cm above the vase opening"
37 | - 1 path constraints: "the robot must still be grasping the bouquet stem"
38 | - Summarize keypoints to be grasped in all grasping stages by defining the `grasp_keypoints` variable.
39 | - Summarize at the end of which stage the robot should release the keypoints by defining the `release_keypoints` variable.
40 |
41 | **Note:**
42 | - Each constraint takes a dummy end-effector point and a set of keypoints as input and returns a numerical cost, where the constraint is satisfied if the cost is smaller than or equal to zero.
43 | - For each stage, you may write 0 or more sub-goal constraints and 0 or more path constraints.
44 | - Avoid using "if" statements in your constraints.
45 | - Avoid using path constraints when manipulating deformable objects (e.g., clothing, towels).
46 | - You do not need to consider collision avoidance. Focus on what is necessary to complete the task.
47 | - Inputs to the constraints are as follows:
48 | - `end_effector`: np.array of shape `(3,)` representing the end-effector position.
49 | - `keypoints`: np.array of shape `(K, 3)` representing the keypoint positions.
50 | - For any path constraint that requires the robot to be still grasping a keypoint `i`, you may use the provided function `get_grasping_cost_by_keypoint_idx` by calling `return get_grasping_cost_by_keypoint_idx(i)` where `i` is the index of the keypoint.
51 | - Inside of each function, you may use native Python functions, any NumPy functions, and the provided `get_grasping_cost_by_keypoint_idx` function.
52 | - For grasping stage, you should only write one sub-goal constraint that associates the end-effector with a keypoint. No path constraints are needed.
53 | - In order to move a keypoint, its associated object must be grasped in one of the previous stages.
54 | - The robot can only grasp one object at a time.
55 | - Grasping must be an independent stage from other stages.
56 | - You may use two keypoints to form a vector, which can be used to specify a rotation (by specifying the angle between the vector and a fixed axis).
57 | - You may use multiple keypoints to specify a surface or volume.
58 | - The keypoints marked on the image start with index 0, same as the given argument `keypoints` array.
59 | - For a point `i` to be relative to another point `j`, the function should define an `offsetted_point` variable that has the delta added to keypoint `j and then calculate the norm of the xyz coordinates of the keypoint `i` and the `offsetted_point`.
60 | - If you would like to specify a location not marked by a keypoint, try using multiple keypoints to specify the location (e.g., you may take the mean of multiple keypoints if the desired location is in the center of those keypoints).
61 |
62 | **Structure your output in a single python code block as follows:**
63 | ```python
64 |
65 | # Your explanation of how many stages are involved in the task and what each stage is about.
66 | # ...
67 |
68 | num_stages = ?
69 |
70 | ### stage 1 sub-goal constraints (if any)
71 | def stage1_subgoal_constraint1(end_effector, keypoints):
72 | """Put your explanation here."""
73 | ...
74 | return cost
75 | # Add more sub-goal constraints if needed
76 | ...
77 |
78 | ### stage 1 path constraints (if any)
79 | def stage1_path_constraint1(end_effector, keypoints):
80 | """Put your explanation here."""
81 | ...
82 | return cost
83 | # Add more path constraints if needed
84 | ...
85 |
86 | # repeat for more stages
87 | ...
88 |
89 | """
90 | Summarize keypoints to be grasped in all grasping stages.
91 | The length of the list should be equal to the number of stages.
92 | For grapsing stage, write the keypoint index. For non-grasping stage, write -1.
93 | """
94 | grasp_keypoints = [?, ..., ?]
95 |
96 | """
97 | Summarize at **the end of which stage** the robot should release the keypoints.
98 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously).
99 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase.
100 | The length of the list should be equal to the number of stages.
101 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1.
102 | """
103 | release_keypoints = [?, ..., ?]
104 |
105 | ```
106 |
107 | ## Query
108 | Query Task: "reorient the red pen and drop it upright into the black pen holder"
109 | Query Image:
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/stage1_subgoal_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage1_subgoal_constraint1(end_effector, keypoints):
2 | """Align the end-effector with the red pen (keypoint 1)."""
3 | return np.linalg.norm(end_effector - keypoints[1])
4 |
5 |
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/stage2_path_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage2_path_constraint1(end_effector, keypoints):
2 | """The robot must still be grasping the red pen."""
3 | return get_grasping_cost_by_keypoint_idx(1)
4 |
5 |
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/stage2_subgoal_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage2_subgoal_constraint1(end_effector, keypoints):
2 | """The red pen is upright (parallel to the z-axis)."""
3 | pen_vector = keypoints[3] - keypoints[1]
4 | z_axis = np.array([0, 0, 1])
5 | return np.arccos(np.dot(pen_vector, z_axis) / (np.linalg.norm(pen_vector) * np.linalg.norm(z_axis)))
6 |
7 |
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/stage3_path_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage3_path_constraint1(end_effector, keypoints):
2 | """The robot must still be grasping the red pen."""
3 | return get_grasping_cost_by_keypoint_idx(1)
4 |
5 |
--------------------------------------------------------------------------------
/vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/stage3_subgoal_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage3_subgoal_constraint1(end_effector, keypoints):
2 | """The red pen must still stay upright (parallel to the z-axis)."""
3 | pen_vector = keypoints[3] - keypoints[1]
4 | z_axis = np.array([0, 0, 1])
5 | return np.arccos(np.dot(pen_vector, z_axis) / (np.linalg.norm(pen_vector) * np.linalg.norm(z_axis)))
6 |
7 | def stage3_subgoal_constraint2(end_effector, keypoints):
8 | """The red pen is 10cm above the pen holder opening."""
9 | pen_tip = keypoints[1]
10 | holder_center = (keypoints[6] + keypoints[7] + keypoints[8] + keypoints[9]) / 4
11 | offsetted_point = holder_center + np.array([0, 0, 0.1])
12 | return np.linalg.norm(pen_tip - offsetted_point)
13 |
14 |
--------------------------------------------------------------------------------
/vlm_query/pen/metadata.json:
--------------------------------------------------------------------------------
1 | {"init_keypoint_positions": [[-0.25794319243703623, -0.2360844662773828, 0.6905476882082092], [-0.2671639177617655, -0.09433527815132495, 0.7122930549896219], [-0.27151234338495983, 0.08395197554366488, 0.7154432798183405], [-0.266379743453929, 0.08175199572534772, 0.8097122583174985], [-0.3607951132392543, 0.18610731832223598, 0.7178293666264162], [-0.2624312374613168, 0.195909440226552, 0.6867906392498943], [-0.31782969861483357, 0.21783651249007482, 0.7905845242493158]], "num_keypoints": 7, "num_stages": 3, "grasp_keypoints": [1, -1, -1], "release_keypoints": [-1, -1, 1]}
--------------------------------------------------------------------------------
/vlm_query/pen/query_img.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/vlm_query/pen/query_img.png
--------------------------------------------------------------------------------
/vlm_query/pen/stage1_subgoal_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage1_subgoal_constraint1(end_effector, keypoints):
2 | """Align the end-effector with the white pen's grasping point (keypoint 1)."""
3 | grasp_point = keypoints[1]
4 | cost = np.linalg.norm(end_effector - grasp_point)
5 | return cost
6 |
7 |
--------------------------------------------------------------------------------
/vlm_query/pen/stage2_path_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage2_path_constraint1(end_effector, keypoints):
2 | """The robot must still be grasping the white pen (keypoint 1)."""
3 | return get_grasping_cost_by_keypoint_idx(1)
4 |
5 |
--------------------------------------------------------------------------------
/vlm_query/pen/stage2_subgoal_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage2_subgoal_constraint1(end_effector, keypoints):
2 | """Ensure the white pen is upright by aligning the vector from keypoint 0 to keypoint 1 with the z-axis."""
3 | pen_vector = keypoints[1] - keypoints[0]
4 | z_axis = np.array([0, 0, 1])
5 | cost = np.linalg.norm(np.cross(pen_vector, z_axis))
6 | return cost
7 |
8 |
--------------------------------------------------------------------------------
/vlm_query/pen/stage3_path_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage3_path_constraint1(end_effector, keypoints):
2 | """The robot must still be grasping the white pen (keypoint 1)."""
3 | return get_grasping_cost_by_keypoint_idx(1)
4 |
5 |
--------------------------------------------------------------------------------
/vlm_query/pen/stage3_subgoal_constraints.txt:
--------------------------------------------------------------------------------
1 | def stage3_subgoal_constraint1(end_effector, keypoints):
2 | """Ensure the white pen is above the black pen holder opening (mean of keypoints 3, 4, 5, 6) and aligned with the z-axis."""
3 | holder_opening = np.mean(keypoints[3:7], axis=0)
4 | above_holder = holder_opening + np.array([0, 0, 0.2]) # 20cm above the holder opening
5 | cost = np.linalg.norm(keypoints[1] - above_holder)
6 | return cost
7 |
8 | def stage3_subgoal_constraint2(end_effector, keypoints):
9 | """Ensure the white pen is upright by aligning the vector from keypoint 0 to keypoint 1 with the z-axis."""
10 | pen_vector = keypoints[1] - keypoints[0]
11 | z_axis = np.array([0, 0, 1])
12 | cost = np.linalg.norm(np.cross(pen_vector, z_axis))
13 | return cost
14 |
15 |
--------------------------------------------------------------------------------
/vlm_query/prompt_template.txt:
--------------------------------------------------------------------------------
1 | ## Instructions
2 | Suppose you are controlling a robot to perform manipulation tasks by writing constraint functions in Python. The manipulation task is given as an image of the environment, overlayed with keypoints marked with their indices, along with a text instruction. For each given task, please perform the following steps:
3 | - Determine how many stages are involved in the task. Grasping must be an independent stage. Some examples:
4 | - "pouring tea from teapot":
5 | - 3 stages: "grasp teapot", "align teapot with cup opening", and "pour liquid"
6 | - "put red block on top of blue block":
7 | - 3 stages: "grasp red block", "drop the red block on top of blue block"
8 | - "reorient bouquet and drop it upright into vase":
9 | - 3 stages: "grasp bouquet", "reorient bouquet", and "keep upright and drop into vase"
10 | - For each stage, write two kinds of constraints, "sub-goal constraints" and "path constraints". The "sub-goal constraints" are constraints that must be satisfied **at the end of the stage**, while the "path constraints" are constraints that must be satisfied **within the stage**. Some examples:
11 | - "pouring liquid from teapot":
12 | - "grasp teapot" stage:
13 | - 1 sub-goal constraints: "align the end-effector with the teapot handle"
14 | - 0 path constraints
15 | - "align teapot with cup opening" stage:
16 | - 1 sub-goal constraints: "the teapot spout needs to be 10cm above the cup opening"
17 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot must stay upright to avoid spilling"
18 | - "pour liquid" stage:
19 | - 2 sub-goal constraints: "the teapot spout needs to be 5cm above the cup opening", "the teapot spout must be tilted to pour liquid"
20 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot spout is directly above the cup opening"
21 | - "put red block on top of blue block":
22 | - "grasp red block" stage:
23 | - 1 sub-goal constraints: "align the end-effector with the red block"
24 | - 0 path constraints
25 | - "drop the red block on top of blue block" stage:
26 | - 1 sub-goal constraints: "the red block is 10cm on top of the blue block"
27 | - 1 path constraints: "the robot must still be grasping the red block"
28 | - "eorient bouquet and drop it upright into vase":
29 | - "grasp bouquet" stage:
30 | - 1 sub-goal constraints: "align the end-effector with the bouquet stem"
31 | - 0 path constraints
32 | - "reorient bouquet" stage:
33 | - 1 sub-goal constraints: "the bouquet is upright (parallel to the z-axis)"
34 | - 1 path constraints: "the robot must still be grasping the bouquet stem"
35 | - "keep upright and drop into vase" stage:
36 | - 2 sub-goal constraints: "the bouquet must still stay upright (parallel to the z-axis)", "the bouquet is 20cm above the vase opening"
37 | - 1 path constraints: "the robot must still be grasping the bouquet stem"
38 | - Summarize keypoints to be grasped in all grasping stages by defining the `grasp_keypoints` variable.
39 | - Summarize at the end of which stage the robot should release the keypoints by defining the `release_keypoints` variable.
40 |
41 | **Note:**
42 | - Each constraint takes a dummy end-effector point and a set of keypoints as input and returns a numerical cost, where the constraint is satisfied if the cost is smaller than or equal to zero.
43 | - For each stage, you may write 0 or more sub-goal constraints and 0 or more path constraints.
44 | - Avoid using "if" statements in your constraints.
45 | - Avoid using path constraints when manipulating deformable objects (e.g., clothing, towels).
46 | - You do not need to consider collision avoidance. Focus on what is necessary to complete the task.
47 | - Inputs to the constraints are as follows:
48 | - `end_effector`: np.array of shape `(3,)` representing the end-effector position.
49 | - `keypoints`: np.array of shape `(K, 3)` representing the keypoint positions.
50 | - For any path constraint that requires the robot to be still grasping a keypoint `i`, you may use the provided function `get_grasping_cost_by_keypoint_idx` by calling `return get_grasping_cost_by_keypoint_idx(i)` where `i` is the index of the keypoint.
51 | - Inside of each function, you may use native Python functions, any NumPy functions, and the provided `get_grasping_cost_by_keypoint_idx` function.
52 | - For grasping stage, you should only write one sub-goal constraint that associates the end-effector with a keypoint. No path constraints are needed.
53 | - In order to move a keypoint, its associated object must be grasped in one of the previous stages.
54 | - The robot can only grasp one object at a time.
55 | - Grasping must be an independent stage from other stages.
56 | - You may use two keypoints to form a vector, which can be used to specify a rotation (by specifying the angle between the vector and a fixed axis).
57 | - You may use multiple keypoints to specify a surface or volume.
58 | - The keypoints marked on the image start with index 0, same as the given argument `keypoints` array.
59 | - For a point `i` to be relative to another point `j`, the function should define an `offsetted_point` variable that has the delta added to keypoint `j and then calculate the norm of the xyz coordinates of the keypoint `i` and the `offsetted_point`.
60 | - If you would like to specify a location not marked by a keypoint, try using multiple keypoints to specify the location (e.g., you may take the mean of multiple keypoints if the desired location is in the center of those keypoints).
61 |
62 | **Structure your output in a single python code block as follows:**
63 | ```python
64 |
65 | # Your explanation of how many stages are involved in the task and what each stage is about.
66 | # ...
67 |
68 | num_stages = ?
69 |
70 | ### stage 1 sub-goal constraints (if any)
71 | def stage1_subgoal_constraint1(end_effector, keypoints):
72 | """Put your explanation here."""
73 | ...
74 | return cost
75 | # Add more sub-goal constraints if needed
76 | ...
77 |
78 | ### stage 1 path constraints (if any)
79 | def stage1_path_constraint1(end_effector, keypoints):
80 | """Put your explanation here."""
81 | ...
82 | return cost
83 | # Add more path constraints if needed
84 | ...
85 |
86 | # repeat for more stages
87 | ...
88 |
89 | """
90 | Summarize keypoints to be grasped in all grasping stages.
91 | The length of the list should be equal to the number of stages.
92 | For grapsing stage, write the keypoint index. For non-grasping stage, write -1.
93 | """
94 | grasp_keypoints = [?, ..., ?]
95 |
96 | """
97 | Summarize at **the end of which stage** the robot should release the keypoints.
98 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously).
99 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase.
100 | The length of the list should be equal to the number of stages.
101 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1.
102 | """
103 | release_keypoints = [?, ..., ?]
104 |
105 | ```
106 |
107 | ## Query
108 | Query Task: "{instruction}"
109 | Query Image:
--------------------------------------------------------------------------------