├── .gitignore ├── README.md ├── configs ├── config.yaml ├── og_scene_file_pen.json └── og_scene_file_red_pen.json ├── constraint_generation.py ├── dinov2_vits14_pretrain.pth ├── environment.py ├── ik_solver.py ├── keypoint_proposal.py ├── main.py ├── media ├── pen-in-holder-disturbances.gif └── pen-in-holder.gif ├── og_utils.py ├── path_solver.py ├── requirements.txt ├── subgoal_solver.py ├── transform_utils.py ├── utils.py ├── videos ├── 2024-10-12-09-29-53.mp4 ├── 2024-10-14-02-04-49.mp4 ├── 2024-10-14-03-26-38.mp4 ├── 2024-10-14-03-42-23.mp4 ├── 2024-10-14-04-16-18.mp4 ├── 2024-10-14-04-23-14.mp4 └── 2024-10-14-08-46-17.mp4 ├── visualizer.py └── vlm_query ├── 2024-10-14_08-17-50_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder ├── prompt.txt └── query_img.png ├── 2024-10-14_08-24-59_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder ├── prompt.txt └── query_img.png ├── 2024-10-14_08-26-54_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder ├── prompt.txt └── query_img.png ├── 2024-10-14_08-39-29_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder ├── prompt.txt └── query_img.png ├── 2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder ├── metadata.json ├── output_raw.txt ├── prompt.txt ├── query_img.png ├── stage1_subgoal_constraints.txt ├── stage2_path_constraints.txt ├── stage2_subgoal_constraints.txt ├── stage3_path_constraints.txt └── stage3_subgoal_constraints.txt ├── pen ├── metadata.json ├── query_img.png ├── stage1_subgoal_constraints.txt ├── stage2_path_constraints.txt ├── stage2_subgoal_constraints.txt ├── stage3_path_constraints.txt └── stage3_subgoal_constraints.txt └── prompt_template.txt /.gitignore: -------------------------------------------------------------------------------- 1 | *cache* 2 | **/__pycache__/** 3 | **/*.egg-info/ 4 | *.egg 5 | 6 | ./videos 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation 2 | 3 | #### [[Project Page]](https://rekep-robot.github.io/) [[Paper]](https://rekep-robot.github.io/rekep.pdf) [[Video]](https://youtu.be/2S8YhBdLdww) 4 | 5 | [Wenlong Huang](https://wenlong.page)¹, [Chen Wang](https://www.chenwangjeremy.net/)^1*, [Yunzhu Li](https://yunzhuli.github.io/)^2*, [Ruohan Zhang](https://ai.stanford.edu/~zharu/)¹, [Li Fei-Fei](https://profiles.stanford.edu/fei-fei-li)¹ (\* indicates equal contributions) 6 | 7 | ¹Stanford University, ³Columbia University 8 | 9 |

10 | 11 | This is the official demo code for [ReKep](https://rekep-robot.github.io/) implemented in [OmniGibson](https://behavior.stanford.edu/omnigibson/index.html). ReKep is a method that uses large vision models and vision-language models in a hierarchical optimization framework to generate closed-loop trajectories for manipulation tasks. 12 | 13 | 14 | ## Setup Instructions 15 | 16 | Note that this codebase is best run with a display. For running in headless mode, refer to the [instructions in OmniGibson](https://behavior.stanford.edu/omnigibson/getting_started/installation.html). 17 | 18 | - Install [OmniGibson](https://behavior.stanford.edu/omnigibson/getting_started/installation.html). This code is tested on [this commit](https://github.com/StanfordVL/OmniGibson/tree/cc0316a0574018a3cb2956fcbff3be75c07cdf0f). 19 | 20 | NOTE: If you encounter the warning `We did not find Isaac Sim under ~/.local/share/ov/pkg.` when running `./scripts/setup.sh` for OmniGibson, first ensure that you have installed Isaac Sim. Assuming Isaac Sim is installed in the default directory, then provide the following path `/home/[USERNAME]/.local/share/ov/pkg/isaac-sim-2023.1.1` (replace `[USERNAME]` with your username). 21 | 22 | - Install ReKep in the same conda environment: 23 | ```Shell 24 | conda activate omnigibson 25 | cd .. 26 | git clone https://github.com/huangwl18/ReKep.git 27 | cd ReKep 28 | pip install -r requirements.txt 29 | ``` 30 | 31 | - Obtain an [OpenAI API](https://openai.com/blog/openai-api) key and set it up as an environment variable: 32 | ```Shell 33 | export OPENAI_API_KEY="YOUR_OPENAI_API_KEY" 34 | ``` 35 | 36 | ## Running Demo 37 | 38 | We provide a demo "pen-in-holder" task that illustrates the core idea in ReKep. Below we provide several options to run the demo. 39 | 40 | Notes: 41 | - An additional `--visualize` flag may be added to visualize every solution from optimization, but since the pipeline needs to repeatedly solves optimization problems, the visualization is blocking and needs to be closed every time in order to continue (by pressing "ESC"). 42 | - Due to challenges of determinism of the physics simulator, different runs with the same random seed may produce different results. It is possible that the robot may fail at the provided task, especially when external disturbances are applied. In this case, we recommend running the demo again. 43 | 44 | ### Demo with Cached Query 45 | 46 | We recommend starting with the cached VLM query. 47 | 48 | ```Shell 49 | python main.py --use_cached_query [--visualize] 50 | ``` 51 | 52 | A video will be saved to `./videos` by default. 53 | 54 |

55 | 56 | ### Demo with External Disturbances 57 | 58 | Since ReKep acts as a closed-loop policy, it is robust to disturbances with automatic failure recovery both within stages and across stages. To demonstrate this in simulation, we apply the following disturbances for the "pen-in-holder" task: 59 | 60 | - Move the pen when robot is trying to grasp the pen 61 | 62 | - Take the pen out of the gripper when robot is trying to reorient the pen 63 | 64 | - Move the holder when robot is trying to drop the pen into the holder 65 | 66 |

67 | 68 | Note that since the disturbances are pre-defined, we recommend running with the cached query. 69 | 70 | ```Shell 71 | python main.py --use_cached_query --apply_disturbance [--visualize] 72 | ``` 73 | ### Demo with Live Query 74 | 75 | The following script can be run to query VLM for a new sequence of ReKep constraints and executes them on the robot: 76 | 77 | ```Shell 78 | python main.py [--visualize] 79 | ``` 80 | 81 | ## Setup New Environments 82 | Leveraging the diverse objects and scenes provided by [BEHAVIOR-1K](https://behavior.stanford.edu/) in [OmniGibson](https://behavior.stanford.edu/omnigibson/index.html), new tasks and scenes can be easily configured. To change the objects, you may check out the available objects as part of the BEHAVIOR assets on this [page](https://behavior.stanford.edu/knowledgebase/objects/index.html) (click on each object instance to view its visualization). After identifying the objects, we recommend making a copy of the JSON scene file `./configs/og_scene_file_pen.json` and edit the `state` and `objects_info` accordingly. Remember that the scene file need to be supplied to the `Main` class at initialization. Additional [scenes](https://behavior.stanford.edu/knowledgebase/scenes/index.html) and [robots](https://behavior.stanford.edu/omnigibson/getting_started/examples.html#robots) provided by BEHAVIOR-1K may also be possible, but they are currently untested. 83 | 84 | ## Real-World Deployment 85 | To deploy ReKep in the real world, most changes should only be needed inside `environment.py`. Specifically, all of the "exposed functions" need to be changed for the real world environment. The following components need to be implemented: 86 | 87 | - **Robot Controller**: Our real-world implementation uses the joint impedance controller from [Deoxys](https://github.com/UT-Austin-RPL/deoxys_control) for our Franka setup. Specifically, when `execute_action` in `environment.py` receives a target end-effector pose, we first calculate IK to obtain the target joint positions and send the command to the low-level controller. 88 | - **Keypoint Tracker**: Keypoints need to be tracked in order to perform closed-loop replanning, and this typically needs to be achieved using RGD-D cameras. Our real-world implementation uses similarity matching of [DINOv2](https://github.com/facebookresearch/dinov2) features calculated from multiple RGB-D cameras to track the keypoints (details may be found in the [paper](https://rekep-robot.github.io/rekep.pdf) appendix). Alternatively, we also recommend trying out specialized point trackers, such as [\[1\]](https://github.com/henry123-boy/SpaTracker), [\[2\]](https://github.com/google-deepmind/tapnet), [\[3\]](https://github.com/facebookresearch/co-tracker), and [\[4\]](https://github.com/aharley/pips2). 89 | - **SDF Reconstruction**: In order to avoid collision with irrelevant objects or the table, an SDF voxel grid of the environment needs to be provided to the solvers. Additionally, the SDF should ignore robot arm and any grasped objects. Our real-world implementation uses [nvblox_torch](https://github.com/NVlabs/nvblox_torch) for ESDF reconstruction, [cuRobo](https://github.com/NVlabs/curobo) for segmenting robot arm, and [Cutie](https://github.com/hkchengrex/Cutie) for object mask tracking. 90 | - **(Optional) Consistency Cost**: If closed-loop replanning is desired, we find it helpful to include a consistency cost in the solver to encourage the new solution to be close to the previous one (more details can be found in the [paper](https://rekep-robot.github.io/rekep.pdf) appendix). 91 | - **(Optional) Grasp Metric or Grasp Detector**: We include a cost that encourages top-down grasp pose in this codebase, in addition to the collision avoidance cost and the ReKep constraint (for identifying grasp keypoint), which collectively identify the 6 DoF grasp pose. Alternatively, other grasp metrics can be included, such as force-closure. Our real-world implementation instead uses grasp detectors [AnyGrasp](https://github.com/graspnet/anygrasp_sdk), which is implemented as a special routine because it is too slow to be used as an optimizable cost. 92 | 93 | Since there are several components in the pipeline, running them sequentially in the real world may be too slow. As a result, we recommend running the following compute-intensive components in separate processes in addition to the main process that runs `main.py`: `subgoal_solver`, `path_solver`, `keypoint_tracker`, `sdf_reconstruction`, `mask_tracker`, and `grasp_detector` (if used). 94 | 95 | ## Known Limitations 96 | - **Prompt Tuning**: Since ReKep relies on VLMs to generate code-based constraints to solve for the behaviors of the robot, it is sensitive to the specific VLM used and the prompts given to the VLM. Although visual prompting is used, typically we find that the prompts do not necessarily need to contain image-text examples or code examples, and pure-text high-level instructions can go a long way with the latest VLM such as `GPT-4o`. As a result, when starting with a new domain and if you observe that the default prompt is failing, we recommend the following steps: 1) pick a few representative tasks in the domain for validation purposes, 2) procedurally update the prompt with high-level text examples and instructions, and 3) test the prompt by checking the text output and return to step 2 if needed. 97 | 98 | - **Performance Tuning**: For clarity purpose, the entire pipeline is run sequentially. The latency introduced by the simulator and the solvers gets compounded. If this is a concern, we recommend running compute-intensive components, such as the simulator, the `subgoal_solver`, and the `path_solver`, in separate processes, but concurrency needs to be handled with care. More discussion can be found in the "Real-World Deployment" section. To tune the solver, the `objective` function typically takes the majority of time, and among the different costs, the reachability cost by the IK solver is typically most expensive to compute. Depending on the task, you may reduce `sampling_maxfun` and `maxiter` in `configs/config.yaml` or disable the reachability cost. 99 | 100 | - **Task-Space Planning**: Since the current pipeline performs planning in the task space (i.e., solving for end-effector poses) instead of the joint space, it occasionally may produce actions kinematically challenging for robots to achieve, especially for tasks that require 6 DoF motions. 101 | 102 | ## Troubleshooting 103 | 104 | For issues related to OmniGibson, please raise a issue [here](https://github.com/StanfordVL/OmniGibson/issues). You are also welcome to join the [Discord](https://discord.com/invite/bccR5vGFEx) channel for timely support. 105 | 106 | For other issues related to the code in this repo, feel free to raise an issue in this repo and we will try to address it when available. -------------------------------------------------------------------------------- /configs/config.yaml: -------------------------------------------------------------------------------- 1 | device: &device 'cuda' 2 | seed: &seed 0 3 | 4 | main: 5 | interpolate_pos_step_size: &interpolate_pos_step_size 0.05 # controls the density of the final returned path 6 | interpolate_rot_step_size: &interpolate_rot_step_size 0.34 # about 20 degrees 7 | grasp_depth: 0.10 8 | constraint_tolerance: 0.10 # for backtracking 9 | bounds_min: &bounds_min [-0.45, -0.75, 0.698] 10 | bounds_max: &bounds_max [0.10, 0.60, 1.2] 11 | sdf_voxel_size: 0.01 12 | vlm_camera: 0 13 | action_steps_per_iter: 5 14 | seed: *seed 15 | 16 | env: 17 | video_cache_size: 2000 18 | og_sim: 19 | physics_frequency: 60 20 | action_frequency: 15 21 | 22 | scene: 23 | name: Rs_int 24 | type: InteractiveTraversableScene 25 | scene_model: Rs_int 26 | 27 | bounds_min: *bounds_min 28 | bounds_max: *bounds_max 29 | interpolate_pos_step_size: *interpolate_pos_step_size 30 | interpolate_rot_step_size: *interpolate_rot_step_size 31 | 32 | robot: 33 | robot_config: 34 | name: Fetch 35 | type: Fetch 36 | obs_modalities: [rgb, depth] 37 | action_modalities: continuous 38 | action_normalize: False 39 | position: [-0.8, 0.0, 0.] 40 | grasping_mode: assisted 41 | 42 | controller_config: 43 | base: 44 | name: DifferentialDriveController 45 | arm_0: 46 | name: OperationalSpaceController 47 | kp: 250 48 | kp_limits: [50, 400] 49 | damping_ratio: 0.6 50 | gripper_0: 51 | name: MultiFingerGripperController 52 | command_input_limits: [0.0, 1.0] 53 | mode: smooth 54 | camera: 55 | name: JointController 56 | 57 | camera: 58 | # recorder 59 | 1: 60 | name: cam_1 61 | position: [ 0.6137, 0.4764, 1.4565] 62 | orientation: [ 0.3212, 0.4682, 0.6788, 0.4656] 63 | resolution: 480 64 | 65 | # vlm camera 66 | 0: 67 | name: cam_0 68 | position: [-0.1655, 0.0167, 1.3664] 69 | orientation: [ 0.0550, 0.0544, 0.7010, 0.7090] 70 | resolution: 480 71 | 72 | path_solver: 73 | opt_pos_step_size: 0.20 # controls the density of control points in the path 74 | opt_rot_step_size: 0.78 # controls the density of control points in the path 75 | opt_interpolate_pos_step_size: 0.02 # controls the density of collision checking inside optimization 76 | opt_interpolate_rot_step_size: 0.10 77 | max_collision_points: 60 78 | sampling_maxfun: 5000 79 | bounds_min: *bounds_min 80 | bounds_max: *bounds_max 81 | constraint_tolerance: 0.0001 82 | minimizer_options: 83 | maxiter: 200 84 | 85 | subgoal_solver: 86 | bounds_min: *bounds_min 87 | bounds_max: *bounds_max 88 | sampling_maxfun: 5000 89 | max_collision_points: 60 90 | constraint_tolerance: 0.0001 91 | minimizer_options: 92 | maxiter: 200 93 | 94 | keypoint_proposer: 95 | num_candidates_per_mask: 5 96 | min_dist_bt_keypoints: 0.06 97 | max_mask_ratio: 0.5 98 | device: *device 99 | bounds_min: *bounds_min 100 | bounds_max: *bounds_max 101 | seed: *seed 102 | 103 | constraint_generator: 104 | model: gpt-4o 105 | temperature: 0.0 106 | max_tokens: 2048 107 | 108 | visualizer: 109 | bounds_min: *bounds_min 110 | bounds_max: *bounds_max -------------------------------------------------------------------------------- /constraint_generation.py: -------------------------------------------------------------------------------- 1 | import base64 2 | from openai import OpenAI 3 | import os 4 | import cv2 5 | import json 6 | import parse 7 | import numpy as np 8 | import time 9 | from datetime import datetime 10 | 11 | # Function to encode the image 12 | def encode_image(image_path): 13 | with open(image_path, "rb") as image_file: 14 | return base64.b64encode(image_file.read()).decode('utf-8') 15 | 16 | class ConstraintGenerator: 17 | def __init__(self, config): 18 | self.config = config 19 | self.client = OpenAI(api_key=os.environ['OPENAI_API_KEY'], base_url="https://api.chatanywhere.tech/v1") 20 | self.base_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), './vlm_query') 21 | with open(os.path.join(self.base_dir, 'prompt_template.txt'), 'r') as f: 22 | self.prompt_template = f.read() 23 | 24 | def _build_prompt(self, image_path, instruction): 25 | img_base64 = encode_image(image_path) 26 | prompt_text = self.prompt_template.format(instruction=instruction) 27 | # save prompt 28 | with open(os.path.join(self.task_dir, 'prompt.txt'), 'w') as f: 29 | f.write(prompt_text) 30 | messages = [ 31 | { 32 | "role": "user", 33 | "content": [ 34 | { 35 | "type": "text", 36 | "text": self.prompt_template.format(instruction=instruction) 37 | }, 38 | { 39 | "type": "image_url", 40 | "image_url": { 41 | "url": f"data:image/png;base64,{img_base64}" 42 | } 43 | }, 44 | ] 45 | } 46 | ] 47 | return messages 48 | 49 | def _parse_and_save_constraints(self, output, save_dir): 50 | # parse into function blocks 51 | lines = output.split("\n") 52 | functions = dict() 53 | for i, line in enumerate(lines): 54 | if line.startswith("def "): 55 | start = i 56 | name = line.split("(")[0].split("def ")[1] 57 | if line.startswith(" return "): 58 | end = i 59 | functions[name] = lines[start:end+1] 60 | # organize them based on hierarchy in function names 61 | groupings = dict() 62 | for name in functions: 63 | parts = name.split("_")[:-1] # last one is the constraint idx 64 | key = "_".join(parts) 65 | if key not in groupings: 66 | groupings[key] = [] 67 | groupings[key].append(name) 68 | # save them into files 69 | for key in groupings: 70 | with open(os.path.join(save_dir, f"{key}_constraints.txt"), "w") as f: 71 | for name in groupings[key]: 72 | f.write("\n".join(functions[name]) + "\n\n") 73 | print(f"Constraints saved to {save_dir}") 74 | 75 | def _parse_other_metadata(self, output): 76 | data_dict = dict() 77 | # find num_stages 78 | num_stages_template = "num_stages = {num_stages}" 79 | for line in output.split("\n"): 80 | num_stages = parse.parse(num_stages_template, line) 81 | if num_stages is not None: 82 | break 83 | if num_stages is None: 84 | raise ValueError("num_stages not found in output") 85 | data_dict['num_stages'] = int(num_stages['num_stages']) 86 | # find grasp_keypoints 87 | grasp_keypoints_template = "grasp_keypoints = {grasp_keypoints}" 88 | for line in output.split("\n"): 89 | grasp_keypoints = parse.parse(grasp_keypoints_template, line) 90 | if grasp_keypoints is not None: 91 | break 92 | if grasp_keypoints is None: 93 | raise ValueError("grasp_keypoints not found in output") 94 | # convert into list of ints 95 | grasp_keypoints = grasp_keypoints['grasp_keypoints'].replace("[", "").replace("]", "").split(",") 96 | grasp_keypoints = [int(x.strip()) for x in grasp_keypoints] 97 | data_dict['grasp_keypoints'] = grasp_keypoints 98 | # find release_keypoints 99 | release_keypoints_template = "release_keypoints = {release_keypoints}" 100 | for line in output.split("\n"): 101 | release_keypoints = parse.parse(release_keypoints_template, line) 102 | if release_keypoints is not None: 103 | break 104 | if release_keypoints is None: 105 | raise ValueError("release_keypoints not found in output") 106 | # convert into list of ints 107 | release_keypoints = release_keypoints['release_keypoints'].replace("[", "").replace("]", "").split(",") 108 | release_keypoints = [int(x.strip()) for x in release_keypoints] 109 | data_dict['release_keypoints'] = release_keypoints 110 | return data_dict 111 | 112 | def _save_metadata(self, metadata): 113 | for k, v in metadata.items(): 114 | if isinstance(v, np.ndarray): 115 | metadata[k] = v.tolist() 116 | with open(os.path.join(self.task_dir, 'metadata.json'), 'w') as f: 117 | json.dump(metadata, f) 118 | print(f"Metadata saved to {os.path.join(self.task_dir, 'metadata.json')}") 119 | 120 | def generate(self, img, instruction, metadata): 121 | """ 122 | Args: 123 | img (np.ndarray): image of the scene (H, W, 3) uint8 124 | instruction (str): instruction for the query 125 | Returns: 126 | save_dir (str): directory where the constraints 127 | """ 128 | # create a directory for the task 129 | fname = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") + "_" + instruction.lower().replace(" ", "_") 130 | self.task_dir = os.path.join(self.base_dir, fname) 131 | os.makedirs(self.task_dir, exist_ok=True) 132 | # save query image 133 | image_path = os.path.join(self.task_dir, 'query_img.png') 134 | cv2.imwrite(image_path, img[..., ::-1]) 135 | # build prompt 136 | messages = self._build_prompt(image_path, instruction) 137 | # stream back the response 138 | stream = self.client.chat.completions.create(model=self.config['model'], 139 | messages=messages, 140 | temperature=self.config['temperature'], 141 | max_tokens=self.config['max_tokens'], 142 | stream=True) 143 | output = "" 144 | start = time.time() 145 | for chunk in stream: 146 | print(f'[{time.time()-start:.2f}s] Querying OpenAI API...', end='\r') 147 | if chunk.choices[0].delta.content is not None: 148 | output += chunk.choices[0].delta.content 149 | print(f'[{time.time()-start:.2f}s] Querying OpenAI API...Done') 150 | # save raw output 151 | with open(os.path.join(self.task_dir, 'output_raw.txt'), 'w') as f: 152 | f.write(output) 153 | # parse and save constraints 154 | self._parse_and_save_constraints(output, self.task_dir) 155 | # save metadata 156 | metadata.update(self._parse_other_metadata(output)) 157 | self._save_metadata(metadata) 158 | return self.task_dir 159 | -------------------------------------------------------------------------------- /dinov2_vits14_pretrain.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/dinov2_vits14_pretrain.pth -------------------------------------------------------------------------------- /environment.py: -------------------------------------------------------------------------------- 1 | import time 2 | import numpy as np 3 | import os 4 | import datetime 5 | import transform_utils as T 6 | import trimesh 7 | import open3d as o3d 8 | import imageio 9 | import omnigibson as og 10 | from omnigibson.macros import gm 11 | from omnigibson.utils.usd_utils import PoseAPI, mesh_prim_mesh_to_trimesh_mesh, mesh_prim_shape_to_trimesh_mesh 12 | from omnigibson.robots.fetch import Fetch 13 | from omnigibson.controllers import IsGraspingState 14 | from og_utils import OGCamera 15 | from utils import ( 16 | bcolors, 17 | get_clock_time, 18 | angle_between_rotmat, 19 | angle_between_quats, 20 | get_linear_interpolation_steps, 21 | linear_interpolate_poses, 22 | ) 23 | from omnigibson.robots.manipulation_robot import ManipulationRobot 24 | from omnigibson.controllers.controller_base import ControlType, BaseController 25 | import torch 26 | 27 | # Don't use GPU dynamics and use flatcache for performance boost 28 | gm.USE_GPU_DYNAMICS = True 29 | gm.ENABLE_FLATCACHE = False 30 | 31 | # some customization to the OG functions 32 | def custom_clip_control(self, control): 33 | """ 34 | Clips the inputted @control signal based on @control_limits. 35 | 36 | Args: 37 | control (Array[float]): control signal to clip 38 | 39 | Returns: 40 | Array[float]: Clipped control signal 41 | """ 42 | clipped_control = control.clip( 43 | self._control_limits[self.control_type][0][self.dof_idx], 44 | self._control_limits[self.control_type][1][self.dof_idx], 45 | ) 46 | idx = ( 47 | self._dof_has_limits[self.dof_idx] 48 | if self.control_type == ControlType.POSITION 49 | else [True] * self.control_dim 50 | ) 51 | if len(control) > 1: 52 | control[idx] = clipped_control[idx] 53 | return control 54 | 55 | Fetch._initialize = ManipulationRobot._initialize 56 | BaseController.clip_control = custom_clip_control 57 | 58 | class ReKepOGEnv: 59 | def __init__(self, config, scene_file, verbose=False): 60 | self.video_cache = [] 61 | self.config = config 62 | self.verbose = verbose 63 | self.config['scene']['scene_file'] = scene_file 64 | self.bounds_min = np.array(self.config['bounds_min']) 65 | self.bounds_max = np.array(self.config['bounds_max']) 66 | self.interpolate_pos_step_size = self.config['interpolate_pos_step_size'] 67 | self.interpolate_rot_step_size = self.config['interpolate_rot_step_size'] 68 | # create omnigibson environment 69 | self.step_counter = 0 70 | self.og_env = og.Environment(dict(scene=self.config['scene'], robots=[self.config['robot']['robot_config']], env=self.config['og_sim'])) 71 | self.og_env.scene.update_initial_state() 72 | for _ in range(10): og.sim.step() 73 | # robot vars 74 | self.robot = self.og_env.robots[0] 75 | dof_idx = np.concatenate([self.robot.trunk_control_idx, 76 | self.robot.arm_control_idx[self.robot.default_arm]]) 77 | self.reset_joint_pos = self.robot.reset_joint_pos[dof_idx] 78 | self.world2robot_homo = T.pose_inv(T.pose2mat(self.robot.get_position_orientation())) 79 | # initialize cameras 80 | self._initialize_cameras(self.config['camera']) 81 | self.last_og_gripper_action = 1.0 82 | 83 | # ====================================== 84 | # = exposed functions 85 | # ====================================== 86 | def get_sdf_voxels(self, resolution, exclude_robot=True, exclude_obj_in_hand=True): 87 | """ 88 | open3d-based SDF computation 89 | 1. recursively get all usd prim and get their vertices and faces 90 | 2. compute SDF using open3d 91 | """ 92 | start = time.time() 93 | exclude_names = ['wall', 'floor', 'ceiling'] 94 | if exclude_robot: 95 | exclude_names += ['fetch', 'robot'] 96 | if exclude_obj_in_hand: 97 | assert self.config['robot']['robot_config']['grasping_mode'] in ['assisted', 'sticky'], "Currently only supported for assisted or sticky grasping" 98 | in_hand_obj = self.robot._ag_obj_in_hand[self.robot.default_arm] 99 | if in_hand_obj is not None: 100 | exclude_names.append(in_hand_obj.name.lower()) 101 | trimesh_objects = [] 102 | for obj in self.og_env.scene.objects: 103 | if any([name in obj.name.lower() for name in exclude_names]): 104 | continue 105 | for link in obj.links.values(): 106 | for mesh in link.collision_meshes.values(): 107 | mesh_type = mesh.prim.GetPrimTypeInfo().GetTypeName() 108 | if mesh_type == 'Mesh': 109 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(mesh.prim) 110 | else: 111 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(mesh.prim) 112 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(mesh.prim_path) 113 | trimesh_object.apply_transform(world_pose_w_scale) 114 | trimesh_objects.append(trimesh_object) 115 | # chain trimesh objects 116 | scene_mesh = trimesh.util.concatenate(trimesh_objects) 117 | # Create a scene and add the triangle mesh 118 | scene = o3d.t.geometry.RaycastingScene() 119 | vertex_positions = scene_mesh.vertices 120 | triangle_indices = scene_mesh.faces 121 | vertex_positions = o3d.core.Tensor(vertex_positions, dtype=o3d.core.Dtype.Float32) 122 | triangle_indices = o3d.core.Tensor(triangle_indices, dtype=o3d.core.Dtype.UInt32) 123 | _ = scene.add_triangles(vertex_positions, triangle_indices) # we do not need the geometry ID for mesh 124 | # create a grid 125 | shape = np.ceil((self.bounds_max - self.bounds_min) / resolution).astype(int) 126 | steps = (self.bounds_max - self.bounds_min) / shape 127 | grid = np.mgrid[self.bounds_min[0]:self.bounds_max[0]:steps[0], 128 | self.bounds_min[1]:self.bounds_max[1]:steps[1], 129 | self.bounds_min[2]:self.bounds_max[2]:steps[2]] 130 | grid = grid.reshape(3, -1).T 131 | # compute SDF 132 | sdf_voxels = scene.compute_signed_distance(grid.astype(np.float32)) 133 | # convert back to np array 134 | sdf_voxels = sdf_voxels.cpu().numpy() 135 | # open3d has flipped sign from our convention 136 | sdf_voxels = -sdf_voxels 137 | sdf_voxels = sdf_voxels.reshape(shape) 138 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] SDF voxels computed in {time.time() - start:.4f} seconds{bcolors.ENDC}') 139 | return sdf_voxels 140 | 141 | def get_cam_obs(self): 142 | self.last_cam_obs = dict() 143 | for cam_id in self.cams: 144 | self.last_cam_obs[cam_id] = self.cams[cam_id].get_obs() # each containing rgb, depth, points, seg 145 | return self.last_cam_obs 146 | 147 | def register_keypoints(self, keypoints): 148 | """ 149 | Args: 150 | keypoints (np.ndarray): keypoints in the world frame of shape (N, 3) 151 | Returns: 152 | None 153 | Given a set of keypoints in the world frame, this function registers them so that their newest positions can be accessed later. 154 | """ 155 | if not isinstance(keypoints, np.ndarray): 156 | keypoints = np.array(keypoints) 157 | self.keypoints = keypoints 158 | self._keypoint_registry = dict() 159 | self._keypoint2object = dict() 160 | exclude_names = ['wall', 'floor', 'ceiling', 'table', 'fetch', 'robot'] 161 | for idx, keypoint in enumerate(keypoints): 162 | closest_distance = np.inf 163 | for obj in self.og_env.scene.objects: 164 | if any([name in obj.name.lower() for name in exclude_names]): 165 | continue 166 | for link in obj.links.values(): 167 | for mesh in link.visual_meshes.values(): 168 | mesh_prim_path = mesh.prim_path 169 | mesh_type = mesh.prim.GetPrimTypeInfo().GetTypeName() 170 | if mesh_type == 'Mesh': 171 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(mesh.prim) 172 | else: 173 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(mesh.prim) 174 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(mesh.prim_path) 175 | trimesh_object.apply_transform(world_pose_w_scale) 176 | points_transformed = trimesh_object.sample(1000) 177 | 178 | # find closest point 179 | dists = np.linalg.norm(points_transformed - keypoint, axis=1) 180 | point = points_transformed[np.argmin(dists)] 181 | distance = np.linalg.norm(point - keypoint) 182 | if distance < closest_distance: 183 | closest_distance = distance 184 | closest_prim_path = mesh_prim_path 185 | closest_point = point 186 | closest_obj = obj 187 | self._keypoint_registry[idx] = (closest_prim_path, PoseAPI.get_world_pose(closest_prim_path)) 188 | self._keypoint2object[idx] = closest_obj 189 | # overwrite the keypoint with the closest point 190 | self.keypoints[idx] = closest_point 191 | 192 | def get_keypoint_positions(self): 193 | """ 194 | Args: 195 | None 196 | Returns: 197 | np.ndarray: keypoints in the world frame of shape (N, 3) 198 | Given the registered keypoints, this function returns their current positions in the world frame. 199 | """ 200 | assert hasattr(self, '_keypoint_registry') and self._keypoint_registry is not None, "Keypoints have not been registered yet." 201 | keypoint_positions = [] 202 | for idx, (prim_path, init_pose) in self._keypoint_registry.items(): 203 | init_pose = T.pose2mat(init_pose) 204 | centering_transform = T.pose_inv(init_pose) 205 | keypoint_centered = np.dot(centering_transform, np.append(self.keypoints[idx], 1))[:3] 206 | curr_pose = T.pose2mat(PoseAPI.get_world_pose(prim_path)) 207 | keypoint = np.dot(curr_pose, np.append(keypoint_centered, 1))[:3] 208 | keypoint_positions.append(keypoint) 209 | return np.array(keypoint_positions) 210 | 211 | def get_object_by_keypoint(self, keypoint_idx): 212 | """ 213 | Args: 214 | keypoint_idx (int): the index of the keypoint 215 | Returns: 216 | pointer: the object that the keypoint is associated with 217 | Given the keypoint index, this function returns the name of the object that the keypoint is associated with. 218 | """ 219 | assert hasattr(self, '_keypoint2object') and self._keypoint2object is not None, "Keypoints have not been registered yet." 220 | return self._keypoint2object[keypoint_idx] 221 | 222 | def get_collision_points(self, noise=True): 223 | """ 224 | Get the points of the gripper and any object in hand. 225 | """ 226 | # add gripper collision points 227 | collision_points = [] 228 | for obj in self.og_env.scene.objects: 229 | if 'fetch' in obj.name.lower(): 230 | for name, link in obj.links.items(): 231 | if 'gripper' in name.lower() or 'wrist' in name.lower(): # wrist_roll and wrist_flex 232 | for collision_mesh in link.collision_meshes.values(): 233 | mesh_prim_path = collision_mesh.prim_path 234 | mesh_type = collision_mesh.prim.GetPrimTypeInfo().GetTypeName() 235 | if mesh_type == 'Mesh': 236 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(collision_mesh.prim) 237 | else: 238 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(collision_mesh.prim) 239 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(mesh_prim_path) 240 | trimesh_object.apply_transform(world_pose_w_scale) 241 | points_transformed = trimesh_object.sample(1000) 242 | # add to collision points 243 | collision_points.append(points_transformed) 244 | # add object in hand collision points 245 | in_hand_obj = self.robot._ag_obj_in_hand[self.robot.default_arm] 246 | if in_hand_obj is not None: 247 | for link in in_hand_obj.links.values(): 248 | for collision_mesh in link.collision_meshes.values(): 249 | mesh_type = collision_mesh.prim.GetPrimTypeInfo().GetTypeName() 250 | if mesh_type == 'Mesh': 251 | trimesh_object = mesh_prim_mesh_to_trimesh_mesh(collision_mesh.prim) 252 | else: 253 | trimesh_object = mesh_prim_shape_to_trimesh_mesh(collision_mesh.prim) 254 | world_pose_w_scale = PoseAPI.get_world_pose_with_scale(collision_mesh.prim_path) 255 | trimesh_object.apply_transform(world_pose_w_scale) 256 | points_transformed = trimesh_object.sample(1000) 257 | # add to collision points 258 | collision_points.append(points_transformed) 259 | collision_points = np.concatenate(collision_points, axis=0) 260 | return collision_points 261 | 262 | def reset(self): 263 | self.og_env.reset() 264 | self.robot.reset() 265 | for _ in range(5): self._step() 266 | self.open_gripper() 267 | # moving arm to the side to unblock view 268 | ee_pose = self.get_ee_pose() 269 | ee_pose[:3] += np.array([0.0, -0.2, -0.1]) 270 | action = np.concatenate([ee_pose, [self.get_gripper_null_action()]]) 271 | self.execute_action(action, precise=True) 272 | self.video_cache = [] 273 | print(f'{bcolors.HEADER}Reset done.{bcolors.ENDC}') 274 | 275 | def is_grasping(self, candidate_obj=None): 276 | return self.robot.is_grasping(candidate_obj=candidate_obj) == IsGraspingState.TRUE 277 | 278 | def get_ee_pose(self): 279 | ee_pos, ee_xyzw = (self.robot.get_eef_position(), self.robot.get_eef_orientation()) 280 | ee_pose = np.concatenate([ee_pos, ee_xyzw]) # [7] 281 | return ee_pose 282 | 283 | def get_ee_pos(self): 284 | return self.get_ee_pose()[:3] 285 | 286 | def get_ee_quat(self): 287 | return self.get_ee_pose()[3:] 288 | 289 | def get_arm_joint_postions(self): 290 | assert isinstance(self.robot, Fetch), "The IK solver assumes the robot is a Fetch robot" 291 | arm = self.robot.default_arm 292 | dof_idx = np.concatenate([self.robot.trunk_control_idx, self.robot.arm_control_idx[arm]]) 293 | arm_joint_pos = self.robot.get_joint_positions()[dof_idx] 294 | return arm_joint_pos 295 | 296 | def close_gripper(self): 297 | """ 298 | Exposed interface: 1.0 for closed, -1.0 for open, 0.0 for no change 299 | Internal OG interface: 1.0 for open, 0.0 for closed 300 | """ 301 | if self.last_og_gripper_action == 0.0: 302 | return 303 | action = np.zeros(12) 304 | action[10:] = [0, 0] # gripper: float. 0. for closed, 1. for open. 305 | for _ in range(30): 306 | self._step(action) 307 | self.last_og_gripper_action = 0.0 308 | 309 | def open_gripper(self): 310 | if self.last_og_gripper_action == 1.0: 311 | return 312 | action = np.zeros(12) 313 | action[10:] = [1, 1] # gripper: float. 0. for closed, 1. for open. 314 | for _ in range(30): 315 | self._step(action) 316 | self.last_og_gripper_action = 1.0 317 | 318 | def get_last_og_gripper_action(self): 319 | return self.last_og_gripper_action 320 | 321 | def get_gripper_open_action(self): 322 | return -1.0 323 | 324 | def get_gripper_close_action(self): 325 | return 1.0 326 | 327 | def get_gripper_null_action(self): 328 | return 0.0 329 | 330 | def compute_target_delta_ee(self, target_pose): 331 | target_pos, target_xyzw = target_pose[:3], target_pose[3:] 332 | ee_pose = self.get_ee_pose() 333 | ee_pos, ee_xyzw = ee_pose[:3], ee_pose[3:] 334 | pos_diff = np.linalg.norm(ee_pos - target_pos) 335 | rot_diff = angle_between_quats(ee_xyzw, target_xyzw) 336 | return pos_diff, rot_diff 337 | 338 | def execute_action( 339 | self, 340 | action, 341 | precise=True, 342 | ): 343 | """ 344 | Moves the robot gripper to a target pose by specifying the absolute pose in the world frame and executes gripper action. 345 | 346 | Args: 347 | action (x, y, z, qx, qy, qz, qw, gripper_action): absolute target pose in the world frame + gripper action. 348 | precise (bool): whether to use small position and rotation thresholds for precise movement (robot would move slower). 349 | Returns: 350 | tuple: A tuple containing the position and rotation errors after reaching the target pose. 351 | """ 352 | if precise: 353 | pos_threshold = 0.03 354 | rot_threshold = 3.0 355 | else: 356 | pos_threshold = 0.10 357 | rot_threshold = 5.0 358 | action = np.array(action).copy() 359 | assert action.shape == (8,) 360 | target_pose = action[:7] 361 | gripper_action = action[7] 362 | 363 | # ====================================== 364 | # = status and safety check 365 | # ====================================== 366 | if np.any(target_pose[:3] < self.bounds_min) \ 367 | or np.any(target_pose[:3] > self.bounds_max): 368 | print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Target position is out of bounds, clipping to workspace bounds{bcolors.ENDC}') 369 | target_pose[:3] = np.clip(target_pose[:3], self.bounds_min, self.bounds_max) 370 | 371 | # ====================================== 372 | # = interpolation 373 | # ====================================== 374 | current_pose = self.get_ee_pose() 375 | pos_diff = np.linalg.norm(current_pose[:3] - target_pose[:3]) 376 | rot_diff = angle_between_quats(current_pose[3:7], target_pose[3:7]) 377 | pos_is_close = pos_diff < self.interpolate_pos_step_size 378 | rot_is_close = rot_diff < self.interpolate_rot_step_size 379 | if pos_is_close and rot_is_close: 380 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Skipping interpolation{bcolors.ENDC}') 381 | pose_seq = np.array([target_pose]) 382 | else: 383 | num_steps = get_linear_interpolation_steps(current_pose, target_pose, self.interpolate_pos_step_size, self.interpolate_rot_step_size) 384 | pose_seq = linear_interpolate_poses(current_pose, target_pose, num_steps) 385 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Interpolating for {num_steps} steps{bcolors.ENDC}') 386 | 387 | # ====================================== 388 | # = move to target pose 389 | # ====================================== 390 | # move faster for intermediate poses 391 | intermediate_pos_threshold = 0.10 392 | intermediate_rot_threshold = 5.0 393 | for pose in pose_seq[:-1]: 394 | self._move_to_waypoint(pose, intermediate_pos_threshold, intermediate_rot_threshold) 395 | # move to the final pose with required precision 396 | pose = pose_seq[-1] 397 | self._move_to_waypoint(pose, pos_threshold, rot_threshold, max_steps=20 if not precise else 40) 398 | # compute error 399 | pos_error, rot_error = self.compute_target_delta_ee(target_pose) 400 | self.verbose and print(f'\n{bcolors.BOLD}[environment.py | {get_clock_time()}] Move to pose completed (pos_error: {pos_error}, rot_error: {np.rad2deg(rot_error)}){bcolors.ENDC}\n') 401 | 402 | # ====================================== 403 | # = apply gripper action 404 | # ====================================== 405 | if gripper_action == self.get_gripper_open_action(): 406 | self.open_gripper() 407 | elif gripper_action == self.get_gripper_close_action(): 408 | self.close_gripper() 409 | elif gripper_action == self.get_gripper_null_action(): 410 | pass 411 | else: 412 | raise ValueError(f"Invalid gripper action: {gripper_action}") 413 | 414 | return pos_error, rot_error 415 | 416 | def sleep(self, seconds): 417 | start = time.time() 418 | while time.time() - start < seconds: 419 | self._step() 420 | 421 | def save_video(self, save_path=None): 422 | save_dir = os.path.join(os.path.dirname(__file__), 'videos') 423 | os.makedirs(save_dir, exist_ok=True) 424 | if save_path is None: 425 | save_path = os.path.join(save_dir, f'{datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")}.mp4') 426 | video_writer = imageio.get_writer(save_path, fps=30) 427 | for rgb in self.video_cache: 428 | # print(type(rgb), rgb.shape) 429 | if not isinstance(rgb, np.ndarray): 430 | rgb = np.array(rgb) 431 | video_writer.append_data(rgb) 432 | video_writer.close() 433 | return save_path 434 | 435 | # ====================================== 436 | # = internal functions 437 | # ====================================== 438 | def _check_reached_ee(self, target_pos, target_xyzw, pos_threshold, rot_threshold): 439 | """ 440 | this is supposed to be for true ee pose (franka hand) in robot frame 441 | """ 442 | current_pos = self.robot.get_eef_position() 443 | current_xyzw = self.robot.get_eef_orientation() 444 | current_rotmat = T.quat2mat(current_xyzw) 445 | target_rotmat = T.quat2mat(target_xyzw) 446 | # calculate position delta 447 | if torch.is_tensor(current_pos): 448 | current_pos = current_pos.detach().cpu().numpy() 449 | pos_diff = (target_pos - current_pos).flatten() 450 | pos_error = np.linalg.norm(pos_diff) 451 | # calculate rotation delta 452 | rot_error = angle_between_rotmat(current_rotmat, target_rotmat) 453 | # print status 454 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Curr pose: {current_pos}, {current_xyzw} (pos_error: {pos_error.round(4)}, rot_error: {np.rad2deg(rot_error).round(4)}){bcolors.ENDC}') 455 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] Goal pose: {target_pos}, {target_xyzw} (pos_thres: {pos_threshold}, rot_thres: {rot_threshold}){bcolors.ENDC}') 456 | if pos_error < pos_threshold and rot_error < np.deg2rad(rot_threshold): 457 | self.verbose and print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] OSC pose reached (pos_error: {pos_error.round(4)}, rot_error: {np.rad2deg(rot_error).round(4)}){bcolors.ENDC}') 458 | return True, pos_error, rot_error 459 | return False, pos_error, rot_error 460 | 461 | def _move_to_waypoint(self, target_pose_world, pos_threshold=0.02, rot_threshold=3.0, max_steps=10): 462 | pos_errors = [] 463 | rot_errors = [] 464 | count = 0 465 | while count < max_steps: 466 | reached, pos_error, rot_error = self._check_reached_ee(target_pose_world[:3], target_pose_world[3:7], pos_threshold, rot_threshold) 467 | pos_errors.append(pos_error) 468 | rot_errors.append(rot_error) 469 | if reached: 470 | break 471 | # convert world pose to robot pose 472 | target_pose_robot = np.dot(self.world2robot_homo, T.convert_pose_quat2mat(target_pose_world)) 473 | # convert to relative pose to be used with the underlying controller 474 | self.relative_eef_position = self.robot.get_relative_eef_position() 475 | if torch.is_tensor(self.relative_eef_position): 476 | self.relative_eef_position = self.relative_eef_position.detach().cpu().numpy() 477 | relative_position = target_pose_robot[:3, 3] - self.relative_eef_position 478 | relative_quat = T.quat_distance(T.mat2quat(target_pose_robot[:3, :3]), self.robot.get_relative_eef_orientation()) 479 | assert isinstance(self.robot, Fetch), "this action space is only for fetch" 480 | action = np.zeros(12) # first 3 are base, which we don't use 481 | action[4:7] = relative_position 482 | action[7:10] = T.quat2axisangle(relative_quat) 483 | action[10:] = [self.last_og_gripper_action, self.last_og_gripper_action] 484 | # step the action 485 | _ = self._step(action=action) 486 | count += 1 487 | if count == max_steps: 488 | print(f'{bcolors.WARNING}[environment.py | {get_clock_time()}] OSC pose not reached after {max_steps} steps (pos_error: {pos_errors[-1].round(4)}, rot_error: {np.rad2deg(rot_errors[-1]).round(4)}){bcolors.ENDC}') 489 | 490 | def _step(self, action=None): 491 | if hasattr(self, 'disturbance_seq') and self.disturbance_seq is not None: 492 | next(self.disturbance_seq) 493 | if action is not None: 494 | self.og_env.step(action) 495 | else: 496 | og.sim.step() 497 | cam_obs = self.get_cam_obs() 498 | rgb = cam_obs[1]['rgb'] 499 | if len(self.video_cache) < self.config['video_cache_size']: 500 | self.video_cache.append(rgb) 501 | else: 502 | self.video_cache.pop(0) 503 | self.video_cache.append(rgb) 504 | self.step_counter += 1 505 | 506 | def _initialize_cameras(self, cam_config): 507 | """ 508 | ::param poses: list of tuples of (position, orientation) of the cameras 509 | """ 510 | self.cams = dict() 511 | for cam_id in cam_config: 512 | cam_id = int(cam_id) 513 | self.cams[cam_id] = OGCamera(self.og_env, cam_config[cam_id]) 514 | for _ in range(10): og.sim.render() -------------------------------------------------------------------------------- /ik_solver.py: -------------------------------------------------------------------------------- 1 | """ 2 | Adapted from OmniGibson and the Lula IK solver 3 | """ 4 | import omnigibson.lazy as lazy 5 | import numpy as np 6 | 7 | class IKSolver: 8 | """ 9 | Class for thinly wrapping Lula IK solver 10 | """ 11 | 12 | def __init__( 13 | self, 14 | robot_description_path, 15 | robot_urdf_path, 16 | eef_name, 17 | reset_joint_pos, 18 | world2robot_homo, 19 | ): 20 | # Create robot description, kinematics, and config 21 | self.robot_description = lazy.lula.load_robot(robot_description_path, robot_urdf_path) 22 | self.kinematics = self.robot_description.kinematics() 23 | self.config = lazy.lula.CyclicCoordDescentIkConfig() 24 | self.eef_name = eef_name 25 | self.reset_joint_pos = reset_joint_pos 26 | self.world2robot_homo = world2robot_homo 27 | 28 | def solve( 29 | self, 30 | target_pose_homo, 31 | position_tolerance=0.01, 32 | orientation_tolerance=0.05, 33 | position_weight=1.0, 34 | orientation_weight=0.05, 35 | max_iterations=150, 36 | initial_joint_pos=None, 37 | ): 38 | """ 39 | Backs out joint positions to achieve desired @target_pos and @target_quat 40 | 41 | Args: 42 | target_pose_homo (np.ndarray): [4, 4] homogeneous transformation matrix of the target pose in world frame 43 | position_tolerance (float): Maximum position error (L2-norm) for a successful IK solution 44 | orientation_tolerance (float): Maximum orientation error (per-axis L2-norm) for a successful IK solution 45 | position_weight (float): Weight for the relative importance of position error during CCD 46 | orientation_weight (float): Weight for the relative importance of position error during CCD 47 | max_iterations (int): Number of iterations used for each cyclic coordinate descent. 48 | initial_joint_pos (None or n-array): If specified, will set the initial cspace seed when solving for joint 49 | positions. Otherwise, will use self.reset_joint_pos 50 | 51 | Returns: 52 | ik_results (lazy.lula.CyclicCoordDescentIkResult): IK result object containing the joint positions and other information. 53 | """ 54 | # convert target pose to robot base frame 55 | target_pose_robot = np.dot(self.world2robot_homo, target_pose_homo) 56 | target_pose_pos = target_pose_robot[:3, 3] 57 | target_pose_rot = target_pose_robot[:3, :3] 58 | ik_target_pose = lazy.lula.Pose3(lazy.lula.Rotation3(target_pose_rot), target_pose_pos) 59 | # Set the cspace seed and tolerance 60 | initial_joint_pos = self.reset_joint_pos if initial_joint_pos is None else np.array(initial_joint_pos) 61 | self.config.cspace_seeds = [initial_joint_pos] 62 | self.config.position_tolerance = position_tolerance 63 | self.config.orientation_tolerance = orientation_tolerance 64 | self.config.ccd_position_weight = position_weight 65 | self.config.ccd_orientation_weight = orientation_weight 66 | self.config.max_num_descents = max_iterations 67 | # Compute target joint positions 68 | ik_results = lazy.lula.compute_ik_ccd(self.kinematics, ik_target_pose, self.eef_name, self.config) 69 | return ik_results -------------------------------------------------------------------------------- /keypoint_proposal.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import cv2 4 | from torch.nn.functional import interpolate 5 | from kmeans_pytorch import kmeans 6 | from utils import filter_points_by_bounds 7 | from sklearn.cluster import MeanShift 8 | 9 | class KeypointProposer: 10 | def __init__(self, config): 11 | self.config = config 12 | self.device = torch.device(self.config['device']) 13 | self.dinov2 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14').eval().to(self.device) 14 | # local_model_path = '/omnigibson-src/ReKep/dinov2_vits14_pretrain.pth' 15 | # checkpoint = torch.load(local_model_path) 16 | # self.dinov2 = checkpoint 17 | self.bounds_min = np.array(self.config['bounds_min']) 18 | self.bounds_max = np.array(self.config['bounds_max']) 19 | self.mean_shift = MeanShift(bandwidth=self.config['min_dist_bt_keypoints'], bin_seeding=True, n_jobs=32) 20 | self.patch_size = 14 # dinov2 21 | np.random.seed(self.config['seed']) 22 | torch.manual_seed(self.config['seed']) 23 | torch.cuda.manual_seed(self.config['seed']) 24 | 25 | def get_keypoints(self, rgb, points, masks): 26 | # preprocessing 27 | # breakpoint() 28 | transformed_rgb, rgb, points, masks, shape_info = self._preprocess(rgb, points, masks) 29 | # get features 30 | features_flat = self._get_features(transformed_rgb, shape_info) 31 | # for each mask, cluster in feature space to get meaningful regions, and use their centers as keypoint candidates 32 | candidate_keypoints, candidate_pixels, candidate_rigid_group_ids = self._cluster_features(points, features_flat, masks) 33 | # exclude keypoints that are outside of the workspace 34 | within_space = filter_points_by_bounds(candidate_keypoints, self.bounds_min, self.bounds_max, strict=True) 35 | candidate_keypoints = candidate_keypoints[within_space] 36 | candidate_pixels = candidate_pixels[within_space] 37 | candidate_rigid_group_ids = candidate_rigid_group_ids[within_space] 38 | # merge close points by clustering in cartesian space 39 | merged_indices = self._merge_clusters(candidate_keypoints) 40 | candidate_keypoints = candidate_keypoints[merged_indices] 41 | candidate_pixels = candidate_pixels[merged_indices] 42 | candidate_rigid_group_ids = candidate_rigid_group_ids[merged_indices] 43 | # sort candidates by locations 44 | sort_idx = np.lexsort((candidate_pixels[:, 0], candidate_pixels[:, 1])) 45 | candidate_keypoints = candidate_keypoints[sort_idx] 46 | candidate_pixels = candidate_pixels[sort_idx] 47 | candidate_rigid_group_ids = candidate_rigid_group_ids[sort_idx] 48 | # project keypoints to image space 49 | projected = self._project_keypoints_to_img(rgb, candidate_pixels, candidate_rigid_group_ids, masks, features_flat) 50 | return candidate_keypoints, projected 51 | 52 | def _preprocess(self, rgb, points, masks): 53 | if masks.is_cuda: 54 | masks = masks.cpu() 55 | # print("***masks", masks) 56 | 57 | rgb = rgb.cpu() # move to CPU if on GPU 58 | rgb = rgb.numpy() 59 | # print("***rgb", rgb) 60 | 61 | # convert masks to binary masks 62 | masks = [masks == uid for uid in np.unique(masks.numpy())] 63 | # print("***masks2", masks) 64 | # ensure input shape is compatible with dinov2 65 | H, W, _ = rgb.shape 66 | patch_h = int(H // self.patch_size) 67 | patch_w = int(W // self.patch_size) 68 | new_H = patch_h * self.patch_size 69 | new_W = patch_w * self.patch_size 70 | # print("***rgb2", rgb) 71 | transformed_rgb = cv2.resize(rgb, (new_W, new_H)) 72 | transformed_rgb = transformed_rgb.astype(np.float32) / 255.0 # float32 [H, W, 3] 73 | # shape info 74 | shape_info = { 75 | 'img_h': H, 76 | 'img_w': W, 77 | 'patch_h': patch_h, 78 | 'patch_w': patch_w, 79 | } 80 | return transformed_rgb, rgb, points, masks, shape_info 81 | 82 | def _project_keypoints_to_img(self, rgb, candidate_pixels, candidate_rigid_group_ids, masks, features_flat): 83 | projected = rgb.copy() 84 | # overlay keypoints on the image 85 | for keypoint_count, pixel in enumerate(candidate_pixels): 86 | displayed_text = f"{keypoint_count}" 87 | text_length = len(displayed_text) 88 | # draw a box 89 | box_width = 30 + 10 * (text_length - 1) 90 | box_height = 30 91 | cv2.rectangle(projected, (pixel[1] - box_width // 2, pixel[0] - box_height // 2), (pixel[1] + box_width // 2, pixel[0] + box_height // 2), (255, 255, 255), -1) 92 | cv2.rectangle(projected, (pixel[1] - box_width // 2, pixel[0] - box_height // 2), (pixel[1] + box_width // 2, pixel[0] + box_height // 2), (0, 0, 0), 2) 93 | # draw text 94 | org = (pixel[1] - 7 * (text_length), pixel[0] + 7) 95 | color = (255, 0, 0) 96 | cv2.putText(projected, str(keypoint_count), org, cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2) 97 | keypoint_count += 1 98 | return projected 99 | 100 | @torch.inference_mode() 101 | @torch.amp.autocast('cuda') 102 | def _get_features(self, transformed_rgb, shape_info): 103 | img_h = shape_info['img_h'] 104 | img_w = shape_info['img_w'] 105 | patch_h = shape_info['patch_h'] 106 | patch_w = shape_info['patch_w'] 107 | # get features 108 | img_tensors = torch.from_numpy(transformed_rgb).permute(2, 0, 1).unsqueeze(0).to(self.device) # float32 [1, 3, H, W] 109 | assert img_tensors.shape[1] == 3, "unexpected image shape" 110 | # breakpoint() 111 | features_dict = self.dinov2.forward_features(img_tensors) 112 | raw_feature_grid = features_dict['x_norm_patchtokens'] # float32 [num_cams, patch_h*patch_w, feature_dim] 113 | raw_feature_grid = raw_feature_grid.reshape(1, patch_h, patch_w, -1) # float32 [num_cams, patch_h, patch_w, feature_dim] 114 | # compute per-point feature using bilinear interpolation 115 | interpolated_feature_grid = interpolate(raw_feature_grid.permute(0, 3, 1, 2), # float32 [num_cams, feature_dim, patch_h, patch_w] 116 | size=(img_h, img_w), 117 | mode='bilinear').permute(0, 2, 3, 1).squeeze(0) # float32 [H, W, feature_dim] 118 | features_flat = interpolated_feature_grid.reshape(-1, interpolated_feature_grid.shape[-1]) # float32 [H*W, feature_dim] 119 | return features_flat 120 | 121 | def _cluster_features(self, points, features_flat, masks): 122 | candidate_keypoints = [] 123 | candidate_pixels = [] 124 | candidate_rigid_group_ids = [] 125 | for rigid_group_id, binary_mask in enumerate(masks): 126 | # ignore mask that is too large 127 | # print("***binary_mask", binary_mask) 128 | binary_mask = binary_mask.cpu().numpy() 129 | if np.mean(binary_mask) > self.config['max_mask_ratio']: 130 | continue 131 | # consider only foreground features 132 | obj_features_flat = features_flat[binary_mask.reshape(-1)] 133 | feature_pixels = np.argwhere(binary_mask) 134 | feature_points = points[binary_mask] 135 | # reduce dimensionality to be less sensitive to noise and texture 136 | obj_features_flat = obj_features_flat.double() 137 | (u, s, v) = torch.pca_lowrank(obj_features_flat, center=False) 138 | features_pca = torch.mm(obj_features_flat, v[:, :3]) 139 | features_pca = (features_pca - features_pca.min(0)[0]) / (features_pca.max(0)[0] - features_pca.min(0)[0]) 140 | X = features_pca 141 | # add feature_pixels as extra dimensions 142 | feature_points_torch = torch.tensor(feature_points, dtype=features_pca.dtype, device=features_pca.device) 143 | feature_points_torch = (feature_points_torch - feature_points_torch.min(0)[0]) / (feature_points_torch.max(0)[0] - feature_points_torch.min(0)[0]) 144 | X = torch.cat([X, feature_points_torch], dim=-1) 145 | # cluster features to get meaningful regions 146 | cluster_ids_x, cluster_centers = kmeans( 147 | X=X, 148 | num_clusters=self.config['num_candidates_per_mask'], 149 | distance='euclidean', 150 | device=self.device, 151 | ) 152 | cluster_centers = cluster_centers.to(self.device) 153 | for cluster_id in range(self.config['num_candidates_per_mask']): 154 | cluster_center = cluster_centers[cluster_id][:3] 155 | member_idx = cluster_ids_x == cluster_id 156 | member_points = feature_points[member_idx] 157 | member_pixels = feature_pixels[member_idx] 158 | member_features = features_pca[member_idx] 159 | dist = torch.norm(member_features - cluster_center, dim=-1) 160 | closest_idx = torch.argmin(dist) 161 | candidate_keypoints.append(member_points[closest_idx]) 162 | candidate_pixels.append(member_pixels[closest_idx]) 163 | candidate_rigid_group_ids.append(rigid_group_id) 164 | 165 | candidate_keypoints = np.array(candidate_keypoints) 166 | candidate_pixels = np.array(candidate_pixels) 167 | candidate_rigid_group_ids = np.array(candidate_rigid_group_ids) 168 | 169 | return candidate_keypoints, candidate_pixels, candidate_rigid_group_ids 170 | 171 | def _merge_clusters(self, candidate_keypoints): 172 | self.mean_shift.fit(candidate_keypoints) 173 | cluster_centers = self.mean_shift.cluster_centers_ 174 | merged_indices = [] 175 | for center in cluster_centers: 176 | dist = np.linalg.norm(candidate_keypoints - center, axis=-1) 177 | merged_indices.append(np.argmin(dist)) 178 | return merged_indices -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | import json 4 | import os 5 | import argparse 6 | from environment import ReKepOGEnv 7 | from keypoint_proposal import KeypointProposer 8 | from constraint_generation import ConstraintGenerator 9 | from ik_solver import IKSolver 10 | from subgoal_solver import SubgoalSolver 11 | from path_solver import PathSolver 12 | from visualizer import Visualizer 13 | import transform_utils as T 14 | from omnigibson.robots.fetch import Fetch 15 | from utils import ( 16 | bcolors, 17 | get_config, 18 | load_functions_from_txt, 19 | get_linear_interpolation_steps, 20 | spline_interpolate_poses, 21 | get_callable_grasping_cost_fn, 22 | print_opt_debug_dict, 23 | ) 24 | 25 | class Main: 26 | def __init__(self, scene_file, visualize=False): 27 | global_config = get_config(config_path="./configs/config.yaml") 28 | self.config = global_config['main'] 29 | self.bounds_min = np.array(self.config['bounds_min']) 30 | self.bounds_max = np.array(self.config['bounds_max']) 31 | self.visualize = visualize 32 | # set random seed 33 | np.random.seed(self.config['seed']) 34 | torch.manual_seed(self.config['seed']) 35 | torch.cuda.manual_seed(self.config['seed']) 36 | # initialize keypoint proposer and constraint generator 37 | self.keypoint_proposer = KeypointProposer(global_config['keypoint_proposer']) 38 | self.constraint_generator = ConstraintGenerator(global_config['constraint_generator']) 39 | # initialize environment 40 | self.env = ReKepOGEnv(global_config['env'], scene_file, verbose=False) 41 | # setup ik solver (for reachability cost) 42 | assert isinstance(self.env.robot, Fetch), "The IK solver assumes the robot is a Fetch robot" 43 | ik_solver = IKSolver( 44 | robot_description_path=self.env.robot.robot_arm_descriptor_yamls[self.env.robot.default_arm], 45 | robot_urdf_path=self.env.robot.urdf_path, 46 | eef_name=self.env.robot.eef_link_names[self.env.robot.default_arm], 47 | reset_joint_pos=self.env.reset_joint_pos, 48 | world2robot_homo=self.env.world2robot_homo, 49 | ) 50 | # initialize solvers 51 | self.subgoal_solver = SubgoalSolver(global_config['subgoal_solver'], ik_solver, self.env.reset_joint_pos) 52 | self.path_solver = PathSolver(global_config['path_solver'], ik_solver, self.env.reset_joint_pos) 53 | # initialize visualizer 54 | if self.visualize: 55 | self.visualizer = Visualizer(global_config['visualizer'], self.env) 56 | 57 | def perform_task(self, instruction, rekep_program_dir=None, disturbance_seq=None): 58 | self.env.reset() 59 | cam_obs = self.env.get_cam_obs() 60 | rgb = cam_obs[self.config['vlm_camera']]['rgb'] 61 | points = cam_obs[self.config['vlm_camera']]['points'] 62 | mask = cam_obs[self.config['vlm_camera']]['seg'] 63 | # ==================================== 64 | # = keypoint proposal and constraint generation 65 | # ==================================== 66 | if rekep_program_dir is None: 67 | keypoints, projected_img = self.keypoint_proposer.get_keypoints(rgb, points, mask) 68 | print(f'{bcolors.HEADER}Got {len(keypoints)} proposed keypoints{bcolors.ENDC}') 69 | if self.visualize: 70 | self.visualizer.show_img(projected_img) 71 | metadata = {'init_keypoint_positions': keypoints, 'num_keypoints': len(keypoints)} 72 | rekep_program_dir = self.constraint_generator.generate(projected_img, instruction, metadata) 73 | print(f'{bcolors.HEADER}Constraints generated{bcolors.ENDC}') 74 | # ==================================== 75 | # = execute 76 | # ==================================== 77 | self._execute(rekep_program_dir, disturbance_seq) 78 | 79 | def _update_disturbance_seq(self, stage, disturbance_seq): 80 | if disturbance_seq is not None: 81 | if stage in disturbance_seq and not self.applied_disturbance[stage]: 82 | # set the disturbance sequence, the generator will yield and instantiate one disturbance function for each env.step until it is exhausted 83 | self.env.disturbance_seq = disturbance_seq[stage](self.env) 84 | self.applied_disturbance[stage] = True 85 | 86 | def _execute(self, rekep_program_dir, disturbance_seq=None): 87 | # load metadata 88 | with open(os.path.join(rekep_program_dir, 'metadata.json'), 'r') as f: 89 | self.program_info = json.load(f) 90 | self.applied_disturbance = {stage: False for stage in range(1, self.program_info['num_stages'] + 1)} 91 | # register keypoints to be tracked 92 | self.env.register_keypoints(self.program_info['init_keypoint_positions']) 93 | # load constraints 94 | self.constraint_fns = dict() 95 | for stage in range(1, self.program_info['num_stages'] + 1): # stage starts with 1 96 | stage_dict = dict() 97 | for constraint_type in ['subgoal', 'path']: 98 | load_path = os.path.join(rekep_program_dir, f'stage{stage}_{constraint_type}_constraints.txt') 99 | get_grasping_cost_fn = get_callable_grasping_cost_fn(self.env) # special grasping function for VLM to call 100 | stage_dict[constraint_type] = load_functions_from_txt(load_path, get_grasping_cost_fn) if os.path.exists(load_path) else [] 101 | self.constraint_fns[stage] = stage_dict 102 | 103 | # bookkeeping of which keypoints can be moved in the optimization 104 | self.keypoint_movable_mask = np.zeros(self.program_info['num_keypoints'] + 1, dtype=bool) 105 | self.keypoint_movable_mask[0] = True # first keypoint is always the ee, so it's movable 106 | 107 | # main loop 108 | self.last_sim_step_counter = -np.inf 109 | self._update_stage(1) 110 | while True: 111 | scene_keypoints = self.env.get_keypoint_positions() 112 | self.keypoints = np.concatenate([[self.env.get_ee_pos()], scene_keypoints], axis=0) # first keypoint is always the ee 113 | self.curr_ee_pose = self.env.get_ee_pose() 114 | self.curr_joint_pos = self.env.get_arm_joint_postions() 115 | self.sdf_voxels = self.env.get_sdf_voxels(self.config['sdf_voxel_size']) 116 | self.collision_points = self.env.get_collision_points() 117 | # ==================================== 118 | # = decide whether to backtrack 119 | # ==================================== 120 | backtrack = False 121 | if self.stage > 1: 122 | path_constraints = self.constraint_fns[self.stage]['path'] 123 | for constraints in path_constraints: 124 | violation = constraints(self.keypoints[0], self.keypoints[1:]) 125 | if violation > self.config['constraint_tolerance']: 126 | backtrack = True 127 | break 128 | if backtrack: 129 | # determine which stage to backtrack to based on constraints 130 | for new_stage in range(self.stage - 1, 0, -1): 131 | path_constraints = self.constraint_fns[new_stage]['path'] 132 | # if no constraints, we can safely backtrack 133 | if len(path_constraints) == 0: 134 | break 135 | # otherwise, check if all constraints are satisfied 136 | all_constraints_satisfied = True 137 | for constraints in path_constraints: 138 | violation = constraints(self.keypoints[0], self.keypoints[1:]) 139 | if violation > self.config['constraint_tolerance']: 140 | all_constraints_satisfied = False 141 | break 142 | if all_constraints_satisfied: 143 | break 144 | print(f"{bcolors.HEADER}[stage={self.stage}] backtrack to stage {new_stage}{bcolors.ENDC}") 145 | self._update_stage(new_stage) 146 | else: 147 | # apply disturbance 148 | self._update_disturbance_seq(self.stage, disturbance_seq) 149 | # ==================================== 150 | # = get optimized plan 151 | # ==================================== 152 | if self.last_sim_step_counter == self.env.step_counter: 153 | print(f"{bcolors.WARNING}sim did not step forward within last iteration (HINT: adjust action_steps_per_iter to be larger or the pos_threshold to be smaller){bcolors.ENDC}") 154 | next_subgoal = self._get_next_subgoal(from_scratch=self.first_iter) 155 | next_path = self._get_next_path(next_subgoal, from_scratch=self.first_iter) 156 | self.first_iter = False 157 | self.action_queue = next_path.tolist() 158 | self.last_sim_step_counter = self.env.step_counter 159 | 160 | # ==================================== 161 | # = execute 162 | # ==================================== 163 | # determine if we proceed to the next stage 164 | count = 0 165 | while len(self.action_queue) > 0 and count < self.config['action_steps_per_iter']: 166 | next_action = self.action_queue.pop(0) 167 | precise = len(self.action_queue) == 0 168 | self.env.execute_action(next_action, precise=precise) 169 | count += 1 170 | if len(self.action_queue) == 0: 171 | if self.is_grasp_stage: 172 | self._execute_grasp_action() 173 | elif self.is_release_stage: 174 | self._execute_release_action() 175 | # if completed, save video and return 176 | if self.stage == self.program_info['num_stages']: 177 | self.env.sleep(2.0) 178 | save_path = self.env.save_video() 179 | print(f"{bcolors.OKGREEN}Video saved to {save_path}\n\n{bcolors.ENDC}") 180 | return 181 | # progress to next stage 182 | self._update_stage(self.stage + 1) 183 | 184 | def _get_next_subgoal(self, from_scratch): 185 | subgoal_constraints = self.constraint_fns[self.stage]['subgoal'] 186 | path_constraints = self.constraint_fns[self.stage]['path'] 187 | subgoal_pose, debug_dict = self.subgoal_solver.solve(self.curr_ee_pose, 188 | self.keypoints, 189 | self.keypoint_movable_mask, 190 | subgoal_constraints, 191 | path_constraints, 192 | self.sdf_voxels, 193 | self.collision_points, 194 | self.is_grasp_stage, 195 | self.curr_joint_pos, 196 | from_scratch=from_scratch) 197 | subgoal_pose_homo = T.convert_pose_quat2mat(subgoal_pose) 198 | # if grasp stage, back up a bit to leave room for grasping 199 | if self.is_grasp_stage: 200 | subgoal_pose[:3] += subgoal_pose_homo[:3, :3] @ np.array([-self.config['grasp_depth'] / 2.0, 0, 0]) 201 | debug_dict['stage'] = self.stage 202 | print_opt_debug_dict(debug_dict) 203 | if self.visualize: 204 | self.visualizer.visualize_subgoal(subgoal_pose) 205 | return subgoal_pose 206 | 207 | def _get_next_path(self, next_subgoal, from_scratch): 208 | path_constraints = self.constraint_fns[self.stage]['path'] 209 | path, debug_dict = self.path_solver.solve(self.curr_ee_pose, 210 | next_subgoal, 211 | self.keypoints, 212 | self.keypoint_movable_mask, 213 | path_constraints, 214 | self.sdf_voxels, 215 | self.collision_points, 216 | self.curr_joint_pos, 217 | from_scratch=from_scratch) 218 | print_opt_debug_dict(debug_dict) 219 | processed_path = self._process_path(path) 220 | if self.visualize: 221 | self.visualizer.visualize_path(processed_path) 222 | return processed_path 223 | 224 | def _process_path(self, path): 225 | # spline interpolate the path from the current ee pose 226 | full_control_points = np.concatenate([ 227 | self.curr_ee_pose.reshape(1, -1), 228 | path, 229 | ], axis=0) 230 | num_steps = get_linear_interpolation_steps(full_control_points[0], full_control_points[-1], 231 | self.config['interpolate_pos_step_size'], 232 | self.config['interpolate_rot_step_size']) 233 | dense_path = spline_interpolate_poses(full_control_points, num_steps) 234 | # add gripper action 235 | ee_action_seq = np.zeros((dense_path.shape[0], 8)) 236 | ee_action_seq[:, :7] = dense_path 237 | ee_action_seq[:, 7] = self.env.get_gripper_null_action() 238 | return ee_action_seq 239 | 240 | def _update_stage(self, stage): 241 | # update stage 242 | self.stage = stage 243 | self.is_grasp_stage = self.program_info['grasp_keypoints'][self.stage - 1] != -1 244 | self.is_release_stage = self.program_info['release_keypoints'][self.stage - 1] != -1 245 | # can only be grasp stage or release stage or none 246 | assert self.is_grasp_stage + self.is_release_stage <= 1, "Cannot be both grasp and release stage" 247 | if self.is_grasp_stage: # ensure gripper is open for grasping stage 248 | self.env.open_gripper() 249 | # clear action queue 250 | self.action_queue = [] 251 | # update keypoint movable mask 252 | self._update_keypoint_movable_mask() 253 | self.first_iter = True 254 | 255 | def _update_keypoint_movable_mask(self): 256 | for i in range(1, len(self.keypoint_movable_mask)): # first keypoint is ee so always movable 257 | keypoint_object = self.env.get_object_by_keypoint(i - 1) 258 | self.keypoint_movable_mask[i] = self.env.is_grasping(keypoint_object) 259 | 260 | def _execute_grasp_action(self): 261 | pregrasp_pose = self.env.get_ee_pose() 262 | grasp_pose = pregrasp_pose.copy() 263 | grasp_pose[:3] += T.quat2mat(pregrasp_pose[3:]) @ np.array([self.config['grasp_depth'], 0, 0]) 264 | grasp_action = np.concatenate([grasp_pose, [self.env.get_gripper_close_action()]]) 265 | self.env.execute_action(grasp_action, precise=True) 266 | 267 | def _execute_release_action(self): 268 | self.env.open_gripper() 269 | 270 | if __name__ == "__main__": 271 | parser = argparse.ArgumentParser() 272 | parser.add_argument('--task', type=str, default='pen', help='task to perform') 273 | parser.add_argument('--use_cached_query', action='store_true', help='instead of querying the VLM, use the cached query') 274 | parser.add_argument('--apply_disturbance', action='store_true', help='apply disturbance to test the robustness') 275 | parser.add_argument('--visualize', action='store_true', help='visualize each solution before executing (NOTE: this is blocking and needs to press "ESC" to continue)') 276 | args = parser.parse_args() 277 | 278 | if args.apply_disturbance: 279 | assert args.task == 'pen' and args.use_cached_query, 'disturbance sequence is only defined for cached scenario' 280 | 281 | # ==================================== 282 | # = pen task disturbance sequence 283 | # ==================================== 284 | def stage1_disturbance_seq(env): 285 | """ 286 | Move the pen in stage 0 when robot is trying to grasp the pen 287 | """ 288 | pen = env.og_env.scene.object_registry("name", "pen_1") 289 | holder = env.og_env.scene.object_registry("name", "pencil_holder_1") 290 | # disturbance sequence 291 | pos0, orn0 = pen.get_position_orientation() 292 | pose0 = np.concatenate([pos0, orn0]) 293 | pos1 = pos0 + np.array([-0.08, 0.0, 0.0]) 294 | orn1 = T.quat_multiply(T.euler2quat(np.array([0, 0, np.pi/4])), orn0) 295 | pose1 = np.concatenate([pos1, orn1]) 296 | pos2 = pos1 + np.array([0.10, 0.0, 0.0]) 297 | orn2 = T.quat_multiply(T.euler2quat(np.array([0, 0, -np.pi/2])), orn1) 298 | pose2 = np.concatenate([pos2, orn2]) 299 | control_points = np.array([pose0, pose1, pose2]) 300 | pose_seq = spline_interpolate_poses(control_points, num_steps=25) 301 | def disturbance(counter): 302 | if counter < len(pose_seq): 303 | pose = pose_seq[counter] 304 | pos, orn = pose[:3], pose[3:] 305 | pen.set_position_orientation(pos, orn) 306 | counter += 1 307 | counter = 0 308 | while True: 309 | yield disturbance(counter) 310 | counter += 1 311 | 312 | def stage2_disturbance_seq(env): 313 | """ 314 | Take the pen out of the gripper in stage 1 when robot is trying to reorient the pen 315 | """ 316 | apply_disturbance = env.is_grasping() 317 | pen = env.og_env.scene.object_registry("name", "pen_1") 318 | holder = env.og_env.scene.object_registry("name", "pencil_holder_1") 319 | # disturbance sequence 320 | pos0, orn0 = pen.get_position_orientation() 321 | pose0 = np.concatenate([pos0, orn0]) 322 | pose1 = np.array([-0.30, -0.15, 0.71, -0.7071068, 0, 0, 0.7071068]) 323 | control_points = np.array([pose0, pose1]) 324 | pose_seq = spline_interpolate_poses(control_points, num_steps=25) 325 | def disturbance(counter): 326 | if apply_disturbance: 327 | if counter < 20: 328 | if counter > 15: 329 | env.robot.release_grasp_immediately() # force robot to release the pen 330 | else: 331 | pass # do nothing for the other steps 332 | elif counter < len(pose_seq) + 20: 333 | env.robot.release_grasp_immediately() # force robot to release the pen 334 | pose = pose_seq[counter - 20] 335 | pos, orn = pose[:3], pose[3:] 336 | pen.set_position_orientation(pos, orn) 337 | counter += 1 338 | counter = 0 339 | while True: 340 | yield disturbance(counter) 341 | counter += 1 342 | 343 | def stage3_disturbance_seq(env): 344 | """ 345 | Move the holder in stage 2 when robot is trying to drop the pen into the holder 346 | """ 347 | pen = env.og_env.scene.object_registry("name", "pen_1") 348 | holder = env.og_env.scene.object_registry("name", "pencil_holder_1") 349 | # disturbance sequence 350 | pos0, orn0 = holder.get_position_orientation() 351 | pose0 = np.concatenate([pos0, orn0]) 352 | pos1 = pos0 + np.array([-0.02, -0.15, 0.0]) 353 | orn1 = orn0 354 | pose1 = np.concatenate([pos1, orn1]) 355 | control_points = np.array([pose0, pose1]) 356 | pose_seq = spline_interpolate_poses(control_points, num_steps=5) 357 | def disturbance(counter): 358 | if counter < len(pose_seq): 359 | pose = pose_seq[counter] 360 | pos, orn = pose[:3], pose[3:] 361 | holder.set_position_orientation(pos, orn) 362 | counter += 1 363 | counter = 0 364 | while True: 365 | yield disturbance(counter) 366 | counter += 1 367 | 368 | task_list = { 369 | 'pen': { 370 | 'scene_file': './configs/og_scene_file_red_pen.json', 371 | 'instruction': 'reorient the red pen and drop it upright into the black pen holder', 372 | 'rekep_program_dir': './vlm_query/pen', 373 | 'disturbance_seq': {1: stage1_disturbance_seq, 2: stage2_disturbance_seq, 3: stage3_disturbance_seq}, 374 | }, 375 | } 376 | task = task_list['pen'] 377 | scene_file = task['scene_file'] 378 | instruction = task['instruction'] 379 | main = Main(scene_file, visualize=args.visualize) 380 | main.perform_task(instruction, 381 | rekep_program_dir=task['rekep_program_dir'] if args.use_cached_query else None, 382 | disturbance_seq=task.get('disturbance_seq', None) if args.apply_disturbance else None) -------------------------------------------------------------------------------- /media/pen-in-holder-disturbances.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/media/pen-in-holder-disturbances.gif -------------------------------------------------------------------------------- /media/pen-in-holder.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/media/pen-in-holder.gif -------------------------------------------------------------------------------- /og_utils.py: -------------------------------------------------------------------------------- 1 | from omnigibson.sensors.vision_sensor import VisionSensor 2 | import transform_utils as T 3 | import numpy as np 4 | import torch 5 | 6 | class OGCamera: 7 | """ 8 | Defines the camera class 9 | """ 10 | def __init__(self, og_env, config) -> None: 11 | self.cam = insert_camera(name=config['name'], og_env=og_env, width=config['resolution'], height=config['resolution']) 12 | self.cam.set_position_orientation(config['position'], config['orientation']) 13 | self.intrinsics = get_cam_intrinsics(self.cam) 14 | self.extrinsics = get_cam_extrinsics(self.cam) 15 | 16 | def get_params(self): 17 | """ 18 | Get the intrinsic and extrinsic parameters of the camera 19 | """ 20 | return {"intrinsics": self.intrinsics, "extrinsics": self.extrinsics} 21 | 22 | def get_obs(self): 23 | """ 24 | Gets the image observation from the camera. 25 | Assumes have rendered befor calling this function. 26 | No semantic handling here for now. 27 | """ 28 | obs = self.cam.get_obs() 29 | ret = {} 30 | ret["rgb"] = obs[0]["rgb"][:,:,:3] # H, W, 3 31 | ret["depth"] = obs[0]["depth_linear"] # H, W 32 | ret["points"] = pixel_to_3d_points(ret["depth"], self.intrinsics, self.extrinsics) # H, W, 3 33 | ret["seg"] = obs[0]["seg_semantic"] # H, W 34 | ret["intrinsic"] = self.intrinsics 35 | ret["extrinsic"] = self.extrinsics 36 | return ret 37 | 38 | def insert_camera(name, og_env, width=480, height=480): 39 | try: 40 | cam = VisionSensor( 41 | prim_path=f"/World/{name}", 42 | name=name, 43 | image_width=width, 44 | image_height=height, 45 | modalities=["rgb", "depth_linear", "seg_semantic"] 46 | ) 47 | except TypeError: 48 | cam = VisionSensor( 49 | relative_prim_path=f"/{name}", 50 | name=name, 51 | image_width=width, 52 | image_height=height, 53 | modalities=["rgb", "depth_linear", "seg_semantic"] 54 | ) 55 | 56 | try: 57 | cam.load() 58 | except TypeError: 59 | cam.load(og_env.scene) 60 | cam.initialize() 61 | return cam 62 | 63 | def get_cam_intrinsics(cam): 64 | """ 65 | Get the intrinsics matrix for a VisionSensor object 66 | ::param cam: VisionSensor object 67 | ::return intrinsics: 3x3 numpy array 68 | """ 69 | img_width = cam.image_width 70 | img_height = cam.image_height 71 | 72 | if img_width != img_height: 73 | raise ValueError("Only square images are supported") 74 | 75 | apert = cam.prim.GetAttribute("horizontalAperture").Get() 76 | focal_len_in_pixel = cam.focal_length * img_width / apert 77 | 78 | intrinsics = np.eye(3) 79 | intrinsics[0,0] = focal_len_in_pixel 80 | intrinsics[1,1] = focal_len_in_pixel 81 | intrinsics[0,2] = img_width / 2 82 | intrinsics[1,2] = img_height / 2 83 | 84 | return intrinsics 85 | 86 | def get_cam_extrinsics(cam): 87 | return T.pose_inv(T.pose2mat(cam.get_position_orientation())) 88 | 89 | def pixel_to_3d_points(depth_image, intrinsics, extrinsics): 90 | # if torch.is_tensor(intrinsics): 91 | # intrinsics = intrinsics.detach().cpu().numpy() 92 | # if torch.is_tensor(extrinsics): 93 | # extrinsics = extrinsics.detach().cpu().numpy() 94 | # Get the shape of the depth image 95 | H, W = depth_image.shape 96 | 97 | # Create a grid of (x, y) coordinates corresponding to each pixel in the image 98 | i, j = np.meshgrid(np.arange(W), np.arange(H), indexing='xy') 99 | 100 | # Unpack the intrinsic parameters 101 | fx, fy = intrinsics[0, 0], intrinsics[1, 1] 102 | cx, cy = intrinsics[0, 2], intrinsics[1, 2] 103 | 104 | # Convert pixel coordinates to normalized camera coordinates 105 | if torch.is_tensor(depth_image): 106 | depth_image = depth_image.detach().cpu().numpy() 107 | z = depth_image 108 | x = (i - cx) * z / fx 109 | y = (j - cy) * z / fy 110 | 111 | # Stack the coordinates to form (H, W, 3) 112 | camera_coordinates = np.stack((x, y, z), axis=-1) 113 | 114 | # Reshape to (H*W, 3) for matrix multiplication 115 | camera_coordinates = camera_coordinates.reshape(-1, 3) 116 | 117 | # Convert to homogeneous coordinates (H*W, 4) 118 | camera_coordinates_homogeneous = np.hstack((camera_coordinates, np.ones((camera_coordinates.shape[0], 1)))) 119 | 120 | # additional conversion to og convention 121 | T_mod = np.array([[1., 0., 0., 0., ], 122 | [0., -1., 0., 0.,], 123 | [0., 0., -1., 0.,], 124 | [0., 0., 0., 1.,]]) 125 | camera_coordinates_homogeneous = camera_coordinates_homogeneous @ T_mod 126 | 127 | # Apply extrinsics to get world coordinates 128 | # world_coordinates_homogeneous = camera_coordinates_homogeneous @ extrinsics.T 129 | world_coordinates_homogeneous = T.pose_inv(extrinsics) @ (camera_coordinates_homogeneous.T) 130 | world_coordinates_homogeneous = world_coordinates_homogeneous.T 131 | 132 | # Convert back to non-homogeneous coordinates 133 | world_coordinates = world_coordinates_homogeneous[:, :3] / world_coordinates_homogeneous[:, 3, np.newaxis] 134 | 135 | # Reshape back to (H, W, 3) 136 | world_coordinates = world_coordinates.reshape(H, W, 3) 137 | 138 | return world_coordinates 139 | 140 | def point_to_pixel(pt, intrinsics, extrinsics): 141 | """ 142 | pt -- (N, 3) 3d points in world frame 143 | intrinsics -- (3, 3) intrinsics matrix 144 | extrinsics -- (4, 4) extrinsics matrix 145 | """ 146 | pt_in_cam = extrinsics @ np.hstack((pt, np.ones((pt.shape[0], 1)))).T # (4, N) 147 | # multiply y, z by -1 148 | pt_in_cam[1, :] *= -1 149 | pt_in_cam[2, :] *= -1 150 | pt_in_cam = pt_in_cam[:3, :] 151 | pt_in_cam = intrinsics @ pt_in_cam 152 | pt_in_cam /= pt_in_cam[2, :] 153 | 154 | return pt_in_cam[:2, :].T -------------------------------------------------------------------------------- /path_solver.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from scipy.optimize import dual_annealing, minimize 3 | from scipy.interpolate import RegularGridInterpolator 4 | import copy 5 | import time 6 | import transform_utils as T 7 | from utils import ( 8 | farthest_point_sampling, 9 | get_linear_interpolation_steps, 10 | linear_interpolate_poses, 11 | normalize_vars, 12 | unnormalize_vars, 13 | get_samples_jitted, 14 | calculate_collision_cost, 15 | path_length, 16 | transform_keypoints, 17 | ) 18 | import torch 19 | 20 | # ==================================== 21 | # = objective function 22 | # ==================================== 23 | def objective(opt_vars, 24 | og_bounds, 25 | start_pose, 26 | end_pose, 27 | keypoints_centered, 28 | keypoint_movable_mask, 29 | path_constraints, 30 | sdf_func, 31 | collision_points_centered, 32 | opt_interpolate_pos_step_size, 33 | opt_interpolate_rot_step_size, 34 | ik_solver, 35 | initial_joint_pos, 36 | reset_joint_pos, 37 | return_debug_dict=False): 38 | 39 | debug_dict = {} 40 | debug_dict['num_control_points'] = len(opt_vars) // 6 41 | 42 | # unnormalize variables and do conversion 43 | unnormalized_opt_vars = unnormalize_vars(opt_vars, og_bounds) 44 | control_points_euler = np.concatenate([start_pose[None], unnormalized_opt_vars.reshape(-1, 6), end_pose[None]], axis=0) # [num_control_points, 6] 45 | control_points_homo = T.convert_pose_euler2mat(control_points_euler) # [num_control_points, 4, 4] 46 | control_points_quat = T.convert_pose_mat2quat(control_points_homo) # [num_control_points, 7] 47 | # get dense samples 48 | poses_quat, num_poses = get_samples_jitted(control_points_homo, control_points_quat, opt_interpolate_pos_step_size, opt_interpolate_rot_step_size) 49 | poses_homo = T.convert_pose_quat2mat(poses_quat) 50 | debug_dict['num_poses'] = num_poses 51 | start_idx, end_idx = 1, num_poses - 1 # exclude start and goal 52 | 53 | cost = 0 54 | # collision cost 55 | if collision_points_centered is not None: 56 | collision_cost = 0.5 * calculate_collision_cost(poses_homo[start_idx:end_idx], sdf_func, collision_points_centered, 0.20) 57 | debug_dict['collision_cost'] = collision_cost 58 | cost += collision_cost 59 | 60 | # penalize path length 61 | pos_length, rot_length = path_length(poses_homo) 62 | approx_length = pos_length + rot_length * 1.0 63 | path_length_cost = 4.0 * approx_length 64 | debug_dict['path_length_cost'] = path_length_cost 65 | cost += path_length_cost 66 | 67 | # reachability cost 68 | ik_cost = 0 69 | reset_reg_cost = 0 70 | debug_dict['ik_pos_error'] = [] 71 | debug_dict['ik_feasible'] = [] 72 | max_iterations = 20 73 | for control_point_homo in control_points_homo: 74 | ik_result = ik_solver.solve( 75 | control_point_homo, 76 | max_iterations=max_iterations, 77 | initial_joint_pos=initial_joint_pos, 78 | ) 79 | debug_dict['ik_pos_error'].append(ik_result.position_error) 80 | debug_dict['ik_feasible'].append(ik_result.success) 81 | ik_cost += 20.0 * (ik_result.num_descents / max_iterations) 82 | if ik_result.success: 83 | reset_joint_pos = reset_joint_pos.detach().cpu().numpy() if torch.is_tensor(reset_joint_pos) else reset_joint_pos 84 | reset_reg = np.linalg.norm(ik_result.cspace_position[:-1] - reset_joint_pos[:-1]) 85 | reset_reg = np.clip(reset_reg, 0.0, 3.0) 86 | else: 87 | reset_reg = 3.0 88 | reset_reg_cost += 0.2 * reset_reg 89 | debug_dict['ik_pos_error'] = np.array(debug_dict['ik_pos_error']) 90 | debug_dict['ik_feasible'] = np.array(debug_dict['ik_feasible']) 91 | debug_dict['ik_cost'] = ik_cost 92 | debug_dict['reset_reg_cost'] = reset_reg_cost 93 | cost += ik_cost 94 | 95 | # # path constraint violation cost 96 | debug_dict['path_violation'] = None 97 | if path_constraints is not None and len(path_constraints) > 0: 98 | path_constraint_cost = 0 99 | path_violation = [] 100 | for pose in poses_homo[start_idx:end_idx]: 101 | transformed_keypoints = transform_keypoints(pose, keypoints_centered, keypoint_movable_mask) 102 | for constraint in path_constraints: 103 | violation = constraint(transformed_keypoints[0], transformed_keypoints[1:]) 104 | path_violation.append(violation) 105 | path_constraint_cost += np.clip(violation, 0, np.inf) 106 | path_constraint_cost = 200.0*path_constraint_cost 107 | debug_dict['path_constraint_cost'] = path_constraint_cost 108 | debug_dict['path_violation'] = path_violation 109 | cost += path_constraint_cost 110 | 111 | debug_dict['total_cost'] = cost 112 | 113 | if return_debug_dict: 114 | return cost, debug_dict 115 | 116 | return cost 117 | 118 | 119 | class PathSolver: 120 | """ 121 | Given a goal pose and a start pose, solve for a sequence of intermediate poses for the end effector to follow. 122 | 123 | Optimization variables: 124 | - sequence of intermediate control points 125 | """ 126 | 127 | def __init__(self, config, ik_solver, reset_joint_pos): 128 | self.config = config 129 | self.ik_solver = ik_solver 130 | self.reset_joint_pos = reset_joint_pos 131 | self.last_opt_result = None 132 | # warmup 133 | self._warmup() 134 | 135 | def _warmup(self): 136 | start_pose = np.array([0.0, 0.0, 0.3, 0, 0, 0, 1]) 137 | end_pose = np.array([0.0, 0.0, 0.0, 0, 0, 0, 1]) 138 | keypoints = np.random.rand(10, 3) 139 | keypoint_movable_mask = np.random.rand(10) > 0.5 140 | path_constraints = [] 141 | sdf_voxels = np.zeros((10, 10, 10)) 142 | collision_points = np.random.rand(100, 3) 143 | self.solve(start_pose, end_pose, keypoints, keypoint_movable_mask, path_constraints, sdf_voxels, collision_points, None, from_scratch=True) 144 | self.last_opt_result = None 145 | 146 | def _setup_sdf(self, sdf_voxels): 147 | # create callable sdf function with interpolation 148 | x = np.linspace(self.config['bounds_min'][0], self.config['bounds_max'][0], sdf_voxels.shape[0]) 149 | y = np.linspace(self.config['bounds_min'][1], self.config['bounds_max'][1], sdf_voxels.shape[1]) 150 | z = np.linspace(self.config['bounds_min'][2], self.config['bounds_max'][2], sdf_voxels.shape[2]) 151 | sdf_func = RegularGridInterpolator((x, y, z), sdf_voxels, bounds_error=False, fill_value=0) 152 | return sdf_func 153 | 154 | def _check_opt_result(self, opt_result, path_quat, debug_dict, og_bounds): 155 | # accept the opt_result if it's only terminated due to iteration limit 156 | if (not opt_result.success and ('maximum' in opt_result.message.lower() or 'iteration' in opt_result.message.lower() or 'not necessarily' in opt_result.message.lower())): 157 | opt_result.success = True 158 | elif not opt_result.success: 159 | opt_result.message += '; invalid solution' 160 | # check whether path constraints are satisfied 161 | if debug_dict['path_violation'] is not None: 162 | path_violation = np.array(debug_dict['path_violation']) 163 | opt_result.message += f'; path_violation: {path_violation}' 164 | path_constraints_satisfied = all([violation <= self.config['constraint_tolerance'] for violation in path_violation]) 165 | if not path_constraints_satisfied: 166 | opt_result.success = False 167 | opt_result.message += f'; path constraint not satisfied' 168 | return opt_result 169 | 170 | def _center_collision_points_and_keypoints(self, ee_pose, collision_points, keypoints, keypoint_movable_mask): 171 | ee_pose_homo = T.pose2mat([ee_pose[:3], T.euler2quat(ee_pose[3:])]) 172 | centering_transform = np.linalg.inv(ee_pose_homo) 173 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3] 174 | keypoints_centered = transform_keypoints(centering_transform, keypoints, keypoint_movable_mask) 175 | return collision_points_centered, keypoints_centered 176 | 177 | def solve(self, 178 | start_pose, 179 | end_pose, 180 | keypoints, 181 | keypoint_movable_mask, 182 | path_constraints, 183 | sdf_voxels, 184 | collision_points, 185 | initial_joint_pos, 186 | from_scratch=False): 187 | """ 188 | Args: 189 | - start_pose (np.ndarray): [7], [x, y, z, qx, qy, qz, qw] 190 | - end_pose (np.ndarray): [7], [x, y, z, qx, qy, qz, qw] 191 | - keypoints (np.ndarray): [num_keypoints, 3] 192 | - keypoint_movable_mask (bool): whether the keypoints are on the object being grasped 193 | - path_constraints (List[Callable]): path constraints 194 | - sdf_voxels (np.ndarray): [H, W, D] 195 | - collision_points (np.ndarray): [num_points, 3], point cloud of the object being grasped 196 | - initial_joint_pos (np.ndarray): [N] initial joint positions of the robot. 197 | - from_scratch (bool): whether to start from scratch 198 | 199 | Returns: 200 | - opt_result (scipy.optimize.OptimizeResult): optimization opt_result 201 | - debug_dict (dict): debug information 202 | """ 203 | # downsample collision points 204 | if collision_points is not None and collision_points.shape[0] > self.config['max_collision_points']: 205 | collision_points = farthest_point_sampling(collision_points, self.config['max_collision_points']) 206 | sdf_func = self._setup_sdf(sdf_voxels) 207 | 208 | # ==================================== 209 | # = setup bounds 210 | # ==================================== 211 | # calculate an appropriate number of control points, including start and goal 212 | num_control_points = get_linear_interpolation_steps(start_pose, end_pose, self.config['opt_pos_step_size'], self.config['opt_rot_step_size']) 213 | num_control_points = np.clip(num_control_points, 3, 6) 214 | # transform to euler representation 215 | start_pose = np.concatenate([start_pose[:3], T.quat2euler(start_pose[3:])]) 216 | end_pose = np.concatenate([end_pose[:3], T.quat2euler(end_pose[3:])]) 217 | 218 | # bounds for decision variables 219 | og_bounds = [(b_min, b_max) for b_min, b_max in zip(self.config['bounds_min'], self.config['bounds_max'])] + \ 220 | [(-np.pi, np.pi) for _ in range(3)] 221 | og_bounds *= (num_control_points - 2) 222 | og_bounds = np.array(og_bounds, dtype=np.float64) 223 | bounds = [(-1, 1)] * len(og_bounds) 224 | num_vars = len(bounds) 225 | 226 | # ==================================== 227 | # = setup initial guess 228 | # ==================================== 229 | # use previous opt_result as initial guess if available 230 | if not from_scratch and self.last_opt_result is not None: 231 | init_sol = self.last_opt_result.x 232 | # if there are more control points in this iter, fill the rest with the last value + small noise 233 | if len(init_sol) < num_vars: 234 | new_x0 = np.empty(num_vars) 235 | new_x0[:len(init_sol)] = init_sol 236 | for i in range(len(init_sol), num_vars, 6): 237 | new_x0[i:i+6] = init_sol[-6:] + np.random.randn(6) * 0.01 238 | init_sol = new_x0 239 | # otherwise, use the last num_vars values 240 | else: 241 | init_sol = init_sol[-num_vars:] 242 | # initial guess as linear interpolation 243 | else: 244 | from_scratch = True 245 | interp_poses = linear_interpolate_poses(start_pose, end_pose, num_control_points) # [num_control_points, 6] 246 | init_sol = interp_poses[1:-1].flatten() # [num_control_points-2, 6] 247 | init_sol = normalize_vars(init_sol, og_bounds) 248 | 249 | # clip the initial guess to be within bounds 250 | for i, (b_min, b_max) in enumerate(bounds): 251 | init_sol[i] = np.clip(init_sol[i], b_min, b_max) 252 | 253 | # ==================================== 254 | # = other setup 255 | # ==================================== 256 | collision_points_centered, keypoints_centered = self._center_collision_points_and_keypoints(start_pose, collision_points, keypoints, keypoint_movable_mask) 257 | aux_args = (og_bounds, 258 | start_pose, 259 | end_pose, 260 | keypoints_centered, 261 | keypoint_movable_mask, 262 | path_constraints, 263 | sdf_func, 264 | collision_points_centered, 265 | self.config['opt_interpolate_pos_step_size'], 266 | self.config['opt_interpolate_rot_step_size'], 267 | self.ik_solver, 268 | initial_joint_pos, 269 | self.reset_joint_pos) 270 | 271 | # ==================================== 272 | # = solve optimization 273 | # ==================================== 274 | start = time.time() 275 | # use global optimization for the first iteration 276 | if from_scratch: 277 | opt_result = dual_annealing( 278 | func=objective, 279 | bounds=bounds, 280 | args=aux_args, 281 | maxfun=self.config['sampling_maxfun'], 282 | x0=init_sol, 283 | no_local_search=True, 284 | minimizer_kwargs={ 285 | 'method': 'SLSQP', 286 | 'options': self.config['minimizer_options'], 287 | }, 288 | ) 289 | # use gradient-based local optimization for the following iterations 290 | else: 291 | opt_result = minimize( 292 | fun=objective, 293 | x0=init_sol, 294 | args=aux_args, 295 | bounds=bounds, 296 | method='SLSQP', 297 | options=self.config['minimizer_options'], 298 | ) 299 | solve_time = time.time() - start 300 | 301 | # ==================================== 302 | # = post-process opt_result 303 | # ==================================== 304 | if isinstance(opt_result.message, list): 305 | opt_result.message = opt_result.message[0] 306 | # rerun to get debug info 307 | _, debug_dict = objective(opt_result.x, *aux_args, return_debug_dict=True) 308 | debug_dict['sol'] = opt_result.x.reshape(-1, 6) 309 | debug_dict['msg'] = opt_result.message 310 | debug_dict['solve_time'] = solve_time 311 | debug_dict['from_scratch'] = from_scratch 312 | debug_dict['type'] = 'path_solver' 313 | # unnormailze 314 | sol = unnormalize_vars(opt_result.x, og_bounds) 315 | # add end pose 316 | poses_euler = np.concatenate([sol.reshape(-1, 6), end_pose[None]], axis=0) 317 | poses_quat = T.convert_pose_euler2quat(poses_euler) # [num_control_points, 7] 318 | opt_result = self._check_opt_result(opt_result, poses_quat, debug_dict, og_bounds) 319 | # cache opt_result for future use if successful 320 | if opt_result.success: 321 | self.last_opt_result = copy.deepcopy(opt_result) 322 | return poses_quat, debug_dict -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | open3d==0.16.0 2 | omegaconf 3 | opencv-python 4 | hydra-core 5 | openai 6 | torch 7 | kmeans_pytorch 8 | parse -------------------------------------------------------------------------------- /subgoal_solver.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import time 3 | import copy 4 | from scipy.optimize import dual_annealing, minimize 5 | from scipy.interpolate import RegularGridInterpolator 6 | import transform_utils as T 7 | from utils import ( 8 | transform_keypoints, 9 | calculate_collision_cost, 10 | normalize_vars, 11 | unnormalize_vars, 12 | farthest_point_sampling, 13 | consistency, 14 | ) 15 | import torch 16 | def objective(opt_vars, 17 | og_bounds, 18 | keypoints_centered, 19 | keypoint_movable_mask, 20 | goal_constraints, 21 | path_constraints, 22 | sdf_func, 23 | collision_points_centered, 24 | init_pose_homo, 25 | ik_solver, 26 | initial_joint_pos, 27 | reset_joint_pos, 28 | is_grasp_stage, 29 | return_debug_dict=False): 30 | 31 | debug_dict = {} 32 | # unnormalize variables and do conversion 33 | opt_pose = unnormalize_vars(opt_vars, og_bounds) 34 | opt_pose_homo = T.pose2mat([opt_pose[:3], T.euler2quat(opt_pose[3:])]) 35 | 36 | cost = 0 37 | # collision cost 38 | if collision_points_centered is not None: 39 | collision_cost = 0.8 * calculate_collision_cost(opt_pose_homo[None], sdf_func, collision_points_centered, 0.10) 40 | debug_dict['collision_cost'] = collision_cost 41 | cost += collision_cost 42 | 43 | # stay close to initial pose 44 | init_pose_cost = 1.0 * consistency(opt_pose_homo[None], init_pose_homo[None], rot_weight=1.5) 45 | debug_dict['init_pose_cost'] = init_pose_cost 46 | cost += init_pose_cost 47 | 48 | # reachability cost (approximated by number of IK iterations + regularization from reset joint pos) 49 | max_iterations = 20 50 | ik_result = ik_solver.solve( 51 | opt_pose_homo, 52 | max_iterations=max_iterations, 53 | initial_joint_pos=initial_joint_pos, 54 | ) 55 | ik_cost = 20.0 * (ik_result.num_descents / max_iterations) 56 | debug_dict['ik_feasible'] = ik_result.success 57 | debug_dict['ik_pos_error'] = ik_result.position_error 58 | debug_dict['ik_cost'] = ik_cost 59 | cost += ik_cost 60 | if ik_result.success: 61 | reset_joint_pos = reset_joint_pos.detach().cpu().numpy() if torch.is_tensor(reset_joint_pos) else reset_joint_pos 62 | reset_reg = np.linalg.norm(ik_result.cspace_position[:-1] - reset_joint_pos[:-1]) 63 | reset_reg = np.clip(reset_reg, 0.0, 3.0) 64 | else: 65 | reset_reg = 3.0 66 | reset_reg_cost = 0.2 * reset_reg 67 | debug_dict['reset_reg_cost'] = reset_reg_cost 68 | cost += reset_reg_cost 69 | 70 | # grasp metric (better performance if using anygrasp or force-based grasp metrics) 71 | if is_grasp_stage: 72 | preferred_dir = np.array([0, 0, -1]) 73 | grasp_cost = -np.dot(opt_pose_homo[:3, 0], preferred_dir) + 1 # [0, 1] 74 | grasp_cost = 10.0 * grasp_cost 75 | debug_dict['grasp_cost'] = grasp_cost 76 | cost += grasp_cost 77 | 78 | # goal constraint violation cost 79 | debug_dict['subgoal_constraint_cost'] = None 80 | debug_dict['subgoal_violation'] = None 81 | if goal_constraints is not None and len(goal_constraints) > 0: 82 | subgoal_constraint_cost = 0 83 | transformed_keypoints = transform_keypoints(opt_pose_homo, keypoints_centered, keypoint_movable_mask) 84 | subgoal_violation = [] 85 | for constraint in goal_constraints: 86 | violation = constraint(transformed_keypoints[0], transformed_keypoints[1:]) 87 | subgoal_violation.append(violation) 88 | subgoal_constraint_cost += np.clip(violation, 0, np.inf) 89 | subgoal_constraint_cost = 200.0*subgoal_constraint_cost 90 | debug_dict['subgoal_constraint_cost'] = subgoal_constraint_cost 91 | debug_dict['subgoal_violation'] = subgoal_violation 92 | cost += subgoal_constraint_cost 93 | 94 | # path constraint violation cost 95 | debug_dict['path_violation'] = None 96 | if path_constraints is not None and len(path_constraints) > 0: 97 | path_constraint_cost = 0 98 | transformed_keypoints = transform_keypoints(opt_pose_homo, keypoints_centered, keypoint_movable_mask) 99 | path_violation = [] 100 | for constraint in path_constraints: 101 | violation = constraint(transformed_keypoints[0], transformed_keypoints[1:]) 102 | path_violation.append(violation) 103 | path_constraint_cost += np.clip(violation, 0, np.inf) 104 | path_constraint_cost = 200.0*path_constraint_cost 105 | debug_dict['path_constraint_cost'] = path_constraint_cost 106 | debug_dict['path_violation'] = path_violation 107 | cost += path_constraint_cost 108 | 109 | debug_dict['total_cost'] = cost 110 | 111 | if return_debug_dict: 112 | return cost, debug_dict 113 | 114 | return cost 115 | 116 | 117 | class SubgoalSolver: 118 | def __init__(self, config, ik_solver, reset_joint_pos): 119 | self.config = config 120 | self.ik_solver = ik_solver 121 | self.reset_joint_pos = reset_joint_pos 122 | self.last_opt_result = None 123 | # warmup 124 | self._warmup() 125 | 126 | def _warmup(self): 127 | ee_pose = np.array([0.0, 0.0, 0.0, 0, 0, 0, 1]) 128 | keypoints = np.random.rand(10, 3) 129 | keypoint_movable_mask = np.random.rand(10) > 0.5 130 | goal_constraints = [] 131 | path_constraints = [] 132 | sdf_voxels = np.zeros((10, 10, 10)) 133 | collision_points = np.random.rand(100, 3) 134 | self.solve(ee_pose, keypoints, keypoint_movable_mask, goal_constraints, path_constraints, sdf_voxels, collision_points, True, None, from_scratch=True) 135 | self.last_opt_result = None 136 | 137 | def _setup_sdf(self, sdf_voxels): 138 | # create callable sdf function with interpolation 139 | x = np.linspace(self.config['bounds_min'][0], self.config['bounds_max'][0], sdf_voxels.shape[0]) 140 | y = np.linspace(self.config['bounds_min'][1], self.config['bounds_max'][1], sdf_voxels.shape[1]) 141 | z = np.linspace(self.config['bounds_min'][2], self.config['bounds_max'][2], sdf_voxels.shape[2]) 142 | sdf_func = RegularGridInterpolator((x, y, z), sdf_voxels, bounds_error=False, fill_value=0) 143 | return sdf_func 144 | 145 | def _check_opt_result(self, opt_result, debug_dict): 146 | # accept the opt_result if it's only terminated due to iteration limit 147 | if (not opt_result.success and ('maximum' in opt_result.message.lower() or 'iteration' in opt_result.message.lower() or 'not necessarily' in opt_result.message.lower())): 148 | opt_result.success = True 149 | elif not opt_result.success: 150 | opt_result.message += '; invalid solution' 151 | # check whether goal constraints are satisfied 152 | if debug_dict['subgoal_violation'] is not None: 153 | goal_constraints_results = np.array(debug_dict['subgoal_violation']) 154 | opt_result.message += f'; goal_constraints_results: {goal_constraints_results} (higher is worse)' 155 | goal_constraints_satisfied = all([violation <= self.config['constraint_tolerance'] for violation in goal_constraints_results]) 156 | if not goal_constraints_satisfied: 157 | opt_result.success = False 158 | opt_result.message += f'; goal not satisfied' 159 | # check whether path constraints are satisfied 160 | if debug_dict['path_violation'] is not None: 161 | path_constraints_results = np.array(debug_dict['path_violation']) 162 | opt_result.message += f'; path_constraints_results: {path_constraints_results}' 163 | path_constraints_satisfied = all([violation <= self.config['constraint_tolerance'] for violation in path_constraints_results]) 164 | if not path_constraints_satisfied: 165 | opt_result.success = False 166 | opt_result.message += f'; path not satisfied' 167 | # check whether ik is feasible 168 | if 'ik_feasible' in debug_dict and not debug_dict['ik_feasible']: 169 | opt_result.success = False 170 | opt_result.message += f'; ik not feasible' 171 | return opt_result 172 | 173 | def _center_collision_points_and_keypoints(self, ee_pose_homo, collision_points, keypoints, keypoint_movable_mask): 174 | centering_transform = np.linalg.inv(ee_pose_homo) 175 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3] 176 | keypoints_centered = transform_keypoints(centering_transform, keypoints, keypoint_movable_mask) 177 | return collision_points_centered, keypoints_centered 178 | 179 | def solve(self, 180 | ee_pose, 181 | keypoints, 182 | keypoint_movable_mask, 183 | goal_constraints, 184 | path_constraints, 185 | sdf_voxels, 186 | collision_points, 187 | is_grasp_stage, 188 | initial_joint_pos, 189 | from_scratch=False, 190 | ): 191 | """ 192 | Args: 193 | - ee_pose (np.ndarray): [7], [x, y, z, qx, qy, qz, qw] end effector pose. 194 | - keypoints (np.ndarray): [M, 3] keypoint positions. 195 | - keypoint_movable_mask (bool): [M] boolean array indicating whether the keypoint is on the grasped object. 196 | - goal_constraints (List[Callable]): subgoal constraint functions. 197 | - path_constraints (List[Callable]): path constraint functions. 198 | - sdf_voxels (np.ndarray): [X, Y, Z] signed distance field of the environment. 199 | - collision_points (np.ndarray): [N, 3] point cloud of the object. 200 | - is_grasp_stage (bool): whether the current stage is a grasp stage. 201 | - initial_joint_pos (np.ndarray): [N] initial joint positions of the robot. 202 | - from_scratch (bool): whether to start from scratch. 203 | Returns: 204 | - result (scipy.optimize.OptimizeResult): optimization result. 205 | - debug_dict (dict): debug information. 206 | """ 207 | # downsample collision points 208 | if collision_points is not None and collision_points.shape[0] > self.config['max_collision_points']: 209 | collision_points = farthest_point_sampling(collision_points, self.config['max_collision_points']) 210 | sdf_func = self._setup_sdf(sdf_voxels) 211 | # ==================================== 212 | # = setup bounds and initial guess 213 | # ==================================== 214 | ee_pose = ee_pose.astype(np.float64) 215 | ee_pose_homo = T.pose2mat([ee_pose[:3], ee_pose[3:]]) 216 | ee_pose_euler = np.concatenate([ee_pose[:3], T.quat2euler(ee_pose[3:])]) 217 | # normalize opt variables to [0, 1] 218 | pos_bounds_min = self.config['bounds_min'] 219 | pos_bounds_max = self.config['bounds_max'] 220 | rot_bounds_min = np.array([-np.pi, -np.pi, -np.pi]) # euler angles 221 | rot_bounds_max = np.array([np.pi, np.pi, np.pi]) # euler angles 222 | og_bounds = [(b_min, b_max) for b_min, b_max in zip(np.concatenate([pos_bounds_min, rot_bounds_min]), np.concatenate([pos_bounds_max, rot_bounds_max]))] 223 | bounds = [(-1, 1)] * len(og_bounds) 224 | if not from_scratch and self.last_opt_result is not None: 225 | init_sol = self.last_opt_result.x 226 | else: 227 | init_sol = normalize_vars(ee_pose_euler, og_bounds) # start from the current pose 228 | from_scratch = True 229 | 230 | # ==================================== 231 | # = other setup 232 | # ==================================== 233 | collision_points_centered, keypoints_centered = self._center_collision_points_and_keypoints(ee_pose_homo, collision_points, keypoints, keypoint_movable_mask) 234 | aux_args = (og_bounds, 235 | keypoints_centered, 236 | keypoint_movable_mask, 237 | goal_constraints, 238 | path_constraints, 239 | sdf_func, 240 | collision_points_centered, 241 | ee_pose_homo, 242 | self.ik_solver, 243 | initial_joint_pos, 244 | self.reset_joint_pos, 245 | is_grasp_stage) 246 | 247 | # ==================================== 248 | # = solve optimization 249 | # ==================================== 250 | start = time.time() 251 | # use global optimization for the first iteration 252 | if from_scratch: 253 | opt_result = dual_annealing( 254 | func=objective, 255 | bounds=bounds, 256 | args=aux_args, 257 | maxfun=self.config['sampling_maxfun'], 258 | x0=init_sol, 259 | no_local_search=False, 260 | minimizer_kwargs={ 261 | 'method': 'SLSQP', 262 | 'options': self.config['minimizer_options'], 263 | }, 264 | ) 265 | # use gradient-based local optimization for the following iterations 266 | else: 267 | opt_result = minimize( 268 | fun=objective, 269 | x0=init_sol, 270 | args=aux_args, 271 | bounds=bounds, 272 | method='SLSQP', 273 | options=self.config['minimizer_options'], 274 | ) 275 | solve_time = time.time() - start 276 | 277 | # ==================================== 278 | # = post-process opt_result 279 | # ==================================== 280 | if isinstance(opt_result.message, list): 281 | opt_result.message = opt_result.message[0] 282 | # rerun to get debug info 283 | _, debug_dict = objective(opt_result.x, *aux_args, return_debug_dict=True) 284 | debug_dict['sol'] = opt_result.x 285 | debug_dict['msg'] = opt_result.message 286 | debug_dict['solve_time'] = solve_time 287 | debug_dict['from_scratch'] = from_scratch 288 | debug_dict['type'] = 'subgoal_solver' 289 | # unnormailze 290 | sol = unnormalize_vars(opt_result.x, og_bounds) 291 | sol = np.concatenate([sol[:3], T.euler2quat(sol[3:])]) 292 | opt_result = self._check_opt_result(opt_result, debug_dict) 293 | # cache opt_result for future use if successful 294 | if opt_result.success: 295 | self.last_opt_result = copy.deepcopy(opt_result) 296 | return sol, debug_dict -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | from numba import njit 4 | import open3d as o3d 5 | import datetime 6 | import scipy.interpolate as interpolate 7 | from scipy.spatial.transform import Slerp 8 | from scipy.spatial.transform import Rotation as R 9 | from scipy.spatial.transform import RotationSpline 10 | import transform_utils as T 11 | import yaml 12 | 13 | # =============================================== 14 | # = optimization utils 15 | # =============================================== 16 | def normalize_vars(vars, og_bounds): 17 | """ 18 | Given 1D variables and bounds, normalize the variables to [-1, 1] range. 19 | """ 20 | normalized_vars = np.empty_like(vars) 21 | for i, (b_min, b_max) in enumerate(og_bounds): 22 | normalized_vars[i] = (vars[i] - b_min) / (b_max - b_min) * 2 - 1 23 | return normalized_vars 24 | 25 | def unnormalize_vars(normalized_vars, og_bounds): 26 | """ 27 | Given 1D variables in [-1, 1] and original bounds, denormalize the variables to the original range. 28 | """ 29 | vars = np.empty_like(normalized_vars) 30 | for i, (b_min, b_max) in enumerate(og_bounds): 31 | vars[i] = (normalized_vars[i] + 1) / 2 * (b_max - b_min) + b_min 32 | return vars 33 | 34 | def calculate_collision_cost(poses, sdf_func, collision_points, threshold): 35 | assert poses.shape[1:] == (4, 4) 36 | transformed_pcs = batch_transform_points(collision_points, poses) 37 | transformed_pcs_flatten = transformed_pcs.reshape(-1, 3) # [num_poses * num_points, 3] 38 | signed_distance = sdf_func(transformed_pcs_flatten) + threshold # [num_poses * num_points] 39 | signed_distance = signed_distance.reshape(-1, collision_points.shape[0]) # [num_poses, num_points] 40 | non_zero_mask = signed_distance > 0 41 | collision_cost = np.sum(signed_distance[non_zero_mask]) 42 | return collision_cost 43 | 44 | @njit(cache=True, fastmath=True) 45 | def consistency(poses_a, poses_b, rot_weight=0.5): 46 | assert poses_a.shape[1:] == (4, 4) and poses_b.shape[1:] == (4, 4), 'poses must be of shape (N, 4, 4)' 47 | min_distances = np.zeros(len(poses_a), dtype=np.float64) 48 | for i in range(len(poses_a)): 49 | min_distance = 9999999 50 | a = poses_a[i] 51 | for j in range(len(poses_b)): 52 | b = poses_b[j] 53 | pos_distance = np.linalg.norm(a[:3, 3] - b[:3, 3]) 54 | rot_distance = angle_between_rotmat(a[:3, :3], b[:3, :3]) 55 | distance = pos_distance + rot_distance * rot_weight 56 | min_distance = min(min_distance, distance) 57 | min_distances[i] = min_distance 58 | return np.mean(min_distances) 59 | 60 | def transform_keypoints(transform, keypoints, movable_mask): 61 | assert transform.shape == (4, 4) 62 | transformed_keypoints = keypoints.copy() 63 | if movable_mask.sum() > 0: 64 | transformed_keypoints[movable_mask] = np.dot(keypoints[movable_mask], transform[:3, :3].T) + transform[:3, 3] 65 | return transformed_keypoints 66 | 67 | @njit(cache=True, fastmath=True) 68 | def batch_transform_points(points, transforms): 69 | """ 70 | Apply multiple of transformation to point cloud, return results of individual transformations. 71 | Args: 72 | points: point cloud (N, 3). 73 | transforms: M 4x4 transformations (M, 4, 4). 74 | Returns: 75 | np.array: point clouds (M, N, 3). 76 | """ 77 | assert transforms.shape[1:] == (4, 4), 'transforms must be of shape (M, 4, 4)' 78 | transformed_points = np.zeros((transforms.shape[0], points.shape[0], 3)) 79 | for i in range(transforms.shape[0]): 80 | pos, R = transforms[i, :3, 3], transforms[i, :3, :3] 81 | transformed_points[i] = np.dot(points, R.T) + pos 82 | return transformed_points 83 | 84 | @njit(cache=True, fastmath=True) 85 | def get_samples_jitted(control_points_homo, control_points_quat, opt_interpolate_pos_step_size, opt_interpolate_rot_step_size): 86 | assert control_points_homo.shape[1:] == (4, 4) 87 | # calculate number of samples per segment 88 | num_samples_per_segment = np.empty(len(control_points_homo) - 1, dtype=np.int64) 89 | for i in range(len(control_points_homo) - 1): 90 | start_pos = control_points_homo[i, :3, 3] 91 | start_rotmat = control_points_homo[i, :3, :3] 92 | end_pos = control_points_homo[i+1, :3, 3] 93 | end_rotmat = control_points_homo[i+1, :3, :3] 94 | pos_diff = np.linalg.norm(start_pos - end_pos) 95 | rot_diff = angle_between_rotmat(start_rotmat, end_rotmat) 96 | pos_num_steps = np.ceil(pos_diff / opt_interpolate_pos_step_size) 97 | rot_num_steps = np.ceil(rot_diff / opt_interpolate_rot_step_size) 98 | num_path_poses = int(max(pos_num_steps, rot_num_steps)) 99 | num_path_poses = max(num_path_poses, 2) # at least 2 poses, start and end 100 | num_samples_per_segment[i] = num_path_poses 101 | # fill in samples 102 | num_samples = num_samples_per_segment.sum() 103 | samples_7 = np.empty((num_samples, 7)) 104 | sample_idx = 0 105 | for i in range(len(control_points_quat) - 1): 106 | start_pos, start_xyzw = control_points_quat[i, :3], control_points_quat[i, 3:] 107 | end_pos, end_xyzw = control_points_quat[i+1, :3], control_points_quat[i+1, 3:] 108 | # using proper quaternion slerp interpolation 109 | poses_7 = np.empty((num_samples_per_segment[i], 7)) 110 | for j in range(num_samples_per_segment[i]): 111 | alpha = j / (num_samples_per_segment[i] - 1) 112 | pos = start_pos * (1 - alpha) + end_pos * alpha 113 | blended_xyzw = T.quat_slerp_jitted(start_xyzw, end_xyzw, alpha) 114 | pose_7 = np.empty(7) 115 | pose_7[:3] = pos 116 | pose_7[3:] = blended_xyzw 117 | poses_7[j] = pose_7 118 | samples_7[sample_idx:sample_idx+num_samples_per_segment[i]] = poses_7 119 | sample_idx += num_samples_per_segment[i] 120 | assert num_samples >= 2, f'num_samples: {num_samples}' 121 | return samples_7, num_samples 122 | 123 | @njit(cache=True, fastmath=True) 124 | def path_length(samples_homo): 125 | assert samples_homo.shape[1:] == (4, 4), 'samples_homo must be of shape (N, 4, 4)' 126 | pos_length = 0 127 | rot_length = 0 128 | for i in range(len(samples_homo) - 1): 129 | pos_length += np.linalg.norm(samples_homo[i, :3, 3] - samples_homo[i+1, :3, 3]) 130 | rot_length += angle_between_rotmat(samples_homo[i, :3, :3], samples_homo[i+1, :3, :3]) 131 | return pos_length, rot_length 132 | 133 | # =============================================== 134 | # = others 135 | # =============================================== 136 | def get_callable_grasping_cost_fn(env): 137 | def get_grasping_cost(keypoint_idx): 138 | keypoint_object = env.get_object_by_keypoint(keypoint_idx) 139 | return -env.is_grasping(candidate_obj=keypoint_object) + 1 # return 0 if grasping an object, 1 if not grasping any object 140 | return get_grasping_cost 141 | 142 | def get_config(config_path=None): 143 | if config_path is None: 144 | this_file_dir = os.path.dirname(os.path.abspath(__file__)) 145 | config_path = os.path.join(this_file_dir, 'configs/config.yaml') 146 | assert config_path and os.path.exists(config_path), f'config file does not exist ({config_path})' 147 | with open(config_path, 'r') as f: 148 | config = yaml.load(f, Loader=yaml.FullLoader) 149 | return config 150 | 151 | class bcolors: 152 | HEADER = '\033[95m' 153 | OKBLUE = '\033[94m' 154 | OKCYAN = '\033[96m' 155 | OKGREEN = '\033[92m' 156 | WARNING = '\033[93m' 157 | FAIL = '\033[91m' 158 | ENDC = '\033[0m' 159 | BOLD = '\033[1m' 160 | UNDERLINE = '\033[4m' 161 | 162 | def get_clock_time(milliseconds=False): 163 | curr_time = datetime.datetime.now() 164 | if milliseconds: 165 | return f'{curr_time.hour}:{curr_time.minute}:{curr_time.second}.{curr_time.microsecond // 1000}' 166 | else: 167 | return f'{curr_time.hour}:{curr_time.minute}:{curr_time.second}' 168 | 169 | def angle_between_quats(q1, q2): 170 | """Angle between two quaternions""" 171 | return 2 * np.arccos(np.clip(np.abs(np.dot(q1, q2)), -1, 1)) 172 | 173 | def filter_points_by_bounds(points, bounds_min, bounds_max, strict=True): 174 | """ 175 | Filter points by taking only points within workspace bounds. 176 | """ 177 | assert points.shape[1] == 3, "points must be (N, 3)" 178 | bounds_min = bounds_min.copy() 179 | bounds_max = bounds_max.copy() 180 | if not strict: 181 | bounds_min[:2] = bounds_min[:2] - 0.1 * (bounds_max[:2] - bounds_min[:2]) 182 | bounds_max[:2] = bounds_max[:2] + 0.1 * (bounds_max[:2] - bounds_min[:2]) 183 | bounds_min[2] = bounds_min[2] - 0.1 * (bounds_max[2] - bounds_min[2]) 184 | within_bounds_mask = ( 185 | (points[:, 0] >= bounds_min[0]) 186 | & (points[:, 0] <= bounds_max[0]) 187 | & (points[:, 1] >= bounds_min[1]) 188 | & (points[:, 1] <= bounds_max[1]) 189 | & (points[:, 2] >= bounds_min[2]) 190 | & (points[:, 2] <= bounds_max[2]) 191 | ) 192 | return within_bounds_mask 193 | 194 | def print_opt_debug_dict(debug_dict): 195 | print('\n' + '#' * 40) 196 | print(f'# Optimization debug info:') 197 | max_key_length = max(len(str(k)) for k in debug_dict.keys()) 198 | for k, v in debug_dict.items(): 199 | if isinstance(v, int) or isinstance(v, float): 200 | print(f'# {k:<{max_key_length}}: {v:.05f}') 201 | elif isinstance(v, list) and all(isinstance(x, int) or isinstance(x, float) for x in v): 202 | print(f'# {k:<{max_key_length}}: {np.array(v).round(5)}') 203 | else: 204 | print(f'# {k:<{max_key_length}}: {v}') 205 | print('#' * 40 + '\n') 206 | 207 | def merge_dicts(dicts): 208 | return { 209 | k : v 210 | for d in dicts 211 | for k, v in d.items() 212 | } 213 | 214 | def exec_safe(code_str, gvars=None, lvars=None): 215 | banned_phrases = ['import', '__'] 216 | for phrase in banned_phrases: 217 | assert phrase not in code_str 218 | 219 | if gvars is None: 220 | gvars = {} 221 | if lvars is None: 222 | lvars = {} 223 | empty_fn = lambda *args, **kwargs: None 224 | custom_gvars = merge_dicts([ 225 | gvars, 226 | {'exec': empty_fn, 'eval': empty_fn} 227 | ]) 228 | try: 229 | exec(code_str, custom_gvars, lvars) 230 | except Exception as e: 231 | print(f'Error executing code:\n{code_str}') 232 | raise e 233 | 234 | def load_functions_from_txt(txt_path, get_grasping_cost_fn): 235 | if txt_path is None: 236 | return [] 237 | # load txt file 238 | with open(txt_path, 'r') as f: 239 | functions_text = f.read() 240 | # execute functions 241 | gvars_dict = { 242 | 'np': np, 243 | 'get_grasping_cost_by_keypoint_idx': get_grasping_cost_fn, 244 | } # external library APIs 245 | lvars_dict = dict() 246 | exec_safe(functions_text, gvars=gvars_dict, lvars=lvars_dict) 247 | return list(lvars_dict.values()) 248 | 249 | @njit(cache=True, fastmath=True) 250 | def angle_between_rotmat(P, Q): 251 | R = np.dot(P, Q.T) 252 | cos_theta = (np.trace(R)-1)/2 253 | if cos_theta > 1: 254 | cos_theta = 1 255 | elif cos_theta < -1: 256 | cos_theta = -1 257 | return np.arccos(cos_theta) 258 | 259 | def fit_b_spline(control_points): 260 | # determine appropriate k 261 | k = min(3, control_points.shape[0]-1) 262 | spline = interpolate.splprep(control_points.T, s=0, k=k) 263 | return spline 264 | 265 | def sample_from_spline(spline, num_samples): 266 | sample_points = np.linspace(0, 1, num_samples) 267 | if isinstance(spline, RotationSpline): 268 | samples = spline(sample_points).as_matrix() # [num_samples, 3, 3] 269 | else: 270 | assert isinstance(spline, tuple) and len(spline) == 2, 'spline must be a tuple of (tck, u)' 271 | tck, u = spline 272 | samples = interpolate.splev(np.linspace(0, 1, num_samples), tck) # [spline_dim, num_samples] 273 | samples = np.array(samples).T # [num_samples, spline_dim] 274 | return samples 275 | 276 | def linear_interpolate_poses(start_pose, end_pose, num_poses): 277 | """ 278 | Interpolate between start and end pose. 279 | """ 280 | assert num_poses >= 2, 'num_poses must be at least 2' 281 | if start_pose.shape == (6,) and end_pose.shape == (6,): 282 | start_pos, start_euler = start_pose[:3], start_pose[3:] 283 | end_pos, end_euler = end_pose[:3], end_pose[3:] 284 | start_rotmat = T.euler2mat(start_euler) 285 | end_rotmat = T.euler2mat(end_euler) 286 | elif start_pose.shape == (4, 4) and end_pose.shape == (4, 4): 287 | start_pos = start_pose[:3, 3] 288 | start_rotmat = start_pose[:3, :3] 289 | end_pos = end_pose[:3, 3] 290 | end_rotmat = end_pose[:3, :3] 291 | elif start_pose.shape == (7,) and end_pose.shape == (7,): 292 | start_pos, start_quat = start_pose[:3], start_pose[3:] 293 | start_rotmat = T.quat2mat(start_quat) 294 | end_pos, end_quat = end_pose[:3], end_pose[3:] 295 | end_rotmat = T.quat2mat(end_quat) 296 | else: 297 | raise ValueError('start_pose and end_pose not recognized') 298 | slerp = Slerp([0, 1], R.from_matrix([start_rotmat, end_rotmat])) 299 | poses = [] 300 | for i in range(num_poses): 301 | alpha = i / (num_poses - 1) 302 | pos = start_pos * (1 - alpha) + end_pos * alpha 303 | rotmat = slerp(alpha).as_matrix() 304 | if start_pose.shape == (6,): 305 | euler = T.mat2euler(rotmat) 306 | poses.append(np.concatenate([pos, euler])) 307 | elif start_pose.shape == (4, 4): 308 | pose = np.eye(4) 309 | pose[:3, :3] = rotmat 310 | pose[:3, 3] = pos 311 | poses.append(pose) 312 | elif start_pose.shape == (7,): 313 | quat = T.mat2quat(rotmat) 314 | pose = np.concatenate([pos, quat]) 315 | poses.append(pose) 316 | return np.array(poses) 317 | 318 | def spline_interpolate_poses(control_points, num_steps): 319 | """ 320 | Interpolate between through the control points using spline interpolation. 321 | 1. Fit a b-spline through the positional terms of the control points. 322 | 2. Fit a RotationSpline through the rotational terms of the control points. 323 | 3. Sample the b-spline and RotationSpline at num_steps. 324 | 325 | Args: 326 | control_points: [N, 6] position + euler or [N, 4, 4] pose or [N, 7] position + quat 327 | num_steps: number of poses to interpolate 328 | Returns: 329 | poses: [num_steps, 6] position + euler or [num_steps, 4, 4] pose or [num_steps, 7] position + quat 330 | """ 331 | assert num_steps >= 2, 'num_steps must be at least 2' 332 | if isinstance(control_points, list): 333 | control_points = np.array(control_points) 334 | if control_points.shape[1] == 6: 335 | control_points_pos = control_points[:, :3] # [N, 3] 336 | control_points_euler = control_points[:, 3:] # [N, 3] 337 | control_points_rotmat = [] 338 | for control_point_euler in control_points_euler: 339 | control_points_rotmat.append(T.euler2mat(control_point_euler)) 340 | control_points_rotmat = np.array(control_points_rotmat) # [N, 3, 3] 341 | elif control_points.shape[1] == 4 and control_points.shape[2] == 4: 342 | control_points_pos = control_points[:, :3, 3] # [N, 3] 343 | control_points_rotmat = control_points[:, :3, :3] # [N, 3, 3] 344 | elif control_points.shape[1] == 7: 345 | control_points_pos = control_points[:, :3] 346 | control_points_rotmat = [] 347 | for control_point_quat in control_points[:, 3:]: 348 | control_points_rotmat.append(T.quat2mat(control_point_quat)) 349 | control_points_rotmat = np.array(control_points_rotmat) 350 | else: 351 | raise ValueError('control_points not recognized') 352 | # remove the duplicate points (threshold 1e-3) 353 | diff = np.linalg.norm(np.diff(control_points_pos, axis=0), axis=1) 354 | mask = diff > 1e-3 355 | # always keep the first and last points 356 | mask = np.concatenate([[True], mask[:-1], [True]]) 357 | control_points_pos = control_points_pos[mask] 358 | control_points_rotmat = control_points_rotmat[mask] 359 | # fit b-spline through positional terms control points 360 | pos_spline = fit_b_spline(control_points_pos) 361 | # fit RotationSpline through rotational terms control points 362 | times = pos_spline[1] 363 | rotations = R.from_matrix(control_points_rotmat) 364 | rot_spline = RotationSpline(times, rotations) 365 | # sample from the splines 366 | pos_samples = sample_from_spline(pos_spline, num_steps) # [num_steps, 3] 367 | rot_samples = sample_from_spline(rot_spline, num_steps) # [num_steps, 3, 3] 368 | if control_points.shape[1] == 6: 369 | poses = [] 370 | for i in range(num_steps): 371 | pose = np.concatenate([pos_samples[i], T.mat2euler(rot_samples[i])]) 372 | poses.append(pose) 373 | poses = np.array(poses) 374 | elif control_points.shape[1] == 4 and control_points.shape[2] == 4: 375 | poses = np.empty((num_steps, 4, 4)) 376 | poses[:, :3, :3] = rot_samples 377 | poses[:, :3, 3] = pos_samples 378 | poses[:, 3, 3] = 1 379 | elif control_points.shape[1] == 7: 380 | poses = np.empty((num_steps, 7)) 381 | for i in range(num_steps): 382 | quat = T.mat2quat(rot_samples[i]) 383 | pose = np.concatenate([pos_samples[i], quat]) 384 | poses[i] = pose 385 | return poses 386 | 387 | def get_linear_interpolation_steps(start_pose, end_pose, pos_step_size, rot_step_size): 388 | """ 389 | Given start and end pose, calculate the number of steps to interpolate between them. 390 | Args: 391 | start_pose: [6] position + euler or [4, 4] pose or [7] position + quat 392 | end_pose: [6] position + euler or [4, 4] pose or [7] position + quat 393 | pos_step_size: position step size 394 | rot_step_size: rotation step size 395 | Returns: 396 | num_path_poses: number of poses to interpolate 397 | """ 398 | if start_pose.shape == (6,) and end_pose.shape == (6,): 399 | start_pos, start_euler = start_pose[:3], start_pose[3:] 400 | end_pos, end_euler = end_pose[:3], end_pose[3:] 401 | start_rotmat = T.euler2mat(start_euler) 402 | end_rotmat = T.euler2mat(end_euler) 403 | elif start_pose.shape == (4, 4) and end_pose.shape == (4, 4): 404 | start_pos = start_pose[:3, 3] 405 | start_rotmat = start_pose[:3, :3] 406 | end_pos = end_pose[:3, 3] 407 | end_rotmat = end_pose[:3, :3] 408 | elif start_pose.shape == (7,) and end_pose.shape == (7,): 409 | start_pos, start_quat = start_pose[:3], start_pose[3:] 410 | start_rotmat = T.quat2mat(start_quat) 411 | end_pos, end_quat = end_pose[:3], end_pose[3:] 412 | end_rotmat = T.quat2mat(end_quat) 413 | else: 414 | raise ValueError('start_pose and end_pose not recognized') 415 | pos_diff = np.linalg.norm(start_pos - end_pos) 416 | rot_diff = angle_between_rotmat(start_rotmat, end_rotmat) 417 | pos_num_steps = np.ceil(pos_diff / pos_step_size) 418 | rot_num_steps = np.ceil(rot_diff / rot_step_size) 419 | num_path_poses = int(max(pos_num_steps, rot_num_steps)) 420 | num_path_poses = max(num_path_poses, 2) # at least start and end poses 421 | return num_path_poses 422 | 423 | def farthest_point_sampling(pc, num_points): 424 | """ 425 | Given a point cloud, sample num_points points that are the farthest apart. 426 | Use o3d farthest point sampling. 427 | """ 428 | assert pc.ndim == 2 and pc.shape[1] == 3, "pc must be a (N, 3) numpy array" 429 | pcd = o3d.geometry.PointCloud() 430 | pcd.points = o3d.utility.Vector3dVector(pc) 431 | downpcd_farthest = pcd.farthest_point_down_sample(num_points) 432 | return np.asarray(downpcd_farthest.points) -------------------------------------------------------------------------------- /videos/2024-10-12-09-29-53.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/videos/2024-10-12-09-29-53.mp4 -------------------------------------------------------------------------------- /videos/2024-10-14-02-04-49.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/videos/2024-10-14-02-04-49.mp4 -------------------------------------------------------------------------------- /videos/2024-10-14-03-26-38.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/videos/2024-10-14-03-26-38.mp4 -------------------------------------------------------------------------------- /videos/2024-10-14-03-42-23.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/videos/2024-10-14-03-42-23.mp4 -------------------------------------------------------------------------------- /videos/2024-10-14-04-16-18.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/videos/2024-10-14-04-16-18.mp4 -------------------------------------------------------------------------------- /videos/2024-10-14-04-23-14.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/videos/2024-10-14-04-23-14.mp4 -------------------------------------------------------------------------------- /videos/2024-10-14-08-46-17.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/videos/2024-10-14-08-46-17.mp4 -------------------------------------------------------------------------------- /visualizer.py: -------------------------------------------------------------------------------- 1 | import open3d as o3d 2 | import numpy as np 3 | import matplotlib 4 | import cv2 5 | import transform_utils as T 6 | from utils import filter_points_by_bounds, batch_transform_points 7 | 8 | def add_to_visualize_buffer(visualize_buffer, visualize_points, visualize_colors): 9 | assert visualize_points.shape[0] == visualize_colors.shape[0], f'got {visualize_points.shape[0]} for points and {visualize_colors.shape[0]} for colors' 10 | if len(visualize_points) == 0: 11 | return 12 | assert visualize_points.shape[1] == 3 13 | assert visualize_colors.shape[1] == 3 14 | # assert visualize_colors.max() <= 1.0 and visualize_colors.min() >= 0.0 15 | visualize_buffer["points"].append(visualize_points) 16 | visualize_buffer["colors"].append(visualize_colors) 17 | 18 | def generate_nearby_points(point, num_points_per_side=5, half_range=0.005): 19 | if point.ndim == 1: 20 | offsets = np.linspace(-1, 1, num_points_per_side) 21 | offsets_meshgrid = np.meshgrid(offsets, offsets, offsets) 22 | offsets_array = np.stack(offsets_meshgrid, axis=-1).reshape(-1, 3) 23 | nearby_points = point + offsets_array * half_range 24 | return nearby_points.reshape(-1, 3) 25 | else: 26 | assert point.shape[1] == 3, "point must be (N, 3)" 27 | assert point.ndim == 2, "point must be (N, 3)" 28 | # vectorized version 29 | offsets = np.linspace(-1, 1, num_points_per_side) 30 | offsets_meshgrid = np.meshgrid(offsets, offsets, offsets) 31 | offsets_array = np.stack(offsets_meshgrid, axis=-1).reshape(-1, 3) 32 | nearby_points = point[:, None, :] + offsets_array 33 | return nearby_points 34 | 35 | class Visualizer: 36 | def __init__(self, config, env): 37 | self.config = config 38 | self.env = env 39 | self.bounds_min = np.array(self.config['bounds_min']) 40 | self.bounds_max = np.array(self.config['bounds_max']) 41 | self.color = np.array([0.05, 0.55, 0.26]) 42 | self.world2viewer = np.array([ 43 | [0.3788, 0.3569, -0.8539, 0.0], 44 | [0.9198, -0.0429, 0.3901, 0.0], 45 | [-0.1026, 0.9332, 0.3445, 0.0], 46 | [0.0, 0.0, 0.0, 1.0] 47 | ]).T 48 | 49 | def show_img(self, rgb): 50 | cv2.imshow('img', rgb[..., ::-1]) 51 | cv2.waitKey(0) 52 | print('showing image, click on the window and press "ESC" to close and continue') 53 | cv2.destroyAllWindows() 54 | 55 | def show_pointcloud(self, points, colors): 56 | # transform to viewer frame 57 | points = np.dot(points, self.world2viewer[:3, :3].T) + self.world2viewer[:3, 3] 58 | # clip color to [0, 1] 59 | colors = np.clip(colors, 0.0, 1.0) 60 | pcd = o3d.geometry.PointCloud() 61 | pcd.points = o3d.utility.Vector3dVector(points.astype(np.float64)) # float64 is a lot faster than float32 when added to o3d later 62 | pcd.colors = o3d.utility.Vector3dVector(colors.astype(np.float64)) 63 | print('visualizing pointcloud, click on the window and press "ESC" to close and continue') 64 | o3d.visualization.draw_geometries([pcd]) 65 | 66 | def _get_scene_points_and_colors(self): 67 | # scene 68 | cam_obs = self.env.get_cam_obs() 69 | scene_points = [] 70 | scene_colors = [] 71 | for cam_id in range(len(cam_obs)): 72 | cam_points = cam_obs[cam_id]['points'].reshape(-1, 3) 73 | cam_colors = cam_obs[cam_id]['rgb'].reshape(-1, 3) / 255.0 74 | # clip to workspace 75 | within_workspace_mask = filter_points_by_bounds(cam_points, self.bounds_min, self.bounds_max, strict=False) 76 | cam_points = cam_points[within_workspace_mask] 77 | cam_colors = cam_colors[within_workspace_mask] 78 | scene_points.append(cam_points) 79 | scene_colors.append(cam_colors) 80 | scene_points = np.concatenate(scene_points, axis=0) 81 | scene_colors = np.concatenate(scene_colors, axis=0) 82 | return scene_points, scene_colors 83 | 84 | def visualize_subgoal(self, subgoal_pose): 85 | visualize_buffer = { 86 | "points": [], 87 | "colors": [] 88 | } 89 | # scene 90 | scene_points, scene_colors = self._get_scene_points_and_colors() 91 | add_to_visualize_buffer(visualize_buffer, scene_points, scene_colors) 92 | subgoal_pose_homo = T.convert_pose_quat2mat(subgoal_pose) 93 | # subgoal 94 | collision_points = self.env.get_collision_points(noise=False) 95 | # transform collision points to the subgoal frame 96 | ee_pose = self.env.get_ee_pose() 97 | ee_pose_homo = T.convert_pose_quat2mat(ee_pose) 98 | centering_transform = np.linalg.inv(ee_pose_homo) 99 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3] 100 | transformed_collision_points = batch_transform_points(collision_points_centered, subgoal_pose_homo[None]).reshape(-1, 3) 101 | collision_points_colors = np.array([self.color] * len(collision_points)) 102 | add_to_visualize_buffer(visualize_buffer, transformed_collision_points, collision_points_colors) 103 | # add keypoints 104 | keypoints = self.env.get_keypoint_positions() 105 | num_keypoints = keypoints.shape[0] 106 | color_map = matplotlib.colormaps["gist_rainbow"] 107 | keypoints_colors = [color_map(i / num_keypoints)[:3] for i in range(num_keypoints)] 108 | for i in range(num_keypoints): 109 | nearby_points = generate_nearby_points(keypoints[i], num_points_per_side=6, half_range=0.009) 110 | nearby_colors = np.tile(keypoints_colors[i], (nearby_points.shape[0], 1)) 111 | nearby_colors = 0.5 * nearby_colors + 0.5 * np.array([1, 1, 1]) 112 | add_to_visualize_buffer(visualize_buffer, nearby_points, nearby_colors) 113 | # visualize 114 | visualize_points = np.concatenate(visualize_buffer["points"], axis=0) 115 | visualize_colors = np.concatenate(visualize_buffer["colors"], axis=0) 116 | self.show_pointcloud(visualize_points, visualize_colors) 117 | 118 | def visualize_path(self, path): 119 | visualize_buffer = { 120 | "points": [], 121 | "colors": [] 122 | } 123 | # scene 124 | scene_points, scene_colors = self._get_scene_points_and_colors() 125 | add_to_visualize_buffer(visualize_buffer, scene_points, scene_colors) 126 | # draw curve based on poses 127 | for t in range(len(path) - 1): 128 | start = path[t][:3] 129 | end = path[t + 1][:3] 130 | num_interp_points = int(np.linalg.norm(start - end) / 0.0002) 131 | interp_points = np.linspace(start, end, num_interp_points) 132 | interp_colors = np.tile([0.0, 0.0, 0.0], (num_interp_points, 1)) 133 | # add a tint of white (the higher the j, the more white) 134 | whitening_coef = 0.3 + 0.5 * (t / len(path)) 135 | interp_colors = (1 - whitening_coef) * interp_colors + whitening_coef * np.array([1, 1, 1]) 136 | add_to_visualize_buffer(visualize_buffer, interp_points, interp_colors) 137 | # subsample path with a fixed step size 138 | step_size = 0.05 139 | subpath = [path[0]] 140 | for i in range(1, len(path) - 1): 141 | dist = np.linalg.norm(np.array(path[i][:3]) - np.array(subpath[-1][:3])) 142 | if dist > step_size: 143 | subpath.append(path[i]) 144 | subpath.append(path[-1]) 145 | path = np.array(subpath) 146 | path_length = path.shape[0] 147 | # path points 148 | collision_points = self.env.get_collision_points(noise=False) 149 | num_points = collision_points.shape[0] 150 | start_pose = self.env.get_ee_pose() 151 | centering_transform = np.linalg.inv(T.convert_pose_quat2mat(start_pose)) 152 | collision_points_centered = np.dot(collision_points, centering_transform[:3, :3].T) + centering_transform[:3, 3] 153 | poses_homo = T.convert_pose_quat2mat(path[:, :7]) # the last number is gripper action 154 | transformed_collision_points = batch_transform_points(collision_points_centered, poses_homo).reshape(-1, 3) # (num_poses, num_points, 3) 155 | # calculate color based on the timestep 156 | collision_points_colors = np.ones([path_length, num_points, 3]) * self.color[None, None] 157 | for t in range(path_length): 158 | whitening_coef = 0.3 + 0.5 * (t / path_length) 159 | collision_points_colors[t] = (1 - whitening_coef) * collision_points_colors[t] + whitening_coef * np.array([1, 1, 1]) 160 | collision_points_colors = collision_points_colors.reshape(-1, 3) 161 | add_to_visualize_buffer(visualize_buffer, transformed_collision_points, collision_points_colors) 162 | # keypoints 163 | keypoints = self.env.get_keypoint_positions() 164 | num_keypoints = keypoints.shape[0] 165 | color_map = matplotlib.colormaps["gist_rainbow"] 166 | keypoints_colors = [color_map(i / num_keypoints)[:3] for i in range(num_keypoints)] 167 | for i in range(num_keypoints): 168 | nearby_points = generate_nearby_points(keypoints[i], num_points_per_side=6, half_range=0.009) 169 | nearby_colors = np.tile(keypoints_colors[i], (nearby_points.shape[0], 1)) 170 | nearby_colors = 0.5 * nearby_colors + 0.5 * np.array([1, 1, 1]) 171 | add_to_visualize_buffer(visualize_buffer, nearby_points, nearby_colors) 172 | # visualize 173 | visualize_points = np.concatenate(visualize_buffer["points"], axis=0) 174 | visualize_colors = np.concatenate(visualize_buffer["colors"], axis=0) 175 | self.show_pointcloud(visualize_points, visualize_colors) -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-17-50_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/prompt.txt: -------------------------------------------------------------------------------- 1 | ## Instructions 2 | Suppose you are controlling a robot to perform manipulation tasks by writing constraint functions in Python. The manipulation task is given as an image of the environment, overlayed with keypoints marked with their indices, along with a text instruction. For each given task, please perform the following steps: 3 | - Determine how many stages are involved in the task. Grasping must be an independent stage. Some examples: 4 | - "pouring tea from teapot": 5 | - 3 stages: "grasp teapot", "align teapot with cup opening", and "pour liquid" 6 | - "put red block on top of blue block": 7 | - 3 stages: "grasp red block", "drop the red block on top of blue block" 8 | - "reorient bouquet and drop it upright into vase": 9 | - 3 stages: "grasp bouquet", "reorient bouquet", and "keep upright and drop into vase" 10 | - For each stage, write two kinds of constraints, "sub-goal constraints" and "path constraints". The "sub-goal constraints" are constraints that must be satisfied **at the end of the stage**, while the "path constraints" are constraints that must be satisfied **within the stage**. Some examples: 11 | - "pouring liquid from teapot": 12 | - "grasp teapot" stage: 13 | - 1 sub-goal constraints: "align the end-effector with the teapot handle" 14 | - 0 path constraints 15 | - "align teapot with cup opening" stage: 16 | - 1 sub-goal constraints: "the teapot spout needs to be 10cm above the cup opening" 17 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot must stay upright to avoid spilling" 18 | - "pour liquid" stage: 19 | - 2 sub-goal constraints: "the teapot spout needs to be 5cm above the cup opening", "the teapot spout must be tilted to pour liquid" 20 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot spout is directly above the cup opening" 21 | - "put red block on top of blue block": 22 | - "grasp red block" stage: 23 | - 1 sub-goal constraints: "align the end-effector with the red block" 24 | - 0 path constraints 25 | - "drop the red block on top of blue block" stage: 26 | - 1 sub-goal constraints: "the red block is 10cm on top of the blue block" 27 | - 1 path constraints: "the robot must still be grasping the red block" 28 | - "eorient bouquet and drop it upright into vase": 29 | - "grasp bouquet" stage: 30 | - 1 sub-goal constraints: "align the end-effector with the bouquet stem" 31 | - 0 path constraints 32 | - "reorient bouquet" stage: 33 | - 1 sub-goal constraints: "the bouquet is upright (parallel to the z-axis)" 34 | - 1 path constraints: "the robot must still be grasping the bouquet stem" 35 | - "keep upright and drop into vase" stage: 36 | - 2 sub-goal constraints: "the bouquet must still stay upright (parallel to the z-axis)", "the bouquet is 20cm above the vase opening" 37 | - 1 path constraints: "the robot must still be grasping the bouquet stem" 38 | - Summarize keypoints to be grasped in all grasping stages by defining the `grasp_keypoints` variable. 39 | - Summarize at the end of which stage the robot should release the keypoints by defining the `release_keypoints` variable. 40 | 41 | **Note:** 42 | - Each constraint takes a dummy end-effector point and a set of keypoints as input and returns a numerical cost, where the constraint is satisfied if the cost is smaller than or equal to zero. 43 | - For each stage, you may write 0 or more sub-goal constraints and 0 or more path constraints. 44 | - Avoid using "if" statements in your constraints. 45 | - Avoid using path constraints when manipulating deformable objects (e.g., clothing, towels). 46 | - You do not need to consider collision avoidance. Focus on what is necessary to complete the task. 47 | - Inputs to the constraints are as follows: 48 | - `end_effector`: np.array of shape `(3,)` representing the end-effector position. 49 | - `keypoints`: np.array of shape `(K, 3)` representing the keypoint positions. 50 | - For any path constraint that requires the robot to be still grasping a keypoint `i`, you may use the provided function `get_grasping_cost_by_keypoint_idx` by calling `return get_grasping_cost_by_keypoint_idx(i)` where `i` is the index of the keypoint. 51 | - Inside of each function, you may use native Python functions, any NumPy functions, and the provided `get_grasping_cost_by_keypoint_idx` function. 52 | - For grasping stage, you should only write one sub-goal constraint that associates the end-effector with a keypoint. No path constraints are needed. 53 | - In order to move a keypoint, its associated object must be grasped in one of the previous stages. 54 | - The robot can only grasp one object at a time. 55 | - Grasping must be an independent stage from other stages. 56 | - You may use two keypoints to form a vector, which can be used to specify a rotation (by specifying the angle between the vector and a fixed axis). 57 | - You may use multiple keypoints to specify a surface or volume. 58 | - The keypoints marked on the image start with index 0, same as the given argument `keypoints` array. 59 | - For a point `i` to be relative to another point `j`, the function should define an `offsetted_point` variable that has the delta added to keypoint `j and then calculate the norm of the xyz coordinates of the keypoint `i` and the `offsetted_point`. 60 | - If you would like to specify a location not marked by a keypoint, try using multiple keypoints to specify the location (e.g., you may take the mean of multiple keypoints if the desired location is in the center of those keypoints). 61 | 62 | **Structure your output in a single python code block as follows:** 63 | ```python 64 | 65 | # Your explanation of how many stages are involved in the task and what each stage is about. 66 | # ... 67 | 68 | num_stages = ? 69 | 70 | ### stage 1 sub-goal constraints (if any) 71 | def stage1_subgoal_constraint1(end_effector, keypoints): 72 | """Put your explanation here.""" 73 | ... 74 | return cost 75 | # Add more sub-goal constraints if needed 76 | ... 77 | 78 | ### stage 1 path constraints (if any) 79 | def stage1_path_constraint1(end_effector, keypoints): 80 | """Put your explanation here.""" 81 | ... 82 | return cost 83 | # Add more path constraints if needed 84 | ... 85 | 86 | # repeat for more stages 87 | ... 88 | 89 | """ 90 | Summarize keypoints to be grasped in all grasping stages. 91 | The length of the list should be equal to the number of stages. 92 | For grapsing stage, write the keypoint index. For non-grasping stage, write -1. 93 | """ 94 | grasp_keypoints = [?, ..., ?] 95 | 96 | """ 97 | Summarize at **the end of which stage** the robot should release the keypoints. 98 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously). 99 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase. 100 | The length of the list should be equal to the number of stages. 101 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1. 102 | """ 103 | release_keypoints = [?, ..., ?] 104 | 105 | ``` 106 | 107 | ## Query 108 | Query Task: "reorient the red pen and drop it upright into the black pen holder" 109 | Query Image: -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-17-50_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/vlm_query/2024-10-14_08-17-50_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-24-59_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/prompt.txt: -------------------------------------------------------------------------------- 1 | ## Instructions 2 | Suppose you are controlling a robot to perform manipulation tasks by writing constraint functions in Python. The manipulation task is given as an image of the environment, overlayed with keypoints marked with their indices, along with a text instruction. For each given task, please perform the following steps: 3 | - Determine how many stages are involved in the task. Grasping must be an independent stage. Some examples: 4 | - "pouring tea from teapot": 5 | - 3 stages: "grasp teapot", "align teapot with cup opening", and "pour liquid" 6 | - "put red block on top of blue block": 7 | - 3 stages: "grasp red block", "drop the red block on top of blue block" 8 | - "reorient bouquet and drop it upright into vase": 9 | - 3 stages: "grasp bouquet", "reorient bouquet", and "keep upright and drop into vase" 10 | - For each stage, write two kinds of constraints, "sub-goal constraints" and "path constraints". The "sub-goal constraints" are constraints that must be satisfied **at the end of the stage**, while the "path constraints" are constraints that must be satisfied **within the stage**. Some examples: 11 | - "pouring liquid from teapot": 12 | - "grasp teapot" stage: 13 | - 1 sub-goal constraints: "align the end-effector with the teapot handle" 14 | - 0 path constraints 15 | - "align teapot with cup opening" stage: 16 | - 1 sub-goal constraints: "the teapot spout needs to be 10cm above the cup opening" 17 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot must stay upright to avoid spilling" 18 | - "pour liquid" stage: 19 | - 2 sub-goal constraints: "the teapot spout needs to be 5cm above the cup opening", "the teapot spout must be tilted to pour liquid" 20 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot spout is directly above the cup opening" 21 | - "put red block on top of blue block": 22 | - "grasp red block" stage: 23 | - 1 sub-goal constraints: "align the end-effector with the red block" 24 | - 0 path constraints 25 | - "drop the red block on top of blue block" stage: 26 | - 1 sub-goal constraints: "the red block is 10cm on top of the blue block" 27 | - 1 path constraints: "the robot must still be grasping the red block" 28 | - "eorient bouquet and drop it upright into vase": 29 | - "grasp bouquet" stage: 30 | - 1 sub-goal constraints: "align the end-effector with the bouquet stem" 31 | - 0 path constraints 32 | - "reorient bouquet" stage: 33 | - 1 sub-goal constraints: "the bouquet is upright (parallel to the z-axis)" 34 | - 1 path constraints: "the robot must still be grasping the bouquet stem" 35 | - "keep upright and drop into vase" stage: 36 | - 2 sub-goal constraints: "the bouquet must still stay upright (parallel to the z-axis)", "the bouquet is 20cm above the vase opening" 37 | - 1 path constraints: "the robot must still be grasping the bouquet stem" 38 | - Summarize keypoints to be grasped in all grasping stages by defining the `grasp_keypoints` variable. 39 | - Summarize at the end of which stage the robot should release the keypoints by defining the `release_keypoints` variable. 40 | 41 | **Note:** 42 | - Each constraint takes a dummy end-effector point and a set of keypoints as input and returns a numerical cost, where the constraint is satisfied if the cost is smaller than or equal to zero. 43 | - For each stage, you may write 0 or more sub-goal constraints and 0 or more path constraints. 44 | - Avoid using "if" statements in your constraints. 45 | - Avoid using path constraints when manipulating deformable objects (e.g., clothing, towels). 46 | - You do not need to consider collision avoidance. Focus on what is necessary to complete the task. 47 | - Inputs to the constraints are as follows: 48 | - `end_effector`: np.array of shape `(3,)` representing the end-effector position. 49 | - `keypoints`: np.array of shape `(K, 3)` representing the keypoint positions. 50 | - For any path constraint that requires the robot to be still grasping a keypoint `i`, you may use the provided function `get_grasping_cost_by_keypoint_idx` by calling `return get_grasping_cost_by_keypoint_idx(i)` where `i` is the index of the keypoint. 51 | - Inside of each function, you may use native Python functions, any NumPy functions, and the provided `get_grasping_cost_by_keypoint_idx` function. 52 | - For grasping stage, you should only write one sub-goal constraint that associates the end-effector with a keypoint. No path constraints are needed. 53 | - In order to move a keypoint, its associated object must be grasped in one of the previous stages. 54 | - The robot can only grasp one object at a time. 55 | - Grasping must be an independent stage from other stages. 56 | - You may use two keypoints to form a vector, which can be used to specify a rotation (by specifying the angle between the vector and a fixed axis). 57 | - You may use multiple keypoints to specify a surface or volume. 58 | - The keypoints marked on the image start with index 0, same as the given argument `keypoints` array. 59 | - For a point `i` to be relative to another point `j`, the function should define an `offsetted_point` variable that has the delta added to keypoint `j and then calculate the norm of the xyz coordinates of the keypoint `i` and the `offsetted_point`. 60 | - If you would like to specify a location not marked by a keypoint, try using multiple keypoints to specify the location (e.g., you may take the mean of multiple keypoints if the desired location is in the center of those keypoints). 61 | 62 | **Structure your output in a single python code block as follows:** 63 | ```python 64 | 65 | # Your explanation of how many stages are involved in the task and what each stage is about. 66 | # ... 67 | 68 | num_stages = ? 69 | 70 | ### stage 1 sub-goal constraints (if any) 71 | def stage1_subgoal_constraint1(end_effector, keypoints): 72 | """Put your explanation here.""" 73 | ... 74 | return cost 75 | # Add more sub-goal constraints if needed 76 | ... 77 | 78 | ### stage 1 path constraints (if any) 79 | def stage1_path_constraint1(end_effector, keypoints): 80 | """Put your explanation here.""" 81 | ... 82 | return cost 83 | # Add more path constraints if needed 84 | ... 85 | 86 | # repeat for more stages 87 | ... 88 | 89 | """ 90 | Summarize keypoints to be grasped in all grasping stages. 91 | The length of the list should be equal to the number of stages. 92 | For grapsing stage, write the keypoint index. For non-grasping stage, write -1. 93 | """ 94 | grasp_keypoints = [?, ..., ?] 95 | 96 | """ 97 | Summarize at **the end of which stage** the robot should release the keypoints. 98 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously). 99 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase. 100 | The length of the list should be equal to the number of stages. 101 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1. 102 | """ 103 | release_keypoints = [?, ..., ?] 104 | 105 | ``` 106 | 107 | ## Query 108 | Query Task: "reorient the red pen and drop it upright into the black pen holder" 109 | Query Image: -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-24-59_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/vlm_query/2024-10-14_08-24-59_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-26-54_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/prompt.txt: -------------------------------------------------------------------------------- 1 | ## Instructions 2 | Suppose you are controlling a robot to perform manipulation tasks by writing constraint functions in Python. The manipulation task is given as an image of the environment, overlayed with keypoints marked with their indices, along with a text instruction. For each given task, please perform the following steps: 3 | - Determine how many stages are involved in the task. Grasping must be an independent stage. Some examples: 4 | - "pouring tea from teapot": 5 | - 3 stages: "grasp teapot", "align teapot with cup opening", and "pour liquid" 6 | - "put red block on top of blue block": 7 | - 3 stages: "grasp red block", "drop the red block on top of blue block" 8 | - "reorient bouquet and drop it upright into vase": 9 | - 3 stages: "grasp bouquet", "reorient bouquet", and "keep upright and drop into vase" 10 | - For each stage, write two kinds of constraints, "sub-goal constraints" and "path constraints". The "sub-goal constraints" are constraints that must be satisfied **at the end of the stage**, while the "path constraints" are constraints that must be satisfied **within the stage**. Some examples: 11 | - "pouring liquid from teapot": 12 | - "grasp teapot" stage: 13 | - 1 sub-goal constraints: "align the end-effector with the teapot handle" 14 | - 0 path constraints 15 | - "align teapot with cup opening" stage: 16 | - 1 sub-goal constraints: "the teapot spout needs to be 10cm above the cup opening" 17 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot must stay upright to avoid spilling" 18 | - "pour liquid" stage: 19 | - 2 sub-goal constraints: "the teapot spout needs to be 5cm above the cup opening", "the teapot spout must be tilted to pour liquid" 20 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot spout is directly above the cup opening" 21 | - "put red block on top of blue block": 22 | - "grasp red block" stage: 23 | - 1 sub-goal constraints: "align the end-effector with the red block" 24 | - 0 path constraints 25 | - "drop the red block on top of blue block" stage: 26 | - 1 sub-goal constraints: "the red block is 10cm on top of the blue block" 27 | - 1 path constraints: "the robot must still be grasping the red block" 28 | - "eorient bouquet and drop it upright into vase": 29 | - "grasp bouquet" stage: 30 | - 1 sub-goal constraints: "align the end-effector with the bouquet stem" 31 | - 0 path constraints 32 | - "reorient bouquet" stage: 33 | - 1 sub-goal constraints: "the bouquet is upright (parallel to the z-axis)" 34 | - 1 path constraints: "the robot must still be grasping the bouquet stem" 35 | - "keep upright and drop into vase" stage: 36 | - 2 sub-goal constraints: "the bouquet must still stay upright (parallel to the z-axis)", "the bouquet is 20cm above the vase opening" 37 | - 1 path constraints: "the robot must still be grasping the bouquet stem" 38 | - Summarize keypoints to be grasped in all grasping stages by defining the `grasp_keypoints` variable. 39 | - Summarize at the end of which stage the robot should release the keypoints by defining the `release_keypoints` variable. 40 | 41 | **Note:** 42 | - Each constraint takes a dummy end-effector point and a set of keypoints as input and returns a numerical cost, where the constraint is satisfied if the cost is smaller than or equal to zero. 43 | - For each stage, you may write 0 or more sub-goal constraints and 0 or more path constraints. 44 | - Avoid using "if" statements in your constraints. 45 | - Avoid using path constraints when manipulating deformable objects (e.g., clothing, towels). 46 | - You do not need to consider collision avoidance. Focus on what is necessary to complete the task. 47 | - Inputs to the constraints are as follows: 48 | - `end_effector`: np.array of shape `(3,)` representing the end-effector position. 49 | - `keypoints`: np.array of shape `(K, 3)` representing the keypoint positions. 50 | - For any path constraint that requires the robot to be still grasping a keypoint `i`, you may use the provided function `get_grasping_cost_by_keypoint_idx` by calling `return get_grasping_cost_by_keypoint_idx(i)` where `i` is the index of the keypoint. 51 | - Inside of each function, you may use native Python functions, any NumPy functions, and the provided `get_grasping_cost_by_keypoint_idx` function. 52 | - For grasping stage, you should only write one sub-goal constraint that associates the end-effector with a keypoint. No path constraints are needed. 53 | - In order to move a keypoint, its associated object must be grasped in one of the previous stages. 54 | - The robot can only grasp one object at a time. 55 | - Grasping must be an independent stage from other stages. 56 | - You may use two keypoints to form a vector, which can be used to specify a rotation (by specifying the angle between the vector and a fixed axis). 57 | - You may use multiple keypoints to specify a surface or volume. 58 | - The keypoints marked on the image start with index 0, same as the given argument `keypoints` array. 59 | - For a point `i` to be relative to another point `j`, the function should define an `offsetted_point` variable that has the delta added to keypoint `j and then calculate the norm of the xyz coordinates of the keypoint `i` and the `offsetted_point`. 60 | - If you would like to specify a location not marked by a keypoint, try using multiple keypoints to specify the location (e.g., you may take the mean of multiple keypoints if the desired location is in the center of those keypoints). 61 | 62 | **Structure your output in a single python code block as follows:** 63 | ```python 64 | 65 | # Your explanation of how many stages are involved in the task and what each stage is about. 66 | # ... 67 | 68 | num_stages = ? 69 | 70 | ### stage 1 sub-goal constraints (if any) 71 | def stage1_subgoal_constraint1(end_effector, keypoints): 72 | """Put your explanation here.""" 73 | ... 74 | return cost 75 | # Add more sub-goal constraints if needed 76 | ... 77 | 78 | ### stage 1 path constraints (if any) 79 | def stage1_path_constraint1(end_effector, keypoints): 80 | """Put your explanation here.""" 81 | ... 82 | return cost 83 | # Add more path constraints if needed 84 | ... 85 | 86 | # repeat for more stages 87 | ... 88 | 89 | """ 90 | Summarize keypoints to be grasped in all grasping stages. 91 | The length of the list should be equal to the number of stages. 92 | For grapsing stage, write the keypoint index. For non-grasping stage, write -1. 93 | """ 94 | grasp_keypoints = [?, ..., ?] 95 | 96 | """ 97 | Summarize at **the end of which stage** the robot should release the keypoints. 98 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously). 99 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase. 100 | The length of the list should be equal to the number of stages. 101 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1. 102 | """ 103 | release_keypoints = [?, ..., ?] 104 | 105 | ``` 106 | 107 | ## Query 108 | Query Task: "reorient the red pen and drop it upright into the black pen holder" 109 | Query Image: -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-26-54_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/vlm_query/2024-10-14_08-26-54_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-39-29_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/prompt.txt: -------------------------------------------------------------------------------- 1 | ## Instructions 2 | Suppose you are controlling a robot to perform manipulation tasks by writing constraint functions in Python. The manipulation task is given as an image of the environment, overlayed with keypoints marked with their indices, along with a text instruction. For each given task, please perform the following steps: 3 | - Determine how many stages are involved in the task. Grasping must be an independent stage. Some examples: 4 | - "pouring tea from teapot": 5 | - 3 stages: "grasp teapot", "align teapot with cup opening", and "pour liquid" 6 | - "put red block on top of blue block": 7 | - 3 stages: "grasp red block", "drop the red block on top of blue block" 8 | - "reorient bouquet and drop it upright into vase": 9 | - 3 stages: "grasp bouquet", "reorient bouquet", and "keep upright and drop into vase" 10 | - For each stage, write two kinds of constraints, "sub-goal constraints" and "path constraints". The "sub-goal constraints" are constraints that must be satisfied **at the end of the stage**, while the "path constraints" are constraints that must be satisfied **within the stage**. Some examples: 11 | - "pouring liquid from teapot": 12 | - "grasp teapot" stage: 13 | - 1 sub-goal constraints: "align the end-effector with the teapot handle" 14 | - 0 path constraints 15 | - "align teapot with cup opening" stage: 16 | - 1 sub-goal constraints: "the teapot spout needs to be 10cm above the cup opening" 17 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot must stay upright to avoid spilling" 18 | - "pour liquid" stage: 19 | - 2 sub-goal constraints: "the teapot spout needs to be 5cm above the cup opening", "the teapot spout must be tilted to pour liquid" 20 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot spout is directly above the cup opening" 21 | - "put red block on top of blue block": 22 | - "grasp red block" stage: 23 | - 1 sub-goal constraints: "align the end-effector with the red block" 24 | - 0 path constraints 25 | - "drop the red block on top of blue block" stage: 26 | - 1 sub-goal constraints: "the red block is 10cm on top of the blue block" 27 | - 1 path constraints: "the robot must still be grasping the red block" 28 | - "eorient bouquet and drop it upright into vase": 29 | - "grasp bouquet" stage: 30 | - 1 sub-goal constraints: "align the end-effector with the bouquet stem" 31 | - 0 path constraints 32 | - "reorient bouquet" stage: 33 | - 1 sub-goal constraints: "the bouquet is upright (parallel to the z-axis)" 34 | - 1 path constraints: "the robot must still be grasping the bouquet stem" 35 | - "keep upright and drop into vase" stage: 36 | - 2 sub-goal constraints: "the bouquet must still stay upright (parallel to the z-axis)", "the bouquet is 20cm above the vase opening" 37 | - 1 path constraints: "the robot must still be grasping the bouquet stem" 38 | - Summarize keypoints to be grasped in all grasping stages by defining the `grasp_keypoints` variable. 39 | - Summarize at the end of which stage the robot should release the keypoints by defining the `release_keypoints` variable. 40 | 41 | **Note:** 42 | - Each constraint takes a dummy end-effector point and a set of keypoints as input and returns a numerical cost, where the constraint is satisfied if the cost is smaller than or equal to zero. 43 | - For each stage, you may write 0 or more sub-goal constraints and 0 or more path constraints. 44 | - Avoid using "if" statements in your constraints. 45 | - Avoid using path constraints when manipulating deformable objects (e.g., clothing, towels). 46 | - You do not need to consider collision avoidance. Focus on what is necessary to complete the task. 47 | - Inputs to the constraints are as follows: 48 | - `end_effector`: np.array of shape `(3,)` representing the end-effector position. 49 | - `keypoints`: np.array of shape `(K, 3)` representing the keypoint positions. 50 | - For any path constraint that requires the robot to be still grasping a keypoint `i`, you may use the provided function `get_grasping_cost_by_keypoint_idx` by calling `return get_grasping_cost_by_keypoint_idx(i)` where `i` is the index of the keypoint. 51 | - Inside of each function, you may use native Python functions, any NumPy functions, and the provided `get_grasping_cost_by_keypoint_idx` function. 52 | - For grasping stage, you should only write one sub-goal constraint that associates the end-effector with a keypoint. No path constraints are needed. 53 | - In order to move a keypoint, its associated object must be grasped in one of the previous stages. 54 | - The robot can only grasp one object at a time. 55 | - Grasping must be an independent stage from other stages. 56 | - You may use two keypoints to form a vector, which can be used to specify a rotation (by specifying the angle between the vector and a fixed axis). 57 | - You may use multiple keypoints to specify a surface or volume. 58 | - The keypoints marked on the image start with index 0, same as the given argument `keypoints` array. 59 | - For a point `i` to be relative to another point `j`, the function should define an `offsetted_point` variable that has the delta added to keypoint `j and then calculate the norm of the xyz coordinates of the keypoint `i` and the `offsetted_point`. 60 | - If you would like to specify a location not marked by a keypoint, try using multiple keypoints to specify the location (e.g., you may take the mean of multiple keypoints if the desired location is in the center of those keypoints). 61 | 62 | **Structure your output in a single python code block as follows:** 63 | ```python 64 | 65 | # Your explanation of how many stages are involved in the task and what each stage is about. 66 | # ... 67 | 68 | num_stages = ? 69 | 70 | ### stage 1 sub-goal constraints (if any) 71 | def stage1_subgoal_constraint1(end_effector, keypoints): 72 | """Put your explanation here.""" 73 | ... 74 | return cost 75 | # Add more sub-goal constraints if needed 76 | ... 77 | 78 | ### stage 1 path constraints (if any) 79 | def stage1_path_constraint1(end_effector, keypoints): 80 | """Put your explanation here.""" 81 | ... 82 | return cost 83 | # Add more path constraints if needed 84 | ... 85 | 86 | # repeat for more stages 87 | ... 88 | 89 | """ 90 | Summarize keypoints to be grasped in all grasping stages. 91 | The length of the list should be equal to the number of stages. 92 | For grapsing stage, write the keypoint index. For non-grasping stage, write -1. 93 | """ 94 | grasp_keypoints = [?, ..., ?] 95 | 96 | """ 97 | Summarize at **the end of which stage** the robot should release the keypoints. 98 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously). 99 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase. 100 | The length of the list should be equal to the number of stages. 101 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1. 102 | """ 103 | release_keypoints = [?, ..., ?] 104 | 105 | ``` 106 | 107 | ## Query 108 | Query Task: "reorient the red pen and drop it upright into the black pen holder" 109 | Query Image: -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-39-29_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/vlm_query/2024-10-14_08-39-29_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/metadata.json: -------------------------------------------------------------------------------- 1 | {"init_keypoint_positions": [[-0.3241568895701906, -0.26135710633859177, 0.8729118042800674], [-0.13032823512207833, -0.21448098577272112, 0.7109137537982642], [0.09161993232061272, -0.18090774877203772, 0.7072121106658631], [-0.1458522626378715, -0.1448651475409916, 0.7143060215345731], [-0.13071311749231349, -0.08189964351178469, 0.7158838367022394], [0.09358737688281993, -0.0534993640009152, 0.7095674875311597], [-0.2694544418102572, 0.08538554046925578, 0.7174613392959628], [-0.3588037694448534, 0.1895498661553155, 0.7242495959791785], [-0.22739163159972983, 0.16978317185973382, 0.809730735390318], [-0.2788102970880201, 0.2161113552597725, 0.7597332230457459]], "num_keypoints": 10, "num_stages": 3, "grasp_keypoints": [1, -1, -1], "release_keypoints": [-1, -1, 1]} -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/output_raw.txt: -------------------------------------------------------------------------------- 1 | ```python 2 | # The task involves 3 stages: 3 | # 1. Grasp the red pen. 4 | # 2. Reorient the red pen to be upright. 5 | # 3. Drop the red pen upright into the black pen holder. 6 | 7 | num_stages = 3 8 | 9 | ### stage 1 sub-goal constraints 10 | def stage1_subgoal_constraint1(end_effector, keypoints): 11 | """Align the end-effector with the red pen (keypoint 1).""" 12 | return np.linalg.norm(end_effector - keypoints[1]) 13 | 14 | ### stage 1 path constraints 15 | # No path constraints needed for grasping. 16 | 17 | ### stage 2 sub-goal constraints 18 | def stage2_subgoal_constraint1(end_effector, keypoints): 19 | """The red pen is upright (parallel to the z-axis).""" 20 | pen_vector = keypoints[3] - keypoints[1] 21 | z_axis = np.array([0, 0, 1]) 22 | return np.arccos(np.dot(pen_vector, z_axis) / (np.linalg.norm(pen_vector) * np.linalg.norm(z_axis))) 23 | 24 | ### stage 2 path constraints 25 | def stage2_path_constraint1(end_effector, keypoints): 26 | """The robot must still be grasping the red pen.""" 27 | return get_grasping_cost_by_keypoint_idx(1) 28 | 29 | ### stage 3 sub-goal constraints 30 | def stage3_subgoal_constraint1(end_effector, keypoints): 31 | """The red pen must still stay upright (parallel to the z-axis).""" 32 | pen_vector = keypoints[3] - keypoints[1] 33 | z_axis = np.array([0, 0, 1]) 34 | return np.arccos(np.dot(pen_vector, z_axis) / (np.linalg.norm(pen_vector) * np.linalg.norm(z_axis))) 35 | 36 | def stage3_subgoal_constraint2(end_effector, keypoints): 37 | """The red pen is 10cm above the pen holder opening.""" 38 | pen_tip = keypoints[1] 39 | holder_center = (keypoints[6] + keypoints[7] + keypoints[8] + keypoints[9]) / 4 40 | offsetted_point = holder_center + np.array([0, 0, 0.1]) 41 | return np.linalg.norm(pen_tip - offsetted_point) 42 | 43 | ### stage 3 path constraints 44 | def stage3_path_constraint1(end_effector, keypoints): 45 | """The robot must still be grasping the red pen.""" 46 | return get_grasping_cost_by_keypoint_idx(1) 47 | 48 | """ 49 | Summarize keypoints to be grasped in all grasping stages. 50 | The length of the list should be equal to the number of stages. 51 | For grasping stage, write the keypoint index. For non-grasping stage, write -1. 52 | """ 53 | grasp_keypoints = [1, -1, -1] 54 | 55 | """ 56 | Summarize at **the end of which stage** the robot should release the keypoints. 57 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously). 58 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase. 59 | The length of the list should be equal to the number of stages. 60 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1. 61 | """ 62 | release_keypoints = [-1, -1, 1] 63 | ``` -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/prompt.txt: -------------------------------------------------------------------------------- 1 | ## Instructions 2 | Suppose you are controlling a robot to perform manipulation tasks by writing constraint functions in Python. The manipulation task is given as an image of the environment, overlayed with keypoints marked with their indices, along with a text instruction. For each given task, please perform the following steps: 3 | - Determine how many stages are involved in the task. Grasping must be an independent stage. Some examples: 4 | - "pouring tea from teapot": 5 | - 3 stages: "grasp teapot", "align teapot with cup opening", and "pour liquid" 6 | - "put red block on top of blue block": 7 | - 3 stages: "grasp red block", "drop the red block on top of blue block" 8 | - "reorient bouquet and drop it upright into vase": 9 | - 3 stages: "grasp bouquet", "reorient bouquet", and "keep upright and drop into vase" 10 | - For each stage, write two kinds of constraints, "sub-goal constraints" and "path constraints". The "sub-goal constraints" are constraints that must be satisfied **at the end of the stage**, while the "path constraints" are constraints that must be satisfied **within the stage**. Some examples: 11 | - "pouring liquid from teapot": 12 | - "grasp teapot" stage: 13 | - 1 sub-goal constraints: "align the end-effector with the teapot handle" 14 | - 0 path constraints 15 | - "align teapot with cup opening" stage: 16 | - 1 sub-goal constraints: "the teapot spout needs to be 10cm above the cup opening" 17 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot must stay upright to avoid spilling" 18 | - "pour liquid" stage: 19 | - 2 sub-goal constraints: "the teapot spout needs to be 5cm above the cup opening", "the teapot spout must be tilted to pour liquid" 20 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot spout is directly above the cup opening" 21 | - "put red block on top of blue block": 22 | - "grasp red block" stage: 23 | - 1 sub-goal constraints: "align the end-effector with the red block" 24 | - 0 path constraints 25 | - "drop the red block on top of blue block" stage: 26 | - 1 sub-goal constraints: "the red block is 10cm on top of the blue block" 27 | - 1 path constraints: "the robot must still be grasping the red block" 28 | - "eorient bouquet and drop it upright into vase": 29 | - "grasp bouquet" stage: 30 | - 1 sub-goal constraints: "align the end-effector with the bouquet stem" 31 | - 0 path constraints 32 | - "reorient bouquet" stage: 33 | - 1 sub-goal constraints: "the bouquet is upright (parallel to the z-axis)" 34 | - 1 path constraints: "the robot must still be grasping the bouquet stem" 35 | - "keep upright and drop into vase" stage: 36 | - 2 sub-goal constraints: "the bouquet must still stay upright (parallel to the z-axis)", "the bouquet is 20cm above the vase opening" 37 | - 1 path constraints: "the robot must still be grasping the bouquet stem" 38 | - Summarize keypoints to be grasped in all grasping stages by defining the `grasp_keypoints` variable. 39 | - Summarize at the end of which stage the robot should release the keypoints by defining the `release_keypoints` variable. 40 | 41 | **Note:** 42 | - Each constraint takes a dummy end-effector point and a set of keypoints as input and returns a numerical cost, where the constraint is satisfied if the cost is smaller than or equal to zero. 43 | - For each stage, you may write 0 or more sub-goal constraints and 0 or more path constraints. 44 | - Avoid using "if" statements in your constraints. 45 | - Avoid using path constraints when manipulating deformable objects (e.g., clothing, towels). 46 | - You do not need to consider collision avoidance. Focus on what is necessary to complete the task. 47 | - Inputs to the constraints are as follows: 48 | - `end_effector`: np.array of shape `(3,)` representing the end-effector position. 49 | - `keypoints`: np.array of shape `(K, 3)` representing the keypoint positions. 50 | - For any path constraint that requires the robot to be still grasping a keypoint `i`, you may use the provided function `get_grasping_cost_by_keypoint_idx` by calling `return get_grasping_cost_by_keypoint_idx(i)` where `i` is the index of the keypoint. 51 | - Inside of each function, you may use native Python functions, any NumPy functions, and the provided `get_grasping_cost_by_keypoint_idx` function. 52 | - For grasping stage, you should only write one sub-goal constraint that associates the end-effector with a keypoint. No path constraints are needed. 53 | - In order to move a keypoint, its associated object must be grasped in one of the previous stages. 54 | - The robot can only grasp one object at a time. 55 | - Grasping must be an independent stage from other stages. 56 | - You may use two keypoints to form a vector, which can be used to specify a rotation (by specifying the angle between the vector and a fixed axis). 57 | - You may use multiple keypoints to specify a surface or volume. 58 | - The keypoints marked on the image start with index 0, same as the given argument `keypoints` array. 59 | - For a point `i` to be relative to another point `j`, the function should define an `offsetted_point` variable that has the delta added to keypoint `j and then calculate the norm of the xyz coordinates of the keypoint `i` and the `offsetted_point`. 60 | - If you would like to specify a location not marked by a keypoint, try using multiple keypoints to specify the location (e.g., you may take the mean of multiple keypoints if the desired location is in the center of those keypoints). 61 | 62 | **Structure your output in a single python code block as follows:** 63 | ```python 64 | 65 | # Your explanation of how many stages are involved in the task and what each stage is about. 66 | # ... 67 | 68 | num_stages = ? 69 | 70 | ### stage 1 sub-goal constraints (if any) 71 | def stage1_subgoal_constraint1(end_effector, keypoints): 72 | """Put your explanation here.""" 73 | ... 74 | return cost 75 | # Add more sub-goal constraints if needed 76 | ... 77 | 78 | ### stage 1 path constraints (if any) 79 | def stage1_path_constraint1(end_effector, keypoints): 80 | """Put your explanation here.""" 81 | ... 82 | return cost 83 | # Add more path constraints if needed 84 | ... 85 | 86 | # repeat for more stages 87 | ... 88 | 89 | """ 90 | Summarize keypoints to be grasped in all grasping stages. 91 | The length of the list should be equal to the number of stages. 92 | For grapsing stage, write the keypoint index. For non-grasping stage, write -1. 93 | """ 94 | grasp_keypoints = [?, ..., ?] 95 | 96 | """ 97 | Summarize at **the end of which stage** the robot should release the keypoints. 98 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously). 99 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase. 100 | The length of the list should be equal to the number of stages. 101 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1. 102 | """ 103 | release_keypoints = [?, ..., ?] 104 | 105 | ``` 106 | 107 | ## Query 108 | Query Task: "reorient the red pen and drop it upright into the black pen holder" 109 | Query Image: -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/query_img.png -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/stage1_subgoal_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage1_subgoal_constraint1(end_effector, keypoints): 2 | """Align the end-effector with the red pen (keypoint 1).""" 3 | return np.linalg.norm(end_effector - keypoints[1]) 4 | 5 | -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/stage2_path_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage2_path_constraint1(end_effector, keypoints): 2 | """The robot must still be grasping the red pen.""" 3 | return get_grasping_cost_by_keypoint_idx(1) 4 | 5 | -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/stage2_subgoal_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage2_subgoal_constraint1(end_effector, keypoints): 2 | """The red pen is upright (parallel to the z-axis).""" 3 | pen_vector = keypoints[3] - keypoints[1] 4 | z_axis = np.array([0, 0, 1]) 5 | return np.arccos(np.dot(pen_vector, z_axis) / (np.linalg.norm(pen_vector) * np.linalg.norm(z_axis))) 6 | 7 | -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/stage3_path_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage3_path_constraint1(end_effector, keypoints): 2 | """The robot must still be grasping the red pen.""" 3 | return get_grasping_cost_by_keypoint_idx(1) 4 | 5 | -------------------------------------------------------------------------------- /vlm_query/2024-10-14_08-44-42_reorient_the_red_pen_and_drop_it_upright_into_the_black_pen_holder/stage3_subgoal_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage3_subgoal_constraint1(end_effector, keypoints): 2 | """The red pen must still stay upright (parallel to the z-axis).""" 3 | pen_vector = keypoints[3] - keypoints[1] 4 | z_axis = np.array([0, 0, 1]) 5 | return np.arccos(np.dot(pen_vector, z_axis) / (np.linalg.norm(pen_vector) * np.linalg.norm(z_axis))) 6 | 7 | def stage3_subgoal_constraint2(end_effector, keypoints): 8 | """The red pen is 10cm above the pen holder opening.""" 9 | pen_tip = keypoints[1] 10 | holder_center = (keypoints[6] + keypoints[7] + keypoints[8] + keypoints[9]) / 4 11 | offsetted_point = holder_center + np.array([0, 0, 0.1]) 12 | return np.linalg.norm(pen_tip - offsetted_point) 13 | 14 | -------------------------------------------------------------------------------- /vlm_query/pen/metadata.json: -------------------------------------------------------------------------------- 1 | {"init_keypoint_positions": [[-0.25794319243703623, -0.2360844662773828, 0.6905476882082092], [-0.2671639177617655, -0.09433527815132495, 0.7122930549896219], [-0.27151234338495983, 0.08395197554366488, 0.7154432798183405], [-0.266379743453929, 0.08175199572534772, 0.8097122583174985], [-0.3607951132392543, 0.18610731832223598, 0.7178293666264162], [-0.2624312374613168, 0.195909440226552, 0.6867906392498943], [-0.31782969861483357, 0.21783651249007482, 0.7905845242493158]], "num_keypoints": 7, "num_stages": 3, "grasp_keypoints": [1, -1, -1], "release_keypoints": [-1, -1, 1]} -------------------------------------------------------------------------------- /vlm_query/pen/query_img.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bailey-24/Rekep/09455fe2be793ed5b571ece5fb39b64f8f4073cb/vlm_query/pen/query_img.png -------------------------------------------------------------------------------- /vlm_query/pen/stage1_subgoal_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage1_subgoal_constraint1(end_effector, keypoints): 2 | """Align the end-effector with the white pen's grasping point (keypoint 1).""" 3 | grasp_point = keypoints[1] 4 | cost = np.linalg.norm(end_effector - grasp_point) 5 | return cost 6 | 7 | -------------------------------------------------------------------------------- /vlm_query/pen/stage2_path_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage2_path_constraint1(end_effector, keypoints): 2 | """The robot must still be grasping the white pen (keypoint 1).""" 3 | return get_grasping_cost_by_keypoint_idx(1) 4 | 5 | -------------------------------------------------------------------------------- /vlm_query/pen/stage2_subgoal_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage2_subgoal_constraint1(end_effector, keypoints): 2 | """Ensure the white pen is upright by aligning the vector from keypoint 0 to keypoint 1 with the z-axis.""" 3 | pen_vector = keypoints[1] - keypoints[0] 4 | z_axis = np.array([0, 0, 1]) 5 | cost = np.linalg.norm(np.cross(pen_vector, z_axis)) 6 | return cost 7 | 8 | -------------------------------------------------------------------------------- /vlm_query/pen/stage3_path_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage3_path_constraint1(end_effector, keypoints): 2 | """The robot must still be grasping the white pen (keypoint 1).""" 3 | return get_grasping_cost_by_keypoint_idx(1) 4 | 5 | -------------------------------------------------------------------------------- /vlm_query/pen/stage3_subgoal_constraints.txt: -------------------------------------------------------------------------------- 1 | def stage3_subgoal_constraint1(end_effector, keypoints): 2 | """Ensure the white pen is above the black pen holder opening (mean of keypoints 3, 4, 5, 6) and aligned with the z-axis.""" 3 | holder_opening = np.mean(keypoints[3:7], axis=0) 4 | above_holder = holder_opening + np.array([0, 0, 0.2]) # 20cm above the holder opening 5 | cost = np.linalg.norm(keypoints[1] - above_holder) 6 | return cost 7 | 8 | def stage3_subgoal_constraint2(end_effector, keypoints): 9 | """Ensure the white pen is upright by aligning the vector from keypoint 0 to keypoint 1 with the z-axis.""" 10 | pen_vector = keypoints[1] - keypoints[0] 11 | z_axis = np.array([0, 0, 1]) 12 | cost = np.linalg.norm(np.cross(pen_vector, z_axis)) 13 | return cost 14 | 15 | -------------------------------------------------------------------------------- /vlm_query/prompt_template.txt: -------------------------------------------------------------------------------- 1 | ## Instructions 2 | Suppose you are controlling a robot to perform manipulation tasks by writing constraint functions in Python. The manipulation task is given as an image of the environment, overlayed with keypoints marked with their indices, along with a text instruction. For each given task, please perform the following steps: 3 | - Determine how many stages are involved in the task. Grasping must be an independent stage. Some examples: 4 | - "pouring tea from teapot": 5 | - 3 stages: "grasp teapot", "align teapot with cup opening", and "pour liquid" 6 | - "put red block on top of blue block": 7 | - 3 stages: "grasp red block", "drop the red block on top of blue block" 8 | - "reorient bouquet and drop it upright into vase": 9 | - 3 stages: "grasp bouquet", "reorient bouquet", and "keep upright and drop into vase" 10 | - For each stage, write two kinds of constraints, "sub-goal constraints" and "path constraints". The "sub-goal constraints" are constraints that must be satisfied **at the end of the stage**, while the "path constraints" are constraints that must be satisfied **within the stage**. Some examples: 11 | - "pouring liquid from teapot": 12 | - "grasp teapot" stage: 13 | - 1 sub-goal constraints: "align the end-effector with the teapot handle" 14 | - 0 path constraints 15 | - "align teapot with cup opening" stage: 16 | - 1 sub-goal constraints: "the teapot spout needs to be 10cm above the cup opening" 17 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot must stay upright to avoid spilling" 18 | - "pour liquid" stage: 19 | - 2 sub-goal constraints: "the teapot spout needs to be 5cm above the cup opening", "the teapot spout must be tilted to pour liquid" 20 | - 2 path constraints: "the robot must still be grasping the teapot handle", "the teapot spout is directly above the cup opening" 21 | - "put red block on top of blue block": 22 | - "grasp red block" stage: 23 | - 1 sub-goal constraints: "align the end-effector with the red block" 24 | - 0 path constraints 25 | - "drop the red block on top of blue block" stage: 26 | - 1 sub-goal constraints: "the red block is 10cm on top of the blue block" 27 | - 1 path constraints: "the robot must still be grasping the red block" 28 | - "eorient bouquet and drop it upright into vase": 29 | - "grasp bouquet" stage: 30 | - 1 sub-goal constraints: "align the end-effector with the bouquet stem" 31 | - 0 path constraints 32 | - "reorient bouquet" stage: 33 | - 1 sub-goal constraints: "the bouquet is upright (parallel to the z-axis)" 34 | - 1 path constraints: "the robot must still be grasping the bouquet stem" 35 | - "keep upright and drop into vase" stage: 36 | - 2 sub-goal constraints: "the bouquet must still stay upright (parallel to the z-axis)", "the bouquet is 20cm above the vase opening" 37 | - 1 path constraints: "the robot must still be grasping the bouquet stem" 38 | - Summarize keypoints to be grasped in all grasping stages by defining the `grasp_keypoints` variable. 39 | - Summarize at the end of which stage the robot should release the keypoints by defining the `release_keypoints` variable. 40 | 41 | **Note:** 42 | - Each constraint takes a dummy end-effector point and a set of keypoints as input and returns a numerical cost, where the constraint is satisfied if the cost is smaller than or equal to zero. 43 | - For each stage, you may write 0 or more sub-goal constraints and 0 or more path constraints. 44 | - Avoid using "if" statements in your constraints. 45 | - Avoid using path constraints when manipulating deformable objects (e.g., clothing, towels). 46 | - You do not need to consider collision avoidance. Focus on what is necessary to complete the task. 47 | - Inputs to the constraints are as follows: 48 | - `end_effector`: np.array of shape `(3,)` representing the end-effector position. 49 | - `keypoints`: np.array of shape `(K, 3)` representing the keypoint positions. 50 | - For any path constraint that requires the robot to be still grasping a keypoint `i`, you may use the provided function `get_grasping_cost_by_keypoint_idx` by calling `return get_grasping_cost_by_keypoint_idx(i)` where `i` is the index of the keypoint. 51 | - Inside of each function, you may use native Python functions, any NumPy functions, and the provided `get_grasping_cost_by_keypoint_idx` function. 52 | - For grasping stage, you should only write one sub-goal constraint that associates the end-effector with a keypoint. No path constraints are needed. 53 | - In order to move a keypoint, its associated object must be grasped in one of the previous stages. 54 | - The robot can only grasp one object at a time. 55 | - Grasping must be an independent stage from other stages. 56 | - You may use two keypoints to form a vector, which can be used to specify a rotation (by specifying the angle between the vector and a fixed axis). 57 | - You may use multiple keypoints to specify a surface or volume. 58 | - The keypoints marked on the image start with index 0, same as the given argument `keypoints` array. 59 | - For a point `i` to be relative to another point `j`, the function should define an `offsetted_point` variable that has the delta added to keypoint `j and then calculate the norm of the xyz coordinates of the keypoint `i` and the `offsetted_point`. 60 | - If you would like to specify a location not marked by a keypoint, try using multiple keypoints to specify the location (e.g., you may take the mean of multiple keypoints if the desired location is in the center of those keypoints). 61 | 62 | **Structure your output in a single python code block as follows:** 63 | ```python 64 | 65 | # Your explanation of how many stages are involved in the task and what each stage is about. 66 | # ... 67 | 68 | num_stages = ? 69 | 70 | ### stage 1 sub-goal constraints (if any) 71 | def stage1_subgoal_constraint1(end_effector, keypoints): 72 | """Put your explanation here.""" 73 | ... 74 | return cost 75 | # Add more sub-goal constraints if needed 76 | ... 77 | 78 | ### stage 1 path constraints (if any) 79 | def stage1_path_constraint1(end_effector, keypoints): 80 | """Put your explanation here.""" 81 | ... 82 | return cost 83 | # Add more path constraints if needed 84 | ... 85 | 86 | # repeat for more stages 87 | ... 88 | 89 | """ 90 | Summarize keypoints to be grasped in all grasping stages. 91 | The length of the list should be equal to the number of stages. 92 | For grapsing stage, write the keypoint index. For non-grasping stage, write -1. 93 | """ 94 | grasp_keypoints = [?, ..., ?] 95 | 96 | """ 97 | Summarize at **the end of which stage** the robot should release the keypoints. 98 | The keypoint indices must appear in an earlier stage as defined in `grasp_keypoints` (i.e., a keypoint can only be released only if it has been grasped previously). 99 | Only release object when it's necessary to complete the task, e.g., drop bouquet in the vase. 100 | The length of the list should be equal to the number of stages. 101 | If a keypoint is to be released at the end of a stage, write the keypoint index at the corresponding location. Otherwise, write -1. 102 | """ 103 | release_keypoints = [?, ..., ?] 104 | 105 | ``` 106 | 107 | ## Query 108 | Query Task: "{instruction}" 109 | Query Image: --------------------------------------------------------------------------------