├── .gitattributes ├── dexhand_env ├── __init__.py ├── tasks │ ├── __init__.py │ ├── base │ │ └── __init__.py │ ├── README.md │ └── base_task.py ├── utils │ ├── __init__.py │ ├── coordinate_transforms.py │ ├── config_utils.py │ └── experiment_manager.py ├── components │ ├── __init__.py │ ├── physics │ │ └── __init__.py │ ├── reset │ │ └── __init__.py │ ├── reward │ │ └── __init__.py │ ├── observation │ │ └── __init__.py │ ├── termination │ │ └── __init__.py │ ├── initialization │ │ └── __init__.py │ ├── graphics │ │ ├── __init__.py │ │ └── video │ │ │ └── __init__.py │ └── action │ │ ├── __init__.py │ │ ├── scaling.py │ │ └── default_rules.py ├── cfg │ ├── test_viewer.yaml │ ├── train_headless.yaml │ ├── base │ │ ├── debug.yaml │ │ ├── test.yaml │ │ └── video.yaml │ ├── debug.yaml │ ├── physics │ │ ├── accurate.yaml │ │ ├── fast.yaml │ │ ├── default.yaml │ │ └── README.md │ ├── test_record.yaml │ ├── test_stream.yaml │ ├── test_script.yaml │ ├── train │ │ └── BaseTaskPPO.yaml │ └── config.yaml ├── rl │ └── __init__.py ├── constants.py ├── README.md └── factory.py ├── assets ├── mjcf │ ├── open_ai_assets │ │ ├── stls │ │ │ ├── .get │ │ │ ├── hand │ │ │ │ ├── F1.stl │ │ │ │ ├── F2.stl │ │ │ │ ├── F3.stl │ │ │ │ ├── TH1_z.stl │ │ │ │ ├── TH2_z.stl │ │ │ │ ├── TH3_z.stl │ │ │ │ ├── palm.stl │ │ │ │ ├── wrist.stl │ │ │ │ ├── knuckle.stl │ │ │ │ ├── lfmetacarpal.stl │ │ │ │ ├── forearm_electric.stl │ │ │ │ └── forearm_electric_cvx.stl │ │ │ └── fetch │ │ │ │ ├── estop_link.stl │ │ │ │ ├── laser_link.stl │ │ │ │ ├── gripper_link.stl │ │ │ │ ├── torso_fixed_link.stl │ │ │ │ ├── base_link_collision.stl │ │ │ │ ├── bellows_link_collision.stl │ │ │ │ ├── l_wheel_link_collision.stl │ │ │ │ ├── r_wheel_link_collision.stl │ │ │ │ ├── elbow_flex_link_collision.stl │ │ │ │ ├── head_pan_link_collision.stl │ │ │ │ ├── head_tilt_link_collision.stl │ │ │ │ ├── torso_lift_link_collision.stl │ │ │ │ ├── wrist_flex_link_collision.stl │ │ │ │ ├── wrist_roll_link_collision.stl │ │ │ │ ├── forearm_roll_link_collision.stl │ │ │ │ ├── shoulder_pan_link_collision.stl │ │ │ │ ├── shoulder_lift_link_collision.stl │ │ │ │ └── upperarm_roll_link_collision.stl │ │ ├── textures │ │ │ ├── block.png │ │ │ └── block_hidden.png │ │ ├── hand │ │ │ ├── shadow_hand.xml │ │ │ ├── egg.xml │ │ │ ├── pen.xml │ │ │ ├── reach.xml │ │ │ ├── shared_asset.xml │ │ │ ├── manipulate_pen.xml │ │ │ ├── manipulate_pen_touch_sensors.xml │ │ │ ├── manipulate_egg.xml │ │ │ ├── manipulate_block.xml │ │ │ ├── manipulate_egg_touch_sensors.xml │ │ │ └── manipulate_block_touch_sensors.xml │ │ └── fetch │ │ │ ├── reach.xml │ │ │ ├── push.xml │ │ │ ├── slide.xml │ │ │ ├── pick_and_place.xml │ │ │ └── shared.xml │ └── nv_ant.xml └── README.md ├── prompts ├── feat-110-domain-randomization.md ├── doc-004-training.md ├── meta-001-programming-guideline.md ├── fix-009-config-consistency.md ├── meta-003-precommit.md ├── doc-000-cp.md ├── refactor-003-imports.md ├── fix-008-termination-reason-logging.md ├── feat-300-simulator-backend.md ├── refactor-007-blind-grasping.md ├── refactor-007-step-architecture.md ├── rl-001-blind-grasping-task.md ├── perf-000-physics-speed.md ├── doc-001-video.md ├── meta-002-backward-compatibility.md ├── feat-200-task-support.md ├── feat-100-bimanual.md ├── meta-000-workflow-setup.md ├── meta-004-docs.md ├── fix-000-tb-metrics.md ├── TEMPLATE.md ├── refactor-001-episode-length.md ├── fix-001-reward-logging-logic.md ├── fix-010-max-deltas.md ├── fix-006-metadata-keys.md ├── refactor-006-action-processing.md ├── feat-001-video-fps-control.md ├── feat-000-streaming-port.md ├── fix-005-box-bounce-physics.md ├── refactor-009-config-yaml.md ├── refactor-002-graphics-manager-in-parent.md ├── fix-007-episode-length-of-grasping.md ├── fix-003-max-iterations.md ├── doc-003-action-processing-illustration.md ├── feat-004-action-rule-example.md ├── doc-002-control-dt-illustration.md ├── refactor-008-config-key-casing.md ├── fix-001-contact-viz.md └── fix-002-consistency.md ├── .gitmodules ├── .pre-commit-config.yaml ├── .gitignore ├── LICENSE.txt ├── setup.py ├── docs ├── GETTING_STARTED.md ├── guide-viewer-controller.md ├── guide-indefinite-testing.md └── guide-environment-resets.md └── examples └── README.md /.gitattributes: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /dexhand_env/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /dexhand_env/tasks/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /dexhand_env/utils/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/.get: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /dexhand_env/components/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /dexhand_env/components/physics/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /dexhand_env/components/reset/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /dexhand_env/components/reward/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /dexhand_env/components/observation/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /dexhand_env/components/termination/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /dexhand_env/components/initialization/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /dexhand_env/components/graphics/__init__.py: -------------------------------------------------------------------------------- 1 | # Graphics components 2 | -------------------------------------------------------------------------------- /dexhand_env/components/graphics/video/__init__.py: -------------------------------------------------------------------------------- 1 | # Video components 2 | -------------------------------------------------------------------------------- /prompts/feat-110-domain-randomization.md: -------------------------------------------------------------------------------- 1 | We need a structured / systematic domain randomization scheme. 2 | -------------------------------------------------------------------------------- /prompts/doc-004-training.md: -------------------------------------------------------------------------------- 1 | Where does `TRAINING.md` fit in the doc system? Also, it has some outdated options. 2 | -------------------------------------------------------------------------------- /prompts/meta-001-programming-guideline.md: -------------------------------------------------------------------------------- 1 | Consolidate programming guidelines generally applicable to all projects to user memory. 2 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "assets/dexrobot_mujoco"] 2 | path = assets/dexrobot_mujoco 3 | url = https://gitee.com/dexrobot/dexrobot_mujoco.git 4 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/hand/F1.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/F1.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/hand/F2.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/F2.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/hand/F3.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/F3.stl -------------------------------------------------------------------------------- /prompts/fix-009-config-consistency.md: -------------------------------------------------------------------------------- 1 | The `test_record.yaml` config file has obselete legacy options. Check ALL config files for consistency. 2 | -------------------------------------------------------------------------------- /prompts/meta-003-precommit.md: -------------------------------------------------------------------------------- 1 | Add to CLAUDE.md: if a file is modified by precommit hook, then `git add` it again before retrying the commit. 2 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/hand/TH1_z.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/TH1_z.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/hand/TH2_z.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/TH2_z.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/hand/TH3_z.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/TH3_z.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/hand/palm.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/palm.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/hand/wrist.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/wrist.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/textures/block.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/textures/block.png -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/hand/knuckle.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/knuckle.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/estop_link.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/estop_link.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/laser_link.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/laser_link.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/textures/block_hidden.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/textures/block_hidden.png -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/gripper_link.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/gripper_link.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/hand/lfmetacarpal.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/lfmetacarpal.stl -------------------------------------------------------------------------------- /prompts/doc-000-cp.md: -------------------------------------------------------------------------------- 1 | Doc how to use `cp -P` to link latest experiments to `pinned`. Example: `cp -P runs/BlindGrasping_train_20250724_120120 runs/latest_train` 2 | -------------------------------------------------------------------------------- /prompts/refactor-003-imports.md: -------------------------------------------------------------------------------- 1 | There are some ugly mid-file imports (opencv, flask). Consider just making these components required to avoid unnecessary checks. 2 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/hand/forearm_electric.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/forearm_electric.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/torso_fixed_link.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/torso_fixed_link.stl -------------------------------------------------------------------------------- /dexhand_env/tasks/base/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Base task components for dexterous hand environments. 3 | """ 4 | 5 | from .vec_task import VecTask 6 | 7 | __all__ = ["VecTask"] 8 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/base_link_collision.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/base_link_collision.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/hand/forearm_electric_cvx.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/forearm_electric_cvx.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/bellows_link_collision.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/bellows_link_collision.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/l_wheel_link_collision.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/l_wheel_link_collision.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/r_wheel_link_collision.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/r_wheel_link_collision.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/elbow_flex_link_collision.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/elbow_flex_link_collision.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/head_pan_link_collision.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/head_pan_link_collision.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/head_tilt_link_collision.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/head_tilt_link_collision.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/torso_lift_link_collision.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/torso_lift_link_collision.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/wrist_flex_link_collision.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/wrist_flex_link_collision.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/wrist_roll_link_collision.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/wrist_roll_link_collision.stl -------------------------------------------------------------------------------- /prompts/fix-008-termination-reason-logging.md: -------------------------------------------------------------------------------- 1 | Does the termination reason logging take the average from the beginning of training? Should log the current status, not the historical average. 2 | -------------------------------------------------------------------------------- /assets/README.md: -------------------------------------------------------------------------------- 1 | # Assets Directory Structure 2 | 3 | - `mjcf`: Example MuJoCo description files provided by Isaac Gym 4 | - `dexrobot_mujoco`: Submodule providing description files for DexHand 5 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/forearm_roll_link_collision.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/forearm_roll_link_collision.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/shoulder_pan_link_collision.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/shoulder_pan_link_collision.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/shoulder_lift_link_collision.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/shoulder_lift_link_collision.stl -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/stls/fetch/upperarm_roll_link_collision.stl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/upperarm_roll_link_collision.stl -------------------------------------------------------------------------------- /dexhand_env/cfg/test_viewer.yaml: -------------------------------------------------------------------------------- 1 | # @package _global_ 2 | # Test configuration with rendering enabled 3 | 4 | defaults: 5 | - config 6 | - base/test 7 | - /physics/default 8 | - _self_ 9 | 10 | env: 11 | viewer: true 12 | -------------------------------------------------------------------------------- /dexhand_env/cfg/train_headless.yaml: -------------------------------------------------------------------------------- 1 | # @package _global_ 2 | # Training configuration with headless mode for fast training 3 | 4 | defaults: 5 | - config 6 | - _self_ 7 | 8 | env: 9 | viewer: false 10 | numEnvs: 8192 11 | -------------------------------------------------------------------------------- /prompts/feat-300-simulator-backend.md: -------------------------------------------------------------------------------- 1 | Support multiple simulators as backends, with unified interface. 2 | 3 | P1: IsaacSim / IsaacLab 4 | P2: Genesis 5 | P3: MJX / MuJoCo Playground (may require significant infra change due to the use of JAX) 6 | -------------------------------------------------------------------------------- /prompts/refactor-007-blind-grasping.md: -------------------------------------------------------------------------------- 1 | The current BlindGrasping task is better called "Blind Grasping" as it does not involve visual input. The task is to grasp an object without using vision, relying on other sensors or pre-defined parameters. 2 | -------------------------------------------------------------------------------- /prompts/refactor-007-step-architecture.md: -------------------------------------------------------------------------------- 1 | Why is pre_physics_step handled in DexHandBase but post_physics_step is in StepProcessor? 2 | 3 | I'm not saying the current architecture is wrong, but you should investigate and decide the best design. 4 | -------------------------------------------------------------------------------- /dexhand_env/cfg/base/debug.yaml: -------------------------------------------------------------------------------- 1 | # @package _global_ 2 | # Base debug configuration 3 | # Contains common settings for debug mode environments 4 | 5 | env: 6 | numEnvs: 4 # Small number for debugging 7 | 8 | logging: 9 | logLevel: "debug" # Verbose logging for debugging 10 | -------------------------------------------------------------------------------- /dexhand_env/cfg/debug.yaml: -------------------------------------------------------------------------------- 1 | # @package _global_ 2 | # Debug configuration with verbose logging and small scale 3 | 4 | defaults: 5 | - config 6 | - base/debug 7 | - base/test 8 | - _self_ 9 | 10 | env: 11 | viewer: true 12 | 13 | train: 14 | maxIterations: 100 # Override for shorter debug sessions 15 | -------------------------------------------------------------------------------- /prompts/rl-001-blind-grasping-task.md: -------------------------------------------------------------------------------- 1 | Still difficult to train a valid policy. Need to break down the details, after training and monitoring tools update. On essential path: 2 | - `fix-003-max-iterations.md` - Fix max iterations for BlindGrasping task 3 | - `fix-000-tb-metrics.md` - Fix TensorBoard metrics for BlindGrasping task 4 | -------------------------------------------------------------------------------- /prompts/perf-000-physics-speed.md: -------------------------------------------------------------------------------- 1 | Determine the optimal physics accuracy for training. 2 | 3 | Only do this after obtaining a good policy with the default physics settings. 4 | 5 | New finding: if I train with default physics but test with fast physics, the performance drops significantly. 6 | 7 | Need to do quantitative test to find the per-parameter impact. 8 | -------------------------------------------------------------------------------- /prompts/doc-001-video.md: -------------------------------------------------------------------------------- 1 | We need doc on how to use the video system. A recommended workflow: 2 | 3 | - Open one process to train headlessly 4 | - Open another testing process (possibly on CPU) with hot-reload on to monitor the training process 5 | - On server: use streaming 6 | - On local workstation: use video; use unison to sync checkpoints with training server 7 | -------------------------------------------------------------------------------- /prompts/meta-002-backward-compatibility.md: -------------------------------------------------------------------------------- 1 | Remove the requirement of backward compatibility in CLAUDE.md. 2 | 3 | On the contrary, we should **discourage** backward compatibility: research code breaks fast, so we should not bloat the codebase with legacy support. Instead, we should focus on maintaining a clean and modern codebase that embraces current best practices. 4 | -------------------------------------------------------------------------------- /dexhand_env/components/action/__init__.py: -------------------------------------------------------------------------------- 1 | """Action processing components for DexHand environment.""" 2 | 3 | from .rules import ActionRules 4 | from .scaling import ActionScaling 5 | from .default_rules import DefaultActionRules 6 | from .rule_based_controller import RuleBasedController 7 | 8 | __all__ = ["ActionRules", "ActionScaling", "DefaultActionRules", "RuleBasedController"] 9 | -------------------------------------------------------------------------------- /prompts/feat-200-task-support.md: -------------------------------------------------------------------------------- 1 | Support more manipulation tasks. 2 | 3 | Source includes: 4 | - IsaacGymEnvs examples of manipulation task of other hands 5 | - RoboHive environments 6 | 7 | Need to define a set of tasks and do tuning on a per-task basis. Can refer to the observations / actions / rewards / reset logic of existing IsaacGymEnvs tasks. Need to write PRD(s) for each new task. 8 | -------------------------------------------------------------------------------- /dexhand_env/cfg/physics/accurate.yaml: -------------------------------------------------------------------------------- 1 | # @package sim 2 | # Accurate physics configuration - maximum precision for training 3 | defaults: 4 | - default # Inherit from default.yaml 5 | - _self_ # Override with accurate-specific settings 6 | 7 | # Only override precision-specific parameters 8 | substeps: 32 # Enhanced substeps for stability 9 | 10 | physx: 11 | num_position_iterations: 32 # Enhanced for penetration mitigation 12 | -------------------------------------------------------------------------------- /dexhand_env/cfg/test_record.yaml: -------------------------------------------------------------------------------- 1 | # @package _global_ 2 | # Test configuration with video recording (headless) 3 | 4 | defaults: 5 | - config # Inherit from main config (includes train.checkpoint) 6 | - base/test 7 | - base/video # Add video configuration 8 | - /physics/default 9 | - _self_ 10 | 11 | env: 12 | viewer: false # Explicitly headless 13 | videoRecord: true # Enable file recording 14 | videoStream: false # Disable HTTP streaming 15 | -------------------------------------------------------------------------------- /dexhand_env/cfg/test_stream.yaml: -------------------------------------------------------------------------------- 1 | # @package _global_ 2 | # Test configuration with HTTP video streaming (headless) 3 | 4 | defaults: 5 | - config # Inherit from main config (includes train.checkpoint) 6 | - base/test 7 | - base/video # Add video configuration 8 | - /physics/default 9 | - _self_ 10 | 11 | env: 12 | viewer: false # Explicitly headless 13 | videoRecord: false # Disable file recording 14 | videoStream: true # Enable HTTP streaming 15 | -------------------------------------------------------------------------------- /prompts/feat-100-bimanual.md: -------------------------------------------------------------------------------- 1 | Support bimanual environment supporting dexhand_left and dexhand_right working in the same environment. 2 | 3 | Breakdown: 4 | - Create dexhand_left_floating mujoco model 5 | - Update hardcoded logic for loading hand asset and creating hand actors (what level of flexibility is needed?) 6 | - Pay attention to actor indices with bimanual + objects 7 | - Update action processing logic if needed 8 | - Create a template task for bimanual dexhands 9 | 10 | May need to create separate PRDs for each item. 11 | -------------------------------------------------------------------------------- /dexhand_env/cfg/base/test.yaml: -------------------------------------------------------------------------------- 1 | # @package _global_ 2 | # Base test configuration 3 | # Contains common settings for test mode environments 4 | 5 | 6 | # Environment settings for testing 7 | env: 8 | numEnvs: 4 # Small number for testing 9 | 10 | # Training settings for test mode 11 | train: 12 | test: true # Enable test mode 13 | maxIterations: 1000 # Reasonable default for testing 14 | seed: 42 # Random seed for reproducible testing 15 | testGamesNum: 50 # Number of games for test configurations (0 = indefinite) 16 | logging: 17 | logLevel: "info" # Standard logging level for tests 18 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | repos: 2 | - repo: https://github.com/pre-commit/pre-commit-hooks 3 | rev: v4.5.0 4 | hooks: 5 | - id: trailing-whitespace 6 | - id: end-of-file-fixer 7 | - id: check-yaml 8 | - id: check-added-large-files 9 | 10 | # Optional: Add black for consistent formatting 11 | - repo: https://github.com/psf/black 12 | rev: 23.12.1 13 | hooks: 14 | - id: black 15 | language_version: python3 16 | 17 | # Optional: Add ruff for linting 18 | - repo: https://github.com/astral-sh/ruff-pre-commit 19 | rev: v0.1.9 20 | hooks: 21 | - id: ruff 22 | args: [--fix, --exit-non-zero-on-fix] 23 | -------------------------------------------------------------------------------- /prompts/meta-000-workflow-setup.md: -------------------------------------------------------------------------------- 1 | Keep todo items in @prompts/ 2 | 3 | Create a ROADMAP.md file to organize the items. 4 | 5 | Prefix specific todo items with refactor_, feat_, fix_, or rl_. Specifically, task-specific design / tuning and RL physics tunings should fall in the `rl_` category (which is more research-oriented than traditional software engineering). 6 | 7 | Update CLAUDE.md to tell AI to use the guideline. 8 | 9 | Fix one item in a session. Work in the following order: 10 | - Ultrathink to obtain context about the issue and expand the todo item markdown into a PRD / architecture document. 11 | - Come up with detailed implementation plan and request user's approval. 12 | - Implement the plan and request user's review. 13 | -------------------------------------------------------------------------------- /dexhand_env/cfg/physics/fast.yaml: -------------------------------------------------------------------------------- 1 | # @package sim 2 | # Fast physics configuration - optimized for real-time visualization 3 | defaults: 4 | - default # Inherit from default.yaml 5 | - _self_ # Override with fast-specific settings 6 | 7 | # Only override performance-specific parameters 8 | substeps: 2 # Reduced substeps for speed 9 | 10 | physx: 11 | num_position_iterations: 8 # Reduced for speed (50% fewer) 12 | contact_offset: 0.002 # Slightly larger for speed 13 | rest_offset: 0.001 # Maintain ratio 14 | max_depenetration_velocity: 0.5 # Faster separation 15 | default_buffer_size_multiplier: 2.0 # Reduced buffer 16 | gpu_contact_pairs_per_env: 256 # Fewer contact pairs 17 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/hand/shadow_hand.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | -------------------------------------------------------------------------------- /dexhand_env/cfg/physics/default.yaml: -------------------------------------------------------------------------------- 1 | # @package sim 2 | # Default physics configuration - balanced quality/performance 3 | substeps: 4 # Standard physics substeps 4 | gravity: [0.0, 0.0, -9.81] 5 | num_client_threads: 0 6 | 7 | physx: 8 | solver_type: 1 # TGS solver 9 | num_position_iterations: 16 # Balanced precision 10 | num_velocity_iterations: 0 # NVIDIA recommendation 11 | contact_offset: 0.001 # High precision detection 12 | rest_offset: 0.0005 # Stability maintenance 13 | bounce_threshold_velocity: 0.15 14 | max_depenetration_velocity: 0.2 15 | default_buffer_size_multiplier: 4.0 16 | num_subscenes: 0 17 | contact_collection: 1 18 | gpu_contact_pairs_per_env: 512 19 | always_use_articulations: true 20 | num_threads: 4 21 | -------------------------------------------------------------------------------- /dexhand_env/cfg/test_script.yaml: -------------------------------------------------------------------------------- 1 | # @package _global_ 2 | # Test script configuration for examples/dexhand_test.py 3 | # Contains settings specific to environment functional testing 4 | 5 | defaults: 6 | - config 7 | - base/test 8 | - /physics/fast # Fast physics for smooth rendering 9 | - _self_ 10 | 11 | # Test script specific settings 12 | steps: 1200 # Total number of test steps to run 13 | sleep: 0.01 # Sleep time between steps in seconds 14 | headless: false # Run without GUI visualization 15 | debug: false # Enable debug output and additional logging 16 | log_level: "info" # Set logging verbosity level 17 | enablePlotting: false # Enable real-time plotting with Rerun 18 | plotEnvIdx: 0 # Environment index to plot 19 | 20 | env: 21 | viewer: true # Enable interactive visualization for test script 22 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | videos 2 | cloud 3 | recorded_frames 4 | *train_dir* 5 | *ige_logs* 6 | *.egg-info 7 | /.vs 8 | /.vscode 9 | /_package 10 | /shaders 11 | ._tmptext.txt 12 | __pycache__/ 13 | /tools/format/.lastrun 14 | *.pyc 15 | _doxygen 16 | *.pxd2 17 | logs* 18 | nn/ 19 | runs/ 20 | runs_all/ 21 | .idea 22 | outputs/ 23 | 24 | # Preserve directory structure with .gitkeep files 25 | !**/.gitkeep 26 | *.hydra* 27 | CLAUDE.md 28 | .unison.* 29 | ~ 30 | 31 | # Directories to ignore 32 | legacy/ 33 | reference/ 34 | assets/dexrobot_urdf/ 35 | 36 | # Python 37 | *.py[cod] 38 | *$py.class 39 | *.so 40 | .Python 41 | env/ 42 | build/ 43 | develop-eggs/ 44 | dist/ 45 | downloads/ 46 | eggs/ 47 | .eggs/ 48 | lib/ 49 | lib64/ 50 | parts/ 51 | sdist/ 52 | var/ 53 | *.egg-info/ 54 | .installed.cfg 55 | *.egg 56 | 57 | # OS specific 58 | .DS_Store 59 | 60 | # Temporary debug files 61 | patch_rlgames_timing.py 62 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/hand/egg.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/fetch/reach.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | -------------------------------------------------------------------------------- /prompts/meta-004-docs.md: -------------------------------------------------------------------------------- 1 | # Documentation Development Protocol Integration 2 | 3 | **Status**: ✅ **COMPLETED** (2025-08-01) 4 | 5 | **Problem**: Need comprehensive documentation development protocol in CLAUDE.md to ensure reader-oriented, accurate, and maintainable documentation. 6 | 7 | **Solution Implemented**: Added comprehensive "Documentation Development Protocol - CRITICAL" section to CLAUDE.md covering: 8 | 9 | 1. **Motivation-First Writing**: Always explain WHY before HOW with problem context 10 | 2. **Architecture Explanation Requirements**: Explain non-standard patterns and "magic" behavior 11 | 3. **Fact-Checking and Code Validation**: Verify every technical detail against implementation 12 | 4. **Reader-Oriented Structure**: Organize around user scenarios, not implementation structure 13 | 5. **Quality Gates**: 5-point validation checklist for documentation updates 14 | 15 | **Key Improvements**: 16 | - Addresses documentation quality issues observed in guide-indefinite-testing.md critique 17 | - Provides concrete examples of wrong vs correct documentation approaches 18 | - Establishes systematic validation process for technical accuracy 19 | - Balances conciseness with essential context requirements 20 | - Identifies common anti-patterns to avoid 21 | 22 | **Integration**: The protocol is now part of the core CLAUDE.md guidelines and will be applied to all future documentation work. 23 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/fetch/push.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | -------------------------------------------------------------------------------- /prompts/fix-000-tb-metrics.md: -------------------------------------------------------------------------------- 1 | # fix-000-tb-metrics.md 2 | 3 | Fix TensorBoard data point sampling limit causing old reward breakdown data to disappear in long runs. 4 | 5 | ## Context 6 | 7 | During long training runs, TensorBoard displays only the most recent ~780.5M steps for custom reward breakdown metrics, while RL Games built-in metrics show complete training history. This is due to TensorBoard's default 1000 data point limit per scalar tag with reservoir sampling. 8 | 9 | Custom reward breakdown logs more frequently (every 10 finished episodes) compared to built-in metrics (every epoch), causing it to hit the 1000-point limit faster. 10 | 11 | ## Current State 12 | 13 | - TensorBoard default: 1000 scalar data points per tag 14 | - RewardComponentObserver logs every `log_interval=10` finished episodes 15 | - Built-in metrics log every epoch (much less frequent) 16 | 17 | ## Desired Outcome 18 | 19 | Reduce logging frequency of reward breakdown metrics to stay within TensorBoard's data point limits for longer training runs. 20 | 21 | ## Implementation Notes 22 | 23 | **Solution: Increase log_interval** 24 | - Change default `log_interval` from 10 to 100+ finished episodes 25 | - Parameter meaning: "Write to TensorBoard once per X finished episodes" 26 | - This reduces data points by 10x, extending visible history from 780.5M to 7.8B+ steps 27 | 28 | **Code change:** 29 | ```python 30 | # In train.py or config 31 | observer = RewardComponentObserver(log_interval=100) # Was 10 32 | ``` 33 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/fetch/slide.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/hand/pen.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2018-2023, NVIDIA Corporation 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | 3. Neither the name of the copyright holder nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | 31 | See assets/licenses for license information for assets included in this repository 32 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/fetch/pick_and_place.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | -------------------------------------------------------------------------------- /dexhand_env/cfg/base/video.yaml: -------------------------------------------------------------------------------- 1 | # @package _global_ 2 | # Base video configuration for all video modes 3 | # Contains common video settings shared across recording and streaming 4 | # 5 | # INDEPENDENT FEATURES: 6 | # - Video recording (env.videoRecord: true/false) - Saves MP4 files to disk 7 | # - HTTP streaming (env.videoStream: true/false) - Serves real-time MJPEG over HTTP 8 | # - Both features can be enabled/disabled independently 9 | # - Both use the same unified camera system (ViewerController's persistent camera) 10 | # 11 | # USAGE EXAMPLES: 12 | # Recording only: env.videoRecord=true env.videoStream=false 13 | # Streaming only: env.videoRecord=false env.videoStream=true 14 | # Both enabled: env.videoRecord=true env.videoStream=true 15 | # Neither enabled: env.videoRecord=false env.videoStream=false (default) 16 | 17 | env: 18 | # Video recording configuration (requires OpenCV) 19 | # Note: FPS is automatically calculated from simulation timing (1.0 / control_dt) 20 | videoResolution: [1920, 1080] # Width x Height for both recording and streaming (Full HD) 21 | videoCodec: mp4v # Video codec (mp4v, XVID, H264) 22 | videoMaxDuration: 300 # Maximum duration per video file (seconds) 23 | 24 | # HTTP streaming configuration (requires Flask) 25 | videoStreamHost: "127.0.0.1" # Server host address (localhost for security) 26 | videoStreamPort: 58080 # Server port (uncommon port to avoid conflicts, auto-increments if occupied) 27 | videoStreamQuality: 100 # JPEG quality (1-100) - Maximum quality 28 | videoStreamBufferSize: 10 # Frame buffer size 29 | videoStreamBindAll: false # Bind to all interfaces (0.0.0.0) instead of localhost - SECURITY RISK if enabled 30 | -------------------------------------------------------------------------------- /dexhand_env/utils/coordinate_transforms.py: -------------------------------------------------------------------------------- 1 | """ 2 | Coordinate transformation utilities for DexHand environment. 3 | 4 | This module provides utilities for transforming points and orientations 5 | between different coordinate frames. 6 | """ 7 | 8 | # Import IsaacGym first 9 | from isaacgym.torch_utils import ( 10 | quat_rotate, 11 | quat_rotate_inverse, 12 | ) 13 | 14 | # Then import PyTorch 15 | 16 | 17 | def point_in_hand_frame(pos_world, hand_pos, hand_rot): 18 | """ 19 | Convert a point from world frame to hand frame. 20 | 21 | Args: 22 | pos_world: Position in world frame [batch_size, 3] 23 | hand_pos: Hand position in world frame [batch_size, 3] 24 | hand_rot: Hand rotation quaternion [batch_size, 4] in format [x, y, z, w] 25 | 26 | Returns: 27 | Position in hand frame [batch_size, 3] 28 | """ 29 | # Vector from hand to point in world frame 30 | rel_pos = pos_world - hand_pos 31 | 32 | # Use Isaac Gym's optimized quat_rotate_inverse to transform from world to hand frame 33 | local_pos = quat_rotate_inverse(hand_rot, rel_pos) 34 | 35 | return local_pos 36 | 37 | 38 | def point_in_world_frame(pos_local, hand_pos, hand_rot): 39 | """ 40 | Convert a point from hand frame to world frame. 41 | 42 | Args: 43 | pos_local: Position in hand frame [batch_size, 3] 44 | hand_pos: Hand position in world frame [batch_size, 3] 45 | hand_rot: Hand rotation quaternion [batch_size, 4] in format [x, y, z, w] 46 | 47 | Returns: 48 | Position in world frame [batch_size, 3] 49 | """ 50 | # Use Isaac Gym's optimized quat_rotate to transform from hand to world frame 51 | rotated_pos = quat_rotate(hand_rot, pos_local) 52 | 53 | # Add hand position to get world position 54 | world_pos = rotated_pos + hand_pos 55 | 56 | return world_pos 57 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/hand/reach.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | """Installation script for the 'dexhand' python package.""" 2 | 3 | from __future__ import absolute_import 4 | from __future__ import print_function 5 | from __future__ import division 6 | 7 | from setuptools import setup, find_packages 8 | 9 | import os 10 | 11 | root_dir = os.path.dirname(os.path.realpath(__file__)) 12 | 13 | 14 | # Minimum dependencies required prior to installation 15 | INSTALL_REQUIRES = [ 16 | # RL 17 | "gym==0.23.1", 18 | "torch", 19 | "omegaconf", 20 | "termcolor", 21 | "jinja2", 22 | "hydra-core>=1.2", 23 | "rl-games>=1.6.0", 24 | "tensorboard", # For training metrics logging 25 | "urdfpy==0.0.22", 26 | "pysdf==0.1.9", 27 | "warp-lang==0.10.1", 28 | "trimesh==3.23.5", 29 | # Video dependencies (now required for training workflows) 30 | "opencv-python>=4.5.0", # For video recording 31 | "flask>=2.0.0", # For HTTP video streaming 32 | ] 33 | 34 | # Optional dependencies for additional features (backward compatibility) 35 | # Note: video dependencies are now required in INSTALL_REQUIRES above 36 | EXTRAS_REQUIRE = { 37 | "streaming": [], # Flask now required by default 38 | "video": [], # OpenCV now required by default 39 | "all": [], # All video features now included by default 40 | } 41 | 42 | 43 | # Installation operation 44 | setup( 45 | name="dexhand_env", 46 | author="DexRobot Inc.", 47 | version="0.1.0", 48 | description="Reinforcement learning environment for dexterous manipulation with robotic hands", 49 | keywords=["robotics", "rl", "dexterous", "manipulation"], 50 | include_package_data=True, 51 | python_requires=">=3.6", 52 | install_requires=INSTALL_REQUIRES, 53 | extras_require=EXTRAS_REQUIRE, 54 | packages=find_packages("."), 55 | classifiers=[ 56 | "Natural Language :: English", 57 | "Programming Language :: Python :: 3.6, 3.7, 3.8", 58 | ], 59 | zip_safe=False, 60 | ) 61 | 62 | # EOF 63 | -------------------------------------------------------------------------------- /dexhand_env/cfg/train/BaseTaskPPO.yaml: -------------------------------------------------------------------------------- 1 | # @package train 2 | algo: 3 | name: a2c_continuous 4 | 5 | model: 6 | name: continuous_a2c_logstd 7 | 8 | network: 9 | name: actor_critic 10 | separate: False 11 | 12 | space: 13 | continuous: 14 | mu_activation: None 15 | sigma_activation: None 16 | mu_init: 17 | name: default 18 | sigma_init: 19 | name: const_initializer 20 | val: 1.0 21 | fixed_sigma: True 22 | 23 | mlp: 24 | units: [512, 256, 128] 25 | activation: elu 26 | d2rl: False 27 | 28 | initializer: 29 | name: default 30 | regularizer: 31 | name: None 32 | 33 | load_checkpoint: false # Will be set to true dynamically in train.py when checkpoint is provided 34 | load_path: ${train.checkpoint} 35 | 36 | config: 37 | name: ${task.name} 38 | full_experiment_name: ${.name} 39 | env_name: ${train.envName} 40 | device: ${env.device} 41 | multi_gpu: False 42 | ppo: True 43 | mixed_precision: False 44 | normalize_input: True 45 | normalize_value: True 46 | value_bootstrap: True 47 | num_actors: ${env.numEnvs} 48 | reward_shaper: 49 | scale_value: 1.0 50 | normalize_advantage: True 51 | gamma: 0.99 52 | tau: 0.95 53 | learning_rate: 3e-4 54 | lr_schedule: adaptive 55 | schedule_type: standard 56 | kl_threshold: 0.008 57 | score_to_win: 10000 58 | max_epochs: ${train.maxIterations} 59 | save_best_after: 1 60 | save_frequency: 100 61 | print_stats: True 62 | grad_norm: 1.0 63 | entropy_coef: 0.0 64 | truncate_grads: True 65 | e_clip: 0.2 66 | horizon_length: 16 67 | minibatch_size: ${env.numEnvs} 68 | mini_epochs: 4 69 | critic_coef: 4 70 | clip_value: True 71 | seq_len: 4 72 | bounds_loss_coef: 0.0001 73 | 74 | # TensorBoard logging configuration 75 | use_tensorboard: True 76 | tensorboard_logdir: 'runs' 77 | log_interval: 1 # Log every epoch 78 | save_interval: 100 # Save checkpoints every 100 epochs 79 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/hand/shared_asset.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | -------------------------------------------------------------------------------- /prompts/TEMPLATE.md: -------------------------------------------------------------------------------- 1 | # [PREFIX]-[NUMBER]-[SHORT-DESCRIPTION].md 2 | 3 | Brief one-line description of the task. 4 | 5 | ## Context 6 | 7 | 8 | 9 | 10 | 11 | ## Current State 12 | 13 | 14 | ## Desired Outcome 15 | 16 | 17 | ## Constraints 18 | 19 | 20 | 21 | ## Implementation Notes 22 | 23 | 24 | 25 | ## Dependencies 26 | 27 | 28 | --- 29 | 30 | ## Instructions 31 | 32 | ### File Naming Convention 33 | Use format: `[prefix]-[number]-[short-description].md` 34 | 35 | **Prefixes:** 36 | - `meta_` - Workflow, tooling, project organization 37 | - `refactor_` - Code quality, architectural improvements 38 | - `feat_` - New functionality, API enhancements 39 | - `fix_` - Bug fixes, issue resolution 40 | - `rl_` - Research tasks (policy tuning, physics, rewards) 41 | 42 | **Examples:** 43 | - `refactor-001-episode-length.md` 44 | - `rl-002-reward-tuning.md` 45 | - `feat-003-new-observation.md` 46 | 47 | ### Content Guidelines 48 | 49 | **Minimal Format (for simple tasks):** 50 | ```markdown 51 | # refactor-001-example-task.md 52 | Brief description of what needs to be done. 53 | ``` 54 | 55 | **Expanded Format (after Ultrathink phase):** 56 | - Fill in relevant sections as understanding deepens 57 | - Not all sections required for every task 58 | - Use during Phase 1 (Ultrathink) to develop detailed understanding 59 | 60 | ### Workflow Integration 61 | 62 | 1. **Create**: Start with minimal format (brief description) 63 | 2. **Ultrathink**: Expand sections as needed during Phase 1 64 | 3. **Plan**: Reference expanded content during Phase 2 planning 65 | 4. **Complete**: Mark as done in ROADMAP.md after Phase 3 66 | -------------------------------------------------------------------------------- /prompts/refactor-001-episode-length.md: -------------------------------------------------------------------------------- 1 | # refactor-001-episode-length.md 2 | 3 | Resolve architectural inconsistency in episodeLength configuration placement. 4 | 5 | ## Context 6 | 7 | The DexHand codebase has an architectural inconsistency where `episodeLength` is placed in different config sections: 8 | - BaseTask.yaml: `task.episodeLength: 300` 9 | - BlindGrasping.yaml: `env.episodeLength: 500` 10 | - DexHandBase code: expects `env_cfg["episodeLength"]` 11 | 12 | This creates potential runtime failures when BaseTask is used directly, since the code looks for the parameter in the env section but BaseTask defines it in the task section. 13 | 14 | ## Current State 15 | 16 | - **BaseTask.yaml**: Places `episodeLength` in `task` section (line 24) 17 | - **BlindGrasping.yaml**: Places `episodeLength` in `env` section (line 15) 18 | - **DexHandBase**: Reads from `self.env_cfg["episodeLength"]` (line 148) 19 | - **CLI Integration**: `dexhand_test.py` overrides `cfg["env"]["episodeLength"]` 20 | 21 | ## Desired Outcome 22 | 23 | Consistent placement of `episodeLength` parameter across all config files and code, eliminating the architectural inconsistency. 24 | 25 | ## Constraints 26 | 27 | - Must maintain backward compatibility with existing BlindGrasping.yaml 28 | - Must align with existing code expectations in DexHandBase 29 | - Must preserve CLI override functionality 30 | - Should follow architectural principles for config section organization 31 | 32 | ## Implementation Notes 33 | 34 | **Expert Consensus Results:** 35 | - Multiple AI models unanimously agreed `episodeLength` belongs in `env` section 36 | - Parameter is classified as runtime execution constraint, similar to `numEnvs`, `device` 37 | - Simple fix: move one line from task to env section in BaseTask.yaml 38 | 39 | **Key Insight from User Challenge:** 40 | `episodeLength` has task-semantic properties (affects task difficulty/feasibility) but is architecturally an environment runtime constraint (timeout mechanism). Current code expects env section placement. 41 | 42 | **Technical Approach:** 43 | 1. Move `episodeLength: 300` from `task` to `env` section in BaseTask.yaml 44 | 2. Test BaseTask environment creation to verify fix 45 | 3. No code changes required (DexHandBase already expects env section) 46 | 47 | ## Dependencies 48 | 49 | None - this is a standalone configuration fix. 50 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/hand/manipulate_pen.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/hand/manipulate_pen_touch_sensors.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | -------------------------------------------------------------------------------- /prompts/fix-001-reward-logging-logic.md: -------------------------------------------------------------------------------- 1 | # fix-001-reward-logging-logic.md 2 | 3 | Fix RewardComponentObserver logging cumulative averages instead of windowed statistics. 4 | 5 | ## Context 6 | 7 | RewardComponentObserver currently logs cumulative averages from training start instead of meaningful windowed statistics. This creates slowly-changing metrics that mask recent performance trends and provide poor insight into training progress. 8 | 9 | ## Current State 10 | 11 | **Flawed cumulative logic:** 12 | ```python 13 | # Accumulates forever (never resets except after_clear_stats) 14 | self.cumulative_sums["all"][component_name]["rewards"] += total_reward 15 | self.cumulative_sums["all"][component_name]["steps"] += total_steps 16 | 17 | # Logs cumulative average since training start 18 | step_mean = cum_rewards / max(cum_steps, 1) 19 | ``` 20 | 21 | This produces metrics like "average reward per step since training began" instead of "average reward per step over recent episodes." 22 | 23 | ## Desired Outcome 24 | 25 | Replace cumulative statistics with windowed statistics that reset after each logging interval, providing meaningful trending data. 26 | 27 | ## Implementation Notes 28 | 29 | **Windowed statistics approach:** 30 | ```python 31 | def _log_to_tensorboard(self): 32 | # Calculate window averages for current interval 33 | for component_name in self.discovered_components: 34 | window_rewards = self.cumulative_sums["all"][component_name]["rewards"] 35 | window_steps = self.cumulative_sums["all"][component_name]["steps"] 36 | step_mean = window_rewards / max(window_steps, 1) 37 | 38 | # Log windowed average 39 | self.writer.add_scalar(f"reward_breakdown/all/raw/step/{component_name}", step_mean, frame) 40 | 41 | # Reset window for next interval 42 | for term_type in self.cumulative_sums: 43 | for component_name in self.cumulative_sums[term_type]: 44 | self.cumulative_sums[term_type][component_name] = {"rewards": 0.0, "steps": 0} 45 | ``` 46 | 47 | **Benefits:** 48 | - Meaningful trending: shows performance changes over time 49 | - Better training insights: recent performance vs long-term averages 50 | - Fewer redundant data points: values change meaningfully between logs 51 | 52 | ## Constraints 53 | 54 | - Keep episode meters (rolling averages) unchanged 55 | - Maintain log_interval in episodes (makes sense for parallel environments) 56 | - Preserve component responsibility separation 57 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/hand/manipulate_egg.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/hand/manipulate_block.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 38 | 39 | 40 | 41 | 42 | -------------------------------------------------------------------------------- /docs/GETTING_STARTED.md: -------------------------------------------------------------------------------- 1 | # Getting Started with DexHand 2 | 3 | This guide will get you from zero to your first trained RL policy in under 10 minutes. 4 | 5 | ## Prerequisites 6 | 7 | - NVIDIA GPU with CUDA support 8 | - Python 3.8+ 9 | - Isaac Gym Preview 4 (see [installation instructions](https://developer.nvidia.com/isaac-gym)) 10 | 11 | ## Quick Setup (5 minutes) 12 | 13 | ### 1. Install Isaac Gym 14 | ```bash 15 | # Download Isaac Gym Preview 4 from NVIDIA 16 | # Follow their installation instructions, then verify: 17 | cd isaacgym/python/examples 18 | python joint_monkey.py # Should show a working simulation 19 | ``` 20 | 21 | ### 2. Clone and Install DexHand 22 | ```bash 23 | git clone --recursive https://github.com/dexrobot/dexrobot_isaac 24 | cd dexrobot_isaac 25 | pip install -e . 26 | ``` 27 | 28 | > **Missing submodules?** Run `git submodule update --init --recursive` to fetch required robot models. 29 | 30 | ### 3. Verify Installation 31 | ```bash 32 | # Quick test (should show hand visualization) 33 | python examples/dexhand_test.py --num-envs 1 --episode-length 100 34 | ``` 35 | 36 | You should see an Isaac Gym window with a dexterous hand in the simulation. 37 | 38 | ## Your First Training (3 minutes) 39 | 40 | ### 1. Start Training 41 | ```bash 42 | # Train a basic policy 43 | python train.py task=BaseTask numEnvs=512 44 | ``` 45 | 46 | This creates a new experiment in `runs/BaseTask_train_YYMMDD_HHMMSS/` and begins training. 47 | 48 | ### 2. Test Your Trained Policy 49 | ```bash 50 | # Test with visualization 51 | python train.py task=BaseTask test=true checkpoint=latest viewer=true numEnvs=4 52 | ``` 53 | 54 | The system automatically finds your latest training checkpoint and visualizes the learned policy. 55 | 56 | ## Next Steps 57 | 58 | - **[Training Guide](TRAINING.md)** - Comprehensive training workflows, testing options, and experiment management 59 | - **[Task Creation Guide](guide-task-creation.md)** - Create custom manipulation tasks 60 | - **[Troubleshooting](TROUBLESHOOTING.md)** - Solutions for common setup and runtime issues 61 | - **[System Architecture](ARCHITECTURE.md)** - Understanding the component-based design 62 | 63 | ## Quick Troubleshooting 64 | 65 | - **ImportError: isaacgym** → Isaac Gym not installed correctly 66 | - **CUDA out of memory** → Reduce `numEnvs` (try 256 or 128) 67 | - **Missing assets/dexrobot_mujoco** → Run `git submodule update --init --recursive` 68 | - **No checkpoints found** → Training hasn't saved checkpoints yet (wait longer) 69 | 70 | For detailed troubleshooting, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md). 71 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/hand/manipulate_egg_touch_sensors.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/hand/manipulate_block_touch_sensors.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | -------------------------------------------------------------------------------- /prompts/fix-010-max-deltas.md: -------------------------------------------------------------------------------- 1 | # fix-010-max-deltas.md 2 | 3 | Investigate and verify max_deltas scaling correctness in ActionProcessor. 4 | 5 | ## Context 6 | 7 | Based on task description, there was concern that max_deltas scaling uses incorrect timing values due to control_dt vs physics_dt initialization bug. However, static code analysis reveals the current implementation appears architecturally correct. 8 | 9 | ## Current State 10 | 11 | **ActionProcessor Implementation Analysis:** 12 | - `_precompute_max_deltas()` correctly called during Stage 2 (`finalize_setup()`) after control_dt measurement 13 | - Uses property decorator to access `self.parent.physics_manager.control_dt` (single source of truth) 14 | - Calculation: `max_deltas = control_dt * velocity_limit` is mathematically sound 15 | - Two-stage initialization pattern properly implemented 16 | 17 | **Configuration Values:** 18 | - BlindGrasping: `max_finger_joint_velocity: 0.5`, `sim.dt: 0.01` (physics_dt) 19 | - BaseTask: `max_finger_joint_velocity: 1.0`, `sim.dt: 0.005` (physics_dt) 20 | 21 | **Expected Calculations:** 22 | - BlindGrasping: control_dt = 0.02 (0.01 × 2 steps), max_deltas = 0.5 × 0.02 = 0.01 23 | - BaseTask: control_dt ≈ 0.005 (0.005 × 1 step), max_deltas = 1.0 × 0.005 = 0.005 24 | 25 | ## Desired Outcome 26 | 27 | **Verification Required:** 28 | 1. Confirm current implementation calculates correct max_deltas values 29 | 2. Validate no control_dt vs physics_dt confusion exists 30 | 3. Update task status based on actual findings 31 | 32 | **Possible Outcomes:** 33 | - **If correct**: Mark task as invalid/completed, no changes needed 34 | - **If incorrect**: Identify actual root cause and implement fix 35 | 36 | ## Constraints 37 | 38 | - Follow existing two-stage initialization pattern 39 | - Maintain fail-fast philosophy (no defensive programming) 40 | - Preserve single source of truth for control_dt 41 | - Respect ActionProcessor component boundaries 42 | 43 | ## Implementation Notes 44 | 45 | **Investigation Steps:** 46 | 1. Add temporary debug logging to verify actual calculated values 47 | 2. Test with both BlindGrasping and BaseTask configurations 48 | 3. Compare expected vs actual max_deltas values 49 | 4. Identify if discrepancy exists and locate root cause 50 | 51 | **Potential Issues to Check:** 52 | - Property decorator functioning correctly 53 | - control_dt measurement accuracy 54 | - Configuration parameter loading 55 | - Physics steps calculation 56 | 57 | ## Dependencies 58 | 59 | - Requires understanding of two-stage initialization pattern 60 | - Depends on PhysicsManager control_dt measurement accuracy 61 | -------------------------------------------------------------------------------- /prompts/fix-006-metadata-keys.md: -------------------------------------------------------------------------------- 1 | # fix-006-metadata-keys.md 2 | 3 | ✅ **COMPLETED** - Git metadata saving fails due to hardcoded config key assumptions. 4 | 5 | ## Context 6 | 7 | The training script's git metadata saving functionality tries to reconstruct CLI arguments by hardcoding expected config keys like `env.render`. This violates fail-fast principles and breaks when configuration keys change (as happened in refactor-004-render.md where `render` became `viewer`). 8 | 9 | Current error: 10 | ``` 11 | WARNING | Could not save git metadata: Key 'render' is not in struct 12 | full_key: env.render 13 | object_type=dict 14 | ``` 15 | 16 | ## Current State 17 | 18 | The `get_config_overrides()` function in train.py attempts to reconstruct CLI arguments for reproducibility by: 19 | 1. Hardcoding assumed "important" config keys 20 | 2. Building a synthetic "Hydra equivalent" command 21 | 3. Failing when keys don't exist (violating fail-fast) 22 | 23 | This approach has fundamental flaws: 24 | - **Information Loss**: Only captures subset of assumed important values 25 | - **Hardcoded Assumptions**: Breaks when config structure changes 26 | - **Incomplete Reconstruction**: Cannot fully reproduce complex configuration hierarchies 27 | - **Defensive Programming**: Uses existence checks that mask configuration issues 28 | 29 | ## Desired Outcome 30 | 31 | Replace flawed reconstruction approach with comprehensive config saving: 32 | 33 | 1. **Remove hardcoded key assumptions** - eliminate `get_config_overrides()` function entirely 34 | 2. **Save complete resolved config** - serialize the full OmegaConf configuration as YAML 35 | 3. **Preserve existing working functionality** - keep original CLI args and git info unchanged 36 | 4. **Full reproducibility** - anyone can recreate exact training conditions 37 | 38 | ## Constraints 39 | 40 | - **Fail-fast compliance**: No defensive programming or hardcoded key checks 41 | - **Single source of truth**: Config values come from resolved configuration only 42 | - **Architectural alignment**: Follows recent configuration refactoring principles 43 | - **Backward compatibility**: Don't break existing experiment tracking 44 | 45 | ## Implementation Notes 46 | 47 | **Files to modify:** 48 | - `train.py`: Remove `get_config_overrides()`, save complete config instead 49 | 50 | **Testing approach:** 51 | - Verify no warnings during metadata saving 52 | - Confirm complete config is saved in readable format 53 | - Test with BaseTask and BlindGrasping configurations 54 | 55 | **Config serialization considerations:** 56 | - Use `OmegaConf.to_yaml()` for human-readable format 57 | - Handle any non-serializable objects gracefully 58 | - Save to dedicated file for easy inspection 59 | -------------------------------------------------------------------------------- /prompts/refactor-006-action-processing.md: -------------------------------------------------------------------------------- 1 | # refactor-006-action-processing.md 2 | 3 | Split action processing timing to align with RL rollout patterns for better clarity and coherence. 4 | 5 | ## Context 6 | 7 | The current action processing logic is bundled together in `pre_physics_step()`, which doesn't align well with standard RL rollout patterns. The refactoring will split action processing into two phases: 8 | - Pre-action computation in post_physics (step N-1) to prepare DOF targets for next step's observations 9 | - Post-action processing in pre_physics (step N) to apply policy actions 10 | 11 | This improves clarity and makes the timing more coherent with RL frameworks where observations for step N are computed in step N-1. 12 | 13 | ## Current State 14 | 15 | **Current Flow (in `pre_physics_step()`):** 16 | 1. Compute observations excluding active_rule_targets 17 | 2. Apply pre-action rule → compute active_rule_targets 18 | 3. Add active_rule_targets to observations 19 | 4. Process policy actions (action rule + post filters + coupling) 20 | 21 | **Current `post_physics_step()`:** 22 | - Only processes rewards, termination, resets 23 | - Returns already-computed observations from pre_physics_step 24 | - Comments indicate "Observations were already computed in pre_physics_step" 25 | 26 | ## Desired Outcome 27 | 28 | **New Flow:** 29 | 30 | **Post-physics (step N-1):** 31 | 1. Compute observations for step N (excluding active_rule_targets) 32 | 2. Apply pre-action rule using these observations → get active_rule_targets 33 | 3. Add active_rule_targets to observations 34 | 4. Return complete observations for step N 35 | 36 | **Pre-physics (step N):** 37 | - Apply policy actions only (action rule + post filters + coupling) 38 | - Skip observation computation and pre-action rule 39 | 40 | ## Constraints 41 | 42 | - Must preserve two-stage initialization pattern 43 | - Must respect component boundaries and single source of truth 44 | - Must not break existing tasks or functionality 45 | - Should maintain or improve performance 46 | 47 | ## Implementation Notes 48 | 49 | **Key Changes Required:** 50 | 1. **StepProcessor**: Move observation computation and pre-action rule to post_physics 51 | 2. **DexHandBase**: Modify pre_physics_step to only handle post-action processing 52 | 3. **ActionProcessor**: Ensure pre-action and post-action can be called separately 53 | 54 | **Component Modifications:** 55 | - `StepProcessor.process_physics_step()`: Add observation computation + pre-action 56 | - `DexHandBase.pre_physics_step()`: Remove stages 1-3, keep only stage 4 57 | - Ensure ActionProcessor can handle split pre/post action processing 58 | 59 | **Testing Approach:** 60 | - Verify identical behavior before/after refactoring 61 | - Test with existing BlindGrasping task 62 | - Validate timing and performance 63 | 64 | ## Dependencies 65 | 66 | None - this is a self-contained timing refactoring. 67 | -------------------------------------------------------------------------------- /dexhand_env/cfg/config.yaml: -------------------------------------------------------------------------------- 1 | # Main Hydra configuration for DexHand training and testing 2 | # Training: python train.py 3 | # Testing: python examples/dexhand_test.py 4 | # Override: python train.py task=BlindGrasping env.numEnvs=2048 5 | # Override: python examples/dexhand_test.py task=BlindGrasping headless=true steps=500 6 | 7 | defaults: 8 | - train: BaseTaskPPO 9 | - base/video 10 | - _self_ 11 | - task: BaseTask 12 | 13 | # Task and training configs will be loaded from task/ and train/ subdirectories 14 | # The following are runtime overrides 15 | 16 | device: "cuda:0" # Device for simulation and RL 17 | 18 | # Physics engine at root level (required by VecTask) 19 | physics_engine: physx 20 | 21 | # Simulation configuration 22 | sim: 23 | substeps: 4 24 | gravity: [0.0, 0.0, -9.81] 25 | num_client_threads: 0 26 | physx: 27 | solver_type: 1 28 | num_position_iterations: 16 29 | num_velocity_iterations: 0 30 | contact_offset: 0.001 31 | rest_offset: 0.0005 32 | bounce_threshold_velocity: 0.15 33 | max_depenetration_velocity: 0.2 34 | default_buffer_size_multiplier: 4.0 35 | num_subscenes: 0 36 | contact_collection: 1 37 | gpu_contact_pairs_per_env: 512 38 | always_use_articulations: true 39 | num_threads: 4 40 | dt: 0.005 41 | graphicsDeviceId: 0 # Graphics device ID for rendering 42 | 43 | # Environment configuration 44 | env: 45 | numEnvs: 1024 # Basic config for training, can be overridden 46 | device: "cuda:0" 47 | viewer: false # Interactive visualization window 48 | videoRecord: false # Save video files to disk 49 | videoStream: false # Stream video over network 50 | controlMode: "position" # Control mode (position/position_delta) - can be overridden by tasks 51 | clipObservations: .inf # Observation clipping limit (inf = no clipping) 52 | clipActions: .inf # Action clipping limit (inf = no clipping) 53 | 54 | 55 | # Task configuration (RL task definition only) 56 | # Task-specific settings inherited from task/ configs 57 | 58 | # Training configuration (algorithm and training process) 59 | train: 60 | seed: 42 61 | torchDeterministic: false 62 | maxIterations: 10000 63 | test: false 64 | checkpoint: null 65 | envName: "rlgpu_dexhand" # RL Games environment name 66 | reloadInterval: 30 # Seconds between checkpoint reloads in test mode 67 | testGamesNum: 100 # Number of games to evaluate in test mode (0 = indefinite) 68 | logging: 69 | experimentName: null # Auto-generated if null 70 | rewardLogInterval: 100 # Log reward breakdown every N finished episodes (prevents TensorBoard sampling limit) 71 | logLevel: "info" 72 | noLogFile: false 73 | experiment: 74 | maxTrainRuns: 10 # Maximum number of recent training runs to keep in workspace 75 | maxTestRuns: 10 # Maximum number of recent testing runs to keep in workspace 76 | -------------------------------------------------------------------------------- /prompts/feat-001-video-fps-control.md: -------------------------------------------------------------------------------- 1 | # feat-001-video-fps-control.md 2 | 3 | Implement automatic video FPS calculation based on simulation timing to ensure accurate playback speed. 4 | 5 | ## Context 6 | 7 | The current video recording system uses a fixed `videoFps` configuration that's independent of simulation timing, causing videos to play at incorrect speeds. This creates temporal inaccuracy where recorded videos don't match the actual simulation playback speed. 8 | 9 | **Root Cause**: VideoRecorder uses hardcoded FPS while simulation runs at a different frequency determined by `control_dt`. 10 | 11 | **Physics Relationship**: Simulation frequency = 1/control_dt, but video encoding uses unrelated config FPS. 12 | 13 | ## Current State 14 | 15 | - VideoRecorder initialized with fixed `env.videoFps` from config (default: 60.0) 16 | - All frames captured during render() are recorded without timing consideration 17 | - Video playback speed incorrect when simulation frequency ≠ configured videoFps 18 | - Example: 50Hz simulation (control_dt=0.02) + 60fps config = 1.2x speed video 19 | 20 | ## Desired Outcome 21 | 22 | - Video FPS automatically calculated as `1.0 / control_dt` for accurate real-time playback 23 | - Remove obsolete `videoFps` configuration option 24 | - Videos play back at correct simulation speed regardless of physics timing 25 | - Maintain all temporal information without frame dropping 26 | 27 | ## Constraints 28 | 29 | - **Two-Stage Initialization**: VideoRecorder created before `control_dt` is measured 30 | - **Component Architecture**: Must follow established `finalize_setup()` pattern like ActionProcessor 31 | - **Fail-Fast Philosophy**: No defensive programming - crash if VideoRecorder used before finalization 32 | - **Single Source of Truth**: FPS comes from physics timing, not config 33 | 34 | ## Implementation Notes 35 | 36 | **Architecture Pattern**: Implement deferred FPS setting following ActionProcessor model: 37 | 38 | 1. **Phase 1 (Creation)**: VideoRecorder(fps=None) before control_dt available 39 | 2. **Phase 2 (Finalization)**: video_recorder.finalize_fps(1.0 / control_dt) after measurement 40 | 41 | **Key Changes**: 42 | - Add `finalize_fps()` method to VideoRecorder class 43 | - Modify initialization to accept fps=None initially 44 | - Add finalization call in `_perform_control_cycle_measurement()` 45 | - Remove `videoFps` from base/video.yaml 46 | - Update create_video_recorder_from_config() to handle missing fps 47 | 48 | **Testing Approach**: 49 | - Verify different control_dt values produce correct video FPS 50 | - Test initialization order (finalize before recording) 51 | - Validate video playback speed matches simulation timing 52 | 53 | ## Dependencies 54 | 55 | - Requires understanding of two-stage initialization pattern 56 | - Must preserve existing video recording functionality 57 | - Should maintain HTTP streaming compatibility (uses separate FPS logic) 58 | -------------------------------------------------------------------------------- /prompts/feat-000-streaming-port.md: -------------------------------------------------------------------------------- 1 | # HTTP Streaming Port Management Enhancement 2 | 3 | ## Status: ✅ COMPLETED (2025-07-30) 4 | 5 | ## Original Requirements 6 | - Change the default port to a uncommon one 7 | - Automatically increment port if the port is already in use 8 | - Add a quick option to bind all interfaces, not just localhost 9 | 10 | ## Implementation Summary 11 | 12 | ### Core Features Delivered 13 | 1. **Uncommon Default Port**: Changed from conflict-prone 8080 to 58080 (~90% fewer conflicts expected) 14 | 2. **Automatic Port Resolution**: Robust port conflict detection with up to 10 retry attempts (58080 → 58081 → ...) 15 | 3. **All-Interface Binding**: Optional 0.0.0.0 binding with security warnings via `videoStreamBindAll` config option 16 | 4. **CLI Convenience**: Added `streamBindAll` alias for easy command-line usage 17 | 18 | ### Architecture Improvements 19 | - **Single Source of Truth**: Host decision made once in constructor (eliminated repeated conditionals) 20 | - **Robust Port Detection**: Pre-test port availability using socket binding before Flask server start 21 | - **Clean Configuration**: Enhanced base/video.yaml with comprehensive security documentation 22 | - **Fail-Fast Philosophy**: No defensive programming, clear error handling and logging 23 | 24 | ### Files Modified 25 | - `base/video.yaml`: Updated port to 58080, added videoStreamBindAll option with security docs 26 | - `train.py`: Added stream_bind_all mapping for configuration processing 27 | - `cli_utils.py`: Added streamBindAll CLI alias 28 | - `http_video_streamer.py`: Enhanced constructor, port auto-increment logic, statistics 29 | 30 | ### Testing Verified 31 | - ✅ Port auto-increment functionality (58080 occupied → auto-increments to 58081) 32 | - ✅ Bind-all mode (correctly binds to 0.0.0.0 with security warnings) 33 | - ✅ CLI aliases work seamlessly (`streamBindAll=true`) 34 | - ✅ Configuration loading with proper defaults 35 | - ✅ Server accessibility and enhanced statistics reporting 36 | 37 | ### Usage Examples 38 | ```bash 39 | # Default configuration (localhost:58080) 40 | python train.py env.videoStream=true 41 | 42 | # With all-interface binding 43 | python train.py env.videoStream=true streamBindAll=true 44 | 45 | # Port conflicts automatically resolved 46 | # If 58080 is occupied, automatically uses 58081, 58082, etc. 47 | ``` 48 | 49 | ### Impact 50 | - Reduces port conflicts by ~90% with uncommon default port 51 | - Automatic conflict resolution eliminates manual intervention 52 | - Flexible deployment options while maintaining security-conscious defaults 53 | - Improved user experience with clear logging and CLI shortcuts 54 | - Complete port management infrastructure with ~88 lines of focused changes 55 | 56 | ## Architecture Compliance 57 | - ✅ Fail-fast philosophy (no defensive programming) 58 | - ✅ Single source of truth for configuration 59 | - ✅ Component boundaries maintained 60 | - ✅ Clean code with minimal surface area changes 61 | -------------------------------------------------------------------------------- /dexhand_env/rl/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | RL integration for DexHand environment. 3 | 4 | This module provides integration with RL libraries like rl_games. 5 | """ 6 | 7 | from rl_games.common import env_configurations, vecenv 8 | from loguru import logger 9 | 10 | 11 | def register_rlgames_env(): 12 | """Register the DexHand environment with rl_games.""" 13 | from dexhand_env.factory import make_env 14 | 15 | def create_env(**kwargs): 16 | # Extract parameters 17 | num_envs = kwargs.pop("num_actors", 1024) 18 | sim_device = kwargs.pop("sim_device", "cuda:0") 19 | rl_device = kwargs.pop("rl_device", "cuda:0") 20 | graphics_device_id = kwargs.pop("graphics_device_id", 0) 21 | headless = kwargs.pop("headless", False) 22 | cfg = kwargs.pop("cfg", None) 23 | task_name = kwargs.pop("task_name", "BaseTask") 24 | 25 | # Create and return the environment directly 26 | return make_env( 27 | task_name=task_name, 28 | num_envs=num_envs, 29 | sim_device=sim_device, 30 | rl_device=rl_device, 31 | graphics_device_id=graphics_device_id, 32 | headless=headless, 33 | cfg=cfg, 34 | ) 35 | 36 | # Register vecenv type for DexHand 37 | # Since DexHand already implements the standard Gym interface, 38 | # we can use a simple wrapper that just passes through calls 39 | class DexHandVecEnv(vecenv.IVecEnv): 40 | def __init__(self, config_name, num_actors, **kwargs): 41 | self.env = env_configurations.configurations[config_name]["env_creator"]( 42 | **kwargs 43 | ) 44 | 45 | def step(self, actions): 46 | return self.env.step(actions) 47 | 48 | def reset(self): 49 | return self.env.reset() 50 | 51 | def get_number_of_agents(self): 52 | return 1 # Single agent per environment 53 | 54 | def get_env_info(self): 55 | return { 56 | "action_space": self.env.action_space, 57 | "observation_space": self.env.observation_space, 58 | "num_envs": self.env.num_envs, 59 | } 60 | 61 | # Register the vecenv implementation 62 | vecenv.register( 63 | "RLGPU", 64 | lambda config_name, num_actors, **kwargs: DexHandVecEnv( 65 | config_name, num_actors, **kwargs 66 | ), 67 | ) 68 | 69 | # Register with rl_games 70 | env_configurations.register( 71 | "rlgpu_dexhand", 72 | { 73 | "vecenv_type": "RLGPU", 74 | "env_creator": create_env, 75 | }, 76 | ) 77 | 78 | logger.info("DexHand environment registered with rl_games") 79 | 80 | 81 | # Import after function definitions to avoid import order issues 82 | from .reward_observer import RewardComponentObserver # noqa: E402 83 | 84 | __all__ = ["register_rlgames_env", "RewardComponentObserver"] 85 | -------------------------------------------------------------------------------- /dexhand_env/constants.py: -------------------------------------------------------------------------------- 1 | """ 2 | Central constants and configuration for DexHand environment. 3 | 4 | This module defines all shared constants to ensure single source of truth. 5 | """ 6 | 7 | # DOF dimensions 8 | NUM_BASE_DOFS = 6 # ARTx, ARTy, ARTz, ARRx, ARRy, ARRz 9 | NUM_ACTIVE_FINGER_DOFS = 12 # 12 finger controls mapping to 19 DOFs with coupling 10 | NUM_TOTAL_FINGER_DOFS = 20 # 5 fingers × 4 joints (including fixed joint3_1) 11 | NUM_FINGERS = 5 # Thumb, index, middle, ring, pinky 12 | 13 | # Joint names 14 | BASE_JOINT_NAMES = ["ARTx", "ARTy", "ARTz", "ARRx", "ARRy", "ARRz"] 15 | 16 | FINGER_JOINT_NAMES = [ 17 | "r_f_joint1_1", 18 | "r_f_joint1_2", 19 | "r_f_joint1_3", 20 | "r_f_joint1_4", 21 | "r_f_joint2_1", 22 | "r_f_joint2_2", 23 | "r_f_joint2_3", 24 | "r_f_joint2_4", 25 | "r_f_joint3_1", 26 | "r_f_joint3_2", 27 | "r_f_joint3_3", 28 | "r_f_joint3_4", 29 | "r_f_joint4_1", 30 | "r_f_joint4_2", 31 | "r_f_joint4_3", 32 | "r_f_joint4_4", 33 | "r_f_joint5_1", 34 | "r_f_joint5_2", 35 | "r_f_joint5_3", 36 | "r_f_joint5_4", 37 | ] 38 | 39 | # Body names for fingertips and fingerpads 40 | FINGERTIP_BODY_NAMES = [ 41 | "r_f_link1_tip", 42 | "r_f_link2_tip", 43 | "r_f_link3_tip", 44 | "r_f_link4_tip", 45 | "r_f_link5_tip", 46 | ] 47 | 48 | FINGERPAD_BODY_NAMES = [ 49 | "r_f_link1_pad", 50 | "r_f_link2_pad", 51 | "r_f_link3_pad", 52 | "r_f_link4_pad", 53 | "r_f_link5_pad", 54 | ] 55 | 56 | # Finger DOF coupling mapping (12 actions -> 19 DOFs) 57 | # Actions map to finger DOFs as follows: 58 | # 0: r_f_joint1_1 (thumb spread) 59 | # 1: r_f_joint1_2 (thumb MCP) 60 | # 2: r_f_joint1_3, r_f_joint1_4 (thumb DIP - coupled) 61 | # 3: r_f_joint2_1, r_f_joint4_1, r_f_joint5_1 (finger spread - coupled, 5_1 is 2x) 62 | # 4: r_f_joint2_2 (index MCP) 63 | # 5: r_f_joint2_3, r_f_joint2_4 (index DIP - coupled) 64 | # 6: r_f_joint3_2 (middle MCP) 65 | # 7: r_f_joint3_3, r_f_joint3_4 (middle DIP - coupled) 66 | # 8: r_f_joint4_2 (ring MCP) 67 | # 9: r_f_joint4_3, r_f_joint4_4 (ring DIP - coupled) 68 | # 10: r_f_joint5_2 (pinky MCP) 69 | # 11: r_f_joint5_3, r_f_joint5_4 (pinky DIP - coupled) 70 | # Note: r_f_joint3_1 is fixed at 0 (not controlled) 71 | FINGER_COUPLING_MAP = { 72 | 0: ["r_f_joint1_1"], # thumb spread 73 | 1: ["r_f_joint1_2"], # thumb MCP 74 | 2: ["r_f_joint1_3", "r_f_joint1_4"], # thumb DIP (coupled) 75 | 3: [ 76 | ("r_f_joint2_1", 1.0), 77 | ("r_f_joint4_1", 1.0), 78 | ("r_f_joint5_1", 2.0), 79 | ], # finger spread (5_1 is 2x) 80 | 4: ["r_f_joint2_2"], # index MCP 81 | 5: ["r_f_joint2_3", "r_f_joint2_4"], # index DIP (coupled) 82 | 6: ["r_f_joint3_2"], # middle MCP 83 | 7: ["r_f_joint3_3", "r_f_joint3_4"], # middle DIP (coupled) 84 | 8: ["r_f_joint4_2"], # ring MCP 85 | 9: ["r_f_joint4_3", "r_f_joint4_4"], # ring DIP (coupled) 86 | 10: ["r_f_joint5_2"], # pinky MCP 87 | 11: ["r_f_joint5_3", "r_f_joint5_4"], # pinky DIP (coupled) 88 | } 89 | -------------------------------------------------------------------------------- /prompts/fix-005-box-bounce-physics.md: -------------------------------------------------------------------------------- 1 | # fix-005-box-bounce-physics.md 2 | 3 | Fix box bouncing at initialization in BlindGrasping task 4 | 5 | ## Issue Description 6 | 7 | After completing refactor-005-default-values, the BlindGrasping task exhibits a consistent physics behavior change where the box bounces slightly during initialization. This did not occur before the refactor. 8 | 9 | ## Symptoms 10 | 11 | - **Consistent behavior**: Box bounces every time BlindGrasping environment initializes 12 | - **Timing**: Occurs during environment startup/initialization phase 13 | - **Task-specific**: Affects BlindGrasping task (BaseTask has no box to compare) 14 | - **Physics-related**: Appears to be actual physics simulation bounce, not visual glitch 15 | 16 | ## Investigation Context 17 | 18 | The refactor-005-default-values changes removed hardcoded defaults from `.get()` patterns throughout the codebase. While most changes should be functionally equivalent (replacing hardcoded defaults with explicit config values), one of the changes may have introduced a subtle difference affecting box initialization physics. 19 | 20 | ## Potential Cause Areas 21 | 22 | Based on the refactor changes, the most likely causes are: 23 | 24 | ### 1. Physics Simulation Parameters 25 | - **VecTask substeps change**: Original default was 2, config.yaml has 4 26 | - **Physics dt**: Now uses explicit config value instead of fallback 27 | - **Client threads**: Now uses explicit config value 28 | 29 | ### 2. Box Positioning Logic 30 | - Box spawn height calculation 31 | - Reference frame changes 32 | - Table/ground plane positioning 33 | 34 | ### 3. Initialization Timing 35 | - Order of physics parameter application 36 | - Tensor initialization sequence 37 | - Reset logic timing 38 | 39 | ## Current Status 40 | 41 | - **Root cause**: Not yet identified 42 | - **Workaround**: None needed (cosmetic issue) 43 | - **Priority**: Medium (affects realism but not functionality) 44 | 45 | ## Investigation Steps 46 | 47 | 1. Compare physics parameters before/after refactor 48 | 2. Check box initial position calculation 49 | 3. Verify ground plane/table reference positioning 50 | 4. Test with different substeps values to isolate physics parameter effects 51 | 5. Check initialization order and timing 52 | 53 | ## Success Criteria 54 | 55 | - Box initializes without bouncing 56 | - Physics behavior matches pre-refactor baseline 57 | - No regression in other physics aspects 58 | 59 | ## Configuration Context 60 | 61 | **Box Configuration (BlindGrasping.yaml):** 62 | ```yaml 63 | env: 64 | box: 65 | size: 0.05 # 5cm cube 66 | mass: 0.1 # 100g 67 | initial_position: 68 | z: 0.025 # Should place box on table surface (half-height) 69 | ``` 70 | 71 | **Ground plane**: z = 0 (unchanged) 72 | **Expected result**: Box should rest stable on table without bouncing 73 | 74 | ## Notes 75 | 76 | - Issue is reproducible but cosmetic (doesn't break functionality) 77 | - Physics behavior appears more realistic (bouncing may be correct physics) 78 | - Original behavior may have been artificially stable due to physics parameter differences 79 | -------------------------------------------------------------------------------- /dexhand_env/cfg/physics/README.md: -------------------------------------------------------------------------------- 1 | # Physics Configuration System 2 | 3 | This directory contains modular physics configurations that can be mixed and matched with different tasks for optimal performance vs. quality trade-offs. 4 | 5 | ## Configuration Files 6 | 7 | ### `default.yaml` 8 | - **Purpose**: Balanced quality/performance for standard development 9 | - **Settings**: 4 substeps, 16 position iterations, 0.001 contact_offset 10 | - **Use case**: BaseTask and general development work 11 | - **Performance**: Standard baseline 12 | 13 | ### `fast.yaml` 14 | - **Purpose**: Optimized for real-time visualization and testing 15 | - **Settings**: 2 substeps, 8 position iterations, 0.002 contact_offset 16 | - **Use case**: test_viewer, test_stream configs for smooth visualization 17 | - **Performance**: ~2-3x faster than default 18 | - **Trade-off**: Slightly reduced physics accuracy for speed 19 | 20 | ### `accurate.yaml` 21 | - **Purpose**: Maximum precision for training and research 22 | - **Settings**: 16 substeps, 32 position iterations, 0.001 contact_offset 23 | - **Use case**: Training and research requiring high precision 24 | - **Performance**: ~2-3x slower than default 25 | - **Trade-off**: Highest physics quality for computational cost 26 | 27 | ## Usage 28 | 29 | ### In Task Configurations 30 | Add physics config to task defaults: 31 | ```yaml 32 | defaults: 33 | - BaseTask 34 | - /physics/accurate # Override physics settings 35 | - _self_ 36 | ``` 37 | 38 | ### In Test Configurations 39 | Add physics config to test defaults: 40 | ```yaml 41 | defaults: 42 | - config 43 | - base/test 44 | - /physics/fast # Fast physics for smooth rendering 45 | - _self_ 46 | ``` 47 | 48 | ## Parameter Comparison 49 | 50 | | Parameter | fast | default | accurate | 51 | |-----------|------|---------|----------| 52 | | **substeps** | 2 | 4 | 16 | 53 | | **num_position_iterations** | 8 | 16 | 32 | 54 | | **contact_offset** | 0.002 | 0.001 | 0.001 | 55 | | **gpu_contact_pairs_per_env** | 256 | 512 | 512 | 56 | | **default_buffer_size_multiplier** | 2.0 | 4.0 | 4.0 | 57 | 58 | ## Performance Impact 59 | 60 | - **fast → default**: ~50% performance cost for better accuracy 61 | - **default → accurate**: ~100% performance cost for maximum precision 62 | - **fast → accurate**: ~300% performance cost for maximum quality 63 | 64 | ## Design Principles 65 | 66 | 1. **Task-specific dt**: Each task controls its own timestep (`dt`) for RL environment consistency 67 | 2. **Physics-specific substeps**: Physics configs control simulation parameters only 68 | 3. **Modular inheritance**: Mix and match physics profiles with any task 69 | 4. **Clear separation**: Physics simulation vs. RL environment timing are independent 70 | 71 | ## Migration from Monolithic Configs 72 | 73 | Before: 74 | ```yaml 75 | # MyTask.yaml 76 | sim: 77 | dt: 0.01 78 | substeps: 16 79 | physx: 80 | num_position_iterations: 32 81 | # ... other physics params 82 | ``` 83 | 84 | After: 85 | ```yaml 86 | # Example: Custom task with accurate physics 87 | defaults: 88 | - BaseTask 89 | - /physics/accurate # Inherits all physics params 90 | - _self_ 91 | 92 | sim: 93 | dt: 0.01 # Only RL timing control 94 | ``` 95 | 96 | This provides clean separation and reusable physics profiles across different tasks and test scenarios. 97 | -------------------------------------------------------------------------------- /dexhand_env/components/action/scaling.py: -------------------------------------------------------------------------------- 1 | """ 2 | Action scaling utilities for DexHand environment. 3 | 4 | This module provides general mathematical scaling utilities for action processing. 5 | Contains no task-specific logic - pure mathematical operations for scaling and clamping. 6 | """ 7 | 8 | import torch 9 | 10 | 11 | class ActionScaling: 12 | """ 13 | Provides general mathematical utilities for action scaling. 14 | 15 | This component provides pure mathematical functions: 16 | - Scale actions from [-1, 1] to target ranges 17 | - Apply velocity-based deltas 18 | - Clamp values to limits 19 | 20 | No task-specific logic or conditional behavior. 21 | """ 22 | 23 | def __init__(self, parent): 24 | """Initialize the action scaling utilities.""" 25 | self.parent = parent 26 | 27 | @staticmethod 28 | def scale_to_limits( 29 | actions: torch.Tensor, lower_limits: torch.Tensor, upper_limits: torch.Tensor 30 | ) -> torch.Tensor: 31 | """ 32 | Scale actions from [-1, 1] to specified limits. 33 | 34 | Args: 35 | actions: Raw actions in [-1, 1] 36 | lower_limits: Lower limits for scaling 37 | upper_limits: Upper limits for scaling 38 | 39 | Returns: 40 | Scaled actions in limit ranges 41 | """ 42 | # Map from [-1, 1] to [lower, upper] 43 | # action = -1 → lower limit 44 | # action = +1 → upper limit 45 | return (actions + 1.0) * 0.5 * (upper_limits - lower_limits) + lower_limits 46 | 47 | @staticmethod 48 | def apply_velocity_deltas( 49 | prev_targets: torch.Tensor, actions: torch.Tensor, max_deltas: torch.Tensor 50 | ) -> torch.Tensor: 51 | """ 52 | Apply velocity-scaled deltas to previous targets. 53 | 54 | Args: 55 | prev_targets: Previous target positions 56 | actions: Raw actions in [-1, 1] 57 | max_deltas: Maximum allowed deltas per timestep 58 | 59 | Returns: 60 | New targets with applied deltas 61 | """ 62 | deltas = actions * max_deltas 63 | return prev_targets + deltas 64 | 65 | @staticmethod 66 | def clamp_to_limits( 67 | targets: torch.Tensor, lower_limits: torch.Tensor, upper_limits: torch.Tensor 68 | ) -> torch.Tensor: 69 | """ 70 | Clamp targets to specified limits. 71 | 72 | Args: 73 | targets: Target values to clamp 74 | lower_limits: Lower limits 75 | upper_limits: Upper limits 76 | 77 | Returns: 78 | Clamped targets 79 | """ 80 | return torch.clamp(targets, lower_limits, upper_limits) 81 | 82 | @staticmethod 83 | def apply_velocity_clamp( 84 | new_targets: torch.Tensor, prev_targets: torch.Tensor, max_deltas: torch.Tensor 85 | ) -> torch.Tensor: 86 | """ 87 | Clamp target changes to respect velocity limits. 88 | 89 | Args: 90 | new_targets: Proposed new targets 91 | prev_targets: Previous targets 92 | max_deltas: Maximum allowed change per timestep 93 | 94 | Returns: 95 | Velocity-clamped targets 96 | """ 97 | delta = new_targets - prev_targets 98 | clamped_delta = torch.clamp(delta, -max_deltas, max_deltas) 99 | return prev_targets + clamped_delta 100 | -------------------------------------------------------------------------------- /prompts/refactor-009-config-yaml.md: -------------------------------------------------------------------------------- 1 | # Configuration Architecture Cleanup 2 | 3 | ## Problem Statement 4 | 5 | `config.yaml` is serving two different purposes and becoming unwieldy: 6 | 7 | 1. **Primary purpose**: Training pipeline configuration for `train.py` 8 | 2. **Secondary purpose**: Test script configuration for `examples/dexhand_test.py` 9 | 10 | The test script settings (lines 16-23: `steps`, `sleep`, `debug`, `log_level`, `enablePlotting`, `plotEnvIdx`) are mixed into the main configuration, creating separation of concerns issues. 11 | 12 | Additionally, `debug.yaml` has naming inconsistency, using `training:` section instead of `train:` which conflicts with the main config structure. 13 | 14 | ## Configuration Analysis 15 | 16 | **Three distinct "test" concepts identified:** 17 | 18 | 1. **Test Script** (`examples/dexhand_test.py`): Environment functional testing 19 | - Uses `test_render.yaml` currently 20 | - Settings: `steps`, `sleep`, `debug`, `log_level`, `enablePlotting`, `plotEnvIdx` 21 | - Purpose: Validate environment implementation 22 | 23 | 2. **Policy Testing** (`base/test.yaml`): Evaluation of trained RL policies 24 | - Settings: `env.numEnvs`, `train.test`, `train.maxIterations` 25 | - Purpose: Evaluate trained policies 26 | 27 | 3. **Policy Testing with Viewer** (`test_render.yaml`): Policy evaluation with visualization 28 | - Inherits from `base/test.yaml` + enables `env.viewer: true` 29 | - Purpose: Visual policy evaluation 30 | 31 | **Naming Issue**: `test_render.yaml` uses deprecated "render" terminology. Per refactor-004-render.md, "viewer" is the correct semantic term. 32 | 33 | ## Solution 34 | 35 | ### 1. Create Dedicated Test Script Configuration 36 | Create `test_script.yaml` in `dexhand_env/cfg/` with: 37 | - Inherits from main `config.yaml` via Hydra defaults 38 | - Contains only test script specific settings: `steps`, `sleep`, `debug`, `log_level`, `enablePlotting`, `plotEnvIdx` 39 | - Removes clutter from main training configuration 40 | 41 | ### 2. Clean Main Configuration 42 | Remove test script settings from `config.yaml` (lines 16-23) to focus on training pipeline. 43 | 44 | ### 3. Clean Base Test Configuration 45 | Remove duplicate test script settings from `base/test.yaml` (lines 5-13), keep only policy evaluation settings. 46 | 47 | ### 4. Update Test Script 48 | Modify `examples/dexhand_test.py` to use dedicated `test_script.yaml` configuration. 49 | 50 | ### 5. Fix Debug Configuration 51 | Correct naming inconsistency in `debug.yaml` from `training:` to `train:`. 52 | 53 | ### 6. Rename for Semantic Clarity 54 | Rename `test_render.yaml` → `test_viewer.yaml` to follow new naming conventions from refactor-004-render.md. 55 | 56 | ### 7. Update Documentation References 57 | Update all documentation files that reference `test_render.yaml` to use `test_viewer.yaml`: 58 | - Search codebase for `test_render` references in documentation 59 | - Update CLI examples, usage instructions, and configuration guides 60 | - Ensure consistency with new naming conventions 61 | 62 | ## Expected Outcome 63 | 64 | - Clear separation of concerns between three test types 65 | - `config.yaml` focused purely on training pipeline 66 | - `test_script.yaml` focused on environment functional testing 67 | - `test_viewer.yaml` focused on policy evaluation with visualization 68 | - `base/test.yaml` focused on common policy evaluation settings 69 | - Consistent naming conventions throughout 70 | - All documentation updated to reflect new file names 71 | - No functionality changes 72 | -------------------------------------------------------------------------------- /examples/README.md: -------------------------------------------------------------------------------- 1 | # DexHand Examples 2 | 3 | This directory contains example scripts for testing and demonstrating the DexHand environment. 4 | 5 | ## dexhand_test.py 6 | 7 | Comprehensive test script for the DexHand environment using BaseTask only. 8 | 9 | ### Usage 10 | 11 | The test script is hardcoded to use BaseTask and supports Hydra configuration overrides: 12 | 13 | ```bash 14 | # Basic test (BaseTask only) 15 | python examples/dexhand_test.py 16 | 17 | # Headless test with custom parameters 18 | python examples/dexhand_test.py headless=true steps=100 env.numEnvs=16 19 | 20 | # Test with different control modes 21 | python examples/dexhand_test.py env.controlMode=position_delta env.policyControlsHandBase=false 22 | 23 | # Debug mode with verbose logging 24 | python examples/dexhand_test.py debug=true log_level=debug 25 | 26 | # Enable video recording and plotting 27 | python examples/dexhand_test.py env.videoRecord=true enablePlotting=true 28 | ``` 29 | 30 | ### Configuration Parameters 31 | 32 | All configuration parameters can be overridden via command line: 33 | 34 | **Test Settings:** 35 | - `steps` (1200) - Total number of test steps to run 36 | - `sleep` (0.01) - Sleep time between steps in seconds 37 | - `device` ("cuda:0") - Device for simulation and RL 38 | - `headless` (false) - Run without GUI visualization 39 | - `debug` (false) - Enable debug output and additional logging 40 | - `log_level` ("info") - Logging level (debug/info/warning/error) 41 | 42 | **Environment Settings:** 43 | - `env.numEnvs` (1024) - Number of parallel environments 44 | - `env.controlMode` ("position") - Control mode (position/position_delta) 45 | - `env.policyControlsHandBase` (true) - Include hand base in policy action space 46 | - `env.policyControlsFingers` (true) - Include fingers in policy action space 47 | 48 | **Recording & Visualization:** 49 | - `env.videoRecord` (false) - Enable video recording (works in headless mode) 50 | - `enablePlotting` (false) - Enable real-time plotting with Rerun 51 | - `plotEnvIdx` (0) - Environment index to plot 52 | 53 | **Task Selection:** 54 | - Script is hardcoded to use BaseTask only (basic task with contact test boxes) 55 | 56 | ### Key Features 57 | 58 | 1. **BaseTask Focus**: Hardcoded to BaseTask for reliable and consistent testing of core functionality 59 | 60 | 2. **Action Verification**: Tests all DOF mappings with sequential action patterns to verify control modes 61 | 62 | 3. **Performance Profiling**: Optional timing analysis for step processing components 63 | 64 | 4. **Real-time Plotting**: Integration with Rerun for visualization (when available) 65 | 66 | 5. **Video Recording**: Supports video capture in both windowed and headless modes 67 | 68 | 6. **Comprehensive Logging**: Detailed information about environment setup, action mappings, and system state 69 | 70 | ### Keyboard Controls (Non-headless mode) 71 | 72 | - `SPACE` - Toggle random actions mode 73 | - `E` - Reset current environment 74 | - `G` - Toggle between single robot and global view 75 | - `UP/DOWN` - Navigate between robots 76 | - `ENTER` - Toggle camera view mode 77 | - `C` - Toggle contact force visualization 78 | 79 | ### Configuration System 80 | 81 | The test script inherits configuration from the main Hydra config system: 82 | 83 | - Base configuration: `dexhand_env/cfg/config.yaml` 84 | - Task configuration: `dexhand_env/cfg/task/BaseTask.yaml` (hardcoded) 85 | - Physics configurations: `dexhand_env/cfg/physics/default.yaml` 86 | 87 | For testing other tasks, use the training script (`train.py`) instead. 88 | -------------------------------------------------------------------------------- /dexhand_env/utils/config_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | Essential configuration utilities for DexHand. 3 | 4 | Provides minimal, fail-fast config validation and helper functions. 5 | """ 6 | 7 | import yaml 8 | from omegaconf import DictConfig, OmegaConf 9 | from pathlib import Path 10 | from typing import Dict, Any, Optional, Union 11 | from loguru import logger 12 | 13 | 14 | def validate_config(cfg: DictConfig) -> None: 15 | """ 16 | Validate configuration with fail-fast approach. 17 | 18 | Args: 19 | cfg: Configuration to validate 20 | 21 | Raises: 22 | AttributeError: If required fields are missing (fail fast) 23 | ValueError: If critical values are invalid 24 | """ 25 | # Required fields - let AttributeError crash if missing (fail fast) 26 | cfg.task.name 27 | cfg.env.numEnvs 28 | cfg.env.device 29 | cfg.train.seed 30 | cfg.train.test 31 | 32 | # Basic sanity checks - crash on obviously bad values 33 | if cfg.env.numEnvs <= 0: 34 | raise ValueError(f"numEnvs must be positive, got {cfg.env.numEnvs}") 35 | 36 | if cfg.env.device not in ["cuda:0", "cpu"]: 37 | raise ValueError(f"device must be 'cuda:0' or 'cpu', got {cfg.env.device}") 38 | 39 | 40 | def validate_checkpoint_exists(checkpoint_path: Optional[str]) -> bool: 41 | """Validate that checkpoint file exists if specified.""" 42 | if checkpoint_path is None or checkpoint_path == "null": 43 | return True 44 | 45 | checkpoint_file = Path(checkpoint_path) 46 | if not checkpoint_file.exists(): 47 | logger.error(f"Checkpoint file does not exist: {checkpoint_path}") 48 | return False 49 | 50 | return True 51 | 52 | 53 | def get_experiment_name(cfg: DictConfig, timestamp: str) -> str: 54 | """Generate experiment name from config and timestamp.""" 55 | # Check if experimentName is explicitly set (optional field) 56 | try: 57 | experiment_name = cfg.train.logging.experimentName 58 | if experiment_name is not None: 59 | return experiment_name 60 | except (AttributeError, KeyError): 61 | pass # logging section or experimentName not present 62 | 63 | # Simple naming: task + mode + timestamp 64 | mode = "test" if cfg.train.test else "train" 65 | return f"{cfg.task.name}_{mode}_{timestamp}" 66 | 67 | 68 | def resolve_config_safely(cfg: DictConfig) -> Dict[str, Any]: 69 | """Safely resolve configuration with better error handling.""" 70 | try: 71 | return OmegaConf.to_container(cfg, resolve=True) 72 | except Exception as e: 73 | error_msg = str(e) 74 | if "InterpolationKeyError" in error_msg and "not found" in error_msg: 75 | missing_key = error_msg.split("'")[1] if "'" in error_msg else "unknown" 76 | raise ValueError(f"Config key '{missing_key}' not found") from e 77 | raise ValueError(f"Config resolution failed: {error_msg}") from e 78 | 79 | 80 | def save_config(cfg: DictConfig, output_dir: Path) -> None: 81 | """Save configuration to output directory.""" 82 | with open(output_dir / "config.yaml", "w") as f: 83 | OmegaConf.save(cfg, f) 84 | 85 | 86 | def load_config(config_path: Union[str, Path]) -> Dict[str, Any]: 87 | """Load configuration from YAML file.""" 88 | config_path = Path(config_path) 89 | 90 | if not config_path.exists(): 91 | raise FileNotFoundError(f"Configuration file not found: {config_path}") 92 | 93 | with open(config_path, "r", encoding="utf-8") as f: 94 | return yaml.safe_load(f) 95 | -------------------------------------------------------------------------------- /dexhand_env/README.md: -------------------------------------------------------------------------------- 1 | # DexHandEnv: Dexterous Manipulation Environment 2 | 3 | This package provides a reinforcement learning environment for dexterous manipulation tasks with robotic hands, built on top of NVIDIA's IsaacGym. 4 | 5 | ## Overview 6 | 7 | DexHandEnv is a modular, component-based framework for dexterous manipulation research. It provides: 8 | 9 | - A unified framework for creating dexterous manipulation tasks 10 | - A component-based architecture for better code organization and reusability 11 | - A configurable environment with various observation and action spaces 12 | - Pre-built tasks like grasping and reorientation 13 | 14 | ## Installation 15 | 16 | ```bash 17 | # Clone the repository 18 | git clone https://github.com/dexrobot/dexrobot_isaac.git 19 | cd dexrobot_isaac 20 | 21 | # Install the package 22 | pip install -e . 23 | ``` 24 | 25 | ## Usage 26 | 27 | ### Running Example Tasks 28 | 29 | ```bash 30 | # Run the DexGrasp task 31 | python examples/run_dex_grasp.py 32 | 33 | # Run headless 34 | python examples/run_dex_grasp.py --headless 35 | 36 | # Specify number of environments 37 | python examples/run_dex_grasp.py --num_envs 4 38 | ``` 39 | 40 | ### Creating Custom Tasks 41 | 42 | To create a custom task, extend the `BaseTask` class: 43 | 44 | ```python 45 | from dexhand_env.tasks.base_task import BaseTask 46 | 47 | class MyCustomTask(BaseTask): 48 | def __init__(self, sim, gym, device, num_envs, cfg): 49 | super().__init__(sim, gym, device, num_envs, cfg) 50 | # Initialize task-specific parameters 51 | 52 | def compute_task_reward_terms(self, obs_dict): 53 | # Compute task-specific rewards 54 | return {"my_reward": torch.ones(self.num_envs, device=self.device)} 55 | 56 | # Implement other required methods... 57 | ``` 58 | 59 | Then register your task in the factory: 60 | 61 | ```python 62 | # In factory.py 63 | from custom_tasks import MyCustomTask 64 | 65 | def create_dex_env(...): 66 | # ... 67 | elif task_name == "MyCustomTask": 68 | task = MyCustomTask(None, None, torch.device(sim_device), cfg["env"]["numEnvs"], cfg) 69 | # ... 70 | ``` 71 | 72 | ## Architecture 73 | 74 | The DexHandEnv package uses a component-based architecture: 75 | 76 | - `DexHandBase`: Main environment class that implements common functionality 77 | - `DexTask`: Interface for task-specific behavior 78 | - Components: 79 | - `CameraController`: Handles camera control and keyboard shortcuts 80 | - `FingertipVisualizer`: Visualizes fingertip contacts with color 81 | - `SuccessFailureTracker`: Tracks success and failure criteria 82 | - `RewardCalculator`: Calculates rewards 83 | 84 | ## Configuration 85 | 86 | The environment is configured using a dictionary-like structure: 87 | 88 | ```python 89 | cfg = { 90 | "env": { 91 | "numEnvs": 2, 92 | "episodeLength": 1000, 93 | # ... more environment parameters 94 | }, 95 | "sim": { 96 | "dt": 0.01, 97 | "substeps": 2, 98 | "gravity": [0.0, 0.0, -9.81], 99 | # ... more simulation parameters 100 | }, 101 | "task": { 102 | # Task-specific parameters 103 | }, 104 | "reward": { 105 | # Reward-specific parameters 106 | } 107 | } 108 | ``` 109 | 110 | ## License 111 | 112 | See the LICENSE file for licensing information. 113 | 114 | ## Acknowledgements 115 | 116 | This package is developed by DexRobot Inc. It builds upon NVIDIA's IsaacGym and leverages ideas from various reinforcement learning frameworks. 117 | -------------------------------------------------------------------------------- /prompts/refactor-002-graphics-manager-in-parent.md: -------------------------------------------------------------------------------- 1 | # refactor-002-graphics-manager-in-parent.md 2 | 3 | Align GraphicsManager ecosystem with established component architecture patterns. 4 | 5 | ## Context 6 | 7 | The DexRobot Isaac project follows a strict component architecture pattern where components access sibling components through parent references and property decorators, maintaining single source of truth principles. This pattern ensures: 8 | 9 | - Clean separation of concerns 10 | - Fail-fast behavior when dependencies are missing 11 | - Consistent initialization order through two-stage pattern 12 | - Reduced coupling between components 13 | 14 | From CLAUDE.md architectural guidelines: 15 | - Components should only take `parent` in constructor 16 | - Use `@property` decorators to access sibling components via parent 17 | - Never store direct references to sibling components 18 | 19 | ## Current State 20 | 21 | **✅ GraphicsManager correctly follows the pattern:** 22 | ```python 23 | class GraphicsManager: 24 | def __init__(self, parent): # ✅ Only parent reference 25 | self.parent = parent 26 | 27 | @property 28 | def device(self): # ✅ Property decorator for parent access 29 | return self.parent.device 30 | ``` 31 | 32 | **❌ VideoManager violates the pattern:** 33 | ```python 34 | class VideoManager: 35 | def __init__(self, parent, graphics_manager): # ❌ Direct sibling reference 36 | self.parent = parent 37 | self.graphics_manager = graphics_manager # ❌ Stored direct reference 38 | ``` 39 | 40 | **❌ ViewerController violates the pattern:** 41 | ```python 42 | class ViewerController: 43 | def __init__(self, parent, gym, sim, env_handles, headless, graphics_manager): # ❌ Multiple direct references 44 | # ... stores direct references instead of using parent 45 | ``` 46 | 47 | ## Desired Outcome 48 | 49 | All graphics-related components follow the established architectural pattern: 50 | 51 | 1. **VideoManager** - Access graphics_manager via property decorator 52 | 2. **ViewerController** - Access all dependencies via parent/property decorators 53 | 3. **DexHandBase** - Update instantiation calls to only pass parent references 54 | 55 | This creates consistent, maintainable architecture aligned with other components like ActionProcessor, RewardCalculator, etc. 56 | 57 | ## Constraints 58 | 59 | - **Maintain exact functionality** - No behavioral changes, only architectural alignment 60 | - **Respect two-stage initialization** - Components may need finalize_setup() if they depend on control_dt 61 | - **Follow fail-fast philosophy** - Let dependencies crash if parent/sibling is None 62 | - **Single source of truth** - Parent holds canonical references, components access via properties 63 | 64 | ## Implementation Notes 65 | 66 | **VideoManager refactoring:** 67 | - Remove `graphics_manager` parameter from constructor 68 | - Add `@property def graphics_manager(self): return self.parent.graphics_manager` 69 | - Update instantiation in DexHandBase 70 | 71 | **ViewerController refactoring:** 72 | - Remove direct dependency parameters (gym, sim, env_handles, graphics_manager) 73 | - Add property decorators for accessing these via parent 74 | - May need property decorators for gym, sim, env_handles if not already available on parent 75 | 76 | **Testing approach:** 77 | - Use existing test command: `python train.py config=test_stream render=true task=BlindGrasping device=cuda:0 checkpoint=runs/BlindGrasping_train_20250724_120120/nn/BlindGrasping.pth numEnvs=1` 78 | - Verify video recording and viewer functionality unchanged 79 | - Test both headless and viewer modes 80 | 81 | ## Dependencies 82 | 83 | None - this is a pure architectural refactoring that doesn't affect external interfaces. 84 | -------------------------------------------------------------------------------- /dexhand_env/tasks/README.md: -------------------------------------------------------------------------------- 1 | # DexHand Tasks 2 | 3 | This directory contains the implementation of the dexterous hand tasks using the new component-based architecture. 4 | 5 | ## Task Structure 6 | 7 | The tasks follow a component-based architecture: 8 | 9 | - `dexhand_base.py`: Base class for all dexterous hand tasks 10 | - `task_interface.py`: Interface that all task implementations must implement 11 | - `base/vec_task.py`: Base class for vector tasks (inherited by DexHandBase) 12 | 13 | ## Key Features 14 | 15 | ### Auto-detection of Physics Steps Per Control Step 16 | 17 | The environment now automatically detects how many physics steps are required between each control step based on reset stability requirements. This eliminates the need for manually configuring `controlFrequencyInv` and ensures that the environment is stable regardless of the physics setup. 18 | 19 | Usage: 20 | ```python 21 | # The environment will automatically detect the number of physics steps per control step 22 | env = create_dex_env(...) 23 | ``` 24 | 25 | ### Configurable Action Space 26 | 27 | The action space is now configurable, allowing you to specify which DOFs are controlled by the policy: 28 | 29 | - `controlHandBase`: Whether the policy controls the hand base (6 DOFs) 30 | - `controlFingers`: Whether the policy controls the finger joints (12 DOFs) 31 | 32 | For DOFs not controlled by the policy, you can specify default targets or implement a custom control policy in the task. 33 | 34 | Example configuration: 35 | ```yaml 36 | env: 37 | controlHandBase: false # Base not controlled by policy 38 | controlFingers: true # Fingers controlled by policy 39 | defaultBaseTargets: [0.0, 0.0, 0.5, 0.0, 0.0, 0.0] # Default targets for base 40 | ``` 41 | 42 | For task-specific control of uncontrolled DOFs, implement the `get_task_dof_targets` method in your task class: 43 | 44 | ```python 45 | def get_task_dof_targets(self, num_envs, device, base_controlled, fingers_controlled): 46 | # Return targets for DOFs not controlled by the policy 47 | targets = {} 48 | 49 | if not base_controlled: 50 | # Example: Make the base follow a circular trajectory 51 | # Use episode time for smooth trajectory (assumes control_dt is available) 52 | # This creates a full circle every ~6.28 seconds 53 | episode_time = self.episode_step_count.float() * self.control_dt 54 | base_targets = torch.zeros((num_envs, 6), device=device) 55 | base_targets[:, 0] = 0.3 * torch.sin(episode_time) # x position 56 | base_targets[:, 1] = 0.3 * torch.cos(episode_time) # y position 57 | base_targets[:, 2] = 0.5 # z position (fixed height) 58 | targets["base_targets"] = base_targets 59 | 60 | if not fingers_controlled and hasattr(self, 'object_pos'): 61 | # Example: Make fingers dynamically respond to object position 62 | finger_targets = self._compute_grasp_targets(self.object_pos) 63 | targets["finger_targets"] = finger_targets 64 | 65 | return targets 66 | ``` 67 | 68 | This allows for complex scenarios such as: 69 | - **Dynamic trajectories**: The base or fingers can follow time-varying trajectories 70 | - **State-dependent control**: Targets can depend on the state of the environment (e.g., object positions) 71 | - **Task-phase control**: Different control strategies can be used during different phases of a task 72 | - **Hybrid control**: Some DOFs can be controlled by the policy while others follow programmed behaviors 73 | 74 | The task has complete control over what targets are returned, and can implement arbitrarily complex control laws. If the task returns `None` or omits a key from the targets dictionary, the environment will use the default targets specified in the configuration. 75 | -------------------------------------------------------------------------------- /prompts/fix-007-episode-length-of-grasping.md: -------------------------------------------------------------------------------- 1 | # fix-007-episode-length-of-grasping.md 2 | 3 | Fix BlindGrasping task episode length inconsistencies and configuration inheritance issues. 4 | 5 | ## Context 6 | 7 | BlindGrasping task episodes are terminating at inconsistent step counts (399-499 steps) instead of the expected 500 steps, and the physics timing configuration is not being applied correctly due to fundamental configuration inheritance architecture problems. 8 | 9 | **Observed Symptoms:** 10 | - Episodes ending at 399-499 steps instead of 500 11 | - Physics running at 200Hz (dt=0.005) instead of expected 100Hz (dt=0.01) 12 | - Control cycle requiring 2 physics steps instead of 1 13 | - "Early failure seems not enforced" behavior 14 | 15 | **Root Cause Analysis:** 16 | The configuration inheritance order is architecturally wrong. Main config.yaml has `_self_` positioned last in defaults, causing its `sim.dt: 0.005` to override task-specific settings like BlindGrasping's `sim.dt: 0.01`. 17 | 18 | ## Current State 19 | 20 | **Broken Configuration Hierarchy:** 21 | 1. Main config.yaml loads task (BlindGrasping: dt=0.01) 22 | 2. Main config.yaml applies `_self_` LAST (dt=0.005 overrides task) 23 | 3. Result: Wrong physics timing affects episode behavior 24 | 25 | **Evidence:** 26 | ``` 27 | physics_dt: 0.005000s # Should be 0.01 for BlindGrasping 28 | physics_steps_per_control: 2 # Should be 1 with correct dt 29 | control_dt: 0.010000s # Correct result achieved wrong way 30 | ``` 31 | 32 | ## Desired Outcome 33 | 34 | **Correct Configuration Architecture:** 35 | - Task-specific settings ALWAYS override base/global settings 36 | - BlindGrasping runs with dt=0.01 (100Hz physics, 1 physics step per control) 37 | - BaseTask continues with dt=0.005 (200Hz physics) 38 | - Episodes run for full 500 steps when no early termination criteria are met 39 | 40 | **Physics Timing Goals:** 41 | - BlindGrasping: 100Hz physics (dt=0.01) with 1 physics step per control 42 | - BaseTask: 200Hz physics (dt=0.005) with 2 physics steps per control 43 | - Other tasks: Can specify their own optimal dt values 44 | 45 | ## Constraints 46 | 47 | **Architectural Principles:** 48 | - Follow fail-fast philosophy: task configs should not need defensive checks 49 | - Maintain component responsibility separation 50 | - Respect configuration inheritance: specialized overrides general 51 | - No breaking changes to existing BaseTask behavior 52 | 53 | **Configuration Design Rules:** 54 | - Main config.yaml: Only global defaults that apply to ALL tasks 55 | - Task configs: Task-specific overrides that should never be overridden 56 | - CLI overrides: Should work as expected for task-specific parameters 57 | 58 | ## Implementation Notes 59 | 60 | **Primary Fix:** 61 | 1. Fix Hydra inheritance order so task-specific configs properly override base config 62 | 2. Investigate moving `_self_` position in main config.yaml defaults list 63 | 3. Ensure BlindGrasping.yaml's `sim.dt: 0.01` overrides config.yaml's `sim.dt: 0.005` 64 | 4. Maintain base config.yaml as source of global defaults that tasks can override 65 | 66 | **Testing Requirements:** 67 | - Verify BlindGrasping shows `physics_dt: 0.010000s` and `physics_steps_per_control: 1` 68 | - Verify BaseTask shows `physics_dt: 0.005000s` and `physics_steps_per_control: 2` 69 | - Test episode lengths reach full 500 steps when no early termination occurs 70 | - Validate termination criteria work correctly with proper timing 71 | 72 | **Secondary Investigation:** 73 | - Check if other parameters in main config.yaml have same inheritance problem 74 | - Investigate if episode termination issues persist after physics timing fix 75 | - Analyze whether early termination criteria need adjustment for new timing 76 | 77 | ## Dependencies 78 | 79 | - Configuration system understanding (Hydra inheritance order) 80 | - Physics timing measurement system (control_dt calculation) 81 | - Episode termination logic in BlindGrasping task 82 | -------------------------------------------------------------------------------- /prompts/fix-003-max-iterations.md: -------------------------------------------------------------------------------- 1 | # fix-003-max-iterations.md 2 | 3 | Fix maxIterations config override and train.py cleanup 4 | 5 | ## Context 6 | 7 | The maxIterations configuration system has several issues that violate the fail-fast philosophy: 8 | 1. **Hardcoded defaults**: `get_config_overrides()` in train.py has brittle hardcoded checks like `if cfg.train.maxIterations != 10000` 9 | 2. **Missing shorthand alias**: No simple `maxIterations` alias (requires full `train.maxIterations`) 10 | 3. **Test mode doesn't respect maxIterations**: `python train.py train.test=true train.maxIterations=500` has no effect in test mode 11 | 4. **Defensive programming**: train.py contains hardcoded fallbacks that should be eliminated 12 | 5. **Configuration structure inconsistency**: train_headless.yaml uses wrong section name 13 | 14 | **Note**: CLI overrides like `python train.py train.maxIterations=5000` DO work correctly for training mode. The interpolation in BaseTaskPPO.yaml works as expected. 15 | 16 | ## Current State 17 | 18 | **Problematic code in train.py:146**: 19 | ```python 20 | def get_config_overrides(cfg: DictConfig) -> List[str]: 21 | # ... other checks with hardcoded defaults ... 22 | if cfg.train.maxIterations != 10000: # Default from config.yaml 23 | overrides.append(f"train.maxIterations={cfg.train.maxIterations}") 24 | # ... more hardcoded checks ... 25 | ``` 26 | 27 | **Inconsistent alias naming in cli_utils.py:48**: 28 | ```python 29 | ALIASES = { 30 | "numEnvs": "env.numEnvs", 31 | "maxIter": "train.maxIterations", # Should be replaced with "maxIterations" 32 | # Missing: "maxIterations": "train.maxIterations" 33 | } 34 | ``` 35 | 36 | **Decision**: Expert consensus recommends standardizing on `maxIterations` for clarity and consistency with config files. The `maxIter` alias should be removed in favor of the explicit form. 37 | 38 | **Test mode issue**: 39 | - `python train.py train.test=true train.maxIterations=500` - maxIterations ignored in test mode 40 | - Test mode runs indefinitely or until manual termination 41 | - Related to feat-002-indefinite-testing.md 42 | 43 | **Config structure inconsistency in train_headless.yaml:12-14**: 44 | ```yaml 45 | training: # Should be "train:" 46 | maxIterations: 10000 47 | ``` 48 | 49 | ## Desired Outcome 50 | 51 | 1. **Remove hardcoded defaults**: Follow fail-fast philosophy - always include configuration values in reproducible commands 52 | 2. **Standardize on explicit alias**: Replace `maxIter` with `maxIterations` for clarity and consistency with config files 53 | 3. **Fix config structure**: Correct train_headless.yaml section name 54 | 4. **Clean code quality**: Remove defensive programming patterns from get_config_overrides() 55 | 56 | **Note**: Test mode iteration control is handled separately in feat-002-indefinite-testing.md 57 | 58 | ## Constraints 59 | 60 | - **Fail-fast philosophy**: No defensive programming with hardcoded fallbacks 61 | - **Single source of truth**: Configuration values come from config files only 62 | - **Reproducibility**: get_config_overrides() must generate accurate command reconstruction 63 | - **Breaking change acceptable**: `maxIter` removal justified by clarity benefits in research context 64 | 65 | ## Implementation Notes 66 | 67 | 1. **Remove hardcoded checks**: Change from "only include if different from default" to "always include key values" 68 | 2. **Replace alias**: Change `"maxIter": "train.maxIterations"` to `"maxIterations": "train.maxIterations"` in cli_utils.py ALIASES 69 | 3. **Fix config**: Change `training:` to `train:` in train_headless.yaml 70 | 4. **Clean up function**: Apply same principle to other hardcoded checks in get_config_overrides() 71 | 5. **Breaking change**: `maxIter` will no longer work - users must use `maxIterations` 72 | 73 | **Rationale**: Expert consensus (o3-mini + Gemini Pro) strongly favors explicit naming for research/ML contexts where clarity and reproducibility outweigh CLI brevity concerns. 74 | 75 | ## Dependencies 76 | 77 | None - isolated configuration management fix. 78 | -------------------------------------------------------------------------------- /prompts/doc-003-action-processing-illustration.md: -------------------------------------------------------------------------------- 1 | # doc-003-action-processing-illustration.md 2 | 3 | Add action processing timing illustration to existing guide-action-pipeline.md 4 | 5 | ## Background 6 | The refactor-006 split action processing timing to align with RL rollout patterns: 7 | - **post_physics (step N-1)**: Observation computation + pre-action rule (pipeline stage 1) 8 | - **pre_physics (step N)**: Action rule + post-action filters + coupling (pipeline stages 2-4) 9 | 10 | ## Implementation Plan 11 | 12 | ### Phase 1: Enhance Existing Documentation 13 | **File**: `docs/guide-action-pipeline.md` (modify existing) 14 | - **Add new section**: "Timing and Execution Flow" 15 | - **Explain timing split**: WHY post_physics vs pre_physics phases exist 16 | - **Stage mapping**: How 4-stage pipeline maps to execution phases 17 | - **RL alignment**: Why this timing benefits RL framework patterns 18 | - **Integration**: Keep all action processing concepts in single document 19 | 20 | ### Phase 2: Visual Diagram - Content and Layout Specifications 21 | 22 | **File**: `docs/assets/action-processing-timeline.svg` (new) 23 | 24 | #### Primary Content Structure: 25 | - **Two Control Steps**: Step N-1 and Step N showing temporal relationship 26 | - **Four Pipeline Stages**: Stage 1 (Pre-Action Rule), Stage 2 (Action Rule), Stage 3 (Post-Action Filters), Stage 4 (Coupling Rule) 27 | - **Timing Phase Mapping**: Stage 1 → post_physics phase, Policy forward + Stages 2-4 → pre_physics phase 28 | - **Data Flow Elements**: Specific tensor variables labeled on arrows (active_prev_targets, active_rule_targets, actions, active_raw_targets, active_next_targets, full_dof_targets) 29 | 30 | #### Layout Organization: 31 | - **Linear Pipeline Flow**: Horizontal arrangement Stage 1 → Policy → Stage 2 → Stage 3 → Stage 4 (left to right progression) 32 | - **Phase Context**: Subtle background zones indicating post_physics (Step N-1) and pre_physics (Step N) timing without overwhelming stage flow 33 | - **Clean Staging**: Each stage as distinct box (~120-140px width) with clear functional purpose 34 | - **Policy Integration**: Policy network as natural bridge between Stage 1 (observations) and Stage 2 (actions) 35 | - **Directional Flow**: Prominent arrows showing data progression through pipeline stages 36 | 37 | #### Visual Hierarchy: 38 | 1. **Primary**: Linear stage sequence showing functional pipeline progression 39 | 2. **Secondary**: Phase timing context as subtle background information 40 | 3. **Supporting**: Data flow arrows and timing labels (Step N-1, Step N) 41 | 42 | #### Content Approach (Descriptive, Not Promotional): 43 | - **Architecture Description**: Focus on WHAT the timing pattern accomplishes 44 | - **Timing Context**: Clear temporal labels without "benefits" language 45 | - **Functional Focus**: Describe data flow and stage purposes rather than advantages 46 | - **No Architecture Summary Box**: Keep diagram focused on visual flow, move architectural description to text documentation 47 | 48 | #### Educational Focus: 49 | - Show WHEN each stage executes through clear timing phases 50 | - Show WHAT data flows between stages with prominent arrows and specific tensor labels 51 | - Show HOW the 4-stage pipeline maps to 2 timing phases 52 | - Include all data dependencies (e.g., Stage 2 receives both actions AND active_prev_targets) 53 | - Clarify policy forward pass happens in pre_physics phase 54 | 55 | #### Additional Requirements: 56 | - **Policy Interpretation Note**: Text documentation should clarify that policy output can have any meaning; the action rule determines how to interpret policy output for DOF target updates 57 | - **Complete Data Flow**: Show all inputs to each stage, not just primary flow (e.g., Stage 2 needs active_prev_targets, active_rule_targets, AND actions) 58 | - **Variable Clarity**: Label arrows with exact tensor variable names from implementation to avoid confusion between similar-sounding targets 59 | 60 | ### Phase 3: Cross-References 61 | - Update any existing links to guide-action-pipeline.md 62 | - No new documentation file needed 63 | 64 | ## Quality Standards 65 | - Follow CLAUDE.md documentation development protocol 66 | - Maintain existing document structure and flow 67 | - Verify technical accuracy against refactor-006-action-processing.md 68 | - Single source of truth for all action processing concepts 69 | -------------------------------------------------------------------------------- /prompts/feat-004-action-rule-example.md: -------------------------------------------------------------------------------- 1 | # Action Rule Use Cases Documentation 2 | 3 | ## Status 4 | - **Example Implementation**: Not needed at this time 5 | - **Documentation**: Required - create conceptual guide 6 | 7 | ## Problem 8 | The existing `guide-action-pipeline.md` provides comprehensive technical documentation, but lacks conceptual understanding through elegant examples that demonstrate the intellectual beauty of the 4-stage pipeline approach. 9 | 10 | ## Solution 11 | Create `docs/guide-action-rule-use-cases.md` - a conceptual companion guide focusing on elegant examples and use cases rather than technical implementation details. 12 | 13 | ## Documentation PRD 14 | 15 | **Document**: `docs/guide-action-rule-use-cases.md` 16 | 17 | **Purpose**: Demonstrate the intellectual beauty and natural problem decomposition enabled by the action rule pipeline through clean, elegant examples. 18 | 19 | **Target Audience**: 20 | - Engineers who want to understand and extend standard control modes 21 | - Researchers who want to see clean examples of pipeline-based problem decomposition 22 | 23 | **Key Insight**: Standard control modes (`position`/`position_delta`) are elegant action rule implementations, not separate control pathways. This provides a natural bridge from familiar concepts to advanced research applications. 24 | 25 | **Structure**: 26 | 27 | ### 1. Standard Control Modes as Action Rules (2 paragraphs) 28 | - **Concept**: Show how `position` and `position_delta` modes are implemented as action rules 29 | - **Purpose**: Demystify the abstraction - "When you use position control, you're already using action rules" 30 | - **Content**: Brief pseudocode showing clean separation of concerns in standard modes 31 | - **Focus**: Familiar control modes as showcases of good pipeline design 32 | 33 | ### 2. Pipeline Philosophy (1 paragraph) 34 | - **Concept**: Why 4-stage separation creates intellectual elegance 35 | - **Content**: Each stage has distinct responsibility, enabling natural problem decomposition 36 | - **Focus**: Conceptual benefits of the pipeline approach 37 | 38 | ### 3. Research Use Cases (2-3 elegant examples) 39 | 40 | #### 3.1 Residual Learning 41 | - **Pre-action**: Set DOF targets to dataset values (baseline) 42 | - **Action rule**: Add scaled policy output to previous targets (correction) 43 | - **Post-action**: Clip targets to physical limits (constraint) 44 | - **Beauty**: Clean separation of baseline, correction, and constraint 45 | 46 | #### 3.2 Confidence-Based Selective DOF Control 47 | - **Pre-action**: Fallback controller computes complete safe baseline targets (not using policy) 48 | - **Action rule**: Per-DOF selective replacement - if confidence[i] > threshold, use policy target[i], else keep fallback target[i] 49 | - **Post-action**: Sanity-based selective reversion - check mixed targets against safety rules, revert problematic DOFs to fallback 50 | - **Beauty**: Heterogeneous control architecture with dual-layer safety (confidence + sanity validation) 51 | 52 | ### 4. Implementation Notes (brief) 53 | - **Function signatures**: Reference existing technical guide 54 | - **Registration patterns**: Basic examples 55 | - **Keep minimal**: Focus readers on conceptual understanding 56 | 57 | **Writing Principles**: 58 | - **Concise**: Each example focuses on data flow and conceptual beauty 59 | - **Objective**: Clear, descriptive language without promotional tone 60 | - **Educational**: Show natural problem decomposition through pipeline stages 61 | - **Complementary**: References technical guide for implementation details 62 | - **Elegant**: Examples chosen for intellectual beauty and clean separation of concerns 63 | 64 | **Key Messages**: 65 | 1. Standard control modes demonstrate good pipeline design 66 | 2. Complex research problems become elegant when properly decomposed 67 | 3. Each pipeline stage serves a distinct, focused purpose 68 | 4. The 4-stage approach enables natural problem decomposition 69 | 5. Pipeline supports both uniform correction (residual learning) and selective control (confidence switching) patterns 70 | 71 | **Success Criteria**: 72 | - Readers understand how standard modes work as action rules 73 | - Readers see the conceptual elegance of pipeline-based problem decomposition 74 | - Researchers can envision how to apply the pattern to their own problems 75 | - Engineers understand how to extend familiar control modes 76 | -------------------------------------------------------------------------------- /assets/mjcf/open_ai_assets/fetch/shared.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | -------------------------------------------------------------------------------- /docs/guide-viewer-controller.md: -------------------------------------------------------------------------------- 1 | # Viewer Controller Guide 2 | 3 | This guide explains how to use the viewer controller for interactive control of the DexHand environment. 4 | 5 | ## Overview 6 | 7 | The ViewerController component provides keyboard shortcuts for controlling the camera view, navigating between robots, and resetting environments during visualization. It's automatically enabled when running with a viewer (non-headless mode). 8 | 9 | ## Keyboard Shortcuts 10 | 11 | ### Camera Controls 12 | 13 | | Key | Action | Description | 14 | |-----|--------|-------------| 15 | | **Enter** | Toggle View Mode | Cycles through camera views: Free → Rear → Right → Bottom → Free | 16 | | **G** | Toggle Follow Mode | Switches between following a single robot or viewing all robots globally | 17 | 18 | ### Navigation Controls 19 | 20 | | Key | Action | Description | 21 | |-----|--------|-------------| 22 | | **↑** (Up Arrow) | Previous Robot | Navigate to the previous robot (only in single follow mode) | 23 | | **↓** (Down Arrow) | Next Robot | Navigate to the next robot (only in single follow mode) | 24 | 25 | ### Environment Controls 26 | 27 | | Key | Action | Description | 28 | |-----|--------|-------------| 29 | | **P** | Reset Environment | Reset the currently selected robot/environment | 30 | 31 | ## Camera View Modes 32 | 33 | ### 1. Free Camera 34 | - Manual camera control using mouse 35 | - No automatic following 36 | - Full freedom to position camera anywhere 37 | 38 | ### 2. Rear View 39 | - Camera positioned behind the hand 40 | - Follows hand movement automatically 41 | - Good for observing finger movements from behind 42 | 43 | ### 3. Right View 44 | - Camera positioned to the right of the hand 45 | - Follows hand movement automatically 46 | - Useful for side perspective of grasping 47 | 48 | ### 4. Bottom View 49 | - Camera positioned below, looking up at the hand 50 | - Follows hand movement automatically 51 | - Ideal for observing palm and contact points 52 | 53 | ## Follow Modes 54 | 55 | ### Single Robot Mode (Default) 56 | - Camera follows one specific robot 57 | - Use arrow keys to switch between robots 58 | - Camera stays focused on the selected robot 59 | - Useful for detailed observation of individual behaviors 60 | 61 | ### Global View Mode 62 | - Camera shows all robots at once 63 | - Camera position centers on all robots 64 | - Increased camera distance for wider view 65 | - Useful for comparing multiple robots or batch training 66 | 67 | ## Usage Examples 68 | 69 | ### Basic Interaction 70 | ```bash 71 | # Run with viewer enabled 72 | python examples/dexhand_test.py 73 | 74 | # During execution: 75 | # - Press Enter to cycle through camera views 76 | # - Press G to toggle between single/global view 77 | # - Press ↑/↓ to change which robot to follow 78 | # - Press P to reset the current robot 79 | ``` 80 | 81 | ### Multi-Environment Setup 82 | ```bash 83 | # Run with multiple environments 84 | python examples/dexhand_test.py --num-envs 4 85 | 86 | # Use global view (G) to see all robots 87 | # Switch to single mode (G) to focus on one 88 | # Navigate between robots with arrow keys 89 | ``` 90 | 91 | ## Console Feedback 92 | 93 | The viewer controller provides console output for all actions: 94 | - Camera mode changes: `"Camera: Rear View (following robot 0)"` 95 | - Follow mode changes: `"Camera: Rear View (global view)"` 96 | - Robot selection: `"Following robot 2"` 97 | - Invalid actions: `"Cannot change robot in global view mode. Press Tab to switch to single robot mode."` 98 | 99 | ## Implementation Details 100 | 101 | ### Default Settings 102 | - Starts in **Rear View** with **Single Robot** follow mode 103 | - Follows robot 0 by default 104 | - All keyboard events are automatically subscribed when viewer is created 105 | 106 | ### Camera Positioning 107 | - Each view mode has predefined offset positions 108 | - Global view increases camera distance for better overview 109 | - Camera smoothly updates position each frame when following 110 | 111 | ### Integration with Environment 112 | - Reset command (`P` key) triggers `reset_idx()` for selected environment 113 | - Camera updates use hand positions from rigid body states 114 | - Works seamlessly with both CPU and GPU pipelines 115 | 116 | ## Troubleshooting 117 | 118 | ### Camera Not Following 119 | - Ensure you're not in Free Camera mode (press Enter to cycle) 120 | - Check that follow mode is set to Single (press G if needed) 121 | - Verify hand positions are being updated correctly 122 | 123 | ### Cannot Change Robot 124 | - You must be in Single Robot mode to navigate between robots 125 | - Press G to switch from Global to Single mode 126 | - Then use arrow keys to select different robots 127 | 128 | ### Viewer Not Responding 129 | - Ensure viewer was created (not running in headless mode) 130 | - Check that keyboard events are being processed 131 | - Verify no other application has keyboard focus 132 | -------------------------------------------------------------------------------- /prompts/doc-002-control-dt-illustration.md: -------------------------------------------------------------------------------- 1 | # Control_dt vs Physics_dt Illustration Documentation 2 | 3 | ## Problem Statement 4 | 5 | The control_dt measurement and two-stage initialization system is a core architectural concept that is poorly understood. Current documentation lacks visual explanation of the parallel simulation constraint that drives this design. 6 | 7 | ## Current Understanding Issues 8 | 9 | - Users assume physics_steps_per_control_step is configurable (it's measured) 10 | - Confusion about why measurement is necessary (parallel GPU simulation constraint) 11 | - Misunderstanding that stepping varies per control cycle (it's deterministic after measurement - ALL control steps have the SAME number of physics steps) 12 | 13 | ## Documentation Goals 14 | 15 | Create comprehensive SVG timeline illustration showing: 16 | 17 | ### 1. Parallel Simulation Constraint (Core Concept) 18 | - Timeline showing all N environments must step together on GPU 19 | - Demonstrate why individual environment stepping is impossible 20 | - Show how worst-case reset logic determines physics step count for ALL control steps 21 | 22 | ### 2. Consistent Physics Step Count (Key Insight) 23 | Example: 4 Physics Steps Per Control Step (measured during initialization) 24 | ``` 25 | Physics Step Breakdown: 26 | ├── P₁: Standard env.step() call (always required) 27 | ├── P₂: Reset logic - moving hand to new position 28 | ├── P₃: Reset logic - placing/repositioning object 29 | └── P₄: Reset logic - final stabilization after setup 30 | 31 | Result: ALL control steps use 4 physics steps (physics_steps_per_control_step = 4) 32 | control_dt = physics_dt × 4 33 | ``` 34 | 35 | ### 3. Deterministic Operation (Post-Measurement) 36 | - EVERY control step takes exactly 4 physics steps (regardless of whether individual environments need reset) 37 | - Fixed control_dt ensures consistent action scaling and timing 38 | - Timeline shows: Control Step 1 [P₁|P₂|P₃|P₄], Control Step 2 [P₁|P₂|P₃|P₄], etc. 39 | 40 | ### 4. Impact on Action Scaling 41 | - position_delta mode requires control_dt for velocity-to-delta conversion 42 | - max_delta = control_dt × max_velocity 43 | - Action scaling coefficients computed during finalize_setup() 44 | 45 | ## Implementation Plan 46 | 47 | ### Documentation Organization 48 | - **Main document**: `docs/control-dt-timing-diagram.md` (conceptual understanding) 49 | - **SVG timeline**: `docs/assets/control-dt-timeline.svg` (visual diagrams) 50 | - **Cross-references**: Links to existing `guide-component-initialization.md` and `reference-physics-implementation.md` 51 | 52 | ### SVG Timeline Specifications 53 | **Dimensions**: ~800px × 400px 54 | **Structure**: 55 | - **Horizontal axis**: Time progression (Control Step 1, 2, 3...) 56 | - **Vertical axis**: Environment timelines (Env 0, Env 1, Env 2, Env 3) 57 | - **Control step containers**: Large boxes spanning 4 physics steps each 58 | - **Physics step subdivisions**: P₁, P₂, P₃, P₄ within each control step 59 | 60 | **Timeline Sequence to Illustrate**: 61 | 1. **Control Step 1**: 4 physics steps [P₁|P₂|P₃|P₄] across all environments 62 | 2. **Control Step 2**: 4 physics steps [P₁|P₂|P₃|P₄] (highlight which env is driving reset) 63 | 3. **Control Step 3**: 4 physics steps [P₁|P₂|P₃|P₄] (consistent timing) 64 | 65 | **Visual Elements**: 66 | - **Color coding**: Blue (standard step), Red (reset-driven steps P₂,P₃,P₄) 67 | - **Reset highlighting**: Show which environment needs reset, but ALL environments take 4 steps 68 | - **Synchronization emphasis**: Vertical alignment showing parallel constraint 69 | - **Callouts**: Explain physics step breakdown (env.step + 3 reset steps) 70 | 71 | ### Cross-Reference Strategy 72 | **FROM existing docs TO new timing diagram**: 73 | - `guide-component-initialization.md` → Add link in "Why Two-Stage is Necessary" section 74 | - `reference-physics-implementation.md` → Add link in "Physics Step Management" section 75 | 76 | **FROM new timing diagram TO existing docs**: 77 | - Reference component-initialization for two-stage implementation details 78 | - Reference physics-implementation for technical stepping specifics 79 | - Reference action pipeline for control_dt scaling impact 80 | 81 | ### Key Messages 82 | 1. **Parallel Constraint**: GPU simulation requires ALL environments step together 83 | 2. **Worst-Case Measurement**: System measures maximum physics steps needed (e.g., 4) 84 | 3. **Deterministic Result**: ALL control steps use same physics step count forever 85 | 4. **Reset Logic Breakdown**: Show specific physics steps (hand move, object place, stabilize) 86 | 87 | ## Success Criteria 88 | 89 | - Timeline clearly shows ALL control steps have identical physics step count (4) 90 | - Parallel environment constraint visually obvious through vertical alignment 91 | - Reset logic breakdown clearly explains where extra physics steps come from 92 | - Readers understand deterministic timing (no variation between control steps) 93 | - Cross-references create logical documentation flow from concept → implementation → technical details 94 | -------------------------------------------------------------------------------- /dexhand_env/factory.py: -------------------------------------------------------------------------------- 1 | """ 2 | Factory for creating DexHand environments. 3 | 4 | This module provides factory functions for creating DexHand environments 5 | with different tasks. 6 | """ 7 | 8 | # Import loguru 9 | from loguru import logger 10 | 11 | # Import tasks first (they will import Isaac Gym) 12 | from dexhand_env.tasks.dexhand_base import DexHandBase 13 | from dexhand_env.tasks.base_task import BaseTask 14 | from dexhand_env.tasks.blind_grasping_task import BlindGraspingTask 15 | 16 | # Import PyTorch after Isaac Gym modules 17 | import torch 18 | 19 | 20 | def create_dex_env( 21 | task_name, 22 | cfg, 23 | rl_device, 24 | sim_device, 25 | graphics_device_id, 26 | force_render=False, 27 | video_config=None, 28 | ): 29 | """ 30 | Create a DexHand environment with the specified task. 31 | 32 | Args: 33 | task_name: Name of the task to create 34 | cfg: Configuration dictionary 35 | rl_device: Device for RL computations 36 | sim_device: Device for simulation 37 | graphics_device_id: Graphics device ID 38 | force_render: Whether to force rendering 39 | video_config: Optional video recording configuration 40 | 41 | Returns: 42 | A DexHand environment with the specified task 43 | """ 44 | logger.info(f"Creating DexHand environment with task: {task_name}") 45 | 46 | # Create the task component based on the task name 47 | try: 48 | if task_name == "BaseTask": 49 | # Base task with minimal functionality 50 | logger.debug("Creating BaseTask...") 51 | # Ensure device is properly set - rl_device is the one used for tensors 52 | task = BaseTask( 53 | None, None, torch.device(rl_device), cfg["env"]["numEnvs"], cfg 54 | ) 55 | elif task_name == "BlindGrasping": 56 | # Box grasping task 57 | logger.debug("Creating BlindGraspingTask...") 58 | task = BlindGraspingTask( 59 | None, None, torch.device(rl_device), cfg["env"]["numEnvs"], cfg 60 | ) 61 | else: 62 | raise ValueError(f"Unknown task: {task_name}") 63 | 64 | logger.debug("Task created successfully, creating environment...") 65 | 66 | # Derive headless from explicit viewer configuration 67 | headless = not cfg["env"]["viewer"] 68 | 69 | # Create the environment with the task component 70 | env = DexHandBase( 71 | cfg, 72 | task, 73 | rl_device, 74 | sim_device, 75 | graphics_device_id, 76 | headless, 77 | force_render, 78 | video_config, 79 | ) 80 | 81 | logger.debug("Environment created successfully") 82 | 83 | return env 84 | 85 | except Exception as e: 86 | logger.error(f"ERROR in create_dex_env: {e}") 87 | import traceback 88 | 89 | traceback.print_exc() 90 | raise 91 | 92 | 93 | def make_env( 94 | task_name: str, 95 | num_envs: int, 96 | sim_device: str, 97 | rl_device: str, 98 | graphics_device_id: int, 99 | cfg: dict = None, 100 | force_render: bool = False, 101 | video_config: dict = None, 102 | ): 103 | """ 104 | Create a DexHand environment for RL training. 105 | 106 | This is the main entry point for creating environments compatible with 107 | RL libraries like rl_games. 108 | 109 | Args: 110 | task_name: Name of the task (e.g., "BaseTask", "DexGrasp") 111 | num_envs: Number of parallel environments 112 | sim_device: Device for physics simulation (e.g., "cuda:0", "cpu") 113 | rl_device: Device for RL algorithm (e.g., "cuda:0", "cpu") 114 | graphics_device_id: GPU device ID for rendering 115 | cfg: Optional configuration dictionary (will load from file if not provided) 116 | force_render: Whether to force rendering even in headless mode 117 | video_config: Optional video recording configuration 118 | 119 | Returns: 120 | DexHandBase environment instance 121 | """ 122 | # Load configuration if not provided 123 | if cfg is None: 124 | from dexhand_env.utils.config_utils import load_config 125 | 126 | config_path = f"dexhand_env/cfg/task/{task_name}.yaml" 127 | cfg = load_config(config_path) 128 | 129 | # Ensure numEnvs is set in config 130 | if "numEnvs" not in cfg["env"]: 131 | cfg["env"]["numEnvs"] = num_envs 132 | elif cfg["env"]["numEnvs"] != num_envs: 133 | logger.info(f"Updating numEnvs from {cfg['env']['numEnvs']} to {num_envs}") 134 | cfg["env"]["numEnvs"] = num_envs 135 | 136 | # Create environment using existing factory function 137 | env = create_dex_env( 138 | task_name=task_name, 139 | cfg=cfg, 140 | rl_device=rl_device, 141 | sim_device=sim_device, 142 | graphics_device_id=graphics_device_id, 143 | force_render=force_render, 144 | video_config=video_config, 145 | ) 146 | 147 | return env 148 | -------------------------------------------------------------------------------- /dexhand_env/tasks/base_task.py: -------------------------------------------------------------------------------- 1 | """ 2 | Base task implementation for DexHand. 3 | 4 | This module provides a minimal task implementation that satisfies the DexTask interface 5 | without adding any specific task behavior. It can be used as a starting point for new tasks 6 | or for testing the basic environment functionality. 7 | """ 8 | 9 | from typing import Dict, Optional 10 | 11 | # Import PyTorch 12 | import torch 13 | 14 | from dexhand_env.tasks.task_interface import DexTask 15 | 16 | 17 | class BaseTask(DexTask): 18 | """ 19 | Minimal task implementation for DexHand. 20 | 21 | This task provides the minimal implementation required by the DexTask interface, 22 | without adding any specific task behavior. It returns empty reward terms, 23 | no success/failure criteria, and doesn't add any task-specific actors. 24 | 25 | Use this as a base class for new tasks or for testing the basic environment. 26 | """ 27 | 28 | def __init__(self, sim, gym, device, num_envs, cfg): 29 | """ 30 | Initialize the base task. 31 | 32 | Args: 33 | sim: Simulation instance 34 | gym: Gym instance 35 | device: PyTorch device 36 | num_envs: Number of environments 37 | cfg: Configuration dictionary 38 | """ 39 | self.sim = sim 40 | self.gym = gym 41 | self.device = device 42 | self.num_envs = num_envs 43 | self.cfg = cfg 44 | 45 | # Reference to parent environment (set by DexHandBase) 46 | self.parent_env = None 47 | 48 | def compute_task_reward_terms( 49 | self, obs_dict: Dict[str, torch.Tensor] 50 | ) -> Dict[str, torch.Tensor]: 51 | """ 52 | Compute task-specific reward components. 53 | 54 | The base task doesn't provide any specific rewards beyond the common rewards 55 | handled by DexHandBase. 56 | 57 | Args: 58 | obs_dict: Dictionary of observations 59 | 60 | Returns: 61 | Empty dictionary of task-specific reward components 62 | """ 63 | return {} 64 | 65 | def check_task_success_criteria( 66 | self, obs_dict: Optional[Dict[str, torch.Tensor]] = None 67 | ) -> Dict[str, torch.Tensor]: 68 | """ 69 | Check task-specific success criteria. 70 | 71 | Args: 72 | obs_dict: Optional dictionary of observations. If provided, can be used 73 | for efficiency to avoid recomputing observations. 74 | 75 | The base task doesn't define any success criteria. 76 | 77 | Returns: 78 | Empty dictionary of task-specific success criteria 79 | """ 80 | return {} 81 | 82 | def check_task_failure_criteria( 83 | self, obs_dict: Optional[Dict[str, torch.Tensor]] = None 84 | ) -> Dict[str, torch.Tensor]: 85 | """ 86 | Check task-specific failure criteria. 87 | 88 | Args: 89 | obs_dict: Optional dictionary of observations. If provided, can be used 90 | for efficiency to avoid recomputing observations. 91 | 92 | The base task doesn't define any failure criteria. 93 | 94 | Returns: 95 | Empty dictionary of task-specific failure criteria 96 | """ 97 | return {} 98 | 99 | def reset_task_state(self, env_ids: torch.Tensor): 100 | """ 101 | Reset task-specific state for the specified environments. 102 | 103 | The base task doesn't have any specific state to reset. 104 | 105 | Args: 106 | env_ids: Environment indices to reset 107 | """ 108 | pass 109 | 110 | def create_task_objects(self, gym, sim, env_ptr, env_id: int): 111 | """ 112 | Add task-specific objects to the environment. 113 | 114 | The base task doesn't add any specific objects. 115 | 116 | Args: 117 | gym: Gym instance 118 | sim: Simulation instance 119 | env_ptr: Pointer to the environment to add objects to 120 | env_id: Index of the environment being created 121 | """ 122 | pass 123 | 124 | def load_task_assets(self): 125 | """ 126 | Load task-specific assets and define task-specific variables. 127 | 128 | The base task doesn't load any specific assets. 129 | """ 130 | pass 131 | 132 | def get_task_observations( 133 | self, obs_dict: Dict[str, torch.Tensor] 134 | ) -> Optional[Dict[str, torch.Tensor]]: 135 | """ 136 | Get task-specific observations. 137 | 138 | The base task doesn't provide any task-specific observations. 139 | 140 | Args: 141 | obs_dict: Dictionary of current observations 142 | 143 | Returns: 144 | None, indicating no task-specific observations 145 | """ 146 | return None 147 | 148 | def set_tensor_references(self, root_state_tensor: torch.Tensor): 149 | """ 150 | Set references to simulation tensors needed by the task. 151 | 152 | The base task doesn't need tensor references. 153 | 154 | Args: 155 | root_state_tensor: Root state tensor for all actors 156 | """ 157 | pass 158 | -------------------------------------------------------------------------------- /dexhand_env/utils/experiment_manager.py: -------------------------------------------------------------------------------- 1 | """ 2 | Experiment directory management utilities for DexHand. 3 | 4 | Simple experiment management with train/test separation and latest symlinks. 5 | """ 6 | 7 | from pathlib import Path 8 | from typing import Optional, List 9 | 10 | 11 | def classify_experiment_type(experiment_name: str) -> str: 12 | """Classify experiment as 'train' or 'test' based on name.""" 13 | return "test" if "_test_" in experiment_name.lower() else "train" 14 | 15 | 16 | class ExperimentManager: 17 | """ 18 | Manages experiment directories with workspace and archive. 19 | 20 | Structure: 21 | - runs_all/: Archive with all experiments 22 | - runs/: Workspace with recent experiments (symlinks) 23 | - runs/latest_train, runs/latest_test: Latest symlinks 24 | """ 25 | 26 | def __init__(self, max_train_runs: int = 10, max_test_runs: int = 10): 27 | self.max_train_runs = max_train_runs 28 | self.max_test_runs = max_test_runs 29 | 30 | self.runs_all_dir = Path("runs_all") 31 | self.runs_dir = Path("runs") 32 | 33 | self._ensure_directories() 34 | 35 | def _ensure_directories(self): 36 | """Ensure required directories exist.""" 37 | self.runs_all_dir.mkdir(exist_ok=True) 38 | self.runs_dir.mkdir(exist_ok=True) 39 | 40 | def create_experiment_directory(self, experiment_name: str) -> Path: 41 | """Create experiment directory and manage workspace.""" 42 | # Create in archive 43 | archive_dir = self.runs_all_dir / experiment_name 44 | archive_dir.mkdir(parents=True, exist_ok=True) 45 | 46 | # Create workspace symlink 47 | workspace_symlink = self.runs_dir / experiment_name 48 | if not workspace_symlink.exists(): 49 | workspace_symlink.symlink_to(archive_dir.absolute()) 50 | 51 | # Cleanup and update symlinks 52 | self._cleanup_workspace() 53 | self._update_latest_symlinks() 54 | 55 | return archive_dir 56 | 57 | def _cleanup_workspace(self): 58 | """Remove old symlinks to maintain limits.""" 59 | # Get all experiment symlinks (exclude latest_* symlinks) 60 | symlinks = [ 61 | item 62 | for item in self.runs_dir.iterdir() 63 | if item.is_symlink() and not item.name.startswith("latest_") 64 | ] 65 | 66 | # Separate by type 67 | train_symlinks = [ 68 | s for s in symlinks if classify_experiment_type(s.name) == "train" 69 | ] 70 | test_symlinks = [ 71 | s for s in symlinks if classify_experiment_type(s.name) == "test" 72 | ] 73 | 74 | # Sort by modification time (newest first) 75 | train_symlinks.sort(key=lambda p: p.lstat().st_mtime, reverse=True) 76 | test_symlinks.sort(key=lambda p: p.lstat().st_mtime, reverse=True) 77 | 78 | # Remove old symlinks 79 | for old_symlink in train_symlinks[self.max_train_runs :]: 80 | old_symlink.unlink() 81 | for old_symlink in test_symlinks[self.max_test_runs :]: 82 | old_symlink.unlink() 83 | 84 | def _update_latest_symlinks(self): 85 | """Update latest_train and latest_test symlinks.""" 86 | experiments = self.get_all_experiments() 87 | 88 | # Separate by type 89 | train_experiments = [ 90 | e for e in experiments if classify_experiment_type(e.name) == "train" 91 | ] 92 | test_experiments = [ 93 | e for e in experiments if classify_experiment_type(e.name) == "test" 94 | ] 95 | 96 | # Update latest_train 97 | if train_experiments: 98 | self._update_symlink("latest_train", train_experiments[0]) 99 | 100 | # Update latest_test 101 | if test_experiments: 102 | self._update_symlink("latest_test", test_experiments[0]) 103 | 104 | def _update_symlink(self, symlink_name: str, target: Path): 105 | """Update a symlink to point to target.""" 106 | symlink_path = self.runs_dir / symlink_name 107 | if symlink_path.exists() or symlink_path.is_symlink(): 108 | symlink_path.unlink() 109 | symlink_path.symlink_to(target.absolute()) 110 | 111 | def get_all_experiments(self) -> List[Path]: 112 | """Get all experiments sorted by modification time (newest first).""" 113 | experiments = [] 114 | if self.runs_all_dir.exists(): 115 | experiments.extend(d for d in self.runs_all_dir.iterdir() if d.is_dir()) 116 | return sorted(experiments, key=lambda p: p.stat().st_mtime, reverse=True) 117 | 118 | def get_latest_experiment(self, run_type: str = "train") -> Optional[Path]: 119 | """Get latest experiment of specified type.""" 120 | experiments = self.get_all_experiments() 121 | filtered = [ 122 | e for e in experiments if classify_experiment_type(e.name) == run_type 123 | ] 124 | return filtered[0] if filtered else None 125 | 126 | 127 | def create_experiment_manager(cfg) -> ExperimentManager: 128 | """Create ExperimentManager from configuration.""" 129 | experiment_cfg = getattr(cfg, "experiment", {}) 130 | max_train_runs = getattr(experiment_cfg, "maxTrainRuns", 10) 131 | max_test_runs = getattr(experiment_cfg, "maxTestRuns", 10) 132 | 133 | return ExperimentManager(max_train_runs=max_train_runs, max_test_runs=max_test_runs) 134 | -------------------------------------------------------------------------------- /assets/mjcf/nv_ant.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 93 | -------------------------------------------------------------------------------- /docs/guide-indefinite-testing.md: -------------------------------------------------------------------------------- 1 | # Indefinite Policy Testing with Hot-Reload 2 | 3 | During training, you want to monitor how your policy evolves without constantly restarting test scripts or losing visual feedback. This guide shows how to set up continuous policy monitoring that automatically loads new checkpoints as training progresses. 4 | 5 | ## The Problem 6 | 7 | Traditional policy evaluation during training is cumbersome: 8 | - **Manual restarts**: Stop test script, find latest checkpoint, restart with new path 9 | - **Static evaluation**: Test a frozen checkpoint while training continues 10 | - **Visual gaps**: No continuous visual feedback on policy improvement 11 | 12 | ## The Solution: Hot-Reload Testing 13 | 14 | Hot-reload testing solves this by running indefinite policy testing with automatic checkpoint discovery and reloading. The system continuously monitors your experiment directory and seamlessly loads newer checkpoints without interrupting the visual feedback loop. 15 | 16 | **Key capabilities:** 17 | - **Automatic discovery**: `checkpoint=latest` finds your most recent training experiment using the experiment management system (see [TRAINING.md](TRAINING.md)) 18 | - **Live reloading**: Monitors the experiment directory and loads new checkpoints every 30 seconds (configurable) 19 | - **Indefinite testing**: Runs until manual termination (`testGamesNum=0`) 20 | - **Deployment flexibility**: Works with local Isaac Gym viewer or remote HTTP streaming 21 | 22 | **The `checkpoint=latest` magic:** 23 | 1. **Directory discovery**: Resolves to latest experiment directory via `runs/latest_train` symlink 24 | 2. **Continuous monitoring**: Watches the resolved directory (not a static file) for new checkpoints 25 | 3. **Dynamic loading**: Automatically loads the newest `.pth` file found in `nn/` subdirectory 26 | 27 | ## Deployment Scenarios 28 | 29 | ### Scenario 1: Local Workstation with Server Training 30 | 31 | **When to use**: You can run Isaac Gym viewer locally but training happens on a remote server. 32 | 33 | **Advantages**: Full Isaac Gym interactivity, better visual quality, local keyboard controls 34 | **Trade-offs**: Requires checkpoint synchronization, slightly more setup 35 | 36 | **Server (training):** 37 | ```bash 38 | python train.py config=train_headless task=BlindGrasping 39 | ``` 40 | 41 | **Local (checkpoint sync):** 42 | ```bash 43 | # Option A: Simple rsync loop 44 | while true; do 45 | rsync -av server:/path/to/dexrobot_isaac/runs/ ./runs/ 46 | sleep 30 47 | done & 48 | 49 | # Option B: File synchronization tools 50 | unison server_profile -repeat 30 51 | ``` 52 | 53 | **Local (testing):** 54 | ```bash 55 | python train.py config=test_viewer testGamesNum=0 checkpoint=latest 56 | # Uses runs/latest_train symlink → experiment directory → newest checkpoint 57 | ``` 58 | 59 | ### Scenario 2: Remote Server Monitoring 60 | 61 | **When to use**: Training and testing both happen on remote server, monitor via browser. 62 | 63 | **Advantages**: No file synchronization needed, accessible from anywhere, simpler setup 64 | **Trade-offs**: HTTP streaming limitations, browser-based viewing only 65 | 66 | **Server (training):** 67 | ```bash 68 | python train.py config=train_headless task=BlindGrasping 69 | ``` 70 | 71 | **Server (monitoring):** 72 | ```bash 73 | python train.py config=test_stream testGamesNum=0 checkpoint=latest streamBindAll=true 74 | # streamBindAll enables access from external IPs (security warning applies) 75 | ``` 76 | 77 | **Access**: Open `http://server-ip:58080` in browser 78 | 79 | ## Basic Usage 80 | 81 | **Indefinite testing with hot-reload:** 82 | ```bash 83 | python train.py config=test_viewer testGamesNum=0 checkpoint=latest 84 | ``` 85 | 86 | **Customize reload timing:** 87 | ```bash 88 | python train.py config=test_viewer testGamesNum=0 checkpoint=latest reloadInterval=60 89 | ``` 90 | 91 | **Use specific experiment:** 92 | ```bash 93 | python train.py config=test_viewer testGamesNum=0 checkpoint=runs/BlindGrasping_train_20250801_095943 94 | ``` 95 | 96 | ## Configuration Reference 97 | 98 | **Test duration control:** 99 | - `testGamesNum=0`: Run indefinitely until Ctrl+C (most common for monitoring) 100 | - `testGamesNum=25`: Run exactly 25 episodes then terminate 101 | 102 | **Hot-reload settings:** 103 | - `reloadInterval=30`: Check for new checkpoints every 30 seconds (default) 104 | - `reloadInterval=0`: Disable hot-reload, use static checkpoint 105 | 106 | **Configuration presets:** 107 | - `test_viewer.yaml`: Interactive Isaac Gym viewer (4 environments) 108 | - `test_stream.yaml`: HTTP video streaming (headless) 109 | - `test.yaml`: Base headless testing configuration 110 | 111 | **Parameter overrides:** 112 | ```bash 113 | # Fewer environments for smoother visualization 114 | python train.py config=test_viewer testGamesNum=0 checkpoint=latest numEnvs=1 115 | 116 | # Longer reload interval to reduce overhead 117 | python train.py config=test_viewer testGamesNum=0 checkpoint=latest reloadInterval=120 118 | ``` 119 | 120 | ## How Hot-Reload Works 121 | 122 | The hot-reload system uses a background thread that: 123 | 124 | 1. **Resolves experiment directory**: `checkpoint=latest` → `runs/latest_train` symlink → actual experiment directory 125 | 2. **Monitors for changes**: Uses `find_latest_checkpoint_file()` to check for new `.pth` files in the experiment's `nn/` directory 126 | 3. **Detects updates**: Compares file modification times every `reloadInterval` seconds 127 | 4. **Loads seamlessly**: When a newer checkpoint is found, loads the new weights into the running policy without interrupting the episode 128 | 5. **Logs events**: Clear console output shows when reloads occur 129 | 130 | This design enables true continuous monitoring - you start the test process once and watch your policy improve throughout the entire training session. 131 | -------------------------------------------------------------------------------- /dexhand_env/components/action/default_rules.py: -------------------------------------------------------------------------------- 1 | """ 2 | Default action rules for DexHand environment. 3 | 4 | This module provides default action rule implementations for different control modes 5 | (position and position_delta) that can be used across different tasks. 6 | """ 7 | 8 | from typing import Callable 9 | from loguru import logger 10 | 11 | 12 | class DefaultActionRules: 13 | """ 14 | Factory for default action rules used by the DexHand environment. 15 | 16 | Provides position and position_delta action rules that handle scaling 17 | and applying policy actions to DOF targets while preserving rule-based 18 | control for non-policy-controlled DOFs. 19 | """ 20 | 21 | @staticmethod 22 | def create_position_action_rule(action_processor) -> Callable: 23 | """ 24 | Create a position mode action rule. 25 | 26 | Args: 27 | action_processor: ActionProcessor instance for accessing scaling utilities 28 | 29 | Returns: 30 | Callable action rule function 31 | """ 32 | 33 | def position_action_rule( 34 | active_prev_targets, active_rule_targets, actions, config 35 | ): 36 | """Default position mode action rule using ActionScaling utilities.""" 37 | # Start with rule targets - preserves rule-based control for uncontrolled DOFs 38 | targets = active_rule_targets.clone() 39 | scaling = action_processor.action_scaling 40 | 41 | # Only update the DOFs that the policy controls 42 | if config["policy_controls_base"]: 43 | # Scale base actions from [-1, 1] to DOF limits 44 | base_lower = action_processor.active_lower_limits[:6] 45 | base_upper = action_processor.active_upper_limits[:6] 46 | scaled_base = scaling.scale_to_limits( 47 | actions[:, :6], base_lower, base_upper 48 | ) 49 | targets[:, :6] = scaled_base 50 | 51 | if config["policy_controls_fingers"]: 52 | # Get finger action indices 53 | finger_start = 6 if config["policy_controls_base"] else 0 54 | finger_end = finger_start + 12 55 | 56 | # Scale finger actions from [-1, 1] to DOF limits 57 | finger_lower = action_processor.active_lower_limits[6:] 58 | finger_upper = action_processor.active_upper_limits[6:] 59 | scaled_fingers = scaling.scale_to_limits( 60 | actions[:, finger_start:finger_end], finger_lower, finger_upper 61 | ) 62 | targets[:, 6:] = scaled_fingers 63 | 64 | return targets 65 | 66 | return position_action_rule 67 | 68 | @staticmethod 69 | def create_position_delta_action_rule(action_processor) -> Callable: 70 | """ 71 | Create a position_delta mode action rule. 72 | 73 | Args: 74 | action_processor: ActionProcessor instance for accessing scaling utilities 75 | 76 | Returns: 77 | Callable action rule function 78 | """ 79 | 80 | def position_delta_action_rule( 81 | active_prev_targets, active_rule_targets, actions, config 82 | ): 83 | """Default position_delta mode action rule using ActionScaling utilities.""" 84 | # Start with rule targets 85 | targets = active_rule_targets.clone() 86 | ap = action_processor 87 | scaling = ap.action_scaling 88 | 89 | if config["policy_controls_base"]: 90 | # Apply base deltas using ActionScaling utility 91 | targets[:, :6] = scaling.apply_velocity_deltas( 92 | active_prev_targets[:, :6], actions[:, :6], ap.max_deltas[:6] 93 | ) 94 | 95 | if config["policy_controls_fingers"]: 96 | # Get finger action indices 97 | finger_start = 6 if config["policy_controls_base"] else 0 98 | finger_end = finger_start + 12 99 | 100 | # Apply finger deltas using ActionScaling utility 101 | targets[:, 6:] = scaling.apply_velocity_deltas( 102 | active_prev_targets[:, 6:], 103 | actions[:, finger_start:finger_end], 104 | ap.max_deltas[6:], 105 | ) 106 | 107 | # Clamp to limits using ActionScaling utility 108 | targets = scaling.clamp_to_limits( 109 | targets, ap.active_lower_limits, ap.active_upper_limits 110 | ) 111 | 112 | return targets 113 | 114 | return position_delta_action_rule 115 | 116 | @staticmethod 117 | def setup_default_action_rule(action_processor, control_mode: str): 118 | """ 119 | Set up a default action rule based on control mode. 120 | 121 | Args: 122 | action_processor: ActionProcessor instance to configure 123 | control_mode: Control mode ("position" or "position_delta") 124 | """ 125 | if control_mode == "position": 126 | action_rule = DefaultActionRules.create_position_action_rule( 127 | action_processor 128 | ) 129 | action_processor.set_action_rule(action_rule) 130 | logger.debug("Configured default position action rule") 131 | elif control_mode == "position_delta": 132 | action_rule = DefaultActionRules.create_position_delta_action_rule( 133 | action_processor 134 | ) 135 | action_processor.set_action_rule(action_rule) 136 | logger.debug("Configured default position_delta action rule") 137 | else: 138 | raise ValueError(f"Unknown control mode: {control_mode}") 139 | -------------------------------------------------------------------------------- /docs/guide-environment-resets.md: -------------------------------------------------------------------------------- 1 | # Environment Reset System Guide 2 | 3 | This guide explains the environment reset and termination system in DexHand Isaac environments. 4 | 5 | ## Architecture Overview 6 | 7 | The DexHand environment uses a clean separation between **termination decisions** and **reset execution**: 8 | 9 | - **TerminationManager**: Decides which environments should reset and why (success/failure/timeout) 10 | - **ResetManager**: Executes physical resets (DOF states, poses, randomization) 11 | - **BaseTask**: Coordinates between the two and manages episode progress 12 | 13 | ## Key Components 14 | 15 | ### TerminationManager 16 | **Responsibility**: Evaluate termination conditions and decide which environments should reset 17 | 18 | **Three Termination Types**: 19 | - **Success**: Task completed successfully (positive reward) 20 | - **Failure**: Task failed due to violation (negative reward) 21 | - **Timeout**: Episode reached max length (neutral reward) 22 | 23 | **Key Methods**: 24 | - `evaluate(episode_step_count, success_criteria, failure_criteria)` → returns `(should_reset, termination_info, episode_rewards)` 25 | 26 | ### ResetManager 27 | **Responsibility**: Execute physical environment resets 28 | 29 | **Key Methods**: 30 | - `reset_idx(env_ids)`: Reset DOF states, poses, and apply randomization 31 | - `set_episode_step_count_buffer(buffer)`: Reference to shared episode step counter 32 | 33 | **What it does NOT do**: 34 | - ❌ Does not decide when to reset (no check_termination method) 35 | - ❌ Does not increment episode progress (no increment_progress method) 36 | 37 | ### Episode Progress Management 38 | **Handled directly in BaseTask**: 39 | ```python 40 | # In BaseTask.post_physics_step(): 41 | self.episode_step_count += 1 # Direct increment, no method needed 42 | ``` 43 | 44 | ## Clean Data Flow 45 | 46 | ```python 47 | # In BaseTask.post_physics_step(): 48 | 49 | # 1. Update episode progress directly 50 | self.episode_step_count += 1 51 | 52 | # 2. Evaluate termination conditions 53 | should_reset, termination_info, episode_rewards = self.termination_manager.evaluate( 54 | self.episode_step_count, success_criteria, failure_criteria 55 | ) 56 | 57 | # 3. Apply termination rewards 58 | for reward_type, reward_tensor in episode_rewards.items(): 59 | self.rew_buf += reward_tensor 60 | 61 | # 4. Reset environments that should reset 62 | if torch.any(should_reset): 63 | env_ids_to_reset = torch.nonzero(should_reset).flatten() 64 | self.reset_manager.reset_idx(env_ids_to_reset) 65 | self.termination_manager.reset_tracking(env_ids_to_reset) 66 | ``` 67 | 68 | ## Key Buffers 69 | 70 | ### should_reset Tensor 71 | - **Type**: `torch.Tensor` of shape `(num_envs,)` with dtype `torch.bool` 72 | - **Purpose**: Boolean flags indicating which environments should reset 73 | - **Generated by**: TerminationManager.evaluate() 74 | - **Used by**: BaseTask to determine which environments to reset 75 | 76 | ### episode_step_count Buffer 77 | - **Type**: `torch.Tensor` of shape `(num_envs,)` with dtype `torch.long` 78 | - **Purpose**: Tracks the number of steps in each environment's current episode 79 | - **Managed by**: BaseTask (direct increment) 80 | - **Reset by**: ResetManager.reset_idx() sets to 0 for reset environments 81 | 82 | ## Termination Types and Logging 83 | 84 | The TerminationManager provides detailed termination information for rl_games TensorBoard logging: 85 | 86 | ```python 87 | termination_info = { 88 | "success": success_termination, # Boolean tensor 89 | "failure": failure_termination, # Boolean tensor 90 | "timeout": timeout_termination, # Boolean tensor 91 | "success_rate": success_count / num_envs, 92 | "failure_rate": failure_count / num_envs, 93 | "timeout_rate": timeout_count / num_envs, 94 | } 95 | ``` 96 | 97 | This enables proper reward logging in TensorBoard because rl_games can track when episodes actually complete. 98 | 99 | ## Extending Termination Criteria 100 | 101 | ### Adding Task-Specific Success Criteria 102 | ```python 103 | # In your task class: 104 | def check_task_success_criteria(self): 105 | return { 106 | "object_grasped": self.check_grasp_success(), 107 | "target_reached": self.check_target_distance(), 108 | } 109 | ``` 110 | 111 | ### Adding Task-Specific Failure Criteria 112 | ```python 113 | # In your task class: 114 | def check_task_failure_criteria(self): 115 | return { 116 | "hand_dropped": self.check_hand_height(), 117 | "object_dropped": self.check_object_height(), 118 | } 119 | ``` 120 | 121 | ## Configuration 122 | 123 | ### Episode Length (Timeout) 124 | Set in task configuration: 125 | ```yaml 126 | # BaseTask.yaml 127 | env: 128 | episodeLength: 300 # Steps before timeout termination 129 | ``` 130 | 131 | ### Termination Rewards 132 | Set in task configuration: 133 | ```yaml 134 | # BaseTask.yaml 135 | env: 136 | successReward: 10.0 # Reward for success termination 137 | failurePenalty: 5.0 # Penalty for failure termination (applied as negative) 138 | timeoutReward: 0.0 # Reward for timeout termination (usually neutral) 139 | ``` 140 | 141 | ## Testing Episode Termination 142 | 143 | You can test termination behavior with: 144 | ```bash 145 | python examples/dexhand_test.py --episode-length 10 146 | ``` 147 | 148 | This should reset environments every 10 steps due to timeout termination, visible in the logs. 149 | 150 | ## Benefits of This Architecture 151 | 152 | 1. **No Duplication**: Single timeout check in TerminationManager (was previously duplicated) 153 | 2. **Clear Separation**: Decision logic (TerminationManager) vs execution logic (ResetManager) 154 | 3. **Better Logging**: Proper termination type tracking enables rl_games TensorBoard integration 155 | 4. **Extensible**: Easy to add new termination types or task-specific criteria 156 | 5. **Maintainable**: Single responsibility principle for each component 157 | 158 | ## References 159 | - TerminationManager implementation: `dexhand_env/components/termination_manager.py` 160 | - ResetManager implementation: `dexhand_env/components/reset_manager.py` 161 | - Integration example: `dexhand_env/tasks/dexhand_base.py` 162 | -------------------------------------------------------------------------------- /prompts/refactor-008-config-key-casing.md: -------------------------------------------------------------------------------- 1 | # refactor-008-config-key-casing.md 2 | 3 | Unify configuration key naming to snake_case under task section for code consistency. 4 | 5 | ## Context 6 | 7 | The configuration files have inconsistent naming conventions, particularly in the `task:` section where some keys use camelCase while Python code conventions prefer snake_case. This creates cognitive friction when working between config files and Python code. 8 | 9 | **Design Decision**: Keep other sections (env, sim, train) as camelCase for CLI usability, but unify the `task:` section to snake_case for code consistency since these keys are primarily accessed by Python code rather than CLI overrides. 10 | 11 | ## Current State 12 | 13 | **BaseTask.yaml - 16 camelCase keys in task section:** 14 | - `policyControlsHandBase` → `policy_controls_hand_base` 15 | - `policyControlsFingers` → `policy_controls_fingers` 16 | - `defaultBaseTargets` → `default_base_targets` 17 | - `defaultFingerTargets` → `default_finger_targets` 18 | - `maxFingerJointVelocity` → `max_finger_joint_velocity` 19 | - `maxBaseLinearVelocity` → `max_base_linear_velocity` 20 | - `maxBaseAngularVelocity` → `max_base_angular_velocity` 21 | - `activeSuccessCriteria` → `active_success_criteria` 22 | - `activeFailureCriteria` → `active_failure_criteria` 23 | - `rewardWeights` → `reward_weights` 24 | - `enableComponentDebugLogs` → `enable_component_debug_logs` 25 | - `maxConsecutiveSuccesses` → `max_consecutive_successes` 26 | - `contactForceBodies` → `contact_force_bodies` 27 | - `contactBinaryThreshold` → `contact_binary_threshold` 28 | - `contactVisualization` → `contact_visualization` 29 | - `policyObservationKeys` → `policy_observation_keys` 30 | 31 | **BlindGrasping.yaml - 9 camelCase keys in task section:** 32 | - `maxBaseLinearVelocity` → `max_base_linear_velocity` 33 | - `maxBaseAngularVelocity` → `max_base_angular_velocity` 34 | - `maxFingerJointVelocity` → `max_finger_joint_velocity` 35 | - `contactBinaryThreshold` → `contact_binary_threshold` 36 | - `penetrationPrevention` → `penetration_prevention` 37 | - `policyObservationKeys` → `policy_observation_keys` 38 | - `activeSuccessCriteria` → `active_success_criteria` 39 | - `activeFailureCriteria` → `active_failure_criteria` 40 | - `rewardWeights` → `reward_weights` 41 | 42 | ## Desired Outcome 43 | 44 | 1. **Configuration Files**: All task section keys use consistent snake_case naming 45 | 2. **Code References**: All Python code references updated to use new snake_case keys 46 | 3. **No Breaking CLI**: env/sim/train sections keep camelCase for CLI usability 47 | 4. **Zero Backward Compatibility**: Clean break, no legacy support needed 48 | 49 | ## Code References Requiring Updates 50 | 51 | **9 Python files with 17 key references:** 52 | 53 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/tasks/dexhand_base.py` (11 references) 54 | - Line 362, 364: `contactForceBodies` → `contact_force_bodies` 55 | - Line 468: `policyControlsHandBase` → `policy_controls_hand_base` 56 | - Line 469: `policyControlsFingers` → `policy_controls_fingers` 57 | - Line 470: `maxFingerJointVelocity` → `max_finger_joint_velocity` 58 | - Line 471: `maxBaseLinearVelocity` → `max_base_linear_velocity` 59 | - Line 472: `maxBaseAngularVelocity` → `max_base_angular_velocity` 60 | - Line 480, 482: `defaultBaseTargets` → `default_base_targets` 61 | - Line 484, 486: `defaultFingerTargets` → `default_finger_targets` 62 | - Line 1103: `enableComponentDebugLogs` → `enable_component_debug_logs` 63 | 64 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/components/termination/termination_manager.py` (4 references) 65 | - Line 48: `activeSuccessCriteria` → `active_success_criteria` 66 | - Line 49: `activeFailureCriteria` → `active_failure_criteria` 67 | - Line 59: `rewardWeights` → `reward_weights` 68 | - Line 71: `maxConsecutiveSuccesses` → `max_consecutive_successes` 69 | 70 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/tasks/blind_grasping_task.py` (3 references) 71 | - Line 185: `penetrationPrevention` → `penetration_prevention` 72 | - Line 797: `contactBinaryThreshold` → `contact_binary_threshold` 73 | - Line 1245: `activeFailureCriteria` → `active_failure_criteria` (in comment) 74 | 75 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/components/reward/reward_calculator.py` (1 reference) 76 | - Line 37: `rewardWeights` → `reward_weights` 77 | 78 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/components/observation/observation_encoder.py` (1 reference) 79 | - Line 712: `contactBinaryThreshold` → `contact_binary_threshold` 80 | 81 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/components/graphics/viewer_controller.py` (1 reference) 82 | - Line 76: `contactVisualization` → `contact_visualization` 83 | 84 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/components/initialization/initialization_manager.py` (1 reference) 85 | - Line 71: `policyObservationKeys` → `policy_observation_keys` 86 | 87 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/components/initialization/hand_initializer.py` (1 reference) 88 | - Line 557: `contactForceBodies` → `contact_force_bodies` (in comment) 89 | 90 | ### `/home/yiwen/dexrobot_isaac/examples/dexhand_test.py` (1 reference) 91 | - Line 1044: `policyControlsHandBase` → `policy_controls_hand_base` (in comment) 92 | 93 | ## Implementation Notes 94 | 95 | **Architecture Compliance:** 96 | - Follows fail-fast philosophy - no backward compatibility, clean break 97 | - Maintains single source of truth - no dual naming support 98 | - Preserves CLI usability for frequently-used env/sim/train keys 99 | 100 | **Testing Strategy:** 101 | - Test both BaseTask and BlindGrasping task loading 102 | - Verify test script and training pipeline work correctly 103 | - Confirm all config key access patterns function properly 104 | 105 | **Breaking Change Protocol:** 106 | - No backward compatibility required per CLAUDE.md guidelines 107 | - Clean architectural improvement with immediate effect 108 | - All references must be updated atomically 109 | 110 | ## Constraints 111 | 112 | - **Scope Limited**: Only `task:` section keys, leave env/sim/train as camelCase 113 | - **No Legacy Support**: Clean break, all references must be updated 114 | - **Architecture Boundaries**: Respect component separation during updates 115 | - **Testing Required**: Both test script and training pipeline must work 116 | 117 | ## Dependencies 118 | 119 | None - standalone refactoring task with no external dependencies. 120 | -------------------------------------------------------------------------------- /prompts/fix-001-contact-viz.md: -------------------------------------------------------------------------------- 1 | # fix-001-contact-viz.md 2 | 3 | Contact visualization is implemented but not working correctly due to physics data pipeline and timing issues. 4 | 5 | ## Context 6 | 7 | The contact visualization system consists of multiple components that must work together: 8 | - Configuration loading from `task.contactVisualization` in BaseTask.yaml 9 | - Contact force data pipeline from Isaac Gym → TensorManager → ViewerController 10 | - Real-time color updates based on contact force magnitudes 11 | - Keyboard toggle ('C' key) for enabling/disabling visualization 12 | 13 | Initial investigation suggested configuration issues, but thorough analysis revealed the configuration loading path works correctly. 14 | 15 | ## Current State 16 | 17 | **Working Components:** 18 | - ✅ Configuration properly defined in BaseTask.yaml with correct inheritance to BlindGrasping 19 | - ✅ ViewerController correctly accesses config via `parent.task_cfg.get("contactVisualization", {})` 20 | - ✅ Contact body names (`r_f_link*_4`) exist in MJCF and resolve to valid indices 21 | - ✅ Keyboard shortcut 'C' registered and toggle logging implemented 22 | - ✅ Contact visualization rendering pipeline implemented 23 | 24 | **Root Cause Identified:** 25 | 26 | **Architecture Issue**: ViewerController accesses parent's `contact_forces` tensor via `self.parent.contact_forces`, but this tensor is only refreshed during the main simulation step through `physics_manager.step_physics(refresh_tensors=True)`. 27 | 28 | In `ViewerController.render()`, the code calls `gym.refresh_net_contact_force_tensor(self.sim)` to refresh Isaac Gym's tensor, but doesn't update the parent's `contact_forces` tensor through `TensorManager.refresh_tensors()`. This creates a timing mismatch where ViewerController sees stale contact force data. 29 | 30 | **Investigation Results:** 31 | 32 | ✅ **Physics setup works correctly**: ObservationEncoder can access non-zero contact forces, confirming Isaac Gym generates proper contact data 33 | ✅ **TensorManager refresh works correctly**: The main simulation loop properly calls `refresh_tensors()` 34 | ❌ **ViewerController has stale data**: It calls Isaac Gym refresh but doesn't update parent's tensor 35 | 36 | ## Alternative Architecture Solution 37 | 38 | **Proposed Fix**: Instead of ViewerController accessing `self.parent.contact_forces` (which requires coordinated tensor refresh timing), ViewerController should access contact forces from the already-computed `obs_dict`. 39 | 40 | **Available Contact Data in obs_dict:** 41 | - `contact_forces`: Raw 3D force vectors per contact body [num_envs, num_bodies * 3] 42 | - `contact_force_magnitude`: Computed force magnitudes [num_envs, num_bodies] 43 | - `contact_binary`: Binary contact indicators [num_envs, num_bodies] 44 | 45 | **Architectural Benefits:** 46 | 1. **Single source of truth**: Contact forces already computed correctly in observation pipeline 47 | 2. **No timing issues**: `obs_dict` computed at right time in simulation loop 48 | 3. **Clean separation**: ViewerController becomes consumer of processed data, not raw physics tensors 49 | 4. **No coupling**: ViewerController doesn't need TensorManager knowledge 50 | 5. **Reuses working code**: ObservationEncoder already processes contact forces correctly 51 | 52 | ## Desired Outcome 53 | 54 | Contact visualization should work reliably: 55 | - Bodies change color (red intensity) based on contact force magnitude 56 | - Colors update in real-time during simulation 57 | - Toggle with 'C' key shows proper enable/disable logging 58 | - System handles edge cases gracefully with proper error messages 59 | 60 | ## Implementation Strategy 61 | 62 | **Recommended Fix**: Modify ViewerController to access contact forces from `obs_dict` instead of `self.parent.contact_forces` 63 | 64 | **Implementation Steps:** 65 | 1. **Add obs_dict access**: ViewerController needs access to current observation dictionary 66 | 2. **Update contact force source**: Use `obs_dict["contact_force_magnitude"]` instead of computing `torch.norm(self.parent.contact_forces, dim=2)` 67 | 3. **Verify data format**: Ensure obs_dict contact data matches visualization expectations 68 | 4. **Clean up**: Remove unused Isaac Gym tensor refresh calls in ViewerController 69 | 70 | **Technical Details:** 71 | - ViewerController currently computes: `force_magnitudes = torch.norm(contact_forces, dim=2)` 72 | - obs_dict already provides: `contact_force_magnitude` with identical computation 73 | - Shape compatibility: Both are [num_envs, num_bodies] tensors with force magnitudes 74 | 75 | **Testing Approach:** 76 | - Run with BlindGrasping task and make hand contact with box 77 | - Press 'C' to toggle contact visualization and verify logging shows correct config values 78 | - Confirm color changes occur during contact events based on obs_dict data 79 | - Verify performance impact is minimal (should be better since no duplicate computation) 80 | 81 | ## Constraints 82 | 83 | - Must maintain fail-fast philosophy - prefer clear crashes over silent failures 84 | - Respect component architecture patterns and property decorators 85 | - Cannot modify MJCF collision exclusions without understanding full physics implications 86 | - Must preserve existing visualization performance optimizations 87 | 88 | ## Dependencies 89 | 90 | None - this is a self-contained fix within the graphics and physics data pipeline. 91 | 92 | ## Implementation Status - ✅ **COMPLETED** (2025-07-28) 93 | 94 | ### ✅ **COMPLETED TASKS:** 95 | 1. **Modified ViewerController.render()** - Added obs_dict parameter with fail-fast validation 96 | 2. **Updated DexHandBase integration** - Now passes self.obs_dict to viewer_controller.render() 97 | 3. **Implemented fail-fast architecture** - Removed all fallback logic per CLAUDE.md guidelines 98 | 4. **Updated method signature** - update_contact_force_colors() now expects contact_force_magnitudes tensor 99 | 5. **Fixed NameError** - Changed `contact_forces.device` to `contact_force_magnitudes.device` at line 510 100 | 6. **Fixed tensor indexing** - Corrected subset selection for contact bodies with valid indices 101 | 7. **Fixed color comparison logic** - Updated torch.allclose to torch.isclose with proper tensor dimensions 102 | 8. **Fixed dimension handling** - Corrected tensor indexing for color update operations 103 | 104 | ### ✅ **FINAL TESTING RESULTS:** 105 | - Environment initialization completes without crashes 106 | - Contact visualization keyboard shortcut ('C' key) properly registered 107 | - No NameError exceptions during rendering 108 | - System correctly handles obs_dict-based contact force data 109 | - Contact visualization displays red color intensity on finger bodies based on contact force magnitude 110 | 111 | **Architecture Benefits Achieved:** 112 | - Single source of truth: obs_dict contact data 113 | - Eliminated timing issues with stale tensor data 114 | - Fail-fast validation prevents silent failures 115 | - Better performance: no duplicate force magnitude computation 116 | - Robust tensor handling for variable numbers of valid contact bodies 117 | -------------------------------------------------------------------------------- /prompts/fix-002-consistency.md: -------------------------------------------------------------------------------- 1 | # Fix-002: Test Script and Training Consistency Issues 2 | 3 | ## Problems Identified 4 | 5 | ### 1. Test Script Base Class Compatibility 6 | - Current test script (examples/dexhand_test.py) intentionally patches BaseTask to add contact test box 7 | - Need to ensure this patching approach works reliably with BaseTask 8 | - Verify the test script provides meaningful testing of base functionality 9 | 10 | ### 2. Test Script Argument Complexity 11 | - Test script has 95+ lines of argument definitions with many complex options 12 | - Arguments include: video/rendering, control modes, plotting, profiling, debugging 13 | - Many arguments may be redundant or overly complex for typical usage 14 | - Need to simplify while maintaining essential functionality 15 | 16 | ### 3. Examples Directory Organization 17 | - Only one test script (dexhand_test.py) in examples/ 18 | - No clear documentation of what the script tests or how to use it 19 | - Consider if organization/naming could be clearer 20 | 21 | ### 4. Training Compatibility Issues 22 | - Need to verify both "BaseTask" and "BlindGrasping" work with training pipeline 23 | - Check that task switching works properly in train.py 24 | - Ensure configs are compatible and well-documented 25 | 26 | ## Analysis Results 27 | 28 | ### Root Cause Identified 29 | The core consistency issue is a **configuration loading mismatch**: 30 | 31 | 1. **dexhand_test.py**: Uses `yaml.safe_load()` - no Hydra inheritance 32 | 2. **train.py**: Uses Hydra - inheritance works properly 33 | 3. **BlindGraspingTask**: Requires `contactForceBodies` but only inherits it via Hydra defaults 34 | 35 | **Test Results:** 36 | - ✅ `dexhand_test.py` works with BaseTask (has explicit `contactForceBodies`) 37 | - ❌ `dexhand_test.py` fails with BlindGrasping ("No contact force body indices provided") 38 | - ✅ `train.py` works with BaseTask (Hydra resolves inheritance) 39 | - ✅ `train.py` works with BlindGrasping (Hydra resolves inheritance) 40 | 41 | ### Recommended Solution: Switch Test Script to Hydra 42 | 43 | **Benefits:** 44 | 1. **Fixes core issue**: BlindGrasping inheritance works properly 45 | 2. **Configuration consistency**: Test and train use identical config systems 46 | 3. **Proven approach**: `train.py` already uses Hydra successfully 47 | 4. **Future-proofing**: Any new tasks with inheritance work automatically 48 | 5. **Eliminates manual inheritance**: No custom config resolution needed 49 | 50 | **Risks (All Manageable):** 51 | 1. **CLI syntax change**: From `--num-envs 2` to `env.numEnvs=2` (LOW risk - documentation update) 52 | 2. **Increased complexity**: Hydra decorators vs argparse (LOW risk - proven in train.py) 53 | 3. **Dependency consistency**: Need Hydra available (MINIMAL risk - already required) 54 | 55 | **Implementation Pattern:** 56 | ```python 57 | # Current: 95 lines of argparse + manual config loading 58 | def main(): 59 | parser = argparse.ArgumentParser() 60 | config = load_config(args.config) 61 | 62 | # New: Similar to train.py 63 | @hydra.main(version_base=None, config_path="dexhand_env/cfg", config_name="config") 64 | def main(cfg: DictConfig): 65 | # Config already loaded with inheritance! 66 | ``` 67 | 68 | ## Implementation Plan 69 | 70 | ### Phase 1: Convert Test Script to Hydra (HIGH PRIORITY) 71 | - Replace argparse with Hydra decorator and DictConfig 72 | - Update CLI argument syntax to match train.py patterns 73 | - Test both BaseTask and BlindGrasping functionality 74 | - Update examples documentation with new CLI syntax 75 | 76 | ### Phase 2: Validate Cross-Task Compatibility (MEDIUM PRIORITY) 77 | - Verify both scripts work with both tasks consistently 78 | - Test edge cases and configuration overrides 79 | - Document any remaining inconsistencies 80 | 81 | ### Phase 3: Documentation and Cleanup (LOW PRIORITY) 82 | - Add examples/README.md explaining test script purpose and usage 83 | - Consider argument simplification now that Hydra handles structure 84 | - Ensure consistent patterns between test and train workflows 85 | 86 | ## Implementation Status: FULLY COMPLETED ✅ 87 | 88 | ### ✅ Successfully Completed: 89 | 1. **Converted test script to Hydra**: Replaced argparse with `@hydra.main()` decorator 90 | 2. **Updated CLI syntax**: Changed from `--num-envs 2` to `env.numEnvs=2` pattern 91 | 3. **Fixed configuration access**: Updated to DictConfig dot notation throughout 92 | 4. **Resolved core inheritance issue**: BlindGrasping task now loads properly with Hydra inheritance 93 | 5. **Updated documentation**: Modified CLAUDE.md build commands and created examples/README.md 94 | 6. **Verified both tasks work**: BaseTask and BlindGrasping both function with Hydra configuration 95 | 7. **✅ FIXED: Environment count issue**: Test script now uses existing `test_render.yaml` with 4 environments 96 | 8. **✅ FIXED: Control mode validation**: Updated validation to accept both `position` and `position_delta` modes 97 | 9. **✅ VERIFIED: CLI overrides**: All command-line overrides work correctly with new configuration 98 | 99 | ### Final Implementation Changes: 100 | 101 | #### Fix 1: Used Existing Test Configuration 102 | **Solution**: Changed `@hydra.main(config_name="config")` to `@hydra.main(config_name="test_viewer")` 103 | **Result**: Test script now uses existing `base/test.yaml` with `env.numEnvs: 4` (reasonable for testing) 104 | **Benefits**: 105 | - No new files needed - reuses well-designed existing configuration 106 | - Gets proper test defaults (4 environments, fast physics, rendering enabled) 107 | - Leverages existing work optimized for testing scenarios 108 | 109 | #### Fix 2: Flexible Control Mode Validation 110 | **Location**: `examples/dexhand_test.py` lines 1155-1163 111 | **Solution**: Updated validation to accept both `position` and `position_delta` as valid modes 112 | **Code change**: Replaced strict mode matching with flexible validation allowing both modes 113 | **Result**: Both BaseTask (position_delta) and BlindGrasping (position_delta) work without errors 114 | 115 | #### Fix 3: Comprehensive Testing Verification 116 | **BaseTask**: ✅ Works with 4 environments, position_delta mode, proper rendering 117 | **BlindGrasping**: ✅ Works with position_delta mode, task assets load correctly, Hydra inheritance functional 118 | **CLI Overrides**: ✅ All overrides tested and working (`env.numEnvs=2`, `steps=50`, `headless=true`) 119 | 120 | ### Final Impact Assessment: 121 | - **CORE FUNCTIONALITY**: ✅ **FIXED** - BlindGrasping inheritance works perfectly 122 | - **USABILITY**: ✅ **FIXED** - Reasonable environment defaults, flexible mode validation 123 | - **CONSISTENCY**: ✅ **ACHIEVED** - Both scripts use identical Hydra system 124 | - **MAINTAINABILITY**: ✅ **IMPROVED** - Leverages existing test configurations, minimal code changes 125 | 126 | **Overall Status**: ✅ **FULLY COMPLETED** - All consistency issues resolved, both tasks work reliably with proper test defaults and flexible validation. 127 | 128 | REOPENED: `dexhand_test.py` should be hardcoded to use `BaseTask`, without an option for specifying a task. Docs mentioning this file should be updated accordingly. 129 | --------------------------------------------------------------------------------