├── .gitattributes
├── dexhand_env
├── __init__.py
├── tasks
│ ├── __init__.py
│ ├── base
│ │ └── __init__.py
│ ├── README.md
│ └── base_task.py
├── utils
│ ├── __init__.py
│ ├── coordinate_transforms.py
│ ├── config_utils.py
│ └── experiment_manager.py
├── components
│ ├── __init__.py
│ ├── physics
│ │ └── __init__.py
│ ├── reset
│ │ └── __init__.py
│ ├── reward
│ │ └── __init__.py
│ ├── observation
│ │ └── __init__.py
│ ├── termination
│ │ └── __init__.py
│ ├── initialization
│ │ └── __init__.py
│ ├── graphics
│ │ ├── __init__.py
│ │ └── video
│ │ │ └── __init__.py
│ └── action
│ │ ├── __init__.py
│ │ ├── scaling.py
│ │ └── default_rules.py
├── cfg
│ ├── test_viewer.yaml
│ ├── train_headless.yaml
│ ├── base
│ │ ├── debug.yaml
│ │ ├── test.yaml
│ │ └── video.yaml
│ ├── debug.yaml
│ ├── physics
│ │ ├── accurate.yaml
│ │ ├── fast.yaml
│ │ ├── default.yaml
│ │ └── README.md
│ ├── test_record.yaml
│ ├── test_stream.yaml
│ ├── test_script.yaml
│ ├── train
│ │ └── BaseTaskPPO.yaml
│ └── config.yaml
├── rl
│ └── __init__.py
├── constants.py
├── README.md
└── factory.py
├── assets
├── mjcf
│ ├── open_ai_assets
│ │ ├── stls
│ │ │ ├── .get
│ │ │ ├── hand
│ │ │ │ ├── F1.stl
│ │ │ │ ├── F2.stl
│ │ │ │ ├── F3.stl
│ │ │ │ ├── TH1_z.stl
│ │ │ │ ├── TH2_z.stl
│ │ │ │ ├── TH3_z.stl
│ │ │ │ ├── palm.stl
│ │ │ │ ├── wrist.stl
│ │ │ │ ├── knuckle.stl
│ │ │ │ ├── lfmetacarpal.stl
│ │ │ │ ├── forearm_electric.stl
│ │ │ │ └── forearm_electric_cvx.stl
│ │ │ └── fetch
│ │ │ │ ├── estop_link.stl
│ │ │ │ ├── laser_link.stl
│ │ │ │ ├── gripper_link.stl
│ │ │ │ ├── torso_fixed_link.stl
│ │ │ │ ├── base_link_collision.stl
│ │ │ │ ├── bellows_link_collision.stl
│ │ │ │ ├── l_wheel_link_collision.stl
│ │ │ │ ├── r_wheel_link_collision.stl
│ │ │ │ ├── elbow_flex_link_collision.stl
│ │ │ │ ├── head_pan_link_collision.stl
│ │ │ │ ├── head_tilt_link_collision.stl
│ │ │ │ ├── torso_lift_link_collision.stl
│ │ │ │ ├── wrist_flex_link_collision.stl
│ │ │ │ ├── wrist_roll_link_collision.stl
│ │ │ │ ├── forearm_roll_link_collision.stl
│ │ │ │ ├── shoulder_pan_link_collision.stl
│ │ │ │ ├── shoulder_lift_link_collision.stl
│ │ │ │ └── upperarm_roll_link_collision.stl
│ │ ├── textures
│ │ │ ├── block.png
│ │ │ └── block_hidden.png
│ │ ├── hand
│ │ │ ├── shadow_hand.xml
│ │ │ ├── egg.xml
│ │ │ ├── pen.xml
│ │ │ ├── reach.xml
│ │ │ ├── shared_asset.xml
│ │ │ ├── manipulate_pen.xml
│ │ │ ├── manipulate_pen_touch_sensors.xml
│ │ │ ├── manipulate_egg.xml
│ │ │ ├── manipulate_block.xml
│ │ │ ├── manipulate_egg_touch_sensors.xml
│ │ │ └── manipulate_block_touch_sensors.xml
│ │ └── fetch
│ │ │ ├── reach.xml
│ │ │ ├── push.xml
│ │ │ ├── slide.xml
│ │ │ ├── pick_and_place.xml
│ │ │ └── shared.xml
│ └── nv_ant.xml
└── README.md
├── prompts
├── feat-110-domain-randomization.md
├── doc-004-training.md
├── meta-001-programming-guideline.md
├── fix-009-config-consistency.md
├── meta-003-precommit.md
├── doc-000-cp.md
├── refactor-003-imports.md
├── fix-008-termination-reason-logging.md
├── feat-300-simulator-backend.md
├── refactor-007-blind-grasping.md
├── refactor-007-step-architecture.md
├── rl-001-blind-grasping-task.md
├── perf-000-physics-speed.md
├── doc-001-video.md
├── meta-002-backward-compatibility.md
├── feat-200-task-support.md
├── feat-100-bimanual.md
├── meta-000-workflow-setup.md
├── meta-004-docs.md
├── fix-000-tb-metrics.md
├── TEMPLATE.md
├── refactor-001-episode-length.md
├── fix-001-reward-logging-logic.md
├── fix-010-max-deltas.md
├── fix-006-metadata-keys.md
├── refactor-006-action-processing.md
├── feat-001-video-fps-control.md
├── feat-000-streaming-port.md
├── fix-005-box-bounce-physics.md
├── refactor-009-config-yaml.md
├── refactor-002-graphics-manager-in-parent.md
├── fix-007-episode-length-of-grasping.md
├── fix-003-max-iterations.md
├── doc-003-action-processing-illustration.md
├── feat-004-action-rule-example.md
├── doc-002-control-dt-illustration.md
├── refactor-008-config-key-casing.md
├── fix-001-contact-viz.md
└── fix-002-consistency.md
├── .gitmodules
├── .pre-commit-config.yaml
├── .gitignore
├── LICENSE.txt
├── setup.py
├── docs
├── GETTING_STARTED.md
├── guide-viewer-controller.md
├── guide-indefinite-testing.md
└── guide-environment-resets.md
└── examples
└── README.md
/.gitattributes:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/dexhand_env/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/dexhand_env/tasks/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/dexhand_env/utils/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/.get:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/dexhand_env/components/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/dexhand_env/components/physics/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/dexhand_env/components/reset/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/dexhand_env/components/reward/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/dexhand_env/components/observation/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/dexhand_env/components/termination/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/dexhand_env/components/initialization/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/dexhand_env/components/graphics/__init__.py:
--------------------------------------------------------------------------------
1 | # Graphics components
2 |
--------------------------------------------------------------------------------
/dexhand_env/components/graphics/video/__init__.py:
--------------------------------------------------------------------------------
1 | # Video components
2 |
--------------------------------------------------------------------------------
/prompts/feat-110-domain-randomization.md:
--------------------------------------------------------------------------------
1 | We need a structured / systematic domain randomization scheme.
2 |
--------------------------------------------------------------------------------
/prompts/doc-004-training.md:
--------------------------------------------------------------------------------
1 | Where does `TRAINING.md` fit in the doc system? Also, it has some outdated options.
2 |
--------------------------------------------------------------------------------
/prompts/meta-001-programming-guideline.md:
--------------------------------------------------------------------------------
1 | Consolidate programming guidelines generally applicable to all projects to user memory.
2 |
--------------------------------------------------------------------------------
/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "assets/dexrobot_mujoco"]
2 | path = assets/dexrobot_mujoco
3 | url = https://gitee.com/dexrobot/dexrobot_mujoco.git
4 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/hand/F1.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/F1.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/hand/F2.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/F2.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/hand/F3.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/F3.stl
--------------------------------------------------------------------------------
/prompts/fix-009-config-consistency.md:
--------------------------------------------------------------------------------
1 | The `test_record.yaml` config file has obselete legacy options. Check ALL config files for consistency.
2 |
--------------------------------------------------------------------------------
/prompts/meta-003-precommit.md:
--------------------------------------------------------------------------------
1 | Add to CLAUDE.md: if a file is modified by precommit hook, then `git add` it again before retrying the commit.
2 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/hand/TH1_z.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/TH1_z.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/hand/TH2_z.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/TH2_z.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/hand/TH3_z.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/TH3_z.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/hand/palm.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/palm.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/hand/wrist.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/wrist.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/textures/block.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/textures/block.png
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/hand/knuckle.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/knuckle.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/estop_link.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/estop_link.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/laser_link.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/laser_link.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/textures/block_hidden.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/textures/block_hidden.png
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/gripper_link.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/gripper_link.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/hand/lfmetacarpal.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/lfmetacarpal.stl
--------------------------------------------------------------------------------
/prompts/doc-000-cp.md:
--------------------------------------------------------------------------------
1 | Doc how to use `cp -P` to link latest experiments to `pinned`. Example: `cp -P runs/BlindGrasping_train_20250724_120120 runs/latest_train`
2 |
--------------------------------------------------------------------------------
/prompts/refactor-003-imports.md:
--------------------------------------------------------------------------------
1 | There are some ugly mid-file imports (opencv, flask). Consider just making these components required to avoid unnecessary checks.
2 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/hand/forearm_electric.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/forearm_electric.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/torso_fixed_link.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/torso_fixed_link.stl
--------------------------------------------------------------------------------
/dexhand_env/tasks/base/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | Base task components for dexterous hand environments.
3 | """
4 |
5 | from .vec_task import VecTask
6 |
7 | __all__ = ["VecTask"]
8 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/base_link_collision.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/base_link_collision.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/hand/forearm_electric_cvx.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/hand/forearm_electric_cvx.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/bellows_link_collision.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/bellows_link_collision.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/l_wheel_link_collision.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/l_wheel_link_collision.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/r_wheel_link_collision.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/r_wheel_link_collision.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/elbow_flex_link_collision.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/elbow_flex_link_collision.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/head_pan_link_collision.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/head_pan_link_collision.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/head_tilt_link_collision.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/head_tilt_link_collision.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/torso_lift_link_collision.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/torso_lift_link_collision.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/wrist_flex_link_collision.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/wrist_flex_link_collision.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/wrist_roll_link_collision.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/wrist_roll_link_collision.stl
--------------------------------------------------------------------------------
/prompts/fix-008-termination-reason-logging.md:
--------------------------------------------------------------------------------
1 | Does the termination reason logging take the average from the beginning of training? Should log the current status, not the historical average.
2 |
--------------------------------------------------------------------------------
/assets/README.md:
--------------------------------------------------------------------------------
1 | # Assets Directory Structure
2 |
3 | - `mjcf`: Example MuJoCo description files provided by Isaac Gym
4 | - `dexrobot_mujoco`: Submodule providing description files for DexHand
5 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/forearm_roll_link_collision.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/forearm_roll_link_collision.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/shoulder_pan_link_collision.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/shoulder_pan_link_collision.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/shoulder_lift_link_collision.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/shoulder_lift_link_collision.stl
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/stls/fetch/upperarm_roll_link_collision.stl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DexRobot/dexrobot_isaac/HEAD/assets/mjcf/open_ai_assets/stls/fetch/upperarm_roll_link_collision.stl
--------------------------------------------------------------------------------
/dexhand_env/cfg/test_viewer.yaml:
--------------------------------------------------------------------------------
1 | # @package _global_
2 | # Test configuration with rendering enabled
3 |
4 | defaults:
5 | - config
6 | - base/test
7 | - /physics/default
8 | - _self_
9 |
10 | env:
11 | viewer: true
12 |
--------------------------------------------------------------------------------
/dexhand_env/cfg/train_headless.yaml:
--------------------------------------------------------------------------------
1 | # @package _global_
2 | # Training configuration with headless mode for fast training
3 |
4 | defaults:
5 | - config
6 | - _self_
7 |
8 | env:
9 | viewer: false
10 | numEnvs: 8192
11 |
--------------------------------------------------------------------------------
/prompts/feat-300-simulator-backend.md:
--------------------------------------------------------------------------------
1 | Support multiple simulators as backends, with unified interface.
2 |
3 | P1: IsaacSim / IsaacLab
4 | P2: Genesis
5 | P3: MJX / MuJoCo Playground (may require significant infra change due to the use of JAX)
6 |
--------------------------------------------------------------------------------
/prompts/refactor-007-blind-grasping.md:
--------------------------------------------------------------------------------
1 | The current BlindGrasping task is better called "Blind Grasping" as it does not involve visual input. The task is to grasp an object without using vision, relying on other sensors or pre-defined parameters.
2 |
--------------------------------------------------------------------------------
/prompts/refactor-007-step-architecture.md:
--------------------------------------------------------------------------------
1 | Why is pre_physics_step handled in DexHandBase but post_physics_step is in StepProcessor?
2 |
3 | I'm not saying the current architecture is wrong, but you should investigate and decide the best design.
4 |
--------------------------------------------------------------------------------
/dexhand_env/cfg/base/debug.yaml:
--------------------------------------------------------------------------------
1 | # @package _global_
2 | # Base debug configuration
3 | # Contains common settings for debug mode environments
4 |
5 | env:
6 | numEnvs: 4 # Small number for debugging
7 |
8 | logging:
9 | logLevel: "debug" # Verbose logging for debugging
10 |
--------------------------------------------------------------------------------
/dexhand_env/cfg/debug.yaml:
--------------------------------------------------------------------------------
1 | # @package _global_
2 | # Debug configuration with verbose logging and small scale
3 |
4 | defaults:
5 | - config
6 | - base/debug
7 | - base/test
8 | - _self_
9 |
10 | env:
11 | viewer: true
12 |
13 | train:
14 | maxIterations: 100 # Override for shorter debug sessions
15 |
--------------------------------------------------------------------------------
/prompts/rl-001-blind-grasping-task.md:
--------------------------------------------------------------------------------
1 | Still difficult to train a valid policy. Need to break down the details, after training and monitoring tools update. On essential path:
2 | - `fix-003-max-iterations.md` - Fix max iterations for BlindGrasping task
3 | - `fix-000-tb-metrics.md` - Fix TensorBoard metrics for BlindGrasping task
4 |
--------------------------------------------------------------------------------
/prompts/perf-000-physics-speed.md:
--------------------------------------------------------------------------------
1 | Determine the optimal physics accuracy for training.
2 |
3 | Only do this after obtaining a good policy with the default physics settings.
4 |
5 | New finding: if I train with default physics but test with fast physics, the performance drops significantly.
6 |
7 | Need to do quantitative test to find the per-parameter impact.
8 |
--------------------------------------------------------------------------------
/prompts/doc-001-video.md:
--------------------------------------------------------------------------------
1 | We need doc on how to use the video system. A recommended workflow:
2 |
3 | - Open one process to train headlessly
4 | - Open another testing process (possibly on CPU) with hot-reload on to monitor the training process
5 | - On server: use streaming
6 | - On local workstation: use video; use unison to sync checkpoints with training server
7 |
--------------------------------------------------------------------------------
/prompts/meta-002-backward-compatibility.md:
--------------------------------------------------------------------------------
1 | Remove the requirement of backward compatibility in CLAUDE.md.
2 |
3 | On the contrary, we should **discourage** backward compatibility: research code breaks fast, so we should not bloat the codebase with legacy support. Instead, we should focus on maintaining a clean and modern codebase that embraces current best practices.
4 |
--------------------------------------------------------------------------------
/dexhand_env/components/action/__init__.py:
--------------------------------------------------------------------------------
1 | """Action processing components for DexHand environment."""
2 |
3 | from .rules import ActionRules
4 | from .scaling import ActionScaling
5 | from .default_rules import DefaultActionRules
6 | from .rule_based_controller import RuleBasedController
7 |
8 | __all__ = ["ActionRules", "ActionScaling", "DefaultActionRules", "RuleBasedController"]
9 |
--------------------------------------------------------------------------------
/prompts/feat-200-task-support.md:
--------------------------------------------------------------------------------
1 | Support more manipulation tasks.
2 |
3 | Source includes:
4 | - IsaacGymEnvs examples of manipulation task of other hands
5 | - RoboHive environments
6 |
7 | Need to define a set of tasks and do tuning on a per-task basis. Can refer to the observations / actions / rewards / reset logic of existing IsaacGymEnvs tasks. Need to write PRD(s) for each new task.
8 |
--------------------------------------------------------------------------------
/dexhand_env/cfg/physics/accurate.yaml:
--------------------------------------------------------------------------------
1 | # @package sim
2 | # Accurate physics configuration - maximum precision for training
3 | defaults:
4 | - default # Inherit from default.yaml
5 | - _self_ # Override with accurate-specific settings
6 |
7 | # Only override precision-specific parameters
8 | substeps: 32 # Enhanced substeps for stability
9 |
10 | physx:
11 | num_position_iterations: 32 # Enhanced for penetration mitigation
12 |
--------------------------------------------------------------------------------
/dexhand_env/cfg/test_record.yaml:
--------------------------------------------------------------------------------
1 | # @package _global_
2 | # Test configuration with video recording (headless)
3 |
4 | defaults:
5 | - config # Inherit from main config (includes train.checkpoint)
6 | - base/test
7 | - base/video # Add video configuration
8 | - /physics/default
9 | - _self_
10 |
11 | env:
12 | viewer: false # Explicitly headless
13 | videoRecord: true # Enable file recording
14 | videoStream: false # Disable HTTP streaming
15 |
--------------------------------------------------------------------------------
/dexhand_env/cfg/test_stream.yaml:
--------------------------------------------------------------------------------
1 | # @package _global_
2 | # Test configuration with HTTP video streaming (headless)
3 |
4 | defaults:
5 | - config # Inherit from main config (includes train.checkpoint)
6 | - base/test
7 | - base/video # Add video configuration
8 | - /physics/default
9 | - _self_
10 |
11 | env:
12 | viewer: false # Explicitly headless
13 | videoRecord: false # Disable file recording
14 | videoStream: true # Enable HTTP streaming
15 |
--------------------------------------------------------------------------------
/prompts/feat-100-bimanual.md:
--------------------------------------------------------------------------------
1 | Support bimanual environment supporting dexhand_left and dexhand_right working in the same environment.
2 |
3 | Breakdown:
4 | - Create dexhand_left_floating mujoco model
5 | - Update hardcoded logic for loading hand asset and creating hand actors (what level of flexibility is needed?)
6 | - Pay attention to actor indices with bimanual + objects
7 | - Update action processing logic if needed
8 | - Create a template task for bimanual dexhands
9 |
10 | May need to create separate PRDs for each item.
11 |
--------------------------------------------------------------------------------
/dexhand_env/cfg/base/test.yaml:
--------------------------------------------------------------------------------
1 | # @package _global_
2 | # Base test configuration
3 | # Contains common settings for test mode environments
4 |
5 |
6 | # Environment settings for testing
7 | env:
8 | numEnvs: 4 # Small number for testing
9 |
10 | # Training settings for test mode
11 | train:
12 | test: true # Enable test mode
13 | maxIterations: 1000 # Reasonable default for testing
14 | seed: 42 # Random seed for reproducible testing
15 | testGamesNum: 50 # Number of games for test configurations (0 = indefinite)
16 | logging:
17 | logLevel: "info" # Standard logging level for tests
18 |
--------------------------------------------------------------------------------
/.pre-commit-config.yaml:
--------------------------------------------------------------------------------
1 | repos:
2 | - repo: https://github.com/pre-commit/pre-commit-hooks
3 | rev: v4.5.0
4 | hooks:
5 | - id: trailing-whitespace
6 | - id: end-of-file-fixer
7 | - id: check-yaml
8 | - id: check-added-large-files
9 |
10 | # Optional: Add black for consistent formatting
11 | - repo: https://github.com/psf/black
12 | rev: 23.12.1
13 | hooks:
14 | - id: black
15 | language_version: python3
16 |
17 | # Optional: Add ruff for linting
18 | - repo: https://github.com/astral-sh/ruff-pre-commit
19 | rev: v0.1.9
20 | hooks:
21 | - id: ruff
22 | args: [--fix, --exit-non-zero-on-fix]
23 |
--------------------------------------------------------------------------------
/prompts/meta-000-workflow-setup.md:
--------------------------------------------------------------------------------
1 | Keep todo items in @prompts/
2 |
3 | Create a ROADMAP.md file to organize the items.
4 |
5 | Prefix specific todo items with refactor_, feat_, fix_, or rl_. Specifically, task-specific design / tuning and RL physics tunings should fall in the `rl_` category (which is more research-oriented than traditional software engineering).
6 |
7 | Update CLAUDE.md to tell AI to use the guideline.
8 |
9 | Fix one item in a session. Work in the following order:
10 | - Ultrathink to obtain context about the issue and expand the todo item markdown into a PRD / architecture document.
11 | - Come up with detailed implementation plan and request user's approval.
12 | - Implement the plan and request user's review.
13 |
--------------------------------------------------------------------------------
/dexhand_env/cfg/physics/fast.yaml:
--------------------------------------------------------------------------------
1 | # @package sim
2 | # Fast physics configuration - optimized for real-time visualization
3 | defaults:
4 | - default # Inherit from default.yaml
5 | - _self_ # Override with fast-specific settings
6 |
7 | # Only override performance-specific parameters
8 | substeps: 2 # Reduced substeps for speed
9 |
10 | physx:
11 | num_position_iterations: 8 # Reduced for speed (50% fewer)
12 | contact_offset: 0.002 # Slightly larger for speed
13 | rest_offset: 0.001 # Maintain ratio
14 | max_depenetration_velocity: 0.5 # Faster separation
15 | default_buffer_size_multiplier: 2.0 # Reduced buffer
16 | gpu_contact_pairs_per_env: 256 # Fewer contact pairs
17 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/hand/shadow_hand.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
--------------------------------------------------------------------------------
/dexhand_env/cfg/physics/default.yaml:
--------------------------------------------------------------------------------
1 | # @package sim
2 | # Default physics configuration - balanced quality/performance
3 | substeps: 4 # Standard physics substeps
4 | gravity: [0.0, 0.0, -9.81]
5 | num_client_threads: 0
6 |
7 | physx:
8 | solver_type: 1 # TGS solver
9 | num_position_iterations: 16 # Balanced precision
10 | num_velocity_iterations: 0 # NVIDIA recommendation
11 | contact_offset: 0.001 # High precision detection
12 | rest_offset: 0.0005 # Stability maintenance
13 | bounce_threshold_velocity: 0.15
14 | max_depenetration_velocity: 0.2
15 | default_buffer_size_multiplier: 4.0
16 | num_subscenes: 0
17 | contact_collection: 1
18 | gpu_contact_pairs_per_env: 512
19 | always_use_articulations: true
20 | num_threads: 4
21 |
--------------------------------------------------------------------------------
/dexhand_env/cfg/test_script.yaml:
--------------------------------------------------------------------------------
1 | # @package _global_
2 | # Test script configuration for examples/dexhand_test.py
3 | # Contains settings specific to environment functional testing
4 |
5 | defaults:
6 | - config
7 | - base/test
8 | - /physics/fast # Fast physics for smooth rendering
9 | - _self_
10 |
11 | # Test script specific settings
12 | steps: 1200 # Total number of test steps to run
13 | sleep: 0.01 # Sleep time between steps in seconds
14 | headless: false # Run without GUI visualization
15 | debug: false # Enable debug output and additional logging
16 | log_level: "info" # Set logging verbosity level
17 | enablePlotting: false # Enable real-time plotting with Rerun
18 | plotEnvIdx: 0 # Environment index to plot
19 |
20 | env:
21 | viewer: true # Enable interactive visualization for test script
22 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | videos
2 | cloud
3 | recorded_frames
4 | *train_dir*
5 | *ige_logs*
6 | *.egg-info
7 | /.vs
8 | /.vscode
9 | /_package
10 | /shaders
11 | ._tmptext.txt
12 | __pycache__/
13 | /tools/format/.lastrun
14 | *.pyc
15 | _doxygen
16 | *.pxd2
17 | logs*
18 | nn/
19 | runs/
20 | runs_all/
21 | .idea
22 | outputs/
23 |
24 | # Preserve directory structure with .gitkeep files
25 | !**/.gitkeep
26 | *.hydra*
27 | CLAUDE.md
28 | .unison.*
29 | ~
30 |
31 | # Directories to ignore
32 | legacy/
33 | reference/
34 | assets/dexrobot_urdf/
35 |
36 | # Python
37 | *.py[cod]
38 | *$py.class
39 | *.so
40 | .Python
41 | env/
42 | build/
43 | develop-eggs/
44 | dist/
45 | downloads/
46 | eggs/
47 | .eggs/
48 | lib/
49 | lib64/
50 | parts/
51 | sdist/
52 | var/
53 | *.egg-info/
54 | .installed.cfg
55 | *.egg
56 |
57 | # OS specific
58 | .DS_Store
59 |
60 | # Temporary debug files
61 | patch_rlgames_timing.py
62 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/hand/egg.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/fetch/reach.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
--------------------------------------------------------------------------------
/prompts/meta-004-docs.md:
--------------------------------------------------------------------------------
1 | # Documentation Development Protocol Integration
2 |
3 | **Status**: ✅ **COMPLETED** (2025-08-01)
4 |
5 | **Problem**: Need comprehensive documentation development protocol in CLAUDE.md to ensure reader-oriented, accurate, and maintainable documentation.
6 |
7 | **Solution Implemented**: Added comprehensive "Documentation Development Protocol - CRITICAL" section to CLAUDE.md covering:
8 |
9 | 1. **Motivation-First Writing**: Always explain WHY before HOW with problem context
10 | 2. **Architecture Explanation Requirements**: Explain non-standard patterns and "magic" behavior
11 | 3. **Fact-Checking and Code Validation**: Verify every technical detail against implementation
12 | 4. **Reader-Oriented Structure**: Organize around user scenarios, not implementation structure
13 | 5. **Quality Gates**: 5-point validation checklist for documentation updates
14 |
15 | **Key Improvements**:
16 | - Addresses documentation quality issues observed in guide-indefinite-testing.md critique
17 | - Provides concrete examples of wrong vs correct documentation approaches
18 | - Establishes systematic validation process for technical accuracy
19 | - Balances conciseness with essential context requirements
20 | - Identifies common anti-patterns to avoid
21 |
22 | **Integration**: The protocol is now part of the core CLAUDE.md guidelines and will be applied to all future documentation work.
23 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/fetch/push.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
--------------------------------------------------------------------------------
/prompts/fix-000-tb-metrics.md:
--------------------------------------------------------------------------------
1 | # fix-000-tb-metrics.md
2 |
3 | Fix TensorBoard data point sampling limit causing old reward breakdown data to disappear in long runs.
4 |
5 | ## Context
6 |
7 | During long training runs, TensorBoard displays only the most recent ~780.5M steps for custom reward breakdown metrics, while RL Games built-in metrics show complete training history. This is due to TensorBoard's default 1000 data point limit per scalar tag with reservoir sampling.
8 |
9 | Custom reward breakdown logs more frequently (every 10 finished episodes) compared to built-in metrics (every epoch), causing it to hit the 1000-point limit faster.
10 |
11 | ## Current State
12 |
13 | - TensorBoard default: 1000 scalar data points per tag
14 | - RewardComponentObserver logs every `log_interval=10` finished episodes
15 | - Built-in metrics log every epoch (much less frequent)
16 |
17 | ## Desired Outcome
18 |
19 | Reduce logging frequency of reward breakdown metrics to stay within TensorBoard's data point limits for longer training runs.
20 |
21 | ## Implementation Notes
22 |
23 | **Solution: Increase log_interval**
24 | - Change default `log_interval` from 10 to 100+ finished episodes
25 | - Parameter meaning: "Write to TensorBoard once per X finished episodes"
26 | - This reduces data points by 10x, extending visible history from 780.5M to 7.8B+ steps
27 |
28 | **Code change:**
29 | ```python
30 | # In train.py or config
31 | observer = RewardComponentObserver(log_interval=100) # Was 10
32 | ```
33 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/fetch/slide.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/hand/pen.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | BSD 3-Clause License
2 |
3 | Copyright (c) 2018-2023, NVIDIA Corporation
4 | All rights reserved.
5 |
6 | Redistribution and use in source and binary forms, with or without
7 | modification, are permitted provided that the following conditions are met:
8 |
9 | 1. Redistributions of source code must retain the above copyright notice, this
10 | list of conditions and the following disclaimer.
11 |
12 | 2. Redistributions in binary form must reproduce the above copyright notice,
13 | this list of conditions and the following disclaimer in the documentation
14 | and/or other materials provided with the distribution.
15 |
16 | 3. Neither the name of the copyright holder nor the names of its
17 | contributors may be used to endorse or promote products derived from
18 | this software without specific prior written permission.
19 |
20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 |
31 | See assets/licenses for license information for assets included in this repository
32 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/fetch/pick_and_place.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
--------------------------------------------------------------------------------
/dexhand_env/cfg/base/video.yaml:
--------------------------------------------------------------------------------
1 | # @package _global_
2 | # Base video configuration for all video modes
3 | # Contains common video settings shared across recording and streaming
4 | #
5 | # INDEPENDENT FEATURES:
6 | # - Video recording (env.videoRecord: true/false) - Saves MP4 files to disk
7 | # - HTTP streaming (env.videoStream: true/false) - Serves real-time MJPEG over HTTP
8 | # - Both features can be enabled/disabled independently
9 | # - Both use the same unified camera system (ViewerController's persistent camera)
10 | #
11 | # USAGE EXAMPLES:
12 | # Recording only: env.videoRecord=true env.videoStream=false
13 | # Streaming only: env.videoRecord=false env.videoStream=true
14 | # Both enabled: env.videoRecord=true env.videoStream=true
15 | # Neither enabled: env.videoRecord=false env.videoStream=false (default)
16 |
17 | env:
18 | # Video recording configuration (requires OpenCV)
19 | # Note: FPS is automatically calculated from simulation timing (1.0 / control_dt)
20 | videoResolution: [1920, 1080] # Width x Height for both recording and streaming (Full HD)
21 | videoCodec: mp4v # Video codec (mp4v, XVID, H264)
22 | videoMaxDuration: 300 # Maximum duration per video file (seconds)
23 |
24 | # HTTP streaming configuration (requires Flask)
25 | videoStreamHost: "127.0.0.1" # Server host address (localhost for security)
26 | videoStreamPort: 58080 # Server port (uncommon port to avoid conflicts, auto-increments if occupied)
27 | videoStreamQuality: 100 # JPEG quality (1-100) - Maximum quality
28 | videoStreamBufferSize: 10 # Frame buffer size
29 | videoStreamBindAll: false # Bind to all interfaces (0.0.0.0) instead of localhost - SECURITY RISK if enabled
30 |
--------------------------------------------------------------------------------
/dexhand_env/utils/coordinate_transforms.py:
--------------------------------------------------------------------------------
1 | """
2 | Coordinate transformation utilities for DexHand environment.
3 |
4 | This module provides utilities for transforming points and orientations
5 | between different coordinate frames.
6 | """
7 |
8 | # Import IsaacGym first
9 | from isaacgym.torch_utils import (
10 | quat_rotate,
11 | quat_rotate_inverse,
12 | )
13 |
14 | # Then import PyTorch
15 |
16 |
17 | def point_in_hand_frame(pos_world, hand_pos, hand_rot):
18 | """
19 | Convert a point from world frame to hand frame.
20 |
21 | Args:
22 | pos_world: Position in world frame [batch_size, 3]
23 | hand_pos: Hand position in world frame [batch_size, 3]
24 | hand_rot: Hand rotation quaternion [batch_size, 4] in format [x, y, z, w]
25 |
26 | Returns:
27 | Position in hand frame [batch_size, 3]
28 | """
29 | # Vector from hand to point in world frame
30 | rel_pos = pos_world - hand_pos
31 |
32 | # Use Isaac Gym's optimized quat_rotate_inverse to transform from world to hand frame
33 | local_pos = quat_rotate_inverse(hand_rot, rel_pos)
34 |
35 | return local_pos
36 |
37 |
38 | def point_in_world_frame(pos_local, hand_pos, hand_rot):
39 | """
40 | Convert a point from hand frame to world frame.
41 |
42 | Args:
43 | pos_local: Position in hand frame [batch_size, 3]
44 | hand_pos: Hand position in world frame [batch_size, 3]
45 | hand_rot: Hand rotation quaternion [batch_size, 4] in format [x, y, z, w]
46 |
47 | Returns:
48 | Position in world frame [batch_size, 3]
49 | """
50 | # Use Isaac Gym's optimized quat_rotate to transform from hand to world frame
51 | rotated_pos = quat_rotate(hand_rot, pos_local)
52 |
53 | # Add hand position to get world position
54 | world_pos = rotated_pos + hand_pos
55 |
56 | return world_pos
57 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/hand/reach.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | """Installation script for the 'dexhand' python package."""
2 |
3 | from __future__ import absolute_import
4 | from __future__ import print_function
5 | from __future__ import division
6 |
7 | from setuptools import setup, find_packages
8 |
9 | import os
10 |
11 | root_dir = os.path.dirname(os.path.realpath(__file__))
12 |
13 |
14 | # Minimum dependencies required prior to installation
15 | INSTALL_REQUIRES = [
16 | # RL
17 | "gym==0.23.1",
18 | "torch",
19 | "omegaconf",
20 | "termcolor",
21 | "jinja2",
22 | "hydra-core>=1.2",
23 | "rl-games>=1.6.0",
24 | "tensorboard", # For training metrics logging
25 | "urdfpy==0.0.22",
26 | "pysdf==0.1.9",
27 | "warp-lang==0.10.1",
28 | "trimesh==3.23.5",
29 | # Video dependencies (now required for training workflows)
30 | "opencv-python>=4.5.0", # For video recording
31 | "flask>=2.0.0", # For HTTP video streaming
32 | ]
33 |
34 | # Optional dependencies for additional features (backward compatibility)
35 | # Note: video dependencies are now required in INSTALL_REQUIRES above
36 | EXTRAS_REQUIRE = {
37 | "streaming": [], # Flask now required by default
38 | "video": [], # OpenCV now required by default
39 | "all": [], # All video features now included by default
40 | }
41 |
42 |
43 | # Installation operation
44 | setup(
45 | name="dexhand_env",
46 | author="DexRobot Inc.",
47 | version="0.1.0",
48 | description="Reinforcement learning environment for dexterous manipulation with robotic hands",
49 | keywords=["robotics", "rl", "dexterous", "manipulation"],
50 | include_package_data=True,
51 | python_requires=">=3.6",
52 | install_requires=INSTALL_REQUIRES,
53 | extras_require=EXTRAS_REQUIRE,
54 | packages=find_packages("."),
55 | classifiers=[
56 | "Natural Language :: English",
57 | "Programming Language :: Python :: 3.6, 3.7, 3.8",
58 | ],
59 | zip_safe=False,
60 | )
61 |
62 | # EOF
63 |
--------------------------------------------------------------------------------
/dexhand_env/cfg/train/BaseTaskPPO.yaml:
--------------------------------------------------------------------------------
1 | # @package train
2 | algo:
3 | name: a2c_continuous
4 |
5 | model:
6 | name: continuous_a2c_logstd
7 |
8 | network:
9 | name: actor_critic
10 | separate: False
11 |
12 | space:
13 | continuous:
14 | mu_activation: None
15 | sigma_activation: None
16 | mu_init:
17 | name: default
18 | sigma_init:
19 | name: const_initializer
20 | val: 1.0
21 | fixed_sigma: True
22 |
23 | mlp:
24 | units: [512, 256, 128]
25 | activation: elu
26 | d2rl: False
27 |
28 | initializer:
29 | name: default
30 | regularizer:
31 | name: None
32 |
33 | load_checkpoint: false # Will be set to true dynamically in train.py when checkpoint is provided
34 | load_path: ${train.checkpoint}
35 |
36 | config:
37 | name: ${task.name}
38 | full_experiment_name: ${.name}
39 | env_name: ${train.envName}
40 | device: ${env.device}
41 | multi_gpu: False
42 | ppo: True
43 | mixed_precision: False
44 | normalize_input: True
45 | normalize_value: True
46 | value_bootstrap: True
47 | num_actors: ${env.numEnvs}
48 | reward_shaper:
49 | scale_value: 1.0
50 | normalize_advantage: True
51 | gamma: 0.99
52 | tau: 0.95
53 | learning_rate: 3e-4
54 | lr_schedule: adaptive
55 | schedule_type: standard
56 | kl_threshold: 0.008
57 | score_to_win: 10000
58 | max_epochs: ${train.maxIterations}
59 | save_best_after: 1
60 | save_frequency: 100
61 | print_stats: True
62 | grad_norm: 1.0
63 | entropy_coef: 0.0
64 | truncate_grads: True
65 | e_clip: 0.2
66 | horizon_length: 16
67 | minibatch_size: ${env.numEnvs}
68 | mini_epochs: 4
69 | critic_coef: 4
70 | clip_value: True
71 | seq_len: 4
72 | bounds_loss_coef: 0.0001
73 |
74 | # TensorBoard logging configuration
75 | use_tensorboard: True
76 | tensorboard_logdir: 'runs'
77 | log_interval: 1 # Log every epoch
78 | save_interval: 100 # Save checkpoints every 100 epochs
79 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/hand/shared_asset.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
--------------------------------------------------------------------------------
/prompts/TEMPLATE.md:
--------------------------------------------------------------------------------
1 | # [PREFIX]-[NUMBER]-[SHORT-DESCRIPTION].md
2 |
3 | Brief one-line description of the task.
4 |
5 | ## Context
6 |
7 |
8 |
9 |
10 |
11 | ## Current State
12 |
13 |
14 | ## Desired Outcome
15 |
16 |
17 | ## Constraints
18 |
19 |
20 |
21 | ## Implementation Notes
22 |
23 |
24 |
25 | ## Dependencies
26 |
27 |
28 | ---
29 |
30 | ## Instructions
31 |
32 | ### File Naming Convention
33 | Use format: `[prefix]-[number]-[short-description].md`
34 |
35 | **Prefixes:**
36 | - `meta_` - Workflow, tooling, project organization
37 | - `refactor_` - Code quality, architectural improvements
38 | - `feat_` - New functionality, API enhancements
39 | - `fix_` - Bug fixes, issue resolution
40 | - `rl_` - Research tasks (policy tuning, physics, rewards)
41 |
42 | **Examples:**
43 | - `refactor-001-episode-length.md`
44 | - `rl-002-reward-tuning.md`
45 | - `feat-003-new-observation.md`
46 |
47 | ### Content Guidelines
48 |
49 | **Minimal Format (for simple tasks):**
50 | ```markdown
51 | # refactor-001-example-task.md
52 | Brief description of what needs to be done.
53 | ```
54 |
55 | **Expanded Format (after Ultrathink phase):**
56 | - Fill in relevant sections as understanding deepens
57 | - Not all sections required for every task
58 | - Use during Phase 1 (Ultrathink) to develop detailed understanding
59 |
60 | ### Workflow Integration
61 |
62 | 1. **Create**: Start with minimal format (brief description)
63 | 2. **Ultrathink**: Expand sections as needed during Phase 1
64 | 3. **Plan**: Reference expanded content during Phase 2 planning
65 | 4. **Complete**: Mark as done in ROADMAP.md after Phase 3
66 |
--------------------------------------------------------------------------------
/prompts/refactor-001-episode-length.md:
--------------------------------------------------------------------------------
1 | # refactor-001-episode-length.md
2 |
3 | Resolve architectural inconsistency in episodeLength configuration placement.
4 |
5 | ## Context
6 |
7 | The DexHand codebase has an architectural inconsistency where `episodeLength` is placed in different config sections:
8 | - BaseTask.yaml: `task.episodeLength: 300`
9 | - BlindGrasping.yaml: `env.episodeLength: 500`
10 | - DexHandBase code: expects `env_cfg["episodeLength"]`
11 |
12 | This creates potential runtime failures when BaseTask is used directly, since the code looks for the parameter in the env section but BaseTask defines it in the task section.
13 |
14 | ## Current State
15 |
16 | - **BaseTask.yaml**: Places `episodeLength` in `task` section (line 24)
17 | - **BlindGrasping.yaml**: Places `episodeLength` in `env` section (line 15)
18 | - **DexHandBase**: Reads from `self.env_cfg["episodeLength"]` (line 148)
19 | - **CLI Integration**: `dexhand_test.py` overrides `cfg["env"]["episodeLength"]`
20 |
21 | ## Desired Outcome
22 |
23 | Consistent placement of `episodeLength` parameter across all config files and code, eliminating the architectural inconsistency.
24 |
25 | ## Constraints
26 |
27 | - Must maintain backward compatibility with existing BlindGrasping.yaml
28 | - Must align with existing code expectations in DexHandBase
29 | - Must preserve CLI override functionality
30 | - Should follow architectural principles for config section organization
31 |
32 | ## Implementation Notes
33 |
34 | **Expert Consensus Results:**
35 | - Multiple AI models unanimously agreed `episodeLength` belongs in `env` section
36 | - Parameter is classified as runtime execution constraint, similar to `numEnvs`, `device`
37 | - Simple fix: move one line from task to env section in BaseTask.yaml
38 |
39 | **Key Insight from User Challenge:**
40 | `episodeLength` has task-semantic properties (affects task difficulty/feasibility) but is architecturally an environment runtime constraint (timeout mechanism). Current code expects env section placement.
41 |
42 | **Technical Approach:**
43 | 1. Move `episodeLength: 300` from `task` to `env` section in BaseTask.yaml
44 | 2. Test BaseTask environment creation to verify fix
45 | 3. No code changes required (DexHandBase already expects env section)
46 |
47 | ## Dependencies
48 |
49 | None - this is a standalone configuration fix.
50 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/hand/manipulate_pen.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/hand/manipulate_pen_touch_sensors.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
--------------------------------------------------------------------------------
/prompts/fix-001-reward-logging-logic.md:
--------------------------------------------------------------------------------
1 | # fix-001-reward-logging-logic.md
2 |
3 | Fix RewardComponentObserver logging cumulative averages instead of windowed statistics.
4 |
5 | ## Context
6 |
7 | RewardComponentObserver currently logs cumulative averages from training start instead of meaningful windowed statistics. This creates slowly-changing metrics that mask recent performance trends and provide poor insight into training progress.
8 |
9 | ## Current State
10 |
11 | **Flawed cumulative logic:**
12 | ```python
13 | # Accumulates forever (never resets except after_clear_stats)
14 | self.cumulative_sums["all"][component_name]["rewards"] += total_reward
15 | self.cumulative_sums["all"][component_name]["steps"] += total_steps
16 |
17 | # Logs cumulative average since training start
18 | step_mean = cum_rewards / max(cum_steps, 1)
19 | ```
20 |
21 | This produces metrics like "average reward per step since training began" instead of "average reward per step over recent episodes."
22 |
23 | ## Desired Outcome
24 |
25 | Replace cumulative statistics with windowed statistics that reset after each logging interval, providing meaningful trending data.
26 |
27 | ## Implementation Notes
28 |
29 | **Windowed statistics approach:**
30 | ```python
31 | def _log_to_tensorboard(self):
32 | # Calculate window averages for current interval
33 | for component_name in self.discovered_components:
34 | window_rewards = self.cumulative_sums["all"][component_name]["rewards"]
35 | window_steps = self.cumulative_sums["all"][component_name]["steps"]
36 | step_mean = window_rewards / max(window_steps, 1)
37 |
38 | # Log windowed average
39 | self.writer.add_scalar(f"reward_breakdown/all/raw/step/{component_name}", step_mean, frame)
40 |
41 | # Reset window for next interval
42 | for term_type in self.cumulative_sums:
43 | for component_name in self.cumulative_sums[term_type]:
44 | self.cumulative_sums[term_type][component_name] = {"rewards": 0.0, "steps": 0}
45 | ```
46 |
47 | **Benefits:**
48 | - Meaningful trending: shows performance changes over time
49 | - Better training insights: recent performance vs long-term averages
50 | - Fewer redundant data points: values change meaningfully between logs
51 |
52 | ## Constraints
53 |
54 | - Keep episode meters (rolling averages) unchanged
55 | - Maintain log_interval in episodes (makes sense for parallel environments)
56 | - Preserve component responsibility separation
57 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/hand/manipulate_egg.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/hand/manipulate_block.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
38 |
39 |
40 |
41 |
42 |
--------------------------------------------------------------------------------
/docs/GETTING_STARTED.md:
--------------------------------------------------------------------------------
1 | # Getting Started with DexHand
2 |
3 | This guide will get you from zero to your first trained RL policy in under 10 minutes.
4 |
5 | ## Prerequisites
6 |
7 | - NVIDIA GPU with CUDA support
8 | - Python 3.8+
9 | - Isaac Gym Preview 4 (see [installation instructions](https://developer.nvidia.com/isaac-gym))
10 |
11 | ## Quick Setup (5 minutes)
12 |
13 | ### 1. Install Isaac Gym
14 | ```bash
15 | # Download Isaac Gym Preview 4 from NVIDIA
16 | # Follow their installation instructions, then verify:
17 | cd isaacgym/python/examples
18 | python joint_monkey.py # Should show a working simulation
19 | ```
20 |
21 | ### 2. Clone and Install DexHand
22 | ```bash
23 | git clone --recursive https://github.com/dexrobot/dexrobot_isaac
24 | cd dexrobot_isaac
25 | pip install -e .
26 | ```
27 |
28 | > **Missing submodules?** Run `git submodule update --init --recursive` to fetch required robot models.
29 |
30 | ### 3. Verify Installation
31 | ```bash
32 | # Quick test (should show hand visualization)
33 | python examples/dexhand_test.py --num-envs 1 --episode-length 100
34 | ```
35 |
36 | You should see an Isaac Gym window with a dexterous hand in the simulation.
37 |
38 | ## Your First Training (3 minutes)
39 |
40 | ### 1. Start Training
41 | ```bash
42 | # Train a basic policy
43 | python train.py task=BaseTask numEnvs=512
44 | ```
45 |
46 | This creates a new experiment in `runs/BaseTask_train_YYMMDD_HHMMSS/` and begins training.
47 |
48 | ### 2. Test Your Trained Policy
49 | ```bash
50 | # Test with visualization
51 | python train.py task=BaseTask test=true checkpoint=latest viewer=true numEnvs=4
52 | ```
53 |
54 | The system automatically finds your latest training checkpoint and visualizes the learned policy.
55 |
56 | ## Next Steps
57 |
58 | - **[Training Guide](TRAINING.md)** - Comprehensive training workflows, testing options, and experiment management
59 | - **[Task Creation Guide](guide-task-creation.md)** - Create custom manipulation tasks
60 | - **[Troubleshooting](TROUBLESHOOTING.md)** - Solutions for common setup and runtime issues
61 | - **[System Architecture](ARCHITECTURE.md)** - Understanding the component-based design
62 |
63 | ## Quick Troubleshooting
64 |
65 | - **ImportError: isaacgym** → Isaac Gym not installed correctly
66 | - **CUDA out of memory** → Reduce `numEnvs` (try 256 or 128)
67 | - **Missing assets/dexrobot_mujoco** → Run `git submodule update --init --recursive`
68 | - **No checkpoints found** → Training hasn't saved checkpoints yet (wait longer)
69 |
70 | For detailed troubleshooting, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md).
71 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/hand/manipulate_egg_touch_sensors.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/hand/manipulate_block_touch_sensors.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
--------------------------------------------------------------------------------
/prompts/fix-010-max-deltas.md:
--------------------------------------------------------------------------------
1 | # fix-010-max-deltas.md
2 |
3 | Investigate and verify max_deltas scaling correctness in ActionProcessor.
4 |
5 | ## Context
6 |
7 | Based on task description, there was concern that max_deltas scaling uses incorrect timing values due to control_dt vs physics_dt initialization bug. However, static code analysis reveals the current implementation appears architecturally correct.
8 |
9 | ## Current State
10 |
11 | **ActionProcessor Implementation Analysis:**
12 | - `_precompute_max_deltas()` correctly called during Stage 2 (`finalize_setup()`) after control_dt measurement
13 | - Uses property decorator to access `self.parent.physics_manager.control_dt` (single source of truth)
14 | - Calculation: `max_deltas = control_dt * velocity_limit` is mathematically sound
15 | - Two-stage initialization pattern properly implemented
16 |
17 | **Configuration Values:**
18 | - BlindGrasping: `max_finger_joint_velocity: 0.5`, `sim.dt: 0.01` (physics_dt)
19 | - BaseTask: `max_finger_joint_velocity: 1.0`, `sim.dt: 0.005` (physics_dt)
20 |
21 | **Expected Calculations:**
22 | - BlindGrasping: control_dt = 0.02 (0.01 × 2 steps), max_deltas = 0.5 × 0.02 = 0.01
23 | - BaseTask: control_dt ≈ 0.005 (0.005 × 1 step), max_deltas = 1.0 × 0.005 = 0.005
24 |
25 | ## Desired Outcome
26 |
27 | **Verification Required:**
28 | 1. Confirm current implementation calculates correct max_deltas values
29 | 2. Validate no control_dt vs physics_dt confusion exists
30 | 3. Update task status based on actual findings
31 |
32 | **Possible Outcomes:**
33 | - **If correct**: Mark task as invalid/completed, no changes needed
34 | - **If incorrect**: Identify actual root cause and implement fix
35 |
36 | ## Constraints
37 |
38 | - Follow existing two-stage initialization pattern
39 | - Maintain fail-fast philosophy (no defensive programming)
40 | - Preserve single source of truth for control_dt
41 | - Respect ActionProcessor component boundaries
42 |
43 | ## Implementation Notes
44 |
45 | **Investigation Steps:**
46 | 1. Add temporary debug logging to verify actual calculated values
47 | 2. Test with both BlindGrasping and BaseTask configurations
48 | 3. Compare expected vs actual max_deltas values
49 | 4. Identify if discrepancy exists and locate root cause
50 |
51 | **Potential Issues to Check:**
52 | - Property decorator functioning correctly
53 | - control_dt measurement accuracy
54 | - Configuration parameter loading
55 | - Physics steps calculation
56 |
57 | ## Dependencies
58 |
59 | - Requires understanding of two-stage initialization pattern
60 | - Depends on PhysicsManager control_dt measurement accuracy
61 |
--------------------------------------------------------------------------------
/prompts/fix-006-metadata-keys.md:
--------------------------------------------------------------------------------
1 | # fix-006-metadata-keys.md
2 |
3 | ✅ **COMPLETED** - Git metadata saving fails due to hardcoded config key assumptions.
4 |
5 | ## Context
6 |
7 | The training script's git metadata saving functionality tries to reconstruct CLI arguments by hardcoding expected config keys like `env.render`. This violates fail-fast principles and breaks when configuration keys change (as happened in refactor-004-render.md where `render` became `viewer`).
8 |
9 | Current error:
10 | ```
11 | WARNING | Could not save git metadata: Key 'render' is not in struct
12 | full_key: env.render
13 | object_type=dict
14 | ```
15 |
16 | ## Current State
17 |
18 | The `get_config_overrides()` function in train.py attempts to reconstruct CLI arguments for reproducibility by:
19 | 1. Hardcoding assumed "important" config keys
20 | 2. Building a synthetic "Hydra equivalent" command
21 | 3. Failing when keys don't exist (violating fail-fast)
22 |
23 | This approach has fundamental flaws:
24 | - **Information Loss**: Only captures subset of assumed important values
25 | - **Hardcoded Assumptions**: Breaks when config structure changes
26 | - **Incomplete Reconstruction**: Cannot fully reproduce complex configuration hierarchies
27 | - **Defensive Programming**: Uses existence checks that mask configuration issues
28 |
29 | ## Desired Outcome
30 |
31 | Replace flawed reconstruction approach with comprehensive config saving:
32 |
33 | 1. **Remove hardcoded key assumptions** - eliminate `get_config_overrides()` function entirely
34 | 2. **Save complete resolved config** - serialize the full OmegaConf configuration as YAML
35 | 3. **Preserve existing working functionality** - keep original CLI args and git info unchanged
36 | 4. **Full reproducibility** - anyone can recreate exact training conditions
37 |
38 | ## Constraints
39 |
40 | - **Fail-fast compliance**: No defensive programming or hardcoded key checks
41 | - **Single source of truth**: Config values come from resolved configuration only
42 | - **Architectural alignment**: Follows recent configuration refactoring principles
43 | - **Backward compatibility**: Don't break existing experiment tracking
44 |
45 | ## Implementation Notes
46 |
47 | **Files to modify:**
48 | - `train.py`: Remove `get_config_overrides()`, save complete config instead
49 |
50 | **Testing approach:**
51 | - Verify no warnings during metadata saving
52 | - Confirm complete config is saved in readable format
53 | - Test with BaseTask and BlindGrasping configurations
54 |
55 | **Config serialization considerations:**
56 | - Use `OmegaConf.to_yaml()` for human-readable format
57 | - Handle any non-serializable objects gracefully
58 | - Save to dedicated file for easy inspection
59 |
--------------------------------------------------------------------------------
/prompts/refactor-006-action-processing.md:
--------------------------------------------------------------------------------
1 | # refactor-006-action-processing.md
2 |
3 | Split action processing timing to align with RL rollout patterns for better clarity and coherence.
4 |
5 | ## Context
6 |
7 | The current action processing logic is bundled together in `pre_physics_step()`, which doesn't align well with standard RL rollout patterns. The refactoring will split action processing into two phases:
8 | - Pre-action computation in post_physics (step N-1) to prepare DOF targets for next step's observations
9 | - Post-action processing in pre_physics (step N) to apply policy actions
10 |
11 | This improves clarity and makes the timing more coherent with RL frameworks where observations for step N are computed in step N-1.
12 |
13 | ## Current State
14 |
15 | **Current Flow (in `pre_physics_step()`):**
16 | 1. Compute observations excluding active_rule_targets
17 | 2. Apply pre-action rule → compute active_rule_targets
18 | 3. Add active_rule_targets to observations
19 | 4. Process policy actions (action rule + post filters + coupling)
20 |
21 | **Current `post_physics_step()`:**
22 | - Only processes rewards, termination, resets
23 | - Returns already-computed observations from pre_physics_step
24 | - Comments indicate "Observations were already computed in pre_physics_step"
25 |
26 | ## Desired Outcome
27 |
28 | **New Flow:**
29 |
30 | **Post-physics (step N-1):**
31 | 1. Compute observations for step N (excluding active_rule_targets)
32 | 2. Apply pre-action rule using these observations → get active_rule_targets
33 | 3. Add active_rule_targets to observations
34 | 4. Return complete observations for step N
35 |
36 | **Pre-physics (step N):**
37 | - Apply policy actions only (action rule + post filters + coupling)
38 | - Skip observation computation and pre-action rule
39 |
40 | ## Constraints
41 |
42 | - Must preserve two-stage initialization pattern
43 | - Must respect component boundaries and single source of truth
44 | - Must not break existing tasks or functionality
45 | - Should maintain or improve performance
46 |
47 | ## Implementation Notes
48 |
49 | **Key Changes Required:**
50 | 1. **StepProcessor**: Move observation computation and pre-action rule to post_physics
51 | 2. **DexHandBase**: Modify pre_physics_step to only handle post-action processing
52 | 3. **ActionProcessor**: Ensure pre-action and post-action can be called separately
53 |
54 | **Component Modifications:**
55 | - `StepProcessor.process_physics_step()`: Add observation computation + pre-action
56 | - `DexHandBase.pre_physics_step()`: Remove stages 1-3, keep only stage 4
57 | - Ensure ActionProcessor can handle split pre/post action processing
58 |
59 | **Testing Approach:**
60 | - Verify identical behavior before/after refactoring
61 | - Test with existing BlindGrasping task
62 | - Validate timing and performance
63 |
64 | ## Dependencies
65 |
66 | None - this is a self-contained timing refactoring.
67 |
--------------------------------------------------------------------------------
/dexhand_env/cfg/config.yaml:
--------------------------------------------------------------------------------
1 | # Main Hydra configuration for DexHand training and testing
2 | # Training: python train.py
3 | # Testing: python examples/dexhand_test.py
4 | # Override: python train.py task=BlindGrasping env.numEnvs=2048
5 | # Override: python examples/dexhand_test.py task=BlindGrasping headless=true steps=500
6 |
7 | defaults:
8 | - train: BaseTaskPPO
9 | - base/video
10 | - _self_
11 | - task: BaseTask
12 |
13 | # Task and training configs will be loaded from task/ and train/ subdirectories
14 | # The following are runtime overrides
15 |
16 | device: "cuda:0" # Device for simulation and RL
17 |
18 | # Physics engine at root level (required by VecTask)
19 | physics_engine: physx
20 |
21 | # Simulation configuration
22 | sim:
23 | substeps: 4
24 | gravity: [0.0, 0.0, -9.81]
25 | num_client_threads: 0
26 | physx:
27 | solver_type: 1
28 | num_position_iterations: 16
29 | num_velocity_iterations: 0
30 | contact_offset: 0.001
31 | rest_offset: 0.0005
32 | bounce_threshold_velocity: 0.15
33 | max_depenetration_velocity: 0.2
34 | default_buffer_size_multiplier: 4.0
35 | num_subscenes: 0
36 | contact_collection: 1
37 | gpu_contact_pairs_per_env: 512
38 | always_use_articulations: true
39 | num_threads: 4
40 | dt: 0.005
41 | graphicsDeviceId: 0 # Graphics device ID for rendering
42 |
43 | # Environment configuration
44 | env:
45 | numEnvs: 1024 # Basic config for training, can be overridden
46 | device: "cuda:0"
47 | viewer: false # Interactive visualization window
48 | videoRecord: false # Save video files to disk
49 | videoStream: false # Stream video over network
50 | controlMode: "position" # Control mode (position/position_delta) - can be overridden by tasks
51 | clipObservations: .inf # Observation clipping limit (inf = no clipping)
52 | clipActions: .inf # Action clipping limit (inf = no clipping)
53 |
54 |
55 | # Task configuration (RL task definition only)
56 | # Task-specific settings inherited from task/ configs
57 |
58 | # Training configuration (algorithm and training process)
59 | train:
60 | seed: 42
61 | torchDeterministic: false
62 | maxIterations: 10000
63 | test: false
64 | checkpoint: null
65 | envName: "rlgpu_dexhand" # RL Games environment name
66 | reloadInterval: 30 # Seconds between checkpoint reloads in test mode
67 | testGamesNum: 100 # Number of games to evaluate in test mode (0 = indefinite)
68 | logging:
69 | experimentName: null # Auto-generated if null
70 | rewardLogInterval: 100 # Log reward breakdown every N finished episodes (prevents TensorBoard sampling limit)
71 | logLevel: "info"
72 | noLogFile: false
73 | experiment:
74 | maxTrainRuns: 10 # Maximum number of recent training runs to keep in workspace
75 | maxTestRuns: 10 # Maximum number of recent testing runs to keep in workspace
76 |
--------------------------------------------------------------------------------
/prompts/feat-001-video-fps-control.md:
--------------------------------------------------------------------------------
1 | # feat-001-video-fps-control.md
2 |
3 | Implement automatic video FPS calculation based on simulation timing to ensure accurate playback speed.
4 |
5 | ## Context
6 |
7 | The current video recording system uses a fixed `videoFps` configuration that's independent of simulation timing, causing videos to play at incorrect speeds. This creates temporal inaccuracy where recorded videos don't match the actual simulation playback speed.
8 |
9 | **Root Cause**: VideoRecorder uses hardcoded FPS while simulation runs at a different frequency determined by `control_dt`.
10 |
11 | **Physics Relationship**: Simulation frequency = 1/control_dt, but video encoding uses unrelated config FPS.
12 |
13 | ## Current State
14 |
15 | - VideoRecorder initialized with fixed `env.videoFps` from config (default: 60.0)
16 | - All frames captured during render() are recorded without timing consideration
17 | - Video playback speed incorrect when simulation frequency ≠ configured videoFps
18 | - Example: 50Hz simulation (control_dt=0.02) + 60fps config = 1.2x speed video
19 |
20 | ## Desired Outcome
21 |
22 | - Video FPS automatically calculated as `1.0 / control_dt` for accurate real-time playback
23 | - Remove obsolete `videoFps` configuration option
24 | - Videos play back at correct simulation speed regardless of physics timing
25 | - Maintain all temporal information without frame dropping
26 |
27 | ## Constraints
28 |
29 | - **Two-Stage Initialization**: VideoRecorder created before `control_dt` is measured
30 | - **Component Architecture**: Must follow established `finalize_setup()` pattern like ActionProcessor
31 | - **Fail-Fast Philosophy**: No defensive programming - crash if VideoRecorder used before finalization
32 | - **Single Source of Truth**: FPS comes from physics timing, not config
33 |
34 | ## Implementation Notes
35 |
36 | **Architecture Pattern**: Implement deferred FPS setting following ActionProcessor model:
37 |
38 | 1. **Phase 1 (Creation)**: VideoRecorder(fps=None) before control_dt available
39 | 2. **Phase 2 (Finalization)**: video_recorder.finalize_fps(1.0 / control_dt) after measurement
40 |
41 | **Key Changes**:
42 | - Add `finalize_fps()` method to VideoRecorder class
43 | - Modify initialization to accept fps=None initially
44 | - Add finalization call in `_perform_control_cycle_measurement()`
45 | - Remove `videoFps` from base/video.yaml
46 | - Update create_video_recorder_from_config() to handle missing fps
47 |
48 | **Testing Approach**:
49 | - Verify different control_dt values produce correct video FPS
50 | - Test initialization order (finalize before recording)
51 | - Validate video playback speed matches simulation timing
52 |
53 | ## Dependencies
54 |
55 | - Requires understanding of two-stage initialization pattern
56 | - Must preserve existing video recording functionality
57 | - Should maintain HTTP streaming compatibility (uses separate FPS logic)
58 |
--------------------------------------------------------------------------------
/prompts/feat-000-streaming-port.md:
--------------------------------------------------------------------------------
1 | # HTTP Streaming Port Management Enhancement
2 |
3 | ## Status: ✅ COMPLETED (2025-07-30)
4 |
5 | ## Original Requirements
6 | - Change the default port to a uncommon one
7 | - Automatically increment port if the port is already in use
8 | - Add a quick option to bind all interfaces, not just localhost
9 |
10 | ## Implementation Summary
11 |
12 | ### Core Features Delivered
13 | 1. **Uncommon Default Port**: Changed from conflict-prone 8080 to 58080 (~90% fewer conflicts expected)
14 | 2. **Automatic Port Resolution**: Robust port conflict detection with up to 10 retry attempts (58080 → 58081 → ...)
15 | 3. **All-Interface Binding**: Optional 0.0.0.0 binding with security warnings via `videoStreamBindAll` config option
16 | 4. **CLI Convenience**: Added `streamBindAll` alias for easy command-line usage
17 |
18 | ### Architecture Improvements
19 | - **Single Source of Truth**: Host decision made once in constructor (eliminated repeated conditionals)
20 | - **Robust Port Detection**: Pre-test port availability using socket binding before Flask server start
21 | - **Clean Configuration**: Enhanced base/video.yaml with comprehensive security documentation
22 | - **Fail-Fast Philosophy**: No defensive programming, clear error handling and logging
23 |
24 | ### Files Modified
25 | - `base/video.yaml`: Updated port to 58080, added videoStreamBindAll option with security docs
26 | - `train.py`: Added stream_bind_all mapping for configuration processing
27 | - `cli_utils.py`: Added streamBindAll CLI alias
28 | - `http_video_streamer.py`: Enhanced constructor, port auto-increment logic, statistics
29 |
30 | ### Testing Verified
31 | - ✅ Port auto-increment functionality (58080 occupied → auto-increments to 58081)
32 | - ✅ Bind-all mode (correctly binds to 0.0.0.0 with security warnings)
33 | - ✅ CLI aliases work seamlessly (`streamBindAll=true`)
34 | - ✅ Configuration loading with proper defaults
35 | - ✅ Server accessibility and enhanced statistics reporting
36 |
37 | ### Usage Examples
38 | ```bash
39 | # Default configuration (localhost:58080)
40 | python train.py env.videoStream=true
41 |
42 | # With all-interface binding
43 | python train.py env.videoStream=true streamBindAll=true
44 |
45 | # Port conflicts automatically resolved
46 | # If 58080 is occupied, automatically uses 58081, 58082, etc.
47 | ```
48 |
49 | ### Impact
50 | - Reduces port conflicts by ~90% with uncommon default port
51 | - Automatic conflict resolution eliminates manual intervention
52 | - Flexible deployment options while maintaining security-conscious defaults
53 | - Improved user experience with clear logging and CLI shortcuts
54 | - Complete port management infrastructure with ~88 lines of focused changes
55 |
56 | ## Architecture Compliance
57 | - ✅ Fail-fast philosophy (no defensive programming)
58 | - ✅ Single source of truth for configuration
59 | - ✅ Component boundaries maintained
60 | - ✅ Clean code with minimal surface area changes
61 |
--------------------------------------------------------------------------------
/dexhand_env/rl/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | RL integration for DexHand environment.
3 |
4 | This module provides integration with RL libraries like rl_games.
5 | """
6 |
7 | from rl_games.common import env_configurations, vecenv
8 | from loguru import logger
9 |
10 |
11 | def register_rlgames_env():
12 | """Register the DexHand environment with rl_games."""
13 | from dexhand_env.factory import make_env
14 |
15 | def create_env(**kwargs):
16 | # Extract parameters
17 | num_envs = kwargs.pop("num_actors", 1024)
18 | sim_device = kwargs.pop("sim_device", "cuda:0")
19 | rl_device = kwargs.pop("rl_device", "cuda:0")
20 | graphics_device_id = kwargs.pop("graphics_device_id", 0)
21 | headless = kwargs.pop("headless", False)
22 | cfg = kwargs.pop("cfg", None)
23 | task_name = kwargs.pop("task_name", "BaseTask")
24 |
25 | # Create and return the environment directly
26 | return make_env(
27 | task_name=task_name,
28 | num_envs=num_envs,
29 | sim_device=sim_device,
30 | rl_device=rl_device,
31 | graphics_device_id=graphics_device_id,
32 | headless=headless,
33 | cfg=cfg,
34 | )
35 |
36 | # Register vecenv type for DexHand
37 | # Since DexHand already implements the standard Gym interface,
38 | # we can use a simple wrapper that just passes through calls
39 | class DexHandVecEnv(vecenv.IVecEnv):
40 | def __init__(self, config_name, num_actors, **kwargs):
41 | self.env = env_configurations.configurations[config_name]["env_creator"](
42 | **kwargs
43 | )
44 |
45 | def step(self, actions):
46 | return self.env.step(actions)
47 |
48 | def reset(self):
49 | return self.env.reset()
50 |
51 | def get_number_of_agents(self):
52 | return 1 # Single agent per environment
53 |
54 | def get_env_info(self):
55 | return {
56 | "action_space": self.env.action_space,
57 | "observation_space": self.env.observation_space,
58 | "num_envs": self.env.num_envs,
59 | }
60 |
61 | # Register the vecenv implementation
62 | vecenv.register(
63 | "RLGPU",
64 | lambda config_name, num_actors, **kwargs: DexHandVecEnv(
65 | config_name, num_actors, **kwargs
66 | ),
67 | )
68 |
69 | # Register with rl_games
70 | env_configurations.register(
71 | "rlgpu_dexhand",
72 | {
73 | "vecenv_type": "RLGPU",
74 | "env_creator": create_env,
75 | },
76 | )
77 |
78 | logger.info("DexHand environment registered with rl_games")
79 |
80 |
81 | # Import after function definitions to avoid import order issues
82 | from .reward_observer import RewardComponentObserver # noqa: E402
83 |
84 | __all__ = ["register_rlgames_env", "RewardComponentObserver"]
85 |
--------------------------------------------------------------------------------
/dexhand_env/constants.py:
--------------------------------------------------------------------------------
1 | """
2 | Central constants and configuration for DexHand environment.
3 |
4 | This module defines all shared constants to ensure single source of truth.
5 | """
6 |
7 | # DOF dimensions
8 | NUM_BASE_DOFS = 6 # ARTx, ARTy, ARTz, ARRx, ARRy, ARRz
9 | NUM_ACTIVE_FINGER_DOFS = 12 # 12 finger controls mapping to 19 DOFs with coupling
10 | NUM_TOTAL_FINGER_DOFS = 20 # 5 fingers × 4 joints (including fixed joint3_1)
11 | NUM_FINGERS = 5 # Thumb, index, middle, ring, pinky
12 |
13 | # Joint names
14 | BASE_JOINT_NAMES = ["ARTx", "ARTy", "ARTz", "ARRx", "ARRy", "ARRz"]
15 |
16 | FINGER_JOINT_NAMES = [
17 | "r_f_joint1_1",
18 | "r_f_joint1_2",
19 | "r_f_joint1_3",
20 | "r_f_joint1_4",
21 | "r_f_joint2_1",
22 | "r_f_joint2_2",
23 | "r_f_joint2_3",
24 | "r_f_joint2_4",
25 | "r_f_joint3_1",
26 | "r_f_joint3_2",
27 | "r_f_joint3_3",
28 | "r_f_joint3_4",
29 | "r_f_joint4_1",
30 | "r_f_joint4_2",
31 | "r_f_joint4_3",
32 | "r_f_joint4_4",
33 | "r_f_joint5_1",
34 | "r_f_joint5_2",
35 | "r_f_joint5_3",
36 | "r_f_joint5_4",
37 | ]
38 |
39 | # Body names for fingertips and fingerpads
40 | FINGERTIP_BODY_NAMES = [
41 | "r_f_link1_tip",
42 | "r_f_link2_tip",
43 | "r_f_link3_tip",
44 | "r_f_link4_tip",
45 | "r_f_link5_tip",
46 | ]
47 |
48 | FINGERPAD_BODY_NAMES = [
49 | "r_f_link1_pad",
50 | "r_f_link2_pad",
51 | "r_f_link3_pad",
52 | "r_f_link4_pad",
53 | "r_f_link5_pad",
54 | ]
55 |
56 | # Finger DOF coupling mapping (12 actions -> 19 DOFs)
57 | # Actions map to finger DOFs as follows:
58 | # 0: r_f_joint1_1 (thumb spread)
59 | # 1: r_f_joint1_2 (thumb MCP)
60 | # 2: r_f_joint1_3, r_f_joint1_4 (thumb DIP - coupled)
61 | # 3: r_f_joint2_1, r_f_joint4_1, r_f_joint5_1 (finger spread - coupled, 5_1 is 2x)
62 | # 4: r_f_joint2_2 (index MCP)
63 | # 5: r_f_joint2_3, r_f_joint2_4 (index DIP - coupled)
64 | # 6: r_f_joint3_2 (middle MCP)
65 | # 7: r_f_joint3_3, r_f_joint3_4 (middle DIP - coupled)
66 | # 8: r_f_joint4_2 (ring MCP)
67 | # 9: r_f_joint4_3, r_f_joint4_4 (ring DIP - coupled)
68 | # 10: r_f_joint5_2 (pinky MCP)
69 | # 11: r_f_joint5_3, r_f_joint5_4 (pinky DIP - coupled)
70 | # Note: r_f_joint3_1 is fixed at 0 (not controlled)
71 | FINGER_COUPLING_MAP = {
72 | 0: ["r_f_joint1_1"], # thumb spread
73 | 1: ["r_f_joint1_2"], # thumb MCP
74 | 2: ["r_f_joint1_3", "r_f_joint1_4"], # thumb DIP (coupled)
75 | 3: [
76 | ("r_f_joint2_1", 1.0),
77 | ("r_f_joint4_1", 1.0),
78 | ("r_f_joint5_1", 2.0),
79 | ], # finger spread (5_1 is 2x)
80 | 4: ["r_f_joint2_2"], # index MCP
81 | 5: ["r_f_joint2_3", "r_f_joint2_4"], # index DIP (coupled)
82 | 6: ["r_f_joint3_2"], # middle MCP
83 | 7: ["r_f_joint3_3", "r_f_joint3_4"], # middle DIP (coupled)
84 | 8: ["r_f_joint4_2"], # ring MCP
85 | 9: ["r_f_joint4_3", "r_f_joint4_4"], # ring DIP (coupled)
86 | 10: ["r_f_joint5_2"], # pinky MCP
87 | 11: ["r_f_joint5_3", "r_f_joint5_4"], # pinky DIP (coupled)
88 | }
89 |
--------------------------------------------------------------------------------
/prompts/fix-005-box-bounce-physics.md:
--------------------------------------------------------------------------------
1 | # fix-005-box-bounce-physics.md
2 |
3 | Fix box bouncing at initialization in BlindGrasping task
4 |
5 | ## Issue Description
6 |
7 | After completing refactor-005-default-values, the BlindGrasping task exhibits a consistent physics behavior change where the box bounces slightly during initialization. This did not occur before the refactor.
8 |
9 | ## Symptoms
10 |
11 | - **Consistent behavior**: Box bounces every time BlindGrasping environment initializes
12 | - **Timing**: Occurs during environment startup/initialization phase
13 | - **Task-specific**: Affects BlindGrasping task (BaseTask has no box to compare)
14 | - **Physics-related**: Appears to be actual physics simulation bounce, not visual glitch
15 |
16 | ## Investigation Context
17 |
18 | The refactor-005-default-values changes removed hardcoded defaults from `.get()` patterns throughout the codebase. While most changes should be functionally equivalent (replacing hardcoded defaults with explicit config values), one of the changes may have introduced a subtle difference affecting box initialization physics.
19 |
20 | ## Potential Cause Areas
21 |
22 | Based on the refactor changes, the most likely causes are:
23 |
24 | ### 1. Physics Simulation Parameters
25 | - **VecTask substeps change**: Original default was 2, config.yaml has 4
26 | - **Physics dt**: Now uses explicit config value instead of fallback
27 | - **Client threads**: Now uses explicit config value
28 |
29 | ### 2. Box Positioning Logic
30 | - Box spawn height calculation
31 | - Reference frame changes
32 | - Table/ground plane positioning
33 |
34 | ### 3. Initialization Timing
35 | - Order of physics parameter application
36 | - Tensor initialization sequence
37 | - Reset logic timing
38 |
39 | ## Current Status
40 |
41 | - **Root cause**: Not yet identified
42 | - **Workaround**: None needed (cosmetic issue)
43 | - **Priority**: Medium (affects realism but not functionality)
44 |
45 | ## Investigation Steps
46 |
47 | 1. Compare physics parameters before/after refactor
48 | 2. Check box initial position calculation
49 | 3. Verify ground plane/table reference positioning
50 | 4. Test with different substeps values to isolate physics parameter effects
51 | 5. Check initialization order and timing
52 |
53 | ## Success Criteria
54 |
55 | - Box initializes without bouncing
56 | - Physics behavior matches pre-refactor baseline
57 | - No regression in other physics aspects
58 |
59 | ## Configuration Context
60 |
61 | **Box Configuration (BlindGrasping.yaml):**
62 | ```yaml
63 | env:
64 | box:
65 | size: 0.05 # 5cm cube
66 | mass: 0.1 # 100g
67 | initial_position:
68 | z: 0.025 # Should place box on table surface (half-height)
69 | ```
70 |
71 | **Ground plane**: z = 0 (unchanged)
72 | **Expected result**: Box should rest stable on table without bouncing
73 |
74 | ## Notes
75 |
76 | - Issue is reproducible but cosmetic (doesn't break functionality)
77 | - Physics behavior appears more realistic (bouncing may be correct physics)
78 | - Original behavior may have been artificially stable due to physics parameter differences
79 |
--------------------------------------------------------------------------------
/dexhand_env/cfg/physics/README.md:
--------------------------------------------------------------------------------
1 | # Physics Configuration System
2 |
3 | This directory contains modular physics configurations that can be mixed and matched with different tasks for optimal performance vs. quality trade-offs.
4 |
5 | ## Configuration Files
6 |
7 | ### `default.yaml`
8 | - **Purpose**: Balanced quality/performance for standard development
9 | - **Settings**: 4 substeps, 16 position iterations, 0.001 contact_offset
10 | - **Use case**: BaseTask and general development work
11 | - **Performance**: Standard baseline
12 |
13 | ### `fast.yaml`
14 | - **Purpose**: Optimized for real-time visualization and testing
15 | - **Settings**: 2 substeps, 8 position iterations, 0.002 contact_offset
16 | - **Use case**: test_viewer, test_stream configs for smooth visualization
17 | - **Performance**: ~2-3x faster than default
18 | - **Trade-off**: Slightly reduced physics accuracy for speed
19 |
20 | ### `accurate.yaml`
21 | - **Purpose**: Maximum precision for training and research
22 | - **Settings**: 16 substeps, 32 position iterations, 0.001 contact_offset
23 | - **Use case**: Training and research requiring high precision
24 | - **Performance**: ~2-3x slower than default
25 | - **Trade-off**: Highest physics quality for computational cost
26 |
27 | ## Usage
28 |
29 | ### In Task Configurations
30 | Add physics config to task defaults:
31 | ```yaml
32 | defaults:
33 | - BaseTask
34 | - /physics/accurate # Override physics settings
35 | - _self_
36 | ```
37 |
38 | ### In Test Configurations
39 | Add physics config to test defaults:
40 | ```yaml
41 | defaults:
42 | - config
43 | - base/test
44 | - /physics/fast # Fast physics for smooth rendering
45 | - _self_
46 | ```
47 |
48 | ## Parameter Comparison
49 |
50 | | Parameter | fast | default | accurate |
51 | |-----------|------|---------|----------|
52 | | **substeps** | 2 | 4 | 16 |
53 | | **num_position_iterations** | 8 | 16 | 32 |
54 | | **contact_offset** | 0.002 | 0.001 | 0.001 |
55 | | **gpu_contact_pairs_per_env** | 256 | 512 | 512 |
56 | | **default_buffer_size_multiplier** | 2.0 | 4.0 | 4.0 |
57 |
58 | ## Performance Impact
59 |
60 | - **fast → default**: ~50% performance cost for better accuracy
61 | - **default → accurate**: ~100% performance cost for maximum precision
62 | - **fast → accurate**: ~300% performance cost for maximum quality
63 |
64 | ## Design Principles
65 |
66 | 1. **Task-specific dt**: Each task controls its own timestep (`dt`) for RL environment consistency
67 | 2. **Physics-specific substeps**: Physics configs control simulation parameters only
68 | 3. **Modular inheritance**: Mix and match physics profiles with any task
69 | 4. **Clear separation**: Physics simulation vs. RL environment timing are independent
70 |
71 | ## Migration from Monolithic Configs
72 |
73 | Before:
74 | ```yaml
75 | # MyTask.yaml
76 | sim:
77 | dt: 0.01
78 | substeps: 16
79 | physx:
80 | num_position_iterations: 32
81 | # ... other physics params
82 | ```
83 |
84 | After:
85 | ```yaml
86 | # Example: Custom task with accurate physics
87 | defaults:
88 | - BaseTask
89 | - /physics/accurate # Inherits all physics params
90 | - _self_
91 |
92 | sim:
93 | dt: 0.01 # Only RL timing control
94 | ```
95 |
96 | This provides clean separation and reusable physics profiles across different tasks and test scenarios.
97 |
--------------------------------------------------------------------------------
/dexhand_env/components/action/scaling.py:
--------------------------------------------------------------------------------
1 | """
2 | Action scaling utilities for DexHand environment.
3 |
4 | This module provides general mathematical scaling utilities for action processing.
5 | Contains no task-specific logic - pure mathematical operations for scaling and clamping.
6 | """
7 |
8 | import torch
9 |
10 |
11 | class ActionScaling:
12 | """
13 | Provides general mathematical utilities for action scaling.
14 |
15 | This component provides pure mathematical functions:
16 | - Scale actions from [-1, 1] to target ranges
17 | - Apply velocity-based deltas
18 | - Clamp values to limits
19 |
20 | No task-specific logic or conditional behavior.
21 | """
22 |
23 | def __init__(self, parent):
24 | """Initialize the action scaling utilities."""
25 | self.parent = parent
26 |
27 | @staticmethod
28 | def scale_to_limits(
29 | actions: torch.Tensor, lower_limits: torch.Tensor, upper_limits: torch.Tensor
30 | ) -> torch.Tensor:
31 | """
32 | Scale actions from [-1, 1] to specified limits.
33 |
34 | Args:
35 | actions: Raw actions in [-1, 1]
36 | lower_limits: Lower limits for scaling
37 | upper_limits: Upper limits for scaling
38 |
39 | Returns:
40 | Scaled actions in limit ranges
41 | """
42 | # Map from [-1, 1] to [lower, upper]
43 | # action = -1 → lower limit
44 | # action = +1 → upper limit
45 | return (actions + 1.0) * 0.5 * (upper_limits - lower_limits) + lower_limits
46 |
47 | @staticmethod
48 | def apply_velocity_deltas(
49 | prev_targets: torch.Tensor, actions: torch.Tensor, max_deltas: torch.Tensor
50 | ) -> torch.Tensor:
51 | """
52 | Apply velocity-scaled deltas to previous targets.
53 |
54 | Args:
55 | prev_targets: Previous target positions
56 | actions: Raw actions in [-1, 1]
57 | max_deltas: Maximum allowed deltas per timestep
58 |
59 | Returns:
60 | New targets with applied deltas
61 | """
62 | deltas = actions * max_deltas
63 | return prev_targets + deltas
64 |
65 | @staticmethod
66 | def clamp_to_limits(
67 | targets: torch.Tensor, lower_limits: torch.Tensor, upper_limits: torch.Tensor
68 | ) -> torch.Tensor:
69 | """
70 | Clamp targets to specified limits.
71 |
72 | Args:
73 | targets: Target values to clamp
74 | lower_limits: Lower limits
75 | upper_limits: Upper limits
76 |
77 | Returns:
78 | Clamped targets
79 | """
80 | return torch.clamp(targets, lower_limits, upper_limits)
81 |
82 | @staticmethod
83 | def apply_velocity_clamp(
84 | new_targets: torch.Tensor, prev_targets: torch.Tensor, max_deltas: torch.Tensor
85 | ) -> torch.Tensor:
86 | """
87 | Clamp target changes to respect velocity limits.
88 |
89 | Args:
90 | new_targets: Proposed new targets
91 | prev_targets: Previous targets
92 | max_deltas: Maximum allowed change per timestep
93 |
94 | Returns:
95 | Velocity-clamped targets
96 | """
97 | delta = new_targets - prev_targets
98 | clamped_delta = torch.clamp(delta, -max_deltas, max_deltas)
99 | return prev_targets + clamped_delta
100 |
--------------------------------------------------------------------------------
/prompts/refactor-009-config-yaml.md:
--------------------------------------------------------------------------------
1 | # Configuration Architecture Cleanup
2 |
3 | ## Problem Statement
4 |
5 | `config.yaml` is serving two different purposes and becoming unwieldy:
6 |
7 | 1. **Primary purpose**: Training pipeline configuration for `train.py`
8 | 2. **Secondary purpose**: Test script configuration for `examples/dexhand_test.py`
9 |
10 | The test script settings (lines 16-23: `steps`, `sleep`, `debug`, `log_level`, `enablePlotting`, `plotEnvIdx`) are mixed into the main configuration, creating separation of concerns issues.
11 |
12 | Additionally, `debug.yaml` has naming inconsistency, using `training:` section instead of `train:` which conflicts with the main config structure.
13 |
14 | ## Configuration Analysis
15 |
16 | **Three distinct "test" concepts identified:**
17 |
18 | 1. **Test Script** (`examples/dexhand_test.py`): Environment functional testing
19 | - Uses `test_render.yaml` currently
20 | - Settings: `steps`, `sleep`, `debug`, `log_level`, `enablePlotting`, `plotEnvIdx`
21 | - Purpose: Validate environment implementation
22 |
23 | 2. **Policy Testing** (`base/test.yaml`): Evaluation of trained RL policies
24 | - Settings: `env.numEnvs`, `train.test`, `train.maxIterations`
25 | - Purpose: Evaluate trained policies
26 |
27 | 3. **Policy Testing with Viewer** (`test_render.yaml`): Policy evaluation with visualization
28 | - Inherits from `base/test.yaml` + enables `env.viewer: true`
29 | - Purpose: Visual policy evaluation
30 |
31 | **Naming Issue**: `test_render.yaml` uses deprecated "render" terminology. Per refactor-004-render.md, "viewer" is the correct semantic term.
32 |
33 | ## Solution
34 |
35 | ### 1. Create Dedicated Test Script Configuration
36 | Create `test_script.yaml` in `dexhand_env/cfg/` with:
37 | - Inherits from main `config.yaml` via Hydra defaults
38 | - Contains only test script specific settings: `steps`, `sleep`, `debug`, `log_level`, `enablePlotting`, `plotEnvIdx`
39 | - Removes clutter from main training configuration
40 |
41 | ### 2. Clean Main Configuration
42 | Remove test script settings from `config.yaml` (lines 16-23) to focus on training pipeline.
43 |
44 | ### 3. Clean Base Test Configuration
45 | Remove duplicate test script settings from `base/test.yaml` (lines 5-13), keep only policy evaluation settings.
46 |
47 | ### 4. Update Test Script
48 | Modify `examples/dexhand_test.py` to use dedicated `test_script.yaml` configuration.
49 |
50 | ### 5. Fix Debug Configuration
51 | Correct naming inconsistency in `debug.yaml` from `training:` to `train:`.
52 |
53 | ### 6. Rename for Semantic Clarity
54 | Rename `test_render.yaml` → `test_viewer.yaml` to follow new naming conventions from refactor-004-render.md.
55 |
56 | ### 7. Update Documentation References
57 | Update all documentation files that reference `test_render.yaml` to use `test_viewer.yaml`:
58 | - Search codebase for `test_render` references in documentation
59 | - Update CLI examples, usage instructions, and configuration guides
60 | - Ensure consistency with new naming conventions
61 |
62 | ## Expected Outcome
63 |
64 | - Clear separation of concerns between three test types
65 | - `config.yaml` focused purely on training pipeline
66 | - `test_script.yaml` focused on environment functional testing
67 | - `test_viewer.yaml` focused on policy evaluation with visualization
68 | - `base/test.yaml` focused on common policy evaluation settings
69 | - Consistent naming conventions throughout
70 | - All documentation updated to reflect new file names
71 | - No functionality changes
72 |
--------------------------------------------------------------------------------
/examples/README.md:
--------------------------------------------------------------------------------
1 | # DexHand Examples
2 |
3 | This directory contains example scripts for testing and demonstrating the DexHand environment.
4 |
5 | ## dexhand_test.py
6 |
7 | Comprehensive test script for the DexHand environment using BaseTask only.
8 |
9 | ### Usage
10 |
11 | The test script is hardcoded to use BaseTask and supports Hydra configuration overrides:
12 |
13 | ```bash
14 | # Basic test (BaseTask only)
15 | python examples/dexhand_test.py
16 |
17 | # Headless test with custom parameters
18 | python examples/dexhand_test.py headless=true steps=100 env.numEnvs=16
19 |
20 | # Test with different control modes
21 | python examples/dexhand_test.py env.controlMode=position_delta env.policyControlsHandBase=false
22 |
23 | # Debug mode with verbose logging
24 | python examples/dexhand_test.py debug=true log_level=debug
25 |
26 | # Enable video recording and plotting
27 | python examples/dexhand_test.py env.videoRecord=true enablePlotting=true
28 | ```
29 |
30 | ### Configuration Parameters
31 |
32 | All configuration parameters can be overridden via command line:
33 |
34 | **Test Settings:**
35 | - `steps` (1200) - Total number of test steps to run
36 | - `sleep` (0.01) - Sleep time between steps in seconds
37 | - `device` ("cuda:0") - Device for simulation and RL
38 | - `headless` (false) - Run without GUI visualization
39 | - `debug` (false) - Enable debug output and additional logging
40 | - `log_level` ("info") - Logging level (debug/info/warning/error)
41 |
42 | **Environment Settings:**
43 | - `env.numEnvs` (1024) - Number of parallel environments
44 | - `env.controlMode` ("position") - Control mode (position/position_delta)
45 | - `env.policyControlsHandBase` (true) - Include hand base in policy action space
46 | - `env.policyControlsFingers` (true) - Include fingers in policy action space
47 |
48 | **Recording & Visualization:**
49 | - `env.videoRecord` (false) - Enable video recording (works in headless mode)
50 | - `enablePlotting` (false) - Enable real-time plotting with Rerun
51 | - `plotEnvIdx` (0) - Environment index to plot
52 |
53 | **Task Selection:**
54 | - Script is hardcoded to use BaseTask only (basic task with contact test boxes)
55 |
56 | ### Key Features
57 |
58 | 1. **BaseTask Focus**: Hardcoded to BaseTask for reliable and consistent testing of core functionality
59 |
60 | 2. **Action Verification**: Tests all DOF mappings with sequential action patterns to verify control modes
61 |
62 | 3. **Performance Profiling**: Optional timing analysis for step processing components
63 |
64 | 4. **Real-time Plotting**: Integration with Rerun for visualization (when available)
65 |
66 | 5. **Video Recording**: Supports video capture in both windowed and headless modes
67 |
68 | 6. **Comprehensive Logging**: Detailed information about environment setup, action mappings, and system state
69 |
70 | ### Keyboard Controls (Non-headless mode)
71 |
72 | - `SPACE` - Toggle random actions mode
73 | - `E` - Reset current environment
74 | - `G` - Toggle between single robot and global view
75 | - `UP/DOWN` - Navigate between robots
76 | - `ENTER` - Toggle camera view mode
77 | - `C` - Toggle contact force visualization
78 |
79 | ### Configuration System
80 |
81 | The test script inherits configuration from the main Hydra config system:
82 |
83 | - Base configuration: `dexhand_env/cfg/config.yaml`
84 | - Task configuration: `dexhand_env/cfg/task/BaseTask.yaml` (hardcoded)
85 | - Physics configurations: `dexhand_env/cfg/physics/default.yaml`
86 |
87 | For testing other tasks, use the training script (`train.py`) instead.
88 |
--------------------------------------------------------------------------------
/dexhand_env/utils/config_utils.py:
--------------------------------------------------------------------------------
1 | """
2 | Essential configuration utilities for DexHand.
3 |
4 | Provides minimal, fail-fast config validation and helper functions.
5 | """
6 |
7 | import yaml
8 | from omegaconf import DictConfig, OmegaConf
9 | from pathlib import Path
10 | from typing import Dict, Any, Optional, Union
11 | from loguru import logger
12 |
13 |
14 | def validate_config(cfg: DictConfig) -> None:
15 | """
16 | Validate configuration with fail-fast approach.
17 |
18 | Args:
19 | cfg: Configuration to validate
20 |
21 | Raises:
22 | AttributeError: If required fields are missing (fail fast)
23 | ValueError: If critical values are invalid
24 | """
25 | # Required fields - let AttributeError crash if missing (fail fast)
26 | cfg.task.name
27 | cfg.env.numEnvs
28 | cfg.env.device
29 | cfg.train.seed
30 | cfg.train.test
31 |
32 | # Basic sanity checks - crash on obviously bad values
33 | if cfg.env.numEnvs <= 0:
34 | raise ValueError(f"numEnvs must be positive, got {cfg.env.numEnvs}")
35 |
36 | if cfg.env.device not in ["cuda:0", "cpu"]:
37 | raise ValueError(f"device must be 'cuda:0' or 'cpu', got {cfg.env.device}")
38 |
39 |
40 | def validate_checkpoint_exists(checkpoint_path: Optional[str]) -> bool:
41 | """Validate that checkpoint file exists if specified."""
42 | if checkpoint_path is None or checkpoint_path == "null":
43 | return True
44 |
45 | checkpoint_file = Path(checkpoint_path)
46 | if not checkpoint_file.exists():
47 | logger.error(f"Checkpoint file does not exist: {checkpoint_path}")
48 | return False
49 |
50 | return True
51 |
52 |
53 | def get_experiment_name(cfg: DictConfig, timestamp: str) -> str:
54 | """Generate experiment name from config and timestamp."""
55 | # Check if experimentName is explicitly set (optional field)
56 | try:
57 | experiment_name = cfg.train.logging.experimentName
58 | if experiment_name is not None:
59 | return experiment_name
60 | except (AttributeError, KeyError):
61 | pass # logging section or experimentName not present
62 |
63 | # Simple naming: task + mode + timestamp
64 | mode = "test" if cfg.train.test else "train"
65 | return f"{cfg.task.name}_{mode}_{timestamp}"
66 |
67 |
68 | def resolve_config_safely(cfg: DictConfig) -> Dict[str, Any]:
69 | """Safely resolve configuration with better error handling."""
70 | try:
71 | return OmegaConf.to_container(cfg, resolve=True)
72 | except Exception as e:
73 | error_msg = str(e)
74 | if "InterpolationKeyError" in error_msg and "not found" in error_msg:
75 | missing_key = error_msg.split("'")[1] if "'" in error_msg else "unknown"
76 | raise ValueError(f"Config key '{missing_key}' not found") from e
77 | raise ValueError(f"Config resolution failed: {error_msg}") from e
78 |
79 |
80 | def save_config(cfg: DictConfig, output_dir: Path) -> None:
81 | """Save configuration to output directory."""
82 | with open(output_dir / "config.yaml", "w") as f:
83 | OmegaConf.save(cfg, f)
84 |
85 |
86 | def load_config(config_path: Union[str, Path]) -> Dict[str, Any]:
87 | """Load configuration from YAML file."""
88 | config_path = Path(config_path)
89 |
90 | if not config_path.exists():
91 | raise FileNotFoundError(f"Configuration file not found: {config_path}")
92 |
93 | with open(config_path, "r", encoding="utf-8") as f:
94 | return yaml.safe_load(f)
95 |
--------------------------------------------------------------------------------
/dexhand_env/README.md:
--------------------------------------------------------------------------------
1 | # DexHandEnv: Dexterous Manipulation Environment
2 |
3 | This package provides a reinforcement learning environment for dexterous manipulation tasks with robotic hands, built on top of NVIDIA's IsaacGym.
4 |
5 | ## Overview
6 |
7 | DexHandEnv is a modular, component-based framework for dexterous manipulation research. It provides:
8 |
9 | - A unified framework for creating dexterous manipulation tasks
10 | - A component-based architecture for better code organization and reusability
11 | - A configurable environment with various observation and action spaces
12 | - Pre-built tasks like grasping and reorientation
13 |
14 | ## Installation
15 |
16 | ```bash
17 | # Clone the repository
18 | git clone https://github.com/dexrobot/dexrobot_isaac.git
19 | cd dexrobot_isaac
20 |
21 | # Install the package
22 | pip install -e .
23 | ```
24 |
25 | ## Usage
26 |
27 | ### Running Example Tasks
28 |
29 | ```bash
30 | # Run the DexGrasp task
31 | python examples/run_dex_grasp.py
32 |
33 | # Run headless
34 | python examples/run_dex_grasp.py --headless
35 |
36 | # Specify number of environments
37 | python examples/run_dex_grasp.py --num_envs 4
38 | ```
39 |
40 | ### Creating Custom Tasks
41 |
42 | To create a custom task, extend the `BaseTask` class:
43 |
44 | ```python
45 | from dexhand_env.tasks.base_task import BaseTask
46 |
47 | class MyCustomTask(BaseTask):
48 | def __init__(self, sim, gym, device, num_envs, cfg):
49 | super().__init__(sim, gym, device, num_envs, cfg)
50 | # Initialize task-specific parameters
51 |
52 | def compute_task_reward_terms(self, obs_dict):
53 | # Compute task-specific rewards
54 | return {"my_reward": torch.ones(self.num_envs, device=self.device)}
55 |
56 | # Implement other required methods...
57 | ```
58 |
59 | Then register your task in the factory:
60 |
61 | ```python
62 | # In factory.py
63 | from custom_tasks import MyCustomTask
64 |
65 | def create_dex_env(...):
66 | # ...
67 | elif task_name == "MyCustomTask":
68 | task = MyCustomTask(None, None, torch.device(sim_device), cfg["env"]["numEnvs"], cfg)
69 | # ...
70 | ```
71 |
72 | ## Architecture
73 |
74 | The DexHandEnv package uses a component-based architecture:
75 |
76 | - `DexHandBase`: Main environment class that implements common functionality
77 | - `DexTask`: Interface for task-specific behavior
78 | - Components:
79 | - `CameraController`: Handles camera control and keyboard shortcuts
80 | - `FingertipVisualizer`: Visualizes fingertip contacts with color
81 | - `SuccessFailureTracker`: Tracks success and failure criteria
82 | - `RewardCalculator`: Calculates rewards
83 |
84 | ## Configuration
85 |
86 | The environment is configured using a dictionary-like structure:
87 |
88 | ```python
89 | cfg = {
90 | "env": {
91 | "numEnvs": 2,
92 | "episodeLength": 1000,
93 | # ... more environment parameters
94 | },
95 | "sim": {
96 | "dt": 0.01,
97 | "substeps": 2,
98 | "gravity": [0.0, 0.0, -9.81],
99 | # ... more simulation parameters
100 | },
101 | "task": {
102 | # Task-specific parameters
103 | },
104 | "reward": {
105 | # Reward-specific parameters
106 | }
107 | }
108 | ```
109 |
110 | ## License
111 |
112 | See the LICENSE file for licensing information.
113 |
114 | ## Acknowledgements
115 |
116 | This package is developed by DexRobot Inc. It builds upon NVIDIA's IsaacGym and leverages ideas from various reinforcement learning frameworks.
117 |
--------------------------------------------------------------------------------
/prompts/refactor-002-graphics-manager-in-parent.md:
--------------------------------------------------------------------------------
1 | # refactor-002-graphics-manager-in-parent.md
2 |
3 | Align GraphicsManager ecosystem with established component architecture patterns.
4 |
5 | ## Context
6 |
7 | The DexRobot Isaac project follows a strict component architecture pattern where components access sibling components through parent references and property decorators, maintaining single source of truth principles. This pattern ensures:
8 |
9 | - Clean separation of concerns
10 | - Fail-fast behavior when dependencies are missing
11 | - Consistent initialization order through two-stage pattern
12 | - Reduced coupling between components
13 |
14 | From CLAUDE.md architectural guidelines:
15 | - Components should only take `parent` in constructor
16 | - Use `@property` decorators to access sibling components via parent
17 | - Never store direct references to sibling components
18 |
19 | ## Current State
20 |
21 | **✅ GraphicsManager correctly follows the pattern:**
22 | ```python
23 | class GraphicsManager:
24 | def __init__(self, parent): # ✅ Only parent reference
25 | self.parent = parent
26 |
27 | @property
28 | def device(self): # ✅ Property decorator for parent access
29 | return self.parent.device
30 | ```
31 |
32 | **❌ VideoManager violates the pattern:**
33 | ```python
34 | class VideoManager:
35 | def __init__(self, parent, graphics_manager): # ❌ Direct sibling reference
36 | self.parent = parent
37 | self.graphics_manager = graphics_manager # ❌ Stored direct reference
38 | ```
39 |
40 | **❌ ViewerController violates the pattern:**
41 | ```python
42 | class ViewerController:
43 | def __init__(self, parent, gym, sim, env_handles, headless, graphics_manager): # ❌ Multiple direct references
44 | # ... stores direct references instead of using parent
45 | ```
46 |
47 | ## Desired Outcome
48 |
49 | All graphics-related components follow the established architectural pattern:
50 |
51 | 1. **VideoManager** - Access graphics_manager via property decorator
52 | 2. **ViewerController** - Access all dependencies via parent/property decorators
53 | 3. **DexHandBase** - Update instantiation calls to only pass parent references
54 |
55 | This creates consistent, maintainable architecture aligned with other components like ActionProcessor, RewardCalculator, etc.
56 |
57 | ## Constraints
58 |
59 | - **Maintain exact functionality** - No behavioral changes, only architectural alignment
60 | - **Respect two-stage initialization** - Components may need finalize_setup() if they depend on control_dt
61 | - **Follow fail-fast philosophy** - Let dependencies crash if parent/sibling is None
62 | - **Single source of truth** - Parent holds canonical references, components access via properties
63 |
64 | ## Implementation Notes
65 |
66 | **VideoManager refactoring:**
67 | - Remove `graphics_manager` parameter from constructor
68 | - Add `@property def graphics_manager(self): return self.parent.graphics_manager`
69 | - Update instantiation in DexHandBase
70 |
71 | **ViewerController refactoring:**
72 | - Remove direct dependency parameters (gym, sim, env_handles, graphics_manager)
73 | - Add property decorators for accessing these via parent
74 | - May need property decorators for gym, sim, env_handles if not already available on parent
75 |
76 | **Testing approach:**
77 | - Use existing test command: `python train.py config=test_stream render=true task=BlindGrasping device=cuda:0 checkpoint=runs/BlindGrasping_train_20250724_120120/nn/BlindGrasping.pth numEnvs=1`
78 | - Verify video recording and viewer functionality unchanged
79 | - Test both headless and viewer modes
80 |
81 | ## Dependencies
82 |
83 | None - this is a pure architectural refactoring that doesn't affect external interfaces.
84 |
--------------------------------------------------------------------------------
/dexhand_env/tasks/README.md:
--------------------------------------------------------------------------------
1 | # DexHand Tasks
2 |
3 | This directory contains the implementation of the dexterous hand tasks using the new component-based architecture.
4 |
5 | ## Task Structure
6 |
7 | The tasks follow a component-based architecture:
8 |
9 | - `dexhand_base.py`: Base class for all dexterous hand tasks
10 | - `task_interface.py`: Interface that all task implementations must implement
11 | - `base/vec_task.py`: Base class for vector tasks (inherited by DexHandBase)
12 |
13 | ## Key Features
14 |
15 | ### Auto-detection of Physics Steps Per Control Step
16 |
17 | The environment now automatically detects how many physics steps are required between each control step based on reset stability requirements. This eliminates the need for manually configuring `controlFrequencyInv` and ensures that the environment is stable regardless of the physics setup.
18 |
19 | Usage:
20 | ```python
21 | # The environment will automatically detect the number of physics steps per control step
22 | env = create_dex_env(...)
23 | ```
24 |
25 | ### Configurable Action Space
26 |
27 | The action space is now configurable, allowing you to specify which DOFs are controlled by the policy:
28 |
29 | - `controlHandBase`: Whether the policy controls the hand base (6 DOFs)
30 | - `controlFingers`: Whether the policy controls the finger joints (12 DOFs)
31 |
32 | For DOFs not controlled by the policy, you can specify default targets or implement a custom control policy in the task.
33 |
34 | Example configuration:
35 | ```yaml
36 | env:
37 | controlHandBase: false # Base not controlled by policy
38 | controlFingers: true # Fingers controlled by policy
39 | defaultBaseTargets: [0.0, 0.0, 0.5, 0.0, 0.0, 0.0] # Default targets for base
40 | ```
41 |
42 | For task-specific control of uncontrolled DOFs, implement the `get_task_dof_targets` method in your task class:
43 |
44 | ```python
45 | def get_task_dof_targets(self, num_envs, device, base_controlled, fingers_controlled):
46 | # Return targets for DOFs not controlled by the policy
47 | targets = {}
48 |
49 | if not base_controlled:
50 | # Example: Make the base follow a circular trajectory
51 | # Use episode time for smooth trajectory (assumes control_dt is available)
52 | # This creates a full circle every ~6.28 seconds
53 | episode_time = self.episode_step_count.float() * self.control_dt
54 | base_targets = torch.zeros((num_envs, 6), device=device)
55 | base_targets[:, 0] = 0.3 * torch.sin(episode_time) # x position
56 | base_targets[:, 1] = 0.3 * torch.cos(episode_time) # y position
57 | base_targets[:, 2] = 0.5 # z position (fixed height)
58 | targets["base_targets"] = base_targets
59 |
60 | if not fingers_controlled and hasattr(self, 'object_pos'):
61 | # Example: Make fingers dynamically respond to object position
62 | finger_targets = self._compute_grasp_targets(self.object_pos)
63 | targets["finger_targets"] = finger_targets
64 |
65 | return targets
66 | ```
67 |
68 | This allows for complex scenarios such as:
69 | - **Dynamic trajectories**: The base or fingers can follow time-varying trajectories
70 | - **State-dependent control**: Targets can depend on the state of the environment (e.g., object positions)
71 | - **Task-phase control**: Different control strategies can be used during different phases of a task
72 | - **Hybrid control**: Some DOFs can be controlled by the policy while others follow programmed behaviors
73 |
74 | The task has complete control over what targets are returned, and can implement arbitrarily complex control laws. If the task returns `None` or omits a key from the targets dictionary, the environment will use the default targets specified in the configuration.
75 |
--------------------------------------------------------------------------------
/prompts/fix-007-episode-length-of-grasping.md:
--------------------------------------------------------------------------------
1 | # fix-007-episode-length-of-grasping.md
2 |
3 | Fix BlindGrasping task episode length inconsistencies and configuration inheritance issues.
4 |
5 | ## Context
6 |
7 | BlindGrasping task episodes are terminating at inconsistent step counts (399-499 steps) instead of the expected 500 steps, and the physics timing configuration is not being applied correctly due to fundamental configuration inheritance architecture problems.
8 |
9 | **Observed Symptoms:**
10 | - Episodes ending at 399-499 steps instead of 500
11 | - Physics running at 200Hz (dt=0.005) instead of expected 100Hz (dt=0.01)
12 | - Control cycle requiring 2 physics steps instead of 1
13 | - "Early failure seems not enforced" behavior
14 |
15 | **Root Cause Analysis:**
16 | The configuration inheritance order is architecturally wrong. Main config.yaml has `_self_` positioned last in defaults, causing its `sim.dt: 0.005` to override task-specific settings like BlindGrasping's `sim.dt: 0.01`.
17 |
18 | ## Current State
19 |
20 | **Broken Configuration Hierarchy:**
21 | 1. Main config.yaml loads task (BlindGrasping: dt=0.01)
22 | 2. Main config.yaml applies `_self_` LAST (dt=0.005 overrides task)
23 | 3. Result: Wrong physics timing affects episode behavior
24 |
25 | **Evidence:**
26 | ```
27 | physics_dt: 0.005000s # Should be 0.01 for BlindGrasping
28 | physics_steps_per_control: 2 # Should be 1 with correct dt
29 | control_dt: 0.010000s # Correct result achieved wrong way
30 | ```
31 |
32 | ## Desired Outcome
33 |
34 | **Correct Configuration Architecture:**
35 | - Task-specific settings ALWAYS override base/global settings
36 | - BlindGrasping runs with dt=0.01 (100Hz physics, 1 physics step per control)
37 | - BaseTask continues with dt=0.005 (200Hz physics)
38 | - Episodes run for full 500 steps when no early termination criteria are met
39 |
40 | **Physics Timing Goals:**
41 | - BlindGrasping: 100Hz physics (dt=0.01) with 1 physics step per control
42 | - BaseTask: 200Hz physics (dt=0.005) with 2 physics steps per control
43 | - Other tasks: Can specify their own optimal dt values
44 |
45 | ## Constraints
46 |
47 | **Architectural Principles:**
48 | - Follow fail-fast philosophy: task configs should not need defensive checks
49 | - Maintain component responsibility separation
50 | - Respect configuration inheritance: specialized overrides general
51 | - No breaking changes to existing BaseTask behavior
52 |
53 | **Configuration Design Rules:**
54 | - Main config.yaml: Only global defaults that apply to ALL tasks
55 | - Task configs: Task-specific overrides that should never be overridden
56 | - CLI overrides: Should work as expected for task-specific parameters
57 |
58 | ## Implementation Notes
59 |
60 | **Primary Fix:**
61 | 1. Fix Hydra inheritance order so task-specific configs properly override base config
62 | 2. Investigate moving `_self_` position in main config.yaml defaults list
63 | 3. Ensure BlindGrasping.yaml's `sim.dt: 0.01` overrides config.yaml's `sim.dt: 0.005`
64 | 4. Maintain base config.yaml as source of global defaults that tasks can override
65 |
66 | **Testing Requirements:**
67 | - Verify BlindGrasping shows `physics_dt: 0.010000s` and `physics_steps_per_control: 1`
68 | - Verify BaseTask shows `physics_dt: 0.005000s` and `physics_steps_per_control: 2`
69 | - Test episode lengths reach full 500 steps when no early termination occurs
70 | - Validate termination criteria work correctly with proper timing
71 |
72 | **Secondary Investigation:**
73 | - Check if other parameters in main config.yaml have same inheritance problem
74 | - Investigate if episode termination issues persist after physics timing fix
75 | - Analyze whether early termination criteria need adjustment for new timing
76 |
77 | ## Dependencies
78 |
79 | - Configuration system understanding (Hydra inheritance order)
80 | - Physics timing measurement system (control_dt calculation)
81 | - Episode termination logic in BlindGrasping task
82 |
--------------------------------------------------------------------------------
/prompts/fix-003-max-iterations.md:
--------------------------------------------------------------------------------
1 | # fix-003-max-iterations.md
2 |
3 | Fix maxIterations config override and train.py cleanup
4 |
5 | ## Context
6 |
7 | The maxIterations configuration system has several issues that violate the fail-fast philosophy:
8 | 1. **Hardcoded defaults**: `get_config_overrides()` in train.py has brittle hardcoded checks like `if cfg.train.maxIterations != 10000`
9 | 2. **Missing shorthand alias**: No simple `maxIterations` alias (requires full `train.maxIterations`)
10 | 3. **Test mode doesn't respect maxIterations**: `python train.py train.test=true train.maxIterations=500` has no effect in test mode
11 | 4. **Defensive programming**: train.py contains hardcoded fallbacks that should be eliminated
12 | 5. **Configuration structure inconsistency**: train_headless.yaml uses wrong section name
13 |
14 | **Note**: CLI overrides like `python train.py train.maxIterations=5000` DO work correctly for training mode. The interpolation in BaseTaskPPO.yaml works as expected.
15 |
16 | ## Current State
17 |
18 | **Problematic code in train.py:146**:
19 | ```python
20 | def get_config_overrides(cfg: DictConfig) -> List[str]:
21 | # ... other checks with hardcoded defaults ...
22 | if cfg.train.maxIterations != 10000: # Default from config.yaml
23 | overrides.append(f"train.maxIterations={cfg.train.maxIterations}")
24 | # ... more hardcoded checks ...
25 | ```
26 |
27 | **Inconsistent alias naming in cli_utils.py:48**:
28 | ```python
29 | ALIASES = {
30 | "numEnvs": "env.numEnvs",
31 | "maxIter": "train.maxIterations", # Should be replaced with "maxIterations"
32 | # Missing: "maxIterations": "train.maxIterations"
33 | }
34 | ```
35 |
36 | **Decision**: Expert consensus recommends standardizing on `maxIterations` for clarity and consistency with config files. The `maxIter` alias should be removed in favor of the explicit form.
37 |
38 | **Test mode issue**:
39 | - `python train.py train.test=true train.maxIterations=500` - maxIterations ignored in test mode
40 | - Test mode runs indefinitely or until manual termination
41 | - Related to feat-002-indefinite-testing.md
42 |
43 | **Config structure inconsistency in train_headless.yaml:12-14**:
44 | ```yaml
45 | training: # Should be "train:"
46 | maxIterations: 10000
47 | ```
48 |
49 | ## Desired Outcome
50 |
51 | 1. **Remove hardcoded defaults**: Follow fail-fast philosophy - always include configuration values in reproducible commands
52 | 2. **Standardize on explicit alias**: Replace `maxIter` with `maxIterations` for clarity and consistency with config files
53 | 3. **Fix config structure**: Correct train_headless.yaml section name
54 | 4. **Clean code quality**: Remove defensive programming patterns from get_config_overrides()
55 |
56 | **Note**: Test mode iteration control is handled separately in feat-002-indefinite-testing.md
57 |
58 | ## Constraints
59 |
60 | - **Fail-fast philosophy**: No defensive programming with hardcoded fallbacks
61 | - **Single source of truth**: Configuration values come from config files only
62 | - **Reproducibility**: get_config_overrides() must generate accurate command reconstruction
63 | - **Breaking change acceptable**: `maxIter` removal justified by clarity benefits in research context
64 |
65 | ## Implementation Notes
66 |
67 | 1. **Remove hardcoded checks**: Change from "only include if different from default" to "always include key values"
68 | 2. **Replace alias**: Change `"maxIter": "train.maxIterations"` to `"maxIterations": "train.maxIterations"` in cli_utils.py ALIASES
69 | 3. **Fix config**: Change `training:` to `train:` in train_headless.yaml
70 | 4. **Clean up function**: Apply same principle to other hardcoded checks in get_config_overrides()
71 | 5. **Breaking change**: `maxIter` will no longer work - users must use `maxIterations`
72 |
73 | **Rationale**: Expert consensus (o3-mini + Gemini Pro) strongly favors explicit naming for research/ML contexts where clarity and reproducibility outweigh CLI brevity concerns.
74 |
75 | ## Dependencies
76 |
77 | None - isolated configuration management fix.
78 |
--------------------------------------------------------------------------------
/prompts/doc-003-action-processing-illustration.md:
--------------------------------------------------------------------------------
1 | # doc-003-action-processing-illustration.md
2 |
3 | Add action processing timing illustration to existing guide-action-pipeline.md
4 |
5 | ## Background
6 | The refactor-006 split action processing timing to align with RL rollout patterns:
7 | - **post_physics (step N-1)**: Observation computation + pre-action rule (pipeline stage 1)
8 | - **pre_physics (step N)**: Action rule + post-action filters + coupling (pipeline stages 2-4)
9 |
10 | ## Implementation Plan
11 |
12 | ### Phase 1: Enhance Existing Documentation
13 | **File**: `docs/guide-action-pipeline.md` (modify existing)
14 | - **Add new section**: "Timing and Execution Flow"
15 | - **Explain timing split**: WHY post_physics vs pre_physics phases exist
16 | - **Stage mapping**: How 4-stage pipeline maps to execution phases
17 | - **RL alignment**: Why this timing benefits RL framework patterns
18 | - **Integration**: Keep all action processing concepts in single document
19 |
20 | ### Phase 2: Visual Diagram - Content and Layout Specifications
21 |
22 | **File**: `docs/assets/action-processing-timeline.svg` (new)
23 |
24 | #### Primary Content Structure:
25 | - **Two Control Steps**: Step N-1 and Step N showing temporal relationship
26 | - **Four Pipeline Stages**: Stage 1 (Pre-Action Rule), Stage 2 (Action Rule), Stage 3 (Post-Action Filters), Stage 4 (Coupling Rule)
27 | - **Timing Phase Mapping**: Stage 1 → post_physics phase, Policy forward + Stages 2-4 → pre_physics phase
28 | - **Data Flow Elements**: Specific tensor variables labeled on arrows (active_prev_targets, active_rule_targets, actions, active_raw_targets, active_next_targets, full_dof_targets)
29 |
30 | #### Layout Organization:
31 | - **Linear Pipeline Flow**: Horizontal arrangement Stage 1 → Policy → Stage 2 → Stage 3 → Stage 4 (left to right progression)
32 | - **Phase Context**: Subtle background zones indicating post_physics (Step N-1) and pre_physics (Step N) timing without overwhelming stage flow
33 | - **Clean Staging**: Each stage as distinct box (~120-140px width) with clear functional purpose
34 | - **Policy Integration**: Policy network as natural bridge between Stage 1 (observations) and Stage 2 (actions)
35 | - **Directional Flow**: Prominent arrows showing data progression through pipeline stages
36 |
37 | #### Visual Hierarchy:
38 | 1. **Primary**: Linear stage sequence showing functional pipeline progression
39 | 2. **Secondary**: Phase timing context as subtle background information
40 | 3. **Supporting**: Data flow arrows and timing labels (Step N-1, Step N)
41 |
42 | #### Content Approach (Descriptive, Not Promotional):
43 | - **Architecture Description**: Focus on WHAT the timing pattern accomplishes
44 | - **Timing Context**: Clear temporal labels without "benefits" language
45 | - **Functional Focus**: Describe data flow and stage purposes rather than advantages
46 | - **No Architecture Summary Box**: Keep diagram focused on visual flow, move architectural description to text documentation
47 |
48 | #### Educational Focus:
49 | - Show WHEN each stage executes through clear timing phases
50 | - Show WHAT data flows between stages with prominent arrows and specific tensor labels
51 | - Show HOW the 4-stage pipeline maps to 2 timing phases
52 | - Include all data dependencies (e.g., Stage 2 receives both actions AND active_prev_targets)
53 | - Clarify policy forward pass happens in pre_physics phase
54 |
55 | #### Additional Requirements:
56 | - **Policy Interpretation Note**: Text documentation should clarify that policy output can have any meaning; the action rule determines how to interpret policy output for DOF target updates
57 | - **Complete Data Flow**: Show all inputs to each stage, not just primary flow (e.g., Stage 2 needs active_prev_targets, active_rule_targets, AND actions)
58 | - **Variable Clarity**: Label arrows with exact tensor variable names from implementation to avoid confusion between similar-sounding targets
59 |
60 | ### Phase 3: Cross-References
61 | - Update any existing links to guide-action-pipeline.md
62 | - No new documentation file needed
63 |
64 | ## Quality Standards
65 | - Follow CLAUDE.md documentation development protocol
66 | - Maintain existing document structure and flow
67 | - Verify technical accuracy against refactor-006-action-processing.md
68 | - Single source of truth for all action processing concepts
69 |
--------------------------------------------------------------------------------
/prompts/feat-004-action-rule-example.md:
--------------------------------------------------------------------------------
1 | # Action Rule Use Cases Documentation
2 |
3 | ## Status
4 | - **Example Implementation**: Not needed at this time
5 | - **Documentation**: Required - create conceptual guide
6 |
7 | ## Problem
8 | The existing `guide-action-pipeline.md` provides comprehensive technical documentation, but lacks conceptual understanding through elegant examples that demonstrate the intellectual beauty of the 4-stage pipeline approach.
9 |
10 | ## Solution
11 | Create `docs/guide-action-rule-use-cases.md` - a conceptual companion guide focusing on elegant examples and use cases rather than technical implementation details.
12 |
13 | ## Documentation PRD
14 |
15 | **Document**: `docs/guide-action-rule-use-cases.md`
16 |
17 | **Purpose**: Demonstrate the intellectual beauty and natural problem decomposition enabled by the action rule pipeline through clean, elegant examples.
18 |
19 | **Target Audience**:
20 | - Engineers who want to understand and extend standard control modes
21 | - Researchers who want to see clean examples of pipeline-based problem decomposition
22 |
23 | **Key Insight**: Standard control modes (`position`/`position_delta`) are elegant action rule implementations, not separate control pathways. This provides a natural bridge from familiar concepts to advanced research applications.
24 |
25 | **Structure**:
26 |
27 | ### 1. Standard Control Modes as Action Rules (2 paragraphs)
28 | - **Concept**: Show how `position` and `position_delta` modes are implemented as action rules
29 | - **Purpose**: Demystify the abstraction - "When you use position control, you're already using action rules"
30 | - **Content**: Brief pseudocode showing clean separation of concerns in standard modes
31 | - **Focus**: Familiar control modes as showcases of good pipeline design
32 |
33 | ### 2. Pipeline Philosophy (1 paragraph)
34 | - **Concept**: Why 4-stage separation creates intellectual elegance
35 | - **Content**: Each stage has distinct responsibility, enabling natural problem decomposition
36 | - **Focus**: Conceptual benefits of the pipeline approach
37 |
38 | ### 3. Research Use Cases (2-3 elegant examples)
39 |
40 | #### 3.1 Residual Learning
41 | - **Pre-action**: Set DOF targets to dataset values (baseline)
42 | - **Action rule**: Add scaled policy output to previous targets (correction)
43 | - **Post-action**: Clip targets to physical limits (constraint)
44 | - **Beauty**: Clean separation of baseline, correction, and constraint
45 |
46 | #### 3.2 Confidence-Based Selective DOF Control
47 | - **Pre-action**: Fallback controller computes complete safe baseline targets (not using policy)
48 | - **Action rule**: Per-DOF selective replacement - if confidence[i] > threshold, use policy target[i], else keep fallback target[i]
49 | - **Post-action**: Sanity-based selective reversion - check mixed targets against safety rules, revert problematic DOFs to fallback
50 | - **Beauty**: Heterogeneous control architecture with dual-layer safety (confidence + sanity validation)
51 |
52 | ### 4. Implementation Notes (brief)
53 | - **Function signatures**: Reference existing technical guide
54 | - **Registration patterns**: Basic examples
55 | - **Keep minimal**: Focus readers on conceptual understanding
56 |
57 | **Writing Principles**:
58 | - **Concise**: Each example focuses on data flow and conceptual beauty
59 | - **Objective**: Clear, descriptive language without promotional tone
60 | - **Educational**: Show natural problem decomposition through pipeline stages
61 | - **Complementary**: References technical guide for implementation details
62 | - **Elegant**: Examples chosen for intellectual beauty and clean separation of concerns
63 |
64 | **Key Messages**:
65 | 1. Standard control modes demonstrate good pipeline design
66 | 2. Complex research problems become elegant when properly decomposed
67 | 3. Each pipeline stage serves a distinct, focused purpose
68 | 4. The 4-stage approach enables natural problem decomposition
69 | 5. Pipeline supports both uniform correction (residual learning) and selective control (confidence switching) patterns
70 |
71 | **Success Criteria**:
72 | - Readers understand how standard modes work as action rules
73 | - Readers see the conceptual elegance of pipeline-based problem decomposition
74 | - Researchers can envision how to apply the pattern to their own problems
75 | - Engineers understand how to extend familiar control modes
76 |
--------------------------------------------------------------------------------
/assets/mjcf/open_ai_assets/fetch/shared.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 |
47 |
48 |
49 |
50 |
51 |
52 |
53 |
54 |
55 |
56 |
57 |
58 |
59 |
60 |
61 |
62 |
63 |
64 |
65 |
66 |
67 |
--------------------------------------------------------------------------------
/docs/guide-viewer-controller.md:
--------------------------------------------------------------------------------
1 | # Viewer Controller Guide
2 |
3 | This guide explains how to use the viewer controller for interactive control of the DexHand environment.
4 |
5 | ## Overview
6 |
7 | The ViewerController component provides keyboard shortcuts for controlling the camera view, navigating between robots, and resetting environments during visualization. It's automatically enabled when running with a viewer (non-headless mode).
8 |
9 | ## Keyboard Shortcuts
10 |
11 | ### Camera Controls
12 |
13 | | Key | Action | Description |
14 | |-----|--------|-------------|
15 | | **Enter** | Toggle View Mode | Cycles through camera views: Free → Rear → Right → Bottom → Free |
16 | | **G** | Toggle Follow Mode | Switches between following a single robot or viewing all robots globally |
17 |
18 | ### Navigation Controls
19 |
20 | | Key | Action | Description |
21 | |-----|--------|-------------|
22 | | **↑** (Up Arrow) | Previous Robot | Navigate to the previous robot (only in single follow mode) |
23 | | **↓** (Down Arrow) | Next Robot | Navigate to the next robot (only in single follow mode) |
24 |
25 | ### Environment Controls
26 |
27 | | Key | Action | Description |
28 | |-----|--------|-------------|
29 | | **P** | Reset Environment | Reset the currently selected robot/environment |
30 |
31 | ## Camera View Modes
32 |
33 | ### 1. Free Camera
34 | - Manual camera control using mouse
35 | - No automatic following
36 | - Full freedom to position camera anywhere
37 |
38 | ### 2. Rear View
39 | - Camera positioned behind the hand
40 | - Follows hand movement automatically
41 | - Good for observing finger movements from behind
42 |
43 | ### 3. Right View
44 | - Camera positioned to the right of the hand
45 | - Follows hand movement automatically
46 | - Useful for side perspective of grasping
47 |
48 | ### 4. Bottom View
49 | - Camera positioned below, looking up at the hand
50 | - Follows hand movement automatically
51 | - Ideal for observing palm and contact points
52 |
53 | ## Follow Modes
54 |
55 | ### Single Robot Mode (Default)
56 | - Camera follows one specific robot
57 | - Use arrow keys to switch between robots
58 | - Camera stays focused on the selected robot
59 | - Useful for detailed observation of individual behaviors
60 |
61 | ### Global View Mode
62 | - Camera shows all robots at once
63 | - Camera position centers on all robots
64 | - Increased camera distance for wider view
65 | - Useful for comparing multiple robots or batch training
66 |
67 | ## Usage Examples
68 |
69 | ### Basic Interaction
70 | ```bash
71 | # Run with viewer enabled
72 | python examples/dexhand_test.py
73 |
74 | # During execution:
75 | # - Press Enter to cycle through camera views
76 | # - Press G to toggle between single/global view
77 | # - Press ↑/↓ to change which robot to follow
78 | # - Press P to reset the current robot
79 | ```
80 |
81 | ### Multi-Environment Setup
82 | ```bash
83 | # Run with multiple environments
84 | python examples/dexhand_test.py --num-envs 4
85 |
86 | # Use global view (G) to see all robots
87 | # Switch to single mode (G) to focus on one
88 | # Navigate between robots with arrow keys
89 | ```
90 |
91 | ## Console Feedback
92 |
93 | The viewer controller provides console output for all actions:
94 | - Camera mode changes: `"Camera: Rear View (following robot 0)"`
95 | - Follow mode changes: `"Camera: Rear View (global view)"`
96 | - Robot selection: `"Following robot 2"`
97 | - Invalid actions: `"Cannot change robot in global view mode. Press Tab to switch to single robot mode."`
98 |
99 | ## Implementation Details
100 |
101 | ### Default Settings
102 | - Starts in **Rear View** with **Single Robot** follow mode
103 | - Follows robot 0 by default
104 | - All keyboard events are automatically subscribed when viewer is created
105 |
106 | ### Camera Positioning
107 | - Each view mode has predefined offset positions
108 | - Global view increases camera distance for better overview
109 | - Camera smoothly updates position each frame when following
110 |
111 | ### Integration with Environment
112 | - Reset command (`P` key) triggers `reset_idx()` for selected environment
113 | - Camera updates use hand positions from rigid body states
114 | - Works seamlessly with both CPU and GPU pipelines
115 |
116 | ## Troubleshooting
117 |
118 | ### Camera Not Following
119 | - Ensure you're not in Free Camera mode (press Enter to cycle)
120 | - Check that follow mode is set to Single (press G if needed)
121 | - Verify hand positions are being updated correctly
122 |
123 | ### Cannot Change Robot
124 | - You must be in Single Robot mode to navigate between robots
125 | - Press G to switch from Global to Single mode
126 | - Then use arrow keys to select different robots
127 |
128 | ### Viewer Not Responding
129 | - Ensure viewer was created (not running in headless mode)
130 | - Check that keyboard events are being processed
131 | - Verify no other application has keyboard focus
132 |
--------------------------------------------------------------------------------
/prompts/doc-002-control-dt-illustration.md:
--------------------------------------------------------------------------------
1 | # Control_dt vs Physics_dt Illustration Documentation
2 |
3 | ## Problem Statement
4 |
5 | The control_dt measurement and two-stage initialization system is a core architectural concept that is poorly understood. Current documentation lacks visual explanation of the parallel simulation constraint that drives this design.
6 |
7 | ## Current Understanding Issues
8 |
9 | - Users assume physics_steps_per_control_step is configurable (it's measured)
10 | - Confusion about why measurement is necessary (parallel GPU simulation constraint)
11 | - Misunderstanding that stepping varies per control cycle (it's deterministic after measurement - ALL control steps have the SAME number of physics steps)
12 |
13 | ## Documentation Goals
14 |
15 | Create comprehensive SVG timeline illustration showing:
16 |
17 | ### 1. Parallel Simulation Constraint (Core Concept)
18 | - Timeline showing all N environments must step together on GPU
19 | - Demonstrate why individual environment stepping is impossible
20 | - Show how worst-case reset logic determines physics step count for ALL control steps
21 |
22 | ### 2. Consistent Physics Step Count (Key Insight)
23 | Example: 4 Physics Steps Per Control Step (measured during initialization)
24 | ```
25 | Physics Step Breakdown:
26 | ├── P₁: Standard env.step() call (always required)
27 | ├── P₂: Reset logic - moving hand to new position
28 | ├── P₃: Reset logic - placing/repositioning object
29 | └── P₄: Reset logic - final stabilization after setup
30 |
31 | Result: ALL control steps use 4 physics steps (physics_steps_per_control_step = 4)
32 | control_dt = physics_dt × 4
33 | ```
34 |
35 | ### 3. Deterministic Operation (Post-Measurement)
36 | - EVERY control step takes exactly 4 physics steps (regardless of whether individual environments need reset)
37 | - Fixed control_dt ensures consistent action scaling and timing
38 | - Timeline shows: Control Step 1 [P₁|P₂|P₃|P₄], Control Step 2 [P₁|P₂|P₃|P₄], etc.
39 |
40 | ### 4. Impact on Action Scaling
41 | - position_delta mode requires control_dt for velocity-to-delta conversion
42 | - max_delta = control_dt × max_velocity
43 | - Action scaling coefficients computed during finalize_setup()
44 |
45 | ## Implementation Plan
46 |
47 | ### Documentation Organization
48 | - **Main document**: `docs/control-dt-timing-diagram.md` (conceptual understanding)
49 | - **SVG timeline**: `docs/assets/control-dt-timeline.svg` (visual diagrams)
50 | - **Cross-references**: Links to existing `guide-component-initialization.md` and `reference-physics-implementation.md`
51 |
52 | ### SVG Timeline Specifications
53 | **Dimensions**: ~800px × 400px
54 | **Structure**:
55 | - **Horizontal axis**: Time progression (Control Step 1, 2, 3...)
56 | - **Vertical axis**: Environment timelines (Env 0, Env 1, Env 2, Env 3)
57 | - **Control step containers**: Large boxes spanning 4 physics steps each
58 | - **Physics step subdivisions**: P₁, P₂, P₃, P₄ within each control step
59 |
60 | **Timeline Sequence to Illustrate**:
61 | 1. **Control Step 1**: 4 physics steps [P₁|P₂|P₃|P₄] across all environments
62 | 2. **Control Step 2**: 4 physics steps [P₁|P₂|P₃|P₄] (highlight which env is driving reset)
63 | 3. **Control Step 3**: 4 physics steps [P₁|P₂|P₃|P₄] (consistent timing)
64 |
65 | **Visual Elements**:
66 | - **Color coding**: Blue (standard step), Red (reset-driven steps P₂,P₃,P₄)
67 | - **Reset highlighting**: Show which environment needs reset, but ALL environments take 4 steps
68 | - **Synchronization emphasis**: Vertical alignment showing parallel constraint
69 | - **Callouts**: Explain physics step breakdown (env.step + 3 reset steps)
70 |
71 | ### Cross-Reference Strategy
72 | **FROM existing docs TO new timing diagram**:
73 | - `guide-component-initialization.md` → Add link in "Why Two-Stage is Necessary" section
74 | - `reference-physics-implementation.md` → Add link in "Physics Step Management" section
75 |
76 | **FROM new timing diagram TO existing docs**:
77 | - Reference component-initialization for two-stage implementation details
78 | - Reference physics-implementation for technical stepping specifics
79 | - Reference action pipeline for control_dt scaling impact
80 |
81 | ### Key Messages
82 | 1. **Parallel Constraint**: GPU simulation requires ALL environments step together
83 | 2. **Worst-Case Measurement**: System measures maximum physics steps needed (e.g., 4)
84 | 3. **Deterministic Result**: ALL control steps use same physics step count forever
85 | 4. **Reset Logic Breakdown**: Show specific physics steps (hand move, object place, stabilize)
86 |
87 | ## Success Criteria
88 |
89 | - Timeline clearly shows ALL control steps have identical physics step count (4)
90 | - Parallel environment constraint visually obvious through vertical alignment
91 | - Reset logic breakdown clearly explains where extra physics steps come from
92 | - Readers understand deterministic timing (no variation between control steps)
93 | - Cross-references create logical documentation flow from concept → implementation → technical details
94 |
--------------------------------------------------------------------------------
/dexhand_env/factory.py:
--------------------------------------------------------------------------------
1 | """
2 | Factory for creating DexHand environments.
3 |
4 | This module provides factory functions for creating DexHand environments
5 | with different tasks.
6 | """
7 |
8 | # Import loguru
9 | from loguru import logger
10 |
11 | # Import tasks first (they will import Isaac Gym)
12 | from dexhand_env.tasks.dexhand_base import DexHandBase
13 | from dexhand_env.tasks.base_task import BaseTask
14 | from dexhand_env.tasks.blind_grasping_task import BlindGraspingTask
15 |
16 | # Import PyTorch after Isaac Gym modules
17 | import torch
18 |
19 |
20 | def create_dex_env(
21 | task_name,
22 | cfg,
23 | rl_device,
24 | sim_device,
25 | graphics_device_id,
26 | force_render=False,
27 | video_config=None,
28 | ):
29 | """
30 | Create a DexHand environment with the specified task.
31 |
32 | Args:
33 | task_name: Name of the task to create
34 | cfg: Configuration dictionary
35 | rl_device: Device for RL computations
36 | sim_device: Device for simulation
37 | graphics_device_id: Graphics device ID
38 | force_render: Whether to force rendering
39 | video_config: Optional video recording configuration
40 |
41 | Returns:
42 | A DexHand environment with the specified task
43 | """
44 | logger.info(f"Creating DexHand environment with task: {task_name}")
45 |
46 | # Create the task component based on the task name
47 | try:
48 | if task_name == "BaseTask":
49 | # Base task with minimal functionality
50 | logger.debug("Creating BaseTask...")
51 | # Ensure device is properly set - rl_device is the one used for tensors
52 | task = BaseTask(
53 | None, None, torch.device(rl_device), cfg["env"]["numEnvs"], cfg
54 | )
55 | elif task_name == "BlindGrasping":
56 | # Box grasping task
57 | logger.debug("Creating BlindGraspingTask...")
58 | task = BlindGraspingTask(
59 | None, None, torch.device(rl_device), cfg["env"]["numEnvs"], cfg
60 | )
61 | else:
62 | raise ValueError(f"Unknown task: {task_name}")
63 |
64 | logger.debug("Task created successfully, creating environment...")
65 |
66 | # Derive headless from explicit viewer configuration
67 | headless = not cfg["env"]["viewer"]
68 |
69 | # Create the environment with the task component
70 | env = DexHandBase(
71 | cfg,
72 | task,
73 | rl_device,
74 | sim_device,
75 | graphics_device_id,
76 | headless,
77 | force_render,
78 | video_config,
79 | )
80 |
81 | logger.debug("Environment created successfully")
82 |
83 | return env
84 |
85 | except Exception as e:
86 | logger.error(f"ERROR in create_dex_env: {e}")
87 | import traceback
88 |
89 | traceback.print_exc()
90 | raise
91 |
92 |
93 | def make_env(
94 | task_name: str,
95 | num_envs: int,
96 | sim_device: str,
97 | rl_device: str,
98 | graphics_device_id: int,
99 | cfg: dict = None,
100 | force_render: bool = False,
101 | video_config: dict = None,
102 | ):
103 | """
104 | Create a DexHand environment for RL training.
105 |
106 | This is the main entry point for creating environments compatible with
107 | RL libraries like rl_games.
108 |
109 | Args:
110 | task_name: Name of the task (e.g., "BaseTask", "DexGrasp")
111 | num_envs: Number of parallel environments
112 | sim_device: Device for physics simulation (e.g., "cuda:0", "cpu")
113 | rl_device: Device for RL algorithm (e.g., "cuda:0", "cpu")
114 | graphics_device_id: GPU device ID for rendering
115 | cfg: Optional configuration dictionary (will load from file if not provided)
116 | force_render: Whether to force rendering even in headless mode
117 | video_config: Optional video recording configuration
118 |
119 | Returns:
120 | DexHandBase environment instance
121 | """
122 | # Load configuration if not provided
123 | if cfg is None:
124 | from dexhand_env.utils.config_utils import load_config
125 |
126 | config_path = f"dexhand_env/cfg/task/{task_name}.yaml"
127 | cfg = load_config(config_path)
128 |
129 | # Ensure numEnvs is set in config
130 | if "numEnvs" not in cfg["env"]:
131 | cfg["env"]["numEnvs"] = num_envs
132 | elif cfg["env"]["numEnvs"] != num_envs:
133 | logger.info(f"Updating numEnvs from {cfg['env']['numEnvs']} to {num_envs}")
134 | cfg["env"]["numEnvs"] = num_envs
135 |
136 | # Create environment using existing factory function
137 | env = create_dex_env(
138 | task_name=task_name,
139 | cfg=cfg,
140 | rl_device=rl_device,
141 | sim_device=sim_device,
142 | graphics_device_id=graphics_device_id,
143 | force_render=force_render,
144 | video_config=video_config,
145 | )
146 |
147 | return env
148 |
--------------------------------------------------------------------------------
/dexhand_env/tasks/base_task.py:
--------------------------------------------------------------------------------
1 | """
2 | Base task implementation for DexHand.
3 |
4 | This module provides a minimal task implementation that satisfies the DexTask interface
5 | without adding any specific task behavior. It can be used as a starting point for new tasks
6 | or for testing the basic environment functionality.
7 | """
8 |
9 | from typing import Dict, Optional
10 |
11 | # Import PyTorch
12 | import torch
13 |
14 | from dexhand_env.tasks.task_interface import DexTask
15 |
16 |
17 | class BaseTask(DexTask):
18 | """
19 | Minimal task implementation for DexHand.
20 |
21 | This task provides the minimal implementation required by the DexTask interface,
22 | without adding any specific task behavior. It returns empty reward terms,
23 | no success/failure criteria, and doesn't add any task-specific actors.
24 |
25 | Use this as a base class for new tasks or for testing the basic environment.
26 | """
27 |
28 | def __init__(self, sim, gym, device, num_envs, cfg):
29 | """
30 | Initialize the base task.
31 |
32 | Args:
33 | sim: Simulation instance
34 | gym: Gym instance
35 | device: PyTorch device
36 | num_envs: Number of environments
37 | cfg: Configuration dictionary
38 | """
39 | self.sim = sim
40 | self.gym = gym
41 | self.device = device
42 | self.num_envs = num_envs
43 | self.cfg = cfg
44 |
45 | # Reference to parent environment (set by DexHandBase)
46 | self.parent_env = None
47 |
48 | def compute_task_reward_terms(
49 | self, obs_dict: Dict[str, torch.Tensor]
50 | ) -> Dict[str, torch.Tensor]:
51 | """
52 | Compute task-specific reward components.
53 |
54 | The base task doesn't provide any specific rewards beyond the common rewards
55 | handled by DexHandBase.
56 |
57 | Args:
58 | obs_dict: Dictionary of observations
59 |
60 | Returns:
61 | Empty dictionary of task-specific reward components
62 | """
63 | return {}
64 |
65 | def check_task_success_criteria(
66 | self, obs_dict: Optional[Dict[str, torch.Tensor]] = None
67 | ) -> Dict[str, torch.Tensor]:
68 | """
69 | Check task-specific success criteria.
70 |
71 | Args:
72 | obs_dict: Optional dictionary of observations. If provided, can be used
73 | for efficiency to avoid recomputing observations.
74 |
75 | The base task doesn't define any success criteria.
76 |
77 | Returns:
78 | Empty dictionary of task-specific success criteria
79 | """
80 | return {}
81 |
82 | def check_task_failure_criteria(
83 | self, obs_dict: Optional[Dict[str, torch.Tensor]] = None
84 | ) -> Dict[str, torch.Tensor]:
85 | """
86 | Check task-specific failure criteria.
87 |
88 | Args:
89 | obs_dict: Optional dictionary of observations. If provided, can be used
90 | for efficiency to avoid recomputing observations.
91 |
92 | The base task doesn't define any failure criteria.
93 |
94 | Returns:
95 | Empty dictionary of task-specific failure criteria
96 | """
97 | return {}
98 |
99 | def reset_task_state(self, env_ids: torch.Tensor):
100 | """
101 | Reset task-specific state for the specified environments.
102 |
103 | The base task doesn't have any specific state to reset.
104 |
105 | Args:
106 | env_ids: Environment indices to reset
107 | """
108 | pass
109 |
110 | def create_task_objects(self, gym, sim, env_ptr, env_id: int):
111 | """
112 | Add task-specific objects to the environment.
113 |
114 | The base task doesn't add any specific objects.
115 |
116 | Args:
117 | gym: Gym instance
118 | sim: Simulation instance
119 | env_ptr: Pointer to the environment to add objects to
120 | env_id: Index of the environment being created
121 | """
122 | pass
123 |
124 | def load_task_assets(self):
125 | """
126 | Load task-specific assets and define task-specific variables.
127 |
128 | The base task doesn't load any specific assets.
129 | """
130 | pass
131 |
132 | def get_task_observations(
133 | self, obs_dict: Dict[str, torch.Tensor]
134 | ) -> Optional[Dict[str, torch.Tensor]]:
135 | """
136 | Get task-specific observations.
137 |
138 | The base task doesn't provide any task-specific observations.
139 |
140 | Args:
141 | obs_dict: Dictionary of current observations
142 |
143 | Returns:
144 | None, indicating no task-specific observations
145 | """
146 | return None
147 |
148 | def set_tensor_references(self, root_state_tensor: torch.Tensor):
149 | """
150 | Set references to simulation tensors needed by the task.
151 |
152 | The base task doesn't need tensor references.
153 |
154 | Args:
155 | root_state_tensor: Root state tensor for all actors
156 | """
157 | pass
158 |
--------------------------------------------------------------------------------
/dexhand_env/utils/experiment_manager.py:
--------------------------------------------------------------------------------
1 | """
2 | Experiment directory management utilities for DexHand.
3 |
4 | Simple experiment management with train/test separation and latest symlinks.
5 | """
6 |
7 | from pathlib import Path
8 | from typing import Optional, List
9 |
10 |
11 | def classify_experiment_type(experiment_name: str) -> str:
12 | """Classify experiment as 'train' or 'test' based on name."""
13 | return "test" if "_test_" in experiment_name.lower() else "train"
14 |
15 |
16 | class ExperimentManager:
17 | """
18 | Manages experiment directories with workspace and archive.
19 |
20 | Structure:
21 | - runs_all/: Archive with all experiments
22 | - runs/: Workspace with recent experiments (symlinks)
23 | - runs/latest_train, runs/latest_test: Latest symlinks
24 | """
25 |
26 | def __init__(self, max_train_runs: int = 10, max_test_runs: int = 10):
27 | self.max_train_runs = max_train_runs
28 | self.max_test_runs = max_test_runs
29 |
30 | self.runs_all_dir = Path("runs_all")
31 | self.runs_dir = Path("runs")
32 |
33 | self._ensure_directories()
34 |
35 | def _ensure_directories(self):
36 | """Ensure required directories exist."""
37 | self.runs_all_dir.mkdir(exist_ok=True)
38 | self.runs_dir.mkdir(exist_ok=True)
39 |
40 | def create_experiment_directory(self, experiment_name: str) -> Path:
41 | """Create experiment directory and manage workspace."""
42 | # Create in archive
43 | archive_dir = self.runs_all_dir / experiment_name
44 | archive_dir.mkdir(parents=True, exist_ok=True)
45 |
46 | # Create workspace symlink
47 | workspace_symlink = self.runs_dir / experiment_name
48 | if not workspace_symlink.exists():
49 | workspace_symlink.symlink_to(archive_dir.absolute())
50 |
51 | # Cleanup and update symlinks
52 | self._cleanup_workspace()
53 | self._update_latest_symlinks()
54 |
55 | return archive_dir
56 |
57 | def _cleanup_workspace(self):
58 | """Remove old symlinks to maintain limits."""
59 | # Get all experiment symlinks (exclude latest_* symlinks)
60 | symlinks = [
61 | item
62 | for item in self.runs_dir.iterdir()
63 | if item.is_symlink() and not item.name.startswith("latest_")
64 | ]
65 |
66 | # Separate by type
67 | train_symlinks = [
68 | s for s in symlinks if classify_experiment_type(s.name) == "train"
69 | ]
70 | test_symlinks = [
71 | s for s in symlinks if classify_experiment_type(s.name) == "test"
72 | ]
73 |
74 | # Sort by modification time (newest first)
75 | train_symlinks.sort(key=lambda p: p.lstat().st_mtime, reverse=True)
76 | test_symlinks.sort(key=lambda p: p.lstat().st_mtime, reverse=True)
77 |
78 | # Remove old symlinks
79 | for old_symlink in train_symlinks[self.max_train_runs :]:
80 | old_symlink.unlink()
81 | for old_symlink in test_symlinks[self.max_test_runs :]:
82 | old_symlink.unlink()
83 |
84 | def _update_latest_symlinks(self):
85 | """Update latest_train and latest_test symlinks."""
86 | experiments = self.get_all_experiments()
87 |
88 | # Separate by type
89 | train_experiments = [
90 | e for e in experiments if classify_experiment_type(e.name) == "train"
91 | ]
92 | test_experiments = [
93 | e for e in experiments if classify_experiment_type(e.name) == "test"
94 | ]
95 |
96 | # Update latest_train
97 | if train_experiments:
98 | self._update_symlink("latest_train", train_experiments[0])
99 |
100 | # Update latest_test
101 | if test_experiments:
102 | self._update_symlink("latest_test", test_experiments[0])
103 |
104 | def _update_symlink(self, symlink_name: str, target: Path):
105 | """Update a symlink to point to target."""
106 | symlink_path = self.runs_dir / symlink_name
107 | if symlink_path.exists() or symlink_path.is_symlink():
108 | symlink_path.unlink()
109 | symlink_path.symlink_to(target.absolute())
110 |
111 | def get_all_experiments(self) -> List[Path]:
112 | """Get all experiments sorted by modification time (newest first)."""
113 | experiments = []
114 | if self.runs_all_dir.exists():
115 | experiments.extend(d for d in self.runs_all_dir.iterdir() if d.is_dir())
116 | return sorted(experiments, key=lambda p: p.stat().st_mtime, reverse=True)
117 |
118 | def get_latest_experiment(self, run_type: str = "train") -> Optional[Path]:
119 | """Get latest experiment of specified type."""
120 | experiments = self.get_all_experiments()
121 | filtered = [
122 | e for e in experiments if classify_experiment_type(e.name) == run_type
123 | ]
124 | return filtered[0] if filtered else None
125 |
126 |
127 | def create_experiment_manager(cfg) -> ExperimentManager:
128 | """Create ExperimentManager from configuration."""
129 | experiment_cfg = getattr(cfg, "experiment", {})
130 | max_train_runs = getattr(experiment_cfg, "maxTrainRuns", 10)
131 | max_test_runs = getattr(experiment_cfg, "maxTestRuns", 10)
132 |
133 | return ExperimentManager(max_train_runs=max_train_runs, max_test_runs=max_test_runs)
134 |
--------------------------------------------------------------------------------
/assets/mjcf/nv_ant.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 |
47 |
48 |
49 |
50 |
51 |
52 |
53 |
54 |
55 |
56 |
57 |
58 |
59 |
60 |
61 |
62 |
63 |
64 |
65 |
66 |
67 |
68 |
69 |
70 |
71 |
72 |
73 |
74 |
75 |
76 |
77 |
78 |
79 |
80 |
81 |
82 |
83 |
84 |
85 |
86 |
87 |
88 |
89 |
90 |
91 |
92 |
93 |
--------------------------------------------------------------------------------
/docs/guide-indefinite-testing.md:
--------------------------------------------------------------------------------
1 | # Indefinite Policy Testing with Hot-Reload
2 |
3 | During training, you want to monitor how your policy evolves without constantly restarting test scripts or losing visual feedback. This guide shows how to set up continuous policy monitoring that automatically loads new checkpoints as training progresses.
4 |
5 | ## The Problem
6 |
7 | Traditional policy evaluation during training is cumbersome:
8 | - **Manual restarts**: Stop test script, find latest checkpoint, restart with new path
9 | - **Static evaluation**: Test a frozen checkpoint while training continues
10 | - **Visual gaps**: No continuous visual feedback on policy improvement
11 |
12 | ## The Solution: Hot-Reload Testing
13 |
14 | Hot-reload testing solves this by running indefinite policy testing with automatic checkpoint discovery and reloading. The system continuously monitors your experiment directory and seamlessly loads newer checkpoints without interrupting the visual feedback loop.
15 |
16 | **Key capabilities:**
17 | - **Automatic discovery**: `checkpoint=latest` finds your most recent training experiment using the experiment management system (see [TRAINING.md](TRAINING.md))
18 | - **Live reloading**: Monitors the experiment directory and loads new checkpoints every 30 seconds (configurable)
19 | - **Indefinite testing**: Runs until manual termination (`testGamesNum=0`)
20 | - **Deployment flexibility**: Works with local Isaac Gym viewer or remote HTTP streaming
21 |
22 | **The `checkpoint=latest` magic:**
23 | 1. **Directory discovery**: Resolves to latest experiment directory via `runs/latest_train` symlink
24 | 2. **Continuous monitoring**: Watches the resolved directory (not a static file) for new checkpoints
25 | 3. **Dynamic loading**: Automatically loads the newest `.pth` file found in `nn/` subdirectory
26 |
27 | ## Deployment Scenarios
28 |
29 | ### Scenario 1: Local Workstation with Server Training
30 |
31 | **When to use**: You can run Isaac Gym viewer locally but training happens on a remote server.
32 |
33 | **Advantages**: Full Isaac Gym interactivity, better visual quality, local keyboard controls
34 | **Trade-offs**: Requires checkpoint synchronization, slightly more setup
35 |
36 | **Server (training):**
37 | ```bash
38 | python train.py config=train_headless task=BlindGrasping
39 | ```
40 |
41 | **Local (checkpoint sync):**
42 | ```bash
43 | # Option A: Simple rsync loop
44 | while true; do
45 | rsync -av server:/path/to/dexrobot_isaac/runs/ ./runs/
46 | sleep 30
47 | done &
48 |
49 | # Option B: File synchronization tools
50 | unison server_profile -repeat 30
51 | ```
52 |
53 | **Local (testing):**
54 | ```bash
55 | python train.py config=test_viewer testGamesNum=0 checkpoint=latest
56 | # Uses runs/latest_train symlink → experiment directory → newest checkpoint
57 | ```
58 |
59 | ### Scenario 2: Remote Server Monitoring
60 |
61 | **When to use**: Training and testing both happen on remote server, monitor via browser.
62 |
63 | **Advantages**: No file synchronization needed, accessible from anywhere, simpler setup
64 | **Trade-offs**: HTTP streaming limitations, browser-based viewing only
65 |
66 | **Server (training):**
67 | ```bash
68 | python train.py config=train_headless task=BlindGrasping
69 | ```
70 |
71 | **Server (monitoring):**
72 | ```bash
73 | python train.py config=test_stream testGamesNum=0 checkpoint=latest streamBindAll=true
74 | # streamBindAll enables access from external IPs (security warning applies)
75 | ```
76 |
77 | **Access**: Open `http://server-ip:58080` in browser
78 |
79 | ## Basic Usage
80 |
81 | **Indefinite testing with hot-reload:**
82 | ```bash
83 | python train.py config=test_viewer testGamesNum=0 checkpoint=latest
84 | ```
85 |
86 | **Customize reload timing:**
87 | ```bash
88 | python train.py config=test_viewer testGamesNum=0 checkpoint=latest reloadInterval=60
89 | ```
90 |
91 | **Use specific experiment:**
92 | ```bash
93 | python train.py config=test_viewer testGamesNum=0 checkpoint=runs/BlindGrasping_train_20250801_095943
94 | ```
95 |
96 | ## Configuration Reference
97 |
98 | **Test duration control:**
99 | - `testGamesNum=0`: Run indefinitely until Ctrl+C (most common for monitoring)
100 | - `testGamesNum=25`: Run exactly 25 episodes then terminate
101 |
102 | **Hot-reload settings:**
103 | - `reloadInterval=30`: Check for new checkpoints every 30 seconds (default)
104 | - `reloadInterval=0`: Disable hot-reload, use static checkpoint
105 |
106 | **Configuration presets:**
107 | - `test_viewer.yaml`: Interactive Isaac Gym viewer (4 environments)
108 | - `test_stream.yaml`: HTTP video streaming (headless)
109 | - `test.yaml`: Base headless testing configuration
110 |
111 | **Parameter overrides:**
112 | ```bash
113 | # Fewer environments for smoother visualization
114 | python train.py config=test_viewer testGamesNum=0 checkpoint=latest numEnvs=1
115 |
116 | # Longer reload interval to reduce overhead
117 | python train.py config=test_viewer testGamesNum=0 checkpoint=latest reloadInterval=120
118 | ```
119 |
120 | ## How Hot-Reload Works
121 |
122 | The hot-reload system uses a background thread that:
123 |
124 | 1. **Resolves experiment directory**: `checkpoint=latest` → `runs/latest_train` symlink → actual experiment directory
125 | 2. **Monitors for changes**: Uses `find_latest_checkpoint_file()` to check for new `.pth` files in the experiment's `nn/` directory
126 | 3. **Detects updates**: Compares file modification times every `reloadInterval` seconds
127 | 4. **Loads seamlessly**: When a newer checkpoint is found, loads the new weights into the running policy without interrupting the episode
128 | 5. **Logs events**: Clear console output shows when reloads occur
129 |
130 | This design enables true continuous monitoring - you start the test process once and watch your policy improve throughout the entire training session.
131 |
--------------------------------------------------------------------------------
/dexhand_env/components/action/default_rules.py:
--------------------------------------------------------------------------------
1 | """
2 | Default action rules for DexHand environment.
3 |
4 | This module provides default action rule implementations for different control modes
5 | (position and position_delta) that can be used across different tasks.
6 | """
7 |
8 | from typing import Callable
9 | from loguru import logger
10 |
11 |
12 | class DefaultActionRules:
13 | """
14 | Factory for default action rules used by the DexHand environment.
15 |
16 | Provides position and position_delta action rules that handle scaling
17 | and applying policy actions to DOF targets while preserving rule-based
18 | control for non-policy-controlled DOFs.
19 | """
20 |
21 | @staticmethod
22 | def create_position_action_rule(action_processor) -> Callable:
23 | """
24 | Create a position mode action rule.
25 |
26 | Args:
27 | action_processor: ActionProcessor instance for accessing scaling utilities
28 |
29 | Returns:
30 | Callable action rule function
31 | """
32 |
33 | def position_action_rule(
34 | active_prev_targets, active_rule_targets, actions, config
35 | ):
36 | """Default position mode action rule using ActionScaling utilities."""
37 | # Start with rule targets - preserves rule-based control for uncontrolled DOFs
38 | targets = active_rule_targets.clone()
39 | scaling = action_processor.action_scaling
40 |
41 | # Only update the DOFs that the policy controls
42 | if config["policy_controls_base"]:
43 | # Scale base actions from [-1, 1] to DOF limits
44 | base_lower = action_processor.active_lower_limits[:6]
45 | base_upper = action_processor.active_upper_limits[:6]
46 | scaled_base = scaling.scale_to_limits(
47 | actions[:, :6], base_lower, base_upper
48 | )
49 | targets[:, :6] = scaled_base
50 |
51 | if config["policy_controls_fingers"]:
52 | # Get finger action indices
53 | finger_start = 6 if config["policy_controls_base"] else 0
54 | finger_end = finger_start + 12
55 |
56 | # Scale finger actions from [-1, 1] to DOF limits
57 | finger_lower = action_processor.active_lower_limits[6:]
58 | finger_upper = action_processor.active_upper_limits[6:]
59 | scaled_fingers = scaling.scale_to_limits(
60 | actions[:, finger_start:finger_end], finger_lower, finger_upper
61 | )
62 | targets[:, 6:] = scaled_fingers
63 |
64 | return targets
65 |
66 | return position_action_rule
67 |
68 | @staticmethod
69 | def create_position_delta_action_rule(action_processor) -> Callable:
70 | """
71 | Create a position_delta mode action rule.
72 |
73 | Args:
74 | action_processor: ActionProcessor instance for accessing scaling utilities
75 |
76 | Returns:
77 | Callable action rule function
78 | """
79 |
80 | def position_delta_action_rule(
81 | active_prev_targets, active_rule_targets, actions, config
82 | ):
83 | """Default position_delta mode action rule using ActionScaling utilities."""
84 | # Start with rule targets
85 | targets = active_rule_targets.clone()
86 | ap = action_processor
87 | scaling = ap.action_scaling
88 |
89 | if config["policy_controls_base"]:
90 | # Apply base deltas using ActionScaling utility
91 | targets[:, :6] = scaling.apply_velocity_deltas(
92 | active_prev_targets[:, :6], actions[:, :6], ap.max_deltas[:6]
93 | )
94 |
95 | if config["policy_controls_fingers"]:
96 | # Get finger action indices
97 | finger_start = 6 if config["policy_controls_base"] else 0
98 | finger_end = finger_start + 12
99 |
100 | # Apply finger deltas using ActionScaling utility
101 | targets[:, 6:] = scaling.apply_velocity_deltas(
102 | active_prev_targets[:, 6:],
103 | actions[:, finger_start:finger_end],
104 | ap.max_deltas[6:],
105 | )
106 |
107 | # Clamp to limits using ActionScaling utility
108 | targets = scaling.clamp_to_limits(
109 | targets, ap.active_lower_limits, ap.active_upper_limits
110 | )
111 |
112 | return targets
113 |
114 | return position_delta_action_rule
115 |
116 | @staticmethod
117 | def setup_default_action_rule(action_processor, control_mode: str):
118 | """
119 | Set up a default action rule based on control mode.
120 |
121 | Args:
122 | action_processor: ActionProcessor instance to configure
123 | control_mode: Control mode ("position" or "position_delta")
124 | """
125 | if control_mode == "position":
126 | action_rule = DefaultActionRules.create_position_action_rule(
127 | action_processor
128 | )
129 | action_processor.set_action_rule(action_rule)
130 | logger.debug("Configured default position action rule")
131 | elif control_mode == "position_delta":
132 | action_rule = DefaultActionRules.create_position_delta_action_rule(
133 | action_processor
134 | )
135 | action_processor.set_action_rule(action_rule)
136 | logger.debug("Configured default position_delta action rule")
137 | else:
138 | raise ValueError(f"Unknown control mode: {control_mode}")
139 |
--------------------------------------------------------------------------------
/docs/guide-environment-resets.md:
--------------------------------------------------------------------------------
1 | # Environment Reset System Guide
2 |
3 | This guide explains the environment reset and termination system in DexHand Isaac environments.
4 |
5 | ## Architecture Overview
6 |
7 | The DexHand environment uses a clean separation between **termination decisions** and **reset execution**:
8 |
9 | - **TerminationManager**: Decides which environments should reset and why (success/failure/timeout)
10 | - **ResetManager**: Executes physical resets (DOF states, poses, randomization)
11 | - **BaseTask**: Coordinates between the two and manages episode progress
12 |
13 | ## Key Components
14 |
15 | ### TerminationManager
16 | **Responsibility**: Evaluate termination conditions and decide which environments should reset
17 |
18 | **Three Termination Types**:
19 | - **Success**: Task completed successfully (positive reward)
20 | - **Failure**: Task failed due to violation (negative reward)
21 | - **Timeout**: Episode reached max length (neutral reward)
22 |
23 | **Key Methods**:
24 | - `evaluate(episode_step_count, success_criteria, failure_criteria)` → returns `(should_reset, termination_info, episode_rewards)`
25 |
26 | ### ResetManager
27 | **Responsibility**: Execute physical environment resets
28 |
29 | **Key Methods**:
30 | - `reset_idx(env_ids)`: Reset DOF states, poses, and apply randomization
31 | - `set_episode_step_count_buffer(buffer)`: Reference to shared episode step counter
32 |
33 | **What it does NOT do**:
34 | - ❌ Does not decide when to reset (no check_termination method)
35 | - ❌ Does not increment episode progress (no increment_progress method)
36 |
37 | ### Episode Progress Management
38 | **Handled directly in BaseTask**:
39 | ```python
40 | # In BaseTask.post_physics_step():
41 | self.episode_step_count += 1 # Direct increment, no method needed
42 | ```
43 |
44 | ## Clean Data Flow
45 |
46 | ```python
47 | # In BaseTask.post_physics_step():
48 |
49 | # 1. Update episode progress directly
50 | self.episode_step_count += 1
51 |
52 | # 2. Evaluate termination conditions
53 | should_reset, termination_info, episode_rewards = self.termination_manager.evaluate(
54 | self.episode_step_count, success_criteria, failure_criteria
55 | )
56 |
57 | # 3. Apply termination rewards
58 | for reward_type, reward_tensor in episode_rewards.items():
59 | self.rew_buf += reward_tensor
60 |
61 | # 4. Reset environments that should reset
62 | if torch.any(should_reset):
63 | env_ids_to_reset = torch.nonzero(should_reset).flatten()
64 | self.reset_manager.reset_idx(env_ids_to_reset)
65 | self.termination_manager.reset_tracking(env_ids_to_reset)
66 | ```
67 |
68 | ## Key Buffers
69 |
70 | ### should_reset Tensor
71 | - **Type**: `torch.Tensor` of shape `(num_envs,)` with dtype `torch.bool`
72 | - **Purpose**: Boolean flags indicating which environments should reset
73 | - **Generated by**: TerminationManager.evaluate()
74 | - **Used by**: BaseTask to determine which environments to reset
75 |
76 | ### episode_step_count Buffer
77 | - **Type**: `torch.Tensor` of shape `(num_envs,)` with dtype `torch.long`
78 | - **Purpose**: Tracks the number of steps in each environment's current episode
79 | - **Managed by**: BaseTask (direct increment)
80 | - **Reset by**: ResetManager.reset_idx() sets to 0 for reset environments
81 |
82 | ## Termination Types and Logging
83 |
84 | The TerminationManager provides detailed termination information for rl_games TensorBoard logging:
85 |
86 | ```python
87 | termination_info = {
88 | "success": success_termination, # Boolean tensor
89 | "failure": failure_termination, # Boolean tensor
90 | "timeout": timeout_termination, # Boolean tensor
91 | "success_rate": success_count / num_envs,
92 | "failure_rate": failure_count / num_envs,
93 | "timeout_rate": timeout_count / num_envs,
94 | }
95 | ```
96 |
97 | This enables proper reward logging in TensorBoard because rl_games can track when episodes actually complete.
98 |
99 | ## Extending Termination Criteria
100 |
101 | ### Adding Task-Specific Success Criteria
102 | ```python
103 | # In your task class:
104 | def check_task_success_criteria(self):
105 | return {
106 | "object_grasped": self.check_grasp_success(),
107 | "target_reached": self.check_target_distance(),
108 | }
109 | ```
110 |
111 | ### Adding Task-Specific Failure Criteria
112 | ```python
113 | # In your task class:
114 | def check_task_failure_criteria(self):
115 | return {
116 | "hand_dropped": self.check_hand_height(),
117 | "object_dropped": self.check_object_height(),
118 | }
119 | ```
120 |
121 | ## Configuration
122 |
123 | ### Episode Length (Timeout)
124 | Set in task configuration:
125 | ```yaml
126 | # BaseTask.yaml
127 | env:
128 | episodeLength: 300 # Steps before timeout termination
129 | ```
130 |
131 | ### Termination Rewards
132 | Set in task configuration:
133 | ```yaml
134 | # BaseTask.yaml
135 | env:
136 | successReward: 10.0 # Reward for success termination
137 | failurePenalty: 5.0 # Penalty for failure termination (applied as negative)
138 | timeoutReward: 0.0 # Reward for timeout termination (usually neutral)
139 | ```
140 |
141 | ## Testing Episode Termination
142 |
143 | You can test termination behavior with:
144 | ```bash
145 | python examples/dexhand_test.py --episode-length 10
146 | ```
147 |
148 | This should reset environments every 10 steps due to timeout termination, visible in the logs.
149 |
150 | ## Benefits of This Architecture
151 |
152 | 1. **No Duplication**: Single timeout check in TerminationManager (was previously duplicated)
153 | 2. **Clear Separation**: Decision logic (TerminationManager) vs execution logic (ResetManager)
154 | 3. **Better Logging**: Proper termination type tracking enables rl_games TensorBoard integration
155 | 4. **Extensible**: Easy to add new termination types or task-specific criteria
156 | 5. **Maintainable**: Single responsibility principle for each component
157 |
158 | ## References
159 | - TerminationManager implementation: `dexhand_env/components/termination_manager.py`
160 | - ResetManager implementation: `dexhand_env/components/reset_manager.py`
161 | - Integration example: `dexhand_env/tasks/dexhand_base.py`
162 |
--------------------------------------------------------------------------------
/prompts/refactor-008-config-key-casing.md:
--------------------------------------------------------------------------------
1 | # refactor-008-config-key-casing.md
2 |
3 | Unify configuration key naming to snake_case under task section for code consistency.
4 |
5 | ## Context
6 |
7 | The configuration files have inconsistent naming conventions, particularly in the `task:` section where some keys use camelCase while Python code conventions prefer snake_case. This creates cognitive friction when working between config files and Python code.
8 |
9 | **Design Decision**: Keep other sections (env, sim, train) as camelCase for CLI usability, but unify the `task:` section to snake_case for code consistency since these keys are primarily accessed by Python code rather than CLI overrides.
10 |
11 | ## Current State
12 |
13 | **BaseTask.yaml - 16 camelCase keys in task section:**
14 | - `policyControlsHandBase` → `policy_controls_hand_base`
15 | - `policyControlsFingers` → `policy_controls_fingers`
16 | - `defaultBaseTargets` → `default_base_targets`
17 | - `defaultFingerTargets` → `default_finger_targets`
18 | - `maxFingerJointVelocity` → `max_finger_joint_velocity`
19 | - `maxBaseLinearVelocity` → `max_base_linear_velocity`
20 | - `maxBaseAngularVelocity` → `max_base_angular_velocity`
21 | - `activeSuccessCriteria` → `active_success_criteria`
22 | - `activeFailureCriteria` → `active_failure_criteria`
23 | - `rewardWeights` → `reward_weights`
24 | - `enableComponentDebugLogs` → `enable_component_debug_logs`
25 | - `maxConsecutiveSuccesses` → `max_consecutive_successes`
26 | - `contactForceBodies` → `contact_force_bodies`
27 | - `contactBinaryThreshold` → `contact_binary_threshold`
28 | - `contactVisualization` → `contact_visualization`
29 | - `policyObservationKeys` → `policy_observation_keys`
30 |
31 | **BlindGrasping.yaml - 9 camelCase keys in task section:**
32 | - `maxBaseLinearVelocity` → `max_base_linear_velocity`
33 | - `maxBaseAngularVelocity` → `max_base_angular_velocity`
34 | - `maxFingerJointVelocity` → `max_finger_joint_velocity`
35 | - `contactBinaryThreshold` → `contact_binary_threshold`
36 | - `penetrationPrevention` → `penetration_prevention`
37 | - `policyObservationKeys` → `policy_observation_keys`
38 | - `activeSuccessCriteria` → `active_success_criteria`
39 | - `activeFailureCriteria` → `active_failure_criteria`
40 | - `rewardWeights` → `reward_weights`
41 |
42 | ## Desired Outcome
43 |
44 | 1. **Configuration Files**: All task section keys use consistent snake_case naming
45 | 2. **Code References**: All Python code references updated to use new snake_case keys
46 | 3. **No Breaking CLI**: env/sim/train sections keep camelCase for CLI usability
47 | 4. **Zero Backward Compatibility**: Clean break, no legacy support needed
48 |
49 | ## Code References Requiring Updates
50 |
51 | **9 Python files with 17 key references:**
52 |
53 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/tasks/dexhand_base.py` (11 references)
54 | - Line 362, 364: `contactForceBodies` → `contact_force_bodies`
55 | - Line 468: `policyControlsHandBase` → `policy_controls_hand_base`
56 | - Line 469: `policyControlsFingers` → `policy_controls_fingers`
57 | - Line 470: `maxFingerJointVelocity` → `max_finger_joint_velocity`
58 | - Line 471: `maxBaseLinearVelocity` → `max_base_linear_velocity`
59 | - Line 472: `maxBaseAngularVelocity` → `max_base_angular_velocity`
60 | - Line 480, 482: `defaultBaseTargets` → `default_base_targets`
61 | - Line 484, 486: `defaultFingerTargets` → `default_finger_targets`
62 | - Line 1103: `enableComponentDebugLogs` → `enable_component_debug_logs`
63 |
64 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/components/termination/termination_manager.py` (4 references)
65 | - Line 48: `activeSuccessCriteria` → `active_success_criteria`
66 | - Line 49: `activeFailureCriteria` → `active_failure_criteria`
67 | - Line 59: `rewardWeights` → `reward_weights`
68 | - Line 71: `maxConsecutiveSuccesses` → `max_consecutive_successes`
69 |
70 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/tasks/blind_grasping_task.py` (3 references)
71 | - Line 185: `penetrationPrevention` → `penetration_prevention`
72 | - Line 797: `contactBinaryThreshold` → `contact_binary_threshold`
73 | - Line 1245: `activeFailureCriteria` → `active_failure_criteria` (in comment)
74 |
75 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/components/reward/reward_calculator.py` (1 reference)
76 | - Line 37: `rewardWeights` → `reward_weights`
77 |
78 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/components/observation/observation_encoder.py` (1 reference)
79 | - Line 712: `contactBinaryThreshold` → `contact_binary_threshold`
80 |
81 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/components/graphics/viewer_controller.py` (1 reference)
82 | - Line 76: `contactVisualization` → `contact_visualization`
83 |
84 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/components/initialization/initialization_manager.py` (1 reference)
85 | - Line 71: `policyObservationKeys` → `policy_observation_keys`
86 |
87 | ### `/home/yiwen/dexrobot_isaac/dexhand_env/components/initialization/hand_initializer.py` (1 reference)
88 | - Line 557: `contactForceBodies` → `contact_force_bodies` (in comment)
89 |
90 | ### `/home/yiwen/dexrobot_isaac/examples/dexhand_test.py` (1 reference)
91 | - Line 1044: `policyControlsHandBase` → `policy_controls_hand_base` (in comment)
92 |
93 | ## Implementation Notes
94 |
95 | **Architecture Compliance:**
96 | - Follows fail-fast philosophy - no backward compatibility, clean break
97 | - Maintains single source of truth - no dual naming support
98 | - Preserves CLI usability for frequently-used env/sim/train keys
99 |
100 | **Testing Strategy:**
101 | - Test both BaseTask and BlindGrasping task loading
102 | - Verify test script and training pipeline work correctly
103 | - Confirm all config key access patterns function properly
104 |
105 | **Breaking Change Protocol:**
106 | - No backward compatibility required per CLAUDE.md guidelines
107 | - Clean architectural improvement with immediate effect
108 | - All references must be updated atomically
109 |
110 | ## Constraints
111 |
112 | - **Scope Limited**: Only `task:` section keys, leave env/sim/train as camelCase
113 | - **No Legacy Support**: Clean break, all references must be updated
114 | - **Architecture Boundaries**: Respect component separation during updates
115 | - **Testing Required**: Both test script and training pipeline must work
116 |
117 | ## Dependencies
118 |
119 | None - standalone refactoring task with no external dependencies.
120 |
--------------------------------------------------------------------------------
/prompts/fix-001-contact-viz.md:
--------------------------------------------------------------------------------
1 | # fix-001-contact-viz.md
2 |
3 | Contact visualization is implemented but not working correctly due to physics data pipeline and timing issues.
4 |
5 | ## Context
6 |
7 | The contact visualization system consists of multiple components that must work together:
8 | - Configuration loading from `task.contactVisualization` in BaseTask.yaml
9 | - Contact force data pipeline from Isaac Gym → TensorManager → ViewerController
10 | - Real-time color updates based on contact force magnitudes
11 | - Keyboard toggle ('C' key) for enabling/disabling visualization
12 |
13 | Initial investigation suggested configuration issues, but thorough analysis revealed the configuration loading path works correctly.
14 |
15 | ## Current State
16 |
17 | **Working Components:**
18 | - ✅ Configuration properly defined in BaseTask.yaml with correct inheritance to BlindGrasping
19 | - ✅ ViewerController correctly accesses config via `parent.task_cfg.get("contactVisualization", {})`
20 | - ✅ Contact body names (`r_f_link*_4`) exist in MJCF and resolve to valid indices
21 | - ✅ Keyboard shortcut 'C' registered and toggle logging implemented
22 | - ✅ Contact visualization rendering pipeline implemented
23 |
24 | **Root Cause Identified:**
25 |
26 | **Architecture Issue**: ViewerController accesses parent's `contact_forces` tensor via `self.parent.contact_forces`, but this tensor is only refreshed during the main simulation step through `physics_manager.step_physics(refresh_tensors=True)`.
27 |
28 | In `ViewerController.render()`, the code calls `gym.refresh_net_contact_force_tensor(self.sim)` to refresh Isaac Gym's tensor, but doesn't update the parent's `contact_forces` tensor through `TensorManager.refresh_tensors()`. This creates a timing mismatch where ViewerController sees stale contact force data.
29 |
30 | **Investigation Results:**
31 |
32 | ✅ **Physics setup works correctly**: ObservationEncoder can access non-zero contact forces, confirming Isaac Gym generates proper contact data
33 | ✅ **TensorManager refresh works correctly**: The main simulation loop properly calls `refresh_tensors()`
34 | ❌ **ViewerController has stale data**: It calls Isaac Gym refresh but doesn't update parent's tensor
35 |
36 | ## Alternative Architecture Solution
37 |
38 | **Proposed Fix**: Instead of ViewerController accessing `self.parent.contact_forces` (which requires coordinated tensor refresh timing), ViewerController should access contact forces from the already-computed `obs_dict`.
39 |
40 | **Available Contact Data in obs_dict:**
41 | - `contact_forces`: Raw 3D force vectors per contact body [num_envs, num_bodies * 3]
42 | - `contact_force_magnitude`: Computed force magnitudes [num_envs, num_bodies]
43 | - `contact_binary`: Binary contact indicators [num_envs, num_bodies]
44 |
45 | **Architectural Benefits:**
46 | 1. **Single source of truth**: Contact forces already computed correctly in observation pipeline
47 | 2. **No timing issues**: `obs_dict` computed at right time in simulation loop
48 | 3. **Clean separation**: ViewerController becomes consumer of processed data, not raw physics tensors
49 | 4. **No coupling**: ViewerController doesn't need TensorManager knowledge
50 | 5. **Reuses working code**: ObservationEncoder already processes contact forces correctly
51 |
52 | ## Desired Outcome
53 |
54 | Contact visualization should work reliably:
55 | - Bodies change color (red intensity) based on contact force magnitude
56 | - Colors update in real-time during simulation
57 | - Toggle with 'C' key shows proper enable/disable logging
58 | - System handles edge cases gracefully with proper error messages
59 |
60 | ## Implementation Strategy
61 |
62 | **Recommended Fix**: Modify ViewerController to access contact forces from `obs_dict` instead of `self.parent.contact_forces`
63 |
64 | **Implementation Steps:**
65 | 1. **Add obs_dict access**: ViewerController needs access to current observation dictionary
66 | 2. **Update contact force source**: Use `obs_dict["contact_force_magnitude"]` instead of computing `torch.norm(self.parent.contact_forces, dim=2)`
67 | 3. **Verify data format**: Ensure obs_dict contact data matches visualization expectations
68 | 4. **Clean up**: Remove unused Isaac Gym tensor refresh calls in ViewerController
69 |
70 | **Technical Details:**
71 | - ViewerController currently computes: `force_magnitudes = torch.norm(contact_forces, dim=2)`
72 | - obs_dict already provides: `contact_force_magnitude` with identical computation
73 | - Shape compatibility: Both are [num_envs, num_bodies] tensors with force magnitudes
74 |
75 | **Testing Approach:**
76 | - Run with BlindGrasping task and make hand contact with box
77 | - Press 'C' to toggle contact visualization and verify logging shows correct config values
78 | - Confirm color changes occur during contact events based on obs_dict data
79 | - Verify performance impact is minimal (should be better since no duplicate computation)
80 |
81 | ## Constraints
82 |
83 | - Must maintain fail-fast philosophy - prefer clear crashes over silent failures
84 | - Respect component architecture patterns and property decorators
85 | - Cannot modify MJCF collision exclusions without understanding full physics implications
86 | - Must preserve existing visualization performance optimizations
87 |
88 | ## Dependencies
89 |
90 | None - this is a self-contained fix within the graphics and physics data pipeline.
91 |
92 | ## Implementation Status - ✅ **COMPLETED** (2025-07-28)
93 |
94 | ### ✅ **COMPLETED TASKS:**
95 | 1. **Modified ViewerController.render()** - Added obs_dict parameter with fail-fast validation
96 | 2. **Updated DexHandBase integration** - Now passes self.obs_dict to viewer_controller.render()
97 | 3. **Implemented fail-fast architecture** - Removed all fallback logic per CLAUDE.md guidelines
98 | 4. **Updated method signature** - update_contact_force_colors() now expects contact_force_magnitudes tensor
99 | 5. **Fixed NameError** - Changed `contact_forces.device` to `contact_force_magnitudes.device` at line 510
100 | 6. **Fixed tensor indexing** - Corrected subset selection for contact bodies with valid indices
101 | 7. **Fixed color comparison logic** - Updated torch.allclose to torch.isclose with proper tensor dimensions
102 | 8. **Fixed dimension handling** - Corrected tensor indexing for color update operations
103 |
104 | ### ✅ **FINAL TESTING RESULTS:**
105 | - Environment initialization completes without crashes
106 | - Contact visualization keyboard shortcut ('C' key) properly registered
107 | - No NameError exceptions during rendering
108 | - System correctly handles obs_dict-based contact force data
109 | - Contact visualization displays red color intensity on finger bodies based on contact force magnitude
110 |
111 | **Architecture Benefits Achieved:**
112 | - Single source of truth: obs_dict contact data
113 | - Eliminated timing issues with stale tensor data
114 | - Fail-fast validation prevents silent failures
115 | - Better performance: no duplicate force magnitude computation
116 | - Robust tensor handling for variable numbers of valid contact bodies
117 |
--------------------------------------------------------------------------------
/prompts/fix-002-consistency.md:
--------------------------------------------------------------------------------
1 | # Fix-002: Test Script and Training Consistency Issues
2 |
3 | ## Problems Identified
4 |
5 | ### 1. Test Script Base Class Compatibility
6 | - Current test script (examples/dexhand_test.py) intentionally patches BaseTask to add contact test box
7 | - Need to ensure this patching approach works reliably with BaseTask
8 | - Verify the test script provides meaningful testing of base functionality
9 |
10 | ### 2. Test Script Argument Complexity
11 | - Test script has 95+ lines of argument definitions with many complex options
12 | - Arguments include: video/rendering, control modes, plotting, profiling, debugging
13 | - Many arguments may be redundant or overly complex for typical usage
14 | - Need to simplify while maintaining essential functionality
15 |
16 | ### 3. Examples Directory Organization
17 | - Only one test script (dexhand_test.py) in examples/
18 | - No clear documentation of what the script tests or how to use it
19 | - Consider if organization/naming could be clearer
20 |
21 | ### 4. Training Compatibility Issues
22 | - Need to verify both "BaseTask" and "BlindGrasping" work with training pipeline
23 | - Check that task switching works properly in train.py
24 | - Ensure configs are compatible and well-documented
25 |
26 | ## Analysis Results
27 |
28 | ### Root Cause Identified
29 | The core consistency issue is a **configuration loading mismatch**:
30 |
31 | 1. **dexhand_test.py**: Uses `yaml.safe_load()` - no Hydra inheritance
32 | 2. **train.py**: Uses Hydra - inheritance works properly
33 | 3. **BlindGraspingTask**: Requires `contactForceBodies` but only inherits it via Hydra defaults
34 |
35 | **Test Results:**
36 | - ✅ `dexhand_test.py` works with BaseTask (has explicit `contactForceBodies`)
37 | - ❌ `dexhand_test.py` fails with BlindGrasping ("No contact force body indices provided")
38 | - ✅ `train.py` works with BaseTask (Hydra resolves inheritance)
39 | - ✅ `train.py` works with BlindGrasping (Hydra resolves inheritance)
40 |
41 | ### Recommended Solution: Switch Test Script to Hydra
42 |
43 | **Benefits:**
44 | 1. **Fixes core issue**: BlindGrasping inheritance works properly
45 | 2. **Configuration consistency**: Test and train use identical config systems
46 | 3. **Proven approach**: `train.py` already uses Hydra successfully
47 | 4. **Future-proofing**: Any new tasks with inheritance work automatically
48 | 5. **Eliminates manual inheritance**: No custom config resolution needed
49 |
50 | **Risks (All Manageable):**
51 | 1. **CLI syntax change**: From `--num-envs 2` to `env.numEnvs=2` (LOW risk - documentation update)
52 | 2. **Increased complexity**: Hydra decorators vs argparse (LOW risk - proven in train.py)
53 | 3. **Dependency consistency**: Need Hydra available (MINIMAL risk - already required)
54 |
55 | **Implementation Pattern:**
56 | ```python
57 | # Current: 95 lines of argparse + manual config loading
58 | def main():
59 | parser = argparse.ArgumentParser()
60 | config = load_config(args.config)
61 |
62 | # New: Similar to train.py
63 | @hydra.main(version_base=None, config_path="dexhand_env/cfg", config_name="config")
64 | def main(cfg: DictConfig):
65 | # Config already loaded with inheritance!
66 | ```
67 |
68 | ## Implementation Plan
69 |
70 | ### Phase 1: Convert Test Script to Hydra (HIGH PRIORITY)
71 | - Replace argparse with Hydra decorator and DictConfig
72 | - Update CLI argument syntax to match train.py patterns
73 | - Test both BaseTask and BlindGrasping functionality
74 | - Update examples documentation with new CLI syntax
75 |
76 | ### Phase 2: Validate Cross-Task Compatibility (MEDIUM PRIORITY)
77 | - Verify both scripts work with both tasks consistently
78 | - Test edge cases and configuration overrides
79 | - Document any remaining inconsistencies
80 |
81 | ### Phase 3: Documentation and Cleanup (LOW PRIORITY)
82 | - Add examples/README.md explaining test script purpose and usage
83 | - Consider argument simplification now that Hydra handles structure
84 | - Ensure consistent patterns between test and train workflows
85 |
86 | ## Implementation Status: FULLY COMPLETED ✅
87 |
88 | ### ✅ Successfully Completed:
89 | 1. **Converted test script to Hydra**: Replaced argparse with `@hydra.main()` decorator
90 | 2. **Updated CLI syntax**: Changed from `--num-envs 2` to `env.numEnvs=2` pattern
91 | 3. **Fixed configuration access**: Updated to DictConfig dot notation throughout
92 | 4. **Resolved core inheritance issue**: BlindGrasping task now loads properly with Hydra inheritance
93 | 5. **Updated documentation**: Modified CLAUDE.md build commands and created examples/README.md
94 | 6. **Verified both tasks work**: BaseTask and BlindGrasping both function with Hydra configuration
95 | 7. **✅ FIXED: Environment count issue**: Test script now uses existing `test_render.yaml` with 4 environments
96 | 8. **✅ FIXED: Control mode validation**: Updated validation to accept both `position` and `position_delta` modes
97 | 9. **✅ VERIFIED: CLI overrides**: All command-line overrides work correctly with new configuration
98 |
99 | ### Final Implementation Changes:
100 |
101 | #### Fix 1: Used Existing Test Configuration
102 | **Solution**: Changed `@hydra.main(config_name="config")` to `@hydra.main(config_name="test_viewer")`
103 | **Result**: Test script now uses existing `base/test.yaml` with `env.numEnvs: 4` (reasonable for testing)
104 | **Benefits**:
105 | - No new files needed - reuses well-designed existing configuration
106 | - Gets proper test defaults (4 environments, fast physics, rendering enabled)
107 | - Leverages existing work optimized for testing scenarios
108 |
109 | #### Fix 2: Flexible Control Mode Validation
110 | **Location**: `examples/dexhand_test.py` lines 1155-1163
111 | **Solution**: Updated validation to accept both `position` and `position_delta` as valid modes
112 | **Code change**: Replaced strict mode matching with flexible validation allowing both modes
113 | **Result**: Both BaseTask (position_delta) and BlindGrasping (position_delta) work without errors
114 |
115 | #### Fix 3: Comprehensive Testing Verification
116 | **BaseTask**: ✅ Works with 4 environments, position_delta mode, proper rendering
117 | **BlindGrasping**: ✅ Works with position_delta mode, task assets load correctly, Hydra inheritance functional
118 | **CLI Overrides**: ✅ All overrides tested and working (`env.numEnvs=2`, `steps=50`, `headless=true`)
119 |
120 | ### Final Impact Assessment:
121 | - **CORE FUNCTIONALITY**: ✅ **FIXED** - BlindGrasping inheritance works perfectly
122 | - **USABILITY**: ✅ **FIXED** - Reasonable environment defaults, flexible mode validation
123 | - **CONSISTENCY**: ✅ **ACHIEVED** - Both scripts use identical Hydra system
124 | - **MAINTAINABILITY**: ✅ **IMPROVED** - Leverages existing test configurations, minimal code changes
125 |
126 | **Overall Status**: ✅ **FULLY COMPLETED** - All consistency issues resolved, both tasks work reliably with proper test defaults and flexible validation.
127 |
128 | REOPENED: `dexhand_test.py` should be hardcoded to use `BaseTask`, without an option for specifying a task. Docs mentioning this file should be updated accordingly.
129 |
--------------------------------------------------------------------------------