mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-11 22:59:50 +00:00
15934d8d08
* Add option for pi family models to train with relative actions (relative to state) * formatting * add recomputation of stats and option to compute delta stats * normalzie after delta conversion * only recompute state for stats * calulate chunk based stats * sample 100k * load from parquet * sample 1m * stats per chunck * fix * use quantiles * stats for entire dataset * fix * max 1m frames * compute before dist * fix multi gpu processor bug * Fix RTC with delta actions and OpenArms motor_type wiring * feat: align pi0_fast delta actions with pi0/pi05 and add RTC integration tests - Add delta_exclude_joints and action_feature_names to PI0FastConfig - Move to_absolute_actions from modeling to processor pipeline for pi0_fast - Add delta action detection and logging to eval_with_real_robot.py - Add delta actions documentation to pi0 and pi05 READMEs - Fix ruff lint issues in test_delta_actions.py - Add test_rtc_delta_actions.py (24 tests) covering: - ActionQueue with delta vs absolute actions - RTC denoise step with delta leftovers - Full pipeline roundtrip (delta → RTC → absolute) - State rebasing approximation bounds - Non-delta policy compatibility - Multi-chunk consistency * chore: clean up test comments, add OpenPI attribution, remove debug logging - Replace decorative comment separators in test files with plain section headers - Add attribution comments for 1e-6 epsilon in normalize_processor.py (from OpenPI) - Remove debug logging blocks from lerobot_train.py * refactor: extract compute_delta_action_stats into compute_stats.py Move the ~70-line inline delta action stats block from lerobot_train.py into a dedicated function in compute_stats.py, where all other stats computation already lives. The training script now calls it in 6 lines. * refactor: remove unused get_processed_left_over from ActionQueue This method was never called outside of tests. Leftover actions for RTC guidance are always retrieved via get_left_over() (delta/original space). * revert: remove logging-only changes from eval_with_real_robot.py The delta actions detection helper and log message added no functional value — the script already handles delta policies correctly via the processor pipeline. * refactor: use ACTION/OBS_STATE constants instead of hardcoded strings Replace hardcoded "action" and "observation.state" with ACTION and OBS_STATE from utils.constants in compute_stats.py, dataset_tools.py, and lerobot_train.py. * style: remove stray blank lines in training loop * refactor: move delta action stats to preprocessing step, remove on-the-fly computation - Remove on-the-fly compute_delta_action_stats from lerobot_train.py - Rewrite recompute_stats to delegate action stats to compute_delta_action_stats (chunk-based sampling matching what the model sees during training) - Add chunk_size parameter to recompute_stats for delta action computation - Add delta actions documentation to pi0.mdx and pi05.mdx * feat: add recompute_stats CLI operation to lerobot-edit-dataset * fix(tests): relax quantile normalization test tolerance for 1e-6 epsilon * chore: remove agents_memory/pr_details.md from repo * refactor: rename delta actions to relative actions throughout What OpenPI calls "DeltaActions" is actually UMI's "relative trajectory" representation: each action in the chunk is an offset from the current state, not from the previous action. This avoids error accumulation. Renamed across all source, tests, docs, and CLI: - DeltaActionsProcessorStep → RelativeActionsProcessorStep - to_delta_actions → to_relative_actions - use_delta_actions → use_relative_actions - delta_exclude_joints → relative_exclude_joints - compute_delta_action_stats → compute_relative_action_stats - delta_action_processor.py → relative_action_processor.py - test_delta_actions.py → test_relative_actions.py Kept as-is: AbsoluteActionsProcessorStep (converts TO absolute), registry ID "delta_actions_processor" (backward compat), and unrelated delta references (IK pipeline, Robosuite, RA-BC metrics, gym envs). * docs: add Action Representations guide Dedicated page explaining absolute, relative, and delta actions with numerical examples, joint vs EE space, and how to use kinematics pipelines and the relative action processor. References UMI paper (Chi et al., 2024) for the terminology. * docs: remove redundant OpenPI naming note from action representations * docs: remove opinionated OpenPI reference from delta actions section * docs: replace ASCII diagram with UMI paper figure * docs: remove OpenPI reference from action representations * docs: use HF-hosted image instead of local asset * docs: clarify figure attribution * revert: restore original normalization epsilon behavior The 1e-6 unconditional epsilon change perturbed all normalized values, breaking backward compatibility tests. The original approach (1e-8 eps for MEAN_STD, conditional torch.where for QUANTILES) already handles division by zero correctly without affecting non-degenerate cases. * fix: restore delta_action_processor.py used by phone/RL teleop The rename commit incorrectly deleted delta_action_processor.py and duplicated its classes into relative_action_processor.py. Restore the original file and import from it instead. * fix(processor): address PR #2970 review comments - Remove shebang from relative_action_processor.py (library module, not script) - Add device alignment in to_relative_actions/to_absolute_actions so _last_state on CPU doesn't cause cross-device errors when actions are on CUDA - Rename delta_step → relative_step in AbsoluteActionsProcessorStep for naming consistency; update factory.py, all processor files, and tests - Expand _reconnect_relative_absolute_steps docstring to explain why post-hoc rewiring is needed after deserialization - Fix off-by-one in compute_stats.py: sample_upper_bound = total_frames - chunk_size + 1 so last valid start index is included and total_frames == chunk_size is not rejected - Remove redundant NOTE comment in processor_pi05.py (duplicated two lines below) - Fix pi0_fast processor ordering: move relative_step before NormalizerProcessorStep so normalizer sees delta actions (matching pi0/pi05); flip postprocessor to unnormalize → absolute accordingly. Relative stats are now required for all pi models - Revert use_relative_joint_actions_aloha → use_delta_joint_actions_aloha in configuration_smolvla.py (preserve existing public API) - Update action_representations.mdx: add missing joint to 6-DOF example, fix 'based on a figure', clarify pi family ordering, add RTC compatibility section * update rtc link * feat: compute relative action stats over full dataset with optional parallelism Remove the 100k sample cap from compute_relative_action_stats and process all valid chunks. Vectorize with numpy (pre-load actions/states, fancy indexing + broadcasting) for a large speedup over the per-index HF dataset loop. Add num_workers param for thread-based parallelism (numpy releases the GIL). Update docs to show --push_to_hub for recompute_stats. * style: apply ruff formatting to compute_stats.py * testing on real robot * style: fix ruff format and remove redundant .keys() calls
224 lines
11 KiB
Plaintext
224 lines
11 KiB
Plaintext
# Action Representations
|
|
|
|
This guide explains the different ways robot actions can be represented in LeRobot, how they relate to each other, and when to use each one.
|
|
|
|
## Joint Space vs End-Effector Space
|
|
|
|
Before discussing action representations, it helps to understand the two coordinate spaces actions can live in.
|
|
|
|
### Joint Space
|
|
|
|
Joint-space actions directly specify target positions for each motor. For a 6-DOF arm with a gripper, a joint-space action might look like:
|
|
|
|
```
|
|
action = [shoulder_pan: 45.0, shoulder_lift: -20.0, elbow: -30.0, wrist_pitch: 10.0, wrist_roll: 0.0, wrist_yaw: 5.0, gripper: 0.8]
|
|
```
|
|
|
|
Joint space is the default in LeRobot. It is simple, requires no kinematics model, and maps directly to motor commands. Most beginner setups (SO-100, Koch) use joint-space actions.
|
|
|
|
### End-Effector (EE) Space
|
|
|
|
End-effector-space actions specify the desired position and orientation of the robot's tool tip (gripper) in Cartesian coordinates:
|
|
|
|
```
|
|
action = [x: 0.25, y: -0.10, z: 0.15, wx: 0.0, wy: 0.0, wz: 0.1, gripper: 0.8]
|
|
```
|
|
|
|
EE space is more intuitive for tasks like pick-and-place because it directly describes where the gripper should go, but it requires a kinematics model (URDF) to convert between EE poses and joint angles.
|
|
|
|
### Converting Between Spaces
|
|
|
|
LeRobot provides processor steps for converting between joint and EE spaces using forward and inverse kinematics. These are built on top of `RobotKinematics`, which loads a URDF model of your robot.
|
|
|
|
```python
|
|
from lerobot.model.kinematics import RobotKinematics
|
|
from lerobot.robots.so_follower.robot_kinematic_processor import (
|
|
ForwardKinematicsJointsToEE,
|
|
InverseKinematicsEEToJoints,
|
|
)
|
|
|
|
kinematics = RobotKinematics(
|
|
urdf_path="./SO101/so101_new_calib.urdf",
|
|
target_frame_name="gripper_frame_link",
|
|
joint_names=["shoulder", "elbow", "wrist_pitch", "wrist_roll", "wrist_yaw"],
|
|
)
|
|
|
|
# Joints → EE (for observations: "where is my gripper?")
|
|
fk_step = ForwardKinematicsJointsToEE(kinematics=kinematics, motor_names=[...])
|
|
|
|
# EE → Joints (for actions: "move my gripper here")
|
|
ik_step = InverseKinematicsEEToJoints(kinematics=kinematics, motor_names=[...])
|
|
```
|
|
|
|
See [`examples/so100_to_so100_EE/`](https://github.com/huggingface/lerobot/tree/main/examples/so100_to_so100_EE) for a complete working example of recording, replaying, and evaluating with EE-space actions on an SO-100 arm.
|
|
|
|
## Absolute, Relative, and Delta Actions
|
|
|
|
Regardless of whether you work in joint space or EE space, the action values can be expressed in three different ways. The terminology follows [UMI (Chi et al., 2024)](https://arxiv.org/abs/2402.10329).
|
|
|
|
### Absolute Actions (LeRobot default)
|
|
|
|
Each action specifies the target position directly.
|
|
|
|
**Example** (joint space, chunk of 4):
|
|
|
|
```
|
|
current_state = [45.0, -30.0, 10.0]
|
|
|
|
action_chunk = [
|
|
[46.0, -29.0, 11.0], # go to 46, -29, 11
|
|
[47.5, -27.0, 12.0], # go to 47.5, -27, 12
|
|
[49.0, -25.0, 13.5], # go to 49, -25, 13.5
|
|
[50.0, -24.0, 15.0], # go to 50, -24, 15
|
|
]
|
|
```
|
|
|
|
Each value is a target position in the robot's coordinate frame. Simple and direct, but requires a consistent global coordinate frame. This is the default in LeRobot.
|
|
|
|
### Relative Actions (used by OpenPI / pi0)
|
|
|
|
Each action in the chunk is an offset from the **current state at the moment of prediction**. All actions in the chunk share the same reference point:
|
|
|
|
```
|
|
current_state = [45.0, -30.0, 10.0]
|
|
|
|
relative_chunk = [
|
|
[1.0, 1.0, 1.0], # +1 from current → target 46, -29, 11
|
|
[2.5, 3.0, 2.0], # +2.5 from current → target 47.5, -27, 12
|
|
[4.0, 5.0, 3.5], # +4 from current → target 49, -25, 13.5
|
|
[5.0, 6.0, 5.0], # +5 from current → target 50, -24, 15
|
|
]
|
|
```
|
|
|
|
The conversion is straightforward: `relative = absolute - current_state`. To recover absolute: `absolute = relative + current_state`.
|
|
|
|
**Why use relative actions?** The model learns to predict offsets centered around zero, which is easier to normalize and leads to more stable training. Because every chunk references the same current state, there is no error accumulation across chunks.
|
|
|
|
### Delta Actions (sequential differences)
|
|
|
|
Each action is an offset from the **previous action** (or from the current state for the first step):
|
|
|
|
```
|
|
current_state = [45.0, -30.0, 10.0]
|
|
|
|
delta_chunk = [
|
|
[1.0, 1.0, 1.0], # current → 46, -29, 11
|
|
[1.5, 2.0, 1.0], # previous action → 47.5, -27, 12
|
|
[1.5, 2.0, 1.5], # previous action → 49, -25, 13.5
|
|
[1.0, 1.0, 1.5], # previous action → 50, -24, 15
|
|
]
|
|
```
|
|
|
|
Here each step is relative to the one before it. To recover absolute positions you must sum all previous deltas, which means errors accumulate over time. UMI explicitly argues against this representation for this reason.
|
|
|
|
### Visual Comparison
|
|
|
|
The figure below (based on a figure from [UMI, Chi et al., 2024](https://arxiv.org/abs/2402.10329)) illustrates the key difference. With **relative trajectory**, every action in the chunk points back to the same origin (current state), so a new inference step cleanly resets the reference. With **delta**, each action depends on the previous one, so errors accumulate. **Absolute** actions require a consistent global coordinate frame.
|
|
|
|
<img
|
|
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/action_representations_umi.png"
|
|
alt="Relative Trajectory as Action Representation (UMI, Chi et al., 2024)"
|
|
width="85%"
|
|
/>
|
|
|
|
## Using Relative Actions in LeRobot
|
|
|
|
LeRobot provides `RelativeActionsProcessorStep` to convert between absolute and relative actions inside the processor pipeline. This is how pi0, pi0.5, and pi0_fast support relative actions.
|
|
|
|
> **Note:** All pi models (pi0, pi0.5, pi0*fast) apply relative conversion \_before* normalization (`relative → normalize`), so the normalizer always sees delta (relative) values. This means **relative action stats are required** for all of them when training with `use_relative_actions=true`. In pi0_fast the `RelativeActionsProcessorStep` only modifies the action — the state observation is unchanged — so `NormalizerProcessorStep` still runs before the state tokenizer and the tokenizer continues to receive normalized state as expected.
|
|
|
|
### How it works
|
|
|
|
During **training** (preprocessing), actions are converted from absolute to relative before the model sees them:
|
|
|
|
```
|
|
raw absolute action → RelativeActionsProcessorStep → normalize → model
|
|
```
|
|
|
|
During **inference** (postprocessing), model predictions are converted back to absolute before being sent to the robot:
|
|
|
|
```
|
|
model output → unnormalize → AbsoluteActionsProcessorStep → robot
|
|
```
|
|
|
|
The `AbsoluteActionsProcessorStep` reads the cached current state from its paired `RelativeActionsProcessorStep`, so the two must be wired together (handled automatically by the policy factory).
|
|
|
|
### Enabling relative actions for the pi family (pi0, pi0.5, pi0_fast)
|
|
|
|
**Step 1**: Precompute relative action statistics for your dataset:
|
|
|
|
```bash
|
|
lerobot-edit-dataset \
|
|
--repo_id your_dataset \
|
|
--operation.type recompute_stats \
|
|
--operation.relative_action true \
|
|
--operation.chunk_size 50 \
|
|
--operation.relative_exclude_joints "['gripper']"
|
|
```
|
|
|
|
**Step 2**: Train with relative actions enabled:
|
|
|
|
```bash
|
|
lerobot-train \
|
|
--dataset.repo_id=your_dataset \
|
|
--policy.type=pi0 \
|
|
--policy.use_relative_actions=true \
|
|
--policy.relative_exclude_joints='["gripper"]'
|
|
```
|
|
|
|
The `relative_exclude_joints` parameter specifies joints that should remain in absolute space. For example, gripper commands are typically binary (open/close) and don't benefit from relative encoding.
|
|
|
|
### Combining relative actions with RTC
|
|
|
|
[RTC](https://arxiv.org/abs/2506.07339) runs policy inference at high frequency and sends actions to the robot as they are predicted rather than waiting for a full chunk. Relative actions and RTC are fully compatible: because every chunk in relative mode references the **same** current state (captured at the start of inference), each predicted action in the chunk remains a valid offset even if the robot has already moved. No special handling is needed — `RelativeActionsProcessorStep` caches the state once per inference call and `AbsoluteActionsProcessorStep` applies it to every action in the streamed output.
|
|
|
|
### Combining relative actions with EE space
|
|
|
|
Relative actions work in both joint space and EE space. For example, if your dataset stores EE actions, relative encoding converts them to offsets from the current EE pose:
|
|
|
|
```
|
|
current_ee_state = [x: 0.25, y: -0.10, z: 0.15, gripper: 0.8]
|
|
|
|
absolute_ee_chunk = [
|
|
[0.26, -0.09, 0.16, 0.8],
|
|
[0.28, -0.07, 0.18, 0.8],
|
|
]
|
|
|
|
relative_ee_chunk = [
|
|
[0.01, 0.01, 0.01, 0.0], # offset from current EE pose
|
|
[0.03, 0.03, 0.03, 0.0], # offset from current EE pose
|
|
]
|
|
```
|
|
|
|
## Processing Pipeline Summary
|
|
|
|
Here is how the different processors compose. Each arrow is a processor step, and they can be chained in a `RobotProcessorPipeline` or `PolicyProcessorPipeline`:
|
|
|
|
```
|
|
┌─────────────────────────────────────────┐
|
|
Action Space │ Joint Space ←──IK──→ EE Space │
|
|
│ ForwardKinematicsJointsToEE │
|
|
│ InverseKinematicsEEToJoints │
|
|
└─────────────────────────────────────────┘
|
|
|
|
┌─────────────────────────────────────────┐
|
|
Representation │ Absolute ←────→ Relative │
|
|
│ RelativeActionsProcessorStep (pre) │
|
|
│ AbsoluteActionsProcessorStep (post) │
|
|
└─────────────────────────────────────────┘
|
|
|
|
┌─────────────────────────────────────────┐
|
|
Normalization │ Raw ←────→ Normalized │
|
|
│ NormalizerProcessorStep (pre) │
|
|
│ UnnormalizerProcessorStep (post) │
|
|
└─────────────────────────────────────────┘
|
|
```
|
|
|
|
A typical training preprocessor might chain: `raw absolute joint actions → relative → normalize`. A typical inference postprocessor: `unnormalize → absolute → (optionally IK to joints)`.
|
|
|
|
## References
|
|
|
|
- [Universal Manipulation Interface (UMI)](https://arxiv.org/abs/2402.10329) - Chi et al., 2024. Defines the relative trajectory action representation and compares it with absolute and delta actions.
|
|
- [Introduction to Processors](./introduction_processors) - How processor pipelines work in LeRobot.
|
|
- [`examples/so100_to_so100_EE/`](https://github.com/huggingface/lerobot/tree/main/examples/so100_to_so100_EE) - Complete example of recording and evaluating with EE-space actions.
|