Files
lerobot/docs/source/umi_pi0_relative_ee.mdx
T
2026-04-01 13:59:44 +02:00

158 lines
7.1 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# UMI Data with pi0 Relative EE Actions
This guide explains how to prepare a UMI-collected dataset for training a pi0 policy with relative end-effector (EE) actions, and how to run inference with the trained model.
**What we will do:**
1. How to add `observation.state` to an existing UMI LeRobot dataset.
2. How to train pi0 with `use_relative_actions=True`.
3. How to evaluate the trained policy on a real robot.
## Background
[UMI (Universal Manipulation Interface)](https://umi-gripper.github.io) collects manipulation data with hand-held grippers, recovering 6-DoF EE poses via SLAM. UMI datasets stored in LeRobot format already contain `action` (absolute EE pose) and wrist-camera images. To train pi0 with relative actions, we need two additions:
1. **`observation.state`** — the current EE pose the policy conditions on.
2. **Relative action statistics** — so the normalizer sees `(action state)` distributions.
### Why relative actions?
With relative actions, each action in a chunk is an **offset from the current state** rather than an absolute target:
```
relative_action[i] = absolute_action[t + i] state[t]
```
This is the representation advocated by UMI (Chi et al., 2024). Compared to absolute actions it removes the need for a consistent global coordinate frame, and compared to delta actions (each step relative to the previous) it avoids error accumulation across the chunk. See the [Action Representations](action_representations) guide for a full comparison.
## State-Action Offset
UMI SLAM produces a single trajectory of EE poses stored as `action`. We derive `observation.state` from the same trajectory with a configurable offset:
```
state[t] = action[t - offset]
```
| Offset | `state[t]` | Meaning |
| ------ | ------------- | ---------------------------------------------------------------- |
| 0 | `action[t]` | State and action are the same pose at time t |
| 1 | `action[t-1]` | State is the previous action — where the gripper already arrived |
An offset of 1 is the typical UMI convention: at decision time the "current state" is where the gripper _already is_ (the result of the previous command), and the action is where it should go next. At episode boundaries where `t < offset`, we clamp to `action[0]`.
## Step 1: Add `observation.state`
pi0 with `use_relative_actions=True` needs `observation.state` in the dataset to compute `action - state` on the fly. The script in `examples/umi_pi0_relative_ee/convert_umi_dataset.py` adds it. Edit the constants at the top:
```python
HF_DATASET_ID = "<hf_username>/<dataset_repo_id>"
# Option A: Copy an existing feature as observation.state
STATE_SOURCE_FEATURE = "observation.joints" # or "observation.pose", etc.
# Option B: Derive from action with offset (set STATE_SOURCE_FEATURE = None)
STATE_SOURCE_FEATURE = None
STATE_ACTION_OFFSET = 1
```
**Choosing the state source:**
- If your dataset already has a feature in the same space as `action` (e.g. `observation.joints` for joint-space actions, or `observation.pose` for EE-space actions), set `STATE_SOURCE_FEATURE` to copy it.
- If your dataset only has a single trajectory (like raw UMI EE data where action = the EE poses), set `STATE_SOURCE_FEATURE = None` and use `STATE_ACTION_OFFSET` to derive state from the action column with a time offset.
`observation.state` **must have the same shape as `action`** — the relative conversion computes `action - state` element-wise.
Then run:
```bash
python examples/umi_pi0_relative_ee/convert_umi_dataset.py
```
<Tip>
If your dataset already has `observation.state`, the script exits early — nothing to do.
</Tip>
## Step 2: Recompute Relative Action Stats
Use the built-in CLI to recompute dataset statistics in relative space:
```bash
lerobot-edit-dataset \
--repo_id <your_dataset> \
--operation.type recompute_stats \
--operation.relative_action true \
--operation.chunk_size 50 \
--operation.relative_exclude_joints "['gripper']" \
--push_to_hub true
```
The `relative_exclude_joints` parameter specifies joints that stay absolute. Gripper commands are typically binary or continuous open/close and don't benefit from relative encoding. Leave it as `"[]"` to convert all dimensions to relative.
## Step 3: Train
No custom training script is needed — standard `lerobot-train` handles everything:
```bash
lerobot-train \
--dataset.repo_id=<hf_username>/<dataset_repo_id> \
--policy.type=pi0 \
--policy.pretrained_path=lerobot/pi0_base \
--policy.use_relative_actions=true \
--policy.relative_exclude_joints='["gripper"]'
```
Under the hood, the training pipeline:
- Loads relative action stats from the dataset's `meta/stats.json`.
- Configures `RelativeActionsProcessorStep` in the preprocessor (absolute → relative before normalization).
- The model trains on normalized relative action values.
See the [pi0 documentation](pi0) for all available training options.
## Step 4: Evaluate
The evaluation script in `examples/umi_pi0_relative_ee/evaluate.py` runs inference on a real robot (SO-100 with EE space):
```bash
python examples/umi_pi0_relative_ee/evaluate.py
```
Edit `HF_MODEL_ID`, `HF_DATASET_ID`, and robot configuration at the top of the file.
The inference flow uses pi0's built-in processor pipeline — no custom wrappers needed:
1. **Robot → FK** — Joint positions are converted to EE pose via `ForwardKinematicsJointsToEE`, producing `observation.state`.
2. **Preprocessor** — `RelativeActionsProcessorStep` caches the raw `observation.state`, then `NormalizerProcessorStep` normalizes everything.
3. **pi0 inference** — The model predicts a normalized relative action chunk.
4. **Postprocessor** — `UnnormalizerProcessorStep` unnormalizes, then `AbsoluteActionsProcessorStep` adds the cached state back to get absolute EE targets.
5. **IK → Robot** — `InverseKinematicsEEToJoints` converts absolute EE targets to joint commands.
## How the Pieces Fit Together
```
Training:
dataset (absolute EE) → RelativeActionsProcessorStep → NormalizerProcessorStep → pi0 model
Inference:
robot joints → FK → observation.state (absolute EE)
RelativeActionsProcessorStep (caches state)
NormalizerProcessorStep → pi0 model → relative action chunk
UnnormalizerProcessorStep
AbsoluteActionsProcessorStep (+ cached state → absolute EE)
IK → joint targets → robot
```
## References
- [UMI: Universal Manipulation Interface](https://umi-gripper.github.io) — Chi et al., 2024. Defines relative trajectory actions.
- [Action Representations](action_representations) — LeRobot guide comparing absolute, relative, and delta actions.
- [pi0 documentation](pi0) — Full pi0 configuration including `use_relative_actions`.
- [`examples/so100_to_so100_EE/`](https://github.com/huggingface/lerobot/tree/main/examples/so100_to_so100_EE) — EE-space evaluation example this builds on.