Files
lerobot/docs/source/umi_pi0_relative_ee.mdx
T
Pepijn d028978552 nit
2026-04-01 18:04:03 +02:00

147 lines
7.5 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# UMI Data with pi0 Relative EE Actions
This guide explains how to prepare a UMI-collected dataset for training a pi0 policy with relative end-effector (EE) actions, and how to run inference with the trained model.
**What we will do:**
1. Recompute dataset statistics for relative actions and state.
2. Train pi0 with `derive_state_from_action=true` (full UMI pipeline).
3. Evaluate the trained policy on a real robot.
## Background
[UMI (Universal Manipulation Interface)](https://umi-gripper.github.io) collects manipulation data with hand-held grippers, recovering 6-DoF EE poses via SLAM. UMI datasets stored in LeRobot format already contain `action` (absolute EE pose) and wrist-camera images. To train pi0 with relative actions, we need **relative action statistics** — so the normalizer sees `(action state)` distributions.
### Why relative actions?
With relative actions, each action in a chunk is an **offset from the current state** rather than an absolute target:
```
relative_action[i] = absolute_action[t + i] state[t]
```
This is the representation advocated by UMI (Chi et al., 2024). Compared to absolute actions it removes the need for a consistent global coordinate frame, and compared to delta actions (each step relative to the previous) it avoids error accumulation across the chunk. See the [Action Representations](action_representations) guide for a full comparison.
### Full UMI mode: `derive_state_from_action`
When `derive_state_from_action=true`, pi0 automatically derives `observation.state` from the action column on the fly — no separate state column or dataset conversion step needed. Under the hood:
- `action_delta_indices` extends to `[-1, 0, 1, ..., 49]` (one extra leading timestep).
- `DeriveStateFromActionStep` extracts `[action[t-1], action[t]]` as a 2-step state and strips the extra timestep from the action chunk.
- `RelativeActionsProcessorStep` converts actions to offsets from `state[t]`.
- `RelativeStateProcessorStep` converts the 2-step state to relative proprioception (velocity + zeros) and flattens.
This single flag implies `use_relative_state=true` and `state_obs_steps=2`.
During **inference**, state comes from the robot (via FK), so `DeriveStateFromActionStep` is a no-op. `RelativeStateProcessorStep` buffers the previous state and applies the same conversion automatically.
## Step 1: Recompute Stats
Use the built-in CLI to recompute dataset statistics for relative actions and derive-state-from-action:
```bash
lerobot-edit-dataset \
--repo_id <your_dataset> \
--operation.type recompute_stats \
--operation.relative_action true \
--operation.derive_state_from_action true \
--operation.chunk_size 50 \
--operation.relative_exclude_joints "['gripper']" \
--push_to_hub true
```
The `derive_state_from_action` flag tells `recompute_stats` to read from the action column (instead of `observation.state`) when computing relative state stats. It automatically enables `relative_state=true` and `state_obs_steps=2`.
The `relative_exclude_joints` parameter specifies joints that stay absolute. Gripper commands are typically binary or continuous open/close and don't benefit from relative encoding. Leave it as `"[]"` to convert all dimensions to relative.
## Step 2: Train
No custom training script is needed — standard `lerobot-train` handles everything:
```bash
lerobot-train \
--dataset.repo_id=<hf_username>/<dataset_repo_id> \
--policy.type=pi0 \
--policy.pretrained_path=lerobot/pi0_base \
--policy.derive_state_from_action=true \
--policy.use_relative_actions=true \
--policy.relative_exclude_joints='["gripper"]'
```
`derive_state_from_action=true` auto-enables `use_relative_state=true` and `state_obs_steps=2`.
Under the hood, the training pipeline:
- Loads relative action stats and relative state stats from the dataset's `meta/stats.json`.
- Extends `action_delta_indices` to `[-1, 0, 1, ..., 49]` to load one extra leading timestep.
- `DeriveStateFromActionStep` extracts the 2-step state from the action chunk and strips the extra timestep.
- `RelativeActionsProcessorStep` converts actions to offsets from `state[t]`.
- `RelativeStateProcessorStep` converts the 2-step state to relative offsets from the current timestep, then flattens.
- `NormalizerProcessorStep` normalizes everything.
- The model trains on normalized relative values.
See the [pi0 documentation](pi0) for all available training options.
## Step 3: Evaluate
The evaluation script in `examples/umi_pi0_relative_ee/evaluate.py` runs inference on a real robot (SO-100 with EE space):
```bash
python examples/umi_pi0_relative_ee/evaluate.py
```
Edit `HF_MODEL_ID`, `HF_DATASET_ID`, and robot configuration at the top of the file.
### Latency compensation
For real robot deployment, you may want to skip the first few steps of each predicted action chunk to compensate for system latency. Set `LATENCY_SKIP_STEPS` in the evaluate script:
```python
LATENCY_SKIP_STEPS = 0 # ceil(total_latency_ms / (1000 / FPS))
```
For example, at 10Hz with ~200ms total latency, set `LATENCY_SKIP_STEPS = 2`.
The inference flow uses pi0's built-in processor pipeline — no custom wrappers needed:
1. **Robot → FK** — Joint positions are converted to EE pose via `ForwardKinematicsJointsToEE`, producing `observation.state`.
2. **Preprocessor** — `DeriveStateFromActionStep` is a no-op (state comes from robot). `RelativeStateProcessorStep` buffers previous state, stacks, and converts to relative. `RelativeActionsProcessorStep` caches state. `NormalizerProcessorStep` normalizes.
3. **pi0 inference** — The model predicts a normalized relative action chunk.
4. **Postprocessor** — `UnnormalizerProcessorStep` unnormalizes, then `AbsoluteActionsProcessorStep` adds the cached state back to get absolute EE targets.
5. **IK → Robot** — `InverseKinematicsEEToJoints` converts absolute EE targets to joint commands.
## How the Pieces Fit Together
```
Training (full UMI mode: derive_state_from_action=true):
DataLoader (action: B,51,D)
→ DeriveStateFromActionStep (state = action[:,:2,:], action = action[:,1:,:])
→ RelativeActionsProcessorStep (action -= state[:,-1,:])
→ RelativeStateProcessorStep (state offsets from current, flatten → B,2*D)
→ NormalizerProcessorStep → pi0 model
Inference:
robot joints → FK → observation.state (absolute EE)
DeriveStateFromActionStep (no-op)
RelativeActionsProcessorStep (caches state)
RelativeStateProcessorStep (buffers prev, stacks, subtracts, flattens)
NormalizerProcessorStep → pi0 model → relative action chunk
UnnormalizerProcessorStep
AbsoluteActionsProcessorStep (+ cached state → absolute EE)
IK → joint targets → robot
```
## References
- [UMI: Universal Manipulation Interface](https://umi-gripper.github.io) — Chi et al., 2024. Defines relative trajectory actions.
- [Action Representations](action_representations) — LeRobot guide comparing absolute, relative, and delta actions.
- [pi0 documentation](pi0) — Full pi0 configuration including `use_relative_actions`.
- [`examples/so100_to_so100_EE/`](https://github.com/huggingface/lerobot/tree/main/examples/so100_to_so100_EE) — EE-space evaluation example this builds on.