mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-21 11:39:50 +00:00
feat(umi): simplify to derive_state_from_action and cam0-only
- Remove fix_dataset.py (user fixes dataset at source) - evaluate.py: replace observation.pose/joints with observation.state (8D, derived from action during training, from FK at inference) - evaluate.py: remove cam1 — training uses only cam0 - docs: rewrite workflow around derive_state_from_action=true, updated recompute-stats and training commands with relative_exclude_joints for gripper dims Made-with: Cursor
This commit is contained in:
@@ -1,16 +1,28 @@
|
|||||||
# UMI Data with pi0 Relative EE Actions
|
# UMI Data with pi0 Relative EE Actions
|
||||||
|
|
||||||
This guide explains how to prepare a UMI-collected dataset for training a pi0 policy with relative end-effector (EE) actions, and how to run inference with the trained model.
|
This guide explains how to train a pi0 policy with UMI-style relative end-effector (EE) actions and deploy it on a real OpenArm robot.
|
||||||
|
|
||||||
**What we will do:**
|
**What we will do:**
|
||||||
|
|
||||||
1. Recompute dataset statistics for relative actions and state.
|
1. Prepare the dataset (EE pose + gripper in the action column).
|
||||||
2. Train pi0 with `derive_state_from_action=true` (full UMI pipeline).
|
2. Recompute statistics for relative actions.
|
||||||
3. Evaluate the trained policy on a real robot.
|
3. Train pi0 with `derive_state_from_action=true`.
|
||||||
|
4. Evaluate the trained policy on a real robot.
|
||||||
|
|
||||||
## Background
|
## Background
|
||||||
|
|
||||||
[UMI (Universal Manipulation Interface)](https://umi-gripper.github.io) collects manipulation data with hand-held grippers, recovering 6-DoF EE poses via SLAM. UMI datasets stored in LeRobot format already contain `action` (absolute EE pose) and wrist-camera images. To train pi0 with relative actions, we need **relative action statistics** — so the normalizer sees `(action − state)` distributions.
|
[UMI (Universal Manipulation Interface)](https://umi-gripper.github.io) collects manipulation data with hand-held grippers, recovering 6-DoF EE poses via SLAM. The key insight from UMI (Chi et al., 2024) is that the action space must include **both EE trajectory and gripper width**, and actions should be expressed as **relative trajectories** (offsets from the current pose).
|
||||||
|
|
||||||
|
### Dataset layout
|
||||||
|
|
||||||
|
The dataset should have this structure:
|
||||||
|
|
||||||
|
| Feature | Shape | Content |
|
||||||
|
| ------------------------- | --------- | -------------------------------------------------------- |
|
||||||
|
| `observation.images.cam0` | `[3,H,W]` | Wrist camera image |
|
||||||
|
| `action` | `[8]` | `[x, y, z, ax, ay, az, proximal, distal]` (EE + gripper) |
|
||||||
|
|
||||||
|
No separate `observation.pose` or `observation.joints` columns are needed — the model derives its proprioception state directly from the action column (`derive_state_from_action=true`).
|
||||||
|
|
||||||
### Why relative actions?
|
### Why relative actions?
|
||||||
|
|
||||||
@@ -20,99 +32,128 @@ With relative actions, each action in a chunk is an **offset from the current st
|
|||||||
relative_action[i] = absolute_action[t + i] − state[t]
|
relative_action[i] = absolute_action[t + i] − state[t]
|
||||||
```
|
```
|
||||||
|
|
||||||
This is the representation advocated by UMI (Chi et al., 2024). Compared to absolute actions it removes the need for a consistent global coordinate frame, and compared to delta actions (each step relative to the previous) it avoids error accumulation across the chunk. See the [Action Representations](action_representations) guide for a full comparison.
|
UMI ablations show this is critical: absolute actions achieve only 25% success vs 100% for relative trajectory on the cup arrangement task. Compared to delta actions (each step relative to the previous), relative trajectory avoids error accumulation. See the [Action Representations](action_representations) guide for details.
|
||||||
|
|
||||||
### Full UMI mode: `derive_state_from_action`
|
### `derive_state_from_action`
|
||||||
|
|
||||||
When `derive_state_from_action=true`, pi0 automatically derives `observation.state` from the action column on the fly — no separate state column or dataset conversion step needed. Under the hood:
|
When `derive_state_from_action=true`, pi0 derives `observation.state` from the action column during training — no separate state column needed. Under the hood:
|
||||||
|
|
||||||
- `action_delta_indices` extends to `[-1, 0, 1, ..., 49]` (one extra leading timestep).
|
- `action_delta_indices` extends to `[-1, 0, 1, ..., chunk_size-1]` (one extra leading timestep).
|
||||||
- `DeriveStateFromActionStep` extracts `[action[t-1], action[t]]` as a 2-step state and strips the extra timestep from the action chunk.
|
- `DeriveStateFromActionStep` extracts `[action[t-1], action[t]]` as a 2-step state and strips the extra timestep from the action chunk.
|
||||||
- `RelativeActionsProcessorStep` converts actions to offsets from `state[t]`.
|
- `RelativeActionsProcessorStep` converts actions to offsets from `state[t]`.
|
||||||
- `RelativeStateProcessorStep` converts the 2-step state to relative proprioception (velocity + zeros) and flattens.
|
- `RelativeStateProcessorStep` converts the 2-step state to relative proprioception (velocity + zeros) and flattens.
|
||||||
|
|
||||||
This single flag implies `use_relative_state=true` and `state_obs_steps=2`.
|
This implies `use_relative_state=true` and `state_obs_steps=2`.
|
||||||
|
|
||||||
During **inference**, state comes from the robot (via FK), so `DeriveStateFromActionStep` is a no-op. `RelativeStateProcessorStep` buffers the previous state and applies the same conversion automatically.
|
During **inference**, `DeriveStateFromActionStep` is a no-op — state comes from the robot via forward kinematics. `RelativeStateProcessorStep` buffers the previous state and applies the same conversion automatically.
|
||||||
|
|
||||||
## Step 1: Recompute Stats
|
## Step 1: Recompute Stats
|
||||||
|
|
||||||
Use the built-in CLI to recompute dataset statistics for relative actions and derive-state-from-action:
|
After preparing the dataset with EE pose in the action column, recompute statistics with `derive_state_from_action=true`. This computes relative action and state stats so the normalizer sees offset distributions:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
lerobot-edit-dataset \
|
lerobot-edit-dataset \
|
||||||
--repo_id <your_dataset> \
|
--repo-id=glannuzel/grabette-dataset \
|
||||||
--operation.type recompute_stats \
|
--operation=recompute_stats \
|
||||||
--operation.relative_action true \
|
--operation.relative_action=true \
|
||||||
--operation.derive_state_from_action true \
|
--operation.relative_exclude_joints='["proximal", "distal"]' \
|
||||||
--operation.chunk_size 50 \
|
--operation.derive_state_from_action=true \
|
||||||
--operation.relative_exclude_joints "['gripper']" \
|
--operation.chunk_size=30 \
|
||||||
--push_to_hub true
|
--push_to_hub=true
|
||||||
```
|
```
|
||||||
|
|
||||||
The `derive_state_from_action` flag tells `recompute_stats` to read from the action column (instead of `observation.state`) when computing relative state stats. It automatically enables `relative_state=true` and `state_obs_steps=2`.
|
| Flag | Purpose |
|
||||||
|
| ------------------------------- | ------------------------------------------------------------------------------- |
|
||||||
The `relative_exclude_joints` parameter specifies joints that stay absolute. Gripper commands are typically binary or continuous open/close and don't benefit from relative encoding. Leave it as `"[]"` to convert all dimensions to relative.
|
| `relative_action=true` | Compute stats on `action − state` (relative actions) |
|
||||||
|
| `relative_exclude_joints` | Keep gripper dims absolute (they don't benefit from relative encoding) |
|
||||||
|
| `derive_state_from_action=true` | Derive state from action column (implies `relative_state`, `state_obs_steps=2`) |
|
||||||
|
| `chunk_size=30` | Must match training chunk size |
|
||||||
|
|
||||||
## Step 2: Train
|
## Step 2: Train
|
||||||
|
|
||||||
No custom training script is needed — standard `lerobot-train` handles everything:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
lerobot-train \
|
#!/bin/bash
|
||||||
--dataset.repo_id=<hf_username>/<dataset_repo_id> \
|
set -euo pipefail
|
||||||
|
|
||||||
|
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:${LD_LIBRARY_PATH:-}
|
||||||
|
|
||||||
|
DATASET="glannuzel/grabette-dataset"
|
||||||
|
NUM_PROCESSES=8
|
||||||
|
|
||||||
|
echo "=== Training pi0 on $DATASET (UMI relative EE, ${NUM_PROCESSES} GPUs) ==="
|
||||||
|
accelerate launch --multi_gpu --num_processes=$NUM_PROCESSES \
|
||||||
|
-m lerobot.scripts.lerobot_train \
|
||||||
|
--dataset.repo_id="$DATASET" \
|
||||||
|
--dataset.video_backend=pyav \
|
||||||
--policy.type=pi0 \
|
--policy.type=pi0 \
|
||||||
--policy.pretrained_path=lerobot/pi0_base \
|
--policy.pretrained_path=lerobot/pi0_base \
|
||||||
|
--policy.repo_id=pepijn/grabette-umi-pi0 \
|
||||||
|
--policy.chunk_size=30 \
|
||||||
|
--policy.n_action_steps=30 \
|
||||||
--policy.derive_state_from_action=true \
|
--policy.derive_state_from_action=true \
|
||||||
--policy.use_relative_actions=true \
|
--use_relative_actions=true \
|
||||||
--policy.relative_exclude_joints='["gripper"]'
|
--policy.relative_exclude_joints='["proximal", "distal"]' \
|
||||||
|
--batch_size=32 \
|
||||||
|
--steps=5000 \
|
||||||
|
--policy.scheduler_decay_steps=5000 \
|
||||||
|
--policy.dtype=bfloat16 \
|
||||||
|
--policy.compile_model=false \
|
||||||
|
--policy.gradient_checkpointing=true \
|
||||||
|
--policy.device=cuda \
|
||||||
|
--output_dir=/fsx/pepijn/outputs/grabette-umi \
|
||||||
|
--job_name=grabette-umi-v2 \
|
||||||
|
--wandb.enable=true \
|
||||||
|
--wandb.disable_artifact=true \
|
||||||
|
--wandb.project=grabette-umi \
|
||||||
|
--log_freq=100 \
|
||||||
|
--save_freq=5000
|
||||||
```
|
```
|
||||||
|
|
||||||
`derive_state_from_action=true` auto-enables `use_relative_state=true` and `state_obs_steps=2`.
|
Key flags:
|
||||||
|
|
||||||
Under the hood, the training pipeline:
|
| Flag | Purpose |
|
||||||
|
| ------------------------------- | ---------------------------------------------------------------------- |
|
||||||
|
| `derive_state_from_action=true` | Derive proprioception from action column (full UMI mode) |
|
||||||
|
| `use_relative_actions=true` | Actions are offsets from current state |
|
||||||
|
| `relative_exclude_joints` | `["proximal", "distal"]` — gripper stays absolute, EE pose is relative |
|
||||||
|
| `chunk_size=30` | Action horizon: 30 steps (~0.65s at 46 FPS) |
|
||||||
|
| `n_action_steps=30` | Execute full chunk before replanning |
|
||||||
|
|
||||||
- Loads relative action stats and relative state stats from the dataset's `meta/stats.json`.
|
Note: `derive_state_from_action=true` automatically implies `use_relative_state=true` and `state_obs_steps=2`. No `rename_map` is needed since there are no separate observation columns to rename.
|
||||||
- Extends `action_delta_indices` to `[-1, 0, 1, ..., 49]` to load one extra leading timestep.
|
|
||||||
- `DeriveStateFromActionStep` extracts the 2-step state from the action chunk and strips the extra timestep.
|
|
||||||
- `RelativeActionsProcessorStep` converts actions to offsets from `state[t]`.
|
|
||||||
- `RelativeStateProcessorStep` converts the 2-step state to relative offsets from the current timestep, then flattens.
|
|
||||||
- `NormalizerProcessorStep` normalizes everything.
|
|
||||||
- The model trains on normalized relative values.
|
|
||||||
|
|
||||||
See the [pi0 documentation](pi0) for all available training options.
|
|
||||||
|
|
||||||
## Step 3: Evaluate
|
## Step 3: Evaluate
|
||||||
|
|
||||||
The evaluation script in `examples/umi_pi0_relative_ee/evaluate.py` runs inference on a real robot (SO-100 with EE space):
|
The evaluation script in `examples/umi_pi0_relative_ee/evaluate.py` runs inference on a real OpenArm robot:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python examples/umi_pi0_relative_ee/evaluate.py
|
python examples/umi_pi0_relative_ee/evaluate.py
|
||||||
```
|
```
|
||||||
|
|
||||||
Edit `HF_MODEL_ID`, `HF_DATASET_ID`, and robot configuration at the top of the file.
|
Edit `HF_MODEL_ID`, camera index, and robot configuration at the top of the file.
|
||||||
|
|
||||||
|
### How inference works
|
||||||
|
|
||||||
|
At inference, the training dataset has no `observation.state` — it was derived from actions. The evaluate script provides `observation.state` from the robot via forward kinematics:
|
||||||
|
|
||||||
|
1. **Robot → FK** — Arm joint positions → EE pose `[x,y,z,ax,ay,az]`, gripper → `[proximal, distal]`. Combined into `observation.state` (8D).
|
||||||
|
2. **Preprocessor** (loaded from checkpoint) — `DeriveStateFromActionStep` is a no-op. `RelativeStateProcessorStep` buffers previous state, stacks `[prev, current]`, subtracts current → velocity info. `RelativeActionsProcessorStep` caches state. `NormalizerProcessorStep` normalizes.
|
||||||
|
3. **pi0 inference** — Predicts normalized relative action chunk (30 steps).
|
||||||
|
4. **Postprocessor** — `UnnormalizerProcessorStep` unnormalizes, `AbsoluteActionsProcessorStep` adds cached state → absolute EE targets.
|
||||||
|
5. **IK → Robot** — Absolute `[x,y,z,ax,ay,az]` → arm joint targets with full 6-DOF IK (orientation weight = 1.0). `[proximal, distal]` → direct gripper position commands.
|
||||||
|
|
||||||
### Latency compensation
|
### Latency compensation
|
||||||
|
|
||||||
For real robot deployment, you may want to skip the first few steps of each predicted action chunk to compensate for system latency. Set `LATENCY_SKIP_STEPS` in the evaluate script:
|
Set `LATENCY_SKIP_STEPS` to skip the first few predicted action steps, compensating for system latency:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
LATENCY_SKIP_STEPS = 0 # ceil(total_latency_ms / (1000 / FPS))
|
LATENCY_SKIP_STEPS = 7 # ceil(total_latency_ms / (1000 / FPS))
|
||||||
```
|
```
|
||||||
|
|
||||||
For example, at 10Hz with ~200ms total latency, set `LATENCY_SKIP_STEPS = 2`.
|
At 46 FPS (~22ms/step) with ~150ms total latency: `ceil(150/22) ≈ 7`. Start with 0 for a safe first test.
|
||||||
|
|
||||||
The inference flow uses pi0's built-in processor pipeline — no custom wrappers needed:
|
|
||||||
|
|
||||||
1. **Robot → FK** — Joint positions are converted to EE pose via `ForwardKinematicsJointsToEE`, producing `observation.state`.
|
|
||||||
2. **Preprocessor** — `DeriveStateFromActionStep` is a no-op (state comes from robot). `RelativeStateProcessorStep` buffers previous state, stacks, and converts to relative. `RelativeActionsProcessorStep` caches state. `NormalizerProcessorStep` normalizes.
|
|
||||||
3. **pi0 inference** — The model predicts a normalized relative action chunk.
|
|
||||||
4. **Postprocessor** — `UnnormalizerProcessorStep` unnormalizes, then `AbsoluteActionsProcessorStep` adds the cached state back to get absolute EE targets.
|
|
||||||
5. **IK → Robot** — `InverseKinematicsEEToJoints` converts absolute EE targets to joint commands.
|
|
||||||
|
|
||||||
## Replay Viewer
|
## Replay Viewer
|
||||||
|
|
||||||
Before running on hardware, you can visualize any dataset episode in a browser-based 3D viewer. The viewer shows the EE trajectory overlaid on the OpenArm URDF model, making it easy to sanity-check recorded data or debug unexpected behavior.
|
Visualize any dataset episode in a browser-based 3D viewer before running on hardware. The viewer shows the EE trajectory overlaid on the OpenArm URDF model.
|
||||||
|
|
||||||
### Quick start
|
### Quick start
|
||||||
|
|
||||||
@@ -120,8 +161,6 @@ Before running on hardware, you can visualize any dataset episode in a browser-b
|
|||||||
python examples/umi_pi0_relative_ee/replay.py
|
python examples/umi_pi0_relative_ee/replay.py
|
||||||
```
|
```
|
||||||
|
|
||||||
This extracts the trajectory from episode 0 of the default dataset, starts a local HTTP server, and opens the viewer at [http://localhost:8765/replay_viewer.html](http://localhost:8765/replay_viewer.html).
|
|
||||||
|
|
||||||
### Options
|
### Options
|
||||||
|
|
||||||
| Flag | Default | Description |
|
| Flag | Default | Description |
|
||||||
@@ -131,64 +170,53 @@ This extracts the trajectory from episode 0 of the default dataset, starts a loc
|
|||||||
| `--port` | `8765` | HTTP server port |
|
| `--port` | `8765` | HTTP server port |
|
||||||
| `--force` | off | Re-extract trajectory even if cached |
|
| `--force` | off | Re-extract trajectory even if cached |
|
||||||
|
|
||||||
Example with a different dataset and episode:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
python examples/umi_pi0_relative_ee/replay.py \
|
|
||||||
--repo-id myuser/my-dataset \
|
|
||||||
--episode 3 \
|
|
||||||
--port 8766
|
|
||||||
```
|
|
||||||
|
|
||||||
### Viewer controls
|
### Viewer controls
|
||||||
|
|
||||||
The panel in the top-left corner shows live EE coordinates (x, y, z, ax, ay, az) and gripper state for the current frame. Below that are transport controls:
|
The panel in the top-left corner shows live EE coordinates and gripper state. Transport controls:
|
||||||
|
|
||||||
- **Play / Pause** — toggle automatic playback.
|
- **Play / Pause** — toggle automatic playback.
|
||||||
- **Step buttons** (◀ ▶) — advance or rewind one frame at a time.
|
- **Step buttons** (◀ ▶) — advance or rewind one frame.
|
||||||
- **Reset** (⟳) — jump back to frame 0.
|
- **Reset** (⟳) — jump to frame 0.
|
||||||
- **Scrubber** — drag to seek to any frame.
|
- **Scrubber** — drag to seek.
|
||||||
- **Speed selector** — 0.25×, 0.5×, 1×, 2×, or 4× playback speed.
|
- **Speed selector** — 0.25× to 4× playback speed.
|
||||||
|
|
||||||
The 3D scene uses orbit controls — click and drag to rotate, scroll to zoom, right-click drag to pan.
|
|
||||||
|
|
||||||
### Color legend
|
### Color legend
|
||||||
|
|
||||||
| Color | Meaning |
|
| Color | Meaning |
|
||||||
| ------------------ | --------------------------------------------- |
|
| ------------------ | --------------------------------------------- |
|
||||||
| Red sphere | Current EE position |
|
| Red sphere | Current EE position |
|
||||||
| Yellow trail | Past trajectory (frames already visited) |
|
| Yellow trail | Past trajectory |
|
||||||
| Dark trail | Future trajectory (frames ahead) |
|
| Dark trail | Future trajectory |
|
||||||
| Orange ring + axes | URDF `ee_target` frame (zero-joint reference) |
|
| Orange ring + axes | URDF `ee_target` frame (zero-joint reference) |
|
||||||
|
|
||||||
The trajectory is automatically re-centered so that frame 0 aligns with the robot's `openarm_right_ee_target` link in the zero-joint pose.
|
|
||||||
|
|
||||||
## How the Pieces Fit Together
|
## How the Pieces Fit Together
|
||||||
|
|
||||||
```
|
```
|
||||||
Training (full UMI mode: derive_state_from_action=true):
|
Training (derive_state_from_action=true):
|
||||||
DataLoader (action: B,51,D)
|
DataLoader loads action: [B, 31, 8] (chunk_size=30 + 1 leading)
|
||||||
→ DeriveStateFromActionStep (state = action[:,:2,:], action = action[:,1:,:])
|
→ DeriveStateFromActionStep
|
||||||
→ RelativeActionsProcessorStep (action -= state[:,-1,:])
|
state = action[:, :2, :] → [B, 2, 8]
|
||||||
→ RelativeStateProcessorStep (state offsets from current, flatten → B,2*D)
|
action = action[:, 1:, :] → [B, 30, 8]
|
||||||
→ NormalizerProcessorStep → pi0 model
|
→ RelativeActionsProcessorStep (action -= state[:, -1, :])
|
||||||
|
→ RelativeStateProcessorStep (state offsets from current, flatten → [B, 16])
|
||||||
|
→ NormalizerProcessorStep → pi0 model
|
||||||
|
|
||||||
Inference:
|
Inference:
|
||||||
robot joints → FK → observation.state (absolute EE)
|
arm joints → FK → observation.state [8D: x,y,z,ax,ay,az,prox,dist]
|
||||||
↓
|
↓
|
||||||
DeriveStateFromActionStep (no-op)
|
DeriveStateFromActionStep (no-op)
|
||||||
↓
|
↓
|
||||||
RelativeActionsProcessorStep (caches state)
|
RelativeActionsProcessorStep (caches state)
|
||||||
↓
|
↓
|
||||||
RelativeStateProcessorStep (buffers prev, stacks, subtracts, flattens)
|
RelativeStateProcessorStep (buffers prev, stacks, subtracts, flattens)
|
||||||
↓
|
↓
|
||||||
NormalizerProcessorStep → pi0 model → relative action chunk
|
NormalizerProcessorStep → pi0 model → relative action chunk [30, 8]
|
||||||
↓
|
↓
|
||||||
UnnormalizerProcessorStep
|
UnnormalizerProcessorStep
|
||||||
↓
|
↓
|
||||||
AbsoluteActionsProcessorStep (+ cached state → absolute EE)
|
AbsoluteActionsProcessorStep (+ cached state → absolute EE)
|
||||||
↓
|
↓
|
||||||
IK → joint targets → robot
|
IK → joint targets → robot
|
||||||
```
|
```
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|||||||
@@ -15,29 +15,32 @@
|
|||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
"""
|
"""
|
||||||
Inference script for a pi0 model trained with **relative EE actions** on an OpenArm robot.
|
Inference script for a pi0 model trained with UMI-style relative EE actions
|
||||||
|
on an OpenArm robot (single right arm, one wrist camera).
|
||||||
|
|
||||||
Single right OpenArm follower with one wrist camera.
|
Training dataset layout:
|
||||||
|
observation.images.cam0 [3, 720, 960]
|
||||||
|
action [x, y, z, ax, ay, az, proximal, distal] (shape 8)
|
||||||
|
|
||||||
This uses the built-in ``DeriveStateFromActionStep`` (no-op at inference),
|
The model uses ``derive_state_from_action=true``, so observation.state is
|
||||||
``RelativeActionsProcessorStep``, ``AbsoluteActionsProcessorStep``, and
|
derived from the action column during training. At inference the state must
|
||||||
``RelativeStateProcessorStep`` that are already wired into pi0's processor
|
be provided by the robot — this script uses FK to compute the current EE
|
||||||
pipeline.
|
pose and gripper position, which it exposes as ``observation.state``.
|
||||||
|
|
||||||
The inference loop:
|
Pipeline:
|
||||||
1. Reads joint positions from the robot (7-DOF arm + gripper).
|
1. Read arm joints from robot → FK → observation.state [x,y,z,ax,ay,az,prox,dist]
|
||||||
2. Converts to EE pose via forward kinematics (FK).
|
2. Read camera image → observation.images.cam0
|
||||||
This produces ``observation.state`` with the current EE pose.
|
3. pi0 preprocessor (loaded from checkpoint):
|
||||||
3. The pi0 preprocessor:
|
- DeriveStateFromActionStep: no-op at inference (state from robot)
|
||||||
a) ``DeriveStateFromActionStep`` — no-op (state comes from robot).
|
- RelativeActionsProcessorStep: caches current state
|
||||||
b) ``RelativeActionsProcessorStep`` caches the raw state.
|
- RelativeStateProcessorStep: buffers prev state, stacks [prev,cur],
|
||||||
c) ``RelativeStateProcessorStep`` buffers prev state, stacks, subtracts.
|
subtracts current → velocity info, flattens
|
||||||
d) ``NormalizerProcessorStep`` normalizes state and actions.
|
- NormalizerProcessorStep: normalizes
|
||||||
4. pi0 predicts relative action chunk.
|
4. pi0 predicts relative action chunk (30 steps)
|
||||||
5. The pi0 postprocessor:
|
5. pi0 postprocessor: unnormalize, add cached state → absolute EE
|
||||||
a) ``UnnormalizerProcessorStep`` unnormalizes.
|
6. IK: absolute EE [x,y,z,ax,ay,az] → arm joint targets
|
||||||
b) ``AbsoluteActionsProcessorStep`` adds cached state → absolute EE.
|
7. Gripper [proximal, distal] → gripper motor targets
|
||||||
6. IK converts absolute EE → joint targets → robot.
|
8. Send to robot
|
||||||
|
|
||||||
Usage:
|
Usage:
|
||||||
python evaluate.py
|
python evaluate.py
|
||||||
@@ -45,57 +48,177 @@ Usage:
|
|||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
from scipy.spatial.transform import Rotation
|
||||||
|
|
||||||
from lerobot.cameras.opencv.configuration_opencv import OpenCVCameraConfig
|
from lerobot.cameras.opencv.configuration_opencv import OpenCVCameraConfig
|
||||||
from lerobot.configs.types import FeatureType, PolicyFeature
|
|
||||||
from lerobot.datasets.feature_utils import combine_feature_dicts
|
|
||||||
from lerobot.datasets.lerobot_dataset import LeRobotDataset
|
from lerobot.datasets.lerobot_dataset import LeRobotDataset
|
||||||
from lerobot.datasets.pipeline_features import aggregate_pipeline_dataset_features, create_initial_features
|
|
||||||
from lerobot.model.kinematics import RobotKinematics
|
from lerobot.model.kinematics import RobotKinematics
|
||||||
from lerobot.policies.factory import make_pre_post_processors
|
from lerobot.policies.factory import make_pre_post_processors
|
||||||
from lerobot.policies.pi0.modeling_pi0 import PI0Policy
|
from lerobot.policies.pi0.modeling_pi0 import PI0Policy
|
||||||
from lerobot.processor import (
|
from lerobot.processor import RelativeStateProcessorStep
|
||||||
RelativeStateProcessorStep,
|
|
||||||
RobotProcessorPipeline,
|
|
||||||
make_default_teleop_action_processor,
|
|
||||||
)
|
|
||||||
from lerobot.processor.converters import (
|
|
||||||
observation_to_transition,
|
|
||||||
robot_action_observation_to_transition,
|
|
||||||
transition_to_observation,
|
|
||||||
transition_to_robot_action,
|
|
||||||
)
|
|
||||||
from lerobot.robots.openarm_follower import OpenArmFollower, OpenArmFollowerConfig
|
from lerobot.robots.openarm_follower import OpenArmFollower, OpenArmFollowerConfig
|
||||||
from lerobot.robots.so_follower.robot_kinematic_processor import (
|
|
||||||
ForwardKinematicsJointsToEE,
|
|
||||||
InverseKinematicsEEToJoints,
|
|
||||||
)
|
|
||||||
from lerobot.scripts.lerobot_record import record_loop
|
from lerobot.scripts.lerobot_record import record_loop
|
||||||
from lerobot.types import RobotAction, RobotObservation
|
from lerobot.types import RobotAction, RobotObservation
|
||||||
from lerobot.utils.control_utils import init_keyboard_listener
|
from lerobot.utils.control_utils import init_keyboard_listener
|
||||||
from lerobot.utils.utils import log_say
|
from lerobot.utils.utils import log_say
|
||||||
from lerobot.utils.visualization_utils import init_rerun
|
from lerobot.utils.visualization_utils import init_rerun
|
||||||
|
|
||||||
FPS = 30
|
# ---------------------------------------------------------------------------
|
||||||
|
# Configuration — adapt these to your setup
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
FPS = 46
|
||||||
EPISODE_TIME_SEC = 60
|
EPISODE_TIME_SEC = 60
|
||||||
TASK_DESCRIPTION = "red cube"
|
TASK_DESCRIPTION = "red cube"
|
||||||
|
|
||||||
HF_MODEL_ID = "pepijn223/grabette-umi-pi0"
|
HF_MODEL_ID = "pepijn223/grabette-umi-pi0"
|
||||||
|
|
||||||
# Latency compensation: skip this many steps from the start of each predicted
|
# Latency compensation: skip this many predicted action steps to account for
|
||||||
# action chunk. Formula: ceil(total_latency_ms / (1000 / FPS)).
|
# camera + inference + execution latency. Formula: ceil(total_ms / (1000/FPS)).
|
||||||
# E.g. at 10Hz with ~200ms total system latency: ceil(200 / 100) = 2.
|
# At 46 FPS (~22ms/step) with ~150ms total latency: ceil(150/22) ≈ 7.
|
||||||
|
# Start with 0 for a safe first test, then increase to match measured latency.
|
||||||
LATENCY_SKIP_STEPS = 0
|
LATENCY_SKIP_STEPS = 0
|
||||||
|
|
||||||
# EE feature keys produced by ForwardKinematicsJointsToEE (arm pose only).
|
|
||||||
# Gripper joints use absolute position control, not EE-relative.
|
|
||||||
EE_KEYS = ["x", "y", "z", "wx", "wy", "wz"]
|
|
||||||
|
|
||||||
URDF_PATH = "src/lerobot/robots/openarm_follower/urdf/openarm_bimanual_pybullet.urdf"
|
URDF_PATH = "src/lerobot/robots/openarm_follower/urdf/openarm_bimanual_pybullet.urdf"
|
||||||
URDF_EE_FRAME = "openarm_right_link7"
|
URDF_EE_FRAME = "openarm_right_ee_target"
|
||||||
|
|
||||||
|
IK_POSITION_WEIGHT = 1.0
|
||||||
|
IK_ORIENTATION_WEIGHT = 1.0
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Dataset features for inference
|
||||||
|
#
|
||||||
|
# The training dataset has only observation.images.cam0 and action.
|
||||||
|
# observation.state is derived from action during training
|
||||||
|
# (derive_state_from_action=true) but must be supplied by the robot at
|
||||||
|
# inference. We define it here so build_dataset_frame can map FK output
|
||||||
|
# to the right feature.
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
DATASET_FEATURES: dict = {
|
||||||
|
"observation.state": {
|
||||||
|
"dtype": "float32",
|
||||||
|
"shape": [8],
|
||||||
|
"names": ["x", "y", "z", "ax", "ay", "az", "proximal", "distal"],
|
||||||
|
},
|
||||||
|
"observation.images.cam0": {
|
||||||
|
"dtype": "video",
|
||||||
|
"shape": [3, 720, 960],
|
||||||
|
"names": ["channels", "height", "width"],
|
||||||
|
"info": {
|
||||||
|
"video.height": 720,
|
||||||
|
"video.width": 960,
|
||||||
|
"video.codec": "h264",
|
||||||
|
"video.pix_fmt": "yuv420p",
|
||||||
|
"video.is_depth_map": False,
|
||||||
|
"video.fps": FPS,
|
||||||
|
"video.channels": 3,
|
||||||
|
"has_audio": False,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
"action": {
|
||||||
|
"dtype": "float32",
|
||||||
|
"shape": [8],
|
||||||
|
"names": ["x", "y", "z", "ax", "ay", "az", "proximal", "distal"],
|
||||||
|
},
|
||||||
|
"timestamp": {"dtype": "float32", "shape": [1], "names": None},
|
||||||
|
"frame_index": {"dtype": "int64", "shape": [1], "names": None},
|
||||||
|
"episode_index": {"dtype": "int64", "shape": [1], "names": None},
|
||||||
|
"index": {"dtype": "int64", "shape": [1], "names": None},
|
||||||
|
"task_index": {"dtype": "int64", "shape": [1], "names": None},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# FK / IK callables
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class JointsToEE:
|
||||||
|
"""FK: raw robot observation → flat dict matching observation.state names.
|
||||||
|
|
||||||
|
Arm joint positions → EE pose [x,y,z,ax,ay,az] via forward kinematics.
|
||||||
|
Gripper motor positions → [proximal, distal].
|
||||||
|
Camera images pass through unchanged.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, kinematics: RobotKinematics, arm_motor_names: list[str]):
|
||||||
|
self.kin = kinematics
|
||||||
|
self.arm = arm_motor_names
|
||||||
|
|
||||||
|
def __call__(self, obs: RobotObservation) -> RobotObservation:
|
||||||
|
q = np.array([float(obs[f"{m}.pos"]) for m in self.arm])
|
||||||
|
t = self.kin.forward_kinematics(q)
|
||||||
|
rot = Rotation.from_matrix(t[:3, :3]).as_rotvec()
|
||||||
|
|
||||||
|
out: dict = {
|
||||||
|
"x": float(t[0, 3]),
|
||||||
|
"y": float(t[1, 3]),
|
||||||
|
"z": float(t[2, 3]),
|
||||||
|
"ax": float(rot[0]),
|
||||||
|
"ay": float(rot[1]),
|
||||||
|
"az": float(rot[2]),
|
||||||
|
"proximal": float(obs["proximal.pos"]),
|
||||||
|
"distal": float(obs["distal.pos"]),
|
||||||
|
}
|
||||||
|
for k, v in obs.items():
|
||||||
|
if not k.endswith((".pos", ".vel", ".torque")):
|
||||||
|
out[k] = v
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
class EEToJoints:
|
||||||
|
"""IK: policy action dict → motor position dict for the robot.
|
||||||
|
|
||||||
|
Reads [x,y,z,ax,ay,az] from the action, runs IK for arm joint targets.
|
||||||
|
Passes [proximal, distal] as direct gripper position commands.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
kinematics: RobotKinematics,
|
||||||
|
arm_motor_names: list[str],
|
||||||
|
position_weight: float = 1.0,
|
||||||
|
orientation_weight: float = 1.0,
|
||||||
|
):
|
||||||
|
self.kin = kinematics
|
||||||
|
self.arm = arm_motor_names
|
||||||
|
self.pw = position_weight
|
||||||
|
self.ow = orientation_weight
|
||||||
|
self.q_curr: np.ndarray | None = None
|
||||||
|
|
||||||
|
def __call__(self, args: tuple[RobotAction, RobotObservation]) -> RobotAction:
|
||||||
|
action, obs = args
|
||||||
|
|
||||||
|
q_raw = np.array([float(obs[f"{m}.pos"]) for m in self.arm])
|
||||||
|
if self.q_curr is None:
|
||||||
|
self.q_curr = q_raw
|
||||||
|
|
||||||
|
t_des = np.eye(4)
|
||||||
|
t_des[:3, :3] = Rotation.from_rotvec([action["ax"], action["ay"], action["az"]]).as_matrix()
|
||||||
|
t_des[:3, 3] = [action["x"], action["y"], action["z"]]
|
||||||
|
|
||||||
|
q_target = self.kin.inverse_kinematics(
|
||||||
|
self.q_curr, t_des, position_weight=self.pw, orientation_weight=self.ow
|
||||||
|
)
|
||||||
|
self.q_curr = q_target
|
||||||
|
|
||||||
|
out: dict = {f"{m}.pos": float(q_target[i]) for i, m in enumerate(self.arm)}
|
||||||
|
out["proximal.pos"] = float(action["proximal"])
|
||||||
|
out["distal.pos"] = float(action["distal"])
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Main
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
camera_config = {"cam0": OpenCVCameraConfig(index_or_path=0, width=960, height=720, fps=FPS)}
|
camera_config = {
|
||||||
|
"cam0": OpenCVCameraConfig(index_or_path=0, width=960, height=720, fps=FPS),
|
||||||
|
}
|
||||||
robot_config = OpenArmFollowerConfig(
|
robot_config = OpenArmFollowerConfig(
|
||||||
port="can0",
|
port="can0",
|
||||||
id="right_openarm",
|
id="right_openarm",
|
||||||
@@ -110,79 +233,20 @@ def main():
|
|||||||
policy.config.latency_skip_steps = LATENCY_SKIP_STEPS
|
policy.config.latency_skip_steps = LATENCY_SKIP_STEPS
|
||||||
|
|
||||||
arm_motor_names = list(robot.bus.motors.keys())
|
arm_motor_names = list(robot.bus.motors.keys())
|
||||||
gripper_motor_names = list(robot.gripper_bus.motors.keys())
|
|
||||||
|
|
||||||
kinematics_solver = RobotKinematics(
|
kinematics = RobotKinematics(
|
||||||
urdf_path=URDF_PATH,
|
urdf_path=URDF_PATH,
|
||||||
target_frame_name=URDF_EE_FRAME,
|
target_frame_name=URDF_EE_FRAME,
|
||||||
joint_names=arm_motor_names,
|
joint_names=arm_motor_names,
|
||||||
)
|
)
|
||||||
|
|
||||||
# The policy starts from the robot's current EE pose (via FK below).
|
fk = JointsToEE(kinematics, arm_motor_names)
|
||||||
# Relative actions are predicted as deltas from that pose, so no manual
|
ik = EEToJoints(kinematics, arm_motor_names, IK_POSITION_WEIGHT, IK_ORIENTATION_WEIGHT)
|
||||||
# re-centering is needed — the starting point is always the live EE tip.
|
|
||||||
|
|
||||||
# FK: joint observation → EE observation (produces observation.state).
|
|
||||||
# gripper_names=[] means proximal/distal pass through as absolute positions.
|
|
||||||
robot_joints_to_ee_processor = RobotProcessorPipeline[RobotObservation, RobotObservation](
|
|
||||||
steps=[
|
|
||||||
ForwardKinematicsJointsToEE(
|
|
||||||
kinematics=kinematics_solver,
|
|
||||||
motor_names=arm_motor_names,
|
|
||||||
gripper_names=[],
|
|
||||||
)
|
|
||||||
],
|
|
||||||
to_transition=observation_to_transition,
|
|
||||||
to_output=transition_to_observation,
|
|
||||||
)
|
|
||||||
|
|
||||||
# IK: EE action → joint targets. Gripper actions are absolute and pass through.
|
|
||||||
robot_ee_to_joints_processor = RobotProcessorPipeline[tuple[RobotAction, RobotObservation], RobotAction](
|
|
||||||
steps=[
|
|
||||||
InverseKinematicsEEToJoints(
|
|
||||||
kinematics=kinematics_solver,
|
|
||||||
motor_names=arm_motor_names,
|
|
||||||
gripper_names=[],
|
|
||||||
initial_guess_current_joints=True,
|
|
||||||
),
|
|
||||||
],
|
|
||||||
to_transition=robot_action_observation_to_transition,
|
|
||||||
to_output=transition_to_robot_action,
|
|
||||||
)
|
|
||||||
|
|
||||||
# OpenArm observations include .vel and .torque per motor; the EE policy
|
|
||||||
# pipeline only needs .pos (converted to EE by FK) and camera features.
|
|
||||||
obs_features = {
|
|
||||||
k: v
|
|
||||||
for k, v in robot.observation_features.items()
|
|
||||||
if not (k.endswith(".vel") or k.endswith(".torque"))
|
|
||||||
}
|
|
||||||
|
|
||||||
# A dataset object is needed for its .features and .meta.stats even when
|
|
||||||
# not recording — record_loop uses them for building observation/action frames.
|
|
||||||
dataset = LeRobotDataset.create(
|
dataset = LeRobotDataset.create(
|
||||||
repo_id="tmp/openarm_eval_scratch",
|
repo_id="tmp/openarm_eval_scratch",
|
||||||
fps=FPS,
|
fps=FPS,
|
||||||
features=combine_feature_dicts(
|
features=DATASET_FEATURES,
|
||||||
aggregate_pipeline_dataset_features(
|
|
||||||
pipeline=robot_joints_to_ee_processor,
|
|
||||||
initial_features=create_initial_features(observation=obs_features),
|
|
||||||
use_videos=True,
|
|
||||||
),
|
|
||||||
aggregate_pipeline_dataset_features(
|
|
||||||
pipeline=make_default_teleop_action_processor(),
|
|
||||||
initial_features=create_initial_features(
|
|
||||||
action={
|
|
||||||
**{f"ee.{k}": PolicyFeature(type=FeatureType.ACTION, shape=(1,)) for k in EE_KEYS},
|
|
||||||
**{
|
|
||||||
f"{g}.pos": PolicyFeature(type=FeatureType.ACTION, shape=(1,))
|
|
||||||
for g in gripper_motor_names
|
|
||||||
},
|
|
||||||
}
|
|
||||||
),
|
|
||||||
use_videos=True,
|
|
||||||
),
|
|
||||||
),
|
|
||||||
robot_type=robot.name,
|
robot_type=robot.name,
|
||||||
use_videos=True,
|
use_videos=True,
|
||||||
image_writer_threads=4,
|
image_writer_threads=4,
|
||||||
@@ -195,7 +259,7 @@ def main():
|
|||||||
preprocessor_overrides={"device_processor": {"device": str(policy.config.device)}},
|
preprocessor_overrides={"device_processor": {"device": str(policy.config.device)}},
|
||||||
)
|
)
|
||||||
|
|
||||||
_relative_state_steps = [s for s in preprocessor.steps if isinstance(s, RelativeStateProcessorStep)]
|
relative_state_steps = [s for s in preprocessor.steps if isinstance(s, RelativeStateProcessorStep)]
|
||||||
|
|
||||||
robot.connect()
|
robot.connect()
|
||||||
|
|
||||||
@@ -207,7 +271,7 @@ def main():
|
|||||||
raise ValueError("Robot is not connected!")
|
raise ValueError("Robot is not connected!")
|
||||||
|
|
||||||
log_say("Starting policy execution")
|
log_say("Starting policy execution")
|
||||||
for step in _relative_state_steps:
|
for step in relative_state_steps:
|
||||||
step.reset()
|
step.reset()
|
||||||
|
|
||||||
record_loop(
|
record_loop(
|
||||||
@@ -221,9 +285,8 @@ def main():
|
|||||||
control_time_s=EPISODE_TIME_SEC,
|
control_time_s=EPISODE_TIME_SEC,
|
||||||
single_task=TASK_DESCRIPTION,
|
single_task=TASK_DESCRIPTION,
|
||||||
display_data=True,
|
display_data=True,
|
||||||
teleop_action_processor=make_default_teleop_action_processor(),
|
robot_action_processor=ik,
|
||||||
robot_action_processor=robot_ee_to_joints_processor,
|
robot_observation_processor=fk,
|
||||||
robot_observation_processor=robot_joints_to_ee_processor,
|
|
||||||
)
|
)
|
||||||
finally:
|
finally:
|
||||||
robot.disconnect()
|
robot.disconnect()
|
||||||
|
|||||||
Reference in New Issue
Block a user