mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-11 14:49:43 +00:00
a147fa4439
* feat(ci): add RoboCerebra benchmark eval job - Docker image with robosuite/libero deps for RoboCerebra eval - CI workflow: 1-episode eval with pepijn223/smolvla_robocerebra - Reuses libero env with rename_map + empty_cameras=3 * docs(robocerebra): add benchmark page and toctree entry Add a dedicated docs page for RoboCerebra that points at the canonical dataset lerobot/robocerebra_unified and shows how to run eval + fine-tune against it. Wire it into the Benchmarks section of the toctree so doc-builder picks it up. * ci: point benchmark eval checkpoints at the lerobot/ org mirrors pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in this branch (libero, metaworld, and the per-branch benchmark). The checkpoints were mirrored into the lerobot/ org and that's the canonical location going forward. * fix(robocerebra): drop alias extra + simplify docker image Two problems rolled up: 1. `uv sync --locked --extra test` was failing because pyproject.toml added a `robocerebra = ["lerobot[libero]"]` alias extra but uv.lock wasn't regenerated. Drop the alias — nothing in CI actually needs the extra name (the Dockerfile just installs what it needs directly), so this restores pyproject.toml and uv.lock to byte-exact origin/main. 2. Rebase docker/Dockerfile.benchmark.robocerebra on huggingface/lerobot-gpu:latest (same pattern as libero/metaworld/…). The nightly image already ships lerobot[all] which includes [libero], so the RoboCerebra image is essentially identical to the LIBERO one: fetch libero-assets, write ~/.libero/config.yaml, overlay source. 92 → 43 lines. Also repoint the CI workflow comment that referenced the removed extra. * ci: use dedicated lerobot/smolvla_robocerebra checkpoint for smoke eval Replace the generic pepijn223/smolvla_libero placeholder with the purpose-trained lerobot/smolvla_robocerebra model in the RoboCerebra CI smoke test. * fix(ci): align RoboCerebra eval with training pipeline Fixes 5 mismatches that caused 0% success rate: - env.type: robocerebra (unregistered) → libero - resolution: 360x360 (default) → 256x256 (matches dataset) - camera_name_mapping: map eye_in_hand → wrist_image (not image2) - empty_cameras: 3 → 1 (matches training) - add HF_USER_TOKEN guard on eval step * fix(ci): set env.fps=20 and explicit obs_type for RoboCerebra eval Match the dataset's 20 FPS (LiberoEnv defaults to 30) and make obs_type=pixels_agent_pos explicit for safety against future changes. * docs(robocerebra): align page with adding_benchmarks template Rework docs/source/robocerebra.mdx to follow the standard benchmark doc structure: intro + links + available tasks + installation + eval + recommended episodes + policy I/O + training + reproducing results. - Point everything at lerobot/smolvla_robocerebra (the released checkpoint), not the personal pepijn223 mirror. - Add the --env.fps=20 and --env.obs_type=pixels_agent_pos flags that CI actually uses, so copy-paste eval reproduces CI. - Split the "Training" block out of the recipe section into its own section with the feature table. - Add an explicit "Reproducing published results" section pointing at the CI smoke eval. * fix: integrate PR #3314 review feedback - ci(robocerebra): drop redundant hf auth login step (auth is already performed inside the eval step's container). - ci(robocerebra): add Docker Hub login before the image build to pick up the authenticated rate limit. - docs(robocerebra): align eval snippet with the CI command (observation size, camera_name_mapping, use_async_envs, device, empty_cameras=1). * fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs Port of #3416 onto this branch. * ci: gate Docker Hub login on secret availability * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update .github/workflows/benchmark_tests.yml Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
100 lines
4.4 KiB
Plaintext
100 lines
4.4 KiB
Plaintext
# RoboCerebra
|
||
|
||
[RoboCerebra](https://robocerebra-project.github.io/) is a long-horizon manipulation benchmark that evaluates **high-level reasoning, planning, and memory** in VLAs. Episodes chain multiple sub-goals with language-grounded intermediate instructions, built on top of LIBERO's simulator stack (MuJoCo + robosuite, Franka Panda 7-DOF).
|
||
|
||
- Paper: [RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation](https://arxiv.org/abs/2506.06677)
|
||
- Project website: [robocerebra-project.github.io](https://robocerebra-project.github.io/)
|
||
- Dataset: [`lerobot/robocerebra_unified`](https://huggingface.co/datasets/lerobot/robocerebra_unified) — LeRobot v3.0, 6,660 episodes / 571,116 frames at 20 fps, 1,728 language-grounded sub-tasks.
|
||
- Pretrained policy: [`lerobot/smolvla_robocerebra`](https://huggingface.co/lerobot/smolvla_robocerebra)
|
||
|
||
## Available tasks
|
||
|
||
RoboCerebra reuses LIBERO's simulator, so evaluation runs against the LIBERO `libero_10` long-horizon suite:
|
||
|
||
| Suite | CLI name | Tasks | Description |
|
||
| --------- | ----------- | ----- | ------------------------------------------------------------- |
|
||
| LIBERO-10 | `libero_10` | 10 | Long-horizon kitchen/living room tasks chaining 3–6 sub-goals |
|
||
|
||
Each RoboCerebra episode in the dataset is segmented into multiple sub-tasks with natural-language instructions, which the unified dataset exposes as independent supervision signals.
|
||
|
||
## Installation
|
||
|
||
RoboCerebra piggybacks on LIBERO, so the `libero` extra is all you need:
|
||
|
||
```bash
|
||
pip install -e ".[libero]"
|
||
```
|
||
|
||
<Tip>
|
||
RoboCerebra requires Linux (MuJoCo / robosuite). Set the rendering backend before training or evaluation:
|
||
|
||
```bash
|
||
export MUJOCO_GL=egl # for headless servers (HPC, cloud)
|
||
```
|
||
|
||
</Tip>
|
||
|
||
## Evaluation
|
||
|
||
RoboCerebra eval runs against LIBERO's `libero_10` suite with RoboCerebra's camera naming (`image` + `wrist_image`) and an extra empty-camera slot so a three-view-trained policy receives the expected input layout:
|
||
|
||
```bash
|
||
lerobot-eval \
|
||
--policy.path=lerobot/smolvla_robocerebra \
|
||
--env.type=libero \
|
||
--env.task=libero_10 \
|
||
--env.fps=20 \
|
||
--env.obs_type=pixels_agent_pos \
|
||
--env.observation_height=256 \
|
||
--env.observation_width=256 \
|
||
'--env.camera_name_mapping={"agentview_image": "image", "robot0_eye_in_hand_image": "wrist_image"}' \
|
||
--eval.batch_size=1 \
|
||
--eval.n_episodes=10 \
|
||
--eval.use_async_envs=false \
|
||
--policy.device=cuda \
|
||
'--rename_map={"observation.images.image": "observation.images.camera1", "observation.images.wrist_image": "observation.images.camera2"}' \
|
||
--policy.empty_cameras=1
|
||
```
|
||
|
||
### Recommended evaluation episodes
|
||
|
||
**10 episodes per task** across the `libero_10` suite (100 total) for reproducible benchmarking. Matches the protocol used in the RoboCerebra paper.
|
||
|
||
## Policy inputs and outputs
|
||
|
||
**Observations:**
|
||
|
||
- `observation.state` — 8-dim proprioceptive state (7 joint positions + gripper)
|
||
- `observation.images.image` — third-person view, 256×256 HWC uint8
|
||
- `observation.images.wrist_image` — wrist-mounted camera view, 256×256 HWC uint8
|
||
|
||
**Actions:**
|
||
|
||
- Continuous control in `Box(-1, 1, shape=(7,))` — end-effector delta (6D) + gripper (1D)
|
||
|
||
## Training
|
||
|
||
The unified dataset at [`lerobot/robocerebra_unified`](https://huggingface.co/datasets/lerobot/robocerebra_unified) exposes two RGB streams and language-grounded sub-task annotations:
|
||
|
||
| Feature | Shape | Description |
|
||
| -------------------------------- | ------------- | -------------------- |
|
||
| `observation.images.image` | (256, 256, 3) | Third-person view |
|
||
| `observation.images.wrist_image` | (256, 256, 3) | Wrist-mounted camera |
|
||
| `observation.state` | (8,) | Joint pos + gripper |
|
||
| `action` | (7,) | EEF delta + gripper |
|
||
|
||
Fine-tune a SmolVLA base on it:
|
||
|
||
```bash
|
||
lerobot-train \
|
||
--policy.path=lerobot/smolvla_base \
|
||
--dataset.repo_id=lerobot/robocerebra_unified \
|
||
--env.type=libero \
|
||
--env.task=libero_10 \
|
||
--output_dir=outputs/smolvla_robocerebra
|
||
```
|
||
|
||
## Reproducing published results
|
||
|
||
The released checkpoint [`lerobot/smolvla_robocerebra`](https://huggingface.co/lerobot/smolvla_robocerebra) was trained on `lerobot/robocerebra_unified` and evaluated with the command in the [Evaluation](#evaluation) section. CI runs the same command with `--eval.n_episodes=1` as a smoke test on every PR touching the benchmark.
|