Files
lerobot/docs/source/_toctree.yml
T
Pepijn 5adad11128 feat(sim): VLABench benchmark integration (#3396)
feat(sim): add VLABench benchmark integration
Add VLABench as a new simulation benchmark in LeRobot, following the existing LIBERO and MetaWorld patterns.
This PR wires VLABench end-to-end across environment integration, Docker setup, CI smoke evaluation, and documentation. It also fixes a number of upstream packaging and runtime issues required to make VLABench usable and reproducible in CI.
What’s included
Benchmark integration
Add VLABench as a new simulation benchmark.
Expose supported VLABench tasks through the LeRobot env interface.
Follow the established LIBERO / MetaWorld factory patterns.
Preserve lazy async-env metadata so env.unwrapped.metadata["render_fps"] continues to work.
CI smoke evaluation
Add a VLABench smoke-eval job using lerobot/smolvla_vlabench.
Use the correct rename_map for the 3-camera dataset layout.
Expand smoke coverage from 1 to 10 primitive tasks.
Extract task descriptions after eval so metrics artifacts include per-task labels.
Skip Docker Hub login when secrets are unavailable (e.g. fork PRs).
Docker / install fixes
Install VLABench from GitHub rather than PyPI.
Use uv pip, not pip, in the base image.
Fail loudly on install errors instead of masking them.
Clone VLABench into the non-root user’s home directory.
Use shallow editable installs for VLABench and rrt-algorithms to work around missing __init__.py issues.
Pin upstream clones to exact commit SHAs for reproducibility.
Add undeclared runtime dependencies required by VLABench (open3d, colorlog, scikit-learn, openai).
Unpin open3d so Python 3.12 wheels resolve.
Assets
Support downloading VLABench assets from a Hugging Face Hub mirror via VLABENCH_ASSETS_REPO.
Keep Google Drive download support as fallback.
Install huggingface_hub[hf_xet] so Xet-backed assets download correctly.
Validate required mesh/XML asset subtrees at build time.
Patch VLABench constants to tolerate missing asset directories at import time.
Runtime / env correctness
Import VLABench robots and tasks explicitly so decorator-based registry population happens.
Resize and normalize camera observations so they always match the declared (H, W, 3) uint8 observation space.
Reinstall LeRobot editably inside the image so the new env code is actually used.
Coerce agent_pos / ee_state to the expected shape.
Pad actions when needed to match data.ctrl.
Replace zero-padding fallback with proper dm_control IK for 7D end-effector actions.
Refetch dm_control physics on each step instead of caching weakrefs.
Retry unstable resets with reseeding and handle PhysicsError gracefully at step time.
Dataset / policy alignment
Align VLABench observations and actions with Hugging Face dataset conventions used by lerobot/vlabench_unified:
convert EE position between world frame and robot-base frame at the env boundary,
expose / consume Euler XYZ instead of raw quaternion layout,
align gripper semantics with dataset convention (1 = open, 0 = closed).
This fixes policy/env mismatches that previously caused incorrect IK targets and unstable behavior at evaluation time.
Docs
Add a full docs/source/vlabench.mdx page aligned with the standard benchmark template.
Document task selection forms (single task, comma list, suite shortcut).
Document installation, evaluation, training, and result reproduction.
Point examples at lerobot/smolvla_vlabench.
Add a benchmark banner image.
Remove outdated / misleading references to upstream evaluation tracks.
Document manual install flow instead of a broken vlabench extra.
Packaging cleanup
Remove the unresolvable vlabench extra from pyproject.toml.
Remove the no-op VLABench processor step.
Remove the obsolete env unit test that only covered the dropped gripper remap helper.
Apply formatting / logging / style cleanup from review feedback.
Why this is needed
VLABench is not currently consumable as a normal Python dependency and requires several upstream workarounds:
no PyPI release,
missing package declarations,
undeclared runtime deps,
SSH-only submodule references,
asset downloads outside normal package install flow,
registry population that depends on import side effects,
env outputs that do not always match declared observation shapes,
task resets that can diverge under some random layouts.
This PR makes the benchmark usable in LeRobot despite those constraints, and ensures CI runs are reproducible and informative.
If you want a much shorter squash commit message, I’d use this:
feat(sim): integrate VLABench benchmark with CI, Docker, and docs
Add VLABench as a new LeRobot simulation benchmark, following the existing LIBERO / MetaWorld patterns.
This includes:
LeRobot env integration and task exposure,
CI smoke eval with lerobot/smolvla_vlabench,
Docker install and asset-download fixes,
runtime fixes for registry loading, assets, camera obs, action handling, dm_control IK, and PhysicsError recovery,
alignment of obs/action semantics with HF VLABench datasets,
docs and packaging cleanup.
The PR also incorporates review feedback, improves reproducibility by pinning upstream commits, and makes VLABench usable in CI despite upstream packaging and asset-management issues.
2026-04-21 17:54:11 +02:00

159 lines
3.8 KiB
YAML

- sections:
- local: index
title: LeRobot
- local: installation
title: Installation
title: Get started
- sections:
- local: il_robots
title: Imitation Learning for Robots
- local: bring_your_own_policies
title: Bring Your Own Policies
- local: integrate_hardware
title: Bring Your Own Hardware
- local: hilserl
title: Train a Robot with RL
- local: hilserl_sim
title: Train RL in Simulation
- local: multi_gpu_training
title: Multi GPU training
- local: hil_data_collection
title: Human In the Loop Data Collection
- local: peft_training
title: Training with PEFT (e.g., LoRA)
- local: rename_map
title: Using Rename Map and Empty Cameras
title: "Tutorials"
- sections:
- local: lerobot-dataset-v3
title: Using LeRobotDataset
- local: porting_datasets_v3
title: Porting Large Datasets
- local: using_dataset_tools
title: Using the Dataset Tools
- local: dataset_subtask
title: Using Subtasks in the Dataset
- local: streaming_video_encoding
title: Streaming Video Encoding
title: "Datasets"
- sections:
- local: act
title: ACT
- local: smolvla
title: SmolVLA
- local: pi0
title: π₀ (Pi0)
- local: pi0fast
title: π₀-FAST (Pi0Fast)
- local: pi05
title: π₀.₅ (Pi05)
- local: groot
title: NVIDIA GR00T N1.5
- local: xvla
title: X-VLA
- local: multi_task_dit
title: Multitask DiT Policy
- local: walloss
title: WALL-OSS
title: "Policies"
- sections:
- local: sarm
title: SARM
title: "Reward Models"
- sections:
- local: async
title: Use Async Inference
- local: rtc
title: Real-Time Chunking (RTC)
title: "Inference"
- sections:
- local: envhub
title: Environments from the Hub
- local: envhub_leisaac
title: Control & Train Robots in Sim (LeIsaac)
title: "Simulation"
- sections:
- local: adding_benchmarks
title: Adding a New Benchmark
- local: libero
title: LIBERO
- local: libero_plus
title: LIBERO-plus
- local: metaworld
title: Meta-World
- local: robotwin
title: RoboTwin 2.0
- local: robocasa
title: RoboCasa365
- local: robocerebra
title: RoboCerebra
- local: robomme
title: RoboMME
- local: envhub_isaaclab_arena
title: NVIDIA IsaacLab Arena Environments
- local: vlabench
title: VLABench
title: "Benchmarks"
- sections:
- local: introduction_processors
title: Introduction to Robot Processors
- local: debug_processor_pipeline
title: Debug your processor pipeline
- local: implement_your_own_processor
title: Implement your own processor
- local: processors_robots_teleop
title: Processors for Robots and Teleoperators
- local: env_processor
title: Environment Processors
- local: action_representations
title: Action Representations
title: "Robot Processors"
- sections:
- local: so101
title: SO-101
- local: so100
title: SO-100
- local: koch
title: Koch v1.1
- local: lekiwi
title: LeKiwi
- local: hope_jr
title: Hope Jr
- local: reachy2
title: Reachy 2
- local: unitree_g1
title: Unitree G1
- local: earthrover_mini_plus
title: Earth Rover Mini
- local: omx
title: OMX
- local: openarm
title: OpenArm
title: "Robots"
- sections:
- local: phone_teleop
title: Phone
title: "Teleoperators"
- sections:
- local: cameras
title: Cameras
title: "Sensors"
- sections:
- local: torch_accelerators
title: PyTorch accelerators
title: "Supported Hardware"
- sections:
- local: notebooks
title: Notebooks
- local: feetech
title: Updating Feetech Firmware
- local: damiao
title: Damiao Motors and CAN Bus
title: "Resources"
- sections:
- local: contributing
title: Contribute to LeRobot
- local: backwardcomp
title: Backward compatibility
title: "About"