Files
lerobot/benchmarks/libero/README.md
T
Pepijn fd00e38851 feat(benchmarks): add LIBERO training benchmark pipeline
Single-script benchmark that trains and evaluates all 9 LeRobot policies
on LIBERO. Each SLURM job self-publishes its result row to a HuggingFace
leaderboard dataset — no separate collection step needed.

Policies: pi0, pi0_fast, pi05, groot, act, diffusion, smolvla, xvla,
multi_task_dit. 5000 steps, BS 256, with per-policy GPU allocation and
default LR/scheduler presets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 17:01:49 +02:00

2.6 KiB

LeRobot LIBERO Training Benchmark

Train and evaluate all LeRobot policies on LIBERO and publish results as a HuggingFace leaderboard dataset.

Policies

Policy Base Model GPUs LR Chunk Notes
pi0 lerobot/pi0_base 8 2.5e-5 30 PaliGemma + Gemma flow matching
pi0_fast lerobot/pi0fast-base 8 2.5e-5 30 Requires tokenizer pre-training
pi05 lerobot/pi05_base 8 2.5e-5 30 Quantiles normalization
groot nvidia/GR00T-N1.5-3B 8 1e-4 30 bf16, diffusion head + projector only
act From scratch 1 1e-5 30 ResNet-18, lightweight
diffusion From scratch 1 1e-4 32* U-Net, horizon must be divisible by 8
smolvla lerobot/smolvla_base 8 1e-4 30 SmolVLM2-500M
xvla lerobot/xvla-widowx 4 1e-4 32* Florence2 + CLIP
multi_task_dit From scratch 1 2e-5 32* CLIP + DiT

* These policies use horizon rather than chunk_size. Set to 32 (nearest valid value to 30).

Training spec

  • Steps: 5,000 per policy
  • Batch size: 32 per GPU (effective BS = 256 for multi-GPU)
  • Dataset: lerobot/libero (libero_spatial)
  • Evaluation: 20 episodes after training
  • LR: each policy's default optimizer/scheduler preset
  • Results: each SLURM job publishes its own row to the HF leaderboard dataset automatically

Quick start

1. Generate SLURM scripts

python benchmarks/libero/run_benchmark.py \
    --output_dir /scratch/lerobot-benchmark \
    --hub_org lerobot

2. Submit jobs

# If using pi0_fast, submit tokenizer first:
sbatch /scratch/lerobot-benchmark/slurm_scripts/00_tokenizer.sh
# Wait, then submit pi0_fast

# All other policies can run in parallel:
for script in /scratch/lerobot-benchmark/slurm_scripts/[0-9][0-9]_*.sh; do
    [[ "$script" == *pi0_fast* ]] && continue
    sbatch "$script"
done

Each job publishes its result to lerobot/benchmark-libero on the Hub when it finishes.

Prerequisites

  • SLURM cluster with CUDA GPUs (A100 80GB recommended for VLM policies)
  • pip install lerobot[pi,smolvla,groot,xvla,multi_task_dit,libero] datasets
  • huggingface-cli login