mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-16 17:20:05 +00:00
fd00e38851
Single-script benchmark that trains and evaluates all 9 LeRobot policies on LIBERO. Each SLURM job self-publishes its result row to a HuggingFace leaderboard dataset — no separate collection step needed. Policies: pi0, pi0_fast, pi05, groot, act, diffusion, smolvla, xvla, multi_task_dit. 5000 steps, BS 256, with per-policy GPU allocation and default LR/scheduler presets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2.6 KiB
2.6 KiB
LeRobot LIBERO Training Benchmark
Train and evaluate all LeRobot policies on LIBERO and publish results as a HuggingFace leaderboard dataset.
Policies
| Policy | Base Model | GPUs | LR | Chunk | Notes |
|---|---|---|---|---|---|
| pi0 | lerobot/pi0_base | 8 | 2.5e-5 | 30 | PaliGemma + Gemma flow matching |
| pi0_fast | lerobot/pi0fast-base | 8 | 2.5e-5 | 30 | Requires tokenizer pre-training |
| pi05 | lerobot/pi05_base | 8 | 2.5e-5 | 30 | Quantiles normalization |
| groot | nvidia/GR00T-N1.5-3B | 8 | 1e-4 | 30 | bf16, diffusion head + projector only |
| act | From scratch | 1 | 1e-5 | 30 | ResNet-18, lightweight |
| diffusion | From scratch | 1 | 1e-4 | 32* | U-Net, horizon must be divisible by 8 |
| smolvla | lerobot/smolvla_base | 8 | 1e-4 | 30 | SmolVLM2-500M |
| xvla | lerobot/xvla-widowx | 4 | 1e-4 | 32* | Florence2 + CLIP |
| multi_task_dit | From scratch | 1 | 2e-5 | 32* | CLIP + DiT |
* These policies use horizon rather than chunk_size. Set to 32 (nearest valid value to 30).
Training spec
- Steps: 5,000 per policy
- Batch size: 32 per GPU (effective BS = 256 for multi-GPU)
- Dataset:
lerobot/libero(libero_spatial) - Evaluation: 20 episodes after training
- LR: each policy's default optimizer/scheduler preset
- Results: each SLURM job publishes its own row to the HF leaderboard dataset automatically
Quick start
1. Generate SLURM scripts
python benchmarks/libero/run_benchmark.py \
--output_dir /scratch/lerobot-benchmark \
--hub_org lerobot
2. Submit jobs
# If using pi0_fast, submit tokenizer first:
sbatch /scratch/lerobot-benchmark/slurm_scripts/00_tokenizer.sh
# Wait, then submit pi0_fast
# All other policies can run in parallel:
for script in /scratch/lerobot-benchmark/slurm_scripts/[0-9][0-9]_*.sh; do
[[ "$script" == *pi0_fast* ]] && continue
sbatch "$script"
done
Each job publishes its result to lerobot/benchmark-libero on the Hub when it finishes.
Prerequisites
- SLURM cluster with CUDA GPUs (A100 80GB recommended for VLM policies)
pip install lerobot[pi,smolvla,groot,xvla,multi_task_dit,libero] datasetshuggingface-cli login