diff --git a/docs/source/evaluation.mdx b/docs/source/evaluation.mdx index 6ad2e7ae6..d80140f39 100644 --- a/docs/source/evaluation.mdx +++ b/docs/source/evaluation.mdx @@ -98,15 +98,6 @@ For multi-task benchmarks (e.g. LIBERO with 10 tasks), environments are wrapped | Out of GPU memory | Decrease `batch_size`, or use `--policy.use_amp=true` | | Debugging / single-stepping | `--eval.batch_size=1 --eval.use_async_envs=false` | -### Benchmarks - -Measured with `pepijn223/smolvla_libero` on `libero_spatial` (10 tasks, 100 episodes total): - -| Configuration | Wall time | GPU util | -|---|---|---| -| `batch_size=1` (sync) | ~400s | 0–8% | -| `batch_size=10` (async) | ~189s | 0–99% | - ## Output Results are written to `output_dir` (default: `outputs/eval//