The server now returns only n_action_steps actions from the predicted
chunk instead of the full chunk_size, enabling more frequent
re-planning when n_action_steps < chunk_size.
Made-with: Cursor
Each _InferenceServer now gets its own preprocessor/postprocessor
instances, preventing RuntimeError from HuggingFace tokenizer's
non-thread-safe Rust borrow checker when multiple servers run
concurrently.
Made-with: Cursor
New eval.runtime=multiprocess spawns local lerobot-eval-worker
subprocesses instead of Docker containers. Supports eval.policy_servers
for parallel inference. Works on SLURM clusters and anywhere Docker
is unavailable.
Usage: lerobot-eval --eval.runtime=multiprocess \
--eval.instance_count=8 --eval.policy_servers=4 --eval.port=50051
Made-with: Cursor
Add eval.policy_servers parameter (default 1) that spawns N independent
policy inference servers on consecutive ports. Containers are round-robin
assigned across servers, enabling parallel GPU inference for small models
like SmolVLA (~1.4GB each).
Usage: --eval.policy_servers=4 --eval.instance_count=20
→ 4 model copies on GPU, 20 containers distributed across them.
Made-with: Cursor
The generic libero benchmark case was falling through to the wildcard
install path, which doesn't pre-create ~/.libero/config.yaml. This
caused an interactive input() prompt that crashes in Docker (EOFError).
Made-with: Cursor
- Add NVIDIA EGL/Vulkan vendor ICDs and graphics libs to both Dockerfiles
- Refactor LIBERO-plus asset download into a separate build step
- Fix KeyError in datasets/factory.py when stats dict is None or missing keys
Made-with: Cursor
The LIBERO-plus assets.zip has a deeply nested path
(inspire/hdd/project/.../assets) that didn't match the shallow glob.
Use recursive glob to find assets/scenes regardless of nesting depth.
Made-with: Cursor
The benchmark containers were missing the scene/texture/object assets
required by LIBERO-plus. Download them from HuggingFace Hub during the
Docker build so containers are self-contained and ready to run.
Made-with: Cursor
The upstream libero __init__.py calls input() when ~/.libero/config.yaml
is missing, which crashes in non-interactive Docker containers with
EOFError. Pre-create the config with default paths at build time using
importlib.util.find_spec to locate the module without triggering the
problematic import.
Made-with: Cursor
The generic Dockerfile.benchmark was using a plain `uv pip install ".[libero_plus]"`
which silently fails to make `libero` importable due to an upstream LIBERO-plus
packaging bug. Port the dedicated clone + .pth workaround from
Dockerfile.eval-libero-plus so `docker build --build-arg BENCHMARK=libero_plus`
produces working containers.
Also fix eval worker using nonexistent `parser.parse()` — use `draccus.parse()`.
Made-with: Cursor
robocasa/__init__.py hard-asserts numpy==2.2.5. When bundled with other
packages in one uv install command, uv silently skips the numpy pin
(same "already resolved" bug hit with libero_plus). Moving the pin to a
dedicated final RUN step guarantees it is applied last.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
robocasa/__init__.py asserts mujoco.__version__ == "3.3.1" and aborts
with an error if any other version is installed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
robocasa README explicitly says to use the master branch of
ARISE-Initiative/robosuite (no robocasa-specific branch exists).
Also install robocasa with --no-deps to bypass its lerobot==0.3.3
pin, and declare its actual runtime deps explicitly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Standard robosuite 1.4.x from PyPI doesn't include PandaOmron and
other robocasa-specific robots. robocasa requires the fork at
ARISE-Initiative/robosuite@robocasa_v1.4.1. Install both from source
with --no-deps; shared deps (easydict, scikit-image, scipy) installed
explicitly first.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
robocasa is not published to PyPI, so uv can't resolve it as a plain
package dep. Fix by installing its runtime deps explicitly and cloning
robocasa from GitHub with --no-deps (same pattern as libero_plus).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mani-skill==3.0.0b21 requires numpy<2.0.0 in addition to gymnasium==0.29.1,
both conflicting with lerobot's base requirements.
numpy 1.26.4 is runtime-compatible with lerobot's usage (no numpy 2.x-only
APIs are used in the eval worker or env wrappers).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mani-skill==3.0.0b21 (robomme dep) pins gymnasium==0.29.1, conflicting
with lerobot's gymnasium>=1.1.1. Use uv --override to force 0.29.1.
Both 0.29.x and 1.x use the same 5-tuple step() API (introduced in
gymnasium 0.26), so the eval worker and RoboMMEGymEnv wrapper are
fully compatible with the downgraded version.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Temporary diagnostic to identify why uv sees robosuite as already
installed in the base venv despite it not being a base lerobot dep.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The libero @ git+...@main dep had empty install_requires, causing uv to
skip robosuite (and other deps) during resolution — they appeared
"already resolved" from a stale git dep cache even though not installed.
Fix: use lerobot[libero] as the dep source (hf-libero properly declares
all deps including robosuite via robomimic). The LIBERO-plus Python
module is installed from the git clone with --no-deps, so hf-libero's
declared deps are used but LIBERO-plus's environments override via .pth.
Also remove egl_probe (broken original) duplicate alongside hf-egl-probe.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without this, the server's local .venv gets copied into the image by
the final COPY . . step in Dockerfile.eval-base, overwriting the
freshly-created uv venv. uv then sees those packages as already
installed and skips them — but they may be missing or built for the
wrong environment, causing ModuleNotFoundError at runtime.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cmake 4.x removed backward compat with cmake_minimum_required < 3.5,
breaking egl-probe compilation. Setting CMAKE_POLICY_VERSION_MINIMUM=3.5
in the base image ENV re-enables it so robomimic's egl-probe builds.
Also adds --no-cache-base flag to build script so the base can be
force-rebuilt when Dockerfile.eval-base changes, and pins hf-egl-probe
in libero extras as the upstream-fixed fork of egl-probe.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pip's backtracking resolver hits 'resolution-too-deep' on complex
dependency graphs (robomme → mani-skill, libero_plus → robosuite/bddl).
uv resolves the same graphs in seconds without backtracking issues.
Also removes the now-redundant PATH= prefix since uv and python are
already on PATH via the base image ENV.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bddl imports future (from-future) at package init but doesn't declare it
as a dependency. This caused ModuleNotFoundError inside the benchmark
Docker container when verifying the libero_plus install.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Builds lerobot-eval-base then each benchmark image (libero, libero_plus,
robomme, robocasa), runs the smoke tests, and optionally pushes to Docker Hub.
Usage:
bash docker/build_benchmark_images.sh # local only
bash docker/build_benchmark_images.sh --push --hub_org=<org> # push to Hub
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_orchestrate_multi_instance_eval spawns extra lerobot-eval processes that each
call run_eval_in_docker again, creating N^2 containers. For docker runtime,
instance_count directly controls how many env-worker containers are spawned by
run_eval_in_docker — no process-level orchestration is needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the missing lerobot.utils.hf_eval_results module imported by
lerobot_eval.py but never created. Provides:
- default_eval_date(): today's UTC date as ISO-8601
- build_eval_results_rows(): converts eval info dict → HF .eval_results rows
- upload_eval_results_yaml(): serialises rows and uploads to Hub model repo
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add docker_runtime.py (host-side) and lerobot_eval_worker.py (container-side)
for --eval.runtime=docker. Policy loads once on the host GPU; Docker containers
run env-only workers that call back via HTTP for action chunks, maximising GPU
utilisation across parallel benchmark tasks.
- _InferenceServer: HTTP server wrapping predict_action_chunk with a single lock
- run_eval_in_docker: spawns instance_count containers, collects + merges per-task
JSON, writes eval_info.json compatible with _aggregate_eval_from_per_task
- lerobot-eval-worker CLI: make_env → shard tasks → run episodes → write JSON
- EvalDockerConfig: add port field (default 50051)
- pyproject.toml: add lerobot-eval-worker entry point
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace docker-compose.benchmark.yml with per-image docker build/run
instructions. Each benchmark is built and tested independently via
--build-arg BENCHMARK=<name> on Dockerfile.benchmark.
Co-Authored-By: Claude <noreply@anthropic.com>
Add Dockerfile.benchmark (parameterized via ARG BENCHMARK), a
docker-compose.benchmark.yml with services for libero, libero_plus,
robomme, and robocasa, and a smoke_test_benchmark.sh that verifies
imports and CLI entry-points in each container.
Also add the missing `robocasa` optional dep group to pyproject.toml
(the docs already referenced `pip install ".[robocasa]"` but the group
was not defined).
Build a specific benchmark image:
docker build --build-arg BENCHMARK=robomme \
-f docker/Dockerfile.benchmark -t lerobot-benchmark-robomme .
Build all via compose:
docker compose -f docker/docker-compose.benchmark.yml build
Smoke-test inside a container:
docker compose -f docker/docker-compose.benchmark.yml run --rm robomme \
bash docker/smoke_test_benchmark.sh
Co-Authored-By: Claude <noreply@anthropic.com>
New CLI tool that fetches eval results from multiple Hub model repos
and produces a self-contained HTML leaderboard with sortable columns,
per-suite breakdowns, best-in-column highlighting, and filtering.
Made-with: Cursor
Saves eval_config.json locally and uploads it alongside results. The
model card now includes a collapsible "Eval configuration" section
showing the full config JSON used for the evaluation run.
Made-with: Cursor
Adds a push_to_hub flag to lerobot-eval that uploads eval_info.json,
rollout videos, and appends an evaluation results table to the model
card on Hugging Face. Also declares missing LIBERO-plus runtime deps
in pyproject.toml and adds an asset validation check for libero_plus.
Made-with: Cursor
Add LiberoPlusEnv config (subclass of LiberoEnv), register libero_plus
env type in factory, add import fallbacks for LIBERO-plus package
structure, and add libero_plus optional dependency group in pyproject.toml.
Made-with: Cursor
Integrates 5 selected RoboCasa kitchen tasks (3 short + 2 long) as a
LeRobot benchmark environment, following the same pattern as Libero.
Selected tasks:
Short: PickPlaceCounterToCabinet, PrepareToast, CoffeeSetupMug
Long: PrepareCoffee, RestockPantry
Changes:
- envs/robocasa.py: RoboCasaEnv wrapper with flat 12D Box action space,
3-camera pixel obs, and 16D proprioceptive state
- envs/configs.py: RoboCasaEnv config with features_map
- envs/factory.py: wire robocasa into make_env + make_env_pre_post_processors
- processor/env_processor.py: RoboCasaProcessorStep for obs key remapping
- tests/test_robocasa_env.py: full test suite (auto-skips if assets missing)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a `cudnn_deterministic` flag to `TrainPipelineConfig` (default: False)
that sets `torch.backends.cudnn.deterministic = True` and disables benchmark
mode, eliminating CUDA floating-point non-determinism at the cost of ~10-20%
training speed. When False (default) the existing benchmark=True behaviour
is preserved.