Commit Graph

1363 Commits

Author SHA1 Message Date
Pepijn Kooijmans 8770c011b0 feat(eval): add per-episode timing logs to eval worker
Logs avg env step time, avg inference call time, and totals per episode
to identify whether env or policy server is the bottleneck.

Made-with: Cursor
2026-03-25 07:30:49 +01:00
Pepijn Kooijmans ddcda8f1ca feat(eval): respect n_action_steps in inference server
The server now returns only n_action_steps actions from the predicted
chunk instead of the full chunk_size, enabling more frequent
re-planning when n_action_steps < chunk_size.

Made-with: Cursor
2026-03-25 06:21:02 +01:00
Pepijn Kooijmans 4f8ebe41b3 fix(eval): create independent preprocessors per policy server
Each _InferenceServer now gets its own preprocessor/postprocessor
instances, preventing RuntimeError from HuggingFace tokenizer's
non-thread-safe Rust borrow checker when multiple servers run
concurrently.

Made-with: Cursor
2026-03-24 22:38:03 +01:00
Pepijn Kooijmans 066976e078 feat(eval): add multiprocess runtime -- no Docker needed
New eval.runtime=multiprocess spawns local lerobot-eval-worker
subprocesses instead of Docker containers. Supports eval.policy_servers
for parallel inference. Works on SLURM clusters and anywhere Docker
is unavailable.

Usage: lerobot-eval --eval.runtime=multiprocess \
    --eval.instance_count=8 --eval.policy_servers=4 --eval.port=50051
Made-with: Cursor
2026-03-24 22:14:27 +01:00
Pepijn Kooijmans b3c2592ace feat(eval): multi-policy-server support for Docker eval
Add eval.policy_servers parameter (default 1) that spawns N independent
policy inference servers on consecutive ports. Containers are round-robin
assigned across servers, enabling parallel GPU inference for small models
like SmolVLA (~1.4GB each).

Usage: --eval.policy_servers=4 --eval.instance_count=20
  → 4 model copies on GPU, 20 containers distributed across them.
Made-with: Cursor
2026-03-24 20:28:58 +01:00
Pepijn Kooijmans b97ea8999f fix(docker): create libero config.yaml for non-plus LIBERO builds
The generic libero benchmark case was falling through to the wildcard
install path, which doesn't pre-create ~/.libero/config.yaml. This
caused an interactive input() prompt that crashes in Docker (EOFError).

Made-with: Cursor
2026-03-24 07:11:18 +01:00
Pepijn Kooijmans 69aeda68f5 Docker EGL/GLVND support + asset download refactor + imagenet stats fix
- Add NVIDIA EGL/Vulkan vendor ICDs and graphics libs to both Dockerfiles
- Refactor LIBERO-plus asset download into a separate build step
- Fix KeyError in datasets/factory.py when stats dict is None or missing keys

Made-with: Cursor
2026-03-23 23:33:22 +01:00
Pepijn Kooijmans a9e355bd03 Lazy env creation + smart sharding to fix container OOM 2026-03-23 23:15:23 +01:00
Pepijn Kooijmans aae68e3448 fix(docker): use recursive glob for deeply nested asset zip structure
The LIBERO-plus assets.zip has a deeply nested path
(inspire/hdd/project/.../assets) that didn't match the shallow glob.
Use recursive glob to find assets/scenes regardless of nesting depth.

Made-with: Cursor
2026-03-23 18:57:44 +01:00
Pepijn Kooijmans 4b9f6c4aed fix(docker): download LIBERO-plus assets (~6 GB) at image build time
The benchmark containers were missing the scene/texture/object assets
required by LIBERO-plus. Download them from HuggingFace Hub during the
Docker build so containers are self-contained and ready to run.

Made-with: Cursor
2026-03-23 18:48:25 +01:00
Pepijn Kooijmans 6057638fc1 fix(docker): pre-create libero config.yaml to avoid interactive input() prompt
The upstream libero __init__.py calls input() when ~/.libero/config.yaml
is missing, which crashes in non-interactive Docker containers with
EOFError. Pre-create the config with default paths at build time using
importlib.util.find_spec to locate the module without triggering the
problematic import.

Made-with: Cursor
2026-03-23 18:44:14 +01:00
Pepijn Kooijmans e52e7e644a fix(docker): add libero_plus install workaround to generic Dockerfile.benchmark
The generic Dockerfile.benchmark was using a plain `uv pip install ".[libero_plus]"`
which silently fails to make `libero` importable due to an upstream LIBERO-plus
packaging bug. Port the dedicated clone + .pth workaround from
Dockerfile.eval-libero-plus so `docker build --build-arg BENCHMARK=libero_plus`
produces working containers.

Also fix eval worker using nonexistent `parser.parse()` — use `draccus.parse()`.

Made-with: Cursor
2026-03-23 18:31:57 +01:00
Pepijn 8633608d26 fix(docker): pin numpy==2.2.5 in separate RUN for robocasa
robocasa/__init__.py hard-asserts numpy==2.2.5. When bundled with other
packages in one uv install command, uv silently skips the numpy pin
(same "already resolved" bug hit with libero_plus). Moving the pin to a
dedicated final RUN step guarantees it is applied last.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 22:04:42 -07:00
Pepijn 900e6b59c8 fix(docker): pin mujoco==3.3.1 for robocasa (hard assert on import)
robocasa/__init__.py asserts mujoco.__version__ == "3.3.1" and aborts
with an error if any other version is installed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 21:53:37 -07:00
Pepijn f844fe683c fix(docker): use robosuite master branch for robocasa (per README)
robocasa README explicitly says to use the master branch of
ARISE-Initiative/robosuite (no robocasa-specific branch exists).
Also install robocasa with --no-deps to bypass its lerobot==0.3.3
pin, and declare its actual runtime deps explicitly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 21:51:18 -07:00
Pepijn 4403675b31 fix(docker): install robocasa's robosuite fork (adds PandaOmron)
Standard robosuite 1.4.x from PyPI doesn't include PandaOmron and
other robocasa-specific robots. robocasa requires the fork at
ARISE-Initiative/robosuite@robocasa_v1.4.1. Install both from source
with --no-deps; shared deps (easydict, scikit-image, scipy) installed
explicitly first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 21:40:48 -07:00
Pepijn d18be0c3f4 feat(docker): add metaworld to default benchmark build list
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 21:38:54 -07:00
Pepijn 866f8adf11 fix(docker): install robocasa from GitHub source (not on PyPI)
robocasa is not published to PyPI, so uv can't resolve it as a plain
package dep. Fix by installing its runtime deps explicitly and cloning
robocasa from GitHub with --no-deps (same pattern as libero_plus).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 21:38:19 -07:00
Pepijn 3d6310c03d fix(docker): also override numpy==1.26.4 for robomme image
mani-skill==3.0.0b21 requires numpy<2.0.0 in addition to gymnasium==0.29.1,
both conflicting with lerobot's base requirements.

numpy 1.26.4 is runtime-compatible with lerobot's usage (no numpy 2.x-only
APIs are used in the eval worker or env wrappers).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 21:34:50 -07:00
Pepijn c3b26382e7 fix(docker): override gymnasium==0.29.1 for robomme image
mani-skill==3.0.0b21 (robomme dep) pins gymnasium==0.29.1, conflicting
with lerobot's gymnasium>=1.1.1. Use uv --override to force 0.29.1.

Both 0.29.x and 1.x use the same 5-tuple step() API (introduced in
gymnasium 0.26), so the eval worker and RoboMMEGymEnv wrapper are
fully compatible with the downgraded version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 21:28:34 -07:00
Pepijn e54e582a6f fix(docker): bypass uv extras chain bug in libero_plus Dockerfile
uv silently skips packages when resolving a nested extras chain
(lerobot[libero_plus] -> lerobot[libero] -> hf-libero -> robosuite).
POST-INSTALL grep confirmed robosuite absent after install despite uv
reporting 'Resolved 113 packages, Installed 1'.

Fix: install all libero_plus deps directly by name, bypassing the extras
chain entirely. Also add --plain flag to build script for verbose output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 21:12:26 -07:00
Pepijn 418791ebba debug(docker): add pre/post-install diagnostics to libero_plus Dockerfile
Temporary diagnostic to identify why uv sees robosuite as already
installed in the base venv despite it not being a base lerobot dep.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 21:03:42 -07:00
Pepijn ee3354a885 fix(docker): fix libero_plus deps by replacing git dep with lerobot[libero]
The libero @ git+...@main dep had empty install_requires, causing uv to
skip robosuite (and other deps) during resolution — they appeared
"already resolved" from a stale git dep cache even though not installed.

Fix: use lerobot[libero] as the dep source (hf-libero properly declares
all deps including robosuite via robomimic). The LIBERO-plus Python
module is installed from the git clone with --no-deps, so hf-libero's
declared deps are used but LIBERO-plus's environments override via .pth.

Also remove egl_probe (broken original) duplicate alongside hf-egl-probe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 20:57:14 -07:00
Pepijn 2cd06fe95b fix(docker): exclude .venv from Docker build context
Without this, the server's local .venv gets copied into the image by
the final COPY . . step in Dockerfile.eval-base, overwriting the
freshly-created uv venv. uv then sees those packages as already
installed and skips them — but they may be missing or built for the
wrong environment, causing ModuleNotFoundError at runtime.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 16:13:32 -07:00
Pepijn 7be84cb545 fix(docker): add CMAKE_POLICY_VERSION_MINIMUM=3.5 for cmake 4.x compat
cmake 4.x removed backward compat with cmake_minimum_required < 3.5,
breaking egl-probe compilation. Setting CMAKE_POLICY_VERSION_MINIMUM=3.5
in the base image ENV re-enables it so robomimic's egl-probe builds.

Also adds --no-cache-base flag to build script so the base can be
force-rebuilt when Dockerfile.eval-base changes, and pins hf-egl-probe
in libero extras as the upstream-fixed fork of egl-probe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 15:45:36 -07:00
Pepijn c35af1ae6a fix(docker): use uv pip instead of pip in benchmark Dockerfiles
pip's backtracking resolver hits 'resolution-too-deep' on complex
dependency graphs (robomme → mani-skill, libero_plus → robosuite/bddl).
uv resolves the same graphs in seconds without backtracking issues.

Also removes the now-redundant PATH= prefix since uv and python are
already on PATH via the base image ENV.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 15:17:03 -07:00
Pepijn 6fc024704e fix(deps): add missing future dep for bddl in libero_plus extras
bddl imports future (from-future) at package init but doesn't declare it
as a dependency. This caused ModuleNotFoundError inside the benchmark
Docker container when verifying the libero_plus install.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 11:17:51 -07:00
Pepijn c3b7a18f01 feat(docker): add build_benchmark_images.sh to build and push all eval images
Builds lerobot-eval-base then each benchmark image (libero, libero_plus,
robomme, robocasa), runs the smoke tests, and optionally pushes to Docker Hub.

Usage:
  bash docker/build_benchmark_images.sh                         # local only
  bash docker/build_benchmark_images.sh --push --hub_org=<org>  # push to Hub

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 11:03:09 -07:00
Pepijn 7fc0cdf68a fix(eval): skip multi-instance orchestration when runtime=docker
_orchestrate_multi_instance_eval spawns extra lerobot-eval processes that each
call run_eval_in_docker again, creating N^2 containers. For docker runtime,
instance_count directly controls how many env-worker containers are spawned by
run_eval_in_docker — no process-level orchestration is needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 22:48:06 -07:00
Pepijn 23bf69ebab feat(eval): add hf_eval_results utility for HF leaderboard upload
Implements the missing lerobot.utils.hf_eval_results module imported by
lerobot_eval.py but never created. Provides:
- default_eval_date(): today's UTC date as ISO-8601
- build_eval_results_rows(): converts eval info dict → HF .eval_results rows
- upload_eval_results_yaml(): serialises rows and uploads to Hub model repo

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 22:41:21 -07:00
Pepijn 3d5d8fa88a feat(eval): implement docker runtime with HTTP policy inference server
Add docker_runtime.py (host-side) and lerobot_eval_worker.py (container-side)
for --eval.runtime=docker. Policy loads once on the host GPU; Docker containers
run env-only workers that call back via HTTP for action chunks, maximising GPU
utilisation across parallel benchmark tasks.

- _InferenceServer: HTTP server wrapping predict_action_chunk with a single lock
- run_eval_in_docker: spawns instance_count containers, collects + merges per-task
  JSON, writes eval_info.json compatible with _aggregate_eval_from_per_task
- lerobot-eval-worker CLI: make_env → shard tasks → run episodes → write JSON
- EvalDockerConfig: add port field (default 50051)
- pyproject.toml: add lerobot-eval-worker entry point

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 22:35:59 -07:00
Pepijn e80c9e6270 chore(docker): remove docker-compose, use individual build/run commands
Replace docker-compose.benchmark.yml with per-image docker build/run
instructions. Each benchmark is built and tested independently via
--build-arg BENCHMARK=<name> on Dockerfile.benchmark.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-20 22:21:28 -07:00
Pepijn 39cf11d5dc feat(docker): add per-benchmark evaluation containers
Add Dockerfile.benchmark (parameterized via ARG BENCHMARK), a
docker-compose.benchmark.yml with services for libero, libero_plus,
robomme, and robocasa, and a smoke_test_benchmark.sh that verifies
imports and CLI entry-points in each container.

Also add the missing `robocasa` optional dep group to pyproject.toml
(the docs already referenced `pip install ".[robocasa]"` but the group
was not defined).

Build a specific benchmark image:
  docker build --build-arg BENCHMARK=robomme \
    -f docker/Dockerfile.benchmark -t lerobot-benchmark-robomme .

Build all via compose:
  docker compose -f docker/docker-compose.benchmark.yml build

Smoke-test inside a container:
  docker compose -f docker/docker-compose.benchmark.yml run --rm robomme \
    bash docker/smoke_test_benchmark.sh

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-20 22:21:28 -07:00
Pepijn Kooijmans 285c500aef speed up benchmark eval scheduling and docker workflow 2026-03-21 06:09:01 +01:00
Pepijn Kooijmans f60d163588 refactor(leaderboard): use a single positional file arg for repo IDs
Replace --repo-ids and --repo-ids-file with a single positional
repo_ids_file argument. Lines starting with # are ignored.

Made-with: Cursor
2026-03-16 02:58:28 +01:00
Pepijn Kooijmans 4a04465bb8 feat: add lerobot-leaderboard to generate interactive eval comparison pages
New CLI tool that fetches eval results from multiple Hub model repos
and produces a self-contained HTML leaderboard with sortable columns,
per-suite breakdowns, best-in-column highlighting, and filtering.

Made-with: Cursor
2026-03-16 02:57:37 +01:00
Pepijn Kooijmans 464532ec37 feat(eval): include eval config (policy, env, eval settings) in Hub push
Saves eval_config.json locally and uploads it alongside results. The
model card now includes a collapsible "Eval configuration" section
showing the full config JSON used for the evaluation run.

Made-with: Cursor
2026-03-16 02:42:38 +01:00
Pepijn Kooijmans 89f9bd78ab feat(eval): add --push_to_hub to upload eval results, videos, and model card to Hub
Adds a push_to_hub flag to lerobot-eval that uploads eval_info.json,
rollout videos, and appends an evaluation results table to the model
card on Hugging Face. Also declares missing LIBERO-plus runtime deps
in pyproject.toml and adds an asset validation check for libero_plus.

Made-with: Cursor
2026-03-16 02:39:24 +01:00
pepijn c9cfc88602 feat: add benchmark orchestration, LIBERO-plus install parity, and eval hardening
- Add lerobot-benchmark CLI for multi-benchmark train/eval workflows
- Add benchmark_training.mdx documentation
- Add libero-plus pip extra alias with EGL probe deps matching standard libero
- Harden libero.py: wand mock, init-state fallback, renderer EGL→OSMesa fallback
- Add multimodal_analysis.py script and SLURM training template

Made-with: Cursor
2026-03-15 05:52:53 +00:00
pepijn 7bef12a461 feat(envs): add RoboMME memory-augmented manipulation benchmark
- RoboMMEEnv config with 16 tasks across 4 suites (Counting, Permanence,
  Reference, Imitation)
- Gymnasium wrapper around BenchmarkEnvBuilder (robomme.py)
- Environment factory wiring for env_type="robomme"
- robomme optional dependency in pyproject.toml

Made-with: Cursor
2026-03-13 04:44:32 +00:00
pepijn db5c26f07d feat(envs): add LIBERO-plus integration for evaluation benchmarks
Add LiberoPlusEnv config (subclass of LiberoEnv), register libero_plus
env type in factory, add import fallbacks for LIBERO-plus package
structure, and add libero_plus optional dependency group in pyproject.toml.

Made-with: Cursor
2026-03-12 04:31:09 +00:00
Pepijn 8904768db4 feat(envs): add RoboCasa composite-task benchmark integration
Integrates 5 selected RoboCasa kitchen tasks (3 short + 2 long) as a
LeRobot benchmark environment, following the same pattern as Libero.

Selected tasks:
  Short: PickPlaceCounterToCabinet, PrepareToast, CoffeeSetupMug
  Long:  PrepareCoffee, RestockPantry

Changes:
- envs/robocasa.py: RoboCasaEnv wrapper with flat 12D Box action space,
  3-camera pixel obs, and 16D proprioceptive state
- envs/configs.py: RoboCasaEnv config with features_map
- envs/factory.py: wire robocasa into make_env + make_env_pre_post_processors
- processor/env_processor.py: RoboCasaProcessorStep for obs key remapping
- tests/test_robocasa_env.py: full test suite (auto-skips if assets missing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 17:08:32 +01:00
Steven Palma b0efa73520 chore(dependencies): Bump lerobot to 0.5.1 (#3118) 2026-03-09 12:43:32 +01:00
Steven Palma 00b662de02 chore(dependencies): Bump lerobot to 0.5.0 (#3117) v0.5.0 2026-03-09 11:34:52 +01:00
Steven Palma 5c51a74484 chore(deps): update requirements file (#3114) 2026-03-09 11:18:05 +01:00
Steven Palma db8547e35d test(cameras): skip flaky async_read test (#3106) 2026-03-08 14:02:33 +01:00
Steven Palma c17d949531 chore(readme): update citation with ICLR26 paper (#3107)
* peer reviewed citation 🎉

Signed-off-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>

* add iclr year

Signed-off-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>

* fix quentin's spelling name

Signed-off-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>

* docs(readme): update citation

---------

Signed-off-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>
Co-authored-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>
2026-03-08 14:01:43 +01:00
Steven Palma 1e131f93f8 chore(docs): add uv installation instructions (#3105)
* chore(docs): add uv installation instructions

* fix(docs): format tabs

* chore(docs): small details

* chore(docs): last details uv installation instructions

* chore(docs): last detail

---

Co-authored-by: sahilmaniyar888 <156301258+sahilmaniyar888@users.noreply.github.com>
2026-03-08 13:00:06 +01:00
Ignat Georgiev 2fb5c7add0 feat(train): add cudnn_deterministic option for reproducible training (#3102)
Add a `cudnn_deterministic` flag to `TrainPipelineConfig` (default: False)
that sets `torch.backends.cudnn.deterministic = True` and disables benchmark
mode, eliminating CUDA floating-point non-determinism at the cost of ~10-20%
training speed. When False (default) the existing benchmark=True behaviour
is preserved.
2026-03-08 12:29:33 +01:00
Martino Russi 4f2ef024d8 feat(robots): Unitree G1 WBC implementation (#2876)
* move locomotion from examples to robot, move controller to teleoperator class

* modify teleoperate to send back actions to robot

* whole body controller

* add holosoma to locomotros

* various updates

* update joint zeroing etc

* ensure safefail with locomotion

* add unitree locomotion

* launch camera from g1 server

* publish at varying framerates

* fix async read in camera

* attempting to fix camera lag

* test camera speedup

* training

* inference works

* remove logging from pi0

* remove logging

* push local changes

* testing

* final changes

* revert control_utils

* revert utils

* revert

* revert g1

* revert again:

* revert utils

* push recents

* remove examples

* remove junk

* remove mjlog

* revergt edit_dataset

* Update lerobot_edit_dataset.py

Signed-off-by: Martino Russi <77496684+nepyope@users.noreply.github.com>

* undo teleop changes

* revert logging

* remove loggings

* remove loogs

* revert dataset tools

* Update dataset_tools.py

Signed-off-by: Martino Russi <77496684+nepyope@users.noreply.github.com>

* move gravity to utils

* revert changes

* remove matplotlib viewer (rerun works fine)

* factory revert

* send policy action directly

* recent changes

* implement flexible action space

* send empty command if arms are missing

* rename locomotion to controller

* add init

* implement feedback

* add feedback for teleoperator

* fix ruff

* fix ruff

* use read_latest

* fix zmq camera

* revert exo_serial

* simplify PR

* revert exo_changes

* revert camera_zmq

* Update camera_zmq.py

Signed-off-by: Martino Russi <77496684+nepyope@users.noreply.github.com>

* remove frame duplication from zmq server

* revert channerfactoryinitialize

* keep channelfactoryinitialize

* remove zeroing out logic

* fix typo

* refactor teleop class

* simplify teleop further

* import armindex at the top

* fix visualizer again

* revert ik helper

* push stuff

* simplify image_server

* update image_server

* asd

* add threading logic

* simplify ik helper stuff

* simplify holosoma

* fix names

* fix docs

* revert leg override

* clean connect

* fix controller

* fix ruff

* clean teleoperator

* set_from_wireless

* avoid double initializations

* refactor robot class

* fix pre-commit

* update docs

* update docs format

* add teleop instructions

* unitree_g1 specific exception in record/teleoperate

* add thumbnail to docs

* add thumbnail to doc

* refactor(unitree): multiple improvements (#3103)

* refactor(unitree): multiple improvements

* test(unitree): added tests + improved installation instructions

* refactor(robots): minor changes unitree robot kinematic

* chore(robots): rename g1 kinematics file

---------

Signed-off-by: Martino Russi <77496684+nepyope@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Steven Palma <steven.palma@huggingface.co>
2026-03-08 11:33:24 +01:00