pip's backtracking resolver hits 'resolution-too-deep' on complex
dependency graphs (robomme → mani-skill, libero_plus → robosuite/bddl).
uv resolves the same graphs in seconds without backtracking issues.
Also removes the now-redundant PATH= prefix since uv and python are
already on PATH via the base image ENV.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bddl imports future (from-future) at package init but doesn't declare it
as a dependency. This caused ModuleNotFoundError inside the benchmark
Docker container when verifying the libero_plus install.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Builds lerobot-eval-base then each benchmark image (libero, libero_plus,
robomme, robocasa), runs the smoke tests, and optionally pushes to Docker Hub.
Usage:
bash docker/build_benchmark_images.sh # local only
bash docker/build_benchmark_images.sh --push --hub_org=<org> # push to Hub
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_orchestrate_multi_instance_eval spawns extra lerobot-eval processes that each
call run_eval_in_docker again, creating N^2 containers. For docker runtime,
instance_count directly controls how many env-worker containers are spawned by
run_eval_in_docker — no process-level orchestration is needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the missing lerobot.utils.hf_eval_results module imported by
lerobot_eval.py but never created. Provides:
- default_eval_date(): today's UTC date as ISO-8601
- build_eval_results_rows(): converts eval info dict → HF .eval_results rows
- upload_eval_results_yaml(): serialises rows and uploads to Hub model repo
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add docker_runtime.py (host-side) and lerobot_eval_worker.py (container-side)
for --eval.runtime=docker. Policy loads once on the host GPU; Docker containers
run env-only workers that call back via HTTP for action chunks, maximising GPU
utilisation across parallel benchmark tasks.
- _InferenceServer: HTTP server wrapping predict_action_chunk with a single lock
- run_eval_in_docker: spawns instance_count containers, collects + merges per-task
JSON, writes eval_info.json compatible with _aggregate_eval_from_per_task
- lerobot-eval-worker CLI: make_env → shard tasks → run episodes → write JSON
- EvalDockerConfig: add port field (default 50051)
- pyproject.toml: add lerobot-eval-worker entry point
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace docker-compose.benchmark.yml with per-image docker build/run
instructions. Each benchmark is built and tested independently via
--build-arg BENCHMARK=<name> on Dockerfile.benchmark.
Co-Authored-By: Claude <noreply@anthropic.com>
Add Dockerfile.benchmark (parameterized via ARG BENCHMARK), a
docker-compose.benchmark.yml with services for libero, libero_plus,
robomme, and robocasa, and a smoke_test_benchmark.sh that verifies
imports and CLI entry-points in each container.
Also add the missing `robocasa` optional dep group to pyproject.toml
(the docs already referenced `pip install ".[robocasa]"` but the group
was not defined).
Build a specific benchmark image:
docker build --build-arg BENCHMARK=robomme \
-f docker/Dockerfile.benchmark -t lerobot-benchmark-robomme .
Build all via compose:
docker compose -f docker/docker-compose.benchmark.yml build
Smoke-test inside a container:
docker compose -f docker/docker-compose.benchmark.yml run --rm robomme \
bash docker/smoke_test_benchmark.sh
Co-Authored-By: Claude <noreply@anthropic.com>
New CLI tool that fetches eval results from multiple Hub model repos
and produces a self-contained HTML leaderboard with sortable columns,
per-suite breakdowns, best-in-column highlighting, and filtering.
Made-with: Cursor
Saves eval_config.json locally and uploads it alongside results. The
model card now includes a collapsible "Eval configuration" section
showing the full config JSON used for the evaluation run.
Made-with: Cursor
Adds a push_to_hub flag to lerobot-eval that uploads eval_info.json,
rollout videos, and appends an evaluation results table to the model
card on Hugging Face. Also declares missing LIBERO-plus runtime deps
in pyproject.toml and adds an asset validation check for libero_plus.
Made-with: Cursor
Add LiberoPlusEnv config (subclass of LiberoEnv), register libero_plus
env type in factory, add import fallbacks for LIBERO-plus package
structure, and add libero_plus optional dependency group in pyproject.toml.
Made-with: Cursor
Integrates 5 selected RoboCasa kitchen tasks (3 short + 2 long) as a
LeRobot benchmark environment, following the same pattern as Libero.
Selected tasks:
Short: PickPlaceCounterToCabinet, PrepareToast, CoffeeSetupMug
Long: PrepareCoffee, RestockPantry
Changes:
- envs/robocasa.py: RoboCasaEnv wrapper with flat 12D Box action space,
3-camera pixel obs, and 16D proprioceptive state
- envs/configs.py: RoboCasaEnv config with features_map
- envs/factory.py: wire robocasa into make_env + make_env_pre_post_processors
- processor/env_processor.py: RoboCasaProcessorStep for obs key remapping
- tests/test_robocasa_env.py: full test suite (auto-skips if assets missing)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a `cudnn_deterministic` flag to `TrainPipelineConfig` (default: False)
that sets `torch.backends.cudnn.deterministic = True` and disables benchmark
mode, eliminating CUDA floating-point non-determinism at the cost of ~10-20%
training speed. When False (default) the existing benchmark=True behaviour
is preserved.
* fix(ci): skip HF log in (and tests) in forks and community PRs
* chore(test): remove comment about test meant to be only run locally
* fix(tests): no hf log in decorator for xvla
* fix(test): no decorator in yield
* Add SLURM SARM progress annotation script.
Provide a standalone two-stage compute/aggregate pipeline for RA-BC progress generation so large datasets can be processed in parallel and optionally uploaded to the Hub.
Made-with: Cursor
* fix pr comments
* remove comments
* chore(docstrings): updating v2.1-v3.0 conversion script docstrings to match the new task label
* chore(task): renamming the default index label in the tasks DataFrame to task
* Revert "chore(docstrings): updating v2.1-v3.0 conversion script docstrings to match the new task label"
This reverts commit f55de3255278f23f18b5d955565f6768d094951d.
* chore(docstrings): updating docstrings to match dataset v3.0 architecture
* chore(format): formatting code
* Fixing metadata indexing when writing new Parquet file
Summary:
- addressing this issue: https://github.com/huggingface/lerobot/issues/2401
- vibe-coded bugfix by Claude Sonnet 4.5
* Backing out changes to convert_videos_of_camera
* Addressing Ruff pre-commit complaint
Summary:
- addressing "SIM113 Use `enumerate()` for index variable `ep_idx` in `for` loop"
---------
Co-authored-by: Paul <238953601+pac-robotics@users.noreply.github.com>
* fix(root): adding proper support for the root and new_root arguments
* feat(roots): adding a roots agrument for the merge operation
* chore(clean): cleaning up code
* chore(doctrings): updating doctrings with new features
* fix(repo_id): setting repo_id to None when not needed
* fix(roots/repo_ids): making mypy happy by using repo_ids and roots for merge operation
* fix(path): fixing path related issues
* fix(repo_id): fixing issues related to repo_id
* chore(doctrings): updating docstrings + fix typo
* chore(clean): cleaning code
* fix(split new_repo_id): reverting new_repo_id addition for split operation
* docs(dosctrings): completing docstrings
* fix(repo_ids/roots): improving checks for repo_ids/roots lengths
* fix(repo_ids): making repo_ids optional in MergeConfig but raise if not given
* fix(docstrings): fixing docstrings for split operation
* fix(hints): updating get_output_path hints to accept paths as strings too
* fix(y/N prompts): removing y/N prompts in lerobot_edit_dataset
* fix(merge repo_id): fixing merge operation to use new_repo_id instead of repo_id
* fix(typo): fixing typo in doctrings
* fix(frame_index): making rerun's "frame_index" timeline compatible with behaviour1k datasets
* fix(segfault risk): removing segfault risk by calling batch["index"] in the dataloader loop