lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-05-15 00:29:52 +00:00

Author	SHA1	Message	Date
Pepijn	129537068a	feat(ci): extract task descriptions and embed in metrics artifact Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 12:50:14 +02:00
Pepijn	1205bb086d	feat(ci): add Libero train+eval smoke test (1 step, eval_freq=1) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 10:26:42 +02:00
Pepijn	501b916601	feat(metrics): add avg_sum_reward and eval_s to metrics output Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 10:04:53 +02:00
Pepijn	82034805d6	fix(ci): write eval output to /tmp inside container Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 20:56:35 +02:00
Pepijn	728fbbd98c	fix(ci): use docker cp instead of bind mounts for artifacts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 20:33:39 +02:00
Pepijn	9b8630e9d9	fix(ci): change benchmark schedule from monthly to weekly (every Monday) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 20:07:05 +02:00
Pepijn	5771e2d3ab	feat(ci): add monthly schedule trigger for benchmark tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 20:06:11 +02:00
Pepijn	a82fa8b35e	fix(ci): re-chmod artifacts after eval to fix unreadable files Files created by user_lerobot inside the eval container inherit a restrictive umask, making them unreadable by the runner after the container exits. Add a post-eval 'docker run --user root' chmod step so upload-artifact can find the video files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 19:52:49 +02:00
Pepijn	bcfcedd72e	fix(ci): use root container chmod + python3 for benchmark artifact dirs - Replace host-side chmod (unreliable across Docker UID boundary) with a dedicated 'docker run --user root' step that chmods from inside the container before the eval run mounts the path. - Use python3 instead of python (CI runners only have python3). - Add if: always() to parse/upload steps so metrics are captured even on eval failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 19:22:16 +02:00
Pepijn	3d4245da7d	chore: remove spaces/ — dashboard lives at lerobot/health-dashboard on HF Hub The Gradio Space is now a standalone repo deployed to https://huggingface.co/spaces/lerobot/health-dashboard (private). Only the CI scripts and workflow changes belong in this repo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 18:06:06 +02:00
Pepijn	452d9abaa4	feat(ci): add health dashboard Space + benchmark metrics artifacts - spaces/health-dashboard/app.py: Gradio Space that queries the GitHub Actions API directly (no extra datastore). Shows benchmark status badges, success-rate and duration trend charts, and embeds the latest rollout video per benchmark. Results cached 5 min in-memory; video files cached on disk by artifact ID so downloads only happen once. - spaces/health-dashboard/requirements.txt + README.md: Space card with setup instructions for the GITHUB_RO_TOKEN secret (actions:read, metadata:read only). - scripts/ci/parse_eval_metrics.py: runs on the CI host after each eval, reads eval_info.json written by lerobot-eval, extracts pc_success and n_episodes, and writes metrics.json to the artifacts dir. - .github/workflows/benchmark_tests.yml: add "Parse … metrics" and "Upload … metrics" steps (if: always()) after each eval so the dashboard has data even when the eval fails. The Space should be deployed as a private Space under the huggingface org. Required secret: GITHUB_RO_TOKEN (fine-grained, read-only). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 17:46:44 +02:00
Pepijn	13ee7009fe	fix(ci): chmod 777 artifact dirs so non-root container can write videos Container runs as user_lerobot (non-root); host-mounted /artifacts volume was owned by root, causing PermissionError on first video write. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 16:50:54 +02:00
Pepijn	8bf77ef6b9	feat(ci): upload rollout videos as artifacts for quick visual validation Mount a host volume into the container so lerobot-eval writes videos to /artifacts, then upload artifacts/videos/ via actions/upload-artifact. `if: always()` ensures the video is uploaded even when the eval fails, which helps debug rollout issues. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 16:26:32 +02:00
Pepijn	4131f22ea1	fix(ci): pre-download libero-assets at image build time The 586-file lerobot/libero-assets dataset was being fetched at runtime (on first reset()) which consistently hit a 504 Gateway Timeout on CI runners. Downloading at build time bakes the assets into the image so no network call is needed during the smoke eval. The config.yaml now points assets → ~/.libero/assets (the downloaded snapshot) instead of the bundled (empty) package path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 16:14:34 +02:00
Pepijn	225bec6552	fix(ci): fix HF download timeout and metaworld feature mismatch - Add HF_HUB_DOWNLOAD_TIMEOUT=300 to both jobs — SmolVLM2 processor download was timing out on CI runners with the default timeout - MetaWorld: add --rename_map to map observation.image → camera1 and --policy.empty_cameras=2 to pad the 2 missing cameras the policy expects (trained with 3 cameras, env provides 1) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 15:51:15 +02:00
Pepijn	a4d9bee6e2	fix(ci): use metaworld-push-v3 task (v2 not in TASK_DESCRIPTIONS) All MetaWorld task names in metaworld_config.json use the v3 suffix. push-v2 caused a KeyError on TASK_DESCRIPTIONS lookup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 15:26:58 +02:00
Pepijn	437014926f	feat(ci): add benchmark smoke tests with isolated Docker images Each benchmark gets its own image (lerobot[<benchmark>,smolvla]) so incompatible dep trees can never collide. A 1-episode smoke eval runs per benchmark on GPU runners. - Libero: pepijn223/smolvla_libero, libero_spatial, camera_name_mapping - MetaWorld: pepijn223/smolvla_metaworld, metaworld-push-v2 - LIBERO config pre-created at build time to bypass interactive stdin prompt - Triggers on envs/**, lerobot_eval.py, Dockerfiles, pyproject.toml changes - Adds docs/source/evaluation.mdx and restores step 7 in adding_benchmarks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 14:44:59 +02:00
Pepijn	f4ad290067	docs(adding_benchmarks): remove CI smoke test step (coming in separate PR) Step 7 (Dockerfile + benchmark_tests.yml CI job) and its table rows are out of scope for this PR. The CI infrastructure will be added on top in a follow-up PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 14:41:49 +02:00
Pepijn	bd6e27f9a1	chore: restore adding_benchmarks + test_dispatch, drop env_processor changes - Restore docs/source/adding_benchmarks.mdx (belongs in this PR) - Restore tests/envs/test_dispatch.py (belongs in this PR) - Revert docs/source/env_processor.mdx to main (out of scope for this PR) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 14:39:39 +02:00
Pepijn	c4d7e7468b	chore: remove out-of-scope benchmark/CI/docs files from PR Benchmark CI workflow, Dockerfiles, benchmark docs, evaluation smoke-test doc, and dispatch tests belong in a separate PR. Scope this PR to the async env init changes only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 14:33:48 +02:00
Pepijn	f5206a3cd8	refactor(envs): move _LazyAsyncVectorEnv to utils and apply to metaworld _LazyAsyncVectorEnv lived in libero.py but metaworld had the same OOM problem: all tasks' AsyncVectorEnv workers were spawned eagerly, wasting GPU memory for tasks not yet running. Move the class to envs/utils.py so both environments share it, then apply the same is_async + lazy wrapping pattern in create_metaworld_envs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 14:32:51 +02:00
Pepijn	66d2382191	fix(eval): prefetch next task's workers after close to avoid GPU memory overlap Previously, next task's AsyncVectorEnv workers were spawned while the current task was still running, causing both tasks' GPU contexts to coexist. Moving the prefetch start into the finally block (after env.close()) ensures workers for task N+1 only spin up once task N has released GPU memory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 14:15:39 +02:00
Pepijn	786ee5606e	refactor(envs): remove __del__ from _LazyAsyncVectorEnv __del__ is unreliable as a cleanup mechanism. close() is already called explicitly in the eval loop's finally block, so the finalizer is redundant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 14:15:33 +02:00
Pepijn	a4b76c22fd	docs(env_processor): remove deprecated add_envs_task from pipeline example add_envs_task is replaced by env.call("task_description") in this PR. Remove it from the pipeline walkthrough and renumber the steps (8→7). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 14:15:27 +02:00
Pepijn	76129ab130	chore: apply prettier formatting to docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:48:35 +02:00
Pepijn	97972ae1de	refactor(envs): remove unused _get_sub_env_attr helper _get_sub_env_attr was defined but never called anywhere in the codebase. _sub_env_has_attr (its sibling) is kept — it is actively used in utils.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:46:07 +02:00
Pepijn	9b131f40b8	fix(eval): render_frame covers _LazyAsyncVectorEnv isinstance(env, AsyncVectorEnv) silently skipped _LazyAsyncVectorEnv, causing video rendering to produce no frames on the default async path. Switch to hasattr(env, "call") so any async-compatible env (including _LazyAsyncVectorEnv) hits the call("render") branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:45:57 +02:00
Pepijn	f4e60371ea	fix(ci): add smolvla extra to benchmark Dockerfiles num2words (required by SmolVLM processor) is declared in lerobot[smolvla], not lerobot[libero/metaworld]. Install both extras together. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:41:02 +02:00
Pepijn	cd6e6ab765	fix(ci): point libero config to bundled package init_files The config was pointing to /tmp/libero_init which doesn't exist. Use importlib.util.find_spec to locate the hf-libero package directory and write paths to the actual bundled bddl_files/init_files/assets. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:22:13 +02:00
Pepijn	9be5e4f3bf	fix(ci): use shell to create libero config instead of multiline python -c The multiline RUN python -c "..." was being parsed as Dockerfile instructions. Use printf to write ~/.libero/config.yaml directly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 11:53:32 +02:00
Pepijn	28c5fd0421	fix(ci): pre-create libero config in Dockerfile to bypass stdin prompt libero/__init__.py calls input() when ~/.libero/config.yaml is missing. We write the config at image build time (without importing libero) so the prompt never fires at runtime. Also trigger CI on pyproject.toml changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 11:34:36 +02:00
Pepijn	56138e2368	docs(benchmarks): add CI smoke test step to adding_benchmarks guide Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 10:56:43 +02:00
Pepijn	1bb62aa0c5	fix(ci): set LIBERO_DATA_FOLDER to bypass interactive stdin prompt libero/__init__.py calls input() to ask about a custom dataset path, which raises EOFError when stdin is closed inside Docker. Setting LIBERO_DATA_FOLDER skips the prompt entirely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 09:34:08 +02:00
Pepijn	834532f1dc	ci(benchmarks): trigger only on envs/ or lerobot_eval.py changes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 22:29:42 +02:00
Pepijn	40757b3481	ci(benchmarks): pin action hashes and use uv sync --locked Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 21:56:47 +02:00
Pepijn	0bc68740f4	ci(benchmarks): add isolated integration tests for libero and metaworld Each benchmark gets its own Docker image (lerobot[libero] / lerobot[metaworld] only) so incompatible dep trees cannot collide. A 1-episode smoke eval runs per benchmark on GPU runners. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 21:55:59 +02:00
Pepijn Kooijmans	861a7c7068	chore: revert env_processor.mdx changes (not part of this PR) Made-with: Cursor	2026-04-07 20:11:31 +02:00
Pepijn Kooijmans	882b44f6be	style: ruff format Made-with: Cursor	2026-04-07 20:11:31 +02:00
Pepijn Kooijmans	5ce727f20f	perf(eval): shared memory, observation passthrough, task prefetch - AsyncVectorEnv now uses shared_memory=True for zero-copy observation transfer - LiberoEnvConfig.gym_kwargs passes observation_height/width to the env - eval_policy_all prefetches next task's workers while current task runs Made-with: Cursor	2026-04-07 20:11:31 +02:00
Pepijn Kooijmans	634aa89558	docs(evaluation): remove benchmark table, rename section header Made-with: Cursor	2026-04-07 20:11:31 +02:00
Pepijn Kooijmans	ec759e994d	docs: add evaluation guide and update benchmarks doc - New docs/source/evaluation.mdx covering lerobot-eval usage, batch_size auto-tuning, AsyncVectorEnv performance, tuning tips, output format, multi-task evaluation, and programmatic usage. - Add evaluation page to _toctree.yml under Benchmarks section. - Update adding_benchmarks.mdx to reference batch_size auto default and link to the evaluation guide. Made-with: Cursor	2026-04-07 20:11:31 +02:00
Pepijn Kooijmans	ce6c0ba1b7	feat(eval): batch_size=auto + faster env loading - batch_size=0 (default) auto-tunes based on CPU cores, capped by n_episodes and 64. Removes the need for users to guess the right value. The old batch_size > n_episodes error is replaced by silently clamping to n_episodes. - _LazyAsyncVectorEnv accepts pre-computed spaces so only one temp env is created per suite (not per task). For libero_spatial (10 tasks) this avoids 9 redundant LiberoEnv instantiations during env setup. Made-with: Cursor	2026-04-07 20:11:31 +02:00
Pepijn Kooijmans	99f5659624	docs: update adding_benchmarks for async env changes - Replace add_envs_task reference with env.call("task_description") - Update use_async_envs default to True - Add note about lazy GPU init for AsyncVectorEnv compatibility Made-with: Cursor	2026-04-07 20:11:31 +02:00
Pepijn Kooijmans	438c1be1ca	fix(eval): use task_description instead of task for language conditioning env.call("task") returns the LIBERO task name with underscores (e.g. "pick_up_the_black_bowl_...") instead of the natural language description ("pick up the black bowl ..."). The VLM tokenizes these completely differently, causing 0.0 reward across all episodes. Made-with: Cursor	2026-04-07 20:11:31 +02:00
Pepijn Kooijmans	6b3d25bc79	fix: close envs between tasks to prevent worker process accumulation eval_policy_all never closed environments after each task completed, causing AsyncVectorEnv worker processes to accumulate (N_tasks × n_envs). This led to OOM, BrokenPipeError and EOFError on multi-task benchmarks. Also fixes: - AsyncVectorEnv compat in envs/utils.py (use get_attr/call instead of .envs) - Tuple task handling in tokenizer_processor and lerobot_eval - _LazyAsyncVectorEnv for deferred worker spawning in LIBERO Made-with: Cursor	2026-04-07 20:11:31 +02:00
Pepijn	8c3babc2cb	feat(envs): lazy env init + AsyncVectorEnv as default for n_envs > 1 LiberoEnv and MetaworldEnv previously allocated GPU resources (EGL context, OpenGL framebuffer) in __init__, before AsyncVectorEnv's fork(). Worker processes inherited stale GPU handles, causing EGL_BAD_CONTEXT crashes on first render. Fix: defer OffScreenRenderEnv / MT1 construction to _ensure_env(), called on first reset() or step() inside the worker subprocess. Each worker creates its own clean context after fork(). Also fixes lerobot_eval.py:170 (add_envs_task TODO): replace with env.call("task") which works with both SyncVectorEnv and AsyncVectorEnv. AsyncVectorEnv is now the default for n_envs > 1; auto-downgraded to SyncVectorEnv when n_envs=1 (no benefit, less overhead). Expected speedup: ~15-20x for LIBERO Spatial with batch_size=50. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 20:11:30 +02:00
Pepijn Kooijmans	fa6d7d23d3	style: revert quadruple backticks to triple (prettier compat) Made-with: Cursor	2026-04-07 20:11:25 +02:00
Pepijn Kooijmans	e05cf3c742	docs: remove duplicate code blocks in env_processor.mdx Made-with: Cursor	2026-04-07 20:00:22 +02:00
Pepijn Kooijmans	3a6600f7b0	style: fix markdown code fences in env_processor.mdx Made-with: Cursor	2026-04-07 19:42:42 +02:00
Pepijn	f736a36049	Merge branch 'main' into refactor/benchmark-dispatch	2026-04-07 19:36:20 +02:00

1 2 3 4 5 ...

1434 Commits