lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-07-23 01:41:54 +00:00

Author	SHA1	Message	Date
Pepijn	e9f3f88377	fix(rollout): preserve relative action chunks	2026-04-24 15:23:36 +02:00
Steven Palma	f29ce39f39	fix(rollout): enable default	2026-04-22 18:47:15 +02:00
Steven Palma	6ee093db9a	Merge branch 'main' into feat/lerobot-rollout	2026-04-22 16:38:13 +02:00
Steven Palma	4499519dbf	chore(rollout): address coments + minor improvements	2026-04-22 16:37:19 +02:00
Steven Palma	d70c3baf7c	chore(rollout): multiple minor improvements	2026-04-22 15:18:50 +02:00
Steven Palma	bff2d50dc1	docs(rollout): edit rename_map instructions	2026-04-22 13:59:55 +02:00
Steven Palma	8d1401abe3	fix(rollout): device and task propagation + feature pos + warn fps + move rename_map config	2026-04-22 12:55:22 +02:00
Pepijn	5adad11128	feat(sim): VLABench benchmark integration (#3396 ) feat(sim): add VLABench benchmark integration Add VLABench as a new simulation benchmark in LeRobot, following the existing LIBERO and MetaWorld patterns. This PR wires VLABench end-to-end across environment integration, Docker setup, CI smoke evaluation, and documentation. It also fixes a number of upstream packaging and runtime issues required to make VLABench usable and reproducible in CI. What’s included Benchmark integration Add VLABench as a new simulation benchmark. Expose supported VLABench tasks through the LeRobot env interface. Follow the established LIBERO / MetaWorld factory patterns. Preserve lazy async-env metadata so env.unwrapped.metadata["render_fps"] continues to work. CI smoke evaluation Add a VLABench smoke-eval job using lerobot/smolvla_vlabench. Use the correct rename_map for the 3-camera dataset layout. Expand smoke coverage from 1 to 10 primitive tasks. Extract task descriptions after eval so metrics artifacts include per-task labels. Skip Docker Hub login when secrets are unavailable (e.g. fork PRs). Docker / install fixes Install VLABench from GitHub rather than PyPI. Use uv pip, not pip, in the base image. Fail loudly on install errors instead of masking them. Clone VLABench into the non-root user’s home directory. Use shallow editable installs for VLABench and rrt-algorithms to work around missing __init__.py issues. Pin upstream clones to exact commit SHAs for reproducibility. Add undeclared runtime dependencies required by VLABench (open3d, colorlog, scikit-learn, openai). Unpin open3d so Python 3.12 wheels resolve. Assets Support downloading VLABench assets from a Hugging Face Hub mirror via VLABENCH_ASSETS_REPO. Keep Google Drive download support as fallback. Install huggingface_hub[hf_xet] so Xet-backed assets download correctly. Validate required mesh/XML asset subtrees at build time. Patch VLABench constants to tolerate missing asset directories at import time. Runtime / env correctness Import VLABench robots and tasks explicitly so decorator-based registry population happens. Resize and normalize camera observations so they always match the declared (H, W, 3) uint8 observation space. Reinstall LeRobot editably inside the image so the new env code is actually used. Coerce agent_pos / ee_state to the expected shape. Pad actions when needed to match data.ctrl. Replace zero-padding fallback with proper dm_control IK for 7D end-effector actions. Refetch dm_control physics on each step instead of caching weakrefs. Retry unstable resets with reseeding and handle PhysicsError gracefully at step time. Dataset / policy alignment Align VLABench observations and actions with Hugging Face dataset conventions used by lerobot/vlabench_unified: convert EE position between world frame and robot-base frame at the env boundary, expose / consume Euler XYZ instead of raw quaternion layout, align gripper semantics with dataset convention (1 = open, 0 = closed). This fixes policy/env mismatches that previously caused incorrect IK targets and unstable behavior at evaluation time. Docs Add a full docs/source/vlabench.mdx page aligned with the standard benchmark template. Document task selection forms (single task, comma list, suite shortcut). Document installation, evaluation, training, and result reproduction. Point examples at lerobot/smolvla_vlabench. Add a benchmark banner image. Remove outdated / misleading references to upstream evaluation tracks. Document manual install flow instead of a broken vlabench extra. Packaging cleanup Remove the unresolvable vlabench extra from pyproject.toml. Remove the no-op VLABench processor step. Remove the obsolete env unit test that only covered the dropped gripper remap helper. Apply formatting / logging / style cleanup from review feedback. Why this is needed VLABench is not currently consumable as a normal Python dependency and requires several upstream workarounds: no PyPI release, missing package declarations, undeclared runtime deps, SSH-only submodule references, asset downloads outside normal package install flow, registry population that depends on import side effects, env outputs that do not always match declared observation shapes, task resets that can diverge under some random layouts. This PR makes the benchmark usable in LeRobot despite those constraints, and ensures CI runs are reproducible and informative. If you want a much shorter squash commit message, I’d use this: feat(sim): integrate VLABench benchmark with CI, Docker, and docs Add VLABench as a new LeRobot simulation benchmark, following the existing LIBERO / MetaWorld patterns. This includes: LeRobot env integration and task exposure, CI smoke eval with lerobot/smolvla_vlabench, Docker install and asset-download fixes, runtime fixes for registry loading, assets, camera obs, action handling, dm_control IK, and PhysicsError recovery, alignment of obs/action semantics with HF VLABench datasets, docs and packaging cleanup. The PR also incorporates review feedback, improves reproducibility by pinning upstream commits, and makes VLABench usable in CI despite upstream packaging and asset-management issues.	2026-04-21 17:54:11 +02:00
Pepijn	a07f22e22c	feat(envs): add LIBERO-plus robustness benchmark (#3313 ) * feat(envs): add LIBERO-plus robustness benchmark integration - LiberoPlusEnv config (subclass of LiberoEnv, same gym interface) - Docker image installing LIBERO-plus fork via PYTHONPATH - CI workflow: 1-episode smoke eval with pepijn223/smolvla_libero_plus - pyproject.toml: libero_plus extra * fix(libero): use suite's perturbation-aware init_states loader LIBERO-plus's Benchmark class exposes a `get_task_init_states(i)` method that strips perturbation suffixes (`_table_N`, `_tb_N`, `_view_`, `_language_`, `_light_`, `_add_`, `_level`) and loads the underlying base `.pruned_init` file — the on-disk name for a perturbation variant doesn't exist as a file, only the base does. lerobot's loader was bypassing that logic and trying to read the suffix-bearing filename directly, which failed for every non-zero task id and killed the eval before any rollout video could be written. Delegate to the suite's method when it exists; fall back to the path-based loader for vanilla LIBERO (which does not provide the method). Also drop the hf-libero install + init_files copy from the LIBERO-plus Dockerfile — the LIBERO-plus clone already ships both `bddl_files/` and `init_files/` for all five suites, so the copy was unnecessary and the `cp -r` into an existing dir produced a confusing nested layout. * fix(libero): resolve LIBERO-plus perturbation init_states path ourselves Delegating to `task_suite.get_task_init_states(i)` works for path resolution but LIBERO-plus's method calls `torch.load(path)` without `weights_only=False`, which fails on PyTorch 2.6+ because the pickled init_states contains numpy objects not in the default allowlist: _pickle.UnpicklingError: Weights only load failed. WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global. Mirror LIBERO-plus's suffix-stripping logic (`_table_N`, `_tb_N`, `_view_`, `_language_`, `_light_`, `_add_`, `_level`) in our own helper so we can pass `weights_only=False` ourselves. Vanilla LIBERO task names don't contain any of these patterns except for `_table_` when followed by the word `center` (e.g. `pick_up_the_black_bowl_from_table_center_...`), and the regex requires `_table_\\d+` so semantic uses are preserved. * fix(libero-plus): download perturbation assets from Sylvest/LIBERO-plus LIBERO-plus's bddl_base_domain.py resolves scene XMLs with `os.path.join(DIR_PATH, "../assets")`, so the `assets` key in config.yaml has no effect on scene lookup — MuJoCo always opens `<clone>/libero/libero/assets/scenes/...`. With no such directory present, every perturbation task fails on: FileNotFoundError: No such file or directory: .../libero-plus/libero/libero/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml These textures, views, and extra objects ship only in the 6.4 GB `assets.zip` published at `Sylvest/LIBERO-plus` (the LIBERO-plus README explicitly says to download and unzip it into the package dir). Fetch it via `hf_hub_download`, unzip into `${LIBERO_PLUS_ROOT}/`, install `unzip`, and point config.yaml at the extracted dir so everything stays consistent. The download lives in its own Docker layer so subsequent rebuilds reuse the cached assets. Drops the lerobot/libero-assets snapshot_download — that mirror only has vanilla LIBERO textures and is ignored for scene loading anyway. * fix(libero-plus): flatten deep path prefix from Sylvest/LIBERO-plus assets.zip The 6.4 GB zip ships with every entry prefixed by `inspire/hdd/project/embodied-multimodality/public/syfei/libero_new/release/dataset/LIBERO-plus-0/assets/...` (the author's internal filesystem layout, not the layout the LIBERO-plus README promises), so the previous `unzip -d ${LIBERO_PLUS_ROOT}/` created `${LIBERO_PLUS_ROOT}/inspire/.../assets/` — robosuite still opened `${LIBERO_PLUS_ROOT}/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml` and hit the same FileNotFoundError. Extract to a scratch dir, then `mv` the nested `assets/` subtree to the expected location. Verified the target file exists in the zip central directory under that exact prefix. * refactor(libero): inline init_states resolver behind single regex Collapse the three-style suffix stripper (split/re.sub/in) into one compiled regex, drop the (Path, bool) tuple return, and move the `_add_`/`_level` reshape branch into the caller so each branch loads its own file and returns directly. Net: -11 lines, one fewer helper. * refactor(libero-plus): rebase docker image on huggingface/lerobot-gpu Mirror the libero/metaworld/robomme pattern: start from the nightly GPU image (apt deps, python, uv, venv, lerobot[all] already there) and only layer on what LIBERO-plus uniquely needs — its wand/ImageMagick build deps, the non-extra runtime pips (robosuite==1.4.1, bddl, …), the PYTHONPATH-shadowed fork, and the 6.4 GB assets.zip. Drops ~50 lines of duplicated base setup (CUDA FROM, apt python, uv install, user creation, venv init) the nightly already provides. 123 → 73 lines. Also: - Add libero_plus to docs/source/_toctree.yml under Benchmarks so doc-builder's TOC integrity check stops failing. - Repoint the docs dataset link from pepijn223/libero_plus_lerobot to the canonical lerobot/libero_plus. - Revert the stray uv.lock churn (revision/marker diff that crept in from an unrelated resolve — unrelated to LIBERO-plus). * fix(libero-plus): stop touching pyproject + uv.lock The fast-tests job was rejecting the branch because pyproject.toml had a [libero_plus] extra whose git dep wasn't represented in uv.lock. The Docker image no longer needs the extra — it clones LIBERO-plus directly and PYTHONPATH-shadows hf-libero. Drop [libero_plus] from pyproject and restore pyproject.toml + uv.lock to exactly what's on origin/main, so `uv sync --locked --extra test` is a no-op for this PR. Also repoint the doc/CI/env comments that still mentioned the extra at the Docker install path. * fix(libero-plus): strip perturbation metadata from task descriptions LIBERO-plus builds task.language by space-joining the perturbation-variant filename, so every non-_language_ variant inherits a trailing blob like "view 0 0 100 0 0 initstate 0 noise 45" or "add 16". That shows up in the dashboard video labels and no longer matches the base instruction stored in the training dataset. Strip those tokens in extract_task_descriptions.py with an end-anchored regex over the {view,initstate,noise,add,tb,table,light,level}(+digits) vocabulary. The anchor preserves mid-sentence literal uses of those words (e.g. "from table center and place it on the plate") — only the trailing metadata chain is removed. _language_ variants carry real BDDL-sourced text and are left untouched. * ci: point benchmark eval checkpoints at the lerobot/ org mirrors pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in this branch (libero, metaworld, and the per-branch benchmark). The checkpoints were mirrored into the lerobot/ org and that's the canonical location going forward. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: integrate PR #3313 review feedback - docs: fix paper link to arxiv, add benchmark image, add suite descriptions, add LIBERO-plus replacement warning, restructure eval section to match LIBERO doc style, fix policy I/O section, remove false try/except claim - docker: fix shell grouping for hf-libero uninstall, replace hardcoded asset path with dynamic find - ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step - envs: add is_libero_plus param to get_task_init_states so vanilla LIBERO always takes the simple path * fix(docs): use correct LIBERO-plus teaser image URL * ci(libero-plus): drop redundant hf auth login step The standalone login step ran `hf auth login` in a throwaway `docker run --rm` container, so no credentials persisted. Auth is already performed inside the eval step's container. Removing the redundant step per PR #3313 review feedback. * fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs Port of #3416 onto this branch. Without these attributes eval crashes when calling `env.unwrapped.metadata["render_fps"]` with async vector envs. Adds `metadata` / `unwrapped` to `_LazyAsyncVectorEnv` and caches the metadata alongside obs/action spaces in the LIBERO and MetaWorld factories. * ci: gate Docker Hub login on secret availability Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`, which made every benchmark job fail at the login step before any of the actual build/eval work could run. Gate the login on the env-var expansion of the username so the step is skipped (not failed) when secrets are absent. Mirrors the existing pattern in the VLABench job. * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update scripts/ci/extract_task_descriptions.py Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update docker/Dockerfile.benchmark.libero_plus Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(libero-plus): address review feedback * ci(libero-plus): fix YAML indentation in upload-artifact steps The `uses:` key on two upload-artifact steps was at column 0 instead of nested under the step, causing `pre-commit run check-yaml` to fail with "expected <block end>, but found '<block mapping start>'". Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>	2026-04-20 21:07:21 +02:00
Pepijn	282c31cfef	feat(envs): add RoboMME benchmark (#3311 ) * feat(envs): add RoboMME benchmark integration - RoboMME env wrapper with image/wrist_image/state observations - Docker image with Vulkan, SAPIEN, mani-skill deps - CI workflow: 1-episode smoke eval with pepijn223/smolvla_robomme - preprocess_observation: handle image/wrist_image/state keys - pyproject.toml: robomme extra Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(docker): rebase RoboMME image on huggingface/lerobot-gpu Mirror the libero/metaworld pattern: start from the nightly GPU image (which already has apt deps, uv, venv, and lerobot[all] preinstalled) and only layer on what RoboMME uniquely needs — the Vulkan libs ManiSkill/SAPIEN requires, plus the robomme extra with the gymnasium/numpy overrides. Drops 48 lines of duplicated base setup (CUDA FROM, python install, user creation, venv init, base apt deps) that the nightly image already provides. Net: 102 → 54 lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(robomme): drop prototype-branch note and move dataset to lerobot/robomme - Remove the "Related work" block referencing the prototype branch feat/robomme-integration; the PR stands on its own. - Point all dataset references at lerobot/robomme (docs, env module docstring, RoboMMEEnvConfig docstring) — this is the canonical HF location once the dataset is mirrored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(robomme): make docs build + fast tests green 1. Docs: add robomme to _toctree.yml under Benchmarks so doc-builder's TOC integrity check stops rejecting the new page. 2. Fast tests: robomme's mani-skill transitively pins numpy<2 which is unsatisfiable against the project's numpy>=2 base pin, so `uv sync` couldn't resolve a universal lockfile. Drop robomme as a pyproject extra entirely — it truly cannot coexist with the rest of the dep tree. The Dockerfile installs robomme directly from its git URL via `uv pip install --override`, which was already the runtime path. pyproject, docs, env docstrings, and the CI job comment all now point to the docker-only install. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(robomme): realign unit tests with current env API The tests were written against an earlier env layout and never updated when the wrapper was refactored, so CI's fast-test job was failing with: - KeyError: 'front_rgb' / 'wrist_rgb' — these were renamed to the lerobot-canonical 'image' / 'wrist_image' keys (matching the dataset columns and preprocess_observation's built-in fallbacks). - AssertionError: 'robomme' not in result — create_robomme_envs now returns {task_name: {task_id: env}}, not {'robomme': {...}}, so comma-separated task lists work. - ModuleNotFoundError: lerobot.envs.lazy_vec_env — LazyVectorEnv was removed; create_robomme_envs is straightforward synchronous now. Rewrite the 7 failing cases against the current API, drop the three LazyVectorEnv tests, and add a multi-task test so the new comma-separated task parsing is covered. Stub install/teardown is moved into helpers (`_install_robomme_stub` / `_uninstall_robomme_stub`) so individual tests stop repeating six boilerplate lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: point benchmark eval checkpoints at the lerobot/ org mirrors pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in this branch (libero, metaworld, and the per-branch benchmark). The checkpoints were mirrored into the lerobot/ org and that's the canonical location going forward. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: integrate PR #3311 review feedback - envs: rename obs keys to pixels/image, pixels/wrist_image, agent_pos - envs: add __post_init__ for dynamic action_dim in RoboMMEEnv config - envs: remove special-case obs conversion in utils.py (no longer needed) - ci: add Docker Hub login, HF_USER_TOKEN guard, --env.task_ids=[0] - scripts: extract_task_descriptions supports multiple task_ids - docs: title to # RoboMME, add image, restructure eval section - tests: update all key assertions to match new obs naming Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): use correct RoboMME teaser image URL Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci(robomme): smoke-eval 10 tasks instead of 5 Broader coverage on the RoboMME benchmark CI job: bump the smoke eval from 5 tasks to 10 (one episode each), all drawn from ROBOMME_TASKS. Tasks now run: PickXtimes, BinFill, StopCube, MoveCube, InsertPeg, SwingXtimes, VideoUnmask, ButtonUnmask, PickHighlight, PatternLock. Updated the parse_eval_metrics.py `--task` label from the single `PickXtimes` stub to the full comma list so the metrics artifact reflects what was actually run. `parse_eval_metrics.py` already reads `overall` for multi-task runs, so no parser change is needed. Made-with: Cursor * fix(robomme): nest `pixels` as a dict so preprocess_observation picks it up `_convert_obs` was returning flat keys (`pixels/image`, `pixels/wrist_image`). `preprocess_observation()` in envs/utils.py keys off the top-level `"pixels"` entry and, not finding it, silently dropped every image from the batch. The policy then saw zero image features and raised ValueError: All image features are missing from the batch. Match the LIBERO layout: return `{"pixels": {"image": ..., "wrist_image": ...}, "agent_pos": ...}` and declare the same shape in `observation_space`. Made-with: Cursor * fix(robomme): align docs and tests with nested pixels obs layout Addresses PR #3311 review feedback: - Docs: correct observation keys to `pixels/image` / `pixels/wrist_image` (mapped to `observation.images.image` / `observation.images.wrist_image`) and drop the now-obsolete column-rename snippet. - Tests: assert `result["pixels"]["image"]` instead of flat `pixels/image`, matching the nested layout required by `preprocess_observation()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs Port of #3416 onto this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: gate Docker Hub login on secret availability Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`, which made every benchmark job fail at the login step. Gate the login on the env-var expansion of the username so the step is skipped (not failed) when secrets are absent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(robomme): address review feedback --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 20:21:27 +02:00
Steven Palma	195b777367	fix(rollout): features check	2026-04-20 18:13:37 +02:00
Pepijn	0f1c9b0851	feat(envs): add RoboTwin 2.0 benchmark (#3315 ) * feat(envs): add RoboTwin 2.0 benchmark integration - RoboTwinEnvConfig with 4-camera setup (head/front/left_wrist/right_wrist) - Docker image with SAPIEN, mplib, CuRobo, pytorch3d (Python 3.12) - CI workflow: 1-episode smoke eval with pepijn223/smolvla_robotwin - RoboTwinProcessorStep for state float32 casting - Camera rename_map: head_camera/front_camera/left_wrist -> camera1/2/3 * fix(robotwin): re-enable autograd for CuRobo planner warmup and take_action lerobot_eval wraps the full rollout in torch.no_grad() (lerobot_eval.py:566), but RoboTwin's setup_demo → load_robot → CuroboPlanner(...) runs motion_gen.warmup(), which invokes Newton's-method trajectory optimization. That optimizer calls cost.backward() internally, which raises RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn when autograd is disabled. take_action() hits the same planner path at every step. Wrap both setup_demo and take_action in torch.enable_grad() so CuRobo's optimizer can build its computation graph. Policy inference is unaffected — rollout()'s inner torch.inference_mode() block around select_action() is untouched, so we still don't allocate grad buffers during policy forward. * fix(robotwin): read nested get_obs() output and use aloha-agilex camera names RoboTwin's base_task.get_obs() returns a nested dict: {"observation": {cam: {"rgb": ..., "intrinsic_matrix": ...}}, "joint_action": {"left_arm": ..., "left_gripper": ..., "right_arm": ..., "right_gripper": ..., "vector": np.ndarray}, "endpose": {...}} Our _get_obs was reading raw["{cam}_rgb"] / raw["{cam}"] and raw["joint_action"] as if they were flat, so np.asarray(raw["joint_action"], dtype=float64) tripped on a dict and raised TypeError: float() argument must be a string or a real number, not 'dict' Fix: - Pull images from raw["observation"][cam]["rgb"] - Pull joint state from raw["joint_action"]["vector"] (the flat array) - Update the default camera tuple to (head_camera, left_camera, right_camera) to match RoboTwin's actual wrist-camera names (envs/camera/camera.py:135-151) * refactor(robotwin): drop defensive dict guards, cache black fallback frame _get_obs was guarding every dict access with isinstance(..., dict) in case RoboTwin's get_obs returned something else — but the API contract (envs/_base_task.py:437) always returns a dict, so the guards were silently masking real failures behind plausible-looking zero observations. Drop them. Also: - Cache a single black fallback frame in __init__ instead of allocating a fresh np.zeros((H, W, 3), uint8) for every missing camera on every step — the "camera not exposed" set is static per env. - Only allocate the zero joint_state on the fallback path (not unconditionally before the real value overwrites it). - Replace .flatten() with .ravel() (no copy when already 1-D). - Fold the nested-dict schema comment and two identical torch.enable_grad() rationales into a single Autograd section in the class docstring. - Fix stale `left_wrist` camera name in the observation docstring. * fix(robotwin): align observation_space dims with D435 camera output lerobot_eval crashed in gym.vector's SyncVectorEnv.reset with: ValueError: Output array is the wrong shape because RoboTwinEnvConfig declared observation_space = (480, 640, 3) but task_config/demo_clean.yml specifies head_camera_type=D435, which renders (240, 320, 3). gym.vector.concatenate pre-allocates a buffer from the declared space, so the first np.stack raises on shape mismatch. Changes: - Config defaults now 240×320 (the D435 dims in _camera_config.yml), with a comment pointing at the source of truth. - RoboTwinEnv.__init__ accepts observation_height/width as Optional and falls back to setup_kwargs["head_camera_h/w"] so the env is self-consistent even if the config is not in sync. - Config camera_names / features_map use the actual aloha-agilex camera names (head_camera, left_camera, right_camera). Drops the stale "front_camera" and "left_wrist"/"right_wrist" entries that never matched anything RoboTwin exposes. - CI workflow's rename_map updated to match the new camera names. * fix(robotwin): expose _max_episode_steps for lerobot_eval.rollout rollout() does `env.call("_max_episode_steps")` (lerobot_eval.py:157) to know when to stop stepping. LiberoEnv and MetaworldEnv set this attribute; RoboTwinEnv was tracking the limit under `episode_length` only, so the call raised AttributeError once CuRobo finished warming up. * fix(robotwin): install av-dep so lerobot_eval can write rollout MP4s write_video (utils/io_utils.py:53) lazily imports PyAV via require_package and raises silently inside the video-writing thread when the extra is not installed — so the eval itself succeeds with pc_success=100 but no MP4 ever lands in videos/, and the artifact upload reports "No files were found". Add av-dep to the install line (same pattern as the RoboMME image). * feat(robotwin): eval 5 diverse tasks per CI run with NL descriptions Widen the smoke eval from a single task (beat_block_hammer) to five: click_bell, handover_block, open_laptop, stack_blocks_two on top of the original. Each gets its own rollout video in videos/<task>_0/ so the dashboard can surface visually distinct behaviours. extract_task_descriptions.py now has a RoboTwin branch that reads `description/task_instruction/<task>.json` (already shipped in the clone at /opt/robotwin) and pulls the `full_description` field. CI cds into the clone before invoking the script so the relative path resolves. parse_eval_metrics.py is invoked with the same 5-task list so the metrics.json embeds one entry per task. * ci: point benchmark eval checkpoints at the lerobot/ org mirrors pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in this branch (libero, metaworld, and the per-branch benchmark). The checkpoints were mirrored into the lerobot/ org and that's the canonical location going forward. * refactor(robotwin): rebase docker image on huggingface/lerobot-gpu Mirror the libero/metaworld/libero_plus/robomme pattern: start from the nightly GPU image (apt deps, python, uv, venv, lerobot[all] already there) and layer on only what RoboTwin 2.0 uniquely needs — cuda-nvcc + cuda-cudart-dev (CuRobo builds from source), Vulkan libs + NVIDIA ICD (SAPIEN renderer), sapien/mplib/open3d/pytorch3d/curobo installs, the mplib + sapien upstream patches, and the TianxingChen asset download. Drops ~90 lines of duplicated base setup (CUDA FROM, apt python, uv install, user creation, venv init, base lerobot install). 199 → 110. Also repoint the docs + env docstring dataset link from hxma/RoboTwin-LeRobot-v3.0 to the canonical lerobot/robotwin_unified. * docs(robotwin): add robotwin to _toctree.yml under Benchmarks doc-builder's TOC integrity check was rejecting the branch because docs/source/robotwin.mdx existed but wasn't listed in _toctree.yml. * fix(robotwin): defer YAML lookup and realign tests with current API __init__ was eagerly calling _load_robotwin_setup_kwargs just to read head_camera_h/w from the YAML. That import (`from envs import CONFIGS_PATH`) required a real RoboTwin install, so constructing the env — and thus every test in tests/envs/test_robotwin.py — blew up with ModuleNotFoundError on fast-tests where RoboTwin isn't installed. Replace the eager lookup with DEFAULT_CAMERA_H/W constants (240×320, the D435 dims baked into task_config/demo_clean.yml). reset() still resolves the full setup_kwargs lazily — that's fine because reset() is only called inside the benchmark Docker image where RoboTwin is present. Also resync the test file with the current env API: - mock get_obs() as the real nested {"observation": {cam: {"rgb": …}}, "joint_action": {"vector": …}} shape - patch both _load_robotwin_task and _load_robotwin_setup_kwargs (_patch_load → _patch_runtime) - drop `front_camera` / `left_wrist` from assertions — aloha-agilex exposes head_camera + left_camera + right_camera, not those - black-frame test now uses left_camera as the missing camera - setup_demo call check loosened to the caller-provided seed/is_test bits (full kwargs include the YAML-derived blob) * fix: integrate PR #3315 review feedback - ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step - docker: tie patches to pinned versions with removal guidance, remove unnecessary HF_TOKEN for public dataset, fix hadolint warnings - docs: fix paper link to arxiv, add teaser image, fix camera names (4→3 cameras), fix observation dims (480x640→240x320) * fix(docs): correct RoboTwin 2.0 paper arxiv link * fix(docs): use correct RoboTwin 2.0 teaser image URL * fix(docs): use plain markdown image to fix MDX build * ci(robotwin): smoke-eval 10 tasks instead of 5 Broader coverage on the RoboTwin 2.0 benchmark CI job: bump the smoke eval from 5 tasks to 10 (one episode each). Added tasks are all drawn from ROBOTWIN_TASKS and mirror the shape/complexity of the existing set (simple single-object or single-fixture manipulations). Tasks now run: beat_block_hammer, click_bell, handover_block, open_laptop, stack_blocks_two, click_alarmclock, close_laptop, close_microwave, open_microwave, place_block. `parse_eval_metrics.py` reads `overall` for multi-task runs so no parser change is needed. Bumped the step name and the metrics label to reflect the 10-task layout. * fix(ci): swap 4 broken RoboTwin tasks in smoke eval The smoke eval hit two upstream issues: - `open_laptop`: bug in OpenMOSS/RoboTwin main — `check_success()` uses `self.arm_tag`, but that attribute is only set inside `play_once()` (the scripted-expert path). During eval `take_action()` calls `check_success()` directly, hitting `AttributeError: 'open_laptop' object has no attribute 'arm_tag'`. - `close_laptop`, `close_microwave`, `place_block`: not present in upstream RoboTwin `envs/` at all — our ROBOTWIN_TASKS tuple drifted from upstream and these names leaked into CI. Replace the four broken tasks with upstream-confirmed equivalents that exist both in ROBOTWIN_TASKS and in RoboTwin's `envs/`: `adjust_bottle`, `lift_pot`, `stamp_seal`, `turn_switch`. New 10-task smoke set: beat_block_hammer, click_bell, handover_block, stack_blocks_two, click_alarmclock, open_microwave, adjust_bottle, lift_pot, stamp_seal, turn_switch. * fix(robotwin): sync ROBOTWIN_TASKS + doc with upstream (50 tasks) The local ROBOTWIN_TASKS tuple drifted from upstream RoboTwin-Platform/RoboTwin. Users passing names like `close_laptop`, `close_microwave`, `dump_bin`, `place_block`, `pour_water`, `fold_cloth`, etc. got past our validator (the names were in the tuple) but then crashed inside robosuite with a confusing error, because those tasks don't exist in upstream `envs/`. - Replace ROBOTWIN_TASKS with a verbatim mirror of upstream's `envs/` directory: 50 tasks as of main (was 60 with many stale entries). Added a `gh api`-based one-liner comment so future bumps are mechanical. - Update the `60 tasks` claims in robotwin.mdx and RoboTwinEnvConfig's docstring to `50`. - Replace the stale example-task table in robotwin.mdx with ten upstream-confirmed examples, and flag `open_laptop` as temporarily broken (its `check_success()` uses `self.arm_tag` which is only set inside `play_once()`; eval-mode callers hit AttributeError). - Rebuild the "Full benchmark" command with the actual 50-task list, omitting `open_laptop`. * test(robotwin): lower task-count floor from 60 to 50 ROBOTWIN_TASKS was trimmed to 50 tasks (see comment in `src/lerobot/envs/robotwin.py:48`), but the assertion still required ≥60, causing CI failures. Align the test with the current upstream task count. * fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs Port of #3416 onto this branch. * ci: gate Docker Hub login on secret availability * fix: integrate PR #3315 review feedback - envs(robotwin): default `observation_height/width` in `create_robotwin_envs` to `DEFAULT_CAMERA_H/W` (240/320) so they match the D435 dims baked into `task_config/demo_clean.yml`. - envs(robotwin): resolve `task_config/demo_clean.yml` via `CONFIGS_PATH` instead of a cwd-relative path; works regardless of where `lerobot-eval` is invoked. - envs(robotwin): replace `print()` calls in `create_robotwin_envs` with `logger.info(...)` (module-level `logger = logging.getLogger`). - envs(robotwin): use `_LazyAsyncVectorEnv` for the async path so async workers start lazily (matches LIBERO / RoboCasa / VLABench). - envs(robotwin): cast `agent_pos` space + joint-state output to float32 end-to-end (was mixed float64/float32). - envs(configs): use the existing `_make_vec_env_cls(use_async, n_envs)` helper in `RoboTwinEnvConfig.create_envs`; drop the `get_env_processors` override so RoboTwin uses the identity processor inherited from `EnvConfig`. - processor: delete `RoboTwinProcessorStep` — the float32 cast now happens in the wrapper itself, so the processor is redundant. - tests: drop the `TestRoboTwinProcessorStep` suite; update the mock obs fixture to use float32 `joint_action.vector`. - ci: hoist `ROBOTWIN_POLICY` and `ROBOTWIN_TASKS` to job-level env vars so the task list and policy aren't duplicated across eval / extract / parse steps. - docker: pin RoboTwin + CuRobo upstream clones to commit SHAs (`RoboTwin@0aeea2d6`, `curobo@ca941586`) for reproducibility.	2026-04-20 17:46:39 +02:00
Pepijn	e699e52388	feat(envs): add RoboCasa365 benchmark integration (#3375 ) * feat(envs): add RoboCasa365 benchmark integration Add RoboCasa365 (arXiv:2603.04356) as a new simulation benchmark with 365 everyday kitchen manipulation tasks across 2,500 diverse environments. New files: - src/lerobot/envs/robocasa.py: gym.Env wrapper with deferred env creation, flat 12D action / 16D state vectors, 3-camera support - docs/source/robocasa.mdx: user-facing documentation - docker/Dockerfile.benchmark.robocasa: CI benchmark image Modified files: - src/lerobot/envs/configs.py: RoboCasaEnv config (--env.type=robocasa) - pyproject.toml: robocasa optional dependency group - docs/source/_toctree.yml: sidebar entry - .github/workflows/benchmark_tests.yml: integration test job Refs: https://arxiv.org/abs/2603.04356, https://robocasa.ai Related: huggingface/lerobot#321 * fix(docker): use uv pip to install robocasa in benchmark image The huggingface/lerobot-gpu base image uses `uv` with a venv at /lerobot/.venv — `pip` is not on PATH, so `pip install` fails with "pip: not found". Switch to `uv pip install` which installs into the existing venv. Also drop the @v1.0.0 tag pin from the robocasa git URL since the upstream repo may not have that tag; use default branch instead. * fix(robocasa): editable install + switch to lerobot/smolvla_robocasa - pip install from git omits data files like box_links_assets.json (not declared in package_data). Clone and install editable so the source tree is used at runtime. - Download only tex + fixtures_lw asset types (smoke test doesn't need objaverse/aigen objects). Pipe 'y' to auto-accept download prompt. - Switch CI policy from pepijn223/smolvla_robocasa to lerobot/smolvla_robocasa. * fix(docker): re-install lerobot editably after COPY The nightly huggingface/lerobot-gpu image predates the RoboCasaEnv registration — so `lerobot-eval --env.type=robocasa` fails at argparse with "invalid choice" even after COPY . . overlays the new source. Force an editable reinstall so the venv picks up the current configs.py. * fix(ci): add rename_map for robocasa eval (image* -> camera) Policy lerobot/smolvla_robocasa expects observation.images.camera1/2/3, but RoboCasaEnv produces observation.images.image/image2/image3. fix(robocasa): override RoboCasaGymEnv default split (test -> all) RoboCasaGymEnv defaults split="test", but create_env only accepts {None, "all", "pretrain", "target"}, so the out-of-the-box default crashes with ValueError. Always pass "all" when split is None. * fix(docker): also download objs_lw (lightwheel objects) for robocasa Kitchen tasks (e.g. CloseFridge) reference lightwheel object meshes like Stool022/model.xml. fixtures_lw alone isn't enough — we also need objs_lw. Still skipping objaverse/aigen to keep image size down. Made-with: Cursor * feat(robocasa): raw camera names + benchmark-group task shortcuts Align the LeRobot env with RoboCasa's native conventions so policies trained on the upstream datasets don't need a --rename_map at eval time, and expose the standard task groups as first-class --env.task values. - Preserve raw RoboCasa camera names (e.g. robot0_agentview_left) as observation.images.<name> end-to-end. Drops camera_name_mapping and DEFAULT_CAMERA_NAME_MAPPING; features/features_map are now built dynamically from the parsed camera list. - Accept benchmark-group names as --env.task: atomic_seen, composite_seen, composite_unseen, pretrain50/100/200/300. Expanded lazily via robocasa.utils.dataset_registry and auto-sets the split ("target" \| "pretrain"). - Update CI smoke-eval rename_map to map raw cam names to the camera1/2/3 keys expected by lerobot/smolvla_robocasa. * docs(robocasa): single-task smolvla train+eval recipe on pepijn223/robocasa_CloseFridge - Rewrite observation section to use raw RoboCasa camera keys (observation.images.robot0_agentview_{left,right}, observation.images.robot0_eye_in_hand). - Add a "Training on a single task" section with a full smolvla training command on pepijn223/robocasa_CloseFridge, plus matching single-task eval command. - Document benchmark-group task shortcuts (atomic_seen, composite_seen, composite_unseen, pretrain50/100/200/300) as valid --env.task values. * fix(robocasa): restrict obj_registries to lightwheel by default CloseFridge (and most kitchen tasks) crashed at reset with `ValueError: Probabilities contain NaN` coming out of `sample_kitchen_object_helper`. RoboCasa's upstream default `obj_registries=("objaverse", "lightwheel")` normalizes per-registry candidate counts as probabilities; when a sampled category has zero mjcf paths in every configured registry (because the objaverse asset pack isn't on disk — ~30GB, skipped by our Docker build), the 0/0 divide yields NaNs and `rng.choice` raises. - Add `obj_registries: list[str] = ["lightwheel"]` to `RoboCasaEnv` config; thread it through `create_robocasa_envs`, `_make_env_fns`, and the gym.Env wrapper to the underlying `RoboCasaGymEnv` (which forwards to `create_env` → `robosuite.make` → kitchen env). - Default matches what `download_kitchen_assets --type objs_lw` actually ships, so the env works out of the box without a 30GB objaverse download. - Document the override (`--env.obj_registries='[objaverse,lightwheel]'`) for users who have downloaded the full asset set. * fix(docker): also download tex_generative for robocasa benchmark RoboCasa's lightwheel kitchen fixtures embed references to `generative_textures/wall/tex.png` directly in their MuJoCo XML, so `MjModel.from_xml_string` errors out at reset time with "No such file or directory" even when the env is constructed with `generative_textures=None`. The generative textures live under a separate asset registry key (`tex_generative`) in `download_kitchen_assets`, distinct from the base `tex` pack we were already fetching. - Add `tex_generative` to the download list so the fixture XMLs resolve. - Document the remaining omissions (objaverse/aigen, ~30GB) and how the runtime side pairs this with obj_registries=["lightwheel"] to avoid sampling from categories whose assets aren't on disk. ci(robocasa): smoke-eval 10 atomic tasks instead of 1 Broader coverage in the benchmark CI job: evaluate SmolVLA on ten fixture-centric atomic RoboCasa tasks (one episode each) instead of just CloseFridge. The tasks are all drawn from TARGET_TASKS.atomic_seen and selected to avoid object-manipulation categories that would require the objaverse/aigen asset packs (we only ship objs_lw in the Docker image, paired with obj_registries=["lightwheel"] on the runtime side). Tasks: CloseFridge, OpenCabinet, OpenDrawer, TurnOnMicrowave, TurnOffStove, CloseToasterOvenDoor, SlideDishwasherRack, TurnOnSinkFaucet, NavigateKitchen, TurnOnElectricKettle. `scripts/ci/parse_eval_metrics.py` already handles multi-task output via the `overall` key, so no parser changes needed. Bumped the metrics artifact's task label to `atomic_smoke_10` to reflect the grouping. * fix(pyproject): drop unresolvable robocasa extra robocasa's upstream setup.py hardcodes `lerobot==0.3.3` in install_requires. Exposing it as the `lerobot[robocasa]` extra made uv's dep resolver cycle: `lerobot[robocasa]` -> robocasa -> lerobot (a different version) -> unsolvable. This broke every `uv sync` — even invocations with an unrelated extra like `--extra test` — because uv validates the whole lockfile graph. - Remove the `robocasa` extra from pyproject.toml. Installation instructions in docs/source/robocasa.mdx now walk users through the manual `git clone` + `pip install --no-deps` flow, which matches what the Docker image already does and sidesteps the cyclic dep entirely. - Dockerfile: `uv pip install -e ~/robocasa --no-deps` so the shadowed lerobot==0.3.3 never lands in the image; install robocasa's actual runtime deps (numpy, numba, scipy, mujoco, tianshou, etc.) explicitly. * docs(robocasa): align page with adding_benchmarks template Rework docs/source/robocasa.mdx to follow the standard benchmark doc structure: intro + links + available tasks (with family breakdown and first-class benchmark-group shortcuts) + installation + eval + recommended episodes + policy I/O + training + reproducing results. - Fix the paper link (was pointing at a non-existent arxiv ID). - Surface lerobot/smolvla_robocasa and pepijn223/robocasa_CloseFridge in the top-of-page links so they're findable without reading the training section. - Add an explicit "Object registries" subsection explaining the `--env.obj_registries=[objaverse,lightwheel]` override path. - Add an explicit "Reproducing published results" section pointing at the CI smoke eval. * fix: integrate PR #3375 review feedback - envs(robocasa): hoist the duplicated `_parse_camera_names` helper out of `libero.py` and `robocasa.py` into `envs/utils.py` as the public `parse_camera_names`; call sites updated. - envs(robocasa): give each factory a distinct `episode_index` (`0..n_envs-1`) and derive a per-worker seed series in `reset()` so n_envs workers don't all roll the same scene under a shared outer seed. - envs(robocasa): drop the unused `*kwargs` on `_make_env`; declare `visualization_height` / `visualization_width` on both the wrapper and the `RoboCasaEnv` config + propagate via `gym_kwargs`. - envs(robocasa): emit `info["final_info"]` on termination (matching MetaWorld) so downstream vector-env auto-reset keeps the terminal task/success flags. - docs(robocasa): add `--rename_map` (robot0_agentview_left/ eye_in_hand/agentview_right → camera1/2/3) plus CI-parity flags to all three eval snippets. - docker(robocasa): pin robocasa + robosuite git SHAs and the pip dep versions (pygame, Pillow, opencv-python, pyyaml, pynput, tqdm, termcolor, imageio, h5py, lxml, hidapi, gymnasium) for reproducible benchmark images. - ci(robocasa): update the workflow comment — there is no `lerobot[robocasa]` extra; robocasa/robosuite are installed manually because upstream's `lerobot==0.3.3` pin shadows ours. docs(robocasa): add benchmark banner image * fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs Port of #3416 onto this branch. Also threads the cached metadata through the RoboCasa factory so async eval on `--env.type=robocasa` keeps the same improvement. * fix: integrate PR #3375 review feedback (round 2) - envs(robocasa): when the caller passes `seed=None` to `reset()`, fall back to `self.episode_index` for the inner env seed so each worker still samples a distinct trajectory instead of all workers inheriting the same global RNG state. - envs(robocasa): replace the two module-level `print()` calls in `create_robocasa_envs` with `logger.info(...)` via a module-level `logger = logging.getLogger(__name__)`. - ci(robocasa): run `scripts/ci/extract_task_descriptions.py` after the eval so `metrics.json` carries per-task natural-language labels, matching LIBERO / MetaWorld / VLABench jobs. Added a `_robocasa_descriptions()` extractor that splits CamelCase task names into word-level labels keyed by `<task>_0`.	2026-04-20 17:10:53 +02:00
Steven Palma	b49e4016f2	test(rollout): fix expectations	2026-04-20 16:17:23 +02:00
Steven Palma	02d8a34829	fix(docs): dagger num_episodes	2026-04-20 15:40:43 +02:00
Steven Palma	14c7a25ce4	fix(rollout) require dataset in dagger + use duration too	2026-04-20 15:34:41 +02:00
Haoming Song	b2765b39b8	Cache lazy async env metadata for eval (#3416 ) Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-04-20 15:33:13 +02:00
Steven Palma	bc06cb44ca	feat(scripts): lerobot-rollout	2026-04-20 00:45:10 +02:00
Defalt	5c43fa1cce	fix(policies): replace deprecated torch.cuda.amp.autocast with torch.amp.autocast (#3167 ) Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-04-19 16:25:08 +02:00
k1000dai	3f16d98a9b	episods→episodes (#3410 ) Fixing typo	2026-04-19 12:58:06 +02:00
whats2000	52f508c51c	fix(dataset): cleanup_interrupted_episode wipes image temp dirs (#3405 )	2026-04-19 12:04:24 +02:00
Steven Palma	a8b72d9615	feat(dataset): 2x faster dataloader via parallel decode, uint8 transport, and persistent workers (#3406 ) * feat(dataset): 2xfaster dataloader * fix(dataset): streaming return uint8 decode * fix(tests): adjust normalization step comparison * fix(dataset): with threadexecutor + False default * chore(dataset): make it a config * fix(test): account for uint8 in training path testing	2026-04-19 00:08:22 +02:00
Shu Jiuhe	a99943ca26	Improve loading performance in `_absolute_to_relative_idx` when remapping indices (#3279 )	2026-04-18 19:28:50 +02:00
Cheng Yin	a9821af61b	fix(record): pass rename_map to make_policy in lerobot-record (#3240 ) * fix(record): pass rename_map to make_policy in lerobot-record Fixes #3181. The rename_map from dataset config was used for preprocessor construction but not passed to make_policy(), causing feature mismatch errors when camera key names differ between dataset and model config. make_policy() already accepts a rename_map parameter and uses it to skip visual feature consistency validation when remapping is active, but lerobot_record.py was not passing it through. * style: fix ruff format for ternary expression --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-04-17 16:40:08 +02:00
Steven Palma	9021d2d240	refactor(imports): enforce guard pattern (#3382 ) * refactor(imports): enforce guard pattern * fix(tests): skip reachy2 if not installed * Address review feedback	2026-04-14 22:54:05 +02:00
Khalil Meftah	60e7d67cb8	fix: catch KeyboardInterrupt in safe_stop_image_writer to prevent corrupted frames (#3381 )	2026-04-14 18:22:56 +02:00
Radu	1ede000bdd	fix(rl): swap dict merge order to preserve teleop intervention flag (#3273 ) Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>	2026-04-14 16:20:54 +02:00
Khalil Meftah	d57c58a532	fix: add thread synchronization to ReplayBuffer to prevent race condition between add() and sample() (#3372 )	2026-04-14 13:16:45 +02:00
Matteo Tiezzi	b3e76a92f2	fix(groot): compatibility fixes for gr00t in v0.5 (#3182 ) * fix(groot): apply groot 0.5 fixes * fix(groot): correct indentation and add tile count in Eagle25VL processor * Fixed lint7/style	2026-04-14 13:09:18 +02:00
Maxime Ellerbach	a656a982af	fix(feetech): motor position readings overflow (#3373 )	2026-04-13 22:39:58 +02:00
Jash Shah	9bd844a3b9	fix(rl): ensure queue and process cleanup on abnormal exit (#3063 ) Wrap the main execution in actor_cli and start_learner_threads with try/finally so that queues are closed and processes are joined even when an unhandled exception occurs. Previously, exceptions in act_with_policy or add_actor_information_and_train would skip all cleanup code, leaking GPU/CPU resources. Also sets the shutdown_event on exception so child processes exit gracefully. Fixes #3059 Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>	2026-04-13 16:25:42 +02:00
Steven Palma	df0763a2bc	feat(dependencies): minimal default tag install (#3362 )	2026-04-12 20:03:04 +02:00
Caroline Pascal	d762f4bfe8	fix(dataset): adding metadata loading when reading from a dataset after writing (#3305 ) * fix(one shot load): adding metadata loading when reading from a dataset after writing * refactor(one shot load): move metadata reload to ensure_readable() on LeRobotDatasetMetadata Move the metadata reload from DatasetReader.load_and_activate() to a new public ensure_readable() method on LeRobotDatasetMetadata, called from LeRobotDataset._ensure_reader(). This places lifecycle management in the right layer: metadata owns its readiness check, the dataset orchestrates the write-to-read transition, and the reader stays clean. Also adds a regression test using delta_timestamps to exercise the meta.episodes access path in the create -> write -> finalize -> read flow. Co-authored-by: Steven Palma <imstevenpmwork@users.noreply.github.com> --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Steven Palma <imstevenpmwork@users.noreply.github.com>	2026-04-10 11:29:40 +02:00
Pepijn	919184d6f8	feat(envs): lazy env init + AsyncVectorEnv as default for n_envs > 1 (#3274 ) * docs(benchmarks): add benchmark integration guide and standardize benchmark docs Add a comprehensive guide for adding new benchmarks to LeRobot, and refactor the existing LIBERO and Meta-World docs to follow the new standardized template. Made-with: Cursor * refactor(envs): move dispatch logic from factory into EnvConfig subclasses Replace hardcoded if/elif chains in factory.py with create_envs() and get_env_processors() methods on EnvConfig. New benchmarks now only need to register a config subclass — no factory.py edits required. Net -23 lines: factory.py shrinks from ~200 to ~70 lines of logic. Made-with: Cursor * docs(benchmarks): clean up adding-benchmarks guide for clarity Rewrite for simpler language, better structure, and easier navigation. Move quick-reference table to the top, fold eval explanation into architecture section, condense the doc template to a bulleted outline. Made-with: Cursor * fix link * fix task count * fix: enable SmolVLA eval on LIBERO with custom camera mappings - Thread camera_name_mapping from LiberoEnv config through to gym envs - Sync features_map with camera_name_mapping in LiberoEnv.__post_init__ - Fix render() to use first available camera instead of hardcoded "image" - Handle non-dict final_info in rollout by falling back to info["is_success"] - Add use_peft legacy field to SmolVLAConfig for checkpoint compat - Add defaults to GR00TN15Config init=False fields for transformers 5.3 Made-with: Cursor * fix: use direct AutoresetMode import for gymnasium compat Made-with: Cursor * fix: handle gymnasium < 1.0 without AutoresetMode Made-with: Cursor * refactor: revert policy changes, keep env-only camera mapping fixes - Revert GR00T N1.5 default_factory/default changes (transformers compat) - Revert SmolVLA use_peft legacy field - Apply ruff formatting fixes - camera_name_mapping stays entirely in env/eval layer (no policy changes) Made-with: Cursor * Update docs/source/env_processor.mdx Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * feat(envs): lazy env init + AsyncVectorEnv as default for n_envs > 1 LiberoEnv and MetaworldEnv previously allocated GPU resources (EGL context, OpenGL framebuffer) in __init__, before AsyncVectorEnv's fork(). Worker processes inherited stale GPU handles, causing EGL_BAD_CONTEXT crashes on first render. Fix: defer OffScreenRenderEnv / MT1 construction to _ensure_env(), called on first reset() or step() inside the worker subprocess. Each worker creates its own clean context after fork(). Also fixes lerobot_eval.py:170 (add_envs_task TODO): replace with env.call("task") which works with both SyncVectorEnv and AsyncVectorEnv. AsyncVectorEnv is now the default for n_envs > 1; auto-downgraded to SyncVectorEnv when n_envs=1 (no benefit, less overhead). Expected speedup: ~15-20x for LIBERO Spatial with batch_size=50. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: close envs between tasks to prevent worker process accumulation eval_policy_all never closed environments after each task completed, causing AsyncVectorEnv worker processes to accumulate (N_tasks × n_envs). This led to OOM, BrokenPipeError and EOFError on multi-task benchmarks. Also fixes: - AsyncVectorEnv compat in envs/utils.py (use get_attr/call instead of .envs) - Tuple task handling in tokenizer_processor and lerobot_eval - _LazyAsyncVectorEnv for deferred worker spawning in LIBERO Made-with: Cursor * fix(eval): use task_description instead of task for language conditioning env.call("task") returns the LIBERO task name with underscores (e.g. "pick_up_the_black_bowl_...") instead of the natural language description ("pick up the black bowl ..."). The VLM tokenizes these completely differently, causing 0.0 reward across all episodes. Made-with: Cursor * docs: update adding_benchmarks for async env changes - Replace add_envs_task reference with env.call("task_description") - Update use_async_envs default to True - Add note about lazy GPU init for AsyncVectorEnv compatibility Made-with: Cursor * feat(eval): batch_size=auto + faster env loading - batch_size=0 (default) auto-tunes based on CPU cores, capped by n_episodes and 64. Removes the need for users to guess the right value. The old batch_size > n_episodes error is replaced by silently clamping to n_episodes. - _LazyAsyncVectorEnv accepts pre-computed spaces so only one temp env is created per suite (not per task). For libero_spatial (10 tasks) this avoids 9 redundant LiberoEnv instantiations during env setup. Made-with: Cursor * docs: add evaluation guide and update benchmarks doc - New docs/source/evaluation.mdx covering lerobot-eval usage, batch_size auto-tuning, AsyncVectorEnv performance, tuning tips, output format, multi-task evaluation, and programmatic usage. - Add evaluation page to _toctree.yml under Benchmarks section. - Update adding_benchmarks.mdx to reference batch_size auto default and link to the evaluation guide. Made-with: Cursor * docs(evaluation): remove benchmark table, rename section header Made-with: Cursor * perf(eval): shared memory, observation passthrough, task prefetch - AsyncVectorEnv now uses shared_memory=True for zero-copy observation transfer - LiberoEnvConfig.gym_kwargs passes observation_height/width to the env - eval_policy_all prefetches next task's workers while current task runs Made-with: Cursor * style: ruff format Made-with: Cursor * chore: revert env_processor.mdx changes (not part of this PR) Made-with: Cursor * ci(benchmarks): add isolated integration tests for libero and metaworld Each benchmark gets its own Docker image (lerobot[libero] / lerobot[metaworld] only) so incompatible dep trees cannot collide. A 1-episode smoke eval runs per benchmark on GPU runners. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci(benchmarks): pin action hashes and use uv sync --locked Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci(benchmarks): trigger only on envs/ or lerobot_eval.py changes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(ci): set LIBERO_DATA_FOLDER to bypass interactive stdin prompt libero/__init__.py calls input() to ask about a custom dataset path, which raises EOFError when stdin is closed inside Docker. Setting LIBERO_DATA_FOLDER skips the prompt entirely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(benchmarks): add CI smoke test step to adding_benchmarks guide Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(ci): pre-create libero config in Dockerfile to bypass stdin prompt libero/__init__.py calls input() when ~/.libero/config.yaml is missing. We write the config at image build time (without importing libero) so the prompt never fires at runtime. Also trigger CI on pyproject.toml changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(ci): use shell to create libero config instead of multiline python -c The multiline RUN python -c "..." was being parsed as Dockerfile instructions. Use printf to write ~/.libero/config.yaml directly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(ci): point libero config to bundled package init_files The config was pointing to /tmp/libero_init which doesn't exist. Use importlib.util.find_spec to locate the hf-libero package directory and write paths to the actual bundled bddl_files/init_files/assets. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(ci): add smolvla extra to benchmark Dockerfiles num2words (required by SmolVLM processor) is declared in lerobot[smolvla], not lerobot[libero/metaworld]. Install both extras together. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(eval): render_frame covers _LazyAsyncVectorEnv isinstance(env, AsyncVectorEnv) silently skipped _LazyAsyncVectorEnv, causing video rendering to produce no frames on the default async path. Switch to hasattr(env, "call") so any async-compatible env (including _LazyAsyncVectorEnv) hits the call("render") branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(envs): remove unused _get_sub_env_attr helper _get_sub_env_attr was defined but never called anywhere in the codebase. _sub_env_has_attr (its sibling) is kept — it is actively used in utils.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore: apply prettier formatting to docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(env_processor): remove deprecated add_envs_task from pipeline example add_envs_task is replaced by env.call("task_description") in this PR. Remove it from the pipeline walkthrough and renumber the steps (8→7). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(envs): remove __del__ from _LazyAsyncVectorEnv __del__ is unreliable as a cleanup mechanism. close() is already called explicitly in the eval loop's finally block, so the finalizer is redundant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(eval): prefetch next task's workers after close to avoid GPU memory overlap Previously, next task's AsyncVectorEnv workers were spawned while the current task was still running, causing both tasks' GPU contexts to coexist. Moving the prefetch start into the finally block (after env.close()) ensures workers for task N+1 only spin up once task N has released GPU memory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(envs): move _LazyAsyncVectorEnv to utils and apply to metaworld _LazyAsyncVectorEnv lived in libero.py but metaworld had the same OOM problem: all tasks' AsyncVectorEnv workers were spawned eagerly, wasting GPU memory for tasks not yet running. Move the class to envs/utils.py so both environments share it, then apply the same is_async + lazy wrapping pattern in create_metaworld_envs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore: remove out-of-scope benchmark/CI/docs files from PR Benchmark CI workflow, Dockerfiles, benchmark docs, evaluation smoke-test doc, and dispatch tests belong in a separate PR. Scope this PR to the async env init changes only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore: restore adding_benchmarks + test_dispatch, drop env_processor changes - Restore docs/source/adding_benchmarks.mdx (belongs in this PR) - Restore tests/envs/test_dispatch.py (belongs in this PR) - Revert docs/source/env_processor.mdx to main (out of scope for this PR) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(adding_benchmarks): remove CI smoke test step (coming in separate PR) Step 7 (Dockerfile + benchmark_tests.yml CI job) and its table rows are out of scope for this PR. The CI infrastructure will be added on top in a follow-up PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(envs): remove unused add_envs_task Replaced by env.call("task_description") in lerobot_eval.py. No callers remain in the codebase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: fix prettier formatting in env_processor.mdx Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(eval): catch AttributeError and NotImplementedError explicitly for task description Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(envs): use forkserver context and close envs in test to prevent deadlock AsyncVectorEnv with default fork context leaks worker processes between test_policy parametrized cases; subsequent env creation deadlocks because new forked workers inherit stale pipe FDs from previous test's leaked workers. - configs.py: pass context="forkserver" to AsyncVectorEnv (matches _LazyAsyncVectorEnv) - test_policies.py: call close_envs(envs) at end of test_policy to clean up workers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(envs): default use_async_envs=False in create_envs and make_env Tests that call make_env(n_envs=2) without passing use_async_envs were getting AsyncVectorEnv, whose forked workers can't resolve gym namespaces registered at runtime. Default to False (sync) so existing tests pass. lerobot_eval.py explicitly passes cfg.eval.use_async_envs, so the CLI async behaviour (controlled by EvalConfig.use_async_envs) is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 10:29:20 +02:00
Pepijn	5de7aa5a4f	refactor(envs): move benchmark dispatch into EnvConfig subclasses (#3272 ) * docs(benchmarks): add benchmark integration guide and standardize benchmark docs Add a comprehensive guide for adding new benchmarks to LeRobot, and refactor the existing LIBERO and Meta-World docs to follow the new standardized template. * refactor(envs): move dispatch logic from factory into EnvConfig subclasses Replace hardcoded if/elif chains in factory.py with create_envs() and get_env_processors() methods on EnvConfig. New benchmarks now only need to register a config subclass — no factory.py edits required. Net -23 lines: factory.py shrinks from ~200 to ~70 lines of logic. * docs(benchmarks): clean up adding-benchmarks guide for clarity Rewrite for simpler language, better structure, and easier navigation. Move quick-reference table to the top, fold eval explanation into architecture section, condense the doc template to a bulleted outline. * fix link * fix task count * fix(tests): fix 3 failing dispatch tests - test_registry_all_types: skip non-EnvConfig stubs (e.g. TestPluginConfig) - test_processors_delegation: use None instead of abstract PreTrainedConfig - test_custom_get_env_processors_override: use DataProcessorPipeline for isinstance check (PolicyProcessorPipeline is a subscripted generic) * fix: enable SmolVLA eval on LIBERO with custom camera mappings - Thread camera_name_mapping from LiberoEnv config through to gym envs - Sync features_map with camera_name_mapping in LiberoEnv.__post_init__ - Fix render() to use first available camera instead of hardcoded "image" - Handle non-dict final_info in rollout by falling back to info["is_success"] - Add use_peft legacy field to SmolVLAConfig for checkpoint compat - Add defaults to GR00TN15Config init=False fields for transformers 5.3 Made-with: Cursor * fix: use direct AutoresetMode import for gymnasium compat Made-with: Cursor * fix: handle gymnasium < 1.0 without AutoresetMode Made-with: Cursor * refactor: revert policy changes, keep env-only camera mapping fixes - Revert GR00T N1.5 default_factory/default changes (transformers compat) - Revert SmolVLA use_peft legacy field - Apply ruff formatting fixes - camera_name_mapping stays entirely in env/eval layer (no policy changes) Made-with: Cursor * Update docs/source/env_processor.mdx Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update docs/source/env_processor.mdx Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update docs/source/env_processor.mdx Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(eval): raise RuntimeError for unsupported final_info format (Gymnasium < 1.0) Made-with: Cursor * style: fix markdown code fences in env_processor.mdx Made-with: Cursor * docs: remove duplicate code blocks in env_processor.mdx Made-with: Cursor * style: revert quadruple backticks to triple (prettier compat) * docs(env_processor): add EnvConfig subclass step and policy_cfg examples - Add missing '### 2. Update Your EnvConfig Subclass' section with get_env_processors() snippet - Update factory usage example to show policy_cfg parameter and keyword-argument style for both SmolVLA and ACT cases * docs(env_processor): rename step 2 and fix policy_cfg examples - Rename '### 2. Update the Factory' → '### 2. Update Your EnvConfig Subclass' - Update factory usage examples to use keyword-argument style with policy_cfg parameter for both SmolVLA and ACT cases --------- Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>	2026-04-08 17:48:58 +02:00
Francesco Capuano	7c032f19fc	feat(dataset): registering torchvision transforms (#3153 ) * add: a flexible transformation registry * fix: image transforms can be set both at init and after * add: tests * fix: take in review * feat(datasets): add image transform setters * fix: pre-commit * fix: CI --------- Signed-off-by: Francesco Capuano <74058581+fracapuano@users.noreply.github.com>	2026-04-07 15:59:11 +02:00
Anthony Chan	e2f27bf71b	Fix lerobot_train script without interpolation (#3281 )	2026-04-07 15:50:18 +02:00
Steven Palma	d60a700d2b	chore(policy): multi dit docs (#3285 ) * docs(policy): add libero results multi task dit + remove readme in src code * docs(policy): add hyperlink to doc file in src code * chore(style): pre-commit	2026-04-05 21:23:13 +02:00
Steven Palma	8c3d4cf900	chore(docs): no policy readme in src code (#3286 ) * chore(docs): move policies readme out of src code * chore(docs): create symlink for policy readme	2026-04-05 19:25:38 +02:00
Pepijn	818892a38b	feat(dagger): Add HIL/Dagger/HG-Dagger/RaC style data collection (#2833 ) * feat: HIL data collection, RTC interpolator, and action queue improvements - Add Human-in-the-Loop (HIL) data collection examples (sync + RTC) - Add HIL data collection documentation - Add ActionInterpolator for smoother policy control at higher rates - Integrate interpolator into lerobot-record and eval_with_real_robot - Add action queue clear() and get_processed_left_over() methods - Add rtc/__init__.py for cleaner imports * docs: expand Related Work section with paper summaries * fix: only record dataset frames at original fps, not at interpolated rate The interpolator speeds up robot control (e.g. 2x) but dataset frames should still be recorded at the original fps. Interpolated-only iterations now only send actions to the robot without writing to the dataset. * refactor: merge HIL sync and RTC scripts into single file with --rtc.enabled toggle Combines hil_data_collection.py and hil_data_collection_rtc.py into one script. RTC is toggled via --rtc.enabled=true (defaults to off for sync inference). Deletes the separate hil_data_collection_rtc.py and updates docs to reflect the single-script usage. * test: add ActionInterpolator test suite (29 tests) Covers constructor validation, passthrough (multiplier=1), 2x and 3x interpolation with exact value checks, reset/episode boundaries, control interval calculation, multi-dim actions, and simulated control loop integration. * test: add ActionQueue + ActionInterpolator integration tests Verifies the interpolator doesn't interfere with RTC's leftover chunk tracking: queue consumption rate matches base fps regardless of multiplier, get_left_over/get_processed_left_over only change on queue.get(), merge preserves smooth interpolation across chunks, and interpolator reset is independent of queue state. * feat: register SO follower/leader configs in HIL script Adds SOFollowerRobotConfig and SOLeaderTeleopConfig imports so SO100/SO101 robots can be used via --robot.type=so_follower and --teleop.type=so_leader. Updates docs accordingly. Made-with: Cursor * docs: remove em dashes from HIL documentation Made-with: Cursor * refactor: rename examples/rac to examples/hil Updates directory name and all references in docs and script docstrings. Made-with: Cursor * fix: encorperate pr feedback comments * refactor(tests): enhance ActionInterpolator test structure and add detailed docstrings * feedback pr and test fix * fix(test): pass correct real_delay in interpolator delay test The test was passing real_delay=0 and relying on _check_delays to silently override it with the index-based diff. Now passes real_delay=3 to match the 3 actions consumed during the simulated inference period. * fix pr feedback * ordering * update hil script * fix * default name * fix(bi_openarm): use kw_only=True to fix dataclass field ordering BiOpenArmFollowerConfig overrides `id` with a default, making it positional in the child — non-default `left_arm_config` then follows a default field, which Python dataclasses forbid. Adding kw_only=True (matching the parent RobotConfig) removes positional constraints. Made-with: Cursor * style: format long line in hil_data_collection.py Made-with: Cursor * pr feedback --------- Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>	2026-04-02 19:53:59 +02:00
Pepijn	15934d8d08	feat(policies): add relative action support for pi0, pi0.5, and pi0_fast (#2970 ) * Add option for pi family models to train with relative actions (relative to state) * formatting * add recomputation of stats and option to compute delta stats * normalzie after delta conversion * only recompute state for stats * calulate chunk based stats * sample 100k * load from parquet * sample 1m * stats per chunck * fix * use quantiles * stats for entire dataset * fix * max 1m frames * compute before dist * fix multi gpu processor bug * Fix RTC with delta actions and OpenArms motor_type wiring * feat: align pi0_fast delta actions with pi0/pi05 and add RTC integration tests - Add delta_exclude_joints and action_feature_names to PI0FastConfig - Move to_absolute_actions from modeling to processor pipeline for pi0_fast - Add delta action detection and logging to eval_with_real_robot.py - Add delta actions documentation to pi0 and pi05 READMEs - Fix ruff lint issues in test_delta_actions.py - Add test_rtc_delta_actions.py (24 tests) covering: - ActionQueue with delta vs absolute actions - RTC denoise step with delta leftovers - Full pipeline roundtrip (delta → RTC → absolute) - State rebasing approximation bounds - Non-delta policy compatibility - Multi-chunk consistency * chore: clean up test comments, add OpenPI attribution, remove debug logging - Replace decorative comment separators in test files with plain section headers - Add attribution comments for 1e-6 epsilon in normalize_processor.py (from OpenPI) - Remove debug logging blocks from lerobot_train.py * refactor: extract compute_delta_action_stats into compute_stats.py Move the ~70-line inline delta action stats block from lerobot_train.py into a dedicated function in compute_stats.py, where all other stats computation already lives. The training script now calls it in 6 lines. * refactor: remove unused get_processed_left_over from ActionQueue This method was never called outside of tests. Leftover actions for RTC guidance are always retrieved via get_left_over() (delta/original space). * revert: remove logging-only changes from eval_with_real_robot.py The delta actions detection helper and log message added no functional value — the script already handles delta policies correctly via the processor pipeline. * refactor: use ACTION/OBS_STATE constants instead of hardcoded strings Replace hardcoded "action" and "observation.state" with ACTION and OBS_STATE from utils.constants in compute_stats.py, dataset_tools.py, and lerobot_train.py. * style: remove stray blank lines in training loop * refactor: move delta action stats to preprocessing step, remove on-the-fly computation - Remove on-the-fly compute_delta_action_stats from lerobot_train.py - Rewrite recompute_stats to delegate action stats to compute_delta_action_stats (chunk-based sampling matching what the model sees during training) - Add chunk_size parameter to recompute_stats for delta action computation - Add delta actions documentation to pi0.mdx and pi05.mdx * feat: add recompute_stats CLI operation to lerobot-edit-dataset * fix(tests): relax quantile normalization test tolerance for 1e-6 epsilon * chore: remove agents_memory/pr_details.md from repo * refactor: rename delta actions to relative actions throughout What OpenPI calls "DeltaActions" is actually UMI's "relative trajectory" representation: each action in the chunk is an offset from the current state, not from the previous action. This avoids error accumulation. Renamed across all source, tests, docs, and CLI: - DeltaActionsProcessorStep → RelativeActionsProcessorStep - to_delta_actions → to_relative_actions - use_delta_actions → use_relative_actions - delta_exclude_joints → relative_exclude_joints - compute_delta_action_stats → compute_relative_action_stats - delta_action_processor.py → relative_action_processor.py - test_delta_actions.py → test_relative_actions.py Kept as-is: AbsoluteActionsProcessorStep (converts TO absolute), registry ID "delta_actions_processor" (backward compat), and unrelated delta references (IK pipeline, Robosuite, RA-BC metrics, gym envs). * docs: add Action Representations guide Dedicated page explaining absolute, relative, and delta actions with numerical examples, joint vs EE space, and how to use kinematics pipelines and the relative action processor. References UMI paper (Chi et al., 2024) for the terminology. * docs: remove redundant OpenPI naming note from action representations * docs: remove opinionated OpenPI reference from delta actions section * docs: replace ASCII diagram with UMI paper figure * docs: remove OpenPI reference from action representations * docs: use HF-hosted image instead of local asset * docs: clarify figure attribution * revert: restore original normalization epsilon behavior The 1e-6 unconditional epsilon change perturbed all normalized values, breaking backward compatibility tests. The original approach (1e-8 eps for MEAN_STD, conditional torch.where for QUANTILES) already handles division by zero correctly without affecting non-degenerate cases. * fix: restore delta_action_processor.py used by phone/RL teleop The rename commit incorrectly deleted delta_action_processor.py and duplicated its classes into relative_action_processor.py. Restore the original file and import from it instead. * fix(processor): address PR #2970 review comments - Remove shebang from relative_action_processor.py (library module, not script) - Add device alignment in to_relative_actions/to_absolute_actions so _last_state on CPU doesn't cause cross-device errors when actions are on CUDA - Rename delta_step → relative_step in AbsoluteActionsProcessorStep for naming consistency; update factory.py, all processor files, and tests - Expand _reconnect_relative_absolute_steps docstring to explain why post-hoc rewiring is needed after deserialization - Fix off-by-one in compute_stats.py: sample_upper_bound = total_frames - chunk_size + 1 so last valid start index is included and total_frames == chunk_size is not rejected - Remove redundant NOTE comment in processor_pi05.py (duplicated two lines below) - Fix pi0_fast processor ordering: move relative_step before NormalizerProcessorStep so normalizer sees delta actions (matching pi0/pi05); flip postprocessor to unnormalize → absolute accordingly. Relative stats are now required for all pi models - Revert use_relative_joint_actions_aloha → use_delta_joint_actions_aloha in configuration_smolvla.py (preserve existing public API) - Update action_representations.mdx: add missing joint to 6-DOF example, fix 'based on a figure', clarify pi family ordering, add RTC compatibility section * update rtc link * feat: compute relative action stats over full dataset with optional parallelism Remove the 100k sample cap from compute_relative_action_stats and process all valid chunks. Vectorize with numpy (pre-load actions/states, fancy indexing + broadcasting) for a large speedup over the per-index HF dataset loop. Add num_workers param for thread-based parallelism (numpy releases the GIL). Update docs to show --push_to_hub for recompute_stats. * style: apply ruff formatting to compute_stats.py * testing on real robot * style: fix ruff format and remove redundant .keys() calls	2026-04-01 12:59:12 +02:00
Steven Palma	720cf8e3a0	Revert "fix(deps): breaking change from transformers 5.4.0" (#3249 ) * Revert "fix(deps): breaking change from transformers 5.4.0 (#3231)" This reverts commit `07502868e5`. * chore(dependecies): pin transformers to 5.3.0 temporarily	2026-03-30 19:11:41 +02:00
Steven Palma	5d4fdf5088	feat(scripts): add transformers version (#3248 ) * feat(scripts): add transformers and torch version * chore(scripts): remove pytorch	2026-03-30 16:33:17 +02:00
四七	3b185f7f9d	fix(datasets): remove unreachable timestamp branch in add_frame (#3163 ) * fix(datasets): remove unreachable timestamp branch in add_frame and document caller contract - Remove dead code: frame.pop("timestamp") branch in add_frame() could never execute because validate_frame() raises ValueError for any DEFAULT_FEATURES key (including timestamp) before we reach that line. - Expand add_frame() docstring: explicitly document that timestamp and frame_index must NOT be passed by the caller. - Add explanatory comment in validate_frame(): clarifies why DEFAULT_FEATURES are excluded from expected_features, preventing future re-introduction of the dead branch. The dead branch originated in #1200, which fixed a shape-(1,) mismatch for a code path that was subsequently made unreachable by a refactor of validate_frame. * chore(datasets): narrow PR scope * fix(datasets): move add_frame timestamp cleanup to dataset_writer	2026-03-28 11:37:57 +01:00
Bryson Jones	2e069b1c47	Feature/add multitask diffusion transformer policy implementation (#2545 ) * Add multitask diffusion transformer policy Add multitask diffusion transformer policy * expand the observation encoder to support differnt size encoders for vision and text * add RoPE attention module as this is shown to help training dynamics and generation quality for DiTs * update readme and citations for multitask dit policy * remove dino vision encoder and simplify text and vision encoders by removing inheritance structure * adjust factory comment * update docstring for multitask dit policy processor file * simplify config for multitask dit by merging and flattening everything, then adding comments to denote where some parameters are only used for specific objectives * add references to the modeling file comments * merge all modules files into the main modeling file * add torch.no_grad decorators * split up select action return statement * remove redundant asserts * add tutorial to training with multi_task_dit * fix bugs when testing on hardware * remove environment state conditioning * update typo in test instruction comment * add processor tests to multitask dit tests * move policy to top of file * use constants for indexing into batches and remove env state references * remove the base classes since we don't need to be able to extend * fix nit formatting in generate actions fcn * reformat and clean up tutorial for multitask dit policy * add more descriptions and depth to multitask dit tutorial * note origins of each training objective * rename config param for multiple vision encoders * refactor code to perform task tokenization in the processor instead of in the modeling code for multitask dit * add multitask dit to toc for docs * add conditional transformers import to match all other policies that use transformers lib * add test handling for multitask dit when transformers isnt available * skip tests without transformers * remove cropping of images smaller than the crop size * add kwargs arg to multitask dit constructor * add wallx dep conflict management for multitask dit policy * use hyphens for cleanliness in pyproject.toml * add conflict management to pyproject toml for pi conflict for mtdp as well * update tests script to not use unnecessary uv sync call which resolves dependencies that do not need to run. This drastically reduces CI run time * revert fast tests edits * update docs and readme files, fixing some typos and adding multitask dit to readme * chore(dependencies): upgrade transformers + hggingface-hub + peft + scipy * chore(dependencies): bump pi0 family to transformers v5 * chore(dependencies): bump wall x to transformers v5 * chore(dependencies): bump gr00t to transformers v5 * chore(style): fix pre-commit * fix(policy): xvla forced_bos_token missing * test(rl): skip ci tests for resnet10 * Fix: full pi models support for transformer v5 (#2967) * fix(pi): remove loss truncation * fix(pi): remove state padding before tokenization * fix(pi): fix image padding value * fix from_pretrain * add transformer v5 changes * remove reference * more fixes * make it work * add support for rest of pi family * add pifast work * more changes * more changes * more cleanup * fix torch params * dtype fix * torch compile * embed mismatch fix * revert groot * more nit fixes * remove unused classes * more fixes * revert * nit * torch dtype warning fix * but back dynamic renaming * add tie embedding --------- Co-authored-by: Yufei Sun <skieyfly@gmail.com> * chore: fix XVLA in transformers v5 (#3006) * test(policies): enable wall x CI testing * style(test): pre-commit check * style(test): pre-commit --------- Signed-off-by: Bryson Jones <63133702+brysonjones@users.noreply.github.com> Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Jade Choghari <chogharijade@gmail.com> Co-authored-by: Yufei Sun <skieyfly@gmail.com> Co-authored-by: Steven Palma <steven.palma@huggingface.co>	2026-03-28 00:41:26 +01:00
Steven Palma	4e45acca52	fix(dataset): use revision-safe Hub cache for downloaded datasets (#3233 ) * refactor(dataset): enhance dataset root directory handling and introduce hub cache support - Updated DatasetConfig and LeRobotDatasetMetadata to clarify root directory behavior and introduce a dedicated hub cache for downloads. - Refactored LeRobotDataset and StreamingLeRobotDataset to utilize the new hub cache and improve directory management. - Added tests to ensure correct behavior when using the hub cache and handling different revisions without a specified root directory. * refactor(dataset): improve root directory handling in LeRobotDataset - Updated LeRobotDataset to store the requested root path separately from the actual root path. - Adjusted metadata loading to use the requested root, enhancing clarity and consistency in directory management. * refactor(dataset): minor improvements for hub cache support * chore(datasets): guard in resume + assertion test --------- Co-authored-by: AdilZouitine <adilzouitinegm@gmail.com> Co-authored-by: mickaelChen <mickael.chen.levinson@gmail.com>	2026-03-27 22:21:55 +01:00
Maxime Ellerbach	07502868e5	fix(deps): breaking change from transformers 5.4.0 (#3231 ) * fix(deps): breaking change from transformers 5.4.0 * Update src/lerobot/policies/xvla/modeling_florence2.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * Update src/lerobot/policies/wall_x/qwen_model/qwen2_5_vl_moe.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * removing dataclass * bumping transformers 5.4.0 --------- Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-03-27 21:25:12 +01:00
Reece O'Mahoney	aa9cc9bd43	fix(logging): suppress noisy httpx INFO logs (#3173 ) Set httpx logger level to WARNING in init_logging to prevent HTTP request traces from flooding the terminal during train and eval scripts. Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-03-26 21:05:15 +01:00
Steven Palma	123495250b	refactor(dataset): split LeRobotDataset into DatasetReader & DatasetWriter (+ API cleanup) (#3180 ) * refactor(dataset): split reader and writer * chore(dataset): remove proxys * refactor(dataset): better reader & writer encapsulation * refactor(datasets): clean API + reduce leaky implementations * refactor(dataset): API cleaning for writer, reader and meta * refactor(dataset): expose writer & reader + other minor improvements * refactor(dataset): improve teardown routine * refactor(dataset): add hf_dataset property at the facade level * chore(dataset): add init for datasset module * docs(dataset): add docstrings for public API of the dataset classes * tests(dataset): add tests for new classes * fix(dataset): remove circular dependecy	2026-03-26 19:09:25 +01:00
Altman	e64fa667c3	fix(vqbet): use in-place fill_ to avoid overwriting DDP GPU buffers with CPU tensors (#3128 ) * fix(vqbet): use in-place fill_ to avoid overwriting DDP GPU buffers with CPU tensors When VQ discretization phase completes, the code was overwriting register_buffer('discretized') and register_buffer('freeze_codebook') with torch.tensor(True), which is created on CPU. DDP then fails in _sync_buffers() with: RuntimeError: No backend type associated with device type cpu. Fix by updating the buffers in-place with .fill_(True) so device and registration are preserved. Made-with: Cursor * test(vqbet): add regression test for in-place buffer update during discretization Verifies that discretize() updates the 'discretized' and 'freeze_codebook' registered buffers in-place (via fill_()) rather than replacing them with new CPU tensors. The test checks data_ptr() identity and that the tensors remain registered buffers after the call. This prevents regressions of the DDP fix. Made-with: Cursor * test(vqbet): add GPU regression test to verify buffers stay on CUDA after discretize() Directly catches the original DDP failure mode: when buffers are replaced with torch.tensor(True) they land on CPU, causing NCCL to raise 'No backend type associated with device type cpu' in _sync_buffers(). The GPU test places the model on cuda:0 and asserts both buffers remain on CUDA after discretization. Made-with: Cursor * test(vqbet): simplify to single device-check test in test_policies.py Per reviewer feedback: remove the separate test file and replace the two CPU/GPU tests (with data_ptr checks) with a single focused test in tests/policies/test_policies.py that only asserts the registered buffers remain on the model device after discretize(). Uses DEVICE from tests/utils.py so it runs on whatever device the CI/user selects (cpu, cuda, mps). Made-with: Cursor * style: fix import order in test_policies.py to pass ruff/pre-commit checks Made-with: Cursor --------- Co-authored-by: Zhan DiJia <2476100824@example.com> Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>	2026-03-18 13:24:07 +01:00

1 2 3 4 5 ...

361 Commits