lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-05-14 16:19:45 +00:00

Author	SHA1	Message	Date
Steven Palma	098ebb4d72	feat(ci): send slack notification if latest dependecy test is broken (#3398 )	2026-04-17 11:28:24 +02:00
Remy	bd74f6733d	chore: bump doc-builder SHA for PR upload workflow (#3386 )	2026-04-15 12:15:24 +02:00
Steven Palma	6f4a96333e	chore(docs): update contributing (#3387 )	2026-04-15 11:02:37 +02:00
Pepijn	187b2167ed	feat(ci): benchmark smoke tests with isolated Docker images (LIBERO + MetaWorld) (#3319 ) * docs(benchmarks): add benchmark integration guide and standardize benchmark docs Add a comprehensive guide for adding new benchmarks to LeRobot, and refactor the existing LIBERO and Meta-World docs to follow the new standardized template. * refactor(envs): move dispatch logic from factory into EnvConfig subclasses Replace hardcoded if/elif chains in factory.py with create_envs() and get_env_processors() methods on EnvConfig. New benchmarks now only need to register a config subclass — no factory.py edits required. Net -23 lines: factory.py shrinks from ~200 to ~70 lines of logic. * docs(benchmarks): clean up adding-benchmarks guide for clarity Rewrite for simpler language, better structure, and easier navigation. Move quick-reference table to the top, fold eval explanation into architecture section, condense the doc template to a bulleted outline. * fix link * fix task count * fix: enable SmolVLA eval on LIBERO with custom camera mappings - Thread camera_name_mapping from LiberoEnv config through to gym envs - Sync features_map with camera_name_mapping in LiberoEnv.__post_init__ - Fix render() to use first available camera instead of hardcoded "image" - Handle non-dict final_info in rollout by falling back to info["is_success"] - Add use_peft legacy field to SmolVLAConfig for checkpoint compat - Add defaults to GR00TN15Config init=False fields for transformers 5.3 * fix: use direct AutoresetMode import for gymnasium compat * fix: handle gymnasium < 1.0 without AutoresetMode * refactor: revert policy changes, keep env-only camera mapping fixes - Revert GR00T N1.5 default_factory/default changes (transformers compat) - Revert SmolVLA use_peft legacy field - Apply ruff formatting fixes - camera_name_mapping stays entirely in env/eval layer (no policy changes) * Update docs/source/env_processor.mdx Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * feat(envs): lazy env init + AsyncVectorEnv as default for n_envs > 1 LiberoEnv and MetaworldEnv previously allocated GPU resources (EGL context, OpenGL framebuffer) in __init__, before AsyncVectorEnv's fork(). Worker processes inherited stale GPU handles, causing EGL_BAD_CONTEXT crashes on first render. Fix: defer OffScreenRenderEnv / MT1 construction to _ensure_env(), called on first reset() or step() inside the worker subprocess. Each worker creates its own clean context after fork(). Also fixes lerobot_eval.py:170 (add_envs_task TODO): replace with env.call("task") which works with both SyncVectorEnv and AsyncVectorEnv. AsyncVectorEnv is now the default for n_envs > 1; auto-downgraded to SyncVectorEnv when n_envs=1 (no benefit, less overhead). Expected speedup: ~15-20x for LIBERO Spatial with batch_size=50. * fix: close envs between tasks to prevent worker process accumulation eval_policy_all never closed environments after each task completed, causing AsyncVectorEnv worker processes to accumulate (N_tasks × n_envs). This led to OOM, BrokenPipeError and EOFError on multi-task benchmarks. Also fixes: - AsyncVectorEnv compat in envs/utils.py (use get_attr/call instead of .envs) - Tuple task handling in tokenizer_processor and lerobot_eval - _LazyAsyncVectorEnv for deferred worker spawning in LIBERO * fix(eval): use task_description instead of task for language conditioning env.call("task") returns the LIBERO task name with underscores (e.g. "pick_up_the_black_bowl_...") instead of the natural language description ("pick up the black bowl ..."). The VLM tokenizes these completely differently, causing 0.0 reward across all episodes. * docs: update adding_benchmarks for async env changes - Replace add_envs_task reference with env.call("task_description") - Update use_async_envs default to True - Add note about lazy GPU init for AsyncVectorEnv compatibility * feat(eval): batch_size=auto + faster env loading - batch_size=0 (default) auto-tunes based on CPU cores, capped by n_episodes and 64. Removes the need for users to guess the right value. The old batch_size > n_episodes error is replaced by silently clamping to n_episodes. - _LazyAsyncVectorEnv accepts pre-computed spaces so only one temp env is created per suite (not per task). For libero_spatial (10 tasks) this avoids 9 redundant LiberoEnv instantiations during env setup. * docs: add evaluation guide and update benchmarks doc - New docs/source/evaluation.mdx covering lerobot-eval usage, batch_size auto-tuning, AsyncVectorEnv performance, tuning tips, output format, multi-task evaluation, and programmatic usage. - Add evaluation page to _toctree.yml under Benchmarks section. - Update adding_benchmarks.mdx to reference batch_size auto default and link to the evaluation guide. * docs(evaluation): remove benchmark table, rename section header * perf(eval): shared memory, observation passthrough, task prefetch - AsyncVectorEnv now uses shared_memory=True for zero-copy observation transfer - LiberoEnvConfig.gym_kwargs passes observation_height/width to the env - eval_policy_all prefetches next task's workers while current task runs * style: ruff format * chore: revert env_processor.mdx changes (not part of this PR) * ci(benchmarks): add isolated integration tests for libero and metaworld Each benchmark gets its own Docker image (lerobot[libero] / lerobot[metaworld] only) so incompatible dep trees cannot collide. A 1-episode smoke eval runs per benchmark on GPU runners. * ci(benchmarks): pin action hashes and use uv sync --locked * ci(benchmarks): trigger only on envs/ or lerobot_eval.py changes * fix(ci): set LIBERO_DATA_FOLDER to bypass interactive stdin prompt libero/__init__.py calls input() to ask about a custom dataset path, which raises EOFError when stdin is closed inside Docker. Setting LIBERO_DATA_FOLDER skips the prompt entirely. * docs(benchmarks): add CI smoke test step to adding_benchmarks guide * fix(ci): pre-create libero config in Dockerfile to bypass stdin prompt libero/__init__.py calls input() when ~/.libero/config.yaml is missing. We write the config at image build time (without importing libero) so the prompt never fires at runtime. Also trigger CI on pyproject.toml changes. * fix(ci): use shell to create libero config instead of multiline python -c The multiline RUN python -c "..." was being parsed as Dockerfile instructions. Use printf to write ~/.libero/config.yaml directly. * fix(ci): point libero config to bundled package init_files The config was pointing to /tmp/libero_init which doesn't exist. Use importlib.util.find_spec to locate the hf-libero package directory and write paths to the actual bundled bddl_files/init_files/assets. * fix(ci): add smolvla extra to benchmark Dockerfiles num2words (required by SmolVLM processor) is declared in lerobot[smolvla], not lerobot[libero/metaworld]. Install both extras together. * fix(eval): render_frame covers _LazyAsyncVectorEnv isinstance(env, AsyncVectorEnv) silently skipped _LazyAsyncVectorEnv, causing video rendering to produce no frames on the default async path. Switch to hasattr(env, "call") so any async-compatible env (including _LazyAsyncVectorEnv) hits the call("render") branch. * refactor(envs): remove unused _get_sub_env_attr helper _get_sub_env_attr was defined but never called anywhere in the codebase. _sub_env_has_attr (its sibling) is kept — it is actively used in utils.py. * chore: apply prettier formatting to docs * docs(env_processor): remove deprecated add_envs_task from pipeline example add_envs_task is replaced by env.call("task_description") in this PR. Remove it from the pipeline walkthrough and renumber the steps (8→7). * refactor(envs): remove __del__ from _LazyAsyncVectorEnv __del__ is unreliable as a cleanup mechanism. close() is already called explicitly in the eval loop's finally block, so the finalizer is redundant. * fix(eval): prefetch next task's workers after close to avoid GPU memory overlap Previously, next task's AsyncVectorEnv workers were spawned while the current task was still running, causing both tasks' GPU contexts to coexist. Moving the prefetch start into the finally block (after env.close()) ensures workers for task N+1 only spin up once task N has released GPU memory. * refactor(envs): move _LazyAsyncVectorEnv to utils and apply to metaworld _LazyAsyncVectorEnv lived in libero.py but metaworld had the same OOM problem: all tasks' AsyncVectorEnv workers were spawned eagerly, wasting GPU memory for tasks not yet running. Move the class to envs/utils.py so both environments share it, then apply the same is_async + lazy wrapping pattern in create_metaworld_envs. * chore: remove out-of-scope benchmark/CI/docs files from PR Benchmark CI workflow, Dockerfiles, benchmark docs, evaluation smoke-test doc, and dispatch tests belong in a separate PR. Scope this PR to the async env init changes only. * chore: restore adding_benchmarks + test_dispatch, drop env_processor changes - Restore docs/source/adding_benchmarks.mdx (belongs in this PR) - Restore tests/envs/test_dispatch.py (belongs in this PR) - Revert docs/source/env_processor.mdx to main (out of scope for this PR) * docs(adding_benchmarks): remove CI smoke test step (coming in separate PR) Step 7 (Dockerfile + benchmark_tests.yml CI job) and its table rows are out of scope for this PR. The CI infrastructure will be added on top in a follow-up PR. * refactor(envs): remove unused add_envs_task Replaced by env.call("task_description") in lerobot_eval.py. No callers remain in the codebase. * style: fix prettier formatting in env_processor.mdx * fix(ci): use root container chmod to fix PermissionError on artifact dirs Running chmod on the host doesn't propagate into Docker due to UID/SELinux mismatch. Instead, spin up the image as root to mkdir+chmod from inside the container before the eval run mounts the same path. * fix(ci): re-chmod artifacts after eval to fix unreadable files Files created by user_lerobot inside the eval container inherit a restrictive umask, making them unreadable by the runner after the container exits. Add a post-eval 'docker run --user root' chmod step so upload-artifact can find the video files. * feat(ci): add monthly schedule trigger for benchmark tests Runs on the 1st of every month at 02:00 UTC in addition to the existing push/PR and manual dispatch triggers. * fix(ci): change benchmark schedule from monthly to weekly (every Monday) * fix(ci): use docker cp instead of bind mounts for artifacts Bind mounts on these runners don't surface container-written files on the host path (likely DinD/socket-mount setup). Switch to named containers + docker cp, which copies directly through the daemon and lands files in the runner's accessible filesystem. * fix(ci): write eval output to /tmp inside container user_lerobot cannot create /artifacts at the container root. Use /tmp/eval-artifacts (always writable) then docker cp it out. * feat(ci): add parse_eval_metrics step to benchmark workflow Adds scripts/ci/parse_eval_metrics.py and wires it into both Libero and MetaWorld jobs so the dashboard can read pc_success, avg_sum_reward and eval_s from the metrics artifact instead of relying on GitHub step timing. * feat(ci): add Libero train+eval smoke test (1 step, eval_freq=1) Runs accelerate launch --num_processes=1 lerobot-train with: - steps=1, batch_size=1, dataset.episodes=[0] (episode 0 only) - eval_freq=1 so the training loop triggers eval after step 1 - eval.n_episodes=1, eval.use_async_envs=false Tests the full train→eval-within-training pipeline in the existing libero-benchmark-libero:ci image (no extra Docker build cost). Uploads eval video from /tmp/train-smoke/eval/ as libero-train-smoke-video. * feat(ci): extract task descriptions and embed in metrics artifact - Add scripts/ci/extract_task_descriptions.py: runs inside the benchmark Docker container (LIBERO/MetaWorld installed) after lerobot-eval and writes task_descriptions.json mapping task keys to NL instructions. LIBERO: uses libero.libero.benchmark to get suite.get_task(i).language. MetaWorld: formats task name as human-readable label. - Call extraction at the end of each eval bash-c (\|\| true so never fatal). - parse_eval_metrics.py reads task_descriptions.json and includes it in metrics.json so the health dashboard Space can label videos by task. * fix(ci): call extract_task_descriptions.py after eval in benchmark jobs The task descriptions were never populated in metrics.json because extract_task_descriptions.py was never invoked. The script exists and parse_eval_metrics.py already looks for its output — the call was simply missing from the workflow. Appends the extraction step to the existing bash -c block (runs inside the container where libero/metaworld is installed) so task_descriptions.json is written to the eval-artifacts dir before docker cp copies it out. * fix(test): use SyncVectorEnv in test_base_create_envs AsyncVectorEnv spawns new subprocesses that do not inherit the in-process gym registration created by the test. Pass use_async_envs=False since this test validates dispatch logic, not async parallelism. * perf(ci): split Dockerfile dep-install from source-copy for faster rebuilds The dep-install layer (uv sync) now only depends on pyproject.toml, uv.lock, and a minimal package stub — not the full src/ tree. Source code changes only rebuild the final COPY layer (seconds, not minutes). Also switch from type=local cache (lost on ephemeral runners) to type=gha (persisted in GitHub Actions cache, shared across all runs). Before: every src/ change → full uv sync rebuild (~8-10 min) After: src/-only change → cached dep layer, ~30s source copy * fix(ci): add Docker Hub login to avoid pull rate limits Anonymous pulls from Docker Hub are rate-limited to 100/6h, which fails when multiple benchmark jobs pull nvidia/cuda in parallel. Add docker/login-action step (conditional on DOCKERHUB_USERNAME var) to authenticate and get 200 pulls/6h. Setup: add DOCKERHUB_USERNAME as a repository variable and DOCKERHUB_TOKEN as a repository secret in GitHub Settings. * fix(ci): use existing DOCKERHUB_LEROBOT_USERNAME/PASSWORD secrets * fix(ci): use env context for secrets check in step if-condition Step-level 'if' cannot reference 'secrets' directly. Expose the secret via an env var and check that instead. * fix(ci): simplify Docker Hub login to match existing workflows Drop the conditional guard — other workflows (docker_publish, full_tests) call docker/login-action unconditionally. * fix(ci): switch Docker cache from type=gha to type=registry GHA cache is capped at 10GB per repo — a single CUDA + PyTorch + benchmark image is ~8GB so the cache evicts before it's reused. Switch to type=registry which pushes cache layers to Docker Hub (huggingface/lerobot-benchmark-cache:{libero,metaworld}). No size limit, layers persist until explicitly deleted, and shared across all runners and branches. * fix(ci): use GHCR for Docker layer cache (Docker Hub push denied) Docker Hub CI token can't push to new repos. GHCR works out of the box — GITHUB_TOKEN has automatic packages:write for the repo owner. - Add GHCR login step (github.actor + GITHUB_TOKEN) - Switch cache refs to ghcr.io/huggingface/lerobot/cache-benchmark - Add packages:write at job level (not workflow, per zizmor) - Keep Docker Hub login for pulling nvidia/cuda base image * fix(ci): remove GHCR cache (org blocks GITHUB_TOKEN package writes) The huggingface org restricts GHCR package creation via GITHUB_TOKEN, causing 403 on cache export. Remove all registry caching and GHCR login. The Dockerfile layer split (deps vs source) still helps when the runner has a warm Docker daemon. Also fix the metaworld job which had a stale conditional Docker Hub login and was missing the GHCR login entirely. * fix(ci): address PR review feedback for benchmark smoke tests Security: - Remove "Login to Hugging Face" step — it was a no-op (ephemeral --rm container) that exposed the HF token via CLI argument in docker inspect / /proc//cmdline. The eval step already re-authenticates via env var. Functional: - Remove feat/benchmark-ci from push trigger branches (won't exist post-merge). Dockerfiles: - Pin uv to 0.8.0 (was unpinned, fetching whatever latest ships). - Add comment explaining the chmod +x ptxas workaround (Triton packaging bug — ships ptxas without execute bit). Scripts: - parse_eval_metrics.py: add note that it runs on bare host and must stay stdlib-only. - parse_eval_metrics.py: add NaN guard for avg_sum_reward and eval_s (was only guarding pc_success). ci(benchmarks): trigger on PRs targeting feat/benchmark-ci Benchmark PRs (robomme, libero-plus, robocerebra, robotwin) target feat/benchmark-ci, not main. Without this, the workflow never runs on those PRs. * fix(docker): use uv pip install instead of uv sync (cross-extra conflict) uv sync --locked validates the entire lockfile across all extras. Since robomme depends on mani-skill which pins numpy<2.0, and the base project requires numpy>=2.0, the full lockfile is unsatisfiable. Switch to uv pip install -e ".[libero,smolvla]" which only resolves the requested extras for the current Python version and platform, avoiding the cross-extra numpy conflict entirely. * chore: revert configs.py, factory.py, test_dispatch.py to main These use_async_envs default changes belong to the async-vector-env PR (#3274), not this CI PR. Restore to match origin/main. * fix: address PR review feedback — broken link, NaN guard, zizmor tags, fork skip - Remove broken Triton issue link from Dockerfile.benchmark.libero - Add module-level _safe_int helper to guard n_episodes against NaN - Move _safe_float to module level alongside _safe_int - Add # zizmor: ignore[unpinned-uses] to all upload-artifact@v4 steps - Add if: env.HF_USER_TOKEN != '' to Libero smoke eval for fork PRs * fix(ci): add fork PR guard to train-smoke and MetaWorld eval steps Add if: env.HF_USER_TOKEN != '' to the Libero train+eval smoke and MetaWorld smoke eval steps so fork PRs without the secret skip gracefully. * fix(ci): remove feat/benchmark-ci from PR trigger branches * refactor(docker): rebase benchmark images on nightly lerobot-gpu Use huggingface/lerobot-gpu:latest as base for both libero and metaworld benchmark Dockerfiles instead of building from nvidia/cuda scratch. The nightly image already has all extras installed via uv sync --extra all, so we only need to overlay the PR source code (and libero asset setup). This eliminates duplicated system dep installation, Python setup, uv venv creation, and the Triton ptxas workaround from both files. --------- Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>	2026-04-13 21:24:01 +02:00
Steven Palma	df0763a2bc	feat(dependencies): minimal default tag install (#3362 )	2026-04-12 20:03:04 +02:00
Steven Palma	6799da35eb	chore(ci): proper claude args workflow (#3338 )	2026-04-09 16:20:01 +02:00
Steven Palma	3e34d550c8	fix(ci): pin claude-code-action to v1.0.88 (#3336 )	2026-04-09 14:16:54 +02:00
hf-security-analysis[bot]	800449aa53	chore(security): update claude.yml (#3333 ) * fix(security): remediate workflow vulnerability in .github/workflows/claude.yml * fix(security): right AUTHOR_ASSOCIATION fetching --------- Co-authored-by: hf-security-analysis[bot] <265538906+hf-security-analysis[bot]@users.noreply.github.com> Co-authored-by: Steven Palma <steven.palma@huggingface.co>	2026-04-09 13:02:05 +02:00
Steven Palma	8645d71e56	feat(ci): add agent assitance workflow (#3332 ) Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-04-09 12:06:25 +02:00
Pauline Bailly-Masson	1396b9fab7	🔒 Pin GitHub Actions to commit SHAs (#3265 ) * 🔒 pin quality.yml actions to commit SHAs * 🔒 pin fast_tests.yml actions to commit SHAs * 🔒 pin full_tests.yml actions to commit SHAs * 🔒 pin documentation.yml actions to commit SHAs * 🔒 pin documentation-upload-pr.yml actions to commit SHAs * 🔒 pin release.yml actions to commit SHAs * 🔒 pin security.yml actions to commit SHAs --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-04-07 16:11:14 +02:00
Steven Palma	913041e753	fix(ci): latest deps tests permissions (#3296 ) * fix(ci): latest deps tests permissions * fix(ci): force push dep update branch * fix(ci): change secret for permissions & Ci trigger	2026-04-06 14:56:05 +02:00
Steven Palma	50a1e67e94	feat(ci): add `uv.lock` (#3292 ) * feat(ci): add uv.lock * feat(ci): use uv.lock in CI PR testing * chore(ci): rename nightly to docker publish and test * feat(ci): automated update of uv.lock + remove unbound check + docker images now use uv.lock * fix(ci): add --force-with-lease + set -e for silent erros	2026-04-06 12:23:37 +02:00
Steven Palma	85de893fa7	fix(ci): skip HF log in (and tests) in forks and community PRs (#3097 ) * fix(ci): skip HF log in (and tests) in forks and community PRs * chore(test): remove comment about test meant to be only run locally * fix(tests): no hf log in decorator for xvla * fix(test): no decorator in yield	2026-03-06 16:33:43 +01:00
Steven Palma	a4c66e530b	chore(docs): remove pi installation note (#3095 )	2026-03-06 15:52:54 +01:00
Steven Palma	e489ba24fc	feat(dependencies): require Python 3.12+ as minimum version (#3023 ) * feat(dependecies): upgrade to python3.12 * fix(test): processor regex message * fix(test): processor regex message * fix(dependecies): resolve all tags in python 3.12 * fix(dependecies): add more hints to faster resolve * chore(dependecies): remove cli tag huggingface-hub dep * refactor(policy): update eagle for python3.12 * chore(docs): update policy creation for python 3.12 * chore(test): skip failing tests in macos	2026-03-06 10:15:13 +01:00
Steven Palma	d324ffe810	fix(ci): test only multi-gpu tests in multi-gpu runner (#3092 )	2026-03-05 19:53:40 +01:00
Steven Palma	3e45120272	fix(ci): log in HF for gated repo in nightly workflows (#3089 ) * fix(ci): log in HF for gated repo in nightly workflows * fix(ci): add env var * fix(ci): remove 10 min limit for multi-gpu nightly	2026-03-05 13:22:37 +01:00
Steven Palma	f0d2b37beb	chore(dependencies): bump transformers v5 (#2964 ) * chore(dependencies): upgrade transformers + hggingface-hub + peft + scipy * chore(dependencies): bump pi0 family to transformers v5 * chore(dependencies): bump wall x to transformers v5 * chore(dependencies): bump gr00t to transformers v5 * chore(style): fix pre-commit * fix(policy): xvla forced_bos_token missing * test(rl): skip ci tests for resnet10 * Fix: full pi models support for transformer v5 (#2967) * fix(pi): remove loss truncation * fix(pi): remove state padding before tokenization * fix(pi): fix image padding value * fix from_pretrain * add transformer v5 changes * remove reference * more fixes * make it work * add support for rest of pi family * add pifast work * more changes * more changes * more cleanup * fix torch params * dtype fix * torch compile * embed mismatch fix * revert groot * more nit fixes * remove unused classes * more fixes * revert * nit * torch dtype warning fix * but back dynamic renaming * add tie embedding --------- Co-authored-by: Yufei Sun <skieyfly@gmail.com> * chore: fix XVLA in transformers v5 (#3006) * test(policies): enable wall x CI testing * style(test): pre-commit check * style(test): pre-commit * fix wall x for transformer v5 (#3008) * tv5 fix * various wall x fixes * Delete tests/policies/pi0_pi05/print_pi05_output_logits.py Signed-off-by: Jade Choghari <chogharijade@gmail.com> * sync modeling_florence2.py with chore/bump_transformers_v5 * more * more fixes * more * remove comment * more --------- Signed-off-by: Jade Choghari <chogharijade@gmail.com> * chore(dependencies): adjust dependencies versioning after transformers v5 (#3034) * chore(dependecies): adjust dependecies versioning after transformers v5 * fix(policies): remove deprecated input_embeds * fix(policies): dict _tied_weights_keys * chore(depedencies): common qwen-vl-utils * chore(dependencies): bump transformers to 5.2 * Fix policy testing for tv5 (#3032) * fix ci logger * other fix * fix mypy * change logits to torch2.10 * skip wallx\| * remove logging --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org> * feat(ci): log into HF to unblock some CI tests (#3007) * feat(ci): log into HF to unblock some CI tests * chore(ci): change hf call + secret name * fix(ci): temp fix for pi0 rtc test * test(policies): require_cuda for unblocked tests * test(policies): require_cuda wall_x * fic(tests): require_cuda outter most for pi0 * fix(test): return instead of yield --------- Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> * style(test): fix pre-commit * chore(deps): upgrade transformers (#3050) * chore(test): use lerobot model * fix(policies): change default action tokenizer for wall x * sample on cpu * Revert "Merge branch 'chore/bump_transformers_v5' of https://github.com/huggingface/lerobot into chore/bump_transformers_v5" This reverts commit `d9b76755f7`, reversing changes made to `89359cb0b6`. * Reapply "Merge branch 'chore/bump_transformers_v5' of https://github.com/huggingface/lerobot into chore/bump_transformers_v5" This reverts commit `c9914db78b`. --------- Signed-off-by: Jade Choghari <chogharijade@gmail.com> Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Jade Choghari <chogharijade@gmail.com> Co-authored-by: Yufei Sun <skieyfly@gmail.com> Co-authored-by: Pepijn <pepijn@huggingface.co>	2026-03-05 09:25:26 +01:00
Steven Palma	5095ab0845	fix(ci): permissions triton (#3011 )	2026-02-24 19:09:34 +01:00
whats2000	778db19a17	[Bug Fix] fix(ci): prevent runner group error on fork pushes (#2911 ) * fix(ci): prevent runner group error on fork pushes Add repository check to unbound_deps_tests workflow to ensure aws-general-8-plus runner group is only used on main repository, preventing 'Required runner group not found' errors on forks. * fix(ci): use gating job to prevent runner allocation on forks The previous approach failed because GitHub evaluates runs-on before if conditions. Now using a check-repo job that runs on ubuntu-latest first, and all jobs with special runners depend on it and check its output before being scheduled. * fix(ci): add gating job to full_tests to prevent runner allocation on forks Apply the same gating pattern used in unbound_deps_tests to full_tests.yml to prevent GitHub from trying to allocate custom runners when workflows run on forks. The check-repo job runs first on ubuntu-latest and all jobs with custom runners depend on it and check its output. * fix(ci): add repository check to unbound_deps_tests workflow Add 'if: github.repository == huggingface/lerobot' check to build-and-push-docker job to prevent runner group access errors on forks, matching the pattern used in nightly.yml * fix(ci): add repository check to full_tests workflow Add 'if: github.repository == huggingface/lerobot' check to build-and-push-docker and gpu-tests jobs to prevent runner group access errors on forks * refactor(ci): remove redundant check from gpu-tests job gpu-tests depends on build-and-push-docker via needs, so it will automatically skip when the parent job is skipped * refactor(ci): remove unnecessary fork check from full-tests job full-tests runs on ubuntu-latest which is available to all forks, no need to restrict it --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-02-10 15:21:40 +01:00
Steven Palma	6d34a986de	feat(ci): trigger manually documentation release version (#2841 )	2026-01-22 12:26:17 +01:00
Steven Palma	112b2d173a	chore(ci): deactivates cron job on unbound dep tests (#2810 )	2026-01-16 14:39:00 +01:00
Steven Palma	a17df523e0	chore(ci): merge annoying section in PR template (#2802 ) * chore(ci): merge annoying section in PR template * pre-commit	2026-01-14 17:17:56 +01:00
Pauline Bailly-Masson	a9d81e7f67	refactor(ci): Docker Hub image env (#2755 ) * Refactor Docker Hub image env Updated environment variable usage for Docker Hub credentials and corrected image tag extraction. Signed-off-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com> * same Signed-off-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> * chore(ci): remove duplicated IMAGE_FULL variable definition --------- Signed-off-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com> Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-01-07 00:21:03 +01:00
Salman Chishti	a06f4b9140	Upgrade GitHub Actions for Node 24 compatibility (#2691 )	2025-12-24 10:42:29 +01:00
Steven Palma	20c22a2799	chore(ci): make keyword matching more conservative (#2711 )	2025-12-24 02:03:12 +01:00
Steven Palma	2f238fce15	feat(ci): adds release versioning to docs (#2709 ) * feat(ci): adds release versioning to docs * chore(ci): remove TODO	2025-12-24 00:40:56 +01:00
Pepijn	ff271e8b51	pi fixes for dependencies (#2706 ) * pi fixes for dependencies * add walls sarm conflict * also add conflicts for pi * fix(ci): use --extra all instead of --all-extras + --no-extra --------- Co-authored-by: Steven Palma <steven.palma@huggingface.co>	2025-12-23 23:58:34 +01:00
Tong Wu	17c5a0774f	feat: support wallx model (#2593 ) * support wallx * fix bugs in flow * incorporate wallx model into lerobot * update the policy methods * reduce to least config and params & pass lerobot basic test * fixed dtype bugs * add wallx dependencies * update * remove flash-attn requirement && fix bug in inference and fast mode * fix bug for inference * add some small modifications * fix pre-commit errors * remove lerobot[wallx] * fix ci * fix precommit issues * fix: exclude wallx extra properly in CI workflows * fix: add uv conflicts for wallx transformers version * fix: peft test import * pre-commit * only export WallXConfig from wall_x package to avoid peft import in CI * remove torch dep * precommit * add import --------- Co-authored-by: vincentchen <chenlufang@x2robot.com> Co-authored-by: Geoffrey19 <sympathischmann35@gmail.com> Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Pepijn <pepijn@huggingface.co>	2025-12-22 10:12:39 +01:00
Steven Palma	4a151a9682	chore(ci): minor improvement bug-report template & pr auto label (#2676 ) * chore(ci): minor improvement bug-report template * chore(ci): change triggers for PR auto label	2025-12-18 00:23:23 +01:00
Steven Palma	8667b9ef08	chore(ci): minor improvements auto labeling (#2675 )	2025-12-17 22:54:47 +01:00
Steven Palma	86eee5c1e2	fix(ci): close bracket pattern (#2674 )	2025-12-17 22:40:33 +01:00
Steven Palma	469b855e42	fix(ci): better heuristic + issue type template fix (#2672 ) * fix(ci): better heuristic + issue type template fix * chore(ci): remove keywords in performance tag	2025-12-17 22:31:22 +01:00
Steven Palma	292333cafc	chore(ci): update issue template (#2666 )	2025-12-17 18:02:20 +01:00
Steven Palma	f0c98e23f1	feat(ci): simple automatic labelling (#2667 ) * ci: add pr labeler * ci: add issue labeler * ci: minor fixes for labelers * fix(ci): add explicit path for pr labeler	2025-12-17 17:52:45 +01:00
Steven Palma	7621af5acd	chore(ci): update PR template (#2665 ) * chore: update code of conduct to transformers one * chore: update PR template	2025-12-17 17:10:04 +01:00
Steven Palma	f9cb5e659c	chore(ci): skip workflows if not lerobot repository (#2601 ) Co-authored-by: Alex Tyshka <atyshka15@gmail.com>	2025-12-08 12:44:36 +01:00
Steven Palma	af4766b602	fix(ci): move hub artifacts to `/mnt` to avoid runners' `No space left on device` (#2564 ) * fix(ci): move hub & lerobot artefacts to /mnt to avoid No space left on device in the future * chore(ci): remove dh -h steps	2025-12-01 20:14:51 +01:00
Steven Palma	a5b29d4301	chore(installation): remove libero installation patch (#2416 ) * chore(installation): remove libero installation patch * fix(ci): exclude groot for unbound deps test	2025-11-10 11:51:52 +01:00
Steven Palma	2ea3043b1b	patch(ci): remove pi & libero tags from PyPi release temporary due to their reliance on git dependencies (#2300 )	2025-10-23 19:37:11 +02:00
Steven Palma	be46bdea8f	feat(policies): add Nvidia Gr00t N1.5 model (#2292 ) * feat(policies): add Nvidia Gr00t N1.5 model Co-authored-by: lbenhorin <lbenhorin@nvidia.com> Co-authored-by: Aravindh <aravindhs@nvidia.com> Co-authored-by: nv-sachdevkartik <ksachdev@nvidia.com> Co-authored-by: youliangt <youliangt@nvidia.com> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Jade Choghari <chogharijade@gmail.com> * fix(docs): add groot to index Co-authored-by: sachdevkartik <sachdev.kartik25@gmail.com> --------- Co-authored-by: lbenhorin <lbenhorin@nvidia.com> Co-authored-by: Aravindh <aravindhs@nvidia.com> Co-authored-by: nv-sachdevkartik <ksachdev@nvidia.com> Co-authored-by: youliangt <youliangt@nvidia.com> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Jade Choghari <chogharijade@gmail.com> Co-authored-by: sachdevkartik <sachdev.kartik25@gmail.com>	2025-10-23 13:50:30 +02:00
Steven Palma	503fc4e9f4	fix(ci): exclude motor tests in multi-gpu setup (#2276 )	2025-10-21 12:14:26 +02:00
Jade Choghari	5f6f476f32	fix: support cuda:0, cuda:1 in string selection (#2256 ) * fix * update func 2 * update nightly * fix quality * ignore test_dynamixel	2025-10-20 23:29:05 +02:00
pre-commit-ci[bot]	7aedbbf81a	[pre-commit.ci] pre-commit autoupdate (#1563 ) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](https://github.com/pre-commit/pre-commit-hooks/compare/v5.0.0...v6.0.0) - [github.com/astral-sh/ruff-pre-commit: v0.12.4 → v0.13.0](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.4...v0.13.0) - [github.com/adhtruong/mirrors-typos: v1.34.0 → v1.36.2](https://github.com/adhtruong/mirrors-typos/compare/v1.34.0...v1.36.2) - [github.com/gitleaks/gitleaks: v8.27.2 → v8.28.0](https://github.com/gitleaks/gitleaks/compare/v8.27.2...v8.28.0) - [github.com/woodruffw/zizmor-pre-commit: v1.11.0 → v1.13.0](https://github.com/woodruffw/zizmor-pre-commit/compare/v1.11.0...v1.13.0) * chore: update pre-commit versions --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2025-10-18 01:20:45 +02:00
dependabot[bot]	44bf283701	chore(deps): bump pypa/gh-action-pypi-publish (#1870 ) Bumps the github_actions group with 1 update in the /.github/workflows directory: [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish). Updates `pypa/gh-action-pypi-publish` from 1.12.4 to 1.13.0 - [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases) - [Commits](https://github.com/pypa/gh-action-pypi-publish/compare/v1.12.4...v1.13.0) --- updated-dependencies: - dependency-name: pypa/gh-action-pypi-publish dependency-version: 1.13.0 dependency-type: direct:production dependency-group: github_actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-17 15:33:37 +02:00
Steven Palma	8bd0aec618	chore(ci): relax stale bot for PRs (#2222 )	2025-10-16 17:44:50 +02:00
Pepijn	e82e7a02e9	feat(train): add accelerate for multi gpu training (#2154 ) * Enhance training and logging functionality with accelerator support - Added support for multi-GPU training by introducing an `accelerator` parameter in training functions. - Updated `update_policy` to handle gradient updates based on the presence of an accelerator. - Modified logging to prevent duplicate messages in non-main processes. - Enhanced `set_seed` and `get_safe_torch_device` functions to accommodate accelerator usage. - Updated `MetricsTracker` to account for the number of processes when calculating metrics. - Introduced a new feature in `pyproject.toml` for the `accelerate` library dependency. * Initialize logging in training script for both main and non-main processes - Added `init_logging` calls to ensure proper logging setup when using the accelerator and in standard training mode. - This change enhances the clarity and consistency of logging during training sessions. * add docs and only push model once * Place logging under accelerate and update docs * fix pre commit * only log in main process * main logging * try with local rank * add tests * change runner * fix test * dont push to hub in multi gpu tests * pre download dataset in tests * small fixes * fix path optimizer state * update docs, and small improvements in train * simplify accelerate main process detection * small improvements in train * fix OOM bug * change accelerate detection * add some debugging * always use accelerate * cleanup update method * cleanup * fix bug * scale lr decay if we reduce steps * cleanup logging * fix formatting * encorperate feedback pr * add min memory to cpu tests * use accelerate to determin logging * fix precommit and fix tests * chore: minor details --------- Co-authored-by: AdilZouitine <adilzouitinegm@gmail.com> Co-authored-by: Steven Palma <steven.palma@huggingface.co>	2025-10-16 17:41:55 +02:00
Steven Palma	b74e2a6113	feat(deps): ceil dependency versions (#2091 )	2025-10-05 17:53:43 +02:00
Steven Palma	cdd2bf1c4e	chore(ci): update stale message (#2027 )	2025-09-24 15:46:44 +02:00
Steven Palma	1666097fd3	refactor(scripts): update system info script (#2005 ) * refactor(scripts): update system info script * chore(scripts): rename info script * feat(scripts): add entrypoint for info * chore(ci): update issue report template	2025-09-23 17:55:53 +02:00

1 2 3 4

173 Commits