lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-07-08 02:22:02 +00:00

Author	SHA1	Message	Date
Khalil Meftah	577f14337a	refactor(tests): remove grpc import checks from test files for cleaner code	2026-04-27 16:20:13 +02:00
Khalil Meftah	47be90f040	refactor(rl): make RLAlgorithmConfig an abstract base class for better extensibility	2026-04-27 15:59:59 +02:00
Khalil Meftah	47dd65347e	refactor(rl): add type property to RLAlgorithmConfig for better clarity	2026-04-27 15:57:24 +02:00
Khalil Meftah	fd5a788120	refactor(rl): add make_algorithm_config function for RLAlgorithmConfig instantiation	2026-04-27 15:55:16 +02:00
Khalil Meftah	9ce9e01469	refactor(rl): make algorithm a nested config so all SAC hyperparameters are JSON-addressable	2026-04-27 13:39:03 +02:00
Khalil Meftah	21c16a27f0	Revert "perf(observation_processor): add CUDA support for image processing" This reverts commit `38b88c414c`.	2026-04-27 11:52:19 +02:00
Khalil Meftah	b3164543f4	fix(rl): enhance intervention handling in actor and learner (cherry picked from commit `ef8bfffbd7`)	2026-04-27 11:35:21 +02:00
Khalil Meftah	f3993cbbb1	fix(rl): improve action processing for discrete and continuous actions (cherry picked from commit `f887ab3f6a`)	2026-04-27 11:35:20 +02:00
Khalil Meftah	c278cfa026	fix(rl): postprocess action in actor (cherry picked from commit `c2556439e5`)	2026-04-27 11:35:20 +02:00
Khalil Meftah	77d18659b1	fix(rl): mirror gym_manipulator in actor (cherry picked from commit `d2a046dfc5`)	2026-04-27 11:35:19 +02:00
Khalil Meftah	6347edefb1	fix(rl): merge environment and action-processor info in transition processing (cherry picked from commit `30e1886b64`)	2026-04-27 11:35:18 +02:00
Khalil Meftah	eda47eca18	fix(rl): update neutral gripper action (cherry picked from commit `9c9064e5be`)	2026-04-27 11:35:18 +02:00
Khalil Meftah	a64e6f5070	fix(rl): clarify discrete gripper action mapping in GripperVelocityToJoint for SO100 (cherry picked from commit `494f469a2b`)	2026-04-27 11:35:17 +02:00
Khalil Meftah	3def86c2c3	fix(rl): add time limit processor to environment pipeline (cherry picked from commit `cd105f65cb`)	2026-04-27 11:35:17 +02:00
Khalil Meftah	356a64d8c4	fix(rl): correctly wire HIL-SERL gripper penalty through processor pipeline (cherry picked from commit `9c2af818ff`)	2026-04-27 11:35:16 +02:00
Khalil Meftah	38b88c414c	perf(observation_processor): add CUDA support for image processing	2026-04-24 13:36:26 +02:00
Khalil Meftah	1ed32210c7	refactor(rl/sac): consolidate hyperparameter ownership and clean up discrete critic	2026-04-24 13:18:33 +02:00
Khalil Meftah	06255996ea	refactor(policies): rename policies/sac → policies/gaussian_actor	2026-04-23 19:13:18 +02:00
Khalil Meftah	8065bf15c7	fix test for flat dict structure	2026-04-21 12:06:25 +02:00
Khalil Meftah	8191d2d87f	remove unused type alias	2026-04-21 11:56:27 +02:00
Khalil Meftah	6b93f31238	fix docstring	2026-04-21 11:55:17 +02:00
Khalil Meftah	a4c0c9e358	update losses names in tests	2026-04-21 11:53:32 +02:00
Khalil Meftah	a84b0e8132	refactor(sac): decouple algorithm hyperparameters from policy config	2026-04-18 16:40:56 +02:00
Khalil Meftah	2487a6ee6d	perf(rl): use async iterators in OnlineOfflineMixer.get_iterator	2026-04-18 16:02:28 +02:00
Khalil Meftah	72fb0faf62	refactor(sac): simplify optimizer return structure	2026-04-18 15:45:22 +02:00
Khalil Meftah	2c97cb23c8	refactor(rl): update shutdown_event type hints from 'any' to 'Any' for consistency and clarity	2026-04-18 15:39:32 +02:00
Khalil Meftah	87d4c9879c	fix(sac): clarify torch.compile status	2026-04-18 15:19:35 +02:00
Khalil Meftah	e4c1a8472d	fix(config): update vision encoder model name to lerobot/resnet10	2026-04-18 15:15:59 +02:00
Khalil Meftah	d7e25c8326	refactor(rl): expose public API in rl/__init__ and use relative imports in sub-packages	2026-04-16 15:46:34 +02:00
Khalil Meftah	a5ad273b62	fix(tests): skip tests that require grpc if not available	2026-04-15 16:30:20 +02:00
Khalil Meftah	23bece96a4	fix(tests): ensure tensor stats comparison accounts for reshaping in normalization tests	2026-04-15 16:12:08 +02:00
Khalil Meftah	7a1c9e74c3	fix: skip tests that require grpc if not available	2026-04-15 15:18:04 +02:00
Khalil Meftah	c88cf979f1	fix: use string key for IS_INTERVENTION in complementary_info to avoid torch.load serialization error	2026-04-15 11:49:38 +02:00
Khalil Meftah	79a9ebdaa6	fix: add try/finally to control_loop to ensure image writer cleanup on exit	2026-04-14 17:54:35 +02:00
Khalil Meftah	da6e36fd03	Merge remote-tracking branch 'origin/main' into user/khalil-meftah/2026-02-16-rl-stack-refactor	2026-04-14 17:14:56 +02:00
Khalil Meftah	64dc08cb7b	fix: include IS_INTERVENTION in complementary_info sent to learner for offline replay buffer	2026-04-14 16:35:08 +02:00
Radu	1ede000bdd	fix(rl): swap dict merge order to preserve teleop intervention flag (#3273 ) Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>	2026-04-14 16:20:54 +02:00
Khalil Meftah	d57c58a532	fix: add thread synchronization to ReplayBuffer to prevent race condition between add() and sample() (#3372 )	2026-04-14 13:16:45 +02:00
Matteo Tiezzi	b3e76a92f2	fix(groot): compatibility fixes for gr00t in v0.5 (#3182 ) * fix(groot): apply groot 0.5 fixes * fix(groot): correct indentation and add tile count in Eagle25VL processor * Fixed lint7/style	2026-04-14 13:09:18 +02:00
Khalil Meftah	f5c801fd34	fix(test): add missing device placement in multi-task DiT tests (#3349 )	2026-04-14 12:25:29 +02:00
Ethan Pronovost	cff4bcf4a0	Update reward classifier training config (#3147 ) Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>	2026-04-14 11:28:49 +02:00
Khalil Meftah	e6d282108d	Fix: add kwargs in reward classifier __init__()	2026-04-14 11:13:43 +02:00
Maxime Ellerbach	a656a982af	fix(feetech): motor position readings overflow (#3373 )	2026-04-13 22:39:58 +02:00
Pepijn	187b2167ed	feat(ci): benchmark smoke tests with isolated Docker images (LIBERO + MetaWorld) (#3319 ) * docs(benchmarks): add benchmark integration guide and standardize benchmark docs Add a comprehensive guide for adding new benchmarks to LeRobot, and refactor the existing LIBERO and Meta-World docs to follow the new standardized template. * refactor(envs): move dispatch logic from factory into EnvConfig subclasses Replace hardcoded if/elif chains in factory.py with create_envs() and get_env_processors() methods on EnvConfig. New benchmarks now only need to register a config subclass — no factory.py edits required. Net -23 lines: factory.py shrinks from ~200 to ~70 lines of logic. * docs(benchmarks): clean up adding-benchmarks guide for clarity Rewrite for simpler language, better structure, and easier navigation. Move quick-reference table to the top, fold eval explanation into architecture section, condense the doc template to a bulleted outline. * fix link * fix task count * fix: enable SmolVLA eval on LIBERO with custom camera mappings - Thread camera_name_mapping from LiberoEnv config through to gym envs - Sync features_map with camera_name_mapping in LiberoEnv.__post_init__ - Fix render() to use first available camera instead of hardcoded "image" - Handle non-dict final_info in rollout by falling back to info["is_success"] - Add use_peft legacy field to SmolVLAConfig for checkpoint compat - Add defaults to GR00TN15Config init=False fields for transformers 5.3 * fix: use direct AutoresetMode import for gymnasium compat * fix: handle gymnasium < 1.0 without AutoresetMode * refactor: revert policy changes, keep env-only camera mapping fixes - Revert GR00T N1.5 default_factory/default changes (transformers compat) - Revert SmolVLA use_peft legacy field - Apply ruff formatting fixes - camera_name_mapping stays entirely in env/eval layer (no policy changes) * Update docs/source/env_processor.mdx Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * feat(envs): lazy env init + AsyncVectorEnv as default for n_envs > 1 LiberoEnv and MetaworldEnv previously allocated GPU resources (EGL context, OpenGL framebuffer) in __init__, before AsyncVectorEnv's fork(). Worker processes inherited stale GPU handles, causing EGL_BAD_CONTEXT crashes on first render. Fix: defer OffScreenRenderEnv / MT1 construction to _ensure_env(), called on first reset() or step() inside the worker subprocess. Each worker creates its own clean context after fork(). Also fixes lerobot_eval.py:170 (add_envs_task TODO): replace with env.call("task") which works with both SyncVectorEnv and AsyncVectorEnv. AsyncVectorEnv is now the default for n_envs > 1; auto-downgraded to SyncVectorEnv when n_envs=1 (no benefit, less overhead). Expected speedup: ~15-20x for LIBERO Spatial with batch_size=50. * fix: close envs between tasks to prevent worker process accumulation eval_policy_all never closed environments after each task completed, causing AsyncVectorEnv worker processes to accumulate (N_tasks × n_envs). This led to OOM, BrokenPipeError and EOFError on multi-task benchmarks. Also fixes: - AsyncVectorEnv compat in envs/utils.py (use get_attr/call instead of .envs) - Tuple task handling in tokenizer_processor and lerobot_eval - _LazyAsyncVectorEnv for deferred worker spawning in LIBERO * fix(eval): use task_description instead of task for language conditioning env.call("task") returns the LIBERO task name with underscores (e.g. "pick_up_the_black_bowl_...") instead of the natural language description ("pick up the black bowl ..."). The VLM tokenizes these completely differently, causing 0.0 reward across all episodes. * docs: update adding_benchmarks for async env changes - Replace add_envs_task reference with env.call("task_description") - Update use_async_envs default to True - Add note about lazy GPU init for AsyncVectorEnv compatibility * feat(eval): batch_size=auto + faster env loading - batch_size=0 (default) auto-tunes based on CPU cores, capped by n_episodes and 64. Removes the need for users to guess the right value. The old batch_size > n_episodes error is replaced by silently clamping to n_episodes. - _LazyAsyncVectorEnv accepts pre-computed spaces so only one temp env is created per suite (not per task). For libero_spatial (10 tasks) this avoids 9 redundant LiberoEnv instantiations during env setup. * docs: add evaluation guide and update benchmarks doc - New docs/source/evaluation.mdx covering lerobot-eval usage, batch_size auto-tuning, AsyncVectorEnv performance, tuning tips, output format, multi-task evaluation, and programmatic usage. - Add evaluation page to _toctree.yml under Benchmarks section. - Update adding_benchmarks.mdx to reference batch_size auto default and link to the evaluation guide. * docs(evaluation): remove benchmark table, rename section header * perf(eval): shared memory, observation passthrough, task prefetch - AsyncVectorEnv now uses shared_memory=True for zero-copy observation transfer - LiberoEnvConfig.gym_kwargs passes observation_height/width to the env - eval_policy_all prefetches next task's workers while current task runs * style: ruff format * chore: revert env_processor.mdx changes (not part of this PR) * ci(benchmarks): add isolated integration tests for libero and metaworld Each benchmark gets its own Docker image (lerobot[libero] / lerobot[metaworld] only) so incompatible dep trees cannot collide. A 1-episode smoke eval runs per benchmark on GPU runners. * ci(benchmarks): pin action hashes and use uv sync --locked * ci(benchmarks): trigger only on envs/ or lerobot_eval.py changes * fix(ci): set LIBERO_DATA_FOLDER to bypass interactive stdin prompt libero/__init__.py calls input() to ask about a custom dataset path, which raises EOFError when stdin is closed inside Docker. Setting LIBERO_DATA_FOLDER skips the prompt entirely. * docs(benchmarks): add CI smoke test step to adding_benchmarks guide * fix(ci): pre-create libero config in Dockerfile to bypass stdin prompt libero/__init__.py calls input() when ~/.libero/config.yaml is missing. We write the config at image build time (without importing libero) so the prompt never fires at runtime. Also trigger CI on pyproject.toml changes. * fix(ci): use shell to create libero config instead of multiline python -c The multiline RUN python -c "..." was being parsed as Dockerfile instructions. Use printf to write ~/.libero/config.yaml directly. * fix(ci): point libero config to bundled package init_files The config was pointing to /tmp/libero_init which doesn't exist. Use importlib.util.find_spec to locate the hf-libero package directory and write paths to the actual bundled bddl_files/init_files/assets. * fix(ci): add smolvla extra to benchmark Dockerfiles num2words (required by SmolVLM processor) is declared in lerobot[smolvla], not lerobot[libero/metaworld]. Install both extras together. * fix(eval): render_frame covers _LazyAsyncVectorEnv isinstance(env, AsyncVectorEnv) silently skipped _LazyAsyncVectorEnv, causing video rendering to produce no frames on the default async path. Switch to hasattr(env, "call") so any async-compatible env (including _LazyAsyncVectorEnv) hits the call("render") branch. * refactor(envs): remove unused _get_sub_env_attr helper _get_sub_env_attr was defined but never called anywhere in the codebase. _sub_env_has_attr (its sibling) is kept — it is actively used in utils.py. * chore: apply prettier formatting to docs * docs(env_processor): remove deprecated add_envs_task from pipeline example add_envs_task is replaced by env.call("task_description") in this PR. Remove it from the pipeline walkthrough and renumber the steps (8→7). * refactor(envs): remove __del__ from _LazyAsyncVectorEnv __del__ is unreliable as a cleanup mechanism. close() is already called explicitly in the eval loop's finally block, so the finalizer is redundant. * fix(eval): prefetch next task's workers after close to avoid GPU memory overlap Previously, next task's AsyncVectorEnv workers were spawned while the current task was still running, causing both tasks' GPU contexts to coexist. Moving the prefetch start into the finally block (after env.close()) ensures workers for task N+1 only spin up once task N has released GPU memory. * refactor(envs): move _LazyAsyncVectorEnv to utils and apply to metaworld _LazyAsyncVectorEnv lived in libero.py but metaworld had the same OOM problem: all tasks' AsyncVectorEnv workers were spawned eagerly, wasting GPU memory for tasks not yet running. Move the class to envs/utils.py so both environments share it, then apply the same is_async + lazy wrapping pattern in create_metaworld_envs. * chore: remove out-of-scope benchmark/CI/docs files from PR Benchmark CI workflow, Dockerfiles, benchmark docs, evaluation smoke-test doc, and dispatch tests belong in a separate PR. Scope this PR to the async env init changes only. * chore: restore adding_benchmarks + test_dispatch, drop env_processor changes - Restore docs/source/adding_benchmarks.mdx (belongs in this PR) - Restore tests/envs/test_dispatch.py (belongs in this PR) - Revert docs/source/env_processor.mdx to main (out of scope for this PR) * docs(adding_benchmarks): remove CI smoke test step (coming in separate PR) Step 7 (Dockerfile + benchmark_tests.yml CI job) and its table rows are out of scope for this PR. The CI infrastructure will be added on top in a follow-up PR. * refactor(envs): remove unused add_envs_task Replaced by env.call("task_description") in lerobot_eval.py. No callers remain in the codebase. * style: fix prettier formatting in env_processor.mdx * fix(ci): use root container chmod to fix PermissionError on artifact dirs Running chmod on the host doesn't propagate into Docker due to UID/SELinux mismatch. Instead, spin up the image as root to mkdir+chmod from inside the container before the eval run mounts the same path. * fix(ci): re-chmod artifacts after eval to fix unreadable files Files created by user_lerobot inside the eval container inherit a restrictive umask, making them unreadable by the runner after the container exits. Add a post-eval 'docker run --user root' chmod step so upload-artifact can find the video files. * feat(ci): add monthly schedule trigger for benchmark tests Runs on the 1st of every month at 02:00 UTC in addition to the existing push/PR and manual dispatch triggers. * fix(ci): change benchmark schedule from monthly to weekly (every Monday) * fix(ci): use docker cp instead of bind mounts for artifacts Bind mounts on these runners don't surface container-written files on the host path (likely DinD/socket-mount setup). Switch to named containers + docker cp, which copies directly through the daemon and lands files in the runner's accessible filesystem. * fix(ci): write eval output to /tmp inside container user_lerobot cannot create /artifacts at the container root. Use /tmp/eval-artifacts (always writable) then docker cp it out. * feat(ci): add parse_eval_metrics step to benchmark workflow Adds scripts/ci/parse_eval_metrics.py and wires it into both Libero and MetaWorld jobs so the dashboard can read pc_success, avg_sum_reward and eval_s from the metrics artifact instead of relying on GitHub step timing. * feat(ci): add Libero train+eval smoke test (1 step, eval_freq=1) Runs accelerate launch --num_processes=1 lerobot-train with: - steps=1, batch_size=1, dataset.episodes=[0] (episode 0 only) - eval_freq=1 so the training loop triggers eval after step 1 - eval.n_episodes=1, eval.use_async_envs=false Tests the full train→eval-within-training pipeline in the existing libero-benchmark-libero:ci image (no extra Docker build cost). Uploads eval video from /tmp/train-smoke/eval/ as libero-train-smoke-video. * feat(ci): extract task descriptions and embed in metrics artifact - Add scripts/ci/extract_task_descriptions.py: runs inside the benchmark Docker container (LIBERO/MetaWorld installed) after lerobot-eval and writes task_descriptions.json mapping task keys to NL instructions. LIBERO: uses libero.libero.benchmark to get suite.get_task(i).language. MetaWorld: formats task name as human-readable label. - Call extraction at the end of each eval bash-c (\|\| true so never fatal). - parse_eval_metrics.py reads task_descriptions.json and includes it in metrics.json so the health dashboard Space can label videos by task. * fix(ci): call extract_task_descriptions.py after eval in benchmark jobs The task descriptions were never populated in metrics.json because extract_task_descriptions.py was never invoked. The script exists and parse_eval_metrics.py already looks for its output — the call was simply missing from the workflow. Appends the extraction step to the existing bash -c block (runs inside the container where libero/metaworld is installed) so task_descriptions.json is written to the eval-artifacts dir before docker cp copies it out. * fix(test): use SyncVectorEnv in test_base_create_envs AsyncVectorEnv spawns new subprocesses that do not inherit the in-process gym registration created by the test. Pass use_async_envs=False since this test validates dispatch logic, not async parallelism. * perf(ci): split Dockerfile dep-install from source-copy for faster rebuilds The dep-install layer (uv sync) now only depends on pyproject.toml, uv.lock, and a minimal package stub — not the full src/ tree. Source code changes only rebuild the final COPY layer (seconds, not minutes). Also switch from type=local cache (lost on ephemeral runners) to type=gha (persisted in GitHub Actions cache, shared across all runs). Before: every src/ change → full uv sync rebuild (~8-10 min) After: src/-only change → cached dep layer, ~30s source copy * fix(ci): add Docker Hub login to avoid pull rate limits Anonymous pulls from Docker Hub are rate-limited to 100/6h, which fails when multiple benchmark jobs pull nvidia/cuda in parallel. Add docker/login-action step (conditional on DOCKERHUB_USERNAME var) to authenticate and get 200 pulls/6h. Setup: add DOCKERHUB_USERNAME as a repository variable and DOCKERHUB_TOKEN as a repository secret in GitHub Settings. * fix(ci): use existing DOCKERHUB_LEROBOT_USERNAME/PASSWORD secrets * fix(ci): use env context for secrets check in step if-condition Step-level 'if' cannot reference 'secrets' directly. Expose the secret via an env var and check that instead. * fix(ci): simplify Docker Hub login to match existing workflows Drop the conditional guard — other workflows (docker_publish, full_tests) call docker/login-action unconditionally. * fix(ci): switch Docker cache from type=gha to type=registry GHA cache is capped at 10GB per repo — a single CUDA + PyTorch + benchmark image is ~8GB so the cache evicts before it's reused. Switch to type=registry which pushes cache layers to Docker Hub (huggingface/lerobot-benchmark-cache:{libero,metaworld}). No size limit, layers persist until explicitly deleted, and shared across all runners and branches. * fix(ci): use GHCR for Docker layer cache (Docker Hub push denied) Docker Hub CI token can't push to new repos. GHCR works out of the box — GITHUB_TOKEN has automatic packages:write for the repo owner. - Add GHCR login step (github.actor + GITHUB_TOKEN) - Switch cache refs to ghcr.io/huggingface/lerobot/cache-benchmark - Add packages:write at job level (not workflow, per zizmor) - Keep Docker Hub login for pulling nvidia/cuda base image * fix(ci): remove GHCR cache (org blocks GITHUB_TOKEN package writes) The huggingface org restricts GHCR package creation via GITHUB_TOKEN, causing 403 on cache export. Remove all registry caching and GHCR login. The Dockerfile layer split (deps vs source) still helps when the runner has a warm Docker daemon. Also fix the metaworld job which had a stale conditional Docker Hub login and was missing the GHCR login entirely. * fix(ci): address PR review feedback for benchmark smoke tests Security: - Remove "Login to Hugging Face" step — it was a no-op (ephemeral --rm container) that exposed the HF token via CLI argument in docker inspect / /proc//cmdline. The eval step already re-authenticates via env var. Functional: - Remove feat/benchmark-ci from push trigger branches (won't exist post-merge). Dockerfiles: - Pin uv to 0.8.0 (was unpinned, fetching whatever latest ships). - Add comment explaining the chmod +x ptxas workaround (Triton packaging bug — ships ptxas without execute bit). Scripts: - parse_eval_metrics.py: add note that it runs on bare host and must stay stdlib-only. - parse_eval_metrics.py: add NaN guard for avg_sum_reward and eval_s (was only guarding pc_success). ci(benchmarks): trigger on PRs targeting feat/benchmark-ci Benchmark PRs (robomme, libero-plus, robocerebra, robotwin) target feat/benchmark-ci, not main. Without this, the workflow never runs on those PRs. * fix(docker): use uv pip install instead of uv sync (cross-extra conflict) uv sync --locked validates the entire lockfile across all extras. Since robomme depends on mani-skill which pins numpy<2.0, and the base project requires numpy>=2.0, the full lockfile is unsatisfiable. Switch to uv pip install -e ".[libero,smolvla]" which only resolves the requested extras for the current Python version and platform, avoiding the cross-extra numpy conflict entirely. * chore: revert configs.py, factory.py, test_dispatch.py to main These use_async_envs default changes belong to the async-vector-env PR (#3274), not this CI PR. Restore to match origin/main. * fix: address PR review feedback — broken link, NaN guard, zizmor tags, fork skip - Remove broken Triton issue link from Dockerfile.benchmark.libero - Add module-level _safe_int helper to guard n_episodes against NaN - Move _safe_float to module level alongside _safe_int - Add # zizmor: ignore[unpinned-uses] to all upload-artifact@v4 steps - Add if: env.HF_USER_TOKEN != '' to Libero smoke eval for fork PRs * fix(ci): add fork PR guard to train-smoke and MetaWorld eval steps Add if: env.HF_USER_TOKEN != '' to the Libero train+eval smoke and MetaWorld smoke eval steps so fork PRs without the secret skip gracefully. * fix(ci): remove feat/benchmark-ci from PR trigger branches * refactor(docker): rebase benchmark images on nightly lerobot-gpu Use huggingface/lerobot-gpu:latest as base for both libero and metaworld benchmark Dockerfiles instead of building from nvidia/cuda scratch. The nightly image already has all extras installed via uv sync --extra all, so we only need to overlay the PR source code (and libero asset setup). This eliminates duplicated system dep installation, Python setup, uv venv creation, and the Triton ptxas workaround from both files. --------- Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>	2026-04-13 21:24:01 +02:00
Khalil Meftah	a8838c081b	perf: remove redundant CPU→GPU→CPU transition move in learner	2026-04-13 19:06:28 +02:00
Khalil Meftah	ee0814ef60	refactor: update SACAlgorithm to pass action_dim to _init_critics and fix encoder reference	2026-04-13 18:31:17 +02:00
Khalil Meftah	7b0bdf2a98	fix: add thread synchronization to ReplayBuffer to prevent race condition between add() and sample()	2026-04-13 18:27:24 +02:00
Jash Shah	9bd844a3b9	fix(rl): ensure queue and process cleanup on abnormal exit (#3063 ) Wrap the main execution in actor_cli and start_learner_threads with try/finally so that queues are closed and processes are joined even when an unhandled exception occurs. Previously, exceptions in act_with_policy or add_actor_information_and_train would skip all cleanup code, leaking GPU/CPU resources. Also sets the shutdown_event on exception so child processes exit gracefully. Fixes #3059 Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>	2026-04-13 16:25:42 +02:00
Khalil Meftah	9422dc98c2	fix: remove leftover normalization calls from reward classifier predict_reward Fixes #2355	2026-04-13 13:30:50 +02:00
Khalil Meftah	11a0b0174f	fix(teleop): keyboard EE teleop not registering special keys and losing intervention state Fixes #2345 Co-authored-by: jpizarrom <jpizarrom@gmail.com>	2026-04-13 12:31:00 +02:00

1 2 3 4 5 ...

1428 Commits