lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-25 04:07:02 +00:00

Author	SHA1	Message	Date
Martino Russi	9423deda02	refactor(evo1): use native HF InternVL3-1B-hf, drop trust_remote_code - Switch from OpenGVLab/InternVL3-1B (requires trust_remote_code=True) to OpenGVLab/InternVL3-1B-hf (native transformers implementation). - Replace manual _extract_feature + _prepare_and_fuse_embeddings with a single model.forward() call — verified bit-for-bit identical output. - Remove ~170 lines of manual ViT/pixel-shuffle/projection logic. - Symlink README.md to docs/source/ following repo convention. Weights are byte-identical between both model variants; only the module naming differs. All 12 existing unit tests pass. Local training (10 steps) on maximellerbach/omx_pickandplace confirmed working. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-23 17:17:19 +02:00
javadcc_mac	25556ceefe	fix(evo1): move LIBERO padding into policy processors	2026-06-21 15:58:38 +08:00
javadcc_mac	4cfa762da8	Fix eval action conversion for bf16 policies	2026-06-13 10:51:33 +08:00
javadcc_mac	fa984990c0	Fix EVO1 LIBERO eval action postprocessing	2026-06-13 10:18:34 +08:00
javadcc_mac	f9b8f297b4	Fix EVO1 LIBERO rollout processors	2026-06-09 15:10:10 +08:00
javadcc_mac	95527f6051	Merge remote-tracking branch 'upstream/main' into codex/add-evo1-policy	2026-05-12 17:40:59 +08:00
javadcc_mac	407ee867b9	docs(evo1): format results table	2026-05-12 17:40:18 +08:00
Steven Palma	26ff40ddd7	chore(deps): cap torch ceiling at <2.12, pin Linux wheels to cu128 (#3570 ) * chore(deps): ceiling + cuda * ci: bump cuda version docker image * ci: add cpu wheel to release workflow * chore(deps): update uv.lock * docs: update installation with cuda note	2026-05-11 19:47:55 +02:00
javadcc_mac	a5e6409985	fix(evo1): finalize policy guide alignment	2026-05-11 21:51:41 +08:00
Maxime Ellerbach	6d269b28c8	docs(omx): adding some examples and scripts (#3566 ) * docs(omx): adding some examples and scripts * cleaning up and reviewing the cli args * adding __init__.py to example folder, adjusting the examples * adding reference to pretrained act policy * moving `.send_action` before `dataset.add_frame` for consistency Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * adjusting docstring Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * adressing hardcoded dataset fps * removed init as it worked without --------- Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>	2026-05-11 15:36:32 +02:00
Steven Palma	b607c8458e	docs: add policy & compute guide (#3534 ) * docs(policy): contributing a policy guide * docs(training): HW compute guide * chore(docs): add to readme and index * Apply suggestions from code review Co-authored-by: Haoming Song <1847575517@qq.com> Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> * chore(docs): slight improvements * refactor(docs): consolidate add policy docs * chore(style): fix pre-commit --------- Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Haoming Song <1847575517@qq.com>	2026-05-11 15:19:12 +02:00
Jash Shah	9e83510c99	fix(datasets): close file handle on VideoDecoder init failure in cache (#3542 ) If VideoDecoder() raises during initialization, the fsspec file handle was leaked since it was opened via __enter__() but never closed on the exception path. Now explicitly closes the handle before re-raising.	2026-05-10 17:30:37 +02:00
javadcc_mac	1c9fbba9a9	chore(evo1): align with policy contribution guide conventions - Add `src/lerobot/policies/evo1/README.md` symlink into `docs/source/evo1.mdx` to match the in-tree README convention (mirroring the EO-1 layout). - Convert `transformers` import in `internvl3_embedder.py` to the standard `TYPE_CHECKING + _transformers_available` two-step gating used by other optional-backbone policies (e.g. diffusion). The previous lazy-in-`__init__` import was functionally equivalent for runtime gating but didn't expose the real symbols to type checkers. - Add `lerobot[evo1]` to the `all` extra in `pyproject.toml` so `pip install 'lerobot[all]'` keeps installing every optional policy. Per the guidance in https://moon-ci-docs.huggingface.co/docs/lerobot/pr_3534/en/contributing_a_policy.	2026-05-10 23:14:23 +08:00
javadcc_mac	6a1b5ceb9d	Merge remote-tracking branch 'upstream/main' into codex/add-evo1-policy # Conflicts: # uv.lock	2026-05-10 22:48:17 +08:00
javadcc_mac	daa4c4dd30	chore(lock): regenerate uv.lock for evo1 extra Adds the `evo1` entry to `[package.metadata.requires-dist]` and the `provides-extras` list so that `uv sync --locked --extra test` (used by fast_tests.yml) no longer reports the lockfile as stale. Generated with `uv 0.8.0` (matching `UV_VERSION` in fast_tests.yml). The non-evo1 marker tweaks are produced by `uv lock` re-resolving the existing dep graph and are not introduced by this PR.	2026-05-10 22:43:26 +08:00
Anthony Shoumikhin	1f7b03f5f2	chore(deps): allow torch 2.11/2.12 and fix autocast deprecation (#3435 ) * chore(deps): allow torch 2.11/2.12 and fix autocast deprecation - Bump torch to >=2.7,<2.13 (was <2.11), torchvision to <0.28 (was <0.26), and torchcodec to <0.13 (was <0.11) to allow installs against the latest stable torch 2.11 and the upcoming 2.12 line. - Replace removed torch.get_autocast_gpu_dtype() with torch.get_autocast_dtype("cuda") in Florence2 and Qwen2.5-VL-MoE FlashAttention paths (the former is removed in 2.11+). - Refresh uv.lock for the new resolution (torch 2.11.0+cu130, torchvision 0.26.0+cu130, torchcodec 0.11.1, full CUDA 13 stack). Verified locally with `uv sync --locked` from a clean .venv and the lerobot test suite (pytest -n 8 --dist=loadfile --timeout=300). Failure set is identical to the pre-bump baseline: 18 pre-existing failures (test_sac_policy, test_pi0_rtc, test_pi05_rtc, test_replay_buffer), 0 new, 0 fixed. AI assistance: this change was authored with Claude Code per AI_POLICY.md. * fix(policies): use device-agnostic autocast dtype lookup Pass query_states.device.type to torch.get_autocast_dtype() instead of hardcoding 'cuda', so the cast matches the active autocast context when running under CPU/MPS/XPU autocast. --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-10 13:05:35 +02:00
Yiming Wang	ff992a7a1d	Merge branch 'main' into codex/add-evo1-policy	2026-05-10 18:54:35 +08:00
Steven Palma	cb8edf17e6	chore(dependencies): update uv.lock (#3475 )	2026-05-10 12:24:22 +02:00
Steven Palma	5699f6cbf4	chore(ci): disable auto-stale (#3550 )	2026-05-10 11:49:31 +02:00
javadcc_mac	48269dddb3	fix(evo1): infer batch size after normalizing image dims `_collect_image_batches` read `batch_size = batch[camera_keys[0]].shape[0]` before normalizing per-camera tensors to `(B, C, H, W)`. For an unbatched `(C, H, W)` input (which the function tries to support via the `image.dim() == 3` branch), this picked up the channel count `C` instead of the real batch size, making the subsequent per-sample loop iterate `C` times and indexing go out of bounds. Normalize each camera tensor up-front, then read `batch_size` from the normalized batch dim. Adds `test_collect_image_batches_handles_unbatched_chw` covering the regression. Reported by Copilot review on huggingface/lerobot#3545.	2026-05-10 11:29:23 +08:00
javadcc_mac	8df8d3d866	feat(policies): add EVO1 policy	2026-05-09 21:39:19 +08:00
masato-ka	0e6114ac36	fix(train): restrict legacy RA-BC migration to JSON checkpoints only (#3490 ) * fix(train): restrict legacy RA-BC migration to JSON checkpoints only _migrate_legacy_rabc_fields was called for all config files, causing json.load to raise DecodeError when a YAML/TOML config was passed to lerobot-train for a new training run. Guard the block with an .endswith(".json") check so migration only runs when resuming from a JSON checkpoint.	2026-05-08 20:27:01 +02:00
Steven Palma	c8ce413d73	fix(robots): allign lekiwi default with so100 use_degrees (#3531 )	2026-05-07 17:52:34 +02:00
Pepijn	82dffde7fa	fix(ci): speed up multi-task benchmark evals (parallelize + cap VLABench steps) (#3529 ) * fix(ci): run multi-task benchmark evals 5-at-a-time in parallel The eval script supports running tasks concurrently via a ThreadPoolExecutor (env.max_parallel_tasks). Apply it to the four multi-task benchmark CI jobs (RoboTwin, RoboCasa, RoboMME, LIBERO-plus — 8-10 tasks/task_ids each) so they finish in ~2 waves of 5 instead of running sequentially. Single-task jobs (Libero, MetaWorld, RoboCerebra) are unchanged. * fix(ci): cap VLABench smoke eval at 50 steps per task VLABench's default episode_length is 500 steps; with 10 tasks at ~1 it/s the smoke eval took ~80 minutes of rollouts on top of the image build. The eval is a pipeline smoke test (running_success_rate stays at 0% on this short rollout anyway), so we don't need full episodes — cap each task at 50 steps to bring total rollout time down ~10x. * fix(ci): run VLABench tasks 5-at-a-time in parallel The eval script already supports running multiple tasks concurrently via a ThreadPoolExecutor (env.max_parallel_tasks). Set it to 5 so the 10 VLABench tasks finish in ~2 waves instead of running sequentially.	2026-05-07 13:37:16 +02:00
Ville Kuosmanen	eaf0218bc8	feat(policy): use pretrained vision encoder weights by default for diffusion and vqbet (#3202 ) * feat: add pretrained vision encoder weights for diffusion and vqbet * fix test by re-generating artifacts --------- Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-05-07 12:10:38 +02:00
Pepijn	a0e52d52fe	fix(ci): bump robotwin benchmark image to CUDA 12.6 (#3525 ) The robotwin benchmark Dockerfile still installed cuda-nvcc-12-4 and cuda-cudart-dev-12-4 after #3505 upgraded the base image to CUDA 12.6.3 on Ubuntu 24.04. Those packages aren't available in the ubuntu2404 CUDA repo, so the build failed at apt-get install. Bumping both to -12-6 to match the base image.	2026-05-07 11:11:12 +02:00
Haoming Song	e99c55af4b	feat(policies): add EO-1 model (#3403 ) * feat(policies): add EO-1 model * chore(eo1): adjust policy_eo1_README.md to to avoid duplicate with eo1.mdx * chore(eo1): remove policy_eo1_README.md, link eo1.mdx in policy folder --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-05-06 18:01:16 +02:00
Steven Palma	408e0ca763	fix(robots): openarm features with openarmmini (#3524 )	2026-05-06 17:03:09 +02:00
Maxime Ellerbach	ce24063efd	feat(dagger): adding smooth handover (#3506 ) * feat(dagger): adding smooth handover * update docstring * small phase fix and documenting potential issues * cleaning up	2026-05-05 14:44:32 +02:00
Steven Palma	82934719db	chore(dep): bump transformers to 5.4.0 (#3374 ) * fix(deps): breaking change from transformers 5.4.0 * Update src/lerobot/policies/xvla/modeling_florence2.py Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * Update src/lerobot/policies/wall_x/qwen_model/qwen2_5_vl_moe.py Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * removing dataclass * bumping transformers 5.4.0 * weird i can't even pass the test on main * oops, typo * chore(style): fix pre-commit run * chore: update uv.lock * seems like a weird numerical precision issue, lets check in runners * chore: update uv.lock * chore(dependecies): adjust transformers version * chore: update uv.lock --------- Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> Co-authored-by: Maximellerbach <maxime.ellerbach@huggingface.co> Co-authored-by: raushan <raushan@huggingface.co>	2026-05-05 14:19:09 +02:00
Steven Palma	401a217597	chore(ci): increase time stale (#3507 )	2026-05-04 22:35:16 +02:00
Steven Palma	40094b0464	chore(ci): upgrade docker internal (#3505 )	2026-05-04 21:28:52 +02:00
Jash Shah	fdbfc015a2	fix(peft): fix LoRA resume from Hub (PosixPath + double wrap) (#3485 )	2026-05-04 10:52:37 +02:00
Haoming Song	d656da8ccc	fix(pi): keep training sampling outside compiled forwards (#3487 ) Move PI0 and PI0.5 noise/time sampling into the policy wrappers so the compiled PyTorch cores receive them as tensor inputs. This keeps Beta sampling out of torch.compile on MPS, avoiding aten::_sample_dirichlet compilation errors while preserving the CUDA training path. Validation: .venv/bin/python -m pre_commit run --files src/lerobot/policies/pi0/modeling_pi0.py src/lerobot/policies/pi05/modeling_pi05.py; .venv/bin/python -m pytest -sv -rs tests/policies/pi0_pi05/test_pi0.py tests/policies/pi0_pi05/test_pi05.py tests/policies/pi0_pi05/test_pi0_rtc.py tests/policies/pi0_pi05/test_pi05_rtc.py Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-04-30 13:21:17 +02:00
Khalil Meftah	b5f65e5332	Expose sarm package API and ship reward model card template (#3477 ) * chore: List lerobot_rewardmodel_modelcard_template.md in MANIFEST.in * chore: export SARMConfig, SARMRewardModel, and make_sarm_pre_post_processors from rewards.sarm.	2026-04-29 16:17:16 +02:00
Khalil Meftah	cd6b43ea7a	fix(train): migrate legacy RA-BC fields in train config loading (#3480 )	2026-04-29 16:17:00 +02:00
Steven Palma	2236bbe7a3	fix(rollout): propagate policy-specific CLI config paramaters (#3483 ) Co-authored-by: Maxime Ellerbach <maxime.ellerbach@huggingface.co>	2026-04-29 16:13:10 +02:00
Maxime Ellerbach	cb0a944941	refactor(datasets): replace untyped dict with typed DatasetInfo dataclass (#3472 ) * refactor(datasets): replace untyped dict with typed DatasetInfo dataclass Introduce typed DatasetInfo dataclass to replace untyped dict representation of info.json. Changes: - Add DatasetInfo dataclass with explicit fields and validation - Implement __post_init__ for shape conversion (list ↔ tuple) - Add dict-style compatibility layer (__getitem__, __setitem__, .get()) - Add from_dict() and to_dict() for JSON serialization - Update io_utils to use load_info/write_info with DatasetInfo - Update dataset utilities and metadata to use attribute access - Remove aggregate.py dict-style field access - Add tests fixture support for DatasetInfo Benefits: - Type safety with IDE auto-completion - Validation at construction time - Explicit schema documentation * fix pre-commit * update docstring inside DatasetInfo.from_dict() * sorts the unknown to have deterministic output Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * refactoring the last few old fieds * fix crop dataset roi type mismatch * use consistantly int for data and video_files_size_in_mb --------- Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> Co-authored-by: jjolla93 <jjolla93@gmail.com>	2026-04-28 18:40:30 +02:00
Khalil Meftah	8a3d64033f	Reward models refactor (#3142 ) * feat(rewards): add RewardModelConfig and PreTrainedRewardModel base classes * refactor(rewards): migrate Classifier from policies/sac/reward_model/ to rewards/classifier/ * refactor(rewards): migrate SARM from policies/sarm/ to rewards/sarm/ * refactor(rewards): add rewards/factory.py and remove reward model code from policies/factory.py * refactor(rewards): update imports and delete old reward model locations * test(rewards): add reward model tests and update existing test imports * fix(rewards): restore full Classifier and SARM implementations * test(rewards): restore missing CUDA and mixed precision classifier processor tests * refactor(lerobot_train.py): remove rabc specific configuration and replace it with a generic samplerweight class in lerobot_train * refactor(lerobot_train.py): add missing sampling weight script * linter + missing files * add testing for sampl weighter * revert some useless changes, improve typing * update docs * add automatic detection of the progress path * remove type exp * improve comment * fix: move rabc.py to rewards/sarm/ and update import paths * refactor(imports): update reward model imports to new module structure * refactor(imports): update reward model imports to reflect new module structure * refactor(imports): conditionally import pandas based on availability * feat(configs): add reward_model field to TrainPipelineConfig and Hub fields to RewardModelConfig * refactor(policies): remove reward model branches from policy factory and __init__ * refactor(rewards): expand __init__ facade and fix SARMConfig __post_init__ crash * feat(train): route reward model training through rewards/factory instead of policies/factory * refactor(train): streamline reward model training logic * fix(rewards): ensure FileNotFoundError is raised for missing config_file * refactor(train): update __get_path_fields__ to include reward_model for config loading * refactor(classifier): remove redundant input normalization in predict_reward method * fix(train): raise ValueError for non-trainable reward models in train function * refactor(pretrained_rm): add model card template * refactor(tests): reward models * refactor(sarm): update reset method and remove unused action prediction methods * refactor(wandb): differentiate tags for reward model and policy training in cfg_to_group function * fix(train): raise ValueError for PEFT usage in reward model training * refactor(rewards): enhance RewardModelConfig with device handling and delta indices properties --------- Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>	2026-04-28 17:56:24 +02:00
Steven Palma	03ee50e08f	chore(ci): bump docs workflows (#3476 )	2026-04-28 15:06:44 +02:00
Steven Palma	ca87ccd941	feat(rollout): decouple policy deployment from data recording with new `lerobot-rollout` CLI (#3413 ) * feat(scripts): lerobot-rollout * fix(rollout) require dataset in dagger + use duration too * fix(docs): dagger num_episodes * test(rollout): fix expectations * fix(rollout): features check * fix(rollout): device and task propagation + feature pos + warn fps + move rename_map config * docs(rollout): edit rename_map instructions * chore(rollout): multiple minor improvements * chore(rollout): address coments + minor improvements * fix(rollout): enable default * fix(tests): default value RTCConfig * fix(rollout): robot_observation_processor and notify_observation at policy frequency instead of interpolator rate Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(rollout): prevent relativeactions with sync inference engine Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(rollout): rtc reanchor to non normalized state Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(rollout): fixing the episode length to use hwc (#3469) also reducing default length to 5 minutes * feat(rollout): go back to initial position is now a config * fix(rollout): properly propagating video_files_size_in_mb to lerobot_dataset (#3470) * chore(rollout): note about dagger correction stage * chore(docs): update comments and docstring * fix(test): move rtc relative out of rollout module * fix(rollout): address the review comments --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Maxime Ellerbach <maxime.ellerbach@huggingface.co>	2026-04-28 00:57:35 +02:00
Steven Palma	77352c495c	chore(dependencies): update uv.lock (#3437 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2026-04-27 23:15:46 +02:00
Steven Palma	05a5223885	fix(pi): avoid peak RAM in PiGemma construction by freeing replaced submodules (#3454 ) Co-Authored-By: Daiki Kamata <daiki.kamata@access-company.com> Co-Authored-By: Jack Vial <jackvial@users.noreply.github.com> Co-Authored-By: Ajay Anubolu <AjAnubolu@users.noreply.github.com> Co-Authored-By: Finn F. <F-Fer@users.noreply.github.com>	2026-04-24 17:50:12 +02:00
Steven Palma	580d818aa9	fix(dataset): no default overwrite in lerobot tool recompute stats (#3452 )	2026-04-24 15:07:19 +02:00
Steven Palma	587aa82021	fix(imports): realsense import name is platform dependent (#3451 )	2026-04-24 12:55:38 +02:00
Chuyao Shen	12b88fce02	not use dataclass (#3414 ) Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-04-24 11:26:59 +02:00
masato-ka	fc6c94c82a	fix(sarm): handle BaseModelOutputWithPooling from transformers 5.x in… (#3419 ) * fix(sarm): handle BaseModelOutputWithPooling from transformers 5.x in CLIP encoding In transformers 5.x, CLIPModel.get_image_features() and get_text_features() return BaseModelOutputWithPooling instead of a plain torch.FloatTensor. Added isinstance check to extract pooler_output when the return value is not a tensor, maintaining backward compatibility with transformers 4.x. Fixes AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'detach' * Adding assertion check for pooler_output of CLIP. This change is response to below comment. https://github.com/huggingface/lerobot/pull/3419#discussion_r3112594387 * Adding assertion check for pooler_output of CLIP. This change is response to below comment. Change to simple check and rise https://github.com/huggingface/lerobot/pull/3419#discussion_r3126953776 --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-04-23 16:26:58 +02:00
Steven Palma	1add460678	fix(policy): loss normalization for padded actions in ACT, Diffusion, and MultiTaskDiT (#3442 ) * Fix loss normalization for padded actions in ACT, Diffusion, and MultiTaskDiT When action_is_pad masks out padded timesteps, the subsequent .mean() still divides by the total element count (including zeroed-out padding), underestimating the loss. With 60-70% padding this can cut the effective gradient signal by 2-3x. Replace mask-then-mean with mask-then-sum / valid-count for all three affected policies. TDMPC is not affected because it sums over time before averaging over batch. Fixes #3353 * linting Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com> Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * Update src/lerobot/policies/diffusion/modeling_diffusion.py Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com> Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> * Update src/lerobot/policies/multi_task_dit/modeling_multi_task_dit.py Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com> Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> * Update src/lerobot/policies/multi_task_dit/modeling_multi_task_dit.py Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com> Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> * apply ACT loss normalization suggestion from review Divide by num_valid (timesteps * action_dim) instead of just timesteps, matching the diffusion/multi_task_dit fix. Addresses review from @whats2000 (https://github.com/huggingface/lerobot/pull/3377#discussion_r3106845791). * fix(test): update safetensor act --------- Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> Signed-off-by: Steven Palma <imstevenpmwork@ieee.org> Co-authored-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Co-authored-by: Maxime Ellerbach <maxime@ellerbach.net> Co-authored-by: whats2000 <60466660+whats2000@users.noreply.github.com>	2026-04-23 15:23:54 +02:00
Qi Jia	4587c2b648	fix xvla docs (#3291 ) Co-authored-by: Qi Jia <kaufou@gmail.com> Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>	2026-04-23 14:50:32 +02:00
whats2000	2236cdb302	fix(smolvla): correct loss normalization for padded actions (#3434 ) Apply the same per-scalar-mean fix to SmolVLA that #3377 landed for ACT / Diffusion / MultiTaskDiT. The pre-patch form applies the `action_is_pad` mask to zero out padded timesteps, then calls `.mean()` (or `.mean(dim=(1, 2))`). Because `.mean()` divides by the total number of elements including the zeroed padding, the loss is diluted by the padding fraction. Fixed by normalizing only over valid (non-padded) scalar entries: num_valid = ((~actions_is_pad).sum(...) * losses.shape[-1]).clamp_min(1) loss = losses.sum(...) / num_valid `clamp_min(1)` preserves the all-padded-batch edge case (0/1 = 0). Both reduction paths are updated. Behavior when `action_is_pad` is missing is unchanged (`losses.mean()`). Empirical A/B on aloha_sim_transfer_cube_human (chunk_size=40, batch=2, 30 steps, fixed seed, GB200) shows `loss_A / loss_B = 0.9672 (±0.088)` — same direction and magnitude as PR #3377's `loss_A / loss_C ≈ 0.96` for ACT. Heavier-padding recipes will see a larger gap. Refs: #3353 (original report for ACT), #3377 (fix for the other three policies).	2026-04-23 10:34:11 +02:00

1 2 3 4 5 ...

1459 Commits