Move PI0 and PI0.5 noise/time sampling into the policy wrappers so the compiled PyTorch cores receive them as tensor inputs.
This keeps Beta sampling out of torch.compile on MPS, avoiding aten::_sample_dirichlet compilation errors while preserving the CUDA training path.
Validation: .venv/bin/python -m pre_commit run --files src/lerobot/policies/pi0/modeling_pi0.py src/lerobot/policies/pi05/modeling_pi05.py; .venv/bin/python -m pytest -sv -rs tests/policies/pi0_pi05/test_pi0.py tests/policies/pi0_pi05/test_pi05.py tests/policies/pi0_pi05/test_pi0_rtc.py tests/policies/pi0_pi05/test_pi05_rtc.py
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
* chore: List lerobot_rewardmodel_modelcard_template.md in MANIFEST.in
* chore: export SARMConfig, SARMRewardModel, and make_sarm_pre_post_processors from rewards.sarm.
* refactor(datasets): replace untyped dict with typed DatasetInfo dataclass
Introduce typed DatasetInfo dataclass to replace untyped dict representation of info.json.
Changes:
- Add DatasetInfo dataclass with explicit fields and validation
- Implement __post_init__ for shape conversion (list ↔ tuple)
- Add dict-style compatibility layer (__getitem__, __setitem__, .get())
- Add from_dict() and to_dict() for JSON serialization
- Update io_utils to use load_info/write_info with DatasetInfo
- Update dataset utilities and metadata to use attribute access
- Remove aggregate.py dict-style field access
- Add tests fixture support for DatasetInfo
Benefits:
- Type safety with IDE auto-completion
- Validation at construction time
- Explicit schema documentation
* fix pre-commit
* update docstring inside DatasetInfo.from_dict()
* sorts the unknown to have deterministic output
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
* refactoring the last few old fieds
* fix crop dataset roi type mismatch
* use consistantly int for data and video_files_size_in_mb
---------
Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net>
Co-authored-by: jjolla93 <jjolla93@gmail.com>
* feat(rewards): add RewardModelConfig and PreTrainedRewardModel base classes
* refactor(rewards): migrate Classifier from policies/sac/reward_model/ to rewards/classifier/
* refactor(rewards): migrate SARM from policies/sarm/ to rewards/sarm/
* refactor(rewards): add rewards/factory.py and remove reward model code from policies/factory.py
* refactor(rewards): update imports and delete old reward model locations
* test(rewards): add reward model tests and update existing test imports
* fix(rewards): restore full Classifier and SARM implementations
* test(rewards): restore missing CUDA and mixed precision classifier processor tests
* refactor(lerobot_train.py): remove rabc specific configuration and replace it with a generic samplerweight class in lerobot_train
* refactor(lerobot_train.py): add missing sampling weight script
* linter + missing files
* add testing for sampl weighter
* revert some useless changes, improve typing
* update docs
* add automatic detection of the progress path
* remove type exp
* improve comment
* fix: move rabc.py to rewards/sarm/ and update import paths
* refactor(imports): update reward model imports to new module structure
* refactor(imports): update reward model imports to reflect new module structure
* refactor(imports): conditionally import pandas based on availability
* feat(configs): add reward_model field to TrainPipelineConfig and Hub fields to RewardModelConfig
* refactor(policies): remove reward model branches from policy factory and __init__
* refactor(rewards): expand __init__ facade and fix SARMConfig __post_init__ crash
* feat(train): route reward model training through rewards/factory instead of policies/factory
* refactor(train): streamline reward model training logic
* fix(rewards): ensure FileNotFoundError is raised for missing config_file
* refactor(train): update __get_path_fields__ to include reward_model for config loading
* refactor(classifier): remove redundant input normalization in predict_reward method
* fix(train): raise ValueError for non-trainable reward models in train function
* refactor(pretrained_rm): add model card template
* refactor(tests): reward models
* refactor(sarm): update reset method and remove unused action prediction methods
* refactor(wandb): differentiate tags for reward model and policy training in cfg_to_group function
* fix(train): raise ValueError for PEFT usage in reward model training
* refactor(rewards): enhance RewardModelConfig with device handling and delta indices properties
---------
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
* fix(sarm): handle BaseModelOutputWithPooling from transformers 5.x in CLIP encoding
In transformers 5.x, CLIPModel.get_image_features() and get_text_features()
return BaseModelOutputWithPooling instead of a plain torch.FloatTensor.
Added isinstance check to extract pooler_output when the return value is not
a tensor, maintaining backward compatibility with transformers 4.x.
Fixes AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'detach'
* Adding assertion check for pooler_output of CLIP. This change is response to below comment.
https://github.com/huggingface/lerobot/pull/3419#discussion_r3112594387
* Adding assertion check for pooler_output of CLIP. This change is response to below comment. Change to simple check and rise
https://github.com/huggingface/lerobot/pull/3419#discussion_r3126953776
---------
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Apply the same per-scalar-mean fix to SmolVLA that #3377 landed for
ACT / Diffusion / MultiTaskDiT. The pre-patch form applies the
`action_is_pad` mask to zero out padded timesteps, then calls `.mean()`
(or `.mean(dim=(1, 2))`). Because `.mean()` divides by the total number
of elements including the zeroed padding, the loss is diluted by the
padding fraction.
Fixed by normalizing only over valid (non-padded) scalar entries:
num_valid = ((~actions_is_pad).sum(...) * losses.shape[-1]).clamp_min(1)
loss = losses.sum(...) / num_valid
`clamp_min(1)` preserves the all-padded-batch edge case (0/1 = 0). Both
reduction paths are updated. Behavior when `action_is_pad` is missing is
unchanged (`losses.mean()`).
Empirical A/B on aloha_sim_transfer_cube_human (chunk_size=40, batch=2,
30 steps, fixed seed, GB200) shows `loss_A / loss_B = 0.9672 (±0.088)` —
same direction and magnitude as PR #3377's `loss_A / loss_C ≈ 0.96` for
ACT. Heavier-padding recipes will see a larger gap.
Refs: #3353 (original report for ACT), #3377 (fix for the other three
policies).