* feat(rewards): add RewardModelConfig and PreTrainedRewardModel base classes
* refactor(rewards): migrate Classifier from policies/sac/reward_model/ to rewards/classifier/
* refactor(rewards): migrate SARM from policies/sarm/ to rewards/sarm/
* refactor(rewards): add rewards/factory.py and remove reward model code from policies/factory.py
* refactor(rewards): update imports and delete old reward model locations
* test(rewards): add reward model tests and update existing test imports
* fix(rewards): restore full Classifier and SARM implementations
* test(rewards): restore missing CUDA and mixed precision classifier processor tests
* refactor(lerobot_train.py): remove rabc specific configuration and replace it with a generic samplerweight class in lerobot_train
* refactor(lerobot_train.py): add missing sampling weight script
* linter + missing files
* add testing for sampl weighter
* revert some useless changes, improve typing
* update docs
* add automatic detection of the progress path
* remove type exp
* improve comment
* fix: move rabc.py to rewards/sarm/ and update import paths
* refactor(imports): update reward model imports to new module structure
* refactor(imports): update reward model imports to reflect new module structure
* refactor(imports): conditionally import pandas based on availability
* feat(configs): add reward_model field to TrainPipelineConfig and Hub fields to RewardModelConfig
* refactor(policies): remove reward model branches from policy factory and __init__
* refactor(rewards): expand __init__ facade and fix SARMConfig __post_init__ crash
* feat(train): route reward model training through rewards/factory instead of policies/factory
* refactor(train): streamline reward model training logic
* fix(rewards): ensure FileNotFoundError is raised for missing config_file
* refactor(train): update __get_path_fields__ to include reward_model for config loading
* refactor(classifier): remove redundant input normalization in predict_reward method
* fix(train): raise ValueError for non-trainable reward models in train function
* refactor(pretrained_rm): add model card template
* refactor(tests): reward models
* refactor(sarm): update reset method and remove unused action prediction methods
* refactor(wandb): differentiate tags for reward model and policy training in cfg_to_group function
* fix(train): raise ValueError for PEFT usage in reward model training
* refactor(rewards): enhance RewardModelConfig with device handling and delta indices properties
---------
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>