Commit Graph

13 Commits

Author SHA1 Message Date
Khalil Meftah ef927ac830 refactor(rl): move actor weight-sync wire format from policy to algorithm 2026-05-09 22:47:45 +02:00
Khalil Meftah 23811b720d feat(rl): consolidate HIL-SERL checkpoint into HF-style components
Make `RLAlgorithmConfig` and `RLAlgorithm` `HubMixin`s, add abstract
`state_dict()` / `load_state_dict()` for critic ensemble, target nets
and `log_alpha`, and persist them as a sibling `algorithm/` component
next to `pretrained_model/`. Replace the pickled `training_state.pt`
with an enriched `training_step.json` carrying `step` and
`interaction_step`, so resume restores actor + critics + target nets +
temperature + optimizers + RNG + counters from HF-standard files.
2026-05-09 22:47:20 +02:00
Khalil Meftah 29fc0c6d28 refactor: replace build_algorithm with make_algorithm for SACAlgorithmConfig and update related tests 2026-05-07 12:09:23 +02:00
Khalil Meftah e298474bf3 fix(tests): gate RL tests on the datasets extra 2026-04-27 16:53:34 +02:00
Khalil Meftah 577f14337a refactor(tests): remove grpc import checks from test files for cleaner code 2026-04-27 16:20:13 +02:00
Khalil Meftah 9ce9e01469 refactor(rl): make algorithm a nested config so all SAC hyperparameters are JSON-addressable 2026-04-27 13:39:03 +02:00
Khalil Meftah 1ed32210c7 refactor(rl/sac): consolidate hyperparameter ownership and clean up discrete critic 2026-04-24 13:18:33 +02:00
Khalil Meftah 06255996ea refactor(policies): rename policies/sac → policies/gaussian_actor 2026-04-23 19:13:18 +02:00
Khalil Meftah 8065bf15c7 fix test for flat dict structure 2026-04-21 12:06:25 +02:00
Khalil Meftah a4c0c9e358 update losses names in tests 2026-04-21 11:53:32 +02:00
Khalil Meftah a84b0e8132 refactor(sac): decouple algorithm hyperparameters from policy config 2026-04-18 16:40:56 +02:00
Khalil Meftah 7a1c9e74c3 fix: skip tests that require grpc if not available 2026-04-15 15:18:04 +02:00
Khalil Meftah e022207c75 refactor: RL stack refactoring — RLAlgorithm, RLTrainer, DataMixer, and SAC restructuring 2026-04-13 11:39:48 +02:00