Khalil Meftah
ef927ac830
refactor(rl): move actor weight-sync wire format from policy to algorithm
2026-05-09 22:47:45 +02:00
Khalil Meftah
23811b720d
feat(rl): consolidate HIL-SERL checkpoint into HF-style components
...
Make `RLAlgorithmConfig` and `RLAlgorithm` `HubMixin`s, add abstract
`state_dict()` / `load_state_dict()` for critic ensemble, target nets
and `log_alpha`, and persist them as a sibling `algorithm/` component
next to `pretrained_model/`. Replace the pickled `training_state.pt`
with an enriched `training_step.json` carrying `step` and
`interaction_step`, so resume restores actor + critics + target nets +
temperature + optimizers + RNG + counters from HF-standard files.
2026-05-09 22:47:20 +02:00
Khalil Meftah
29fc0c6d28
refactor: replace build_algorithm with make_algorithm for SACAlgorithmConfig and update related tests
2026-05-07 12:09:23 +02:00
Khalil Meftah
e298474bf3
fix(tests): gate RL tests on the datasets extra
2026-04-27 16:53:34 +02:00
Khalil Meftah
577f14337a
refactor(tests): remove grpc import checks from test files for cleaner code
2026-04-27 16:20:13 +02:00
Khalil Meftah
9ce9e01469
refactor(rl): make algorithm a nested config so all SAC hyperparameters are JSON-addressable
2026-04-27 13:39:03 +02:00
Khalil Meftah
1ed32210c7
refactor(rl/sac): consolidate hyperparameter ownership and clean up discrete critic
2026-04-24 13:18:33 +02:00
Khalil Meftah
06255996ea
refactor(policies): rename policies/sac → policies/gaussian_actor
2026-04-23 19:13:18 +02:00
Khalil Meftah
8065bf15c7
fix test for flat dict structure
2026-04-21 12:06:25 +02:00
Khalil Meftah
a4c0c9e358
update losses names in tests
2026-04-21 11:53:32 +02:00
Khalil Meftah
a84b0e8132
refactor(sac): decouple algorithm hyperparameters from policy config
2026-04-18 16:40:56 +02:00
Khalil Meftah
7a1c9e74c3
fix: skip tests that require grpc if not available
2026-04-15 15:18:04 +02:00
Khalil Meftah
e022207c75
refactor: RL stack refactoring — RLAlgorithm, RLTrainer, DataMixer, and SAC restructuring
2026-04-13 11:39:48 +02:00