Khalil Meftah
0944b84279
feat(rl): consolidate HIL-SERL checkpoint into HF-style components
...
Make and s, add abstract
/ for algorithm-owned tensors (critics,
target nets, ), and persist them as a sibling
component next to . Replace the pickled
side-file with an enriched
carrying both and , so resume restores actor +
critics + target nets + temperature + optimizers + RNG + counters from
plain HF-standard files.
2026-05-08 21:24:23 +02:00
Khalil Meftah
29fc0c6d28
refactor: replace build_algorithm with make_algorithm for SACAlgorithmConfig and update related tests
2026-05-07 12:09:23 +02:00
Khalil Meftah
e298474bf3
fix(tests): gate RL tests on the datasets extra
2026-04-27 16:53:34 +02:00
Khalil Meftah
577f14337a
refactor(tests): remove grpc import checks from test files for cleaner code
2026-04-27 16:20:13 +02:00
Khalil Meftah
9ce9e01469
refactor(rl): make algorithm a nested config so all SAC hyperparameters are JSON-addressable
2026-04-27 13:39:03 +02:00
Khalil Meftah
1ed32210c7
refactor(rl/sac): consolidate hyperparameter ownership and clean up discrete critic
2026-04-24 13:18:33 +02:00
Khalil Meftah
06255996ea
refactor(policies): rename policies/sac → policies/gaussian_actor
2026-04-23 19:13:18 +02:00
Khalil Meftah
8065bf15c7
fix test for flat dict structure
2026-04-21 12:06:25 +02:00
Khalil Meftah
a4c0c9e358
update losses names in tests
2026-04-21 11:53:32 +02:00
Khalil Meftah
a84b0e8132
refactor(sac): decouple algorithm hyperparameters from policy config
2026-04-18 16:40:56 +02:00
Khalil Meftah
7a1c9e74c3
fix: skip tests that require grpc if not available
2026-04-15 15:18:04 +02:00
Khalil Meftah
e022207c75
refactor: RL stack refactoring — RLAlgorithm, RLTrainer, DataMixer, and SAC restructuring
2026-04-13 11:39:48 +02:00