- Add RLAlgorithm base class and RLAlgorithmConfig with draccus.ChoiceRegistry
- Add RLTrainer for unified training orchestration with iterator pattern
- Add DataMixer and OnlineOfflineMixer for online/offline data mixing
- Restructure SAC algorithm with batch iterator and factory pattern
- Add observation normalization pre/post processors
- Add comprehensive tests for all new components
* chore: replace hard-coded OBS values with constants throughout all the source code
* chore(tests): replace hard-coded OBS values with constants throughout all the test code
* chore(rl): move rl related code to its directory at top level
* chore(style): apply pre-commit to renamed headers
* test(rl): fix rl imports
* docs(rl): update rl headers doc