Commit Graph

86 Commits

Author SHA1 Message Date
AdilZouitine 539dbd18ce Add review feedback 2025-05-16 14:25:21 +02:00
Adil Zouitine 2051dd38fc [HIL-SERL] Review feedback modifications (#1112) 2025-05-15 15:24:41 +02:00
Eugene Mironov c7a3973653 [PORT HIL-SERL] Better unit tests coverage for SAC policy (#1074) 2025-05-14 16:41:36 +02:00
AdilZouitine b8c2b0bb93 Clean the code and remove todo 2025-04-24 16:10:56 +02:00
AdilZouitine a7a51cfc9c Refactor SACPolicy and configuration to replace 'grasp_critic' terminology with 'discrete_critic'. Update related methods and comments for clarity and consistency in handling discrete actions. 2025-04-18 14:57:03 +00:00
AdilZouitine 54c3c6d684 Enhance MLP class in modeling_sac.py with detailed docstring and refactor layer construction for improved readability. Simplify layer addition logic by removing unnecessary conditions and ensuring consistent handling of activations and dropout. 2025-04-18 14:15:06 +00:00
pre-commit-ci[bot] fb92935601 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 13:33:37 +00:00
AdilZouitine dcd850feab Refactor SACObservationEncoder to improve modularity and readability. Split initialization into dedicated methods for image and state layers, and enhance caching logic for image features. Update forward method to streamline feature encoding and ensure proper normalization handling. 2025-04-18 15:10:22 +02:00
AdilZouitine 1ce368503d Refactor SACPolicy initialization by breaking down the constructor into smaller methods for normalization, encoders, critics, actor, and temperature setup. This enhances readability and maintainability. 2025-04-18 15:10:22 +02:00
AdilZouitine fb075a709d Refactor input and output normalization handling in SACPolicy for improved clarity and efficiency. Consolidate encoder initialization logic and remove redundant else statements. 2025-04-18 15:10:22 +02:00
AdilZouitine 3424644ecd Fix init temp
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-18 15:10:22 +02:00
AdilZouitine c5382a450c fix caching
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-18 15:10:22 +02:00
AdilZouitine 9e5f254db0 change the tanh distribution to match hil serl
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-18 15:10:22 +02:00
AdilZouitine 8122721f6d match target entropy hil serl
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-18 15:10:22 +02:00
AdilZouitine 5c352ae558 stick to hil serl nn architecture
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-18 15:10:22 +02:00
AdilZouitine 9386892f8e Refactor modeling_sac and parameter handling for clarity and reusability.
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-18 15:10:22 +02:00
AdilZouitine 267a837a2c fix encoder training 2025-04-18 15:10:22 +02:00
Michel Aractingi 9fd4c21d4d General fixes in code, removed delta action, fixed grasp penalty, added logic to put gripper reward in info 2025-04-18 15:10:22 +02:00
pre-commit-ci[bot] 02e1ed0bfb [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:10:22 +02:00
AdilZouitine e18274bc9a fix caching and dataset stats is optional 2025-04-18 15:10:22 +02:00
AdilZouitine 68c271ad25 Add rounding for safety 2025-04-18 15:10:22 +02:00
AdilZouitine 203315d378 fix sign issue 2025-04-18 15:10:22 +02:00
AdilZouitine d5a87f67cf Handle gripper penalty 2025-04-18 15:10:22 +02:00
AdilZouitine 8bcf41761d fix caching 2025-04-18 15:10:22 +02:00
AdilZouitine 70130b9841 Enhance SACPolicy to support shared encoder and optimize action selection
- Cached encoder output in select_action method to reduce redundant computations.
- Updated action selection and grasp critic calls to utilize cached encoder features when available.
2025-04-18 15:10:22 +02:00
AdilZouitine 6167886472 Enhance SACPolicy and learner server for improved grasp critic integration
- Updated SACPolicy to conditionally compute grasp critic losses based on the presence of discrete actions.
- Refactored the forward method to handle grasp critic model selection and loss computation more clearly.
- Adjusted learner server to utilize optimized parameters for grasp critic during training.
- Improved action handling in the ManiskillMockGripperWrapper to accommodate both tuple and single action inputs.
2025-04-18 15:10:22 +02:00
AdilZouitine f9fb9d4594 Refactor SACPolicy for improved readability and action dimension handling
- Cleaned up code formatting for better readability, including consistent spacing and removal of unnecessary blank lines.
- Consolidated continuous action dimension calculation to enhance clarity and maintainability.
- Simplified loss return statements in the forward method to improve code structure.
- Ensured grasp critic parameters are included conditionally based on configuration settings.
2025-04-18 15:10:22 +02:00
AdilZouitine d86d29fe21 Add mock gripper support and enhance SAC policy action handling
- Introduced mock_gripper parameter in ManiskillEnvConfig to enable gripper simulation.
- Added ManiskillMockGripperWrapper to adjust action space for environments with discrete actions.
- Updated SACPolicy to compute continuous action dimensions correctly, ensuring compatibility with the new gripper setup.
- Refactored action handling in the training loop to accommodate the changes in action dimensions.
2025-04-18 15:10:22 +02:00
AdilZouitine f83d215e7a Refactor SAC policy and training loop to enhance discrete action support
- Updated SACPolicy to conditionally compute losses for grasp critic based on num_discrete_actions.
- Simplified forward method to return loss outputs as a dictionary for better clarity.
- Adjusted learner_server to handle both main and grasp critic losses during training.
- Ensured optimizers are created conditionally for grasp critic based on configuration settings.
2025-04-18 15:10:22 +02:00
AdilZouitine 7361a11a4d Refactor SAC configuration and policy to support discrete actions
- Removed GraspCriticNetworkConfig class and integrated its parameters into SACConfig.
- Added num_discrete_actions parameter to SACConfig for better action handling.
- Updated SACPolicy to conditionally create grasp critic networks based on num_discrete_actions.
- Enhanced grasp critic forward pass to handle discrete actions and compute losses accordingly.
2025-04-18 15:10:22 +02:00
pre-commit-ci[bot] 88d26ae976 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:10:22 +02:00
s1lent4gnt 3a2308d86f Add grasp critic to the training loop
- Integrated the grasp critic gradient update to the training loop in learner_server
- Added Adam optimizer and configured grasp critic learning rate in configuration_sac
- Added target critics networks update after the critics gradient step
2025-04-18 15:10:22 +02:00
s1lent4gnt 66693965c0 Add grasp critic
- Implemented grasp critic to evaluate gripper actions
- Added corresponding config parameters for tuning
2025-04-18 15:10:22 +02:00
AdilZouitine 5b49601072 Fix convergence of sac, multiple torch compile on the same model caused divergence 2025-04-18 15:10:22 +02:00
AdilZouitine 0185a0b6fd Fix cuda graph break 2025-04-18 15:10:22 +02:00
pre-commit-ci[bot] eb44a06a9b [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:10:22 +02:00
AdilZouitine 4d5ecb082e Refactor SACPolicy for improved type annotations and readability
- Enhanced type annotations for variables in the `SACPolicy` class to improve code clarity.
- Updated method calls to use keyword arguments for better readability.
- Streamlined the extraction of batch components, ensuring consistent typing across the class methods.
2025-04-18 15:10:22 +02:00
AdilZouitine 6e687e2910 Refactor SACPolicy and learner_server for improved clarity and functionality
- Updated the `forward` method in `SACPolicy` to handle loss computation for actor, critic, and temperature models.
- Replaced direct calls to `compute_loss_*` methods with a unified `forward` method in `learner_server`.
- Enhanced batch processing by consolidating input parameters into a single dictionary for better readability and maintainability.
- Removed redundant code and improved documentation for clarity.
2025-04-18 15:10:22 +02:00
AdilZouitine 3beab33fac Refactor imports in modeling_sac.py for improved organization
- Rearranged import statements for better readability.
- Removed unused imports and streamlined the code structure.
2025-04-18 15:10:22 +02:00
Michel Aractingi 05a237ce10 Added gripper control mechanism to gym_manipulator
Moved HilSerl env config to configs/env/configs.py
fixes in actor_server and modeling_sac and configuration_sac
added the possibility of ignoring missing keys in env_cfg in get_features_from_env_config function
2025-04-18 15:10:22 +02:00
AdilZouitine db897a1619 [WIP] Update SAC configuration and environment settings
- Reduced frame rate in `ManiskillEnvConfig` from 400 to 200.
- Enhanced `SACConfig` with new dataclasses for actor, learner, and network configurations.
- Improved input and output feature management in `SACConfig`.
- Refactored `actor_server` and `learner_server` to access configuration properties directly.
- Updated training pipeline to validate configurations and handle dataset repo IDs more robustly.
2025-04-18 15:09:46 +02:00
AdilZouitine 80d566eb56 Handle new config with sac 2025-04-18 15:09:27 +02:00
pre-commit-ci[bot] 0ea27704f6 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:09:25 +02:00
pre-commit-ci[bot] 1c8daf11fd [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:07:46 +02:00
pre-commit-ci[bot] 42f95e827d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:06:52 +02:00
AdilZouitine 618ed00d45 Initialize log_alpha with the logarithm of temperature_init in SACPolicy
- Updated the SACPolicy class to set log_alpha using the logarithm of the initial temperature value from the configuration.
2025-04-18 15:06:52 +02:00
pre-commit-ci[bot] 50d8db481e [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:06:52 +02:00
AdilZouitine e4a5971ffd Remove unused functions and imports from modeling_sac.py
- Deleted the `find_and_copy_params` function and the `Ensemble` class, as they were deemed unnecessary.
- Cleaned up imports by removing `from_modules` from `tensordict` to enhance code clarity.
- Simplified the assertion in the `Policy` class for better readability.
2025-04-18 15:06:52 +02:00
pre-commit-ci[bot] fd74c194b6 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:06:52 +02:00
AdilZouitine 0959694bab Refactor SACPolicy and learner server for improved replay buffer management
- Updated SACPolicy to create critic heads using a list comprehension for better readability.
- Simplified the saving and loading of models using `save_model` and `load_model` functions from the safetensors library.
- Introduced `initialize_offline_replay_buffer` function in the learner server to streamline offline dataset handling and replay buffer initialization.
- Enhanced logging for dataset loading processes to improve traceability during training.
2025-04-18 15:06:52 +02:00