Refactored hilserl config

2026-05-16 17:20:05 +00:00 · 2025-08-05 01:24:46 +02:00
parent b292dbbc55
commit 9effc5214f
4 changed files with 289 additions and 104 deletions
@@ -56,27 +56,38 @@ pip install -e ".[hilserl]"

 ### Understanding Configuration

-The training process begins with proper configuration for the HILSerl environment. The configuration class of interest is `HILSerlRobotEnvConfig` in `lerobot/envs/configs.py`. Which is defined as:
+The training process begins with proper configuration for the HILSerl environment. The configuration class of interest is `HILSerlRobotEnvConfig` in `lerobot/envs/configs.py`. The configuration is now organized into focused, nested sub-configs:

 <!-- prettier-ignore-start -->
 ```python
 class HILSerlRobotEnvConfig(EnvConfig):
    robot: RobotConfig | None = None    # Main robot agent (defined in `lerobot/robots`)
-    teleop: TeleoperatorConfig | None = None    # Teleoperator agent, e.g., gamepad or leader arm, (defined in `lerobot/teleoperators`)
-    wrapper: EnvTransformConfig | None = None    # Environment wrapper settings; check `lerobot/scripts/server/gym_manipulator.py`
-    fps: int = 10    # Control frequency
+    teleop: TeleoperatorConfig | None = None    # Teleoperator agent, e.g., gamepad or leader arm
+    processor: HILSerlProcessorConfig    # Processing pipeline configuration (nested)
+    dataset: DatasetConfig    # Dataset recording/replay configuration (nested)
    name: str = "real_robot"    # Environment name
    mode: str = None    # "record", "replay", or None (for training)
+    device: str = "cuda"    # Compute device
+
+# Nested processor configuration
+class HILSerlProcessorConfig:
+    control_mode: str = "gamepad"    # Control mode
+    observation: ObservationConfig    # Observation processing settings
+    image_preprocessing: ImagePreprocessingConfig    # Image crop/resize settings
+    gripper: GripperConfig    # Gripper control and penalty settings
+    reset: ResetConfig    # Environment reset and timing settings
+    inverse_kinematics: InverseKinematicsConfig    # IK processing settings
+    reward_classifier: RewardClassifierConfig    # Reward classifier settings
+
+# Dataset configuration
+class DatasetConfig:
    repo_id: str | None = None    # LeRobot dataset repository ID
    dataset_root: str | None = None    # Local dataset root (optional)
    task: str = ""    # Task identifier
    num_episodes: int = 10    # Number of episodes for recording
-    episode: int = 0    # episode index for replay
-    device: str = "cuda"    # Compute device
-    push_to_hub: bool = True    # Whether to push the recorded datasets to Hub
-    pretrained_policy_name_or_path: str | None = None    # For policy loading
-    reward_classifier_pretrained_path: str | None = None    # For reward model
-    number_of_steps_after_success: int = 0    # For reward classifier, collect more positive examples after a success to train a classifier
+    episode: int = 0    # Episode index for replay
+    push_to_hub: bool = True    # Whether to push datasets to Hub
+    fps: int = 10    # Control frequency
 ```
 <!-- prettier-ignore-end -->

@@ -133,19 +144,22 @@ Create a configuration file for recording demonstrations (or edit an existing on
 1. Set `mode` to `"record"`
 2. Specify a unique `repo_id` for your dataset (e.g., "username/task_name")
 3. Set `num_episodes` to the number of demonstrations you want to collect
-4. Set `crop_params_dict` to `null` initially (we'll determine crops later)
+4. Set `processor.image_preprocessing.crop_params_dict` to `{}` initially (we'll determine crops later)
 5. Configure `robot`, `cameras`, and other hardware settings

 Example configuration section:

 ```json
 "mode": "record",
-"repo_id": "username/pick_lift_cube",
-"dataset_root": null,
-"task": "pick_and_lift",
-"num_episodes": 15,
-"episode": 0,
-"push_to_hub": true
+"dataset": {
+    "repo_id": "username/pick_lift_cube",
+    "dataset_root": null,
+    "task": "pick_and_lift",
+    "num_episodes": 15,
+    "episode": 0,
+    "push_to_hub": true,
+    "fps": 10
+}
 ```

 ### Using a Teleoperation Device
@@ -191,10 +205,13 @@ The gamepad provides a very convenient way to control the robot and the episode
 To setup the gamepad, you need to set the `control_mode` to `"gamepad"` and define the `teleop` section in the configuration file.

 ```json
-    "teleop": {
-        "type": "gamepad",
-        "use_gripper": true
-    },
+"teleop": {
+    "type": "gamepad",
+    "use_gripper": true
+},
+"processor": {
+    "control_mode": "gamepad"
+}
 ```

 <p align="center">
@@ -216,11 +233,14 @@ The SO101 leader arm has reduced gears that allows it to move and track the foll
 To setup the SO101 leader, you need to set the `control_mode` to `"leader"` and define the `teleop` section in the configuration file.

 ```json
-    "teleop": {
-        "type": "so101_leader",
-        "port": "/dev/tty.usbmodem585A0077921", # check your port number
-        "use_degrees": true
-    },
+"teleop": {
+    "type": "so101_leader",
+    "port": "/dev/tty.usbmodem585A0077921",
+    "use_degrees": true
+},
+"processor": {
+    "control_mode": "leader"
+}
 ```

 In order to annotate the success/failure of the episode, **you will need** to use a keyboard to press `s` for success, `esc` for failure.
@@ -251,7 +271,7 @@ python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/e

 During recording:

-1. The robot will reset to the initial position defined in the configuration file `fixed_reset_joint_positions`
+1. The robot will reset to the initial position defined in the configuration file `processor.reset.fixed_reset_joint_positions`
 2. Complete the task successfully
 3. The episode ends with a reward of 1 when you press the "success" button
 4. If the time limit is reached, or the fail button is pressed, the episode ends with a reward of 0
@@ -310,11 +330,15 @@ observation.images.front: [180, 250, 120, 150]
 Add these crop parameters to your training configuration:

 ```json
-"crop_params_dict": {
-    "observation.images.side": [180, 207, 180, 200],
-    "observation.images.front": [180, 250, 120, 150]
-},
-"resize_size": [128, 128]
+"processor": {
+    "image_preprocessing": {
+        "crop_params_dict": {
+            "observation.images.side": [180, 207, 180, 200],
+            "observation.images.front": [180, 250, 120, 150]
+        },
+        "resize_size": [128, 128]
+    }
+}
 ```

 **Recommended image resolution**
@@ -346,23 +370,29 @@ python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/r
 - **mode**: set it to `"record"` to collect a dataset
 - **repo_id**: `"hf_username/dataset_name"`, name of the dataset and repo on the hub
 - **num_episodes**: Number of episodes to record
- **number_of_steps_after_success**: Number of additional frames to record after a success (reward=1) is detected
+- **processor.reset.number_of_steps_after_success**: Number of additional frames to record after a success (reward=1) is detected
 - **fps**: Number of frames per second to record
 - **push_to_hub**: Whether to push the dataset to the hub

-The `number_of_steps_after_success` parameter is crucial as it allows you to collect more positive examples. When a success is detected, the system will continue recording for the specified number of steps while maintaining the reward=1 label. Otherwise, there won't be enough states in the dataset labeled to 1 to train a good classifier.
+The `processor.reset.number_of_steps_after_success` parameter is crucial as it allows you to collect more positive examples. When a success is detected, the system will continue recording for the specified number of steps while maintaining the reward=1 label. Otherwise, there won't be enough states in the dataset labeled to 1 to train a good classifier.

 Example configuration section for data collection:

 ```json
 {
  "mode": "record",
-  "repo_id": "hf_username/dataset_name",
-  "dataset_root": "data/your_dataset",
-  "num_episodes": 20,
-  "push_to_hub": true,
-  "fps": 10,
-  "number_of_steps_after_success": 15
+  "dataset": {
+    "repo_id": "hf_username/dataset_name",
+    "dataset_root": "data/your_dataset",
+    "num_episodes": 20,
+    "push_to_hub": true,
+    "fps": 10
+  },
+  "processor": {
+    "reset": {
+      "number_of_steps_after_success": 15
+    }
+  }
 }
 ```

@@ -422,7 +452,11 @@ To use your trained reward classifier, configure the `HILSerlRobotEnvConfig` to
 <!-- prettier-ignore-start -->
 ```python
 env_config = HILSerlRobotEnvConfig(
-    reward_classifier_pretrained_path="path_to_your_pretrained_trained_model",
+    processor=HILSerlProcessorConfig(
+        reward_classifier=RewardClassifierConfig(
+            pretrained_path="path_to_your_pretrained_trained_model"
+        )
+    ),
    # Other environment parameters
 )
 ```
@@ -432,7 +466,13 @@ or set the argument in the json config file.

 ```json
 {
-  "reward_classifier_pretrained_path": "path_to_your_pretrained_model"
+  "processor": {
+    "reward_classifier": {
+      "pretrained_path": "path_to_your_pretrained_model",
+      "success_threshold": 0.7,
+      "success_reward": 1.0
+    }
+  }
 }
 ```