feat(processor): multiple improvements to the pipeline porting (#1749)

* [Port codebase pipeline] General fixes for RL and scripts (#1748) * Refactor dataset configuration in documentation and codebase - Updated dataset configuration keys from `dataset_root` to `root` and `num_episodes` to `num_episodes_to_record` for consistency. - Adjusted replay episode handling by renaming `episode` to `replay_episode`. - Enhanced documentation - added specific processor to transform from policy actions to delta actions * Added Robot action to tensor processor Added new processor script for dealing with gym specific action processing * removed RobotAction2Tensor processor; imrpoved choosing observations in actor * nit in delta action * added missing reset functions to kinematics * Adapt teleoperate and replay to pipeline similar to record * refactor(processors): move to inheritance (#1750) * fix(teleoperator): improvements phone implementation (#1752) * fix(teleoperator): protect shared state in phone implementation * refactor(teleop): separate classes in phone * fix: solve breaking changes (#1753) * refactor(policies): multiple improvements (#1754) * refactor(processor): simpler logic in device processor (#1755) * refactor(processor): euclidean distance in delta action processor (#1757) * refactor(processor): improvements to joint observations processor migration (#1758) * refactor(processor): improvements to tokenizer migration (#1759) * refactor(processor): improvements to tokenizer migration * fix(tests): tokenizer tests regression from #1750 * fix(processors): fix float comparison and config in hil processors (#1760) * chore(teleop): remove unnecessary callbacks in KeyboardEndEffectorTeleop (#1761) * refactor(processor): improvements normalize pipeline migration (#1756) * refactor(processor): several improvements normalize processor step * refactor(processor): more improvements normalize processor * refactor(processor): more changes to normalizer * refactor(processor): take a different approach to DRY * refactor(processor): final design * chore(record): revert comment and continue deleted (#1764) * refactor(examples): pipeline phone examples (#1769) * refactor(examples): phone teleop + teleop script * refactor(examples): phone replay + replay * chore(examples): rename phone example files & folders * feat(processor): fix improvements to the pipeline porting (#1796) * refactor(processor): enhance tensor device handling in normalization process (#1795) * refactor(tests): remove unsupported device detection test for complementary data (#1797) * chore(tests): update ToBatchProcessor test (#1798) * refactor(tests): remove in-place mutation tests for actions and complementary data in batch processor * test(tests): add tests for action and task processing in batch processor * add names for android and ios phone (#1799) * use _tensor_stats in normalize processor (#1800) * fix(normalize_processor): correct device reference for tensor epsilon handling (#1801) * add point 5 add missing feature contracts (#1806) * Fix PR comments 1452 (#1807) * use key to determine image * Address rest of PR comments * use PolicyFeatures in transform_features --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> --------- Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com> Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
2026-05-17 01:30:14 +00:00 · 2025-08-31 20:38:52 +02:00
parent 35c5d43255
commit ce665160ae
55 changed files with 1549 additions and 2024 deletions
@@ -127,11 +127,11 @@ class RewardClassifierConfig:
 # Dataset configuration
 class DatasetConfig:
    repo_id: str    # LeRobot dataset repository ID
-    dataset_root: str    # Local dataset root directory
    task: str    # Task identifier
-    num_episodes: int    # Number of episodes for recording
-    episode: int    # Episode index for replay
-    push_to_hub: bool    # Whether to push datasets to Hub
+    root: str | None = None    # Local dataset root directory
+    num_episodes_to_record: int = 5    # Number of episodes for recording
+    replay_episode: int | None = None    # Episode index for replay
+    push_to_hub: bool = False    # Whether to push datasets to Hub
 ```
 <!-- prettier-ignore-end -->

@@ -351,7 +351,7 @@ Create a configuration file for recording demonstrations (or edit an existing on

 1. Set `mode` to `"record"` at the root level
 2. Specify a unique `repo_id` for your dataset in the `dataset` section (e.g., "username/task_name")
-3. Set `num_episodes` in the `dataset` section to the number of demonstrations you want to collect
+3. Set `num_episodes_to_record` in the `dataset` section to the number of demonstrations you want to collect
 4. Set `env.processor.image_preprocessing.crop_params_dict` to `{}` initially (we'll determine crops later)
 5. Configure `env.robot`, `env.teleop`, and other hardware settings in the `env` section

@@ -390,10 +390,10 @@ Example configuration section:
  },
  "dataset": {
    "repo_id": "username/pick_lift_cube",
-    "dataset_root": null,
+    "root": null,
    "task": "pick_and_lift",
-    "num_episodes": 15,
-    "episode": 0,
+    "num_episodes_to_record": 15,
+    "replay_episode": 0,
    "push_to_hub": true
  },
  "mode": "record",
@@ -626,7 +626,7 @@ python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/r

 - **mode**: set it to `"record"` to collect a dataset (at root level)
 - **dataset.repo_id**: `"hf_username/dataset_name"`, name of the dataset and repo on the hub
- **dataset.num_episodes**: Number of episodes to record
+- **dataset.num_episodes_to_record**: Number of episodes to record
 - **env.processor.reset.terminate_on_success**: Whether to automatically terminate episodes when success is detected (default: `true`)
 - **env.fps**: Number of frames per second to record
 - **dataset.push_to_hub**: Whether to push the dataset to the hub
@@ -664,8 +664,8 @@ Example configuration section for data collection:
    "repo_id": "hf_username/dataset_name",
    "dataset_root": "data/your_dataset",
    "task": "reward_classifier_task",
-    "num_episodes": 20,
-    "episode": 0,
+    "num_episodes_to_record": 20,
+    "replay_episode": null,
    "push_to_hub": true
  },
  "mode": "record",
@@ -107,10 +107,10 @@ To collect a dataset, set the mode to `record` whilst defining the repo_id and n
  },
  "dataset": {
    "repo_id": "username/sim_dataset",
-    "dataset_root": null,
+    "root": null,
    "task": "pick_cube",
-    "num_episodes": 10,
-    "episode": 0,
+    "num_episodes_to_record": 10,
+    "replay_episode": null,
    "push_to_hub": true
  },
  "mode": "record"
@@ -36,10 +36,10 @@ To teleoperate and collect a dataset, we need to modify this config file. Here's
  },
  "dataset": {
    "repo_id": "your_username/il_gym",
-    "dataset_root": null,
+    "root": null,
    "task": "pick_cube",
-    "num_episodes": 30,
-    "episode": 0,
+    "num_episodes_to_record": 30,
+    "replay_episode": null,
    "push_to_hub": true
  },
  "mode": "record",
@@ -50,7 +50,7 @@ To teleoperate and collect a dataset, we need to modify this config file. Here's
 Key configuration points:

 - Set your `repo_id` in the `dataset` section: `"repo_id": "your_username/il_gym"`
- Set `num_episodes: 30` to collect 30 demonstration episodes
+- Set `num_episodes_to_record: 30` to collect 30 demonstration episodes
 - Ensure `mode` is set to `"record"`
 - If you don't have an NVIDIA GPU, change `"device": "cuda"` to `"mps"` for macOS or `"cpu"`
 - To use keyboard instead of gamepad, change `"task"` to `"PandaPickCubeKeyboard-v0"`