feat(processor): multiple improvements to the pipeline porting (#1749)

* [Port codebase pipeline] General fixes for RL and scripts (#1748)

* Refactor dataset configuration in documentation and codebase

- Updated dataset configuration keys from `dataset_root` to `root` and `num_episodes` to `num_episodes_to_record` for consistency.
- Adjusted replay episode handling by renaming `episode` to `replay_episode`.
- Enhanced documentation
- added specific processor to transform from policy actions to delta actions

* Added Robot action to tensor processor
Added new processor script for dealing with gym specific action processing

* removed RobotAction2Tensor processor; imrpoved choosing observations in actor

* nit in delta action

* added missing reset functions to kinematics

* Adapt teleoperate and replay to pipeline similar to record

* refactor(processors): move to inheritance (#1750)

* fix(teleoperator): improvements phone implementation (#1752)

* fix(teleoperator): protect shared state in phone implementation

* refactor(teleop): separate classes in phone

* fix: solve breaking changes (#1753)

* refactor(policies): multiple improvements (#1754)

* refactor(processor): simpler logic in device processor (#1755)

* refactor(processor): euclidean distance in delta action processor (#1757)

* refactor(processor): improvements to joint observations processor migration (#1758)

* refactor(processor): improvements to tokenizer migration (#1759)

* refactor(processor): improvements to tokenizer migration

* fix(tests): tokenizer tests regression from #1750

* fix(processors): fix float comparison and config in hil processors (#1760)

* chore(teleop): remove unnecessary callbacks in KeyboardEndEffectorTeleop (#1761)

* refactor(processor): improvements normalize pipeline migration (#1756)

* refactor(processor): several improvements normalize processor step

* refactor(processor): more improvements normalize processor

* refactor(processor): more changes to normalizer

* refactor(processor): take a different approach to DRY

* refactor(processor): final design

* chore(record): revert comment and continue deleted (#1764)

* refactor(examples): pipeline phone examples (#1769)

* refactor(examples): phone teleop + teleop script

* refactor(examples): phone replay + replay

* chore(examples): rename phone example files & folders

* feat(processor): fix improvements to the pipeline porting (#1796)

* refactor(processor): enhance tensor device handling in normalization process (#1795)

* refactor(tests): remove unsupported device detection test for complementary data (#1797)

* chore(tests): update ToBatchProcessor test (#1798)

* refactor(tests): remove in-place mutation tests for actions and complementary data in batch processor

* test(tests): add tests for action and task processing in batch processor

* add names for android and ios phone (#1799)

* use _tensor_stats in normalize processor (#1800)

* fix(normalize_processor): correct device reference for tensor epsilon handling (#1801)

* add point 5 add missing feature contracts (#1806)

* Fix PR comments 1452 (#1807)

* use key to determine image

* Address rest of PR comments

* use PolicyFeatures in transform_features

---------

Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>

---------

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
This commit is contained in:
Steven Palma
2025-08-31 20:38:52 +02:00
committed by GitHub
parent 35c5d43255
commit ce665160ae
55 changed files with 1549 additions and 2024 deletions
+11 -11
View File
@@ -127,11 +127,11 @@ class RewardClassifierConfig:
# Dataset configuration
class DatasetConfig:
repo_id: str # LeRobot dataset repository ID
dataset_root: str # Local dataset root directory
task: str # Task identifier
num_episodes: int # Number of episodes for recording
episode: int # Episode index for replay
push_to_hub: bool # Whether to push datasets to Hub
root: str | None = None # Local dataset root directory
num_episodes_to_record: int = 5 # Number of episodes for recording
replay_episode: int | None = None # Episode index for replay
push_to_hub: bool = False # Whether to push datasets to Hub
```
<!-- prettier-ignore-end -->
@@ -351,7 +351,7 @@ Create a configuration file for recording demonstrations (or edit an existing on
1. Set `mode` to `"record"` at the root level
2. Specify a unique `repo_id` for your dataset in the `dataset` section (e.g., "username/task_name")
3. Set `num_episodes` in the `dataset` section to the number of demonstrations you want to collect
3. Set `num_episodes_to_record` in the `dataset` section to the number of demonstrations you want to collect
4. Set `env.processor.image_preprocessing.crop_params_dict` to `{}` initially (we'll determine crops later)
5. Configure `env.robot`, `env.teleop`, and other hardware settings in the `env` section
@@ -390,10 +390,10 @@ Example configuration section:
},
"dataset": {
"repo_id": "username/pick_lift_cube",
"dataset_root": null,
"root": null,
"task": "pick_and_lift",
"num_episodes": 15,
"episode": 0,
"num_episodes_to_record": 15,
"replay_episode": 0,
"push_to_hub": true
},
"mode": "record",
@@ -626,7 +626,7 @@ python -m lerobot.scripts.rl.gym_manipulator --config_path src/lerobot/configs/r
- **mode**: set it to `"record"` to collect a dataset (at root level)
- **dataset.repo_id**: `"hf_username/dataset_name"`, name of the dataset and repo on the hub
- **dataset.num_episodes**: Number of episodes to record
- **dataset.num_episodes_to_record**: Number of episodes to record
- **env.processor.reset.terminate_on_success**: Whether to automatically terminate episodes when success is detected (default: `true`)
- **env.fps**: Number of frames per second to record
- **dataset.push_to_hub**: Whether to push the dataset to the hub
@@ -664,8 +664,8 @@ Example configuration section for data collection:
"repo_id": "hf_username/dataset_name",
"dataset_root": "data/your_dataset",
"task": "reward_classifier_task",
"num_episodes": 20,
"episode": 0,
"num_episodes_to_record": 20,
"replay_episode": null,
"push_to_hub": true
},
"mode": "record",
+3 -3
View File
@@ -107,10 +107,10 @@ To collect a dataset, set the mode to `record` whilst defining the repo_id and n
},
"dataset": {
"repo_id": "username/sim_dataset",
"dataset_root": null,
"root": null,
"task": "pick_cube",
"num_episodes": 10,
"episode": 0,
"num_episodes_to_record": 10,
"replay_episode": null,
"push_to_hub": true
},
"mode": "record"
+4 -4
View File
@@ -36,10 +36,10 @@ To teleoperate and collect a dataset, we need to modify this config file. Here's
},
"dataset": {
"repo_id": "your_username/il_gym",
"dataset_root": null,
"root": null,
"task": "pick_cube",
"num_episodes": 30,
"episode": 0,
"num_episodes_to_record": 30,
"replay_episode": null,
"push_to_hub": true
},
"mode": "record",
@@ -50,7 +50,7 @@ To teleoperate and collect a dataset, we need to modify this config file. Here's
Key configuration points:
- Set your `repo_id` in the `dataset` section: `"repo_id": "your_username/il_gym"`
- Set `num_episodes: 30` to collect 30 demonstration episodes
- Set `num_episodes_to_record: 30` to collect 30 demonstration episodes
- Ensure `mode` is set to `"record"`
- If you don't have an NVIDIA GPU, change `"device": "cuda"` to `"mps"` for macOS or `"cpu"`
- To use keyboard instead of gamepad, change `"task"` to `"PandaPickCubeKeyboard-v0"`