mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-15 16:49:55 +00:00
ce665160ae
* [Port codebase pipeline] General fixes for RL and scripts (#1748) * Refactor dataset configuration in documentation and codebase - Updated dataset configuration keys from `dataset_root` to `root` and `num_episodes` to `num_episodes_to_record` for consistency. - Adjusted replay episode handling by renaming `episode` to `replay_episode`. - Enhanced documentation - added specific processor to transform from policy actions to delta actions * Added Robot action to tensor processor Added new processor script for dealing with gym specific action processing * removed RobotAction2Tensor processor; imrpoved choosing observations in actor * nit in delta action * added missing reset functions to kinematics * Adapt teleoperate and replay to pipeline similar to record * refactor(processors): move to inheritance (#1750) * fix(teleoperator): improvements phone implementation (#1752) * fix(teleoperator): protect shared state in phone implementation * refactor(teleop): separate classes in phone * fix: solve breaking changes (#1753) * refactor(policies): multiple improvements (#1754) * refactor(processor): simpler logic in device processor (#1755) * refactor(processor): euclidean distance in delta action processor (#1757) * refactor(processor): improvements to joint observations processor migration (#1758) * refactor(processor): improvements to tokenizer migration (#1759) * refactor(processor): improvements to tokenizer migration * fix(tests): tokenizer tests regression from #1750 * fix(processors): fix float comparison and config in hil processors (#1760) * chore(teleop): remove unnecessary callbacks in KeyboardEndEffectorTeleop (#1761) * refactor(processor): improvements normalize pipeline migration (#1756) * refactor(processor): several improvements normalize processor step * refactor(processor): more improvements normalize processor * refactor(processor): more changes to normalizer * refactor(processor): take a different approach to DRY * refactor(processor): final design * chore(record): revert comment and continue deleted (#1764) * refactor(examples): pipeline phone examples (#1769) * refactor(examples): phone teleop + teleop script * refactor(examples): phone replay + replay * chore(examples): rename phone example files & folders * feat(processor): fix improvements to the pipeline porting (#1796) * refactor(processor): enhance tensor device handling in normalization process (#1795) * refactor(tests): remove unsupported device detection test for complementary data (#1797) * chore(tests): update ToBatchProcessor test (#1798) * refactor(tests): remove in-place mutation tests for actions and complementary data in batch processor * test(tests): add tests for action and task processing in batch processor * add names for android and ios phone (#1799) * use _tensor_stats in normalize processor (#1800) * fix(normalize_processor): correct device reference for tensor epsilon handling (#1801) * add point 5 add missing feature contracts (#1806) * Fix PR comments 1452 (#1807) * use key to determine image * Address rest of PR comments * use PolicyFeatures in transform_features --------- Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> --------- Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com> Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
155 lines
4.2 KiB
Plaintext
155 lines
4.2 KiB
Plaintext
# Train RL in Simulation
|
|
|
|
This guide explains how to use the `gym_hil` simulation environments as an alternative to real robots when working with the LeRobot framework for Human-In-the-Loop (HIL) reinforcement learning.
|
|
|
|
`gym_hil` is a package that provides Gymnasium-compatible simulation environments specifically designed for Human-In-the-Loop reinforcement learning. These environments allow you to:
|
|
|
|
- Train policies in simulation to test the RL stack before training on real robots
|
|
|
|
- Collect demonstrations in sim using external devices like gamepads or keyboards
|
|
- Perform human interventions during policy learning
|
|
|
|
Currently, the main environment is a Franka Panda robot simulation based on MuJoCo, with tasks like picking up a cube.
|
|
|
|
## Installation
|
|
|
|
First, install the `gym_hil` package within the LeRobot environment:
|
|
|
|
```bash
|
|
pip install -e ".[hilserl]"
|
|
```
|
|
|
|
## What do I need?
|
|
|
|
- A gamepad or keyboard to control the robot
|
|
- A Nvidia GPU
|
|
|
|
## Configuration
|
|
|
|
To use `gym_hil` with LeRobot, you need to create a configuration file. An example is provided [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/gym_hil_env.json). Key configuration sections include:
|
|
|
|
### Environment Type and Task
|
|
|
|
```json
|
|
{
|
|
"env": {
|
|
"type": "gym_manipulator",
|
|
"name": "gym_hil",
|
|
"task": "PandaPickCubeGamepad-v0",
|
|
"fps": 10
|
|
},
|
|
"device": "cuda"
|
|
}
|
|
```
|
|
|
|
Available tasks:
|
|
|
|
- `PandaPickCubeBase-v0`: Basic environment
|
|
- `PandaPickCubeGamepad-v0`: With gamepad control
|
|
- `PandaPickCubeKeyboard-v0`: With keyboard control
|
|
|
|
### Processor Configuration
|
|
|
|
```json
|
|
{
|
|
"env": {
|
|
"processor": {
|
|
"control_mode": "gamepad",
|
|
"gripper": {
|
|
"use_gripper": true,
|
|
"gripper_penalty": -0.02
|
|
},
|
|
"reset": {
|
|
"control_time_s": 15.0,
|
|
"fixed_reset_joint_positions": [
|
|
0.0, 0.195, 0.0, -2.43, 0.0, 2.62, 0.785
|
|
]
|
|
},
|
|
"inverse_kinematics": {
|
|
"end_effector_step_sizes": {
|
|
"x": 0.025,
|
|
"y": 0.025,
|
|
"z": 0.025
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Important parameters:
|
|
|
|
- `gripper.gripper_penalty`: Penalty for excessive gripper movement
|
|
- `gripper.use_gripper`: Whether to enable gripper control
|
|
- `inverse_kinematics.end_effector_step_sizes`: Size of the steps in the x,y,z axes of the end-effector
|
|
- `control_mode`: Set to `"gamepad"` to use a gamepad controller
|
|
|
|
## Running with HIL RL of LeRobot
|
|
|
|
### Basic Usage
|
|
|
|
To run the environment, set mode to null:
|
|
|
|
```bash
|
|
python -m lerobot.scripts.rl.gym_manipulator --config_path path/to/gym_hil_env.json
|
|
```
|
|
|
|
### Recording a Dataset
|
|
|
|
To collect a dataset, set the mode to `record` whilst defining the repo_id and number of episodes to record:
|
|
|
|
```json
|
|
{
|
|
"env": {
|
|
"type": "gym_manipulator",
|
|
"name": "gym_hil",
|
|
"task": "PandaPickCubeGamepad-v0"
|
|
},
|
|
"dataset": {
|
|
"repo_id": "username/sim_dataset",
|
|
"root": null,
|
|
"task": "pick_cube",
|
|
"num_episodes_to_record": 10,
|
|
"replay_episode": null,
|
|
"push_to_hub": true
|
|
},
|
|
"mode": "record"
|
|
}
|
|
```
|
|
|
|
```bash
|
|
python -m lerobot.scripts.rl.gym_manipulator --config_path path/to/gym_hil_env.json
|
|
```
|
|
|
|
### Training a Policy
|
|
|
|
To train a policy, checkout the configuration example available [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/train_gym_hil_env.json) and run the actor and learner servers:
|
|
|
|
```bash
|
|
python -m lerobot.scripts.rl.actor --config_path path/to/train_gym_hil_env.json
|
|
```
|
|
|
|
In a different terminal, run the learner server:
|
|
|
|
```bash
|
|
python -m lerobot.scripts.rl.learner --config_path path/to/train_gym_hil_env.json
|
|
```
|
|
|
|
The simulation environment provides a safe and repeatable way to develop and test your Human-In-the-Loop reinforcement learning components before deploying to real robots.
|
|
|
|
Congrats 🎉, you have finished this tutorial!
|
|
|
|
> [!TIP]
|
|
> If you have any questions or need help, please reach out on [Discord](https://discord.com/invite/s3KuuzsPFb).
|
|
|
|
Paper citation:
|
|
|
|
```
|
|
@article{luo2024precise,
|
|
title={Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning},
|
|
author={Luo, Jianlan and Xu, Charles and Wu, Jeffrey and Levine, Sergey},
|
|
journal={arXiv preprint arXiv:2410.21845},
|
|
year={2024}
|
|
}
|
|
```
|