mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-16 00:59:46 +00:00
0053defa2e
* Migrate gym_manipulator to use the pipeline Added get_teleop_events function to capture relevant events from teleop devices unrelated to actions * Added the capability to record a dataset * Added the replay functionality with the pipeline * Refactored `actor.py` to use the pipeline * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * RL works at this commit - fixed actor.py and bugs in gym_manipulator * change folder structure to reduce the size of gym_manip * Refactored hilserl config * Remove dataset and mode from HilSerlEnvConfig to a GymManipulatorConfig to reduce verbose of configs during training * format docs * removed get_teleop_events from abc * Refactor environment configuration and processing pipeline for GymHIL support. Removed device attribute from HILSerlRobotEnvConfig, added DummyTeleopDevice for simulation, and updated processor creation to accommodate GymHIL environments. * Improved typing for HILRobotEnv config and GymManipulator config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Migrated `gym_manipulator` to use a more modular structure similar to phone teleop * Refactor gripper handling and transition processing in HIL and robot kinematic processors - Updated gripper position handling to use a consistent key format across processors - Improved the EEReferenceAndDelta class to handle reference joint positions. - Added support for discrete gripper actions in the GripperVelocityToJoint processor. - Refactored the gym manipulator to improve modularity and clarity in processing steps. * Added delta_action_processor mapping wrapper * Added missing file delta_action_processor and improved imports in `gym_manipulator` * nit * Added missing file joint_observation_processor * Enhance processing architecture with new teleoperation processors - Introduced `AddTeleopActionAsComplimentaryData` and `AddTeleopEventsAsInfo` for integrating teleoperator actions and events into transitions. - Added `Torch2NumpyActionProcessor` and `Numpy2TorchActionProcessor` for seamless conversion between PyTorch tensors and NumPy arrays. - Updated `__init__.py` to include new processors in module exports, improving modularity and clarity in the processing pipeline. - GymHIL is now fully supported with HIL using the pipeline * Refactor configuration structure for gym_hil integration - Renamed sections for better readability, such as changing "Gym Wrappers Configuration" to "Processor Configuration." - Enhanced documentation with clear examples for dataset collection and policy evaluation configurations. * Enhance reset configuration and teleoperation event handling - Added `terminate_on_success` parameter to `ResetConfig` and `InterventionActionProcessor` for controlling episode termination behavior upon success detection. - Updated documentation to clarify the impact of `terminate_on_success` on data collection for reward classifier training. - Refactored teleoperation event handling to use `TeleopEvents` constants for improved readability and maintainability across various modules. * fix(keyboard teleop), delta action keys * Added transform features and feature contract * Added transform features for image crop * Enum for TeleopEvents * Update tranform_features delta action proc --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
155 lines
4.2 KiB
Plaintext
155 lines
4.2 KiB
Plaintext
# Train RL in Simulation
|
|
|
|
This guide explains how to use the `gym_hil` simulation environments as an alternative to real robots when working with the LeRobot framework for Human-In-the-Loop (HIL) reinforcement learning.
|
|
|
|
`gym_hil` is a package that provides Gymnasium-compatible simulation environments specifically designed for Human-In-the-Loop reinforcement learning. These environments allow you to:
|
|
|
|
- Train policies in simulation to test the RL stack before training on real robots
|
|
|
|
- Collect demonstrations in sim using external devices like gamepads or keyboards
|
|
- Perform human interventions during policy learning
|
|
|
|
Currently, the main environment is a Franka Panda robot simulation based on MuJoCo, with tasks like picking up a cube.
|
|
|
|
## Installation
|
|
|
|
First, install the `gym_hil` package within the LeRobot environment:
|
|
|
|
```bash
|
|
pip install -e ".[hilserl]"
|
|
```
|
|
|
|
## What do I need?
|
|
|
|
- A gamepad or keyboard to control the robot
|
|
- A Nvidia GPU
|
|
|
|
## Configuration
|
|
|
|
To use `gym_hil` with LeRobot, you need to create a configuration file. An example is provided [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/gym_hil_env.json). Key configuration sections include:
|
|
|
|
### Environment Type and Task
|
|
|
|
```json
|
|
{
|
|
"env": {
|
|
"type": "gym_manipulator",
|
|
"name": "gym_hil",
|
|
"task": "PandaPickCubeGamepad-v0",
|
|
"fps": 10
|
|
},
|
|
"device": "cuda"
|
|
}
|
|
```
|
|
|
|
Available tasks:
|
|
|
|
- `PandaPickCubeBase-v0`: Basic environment
|
|
- `PandaPickCubeGamepad-v0`: With gamepad control
|
|
- `PandaPickCubeKeyboard-v0`: With keyboard control
|
|
|
|
### Processor Configuration
|
|
|
|
```json
|
|
{
|
|
"env": {
|
|
"processor": {
|
|
"control_mode": "gamepad",
|
|
"gripper": {
|
|
"use_gripper": true,
|
|
"gripper_penalty": -0.02
|
|
},
|
|
"reset": {
|
|
"control_time_s": 15.0,
|
|
"fixed_reset_joint_positions": [
|
|
0.0, 0.195, 0.0, -2.43, 0.0, 2.62, 0.785
|
|
]
|
|
},
|
|
"inverse_kinematics": {
|
|
"end_effector_step_sizes": {
|
|
"x": 0.025,
|
|
"y": 0.025,
|
|
"z": 0.025
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Important parameters:
|
|
|
|
- `gripper.gripper_penalty`: Penalty for excessive gripper movement
|
|
- `gripper.use_gripper`: Whether to enable gripper control
|
|
- `inverse_kinematics.end_effector_step_sizes`: Size of the steps in the x,y,z axes of the end-effector
|
|
- `control_mode`: Set to `"gamepad"` to use a gamepad controller
|
|
|
|
## Running with HIL RL of LeRobot
|
|
|
|
### Basic Usage
|
|
|
|
To run the environment, set mode to null:
|
|
|
|
```bash
|
|
python -m lerobot.scripts.rl.gym_manipulator --config_path path/to/gym_hil_env.json
|
|
```
|
|
|
|
### Recording a Dataset
|
|
|
|
To collect a dataset, set the mode to `record` whilst defining the repo_id and number of episodes to record:
|
|
|
|
```json
|
|
{
|
|
"env": {
|
|
"type": "gym_manipulator",
|
|
"name": "gym_hil",
|
|
"task": "PandaPickCubeGamepad-v0"
|
|
},
|
|
"dataset": {
|
|
"repo_id": "username/sim_dataset",
|
|
"dataset_root": null,
|
|
"task": "pick_cube",
|
|
"num_episodes": 10,
|
|
"episode": 0,
|
|
"push_to_hub": true
|
|
},
|
|
"mode": "record"
|
|
}
|
|
```
|
|
|
|
```bash
|
|
python -m lerobot.scripts.rl.gym_manipulator --config_path path/to/gym_hil_env.json
|
|
```
|
|
|
|
### Training a Policy
|
|
|
|
To train a policy, checkout the configuration example available [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/train_gym_hil_env.json) and run the actor and learner servers:
|
|
|
|
```bash
|
|
python -m lerobot.scripts.rl.actor --config_path path/to/train_gym_hil_env.json
|
|
```
|
|
|
|
In a different terminal, run the learner server:
|
|
|
|
```bash
|
|
python -m lerobot.scripts.rl.learner --config_path path/to/train_gym_hil_env.json
|
|
```
|
|
|
|
The simulation environment provides a safe and repeatable way to develop and test your Human-In-the-Loop reinforcement learning components before deploying to real robots.
|
|
|
|
Congrats 🎉, you have finished this tutorial!
|
|
|
|
> [!TIP]
|
|
> If you have any questions or need help, please reach out on [Discord](https://discord.com/invite/s3KuuzsPFb).
|
|
|
|
Paper citation:
|
|
|
|
```
|
|
@article{luo2024precise,
|
|
title={Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning},
|
|
author={Luo, Jianlan and Xu, Charles and Wu, Jeffrey and Levine, Sergey},
|
|
journal={arXiv preprint arXiv:2410.21845},
|
|
year={2024}
|
|
}
|
|
```
|