# Imitation Learning on Real-World Robots This tutorial will explain how to train a neural network to control a real robot autonomously. **You'll learn:** 1. How to record and visualize your dataset. 2. How to train a policy using your data and prepare it for evaluation. 3. How to evaluate your policy and visualize the results. By following these steps, you'll be able to replicate tasks, such as picking up a Lego block and placing it in a bin with a high success rate, as shown in the video below.
Video: pickup lego block task
This tutorial isn’t tied to a specific robot: we walk you through the commands and API snippets you can adapt for any supported platform. During data collection, you’ll use a “teloperation” device, such as a leader arm or keyboard to teleoperate the robot and record its motion trajectories. Once you’ve gathered enough trajectories, you’ll train a neural network to imitate these trajectories and deploy the trained model so your robot can perform the task autonomously. If you run into any issues at any point, jump into our [Discord community](https://discord.com/invite/s3KuuzsPFb) for support. Want to quickly get the right commands for your setup? The [quickstart notebook](https://github.com/huggingface/lerobot/blob/main/examples/notebooks/quickstart.ipynb) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/lerobot/blob/main/examples/notebooks/quickstart.ipynb) lets you configure your robot once and generates all the commands below ready to paste. ## Set up and Calibrate If you haven't yet set up and calibrated your robot and teleop device, please do so by following the robot-specific tutorial. ## Teleoperate In this example, we’ll demonstrate how to teleoperate the SO101 robot. For each command, we also provide a corresponding API example. Note that the `id` associated with a robot is used to store the calibration file. It's important to use the same `id` when teleoperating, recording, and evaluating when using the same setup. ```bash lerobot-teleoperate \ --robot.type=so101_follower \ --robot.port=/dev/tty.usbmodem58760431541 \ --robot.id=my_awesome_follower_arm \ --teleop.type=so101_leader \ --teleop.port=/dev/tty.usbmodem58760431551 \ --teleop.id=my_awesome_leader_arm ``` ```python from lerobot.teleoperators.so_leader import SO101Leader, SO101LeaderConfig from lerobot.robots.so_follower import SO101Follower, SO101FollowerConfig robot_config = SO101FollowerConfig( port="/dev/tty.usbmodem58760431541", id="my_red_robot_arm", ) teleop_config = SO101LeaderConfig( port="/dev/tty.usbmodem58760431551", id="my_blue_leader_arm", ) robot = SO101Follower(robot_config) teleop_device = SO101Leader(teleop_config) robot.connect() teleop_device.connect() while True: action = teleop_device.get_action() robot.send_action(action) ``` The teleoperate command will automatically: 1. Identify any missing calibrations and initiate the calibration procedure. 2. Connect the robot and teleop device and start teleoperation. ## Cameras To add cameras to your setup, follow this [Guide](./cameras#setup-cameras). ## Teleoperate with cameras With `rerun`, you can teleoperate again while simultaneously visualizing the camera feeds and joint positions. In this example, we’re using the Koch arm. ```bash lerobot-teleoperate \ --robot.type=koch_follower \ --robot.port=/dev/tty.usbmodem58760431541 \ --robot.id=my_awesome_follower_arm \ --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 1920, height: 1080, fps: 30}}" \ --teleop.type=koch_leader \ --teleop.port=/dev/tty.usbmodem58760431551 \ --teleop.id=my_awesome_leader_arm \ --display_data=true ``` ```python from lerobot.cameras.opencv import OpenCVCameraConfig from lerobot.teleoperators.koch_leader import KochLeader, KochLeaderConfig from lerobot.robots.koch_follower import KochFollower, KochFollowerConfig camera_config = { "front": OpenCVCameraConfig(index_or_path=0, width=1920, height=1080, fps=30) } robot_config = KochFollowerConfig( port="/dev/tty.usbmodem585A0076841", id="my_red_robot_arm", cameras=camera_config ) teleop_config = KochLeaderConfig( port="/dev/tty.usbmodem58760431551", id="my_blue_leader_arm", ) robot = KochFollower(robot_config) teleop_device = KochLeader(teleop_config) robot.connect() teleop_device.connect() while True: observation = robot.get_observation() action = teleop_device.get_action() robot.send_action(action) ``` ## Record a dataset Once you're familiar with teleoperation, you can record your first dataset. We use the Hugging Face hub features for uploading your dataset. If you haven't previously used the Hub, make sure you can login via the cli using a write-access token, this token can be generated from the [Hugging Face settings](https://huggingface.co/settings/tokens). Add your token to the CLI by running this command: ```bash hf auth login --token ${HUGGINGFACE_TOKEN} --add-to-git-credential ``` Then store your Hugging Face repository name in a variable: ```bash HF_USER=$(NO_COLOR=1 hf auth whoami | awk -F': *' 'NR==1 {print $2}') echo $HF_USER ``` Now you can record a dataset. To record 5 episodes and upload your dataset to the hub, adapt the code below for your robot and execute the command or API example. ```bash lerobot-record \ --robot.type=so101_follower \ --robot.port=/dev/tty.usbmodem585A0076841 \ --robot.id=my_awesome_follower_arm \ --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 1920, height: 1080, fps: 30}}" \ --teleop.type=so101_leader \ --teleop.port=/dev/tty.usbmodem58760431551 \ --teleop.id=my_awesome_leader_arm \ --display_data=true \ --dataset.repo_id=${HF_USER}/record-test \ --dataset.num_episodes=5 \ --dataset.single_task="Grab the black cube" \ --dataset.streaming_encoding=true \ # --dataset.vcodec=auto \ --dataset.encoder_threads=2 ``` ```python from lerobot.cameras.opencv import OpenCVCameraConfig from lerobot.datasets import LeRobotDataset from lerobot.utils.feature_utils import hw_to_dataset_features from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig from lerobot.teleoperators.so_leader import SO100Leader, SO100LeaderConfig from lerobot.common.control_utils import init_keyboard_listener from lerobot.utils.utils import log_say from lerobot.utils.visualization_utils import init_rerun from lerobot.scripts.lerobot_record import record_loop from lerobot.processor import make_default_processors NUM_EPISODES = 5 FPS = 30 EPISODE_TIME_SEC = 60 RESET_TIME_SEC = 10 TASK_DESCRIPTION = "My task description" # Create robot configuration robot_config = SO100FollowerConfig( id="my_awesome_follower_arm", cameras={ "front": OpenCVCameraConfig(index_or_path=0, width=640, height=480, fps=FPS) # Optional: fourcc="MJPG" for troubleshooting OpenCV async error. }, port="/dev/tty.usbmodem58760434471", ) teleop_config = SO100LeaderConfig( id="my_awesome_leader_arm", port="/dev/tty.usbmodem585A0077581", ) # Initialize the robot and teleoperator robot = SO100Follower(robot_config) teleop = SO100Leader(teleop_config) # Configure the dataset features action_features = hw_to_dataset_features(robot.action_features, "action") obs_features = hw_to_dataset_features(robot.observation_features, "observation") dataset_features = {**action_features, **obs_features} # Create the dataset dataset = LeRobotDataset.create( repo_id="/", fps=FPS, features=dataset_features, robot_type=robot.name, use_videos=True, image_writer_threads=4, ) # Initialize the keyboard listener and rerun visualization _, events = init_keyboard_listener() init_rerun(session_name="recording") # Connect the robot and teleoperator robot.connect() teleop.connect() # Create the required processors teleop_action_processor, robot_action_processor, robot_observation_processor = make_default_processors() episode_idx = 0 while episode_idx < NUM_EPISODES and not events["stop_recording"]: log_say(f"Recording episode {episode_idx + 1} of {NUM_EPISODES}") record_loop( robot=robot, events=events, fps=FPS, teleop_action_processor=teleop_action_processor, robot_action_processor=robot_action_processor, robot_observation_processor=robot_observation_processor, teleop=teleop, dataset=dataset, control_time_s=EPISODE_TIME_SEC, single_task=TASK_DESCRIPTION, display_data=True, ) # Reset the environment if not stopping or re-recording if not events["stop_recording"] and (episode_idx < NUM_EPISODES - 1 or events["rerecord_episode"]): log_say("Reset the environment") record_loop( robot=robot, events=events, fps=FPS, teleop_action_processor=teleop_action_processor, robot_action_processor=robot_action_processor, robot_observation_processor=robot_observation_processor, teleop=teleop, control_time_s=RESET_TIME_SEC, single_task=TASK_DESCRIPTION, display_data=True, ) if events["rerecord_episode"]: log_say("Re-recording episode") events["rerecord_episode"] = False events["exit_early"] = False dataset.clear_episode_buffer() continue dataset.save_episode() episode_idx += 1 # Clean up log_say("Stop recording") robot.disconnect() teleop.disconnect() dataset.push_to_hub() ``` #### Dataset upload Locally, your dataset is stored in this folder: `~/.cache/huggingface/lerobot/{repo-id}`. At the end of data recording, your dataset will be uploaded on your Hugging Face page (e.g. `https://huggingface.co/datasets/${HF_USER}/so101_test`) that you can obtain by running: ```bash echo https://huggingface.co/datasets/${HF_USER}/so101_test ``` Your dataset will be automatically tagged with `LeRobot` for the community to find it easily, and you can also add custom tags (in this case `tutorial` for example). You can look for other LeRobot datasets on the hub by searching for `LeRobot` [tags](https://huggingface.co/datasets?other=LeRobot). You can also push your local dataset to the Hub manually, running: ```bash hf upload ${HF_USER}/record-test ~/.cache/huggingface/lerobot/{repo-id} --repo-type dataset ``` #### Record function The `record` function provides a suite of tools for capturing and managing data during robot operation: ##### 1. Data Storage - Data is stored using the `LeRobotDataset` format and is stored on disk during recording. - By default, the dataset is pushed to your Hugging Face page after recording. - To disable uploading, use `--dataset.push_to_hub=False`. ##### 2. Checkpointing and Resuming - Checkpoints are automatically created during recording. - If an issue occurs, you can resume by re-running the same command with `--resume=true`. When resuming a recording, `--dataset.num_episodes` must be set to the **number of additional episodes to be recorded**, and not to the targeted total number of episodes in the dataset ! - To start recording from scratch, **manually delete** the dataset directory. ##### 3. Recording Parameters Set the flow of data recording using command-line arguments: - `--dataset.episode_time_s=60` Duration of each data recording episode (default: **60 seconds**). - `--dataset.reset_time_s=60` Duration for resetting the environment after each episode (default: **60 seconds**). - `--dataset.num_episodes=50` Total number of episodes to record (default: **50**). ##### 4. Keyboard Controls During Recording Control the data recording flow using keyboard shortcuts: - Press **Right Arrow (`→`)**: Early stop the current episode or reset time and move to the next. - Press **Left Arrow (`←`)**: Cancel the current episode and re-record it. - Press **Escape (`ESC`)**: Immediately stop the session, encode videos, and upload the dataset. #### Tips for gathering data Once you're comfortable with data recording, you can create a larger dataset for training. A good starting task is grasping an object at different locations and placing it in a bin. We suggest recording at least 50 episodes, with 10 episodes per location. Keep the cameras fixed and maintain consistent grasping behavior throughout the recordings. Also make sure the object you are manipulating is visible on the camera's. A good rule of thumb is you should be able to do the task yourself by only looking at the camera images. In the following sections, you’ll train your neural network. After achieving reliable grasping performance, you can start introducing more variations during data collection, such as additional grasp locations, different grasping techniques, and altering camera positions. Avoid adding too much variation too quickly, as it may hinder your results. If you want to dive deeper into this important topic, you can check out the [blog post](https://huggingface.co/blog/lerobot-datasets#what-makes-a-good-dataset) we wrote on what makes a good dataset. #### Troubleshooting: - On Linux, if the left and right arrow keys and escape key don't have any effect during data recording, make sure you've set the `$DISPLAY` environment variable. See [pynput limitations](https://pynput.readthedocs.io/en/latest/limitations.html#linux). ## Visualize a dataset If you uploaded your dataset to the hub with `--control.push_to_hub=true`, you can [visualize your dataset online](https://huggingface.co/spaces/lerobot/visualize_dataset) by copy pasting your repo id given by: ```bash echo ${HF_USER}/so101_test ``` ## Replay an episode A useful feature is the `replay` function, which allows you to replay any episode that you've recorded or episodes from any dataset out there. This function helps you test the repeatability of your robot's actions and assess transferability across robots of the same model. You can replay the first episode on your robot with either the command below or with the API example: ```bash lerobot-replay \ --robot.type=so101_follower \ --robot.port=/dev/tty.usbmodem58760431541 \ --robot.id=my_awesome_follower_arm \ --dataset.repo_id=${HF_USER}/record-test \ --dataset.episode=0 # choose the episode you want to replay ``` ```python import time from lerobot.datasets import LeRobotDataset from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig from lerobot.utils.robot_utils import precise_sleep from lerobot.utils.utils import log_say episode_idx = 0 robot_config = SO100FollowerConfig(port="/dev/tty.usbmodem58760434471", id="my_awesome_follower_arm") robot = SO100Follower(robot_config) robot.connect() dataset = LeRobotDataset("/", episodes=[episode_idx]) actions = dataset.select_columns("action") log_say(f"Replaying episode {episode_idx}") for idx in range(dataset.num_frames): t0 = time.perf_counter() action = { name: float(actions[idx]["action"][i]) for i, name in enumerate(dataset.features["action"]["names"]) } robot.send_action(action) precise_sleep(max(1.0 / dataset.fps - (time.perf_counter() - t0), 0.0)) robot.disconnect() ``` Your robot should replicate movements similar to those you recorded. For example, check out [this video](https://x.com/RemiCadene/status/1793654950905680090) where we use `replay` on a Aloha robot from [Trossen Robotics](https://www.trossenrobotics.com). ## Train a policy To train a policy to control your robot, use the [`lerobot-train`](https://github.com/huggingface/lerobot/blob/main/src/lerobot/scripts/lerobot_train.py) script. A few arguments are required. Here is an example command: ```bash lerobot-train \ --dataset.repo_id=${HF_USER}/so101_test \ --policy.type=act \ --output_dir=outputs/train/act_so101_test \ --job_name=act_so101_test \ --policy.device=cuda \ --wandb.enable=true \ --policy.repo_id=${HF_USER}/my_policy ``` Let's explain the command: 1. We provided the dataset as argument with `--dataset.repo_id=${HF_USER}/so101_test`. 2. We provided the policy with `policy.type=act`. This loads configurations from [`configuration_act.py`](https://github.com/huggingface/lerobot/blob/main/src/lerobot/policies/act/configuration_act.py). Importantly, this policy will automatically adapt to the number of motor states, motor actions and cameras of your robot (e.g. `laptop` and `phone`) which have been saved in your dataset. 3. We provided `policy.device=cuda` since we are training on a Nvidia GPU, but you could use `policy.device=mps` to train on Apple silicon. 4. We provided `wandb.enable=true` to use [Weights and Biases](https://docs.wandb.ai/quickstart) for visualizing training plots. This is optional but if you use it, make sure you are logged in by running `wandb login`. Training should take several hours. You will find checkpoints in `outputs/train/act_so101_test/checkpoints`. To resume training from a checkpoint, below is an example command to resume from `last` checkpoint of the `act_so101_test` policy: ```bash lerobot-train \ --config_path=outputs/train/act_so101_test/checkpoints/last/pretrained_model/train_config.json \ --resume=true ``` If you do not want to push your model to the hub after training use `--policy.push_to_hub=false`. Additionally you can provide extra `tags` or specify a `license` for your model or make the model repo `private` by adding this: `--policy.private=true --policy.tags=\[ppo,rl\] --policy.license=mit` #### Train using Google Colab If your local computer doesn't have a powerful GPU you could utilize Google Colab to train your model by following the [ACT training notebook](./notebooks#training-act). #### Upload policy checkpoints Once training is done, upload the latest checkpoint with: ```bash hf upload ${HF_USER}/act_so101_test \ outputs/train/act_so101_test/checkpoints/last/pretrained_model ``` You can also upload intermediate checkpoints with: ```bash CKPT=010000 hf upload ${HF_USER}/act_so101_test${CKPT} \ outputs/train/act_so101_test/checkpoints/${CKPT}/pretrained_model ``` ## Run inference and evaluate your policy Use `lerobot-rollout` to deploy a trained policy on your robot. You can choose different strategies depending on your needs: ```bash lerobot-rollout \ --strategy.type=base \ --policy.path=${HF_USER}/my_policy \ --robot.type=so100_follower \ --robot.port=/dev/ttyACM1 \ --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \ --task="Put lego brick into the transparent box" \ --duration=60 ``` ```bash lerobot-rollout \ --strategy.type=sentry \ --strategy.upload_every_n_episodes=5 \ --policy.path=${HF_USER}/my_policy \ --robot.type=so100_follower \ --robot.port=/dev/ttyACM1 \ --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \ --dataset.repo_id=${HF_USER}/eval_so100 \ --dataset.single_task="Put lego brick into the transparent box" \ --duration=600 ``` The `--strategy.type` flag selects the execution mode: - `base`: Autonomous rollout with no data recording (useful for quick evaluation) - `sentry`: Continuous recording with auto-upload (useful for large-scale evaluation) - `highlight`: Ring buffer recording with keystroke save (useful for capturing interesting events) - `dagger`: Human-in-the-loop data collection (see [HIL Data Collection](./hil_data_collection)) All strategies support `--inference.type=rtc` for smooth execution with slow VLA models (Pi0, Pi0.5, SmolVLA).