VideoDecoderCache used an unbounded dict keyed on absolute path, with no eviction in the standard LeRobotDataset path. With shuffled iteration over datasets that have many distinct mp4 files, every DataLoader worker accumulated one cached (VideoDecoder, fsspec file handle) pair per distinct path it had ever touched. Per-entry cost is ~3-5 MB of host RAM plus one open FD; at ~8 k entries this is roughly 30 GB per worker. This was hit in the wild during a SmolVLA training run on a 4,195-episode SO-101 dataset (8,390 mp4s, two cameras per episode). dmesg showed anon-rss climbing to 34.9 GB on a single pt_data_worker before the OOM killer fired ~30 min into training; with --num_workers=8 the per-worker peak halved to 17.9 GB, which is the expected inverse-scaling signature when the leak is per-decode and the workload is split across workers. The working workaround on the affected platform was --dataset.video_backend=pyav, because the pyav path opens/closes per call and never touches this cache. Switch the backing store to an OrderedDict and evict LRU entries when the cap is reached, closing the evicted file handle inside the lock so we do not leak FDs either. Default cap is DEFAULT_DECODER_CACHE_SIZE = 100, overridable via LEROBOT_VIDEO_DECODER_CACHE_SIZE or by passing max_size= to the constructor; max_size=None restores the legacy unbounded behaviour for callers that need it. Validation on the original failing workload (decode_video_frames_torchcodec called over real mp4s from the affected SO-101 dataset): unbounded: 300 files -> +1087 MB host RSS, cache=300, still climbing cap=50: 500 files -> +266 MB host RSS, cache=50, stable cap=50: 2000 calls -> +312 MB host RSS, cache=50, stable cap=100: 1000 calls -> +470 MB host RSS, cache=100, stable Three independent seeded runs at cap=50 agreed to within 1% (263 / 266 / 265 MB delta), and the 2000-call multi-pass run shows RSS plateaus after the cap is reached instead of drifting. Tests in tests/datasets/test_video_decoder_cache.py cover: default-is-bounded, size cap, LRU ordering, FD close on eviction, FD close on clear(), cache-hit invariance, max_size=None fallback, and env-var override. No regressions in test_video_encoding.py, test_streaming.py, or test_dataset_reader.py (73 prior tests still pass alongside the 8 new ones).
LeRobot aims to provide models, datasets, and tools for real-world robotics in PyTorch. The goal is to lower the barrier to entry so that everyone can contribute to and benefit from shared datasets and pretrained models.
🤗 A hardware-agnostic, Python-native interface that standardizes control across diverse platforms, from low-cost arms (SO-100) to humanoids.
🤗 A standardized, scalable LeRobotDataset format (Parquet + MP4 or images) hosted on the Hugging Face Hub, enabling efficient storage, streaming and visualization of massive robotic datasets.
🤗 State-of-the-art policies that have been shown to transfer to the real-world ready for training and deployment.
🤗 Comprehensive support for the open-source ecosystem to democratize physical AI.
Quick Start
LeRobot can be installed directly from PyPI.
pip install lerobot
lerobot-info
Important
For detailed installation guide, please see the Installation Documentation.
Robots & Control
LeRobot provides a unified Robot class interface that decouples control logic from hardware specifics. It supports a wide range of robots and teleoperation devices.
from lerobot.robots.myrobot import MyRobot
# Connect to a robot
robot = MyRobot(config=...)
robot.connect()
# Read observation and send action
obs = robot.get_observation()
action = model.select_action(obs)
robot.send_action(action)
Supported Hardware: SO100, LeKiwi, Koch, HopeJR, OMX, EarthRover, Reachy2, Gamepads, Keyboards, Phones, OpenARM, Unitree G1.
While these devices are natively integrated into the LeRobot codebase, the library is designed to be extensible. You can easily implement the Robot interface to utilize LeRobot's data collection, training, and visualization tools for your own custom robot.
For detailed hardware setup guides, see the Hardware Documentation.
LeRobot Dataset
To solve the data fragmentation problem in robotics, we utilize the LeRobotDataset format.
- Structure: Synchronized MP4 videos (or images) for vision and Parquet files for state/action data.
- HF Hub Integration: Explore thousands of robotics datasets on the Hugging Face Hub.
- Tools: Seamlessly delete episodes, split by indices/fractions, add/remove features, and merge multiple datasets.
from lerobot.datasets.lerobot_dataset import LeRobotDataset
# Load a dataset from the Hub
dataset = LeRobotDataset("lerobot/aloha_mobile_cabinet")
# Access data (automatically handles video decoding)
episode_index=0
print(f"{dataset[episode_index]['action'].shape=}\n")
Learn more about it in the LeRobotDataset Documentation
SoTA Models
LeRobot implements state-of-the-art policies in pure PyTorch, covering Imitation Learning, Reinforcement Learning, and Vision-Language-Action (VLA) models, with more coming soon. It also provides you with the tools to instrument and inspect your training process.
Training a policy is as simple as running a script configuration:
lerobot-train \
--policy=act \
--dataset.repo_id=lerobot/aloha_mobile_cabinet
| Category | Models |
|---|---|
| Imitation Learning | ACT, Diffusion, VQ-BeT, Multitask DiT Policy |
| Reinforcement Learning | HIL-SERL, TDMPC & QC-FQL (coming soon) |
| VLAs Models | Pi0Fast, Pi0.5, GR00T N1.5, SmolVLA, XVLA |
Similarly to the hardware, you can easily implement your own policy & leverage LeRobot's data collection, training, and visualization tools, and share your model to the HF Hub
For detailed policy setup guides, see the Policy Documentation. For GPU/RAM requirements and expected training time per policy, see the Compute Hardware Guide.
Inference & Evaluation
Evaluate your policies in simulation or on real hardware using the unified evaluation script. LeRobot supports standard benchmarks like LIBERO, MetaWorld and more to come.
# Evaluate a policy on the LIBERO benchmark
lerobot-eval \
--policy.path=lerobot/pi0_libero_finetuned \
--env.type=libero \
--env.task=libero_object \
--eval.n_episodes=10
Learn how to implement your own simulation environment or benchmark and distribute it from the HF Hub by following the EnvHub Documentation
Resources
- Documentation: The complete guide to tutorials & API.
- Chinese Tutorials: LeRobot+SO-ARM101中文教程-同济子豪兄 Detailed doc for assembling, teleoperate, dataset, train, deploy. Verified by Seed Studio and 5 global hackathon players.
- Discord: Join the
LeRobotserver to discuss with the community. - X: Follow us on X to stay up-to-date with the latest developments.
- Robot Learning Tutorial: A free, hands-on course to learn robot learning using LeRobot.
Citation
If you use LeRobot in your project, please cite the GitHub repository to acknowledge the ongoing development and contributors:
@misc{cadene2024lerobot,
author = {Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Palma, Steven and Kooijmans, Pepijn and Aractingi, Michel and Shukor, Mustafa and Aubakirova, Dana and Russi, Martino and Capuano, Francesco and Pascal, Caroline and Choghari, Jade and Moss, Jess and Wolf, Thomas},
title = {LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch},
howpublished = "\url{https://github.com/huggingface/lerobot}",
year = {2024}
}
If you are referencing our research or the academic paper, please also cite our ICLR publication:
ICLR 2026 Paper
@inproceedings{cadenelerobot,
title={LeRobot: An Open-Source Library for End-to-End Robot Learning},
author={Cadene, Remi and Alibert, Simon and Capuano, Francesco and Aractingi, Michel and Zouitine, Adil and Kooijmans, Pepijn and Choghari, Jade and Russi, Martino and Pascal, Caroline and Palma, Steven and Shukor, Mustafa and Moss, Jess and Soare, Alexander and Aubakirova, Dana and Lhoest, Quentin and Gallou\'edec, Quentin and Wolf, Thomas},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://arxiv.org/abs/2602.22818}
}
Contribute
We welcome contributions from everyone in the community! To get started, please read our CONTRIBUTING.md guide. Whether you're adding a new feature, improving documentation, or fixing a bug, your help and feedback are invaluable. We're incredibly excited about the future of open-source robotics and can't wait to work with you on what's next—thank you for your support!


