Covers feature mapping, auto-padding, per-dataset transforms,
weighted sampling, stats aggregation, and full config examples
for training across RoboCasa, LIBERO-plus, and RoboMME datasets.
Made-with: Cursor
Multi-dataset training support:
- NewMultiLeRobotDataset with per-dataset feature mapping, auto-padding,
per-dataset transform pipelines, and weighted sampling
- MultiDatasetMeta shim compatible with EpisodeAwareSampler and make_policy
- WeightedEpisodeAwareSampler for proportional cross-dataset sampling
- SubDatasetConfig / MultiDatasetConfig in training configs
- DatasetTransformPipeline with built-in PadAction, PadState, ResizeImages
- Factory and training script wired up for multi-dataset path
RoboMME environment integration:
- RoboMMEEnv config and Gymnasium wrapper (robomme.py)
- robomme optional dependency in pyproject.toml
Made-with: Cursor
Add LiberoPlusEnv config (subclass of LiberoEnv), register libero_plus
env type in factory, add import fallbacks for LIBERO-plus package
structure, and add libero_plus optional dependency group in pyproject.toml.
Made-with: Cursor
Integrates 5 selected RoboCasa kitchen tasks (3 short + 2 long) as a
LeRobot benchmark environment, following the same pattern as Libero.
Selected tasks:
Short: PickPlaceCounterToCabinet, PrepareToast, CoffeeSetupMug
Long: PrepareCoffee, RestockPantry
Changes:
- envs/robocasa.py: RoboCasaEnv wrapper with flat 12D Box action space,
3-camera pixel obs, and 16D proprioceptive state
- envs/configs.py: RoboCasaEnv config with features_map
- envs/factory.py: wire robocasa into make_env + make_env_pre_post_processors
- processor/env_processor.py: RoboCasaProcessorStep for obs key remapping
- tests/test_robocasa_env.py: full test suite (auto-skips if assets missing)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a `cudnn_deterministic` flag to `TrainPipelineConfig` (default: False)
that sets `torch.backends.cudnn.deterministic = True` and disables benchmark
mode, eliminating CUDA floating-point non-determinism at the cost of ~10-20%
training speed. When False (default) the existing benchmark=True behaviour
is preserved.
* fix(ci): skip HF log in (and tests) in forks and community PRs
* chore(test): remove comment about test meant to be only run locally
* fix(tests): no hf log in decorator for xvla
* fix(test): no decorator in yield
* Add SLURM SARM progress annotation script.
Provide a standalone two-stage compute/aggregate pipeline for RA-BC progress generation so large datasets can be processed in parallel and optionally uploaded to the Hub.
Made-with: Cursor
* fix pr comments
* remove comments
* chore(docstrings): updating v2.1-v3.0 conversion script docstrings to match the new task label
* chore(task): renamming the default index label in the tasks DataFrame to task
* Revert "chore(docstrings): updating v2.1-v3.0 conversion script docstrings to match the new task label"
This reverts commit f55de3255278f23f18b5d955565f6768d094951d.
* chore(docstrings): updating docstrings to match dataset v3.0 architecture
* chore(format): formatting code
* Fixing metadata indexing when writing new Parquet file
Summary:
- addressing this issue: https://github.com/huggingface/lerobot/issues/2401
- vibe-coded bugfix by Claude Sonnet 4.5
* Backing out changes to convert_videos_of_camera
* Addressing Ruff pre-commit complaint
Summary:
- addressing "SIM113 Use `enumerate()` for index variable `ep_idx` in `for` loop"
---------
Co-authored-by: Paul <238953601+pac-robotics@users.noreply.github.com>
* fix(root): adding proper support for the root and new_root arguments
* feat(roots): adding a roots agrument for the merge operation
* chore(clean): cleaning up code
* chore(doctrings): updating doctrings with new features
* fix(repo_id): setting repo_id to None when not needed
* fix(roots/repo_ids): making mypy happy by using repo_ids and roots for merge operation
* fix(path): fixing path related issues
* fix(repo_id): fixing issues related to repo_id
* chore(doctrings): updating docstrings + fix typo
* chore(clean): cleaning code
* fix(split new_repo_id): reverting new_repo_id addition for split operation
* docs(dosctrings): completing docstrings
* fix(repo_ids/roots): improving checks for repo_ids/roots lengths
* fix(repo_ids): making repo_ids optional in MergeConfig but raise if not given
* fix(docstrings): fixing docstrings for split operation
* fix(hints): updating get_output_path hints to accept paths as strings too
* fix(y/N prompts): removing y/N prompts in lerobot_edit_dataset
* fix(merge repo_id): fixing merge operation to use new_repo_id instead of repo_id
* fix(typo): fixing typo in doctrings
* fix(frame_index): making rerun's "frame_index" timeline compatible with behaviour1k datasets
* fix(segfault risk): removing segfault risk by calling batch["index"] in the dataloader loop
* feat(async-inference) Try using async inference server with plugins
* Fix import
* Fix import error in Robot Client
---------
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
* Fix SmolVLA meta tensor error by removing device_map
- Remove device_map parameter from VLM model loading
- Change torch_dtype from string to torch.bfloat16
- Add explicit .to(device) calls after initialization
This resolves NotImplementedError when training SmolVLA policy.
Fixes meta tensor copy issue in factory.py:418.
* fix: remove manual device movement logic and fix dtype handling
---------
Co-authored-by: Highsky7 <albert31115@gmail.com>
* add OpenArm Mini config and module init
* add OpenArm Mini teleoperator implementation
* add OpenArm Mini into factory and setup motors
---------
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Replaced assert statements with FrameTimestampError exceptions in
decode_video_frames_torchvision and decode_video_frames_torchcodec.
Assertions are unsuitable for runtime validation because they can be
silently disabled with python -O, and they produce unhelpful
AssertionError tracebacks. The codebase already defines
FrameTimestampError for this exact purpose but it was only used
in one of the three validation sites.
Also removed AssertionError from the except clause in
LeRobotDataset.__init__, which was masking video timestamp errors
by silently triggering a dataset re-download instead of surfacing
the actual problem.
1. Include metaworld_config.json in package distributions by adding it to
both MANIFEST.in (for sdist) and pyproject.toml package-data (for wheels).
Without this, pip-installed lerobot raises FileNotFoundError when
importing the metaworld environment.
2. Fix crash in sanity_check_dataset_name where the error message accesses
policy_cfg.type when policy_cfg is None, raising AttributeError instead
of the intended ValueError.
Fixes#2958
* fix(dataset): Reindex videos based on frame and not on time
Sometimes during split operations the frame timestamp floating
precision leads to frame ending up in the wrong split.
This changes fixes the issues by directly working with frame indices
instead.
* Fix formatting