Compare commits

...

63 Commits

Author SHA1 Message Date
Pepijn f147a4cd48 Add inference for training time rtc 2026-01-29 11:05:42 +01:00
Pepijn c3fa269b21 Merge branch 'main' into feat/training_time_rtc 2026-01-27 17:34:56 +01:00
Reece O'Mahoney f6b1c39b78 docs: update libero (#2857)
* update libero docs

* Update docs/source/libero.mdx

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Jade Choghari <chogharijade@gmail.com>

---------

Signed-off-by: Jade Choghari <chogharijade@gmail.com>
Co-authored-by: Jade Choghari <chogharijade@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 15:31:53 +01:00
Pepijn 0c0c171d35 Add robot images to docs (#2862)
* Add robot images to docs

* increase img size

* remove img so100
2026-01-27 13:33:45 +01:00
Steven Palma 9cfb5ce546 feat(motors): add damiao motors & can bus (#2788)
* fix(motors): cleanup imports + fix signatures

* feat(motors): add damiao canbus + multiple fixes

* fix(motors): address comments -> last_state + different gains + sleep

* refactor(motors): reduce duplicated code + adressed some comments in the PR

* chore(motors): better timeouts

* tests(motors): damiao test and imports

* chore(deps): fix space

* Apply suggestions from code review

Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(motors): remove normalization tables damiao

* fix(motors): imports and signatures

* feat(motors): add motor_type_str + recv_id to motor class and _get_motor_recv_id raises if no motor_obj.recv_id

* chore(motors): remove normalize from base motor class and damaio

* tests(motors): remove bad tests (to be replaced)

* chore(motors): updated import check

* use constant for kp and kd range and check responses in mit_control_batch()

* Add docs on setting up canbus and use damiao otor bus, also add lerobot_setup_can.py and log if there is not response from a write command

* precommit format

* supress bandit as these are intentional cli commands

* fix setup-can

* add test

* skip test in ci

* nit precommit

* update doc example

* dont import can for tests

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Pepijn <pepijn@huggingface.co>
2026-01-26 17:53:25 +01:00
Reece O'Mahoney 366bef915c add task ids to libero env cfg (#2842) 2026-01-26 17:26:49 +01:00
Woojin Wie 9e10eb4a77 fix(robots): update gripper configuration and calibration settings for OMX (#2815) 2026-01-25 22:29:37 +01:00
Steven Palma 6d34a986de feat(ci): trigger manually documentation release version (#2841) 2026-01-22 12:26:17 +01:00
Steven Palma 961277d86e chore(dependencies): Bump lerobot to 0.4.4 (#2840) 2026-01-22 12:24:12 +01:00
Pepijn 385ba8d1b7 remove wall-oss from doc links 2026-01-20 20:11:56 +01:00
Pepijn f4ccf911fa format 2026-01-20 20:08:28 +01:00
Pepijn 0cb8c92fe4 Implement training time rtc for pi0, pi0.5 and smolvla 2026-01-20 20:02:10 +01:00
Steven Palma 0b067df57d feat(robots): add context managers (#2828) 2026-01-20 18:02:38 +01:00
Tommy in Tongji 9ca680dce2 Update README.md (#2827)
Add Chinese doc link.

Signed-off-by: Tommy in Tongji <36354458+TommyZihao@users.noreply.github.com>
2026-01-20 17:54:24 +01:00
sato_shinji 9919b16b36 fix: ensure action tensors are moved to client_device in async training (#2792)
* feat(async_inference): server always sends CPU tensors, client handles device conversion

* fix:fix the type annotation of RawObservation in src/lerobot/async_inference/helpers.py

* update the import of robot_client

---------

Co-authored-by: Sato shinji <wwwsatoshinji@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: KB <kevin-brian.n-diaye@epita.fr>
2026-01-20 15:17:38 +01:00
Caroline Pascal d36dfcdf71 fix(discord link): fixing discord link in CONTRIBUTING.md (#2826)
Signed-off-by: Caroline Pascal <caroline8.pascal@gmail.com>
2026-01-20 15:00:45 +01:00
Alexis D 13bfee1aa4 Set 10 direction bit for Current Load attribute (#1014) 2026-01-20 11:20:30 +01:00
Jade Choghari 79688a09f2 improve(dataset-tools): image2video editing tools : Multiple episodes per video file (#2811)
* improve image2video

* add episodes video encoding

* fix mypy failing

* iterate on review

* nit

* remove max, and let it be optional

* iterate more

* update docs

* fix test

---------

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
2026-01-20 11:04:22 +01:00
Francesco Capuano b2ff219624 Fixes aggregation of image datasets (#2717)
* fix: use features when aggregating image based datasets

* add: test asserting for data type

* add: features param to writing dataset

---------

Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2026-01-19 23:36:41 +01:00
Maximilian Ofir 66929c5935 feat: add async server-client streaming support for Groot policy (#2812) 2026-01-19 22:13:48 +01:00
Steven Palma 5286ef8439 feat(utils): extend import check util (#2820)
* refactor(utils): is_package_available now differentiate between pkg name and module name

* refactor(tests): update require_package decorator
2026-01-19 16:43:11 +01:00
bigmbigk fe068df711 fix(train): eval env initialization on train script (#2818)
* fix: eval env initialization on train script

Signed-off-by: bigmbigk <bigmbigk@gmail.com>

* fix: eval env creation condition

---------

Signed-off-by: bigmbigk <bigmbigk@gmail.com>
2026-01-19 14:14:10 +01:00
Sung-Wook Lee da41646073 fix libero reset logic for correct resetting (#2817) 2026-01-19 13:18:52 +01:00
Steven Palma 46e19ae579 feat: is connect checks decorators (#2813) 2026-01-16 18:52:06 +01:00
Alex Tyshka 77dc49b3a3 Fix delta timestamps with episodes filter and add tests (#2612) 2026-01-16 18:14:54 +01:00
Alex Tyshka 33910673ec Bugfix: Add tests for image deletion and fix mixed image-video deletion (#2592)
* Add tests for image deletion and fix mixed-image-video deletion

* Fix docstring whitespace

* Remove debug print

Signed-off-by: Alex Tyshka <atyshka15@gmail.com>

* Remove inaccurate comment

* Remove batched video test

---------

Signed-off-by: Alex Tyshka <atyshka15@gmail.com>
2026-01-16 18:14:15 +01:00
Michel Aractingi 19dce78457 Refactor: Move PEFT config from training script to policy level (#2806)
* move peft config from `lerobot_train` to policy level

* Update src/lerobot/scripts/lerobot_train.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co>

* copilot response

* Change the polciy function to return targets rather than peft config.`_get_default_peft_targets()` override in PI0, PI0.5, SmolVLA

* remove none check when building config dict

---------

Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co>
2026-01-16 17:14:28 +01:00
Steven Palma 112b2d173a chore(ci): deactivates cron job on unbound dep tests (#2810) 2026-01-16 14:39:00 +01:00
Steven Palma b825880c40 chore: add security policy (#2809)
* chore: add security policy

* pre-commit style
2026-01-16 14:38:42 +01:00
./c² 76d6b71b0a Correct Frodobots Earth Rover SDK link and add setup instructions (#2671)
* Fix SDK link and enhance setup instructions

Updated the Frodobots SDK link and added credential setup instructions.

Signed-off-by: ./c² <cagataycali@icloud.com>

* Update docs/source/earthrover_mini_plus.mdx

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: ./c² <cagataycali@icloud.com>

* Update docs/source/earthrover_mini_plus.mdx

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

---------

Signed-off-by: ./c² <cagataycali@icloud.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2026-01-16 02:39:58 +01:00
Nicolas Rabault 5de38813d9 Add small context to the envHub doc page (#2807)
* Add small context to the envHub doc page

* Add the cfg: EnvConfig on the main function explaination.
2026-01-15 18:31:17 +01:00
Neko 6797ce615e chore(deps): bump wandb & protobuf (#2800) 2026-01-15 10:51:42 +01:00
Steven Palma a17df523e0 chore(ci): merge annoying section in PR template (#2802)
* chore(ci): merge annoying section in PR template

* pre-commit
2026-01-14 17:17:56 +01:00
Steven Palma 1c61b43b15 fix(teleop): add is_connected check to get_action (#2801) 2026-01-14 17:14:12 +01:00
Steven Palma 15724826dd chore: use alias & constants (#2785)
* chore: use alias and constants

* fix(rl): solve circular dependecy

* chore: nit right constant

* chore: pre-commit

* chore(script): conflict tokenizer train

---------

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
2026-01-13 09:49:46 +01:00
Jade Choghari 2cdd9f43f7 fix: train tokenizer CLI entry point (#2784) 2026-01-13 01:42:53 +01:00
samet-rob d0f57f58d1 Move cfg.validate() earlier to fix NoneType error with --policy.path (#2782) 2026-01-12 19:24:19 +01:00
Steven Palma b8ec1152d4 fix(robots): add reachy2 fixes (#2783)
* fix(robots): add reachy2 fixes

* tests(robots): remove reachy sdk stub
2026-01-12 18:05:16 +01:00
Martino Russi 6b8d4c75a6 Feat/g1 improvements record sim (#2765)
This PR extends the integration of Unitree g1 with the LeRobot codebase. By converting robot state to a flat dict we can now record and replay episodes (example groot/holosoma scripts need to be adjusted as well). We also improve the simulation integration by calling .step @ _subscribe_motor_state instead of it running in a separate thread. We also add ZMQ camera to lerobot, streaming base64 images over json
2026-01-12 17:31:39 +01:00
Steven Palma d791a431fe feat(robots): consolidates bi SO setups (#2780)
* feat(robots): consolidates bi SO setups

* fix(robots): solve circular dependecy

* fix(robots): teleop & record working

* feat(robots): only one SO

* fix(utils): rename bi so

* fix(scripts): bi so import

* fix(rl): remove imports
2026-01-12 16:01:22 +01:00
Jade Choghari 473f1bd0e0 docs: improve assets (#2777)
* add assets

* add libero results pifast:

* update

* update

* update size

* update naems:
:

* update training tokenizer
2026-01-12 13:33:28 +01:00
Michel Aractingi 91ff9c4975 Fix: Respect policy.device=cpu config in training (#2778)
* fix cpu training in lerobot_train

* Update src/lerobot/scripts/lerobot_train.py

Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co>
2026-01-12 12:19:02 +01:00
Jade Choghari 1d86c9b7f2 feat(policies): add autoregressive VLAs with tokenization PiFast (#2734) 2026-01-09 23:08:37 +01:00
Pepijn ba3d2148a3 skip peft cmd test in cli (#2776)
* skip peft cmd test in cli

* pre commit

* update desc
2026-01-09 19:10:02 +01:00
Leo Tronchon 8b6fc0ae05 feat(datasets): expose video codec option for dataset recording (#2771)
* expose codec options + add tests

* pre-commit run -a
2026-01-08 18:06:39 +01:00
Steven Palma 242b65d2df chore(docs): update code block syntax to specify python for clarity (#2770) 2026-01-08 14:45:07 +01:00
Steven Palma ccfd609ece feat(robots): consolidate SO arms implementation (#2763)
* feat(robots): consolidate SO arms implementation

* chore(robots): delete unnecessary init modules
2026-01-08 13:04:30 +01:00
Steven Palma fbe4c8b94f Feat/remote rerunviz encoded images (#2767)
* feat(visualization): allow remote viewer + compress rerun images

* fix(tests): allow named argument in mocked rerun

* feat(visualization): ip instead or url & cli arg for compressing images

---------

Co-authored-by: J4nn1K <jannik@grothusen.de>
2026-01-07 17:38:13 +01:00
Steven Palma 4f7cd8d369 Revert "feat(visualization): allow remote viewer + compress rerun images (#2756)" (#2766)
This reverts commit f844c7a458.
2026-01-07 17:33:36 +01:00
Steven Palma f844c7a458 feat(visualization): allow remote viewer + compress rerun images (#2756)
* feat(visualization): allow remote viewer + compress rerun images

* fix(tests): allow named argument in mocked rerun

* feat(visualization): ip instead or url & cli arg for compressing images
2026-01-07 17:30:45 +01:00
Martino Russi 7e9d05a799 add holosoma locomotion (#2669)
Add holosoma locomotion from Amazon-FAR
Add reset method to unitree_g1
Format actions as dict
Update docs
2026-01-07 16:05:31 +01:00
Steven Palma ecd8cd23d2 chore(dependencies): bound new dependecies (#2759) 2026-01-07 11:04:21 +01:00
Pauline Bailly-Masson a9d81e7f67 refactor(ci): Docker Hub image env (#2755)
* Refactor Docker Hub image env

Updated environment variable usage for Docker Hub credentials and corrected image tag extraction.

Signed-off-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com>

* same

Signed-off-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com>

* Apply suggestions from code review

Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>

* chore(ci): remove duplicated IMAGE_FULL variable definition

---------

Signed-off-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com>
Signed-off-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2026-01-07 00:21:03 +01:00
Steven Palma e2957d7783 fix: precise_sleep is never called with negative value (#2757) 2026-01-06 20:09:43 +01:00
Jade Choghari 963a3482fa typo LW (#2758) 2026-01-06 18:17:29 +01:00
Tong Wu 603d44434f fix a bug for kwargs in wallx (#2714)
* support wallx

* fix bugs in flow

* incorporate wallx model into lerobot

* update the policy methods

* reduce to least config and params & pass lerobot basic test

* fixed dtype bugs

* add wallx dependencies

* update

* remove flash-attn requirement && fix bug in inference and fast mode

* fix bug for inference

* add some small modifications

* fix pre-commit errors

* remove lerobot[wallx]

* fix ci

* fix precommit issues

* fix: exclude wallx extra properly in CI workflows

* fix: add uv conflicts for wallx transformers version

* fix: peft test import

* pre-commit

* only export WallXConfig from wall_x package to avoid peft import in CI

* remove torch dep

* precommit

* add import

* update doc files

* fix minor errors

* fix a bug for kwargs

* fix precommit issue

---------

Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: vincentchen <chenlufang@x2robot.com>
Co-authored-by: Geoffrey19 <sympathischmann35@gmail.com>
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Pepijn <pepijn@huggingface.co>
Co-authored-by: geoffrey <geoffrey@x2robot.com>
2026-01-06 15:13:35 +01:00
Pepijn 6106a8136c Fix invalid syntax (#2752)
* fix invalid syntax

* also skip for torchdiffeq

* fix patch for gpu tests
2026-01-05 12:13:42 +01:00
githubnemo e670ac5daf Add basic PEFT support to train script + record module (#1411)
* Add basic support for PEFT adapter methods

This changes adds support for training policies with much less parameters
by applying adapter methods such as LoRA on specific parts of the policies
and therefore possibly higher learning rates / batch sizes.

To make this as accessible as possible I thought it useful to provide
defaults for `target_modules` and `modules_to_save`. Currently only SmolVLA
has such defaults but when we agree that this change is useful I will set
out to generate more such defaults. While the user can override these
settings, they are expected to only change the peft_method, rank and init_type
parameters.

* Implement loading of PEFT adapters

Loading a PEFT adapter is currently done by initializing a policy with default config
and then applying the adapter on the resulting model. This has the obvious drawback
that any configurations done during training are not applied in the adapted model.

Currently the `use_peft` attribute of `PreTrainedConfig` is only set during loading
to signal the following code that it has to deal with a PEFT adapter. However
we could imagine a scenario where this is already set at training time and stored
alongside the adapter.

* Store policy config alongside PEFT checkpoint

Before this change the PEFT-wrapped policy did not save the policy's config
alongside the adapter config / weights which prevented us from changing the
policy config. Now the policy config is saved both in full training and PEFT
training.

This change makes loading the PEFT policy adapter much easier as well.

* Add default config for ACT

* Support targets like `all-linear`

* Formatting

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix failing tests

* Remove PEFT compatibility changes in config

We'll wait for the PEFT release that fixes this for good.

* Remove `use_peft` parameter from training script

Instead we make the PEFT config optional which has the same effect.

* Log adapter config to WandB

* Better documentation for CLI arguments

* Don't unload & merge the PEFT model

This can make things hard when using quantized layers (user expects quantized base layers with
unquantized adapters for example, merging defaults to upcast the layers leading to higher
memory).

* Correct way of identifying when to save config

* Add CLI end-to-end tests

Currently there don't seem to be any way to test the CLI commands.
Since this change mostly happens in those I thought it best to add
a way to test these commands end-to-end.

More integrated commands like `lerobot-record` need patching but
standalone commands like training seem to work fine.

* Update default targets

Removed ACT since it doesn't make sense to fine-tune ACT without having it pretrained beforehand.
SmolVLA and Pi0/0.5 are much more senseful targets.

* Clean up loading code

- Centralized instantiation of the PEFT wrapper in `make_policy` for inference
  (e.g. in `lerobot-record`)
- Training a PEFT policy also sets `cfg.use_peft` so that all inference code loading
  the policy can rely on that attribute to identify if PEFT loading is needed
- Modified RTC example to also include PEFT policies. Mostly because this is an example
  I'm currently exploring.

* Make sure push_to_hub works

Since PEFT only wraps `push_to_hub` and not `push_model_to_hub`, the reference
to `self` in `policy.push_model_to_hub` is the unwrapped policy which, of course,
doesn't know anything about PEFT.

To make the upload process aware of PEFT, we pass the unwrapped policy down to
`push_model_to_hub` as a kwarg. This is not ideal but I think it is the best way
for now.

* formatting

* Warn when encountering from-scratch-training

* Revamp pretrained model loading

There were quite a few factors that convinced me that the status quo
is able to load pretrained models from the PEFT adapter config but
in fact that didn't work.

This commit fixes the following things:
- policies wrapped in PEFT will now have a `name_or_path` attribute
  containing the name or path of the pretrained model we're fine-tuning
- we further assume that SmolVLA without `pretrained_path` and
  `load_vlm_weights==False` must be an user-side error
- we assume that using PEFT on from-scratch-policies must be
  an user-side-error

* Make it possible to unset policy features

This is necessary to train pre-trained policies on new datasets so that the
features are inferred from the new dataset and not from the pretrained
policy.

* Use correct loading for PEFT in RTC example

* Make it possible to use PeftModels in eval

* Add test checking that PEFT actually reduces params

* Adapt state/action projections instead of full-finetuning

There doesn't seem to be a benefit to fully fine-tune these layers
over just adapting them, so we do that instead.

* Disallow PEFT training on non-pretrained policies

At first I thought it would make sense to have this feature
in case you want to fine-tune a pre-trained section but in the
end it makes more trouble than it's worth.

It's still possible to allow this in the future when a concrete
need arises.

* Add basic documentation

* Formatting

* Add peft as extra dependency, mark tests

Fast tests currently fail because of the missing dependency.

* Fix pre-commit issues

* Add walx <> peft conflict for uv

* Exclude peft from pi install for now

---------

Co-authored-by: nemo <git@ningu.net>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
2026-01-05 08:51:26 +01:00
Steven Palma 75ab388ecd chore(readme): update discord invitation link (#2750) 2026-01-04 17:24:56 +01:00
Lior Ben Horin 17c115c71f IsaacLab Arena Integration documentation update (#2749)
* wording

* added how to guide to build you own envhub repos

* include LW edits

* wording

* chat fixes

* additional

* wording

* wording

* wording

* pre commit fixes
2026-01-04 16:41:21 +01:00
Kartik fc296548cb feat(envs): Add NVIDIA IsaacLab-Arena Lerobot (#2699)
* adding Isaaclab Arena from collab

* adding into lerobot-eval

* minor modification

* added bash script for env setup

* setups

* fix applauncher not getting the arguments

* data conversion, train and eval smolvla

* fixed imports

* clean-up

* added test suits & clean up - wip

* fixed video recording

* clean-up

* hub integration working

* clean-up

* added kwargs

* Revert "added kwargs"

This reverts commit 9b445356385d0707655cf04d02be058b25138119.

* added kwargs

* clean-up

* cleaned unused function

* added logging

* docs

* cleaned up IsaaclabArenaEnv

* clean-up

* clean-up

* clean up

* added tests

* minor clean-up

* fix: support for state based envs

* feat(envs): Add NVIDIA IsaacLab Arena integration with LeRobot for policy evaluation at scale

* feat(envs): Add IsaacLab Arena integration for policy evaluation

Integrate NVIDIA IsaacLab Arena with LeRobot to enable GPU-accelerated
simulation through the EnvHub infrastructure.

This enables:
- Training imitation learning policies (PI0, SmolVLA, etc.)
- Evaluating trained policies in with IsaacLab Arena

The implementation adds:
- IsaaclabArenaEnv config with Arena-specific parameters
- IsaaclabArenaProcessorStep for observation processing
- Hub loading from nvkartik/isaaclab-arena-envs repository
- Video recording support

Available environments include GR1 microwave manipulation, Galileo
pick-and-place, G1 loco-manipulation, and button pressing tasks.

Datasets: nvkartik/Arena-GR1-Manipulation-Task
Policies: nvkartik/pi05-arena-gr1-microwave,
          nvkartik/smolvla-arena-gr1-microwave

* added isaaclab arena wrapper and corresponding tests

* added error handling

* renamed wrapper file: isaaclab_arena to isaaclab

* added extra kwarg changes

* adjustments for hub envs

* correct class name in test file

* fixed parsing of env_kwargs

* tested end to end

* removed unused code

* refactor design

* shifted IsaacLab to hub

* removed IsaacLab tests

* docs: Add LW-BenchHub evaluation instructions

* docs: Add LW-BenchHub evaluation instructions

* docs diet

* minor edits to texts

* IL Arena commit hash

* update links

* minor edits

* fix numpy version after install of lerobot

* links update

* valideated on vanilla brev

* docs: Add LW-BenchHub evaluation instructions

* remove kwargs from all make_env calls

* remove kwargs from all make_env calls

* fix LW table and indentations

* remove environment list from docs

* docs: Update lw-benchhub eval config in envhub docs

* removing kwargs

* removed extra line

* ensure pinocchio install for lightwheel + add lightwheel website link

* remove env_kwargs

* no default empty value for hub_path

* not using assert method

* remove env_processor defaults

* revert and adding default "" value for hub_path

* pinning down packages versions

* explicit None value for hub_path

* Update src/lerobot/configs/eval.py

Co-authored-by: Jade Choghari <chogharijade@gmail.com>
Signed-off-by: Lior Ben Horin <liorbenhorin@gmail.com>

* corrected formatting

* corrected job_name var in config

* updated docs and namespace

* updated namespace

* updated docs

* updated docs

* added hardware requirements

* updated docs

---------

Signed-off-by: Lior Ben Horin <liorbenhorin@gmail.com>
Co-authored-by: lbenhorin <lbenhorin@nvidia.com>
Co-authored-by: Lior Ben Horin <liorbenhorin@gmail.com>
Co-authored-by: Jade Choghari <chogharijade@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: tianheng.wu <tianheng.wu@lightwheel.ai>
2026-01-02 20:36:24 +01:00
arya 9701b9c273 feat(pi0): add train_expert_only and freeze_vision_encoder flags to pi0 and pi0.5 (#2727)
* feat(pi0): add train_expert_only and freeze_vision_encoder options

* pi_05: train_expert_only and freeze_vision_encoder flags

* comment clean up

* docs: add finetuning parameters to pi0 and pi05 docs

* updating docs to follow standards
2025-12-31 15:54:28 +01:00
Steven Palma 6d0d65a5fe chore: adds dynamic README handling and setup script (#2724) 2025-12-28 01:45:06 +01:00
222 changed files with 10987 additions and 2398 deletions
+5 -4
View File
@@ -22,20 +22,21 @@ Short, imperative summary (e.g., "fix(robots): handle None in sensor parser"). S
- Short, concrete bullets of the modifications (files/behaviour).
- Short note if this introduces breaking changes and migration steps.
## How was this tested
## How was this tested (or how to run locally)
- Tests added: list new tests or test files.
- Manual checks / dataset runs performed.
- Instructions for the reviewer
## How to run locally (reviewer)
Example:
- Run the relevant tests:
- Ran the relevant tests:
```bash
pytest -q tests/ -k <keyword>
```
- Run a quick example or CLI (if applicable):
- Reproduce with a quick example or CLI (if applicable):
```bash
lerobot-train --some.option=true
+12 -1
View File
@@ -18,6 +18,11 @@ name: Documentation
on:
# Allows running this workflow manually from the Actions tab
workflow_dispatch:
inputs:
version:
description: 'Version tag (e.g. v0.1.2) - Leave empty for standard main build'
required: false
type: string
# Triggers the workflow on push events to main for the docs folder
push:
@@ -54,7 +59,13 @@ jobs:
with:
commit_sha: ${{ github.sha }}
package: lerobot
additional_args: --not_python_module ${{ github.event_name == 'release' && format('--version {0}', github.event.release.tag_name) || '' }}
additional_args: >-
--not_python_module
${{
(github.event_name == 'release' && format('--version {0}', github.event.release.tag_name)) ||
(inputs.version != '' && format('--version {0}', inputs.version)) ||
''
}}
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
+8 -5
View File
@@ -186,15 +186,18 @@ jobs:
steps:
- name: Get Docker Hub Token and Delete Image
# zizmor: ignore[template-injection]
env:
DOCKERHUB_LEROBOT_USERNAME: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
DOCKERHUB_LEROBOT_PASSWORD: ${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}
IMAGE_FULL: ${{ needs.build-and-push-docker.outputs.image_tag }}
run: |
IMAGE_NAME=$(echo "${{ needs.build-and-push-docker.outputs.image_tag }}" | cut -d':' -f1)
IMAGE_TAG=$(echo "${{ needs.build-and-push-docker.outputs.image_tag }}" | cut -d':' -f2)
IMAGE_NAME=$(echo "$IMAGE_FULL" | cut -d':' -f1)
IMAGE_TAG=$(echo "$IMAGE_FULL" | cut -d':' -f2-)
echo "Attempting to delete image: $IMAGE_NAME:$IMAGE_TAG"
TOKEN=$(curl -s -H "Content-Type: application/json" \
-X POST \
-d '{"username": "${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}", "password": "${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}"}' \
-d "{\"username\": \"$DOCKERHUB_LEROBOT_USERNAME\", \"password\": \"$DOCKERHUB_LEROBOT_PASSWORD\"}" \
https://hub.docker.com/v2/users/login/ | jq -r .token)
if [ "$TOKEN" == "null" ] || [ -z "$TOKEN" ]; then
@@ -205,7 +208,7 @@ jobs:
HTTP_RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" \
-H "Authorization: JWT ${TOKEN}" \
-X DELETE \
https://hub.docker.com/v2/repositories/${IMAGE_NAME}/tags/${IMAGE_TAG}/)
https://hub.docker.com/v2/repositories/${IMAGE_NAME}/tags/$IMAGE_TAG)
if [ "$HTTP_RESPONSE" -eq 204 ]; then
echo "Successfully deleted Docker image tag: $IMAGE_NAME:$IMAGE_TAG"
+10 -6
View File
@@ -20,8 +20,8 @@ on:
workflow_dispatch:
# Run on the 1st and 15th of every month at 09:00 UTC
schedule:
- cron: '0 2 1,15 * *'
# schedule:
# - cron: '0 2 1,15 * *'
permissions:
contents: read
@@ -162,15 +162,19 @@ jobs:
steps:
- name: Get Docker Hub Token and Delete Image
# zizmor: ignore[template-injection]
env:
DOCKERHUB_LEROBOT_USERNAME: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
DOCKERHUB_LEROBOT_PASSWORD: ${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}
IMAGE_FULL: ${{ needs.build-and-push-docker.outputs.image_tag }}
run: |
IMAGE_NAME=$(echo "${{ needs.build-and-push-docker.outputs.image_tag }}" | cut -d':' -f1)
IMAGE_TAG=$(echo "${{ needs.build-and-push-docker.outputs.image_tag }}" | cut -d':' -f2)
IMAGE_NAME=$(echo "$IMAGE_FULL" | cut -d':' -f1)
IMAGE_TAG=$(echo "$IMAGE_FULL" | cut -d':' -f2)
echo "Attempting to delete image: $IMAGE_NAME:$IMAGE_TAG"
TOKEN=$(curl -s -H "Content-Type: application/json" \
-X POST \
-d '{"username": "${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}", "password": "${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}"}' \
-d "{\"username\": \"$DOCKERHUB_LEROBOT_USERNAME\", \"password\": \"$DOCKERHUB_LEROBOT_PASSWORD\"}" \
https://hub.docker.com/v2/users/login/ | jq -r .token)
if [ "$TOKEN" == "null" ] || [ -z "$TOKEN" ]; then
@@ -181,7 +185,7 @@ jobs:
HTTP_RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" \
-H "Authorization: JWT ${TOKEN}" \
-X DELETE \
https://hub.docker.com/v2/repositories/${IMAGE_NAME}/tags/${IMAGE_TAG}/)
https://hub.docker.com/v2/repositories/${IMAGE_NAME}/tags/$IMAGE_TAG)
if [ "$HTTP_RESPONSE" -eq 204 ]; then
echo "Successfully deleted Docker image tag: $IMAGE_NAME:$IMAGE_TAG"
+1 -1
View File
@@ -14,7 +14,7 @@ You can contribute in many ways:
- **Documentation:** Improve examples, guides, and docstrings.
- **Feedback:** Submit tickets related to bugs or desired new features.
If you are unsure where to start, join our [Discord Channel](https://discord.gg/JkrYNdmw).
If you are unsure where to start, join our [Discord Channel](https://discord.gg/q8Dzzpym3f).
## Development Setup
+8 -6
View File
@@ -10,6 +10,7 @@
[![Status](https://img.shields.io/pypi/status/lerobot)](https://pypi.org/project/lerobot/)
[![Version](https://img.shields.io/pypi/v/lerobot)](https://pypi.org/project/lerobot/)
[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-v2.1-ff69b4.svg)](https://github.com/huggingface/lerobot/blob/main/CODE_OF_CONDUCT.md)
[![Discord](https://img.shields.io/badge/Discord-Join_Us-5865F2?style=flat&logo=discord&logoColor=white)](https://discord.gg/q8Dzzpym3f)
</div>
@@ -99,11 +100,11 @@ lerobot-train \
--dataset.repo_id=lerobot/aloha_mobile_cabinet
```
| Category | Models |
| -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Imitation Learning** | [ACT](./docs/source/policy_act_README.md), [Diffusion](./docs/source/policy_diffusion_README.md), [VQ-BeT](./docs/source/policy_vqbet_README.md) |
| **Reinforcement Learning** | [HIL-SERL](./docs/source/hilserl.mdx), [TDMPC](./docs/source/policy_tdmpc_README.md) & QC-FQL (coming soon) |
| **VLAs Models** | [Pi0.5](./docs/source/pi05.mdx), [GR00T N1.5](./docs/source/policy_groot_README.md), [SmolVLA](./docs/source/policy_smolvla_README.md), [XVLA](./docs/source/xvla.mdx) |
| Category | Models |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Imitation Learning** | [ACT](./docs/source/policy_act_README.md), [Diffusion](./docs/source/policy_diffusion_README.md), [VQ-BeT](./docs/source/policy_vqbet_README.md) |
| **Reinforcement Learning** | [HIL-SERL](./docs/source/hilserl.mdx), [TDMPC](./docs/source/policy_tdmpc_README.md) & QC-FQL (coming soon) |
| **VLAs Models** | [Pi0Fast](./docs/source/pi0fast.mdx), [Pi0.5](./docs/source/pi05.mdx), [GR00T N1.5](./docs/source/policy_groot_README.md), [SmolVLA](./docs/source/policy_smolvla_README.md), [XVLA](./docs/source/xvla.mdx) |
Similarly to the hardware, you can easily implement your own policy & leverage LeRobot's data collection, training, and visualization tools, and share your model to the HF Hub
@@ -127,7 +128,8 @@ Learn how to implement your own simulation environment or benchmark and distribu
## Resources
- **[Documentation](https://huggingface.co/docs/lerobot/index):** The complete guide to tutorials & API.
- **[Discord](https://discord.gg/3gxM6Avj):** Join the `LeRobot` server to discuss with the community.
- **[Chinese Tutorials: LeRobot+SO-ARM101中文教程-同济子豪兄](https://zihao-ai.feishu.cn/wiki/space/7589642043471924447)** Detailed doc for assembling, teleoperate, dataset, train, deploy. Verified by Seed Studio and 5 global hackathon players.
- **[Discord](https://discord.gg/q8Dzzpym3f):** Join the `LeRobot` server to discuss with the community.
- **[X](https://x.com/LeRobotHF):** Follow us on X to stay up-to-date with the latest developments.
- **[Robot Learning Tutorial](https://huggingface.co/spaces/lerobot/robot-learning-tutorial):** A free, hands-on course to learn robot learning using LeRobot.
+48
View File
@@ -0,0 +1,48 @@
# Security Policy
## Project Status & Philosophy
`lerobot` has so far been primarily a research and prototyping tool, which is why deployment security hasnt been a strong focus until now. As `lerobot` continues to be adopted and deployed in production, we are paying much closer attention to these kinds of issues.
Fortunately, being an open-source project, the community can also help by reporting and fixing vulnerabilities. We appreciate your efforts to responsibly disclose your findings and will make every effort to acknowledge your contributions.
## Reporting a Vulnerability
To report a security issue, please use the GitHub Security Advisory ["Report a Vulnerability"](https://github.com/huggingface/lerobot/security/advisories/new) tab.
The `lerobot` team will send a response indicating the next steps in handling your report. After the initial reply to your report, the security team will keep you informed of the progress towards a fix and full announcement, and may ask for additional information or guidance.
#### Hugging Face Security Team
Since this project is part of the Hugging Face ecosystem, feel free to submit vulnerability reports directly to: **[security@huggingface.co](mailto:security@huggingface.co)**. Someone from the HF security team will review the report and recommend next steps.
#### Open Source Disclosures
If reporting a vulnerability specific to the open-source codebase (and not the underlying Hub infrastructure), you may also use [Huntr](https://huntr.com), a vulnerability disclosure program for open source software.
## Supported Versions
Currently, we treat `lerobot` as a rolling release. We prioritize security updates for the latest available version (`main` branch).
| Version | Supported |
| -------- | --------- |
| Latest | ✅ |
| < Latest | ❌ |
## Secure Usage Guidelines
`lerobot` is tightly coupled to the Hugging Face Hub for sharing data and pretrained policies. When downloading artifacts uploaded by others, you expose yourself to risks. Please read below for recommendations to keep your runtime and robot environment safe.
### Remote Artefacts (Weights & Policies)
Models and policies uploaded to the Hugging Face Hub come in different formats. We heavily recommend uploading and downloading models in the [`safetensors`](https://github.com/huggingface/safetensors) format.
`safetensors` was developed specifically to prevent arbitrary code execution on your system, which is critical when running software on physical hardware/robots.
To avoid loading models from unsafe formats (e.g., `pickle`), you should ensure you are prioritizing `safetensors` files.
### Remote Code
Some models or environments on the Hub may require `trust_remote_code=True` to run custom architecture code.
Please **always** verify the content of the modeling files when using this argument. We recommend setting a specific `revision` (commit hash) when loading remote code to ensure you protect yourself from unverified updates to the repository.
+1 -1
View File
@@ -73,7 +73,7 @@ ENV HOME=/home/user_lerobot \
RUN uv venv --python python${PYTHON_VERSION}
# Install Python dependencies for caching
COPY --chown=user_lerobot:user_lerobot pyproject.toml README.md MANIFEST.in ./
COPY --chown=user_lerobot:user_lerobot setup.py pyproject.toml README.md MANIFEST.in ./
COPY --chown=user_lerobot:user_lerobot src/ src/
ARG UNBOUND_DEPS=false
+1 -1
View File
@@ -59,7 +59,7 @@ ENV HOME=/home/user_lerobot \
RUN uv venv
# Install Python dependencies for caching
COPY --chown=user_lerobot:user_lerobot pyproject.toml README.md MANIFEST.in ./
COPY --chown=user_lerobot:user_lerobot setup.py pyproject.toml README.md MANIFEST.in ./
COPY --chown=user_lerobot:user_lerobot src/ src/
ARG UNBOUND_DEPS=false
+12
View File
@@ -19,6 +19,8 @@
title: Train RL in Simulation
- local: multi_gpu_training
title: Multi GPU training
- local: peft_training
title: Training with PEFT (e.g., LoRA)
title: "Tutorials"
- sections:
- local: lerobot-dataset-v3
@@ -35,6 +37,8 @@
title: SmolVLA
- local: pi0
title: π₀ (Pi0)
- local: pi0fast
title: π₀-FAST (Pi0Fast)
- local: pi05
title: π₀.₅ (Pi05)
- local: groot
@@ -53,12 +57,16 @@
title: Use Async Inference
- local: rtc
title: Real-Time Chunking (RTC)
- local: training_time_rtc
title: Training-Time RTC
title: "Inference"
- sections:
- local: envhub
title: Environments from the Hub
- local: envhub_leisaac
title: Control & Train Robots in Sim (LeIsaac)
- local: envhub_isaaclab_arena
title: NVIDIA IsaacLab Arena Environments
- local: libero
title: Using Libero
- local: metaworld
@@ -93,6 +101,8 @@
title: Unitree G1
- local: earthrover_mini_plus
title: Earth Rover Mini
- local: omx
title: OMX
title: "Robots"
- sections:
- local: phone_teleop
@@ -107,6 +117,8 @@
title: Notebooks
- local: feetech
title: Updating Feetech Firmware
- local: damiao
title: Damiao Motors and CAN Bus
title: "Resources"
- sections:
- local: contributing
+2 -1
View File
@@ -169,7 +169,7 @@ python -m lerobot.async_inference.robot_client \
<!-- prettier-ignore-start -->
```python
import threading
from lerobot.robots.so100_follower import SO100FollowerConfig
from lerobot.robots.so_follower import SO100FollowerConfig
from lerobot.cameras.opencv.configuration_opencv import OpenCVCameraConfig
from lerobot.async_inference.configs import RobotClientConfig
from lerobot.async_inference.robot_client import RobotClient
@@ -195,6 +195,7 @@ client_cfg = RobotClientConfig(
robot=robot_cfg,
server_address="localhost:8080",
policy_device="mps",
client_device="cpu",
policy_type="smolvla",
pretrained_name_or_path="<user>/smolvla_async",
chunk_size_threshold=0.5,
+165
View File
@@ -0,0 +1,165 @@
# Damiao Motors and CAN Bus
This guide covers setup and usage of Damiao motors with LeRobot via CAN bus communication.
Currently, only Linux is supported, as the OpenArms CAN adapter only has drivers for Linux.
## Linux CAN Setup
Before using Damiao motors, you need to set up the CAN interface on your Linux system.
### Install CAN Utilities
```bash
sudo apt-get install can-utils
```
### Configure CAN Interface (Manual)
For standard CAN FD (recommended for OpenArms):
```bash
sudo ip link set can0 down
sudo ip link set can0 type can bitrate 1000000 dbitrate 5000000 fd on
sudo ip link set can0 up
```
For standard CAN (without FD):
```bash
sudo ip link set can0 down
sudo ip link set can0 type can bitrate 1000000
sudo ip link set can0 up
```
### Configure CAN Interface (Using LeRobot)
LeRobot provides a utility script to setup and test CAN interfaces:
```bash
# Setup multiple interfaces (e.g., OpenArms Followers with 2 CAN buses)
lerobot-setup-can --mode=setup --interfaces=can0,can1
```
## Debugging CAN Communication
Use the built-in debug tools to test motor communication:
```bash
# Test motors on all interfaces
lerobot-setup-can --mode=test --interfaces=can0,can1
# Run speed/latency test
lerobot-setup-can --mode=speed --interfaces=can0
```
The test mode will scan for motors (IDs 0x01-0x08) and report which ones respond. Example output:
```
can0: UP (CAN FD)
Motor 0x01 (joint_1): ✓ FOUND
→ Response 0x11 [FD]: 00112233...
Motor 0x02 (joint_2): ✓ FOUND
Motor 0x03 (joint_3): ✗ No response
...
Summary: 2/8 motors found
```
## Usage
### Basic Setup
```python
from lerobot.motors import Motor
from lerobot.motors.damiao import DamiaoMotorsBus
# Define your motors with send/receive CAN IDs
motors = {
"joint_1": Motor(id=0x01, motor_type_str="dm8009", recv_id=0x11),
"joint_2": Motor(id=0x02, motor_type_str="dm4340", recv_id=0x12),
"joint_3": Motor(id=0x03, motor_type_str="dm4310", recv_id=0x13),
}
# Create the bus
bus = DamiaoMotorsBus(
port="can0", # Linux socketcan interface
motors=motors,
)
# Connect
bus.connect()
```
### Reading Motor States
```python
# Read single motor position (degrees)
position = bus.read("Present_Position", "joint_1")
# Read from multiple motors
positions = bus.sync_read("Present_Position") # All motors
positions = bus.sync_read("Present_Position", ["joint_1", "joint_2"])
# Read all states at once (position, velocity, torque)
states = bus.sync_read_all_states()
# Returns: {'joint_1': {'position': 45.2, 'velocity': 1.3, 'torque': 0.5}, ...}
```
### Writing Motor Commands
```python
# Enable torque
bus.enable_torque()
# Set goal position (degrees)
bus.write("Goal_Position", "joint_1", 45.0)
# Set positions for multiple motors
bus.sync_write("Goal_Position", {
"joint_1": 45.0,
"joint_2": -30.0,
"joint_3": 90.0,
})
# Disable torque
bus.disable_torque()
```
## Configuration Options
| Parameter | Default | Description |
| -------------- | --------- | ----------------------------------------------------------- |
| `port` | - | CAN interface (`can0`) or serial port (`/dev/cu.usbmodem*`) |
| `use_can_fd` | `True` | Enable CAN FD for higher data rates |
| `bitrate` | `1000000` | Nominal bitrate (1 Mbps) |
| `data_bitrate` | `5000000` | CAN FD data bitrate (5 Mbps) |
## Motor Configuration
Each motor requires:
- `id`: CAN ID for sending commands
- `motor_type`: One of the supported motor types (e.g., `"dm8009"`, `"dm4340"`)
- `recv_id`: CAN ID for receiving responses
OpenArms default IDs follow the pattern: send ID `0x0N`, receive ID `0x1N` where N is the joint number.
## Troubleshooting
### No Response from Motors
1. **Check power**
2. **Verify CAN wiring**: Check CAN-H, CAN-L, and GND connections
3. **Check motor IDs**: Use Damiao Debugging Tools to verify/configure IDs
4. **Test CAN interface**: Run `candump can0` to see if messages are being received
5. **Run diagnostics**: `lerobot-setup-can --mode=test --interfaces=can0`
### Motor Timeout Parameter
If motors were configured with timeout=0, they won't respond to commands. Use Damiao Debugging Tools to set a non-zero timeout value.
### Verify CAN FD Status
```bash
ip -d link show can0 | grep fd
```
+29 -4
View File
@@ -1,5 +1,11 @@
# EarthRover Mini Plus
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/Earth_Rover_Mini_5_240c9adc-4f9e-44b7-982f-5d1dc24af1d8.png.webp"
alt="EarthRover Mini Plus"
width="70%"
/>
The EarthRover Mini Plus is a fully open source mobile robot that connects through the cloud using the Frodobots SDK. This lets you control the robot and record datasets for training AI models.
## What You Need
@@ -12,23 +18,42 @@ The EarthRover Mini Plus is a fully open source mobile robot that connects throu
### Setting Up the Frodobots SDK
The robot needs the [Frodobots SDK](https://github.com/Frodobots/earth-rovers-sdk) running on your computer. Here's how:
The robot needs the [Frodobots SDK](https://github.com/frodobots-org/earth-rovers-sdk) running on your computer. Here's how:
1. Download and install the SDK:
```bash
git clone https://github.com/Frodobots/earth-rovers-sdk.git
git clone https://github.com/frodobots-org/earth-rovers-sdk.git
cd earth-rovers-sdk
pip install -r requirements.txt
```
2. Start the SDK:
2. Save Credentials:
Write your .env variables with the SDK API key and bot name provided by the Frodobots team.
```bash
SDK_API_TOKEN=your_sdk_api_token_here
BOT_SLUG=your_bot_slug_here
CHROME_EXECUTABLE_PATH=/path/to/chrome_or_chromium
# Default value is MAP_ZOOM_LEVEL=18 https://wiki.openstreetmap.org/wiki/Zoom_levels
MAP_ZOOM_LEVEL=18
MISSION_SLUG=your_mission_slug_here
# Image quality between 0.1 and 1.0 (default: 0.8)
# Recommended: 0.8 for better performance
IMAGE_QUALITY=0.8
# Image format: jpeg, png or webp (default: png)
# Recommended: jpeg for better performance and lower bandwidth usage
IMAGE_FORMAT=jpeg
```
3. Start the SDK:
```bash
hypercorn main:app --reload
```
3. Open your web browser and go to `http://localhost:8000`, then click "Join"
4. Open your web browser and go to `http://localhost:8000`, then click "Join"
The SDK gives you:
+24 -17
View File
@@ -2,14 +2,32 @@
The **EnvHub** feature allows you to load simulation environments directly from the Hugging Face Hub with a single line of code. This unlocks a powerful new model for collaboration: instead of environments being locked away inside monolithic libraries, anyone can publish custom environments and share them with the community.
## Overview
## What is EnvHub?
With EnvHub, you can:
EnvHub lets you create custom robotics simulation environments with your own robot models and scenarios, and make them easily usable by anyone through the LeRobot framework.
- Load environments from the Hub instantly
- Share your custom simulation tasks with the community
- Version control your environments using Git
- Distribute complex physics simulations without packaging hassles
EnvHub packages are stored on the Hugging Face Hub, and can be seamlessly pulled and used in your AI robotics projects through LeRobot with a single line of code.
Thanks to EnvHub, you can:
1. **Create and publish environments** to the Hugging Face Hub as Git repositories, and distribute complex physics simulations without packaging hassles
2. **Load environments** dynamically, without installing them as packages
3. **Version and track** environment changes using Git semantics
4. **Discover** new simulation tasks shared by the community
This design means you can go from discovering an interesting environment on the Hub to running experiments in seconds, or create your own custom robot and environment without worrying about dependency conflicts or complex installation procedures.
When you create an EnvHub package, you can build anything you want inside it and use any simulation tool you like: this is your own space to play with. The only requirement is that the package contains an `env.py` file that defines the environment and allows LeRobot to load and use your EnvHub package.
This `env.py` file needs to expose a small API so LeRobot can load and run it. In particular, you must provide a `make_env(n_envs: int = 1, use_async_envs: bool = False)` or `make_env(n_envs: int = 1, use_async_envs: bool = False, cfg: EnvConfig)` function, which is the main entry point for LeRobot. It should return one of:
- A `gym.vector.VectorEnv` (most common)
- A single `gym.Env` (will be automatically wrapped)
- A dict mapping `{suite_name: {task_id: VectorEnv}}` (for multi-task benchmarks)
You can also pass an `EnvConfig` object to `make_env` to configure the environment (e.g. the number of environments, task, camera name, initial states, control mode, episode length, etc.).
Finally, your environment must implement the standard `gym.vector.VectorEnv` interface so it works with LeRobot, including methods like `reset` and `step`.
## Quick Start
@@ -29,17 +47,6 @@ env = make_env("lerobot/cartpole-env", trust_remote_code=True)
hash for reproducibility and security.
</Tip>
## What is EnvHub?
EnvHub is a framework that allows researchers and developers to:
1. **Publish environments** to the Hugging Face Hub as Git repositories
2. **Load environments** dynamically without installing them as packages
3. **Version and track** environment changes using Git semantics
4. **Discover** new simulation tasks shared by the community
This design means you can go from discovering an interesting environment on the Hub to running experiments in seconds, without worrying about dependency conflicts or complex installation procedures.
## Repository Structure
To make your environment loadable from the Hub, your repository must contain at minimum:
+510
View File
@@ -0,0 +1,510 @@
# NVIDIA IsaacLab Arena & LeRobot
LeRobot EnvHub now supports **GPU-accelerated simulation** with IsaacLab Arena for policy evaluation at scale.
Train and evaluate imitation learning policies with high-fidelity simulation — all integrated into the LeRobot ecosystem.
<img
src="https://huggingface.co/nvidia/isaaclab-arena-envs/resolve/main/assets/Gr1OpenMicrowaveEnvironment.png"
alt="IsaacLab Arena - GR1 Microwave Environment"
style={{ maxWidth: "100%", borderRadius: "8px", marginBottom: "1rem" }}
/>
[IsaacLab Arena](https://github.com/isaac-sim/IsaacLab-Arena) integrates with NVIDIA IsaacLab to provide:
- 🤖 **Humanoid embodiments**: GR1, G1, Galileo with various configurations
- 🎯 **Manipulation & loco-manipulation tasks**: Door opening, pick-and-place, button pressing, and more
- ⚡ **GPU-accelerated rollouts**: Parallel environment execution on NVIDIA GPUs
- 🖼️ **RTX Rendering**: Evaluate vision-based policies with realistic rendering, reflections and refractions
- 📦 **LeRobot-compatible datasets**: Ready for training with GR00T N1x, PI0, SmolVLA, ACT, and Diffusion policies
- 🔄 **EnvHub integration**: Load environments from HuggingFace EnvHub with one line
## Installation
### Prerequisites
Hardware requirements are shared with Isaac Sim, and are detailed in [Isaac Sim Requirements](https://docs.isaacsim.omniverse.nvidia.com/5.1.0/installation/requirements.html).
- NVIDIA GPU with CUDA support
- NVIDIA driver compatible with IsaacSim 5.1.0
- Linux (Ubuntu 22.04 / 24.04)
### Setup
```bash
# 1. Create conda environment
conda create -y -n lerobot-arena python=3.11
conda activate lerobot-arena
conda install -y -c conda-forge ffmpeg=7.1.1
# 2. Install Isaac Sim 5.1.0
pip install "isaacsim[all,extscache]==5.1.0" --extra-index-url https://pypi.nvidia.com
# Accept NVIDIA EULA (required)
export ACCEPT_EULA=Y
export PRIVACY_CONSENT=Y
# 3. Install IsaacLab 2.3.0
git clone https://github.com/isaac-sim/IsaacLab.git
cd IsaacLab
git checkout v2.3.0
./isaaclab.sh -i
cd ..
# 4. Install IsaacLab Arena
git clone https://github.com/isaac-sim/IsaacLab-Arena.git
cd IsaacLab-Arena
git checkout release/0.1.1
pip install -e .
cd ..
# 5. Install LeRobot
git clone https://github.com/huggingface/lerobot.git
cd lerobot
pip install -e .
cd ..
# 6. Install additional dependencies
pip install onnxruntime==1.23.2 lightwheel-sdk==1.0.1 vuer[all]==0.0.70 qpsolvers==4.8.1
pip install numpy==1.26.0 # Isaac Sim 5.1 depends on numpy==1.26.0, this will be fixed in next release
```
## Evaluating Policies
### Pre-trained Policies
The following trained policies are available:
| Policy | Architecture | Task | Link |
| :-------------------------- | :----------- | :------------ | :----------------------------------------------------------------------- |
| pi05-arena-gr1-microwave | PI0.5 | GR1 Microwave | [HuggingFace](https://huggingface.co/nvidia/pi05-arena-gr1-microwave) |
| smolvla-arena-gr1-microwave | SmolVLA | GR1 Microwave | [HuggingFace](https://huggingface.co/nvidia/smolvla-arena-gr1-microwave) |
### Evaluate SmolVLA
```bash
pip install -e ".[smolvla]"
pip install numpy==1.26.0 # revert numpy to version 1.26
```
```bash
lerobot-eval \
--policy.path=nvidia/smolvla-arena-gr1-microwave \
--env.type=isaaclab_arena \
--env.hub_path=nvidia/isaaclab-arena-envs \
--rename_map='{"observation.images.robot_pov_cam_rgb": "observation.images.robot_pov_cam"}' \
--policy.device=cuda \
--env.environment=gr1_microwave \
--env.embodiment=gr1_pink \
--env.object=mustard_bottle \
--env.headless=false \
--env.enable_cameras=true \
--env.video=true \
--env.video_length=10 \
--env.video_interval=15 \
--env.state_keys=robot_joint_pos \
--env.camera_keys=robot_pov_cam_rgb \
--trust_remote_code=True \
--eval.batch_size=1
```
### Evaluate PI0.5
```bash
pip install -e ".[pi]"
pip install numpy==1.26.0 # revert numpy to version 1.26
```
<Tip>PI0.5 requires disabling torch compile for evaluation:</Tip>
```bash
TORCH_COMPILE_DISABLE=1 TORCHINDUCTOR_DISABLE=1 lerobot-eval \
--policy.path=nvidia/pi05-arena-gr1-microwave \
--env.type=isaaclab_arena \
--env.hub_path=nvidia/isaaclab-arena-envs \
--rename_map='{"observation.images.robot_pov_cam_rgb": "observation.images.robot_pov_cam"}' \
--policy.device=cuda \
--env.environment=gr1_microwave \
--env.embodiment=gr1_pink \
--env.object=mustard_bottle \
--env.headless=false \
--env.enable_cameras=true \
--env.video=true \
--env.video_length=15 \
--env.video_interval=15 \
--env.state_keys=robot_joint_pos \
--env.camera_keys=robot_pov_cam_rgb \
--trust_remote_code=True \
--eval.batch_size=1
```
<Tip>
To change the number of parallel environments, use the ```--eval.batch_size```
flag.
</Tip>
### What to Expect
During evaluation, you will see a progress bar showing the running success rate:
```
Stepping through eval batches: 8%|██████▍ | 4/50 [00:45<08:06, 10.58s/it, running_success_rate=25.0%]
```
### Video Recording
To enable video recording during evaluation, add the following flags to your command:
```bash
--env.video=true \
--env.video_length=15 \
--env.video_interval=15
```
For more details on video recording, see the [IsaacLab Recording Documentation](https://isaac-sim.github.io/IsaacLab/main/source/how-to/record_video.html).
<Tip>
When running headless with `--env.headless=true`, you must also enable cameras explicitly for camera enabled environments:
```bash
--env.headless=true --env.enable_cameras=true
```
</Tip>
### Output Directory
Evaluation videos are saved to the output directory with the following structure:
```
outputs/eval/<date>/<timestamp>_<env>_<policy>/videos/<task>_<env_id>/eval_episode_<n>.mp4
```
For example:
```
outputs/eval/2026-01-02/14-38-01_isaaclab_arena_smolvla/videos/gr1_microwave_0/eval_episode_0.mp4
```
## Training Policies
To learn more about training policies with LeRobot, please refer to the training documentation:
- [SmolVLA](./smolvla)
- [Pi0.5](./pi05)
- [GR00T N1.5](./groot)
Sample IsaacLab Arena datasets are available on HuggingFace Hub for experimentation:
| Dataset | Description | Frames |
| :-------------------------------------------------------------------------------------------------------- | :------------------------- | :----- |
| [Arena-GR1-Manipulation-Task](https://huggingface.co/datasets/nvidia/Arena-GR1-Manipulation-Task-v3) | GR1 microwave manipulation | ~4K |
| [Arena-G1-Loco-Manipulation-Task](https://huggingface.co/datasets/nvidia/Arena-G1-Loco-Manipulation-Task) | G1 loco-manipulation | ~4K |
## Environment Configuration
### Full Configuration Options
```python
from lerobot.envs.configs import IsaaclabArenaEnv
config = IsaaclabArenaEnv(
# Environment selection
environment="gr1_microwave", # Task environment
embodiment="gr1_pink", # Robot embodiment
object="power_drill", # Object to manipulate
# Simulation settings
episode_length=300, # Max steps per episode
headless=True, # Run without GUI
device="cuda:0", # GPU device
seed=42, # Random seed
# Observation configuration
state_keys="robot_joint_pos", # State observation keys (comma-separated)
camera_keys="robot_pov_cam_rgb", # Camera observation keys (comma-separated)
state_dim=54, # Expected state dimension
action_dim=36, # Expected action dimension
camera_height=512, # Camera image height
camera_width=512, # Camera image width
enable_cameras=True, # Enable camera observations
# Video recording
video=False, # Enable video recording
video_length=100, # Frames per video
video_interval=200, # Steps between recordings
# Advanced
mimic=False, # Enable mimic mode
teleop_device=None, # Teleoperation device
disable_fabric=False, # Disable fabric optimization
enable_pinocchio=True, # Enable Pinocchio for IK
)
```
### Using Environment Hub directly for advanced usage
Create a file called `test_env_load_arena.py` or [download from the EnvHub](https://huggingface.co/nvidia/isaaclab-arena-envs/blob/main/tests/test_env_load_arena.py):
```python
import logging
from dataclasses import asdict
from pprint import pformat
import torch
import tqdm
from lerobot.configs import parser
from lerobot.configs.eval import EvalPipelineConfig
@parser.wrap()
def main(cfg: EvalPipelineConfig):
"""Run random action rollout for IsaacLab Arena environment."""
logging.info(pformat(asdict(cfg)))
from lerobot.envs.factory import make_env
env_dict = make_env(
cfg.env,
n_envs=cfg.env.num_envs,
trust_remote_code=True,
)
env = next(iter(env_dict.values()))[0]
env.reset()
for _ in tqdm.tqdm(range(cfg.env.episode_length)):
with torch.inference_mode():
actions = env.action_space.sample()
obs, rewards, terminated, truncated, info = env.step(actions)
if terminated.any() or truncated.any():
obs, info = env.reset()
env.close()
if __name__ == "__main__":
main()
```
Run with:
```bash
python test_env_load_arena.py \
--env.environment=g1_locomanip_pnp \
--env.embodiment=gr1_pink \
--env.object=cracker_box \
--env.num_envs=4 \
--env.enable_cameras=true \
--env.seed=1000 \
--env.video=true \
--env.video_length=10 \
--env.video_interval=15 \
--env.headless=false \
--env.hub_path=nvidia/isaaclab-arena-envs \
--env.type=isaaclab_arena
```
## Creating New Environments
First create a new IsaacLab Arena environment by following the [IsaacLab Arena Documentation](https://isaac-sim.github.io/IsaacLab-Arena/release/0.1.1/index.html).
Clone our EnvHub repo:
```bash
git clone https://huggingface.co/nvidia/isaaclab-arena-envs
```
Modify the `example_envs.yaml` file based on your new environment.
[Upload](./envhub#step-3-upload-to-the-hub) your modified repo to HuggingFace EnvHub.
<Tip>
Your IsaacLab Arena environment code must be locally available during
evaluation. Users can clone your environment repository separately, or you can
bundle the environment code and assets directly in your EnvHub repo.
</Tip>
Then, when evaluating, use your new environment:
```bash
lerobot-eval \
--env.hub_path=<your-env-hub-path>/isaaclab-arena-envs \
--env.environment=<your new environment> \
...other flags...
```
We look forward to your contributions!
## Troubleshooting
### CUDA out of memory
Reduce `batch_size` or use a GPU with more VRAM:
```bash
--eval.batch_size=1
```
### EULA not accepted
Set environment variables before running:
```bash
export ACCEPT_EULA=Y
export PRIVACY_CONSENT=Y
```
### Video recording not working
Enable cameras when running headless:
```bash
--env.video=true --env.enable_cameras=true --env.headless=true
```
### Policy output dimension mismatch
Ensure `action_dim` matches your policy:
```bash
--env.action_dim=36
```
### libGLU.so.1 Errors during Isaac Sim initialization
Ensure you have the following dependencies installed, this is likely to happen on headless machines.
```bash
sudo apt update && sudo apt install -y libglu1-mesa libxt6
```
## See Also
- [EnvHub Documentation](./envhub.mdx) - General EnvHub usage
- [IsaacLab Arena GitHub](https://github.com/isaac-sim/IsaacLab-Arena)
- [IsaacLab Documentation](https://isaac-sim.github.io/IsaacLab/)
## Lightwheel LW-BenchHub
[Lightwheel](https://www.lightwheel.ai) is bringing `Lightwheel-Libero-Tasks` and `Lightwheel-RoboCasa-Tasks` with 268 tasks to the LeRobot ecosystem.
LW-BenchHub collects and generates large-scale datasets via teleoperation that comply with the LeRobot specification, enabling out-of-the-box training and evaluation workflows.
With the unified interface provided by EnvHub, developers can quickly build end-to-end experimental pipelines.
### Install
Assuming you followed the [Installation](#installation) steps, you can install LW-BenchHub with:
```bash
conda install pinocchio -c conda-forge -y
pip install numpy==1.26.0 # revert numpy to version 1.26
sudo apt-get install git-lfs && git lfs install
git clone https://github.com/LightwheelAI/lw_benchhub
git lfs pull # Ensure LFS files (e.g., .usd assets) are downloaded
cd lw_benchhub
pip install -e .
```
For more detailed instructions, please refer to the [LW-BenchHub Documentation](https://docs.lightwheel.net/lw_benchhub/usage/Installation).
### Lightwheel Tasks Dataset
LW-BenchHub datasets are available on HuggingFace Hub:
| Dataset | Description | Tasks | Frames |
| :------------------------------------------------------------------------------------------------------------ | :---------------------- | :---- | :----- |
| [Lightwheel-Tasks-X7S](https://huggingface.co/datasets/LightwheelAI/Lightwheel-Tasks-X7S) | X7S LIBERO and RoboCasa | 117 | ~10.3M |
| [Lightwheel-Tasks-Double-Piper](https://huggingface.co/datasets/LightwheelAI/Lightwheel-Tasks-Double-Piper) | Double-Piper LIBERO | 130 | ~6.0M |
| [Lightwheel-Tasks-G1-Controller](https://huggingface.co/datasets/LightwheelAI/Lightwheel-Tasks-G1-Controller) | G1-Controller LIBERO | 62 | ~2.7M |
| [Lightwheel-Tasks-G1-WBC](https://huggingface.co/datasets/LightwheelAI/Lightwheel-Tasks-G1-WBC) | G1-WBC RoboCasa | 32 | ~1.5M |
For training policies, refer to the [Training Policies](#training-policies) section.
### Evaluating Policies
#### Pre-trained Policies
The following trained policies are available:
| Policy | Architecture | Task | Layout | Robot | Link |
| :----------------------- | :----------- | :----------------------------- | :--------- | :-------------- | :------------------------------------------------------------------------------------ |
| smolvla-double-piper-pnp | SmolVLA | L90K1PutTheBlackBowlOnThePlate | libero-1-1 | DoublePiper-Abs | [HuggingFace](https://huggingface.co/LightwheelAI/smolvla-double-piper-pnp/tree/main) |
#### Evaluate SmolVLA
```bash
lerobot-eval \
--policy.path=LightwheelAI/smolvla-double-piper-pnp \
--env.type=isaaclab_arena \
--rename_map='{"observation.images.left_hand_camera_rgb": "observation.images.left_hand", "observation.images.right_hand_camera_rgb": "observation.images.right_hand", "observation.images.first_person_camera_rgb": "observation.images.first_person"}' \
--env.hub_path=LightwheelAI/lw_benchhub_env \
--env.kwargs='{"config_path": "configs/envhub/example.yml"}' \
--trust_remote_code=true \
--env.state_keys=joint_pos \
--env.action_dim=12 \
--env.camera_keys=left_hand_camera_rgb,right_hand_camera_rgb,first_person_camera_rgb \
--policy.device=cuda \
--eval.batch_size=10 \
--eval.n_episodes=100
```
### Environment Configuration
Evaluation can be quickly launched by modifying the `robot`, `task`, and `layout` settings in the configuration file.
#### Full Configuration Options
```yml
# =========================
# Basic Settings
# =========================
disable_fabric: false
device: cuda:0
sensitivity: 1.0
step_hz: 50
enable_cameras: true
execute_mode: eval
episode_length_s: 20.0 # Episode length in seconds, increase if episodes timeout during eval
# =========================
# Robot Settings
# =========================
robot: DoublePiper-Abs # Robot type, DoublePiper-Abs, X7S-Abs, G1-Controller or G1-Controller-DecoupledWBC
robot_scale: 1.0
# =========================
# Task & Scene Settings
# =========================
task: L90K1PutTheBlackBowlOnThePlate # Task name
scene_backend: robocasa
task_backend: robocasa
debug_assets: null
layout: libero-1-1 # Layout and style ID
sources:
- objaverse
- lightwheel
- aigen_objs
object_projects: []
usd_simplify: false
seed: 42
# =========================
# Object Placement Retry Settings
# =========================
max_scene_retry: 4
max_object_placement_retry: 3
resample_objects_placement_on_reset: true
resample_robot_placement_on_reset: true
# =========================
# Replay Configuration Settings
# =========================
replay_cfgs:
add_camera_to_observation: true
render_resolution: [640, 480]
```
### See Also
- [LW-BenchHub GitHub](https://github.com/LightwheelAI/LW-BenchHub)
- [LW-BenchHub Documentation](https://docs.lightwheel.net/lw_benchhub/)
+4 -3
View File
@@ -137,7 +137,8 @@ from lerobot.teleoperators import ( # noqa: F401
Teleoperator,
TeleoperatorConfig,
make_teleoperator_from_config,
so101_leader,
so_leader,
bi_so_leader,
)
from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import init_logging
@@ -196,7 +197,7 @@ def teleop_loop(teleop: Teleoperator, env: gym.Env, fps: int):
obs, info = env.reset()
dt_s = time.perf_counter() - loop_start
precise_sleep(1 / fps - dt_s)
precise_sleep(max(1 / fps - dt_s, 0.0))
loop_s = time.perf_counter() - loop_start
print(f"\ntime: {loop_s * 1e3:.2f}ms ({1 / loop_s:.0f} Hz)")
@@ -222,7 +223,7 @@ def teleoperate(cfg: TeleoperateConfig):
def main():
teleoperate(TeleoperateConfig(
teleop=so101_leader.SO101LeaderConfig(
teleop=so_leader.SO101LeaderConfig(
port="/dev/ttyACM0",
id='leader',
use_degrees=False,
+7 -1
View File
@@ -12,6 +12,12 @@ Developers and researchers can post-train GR00T N1.5 with their own real or synt
GR00T N1.5 (specifically the GR00T-N1.5-3B model) is built using pre-trained vision and language encoders. It utilizes a flow matching action transformer to model a chunk of actions, conditioned on vision, language, and proprioception.
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/lerobot-groot-paper1%20(1).png"
alt="An overview of GR00T"
width="80%"
/>
Its strong performance comes from being trained on an expansive and diverse humanoid dataset, which includes:
- Real captured data from robots.
@@ -103,7 +109,7 @@ Once you have trained your model using your parameters you can run inference in
```bash
lerobot-record \
--robot.type=bi_so100_follower \
--robot.type=bi_so_follower \
--robot.left_arm_port=/dev/ttyACM1 \
--robot.right_arm_port=/dev/ttyACM0 \
--robot.id=bimanual_follower \
+10 -10
View File
@@ -58,8 +58,8 @@ lerobot-teleoperate \
<!-- prettier-ignore-start -->
```python
from lerobot.teleoperators.so101_leader import SO101LeaderConfig, SO101Leader
from lerobot.robots.so101_follower import SO101FollowerConfig, SO101Follower
from lerobot.teleoperators.so_leader import SO101LeaderConfig, SO101Leader
from lerobot.robots.so_follower import SO101FollowerConfig, SO101Follower
robot_config = SO101FollowerConfig(
port="/dev/tty.usbmodem58760431541",
@@ -195,9 +195,9 @@ lerobot-record \
from lerobot.cameras.opencv.configuration_opencv import OpenCVCameraConfig
from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.datasets.utils import hw_to_dataset_features
from lerobot.robots.so100_follower import SO100Follower, SO100FollowerConfig
from lerobot.teleoperators.so100_leader.config_so100_leader import SO100LeaderConfig
from lerobot.teleoperators.so100_leader.so100_leader import SO100Leader
from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
from lerobot.teleoperators.so_leader.config_so100_leader import SO100LeaderConfig
from lerobot.teleoperators.so_leader.so100_leader import SO100Leader
from lerobot.utils.control_utils import init_keyboard_listener
from lerobot.utils.utils import log_say
from lerobot.utils.visualization_utils import init_rerun
@@ -408,8 +408,8 @@ lerobot-replay \
import time
from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.robots.so_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so_follower.so100_follower import SO100Follower
from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import log_say
@@ -432,7 +432,7 @@ for idx in range(dataset.num_frames):
}
robot.send_action(action)
precise_sleep(1.0 / dataset.fps - (time.perf_counter() - t0))
precise_sleep(max(1.0 / dataset.fps - (time.perf_counter() - t0), 0.0))
robot.disconnect()
```
@@ -531,8 +531,8 @@ from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.datasets.utils import hw_to_dataset_features
from lerobot.policies.act.modeling_act import ACTPolicy
from lerobot.policies.factory import make_pre_post_processors
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.robots.so_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so_follower.so100_follower import SO100Follower
from lerobot.scripts.lerobot_record import record_loop
from lerobot.utils.control_utils import init_keyboard_listener
from lerobot.utils.utils import log_say
+1 -1
View File
@@ -18,7 +18,7 @@ If you're using Feetech or Dynamixel motors, LeRobot provides built-in bus inter
- [`DynamixelMotorsBus`](https://github.com/huggingface/lerobot/blob/main/src/lerobot/motors/dynamixel/dynamixel.py) for controlling Dynamixel servos
Please refer to the [`MotorsBus`](https://github.com/huggingface/lerobot/blob/main/src/lerobot/motors/motors_bus.py) abstract class to learn about its API.
For a good example of how it can be used, you can have a look at our own [SO101 follower implementation](https://github.com/huggingface/lerobot/blob/main/src/lerobot/robots/so101_follower/so101_follower.py)
For a good example of how it can be used, you can have a look at our own [SO101 follower implementation](https://github.com/huggingface/lerobot/blob/main/src/lerobot/robots/so_follower/so101_follower/so101_follower.py)
Use these if compatible. Otherwise, you'll need to find or write a Python interface (not covered in this tutorial):
+7 -1
View File
@@ -1,5 +1,11 @@
# LeKiwi
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/1740517739083.jpeg"
alt="LeKiwi"
width="70%"
/>
In the steps below, we explain how to assemble the LeKiwi mobile robot.
## Source the parts
@@ -204,7 +210,7 @@ lerobot-calibrate \
<!-- prettier-ignore-start -->
```python
from lerobot.teleoperators.so100_leader import SO100LeaderConfig, SO100Leader
from lerobot.teleoperators.so_leader import SO100LeaderConfig, SO100Leader
config = SO100LeaderConfig(
port="/dev/tty.usbmodem58760431551",
+1
View File
@@ -42,6 +42,7 @@ lerobot-eval \
```
- `--env.task` picks the suite (`libero_object`, `libero_spatial`, etc.).
- `--env.task_ids` picks task ids to run (`[0]`, `[1,2,3]`, etc.). Omit this flag (or set it to `null`) to run all tasks in the suite.
- `--eval.batch_size` controls how many environments run in parallel.
- `--eval.n_episodes` sets how many episodes to run in total.
+197
View File
@@ -0,0 +1,197 @@
## Order and Assemble the parts
First, assemble the OMX hardware following the official assembly guide.
OMX Assembly Guide: https://ai.robotis.com/omx/assembly_guide_omx.html
OMX robots are shipped preconfigured from the factory. Motor IDs, communication parameters, and joint offsets are already set, so no additional motor setup or calibration is required before using LeRobot.
## Install LeRobot 🤗
To install LeRobot, follow our [Installation Guide](./installation)
In addition to these instructions, you need to install the Dynamixel SDK:
```bash
pip install -e ".[dynamixel]"
```
## Connect the robot
To find the port for each bus servo adapter, run this script:
```bash
lerobot-find-port
```
This command runs and when prompted, disconnect the USB cable from either the leader or follower arm and press Enter. The output will show 'The port of this MotorsBus is [port]'. This identifies the port for the disconnected arm. Repeat for the other arm to identify both ports.
<hfoptions id="find_port">
<hfoption id="Mac">
Example output on macOS:
```
Finding all available ports for the MotorBus.
['/dev/tty.usbmodem575E0032081', '/dev/tty.usbmodem575E0031751']
Remove the USB cable from your MotorsBus and press Enter when done.
[...Disconnect corresponding leader or follower arm and press Enter...]
The port of this MotorsBus is /dev/tty.usbmodem575E0032081
Reconnect the USB cable.
```
Where the found port is: `/dev/tty.usbmodem575E0032081` corresponding to your leader or follower arm.
</hfoption>
<hfoption id="Linux">
On Linux, we strongly recommend using udev rules to assign persistent and human-readable device names to the OMX leader and follower arms. This avoids issues where device names such as ttyACM0 and ttyACM1 change when the robot is unplugged, replugged, or when the system is rebooted.
#### 1. Find your device serial numbers
You should have obtained the port numbers like ../../ttyACM? for the leader and follower using `lerobot-find-port`. You can match those results with the serial numbers using the `ls -l /dev/serial/by-id/` command.
To create udev rules, you need the unique serial number for each OMX device. The easiest way is to list devices under:
```bash
ls -l /dev/serial/by-id/
```
You will see output similar to:
```bash
usb-ROBOTIS_OpenRB-150_228BDD7B503059384C2E3120FF0A2B19-if00 -> ../../ttyACM0
usb-ROBOTIS_OpenRB-150_67E1ED68503059384C2E3120FF092234-if00 -> ../../ttyACM1
```
In each line, the serial number is the long string after `usb-ROBOTIS_OpenRB-150_` and before `-if00`.
Follower serial: `228BDD7B503059384C2E3120FF0A2B19`
Leader serial: `67E1ED68503059384C2E3120FF092234`
#### 2. Create the udev rule
Create a new udev rule file:
```bash
sudo nano /etc/udev/rules.d/99-omx.rules
```
Paste the following lines, replacing the serial numbers with the values you found above:
```bash
SUBSYSTEM=="tty", ATTRS{idVendor}=="0403", ATTRS{serial}=="228BDD7B503059384C2E3120FF0A2B19", SYMLINK+="omx_follower"
SUBSYSTEM=="tty", ATTRS{idVendor}=="0403", ATTRS{serial}=="67E1ED68503059384C2E3120FF092234", SYMLINK+="omx_leader"
```
Save the file and reload udev rules:
```bash
sudo udevadm control --reload-rules
sudo udevadm trigger
```
Now unplug and replug both devices once.
#### 3. Verify the symlinks
Check that the persistent device names exist:
```bash
ls -l /dev/omx_follower /dev/omx_leader
```
You should see them pointing to ttyACM\* devices:
```bash
/dev/omx_follower -> ttyACM*
/dev/omx_leader -> ttyACM*
```
These names remain stable across reboots and reconnections.
</hfoption>
</hfoptions>
## Teleoperate
After identifying the correct ports, you can directly teleoperate the follower arm using the leader arm.
<hfoptions id="teleoperate">
<hfoption id="Mac">
### Teleoperate without camera
```bash
lerobot-teleoperate \
--robot.type=omx_follower \
--robot.port=<your_follower_port> \
--robot.id=omx_follower_arm \
--teleop.type=omx_leader \
--teleop.port=<your_leader_port> \
--teleop.id=omx_leader_arm
```
During teleoperation, motions of the leader arm are mirrored in real time by the follower arm. OMX is already preconfigured, teleoperation can begin immediately without any calibration steps.
### Teleoperate with camera
You can also enable camera input during teleoperation by providing a camera configuration for the follower arm.
```bash
lerobot-teleoperate \
--robot.type=omx_follower \
--robot.port=<your_follower_port> \
--robot.id=omx_follower_arm \
--robot.cameras="{front: {type: opencv, index_or_path: '/dev/video0', width: 640, height: 480, fps: 30}}" \
--teleop.type=omx_leader \
--teleop.port=<your_leader_port> \
--teleop.id=omx_leader_arm \
--display_data=true
```
When the camera is enabled, the camera stream is displayed in real time and synchronized with the robot state. This setup is useful for visual monitoring and can be reused later for demonstration recording and imitation learning.
</hfoption>
<hfoption id="Linux">
### Teleoperate without camera
```bash
lerobot-teleoperate \
--robot.type=omx_follower \
--robot.port=/dev/omx_follower \
--robot.id=omx_follower_arm \
--teleop.type=omx_leader \
--teleop.port=/dev/omx_leader \
--teleop.id=omx_leader_arm
```
During teleoperation, motions of the leader arm are mirrored in real time by the follower arm. OMX is already preconfigured, teleoperation can begin immediately without any calibration steps.
### Teleoperate with camera
You can also enable camera input during teleoperation by providing a camera configuration for the follower arm.
```bash
lerobot-teleoperate \
--robot.type=omx_follower \
--robot.port=/dev/omx_follower \
--robot.id=omx_follower_arm \
--robot.cameras="{front: {type: opencv, index_or_path: '/dev/video0', width: 640, height: 480, fps: 30}}" \
--teleop.type=omx_leader \
--teleop.port=/dev/omx_leader \
--teleop.id=omx_leader_arm \
--display_data=true
```
When the camera is enabled, the camera stream is displayed in real time and synchronized with the robot state. This setup is useful for visual monitoring and can be reused later for demonstration recording and imitation learning.
</hfoption>
</hfoptions>
Congrats 🎉, your robot is all set to learn a task on its own.
> If you have any questions or need help, please reach out on [Discord](https://discord.com/invite/robotis).
+62
View File
@@ -0,0 +1,62 @@
# Parameter efficient fine-tuning with 🤗 PEFT
[🤗 PEFT](https://github.com/huggingface/peft) (Parameter-Efficient Fine-Tuning) is a library for efficiently adapting
large pretrained models such as pre-trained policies (e.g., SmolVLA, π₀, ...) to new tasks without training all
of the model's parameters while yielding comparable performance.
Install the `lerobot[peft]` optional package to enable PEFT support.
To read about all the possible methods of adaption, please refer to the [🤗 PEFT docs](https://huggingface.co/docs/peft/index).
## Training SmolVLA
In this section we'll show you how to train a pre-trained SmolVLA policy with PEFT on the libero dataset.
For brevity we're only training on the `libero_spatial` subset. We will use `lerobot/smolvla_base` as the model
to parameter efficiently fine-tune:
```
lerobot-train \
--policy.path=lerobot/smolvla_base \
--policy.repo_id=your_hub_name/my_libero_smolvla \
--dataset.repo_id=HuggingFaceVLA/libero \
--policy.output_features=null \
--policy.input_features=null \
--policy.optimizer_lr=1e-3 \
--policy.scheduler_decay_lr=1e-4 \
--env.type=libero \
--env.task=libero_spatial \
--steps=100000 \
--batch_size=32 \
--peft.method_type=LORA \
--peft.r=64
```
Note the `--peft.method_type` parameter that let's you select which PEFT method to use. Here we use
[LoRA](https://huggingface.co/docs/peft/main/en/package_reference/lora) (Low-Rank Adapter) which is probably the most
popular fine-tuning method to date. Low-rank adaption means that we only fine-tune a matrix with comparably low rank
instead of the full weight matrix. This rank can be specified using the `--peft.r` parameter. The higher the rank
the closer you get to full fine-tuning
There are more complex methods that have more parameters. These are not yet supported, feel free to raise an issue
if you want to see a specific PEFT method supported.
By default, PEFT will target the `q_proj` and `v_proj` layers of the LM expert in SmolVLA. It will also target the
state and action projection matrices as they are most likely task-dependent. If you need to target different layers
you can use `--peft.target_modules` to specify which layers to target. You can refer to the respective PEFT method's
documentation to see what inputs are supported, (e.g., [LoRA's target_modules documentation](https://huggingface.co/docs/peft/main/en/package_reference/lora#peft.LoraConfig.target_modules)).
Usually a list of suffixes or a regex are supported. For example, to target the MLPs of the `lm_expert` instead of
the `q` and `v` projections, use:
```
--peft.target_modules='(model\.vlm_with_expert\.lm_expert\..*\.(down|gate|up)_proj|.*\.(state_proj|action_in_proj|action_out_proj|action_time_mlp_in|action_time_mlp_out))'
```
In case you need to fully fine-tune a layer instead of just adapting it, you can supply a list of layer suffixes
to the `--peft.full_training_modules` parameter:
```
--peft.full_training_modules=["state_proj"]
```
The learning rate and the scheduled target learning rate can usually be scaled by a factor of 10 compared to the
learning rate used for full fine-tuning (e.g., 1e-4 normal, so 1e-3 using LoRA).
+8 -8
View File
@@ -44,7 +44,7 @@ Modify the examples to use `PhoneOS.IOS` or `PhoneOS.ANDROID` in `PhoneConfig`.
Teleoperation example:
```36:43:examples/phone_so100_teleop.py
```python
from lerobot.teleoperators.phone.config_phone import PhoneConfig, PhoneOS
teleop_config = PhoneConfig(phone_os=PhoneOS.IOS) # or PhoneOS.ANDROID
@@ -103,7 +103,7 @@ Additionally you can customize mapping or safety limits by editing the processor
- Kinematics are used in multiple steps. We use [Placo](https://github.com/Rhoban/placo) which is a wrapper around Pinocchio for handling our kinematics. We construct the kinematics object by passing the robot's URDF and target frame. We set `target_frame_name` to the gripper frame.
```examples/phone_to_so100/teleoperate.py
```python
kinematics_solver = RobotKinematics(
urdf_path="./SO101/so101_new_calib.urdf",
target_frame_name="gripper_frame_link",
@@ -114,7 +114,7 @@ Additionally you can customize mapping or safety limits by editing the processor
- The `MapPhoneActionToRobotAction` step converts the calibrated phone pose and inputs into target deltas and gripper commands, below is shown what the step outputs.
```src/lerobot/teleoperators/phone/phone_processor.py
```python
action["enabled"] = enabled
action["target_x"] = -pos[1] if enabled else 0.0
action["target_y"] = pos[0] if enabled else 0.0
@@ -127,7 +127,7 @@ Additionally you can customize mapping or safety limits by editing the processor
- The `EEReferenceAndDelta` step converts target deltas to an absolute desired EE pose, storing a reference on enable, the `end_effector_step_sizes` are the step sizes for the EE pose and can be modified to change the motion speed.
```examples/phone_to_so100/teleoperate.py
```python
EEReferenceAndDelta(
kinematics=kinematics_solver,
end_effector_step_sizes={"x": 0.5, "y": 0.5, "z": 0.5},
@@ -138,7 +138,7 @@ Additionally you can customize mapping or safety limits by editing the processor
- The `EEBoundsAndSafety` step clamps EE motion to a workspace and checks for large ee step jumps to ensure safety. The `end_effector_bounds` are the bounds for the EE pose and can be modified to change the workspace. The `max_ee_step_m` are the step limits for the EE pose and can be modified to change the safety limits.
```examples/phone_to_so100/teleoperate.py
```python
EEBoundsAndSafety(
end_effector_bounds={"min": [-1.0, -1.0, -1.0], "max": [1.0, 1.0, 1.0]},
max_ee_step_m=0.10,
@@ -147,7 +147,7 @@ Additionally you can customize mapping or safety limits by editing the processor
- The `GripperVelocityToJoint` step turns a velocitylike gripper input into absolute gripper position using the current measured state. The `speed_factor` is the factor by which the velocity is multiplied.
```examples/phone_to_so100/teleoperate.py
```python
GripperVelocityToJoint(speed_factor=20.0)
```
@@ -157,7 +157,7 @@ We use different IK initial guesses in the kinematic steps. As initial guess eit
- Closed loop (used in record/eval): sets `initial_guess_current_joints=True` so IK starts from the measured joints each frame.
```examples/phone_to_so100/record.py
```python
InverseKinematicsEEToJoints(
kinematics=kinematics_solver,
motor_names=list(robot.bus.motors.keys()),
@@ -167,7 +167,7 @@ We use different IK initial guesses in the kinematic steps. As initial guess eit
- Open loop (used in replay): sets `initial_guess_current_joints=False` so IK continues from the previous IK solution rather than the measured state. This preserves action stability when we replay without feedback.
```examples/phone_to_so100/replay.py
```python
InverseKinematicsEEToJoints(
kinematics=kinematics_solver,
motor_names=list(robot.bus.motors.keys()),
+17
View File
@@ -6,6 +6,12 @@
π₀ represents a breakthrough in robotics as the first general-purpose robot foundation model developed by [Physical Intelligence](https://www.physicalintelligence.company/blog/pi0). Unlike traditional robot programs that are narrow specialists programmed for repetitive motions, π₀ is designed to be a generalist policy that can understand visual inputs, interpret natural language instructions, and control a variety of different robots across diverse tasks.
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/lerobot-pi0%20(1).png"
alt="An overview of Pi0"
width="85%"
/>
### The Vision for Physical Intelligence
As described by Physical Intelligence, while AI has achieved remarkable success in digital domains, from chess-playing to drug discovery, human intelligence still dramatically outpaces AI in the physical world. To paraphrase Moravec's paradox, winning a game of chess represents an "easy" problem for AI, but folding a shirt or cleaning up a table requires solving some of the most difficult engineering problems ever conceived. π₀ represents a first step toward developing artificial physical intelligence that enables users to simply ask robots to perform any task they want, just like they can with large language models.
@@ -64,6 +70,8 @@ python src/lerobot/scripts/lerobot_train.py \
--policy.compile_model=true \
--policy.gradient_checkpointing=true \
--policy.dtype=bfloat16 \
--policy.freeze_vision_encoder=false \
--policy.train_expert_only=false \
--steps=3000 \
--policy.device=cuda \
--batch_size=32
@@ -79,6 +87,15 @@ python src/lerobot/scripts/lerobot_train.py \
- [lerobot/pi0_base](https://huggingface.co/lerobot/pi0_base)
- [lerobot/pi0_libero](https://huggingface.co/lerobot/pi0_libero) (specifically trained on the Libero dataset)
### Training Parameters Explained
| Parameter | Default | Description |
| ----------------------- | ------- | ------------------------------------------- |
| `freeze_vision_encoder` | `false` | Do not freeze the vision encoder |
| `train_expert_only` | `false` | Do not freeze the VLM, train all parameters |
**💡 Tip**: Setting `train_expert_only=true` freezes the VLM and trains only the action expert and projections, allowing finetuning with reduced memory usage.
## License
This model follows the **Apache 2.0 License**, consistent with the original [OpenPI repository](https://github.com/Physical-Intelligence/openpi).
+11
View File
@@ -67,6 +67,8 @@ python src/lerobot/scripts/lerobot_train.py\
--policy.gradient_checkpointing=true \
--wandb.enable=true \
--policy.dtype=bfloat16 \
--policy.freeze_vision_encoder=false \
--policy.train_expert_only=false \
--steps=3000 \
--policy.device=cuda \
--batch_size=32
@@ -82,6 +84,15 @@ python src/lerobot/scripts/lerobot_train.py\
- [lerobot/pi05_base](https://huggingface.co/lerobot/pi05_base)
- [lerobot/pi05_libero](https://huggingface.co/lerobot/pi05_libero) (specifically trained on the Libero dataset)
### Training Parameters Explained
| Parameter | Default | Description |
| ----------------------- | ------- | ------------------------------------------- |
| `freeze_vision_encoder` | `false` | Do not freeze the vision encoder |
| `train_expert_only` | `false` | Do not freeze the VLM, train all parameters |
**💡 Tip**: Setting `train_expert_only=true` freezes the VLM and trains only the action expert and projections, allowing finetuning with reduced memory usage.
If your dataset is not converted with `quantiles`, you can convert it with the following command:
```bash
+246
View File
@@ -0,0 +1,246 @@
# π₀-FAST (Pi0-FAST)
π₀-FAST is a **Vision-Language-Action model for general robot control** that uses autoregressive next-token prediction to model continuous robot actions.
## Model Overview
π₀-FAST combines the power of Vision-Language Models with a novel action tokenization approach called **FAST (Frequency-space Action Sequence Tokenization)**. This enables training autoregressive VLAs on highly dexterous tasks that are impossible with standard binning-based discretization, while training **up to 5x faster** than diffusion-based approaches like π₀.
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/lerobot-pifast.png"
alt="An overview of Pi0-FAST"
width="85%"
/>
### Why FAST?
Standard approaches for robot action tokenization use simple per-dimension, per-timestep binning schemes. While passable for simple behaviors, this rapidly breaks down for complex and dexterous skills that require precision and high-frequency control.
FAST solves this by compressing action sequences using signal processing techniques, resulting in a dense sequence of action tokens that can be predicted autoregressively—just like language tokens.
### How FAST Tokenization Works
The FAST tokenizer compresses action sequences through the following steps:
1. **Normalize**: Take a continuous action chunk of shape `(H, D)` where `H` is the horizon and `D` is the action dimension. Normalize using one of the supported normalization methods (Quantiles recommended to handle outliers).
2. **Discrete Cosine Transform (DCT)**: Apply DCT (via scipy) to each action dimension separately. DCT is a compression algorithm commonly used in image and audio codecs (JPEG, MP3).
3. **Quantization**: Round and remove insignificant coefficients for each action dimension, producing a sparse frequency matrix.
4. **Flatten**: Flatten the matrix into a 1D vector, with low-frequency components first.
5. **Byte Pair Encoding (BPE)**: Train a BPE tokenizer to compress the DCT coefficients into dense action tokens, typically achieving **10x compression** over prior tokenization approaches.
This approach can transform **any existing VLM** into a VLA by training it to predict these FAST tokens.
## Installation Requirements
1. Install LeRobot by following our [Installation Guide](./installation).
2. Install π₀-FAST dependencies by running:
```bash
pip install -e ".[pi]"
```
> [!NOTE]
> For lerobot 0.4.0, if you want to install the pi tag, you will have to do: `pip install "lerobot[pi]@git+https://github.com/huggingface/lerobot.git"`.
>
> This will be solved in the next patch release
## Training a Custom FAST Tokenizer
You have two options for the FAST tokenizer:
1. **Use the pre-trained tokenizer**: The `physical-intelligence/fast` tokenizer was trained on 1M+ real robot action sequences and works as a general-purpose tokenizer.
2. **Train your own tokenizer**: For maximum performance on your specific dataset, you can finetune the tokenizer on your own data.
### Training Your Own Tokenizer
```bash
lerobot-train-tokenizer \
--repo_id "user/my-lerobot-dataset" \
--action_horizon 10 \
--encoded_dims "0:6" \
--vocab_size 1024 \
--scale 10.0 \
--normalization_mode QUANTILES \
--output_dir "./my_fast_tokenizer" \
--push_to_hub \
--hub_repo_id "username/my-action-tokenizer"
```
### Key Tokenizer Parameters
| Parameter | Description | Default |
| ---------------------- | --------------------------------------------------------------------------------- | ------------ |
| `--repo_id` | LeRobot dataset repository ID | Required |
| `--action_horizon` | Number of future actions in each chunk | `10` |
| `--encoded_dims` | Comma-separated dimension ranges to encode (e.g., `"0:6,7:23"`) | `"0:6,7:23"` |
| `--vocab_size` | BPE vocabulary size | `1024` |
| `--scale` | DCT scaling factor for quantization | `10.0` |
| `--normalization_mode` | Normalization mode (`MEAN_STD`, `MIN_MAX`, `QUANTILES`, `QUANTILE10`, `IDENTITY`) | `QUANTILES` |
| `--sample_fraction` | Fraction of chunks to sample per episode | `0.1` |
## Usage
To use π₀-FAST in LeRobot, specify the policy type as:
```python
policy.type=pi0_fast
```
## Training
For training π₀-FAST, you can use the LeRobot training script:
```bash
lerobot-train \
--dataset.repo_id=your_dataset \
--policy.type=pi0_fast \
--output_dir=./outputs/pi0fast_training \
--job_name=pi0fast_training \
--policy.pretrained_path=lerobot/pi0_fast_base \
--policy.dtype=bfloat16 \
--policy.gradient_checkpointing=true \
--policy.chunk_size=10 \
--policy.n_action_steps=10 \
--policy.max_action_tokens=256 \
--steps=100000 \
--batch_size=4 \
--policy.device=cuda
```
### Key Training Parameters
| Parameter | Description | Default |
| -------------------------------------- | -------------------------------------------------- | ---------------------------- |
| `--policy.gradient_checkpointing=true` | Reduces memory usage significantly during training | `false` |
| `--policy.dtype=bfloat16` | Use mixed precision training for efficiency | `float32` |
| `--policy.chunk_size` | Number of action steps to predict (action horizon) | `50` |
| `--policy.n_action_steps` | Number of action steps to execute | `50` |
| `--policy.max_action_tokens` | Maximum number of FAST tokens per action chunk | `256` |
| `--policy.action_tokenizer_name` | FAST tokenizer to use | `physical-intelligence/fast` |
| `--policy.compile_model=true` | Enable torch.compile for faster training | `false` |
## Inference
### KV-Caching for Fast Inference
π₀-FAST supports **KV-caching**, a widely used optimization in LLM inference. This caches the key-value pairs from the attention mechanism, avoiding redundant computation during autoregressive decoding.
```python
# KV-caching is enabled by default
policy.use_kv_cache=true
```
### Inference Example
```python
from lerobot.policies.pi0_fast import PI0FastPolicy, PI0FastConfig
# Load the policy
policy = PI0FastPolicy.from_pretrained("your-model-path")
# During inference
actions = policy.predict_action_chunk(batch)
```
## Model Architecture
π₀-FAST uses a PaliGemma-based architecture:
- **Vision Encoder**: SigLIP vision tower for image understanding
- **Language Model**: Gemma 2B for processing language instructions and predicting action tokens
The model takes images, text instructions, and robot state as input, and outputs discrete FAST tokens that are decoded back to continuous actions.
## Configuration Options
| Parameter | Description | Default |
| -------------------- | ----------------------------------------------- | ---------- |
| `paligemma_variant` | VLM backbone variant (`gemma_300m`, `gemma_2b`) | `gemma_2b` |
| `max_state_dim` | Maximum state vector dimension (padded) | `32` |
| `max_action_dim` | Maximum action vector dimension (padded) | `32` |
| `temperature` | Sampling temperature (0.0 for greedy) | `0.0` |
| `max_decoding_steps` | Maximum decoding steps | `256` |
| `use_kv_cache` | Enable KV caching for faster inference | `true` |
## Comparison with π₀
| Feature | π₀ | π₀-FAST |
| --------------------- | ------------------------- | ---------------------------- |
| Action Representation | Flow Matching (Diffusion) | Autoregressive Tokens (FAST) |
| Training Speed | 1x | **5x faster** |
| Dexterity | High | High |
| Inference Method | Iterative Denoising | Autoregressive Decoding |
| KV-Caching | N/A | Supported |
## Reproducing π₀Fast results
We reproduce the results of π₀Fast on the LIBERO benchmark using the LeRobot implementation. We take the LeRobot PiFast base model [lerobot/pi0fast-base](https://huggingface.co/lerobot/pi0fast-base) and finetune for an additional 40kk steps in bfloat16, with batch size of 256 on 8 H100 GPUs using the [HuggingFace LIBERO dataset](https://huggingface.co/datasets/HuggingFaceVLA/libero).
The finetuned model can be found here:
- **π₀Fast LIBERO**: [lerobot/pi0fast-libero](https://huggingface.co/lerobot/pi0fast-libero)
With the following training command:
```bash
lerobot-train \
--dataset.repo_id=lerobot/libero \
--output_dir=outputs/libero_pi0fast \
--job_name=libero_pi0fast \
--policy.path=lerobot/pi0fast_base \
--policy.dtype=bfloat16 \
--steps=100000 \
--save_freq=20000 \
--batch_size=4 \
--policy.device=cuda \
--policy.scheduler_warmup_steps=4000 \
--policy.scheduler_decay_steps=100000 \
--policy.scheduler_decay_lr=1e-5 \
--policy.gradient_checkpointing=true \
--policy.chunk_size=10 \
--policy.n_action_steps=10 \
--policy.max_action_tokens=256 \
--policy.empty_cameras=1 \
```
We then evaluate the finetuned model using the LeRobot LIBERO implementation, by running the following command:
```bash
tasks="libero_object,libero_spatial,libero_goal,libero_10"
lerobot-eval \
--policy.path=lerobot/pi0fast-libero \
--policy.max_action_tokens=256 \
--env.type=libero \
--policy.gradient_checkpointing=false \
--env.task=${tasks} \
--eval.batch_size=1 \
--eval.n_episodes=1 \
--rename_map='{"observation.images.image":"observation.images.base_0_rgb","observation.images.image2":"observation.images.left_wrist_0_rgb"}'
```
**Note:** We set `n_action_steps=10`, similar to the original OpenPI implementation.
### Results
We obtain the following results on the LIBERO benchmark:
| Model | LIBERO Spatial | LIBERO Object | LIBERO Goal | LIBERO 10 | Average |
| ----------- | -------------- | ------------- | ----------- | --------- | -------- |
| **π₀-fast** | 70.0 | 100.0 | 100.0 | 60.0 | **82.5** |
The full evaluation output folder, including videos, is available [here](https://drive.google.com/drive/folders/1HXpwPTRm4hx6g1sF2P7OOqGG0TwPU7LQ?usp=sharing)
## License
This model follows the **Apache 2.0 License**, consistent with the original [OpenPI repository](https://github.com/Physical-Intelligence/openpi).
## References
- [FAST: Efficient Robot Action Tokenization](https://www.physicalintelligence.company/research/fast) - Physical Intelligence Blog
- [OpenPI Repository](https://github.com/Physical-Intelligence/openpi) - Original implementation
- [FAST Tokenizer on Hugging Face](https://huggingface.co/physical-intelligence/fast) - Pre-trained tokenizer
+14 -4
View File
@@ -1,20 +1,30 @@
# WALL-OSS
This repository contains the Hugging Face port of **WALL-OSS**, a Vision-Language-Action model for cross-embodiment robotic control based on Qwen2.5-VL with flow matching/FAST action prediction.
This repository contains the Hugging Face port of [**WALL-OSS**](https://x2robot.com/en/research/68bc2cde8497d7f238dde690), a Vision-Language-Action model for cross-embodiment robotic control based on Qwen2.5-VL with flow matching/FAST action prediction.
---
## Model Overview
| Feature | Description |
| ------------------ | ----------------------------------------------------- | --- |
| ------------------ | ----------------------------------------------------- |
| Base Model | Qwen2.5-VL (Vision-Language Model) |
| Action Prediction | Flow Matching (diffusion) or FAST (discrete tokens) |
| Architecture | Mixture of Experts (MoE) with action-specific routing | |
| Architecture | Mixture of Experts (MoE) with action-specific routing |
| Multi-Modal Inputs | Vision (images/videos), Language, Proprioception |
---
## Additional Resources
Paper: https://arxiv.org/pdf/2509.11766
Official Repository: https://github.com/X-Square-Robot/wall-x
Hugging Face: https://huggingface.co/x-square-robot
---
## Citation
If you use this work, please cite:
@@ -32,4 +42,4 @@ If you use this work, please cite:
## License
This port follows the **Apache 2.0 License**.
This model follows the **Apache 2.0 License**, consistent with the original [WallX repository](https://github.com/X-Square-Robot/wall-x).
+3 -3
View File
@@ -30,7 +30,7 @@ Each of these pipelines handle different conversions between different action an
Below is an example of the three pipelines that we use in the phone to SO-100 follower examples:
```69:90:examples/phone_so100_record.py
```python
phone_to_robot_ee_pose_processor = RobotProcessorPipeline[RobotAction, RobotAction]( # teleop -> dataset action
steps=[
MapPhoneActionToRobotAction(platform=teleop_config.phone_os),
@@ -84,7 +84,7 @@ Dataset features are determined by the keys saved in the dataset. Each step can
Below is and example of how we declare features with the `transform_features` method in the phone to SO-100 follower examples:
```src/lerobot/robots/so100_follower/robot_kinematic_processor.py
```python
def transform_features(
self, features: dict[PipelineFeatureType, dict[str, PolicyFeature]]
) -> dict[PipelineFeatureType, dict[str, PolicyFeature]]:
@@ -103,7 +103,7 @@ Here we declare what PolicyFeatures we modify in this step, so we know what feat
Below is an example of how we aggregate and merge features in the phone to SO-100 record example:
```121:145:examples/phone_so100_record.py
```python
features=combine_feature_dicts(
# Run the feature contract of the pipelines
# This tells you how the features would look like after the pipeline steps
+34 -19
View File
@@ -38,6 +38,7 @@ docker run --rm -it \
start_rviz:=true start_sdk_server:=true mujoco:=true
```
> [!NOTE]
> If MuJoCo runs slowly (low simulation frequency), append `-e LD_LIBRARY_PATH="/opt/host-libs:$LD_LIBRARY_PATH" \` to the previous command to improve performance:
>
> ```
@@ -141,7 +142,7 @@ If you choose this option but still want to use the VR teleoperation application
First add reachy2 and reachy2_teleoperator to the imports of the record script. Then you can use the following command:
```bash
python -m lerobot.record \
lerobot-record \
--robot.type=reachy2 \
--robot.ip_address=192.168.0.200 \
--robot.id=r2-0000 \
@@ -150,6 +151,7 @@ python -m lerobot.record \
--teleop.type=reachy2_teleoperator \
--teleop.ip_address=192.168.0.200 \
--teleop.with_mobile_base=false \
--robot.with_torso_camera=true \
--dataset.repo_id=pollen_robotics/record_test \
--dataset.single_task="Reachy 2 recording test" \
--dataset.num_episodes=1 \
@@ -165,7 +167,7 @@ python -m lerobot.record \
**Extended setup overview (all options included):**
```bash
python -m lerobot.record \
lerobot-record \
--robot.type=reachy2 \
--robot.ip_address=192.168.0.200 \
--robot.use_external_commands=true \
@@ -177,6 +179,8 @@ python -m lerobot.record \
--robot.with_left_teleop_camera=true \
--robot.with_right_teleop_camera=true \
--robot.with_torso_camera=false \
--robot.camera_width=640 \
--robot.camera_height=480 \
--robot.disable_torque_on_disconnect=false \
--robot.max_relative_target=5.0 \
--teleop.type=reachy2_teleoperator \
@@ -212,9 +216,10 @@ Must be set to true if a compliant Reachy 2 is used to control another one.
From our initial tests, recording **all** joints when only some are moving can reduce model quality with certain policies.
To avoid this, you can exclude specific parts from recording and replay using:
````
```bash
--robot.with_<part>=false
```,
```
with `<part>` being one of : `mobile_base`, `l_arm`, `r_arm", `neck`, `antennas`.
It determine whether the corresponding part is recorded in the observations. True if not set.
@@ -222,49 +227,60 @@ By default, **all parts are recorded**.
The same per-part mechanism is available in `reachy2_teleoperator` as well.
````
```bash
--teleop.with\_<part>
```
with `<part>` being one of : `mobile_base`, `l_arm`, `r_arm", `neck`, `antennas`.
Determine whether the corresponding part is recorded in the actions. True if not set.
> **Important:** In a given session, the **enabled parts must match** on both the robot and the teleoperator.
For example, if the robot runs with `--robot.with_mobile_base=false`, the teleoperator must disable the same part `--teleoperator.with_mobile_base=false`.
> For example, if the robot runs with `--robot.with_mobile_base=false`, the teleoperator must disable the same part `--teleoperator.with_mobile_base=false`.
##### Use the relevant cameras
You can do the same for **cameras**. By default, only the **teleoperation cameras** are recorded (both `left_teleop_camera` and `right_teleop_camera`). Enable or disable each camera with:
You can do the same for **cameras**. Enable or disable each camera with default parameters using:
```bash
--robot.with_left_teleop_camera=<true|false> \
--robot.with_right_teleop_camera=<true|false> \
--robot.with_torso_camera=<true|false>
```
--robot.with_left_teleop_camera=<true|false>
--robot.with_right_teleop_camera=<true|false>
--robot.with_torso_camera=<true|false>
By default, no camera is recorded, all camera arguments are set to `false`.
If you want to, you can use custom `width` and `height` parameters for Reachy 2's cameras using the `--robot.camera_width` & `--robot.camera_height` argument:
````
```bash
--robot.camera_width=1920 \
--robot.camera_height=1080
```
This will change the resolution of all 3 default robot cameras (enabled by the above bool arguments).
If you want, you can add additional cameras other than the ones in the robot as usual with:
```bash
--robot.cameras="{ extra: {type: opencv, index_or_path: 42, width: 640, height: 480, fps: 30}}" \
```
## Step 2: Replay
Make sure the robot is configured with the same parts as the dataset:
```bash
python -m lerobot.replay \
lerobot-replay \
--robot.type=reachy2 \
--robot.ip_address=192.168.0.200 \
--robot.use_external_commands=false \
--robot.with_mobile_base=false \
--dataset.repo_id=pollen_robotics/record_test \
--dataset.episode=0
--display_data=true
````
```
## Step 3: Train
```bash
python -m lerobot.scripts.train \
lerobot-train \
--dataset.repo_id=pollen_robotics/record_test \
--policy.type=act \
--output_dir=outputs/train/reachy2_test \
@@ -277,10 +293,9 @@ python -m lerobot.scripts.train \
## Step 4: Evaluate
```bash
python -m lerobot.record \
lerobot-eval \
--robot.type=reachy2 \
--robot.ip_address=192.168.0.200 \
--display_data=false \
--dataset.repo_id=pollen_robotics/eval_record_test \
--dataset.single_task="Evaluate reachy2 policy" \
--dataset.num_episodes=10 \
+6
View File
@@ -4,6 +4,12 @@ SARM (Stage-Aware Reward Modeling) is a video-based reward modeling framework fo
**Paper**: [SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation](https://arxiv.org/abs/2509.25358)
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/lerobot-sarm.png"
alt="An overview of SARM"
width="80%"
/>
## Why Reward Models?
Standard behavior cloning treats all demonstration frames equally, but real-world robot datasets are messy. They contain hesitations, corrections, and variable-quality trajectories. Reward models solve this by learning a generalizable notion of **task progress** from demonstrations: given video frames and a task description, they predict how close the robot is to completing the task (0→1). This learned "progress signal" can be used in multiple ways, two promising applications are: (1) **weighted imitation learning** (RA-BC), where high-progress frames receive more weight during policy training, and (2) **reinforcement learning**, where the reward model provides dense rewards for online or offline policy improvement.
+4 -4
View File
@@ -103,7 +103,7 @@ lerobot-setup-motors \
<!-- prettier-ignore-start -->
```python
from lerobot.robots.so100_follower import SO100Follower, SO100FollowerConfig
from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
config = SO100FollowerConfig(
port="/dev/tty.usbmodem585A0076841",
@@ -177,7 +177,7 @@ lerobot-setup-motors \
<!-- prettier-ignore-start -->
```python
from lerobot.teleoperators.so100_leader import SO100Leader, SO100LeaderConfig
from lerobot.teleoperators.so_leader import SO100Leader, SO100LeaderConfig
config = SO100LeaderConfig(
port="/dev/tty.usbmodem585A0076841",
@@ -579,7 +579,7 @@ lerobot-calibrate \
<!-- prettier-ignore-start -->
```python
from lerobot.robots.so100_follower import SO100FollowerConfig, SO100Follower
from lerobot.robots.so_follower import SO100FollowerConfig, SO100Follower
config = SO100FollowerConfig(
port="/dev/tty.usbmodem585A0076891",
@@ -617,7 +617,7 @@ lerobot-calibrate \
<!-- prettier-ignore-start -->
```python
from lerobot.teleoperators.so100_leader import SO100LeaderConfig, SO100Leader
from lerobot.teleoperators.so_leader import SO100LeaderConfig, SO100Leader
config = SO100LeaderConfig(
port="/dev/tty.usbmodem58760431551",
+17 -4
View File
@@ -1,5 +1,18 @@
# SO-101
<div style="display: flex; align-items: center; gap: 10px;">
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/SO101_Follower.webp"
alt="SO-101"
width="60%"
/>
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/SO101_Leader.webp"
alt="SO-101"
width="60%"
/>
</div>
In the steps below, we explain how to assemble our flagship robot, the SO-101.
## Source the parts
@@ -125,7 +138,7 @@ lerobot-setup-motors \
<!-- prettier-ignore-start -->
```python
from lerobot.robots.so101_follower import SO101Follower, SO101FollowerConfig
from lerobot.robots.so_follower import SO101Follower, SO101FollowerConfig
config = SO101FollowerConfig(
port="/dev/tty.usbmodem585A0076841",
@@ -201,7 +214,7 @@ lerobot-setup-motors \
<!-- prettier-ignore-start -->
```python
from lerobot.teleoperators.so101_leader import SO101Leader, SO101LeaderConfig
from lerobot.teleoperators.so_leader import SO101Leader, SO101LeaderConfig
config = SO101LeaderConfig(
port="/dev/tty.usbmodem585A0076841",
@@ -364,7 +377,7 @@ lerobot-calibrate \
<!-- prettier-ignore-start -->
```python
from lerobot.robots.so101_follower import SO101FollowerConfig, SO101Follower
from lerobot.robots.so_follower import SO101FollowerConfig, SO101Follower
config = SO101FollowerConfig(
port="/dev/tty.usbmodem585A0076891",
@@ -413,7 +426,7 @@ lerobot-calibrate \
<!-- prettier-ignore-start -->
```python
from lerobot.teleoperators.so101_leader import SO101LeaderConfig, SO101Leader
from lerobot.teleoperators.so_leader import SO101LeaderConfig, SO101Leader
config = SO101LeaderConfig(
port="/dev/tty.usbmodem58760431551",
+86
View File
@@ -0,0 +1,86 @@
# Training-Time RTC
Training-Time RTC teaches the model to handle inference delay during training.
It feeds the **ground-truth action prefix** to the model and trains only on the remaining postfix actions.
This keeps chunk transitions smooth without doing any inference-time inpainting.
Based on: [Training-Time Action Conditioning for Efficient Real-Time Chunking](https://arxiv.org/abs/2512.05964).
LeRobot supports this for `pi0`, `pi05` and `smolvla` without changing model parameters.
---
## How It Works
### At Training Time
- Sample a delay `d` per batch element.
- Keep the first `d` action steps as **ground truth** (no noise).
- Add noise only to the postfix actions.
- Set the flow-matching timestep to **1.0** for prefix tokens and normal timesteps for postfix tokens.
- Mask the loss to only train on the postfix.
### At Inference Time
When `rtc_training_config.enabled=true`, the model uses training-time RTC inference:
- Replace prefix positions in `x_t` with previous chunk's leftover actions.
- Set timestep to **1.0** for prefix positions.
---
## Quick Start (CLI)
```bash
lerobot-train \
--policy.type=pi0 \
--dataset.repo_id=your/dataset \
--policy.rtc_training_config.enabled=true \
--policy.rtc_training_config.min_delay=0 \
--policy.rtc_training_config.max_delay=6 \
--policy.rtc_training_config.delay_distribution=UNIFORM
```
---
## Inference with Training-Time RTC
After training with `rtc_training_config`, use the same config at inference. The model will automatically use training-time RTC inference:
```python
policy = PI0Policy.from_pretrained("path/to/trained/model")
# rtc_training_config is loaded from the saved config
actions = policy.predict_action_chunk(
batch,
inference_delay=5, # estimated delay in timesteps
prev_chunk_left_over=previous_actions, # from previous chunk
)
```
---
## Key Parameters
`RTCTrainingConfig` is available on the policy config (`pi0`, `pi05`, `smolvla`, `xvla`):
- **`enabled`**: Toggle training-time RTC (both training and inference).
- **`min_delay` / `max_delay`**: Delay range (inclusive).
- **`delay_distribution`**:
- `UNIFORM`: uniform in `[min_delay, max_delay]`
- `EXP`: exponentially decayed distribution over delays
- **`exp_decay`**: Exponential decay factor for `EXP` sampling.
---
## Notes and Recommendations
- Start with `min_delay=0` and `max_delay` around your expected worst-case inference delay.
- Use `EXP` if you want more supervision on smaller delays.
---
## Related Docs
- [Real-Time Chunking (Inference-Time RTC)](./rtc)
- [Pi0](./pi0), [Pi0.5](./pi05), [SmolVLA](./smolvla)
+26 -31
View File
@@ -1,21 +1,21 @@
# Unitree G1 Robot Setup and Control
# Unitree G1
This guide covers the complete setup process for the Unitree G1 humanoid, from initial connection to running gr00t_wbc locomotion.
## About the Unitree G1
## About
We offer support for both 29 and 23 DOF G1. We introduce:
We support both 29 and 23 DOF G1 EDU version. We introduce:
- **`unitree g1` robot class, handling low level communication with the humanoid**
- **ZMQ socket bridge** for remote communication over WiFi, allowing one to deploy policies remotely instead of over ethernet or directly on the Orin
- **GR00T locomotion policy** for bipedal walking and balance
- **MuJoCo simulation mode** for testing policies without the physical robot
- **`unitree g1` robot class, handling low level read/write from/to the humanoid**
- **ZMQ socket bridge** for remote communication and camera streaming, allowing for remote policy deployment over wlan, eth or directly on the robot
- **Locomotion policies** from NVIDIA gr00t and Amazon FAR Holosoma
- **Simulation mode** for testing policies without the physical robot in mujoco
---
## Part 1: Connect to Robot over Ethernet
## Connection guide
### Step 1: Configure Your Computer's Ethernet Interface
### Step 1: Configure Ethernet Interface
Set a static IP on the same subnet as the robot:
@@ -26,7 +26,7 @@ sudo ip addr add 192.168.123.200/24 dev enp131s0
sudo ip link set enp131s0 up
```
**Note**: The robot's Ethernet IP is fixed at `192.168.123.164`. Your computer must use `192.168.123.x` where x ≠ 164.
**Note**: The G1's Ethernet IP is fixed at `192.168.123.164`. Your computer must use `192.168.123.x` with x ≠ 164.
### Step 2: SSH into the Robot
@@ -35,25 +35,24 @@ ssh unitree@192.168.123.164
# Password: 123
```
You should now be connected to the robot's onboard computer.
You should now be connected to the G1's Orin.
---
## Part 2: Enable WiFi on the Robot
Once connected via Ethernet, follow these steps to enable WiFi:
Wlan0 is disabled by default on the G1. To enable it:
### Step 1: Enable WiFi Hardware
```bash
# Unblock WiFi radio
sudo rfkill unblock wifi
sudo rfkill unblock all
# Bring up WiFi interface
# Bring up wlan0
sudo ip link set wlan0 up
# Enable NetworkManager control
# Enable NetworkManager control of wlan0
sudo nmcli radio wifi on
sudo nmcli device set wlan0 managed yes
sudo systemctl restart NetworkManager
@@ -73,7 +72,7 @@ sudo iptables -A FORWARD -i wlp132s0f0 -o enp131s0 -m state --state RELATED,ESTA
sudo iptables -A FORWARD -i enp131s0 -o wlp132s0f0 -j ACCEPT
```
**On the robot:**
**On the G1:**
```bash
# Add laptop as default gateway
@@ -111,7 +110,7 @@ ssh unitree@<YOUR_ROBOT_IP>
# Password: 123
```
Replace `<YOUR_ROBOT_IP>` with your robot's actual WiFi IP address (e.g., `172.18.129.215`).
Replace `<YOUR_ROBOT_IP>` with your robot's actual WiFi IP address.
---
@@ -147,9 +146,9 @@ python src/lerobot/robots/unitree_g1/run_g1_server.py
---
## Part 4: Running GR00T Locomotion
## Part 4: Controlling the robot
With the robot server running, you can now control the robot from your laptop.
With the robot server running, you can now control the robot remotely. Let's launch a locomotion policy
### Step 1: Install LeRobot on your machine
@@ -172,34 +171,30 @@ Edit the config file to match your robot's WiFi IP:
robot_ip: str = "<YOUR_ROBOT_IP>" # Replace with your robot's WiFi IP.
```
**Note**: When running directly on the G1 (not remotely), set `robot_ip: str = "127.0.0.1"` instead.
### Step 3: Run the Locomotion Policy
```bash
# Run GR00T locomotion controller
python examples/unitree_g1/gr00t_locomotion.py --repo-id "nepyope/GR00T-WholeBodyControl_g1"
# Run Holosoma locomotion controller
python examples/unitree_g1/holosoma_locomotion.py
```
### Step 4: Control with Remote
- **Left stick**: Forward/backward and left/right movement
- **Right stick**: Rotation
- **R1 button**: Raise waist height
- **R2 button**: Lower waist height
Press `Ctrl+C` to stop the policy.
---
## Extra: Running in Simulation Mode (MuJoCo)
## Running in Simulation Mode (MuJoCo)
You can now test and develop policies without a physical robot using MuJoCo. to do so set `is_simulation=True` in config.
You can now test policies before unleashing them on the physical robot using MuJoCo. To do so simply set `is_simulation=True` in config.
## Additional Resources
- [Unitree SDK Documentation](https://github.com/unitreerobotics/unitree_sdk2_python)
- [GR00T Policy Repository](https://huggingface.co/nepyope/GR00T-WholeBodyControl_g1)
- [GR00T-WholeBodyControl](https://github.com/NVlabs/GR00T-WholeBodyControl)
- [Holosoma](https://github.com/amazon-far/holosoma)
- [LeRobot Documentation](https://github.com/huggingface/lerobot)
- [Unitree_IL_Lerobot](https://github.com/unitreerobotics/unitree_IL_lerobot)
+13 -6
View File
@@ -95,26 +95,26 @@ Convert an image-based dataset to video format, creating a new LeRobotDataset wh
# Local-only: Save to a custom output directory (no hub push)
lerobot-edit-dataset \
--repo_id lerobot/pusht_image \
--operation.type convert_to_video \
--operation.type convert_image_to_video \
--operation.output_dir /path/to/output/pusht_video
# Save with new repo_id (local storage)
lerobot-edit-dataset \
--repo_id lerobot/pusht_image \
--new_repo_id lerobot/pusht_video \
--operation.type convert_to_video
--operation.type convert_image_to_video
# Convert and push to Hugging Face Hub
lerobot-edit-dataset \
--repo_id lerobot/pusht_image \
--new_repo_id lerobot/pusht_video \
--operation.type convert_to_video \
--operation.type convert_image_to_video \
--push_to_hub true
# Convert with custom video codec and quality settings
lerobot-edit-dataset \
--repo_id lerobot/pusht_image \
--operation.type convert_to_video \
--operation.type convert_image_to_video \
--operation.output_dir outputs/pusht_video \
--operation.vcodec libsvtav1 \
--operation.pix_fmt yuv420p \
@@ -124,16 +124,23 @@ lerobot-edit-dataset \
# Convert only specific episodes
lerobot-edit-dataset \
--repo_id lerobot/pusht_image \
--operation.type convert_to_video \
--operation.type convert_image_to_video \
--operation.output_dir outputs/pusht_video \
--operation.episode_indices "[0, 1, 2, 5, 10]"
# Convert with multiple workers for parallel processing
lerobot-edit-dataset \
--repo_id lerobot/pusht_image \
--operation.type convert_to_video \
--operation.type convert_image_to_video \
--operation.output_dir outputs/pusht_video \
--operation.num_workers 8
# For memory-constrained systems, users can now specify limits:
lerobot-edit-dataset \
--repo_id lerobot/pusht_image \
--operation.type convert_to_video \
--operation.max_episodes_per_batch 50 \
--operation.max_frames_per_batch 10000
```
**Parameters:**
+6
View File
@@ -8,6 +8,12 @@ X Square Robots WALL-OSS is now integrated into Hugging Faces LeRobot ecos
The WALL-OSS team is building the embodied foundation model to capture and compress the world's most valuable data: the continuous, high-fidelity stream of physical interaction. By creating a direct feedback loop between the model's decisions and the body's lived experience, the emergence of a truly generalizable intelligence is enabled—one that understands not just how the world works, but how to act effectively within it.
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lerobot/walloss-lerobot-paper.png"
alt="An overview of WALL-OSS"
width="85%"
/>
Technically, WALL-OSS introduces a tightly coupled multimodal architecture (tightly-coupled MoE structure) that integrates both discrete and continuous action modeling strategies. Through a two-stage training pipeline (Inspiration → Integration), the model gradually unifies semantic reasoning and high-frequency action generation. Its core innovations include:
- **Embodied perceptionenhanced multimodal pretraining**: Large-scale training on unified visionlanguageaction data to strengthen spatial, causal, and manipulation understanding.
+2 -3
View File
@@ -41,8 +41,7 @@ from lerobot.robots import ( # noqa: F401
RobotConfig,
koch_follower,
make_robot_from_config,
so100_follower,
so101_follower,
so_follower,
)
from lerobot.utils.constants import ACTION
from lerobot.utils.robot_utils import precise_sleep
@@ -97,7 +96,7 @@ def replay(cfg: ReplayConfig):
robot.send_action(action)
dt_s = time.perf_counter() - start_episode_t
precise_sleep(1 / dataset.fps - dt_s)
precise_sleep(max(1 / dataset.fps - dt_s, 0.0))
robot.disconnect()
+1 -1
View File
@@ -21,7 +21,7 @@ from lerobot.robots.lekiwi.config_lekiwi import LeKiwiClientConfig
from lerobot.robots.lekiwi.lekiwi_client import LeKiwiClient
from lerobot.scripts.lerobot_record import record_loop
from lerobot.teleoperators.keyboard import KeyboardTeleop, KeyboardTeleopConfig
from lerobot.teleoperators.so100_leader import SO100Leader, SO100LeaderConfig
from lerobot.teleoperators.so_leader import SO100Leader, SO100LeaderConfig
from lerobot.utils.constants import ACTION, OBS_STR
from lerobot.utils.control_utils import init_keyboard_listener
from lerobot.utils.utils import log_say
+1 -1
View File
@@ -18,7 +18,7 @@ import time
from lerobot.robots.lekiwi import LeKiwiClient, LeKiwiClientConfig
from lerobot.teleoperators.keyboard.teleop_keyboard import KeyboardTeleop, KeyboardTeleopConfig
from lerobot.teleoperators.so100_leader import SO100Leader, SO100LeaderConfig
from lerobot.teleoperators.so_leader import SO100Leader, SO100LeaderConfig
from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.visualization_utils import init_rerun, log_rerun_data
+2 -3
View File
@@ -34,12 +34,11 @@ from lerobot.processor.converters import (
transition_to_observation,
transition_to_robot_action,
)
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.robot_kinematic_processor import (
from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
from lerobot.robots.so_follower.robot_kinematic_processor import (
ForwardKinematicsJointsToEE,
InverseKinematicsEEToJoints,
)
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.scripts.lerobot_record import record_loop
from lerobot.utils.control_utils import init_keyboard_listener
from lerobot.utils.utils import log_say
+2 -3
View File
@@ -26,15 +26,14 @@ from lerobot.processor.converters import (
transition_to_observation,
transition_to_robot_action,
)
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.robot_kinematic_processor import (
from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
from lerobot.robots.so_follower.robot_kinematic_processor import (
EEBoundsAndSafety,
EEReferenceAndDelta,
ForwardKinematicsJointsToEE,
GripperVelocityToJoint,
InverseKinematicsEEToJoints,
)
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.scripts.lerobot_record import record_loop
from lerobot.teleoperators.phone.config_phone import PhoneConfig, PhoneOS
from lerobot.teleoperators.phone.phone_processor import MapPhoneActionToRobotAction
+3 -4
View File
@@ -23,11 +23,10 @@ from lerobot.processor.converters import (
robot_action_observation_to_transition,
transition_to_robot_action,
)
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.robot_kinematic_processor import (
from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
from lerobot.robots.so_follower.robot_kinematic_processor import (
InverseKinematicsEEToJoints,
)
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.utils.constants import ACTION
from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import log_say
@@ -96,7 +95,7 @@ def main():
# Send action to robot
_ = robot.send_action(joint_action)
precise_sleep(1.0 / dataset.fps - (time.perf_counter() - t0))
precise_sleep(max(1.0 / dataset.fps - (time.perf_counter() - t0), 0.0))
# Clean up
robot.disconnect()
+2 -3
View File
@@ -21,14 +21,13 @@ from lerobot.processor.converters import (
robot_action_observation_to_transition,
transition_to_robot_action,
)
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.robot_kinematic_processor import (
from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
from lerobot.robots.so_follower.robot_kinematic_processor import (
EEBoundsAndSafety,
EEReferenceAndDelta,
GripperVelocityToJoint,
InverseKinematicsEEToJoints,
)
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.teleoperators.phone.config_phone import PhoneConfig, PhoneOS
from lerobot.teleoperators.phone.phone_processor import MapPhoneActionToRobotAction
from lerobot.teleoperators.phone.teleop_phone import Phone
+14 -3
View File
@@ -94,9 +94,9 @@ from lerobot.rl.process import ProcessSignalHandler
from lerobot.robots import ( # noqa: F401
Robot,
RobotConfig,
bi_so_follower,
koch_follower,
so100_follower,
so101_follower,
so_follower,
)
from lerobot.robots.utils import make_robot_from_config
from lerobot.utils.constants import OBS_IMAGES
@@ -455,7 +455,18 @@ def demo_cli(cfg: RTCDemoConfig):
if cfg.policy.type == "pi05" or cfg.policy.type == "pi0":
config.compile_model = cfg.use_torch_compile
policy = policy_class.from_pretrained(cfg.policy.pretrained_path, config=config)
if config.use_peft:
from peft import PeftConfig, PeftModel
peft_pretrained_path = cfg.policy.pretrained_path
peft_config = PeftConfig.from_pretrained(peft_pretrained_path)
policy = policy_class.from_pretrained(
pretrained_name_or_path=peft_config.base_model_name_or_path, config=config
)
policy = PeftModel.from_pretrained(policy, peft_pretrained_path, config=peft_config)
else:
policy = policy_class.from_pretrained(cfg.policy.pretrained_path, config=config)
# Turn on RTC
policy.config.rtc_config = cfg.rtc
+2 -3
View File
@@ -34,12 +34,11 @@ from lerobot.processor.converters import (
transition_to_observation,
transition_to_robot_action,
)
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.robot_kinematic_processor import (
from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
from lerobot.robots.so_follower.robot_kinematic_processor import (
ForwardKinematicsJointsToEE,
InverseKinematicsEEToJoints,
)
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.scripts.lerobot_record import record_loop
from lerobot.utils.control_utils import init_keyboard_listener
from lerobot.utils.utils import log_say
+3 -5
View File
@@ -27,16 +27,14 @@ from lerobot.processor.converters import (
transition_to_observation,
transition_to_robot_action,
)
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.robot_kinematic_processor import (
from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
from lerobot.robots.so_follower.robot_kinematic_processor import (
EEBoundsAndSafety,
ForwardKinematicsJointsToEE,
InverseKinematicsEEToJoints,
)
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.scripts.lerobot_record import record_loop
from lerobot.teleoperators.so100_leader.config_so100_leader import SO100LeaderConfig
from lerobot.teleoperators.so100_leader.so100_leader import SO100Leader
from lerobot.teleoperators.so_leader import SO100Leader, SO100LeaderConfig
from lerobot.utils.control_utils import init_keyboard_listener
from lerobot.utils.utils import log_say
from lerobot.utils.visualization_utils import init_rerun
+3 -4
View File
@@ -24,11 +24,10 @@ from lerobot.processor.converters import (
robot_action_observation_to_transition,
transition_to_robot_action,
)
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.robot_kinematic_processor import (
from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
from lerobot.robots.so_follower.robot_kinematic_processor import (
InverseKinematicsEEToJoints,
)
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.utils.constants import ACTION
from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import log_say
@@ -97,7 +96,7 @@ def main():
# Send action to robot
_ = robot.send_action(joint_action)
precise_sleep(1.0 / dataset.fps - (time.perf_counter() - t0))
precise_sleep(max(1.0 / dataset.fps - (time.perf_counter() - t0), 0.0))
# Clean up
robot.disconnect()
+3 -5
View File
@@ -23,15 +23,13 @@ from lerobot.processor.converters import (
robot_action_to_transition,
transition_to_robot_action,
)
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.robot_kinematic_processor import (
from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
from lerobot.robots.so_follower.robot_kinematic_processor import (
EEBoundsAndSafety,
ForwardKinematicsJointsToEE,
InverseKinematicsEEToJoints,
)
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.teleoperators.so100_leader.config_so100_leader import SO100LeaderConfig
from lerobot.teleoperators.so100_leader.so100_leader import SO100Leader
from lerobot.teleoperators.so_leader import SO100Leader, SO100LeaderConfig
from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.visualization_utils import init_rerun, log_rerun_data
+1 -2
View File
@@ -5,8 +5,7 @@ from lerobot.datasets.lerobot_dataset import LeRobotDatasetMetadata
from lerobot.policies.act.modeling_act import ACTPolicy
from lerobot.policies.factory import make_pre_post_processors
from lerobot.policies.utils import build_inference_frame, make_robot_action
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
MAX_EPISODES = 5
MAX_STEPS_PER_EPISODE = 20
+2 -1
View File
@@ -4,7 +4,7 @@ from lerobot.async_inference.configs import RobotClientConfig
from lerobot.async_inference.helpers import visualize_action_queue_size
from lerobot.async_inference.robot_client import RobotClient
from lerobot.cameras.opencv.configuration_opencv import OpenCVCameraConfig
from lerobot.robots.so100_follower import SO100FollowerConfig
from lerobot.robots.so_follower import SO100FollowerConfig
def main():
@@ -30,6 +30,7 @@ def main():
robot=robot_cfg,
server_address=server_address,
policy_device="mps",
client_device="cpu",
policy_type="act",
pretrained_name_or_path="<user>/robot_learning_tutorial_act",
chunk_size_threshold=0.5, # g
@@ -5,8 +5,7 @@ from lerobot.datasets.lerobot_dataset import LeRobotDatasetMetadata
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
from lerobot.policies.factory import make_pre_post_processors
from lerobot.policies.utils import build_inference_frame, make_robot_action
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
MAX_EPISODES = 5
MAX_STEPS_PER_EPISODE = 20
+1 -2
View File
@@ -5,8 +5,7 @@ from lerobot.datasets.utils import hw_to_dataset_features
from lerobot.policies.factory import make_pre_post_processors
from lerobot.policies.pi0.modeling_pi0 import PI0Policy
from lerobot.policies.utils import build_inference_frame, make_robot_action
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
MAX_EPISODES = 5
MAX_STEPS_PER_EPISODE = 20
+2 -2
View File
@@ -14,8 +14,8 @@ from lerobot.policies.sac.modeling_sac import SACPolicy
from lerobot.policies.sac.reward_model.modeling_classifier import Classifier
from lerobot.rl.buffer import ReplayBuffer
from lerobot.rl.gym_manipulator import make_robot_env
from lerobot.robots.so100_follower import SO100FollowerConfig
from lerobot.teleoperators.so100_leader import SO100LeaderConfig
from lerobot.robots.so_follower import SO100FollowerConfig
from lerobot.teleoperators.so_leader import SO100LeaderConfig
from lerobot.teleoperators.utils import TeleopEvents
LOG_EVERY = 10
@@ -5,8 +5,7 @@ from lerobot.datasets.utils import hw_to_dataset_features
from lerobot.policies.factory import make_pre_post_processors
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
from lerobot.policies.utils import build_inference_frame, make_robot_action
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.robots.so_follower import SO100Follower, SO100FollowerConfig
MAX_EPISODES = 5
MAX_STEPS_PER_EPISODE = 20
+99 -188
View File
@@ -13,16 +13,9 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Example: GR00T Locomotion with Pre-loaded Policies
This example demonstrates the NEW pattern for loading GR00T policies externally
and passing them to the robot class.
"""
import argparse
import logging
import threading
import time
from collections import deque
@@ -31,24 +24,26 @@ import onnxruntime as ort
from huggingface_hub import hf_hub_download
from lerobot.robots.unitree_g1.config_unitree_g1 import UnitreeG1Config
from lerobot.robots.unitree_g1.g1_utils import G1_29_JointIndex
from lerobot.robots.unitree_g1.unitree_g1 import UnitreeG1
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
GROOT_DEFAULT_ANGLES = np.zeros(29, dtype=np.float32)
GROOT_DEFAULT_ANGLES[[0, 6]] = -0.1 # hip pitch
GROOT_DEFAULT_ANGLES[[3, 9]] = 0.3 # knee
GROOT_DEFAULT_ANGLES[[4, 10]] = -0.2 # ankle pitch
GROOT_DEFAULT_ANGLES[[0, 6]] = -0.1 # Hip pitch
GROOT_DEFAULT_ANGLES[[3, 9]] = 0.3 # Knee
GROOT_DEFAULT_ANGLES[[4, 10]] = -0.2 # Ankle pitch
MISSING_JOINTS = []
G1_MODEL = "g1_23" # or "g1_29"
G1_MODEL = "g1_23" # Or "g1_29"
if G1_MODEL == "g1_23":
MISSING_JOINTS = [12, 14, 20, 21, 27, 28] # waist yaw/pitch, wrist pitch/yaw
LOCOMOTION_ACTION_SCALE = 0.25
LOCOMOTION_CONTROL_DT = 0.02
MISSING_JOINTS = [12, 14, 20, 21, 27, 28] # Waist yaw/pitch, wrist pitch/yaw
# Control parameters
ACTION_SCALE = 0.25
CONTROL_DT = 0.02 # 50Hz
ANG_VEL_SCALE: float = 0.25
DOF_POS_SCALE: float = 1.0
DOF_VEL_SCALE: float = 0.05
@@ -61,12 +56,12 @@ DEFAULT_GROOT_REPO_ID = "nepyope/GR00T-WholeBodyControl_g1"
def load_groot_policies(
repo_id: str = DEFAULT_GROOT_REPO_ID,
) -> tuple[ort.InferenceSession, ort.InferenceSession]:
"""Load GR00T dual-policy system (Balance + Walk) from Hugging Face Hub.
"""Load GR00T dual-policy system (Balance + Walk) from the hub.
Args:
repo_id: Hugging Face Hub repository ID containing the ONNX policies.
"""
logger.info(f"Loading GR00T dual-policy system from Hugging Face Hub ({repo_id})...")
logger.info(f"Loading GR00T dual-policy system from the hub ({repo_id})...")
# Download ONNX policies from Hugging Face Hub
balance_path = hf_hub_download(
@@ -88,15 +83,7 @@ def load_groot_policies(
class GrootLocomotionController:
"""
Handles GR00T-style locomotion control for the Unitree G1 robot.
This controller manages:
- Dual-policy system (Balance + Walk)
- 29-joint observation processing
- 15D action output (legs + waist)
- Policy inference and motor command generation
"""
"""GR00T lower-body locomotion controller for the Unitree G1."""
def __init__(self, policy_balance, policy_walk, robot, config):
self.policy_balance = policy_balance
@@ -104,9 +91,9 @@ class GrootLocomotionController:
self.robot = robot
self.config = config
self.locomotion_cmd = np.array([0.0, 0.0, 0.0], dtype=np.float32) # vx, vy, theta_dot
self.cmd = np.array([0.0, 0.0, 0.0], dtype=np.float32) # vx, vy, theta_dot
# GR00T-specific state
# Robot state
self.groot_qj_all = np.zeros(29, dtype=np.float32)
self.groot_dqj_all = np.zeros(29, dtype=np.float32)
self.groot_action = np.zeros(15, dtype=np.float32)
@@ -116,47 +103,39 @@ class GrootLocomotionController:
self.groot_height_cmd = 0.74 # Default base height
self.groot_orientation_cmd = np.array([0.0, 0.0, 0.0], dtype=np.float32)
# input to gr00t is 6 frames (6*86D=516)
# Input to GR00T is 6 frames (6*86D=516)
for _ in range(6):
self.groot_obs_history.append(np.zeros(86, dtype=np.float32))
# Thread management
self.locomotion_running = False
self.locomotion_thread = None
logger.info("GrootLocomotionController initialized")
def groot_locomotion_run(self):
# get current observation
robot_state = self.robot.get_observation()
def run_step(self):
# Get current observation
obs = self.robot.get_observation()
if robot_state is None:
if not obs:
return
# get command from remote controller
if robot_state.wireless_remote is not None:
self.robot.remote_controller.set(robot_state.wireless_remote)
if self.robot.remote_controller.button[0]: # R1 - raise waist
self.groot_height_cmd += 0.001
self.groot_height_cmd = np.clip(self.groot_height_cmd, 0.50, 1.00)
if self.robot.remote_controller.button[4]: # R2 - lower waist
self.groot_height_cmd -= 0.001
self.groot_height_cmd = np.clip(self.groot_height_cmd, 0.50, 1.00)
else:
self.robot.remote_controller.lx = 0.0
self.robot.remote_controller.ly = 0.0
self.robot.remote_controller.rx = 0.0
self.robot.remote_controller.ry = 0.0
# Get command from remote controller
if obs["remote.buttons"][0]: # R1 - raise waist
self.groot_height_cmd += 0.001
self.groot_height_cmd = np.clip(self.groot_height_cmd, 0.50, 1.00)
if obs["remote.buttons"][4]: # R2 - lower waist
self.groot_height_cmd -= 0.001
self.groot_height_cmd = np.clip(self.groot_height_cmd, 0.50, 1.00)
self.locomotion_cmd[0] = self.robot.remote_controller.ly # forward/backward
self.locomotion_cmd[1] = self.robot.remote_controller.lx * -1 # left/right
self.locomotion_cmd[2] = self.robot.remote_controller.rx * -1 # rotation rate
self.cmd[0] = obs["remote.ly"] # Forward/backward
self.cmd[1] = obs["remote.lx"] * -1 # Left/right
self.cmd[2] = obs["remote.rx"] * -1 # Rotation rate
for i in range(29):
self.groot_qj_all[i] = robot_state.motor_state[i].q
self.groot_dqj_all[i] = robot_state.motor_state[i].dq
# Get joint positions and velocities from flat dict
for motor in G1_29_JointIndex:
name = motor.name
idx = motor.value
self.groot_qj_all[idx] = obs[f"{name}.q"]
self.groot_dqj_all[idx] = obs[f"{name}.dq"]
# adapt observation for g1_23dof
# Adapt observation for g1_23dof
for idx in MISSING_JOINTS:
self.groot_qj_all[idx] = 0.0
self.groot_dqj_all[idx] = 0.0
@@ -165,18 +144,18 @@ class GrootLocomotionController:
qj_obs = self.groot_qj_all.copy()
dqj_obs = self.groot_dqj_all.copy()
# express imu data in gravity frame of reference
quat = robot_state.imu_state.quaternion
ang_vel = np.array(robot_state.imu_state.gyroscope, dtype=np.float32)
# Express IMU data in gravity frame of reference
quat = [obs["imu.quat.w"], obs["imu.quat.x"], obs["imu.quat.y"], obs["imu.quat.z"]]
ang_vel = np.array([obs["imu.gyro.x"], obs["imu.gyro.y"], obs["imu.gyro.z"]], dtype=np.float32)
gravity_orientation = self.robot.get_gravity_orientation(quat)
# scale joint positions and velocities before policy inference
# Scale joint positions and velocities before policy inference
qj_obs = (qj_obs - GROOT_DEFAULT_ANGLES) * DOF_POS_SCALE
dqj_obs = dqj_obs * DOF_VEL_SCALE
ang_vel_scaled = ang_vel * ANG_VEL_SCALE
# build single frame observation
self.groot_obs_single[:3] = self.locomotion_cmd * np.array(CMD_SCALE)
# Build single frame observation
self.groot_obs_single[:3] = self.cmd * np.array(CMD_SCALE)
self.groot_obs_single[3] = self.groot_height_cmd
self.groot_obs_single[4:7] = self.groot_orientation_cmd
self.groot_obs_single[7:10] = ang_vel_scaled
@@ -194,113 +173,76 @@ class GrootLocomotionController:
end_idx = start_idx + 86
self.groot_obs_stacked[start_idx:end_idx] = obs_frame
# Run policy inference (ONNX) with 516D stacked observation
cmd_magnitude = np.linalg.norm(self.locomotion_cmd)
cmd_magnitude = np.linalg.norm(self.cmd)
selected_policy = (
self.policy_balance if cmd_magnitude < 0.05 else self.policy_walk
) # balance/standing policy for small commands, walking policy for movement commands
) # Balance/standing policy for small commands, walking policy for movement commands
# run policy inference
# Run policy inference
ort_inputs = {selected_policy.get_inputs()[0].name: np.expand_dims(self.groot_obs_stacked, axis=0)}
ort_outs = selected_policy.run(None, ort_inputs)
self.groot_action = ort_outs[0].squeeze()
# transform action back to target joint positions
target_dof_pos_15 = GROOT_DEFAULT_ANGLES[:15] + self.groot_action * LOCOMOTION_ACTION_SCALE
# Transform action back to target joint positions
target_dof_pos_15 = GROOT_DEFAULT_ANGLES[:15] + self.groot_action * ACTION_SCALE
# command motors
# Build action dict (only first 15 joints for GR00T)
action_dict = {}
for i in range(15):
motor_idx = i
self.robot.msg.motor_cmd[motor_idx].q = target_dof_pos_15[i]
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = self.robot.kp[motor_idx]
self.robot.msg.motor_cmd[motor_idx].kd = self.robot.kd[motor_idx]
self.robot.msg.motor_cmd[motor_idx].tau = 0
motor_name = G1_29_JointIndex(i).name
action_dict[f"{motor_name}.q"] = float(target_dof_pos_15[i])
# adapt action for g1_23dof
# Zero out missing joints for g1_23dof
for joint_idx in MISSING_JOINTS:
self.robot.msg.motor_cmd[joint_idx].q = 0.0
self.robot.msg.motor_cmd[joint_idx].qd = 0
self.robot.msg.motor_cmd[joint_idx].kp = self.robot.kp[joint_idx]
self.robot.msg.motor_cmd[joint_idx].kd = self.robot.kd[joint_idx]
self.robot.msg.motor_cmd[joint_idx].tau = 0
motor_name = G1_29_JointIndex(joint_idx).name
action_dict[f"{motor_name}.q"] = 0.0
# send action to robot
self.robot.send_action(self.robot.msg)
# Send action to robot
self.robot.send_action(action_dict)
def _locomotion_thread_loop(self):
"""Background thread that runs the locomotion policy at specified rate."""
logger.info("Locomotion thread started")
while self.locomotion_running:
def run(repo_id: str = DEFAULT_GROOT_REPO_ID) -> None:
"""Main function to run the GR00T locomotion controller.
Args:
repo_id: Hugging Face Hub repository ID for GR00T policies.
"""
# Load policies
policy_balance, policy_walk = load_groot_policies(repo_id=repo_id)
# Initialize robot
config = UnitreeG1Config()
robot = UnitreeG1(config)
robot.connect()
# Initialize gr00T locomotion controller
groot_controller = GrootLocomotionController(
policy_balance=policy_balance,
policy_walk=policy_walk,
robot=robot,
config=config,
)
try:
robot.reset(CONTROL_DT, GROOT_DEFAULT_ANGLES)
logger.info("Use joystick: LY=fwd/back, LX=left/right, RX=rotate, R1=raise waist, R2=lower waist")
logger.info("Press Ctrl+C to stop")
# Run step
while not robot._shutdown_event.is_set():
start_time = time.time()
try:
self.groot_locomotion_run()
except Exception as e:
logger.error(f"Error in locomotion loop: {e}")
# Sleep to maintain control rate
groot_controller.run_step()
elapsed = time.time() - start_time
sleep_time = max(0, LOCOMOTION_CONTROL_DT - elapsed)
sleep_time = max(0, CONTROL_DT - elapsed)
time.sleep(sleep_time)
logger.info("Locomotion thread stopped")
def start_locomotion_thread(self):
if self.locomotion_running:
logger.warning("Locomotion thread already running")
return
logger.info("Starting locomotion control thread...")
self.locomotion_running = True
self.locomotion_thread = threading.Thread(target=self._locomotion_thread_loop, daemon=True)
self.locomotion_thread.start()
logger.info("Locomotion control thread started!")
def stop_locomotion_thread(self):
if not self.locomotion_running:
return
logger.info("Stopping locomotion control thread...")
self.locomotion_running = False
if self.locomotion_thread:
self.locomotion_thread.join(timeout=2.0)
logger.info("Locomotion control thread stopped")
def reset_robot(self):
"""Move robot legs to default standing position over 2 seconds (arms are not moved)."""
total_time = 3.0
num_step = int(total_time / self.robot.control_dt)
# Only control legs, not arms (first 12 joints)
default_pos = GROOT_DEFAULT_ANGLES # First 12 values are leg angles
dof_size = len(default_pos)
# Get current lowstate
robot_state = self.robot.get_observation()
# Record the current leg positions
init_dof_pos = np.zeros(dof_size, dtype=np.float32)
for i in range(dof_size):
init_dof_pos[i] = robot_state.motor_state[i].q
# Move legs to default pos
for i in range(num_step):
alpha = i / num_step
for motor_idx in range(dof_size):
target_pos = default_pos[motor_idx]
self.robot.msg.motor_cmd[motor_idx].q = (
init_dof_pos[motor_idx] * (1 - alpha) + target_pos * alpha
)
self.robot.msg.motor_cmd[motor_idx].qd = 0
self.robot.msg.motor_cmd[motor_idx].kp = self.robot.kp[motor_idx]
self.robot.msg.motor_cmd[motor_idx].kd = self.robot.kd[motor_idx]
self.robot.msg.motor_cmd[motor_idx].tau = 0
self.robot.msg.crc = self.robot.crc.Crc(self.robot.msg)
self.robot.lowcmd_publisher.Write(self.robot.msg)
time.sleep(self.robot.control_dt)
logger.info("Reached default position (legs only)")
except KeyboardInterrupt:
logger.info("Stopping locomotion...")
finally:
if robot.is_connected:
robot.disconnect()
logger.info("Done!")
if __name__ == "__main__":
@@ -313,35 +255,4 @@ if __name__ == "__main__":
)
args = parser.parse_args()
# load policies
policy_balance, policy_walk = load_groot_policies(repo_id=args.repo_id)
# initialize robot
config = UnitreeG1Config()
robot = UnitreeG1(config)
# initialize gr00t locomotion controller
groot_controller = GrootLocomotionController(
policy_balance=policy_balance,
policy_walk=policy_walk,
robot=robot,
config=config,
)
# reset legs and start locomotion thread
try:
groot_controller.reset_robot()
groot_controller.start_locomotion_thread()
# log status
logger.info("Robot initialized with GR00T locomotion policies")
logger.info("Locomotion controller running in background thread")
logger.info("Press Ctrl+C to stop")
# keep robot alive
while True:
time.sleep(1.0)
except KeyboardInterrupt:
print("\nStopping locomotion...")
groot_controller.stop_locomotion_thread()
print("Done!")
run(repo_id=args.repo_id)
+264
View File
@@ -0,0 +1,264 @@
#!/usr/bin/env python
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import json
import logging
import time
import numpy as np
import onnx
import onnxruntime as ort
from huggingface_hub import hf_hub_download
from lerobot.robots.unitree_g1.config_unitree_g1 import UnitreeG1Config
from lerobot.robots.unitree_g1.g1_utils import G1_29_JointIndex
from lerobot.robots.unitree_g1.unitree_g1 import UnitreeG1
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
DEFAULT_ANGLES = np.zeros(29, dtype=np.float32)
DEFAULT_ANGLES[[0, 6]] = -0.312 # Hip pitch
DEFAULT_ANGLES[[3, 9]] = 0.669 # Knee
DEFAULT_ANGLES[[4, 10]] = -0.363 # Ankle pitch
DEFAULT_ANGLES[[15, 22]] = 0.2 # Shoulder pitch
DEFAULT_ANGLES[16] = 0.2 # Left shoulder roll
DEFAULT_ANGLES[23] = -0.2 # Right shoulder roll
DEFAULT_ANGLES[[18, 25]] = 0.6 # Elbow
MISSING_JOINTS = []
G1_MODEL = "g1_23" # Or "g1_29"
if G1_MODEL == "g1_23":
MISSING_JOINTS = [12, 14, 20, 21, 27, 28] # Waist yaw/pitch, wrist pitch/yaw
# Control parameters
ACTION_SCALE = 0.25
CONTROL_DT = 0.02 # 50Hz
ANG_VEL_SCALE = 0.25
DOF_POS_SCALE = 1.0
DOF_VEL_SCALE = 0.05
GAIT_PERIOD = 1.0
DEFAULT_HOLOSOMA_REPO_ID = "nepyope/holosoma_locomotion"
# Policy filename mapping
POLICY_FILES = {
"fastsac": "fastsac_g1_29dof.onnx",
"ppo": "ppo_g1_29dof.onnx",
}
def load_policy(
repo_id: str = DEFAULT_HOLOSOMA_REPO_ID,
policy_type: str = "fastsac",
) -> tuple[ort.InferenceSession, np.ndarray, np.ndarray]:
"""Load Holosoma locomotion policy and extract KP/KD from metadata.
Args:
repo_id: Hugging Face Hub repo ID
policy_type: Either "fastsac" (default) or "ppo"
Returns:
(policy, kp, kd) tuple
"""
if policy_type not in POLICY_FILES:
raise ValueError(f"Unknown policy type: {policy_type}. Choose from: {list(POLICY_FILES.keys())}")
filename = POLICY_FILES[policy_type]
logger.info(f"Loading {policy_type.upper()} policy from: {repo_id}/{filename}")
policy_path = hf_hub_download(repo_id=repo_id, filename=filename)
policy = ort.InferenceSession(policy_path)
logger.info(f"Policy loaded: {policy.get_inputs()[0].shape}{policy.get_outputs()[0].shape}")
# Extract KP/KD from ONNX metadata
model = onnx.load(policy_path)
metadata = {prop.key: prop.value for prop in model.metadata_props}
if "kp" not in metadata or "kd" not in metadata:
raise ValueError("ONNX model must contain 'kp' and 'kd' in metadata")
kp = np.array(json.loads(metadata["kp"]), dtype=np.float32)
kd = np.array(json.loads(metadata["kd"]), dtype=np.float32)
logger.info(f"Loaded KP/KD from ONNX ({len(kp)} joints)")
return policy, kp, kd
class HolosomaLocomotionController:
"""Holosoma whole-body locomotion controller for Unitree G1."""
def __init__(self, policy, robot, kp: np.ndarray, kd: np.ndarray):
self.policy = policy
self.robot = robot
# Override robot's PD gains with policy gains
self.robot.kp = kp
self.robot.kd = kd
self.cmd = np.zeros(3, dtype=np.float32)
# Robot state
self.qj = np.zeros(29, dtype=np.float32)
self.dqj = np.zeros(29, dtype=np.float32)
self.obs = np.zeros(100, dtype=np.float32)
self.last_action = np.zeros(29, dtype=np.float32)
# Gait phase
self.phase = np.array([[0.0, np.pi]], dtype=np.float32)
self.phase_dt = 2 * np.pi / ((1.0 / CONTROL_DT) * GAIT_PERIOD)
self.is_standing = True
def run_step(self):
# Get current observation
obs = self.robot.get_observation()
if not obs:
return
# Get command from remote controller
ly = obs["remote.ly"] if abs(obs["remote.ly"]) > 0.1 else 0.0
lx = obs["remote.lx"] if abs(obs["remote.lx"]) > 0.1 else 0.0
rx = obs["remote.rx"] if abs(obs["remote.rx"]) > 0.1 else 0.0
self.cmd[:] = [ly, -lx, -rx]
# Get joint positions and velocities
for motor in G1_29_JointIndex:
name = motor.name
idx = motor.value
self.qj[idx] = obs[f"{name}.q"]
self.dqj[idx] = obs[f"{name}.dq"]
# Adapt observation for g1_23dof
for idx in MISSING_JOINTS:
self.qj[idx] = 0.0
self.dqj[idx] = 0.0
# Express IMU data in gravity frame of reference
quat = [obs["imu.quat.w"], obs["imu.quat.x"], obs["imu.quat.y"], obs["imu.quat.z"]]
ang_vel = np.array([obs["imu.gyro.x"], obs["imu.gyro.y"], obs["imu.gyro.z"]], dtype=np.float32)
gravity = self.robot.get_gravity_orientation(quat)
# Scale joint positions and velocities before policy inference
qj_obs = (self.qj - DEFAULT_ANGLES) * DOF_POS_SCALE
dqj_obs = self.dqj * DOF_VEL_SCALE
ang_vel_s = ang_vel * ANG_VEL_SCALE
# Update gait phase
if np.linalg.norm(self.cmd[:2]) < 0.01 and abs(self.cmd[2]) < 0.01:
self.phase[0, :] = np.pi
self.is_standing = True
elif self.is_standing:
self.phase = np.array([[0.0, np.pi]], dtype=np.float32)
self.is_standing = False
else:
self.phase = np.fmod(self.phase + self.phase_dt + np.pi, 2 * np.pi) - np.pi
sin_ph = np.sin(self.phase[0])
cos_ph = np.cos(self.phase[0])
# Build observations
self.obs[0:29] = self.last_action
self.obs[29:32] = ang_vel_s
self.obs[32] = self.cmd[2]
self.obs[33:35] = self.cmd[:2]
self.obs[35:37] = cos_ph
self.obs[37:66] = qj_obs
self.obs[66:95] = dqj_obs
self.obs[95:98] = gravity
self.obs[98:100] = sin_ph
# Run policy inference
ort_in = {self.policy.get_inputs()[0].name: self.obs.reshape(1, -1).astype(np.float32)}
raw_action = self.policy.run(None, ort_in)[0].squeeze()
action = np.clip(raw_action, -100.0, 100.0)
self.last_action = action.copy()
# Transform action back to target joint positions
target = DEFAULT_ANGLES + action * ACTION_SCALE
# Build action dict
action_dict = {}
for motor in G1_29_JointIndex:
action_dict[f"{motor.name}.q"] = float(target[motor.value])
# Zero out missing joints for g1_23dof
for joint_idx in MISSING_JOINTS:
motor_name = G1_29_JointIndex(joint_idx).name
action_dict[f"{motor_name}.q"] = 0.0
# Send action to robot
self.robot.send_action(action_dict)
def run(repo_id: str = DEFAULT_HOLOSOMA_REPO_ID, policy_type: str = "fastsac") -> None:
"""Main function to run the Holosoma locomotion controller.
Args:
repo_id: Hugging Face Hub repository ID for Holosoma policies.
policy_type: Policy type to use ('fastsac' or 'ppo').
"""
# Load policy and gains
policy, kp, kd = load_policy(repo_id=repo_id, policy_type=policy_type)
# Initialize robot
config = UnitreeG1Config()
robot = UnitreeG1(config)
robot.connect()
holosoma_controller = HolosomaLocomotionController(policy, robot, kp, kd)
try:
robot.reset(CONTROL_DT, DEFAULT_ANGLES)
logger.info("Use joystick: LY=fwd/back, LX=left/right, RX=rotate")
logger.info("Press Ctrl+C to stop")
# Run step
while not robot._shutdown_event.is_set():
start_time = time.time()
holosoma_controller.run_step()
elapsed = time.time() - start_time
sleep_time = max(0, CONTROL_DT - elapsed)
time.sleep(sleep_time)
except KeyboardInterrupt:
logger.info("Stopping locomotion...")
finally:
if robot.is_connected:
robot.disconnect()
logger.info("Done!")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Holosoma Locomotion Controller for Unitree G1")
parser.add_argument(
"--repo-id",
type=str,
default=DEFAULT_HOLOSOMA_REPO_ID,
help=f"Hugging Face Hub repo ID for Holosoma policies (default: {DEFAULT_HOLOSOMA_REPO_ID})",
)
parser.add_argument(
"--policy",
type=str,
choices=["fastsac", "ppo"],
default="fastsac",
help="Policy type to use: 'fastsac' (default) or 'ppo'",
)
args = parser.parse_args()
run(repo_id=args.repo_id, policy_type=args.policy)
+23 -9
View File
@@ -25,9 +25,9 @@ discord = "https://discord.gg/s3KuuzsPFb"
[project]
name = "lerobot"
version = "0.4.3"
version = "0.4.4"
description = "🤗 LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch"
readme = "README.md"
dynamic = ["readme"]
license = { text = "Apache-2.0" }
requires-python = ">=3.10"
authors = [
@@ -74,7 +74,7 @@ dependencies = [
"packaging>=24.2,<26.0",
"pynput>=1.7.7,<1.9.0",
"pyserial>=3.5,<4.0",
"wandb>=0.20.0,<0.22.0", # TODO: Bumb dependency (compatible with protobuf)
"wandb>=0.24.0,<0.25.0",
"torch>=2.2.1,<2.8.0", # TODO: Bumb dependency
"torchcodec>=0.2.1,<0.6.0; sys_platform != 'win32' and (sys_platform != 'linux' or (platform_machine != 'aarch64' and platform_machine != 'arm64' and platform_machine != 'armv7l')) and (sys_platform != 'darwin' or platform_machine != 'x86_64')", # TODO: Bumb dependency
@@ -97,11 +97,12 @@ dependencies = [
pygame-dep = ["pygame>=2.5.1,<2.7.0"]
placo-dep = ["placo>=0.9.6,<0.10.0"]
transformers-dep = ["transformers>=4.57.1,<5.0.0"]
grpcio-dep = ["grpcio==1.73.1", "protobuf==6.31.0"] # TODO: Bumb dependency (compatible with wandb)
grpcio-dep = ["grpcio==1.73.1", "protobuf>=6.31.1,<6.32.0"]
# Motors
feetech = ["feetech-servo-sdk>=1.0.0,<2.0.0"]
dynamixel = ["dynamixel-sdk>=3.7.31,<3.9.0"]
damiao = ["python-can>=4.2.0,<5.0.0"]
# Robots
gamepad = ["lerobot[pygame-dep]", "hidapi>=0.14.0,<0.15.0"]
@@ -109,9 +110,9 @@ hopejr = ["lerobot[feetech]", "lerobot[pygame-dep]"]
lekiwi = ["lerobot[feetech]", "pyzmq>=26.2.1,<28.0.0"]
unitree_g1 = [
"pyzmq>=26.2.1,<28.0.0",
"onnxruntime>=1.16.0"
"onnxruntime>=1.16.0,<2.0.0"
]
reachy2 = ["reachy2_sdk>=1.0.14,<1.1.0"]
reachy2 = ["reachy2_sdk>=1.0.15,<1.1.0"]
kinematics = ["lerobot[placo-dep]"]
intelrealsense = [
"pyrealsense2>=2.55.1.6486,<2.57.0 ; sys_platform != 'darwin'",
@@ -127,7 +128,7 @@ wallx = [
"torchdiffeq==0.2.5",
"qwen_vl_utils==0.0.11"
]
pi = ["transformers @ git+https://github.com/huggingface/transformers.git@fix/lerobot_openpi"]
pi = ["transformers @ git+https://github.com/huggingface/transformers.git@fix/lerobot_openpi", "scipy>=1.10.1,<1.15"]
smolvla = ["lerobot[transformers-dep]", "num2words>=0.5.14,<0.6.0", "accelerate>=1.7.0,<2.0.0", "safetensors>=0.4.3,<1.0.0"]
groot = [
"lerobot[transformers-dep]",
@@ -140,12 +141,13 @@ groot = [
"ninja>=1.11.1,<2.0.0",
"flash-attn>=2.5.9,<3.0.0 ; sys_platform != 'darwin'"
]
sarm = ["lerobot[transformers-dep]", "faker>=33.0.0,<35.0.0", "matplotlib>=3.10.3,<4.0.0", "qwen-vl-utils>=0.0.14"]
sarm = ["lerobot[transformers-dep]", "faker>=33.0.0,<35.0.0", "matplotlib>=3.10.3,<4.0.0", "qwen-vl-utils>=0.0.14,<0.1.0"]
xvla = ["lerobot[transformers-dep]"]
hilserl = ["lerobot[transformers-dep]", "gym-hil>=0.1.13,<0.2.0", "lerobot[grpcio-dep]", "lerobot[placo-dep]"]
# Features
async = ["lerobot[grpcio-dep]", "matplotlib>=3.10.3,<4.0.0"]
peft = ["lerobot[transformers-dep]", "peft>=0.18.0,<1.0.0"]
# Development
dev = ["pre-commit>=3.7.0,<5.0.0", "debugpy>=1.8.1,<1.9.0", "lerobot[grpcio-dep]", "grpcio-tools==1.73.1", "mypy>=1.19.1"]
@@ -182,7 +184,8 @@ all = [
"lerobot[phone]",
"lerobot[libero]",
"lerobot[metaworld]",
"lerobot[sarm]"
"lerobot[sarm]",
"lerobot[peft]",
]
[project.scripts]
@@ -195,11 +198,13 @@ lerobot-setup-motors="lerobot.scripts.lerobot_setup_motors:main"
lerobot-teleoperate="lerobot.scripts.lerobot_teleoperate:main"
lerobot-eval="lerobot.scripts.lerobot_eval:main"
lerobot-train="lerobot.scripts.lerobot_train:main"
lerobot-train-tokenizer="lerobot.scripts.lerobot_train_tokenizer:main"
lerobot-dataset-viz="lerobot.scripts.lerobot_dataset_viz:main"
lerobot-info="lerobot.scripts.lerobot_info:main"
lerobot-find-joint-limits="lerobot.scripts.lerobot_find_joint_limits:main"
lerobot-imgtransform-viz="lerobot.scripts.lerobot_imgtransform_viz:main"
lerobot-edit-dataset="lerobot.scripts.lerobot_edit_dataset:main"
lerobot-setup-can="lerobot.scripts.lerobot_setup_can:main"
# ---------------- Tool Configurations ----------------
[tool.setuptools.packages.find]
@@ -275,6 +280,7 @@ default.extend-ignore-identifiers-re = [
"thw",
"inpt",
"ROBOTIS",
"OT_VALUE"
]
# TODO: Uncomment when ready to use
@@ -417,6 +423,10 @@ conflicts = [
{ extra = "wallx" },
{ extra = "libero" },
],
[
{ extra = "wallx" },
{ extra = "peft" },
],
[
{ extra = "wallx" },
{ extra = "all" },
@@ -450,6 +460,10 @@ conflicts = [
{ extra = "pi" },
{ extra = "libero" },
],
[
{ extra = "pi" },
{ extra = "peft" },
],
[
{ extra = "pi" },
{ extra = "all" },
+72
View File
@@ -0,0 +1,72 @@
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from setuptools import setup
def get_version_from_toml() -> str:
"""Return the project's version string parsed from `pyproject.toml`.
The function scans `pyproject.toml` line-by-line looking for a line
that starts with ``version`` (for example: ``version = "1.2.3"``)
and returns the value without surrounding quotes. If no such line is
found a :class:`ValueError` is raised.
Returns:
The version string from `pyproject.toml` (e.g. ``"1.2.3"`` ->
``1.2.3``).
"""
version = None
with open("pyproject.toml", encoding="utf-8") as f:
for line in f:
if line.strip().startswith("version"):
version = line.split("=")[1].strip().strip('"')
break
if version is None:
raise ValueError("Version not found in pyproject.toml")
return version
def read_long_description() -> str:
"""Read and return the project's long description for setup.
This function reads `README.md` and replaces image links that point
to the local `./media/` directory with absolute raw GitHub URLs that
reference the release tag corresponding to the version parsed from
`pyproject.toml` (for example, ``v1.2.3``). The modified README
content is returned as a string suitable for passing to
``setuptools.setup(long_description=...)``.
Returns:
The README content with rewritten media links.
"""
with open("README.md", encoding="utf-8") as f:
content = f.read()
version = get_version_from_toml()
git_tag = f"v{version}"
base_raw_url = f"https://raw.githubusercontent.com/huggingface/lerobot/{git_tag}/"
content = content.replace('src="./media/', f'src="{base_raw_url}media/')
return content
setup(
long_description=read_long_description(),
long_description_content_type="text/markdown",
)
+10
View File
@@ -126,6 +126,12 @@ class RobotClientConfig:
# Device configuration
policy_device: str = field(default="cpu", metadata={"help": "Device for policy inference"})
client_device: str = field(
default="cpu",
metadata={
"help": "Device to move actions to after receiving from server (e.g., for downstream planners)"
},
)
# Control behavior configuration
chunk_size_threshold: float = field(default=0.5, metadata={"help": "Threshold for chunk size control"})
@@ -161,6 +167,9 @@ class RobotClientConfig:
if not self.policy_device:
raise ValueError("policy_device cannot be empty")
if not self.client_device:
raise ValueError("client_device cannot be empty")
if self.chunk_size_threshold < 0 or self.chunk_size_threshold > 1:
raise ValueError(f"chunk_size_threshold must be between 0 and 1, got {self.chunk_size_threshold}")
@@ -184,6 +193,7 @@ class RobotClientConfig:
"policy_type": self.policy_type,
"pretrained_name_or_path": self.pretrained_name_or_path,
"policy_device": self.policy_device,
"client_device": self.client_device,
"chunk_size_threshold": self.chunk_size_threshold,
"fps": self.fps,
"actions_per_chunk": self.actions_per_chunk,
+2 -2
View File
@@ -23,7 +23,7 @@ DEFAULT_INFERENCE_LATENCY = 1 / DEFAULT_FPS
DEFAULT_OBS_QUEUE_TIMEOUT = 2
# All action chunking policies
SUPPORTED_POLICIES = ["act", "smolvla", "diffusion", "tdmpc", "vqbet", "pi0", "pi05"]
SUPPORTED_POLICIES = ["act", "smolvla", "diffusion", "tdmpc", "vqbet", "pi0", "pi05", "groot"]
# TODO: Add all other robots
SUPPORTED_ROBOTS = ["so100_follower", "so101_follower", "bi_so100_follower", "omx_follower"]
SUPPORTED_ROBOTS = ["so100_follower", "so101_follower", "bi_so_follower", "omx_follower"]
+3 -2
View File
@@ -18,6 +18,7 @@ import os
import time
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
import torch
@@ -39,8 +40,8 @@ from lerobot.utils.utils import init_logging
Action = torch.Tensor
# observation as received from the robot
RawObservation = dict[str, torch.Tensor]
# observation as received from the robot (can be numpy arrays, floats, etc.)
RawObservation = dict[str, Any]
# observation as those recorded in LeRobot dataset (keys are different)
LeRobotObservation = dict[str, torch.Tensor]
@@ -381,6 +381,8 @@ class PolicyServer(services_pb2_grpc.AsyncInferenceServicer):
action_tensor = torch.stack(processed_actions, dim=1).squeeze(0)
self.logger.debug(f"Postprocessed action shape: {action_tensor.shape}")
action_tensor = action_tensor.detach().cpu()
"""5. Convert to TimedAction list"""
action_chunk = self._time_action_chunk(
observation_t.get_timestamp(), list(action_tensor), observation_t.get_timestep()
+18 -3
View File
@@ -25,6 +25,7 @@ python src/lerobot/async_inference/robot_client.py \
--policy_type=act \
--pretrained_name_or_path=user/model \
--policy_device=mps \
--client_device=cpu \
--actions_per_chunk=50 \
--chunk_size_threshold=0.5 \
--aggregate_fn_name=weighted_average \
@@ -51,12 +52,11 @@ from lerobot.cameras.realsense.configuration_realsense import RealSenseCameraCon
from lerobot.robots import ( # noqa: F401
Robot,
RobotConfig,
bi_so100_follower,
bi_so_follower,
koch_follower,
make_robot_from_config,
omx_follower,
so100_follower,
so101_follower,
so_follower,
)
from lerobot.transport import (
services_pb2, # type: ignore
@@ -286,6 +286,21 @@ class RobotClient:
timed_actions = pickle.loads(actions_chunk.data) # nosec
deserialize_time = time.perf_counter() - deserialize_start
# Log device type of received actions
if len(timed_actions) > 0:
received_device = timed_actions[0].get_action().device.type
self.logger.debug(f"Received actions on device: {received_device}")
# Move actions to client_device (e.g., for downstream planners that need GPU)
client_device = self.config.client_device
if client_device != "cpu":
for timed_action in timed_actions:
if timed_action.get_action().device.type != client_device:
timed_action.action = timed_action.get_action().to(client_device)
self.logger.debug(f"Converted actions to device: {client_device}")
else:
self.logger.debug(f"Actions kept on device: {client_device}")
self.action_chunk_size = max(self.action_chunk_size, len(timed_actions))
# Calculate network latency if we have matching observations
@@ -35,18 +35,19 @@ class Reachy2CameraConfig(CameraConfig):
name="teleop",
image_type="left",
ip_address="192.168.0.200", # IP address of the robot
fps=15,
port=50065, # Port of the camera server
width=640,
height=480,
fps=30, # Not configurable for Reachy 2 cameras
color_mode=ColorMode.RGB,
) # Left teleop camera, 640x480 @ 15FPS
) # Left teleop camera, 640x480 @ 30FPS
```
Attributes:
name: Name of the camera device. Can be "teleop" or "depth".
image_type: Type of image stream. For "teleop" camera, can be "left" or "right".
For "depth" camera, can be "rgb" or "depth". (depth is not supported yet)
fps: Requested frames per second for the color stream.
fps: Requested frames per second for the color stream. Not configurable for Reachy 2 cameras.
width: Requested frame width in pixels for the color stream.
height: Requested frame height in pixels for the color stream.
color_mode: Color mode for image output (RGB or BGR). Defaults to RGB.
@@ -62,7 +63,6 @@ class Reachy2CameraConfig(CameraConfig):
color_mode: ColorMode = ColorMode.RGB
ip_address: str | None = "localhost"
port: int = 50065
# use_depth: bool = False
def __post_init__(self) -> None:
if self.name not in ["teleop", "depth"]:
@@ -16,12 +16,13 @@
Provides the Reachy2Camera class for capturing frames from Reachy 2 cameras using Reachy 2's CameraManager.
"""
from __future__ import annotations
import logging
import os
import platform
import time
from threading import Event, Lock, Thread
from typing import Any
from typing import TYPE_CHECKING, Any
from numpy.typing import NDArray # type: ignore # TODO: add type stubs for numpy.typing
@@ -30,10 +31,19 @@ if platform.system() == "Windows" and "OPENCV_VIDEOIO_MSMF_ENABLE_HW_TRANSFORMS"
os.environ["OPENCV_VIDEOIO_MSMF_ENABLE_HW_TRANSFORMS"] = "0"
import cv2 # type: ignore # TODO: add type stubs for OpenCV
import numpy as np # type: ignore # TODO: add type stubs for numpy
from reachy2_sdk.media.camera import CameraView # type: ignore # TODO: add type stubs for reachy2_sdk
from reachy2_sdk.media.camera_manager import ( # type: ignore # TODO: add type stubs for reachy2_sdk
CameraManager,
)
from lerobot.utils.import_utils import _reachy2_sdk_available
if TYPE_CHECKING or _reachy2_sdk_available:
from reachy2_sdk.media.camera import CameraView
from reachy2_sdk.media.camera_manager import CameraManager
else:
CameraManager = None
class CameraView:
LEFT = 0
RIGHT = 1
from lerobot.utils.errors import DeviceNotConnectedError
@@ -69,17 +79,10 @@ class Reachy2Camera(Camera):
self.config = config
self.fps = config.fps
self.color_mode = config.color_mode
self.cam_manager: CameraManager | None = None
self.thread: Thread | None = None
self.stop_event: Event | None = None
self.frame_lock: Lock = Lock()
self.latest_frame: NDArray[Any] | None = None
self.new_frame_event: Event = Event()
def __str__(self) -> str:
return f"{self.__class__.__name__}({self.config.name}, {self.config.image_type})"
@@ -100,44 +103,23 @@ class Reachy2Camera(Camera):
def connect(self, warmup: bool = True) -> None:
"""
Connects to the Reachy2 CameraManager as specified in the configuration.
Raises:
DeviceNotConnectedError: If the camera is not connected.
"""
self.cam_manager = CameraManager(host=self.config.ip_address, port=self.config.port)
if self.cam_manager is None:
raise DeviceNotConnectedError(f"Could not connect to {self}.")
self.cam_manager.initialize_cameras()
logger.info(f"{self} connected.")
@staticmethod
def find_cameras(ip_address: str = "localhost", port: int = 50065) -> list[dict[str, Any]]:
def find_cameras() -> list[dict[str, Any]]:
"""
Detects available Reachy 2 cameras.
Returns:
List[Dict[str, Any]]: A list of dictionaries,
where each dictionary contains 'name', 'stereo',
and the default profile properties (width, height, fps).
Detection not implemented for Reachy2 cameras.
"""
initialized_cameras = []
camera_manager = CameraManager(host=ip_address, port=port)
for camera in [camera_manager.teleop, camera_manager.depth]:
if camera is None:
continue
height, width, _, _, _, _, _ = camera.get_parameters()
camera_info = {
"name": camera._cam_info.name,
"stereo": camera._cam_info.stereo,
"default_profile": {
"width": width,
"height": height,
"fps": 30,
},
}
initialized_cameras.append(camera_info)
camera_manager.disconnect()
return initialized_cameras
raise NotImplementedError("Camera detection is not implemented for Reachy2 cameras.")
def read(self, color_mode: ColorMode | None = None) -> NDArray[Any]:
"""
@@ -155,95 +137,49 @@ class Reachy2Camera(Camera):
(height, width, channels), using the specified or default
color mode and applying any configured rotation.
"""
start_time = time.perf_counter()
if not self.is_connected:
raise DeviceNotConnectedError(f"{self} is not connected.")
start_time = time.perf_counter()
if self.cam_manager is None:
raise DeviceNotConnectedError(f"{self} is not connected.")
frame: NDArray[Any] = np.empty((0, 0, 3), dtype=np.uint8)
if self.cam_manager is None:
raise DeviceNotConnectedError(f"{self} is not connected.")
if self.config.name == "teleop" and hasattr(self.cam_manager, "teleop"):
if self.config.image_type == "left":
frame = self.cam_manager.teleop.get_frame(
CameraView.LEFT, size=(self.config.width, self.config.height)
)[0]
elif self.config.image_type == "right":
frame = self.cam_manager.teleop.get_frame(
CameraView.RIGHT, size=(self.config.width, self.config.height)
)[0]
elif self.config.name == "depth" and hasattr(self.cam_manager, "depth"):
if self.config.image_type == "depth":
frame = self.cam_manager.depth.get_depth_frame()[0]
elif self.config.image_type == "rgb":
frame = self.cam_manager.depth.get_frame(size=(self.config.width, self.config.height))[0]
else:
if self.config.name == "teleop" and hasattr(self.cam_manager, "teleop"):
if self.config.image_type == "left":
frame = self.cam_manager.teleop.get_frame(CameraView.LEFT, size=(640, 480))[0]
elif self.config.image_type == "right":
frame = self.cam_manager.teleop.get_frame(CameraView.RIGHT, size=(640, 480))[0]
elif self.config.name == "depth" and hasattr(self.cam_manager, "depth"):
if self.config.image_type == "depth":
frame = self.cam_manager.depth.get_depth_frame()[0]
elif self.config.image_type == "rgb":
frame = self.cam_manager.depth.get_frame(size=(640, 480))[0]
raise ValueError(f"Invalid camera name '{self.config.name}'. Expected 'teleop' or 'depth'.")
if frame is None:
return np.empty((0, 0, 3), dtype=np.uint8)
if frame is None:
return np.empty((0, 0, 3), dtype=np.uint8)
if self.config.color_mode == "rgb":
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
if self.config.color_mode == "rgb":
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
read_duration_ms = (time.perf_counter() - start_time) * 1e3
logger.debug(f"{self} read took: {read_duration_ms:.1f}ms")
return frame
def _read_loop(self) -> None:
"""
Internal loop run by the background thread for asynchronous reading.
On each iteration:
1. Reads a color frame
2. Stores result in latest_frame (thread-safe)
3. Sets new_frame_event to notify listeners
Stops on DeviceNotConnectedError, logs other errors and continues.
"""
if self.stop_event is None:
raise RuntimeError(f"{self}: stop_event is not initialized before starting read loop.")
while not self.stop_event.is_set():
try:
color_image = self.read()
with self.frame_lock:
self.latest_frame = color_image
self.new_frame_event.set()
except DeviceNotConnectedError:
break
except Exception as e:
logger.warning(f"Error reading frame in background thread for {self}: {e}")
def _start_read_thread(self) -> None:
"""Starts or restarts the background read thread if it's not running."""
if self.thread is not None and self.thread.is_alive():
self.thread.join(timeout=0.1)
if self.stop_event is not None:
self.stop_event.set()
self.stop_event = Event()
self.thread = Thread(target=self._read_loop, args=(), name=f"{self}_read_loop")
self.thread.daemon = True
self.thread.start()
def _stop_read_thread(self) -> None:
"""Signals the background read thread to stop and waits for it to join."""
if self.stop_event is not None:
self.stop_event.set()
if self.thread is not None and self.thread.is_alive():
self.thread.join(timeout=2.0)
self.thread = None
self.stop_event = None
def async_read(self, timeout_ms: float = 200) -> NDArray[Any]:
"""
Reads the latest available frame asynchronously.
Reads the latest available frame.
This method retrieves the most recent frame captured by the background
read thread. It does not block waiting for the camera hardware directly,
but may wait up to timeout_ms for the background thread to provide a frame.
This method retrieves the most recent frame available in Reachy 2's low-level software.
Args:
timeout_ms (float): Maximum time in milliseconds to wait for a frame
@@ -261,22 +197,10 @@ class Reachy2Camera(Camera):
if not self.is_connected:
raise DeviceNotConnectedError(f"{self} is not connected.")
if self.thread is None or not self.thread.is_alive():
self._start_read_thread()
if not self.new_frame_event.wait(timeout=timeout_ms / 1000.0):
thread_alive = self.thread is not None and self.thread.is_alive()
raise TimeoutError(
f"Timed out waiting for frame from camera {self} after {timeout_ms} ms. "
f"Read thread alive: {thread_alive}."
)
with self.frame_lock:
frame = self.latest_frame
self.new_frame_event.clear()
frame = self.read()
if frame is None:
raise RuntimeError(f"Internal error: Event set but no frame available for {self}.")
raise RuntimeError(f"Internal error: No frame available for {self}.")
return frame
@@ -287,12 +211,9 @@ class Reachy2Camera(Camera):
Raises:
DeviceNotConnectedError: If the camera is already disconnected.
"""
if not self.is_connected and self.thread is None:
if not self.is_connected:
raise DeviceNotConnectedError(f"{self} not connected.")
if self.thread is not None:
self._stop_read_thread()
if self.cam_manager is not None:
self.cam_manager.disconnect()
+5
View File
@@ -43,6 +43,11 @@ def make_cameras_from_configs(camera_configs: dict[str, CameraConfig]) -> dict[s
cameras[key] = Reachy2Camera(cfg)
elif cfg.type == "zmq":
from .zmq.camera_zmq import ZMQCamera
cameras[key] = ZMQCamera(cfg)
else:
try:
cameras[key] = cast(Camera, make_device_from_device_class(cfg))
@@ -1,6 +1,6 @@
#!/usr/bin/env python
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -14,5 +14,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from .bi_so100_follower import BiSO100Follower
from .config_bi_so100_follower import BiSO100FollowerConfig
from .camera_zmq import ZMQCamera
from .configuration_zmq import ZMQCameraConfig
__all__ = ["ZMQCamera", "ZMQCameraConfig"]
+235
View File
@@ -0,0 +1,235 @@
#!/usr/bin/env python
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
ZMQCamera - Captures frames from remote cameras via ZeroMQ using JSON protocol in the
following format:
{
"timestamps": {"camera_name": float},
"images": {"camera_name": "<base64-jpeg>"}
}
"""
import base64
import json
import logging
import time
from threading import Event, Lock, Thread
from typing import Any
import cv2
import numpy as np
from numpy.typing import NDArray
from lerobot.utils.errors import DeviceAlreadyConnectedError, DeviceNotConnectedError
from ..camera import Camera
from ..configs import ColorMode
from .configuration_zmq import ZMQCameraConfig
logger = logging.getLogger(__name__)
class ZMQCamera(Camera):
"""
Example usage:
```python
from lerobot.cameras.zmq import ZMQCamera, ZMQCameraConfig
config = ZMQCameraConfig(server_address="192.168.123.164", port=5555, camera_name="head_camera")
camera = ZMQCamera(config)
camera.connect()
frame = camera.read()
camera.disconnect()
```
"""
def __init__(self, config: ZMQCameraConfig):
super().__init__(config)
import zmq
self.config = config
self.server_address = config.server_address
self.port = config.port
self.camera_name = config.camera_name
self.color_mode = config.color_mode
self.timeout_ms = config.timeout_ms
self.context: zmq.Context | None = None
self.socket: zmq.Socket | None = None
self._connected = False
self.thread: Thread | None = None
self.stop_event: Event | None = None
self.frame_lock: Lock = Lock()
self.latest_frame: NDArray[Any] | None = None
self.new_frame_event: Event = Event()
def __str__(self) -> str:
return f"ZMQCamera({self.camera_name}@{self.server_address}:{self.port})"
@property
def is_connected(self) -> bool:
return self._connected and self.context is not None and self.socket is not None
def connect(self, warmup: bool = True) -> None:
"""Connect to ZMQ camera server."""
if self.is_connected:
raise DeviceAlreadyConnectedError(f"{self} is already connected.")
logger.info(f"Connecting to {self}...")
try:
import zmq
self.context = zmq.Context()
self.socket = self.context.socket(zmq.SUB)
self.socket.setsockopt_string(zmq.SUBSCRIBE, "")
self.socket.setsockopt(zmq.RCVTIMEO, self.timeout_ms)
self.socket.setsockopt(zmq.CONFLATE, True)
self.socket.connect(f"tcp://{self.server_address}:{self.port}")
self._connected = True
# Auto-detect resolution
if self.width is None or self.height is None:
h, w = self.read().shape[:2]
self.height = h
self.width = w
logger.info(f"{self} resolution: {w}x{h}")
logger.info(f"{self} connected.")
if warmup:
time.sleep(0.1)
except Exception as e:
self._cleanup()
raise RuntimeError(f"Failed to connect to {self}: {e}") from e
def _cleanup(self):
"""Clean up ZMQ resources."""
self._connected = False
if self.socket:
self.socket.close()
self.socket = None
if self.context:
self.context.term()
self.context = None
@staticmethod
def find_cameras() -> list[dict[str, Any]]:
"""ZMQ cameras require manual configuration (server address/port)."""
return []
def read(self, color_mode: ColorMode | None = None) -> NDArray[Any]:
"""
Read a single frame from the ZMQ camera.
Returns:
np.ndarray: Decoded frame (height, width, 3)
"""
if not self.is_connected or self.socket is None:
raise DeviceNotConnectedError(f"{self} is not connected.")
try:
message = self.socket.recv_string()
except Exception as e:
if type(e).__name__ == "Again":
raise TimeoutError(f"{self} timeout after {self.timeout_ms}ms") from e
raise
# Decode JSON message
data = json.loads(message)
if "images" not in data:
raise RuntimeError(f"{self} invalid message: missing 'images' key")
images = data["images"]
# Get image by camera name or first available
if self.camera_name in images:
img_b64 = images[self.camera_name]
elif images:
img_b64 = next(iter(images.values()))
else:
raise RuntimeError(f"{self} no images in message")
# Decode base64 JPEG
img_bytes = base64.b64decode(img_b64)
frame = cv2.imdecode(np.frombuffer(img_bytes, np.uint8), cv2.IMREAD_COLOR)
if frame is None:
raise RuntimeError(f"{self} failed to decode image")
return frame
def _read_loop(self) -> None:
while self.stop_event and not self.stop_event.is_set():
try:
frame = self.read()
with self.frame_lock:
self.latest_frame = frame
self.new_frame_event.set()
except DeviceNotConnectedError:
break
except TimeoutError:
pass
except Exception as e:
logger.warning(f"Read error: {e}")
def _start_read_thread(self) -> None:
if self.thread and self.thread.is_alive():
return
self.stop_event = Event()
self.thread = Thread(target=self._read_loop, daemon=True)
self.thread.start()
def _stop_read_thread(self) -> None:
if self.stop_event:
self.stop_event.set()
if self.thread and self.thread.is_alive():
self.thread.join(timeout=2.0)
self.thread = None
self.stop_event = None
def async_read(self, timeout_ms: float = 10000) -> NDArray[Any]:
"""Read latest frame asynchronously (non-blocking)."""
if not self.is_connected:
raise DeviceNotConnectedError(f"{self} is not connected.")
if not self.thread or not self.thread.is_alive():
self._start_read_thread()
if not self.new_frame_event.wait(timeout=timeout_ms / 1000.0):
raise TimeoutError(f"{self} async_read timeout after {timeout_ms}ms")
with self.frame_lock:
frame = self.latest_frame
self.new_frame_event.clear()
if frame is None:
raise RuntimeError(f"{self} no frame available")
return frame
def disconnect(self) -> None:
"""Disconnect from ZMQ camera."""
if not self.is_connected and not self.thread:
raise DeviceNotConnectedError(f"{self} not connected.")
self._stop_read_thread()
self._cleanup()
logger.info(f"{self} disconnected.")
@@ -0,0 +1,46 @@
#!/usr/bin/env python
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass
from ..configs import CameraConfig, ColorMode
__all__ = ["ZMQCameraConfig", "ColorMode"]
@CameraConfig.register_subclass("zmq")
@dataclass
class ZMQCameraConfig(CameraConfig):
server_address: str
port: int = 5555
camera_name: str = "zmq_camera"
color_mode: ColorMode = ColorMode.RGB
timeout_ms: int = 5000
def __post_init__(self) -> None:
if self.color_mode not in (ColorMode.RGB, ColorMode.BGR):
raise ValueError(
f"`color_mode` is expected to be {ColorMode.RGB.value} or {ColorMode.BGR.value}, but {self.color_mode} is provided."
)
if self.timeout_ms <= 0:
raise ValueError(f"`timeout_ms` must be positive, but {self.timeout_ms} is provided.")
if not self.server_address:
raise ValueError("`server_address` cannot be empty.")
if self.port <= 0 or self.port > 65535:
raise ValueError(f"`port` must be between 1 and 65535, but {self.port} is provided.")
+114
View File
@@ -0,0 +1,114 @@
#!/usr/bin/env python
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Streams camera images over ZMQ.
Uses lerobot's OpenCVCamera for capture, encodes images to base64 and sends them over ZMQ.
"""
import base64
import contextlib
import json
import logging
import time
from collections import deque
import cv2
import numpy as np
import zmq
from lerobot.cameras.configs import ColorMode
from lerobot.cameras.opencv import OpenCVCamera, OpenCVCameraConfig
logger = logging.getLogger(__name__)
def encode_image(image: np.ndarray, quality: int = 80) -> str:
"""Encode RGB image to base64 JPEG string."""
_, buffer = cv2.imencode(".jpg", image, [int(cv2.IMWRITE_JPEG_QUALITY), quality])
return base64.b64encode(buffer).decode("utf-8")
class ImageServer:
def __init__(self, config: dict, port: int = 5555):
self.fps = config.get("fps", 30)
self.cameras: dict[str, OpenCVCamera] = {}
for name, cfg in config.get("cameras", {}).items():
shape = cfg.get("shape", [480, 640])
cam_config = OpenCVCameraConfig(
index_or_path=cfg.get("device_id", 0),
fps=self.fps,
width=shape[1],
height=shape[0],
color_mode=ColorMode.RGB,
)
camera = OpenCVCamera(cam_config)
camera.connect()
self.cameras[name] = camera
logger.info(f"Camera {name}: {shape[1]}x{shape[0]}")
# ZMQ PUB socket
self.context = zmq.Context()
self.socket = self.context.socket(zmq.PUB)
self.socket.setsockopt(zmq.SNDHWM, 20)
self.socket.setsockopt(zmq.LINGER, 0)
self.socket.bind(f"tcp://*:{port}")
logger.info(f"ImageServer running on port {port}")
def run(self):
frame_count = 0
frame_times = deque(maxlen=60)
try:
while True:
t0 = time.time()
# Build message
message = {"timestamps": {}, "images": {}}
for name, cam in self.cameras.items():
frame = cam.read() # Returns RGB
message["timestamps"][name] = time.time()
message["images"][name] = encode_image(frame)
# Send as JSON string (suppress if buffer full)
with contextlib.suppress(zmq.Again):
self.socket.send_string(json.dumps(message), zmq.NOBLOCK)
frame_count += 1
frame_times.append(time.time() - t0)
if frame_count % 60 == 0:
logger.debug(f"FPS: {len(frame_times) / sum(frame_times):.1f}")
sleep = (1.0 / self.fps) - (time.time() - t0)
if sleep > 0:
time.sleep(sleep)
except KeyboardInterrupt:
pass
finally:
for cam in self.cameras.values():
cam.disconnect()
self.socket.close()
self.context.term()
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
config = {"fps": 30, "cameras": {"head_camera": {"device_id": 4, "shape": [480, 640]}}}
ImageServer(config, port=5555).run()
+28
View File
@@ -67,3 +67,31 @@ class EvalConfig:
f"to increase the number of episodes to match the batch size (e.g. `eval.n_episodes={self.batch_size}`), "
f"or lower the batch size (e.g. `eval.batch_size={self.n_episodes}`)."
)
@dataclass
class PeftConfig:
# PEFT offers many fine-tuning methods, layer adapters being the most common and currently also the most
# effective methods so we'll focus on those in this high-level config interface.
# Either a string (module name suffix or 'all-linear'), a list of module name suffixes or a regular expression
# describing module names to target with the configured PEFT method. Some policies have a default value for this
# so that you don't *have* to choose which layers to adapt but it might still be worthwhile depending on your case.
target_modules: list[str] | str | None = None
# Names/suffixes of modules to fully fine-tune and store alongside adapter weights. Useful for layers that are
# not part of a pre-trained model (e.g., action state projections). Depending on the policy this defaults to layers
# that are newly created in pre-trained policies. If you're fine-tuning an already trained policy you might want
# to set this to `[]`. Corresponds to PEFT's `modules_to_save`.
full_training_modules: list[str] | None = None
# The PEFT (adapter) method to apply to the policy. Needs to be a valid PEFT type.
method_type: str = "LORA"
# Adapter initialization method. Look at the specific PEFT adapter documentation for defaults.
init_type: str | None = None
# We expect that all PEFT adapters are in some way doing rank-decomposition therefore this parameter specifies
# the rank used for the adapter. In general a higher rank means more trainable parameters and closer to full
# fine-tuning.
r: int = 16
+2
View File
@@ -38,6 +38,8 @@ class EvalPipelineConfig:
seed: int | None = 1000
# Rename map for the observation to override the image and state keys
rename_map: dict[str, str] = field(default_factory=dict)
# Explicit consent to execute remote code from the Hub (required for hub environments).
trust_remote_code: bool = False
def __post_init__(self) -> None:
# HACK: We parse again the cli args here to get the pretrained path if there was one.
+14 -2
View File
@@ -55,14 +55,18 @@ class PreTrainedConfig(draccus.ChoiceRegistry, HubMixin, abc.ABC): # type: igno
n_obs_steps: int = 1
input_features: dict[str, PolicyFeature] = field(default_factory=dict)
output_features: dict[str, PolicyFeature] = field(default_factory=dict)
# `input_features` can be set to None/null in order to infer those values from the dataset.
input_features: dict[str, PolicyFeature] | None = field(default_factory=dict)
output_features: dict[str, PolicyFeature] | None = field(default_factory=dict)
device: str | None = None # e.g. "cuda", "cuda:0", "cpu", or "mps"
# `use_amp` determines whether to use Automatic Mixed Precision (AMP) for training and evaluation. With AMP,
# automatic gradient scaling is used.
use_amp: bool = False
# Whether the policy employed PEFT for training.
use_peft: bool = False
push_to_hub: bool = True # type: ignore[assignment] # TODO: use a different name to avoid override
repo_id: str | None = None
@@ -125,6 +129,8 @@ class PreTrainedConfig(draccus.ChoiceRegistry, HubMixin, abc.ABC): # type: igno
@property
def robot_state_feature(self) -> PolicyFeature | None:
if not self.input_features:
return None
for ft_name, ft in self.input_features.items():
if ft.type is FeatureType.STATE and ft_name == OBS_STATE:
return ft
@@ -132,6 +138,8 @@ class PreTrainedConfig(draccus.ChoiceRegistry, HubMixin, abc.ABC): # type: igno
@property
def env_state_feature(self) -> PolicyFeature | None:
if not self.input_features:
return None
for _, ft in self.input_features.items():
if ft.type is FeatureType.ENV:
return ft
@@ -139,10 +147,14 @@ class PreTrainedConfig(draccus.ChoiceRegistry, HubMixin, abc.ABC): # type: igno
@property
def image_features(self) -> dict[str, PolicyFeature]:
if not self.input_features:
return {}
return {key: ft for key, ft in self.input_features.items() if ft.type is FeatureType.VISUAL}
@property
def action_feature(self) -> PolicyFeature | None:
if not self.output_features:
return None
for ft_name, ft in self.output_features.items():
if ft.type is FeatureType.ACTION and ft_name == ACTION:
return ft
+2 -1
View File
@@ -24,7 +24,7 @@ from huggingface_hub.errors import HfHubHTTPError
from lerobot import envs
from lerobot.configs import parser
from lerobot.configs.default import DatasetConfig, EvalConfig, WandBConfig
from lerobot.configs.default import DatasetConfig, EvalConfig, PeftConfig, WandBConfig
from lerobot.configs.policies import PreTrainedConfig
from lerobot.optim import OptimizerConfig
from lerobot.optim.schedulers import LRSchedulerConfig
@@ -65,6 +65,7 @@ class TrainPipelineConfig(HubMixin):
scheduler: LRSchedulerConfig | None = None
eval: EvalConfig = field(default_factory=EvalConfig)
wandb: WandBConfig = field(default_factory=WandBConfig)
peft: PeftConfig | None = None
# RA-BC (Reward-Aligned Behavior Cloning) parameters
use_rabc: bool = False # Enable reward-weighted training
+5
View File
@@ -50,3 +50,8 @@ class RTCAttentionSchedule(str, Enum):
ONES = "ONES"
LINEAR = "LINEAR"
EXP = "EXP"
class RTCTrainingDelayDistribution(str, Enum):
UNIFORM = "UNIFORM"
EXP = "EXP"
+24 -5
View File
@@ -19,6 +19,7 @@ import logging
import shutil
from pathlib import Path
import datasets
import pandas as pd
import tqdm
@@ -32,6 +33,7 @@ from lerobot.datasets.utils import (
DEFAULT_VIDEO_FILE_SIZE_IN_MB,
DEFAULT_VIDEO_PATH,
get_file_size_in_mb,
get_hf_features_from_features,
get_parquet_file_size_in_mb,
to_parquet_with_hf_images,
update_chunk_file_indices,
@@ -402,12 +404,21 @@ def aggregate_data(src_meta, dst_meta, data_idx, data_files_size_in_mb, chunk_si
}
unique_chunk_file_ids = sorted(unique_chunk_file_ids)
contains_images = len(dst_meta.image_keys) > 0
# retrieve features schema for proper image typing in parquet
hf_features = get_hf_features_from_features(dst_meta.features) if contains_images else None
for src_chunk_idx, src_file_idx in unique_chunk_file_ids:
src_path = src_meta.root / DEFAULT_DATA_PATH.format(
chunk_index=src_chunk_idx, file_index=src_file_idx
)
df = pd.read_parquet(src_path)
if contains_images:
# Use HuggingFace datasets to read source data to preserve image format
src_ds = datasets.Dataset.from_parquet(str(src_path))
df = src_ds.to_pandas()
else:
df = pd.read_parquet(src_path)
df = update_data_df(df, src_meta, dst_meta)
data_idx = append_or_create_parquet_file(
@@ -417,8 +428,9 @@ def aggregate_data(src_meta, dst_meta, data_idx, data_files_size_in_mb, chunk_si
data_files_size_in_mb,
chunk_size,
DEFAULT_DATA_PATH,
contains_images=len(dst_meta.image_keys) > 0,
contains_images=contains_images,
aggr_root=dst_meta.root,
hf_features=hf_features,
)
return data_idx
@@ -488,6 +500,7 @@ def append_or_create_parquet_file(
default_path: str,
contains_images: bool = False,
aggr_root: Path = None,
hf_features: datasets.Features | None = None,
):
"""Appends data to an existing parquet file or creates a new one based on size constraints.
@@ -503,6 +516,7 @@ def append_or_create_parquet_file(
default_path: Format string for generating file paths.
contains_images: Whether the data contains images requiring special handling.
aggr_root: Root path for the aggregated dataset.
hf_features: Optional HuggingFace Features schema for proper image typing.
Returns:
dict: Updated index dictionary with current chunk and file indices.
@@ -512,7 +526,7 @@ def append_or_create_parquet_file(
if not dst_path.exists():
dst_path.parent.mkdir(parents=True, exist_ok=True)
if contains_images:
to_parquet_with_hf_images(df, dst_path)
to_parquet_with_hf_images(df, dst_path, features=hf_features)
else:
df.to_parquet(dst_path)
return idx
@@ -527,12 +541,17 @@ def append_or_create_parquet_file(
final_df = df
target_path = new_path
else:
existing_df = pd.read_parquet(dst_path)
if contains_images:
# Use HuggingFace datasets to read existing data to preserve image format
existing_ds = datasets.Dataset.from_parquet(str(dst_path))
existing_df = existing_ds.to_pandas()
else:
existing_df = pd.read_parquet(dst_path)
final_df = pd.concat([existing_df, df], ignore_index=True)
target_path = dst_path
if contains_images:
to_parquet_with_hf_images(final_df, target_path)
to_parquet_with_hf_images(final_df, target_path, features=hf_features)
else:
final_df.to_parquet(target_path)
+561 -1
View File
@@ -26,6 +26,7 @@ This module provides utilities for:
import logging
import shutil
from collections.abc import Callable
from concurrent.futures import ThreadPoolExecutor, as_completed
from pathlib import Path
import datasets
@@ -51,7 +52,8 @@ from lerobot.datasets.utils import (
write_stats,
write_tasks,
)
from lerobot.utils.constants import HF_LEROBOT_HOME
from lerobot.datasets.video_utils import encode_video_frames, get_video_info
from lerobot.utils.constants import HF_LEROBOT_HOME, OBS_IMAGE
def _load_episode_with_stats(src_dataset: LeRobotDataset, episode_idx: int) -> dict:
@@ -1083,3 +1085,561 @@ def _copy_episodes_metadata_and_stats(
else:
if src_dataset.meta.stats:
write_stats(src_dataset.meta.stats, dst_meta.root)
def _save_episode_images_for_video(
dataset: LeRobotDataset,
imgs_dir: Path,
img_key: str,
episode_index: int,
num_workers: int = 4,
) -> None:
"""Save images from a specific episode and camera to disk for video encoding.
Args:
dataset: The LeRobot dataset to extract images from
imgs_dir: Directory to save images to
img_key: The image key (camera) to extract
episode_index: Index of the episode to save
num_workers: Number of threads for parallel image saving
"""
# Create directory
imgs_dir.mkdir(parents=True, exist_ok=True)
# Get dataset without torch format for PIL image access
hf_dataset = dataset.hf_dataset.with_format(None)
# Select only this camera's images
imgs_dataset = hf_dataset.select_columns(img_key)
# Get episode start and end indices
from_idx = dataset.meta.episodes["dataset_from_index"][episode_index]
to_idx = dataset.meta.episodes["dataset_to_index"][episode_index]
# Get all items for this episode
episode_dataset = imgs_dataset.select(range(from_idx, to_idx))
# Define function to save a single image
def save_single_image(i_item_tuple):
i, item = i_item_tuple
img = item[img_key]
# Use frame-XXXXXX.png format to match encode_video_frames expectations
img.save(str(imgs_dir / f"frame-{i:06d}.png"), quality=100)
return i
# Save images with proper naming convention for encode_video_frames (frame-XXXXXX.png)
items = list(enumerate(episode_dataset))
with ThreadPoolExecutor(max_workers=num_workers) as executor:
futures = [executor.submit(save_single_image, item) for item in items]
for future in as_completed(futures):
future.result() # This will raise any exceptions that occurred
def _save_batch_episodes_images(
dataset: LeRobotDataset,
imgs_dir: Path,
img_key: str,
episode_indices: list[int],
num_workers: int = 4,
) -> list[float]:
"""Save images from multiple episodes to disk for batch video encoding.
Args:
dataset: The LeRobot dataset to extract images from
imgs_dir: Directory to save images to
img_key: The image key (camera) to extract
episode_indices: List of episode indices to save
num_workers: Number of threads for parallel image saving
Returns:
List of episode durations in seconds
"""
imgs_dir.mkdir(parents=True, exist_ok=True)
hf_dataset = dataset.hf_dataset.with_format(None)
imgs_dataset = hf_dataset.select_columns(img_key)
# Define function to save a single image with global frame index
# Defined once outside the loop to avoid repeated closure creation
def save_single_image(i_item_tuple, base_frame_idx, img_key_param):
i, item = i_item_tuple
img = item[img_key_param]
# Use global frame index for naming
img.save(str(imgs_dir / f"frame-{base_frame_idx + i:06d}.png"), quality=100)
return i
episode_durations = []
frame_idx = 0
for ep_idx in episode_indices:
# Get episode range
from_idx = dataset.meta.episodes["dataset_from_index"][ep_idx]
to_idx = dataset.meta.episodes["dataset_to_index"][ep_idx]
episode_length = to_idx - from_idx
episode_durations.append(episode_length / dataset.fps)
# Get episode images
episode_dataset = imgs_dataset.select(range(from_idx, to_idx))
# Save images
items = list(enumerate(episode_dataset))
with ThreadPoolExecutor(max_workers=num_workers) as executor:
futures = [executor.submit(save_single_image, item, frame_idx, img_key) for item in items]
for future in as_completed(futures):
future.result()
frame_idx += episode_length
return episode_durations
def _iter_episode_batches(
episode_indices: list[int],
episode_lengths: dict[int, int],
size_per_frame_mb: float,
video_file_size_limit: float,
max_episodes: int | None,
max_frames: int | None,
):
"""Generator that yields batches of episode indices for video encoding.
Groups episodes into batches that respect size and memory constraints:
- Stays under video file size limit
- Respects maximum episodes per batch (if specified)
- Respects maximum frames per batch (if specified)
Args:
episode_indices: List of episode indices to batch
episode_lengths: Dictionary mapping episode index to episode length
size_per_frame_mb: Estimated size per frame in MB
video_file_size_limit: Maximum video file size in MB
max_episodes: Maximum number of episodes per batch (None = no limit)
max_frames: Maximum number of frames per batch (None = no limit)
Yields:
List of episode indices for each batch
"""
batch_episodes = []
estimated_size = 0.0
total_frames = 0
for ep_idx in episode_indices:
ep_length = episode_lengths[ep_idx]
ep_estimated_size = ep_length * size_per_frame_mb
# we check if adding this episode would exceed any constraint
would_exceed_size = estimated_size > 0 and estimated_size + ep_estimated_size >= video_file_size_limit
would_exceed_episodes = max_episodes is not None and len(batch_episodes) >= max_episodes
would_exceed_frames = max_frames is not None and total_frames + ep_length > max_frames
if batch_episodes and (would_exceed_size or would_exceed_episodes or would_exceed_frames):
# yield current batch before adding this episode
yield batch_episodes
# start a new batch with current episode
batch_episodes = [ep_idx]
estimated_size = ep_estimated_size
total_frames = ep_length
else:
# add to current batch
batch_episodes.append(ep_idx)
estimated_size += ep_estimated_size
total_frames += ep_length
# yield final batch if not empty
if batch_episodes:
yield batch_episodes
def _estimate_frame_size_via_calibration(
dataset: LeRobotDataset,
img_key: str,
episode_indices: list[int],
temp_dir: Path,
fps: int,
vcodec: str,
pix_fmt: str,
g: int,
crf: int,
fast_decode: int,
num_calibration_frames: int = 30,
) -> float:
"""Estimate MB per frame by encoding a small calibration sample.
Encodes a representative sample of frames using the exact codec parameters
to measure actual compression ratio, which is more accurate than heuristics.
Args:
dataset: Source dataset with images.
img_key: Image key to calibrate (e.g., "observation.images.top").
episode_indices: List of episode indices being processed.
temp_dir: Temporary directory for calibration files.
fps: Frames per second for video encoding.
vcodec: Video codec (libsvtav1, h264, hevc).
pix_fmt: Pixel format (yuv420p, etc.).
g: GOP size (group of pictures).
crf: Constant Rate Factor (quality).
fast_decode: Fast decode tuning parameter.
num_calibration_frames: Number of frames to use for calibration (default: 30).
Returns:
Estimated size in MB per frame based on actual encoding.
"""
calibration_dir = temp_dir / "calibration" / img_key
calibration_dir.mkdir(parents=True, exist_ok=True)
try:
# Select a representative episode (prefer middle episode if available)
calibration_ep_idx = episode_indices[len(episode_indices) // 2]
# Get episode range
from_idx = dataset.meta.episodes["dataset_from_index"][calibration_ep_idx]
to_idx = dataset.meta.episodes["dataset_to_index"][calibration_ep_idx]
episode_length = to_idx - from_idx
# Use up to num_calibration_frames from this episode
num_frames = min(num_calibration_frames, episode_length)
# Get frames from dataset
hf_dataset = dataset.hf_dataset.with_format(None)
sample_indices = range(from_idx, from_idx + num_frames)
# Save calibration frames
for i, idx in enumerate(sample_indices):
img = hf_dataset[idx][img_key]
img.save(str(calibration_dir / f"frame-{i:06d}.png"), quality=100)
# Encode calibration video
calibration_video_path = calibration_dir / "calibration.mp4"
encode_video_frames(
imgs_dir=calibration_dir,
video_path=calibration_video_path,
fps=fps,
vcodec=vcodec,
pix_fmt=pix_fmt,
g=g,
crf=crf,
fast_decode=fast_decode,
overwrite=True,
)
# Measure actual compressed size
video_size_bytes = calibration_video_path.stat().st_size
video_size_mb = video_size_bytes / BYTES_PER_MIB
size_per_frame_mb = video_size_mb / num_frames
logging.info(
f" Calibration: {num_frames} frames -> {video_size_mb:.2f} MB "
f"= {size_per_frame_mb:.4f} MB/frame for {img_key}"
)
return size_per_frame_mb
finally:
# Clean up calibration files
if calibration_dir.exists():
shutil.rmtree(calibration_dir)
def _copy_data_without_images(
src_dataset: LeRobotDataset,
dst_meta: LeRobotDatasetMetadata,
episode_indices: list[int],
img_keys: list[str],
) -> None:
"""Copy data files without image columns.
Args:
src_dataset: Source dataset
dst_meta: Destination metadata
episode_indices: Episodes to include
img_keys: Image keys to remove
"""
from lerobot.datasets.utils import DATA_DIR
data_dir = src_dataset.root / DATA_DIR
parquet_files = sorted(data_dir.glob("*/*.parquet"))
if not parquet_files:
raise ValueError(f"No parquet files found in {data_dir}")
episode_set = set(episode_indices)
for src_path in tqdm(parquet_files, desc="Processing data files"):
df = pd.read_parquet(src_path).reset_index(drop=True)
# Filter to only include selected episodes
df = df[df["episode_index"].isin(episode_set)].copy()
if len(df) == 0:
continue
# Remove image columns
columns_to_drop = [col for col in img_keys if col in df.columns]
if columns_to_drop:
df = df.drop(columns=columns_to_drop)
# Get chunk and file indices from path
relative_path = src_path.relative_to(src_dataset.root)
chunk_dir = relative_path.parts[1]
file_name = relative_path.parts[2]
chunk_idx = int(chunk_dir.split("-")[1])
file_idx = int(file_name.split("-")[1].split(".")[0])
# Write to destination without pandas index
dst_path = dst_meta.root / f"data/chunk-{chunk_idx:03d}/file-{file_idx:03d}.parquet"
dst_path.parent.mkdir(parents=True, exist_ok=True)
df.to_parquet(dst_path, index=False)
# Video conversion constants
BYTES_PER_KIB = 1024
BYTES_PER_MIB = BYTES_PER_KIB * BYTES_PER_KIB
def convert_image_to_video_dataset(
dataset: LeRobotDataset,
output_dir: Path,
repo_id: str | None = None,
vcodec: str = "libsvtav1",
pix_fmt: str = "yuv420p",
g: int = 2,
crf: int = 30,
fast_decode: int = 0,
episode_indices: list[int] | None = None,
num_workers: int = 4,
max_episodes_per_batch: int | None = None,
max_frames_per_batch: int | None = None,
) -> LeRobotDataset:
"""Convert image-to-video dataset.
Creates a new LeRobotDataset with images encoded as videos, following the proper
LeRobot dataset structure with videos stored in chunked MP4 files.
Args:
dataset: The source LeRobot dataset with images
output_dir: Directory to save the new video dataset
repo_id: Repository ID for the new dataset (default: original_id + "_video")
vcodec: Video codec (default: libsvtav1)
pix_fmt: Pixel format (default: yuv420p)
g: Group of pictures size (default: 2)
crf: Constant rate factor (default: 30)
fast_decode: Fast decode tuning (default: 0)
episode_indices: List of episode indices to convert (None = all episodes)
num_workers: Number of threads for parallel processing (default: 4)
max_episodes_per_batch: Maximum episodes per video batch to avoid memory issues (None = no limit)
max_frames_per_batch: Maximum frames per video batch to avoid memory issues (None = no limit)
Returns:
New LeRobotDataset with images encoded as videos
"""
# Check that it's an image dataset
if len(dataset.meta.video_keys) > 0:
raise ValueError(
f"This operation is for image datasets only. Video dataset provided: {dataset.repo_id}"
)
# Get all image keys
hf_dataset = dataset.hf_dataset.with_format(None)
img_keys = [key for key in hf_dataset.features if key.startswith(OBS_IMAGE)]
if len(img_keys) == 0:
raise ValueError(f"No image keys found in dataset {dataset.repo_id}")
# Determine which episodes to process
if episode_indices is None:
episode_indices = list(range(dataset.meta.total_episodes))
if repo_id is None:
repo_id = f"{dataset.repo_id}_video"
logging.info(
f"Converting {len(episode_indices)} episodes with {len(img_keys)} cameras from {dataset.repo_id}"
)
logging.info(f"Video codec: {vcodec}, pixel format: {pix_fmt}, GOP: {g}, CRF: {crf}")
# Create new features dict, converting image features to video features
new_features = {}
for key, value in dataset.meta.features.items():
if key not in img_keys:
new_features[key] = value
else:
# Convert image key to video format
new_features[key] = value.copy()
new_features[key]["dtype"] = "video" # Change dtype from "image" to "video"
# Video info will be updated after episodes are encoded
# Create new metadata for video dataset
new_meta = LeRobotDatasetMetadata.create(
repo_id=repo_id,
fps=dataset.meta.fps,
features=new_features,
robot_type=dataset.meta.robot_type,
root=output_dir,
use_videos=True,
chunks_size=dataset.meta.chunks_size,
data_files_size_in_mb=dataset.meta.data_files_size_in_mb,
video_files_size_in_mb=dataset.meta.video_files_size_in_mb,
)
# Create temporary directory for image extraction
temp_dir = output_dir / "temp_images"
temp_dir.mkdir(parents=True, exist_ok=True)
# Process all episodes and batch encode videos
# Use dictionary for O(1) episode metadata lookups instead of O(n) linear search
all_episode_metadata = {}
fps = int(dataset.fps)
try:
# Build episode metadata entries first
logging.info("Building episode metadata...")
cumulative_frame_idx = 0
for ep_idx in episode_indices:
src_episode = dataset.meta.episodes[ep_idx]
ep_length = src_episode["length"]
ep_meta = {
"episode_index": ep_idx,
"length": ep_length,
"dataset_from_index": cumulative_frame_idx,
"dataset_to_index": cumulative_frame_idx + ep_length,
}
if "data/chunk_index" in src_episode:
ep_meta["data/chunk_index"] = src_episode["data/chunk_index"]
ep_meta["data/file_index"] = src_episode["data/file_index"]
all_episode_metadata[ep_idx] = ep_meta
cumulative_frame_idx += ep_length
# Process each camera and batch encode multiple episodes together
video_file_size_limit = new_meta.video_files_size_in_mb
# Pre-compute episode lengths for batching
episode_lengths = {ep_idx: dataset.meta.episodes["length"][ep_idx] for ep_idx in episode_indices}
for img_key in tqdm(img_keys, desc="Processing cameras"):
# Estimate size per frame by encoding a small calibration sample
# This provides accurate compression ratio for the specific codec parameters
size_per_frame_mb = _estimate_frame_size_via_calibration(
dataset=dataset,
img_key=img_key,
episode_indices=episode_indices,
temp_dir=temp_dir,
fps=fps,
vcodec=vcodec,
pix_fmt=pix_fmt,
g=g,
crf=crf,
fast_decode=fast_decode,
)
logging.info(f"Processing camera: {img_key}")
chunk_idx, file_idx = 0, 0
cumulative_timestamp = 0.0
# Process episodes in batches to stay under size limit
for batch_episodes in _iter_episode_batches(
episode_indices=episode_indices,
episode_lengths=episode_lengths,
size_per_frame_mb=size_per_frame_mb,
video_file_size_limit=video_file_size_limit,
max_episodes=max_episodes_per_batch,
max_frames=max_frames_per_batch,
):
total_frames_in_batch = sum(episode_lengths[idx] for idx in batch_episodes)
logging.info(
f" Encoding batch of {len(batch_episodes)} episodes "
f"({batch_episodes[0]}-{batch_episodes[-1]}) = {total_frames_in_batch} frames"
)
# Save images for all episodes in this batch
imgs_dir = temp_dir / f"batch_{chunk_idx}_{file_idx}" / img_key
episode_durations = _save_batch_episodes_images(
dataset=dataset,
imgs_dir=imgs_dir,
img_key=img_key,
episode_indices=batch_episodes,
num_workers=num_workers,
)
# Encode all batched episodes into single video
video_path = new_meta.root / new_meta.video_path.format(
video_key=img_key, chunk_index=chunk_idx, file_index=file_idx
)
video_path.parent.mkdir(parents=True, exist_ok=True)
encode_video_frames(
imgs_dir=imgs_dir,
video_path=video_path,
fps=fps,
vcodec=vcodec,
pix_fmt=pix_fmt,
g=g,
crf=crf,
fast_decode=fast_decode,
overwrite=True,
)
# Clean up temporary images
shutil.rmtree(imgs_dir)
# Update metadata for each episode in the batch
for ep_idx, duration in zip(batch_episodes, episode_durations, strict=True):
from_timestamp = cumulative_timestamp
to_timestamp = cumulative_timestamp + duration
cumulative_timestamp = to_timestamp
# Find episode metadata entry and add video metadata (O(1) dictionary lookup)
ep_meta = all_episode_metadata[ep_idx]
ep_meta[f"videos/{img_key}/chunk_index"] = chunk_idx
ep_meta[f"videos/{img_key}/file_index"] = file_idx
ep_meta[f"videos/{img_key}/from_timestamp"] = from_timestamp
ep_meta[f"videos/{img_key}/to_timestamp"] = to_timestamp
# Move to next video file for next batch
chunk_idx, file_idx = update_chunk_file_indices(chunk_idx, file_idx, new_meta.chunks_size)
cumulative_timestamp = 0.0
# Copy and transform data files (removing image columns)
_copy_data_without_images(dataset, new_meta, episode_indices, img_keys)
# Save episode metadata
episodes_df = pd.DataFrame(list(all_episode_metadata.values()))
episodes_path = new_meta.root / "meta" / "episodes" / "chunk-000" / "file-000.parquet"
episodes_path.parent.mkdir(parents=True, exist_ok=True)
episodes_df.to_parquet(episodes_path, index=False)
# Update metadata info
new_meta.info["total_episodes"] = len(episode_indices)
new_meta.info["total_frames"] = sum(ep["length"] for ep in all_episode_metadata.values())
new_meta.info["total_tasks"] = dataset.meta.total_tasks
new_meta.info["splits"] = {"train": f"0:{len(episode_indices)}"}
# Update video info for all image keys (now videos)
# We need to manually set video info since update_video_info() checks video_keys first
for img_key in img_keys:
if not new_meta.features[img_key].get("info", None):
video_path = new_meta.root / new_meta.video_path.format(
video_key=img_key, chunk_index=0, file_index=0
)
new_meta.info["features"][img_key]["info"] = get_video_info(video_path)
write_info(new_meta.info, new_meta.root)
# Copy stats and tasks
if dataset.meta.stats is not None:
# Remove image stats
new_stats = {k: v for k, v in dataset.meta.stats.items() if k not in img_keys}
write_stats(new_stats, new_meta.root)
if dataset.meta.tasks is not None:
write_tasks(dataset.meta.tasks, new_meta.root)
finally:
# Clean up temporary directory
if temp_dir.exists():
shutil.rmtree(temp_dir)
logging.info(f"Completed converting {dataset.repo_id} to video format")
logging.info(f"New dataset saved to: {output_dir}")
# Return new dataset
return LeRobotDataset(repo_id=repo_id, root=output_dir)
+38 -8
View File
@@ -78,6 +78,7 @@ from lerobot.datasets.video_utils import (
from lerobot.utils.constants import HF_LEROBOT_HOME
CODEBASE_VERSION = "v3.0"
VALID_VIDEO_CODECS = {"h264", "hevc", "libsvtav1"}
class LeRobotDatasetMetadata:
@@ -540,11 +541,13 @@ class LeRobotDatasetMetadata:
return obj
def _encode_video_worker(video_key: str, episode_index: int, root: Path, fps: int) -> Path:
def _encode_video_worker(
video_key: str, episode_index: int, root: Path, fps: int, vcodec: str = "libsvtav1"
) -> Path:
temp_path = Path(tempfile.mkdtemp(dir=root)) / f"{video_key}_{episode_index:03d}.mp4"
fpath = DEFAULT_IMAGE_PATH.format(image_key=video_key, episode_index=episode_index, frame_index=0)
img_dir = (root / fpath).parent
encode_video_frames(img_dir, temp_path, fps, overwrite=True)
encode_video_frames(img_dir, temp_path, fps, vcodec=vcodec, overwrite=True)
shutil.rmtree(img_dir)
return temp_path
@@ -563,6 +566,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
download_videos: bool = True,
video_backend: str | None = None,
batch_encoding_size: int = 1,
vcodec: str = "libsvtav1",
):
"""
2 modes are available for instantiating this class, depending on 2 different use cases:
@@ -675,8 +679,13 @@ class LeRobotDataset(torch.utils.data.Dataset):
You can also use the 'pyav' decoder used by Torchvision, which used to be the default option, or 'video_reader' which is another decoder of Torchvision.
batch_encoding_size (int, optional): Number of episodes to accumulate before batch encoding videos.
Set to 1 for immediate encoding (default), or higher for batched encoding. Defaults to 1.
vcodec (str, optional): Video codec for encoding videos during recording. Options: 'h264', 'hevc',
'libsvtav1'. Defaults to 'libsvtav1'. Use 'h264' for faster encoding on systems where AV1
encoding is CPU-heavy.
"""
super().__init__()
if vcodec not in VALID_VIDEO_CODECS:
raise ValueError(f"Invalid vcodec '{vcodec}'. Must be one of: {sorted(VALID_VIDEO_CODECS)}")
self.repo_id = repo_id
self.root = Path(root) if root else HF_LEROBOT_HOME / repo_id
self.image_transforms = image_transforms
@@ -688,6 +697,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
self.delta_indices = None
self.batch_encoding_size = batch_encoding_size
self.episodes_since_last_encoding = 0
self.vcodec = vcodec
# Unused attributes
self.image_writer = None
@@ -925,17 +935,30 @@ class LeRobotDataset(torch.utils.data.Dataset):
else:
return get_hf_features_from_features(self.features)
def _get_query_indices(self, idx: int, ep_idx: int) -> tuple[dict[str, list[int | bool]]]:
def _get_query_indices(
self, abs_idx: int, ep_idx: int
) -> tuple[dict[str, list[int]], dict[str, torch.Tensor]]:
"""Compute query indices for delta timestamps.
Args:
abs_idx: The absolute index in the full dataset (not the relative index in filtered episodes).
ep_idx: The episode index.
Returns:
A tuple of (query_indices, padding) where:
- query_indices: Dict mapping keys to lists of absolute indices to query
- padding: Dict mapping "{key}_is_pad" to boolean tensors indicating padded positions
"""
ep = self.meta.episodes[ep_idx]
ep_start = ep["dataset_from_index"]
ep_end = ep["dataset_to_index"]
query_indices = {
key: [max(ep_start, min(ep_end - 1, idx + delta)) for delta in delta_idx]
key: [max(ep_start, min(ep_end - 1, abs_idx + delta)) for delta in delta_idx]
for key, delta_idx in self.delta_indices.items()
}
padding = { # Pad values outside of current episode range
f"{key}_is_pad": torch.BoolTensor(
[(idx + delta < ep_start) | (idx + delta >= ep_end) for delta in delta_idx]
[(abs_idx + delta < ep_start) | (abs_idx + delta >= ep_end) for delta in delta_idx]
)
for key, delta_idx in self.delta_indices.items()
}
@@ -1027,10 +1050,12 @@ class LeRobotDataset(torch.utils.data.Dataset):
self._ensure_hf_dataset_loaded()
item = self.hf_dataset[idx]
ep_idx = item["episode_index"].item()
# Use the absolute index from the dataset for delta timestamp calculations
abs_idx = item["index"].item()
query_indices = None
if self.delta_indices is not None:
query_indices, padding = self._get_query_indices(idx, ep_idx)
query_indices, padding = self._get_query_indices(abs_idx, ep_idx)
query_result = self._query_hf_dataset(query_indices)
item = {**item, **padding}
for key, val in query_result.items():
@@ -1211,6 +1236,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
episode_index,
self.root,
self.fps,
self.vcodec,
): video_key
for video_key in self.meta.video_keys
}
@@ -1487,7 +1513,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
episode_index = self.episode_buffer["episode_index"]
if isinstance(episode_index, np.ndarray):
episode_index = episode_index.item() if episode_index.size == 1 else episode_index[0]
for cam_key in self.meta.camera_keys:
for cam_key in self.meta.image_keys:
img_dir = self._get_image_file_dir(episode_index, cam_key)
if img_dir.is_dir():
shutil.rmtree(img_dir)
@@ -1526,7 +1552,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
Note: `encode_video_frames` is a blocking call. Making it asynchronous shouldn't speedup encoding,
since video encoding with ffmpeg is already using multithreading.
"""
return _encode_video_worker(video_key, episode_index, self.root, self.fps)
return _encode_video_worker(video_key, episode_index, self.root, self.fps, self.vcodec)
@classmethod
def create(
@@ -1542,8 +1568,11 @@ class LeRobotDataset(torch.utils.data.Dataset):
image_writer_threads: int = 0,
video_backend: str | None = None,
batch_encoding_size: int = 1,
vcodec: str = "libsvtav1",
) -> "LeRobotDataset":
"""Create a LeRobot Dataset from scratch in order to record data."""
if vcodec not in VALID_VIDEO_CODECS:
raise ValueError(f"Invalid vcodec '{vcodec}'. Must be one of: {sorted(VALID_VIDEO_CODECS)}")
obj = cls.__new__(cls)
obj.meta = LeRobotDatasetMetadata.create(
repo_id=repo_id,
@@ -1560,6 +1589,7 @@ class LeRobotDataset(torch.utils.data.Dataset):
obj.image_writer = None
obj.batch_encoding_size = batch_encoding_size
obj.episodes_since_last_encoding = 0
obj.vcodec = vcodec
if image_writer_processes or image_writer_threads:
obj.start_image_writer(image_writer_processes, image_writer_threads)
+2 -2
View File
@@ -18,12 +18,12 @@ from typing import Any
from lerobot.configs.types import PipelineFeatureType
from lerobot.datasets.utils import hw_to_dataset_features
from lerobot.processor import DataProcessorPipeline
from lerobot.processor import DataProcessorPipeline, RobotAction, RobotObservation
from lerobot.utils.constants import ACTION, OBS_IMAGES, OBS_STATE, OBS_STR
def create_initial_features(
action: dict[str, Any] | None = None, observation: dict[str, Any] | None = None
action: RobotAction | None = None, observation: RobotObservation | None = None
) -> dict[PipelineFeatureType, dict[str, Any]]:
"""
Creates the initial features dict for the dataset from action and observation specs.
+11 -2
View File
@@ -1172,12 +1172,21 @@ def validate_episode_buffer(episode_buffer: dict, total_episodes: int, features:
)
def to_parquet_with_hf_images(df: pandas.DataFrame, path: Path) -> None:
def to_parquet_with_hf_images(
df: pandas.DataFrame, path: Path, features: datasets.Features | None = None
) -> None:
"""This function correctly writes to parquet a panda DataFrame that contains images encoded by HF dataset.
This way, it can be loaded by HF dataset and correctly formatted images are returned.
Args:
df: DataFrame to write to parquet.
path: Path to write the parquet file.
features: Optional HuggingFace Features schema. If provided, ensures image columns
are properly typed as Image() in the parquet schema.
"""
# TODO(qlhoest): replace this weird synthax by `df.to_parquet(path)` only
datasets.Dataset.from_dict(df.to_dict(orient="list")).to_parquet(path)
ds = datasets.Dataset.from_dict(df.to_dict(orient="list"), features=features)
ds.to_parquet(path)
def item_to_torch(item: dict) -> dict:
+1 -1
View File
@@ -12,4 +12,4 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from .configs import AlohaEnv, EnvConfig, PushtEnv # noqa: F401
from .configs import AlohaEnv, EnvConfig, HubEnvConfig, PushtEnv # noqa: F401
+90 -5
View File
@@ -13,7 +13,7 @@
# limitations under the License.
import abc
from dataclasses import dataclass, field
from dataclasses import dataclass, field, fields
from typing import Any
import draccus
@@ -68,6 +68,22 @@ class EnvConfig(draccus.ChoiceRegistry, abc.ABC):
raise NotImplementedError()
@dataclass
class HubEnvConfig(EnvConfig):
"""Base class for environments that delegate creation to a hub-hosted make_env.
Hub environments download and execute remote code from the HF Hub.
The hub_path points to a repository containing an env.py with a make_env function.
"""
hub_path: str | None = None # required: e.g., "username/repo" or "username/repo@branch:file.py"
@property
def gym_kwargs(self) -> dict:
# Not used for hub environments - the hub's make_env handles everything
return {}
@EnvConfig.register_subclass("aloha")
@dataclass
class AlohaEnv(EnvConfig):
@@ -244,6 +260,7 @@ class HILSerlRobotEnvConfig(EnvConfig):
@dataclass
class LiberoEnv(EnvConfig):
task: str = "libero_10" # can also choose libero_spatial, libero_object, etc.
task_ids: list[int] | None = None
fps: int = 30
episode_length: int | None = None
obs_type: str = "pixels_agent_pos"
@@ -322,10 +339,10 @@ class LiberoEnv(EnvConfig):
@property
def gym_kwargs(self) -> dict:
return {
"obs_type": self.obs_type,
"render_mode": self.render_mode,
}
kwargs: dict[str, Any] = {"obs_type": self.obs_type, "render_mode": self.render_mode}
if self.task_ids is not None:
kwargs["task_ids"] = self.task_ids
return kwargs
@EnvConfig.register_subclass("metaworld")
@@ -368,3 +385,71 @@ class MetaworldEnv(EnvConfig):
"obs_type": self.obs_type,
"render_mode": self.render_mode,
}
@EnvConfig.register_subclass("isaaclab_arena")
@dataclass
class IsaaclabArenaEnv(HubEnvConfig):
hub_path: str = "nvidia/isaaclab-arena-envs"
episode_length: int = 300
num_envs: int = 1
embodiment: str | None = "gr1_pink"
object: str | None = "power_drill"
mimic: bool = False
teleop_device: str | None = None
seed: int | None = 42
device: str | None = "cuda:0"
disable_fabric: bool = False
enable_cameras: bool = False
headless: bool = False
enable_pinocchio: bool = True
environment: str | None = "gr1_microwave"
task: str | None = "Reach out to the microwave and open it."
state_dim: int = 54
action_dim: int = 36
camera_height: int = 512
camera_width: int = 512
video: bool = False
video_length: int = 100
video_interval: int = 200
# Comma-separated keys, e.g., "robot_joint_pos,left_eef_pos"
state_keys: str = "robot_joint_pos"
# Comma-separated keys, e.g., "robot_pov_cam_rgb,front_cam_rgb"
# Set to None or "" for environments without cameras
camera_keys: str | None = None
features: dict[str, PolicyFeature] = field(default_factory=dict)
features_map: dict[str, str] = field(default_factory=dict)
kwargs: dict | None = None
def __post_init__(self):
if self.kwargs:
# dynamically convert kwargs to fields in the dataclass
# NOTE! the new fields will not bee seen by the dataclass repr
field_names = {f.name for f in fields(self)}
for key, value in self.kwargs.items():
if key not in field_names and key != "kwargs":
setattr(self, key, value)
self.kwargs = None
# Set action feature
self.features[ACTION] = PolicyFeature(type=FeatureType.ACTION, shape=(self.action_dim,))
self.features_map[ACTION] = ACTION
# Set state feature
self.features[OBS_STATE] = PolicyFeature(type=FeatureType.STATE, shape=(self.state_dim,))
self.features_map[OBS_STATE] = OBS_STATE
# Add camera features for each camera key
if self.enable_cameras and self.camera_keys:
for cam_key in self.camera_keys.split(","):
cam_key = cam_key.strip()
if cam_key:
self.features[cam_key] = PolicyFeature(
type=FeatureType.VISUAL,
shape=(self.camera_height, self.camera_width, 3),
)
self.features_map[cam_key] = f"{OBS_IMAGES}.{cam_key}"
@property
def gym_kwargs(self) -> dict:
return {}
+40 -5
View File
@@ -20,11 +20,11 @@ import gymnasium as gym
from gymnasium.envs.registration import registry as gym_registry
from lerobot.configs.policies import PreTrainedConfig
from lerobot.envs.configs import AlohaEnv, EnvConfig, LiberoEnv, PushtEnv
from lerobot.envs.configs import AlohaEnv, EnvConfig, HubEnvConfig, IsaaclabArenaEnv, LiberoEnv, PushtEnv
from lerobot.envs.utils import _call_make_env, _download_hub_file, _import_hub_module, _normalize_hub_result
from lerobot.policies.xvla.configuration_xvla import XVLAConfig
from lerobot.processor import ProcessorStep
from lerobot.processor.env_processor import LiberoProcessorStep
from lerobot.processor.env_processor import IsaaclabArenaProcessorStep, LiberoProcessorStep
from lerobot.processor.pipeline import PolicyProcessorPipeline
@@ -73,6 +73,26 @@ def make_env_pre_post_processors(
if isinstance(env_cfg, LiberoEnv) or "libero" in env_cfg.type:
preprocessor_steps.append(LiberoProcessorStep())
# For Isaaclab Arena environments, add the IsaaclabArenaProcessorStep
if isinstance(env_cfg, IsaaclabArenaEnv) or "isaaclab_arena" in env_cfg.type:
# Parse comma-separated keys (handle None for state-based policies)
if env_cfg.state_keys:
state_keys = tuple(k.strip() for k in env_cfg.state_keys.split(",") if k.strip())
else:
state_keys = ()
if env_cfg.camera_keys:
camera_keys = tuple(k.strip() for k in env_cfg.camera_keys.split(",") if k.strip())
else:
camera_keys = ()
if not state_keys and not camera_keys:
raise ValueError("At least one of state_keys or camera_keys must be specified.")
preprocessor_steps.append(
IsaaclabArenaProcessorStep(
state_keys=state_keys,
camera_keys=camera_keys,
)
)
preprocessor = PolicyProcessorPipeline(steps=preprocessor_steps)
postprocessor = PolicyProcessorPipeline(steps=postprocessor_steps)
@@ -98,7 +118,6 @@ def make_env(
hub_cache_dir (str | None): Optional cache path for downloaded hub files.
trust_remote_code (bool): **Explicit consent** to execute remote code from the Hub.
Default False — must be set to True to import/exec hub `env.py`.
Raises:
ValueError: if n_envs < 1
ModuleNotFoundError: If the requested env package is not installed
@@ -112,19 +131,35 @@ def make_env(
"""
# if user passed a hub id string (e.g., "username/repo", "username/repo@main:env.py")
# simplified: only support hub-provided `make_env`
# TODO: (jadechoghari): deprecate string API and remove this check
if isinstance(cfg, str):
hub_path: str | None = cfg
elif isinstance(cfg, HubEnvConfig):
hub_path = cfg.hub_path
else:
hub_path = None
# If hub_path is set, download and call hub-provided `make_env`
if hub_path:
# _download_hub_file will raise the same RuntimeError if trust_remote_code is False
repo_id, file_path, local_file, revision = _download_hub_file(cfg, trust_remote_code, hub_cache_dir)
repo_id, file_path, local_file, revision = _download_hub_file(
hub_path, trust_remote_code, hub_cache_dir
)
# import and surface clear import errors
module = _import_hub_module(local_file, repo_id)
# call the hub-provided make_env
raw_result = _call_make_env(module, n_envs=n_envs, use_async_envs=use_async_envs)
env_cfg = None if isinstance(cfg, str) else cfg
raw_result = _call_make_env(module, n_envs=n_envs, use_async_envs=use_async_envs, cfg=env_cfg)
# normalize the return into {suite: {task_id: vec_env}}
return _normalize_hub_result(raw_result)
# At this point, cfg must be an EnvConfig (not a string) since hub_path would have been set otherwise
if isinstance(cfg, str):
raise TypeError("cfg should be an EnvConfig at this point")
if n_envs < 1:
raise ValueError("`n_envs` must be at least 1")
+6 -4
View File
@@ -29,6 +29,8 @@ from gymnasium import spaces
from libero.libero import benchmark, get_libero_path
from libero.libero.envs import OffScreenRenderEnv
from lerobot.processor import RobotObservation
def _parse_camera_names(camera_name: str | Sequence[str]) -> list[str]:
"""Normalize camera_name into a non-empty list of strings."""
@@ -237,7 +239,7 @@ class LiberoEnv(gym.Env):
env.reset()
return env
def _format_raw_obs(self, raw_obs: dict[str, Any]) -> dict[str, Any]:
def _format_raw_obs(self, raw_obs: RobotObservation) -> RobotObservation:
images = {}
for camera_name in self.camera_name:
image = raw_obs[camera_name]
@@ -291,9 +293,9 @@ class LiberoEnv(gym.Env):
def reset(self, seed=None, **kwargs):
super().reset(seed=seed)
self._env.seed(seed)
if self.init_states and self._init_states is not None:
self._env.set_init_state(self._init_states[self._init_state_id])
raw_obs = self._env.reset()
if self.init_states and self._init_states is not None:
raw_obs = self._env.set_init_state(self._init_states[self._init_state_id])
# After reset, objects may be unstable (slightly floating, intersecting, etc.).
# Step the simulator with a no-op action for a few frames so everything settles.
@@ -313,7 +315,7 @@ class LiberoEnv(gym.Env):
info = {"is_success": False}
return observation, info
def step(self, action: np.ndarray) -> tuple[dict[str, Any], float, bool, bool, dict[str, Any]]:
def step(self, action: np.ndarray) -> tuple[RobotObservation, float, bool, bool, dict[str, Any]]:
if action.ndim != 1:
raise ValueError(
f"Expected action to be 1-D (shape (action_dim,)), "
+7 -5
View File
@@ -25,6 +25,8 @@ import metaworld.policies as policies
import numpy as np
from gymnasium import spaces
from lerobot.processor import RobotObservation
# ---- Load configuration data from the external JSON file ----
CONFIG_PATH = Path(__file__).parent / "metaworld_config.json"
try:
@@ -161,7 +163,7 @@ class MetaworldEnv(gym.Env):
env._freeze_rand_vec = False # otherwise no randomization
return env
def _format_raw_obs(self, raw_obs: np.ndarray) -> dict[str, Any]:
def _format_raw_obs(self, raw_obs: np.ndarray) -> RobotObservation:
image = None
if self._env is not None:
image = self._env.render()
@@ -196,7 +198,7 @@ class MetaworldEnv(gym.Env):
self,
seed: int | None = None,
**kwargs,
) -> tuple[dict[str, Any], dict[str, Any]]:
) -> tuple[RobotObservation, dict[str, Any]]:
"""
Reset the environment to its initial state.
@@ -204,7 +206,7 @@ class MetaworldEnv(gym.Env):
seed (Optional[int]): Random seed for environment initialization.
Returns:
observation (Dict[str, Any]): The initial formatted observation.
observation (RobotObservation): The initial formatted observation.
info (Dict[str, Any]): Additional info about the reset state.
"""
super().reset(seed=seed)
@@ -216,7 +218,7 @@ class MetaworldEnv(gym.Env):
info = {"is_success": False}
return observation, info
def step(self, action: np.ndarray) -> tuple[dict[str, Any], float, bool, bool, dict[str, Any]]:
def step(self, action: np.ndarray) -> tuple[RobotObservation, float, bool, bool, dict[str, Any]]:
"""
Perform one environment step.
@@ -224,7 +226,7 @@ class MetaworldEnv(gym.Env):
action (np.ndarray): The action to execute, must be 1-D with shape (action_dim,).
Returns:
observation (Dict[str, Any]): The formatted observation after the step.
observation (RobotObservation): The formatted observation after the step.
reward (float): The scalar reward for this step.
terminated (bool): Whether the episode terminated successfully.
truncated (bool): Whether the episode was truncated due to a time limit.
+18 -5
View File
@@ -29,6 +29,7 @@ from torch import Tensor
from lerobot.configs.types import FeatureType, PolicyFeature
from lerobot.envs.configs import EnvConfig
from lerobot.processor import RobotObservation
from lerobot.utils.constants import OBS_ENV_STATE, OBS_IMAGE, OBS_IMAGES, OBS_STATE, OBS_STR
from lerobot.utils.utils import get_channel_first_image_shape
@@ -46,7 +47,7 @@ def _convert_nested_dict(d):
def preprocess_observation(observations: dict[str, np.ndarray]) -> dict[str, Tensor]:
# TODO(aliberts, rcadene): refactor this to use features from the environment (no hardcoding)
# TODO(jadechoghari, imstevenpmwork): refactor this to use features from the environment (no hardcoding)
"""Convert environment observation to LeRobot format observation.
Args:
observation: Dictionary of observation batches from a Gym vector environment.
@@ -98,11 +99,19 @@ def preprocess_observation(observations: dict[str, np.ndarray]) -> dict[str, Ten
if "robot_state" in observations:
return_observations[f"{OBS_STR}.robot_state"] = _convert_nested_dict(observations["robot_state"])
# Handle IsaacLab Arena format: observations have 'policy' and 'camera_obs' keys
if "policy" in observations:
return_observations[f"{OBS_STR}.policy"] = observations["policy"]
if "camera_obs" in observations:
return_observations[f"{OBS_STR}.camera_obs"] = observations["camera_obs"]
return return_observations
def env_to_policy_features(env_cfg: EnvConfig) -> dict[str, PolicyFeature]:
# TODO(aliberts, rcadene): remove this hardcoding of keys and just use the nested keys as is
# TODO(jadechoghari, imstevenpmwork): remove this hardcoding of keys and just use the nested keys as is
# (need to also refactor preprocess_observation and externalize normalization from policies)
policy_features = {}
for key, ft in env_cfg.features.items():
@@ -144,7 +153,7 @@ def check_env_attributes_and_types(env: gym.vector.VectorEnv) -> None:
)
def add_envs_task(env: gym.vector.VectorEnv, observation: dict[str, Any]) -> dict[str, Any]:
def add_envs_task(env: gym.vector.VectorEnv, observation: RobotObservation) -> RobotObservation:
"""Adds task feature to the observation dict with respect to the first environment attribute."""
if hasattr(env.envs[0], "task_description"):
task_result = env.call("task_description")
@@ -302,7 +311,7 @@ def _import_hub_module(local_file: str, repo_id: str) -> Any:
return module
def _call_make_env(module: Any, n_envs: int, use_async_envs: bool) -> Any:
def _call_make_env(module: Any, n_envs: int, use_async_envs: bool, cfg: EnvConfig | None) -> Any:
"""
Ensure module exposes make_env and call it.
"""
@@ -311,7 +320,11 @@ def _call_make_env(module: Any, n_envs: int, use_async_envs: bool) -> Any:
f"The hub module {getattr(module, '__name__', 'hub_module')} must expose `make_env(n_envs=int, use_async_envs=bool)`."
)
entry_fn = module.make_env
return entry_fn(n_envs=n_envs, use_async_envs=use_async_envs)
# Only pass cfg if it's not None (i.e., when an EnvConfig was provided, not a string hub ID)
if cfg is not None:
return entry_fn(n_envs=n_envs, use_async_envs=use_async_envs, cfg=cfg)
else:
return entry_fn(n_envs=n_envs, use_async_envs=use_async_envs)
def _normalize_hub_result(result: Any) -> dict[str, dict[int, gym.vector.VectorEnv]]:
+5 -1
View File
@@ -14,4 +14,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from .motors_bus import Motor, MotorCalibration, MotorNormMode, MotorsBus
from .motors_bus import (
Motor,
MotorCalibration,
MotorNormMode,
)
+1 -1
View File
@@ -18,7 +18,7 @@ from dataclasses import dataclass
os.environ["PYGAME_HIDE_SUPPORT_PROMPT"] = "1"
from lerobot.motors import MotorCalibration, MotorsBus
from .motors_bus import MotorCalibration, MotorsBus
BAR_LEN, BAR_THICKNESS = 450, 8
HANDLE_R = 10
@@ -14,5 +14,5 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from .config_so100_leader import SO100LeaderConfig
from .so100_leader import SO100Leader
from .damiao import DamiaoMotorsBus
from .tables import *
+808
View File
@@ -0,0 +1,808 @@
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Portions of this file are derived from DM_Control_Python by cmjang.
# Licensed under the MIT License; see `LICENSE` for the full text:
# https://github.com/cmjang/DM_Control_Python
import logging
import time
from contextlib import contextmanager
from copy import deepcopy
from functools import cached_property
from typing import TYPE_CHECKING, Any, TypedDict
from lerobot.utils.import_utils import _can_available
if TYPE_CHECKING or _can_available:
import can
else:
can.Message = object
can.interface = None
import numpy as np
from lerobot.utils.errors import DeviceAlreadyConnectedError, DeviceNotConnectedError
from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import enter_pressed, move_cursor_up
from ..motors_bus import Motor, MotorCalibration, MotorsBusBase, NameOrID, Value
from .tables import (
AVAILABLE_BAUDRATES,
CAN_CMD_DISABLE,
CAN_CMD_ENABLE,
CAN_CMD_REFRESH,
CAN_CMD_SET_ZERO,
CAN_PARAM_ID,
DEFAULT_BAUDRATE,
DEFAULT_TIMEOUT_MS,
MIT_KD_RANGE,
MIT_KP_RANGE,
MOTOR_LIMIT_PARAMS,
MotorType,
)
logger = logging.getLogger(__name__)
LONG_TIMEOUT_SEC = 0.1
MEDIUM_TIMEOUT_SEC = 0.01
SHORT_TIMEOUT_SEC = 0.001
PRECISE_TIMEOUT_SEC = 0.0001
class MotorState(TypedDict):
position: float
velocity: float
torque: float
temp_mos: float
temp_rotor: float
class DamiaoMotorsBus(MotorsBusBase):
"""
The Damiao implementation for a MotorsBus using CAN bus communication.
This class uses python-can for CAN bus communication with Damiao motors.
For more info, see:
- python-can documentation: https://python-can.readthedocs.io/en/stable/
- Seedstudio documentation: https://wiki.seeedstudio.com/damiao_series/
- DM_Control_Python repo: https://github.com/cmjang/DM_Control_Python
"""
# CAN-specific settings
available_baudrates = deepcopy(AVAILABLE_BAUDRATES)
default_baudrate = DEFAULT_BAUDRATE
default_timeout = DEFAULT_TIMEOUT_MS
def __init__(
self,
port: str,
motors: dict[str, Motor],
calibration: dict[str, MotorCalibration] | None = None,
can_interface: str = "auto",
use_can_fd: bool = True,
bitrate: int = 1000000,
data_bitrate: int | None = 5000000,
):
"""
Initialize the Damiao motors bus.
Args:
port: CAN interface name (e.g., "can0" for Linux, "/dev/cu.usbmodem*" for macOS)
motors: Dictionary mapping motor names to Motor objects
calibration: Optional calibration data
can_interface: CAN interface type - "auto" (default), "socketcan" (Linux), or "slcan" (macOS/serial)
use_can_fd: Whether to use CAN FD mode (default: True for OpenArms)
bitrate: Nominal bitrate in bps (default: 1000000 = 1 Mbps)
data_bitrate: Data bitrate for CAN FD in bps (default: 5000000 = 5 Mbps), ignored if use_can_fd is False
"""
super().__init__(port, motors, calibration)
self.port = port
self.can_interface = can_interface
self.use_can_fd = use_can_fd
self.bitrate = bitrate
self.data_bitrate = data_bitrate
self.canbus: can.interface.Bus | None = None
self._is_connected = False
# Map motor names to CAN IDs
self._motor_can_ids: dict[str, int] = {}
self._recv_id_to_motor: dict[int, str] = {}
self._motor_types: dict[str, MotorType] = {}
for name, motor in self.motors.items():
if motor.motor_type_str is None:
raise ValueError(f"Motor '{name}' is missing required 'motor_type'")
self._motor_types[name] = getattr(MotorType, motor.motor_type_str.upper().replace("-", "_"))
# Map recv_id to motor name for filtering responses
if motor.recv_id is not None:
self._recv_id_to_motor[motor.recv_id] = name
# State cache for handling packet drops safely
self._last_known_states: dict[str, MotorState] = {
name: {
"position": 0.0,
"velocity": 0.0,
"torque": 0.0,
"temp_mos": 0.0,
"temp_rotor": 0.0,
}
for name in self.motors
}
# Dynamic gains storage
# Defaults: Kp=10.0 (Stiffness), Kd=0.5 (Damping)
self._gains: dict[str, dict[str, float]] = {name: {"kp": 10.0, "kd": 0.5} for name in self.motors}
@property
def is_connected(self) -> bool:
"""Check if the CAN bus is connected."""
return self._is_connected and self.canbus is not None
def connect(self, handshake: bool = True) -> None:
"""
Open the CAN bus and initialize communication.
Args:
handshake: If True, ping all motors to verify they're present
"""
if self.is_connected:
raise DeviceAlreadyConnectedError(
f"{self.__class__.__name__}('{self.port}') is already connected."
)
try:
# Auto-detect interface type based on port name
if self.can_interface == "auto":
if self.port.startswith("/dev/"):
self.can_interface = "slcan"
logger.info(f"Auto-detected slcan interface for port {self.port}")
else:
self.can_interface = "socketcan"
logger.info(f"Auto-detected socketcan interface for port {self.port}")
# Connect to CAN bus
kwargs = {
"channel": self.port,
"bitrate": self.bitrate,
"interface": self.can_interface,
}
if self.can_interface == "socketcan" and self.use_can_fd and self.data_bitrate is not None:
kwargs.update({"data_bitrate": self.data_bitrate, "fd": True})
logger.info(
f"Connected to {self.port} with CAN FD (bitrate={self.bitrate}, data_bitrate={self.data_bitrate})"
)
else:
logger.info(f"Connected to {self.port} with {self.can_interface} (bitrate={self.bitrate})")
self.canbus = can.interface.Bus(**kwargs)
self._is_connected = True
if handshake:
self._handshake()
logger.debug(f"{self.__class__.__name__} connected via {self.can_interface}.")
except Exception as e:
self._is_connected = False
raise ConnectionError(f"Failed to connect to CAN bus: {e}") from e
def _handshake(self) -> None:
"""
Verify all motors are present and populate initial state cache.
Raises ConnectionError if any motor fails to respond.
"""
logger.info("Starting handshake with motors...")
missing_motors = []
for motor_name in self.motors:
msg = self._refresh_motor(motor_name)
if msg is None:
missing_motors.append(motor_name)
else:
self._process_response(motor_name, msg)
time.sleep(MEDIUM_TIMEOUT_SEC)
if missing_motors:
raise ConnectionError(
f"Handshake failed. The following motors did not respond: {missing_motors}. "
"Check power (24V) and CAN wiring."
)
logger.info("Handshake successful. All motors ready.")
def disconnect(self, disable_torque: bool = True) -> None:
"""
Close the CAN bus connection.
Args:
disable_torque: If True, disable torque on all motors before disconnecting
"""
if not self.is_connected:
raise DeviceNotConnectedError(f"{self.__class__.__name__}('{self.port}') is not connected.")
if disable_torque:
try:
self.disable_torque()
except Exception as e:
logger.warning(f"Failed to disable torque during disconnect: {e}")
if self.canbus:
self.canbus.shutdown()
self.canbus = None
self._is_connected = False
logger.debug(f"{self.__class__.__name__} disconnected.")
def configure_motors(self) -> None:
"""Configure all motors with default settings."""
# Damiao motors don't require much configuration in MIT mode
# Just ensure they're enabled
for motor in self.motors:
self._send_simple_command(motor, CAN_CMD_ENABLE)
time.sleep(MEDIUM_TIMEOUT_SEC)
def _send_simple_command(self, motor: NameOrID, command_byte: int) -> None:
"""Helper to send simple 8-byte commands (Enable, Disable, Zero)."""
motor_id = self._get_motor_id(motor)
motor_name = self._get_motor_name(motor)
recv_id = self._get_motor_recv_id(motor)
data = [0xFF] * 7 + [command_byte]
msg = can.Message(arbitration_id=motor_id, data=data, is_extended_id=False)
self.canbus.send(msg)
if msg := self._recv_motor_response(expected_recv_id=recv_id):
self._process_response(motor_name, msg)
else:
logger.debug(f"No response from {motor_name} after command 0x{command_byte:02X}")
def enable_torque(self, motors: str | list[str] | None = None, num_retry: int = 0) -> None:
"""Enable torque on selected motors."""
target_motors = self._get_motors_list(motors)
for motor in target_motors:
for _ in range(num_retry + 1):
try:
self._send_simple_command(motor, CAN_CMD_ENABLE)
break
except Exception as e:
if _ == num_retry:
raise e
time.sleep(MEDIUM_TIMEOUT_SEC)
def disable_torque(self, motors: str | list[str] | None = None, num_retry: int = 0) -> None:
"""Disable torque on selected motors."""
target_motors = self._get_motors_list(motors)
for motor in target_motors:
for _ in range(num_retry + 1):
try:
self._send_simple_command(motor, CAN_CMD_DISABLE)
break
except Exception as e:
if _ == num_retry:
raise e
time.sleep(MEDIUM_TIMEOUT_SEC)
@contextmanager
def torque_disabled(self, motors: str | list[str] | None = None):
"""
Context manager that guarantees torque is re-enabled.
This helper is useful to temporarily disable torque when configuring motors.
"""
self.disable_torque(motors)
try:
yield
finally:
self.enable_torque(motors)
def set_zero_position(self, motors: str | list[str] | None = None) -> None:
"""Set current position as zero for selected motors."""
target_motors = self._get_motors_list(motors)
for motor in target_motors:
self._send_simple_command(motor, CAN_CMD_SET_ZERO)
time.sleep(MEDIUM_TIMEOUT_SEC)
def _refresh_motor(self, motor: NameOrID) -> can.Message | None:
"""Refresh motor status and return the response."""
motor_id = self._get_motor_id(motor)
recv_id = self._get_motor_recv_id(motor)
data = [motor_id & 0xFF, (motor_id >> 8) & 0xFF, CAN_CMD_REFRESH, 0, 0, 0, 0, 0]
msg = can.Message(arbitration_id=CAN_PARAM_ID, data=data, is_extended_id=False)
self.canbus.send(msg)
return self._recv_motor_response(expected_recv_id=recv_id)
def _recv_motor_response(
self, expected_recv_id: int | None = None, timeout: float = 0.001
) -> can.Message | None:
"""
Receive a response from a motor.
Args:
expected_recv_id: If provided, only return messages from this CAN ID
timeout: Timeout in seconds (default: 1ms for high-speed operation)
Returns:
CAN message if received, None otherwise
"""
try:
start_time = time.time()
messages_seen = []
while time.time() - start_time < timeout:
msg = self.canbus.recv(timeout=PRECISE_TIMEOUT_SEC)
if msg:
messages_seen.append(f"0x{msg.arbitration_id:02X}")
if expected_recv_id is None or msg.arbitration_id == expected_recv_id:
return msg
logger.debug(
f"Ignoring message from 0x{msg.arbitration_id:02X}, expected 0x{expected_recv_id:02X}"
)
if logger.isEnabledFor(logging.DEBUG):
if messages_seen:
logger.debug(
f"Received {len(messages_seen)} msgs from {set(messages_seen)}, expected 0x{expected_recv_id:02X}"
)
else:
logger.debug(f"No CAN messages received (expected 0x{expected_recv_id:02X})")
except Exception as e:
logger.debug(f"Failed to receive CAN message: {e}")
return None
def _recv_all_responses(
self, expected_recv_ids: list[int], timeout: float = 0.002
) -> dict[int, can.Message]:
"""
Efficiently receive responses from multiple motors at once.
Uses the OpenArms pattern: collect all available messages within timeout.
Args:
expected_recv_ids: List of CAN IDs we expect responses from
timeout: Total timeout in seconds (default: 2ms)
Returns:
Dictionary mapping recv_id to CAN message
"""
responses = {}
expected_set = set(expected_recv_ids)
start_time = time.time()
try:
while len(responses) < len(expected_recv_ids) and (time.time() - start_time) < timeout:
# 100us poll timeout
msg = self.canbus.recv(timeout=PRECISE_TIMEOUT_SEC)
if msg and msg.arbitration_id in expected_set:
responses[msg.arbitration_id] = msg
if len(responses) == len(expected_recv_ids):
break
except Exception as e:
logger.debug(f"Error receiving responses: {e}")
return responses
def _encode_mit_packet(
self,
motor_type: MotorType,
kp: float,
kd: float,
position_degrees: float,
velocity_deg_per_sec: float,
torque: float,
) -> list[int]:
"""Helper to encode control parameters into 8 bytes for MIT mode."""
# Convert degrees to radians
position_rad = np.radians(position_degrees)
velocity_rad_per_sec = np.radians(velocity_deg_per_sec)
# Get motor limits
pmax, vmax, tmax = MOTOR_LIMIT_PARAMS[motor_type]
# Encode parameters
kp_uint = self._float_to_uint(kp, *MIT_KP_RANGE, 12)
kd_uint = self._float_to_uint(kd, *MIT_KD_RANGE, 12)
q_uint = self._float_to_uint(position_rad, -pmax, pmax, 16)
dq_uint = self._float_to_uint(velocity_rad_per_sec, -vmax, vmax, 12)
tau_uint = self._float_to_uint(torque, -tmax, tmax, 12)
# Pack data
data = [0] * 8
data[0] = (q_uint >> 8) & 0xFF
data[1] = q_uint & 0xFF
data[2] = dq_uint >> 4
data[3] = ((dq_uint & 0xF) << 4) | ((kp_uint >> 8) & 0xF)
data[4] = kp_uint & 0xFF
data[5] = kd_uint >> 4
data[6] = ((kd_uint & 0xF) << 4) | ((tau_uint >> 8) & 0xF)
data[7] = tau_uint & 0xFF
return data
def _mit_control(
self,
motor: NameOrID,
kp: float,
kd: float,
position_degrees: float,
velocity_deg_per_sec: float,
torque: float,
) -> None:
"""Send MIT control command to a motor."""
motor_id = self._get_motor_id(motor)
motor_name = self._get_motor_name(motor)
motor_type = self._motor_types[motor_name]
data = self._encode_mit_packet(motor_type, kp, kd, position_degrees, velocity_deg_per_sec, torque)
msg = can.Message(arbitration_id=motor_id, data=data, is_extended_id=False)
self.canbus.send(msg)
recv_id = self._get_motor_recv_id(motor)
if msg := self._recv_motor_response(expected_recv_id=recv_id):
self._process_response(motor_name, msg)
else:
logger.debug(f"No response from {motor_name} after MIT control command")
def _mit_control_batch(
self,
commands: dict[NameOrID, tuple[float, float, float, float, float]],
) -> None:
"""
Send MIT control commands to multiple motors in batch.
Sends all commands first, then collects responses.
Args:
commands: Dict mapping motor name/ID to (kp, kd, position_deg, velocity_deg/s, torque)
Example: {'joint_1': (10.0, 0.5, 45.0, 0.0, 0.0), ...}
"""
if not commands:
return
recv_id_to_motor: dict[int, str] = {}
# Step 1: Send all MIT control commands
for motor, (kp, kd, position_degrees, velocity_deg_per_sec, torque) in commands.items():
motor_id = self._get_motor_id(motor)
motor_name = self._get_motor_name(motor)
motor_type = self._motor_types[motor_name]
data = self._encode_mit_packet(motor_type, kp, kd, position_degrees, velocity_deg_per_sec, torque)
msg = can.Message(arbitration_id=motor_id, data=data, is_extended_id=False)
self.canbus.send(msg)
recv_id_to_motor[self._get_motor_recv_id(motor)] = motor_name
# Step 2: Collect responses and update state cache
responses = self._recv_all_responses(list(recv_id_to_motor.keys()), timeout=SHORT_TIMEOUT_SEC)
for recv_id, motor_name in recv_id_to_motor.items():
if msg := responses.get(recv_id):
self._process_response(motor_name, msg)
def _float_to_uint(self, x: float, x_min: float, x_max: float, bits: int) -> int:
"""Convert float to unsigned integer for CAN transmission."""
x = max(x_min, min(x_max, x)) # Clamp to range
span = x_max - x_min
data_norm = (x - x_min) / span
return int(data_norm * ((1 << bits) - 1))
def _uint_to_float(self, x: int, x_min: float, x_max: float, bits: int) -> float:
"""Convert unsigned integer from CAN to float."""
span = x_max - x_min
data_norm = float(x) / ((1 << bits) - 1)
return data_norm * span + x_min
def _decode_motor_state(
self, data: bytearray | bytes, motor_type: MotorType
) -> tuple[float, float, float, int, int]:
"""
Decode motor state from CAN data.
Returns: (position_deg, velocity_deg_s, torque, temp_mos, temp_rotor)
"""
if len(data) < 8:
raise ValueError("Invalid motor state data")
# Extract encoded values
q_uint = (data[1] << 8) | data[2]
dq_uint = (data[3] << 4) | (data[4] >> 4)
tau_uint = ((data[4] & 0x0F) << 8) | data[5]
t_mos = data[6]
t_rotor = data[7]
# Get motor limits
pmax, vmax, tmax = MOTOR_LIMIT_PARAMS[motor_type]
# Decode to physical values
position_rad = self._uint_to_float(q_uint, -pmax, pmax, 16)
velocity_rad_per_sec = self._uint_to_float(dq_uint, -vmax, vmax, 12)
torque = self._uint_to_float(tau_uint, -tmax, tmax, 12)
return np.degrees(position_rad), np.degrees(velocity_rad_per_sec), torque, t_mos, t_rotor
def _process_response(self, motor: str, msg: can.Message) -> None:
"""Decode a message and update the motor state cache."""
try:
motor_type = self._motor_types[motor]
pos, vel, torque, t_mos, t_rotor = self._decode_motor_state(msg.data, motor_type)
self._last_known_states[motor] = {
"position": pos,
"velocity": vel,
"torque": torque,
"temp_mos": float(t_mos),
"temp_rotor": float(t_rotor),
}
except Exception as e:
logger.warning(f"Failed to decode response from {motor}: {e}")
def read(self, data_name: str, motor: str) -> Value:
"""Read a value from a single motor. Positions are always in degrees."""
if not self.is_connected:
raise DeviceNotConnectedError(f"{self} is not connected.")
# Refresh motor to get latest state
msg = self._refresh_motor(motor)
if msg is None:
motor_id = self._get_motor_id(motor)
recv_id = self._get_motor_recv_id(motor)
raise ConnectionError(
f"No response from motor '{motor}' (send ID: 0x{motor_id:02X}, recv ID: 0x{recv_id:02X}). "
f"Check that: 1) Motor is powered (24V), 2) CAN wiring is correct, "
f"3) Motor IDs are configured correctly using Damiao Debugging Tools"
)
self._process_response(motor, msg)
return self._get_cached_value(motor, data_name)
def _get_cached_value(self, motor: str, data_name: str) -> Value:
"""Retrieve a specific value from the cache."""
state = self._last_known_states[motor]
mapping: dict[str, Any] = {
"Present_Position": state["position"],
"Present_Velocity": state["velocity"],
"Present_Torque": state["torque"],
"Temperature_MOS": state["temp_mos"],
"Temperature_Rotor": state["temp_rotor"],
}
if data_name not in mapping:
raise ValueError(f"Unknown data_name: {data_name}")
return mapping[data_name]
def write(
self,
data_name: str,
motor: str,
value: Value,
) -> None:
"""
Write a value to a single motor. Positions are always in degrees.
Can write 'Goal_Position', 'Kp', or 'Kd'.
"""
if not self.is_connected:
raise DeviceNotConnectedError(f"{self} is not connected.")
if data_name in ("Kp", "Kd"):
self._gains[motor][data_name.lower()] = float(value)
elif data_name == "Goal_Position":
kp = self._gains[motor]["kp"]
kd = self._gains[motor]["kd"]
self._mit_control(motor, kp, kd, float(value), 0.0, 0.0)
else:
raise ValueError(f"Writing {data_name} not supported in MIT mode")
def sync_read(
self,
data_name: str,
motors: str | list[str] | None = None,
) -> dict[str, Value]:
"""
Read the same value from multiple motors simultaneously.
"""
target_motors = self._get_motors_list(motors)
self._batch_refresh(target_motors)
result = {}
for motor in target_motors:
result[motor] = self._get_cached_value(motor, data_name)
return result
def sync_read_all_states(
self,
motors: str | list[str] | None = None,
*,
num_retry: int = 0,
) -> dict[str, MotorState]:
"""
Read ALL motor states (position, velocity, torque) from multiple motors in ONE refresh cycle.
Returns:
Dictionary mapping motor names to state dicts with keys: 'position', 'velocity', 'torque'
Example: {'joint_1': {'position': 45.2, 'velocity': 1.3, 'torque': 0.5}, ...}
"""
target_motors = self._get_motors_list(motors)
self._batch_refresh(target_motors)
result = {}
for motor in target_motors:
result[motor] = self._last_known_states[motor].copy()
return result
def _batch_refresh(self, motors: list[str]) -> None:
"""Internal helper to refresh a list of motors and update cache."""
# Send refresh commands
for motor in motors:
motor_id = self._get_motor_id(motor)
data = [motor_id & 0xFF, (motor_id >> 8) & 0xFF, CAN_CMD_REFRESH, 0, 0, 0, 0, 0]
msg = can.Message(arbitration_id=CAN_PARAM_ID, data=data, is_extended_id=False)
self.canbus.send(msg)
# Small delay to reduce bus congestion if necessary, though removed in sync_read previously
# precise_sleep(PRECISE_SLEEP_SEC)
# Collect responses
expected_recv_ids = [self._get_motor_recv_id(m) for m in motors]
responses = self._recv_all_responses(expected_recv_ids, timeout=MEDIUM_TIMEOUT_SEC)
# Update cache
for motor in motors:
recv_id = self._get_motor_recv_id(motor)
msg = responses.get(recv_id)
if msg:
self._process_response(motor, msg)
else:
logger.warning(f"Packet drop: {motor} (ID: 0x{recv_id:02X}). Using last known state.")
def sync_write(self, data_name: str, values: Value | dict[str, Value]) -> None:
"""
Write values to multiple motors simultaneously. Positions are always in degrees.
"""
if data_name in ("Kp", "Kd"):
key = data_name.lower()
for motor, val in values.items():
self._gains[motor][key] = float(val)
elif data_name == "Goal_Position":
# Step 1: Send all MIT control commands
recv_id_to_motor: dict[int, str] = {}
for motor, value_degrees in values.items():
motor_id = self._get_motor_id(motor)
motor_name = self._get_motor_name(motor)
motor_type = self._motor_types[motor_name]
kp = self._gains[motor]["kp"]
kd = self._gains[motor]["kd"]
data = self._encode_mit_packet(motor_type, kp, kd, float(value_degrees), 0.0, 0.0)
msg = can.Message(arbitration_id=motor_id, data=data, is_extended_id=False)
self.canbus.send(msg)
precise_sleep(PRECISE_TIMEOUT_SEC)
recv_id_to_motor[self._get_motor_recv_id(motor)] = motor_name
# Step 2: Collect responses and update state cache
responses = self._recv_all_responses(list(recv_id_to_motor.keys()), timeout=MEDIUM_TIMEOUT_SEC)
for recv_id, motor_name in recv_id_to_motor.items():
if msg := responses.get(recv_id):
self._process_response(motor_name, msg)
else:
# Fall back to individual writes
for motor, value in values.items():
self.write(data_name, motor, value)
def read_calibration(self) -> dict[str, MotorCalibration]:
"""Read calibration data from motors."""
# Damiao motors don't store calibration internally
# Return existing calibration or empty dict
return self.calibration if self.calibration else {}
def write_calibration(self, calibration_dict: dict[str, MotorCalibration], cache: bool = True) -> None:
"""Write calibration data to motors."""
# Damiao motors don't store calibration internally
# Just cache it in memory
if cache:
self.calibration = calibration_dict
def record_ranges_of_motion(
self,
motors: NameOrID | list[NameOrID] | None = None,
display_values: bool = True,
) -> tuple[dict[NameOrID, Value], dict[NameOrID, Value]]:
"""
Interactively record the min/max values of each motor in degrees.
Move the joints by hand (with torque disabled) while the method streams live positions.
Press Enter to finish.
"""
target_motors = self._get_motors_list(motors)
self.disable_torque(target_motors)
time.sleep(LONG_TIMEOUT_SEC)
start_positions = self.sync_read("Present_Position", target_motors)
mins = start_positions.copy()
maxes = start_positions.copy()
print("\nMove joints through their full range of motion. Press ENTER when done.")
user_pressed_enter = False
while not user_pressed_enter:
positions = self.sync_read("Present_Position", target_motors)
for motor in target_motors:
if motor in positions:
mins[motor] = min(positions[motor], mins.get(motor, positions[motor]))
maxes[motor] = max(positions[motor], maxes.get(motor, positions[motor]))
if display_values:
print("\n" + "=" * 50)
print(f"{'MOTOR':<20} | {'MIN (deg)':>12} | {'POS (deg)':>12} | {'MAX (deg)':>12}")
print("-" * 50)
for motor in target_motors:
if motor in positions:
print(
f"{motor:<20} | {mins[motor]:>12.1f} | {positions[motor]:>12.1f} | {maxes[motor]:>12.1f}"
)
if enter_pressed():
user_pressed_enter = True
if display_values and not user_pressed_enter:
move_cursor_up(len(target_motors) + 4)
time.sleep(LONG_TIMEOUT_SEC)
self.enable_torque(target_motors)
for motor in target_motors:
if (motor in mins) and (motor in maxes) and (int(abs(maxes[motor] - mins[motor])) < 5):
raise ValueError(f"Motor {motor} has insufficient range of motion (< 5 degrees)")
return mins, maxes
def _get_motors_list(self, motors: str | list[str] | None) -> list[str]:
"""Convert motor specification to list of motor names."""
if motors is None:
return list(self.motors.keys())
elif isinstance(motors, str):
return [motors]
elif isinstance(motors, list):
return motors
else:
raise TypeError(f"Invalid motors type: {type(motors)}")
def _get_motor_id(self, motor: NameOrID) -> int:
"""Get CAN ID for a motor."""
if isinstance(motor, str):
if motor in self.motors:
return self.motors[motor].id
else:
raise ValueError(f"Unknown motor: {motor}")
else:
return motor
def _get_motor_name(self, motor: NameOrID) -> str:
"""Get motor name from name or ID."""
if isinstance(motor, str):
return motor
else:
for name, m in self.motors.items():
if m.id == motor:
return name
raise ValueError(f"Unknown motor ID: {motor}")
def _get_motor_recv_id(self, motor: NameOrID) -> int:
"""Get motor recv_id from name or ID."""
motor_name = self._get_motor_name(motor)
motor_obj = self.motors.get(motor_name)
if motor_obj and motor_obj.recv_id is not None:
return motor_obj.recv_id
else:
raise ValueError(f"Motor {motor_obj} doesn't have a valid recv_id (None).")
@cached_property
def is_calibrated(self) -> bool:
"""Check if motors are calibrated."""
return bool(self.calibration)
+209
View File
@@ -0,0 +1,209 @@
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Configuration tables for Damiao motors."""
from enum import IntEnum
# Motor type definitions
class MotorType(IntEnum):
DM3507 = 0
DM4310 = 1
DM4310_48V = 2
DM4340 = 3
DM4340_48V = 4
DM6006 = 5
DM8006 = 6
DM8009 = 7
DM10010L = 8
DM10010 = 9
DMH3510 = 10
DMH6215 = 11
DMG6220 = 12
# Control modes
class ControlMode(IntEnum):
MIT = 1
POS_VEL = 2
VEL = 3
TORQUE_POS = 4
# Motor variable IDs (RID)
class MotorVariable(IntEnum):
UV_VALUE = 0
KT_VALUE = 1
OT_VALUE = 2
OC_VALUE = 3
ACC = 4
DEC = 5
MAX_SPD = 6
MST_ID = 7
ESC_ID = 8
TIMEOUT = 9
CTRL_MODE = 10
DAMP = 11
INERTIA = 12
HW_VER = 13
SW_VER = 14
SN = 15
NPP = 16
RS = 17
LS = 18
FLUX = 19
GR = 20
PMAX = 21
VMAX = 22
TMAX = 23
I_BW = 24
KP_ASR = 25
KI_ASR = 26
KP_APR = 27
KI_APR = 28
OV_VALUE = 29
GREF = 30
DETA = 31
V_BW = 32
IQ_C1 = 33
VL_C1 = 34
CAN_BR = 35
SUB_VER = 36
U_OFF = 50
V_OFF = 51
K1 = 52
K2 = 53
M_OFF = 54
DIR = 55
P_M = 80
XOUT = 81
# Motor limit parameters [PMAX, VMAX, TMAX]
# PMAX: Maximum position (rad)
# VMAX: Maximum velocity (rad/s)
# TMAX: Maximum torque (N·m)
MOTOR_LIMIT_PARAMS = {
MotorType.DM3507: (12.5, 30, 10),
MotorType.DM4310: (12.5, 30, 10),
MotorType.DM4310_48V: (12.5, 50, 10),
MotorType.DM4340: (12.5, 8, 28),
MotorType.DM4340_48V: (12.5, 10, 28),
MotorType.DM6006: (12.5, 45, 20),
MotorType.DM8006: (12.5, 45, 40),
MotorType.DM8009: (12.5, 45, 54),
MotorType.DM10010L: (12.5, 25, 200),
MotorType.DM10010: (12.5, 20, 200),
MotorType.DMH3510: (12.5, 280, 1),
MotorType.DMH6215: (12.5, 45, 10),
MotorType.DMG6220: (12.5, 45, 10),
}
# Motor model names
MODEL_NAMES = {
MotorType.DM3507: "dm3507",
MotorType.DM4310: "dm4310",
MotorType.DM4310_48V: "dm4310_48v",
MotorType.DM4340: "dm4340",
MotorType.DM4340_48V: "dm4340_48v",
MotorType.DM6006: "dm6006",
MotorType.DM8006: "dm8006",
MotorType.DM8009: "dm8009",
MotorType.DM10010L: "dm10010l",
MotorType.DM10010: "dm10010",
MotorType.DMH3510: "dmh3510",
MotorType.DMH6215: "dmh6215",
MotorType.DMG6220: "dmg6220",
}
# Motor resolution table (encoder counts per revolution)
MODEL_RESOLUTION = {
"dm3507": 65536,
"dm4310": 65536,
"dm4310_48v": 65536,
"dm4340": 65536,
"dm4340_48v": 65536,
"dm6006": 65536,
"dm8006": 65536,
"dm8009": 65536,
"dm10010l": 65536,
"dm10010": 65536,
"dmh3510": 65536,
"dmh6215": 65536,
"dmg6220": 65536,
}
# CAN baudrates supported by Damiao motors
AVAILABLE_BAUDRATES = [
125000, # 0: 125 kbps
200000, # 1: 200 kbps
250000, # 2: 250 kbps
500000, # 3: 500 kbps
1000000, # 4: 1 mbps (default for OpenArms)
2000000, # 5: 2 mbps
2500000, # 6: 2.5 mbps
3200000, # 7: 3.2 mbps
4000000, # 8: 4 mbps
5000000, # 9: 5 mbps
]
DEFAULT_BAUDRATE = 1000000 # 1 Mbps is standard for OpenArms
# Default timeout in milliseconds
DEFAULT_TIMEOUT_MS = 1000
# OpenArms specific configurations
# Based on: https://docs.openarm.dev/software/setup/configure-test
# OpenArms has 7 DOF per arm (14 total for dual arm)
OPENARMS_ARM_MOTOR_IDS = {
"joint_1": {"send": 0x01, "recv": 0x11}, # J1 - Shoulder pan
"joint_2": {"send": 0x02, "recv": 0x12}, # J2 - Shoulder lift
"joint_3": {"send": 0x03, "recv": 0x13}, # J3 - Elbow flex
"joint_4": {"send": 0x04, "recv": 0x14}, # J4 - Wrist flex
"joint_5": {"send": 0x05, "recv": 0x15}, # J5 - Wrist roll
"joint_6": {"send": 0x06, "recv": 0x16}, # J6 - Wrist pitch
"joint_7": {"send": 0x07, "recv": 0x17}, # J7 - Wrist rotation
}
OPENARMS_GRIPPER_MOTOR_IDS = {
"gripper": {"send": 0x08, "recv": 0x18}, # J8 - Gripper
}
# Default motor types for OpenArms
OPENARMS_DEFAULT_MOTOR_TYPES = {
"joint_1": MotorType.DM8009, # Shoulder pan - high torque
"joint_2": MotorType.DM8009, # Shoulder lift - high torque
"joint_3": MotorType.DM4340, # Shoulder rotation
"joint_4": MotorType.DM4340, # Elbow flex
"joint_5": MotorType.DM4310, # Wrist roll
"joint_6": MotorType.DM4310, # Wrist pitch
"joint_7": MotorType.DM4310, # Wrist rotation
"gripper": MotorType.DM4310, # Gripper
}
# MIT control parameter ranges
MIT_KP_RANGE = (0.0, 500.0)
MIT_KD_RANGE = (0.0, 5.0)
# CAN frame command IDs
CAN_CMD_ENABLE = 0xFC
CAN_CMD_DISABLE = 0xFD
CAN_CMD_SET_ZERO = 0xFE
CAN_CMD_REFRESH = 0xCC
CAN_CMD_QUERY_PARAM = 0x33
CAN_CMD_WRITE_PARAM = 0x55
CAN_CMD_SAVE_PARAM = 0xAA
# CAN ID for parameter operations
CAN_PARAM_ID = 0x7FF
+5 -6
View File
@@ -22,9 +22,8 @@ import logging
from copy import deepcopy
from enum import Enum
from lerobot.motors.encoding_utils import decode_twos_complement, encode_twos_complement
from ..motors_bus import Motor, MotorCalibration, MotorsBus, NameOrID, Value, get_address
from ..encoding_utils import decode_twos_complement, encode_twos_complement
from ..motors_bus import Motor, MotorCalibration, NameOrID, SerialMotorsBus, Value, get_address
from .tables import (
AVAILABLE_BAUDRATES,
MODEL_BAUDRATE_TABLE,
@@ -100,7 +99,7 @@ def _split_into_byte_chunks(value: int, length: int) -> list[int]:
return data
class DynamixelMotorsBus(MotorsBus):
class DynamixelMotorsBus(SerialMotorsBus):
"""
The Dynamixel implementation for a MotorsBus. It relies on the python dynamixel sdk to communicate with
the motors. For more info, see the Dynamixel SDK Documentation:
@@ -203,9 +202,9 @@ class DynamixelMotorsBus(MotorsBus):
for motor in self._get_motors_list(motors):
self.write("Torque_Enable", motor, TorqueMode.DISABLED.value, num_retry=num_retry)
def _disable_torque(self, motor_id: int, model: str, num_retry: int = 0) -> None:
def _disable_torque(self, motor: int, model: str, num_retry: int = 0) -> None:
addr, length = get_address(self.model_ctrl_table, model, "Torque_Enable")
self._write(addr, length, motor_id, TorqueMode.DISABLED.value, num_retry=num_retry)
self._write(addr, length, motor, TorqueMode.DISABLED.value, num_retry=num_retry)
def enable_torque(self, motors: str | list[str] | None = None, num_retry: int = 0) -> None:
for motor in self._get_motors_list(motors):
+6 -7
View File
@@ -17,9 +17,8 @@ from copy import deepcopy
from enum import Enum
from pprint import pformat
from lerobot.motors.encoding_utils import decode_sign_magnitude, encode_sign_magnitude
from ..motors_bus import Motor, MotorCalibration, MotorsBus, NameOrID, Value, get_address
from ..encoding_utils import decode_sign_magnitude, encode_sign_magnitude
from ..motors_bus import Motor, MotorCalibration, NameOrID, SerialMotorsBus, Value, get_address
from .tables import (
FIRMWARE_MAJOR_VERSION,
FIRMWARE_MINOR_VERSION,
@@ -96,7 +95,7 @@ def patch_setPacketTimeout(self, packet_length): # noqa: N802
self.packet_timeout = (self.tx_time_per_byte * packet_length) + (self.tx_time_per_byte * 3.0) + 50
class FeetechMotorsBus(MotorsBus):
class FeetechMotorsBus(SerialMotorsBus):
"""
The FeetechMotorsBus class allows to efficiently read and write to the attached motors. It relies on the
python feetech sdk to communicate with the motors, which is itself based on the dynamixel sdk.
@@ -298,11 +297,11 @@ class FeetechMotorsBus(MotorsBus):
self.write("Torque_Enable", motor, TorqueMode.DISABLED.value, num_retry=num_retry)
self.write("Lock", motor, 0, num_retry=num_retry)
def _disable_torque(self, motor_id: int, model: str, num_retry: int = 0) -> None:
def _disable_torque(self, motor: int, model: str, num_retry: int = 0) -> None:
addr, length = get_address(self.model_ctrl_table, model, "Torque_Enable")
self._write(addr, length, motor_id, TorqueMode.DISABLED.value, num_retry=num_retry)
self._write(addr, length, motor, TorqueMode.DISABLED.value, num_retry=num_retry)
addr, length = get_address(self.model_ctrl_table, model, "Lock")
self._write(addr, length, motor_id, 0, num_retry=num_retry)
self._write(addr, length, motor, 0, num_retry=num_retry)
def enable_torque(self, motors: str | list[str] | None = None, num_retry: int = 0) -> None:
for motor in self._get_motors_list(motors):
+1
View File
@@ -205,6 +205,7 @@ MODEL_BAUDRATE_TABLE = {
# Sign-Magnitude encoding bits
STS_SMS_SERIES_ENCODINGS_TABLE = {
"Present_Load": 10,
"Homing_Offset": 11,
"Goal_Position": 15,
"Goal_Velocity": 15,
+98 -35
View File
@@ -19,6 +19,8 @@
# TODO(aliberts): Add block noqa when feature below is available
# https://github.com/astral-sh/ruff/issues/3711
from __future__ import annotations
import abc
import logging
from contextlib import contextmanager
@@ -32,7 +34,7 @@ import serial
from deepdiff import DeepDiff
from tqdm import tqdm
from lerobot.utils.errors import DeviceAlreadyConnectedError, DeviceNotConnectedError
from lerobot.utils.decorators import check_if_already_connected, check_if_not_connected
from lerobot.utils.utils import enter_pressed, move_cursor_up
NameOrID: TypeAlias = str | int
@@ -41,6 +43,81 @@ Value: TypeAlias = int | float
logger = logging.getLogger(__name__)
class MotorsBusBase(abc.ABC):
"""
Base class for all motor bus implementations.
This is a minimal interface that all motor buses must implement, regardless of their
communication protocol (serial, CAN, etc.).
"""
def __init__(
self,
port: str,
motors: dict[str, Motor],
calibration: dict[str, MotorCalibration] | None = None,
):
self.port = port
self.motors = motors
self.calibration = calibration if calibration else {}
@abc.abstractmethod
def connect(self, handshake: bool = True) -> None:
"""Establish connection to the motors."""
pass
@abc.abstractmethod
def disconnect(self, disable_torque: bool = True) -> None:
"""Disconnect from the motors."""
pass
@property
@abc.abstractmethod
def is_connected(self) -> bool:
"""Check if connected to the motors."""
pass
@abc.abstractmethod
def read(self, data_name: str, motor: str) -> Value:
"""Read a value from a single motor."""
pass
@abc.abstractmethod
def write(self, data_name: str, motor: str, value: Value) -> None:
"""Write a value to a single motor."""
pass
@abc.abstractmethod
def sync_read(self, data_name: str, motors: str | list[str] | None = None) -> dict[str, Value]:
"""Read a value from multiple motors."""
pass
@abc.abstractmethod
def sync_write(self, data_name: str, values: Value | dict[str, Value]) -> None:
"""Write values to multiple motors."""
pass
@abc.abstractmethod
def enable_torque(self, motors: str | list[str] | None = None, num_retry: int = 0) -> None:
"""Enable torque on selected motors."""
pass
@abc.abstractmethod
def disable_torque(self, motors: str | list[str] | None = None, num_retry: int = 0) -> None:
"""Disable torque on selected motors."""
pass
@abc.abstractmethod
def read_calibration(self) -> dict[str, MotorCalibration]:
"""Read calibration parameters from the motors."""
pass
@abc.abstractmethod
def write_calibration(self, calibration_dict: dict[str, MotorCalibration], cache: bool = True) -> None:
"""Write calibration parameters to the motors."""
pass
def get_ctrl_table(model_ctrl_table: dict[str, dict], model: str) -> dict[str, tuple[int, int]]:
ctrl_table = model_ctrl_table.get(model)
if ctrl_table is None:
@@ -97,6 +174,8 @@ class Motor:
id: int
model: str
norm_mode: MotorNormMode
motor_type_str: str | None = None
recv_id: int | None = None
class PortHandler(Protocol):
@@ -203,15 +282,15 @@ class GroupSyncWrite(Protocol):
def txPacket(self): ...
class MotorsBus(abc.ABC):
class SerialMotorsBus(MotorsBusBase):
"""
A MotorsBus allows to efficiently read and write to the attached motors.
A SerialMotorsBus allows to efficiently read and write to motors connected via serial communication.
It represents several motors daisy-chained together and connected through a serial port.
There are currently two implementations of this abstract class:
There are currently two implementations of this class:
- DynamixelMotorsBus
- FeetechMotorsBus
Note: This class may evolve in the future should we add support for other types of bus.
This class is specifically for serial-based motor protocols (Dynamixel, Feetech, etc.).
A MotorsBus subclass instance requires a port (e.g. `FeetechMotorsBus(port="/dev/tty.usbmodem575E0031751"`)).
To find the port, you can run our utility script:
@@ -260,9 +339,7 @@ class MotorsBus(abc.ABC):
motors: dict[str, Motor],
calibration: dict[str, MotorCalibration] | None = None,
):
self.port = port
self.motors = motors
self.calibration = calibration if calibration else {}
super().__init__(port, motors, calibration)
self.port_handler: PortHandler
self.packet_handler: PacketHandler
@@ -411,6 +488,7 @@ class MotorsBus(abc.ABC):
"""bool: `True` if the underlying serial port is open."""
return self.port_handler.is_open
@check_if_already_connected
def connect(self, handshake: bool = True) -> None:
"""Open the serial port and initialise communication.
@@ -422,10 +500,6 @@ class MotorsBus(abc.ABC):
DeviceAlreadyConnectedError: The port is already open.
ConnectionError: The underlying SDK failed to open the port or the handshake did not succeed.
"""
if self.is_connected:
raise DeviceAlreadyConnectedError(
f"{self.__class__.__name__}('{self.port}') is already connected. Do not call `{self.__class__.__name__}.connect()` twice."
)
self._connect(handshake)
self.set_timeout()
@@ -447,6 +521,7 @@ class MotorsBus(abc.ABC):
def _handshake(self) -> None:
pass
@check_if_not_connected
def disconnect(self, disable_torque: bool = True) -> None:
"""Close the serial port (optionally disabling torque first).
@@ -455,10 +530,6 @@ class MotorsBus(abc.ABC):
closing the port. This can prevent damaging motors if they are left applying resisting torque
after disconnect.
"""
if not self.is_connected:
raise DeviceNotConnectedError(
f"{self.__class__.__name__}('{self.port}') is not connected. Try running `{self.__class__.__name__}.connect()` first."
)
if disable_torque:
self.port_handler.clearPort()
@@ -538,7 +609,7 @@ class MotorsBus(abc.ABC):
self.set_baudrate(self.default_baudrate)
@abc.abstractmethod
def _find_single_motor(self, motor: str, initial_baudrate: int | None) -> tuple[int, int]:
def _find_single_motor(self, motor: str, initial_baudrate: int | None = None) -> tuple[int, int]:
pass
@abc.abstractmethod
@@ -551,13 +622,13 @@ class MotorsBus(abc.ABC):
pass
@abc.abstractmethod
def disable_torque(self, motors: int | str | list[str] | None = None, num_retry: int = 0) -> None:
def disable_torque(self, motors: str | list[str] | None = None, num_retry: int = 0) -> None:
"""Disable torque on selected motors.
Disabling Torque allows to write to the motors' permanent memory area (EPROM/EEPROM).
Args:
motors (int | str | list[str] | None, optional): Target motors. Accepts a motor name, an ID, a
motors ( str | list[str] | None, optional): Target motors. Accepts a motor name, an ID, a
list of names or `None` to affect every registered motor. Defaults to `None`.
num_retry (int, optional): Number of additional retry attempts on communication failure.
Defaults to 0.
@@ -907,6 +978,7 @@ class MotorsBus(abc.ABC):
"""
pass
@check_if_not_connected
def read(
self,
data_name: str,
@@ -927,10 +999,6 @@ class MotorsBus(abc.ABC):
Returns:
Value: Raw or normalised value depending on *normalize*.
"""
if not self.is_connected:
raise DeviceNotConnectedError(
f"{self.__class__.__name__}('{self.port}') is not connected. You need to run `{self.__class__.__name__}.connect()`."
)
id_ = self.motors[motor].id
model = self.motors[motor].model
@@ -981,6 +1049,7 @@ class MotorsBus(abc.ABC):
return value, comm, error
@check_if_not_connected
def write(
self, data_name: str, motor: str, value: Value, *, normalize: bool = True, num_retry: int = 0
) -> None:
@@ -999,10 +1068,6 @@ class MotorsBus(abc.ABC):
normalize (bool, optional): Enable or disable normalisation. Defaults to `True`.
num_retry (int, optional): Retry attempts. Defaults to `0`.
"""
if not self.is_connected:
raise DeviceNotConnectedError(
f"{self.__class__.__name__}('{self.port}') is not connected. You need to run `{self.__class__.__name__}.connect()`."
)
id_ = self.motors[motor].id
model = self.motors[motor].model
@@ -1044,6 +1109,7 @@ class MotorsBus(abc.ABC):
return comm, error
@check_if_not_connected
def sync_read(
self,
data_name: str,
@@ -1063,10 +1129,6 @@ class MotorsBus(abc.ABC):
Returns:
dict[str, Value]: Mapping *motor name value*.
"""
if not self.is_connected:
raise DeviceNotConnectedError(
f"{self.__class__.__name__}('{self.port}') is not connected. You need to run `{self.__class__.__name__}.connect()`."
)
self._assert_protocol_is_compatible("sync_read")
@@ -1139,6 +1201,7 @@ class MotorsBus(abc.ABC):
# for id_ in motor_ids:
# value = self.sync_reader.getData(id_, address, length)
@check_if_not_connected
def sync_write(
self,
data_name: str,
@@ -1160,10 +1223,6 @@ class MotorsBus(abc.ABC):
normalize (bool, optional): If `True` (default) convert values from the user range to raw units.
num_retry (int, optional): Retry attempts. Defaults to `0`.
"""
if not self.is_connected:
raise DeviceNotConnectedError(
f"{self.__class__.__name__}('{self.port}') is not connected. You need to run `{self.__class__.__name__}.connect()`."
)
ids_values = self._get_ids_values_dict(values)
models = [self._id_to_model(id_) for id_ in ids_values]
@@ -1212,3 +1271,7 @@ class MotorsBus(abc.ABC):
for id_, value in ids_values.items():
data = self._serialize_data(value, length)
self.sync_writer.addParam(id_, data)
# Backward compatibility alias
MotorsBus: TypeAlias = SerialMotorsBus
+2
View File
@@ -16,6 +16,7 @@ from .act.configuration_act import ACTConfig as ACTConfig
from .diffusion.configuration_diffusion import DiffusionConfig as DiffusionConfig
from .groot.configuration_groot import GrootConfig as GrootConfig
from .pi0.configuration_pi0 import PI0Config as PI0Config
from .pi0_fast.configuration_pi0_fast import PI0FastConfig as PI0FastConfig
from .pi05.configuration_pi05 import PI05Config as PI05Config
from .smolvla.configuration_smolvla import SmolVLAConfig as SmolVLAConfig
from .smolvla.processor_smolvla import SmolVLANewLineProcessor
@@ -29,6 +30,7 @@ __all__ = [
"DiffusionConfig",
"PI0Config",
"PI05Config",
"PI0FastConfig",
"SmolVLAConfig",
"SARMConfig",
"TDMPCConfig",
+40 -3
View File
@@ -51,7 +51,11 @@ from lerobot.processor.converters import (
transition_to_batch,
transition_to_policy_action,
)
from lerobot.utils.constants import POLICY_POSTPROCESSOR_DEFAULT_NAME, POLICY_PREPROCESSOR_DEFAULT_NAME
from lerobot.utils.constants import (
ACTION,
POLICY_POSTPROCESSOR_DEFAULT_NAME,
POLICY_PREPROCESSOR_DEFAULT_NAME,
)
def get_policy_class(name: str) -> type[PreTrainedPolicy]:
@@ -91,6 +95,10 @@ def get_policy_class(name: str) -> type[PreTrainedPolicy]:
from lerobot.policies.pi0.modeling_pi0 import PI0Policy
return PI0Policy
elif name == "pi0_fast":
from lerobot.policies.pi0_fast.modeling_pi0_fast import PI0FastPolicy
return PI0FastPolicy
elif name == "pi05":
from lerobot.policies.pi05.modeling_pi05 import PI05Policy
@@ -246,7 +254,7 @@ def make_pre_post_processors(
}
# Also ensure postprocessing slices to env action dim and unnormalizes with dataset stats
env_action_dim = policy_cfg.output_features["action"].shape[0]
env_action_dim = policy_cfg.output_features[ACTION].shape[0]
postprocessor_overrides["groot_action_unpack_unnormalize_v1"] = {
"stats": kwargs.get("dataset_stats"),
"normalize_min_max": True,
@@ -471,11 +479,40 @@ def make_policy(
if ds_meta is not None:
kwargs["dataset_meta"] = ds_meta
if cfg.pretrained_path:
if not cfg.pretrained_path and cfg.use_peft:
raise ValueError(
"Instantiating a policy with `use_peft=True` without a checkpoint is not supported since that requires "
"the PEFT config parameters to be set. For training with PEFT, see `lerobot_train.py` on how to do that."
)
if cfg.pretrained_path and not cfg.use_peft:
# Load a pretrained policy and override the config if needed (for example, if there are inference-time
# hyperparameters that we want to vary).
kwargs["pretrained_name_or_path"] = cfg.pretrained_path
policy = policy_cls.from_pretrained(**kwargs)
elif cfg.pretrained_path and cfg.use_peft:
# Load a pretrained PEFT model on top of the policy. The pretrained path points to the folder/repo
# of the adapter and the adapter's config contains the path to the base policy. So we need the
# adapter config first, then load the correct policy and then apply PEFT.
from peft import PeftConfig, PeftModel
logging.info("Loading policy's PEFT adapter.")
peft_pretrained_path = cfg.pretrained_path
peft_config = PeftConfig.from_pretrained(peft_pretrained_path)
kwargs["pretrained_name_or_path"] = peft_config.base_model_name_or_path
if not kwargs["pretrained_name_or_path"]:
# This means that there's a bug or we trained a policy from scratch using PEFT.
# It is more likely that this is a bug so we'll raise an error.
raise ValueError(
"No pretrained model name found in adapter config. Can't instantiate the pre-trained policy on which "
"the adapter was trained."
)
policy = policy_cls.from_pretrained(**kwargs)
policy = PeftModel.from_pretrained(policy, peft_pretrained_path, config=peft_config)
else:
# Make a fresh policy.
policy = policy_cls(**kwargs)
@@ -20,6 +20,7 @@ from lerobot.configs.policies import PreTrainedConfig
from lerobot.configs.types import FeatureType, NormalizationMode, PolicyFeature
from lerobot.optim.optimizers import AdamWConfig
from lerobot.optim.schedulers import CosineDecayWithWarmupSchedulerConfig
from lerobot.utils.constants import ACTION, OBS_STATE
@PreTrainedConfig.register_subclass("groot")
@@ -137,14 +138,14 @@ class GrootConfig(PreTrainedConfig):
"No features of type FeatureType.VISUAL found in input_features."
)
if "observation.state" not in self.input_features:
if OBS_STATE not in self.input_features:
state_feature = PolicyFeature(
type=FeatureType.STATE,
shape=(self.max_state_dim,),
)
self.input_features["observation.state"] = state_feature
self.input_features[OBS_STATE] = state_feature
else:
state_shape = self.input_features["observation.state"].shape
state_shape = self.input_features[OBS_STATE].shape
state_dim = state_shape[0] if state_shape else 0
if state_dim > self.max_state_dim:
raise ValueError(
@@ -152,14 +153,14 @@ class GrootConfig(PreTrainedConfig):
f"Either reduce state dimension or increase max_state_dim in config."
)
if "action" not in self.output_features:
if ACTION not in self.output_features:
action_feature = PolicyFeature(
type=FeatureType.ACTION,
shape=(self.max_action_dim,),
)
self.output_features["action"] = action_feature
self.output_features[ACTION] = action_feature
else:
action_shape = self.output_features["action"].shape
action_shape = self.output_features[ACTION].shape
action_dim = action_shape[0] if action_shape else 0
if action_dim > self.max_action_dim:
raise ValueError(
+3 -3
View File
@@ -46,7 +46,7 @@ from lerobot.policies.groot.action_head.flow_matching_action_head import (
FlowmatchingActionHeadConfig,
)
from lerobot.policies.groot.utils import ensure_eagle_cache_ready
from lerobot.utils.constants import HF_LEROBOT_HOME
from lerobot.utils.constants import ACTION, HF_LEROBOT_HOME
DEFAULT_VENDOR_EAGLE_PATH = str((Path(__file__).resolve().parent / "eagle2_hg_model").resolve())
DEFAULT_TOKENIZER_ASSETS_REPO = "lerobot/eagle2hg-processor-groot-n1p5"
@@ -227,8 +227,8 @@ class GR00TN15(PreTrainedModel):
detected_error = False
error_msg = ERROR_MSG
if "action" in inputs:
action = inputs["action"]
if ACTION in inputs:
action = inputs[ACTION]
# In inference, action may be omitted or None; validate only when it's a tensor.
if action is None:
pass # allow None during inference

Some files were not shown because too many files have changed in this diff Show More