mirror of https://github.com/huggingface/lerobot.git synced 2026-05-11 14:49:43 +00:00

Files

T

Bryson Jones 2e069b1c47 Feature/add multitask diffusion transformer policy implementation (#2545 )

* Add multitask diffusion transformer policy

Add multitask diffusion transformer policy

* expand the observation encoder to support differnt size encoders for vision and text

* add RoPE attention module as this is shown to help training dynamics and generation quality for DiTs

* update readme and citations for multitask dit policy

* remove dino vision encoder and simplify text and vision encoders by removing inheritance structure

* adjust factory comment

* update docstring for multitask dit policy processor file

* simplify config for multitask dit by merging and flattening everything, then adding comments to denote where some parameters are only used for specific objectives

* add references to the modeling file comments

* merge all modules files into the main modeling file

* add torch.no_grad decorators

* split up select action return statement

* remove redundant asserts

* add tutorial to training with multi_task_dit

* fix bugs when testing on hardware

* remove environment state conditioning

* update typo in test instruction comment

* add processor tests to multitask dit tests

* move policy to top of file

* use constants for indexing into batches and remove env state references

* remove the base classes since we don't need to be able to extend

* fix nit formatting in generate actions fcn

* reformat and clean up tutorial for multitask dit policy

* add more descriptions and depth to multitask dit tutorial

* note origins of each training objective

* rename config param for multiple vision encoders

* refactor code to perform task tokenization in the processor instead of in the modeling code for multitask dit

* add multitask dit to toc for docs

* add conditional transformers import to match all other policies that use transformers lib

* add test handling for multitask dit when transformers isnt available

* skip tests without transformers

* remove cropping of images smaller than the crop size

* add kwargs arg to multitask dit constructor

* add wallx dep conflict management for multitask dit policy

* use hyphens for cleanliness in pyproject.toml

* add conflict management to pyproject toml for pi conflict for mtdp as well

* update tests script to not use unnecessary uv sync call which resolves dependencies that do not need to run. This drastically reduces CI run time

* revert fast tests edits

* update docs and readme files, fixing some typos and adding multitask dit to readme

* chore(dependencies): upgrade transformers + hggingface-hub + peft + scipy

* chore(dependencies): bump pi0 family to transformers v5

* chore(dependencies): bump wall x to transformers v5

* chore(dependencies): bump gr00t to transformers v5

* chore(style): fix pre-commit

* fix(policy): xvla forced_bos_token missing

* test(rl): skip ci tests for resnet10

* Fix: full pi models support for transformer v5 (#2967)

* fix(pi): remove loss truncation

* fix(pi): remove state padding before tokenization

* fix(pi): fix image padding value

* fix from_pretrain

* add transformer v5 changes

* remove reference

* more fixes

* make it work

* add support for rest of pi family

* add pifast work

* more changes

* more changes

* more cleanup

* fix torch params

* dtype fix

* torch compile

* embed mismatch fix

* revert groot

* more nit fixes

* remove unused classes

* more fixes

* revert

* nit

* torch dtype warning fix

* but back dynamic renaming

* add tie embedding

---------

Co-authored-by: Yufei Sun <skieyfly@gmail.com>

* chore: fix XVLA in transformers v5 (#3006)

* test(policies): enable wall x CI testing

* style(test): pre-commit check

* style(test): pre-commit

---------

Signed-off-by: Bryson Jones <63133702+brysonjones@users.noreply.github.com>
Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: Jade Choghari <chogharijade@gmail.com>
Co-authored-by: Yufei Sun <skieyfly@gmail.com>
Co-authored-by: Steven Palma <steven.palma@huggingface.co>

2026-03-28 00:41:26 +01:00

1.1 KiB

Raw Blame History

Multitask DiT Policy

Citation

If you use this work, please cite the following works:

@misc{jones2025multitaskditpolicy,
  author = {Bryson Jones},
  title = {Dissecting and Open-Sourcing Multitask Diffusion Transformer Policy},
  year = {2025},
  url = {https://brysonkjones.substack.com/p/dissecting-and-open-sourcing-multitask-diffusion-transformer-policy},
  note = {Blog post}
}

@misc{trilbmteam2025carefulexaminationlargebehaviormodels,
  author       = {TRI LBM Team},
  title        = {A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation},
  year         = {2025},
  eprint       = {arXiv:2507.05331},
  archivePrefix = {arXiv},
  primaryClass = {cs.RO},
  url          = {https://arxiv.org/abs/2507.05331}
}

@misc{bostondynamics2025largebehaviormodelsatlas,
  author       = {Boston Dynamics and TRI Research Team},
  title        = {Large Behavior Models and Atlas Find New Footing},
  year         = {2025},
  url          = {https://bostondynamics.com/blog/large-behavior-models-atlas-find-new-footing/},
  note         = {Blog post}
}

1.1 KiB Raw Blame History

Multitask DiT Policy

Citation

1.1 KiB

Raw Blame History