Key fixes for the PI05Video policy implementation:
PerceiverResampler improvements:
- Add residual connection (latents + attn_out) for better gradient flow
- Add output LayerNorm after residual connection
- Initialize latents with smaller variance (*0.02) for stability
Bug fixes:
- Replace expand() with repeat() in _preprocess_video to create copies
instead of memory views, preventing potential in-place modification bugs
- Fix dtype consistency in embed_video: use PaliGemma's dtype instead
of input dtype for consistent processing throughout the pipeline
- Add bfloat16/float16 support to resize_with_pad_torch
PEFT improvements:
- Remove state_proj from target modules (PI0-only, not in PI05)
- Add video_proj and video_resampler to PEFT targets for fine-tuning
Other improvements:
- Add warning when use_video_encoder=True but no image features found
- Add gradient checkpointing support for video encoder
- Remove duplicate tokenizer_max_length definition in config
- Add validation for video_num_latents and video_resampler_num_heads
* feat(async_inference): server always sends CPU tensors, client handles device conversion
* fix:fix the type annotation of RawObservation in src/lerobot/async_inference/helpers.py
* update the import of robot_client
---------
Co-authored-by: Sato shinji <wwwsatoshinji@gmail.com>
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
Co-authored-by: KB <kevin-brian.n-diaye@epita.fr>
* improve image2video
* add episodes video encoding
* fix mypy failing
* iterate on review
* nit
* remove max, and let it be optional
* iterate more
* update docs
* fix test
---------
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
* fix: use features when aggregating image based datasets
* add: test asserting for data type
* add: features param to writing dataset
---------
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
* move peft config from `lerobot_train` to policy level
* Update src/lerobot/scripts/lerobot_train.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co>
* copilot response
* Change the polciy function to return targets rather than peft config.`_get_default_peft_targets()` override in PI0, PI0.5, SmolVLA
* remove none check when building config dict
---------
Signed-off-by: Michel Aractingi <michel.aractingi@huggingface.co>
This PR extends the integration of Unitree g1 with the LeRobot codebase. By converting robot state to a flat dict we can now record and replay episodes (example groot/holosoma scripts need to be adjusted as well). We also improve the simulation integration by calling .step @ _subscribe_motor_state instead of it running in a separate thread. We also add ZMQ camera to lerobot, streaming base64 images over json
* feat(robots): consolidates bi SO setups
* fix(robots): solve circular dependecy
* fix(robots): teleop & record working
* feat(robots): only one SO
* fix(utils): rename bi so
* fix(scripts): bi so import
* fix(rl): remove imports