lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-26 04:37:01 +00:00

Files

T

History

Altman e64fa667c3 fix(vqbet): use in-place fill_ to avoid overwriting DDP GPU buffers with CPU tensors (#3128 )

* fix(vqbet): use in-place fill_ to avoid overwriting DDP GPU buffers with CPU tensors

When VQ discretization phase completes, the code was overwriting
register_buffer('discretized') and register_buffer('freeze_codebook')
with torch.tensor(True), which is created on CPU. DDP then fails in
_sync_buffers() with: RuntimeError: No backend type associated with
device type cpu. Fix by updating the buffers in-place with .fill_(True)
so device and registration are preserved.

Made-with: Cursor

* test(vqbet): add regression test for in-place buffer update during discretization

Verifies that discretize() updates the 'discretized' and 'freeze_codebook'
registered buffers in-place (via fill_()) rather than replacing them with new
CPU tensors. The test checks data_ptr() identity and that the tensors remain
registered buffers after the call. This prevents regressions of the DDP fix.

Made-with: Cursor

* test(vqbet): add GPU regression test to verify buffers stay on CUDA after discretize()

Directly catches the original DDP failure mode: when buffers are replaced with
torch.tensor(True) they land on CPU, causing NCCL to raise 'No backend type
associated with device type cpu' in _sync_buffers(). The GPU test places the
model on cuda:0 and asserts both buffers remain on CUDA after discretization.

Made-with: Cursor

* test(vqbet): simplify to single device-check test in test_policies.py

Per reviewer feedback: remove the separate test file and replace the two
CPU/GPU tests (with data_ptr checks) with a single focused test in
tests/policies/test_policies.py that only asserts the registered buffers
remain on the model device after discretize(). Uses DEVICE from tests/utils.py
so it runs on whatever device the CI/user selects (cpu, cuda, mps).

Made-with: Cursor

* style: fix import order in test_policies.py to pass ruff/pre-commit checks

Made-with: Cursor

---------

Co-authored-by: Zhan DiJia <2476100824@example.com>
Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>

2026-03-18 13:24:07 +01:00

artifacts

Fix(diffusion): enforce no-crop behavior when crop_ratio=1.0 (#3046 )

2026-02-27 17:44:53 +01:00

async_inference

fix(async_inference): restore robot module imports in robot_client.py (#3081 )

2026-03-06 17:14:14 +01:00

cameras

test(cameras): skip flaky async_read test (#3106 )

2026-03-08 14:02:33 +01:00

configs

chore(dataset): basic house-keeping (#3170 )