lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-06-19 01:07:18 +00:00

Author	SHA1	Message	Date
Pepijn	6496728025	docs(lingbot_va): point checkpoint paths at the lerobot org The LeRobot-format checkpoints moved from pepijn223/* to lerobot/* (libero_long, robotwin, base). Update the eval/train --policy.path examples accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 11:58:31 +02:00
Pepijn	5568ce7af1	Update lingbot_va.mdx Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com>	2026-06-08 10:47:34 +02:00
pepijn223	5222f3a4a7	docs(lingbot_va): document EEF action-channel schema + camera order Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-06 16:11:35 +02:00
pepijn223	f9d12db9cf	fix(lingbot_va): CI quality gate + fast-test collection - Add tests/policies/lingbot_va/__init__.py so the test files don't clash by basename with tests/policies/vla_jepa/* under pytest's default import mode (fast-test collection error). - Fix vendored typos flagged by the typos hook (pach_scale->patch_scale, total_tolen-> total_token_len, stablized->stabilized) and a mypy union-attr in RoboTwinEnv._read_eef_pose. - Apply Prettier formatting to docs/source/lingbot_va.mdx. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-06 15:46:37 +02:00
pepijn223	71aacda05e	feat(lingbot_va): implement training / fine-tuning (flow-matching loss) - Implement LingBotVAPolicy.forward(): dual-stream flow-matching training loss (latent + action, timestep-weighted, action-masked) ported from upstream train.py; VAE-encodes camera clips, UMT5-encodes the task, noises both streams, runs the block-causal flex-attention training pass (forward_train). - training_loss_from_streams() core + _build_training_streams() data prep (action scatter into the 30-d space, multi-frame VAE encode incl. robotwin_tshape). - get_optim_params returns only trainable transformer params (LoRA/PEFT friendly); VAE/UMT5 stay frozen. Training needs attn_mode='flex'. - Add a tiny-config single-training-step test (forward->loss->backward->AdamW) and a Training/fine-tuning section in the docs. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-06 15:38:41 +02:00
pepijn223	e3deff00ad	feat(lingbot_va): RoboTwin eef-pose eval, single-file model, Hub checkpoints Make the LingBot-VA port runnable on both LIBERO and RoboTwin and clean up the package to LeRobot conventions. - Consolidate all vendored Wan2.2 model code (transformer, attention, VAE helpers, flow-matching scheduler, grid utils, flex-attention) into a single modeling_lingbot_va.py; remove the separate wan_*/schedulers modules. - Move the fixed action (un)normalization quantiles out of the config and into the post-processor (LIBERO 7-DoF + RoboTwin 16-d eef); remove the conversion script in favour of ready-to-use LeRobot-format checkpoints on the Hub. - Fixes found via on-sim validation: undo LIBERO's 180-degree image flip (image_hflip), encode obs as a multi-frame streaming-VAE clip, reset the streaming VAE cache between episodes, run the transformer in config.dtype, lazy-load frozen VAE/UMT5 by subfolder with the text encoder on CPU. - RoboTwin: add an end-effector-pose action mode to RoboTwinEnv (16-d per-arm xyz+quat+gripper deltas composed onto the initial eef pose, executed via CuRobo IK) and the robotwin_tshape latent layout (full-res head + half-res wrists via a second streaming VAE) with the upstream RoboTwin action quantiles + camera mapping. - Predicted-video saving works for both benchmarks; docs + tests updated. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-06 15:20:51 +02:00
Pepijn	4dfa8cea65	feat(policies): add LingBot-VA autoregressive video-action world model Port the LingBot-VA policy (Wan2.2 dual-stream video+action world model) into LeRobot, following the EO-1 / VLA-JEPA conventions. Covers inference, checkpoint conversion, and predicted-video saving (training is deferred to a follow-up PR). - Vendored Wan transformer/attention/flex/VAE/scheduler modules (key names preserved for near-identity conversion); torch SDPA default, flashattn/flex lazy-guarded. - LingBotVAConfig (registered "lingbot_va") + processor with fixed-quantile action unnormalization; full dual-stream sampling loop with CFG, two flow-matching schedulers and KV cache, mapped onto select_action with observed-keyframe feedback. - convert_lingbot_va_checkpoints.py (libero/robotwin variants): bundles the ~5B transformer, lazy-pulls the frozen VAE+UMT5 from the source repo. - Predicted-video plumbing in lerobot_eval (predicted_frames_callback; opt-in via --policy.save_predicted_video) and ConstantWithWarmupSchedulerConfig. - pyproject: widen diffusers-dep to <0.37, add lingbot_va + imageio-dep extras, add lingbot_va and (missing) eo1 to `all`. - Factory + policies/__init__ wiring, docs page + toctree, and tests. Note: the LIBERO success-rate correctness gate must be validated on a CUDA GPU with the converted checkpoint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 16:28:19 +02:00

7 Commits