mirror of
https://github.com/huggingface/lerobot.git
synced 2026-06-13 06:29:57 +00:00
2e9cd87bbd
* first commit * feat(policies): add VLA-JEPA * feat(policies): add VLA-JEPA * support vla_jepa * (feat)policies: add VLA-JEPA * linting * adding deps to pyproject.toml * updating uv lock * adding guards to avoid needing transformers and diffusers for type checking and basic tests * fixing action and state dim * fix warnings with qwen processor kwargs * fixing wm_loss not propagating * adjusting obs steps, tublets size to match original implementation * some more fixes to be closer to the original implem * adding more tests to ensure good coverage * align VLA-JEPA architecture with original checkpoint - Remove stale `action_num_heads` / `action_attention_head_dim` config fields; DiT head dimensions are now always derived from the preset (DiT-B/L/test). - Add `num_target_vision_tokens` and `action_max_seq_len` config fields required by the action head's future-token embedding and positional embedding tables. - Fix default `qwen_model_name` to 2B (matches all released checkpoints). - Rename `ActionEncoder` attrs w1/w2/w3 → layer1/layer2/layer3 to match checkpoint key names; replace `nn.Sequential` decoder/state-encoder with `_MLP2` (layer1/layer2 naming). - Fix `VLAJEPAActionHead` to size ActionEncoder and StateEncoder at `inner_dim` (DiT input width) rather than `action_hidden_size` (DiT output width). - Rename `DiT.blocks` → `transformer_blocks` and `attn` → `attn1` to match checkpoint; add alternating cross/self attention (even blocks cross-attend to Qwen context, odd blocks self-attend). - Add `DiT-test` preset for unit tests. - Rewrite `ActionConditionedVideoPredictor` with explicit ViT-style blocks (`_PredictorBlock` with fused qkv) to match checkpoint structure; rename `encoder`/`norm`/`proj` → `predictor_blocks`/`predictor_norm`/`predictor_proj`. * propagate action_is_pad masking through VLA-JEPA policy pipeline Pass the `action_is_pad` tensor from the batch through to the action head so padded timesteps are excluded from the flow-matching loss. * update VLA-JEPA tests for arch changes and action_is_pad - Switch conftest to use `action_model_type="DiT-test"` now that `action_num_heads` / `action_attention_head_dim` have been removed. - Add action_head tests covering fully-padded loss (zero) and equivalence of action_is_pad=None vs all-zeros mask. - Remove obsolete `test_native_to_lerobot_wm_only` test. * add VLA-JEPA documentation Covers architecture overview, pretrained checkpoints, config reference, training/eval commands for LIBERO-10, and guidance on fine-tuning for single-camera datasets. * add one-shot script to convert ginwind/VLA-JEPA checkpoints to safetensors (will remove once migrated) * make default params more aligned with paper and pretrained models - adding possibility of freezing qwen backbone and world model - added tests for weight loading * trying out to re-init the action head to avoid pretraining dimension mismatch * allow different state dim and action dim * removing missleading future_action_window_size to just use chunk_size * lots of changes to make existing weights work, need to massively refactor the pre and post processing * refactoring into using pre and post processor * pre-commit cleanup * fixing doc defaults args Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> * adressing dtype zeros issue * adding guard for diffusers * fixing training and exal examples * trying to close success rate gap * fix qwen norm layer output libero eval is now as expected * adding instructions for different embodiement + fixing some tests * smol fix to avoid having default CPU device when training * fixing misconception about multiview / singleview handling * removing conversion script * adding licences * adding .mdx docs and shortening polivy_vla_jepa_README.md * removing useless pre-processor * cleanup * removing swish in favor of silu * adding configuration gripper index and threshold * fixing simlink --------- Signed-off-by: Maxime Ellerbach <maxime@ellerbach.net> Co-authored-by: ginwind <ginwind@mail.ustc.edu.cn>
183 lines
4.3 KiB
YAML
183 lines
4.3 KiB
YAML
- sections:
|
|
- local: index
|
|
title: LeRobot
|
|
- local: installation
|
|
title: Installation
|
|
- local: cheat-sheet
|
|
title: Cheat sheet
|
|
title: Get started
|
|
- sections:
|
|
- local: il_robots
|
|
title: Imitation Learning for Robots
|
|
- local: lelab
|
|
title: LeLab - Lerobot GUI
|
|
- local: bring_your_own_policies
|
|
title: Adding a Policy
|
|
- local: integrate_hardware
|
|
title: Bring Your Own Hardware
|
|
- local: hilserl
|
|
title: Train a Robot with RL
|
|
- local: hilserl_sim
|
|
title: Train RL in Simulation
|
|
- local: multi_gpu_training
|
|
title: Multi GPU training
|
|
- local: hil_data_collection
|
|
title: Human In the Loop Data Collection
|
|
- local: peft_training
|
|
title: Training with PEFT (e.g., LoRA)
|
|
- local: rename_map
|
|
title: Using Rename Map and Empty Cameras
|
|
title: "Tutorials"
|
|
- sections:
|
|
- local: hardware_guide
|
|
title: Compute Hardware Guide
|
|
- local: torch_accelerators
|
|
title: PyTorch accelerators
|
|
title: "Compute & Hardware"
|
|
- sections:
|
|
- local: lerobot-dataset-v3
|
|
title: Using LeRobotDataset
|
|
- local: porting_datasets_v3
|
|
title: Porting Large Datasets
|
|
- local: using_dataset_tools
|
|
title: Using the Dataset Tools
|
|
- local: language_and_recipes
|
|
title: Language Columns and Recipes
|
|
- local: tools
|
|
title: Tools
|
|
- local: video_encoding_parameters
|
|
title: Video encoding parameters
|
|
- local: streaming_video_encoding
|
|
title: Streaming Video Encoding
|
|
title: "Datasets"
|
|
- sections:
|
|
- local: act
|
|
title: ACT
|
|
- local: smolvla
|
|
title: SmolVLA
|
|
- local: pi0
|
|
title: π₀ (Pi0)
|
|
- local: pi0fast
|
|
title: π₀-FAST (Pi0Fast)
|
|
- local: pi05
|
|
title: π₀.₅ (Pi05)
|
|
- local: molmoact2
|
|
title: MolmoAct2
|
|
- local: vla_jepa
|
|
title: VLA-JEPA
|
|
- local: eo1
|
|
title: EO-1
|
|
- local: groot
|
|
title: NVIDIA GR00T N1.5
|
|
- local: xvla
|
|
title: X-VLA
|
|
- local: multi_task_dit
|
|
title: Multitask DiT Policy
|
|
- local: walloss
|
|
title: WALL-OSS
|
|
title: "Policies"
|
|
- sections:
|
|
- local: sarm
|
|
title: SARM
|
|
- local: robometer
|
|
title: ROBOMETER
|
|
- local: topreward
|
|
title: TOPReward
|
|
title: "Reward Models"
|
|
- sections:
|
|
- local: inference
|
|
title: Policy Deployment (lerobot-rollout)
|
|
- local: async
|
|
title: Use Async Inference
|
|
- local: rtc
|
|
title: Real-Time Chunking (RTC)
|
|
title: "Inference"
|
|
- sections:
|
|
- local: envhub
|
|
title: Environments from the Hub
|
|
- local: envhub_leisaac
|
|
title: Control & Train Robots in Sim (LeIsaac)
|
|
title: "Simulation"
|
|
- sections:
|
|
- local: adding_benchmarks
|
|
title: Adding a New Benchmark
|
|
- local: libero
|
|
title: LIBERO
|
|
- local: libero_plus
|
|
title: LIBERO-plus
|
|
- local: metaworld
|
|
title: Meta-World
|
|
- local: robotwin
|
|
title: RoboTwin 2.0
|
|
- local: robocasa
|
|
title: RoboCasa365
|
|
- local: robocerebra
|
|
title: RoboCerebra
|
|
- local: robomme
|
|
title: RoboMME
|
|
- local: envhub_isaaclab_arena
|
|
title: NVIDIA IsaacLab Arena Environments
|
|
- local: vlabench
|
|
title: VLABench
|
|
title: "Benchmarks"
|
|
- sections:
|
|
- local: introduction_processors
|
|
title: Introduction to Robot Processors
|
|
- local: debug_processor_pipeline
|
|
title: Debug your processor pipeline
|
|
- local: implement_your_own_processor
|
|
title: Implement your own processor
|
|
- local: processors_robots_teleop
|
|
title: Processors for Robots and Teleoperators
|
|
- local: env_processor
|
|
title: Environment Processors
|
|
- local: action_representations
|
|
title: Action Representations
|
|
title: "Robot Processors"
|
|
- sections:
|
|
- local: so101
|
|
title: SO-101
|
|
- local: so100
|
|
title: SO-100
|
|
- local: koch
|
|
title: Koch v1.1
|
|
- local: lekiwi
|
|
title: LeKiwi
|
|
- local: hope_jr
|
|
title: Hope Jr
|
|
- local: reachy2
|
|
title: Reachy 2
|
|
- local: unitree_g1
|
|
title: Unitree G1
|
|
- local: earthrover_mini_plus
|
|
title: Earth Rover Mini
|
|
- local: omx
|
|
title: OMX
|
|
- local: openarm
|
|
title: OpenArm
|
|
- local: rebot_b601
|
|
title: reBot B601-DM
|
|
title: "Robots"
|
|
- sections:
|
|
- local: phone_teleop
|
|
title: Phone
|
|
title: "Teleoperators"
|
|
- sections:
|
|
- local: cameras
|
|
title: Cameras
|
|
title: "Sensors"
|
|
- sections:
|
|
- local: notebooks
|
|
title: Notebooks
|
|
- local: feetech
|
|
title: Updating Feetech Firmware
|
|
- local: damiao
|
|
title: Damiao Motors and CAN Bus
|
|
title: "Resources"
|
|
- sections:
|
|
- local: contributing
|
|
title: Contribute to LeRobot
|
|
- local: backwardcomp
|
|
title: Backward compatibility
|
|
title: "About"
|