fix a bug for kwargs in wallx (#2714)

* support wallx * fix bugs in flow * incorporate wallx model into lerobot * update the policy methods * reduce to least config and params & pass lerobot basic test * fixed dtype bugs * add wallx dependencies * update * remove flash-attn requirement && fix bug in inference and fast mode * fix bug for inference * add some small modifications * fix pre-commit errors * remove lerobot[wallx] * fix ci * fix precommit issues * fix: exclude wallx extra properly in CI workflows * fix: add uv conflicts for wallx transformers version * fix: peft test import * pre-commit * only export WallXConfig from wall_x package to avoid peft import in CI * remove torch dep * precommit * add import * update doc files * fix minor errors * fix a bug for kwargs * fix precommit issue --------- Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: vincentchen <chenlufang@x2robot.com> Co-authored-by: Geoffrey19 <sympathischmann35@gmail.com> Co-authored-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Pepijn <pepijn@huggingface.co> Co-authored-by: geoffrey <geoffrey@x2robot.com>
2026-07-08 02:22:02 +00:00 · 2026-01-06 22:13:35 +08:00
parent 6106a8136c
commit 603d44434f
2 changed files with 15 additions and 5 deletions
@@ -1,20 +1,30 @@
 # WALL-OSS

-This repository contains the Hugging Face port of **WALL-OSS**, a Vision-Language-Action model for cross-embodiment robotic control based on Qwen2.5-VL with flow matching/FAST action prediction.
+This repository contains the Hugging Face port of [**WALL-OSS**](https://x2robot.com/en/research/68bc2cde8497d7f238dde690), a Vision-Language-Action model for cross-embodiment robotic control based on Qwen2.5-VL with flow matching/FAST action prediction.

 ---

 ## Model Overview

 | Feature            | Description                                           |
-| ------------------ | ----------------------------------------------------- | --- |
+| ------------------ | ----------------------------------------------------- |
 | Base Model         | Qwen2.5-VL (Vision-Language Model)                    |
 | Action Prediction  | Flow Matching (diffusion) or FAST (discrete tokens)   |
-| Architecture       | Mixture of Experts (MoE) with action-specific routing |     |
+| Architecture       | Mixture of Experts (MoE) with action-specific routing |
 | Multi-Modal Inputs | Vision (images/videos), Language, Proprioception      |

 ---

+## Additional Resources
+
+Paper: https://arxiv.org/pdf/2509.11766
+
+Official Repository: https://github.com/X-Square-Robot/wall-x
+
+Hugging Face: https://huggingface.co/x-square-robot
+
+---
+
 ## Citation

 If you use this work, please cite:
@@ -32,4 +42,4 @@ If you use this work, please cite:

 ## License

-This port follows the **Apache 2.0 License**.
+This model follows the **Apache 2.0 License**, consistent with the original [WallX repository](https://github.com/X-Square-Robot/wall-x).
@@ -1697,7 +1697,7 @@ class WallXPolicy(PreTrainedPolicy):
    config_class = WallXConfig
    name = "wall_x"

-    def __init__(self, config: WallXConfig):
+    def __init__(self, config: WallXConfig, **kwargs):
        super().__init__(config)
        config.validate_features()
        self.config = config