refactor to use relative state

2026-05-20 11:09:59 +00:00 · 2026-04-01 17:23:58 +02:00
parent 0fc855df13
commit 58bd11caf3
14 changed files with 502 additions and 296 deletions
@@ -202,11 +202,22 @@ Here is how the different processors compose. Each arrow is a processor step, an
                    └─────────────────────────────────────────┘

                    ┌─────────────────────────────────────────┐
-   Representation   │   Absolute  ←────→  Relative            │
+   State Derivation │   Action column  ────→  State + Action  │
+                    │   DeriveStateFromActionStep (pre only)  │
+                    │   (UMI-style: state from action chunk)  │
+                    └─────────────────────────────────────────┘
+
+                    ┌─────────────────────────────────────────┐
+   Action Repr.     │   Absolute  ←────→  Relative            │
                    │   RelativeActionsProcessorStep (pre)    │
                    │   AbsoluteActionsProcessorStep (post)   │
                    └─────────────────────────────────────────┘

+                    ┌─────────────────────────────────────────┐
+   State Repr.      │   Absolute  ────→  Relative             │
+                    │   RelativeStateProcessorStep (pre only) │
+                    └─────────────────────────────────────────┘
+
                    ┌─────────────────────────────────────────┐
   Normalization    │   Raw  ←────→  Normalized               │
                    │   NormalizerProcessorStep (pre)         │
@@ -216,6 +227,10 @@ Here is how the different processors compose. Each arrow is a processor step, an

 A typical training preprocessor might chain: `raw absolute joint actions → relative → normalize`. A typical inference postprocessor: `unnormalize → absolute → (optionally IK to joints)`.

+With UMI-style relative proprioception (`use_relative_state=True`), the preprocessor also converts observation.state to offsets from the current timestep via `RelativeStateProcessorStep` before normalization. This is a pre-processing-only step (state is an input, not an output).
+
+With `derive_state_from_action=True`, the preprocessor first runs `DeriveStateFromActionStep` to extract a 2-step state from the extended action chunk. This enables full UMI-style training without a separate `observation.state` column. See the [UMI pi0 guide](umi_pi0_relative_ee) for details.
+
 ## References

 - [Universal Manipulation Interface (UMI)](https://arxiv.org/abs/2402.10329) - Chi et al., 2024. Defines the relative trajectory action representation and compares it with absolute and delta actions.