feature(pipeline): port tokenizer pipeline for VLA (#1645)

* feat(tokenizer): Introduce TokenizerProcessor for text tokenization - Added TokenizerProcessor class to handle tokenization of task strings using Hugging Face's AutoTokenizer. - Supports both string and list inputs, with customizable parameters for task key, output key, and tokenization settings. - Implemented comprehensive unit tests to validate functionality, including handling of various input scenarios and integration with RobotProcessor. - Updated types.py to include LANGUAGE feature type and modified __init__.py to register the new processor. * feat(language): Enhance language processing in TokenizerProcessor - Added OBS_LANGUAGE constant to define the observation language key. - Updated TokenizerProcessor to store tokenized task data in the observation dictionary, ensuring compatibility with the new language feature. - Introduced Pi0NewLineProcessor to append newlines to tasks for proper tokenization. - Modified tests to validate the integration of language tokens and attention masks in the observation structure. * feat(tokenizer): Add padding configuration to TokenizerProcessor - Introduced `padding_side` parameter to the TokenizerProcessor for customizable padding direction. - Updated the `make_pi0_processor` function to include the new padding configuration. - Enhanced unit tests to validate the functionality of the `padding_side` parameter in various scenarios. * feat(processor): Add state management methods to Pi0NewLineProcessor * feat(normalization): Track normalization and unnormalization info in complementary data - Updated NormalizerProcessor and UnnormalizerProcessor to accept additional parameters for tracking normalization modes. - Enhanced the __call__ methods to store normalization and unnormalization information in the complementary data of transitions. - Added unit tests to verify the correct tracking of normalization info, including scenarios with missing stats and selective normalization keys. * feat(factory): Add preprocessor and postprocessor overrides to ProcessorConfigKwargs - Updated ProcessorConfigKwargs to include optional overrides for preprocessor and postprocessor configurations. - Enhanced the make_processor function to utilize the new overrides, allowing for more flexible processor initialization. * feat(processors): Integrate RenameProcessor into various processor configurations - Added RenameProcessor to the input steps of multiple processor functions, including make_act_processor, make_diffusion_processor, make_pi0_processor, make_sac_processor, make_tdmpc_processor, make_vqbet_processor, and make_smolvla_processor. - Consolidated normalization features from input and output into a single NormalizerProcessor for improved efficiency. - Updated the input steps to ensure compatibility with the new RenameProcessor integration. * feat(smolvla): Refactor language processing and introduce new line processor (#1658) - Removed the prepare_language method and directly accessed language tokens and masks from the batch using the OBS_LANGUAGE constant. - Added SmolVLANewLineProcessor to ensure tasks end with a newline, enhancing tokenization compatibility. - Updated the make_smolvla_processor function to include the new line processor and tokenizer processor for improved input handling. * feture(policies): add device processor (#1659) * feat(processors): Integrate DeviceProcessor into multiple processor configurations - Added DeviceProcessor to the input and output steps of various processor functions, including make_act_processor, make_diffusion_processor, make_pi0_processor, make_pi0fast_processor, make_sac_processor, make_tdmpc_processor, make_vqbet_processor, and make_smolvla_processor. - Enhanced the DeviceProcessor class with state management methods and ensured compatibility with existing processor pipelines. - Introduced unit tests for DeviceProcessor to validate functionality across different scenarios, including CPU and CUDA operations. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor(pipeline): Remove to() method for device management - Eliminated the to() method from RobotProcessor, which was responsible for moving tensor states to specified devices. - Removed associated unit tests that validated the functionality of the to() method across various scenarios. - Streamlined the pipeline code by focusing on other device management strategies. * feat(processor): Enhance DeviceProcessor with float dtype conversion - Added support for optional float dtype conversion in DeviceProcessor, allowing tensors to be converted to specified floating-point types while preserving non-float types. - Implemented validation for float dtype input and updated the processor's configuration methods to include float dtype. - Refactored tensor processing logic to streamline device movement and dtype conversion. - Introduced comprehensive unit tests to validate the new float dtype functionality across various scenarios. * feat(policies): Add new line processors and update module exports * feat(processor): Enhance batch and device processors to handle index and task_index fields - Added logic to ToBatchProcessor for unsqueezing 0D tensors for index and task_index fields, ensuring they are processed as 1D tensors. - Updated DeviceProcessor to process index and task_index fields in complementary data, preserving their tensor types and ensuring non-tensor fields remain unchanged. - Enhanced unit tests to validate the correct handling of index and task_index fields across various scenarios, including device compatibility and dtype preservation.
2026-05-19 10:40:04 +00:00 · 2025-08-05 10:53:08 +02:00
parent a1734cf575
commit 5326ffe77e
26 changed files with 2776 additions and 232 deletions
@@ -24,6 +24,7 @@ class FeatureType(str, Enum):
    ENV = "ENV"
    ACTION = "ACTION"
    REWARD = "REWARD"
+    LANGUAGE = "LANGUAGE"


 class NormalizationMode(str, Enum):
@@ -21,6 +21,7 @@ OBS_ENV_STATE = "observation.environment_state"
 OBS_STATE = "observation.state"
 OBS_IMAGE = "observation.image"
 OBS_IMAGES = "observation.images"
+OBS_LANGUAGE = "observation.language"
 ACTION = "action"
 REWARD = "next.reward"

@@ -15,6 +15,17 @@
 from .act.configuration_act import ACTConfig as ACTConfig
 from .diffusion.configuration_diffusion import DiffusionConfig as DiffusionConfig
 from .pi0.configuration_pi0 import PI0Config as PI0Config
+from .pi0.processor_pi0 import Pi0NewLineProcessor
 from .smolvla.configuration_smolvla import SmolVLAConfig as SmolVLAConfig
+from .smolvla.processor_smolvla import SmolVLANewLineProcessor
 from .tdmpc.configuration_tdmpc import TDMPCConfig as TDMPCConfig
 from .vqbet.configuration_vqbet import VQBeTConfig as VQBeTConfig
+
+__all__ = [
+    "ACTConfig",
+    "DiffusionConfig",
+    "PI0Config",
+    "SmolVLAConfig",
+    "TDMPCConfig",
+    "VQBeTConfig",
+]
@@ -17,7 +17,9 @@ import torch

 from lerobot.policies.act.configuration_act import ACTConfig
 from lerobot.processor import (
+    DeviceProcessor,
    NormalizerProcessor,
+    RenameProcessor,
    RobotProcessor,
    ToBatchProcessor,
    UnnormalizerProcessor,
@@ -28,15 +30,17 @@ def make_act_processor(
    config: ACTConfig, dataset_stats: dict[str, dict[str, torch.Tensor]] | None = None
 ) -> tuple[RobotProcessor, RobotProcessor]:
    input_steps = [
+        RenameProcessor(rename_map={}),
        NormalizerProcessor(
-            features=config.input_features, norm_map=config.normalization_mapping, stats=dataset_stats
-        ),
-        NormalizerProcessor(
-            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
+            features={**config.input_features, **config.output_features},
+            norm_map=config.normalization_mapping,
+            stats=dataset_stats,
        ),
        ToBatchProcessor(),
+        DeviceProcessor(device=config.device),
    ]
    output_steps = [
+        DeviceProcessor(device="cpu"),
        UnnormalizerProcessor(
            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
        ),
@@ -18,7 +18,9 @@ import torch

 from lerobot.policies.diffusion.configuration_diffusion import DiffusionConfig
 from lerobot.processor import (
+    DeviceProcessor,
    NormalizerProcessor,
+    RenameProcessor,
    RobotProcessor,
    ToBatchProcessor,
    UnnormalizerProcessor,
@@ -29,15 +31,17 @@ def make_diffusion_processor(
    config: DiffusionConfig, dataset_stats: dict[str, dict[str, torch.Tensor]] | None = None
 ) -> tuple[RobotProcessor, RobotProcessor]:
    input_steps = [
+        RenameProcessor(rename_map={}),
        NormalizerProcessor(
-            features=config.input_features, norm_map=config.normalization_mapping, stats=dataset_stats
-        ),
-        NormalizerProcessor(
-            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
+            features={**config.input_features, **config.output_features},
+            norm_map=config.normalization_mapping,
+            stats=dataset_stats,
        ),
        ToBatchProcessor(),
+        DeviceProcessor(device=config.device),
    ]
    output_steps = [
+        DeviceProcessor(device="cpu"),
        UnnormalizerProcessor(
            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
        ),
@@ -17,7 +17,7 @@
 from __future__ import annotations

 import logging
-from typing import TypedDict
+from typing import Any, TypedDict

 from torch import nn
 from typing_extensions import Unpack
@@ -111,6 +111,8 @@ class ProcessorConfigKwargs(TypedDict, total=False):

    preprocessor_config_filename: str | None
    postprocessor_config_filename: str | None
+    preprocessor_overrides: dict[str, Any] | None
+    postprocessor_overrides: dict[str, Any] | None


 def make_processor(
@@ -142,10 +144,12 @@ def make_processor(
            RobotProcessor.from_pretrained(
                source=pretrained_path,
                config_filename=kwargs.get("preprocessor_config_filename", "preprocessor.json"),
+                overrides=kwargs.get("preprocessor_overrides", {}),
            ),
            RobotProcessor.from_pretrained(
                source=pretrained_path,
                config_filename=kwargs.get("postprocessor_config_filename", "postprocessor.json"),
+                overrides=kwargs.get("postprocessor_overrides", {}),
            ),
        )

@@ -56,9 +56,8 @@ from collections import deque
 import torch
 import torch.nn.functional as F  # noqa: N812
 from torch import Tensor, nn
-from transformers import AutoTokenizer

-from lerobot.constants import ACTION, OBS_STATE
+from lerobot.constants import ACTION, OBS_LANGUAGE, OBS_STATE
 from lerobot.policies.pi0.configuration_pi0 import PI0Config
 from lerobot.policies.pi0.paligemma_with_expert import (
    PaliGemmaWithExpertConfig,
@@ -226,16 +225,12 @@ class PI0Policy(PreTrainedPolicy):
        Args:
            config: Policy configuration class instance or None, in which case the default instantiation of
                    the configuration class is used.
-            dataset_stats: Dataset statistics to be used for normalization. If not passed here, it is expected
-                that they will be passed with a call to `load_state_dict` before the policy is used.
        """

        super().__init__(config)
        config.validate_features()
        self.config = config

-        # TODO(azouitine): Add tokenizer to pipeline
-        self.language_tokenizer = AutoTokenizer.from_pretrained("google/paligemma-3b-pt-224")
        self.model = PI0FlowMatching(config)

        self.reset()
@@ -280,7 +275,8 @@ class PI0Policy(PreTrainedPolicy):
        if len(self._action_queue) == 0:
            images, img_masks = self.prepare_images(batch)
            state = self.prepare_state(batch)
-            lang_tokens, lang_masks = self.prepare_language(batch)
+            lang_tokens = batch[f"{OBS_LANGUAGE}.tokens"]
+            lang_masks = batch[f"{OBS_LANGUAGE}.attention_mask"]

            actions = self.model.sample_actions(
                images, img_masks, lang_tokens, lang_masks, state, noise=noise
@@ -306,7 +302,8 @@ class PI0Policy(PreTrainedPolicy):

        images, img_masks = self.prepare_images(batch)
        state = self.prepare_state(batch)
-        lang_tokens, lang_masks = self.prepare_language(batch)
+        lang_tokens = batch[f"{OBS_LANGUAGE}.tokens"]
+        lang_masks = batch[f"{OBS_LANGUAGE}.attention_mask"]
        actions = self.prepare_action(batch)
        actions_is_pad = batch.get("action_is_pad")

@@ -373,26 +370,6 @@ class PI0Policy(PreTrainedPolicy):

        return images, img_masks

-    def prepare_language(self, batch) -> tuple[Tensor, Tensor]:
-        """Tokenize the text input"""
-        device = batch[OBS_STATE].device
-        tasks = batch["task"]
-
-        # PaliGemma prompt has to end with a new line
-        tasks = [task if task.endswith("\n") else f"{task}\n" for task in tasks]
-
-        tokenized_prompt = self.language_tokenizer.__call__(
-            tasks,
-            padding="max_length",
-            padding_side="right",
-            max_length=self.config.tokenizer_max_length,
-            return_tensors="pt",
-        )
-        lang_tokens = tokenized_prompt["input_ids"].to(device=device)
-        lang_masks = tokenized_prompt["attention_mask"].to(device=device, dtype=torch.bool)
-
-        return lang_tokens, lang_masks
-
    def _pi_aloha_decode_state(self, state):
        # Flip the joints.
        for motor_idx in [1, 2, 8, 9]:
@@ -458,7 +435,7 @@ class PI0FlowMatching(nn.Module):
    └──────────────────────────────┘
    """

-    def __init__(self, config):
+    def __init__(self, config: PI0Config):
        super().__init__()
        self.config = config

@@ -14,34 +14,107 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+from typing import Any
+
 import torch

+from lerobot.configs.types import PolicyFeature
 from lerobot.policies.pi0.configuration_pi0 import PI0Config
 from lerobot.processor import (
+    DeviceProcessor,
    NormalizerProcessor,
    RobotProcessor,
    ToBatchProcessor,
+    TokenizerProcessor,
    UnnormalizerProcessor,
 )
+from lerobot.processor.pipeline import (
+    EnvTransition,
+    ProcessorStep,
+    ProcessorStepRegistry,
+    TransitionKey,
+)
+from lerobot.processor.rename_processor import RenameProcessor
+
+
+@ProcessorStepRegistry.register(name="pi0_new_line_processor")
+class Pi0NewLineProcessor(ProcessorStep):
+    """Add a new line to the end of the task if it doesn't have one.
+    This is required for the PaliGemma tokenizer.
+    """
+
+    def __call__(self, transition: EnvTransition) -> EnvTransition:
+        # Check if complementary_data exists
+        complementary_data = transition.get(TransitionKey.COMPLEMENTARY_DATA)
+        if complementary_data is None or "task" not in complementary_data:
+            return transition
+
+        task = complementary_data["task"]
+        if task is None:
+            return transition
+
+        # Handle both string and list of strings
+        if isinstance(task, str):
+            # Single string: add newline if not present
+            if not task.endswith("\n"):
+                complementary_data["task"] = f"{task}\n"
+        elif isinstance(task, list) and all(isinstance(t, str) for t in task):
+            # List of strings: add newline to each if not present
+            complementary_data["task"] = [t if t.endswith("\n") else f"{t}\n" for t in task]
+        # If task is neither string nor list of strings, leave unchanged
+
+        return transition
+
+    def feature_contract(self, features: dict[str, PolicyFeature]) -> dict[str, PolicyFeature]:
+        """Add tokenized task features to the feature contract."""
+        return features
+
+    def state_dict(self) -> dict[str, torch.Tensor]:
+        """Return state dictionary (empty for this processor)."""
+        return {}
+
+    def load_state_dict(self, state: dict[str, torch.Tensor]) -> None:
+        """Load state dictionary (no-op for this processor)."""
+        pass
+
+    def reset(self) -> None:
+        """Reset processor state (no-op for this processor)."""
+        pass
+
+    def get_config(self) -> dict[str, Any]:
+        """Return configuration for serialization."""
+        return {}


 def make_pi0_processor(
    config: PI0Config, dataset_stats: dict[str, dict[str, torch.Tensor]] | None = None
 ) -> tuple[RobotProcessor, RobotProcessor]:
-    input_steps = [
+    # Add remaining processors
+    input_steps: list[ProcessorStep] = [
+        RenameProcessor(rename_map={}),  # To mimic the same processor as pretrained one
        NormalizerProcessor(
-            features=config.input_features, norm_map=config.normalization_mapping, stats=dataset_stats
-        ),
-        NormalizerProcessor(
-            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
+            features={**config.input_features, **config.output_features},
+            norm_map=config.normalization_mapping,
+            stats=dataset_stats,
        ),
        ToBatchProcessor(),
+        Pi0NewLineProcessor(),  # Add newlines before tokenization for PaliGemma
+        TokenizerProcessor(
+            tokenizer_name="google/paligemma-3b-pt-224",
+            max_length=config.tokenizer_max_length,
+            padding_side="right",
+            padding="max_length",
+        ),
+        DeviceProcessor(device=config.device),
    ]
-    output_steps = [
+
+    output_steps: list[ProcessorStep] = [
+        DeviceProcessor(device="cpu"),
        UnnormalizerProcessor(
            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
        ),
    ]
+
    return RobotProcessor(steps=input_steps, name="pi0_preprocessor"), RobotProcessor(
        steps=output_steps, name="pi0_postprocessor"
    )
@@ -18,7 +18,9 @@ import torch

 from lerobot.policies.pi0.configuration_pi0 import PI0Config
 from lerobot.processor import (
+    DeviceProcessor,
    NormalizerProcessor,
+    RenameProcessor,
    RobotProcessor,
    ToBatchProcessor,
    UnnormalizerProcessor,
@@ -29,15 +31,17 @@ def make_pi0_processor(
    config: PI0Config, dataset_stats: dict[str, dict[str, torch.Tensor]] | None = None
 ) -> tuple[RobotProcessor, RobotProcessor]:
    input_steps = [
+        RenameProcessor(rename_map={}),  # To mimic the same processor as pretrained one
        NormalizerProcessor(
-            features=config.input_features, norm_map=config.normalization_mapping, stats=dataset_stats
-        ),
-        NormalizerProcessor(
-            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
+            features={**config.input_features, **config.output_features},
+            norm_map=config.normalization_mapping,
+            stats=dataset_stats,
        ),
        ToBatchProcessor(),
+        DeviceProcessor(device=config.device),
    ]
    output_steps = [
+        DeviceProcessor(device="cpu"),
        UnnormalizerProcessor(
            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
        ),
@@ -19,7 +19,9 @@ import torch

 from lerobot.policies.sac.configuration_sac import SACConfig
 from lerobot.processor import (
+    DeviceProcessor,
    NormalizerProcessor,
+    RenameProcessor,
    RobotProcessor,
    ToBatchProcessor,
    UnnormalizerProcessor,
@@ -30,15 +32,17 @@ def make_sac_processor(
    config: SACConfig, dataset_stats: dict[str, dict[str, torch.Tensor]] | None = None
 ) -> tuple[RobotProcessor, RobotProcessor]:
    input_steps = [
+        RenameProcessor(rename_map={}),
        NormalizerProcessor(
-            features=config.input_features, norm_map=config.normalization_mapping, stats=dataset_stats
-        ),
-        NormalizerProcessor(
-            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
+            features={**config.input_features, **config.output_features},
+            norm_map=config.normalization_mapping,
+            stats=dataset_stats,
        ),
        ToBatchProcessor(),
+        DeviceProcessor(device=config.device),
    ]
    output_steps = [
+        DeviceProcessor(device="cpu"),
        UnnormalizerProcessor(
            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
        ),
@@ -17,6 +17,7 @@ import torch

 from lerobot.policies.sac.reward_model.configuration_classifier import RewardClassifierConfig
 from lerobot.processor import (
+    DeviceProcessor,
    IdentityProcessor,
    NormalizerProcessor,
    RobotProcessor,
@@ -33,8 +34,9 @@ def make_classifier_processor(
        NormalizerProcessor(
            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
        ),
+        DeviceProcessor(device=config.device),
    ]
-    output_steps = [IdentityProcessor()]
+    output_steps = [DeviceProcessor(device="cpu"), IdentityProcessor()]
    return RobotProcessor(steps=input_steps, name="classifier_preprocessor"), RobotProcessor(
        steps=output_steps, name="classifier_postprocessor"
    )
@@ -53,17 +53,13 @@ policy = SmolVLAPolicy.from_pretrained("lerobot/smolvla_base")
 """

 import math
-import os
-import re
 from collections import deque

-import safetensors
 import torch
 import torch.nn.functional as F  # noqa: N812
 from torch import Tensor, nn
-from transformers import AutoProcessor

-from lerobot.constants import ACTION, OBS_STATE
+from lerobot.constants import ACTION, OBS_LANGUAGE, OBS_STATE
 from lerobot.policies.pretrained import PreTrainedPolicy
 from lerobot.policies.smolvla.configuration_smolvla import SmolVLAConfig
 from lerobot.policies.smolvla.smolvlm_with_expert import SmolVLMWithExpertModel
@@ -72,102 +68,6 @@ from lerobot.policies.utils import (
 )
 from lerobot.utils.utils import get_safe_dtype

-# Matches ".soNNN", optionally followed by "-something", up to the "_buffer_" marker
-_VARIANT_RE = re.compile(r"\.so\d+(?:-[\w]+)?_buffer_")
-
-
-def canonicalise(k: str) -> str:
-    """
-    Remove dataset-variant markers like '.so100-blue_' or '.so100_' from a
-    normalisation-buffer key.
-    """
-    return _VARIANT_RE.sub(".buffer_", k)
-
-
-def standardise_state_dict(
-    checkpoint: dict[str, torch.Tensor], ref_keys: set[str], *, verbose: bool = True
-) -> tuple[dict[str, torch.Tensor], list[str]]:
-    """
-    • Re-keys `checkpoint ` so that every entry matches the *reference* key set.
-    • If several variant keys collapse to the same canonical name we keep the
-      first one and log the collision.
-    • Returns the new dict + a list of entries that could not be matched.
-    """
-    out, collisions, unmatched = {}, {}, []
-
-    for k, v in checkpoint.items():
-        canon = canonicalise(k)
-        if canon in ref_keys:
-            if canon in out:  # duplicate after collapsing
-                collisions.setdefault(canon, []).append(k)
-            else:
-                out[canon] = v
-        else:
-            unmatched.append(k)
-
-    if verbose:
-        for canon, variants in collisions.items():
-            print(f"[standardise_state_dict] '{canon}'  ←  {variants}")
-        if unmatched:
-            print(f"[standardise_state_dict] kept {len(unmatched)} unmatched keys")
-
-    out.update({k: checkpoint[k] for k in unmatched})
-    return out, unmatched
-
-
-def rename_checkpoint_keys(checkpoint: dict, rename_str: str):
-    """
-    Renames keys in a checkpoint dictionary based on the given rename string.
-
-    Args:
-        checkpoint (dict): The checkpoint dictionary.
-        rename_str (str): A string specifying key mappings in the format "old1//new1,old2//new2".
-
-    Returns:
-        dict: The modified checkpoint with renamed keys.
-    """
-
-    rename_dict = dict(pair.split("//") for pair in rename_str.split(","))
-
-    new_checkpoint = {}
-    for k, v in checkpoint.items():
-        for old_key, new_key in rename_dict.items():
-            if old_key in k:
-                k = k.replace(old_key, new_key)
-        new_checkpoint[k] = v
-    return new_checkpoint
-
-
-def load_smolvla(
-    model: torch.nn.Module,
-    filename: str | os.PathLike,
-    *,
-    device: str = "cpu",
-    checkpoint_keys_mapping: str = "",
-) -> torch.nn.Module:
-    state_dict = safetensors.torch.load_file(filename, device=device)
-
-    # Optional user-supplied renames (e.g. "model._orig_mod.//model.")
-    if checkpoint_keys_mapping and "//" in checkpoint_keys_mapping:
-        state_dict = rename_checkpoint_keys(state_dict, checkpoint_keys_mapping)
-
-    state_dict, _ = standardise_state_dict(state_dict, set(model.state_dict().keys()))
-
-    # HACK(aliberts): to not overwrite normalization parameters as they should come from the dataset
-    norm_keys = ("normalize_inputs", "normalize_targets", "unnormalize_outputs")
-    state_dict = {k: v for k, v in state_dict.items() if not k.startswith(norm_keys)}
-
-    missing, unexpected = model.load_state_dict(state_dict, strict=False)
-
-    if not all(key.startswith(norm_keys) for key in missing) or unexpected:
-        raise RuntimeError(
-            "SmolVLA %d missing / %d unexpected keys",
-            len(missing),
-            len(unexpected),
-        )
-
-    return model
-

 def create_sinusoidal_pos_embedding(
    time: torch.tensor, dimension: int, min_period: float, max_period: float, device="cpu"
@@ -333,7 +233,6 @@ class SmolVLAPolicy(PreTrainedPolicy):
        config.validate_features()
        self.config = config

-        self.language_tokenizer = AutoProcessor.from_pretrained(self.config.vlm_model_name).tokenizer
        self.model = VLAFlowMatching(config)
        self.reset()

@@ -343,23 +242,6 @@ class SmolVLAPolicy(PreTrainedPolicy):
            ACTION: deque(maxlen=self.config.n_action_steps),
        }

-    # HACK(aliberts, danaaubakirova): we overwrite this classmethod here to fix smolVLA-specific issues
-    @classmethod
-    def _load_as_safetensor(
-        cls,
-        model: "SmolVLAPolicy",
-        model_file: str,
-        map_location: str,
-        strict: bool,
-    ):
-        safetensors.torch.load_model(model, model_file, strict=strict, device=map_location)
-        return load_smolvla(
-            model,
-            model_file,
-            device=map_location,
-            checkpoint_keys_mapping="model._orig_mod.//model.",
-        )
-
    def get_optim_params(self) -> dict:
        return self.parameters()

@@ -375,7 +257,8 @@ class SmolVLAPolicy(PreTrainedPolicy):

        images, img_masks = self.prepare_images(batch)
        state = self.prepare_state(batch)
-        lang_tokens, lang_masks = self.prepare_language(batch)
+        lang_tokens = batch[f"{OBS_LANGUAGE}.tokens"]
+        lang_masks = batch[f"{OBS_LANGUAGE}.attention_mask"]

        actions = self.model.sample_actions(images, img_masks, lang_tokens, lang_masks, state, noise=noise)

@@ -435,7 +318,8 @@ class SmolVLAPolicy(PreTrainedPolicy):

        images, img_masks = self.prepare_images(batch)
        state = self.prepare_state(batch)
-        lang_tokens, lang_masks = self.prepare_language(batch)
+        lang_tokens = batch[f"{OBS_LANGUAGE}.tokens"]
+        lang_masks = batch[f"{OBS_LANGUAGE}.attention_mask"]
        actions = self.prepare_action(batch)
        actions_is_pad = batch.get("actions_id_pad")
        loss_dict = {}
@@ -499,30 +383,6 @@ class SmolVLAPolicy(PreTrainedPolicy):
            img_masks.append(mask)
        return images, img_masks

-    def prepare_language(self, batch) -> tuple[Tensor, Tensor]:
-        """Tokenize the text input"""
-        device = batch[OBS_STATE].device
-        tasks = batch["task"]
-        if isinstance(tasks, str):
-            tasks = [tasks]
-
-        if len(tasks) == 1:
-            tasks = [tasks[0] for _ in range(batch[OBS_STATE].shape[0])]
-
-        tasks = [task if task.endswith("\n") else f"{task}\n" for task in tasks]
-
-        tokenized_prompt = self.language_tokenizer.__call__(
-            tasks,
-            padding=self.config.pad_language_to,
-            padding_side="right",
-            max_length=self.config.tokenizer_max_length,
-            return_tensors="pt",
-        )
-        lang_tokens = tokenized_prompt["input_ids"].to(device=device)
-        lang_masks = tokenized_prompt["attention_mask"].to(device=device, dtype=torch.bool)
-
-        return lang_tokens, lang_masks
-
    def _pi_aloha_decode_state(self, state):
        # Flip the joints.
        for motor_idx in [1, 2, 8, 9]:
@@ -13,30 +13,46 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+from typing import Any
+
 import torch

+from lerobot.configs.types import PolicyFeature
 from lerobot.policies.smolvla.configuration_smolvla import SmolVLAConfig
 from lerobot.processor import (
+    DeviceProcessor,
    NormalizerProcessor,
+    RenameProcessor,
    RobotProcessor,
    ToBatchProcessor,
+    TokenizerProcessor,
    UnnormalizerProcessor,
 )
+from lerobot.processor.pipeline import EnvTransition, ProcessorStep, ProcessorStepRegistry, TransitionKey


 def make_smolvla_processor(
    config: SmolVLAConfig, dataset_stats: dict[str, dict[str, torch.Tensor]] | None = None
 ) -> tuple[RobotProcessor, RobotProcessor]:
    input_steps = [
+        RenameProcessor(rename_map={}),  # To mimic the same processor as pretrained one
        NormalizerProcessor(
-            features=config.input_features, norm_map=config.normalization_mapping, stats=dataset_stats
-        ),
-        NormalizerProcessor(
-            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
+            features={**config.input_features, **config.output_features},
+            norm_map=config.normalization_mapping,
+            stats=dataset_stats,
        ),
        ToBatchProcessor(),
+        SmolVLANewLineProcessor(),
+        TokenizerProcessor(
+            tokenizer_name=config.vlm_model_name,
+            padding=config.pad_language_to,
+            padding_side="right",
+            max_length=config.tokenizer_max_length,
+        ),
+        DeviceProcessor(device=config.device),
    ]
    output_steps = [
+        DeviceProcessor(device="cpu"),
        UnnormalizerProcessor(
            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
        ),
@@ -44,3 +60,50 @@ def make_smolvla_processor(
    return RobotProcessor(steps=input_steps, name="smolvla_preprocessor"), RobotProcessor(
        steps=output_steps, name="smolvla_postprocessor"
    )
+
+
+@ProcessorStepRegistry.register(name="smolvla_new_line_processor")
+class SmolVLANewLineProcessor(ProcessorStep):
+    """Add a new line to the end of the task if it doesn't have one."""
+
+    def __call__(self, transition: EnvTransition) -> EnvTransition:
+        # Check if complementary_data exists
+        complementary_data = transition.get(TransitionKey.COMPLEMENTARY_DATA)
+        if complementary_data is None or "task" not in complementary_data:
+            return transition
+
+        task = complementary_data["task"]
+        if task is None:
+            return transition
+
+        # Handle both string and list of strings
+        if isinstance(task, str):
+            # Single string: add newline if not present
+            if not task.endswith("\n"):
+                complementary_data["task"] = f"{task}\n"
+        elif isinstance(task, list) and all(isinstance(t, str) for t in task):
+            # List of strings: add newline to each if not present
+            complementary_data["task"] = [t if t.endswith("\n") else f"{t}\n" for t in task]
+        # If task is neither string nor list of strings, leave unchanged
+
+        return transition
+
+    def feature_contract(self, features: dict[str, PolicyFeature]) -> dict[str, PolicyFeature]:
+        """Add tokenized task features to the feature contract."""
+        return features
+
+    def state_dict(self) -> dict[str, torch.Tensor]:
+        """Return state dictionary (empty for this processor)."""
+        return {}
+
+    def load_state_dict(self, state: dict[str, torch.Tensor]) -> None:
+        """Load state dictionary (no-op for this processor)."""
+        pass
+
+    def reset(self) -> None:
+        """Reset processor state (no-op for this processor)."""
+        pass
+
+    def get_config(self) -> dict[str, Any]:
+        """Return configuration for serialization."""
+        return {}
@@ -18,7 +18,9 @@ import torch

 from lerobot.policies.tdmpc.configuration_tdmpc import TDMPCConfig
 from lerobot.processor import (
+    DeviceProcessor,
    NormalizerProcessor,
+    RenameProcessor,
    RobotProcessor,
    ToBatchProcessor,
    UnnormalizerProcessor,
@@ -29,15 +31,17 @@ def make_tdmpc_processor(
    config: TDMPCConfig, dataset_stats: dict[str, dict[str, torch.Tensor]] | None = None
 ) -> tuple[RobotProcessor, RobotProcessor]:
    input_steps = [
+        RenameProcessor(rename_map={}),
        NormalizerProcessor(
-            features=config.input_features, norm_map=config.normalization_mapping, stats=dataset_stats
-        ),
-        NormalizerProcessor(
-            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
+            features={**config.input_features, **config.output_features},
+            norm_map=config.normalization_mapping,
+            stats=dataset_stats,
        ),
        ToBatchProcessor(),
+        DeviceProcessor(device=config.device),
    ]
    output_steps = [
+        DeviceProcessor(device="cpu"),
        UnnormalizerProcessor(
            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
        ),
@@ -19,7 +19,9 @@ import torch

 from lerobot.policies.vqbet.configuration_vqbet import VQBeTConfig
 from lerobot.processor import (
+    DeviceProcessor,
    NormalizerProcessor,
+    RenameProcessor,
    RobotProcessor,
    ToBatchProcessor,
    UnnormalizerProcessor,
@@ -30,15 +32,17 @@ def make_vqbet_processor(
    config: VQBeTConfig, dataset_stats: dict[str, dict[str, torch.Tensor]] | None = None
 ) -> tuple[RobotProcessor, RobotProcessor]:
    input_steps = [
+        RenameProcessor(rename_map={}),  # Let the possibility to the user to rename the keys
        NormalizerProcessor(
-            features=config.input_features, norm_map=config.normalization_mapping, stats=dataset_stats
-        ),
-        NormalizerProcessor(
-            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
+            features={**config.input_features, **config.output_features},
+            norm_map=config.normalization_mapping,
+            stats=dataset_stats,
        ),
        ToBatchProcessor(),
+        DeviceProcessor(device=config.device),
    ]
    output_steps = [
+        DeviceProcessor(device="cpu"),
        UnnormalizerProcessor(
            features=config.output_features, norm_map=config.normalization_mapping, stats=dataset_stats
        ),
@@ -33,6 +33,7 @@ from .pipeline import (
    TruncatedProcessor,
 )
 from .rename_processor import RenameProcessor
+from .tokenizer_processor import TokenizerProcessor

 __all__ = [
    "ActionProcessor",
@@ -51,6 +52,7 @@ __all__ = [
    "RewardProcessor",
    "RobotProcessor",
    "ToBatchProcessor",
+    "TokenizerProcessor",
    "TransitionKey",
    "TruncatedProcessor",
    "VanillaObservationProcessor",
@@ -106,6 +106,18 @@ class ToBatchProcessor:
            if isinstance(task_value, str):
                complementary_data["task"] = [task_value]

+        # Process index field - add batch dim if 0D
+        if "index" in complementary_data:
+            index_value = complementary_data["index"]
+            if isinstance(index_value, Tensor) and index_value.dim() == 0:
+                complementary_data["index"] = index_value.unsqueeze(0)
+
+        # Process task_index field - add batch dim if 0D
+        if "task_index" in complementary_data:
+            task_index_value = complementary_data["task_index"]
+            if isinstance(task_index_value, Tensor) and task_index_value.dim() == 0:
+                complementary_data["task_index"] = task_index_value.unsqueeze(0)
+
    def get_config(self) -> dict[str, Any]:
        """Return configuration for serialization."""
        return {}
@@ -19,24 +19,61 @@ from typing import Any
 import torch

 from lerobot.configs.types import PolicyFeature
-from lerobot.processor.pipeline import EnvTransition, TransitionKey
+from lerobot.processor.pipeline import EnvTransition, ProcessorStepRegistry, TransitionKey
 from lerobot.utils.utils import get_safe_torch_device


+@ProcessorStepRegistry.register("device_processor")
@dataclass
 class DeviceProcessor:
-    """Processes transitions by moving tensors to the specified device.
+    """Processes transitions by moving tensors to the specified device and optionally converting float dtypes.

    This processor ensures that all tensors in the transition are moved to the
-    specified device (CPU or GPU) before they are returned.
+    specified device (CPU or GPU) before they are returned. It can also convert
+    floating-point tensors to a specified dtype while preserving non-float types
+    (int, long, bool, etc.).
    """

    device: torch.device = "cpu"
+    float_dtype: str | None = None

    def __post_init__(self):
        self.device = get_safe_torch_device(self.device)
        self.non_blocking = "cuda" in str(self.device)

+        # Validate and convert float_dtype string to torch dtype
+        if self.float_dtype is not None:
+            dtype_mapping = {
+                "float16": torch.float16,
+                "float32": torch.float32,
+                "float64": torch.float64,
+                "bfloat16": torch.bfloat16,
+                "half": torch.float16,
+                "float": torch.float32,
+                "double": torch.float64,
+            }
+
+            if self.float_dtype not in dtype_mapping:
+                available_dtypes = list(dtype_mapping.keys())
+                raise ValueError(
+                    f"Invalid float_dtype '{self.float_dtype}'. Available options: {available_dtypes}"
+                )
+
+            self._target_float_dtype = dtype_mapping[self.float_dtype]
+        else:
+            self._target_float_dtype = None
+
+    def _process_tensor(self, tensor: torch.Tensor) -> torch.Tensor:
+        """Process a tensor by moving to device and optionally converting float dtype."""
+        # Move to device first
+        tensor = tensor.to(self.device, non_blocking=self.non_blocking)
+
+        # Convert float dtype if specified and tensor is floating point
+        if self._target_float_dtype is not None and tensor.is_floating_point():
+            tensor = tensor.to(dtype=self._target_float_dtype)
+
+        return tensor
+
    def __call__(self, transition: EnvTransition) -> EnvTransition:
        # Create a copy of the transition
        new_transition = transition.copy()
@@ -45,7 +82,7 @@ class DeviceProcessor:
        observation = transition.get(TransitionKey.OBSERVATION)
        if observation is not None:
            new_observation = {
-                k: v.to(self.device, non_blocking=self.non_blocking) if isinstance(v, torch.Tensor) else v
+                k: self._process_tensor(v) if isinstance(v, torch.Tensor) else v
                for k, v in observation.items()
            }
            new_transition[TransitionKey.OBSERVATION] = new_observation
@@ -53,30 +90,54 @@ class DeviceProcessor:
        # Process action tensor
        action = transition.get(TransitionKey.ACTION)
        if action is not None and isinstance(action, torch.Tensor):
-            new_transition[TransitionKey.ACTION] = action.to(self.device, non_blocking=self.non_blocking)
+            new_transition[TransitionKey.ACTION] = self._process_tensor(action)

        # Process reward tensor
        reward = transition.get(TransitionKey.REWARD)
        if reward is not None and isinstance(reward, torch.Tensor):
-            new_transition[TransitionKey.REWARD] = reward.to(self.device, non_blocking=self.non_blocking)
+            new_transition[TransitionKey.REWARD] = self._process_tensor(reward)

        # Process done tensor
        done = transition.get(TransitionKey.DONE)
        if done is not None and isinstance(done, torch.Tensor):
-            new_transition[TransitionKey.DONE] = done.to(self.device, non_blocking=self.non_blocking)
+            new_transition[TransitionKey.DONE] = self._process_tensor(done)

        # Process truncated tensor
        truncated = transition.get(TransitionKey.TRUNCATED)
        if truncated is not None and isinstance(truncated, torch.Tensor):
-            new_transition[TransitionKey.TRUNCATED] = truncated.to(
-                self.device, non_blocking=self.non_blocking
-            )
+            new_transition[TransitionKey.TRUNCATED] = self._process_tensor(truncated)
+
+        # Process complementary data tensors
+        complementary_data = transition.get(TransitionKey.COMPLEMENTARY_DATA)
+        if complementary_data is not None:
+            new_complementary_data = {}
+
+            # Process all items in complementary_data
+            for key, value in complementary_data.items():
+                if isinstance(value, torch.Tensor):
+                    new_complementary_data[key] = self._process_tensor(value)
+                else:
+                    new_complementary_data[key] = value
+
+            new_transition[TransitionKey.COMPLEMENTARY_DATA] = new_complementary_data

        return new_transition

    def get_config(self) -> dict[str, Any]:
        """Return configuration for serialization."""
-        return {"device": self.device}
+        return {"device": self.device, "float_dtype": self.float_dtype}
+
+    def state_dict(self) -> dict[str, torch.Tensor]:
+        """Return state dictionary (empty for this processor)."""
+        return {}
+
+    def load_state_dict(self, state: dict[str, torch.Tensor]) -> None:
+        """Load state dictionary (no-op for this processor)."""
+        pass
+
+    def reset(self) -> None:
+        """Reset processor state (no-op for this processor)."""
+        pass

    def feature_contract(self, features: dict[str, PolicyFeature]) -> dict[str, PolicyFeature]:
        return features
@@ -116,7 +116,7 @@ class NormalizerProcessor:
        if self.normalize_keys is not None and not isinstance(self.normalize_keys, set):
            self.normalize_keys = set(self.normalize_keys)

-    def _normalize_obs(self, observation):
+    def _normalize_obs(self, observation, normalized_info):
        if observation is None:
            return None

@@ -138,6 +138,7 @@ class NormalizerProcessor:

            # Skip normalization if mode is IDENTITY
            if norm_mode is NormalizationMode.IDENTITY:
+                normalized_info[key] = "IDENTITY"
                continue

            # Skip if no stats available for this key
@@ -156,16 +157,18 @@ class NormalizerProcessor:
                if "mean" in stats and "std" in stats:
                    mean, std = stats["mean"], stats["std"]
                    processed[key] = (tensor - mean) / (std + self.eps)
+                    normalized_info[key] = "MEAN_STD"
            elif norm_mode is NormalizationMode.MIN_MAX:
                if "min" in stats and "max" in stats:
                    min_val, max_val = stats["min"], stats["max"]
                    processed[key] = 2 * (tensor - min_val) / (max_val - min_val + self.eps) - 1
+                    normalized_info[key] = "MIN_MAX"
            else:
                raise ValueError(f"Unsupported normalization mode: {norm_mode}")

        return processed

-    def _normalize_action(self, action):
+    def _normalize_action(self, action, normalized_info):
        if action is None:
            return action

@@ -174,6 +177,7 @@ class NormalizerProcessor:

        # Skip normalization if mode is IDENTITY
        if norm_mode is NormalizationMode.IDENTITY:
+            normalized_info["action"] = "IDENTITY"
            return action

        # Skip if no stats available for actions
@@ -190,10 +194,12 @@ class NormalizerProcessor:
        if norm_mode is NormalizationMode.MEAN_STD:
            if "mean" in stats and "std" in stats:
                mean, std = stats["mean"], stats["std"]
+                normalized_info["action"] = "MEAN_STD"
                return (tensor - mean) / (std + self.eps)
        elif norm_mode is NormalizationMode.MIN_MAX:
            if "min" in stats and "max" in stats:
                min_val, max_val = stats["min"], stats["max"]
+                normalized_info["action"] = "MIN_MAX"
                return 2 * (tensor - min_val) / (max_val - min_val + self.eps) - 1
        else:
            raise ValueError(f"Unsupported normalization mode: {norm_mode}")
@@ -202,13 +208,24 @@ class NormalizerProcessor:
        raise ValueError(f"Action stats must contain appropriate values for {norm_mode} normalization")

    def __call__(self, transition: EnvTransition) -> EnvTransition:
-        observation = self._normalize_obs(transition.get(TransitionKey.OBSERVATION))
-        action = self._normalize_action(transition.get(TransitionKey.ACTION))
+        # Track what was normalized
+        normalized_info = {}
+
+        observation = self._normalize_obs(transition.get(TransitionKey.OBSERVATION), normalized_info)
+        action = self._normalize_action(transition.get(TransitionKey.ACTION), normalized_info)

        # Create a new transition with normalized values
        new_transition = transition.copy()
        new_transition[TransitionKey.OBSERVATION] = observation
        new_transition[TransitionKey.ACTION] = action
+
+        # Add normalization info to complementary data
+        if normalized_info:
+            comp_data = new_transition.get(TransitionKey.COMPLEMENTARY_DATA, {})
+            comp_data = {} if comp_data is None else dict(comp_data)
+            comp_data["normalized_keys"] = normalized_info
+            new_transition[TransitionKey.COMPLEMENTARY_DATA] = comp_data
+
        return new_transition

    def get_config(self) -> dict[str, Any]:
@@ -289,7 +306,7 @@ class UnnormalizerProcessor:
        self.stats = self.stats or {}
        self._tensor_stats = _convert_stats_to_tensors(self.stats)

-    def _unnormalize_obs(self, observation):
+    def _unnormalize_obs(self, observation, unnormalized_info):
        if observation is None:
            return None
        keys = [k for k, ft in self.features.items() if ft.type is not FeatureType.ACTION]
@@ -304,6 +321,7 @@ class UnnormalizerProcessor:

            # Skip unnormalization if mode is IDENTITY
            if norm_mode is NormalizationMode.IDENTITY:
+                unnormalized_info[key] = "IDENTITY"
                continue

            # Skip if no stats available for this key
@@ -322,16 +340,18 @@ class UnnormalizerProcessor:
                if "mean" in stats and "std" in stats:
                    mean, std = stats["mean"], stats["std"]
                    processed[key] = tensor * std + mean
+                    unnormalized_info[key] = "MEAN_STD"
            elif norm_mode is NormalizationMode.MIN_MAX:
                if "min" in stats and "max" in stats:
                    min_val, max_val = stats["min"], stats["max"]
                    processed[key] = (tensor + 1) / 2 * (max_val - min_val) + min_val
+                    unnormalized_info[key] = "MIN_MAX"
            else:
                raise ValueError(f"Unsupported normalization mode: {norm_mode}")

        return processed

-    def _unnormalize_action(self, action):
+    def _unnormalize_action(self, action, unnormalized_info):
        if action is None:
            return action

@@ -340,6 +360,7 @@ class UnnormalizerProcessor:

        # Skip unnormalization if mode is IDENTITY
        if norm_mode is NormalizationMode.IDENTITY:
+            unnormalized_info["action"] = "IDENTITY"
            return action

        # Skip if no stats available for actions
@@ -356,10 +377,12 @@ class UnnormalizerProcessor:
        if norm_mode is NormalizationMode.MEAN_STD:
            if "mean" in stats and "std" in stats:
                mean, std = stats["mean"], stats["std"]
+                unnormalized_info["action"] = "MEAN_STD"
                return tensor * std + mean
        elif norm_mode is NormalizationMode.MIN_MAX:
            if "min" in stats and "max" in stats:
                min_val, max_val = stats["min"], stats["max"]
+                unnormalized_info["action"] = "MIN_MAX"
                return (tensor + 1) / 2 * (max_val - min_val) + min_val
        else:
            raise ValueError(f"Unsupported normalization mode: {norm_mode}")
@@ -368,13 +391,24 @@ class UnnormalizerProcessor:
        raise ValueError(f"Action stats must contain appropriate values for {norm_mode} normalization")

    def __call__(self, transition: EnvTransition) -> EnvTransition:
-        observation = self._unnormalize_obs(transition.get(TransitionKey.OBSERVATION))
-        action = self._unnormalize_action(transition.get(TransitionKey.ACTION))
+        # Track what was unnormalized
+        unnormalized_info = {}
+
+        observation = self._unnormalize_obs(transition.get(TransitionKey.OBSERVATION), unnormalized_info)
+        action = self._unnormalize_action(transition.get(TransitionKey.ACTION), unnormalized_info)

        # Create a new transition with unnormalized values
        new_transition = transition.copy()
        new_transition[TransitionKey.OBSERVATION] = observation
        new_transition[TransitionKey.ACTION] = action
+
+        # Add unnormalization info to complementary data
+        if unnormalized_info:
+            comp_data = new_transition.get(TransitionKey.COMPLEMENTARY_DATA, {})
+            comp_data = {} if comp_data is None else dict(comp_data)
+            comp_data["unnormalized_keys"] = unnormalized_info
+            new_transition[TransitionKey.COMPLEMENTARY_DATA] = comp_data
+
        return new_transition

    def get_config(self) -> dict[str, Any]:
@@ -413,3 +447,29 @@ def hotswap_stats(robot_processor: RobotProcessor, stats: dict[str, dict[str, An
            step.stats = stats
            step._tensor_stats = _convert_stats_to_tensors(stats)
    return robot_processor
+
+
+def rename_stats(stats: dict[str, dict[str, Any]], rename_map: dict[str, str]) -> dict[str, dict[str, Any]]:
+    """Rename keys in the stats dictionary according to the provided mapping.
+
+    Args:
+        stats: The statistics dictionary with structure {feature_key: {stat_name: value}}
+        rename_map: Dictionary mapping old key names to new key names
+
+    Returns:
+        A new stats dictionary with renamed keys
+
+    Example:
+        >>> stats = {"observation.state": {"mean": 0.0, "std": 1.0}, "action": {"mean": 0.5, "std": 0.5}}
+        >>> rename_map = {"observation.state": "observation.robot_state"}
+        >>> new_stats = rename_stats(stats, rename_map)
+        >>> # new_stats will have "observation.robot_state" instead of "observation.state"
+    """
+    renamed_stats = {}
+
+    for old_key, sub_stats in stats.items():
+        # Use the new key if it exists in the rename map, otherwise keep the old key
+        new_key = rename_map.get(old_key, old_key)
+        renamed_stats[new_key] = deepcopy(sub_stats)
+
+    return renamed_stats
@@ -201,10 +201,16 @@ def _default_batch_to_transition(batch: dict[str, Any]) -> EnvTransition:  # noq
    observation_keys = {k: v for k, v in batch.items() if k.startswith("observation.")}
    observation = observation_keys if observation_keys else None

-    # Extract padding and task keys for complementary data
+    # Extract padding, task, index, and task_index keys for complementary data
    pad_keys = {k: v for k, v in batch.items() if "_is_pad" in k}
    task_key = {"task": batch["task"]} if "task" in batch else {}
-    complementary_data = {**pad_keys, **task_key} if pad_keys or task_key else {}
+    index_key = {"index": batch["index"]} if "index" in batch else {}
+    task_index_key = {"task_index": batch["task_index"]} if "task_index" in batch else {}
+    complementary_data = (
+        {**pad_keys, **task_key, **index_key, **task_index_key}
+        if pad_keys or task_key or index_key or task_index_key
+        else {}
+    )

    transition: EnvTransition = {
        TransitionKey.OBSERVATION: observation,
@@ -231,7 +237,7 @@ def _default_transition_to_batch(transition: EnvTransition) -> dict[str, Any]:
        "info": transition.get(TransitionKey.INFO, {}),
    }

-    # Add padding and task data from complementary_data
+    # Add padding, task, index, and task_index data from complementary_data
    complementary_data = transition.get(TransitionKey.COMPLEMENTARY_DATA)
    if complementary_data:
        pad_data = {k: v for k, v in complementary_data.items() if "_is_pad" in k}
@@ -240,6 +246,12 @@ def _default_transition_to_batch(transition: EnvTransition) -> dict[str, Any]:
        if "task" in complementary_data:
            batch["task"] = complementary_data["task"]

+        if "index" in complementary_data:
+            batch["index"] = complementary_data["index"]
+
+        if "task_index" in complementary_data:
+            batch["task_index"] = complementary_data["task_index"]
+
    # Handle observation - flatten dict to observation.* keys if it's a dict
    observation = transition.get(TransitionKey.OBSERVATION)
    if isinstance(observation, dict):
@@ -0,0 +1,210 @@
+"""
+Tokenizer processor for handling text tokenization in robot transitions.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import Any
+
+import torch
+from transformers import AutoTokenizer
+
+from lerobot.configs.types import FeatureType, PolicyFeature
+from lerobot.constants import OBS_LANGUAGE
+from lerobot.processor.pipeline import EnvTransition, ProcessorStepRegistry, TransitionKey
+
+
+@dataclass
+@ProcessorStepRegistry.register(name="tokenizer_processor")
+class TokenizerProcessor:
+    """Tokenizes text tasks in complementary data using a huggingface tokenizer.
+
+    This processor handles tokenization of task strings found in the complementary_data
+    using a specified pretrained tokenizer from Hugging Face. It adds tokenized versions
+    to the observation data for model processing while preserving the original task string.
+
+    The processor supports both single strings and lists of strings as task inputs.
+
+    Args:
+        tokenizer_name: Name of the pretrained tokenizer to load from Hugging Face Hub
+            (e.g., "bert-base-uncased", "microsoft/DialoGPT-medium"). This will be used
+            with AutoTokenizer.from_pretrained(). If tokenizer is provided, this is ignored.
+        tokenizer: A tokenizer object (e.g., from transformers library) that implements
+            the __call__ method. If provided, tokenizer_name is ignored. This parameter
+            is not serialized and must be provided via overrides when loading.
+        max_length: Maximum sequence length for tokenization. Defaults to 512.
+        task_key: Key in complementary_data containing the task text. Defaults to "task".
+        padding: Padding strategy for tokenization. Defaults to "max_length".
+        truncation: Whether to truncate sequences longer than max_length. Defaults to True.
+
+    Examples:
+        Using tokenizer name (auto-loaded):
+        ```python
+        processor = TokenizerProcessor(tokenizer_name="bert-base-uncased", max_length=128)
+        ```
+
+        Using custom tokenizer object:
+        ```python
+        from transformers import AutoTokenizer
+
+        custom_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+        processor = TokenizerProcessor(tokenizer=custom_tokenizer, max_length=128)
+        ```
+    """
+
+    tokenizer_name: str | None = None
+    tokenizer: AutoTokenizer | None = None
+    max_length: int = 512
+    task_key: str = "task"
+    padding_side: str = "right"
+    padding: str = "max_length"
+    truncation: bool = True
+
+    # Internal tokenizer instance (not serialized)
+    _tokenizer: Any = field(default=None, init=False, repr=False)
+
+    def __post_init__(self):
+        """Initialize the tokenizer from the provided tokenizer or tokenizer name."""
+        if self.tokenizer is not None:
+            # Use provided tokenizer object directly
+            self._tokenizer = self.tokenizer
+        elif self.tokenizer_name is not None:
+            self._tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name)
+        else:
+            raise ValueError(
+                "Either 'tokenizer' or 'tokenizer_name' must be provided. "
+                "Pass a tokenizer object directly or a tokenizer name to auto-load."
+            )
+
+    def get_task(self, transition: EnvTransition) -> list[str] | None:
+        """Extract and normalize task from complementary data.
+
+        Args:
+            transition: Input transition containing complementary_data.
+
+        Returns:
+            List of task strings if task is present, None otherwise.
+        """
+        complementary_data = transition.get(TransitionKey.COMPLEMENTARY_DATA)
+        if complementary_data is None:
+            return None
+
+        if self.task_key not in complementary_data:
+            return None
+
+        task = complementary_data[self.task_key]
+        if task is None:
+            return None
+
+        # Convert to list of strings
+        if isinstance(task, str):
+            return [task]
+        elif isinstance(task, list) and all(isinstance(t, str) for t in task):
+            return task
+
+        return None
+
+    def __call__(self, transition: EnvTransition) -> EnvTransition:
+        """Process the transition by tokenizing the task text.
+
+        Args:
+            transition: Input transition containing complementary_data with task text.
+
+        Returns:
+            Modified transition with tokenized task added to observation.
+
+        Raises:
+            ValueError: If tokenizer initialization failed.
+        """
+        task = self.get_task(transition)
+        if task is None:
+            return transition
+
+        # Tokenize the task
+        tokenized_prompt = self._tokenize_text(task)
+
+        # Get or create observation dict
+        if TransitionKey.OBSERVATION not in transition or transition[TransitionKey.OBSERVATION] is None:
+            transition[TransitionKey.OBSERVATION] = {}
+        observation = transition[TransitionKey.OBSERVATION]
+
+        # Add tokenized data to observation
+        observation[f"{OBS_LANGUAGE}.tokens"] = tokenized_prompt["input_ids"]
+        observation[f"{OBS_LANGUAGE}.attention_mask"] = tokenized_prompt["attention_mask"].to(
+            dtype=torch.bool
+        )
+
+        return transition
+
+    def _tokenize_text(self, text: str | list[str]) -> dict[str, torch.Tensor]:
+        """Tokenize text using the configured tokenizer.
+
+        Args:
+            text: Text string or list of strings to tokenize.
+
+        Returns:
+            Dictionary containing tokenized output with keys like 'input_ids', 'attention_mask'.
+        """
+        return self._tokenizer(
+            text,
+            max_length=self.max_length,
+            truncation=self.truncation,
+            padding=self.padding,
+            padding_side=self.padding_side,
+            return_tensors="pt",
+        )
+
+    def get_config(self) -> dict[str, Any]:
+        """Return configuration for serialization.
+
+        Note: Only tokenizer_name is saved, not the tokenizer object itself.
+        When loading, provide the tokenizer via overrides if needed.
+        """
+        config = {
+            "max_length": self.max_length,
+            "task_key": self.task_key,
+            "padding_side": self.padding_side,
+            "padding": self.padding,
+            "truncation": self.truncation,
+        }
+
+        # Only include tokenizer_name if it was used (not when tokenizer object was provided)
+        if self.tokenizer_name is not None:
+            config["tokenizer_name"] = self.tokenizer_name
+
+        return config
+
+    def state_dict(self) -> dict[str, torch.Tensor]:
+        """Return state dictionary (empty for this processor)."""
+        return {}
+
+    def load_state_dict(self, state: dict[str, torch.Tensor]) -> None:
+        """Load state dictionary (no-op for this processor)."""
+        pass
+
+    def reset(self) -> None:
+        """Reset processor state (no-op for this processor)."""
+        pass
+
+    def feature_contract(self, features: dict[str, PolicyFeature]) -> dict[str, PolicyFeature]:
+        """Add tokenized task features to the feature contract.
+
+        Args:
+            features: Input feature dictionary.
+
+        Returns:
+            Updated feature dictionary with tokenized task features added.
+        """
+        # Add features for tokenized output if they don't exist
+        # Standard tokenizer output includes tokens and attention_mask
+        tokens_key = f"{OBS_LANGUAGE}.tokens"
+        attention_mask_key = f"{OBS_LANGUAGE}.attention_mask"
+
+        if tokens_key not in features:
+            features[tokens_key] = PolicyFeature(type=FeatureType.LANGUAGE, shape=(self.max_length,))
+
+        if attention_mask_key not in features:
+            features[attention_mask_key] = PolicyFeature(type=FeatureType.LANGUAGE, shape=(self.max_length,))
+
+        return features