Added script for the second part of the processor doc

2026-05-15 08:39:49 +00:00 · 2025-08-06 17:41:39 +02:00
parent b9d3c34ae4
commit 8d5f519fcb
2 changed files with 496 additions and 0 deletions
@@ -0,0 +1,496 @@
+# Implement your processor
+
+In this tutorial, we will explain how to implement your own processor. We will start by motivating 
+the need for a custom processor and then we will explain the helper classes that can help you implement your own processor.
+
+## Why would you need a custom processor?
+
+In most cases, when reading raw data from a sensor like the camera and robot motor encoders, 
+you will need to process this data to transform it into a format that is compatible to use with the policies in LeRobot.    
+For example, raw images are encoded with `uint8` and the values are in the range `[0, 255]`. 
+To use these images with the policies, you will need to cast them to `float32` and normalize them to the range `[0, 1]`.
+
+For example, in LeRobot's `ImageProcessor`, raw images come from the environment as numpy arrays with `uint8` values in range `[0, 255]` and in channel-last format `(H, W, C)`. The processor transforms them into PyTorch tensors with `float32` values in range `[0, 1]` and channel-first format `(C, H, W)`:
+
+```python
+# Input: numpy array with shape (480, 640, 3) and dtype uint8
+raw_image = env_observation["pixels"]  # Values in [0, 255]
+
+# After processing: torch tensor with shape (1, 3, 480, 640) and dtype float32  
+processed_image = processor(transition)["observation"]["observation.image"]  # Values in [0, 1]
+```
+
+On the other hand, when a model returns a certain action to be executed on the robot, it is often that one has to post-process this action to make it compatible to run on the robot. 
+For example, the model might return joint positions values that range from `[-1, 1]` and one would need to scale them to the ranges of the minumum and maximum joint angle positions of the robot. 
+
+For instance, in LeRobot's `UnnormalizerProcessor`, model outputs are in the normalized range `[-1, 1]` and need to be converted back to actual robot joint ranges:
+
+```python
+# Input: model action with normalized values in [-1, 1]
+normalized_action = torch.tensor([-0.5, 0.8, -1.0, 0.2])  # Model output
+
+# After post-processing: real joint positions in robot's native ranges
+# Example: joints range from [-180.0, 180.0]
+real_action = unnormalizer(transition)["action"]  
+# real action after post-processing: [ -90.,  144., -180.,   36.]
+```
+
+The unnormalizer uses the dataset statistics to convert back:
+```python
+# For MIN_MAX normalization: action = (normalized + 1) * (max - min) / 2 + min
+real_action = (normalized_action + 1) * (max_val - min_val) / 2 + min_val
+```
+
+All this situation point us towards the need for a mechanism to preprocess the data before inputed to the policies and then post-process the action that are returend to be executed on the robot.
+
+To that end, LeRobot provides a pipeline mechanism to implement a sequence of processing steps for the input data and the output action.
+
+## How to implement your own processor? 
+
+Prepare the sequence of processing steps necessary for your problem. A processor step is a class that implements the following methods:
+
+- `__call__`: implements the processing step for the input transition.
+- `get_config`: gets the configuration of the processor step.
+- `state_dict`: gets the state of the processor step.
+- `load_state_dict`: loads the state of the processor step.
+- `reset`: resets the state of the processor step.
+- `feature_contract`: displays the modification to the feature space during the processor step.
+
+### Implement the `__call__` method
+
+The `__call__` method is the core of your processor step. It takes an `EnvTransition` and returns a modified `EnvTransition`. Here's a real example from LeRobot's `ImageProcessor`:
+
+```python
+from dataclasses import dataclass
+import torch
+import einops
+from lerobot.processor.pipeline import EnvTransition, TransitionKey
+
+@dataclass
+class ImageProcessor:
+    def __call__(self, transition: EnvTransition) -> EnvTransition:
+        observation = transition.get(TransitionKey.OBSERVATION)
+        
+        if observation is None:
+            return transition
+            
+        processed_obs = {}
+        
+        # Copy all observations first
+        for key, value in observation.items():
+            processed_obs[key] = value
+            
+        # Handle pixels key if present  
+        pixels = observation.get("pixels")
+        if pixels is not None:
+            # Remove pixels from processed_obs since we'll replace it with processed images
+            processed_obs.pop("pixels", None)
+            
+            # Process the image
+            processed_img = self._process_single_image(pixels)
+            processed_obs["observation.image"] = processed_img
+            
+        # Return new transition with processed observation
+        new_transition = transition.copy()
+        new_transition[TransitionKey.OBSERVATION] = processed_obs
+        return new_transition
+        
+    def _process_single_image(self, img):
+        # Convert to tensor
+        img_tensor = torch.from_numpy(img)
+        
+        # Add batch dimension if needed
+        if img_tensor.ndim == 3:
+            img_tensor = img_tensor.unsqueeze(0)
+            
+        # Convert to channel-first format: (B, H, W, C) -> (B, C, H, W)  
+        img_tensor = einops.rearrange(img_tensor, "b h w c -> b c h w").contiguous()
+        
+        # Convert to float32 and normalize to [0, 1]
+        img_tensor = img_tensor.type(torch.float32) / 255.0
+        
+        return img_tensor
+```
+
+Key principles for implementing `__call__`:
+- Always check if the required data exists (observations, actions, etc.)
+- Return the original transition unchanged if no processing is needed
+- Create a copy of the transition to avoid side effects
+- Only modify the specific keys your processor is responsible for
+### Configuration and State Management
+
+LeRobot processors support serialization and deserialization through three key methods. Here's how they work using `NormalizerProcessor` as an example:
+
+#### `get_config()` - Serializable Configuration
+This method returns all non-tensor configuration that can be saved to JSON:
+
+```python
+@dataclass 
+class NormalizerProcessor:
+    features: dict[str, PolicyFeature]
+    norm_map: dict[FeatureType, NormalizationMode] 
+    normalize_keys: set[str] | None = None
+    eps: float = 1e-8
+    
+    def get_config(self) -> dict[str, Any]:
+        """Return JSON-serializable configuration."""
+        return {
+            "features": {k: {"type": v.type.value, "shape": v.shape} for k, v in self.features.items()},
+            "norm_map": {k.value: v.value for k, v in self.norm_map.items()},
+            "normalize_keys": list(self.normalize_keys) if self.normalize_keys else None,
+            "eps": self.eps,
+            # Note: 'stats' is not included as it contains tensors
+        }
+```
+
+#### `state_dict()` - Tensor State
+This method returns only PyTorch tensors that need special serialization:
+
+```python
+def state_dict(self) -> dict[str, torch.Tensor]:
+    """Return tensor state dictionary.""" 
+    state = {}
+    for key, stats in self._tensor_stats.items():
+        for stat_name, tensor_val in stats.items():
+            state[f"{key}.{stat_name}"] = tensor_val
+    return state
+```
+
+#### `load_state_dict()` - Restore Tensor State  
+This method restores the tensor state from a saved state dictionary:
+
+```python
+def load_state_dict(self, state: dict[str, torch.Tensor]) -> None:
+    """Load tensor state from state dictionary."""
+    # Reconstruct _tensor_stats from flat state dict
+    self._tensor_stats = {}
+    for full_key, tensor_val in state.items():
+        if "." in full_key:
+            key, stat_name = full_key.rsplit(".", 1)
+            if key not in self._tensor_stats:
+                self._tensor_stats[key] = {}
+            self._tensor_stats[key][stat_name] = tensor_val
+```
+
+#### Usage Example
+```python
+# Save processor
+config = processor.get_config()
+tensors = processor.state_dict()
+
+# Later, restore processor
+new_processor = NormalizerProcessor(**config)
+new_processor.load_state_dict(tensors)
+```
+
+### Feature Contract
+
+The `feature_contract` method defines how your processor transforms the feature space. It tells the system how input feature names and shapes change after processing. This is crucial for policy configuration and debugging.
+
+Here's an example from `ImageProcessor`:
+
+```python
+def feature_contract(self, features: dict[str, PolicyFeature]) -> dict[str, PolicyFeature]:
+    """Transforms:
+    pixels -> observation.image,
+    observation.pixels -> observation.image,
+    pixels.<cam> -> observation.images.<cam>,
+    observation.pixels.<cam> -> observation.images.<cam>
+    """
+    # Handle simple pixel renaming
+    if "pixels" in features:
+        features["observation.image"] = features.pop("pixels")
+    if "observation.pixels" in features:
+        features["observation.image"] = features.pop("observation.pixels")
+        
+    # Handle camera-specific pixels
+    prefixes = ("pixels.", "observation.pixels.")
+    for key in list(features.keys()):
+        for p in prefixes:
+            if key.startswith(p):
+                suffix = key[len(p):]
+                features[f"observation.images.{suffix}"] = features.pop(key)
+                break
+    return features
+```
+
+And from `StateProcessor`:
+
+```python
+def feature_contract(self, features: dict[str, PolicyFeature]) -> dict[str, PolicyFeature]:
+    """Transforms:
+    environment_state -> observation.environment_state,
+    agent_pos -> observation.state
+    """
+    pairs = (
+        ("environment_state", "observation.environment_state"),
+        ("agent_pos", "observation.state"),
+    )
+    for old, new in pairs:
+        if old in features:
+            features[new] = features.pop(old)
+        prefixed = f"observation.{old}"
+        if prefixed in features:
+            features[new] = features.pop(prefixed)
+    return features
+```
+
+**Key principles:**
+- Use `features.pop(old_key)` to remove the old feature and get its value
+- Use `features[new_key] = old_feature` to add the new feature with same properties
+- Always return the modified features dictionary
+- Document the transformations clearly in the docstring
+
+## Helper Classes
+
+LeRobot provides several pre-built processor classes that handle common transformations, which you can use directly or as building blocks for your custom processors.
+
+### Core Processing Classes
+
+- **`ImageProcessor`** - Converts images from numpy arrays (uint8, channel-last) to PyTorch tensors (float32, channel-first)
+- **`StateProcessor`** - Handles state observations, converting numpy arrays to tensors and renaming keys  
+- **`NormalizerProcessor`** - Normalizes observations and actions using dataset statistics (mean/std or min/max)
+- **`UnnormalizerProcessor`** - Inverse of NormalizerProcessor, converts normalized values back to original ranges
+
+### Utility Classes
+
+- **`DeviceProcessor`** - Moves tensors to specified device (CPU/GPU)
+- **`ToBatchProcessor`** - Adds batch dimensions to observations and actions
+- **`RenameProcessor`** - Renames keys in observations using a mapping dictionary
+- **`TokenizerProcessor`** - Handles text tokenization for language-conditioned policies
+
+### Composition Classes
+
+- **`VanillaObservationProcessor`** - Combines ImageProcessor and StateProcessor for complete observation handling
+- **`RobotProcessor`** - Main container that orchestrates a sequence of processor steps
+
+### Example Usage
+
+```python
+from lerobot.processor import (
+    ImageProcessor, 
+    NormalizerProcessor, 
+    DeviceProcessor,
+    RobotProcessor
+)
+
+# Create a processing pipeline
+preprocessing_steps = [
+    ImageProcessor(),  # Convert images
+    NormalizerProcessor(features=features, norm_map=norm_map, stats=dataset_stats),  # Normalize
+    DeviceProcessor(device="cuda"),  # Move to GPU
+]
+
+# Combine into a robot processor
+preprocessor = RobotProcessor(steps=preprocessing_steps, name="my_preprocessor")
+
+# Use it
+processed_transition = preprocessor(raw_transition)
+```
+
+These helper classes implement the full `ProcessorStep` protocol, so they can be easily combined and extended for your specific needs.
+
+## Best Practices
+
+### Design Principles
+
+- **Keep processors atomic** - Each processor should handle one specific transformation. This makes them more reusable and easier to debug.
+- **Use dataclasses** - Always implement processors as dataclasses for clean initialization and automatic generation of `__init__`, `__repr__`, etc.
+
+```python
+@dataclass
+class MyProcessor:
+    threshold: float = 0.5
+    normalize: bool = True
+```
+
+### Registration and Discovery
+
+- **Always register your processor** - Use the `@ProcessorStepRegistry.register()` decorator to make your processor discoverable:
+
+```python
+@ProcessorStepRegistry.register("my_custom_processor")
+@dataclass  
+class MyCustomProcessor:
+    # Implementation
+```
+
+### State Management
+
+- **Separate concerns in config vs state_dict** - Keep non-tensor configuration in `get_config()` and only tensors in `state_dict()`:
+
+```python
+def get_config(self) -> dict[str, Any]:
+    # JSON-serializable data only
+    return {"threshold": self.threshold, "mode": self.mode}
+    
+def state_dict(self) -> dict[str, torch.Tensor]:  
+    # Tensors only
+    return {"running_mean": self.running_mean, "weights": self.weights}
+```
+
+### Safety and Robustness
+
+- **Always check for None values** - Validate that required data exists before processing:
+
+```python
+def __call__(self, transition: EnvTransition) -> EnvTransition:
+    observation = transition.get(TransitionKey.OBSERVATION)
+    if observation is None:
+        return transition  # Return unchanged if no observation
+```
+
+- **Use copy() for safety** - If performance is not critical, copy transitions to avoid side effects:
+
+```python
+new_transition = transition.copy()
+new_transition[TransitionKey.OBSERVATION] = processed_obs
+return new_transition
+```
+
+### Debugging and Development
+
+- **Use hooks for debugging** - RobotProcessor supports hooks for monitoring:
+
+```python
+def debug_hook(step_idx: int, transition: EnvTransition) -> None:
+    print(f"Step {step_idx}: {list(transition.keys())}")
+
+processor = RobotProcessor(steps=[...], before_step_hooks=[debug_hook])
+```
+
+- **Use step_through() for development** - Iterate through processing steps one by one:
+
+```python
+for step_idx, transition in processor.step_through(data):
+    print(f"After step {step_idx}: {transition}")
+```
+
+### Performance Considerations
+
+- **Batch processing** - Design processors to handle batched data efficiently when possible
+- **Device awareness** - Let `DeviceProcessor` handle device placement rather than hardcoding it
+- **Memory efficiency** - Reuse tensors when safe to do so for better performance
+
+### Documentation
+
+- **Document feature contracts clearly** - Be explicit about how your processor transforms the feature space
+- **Provide usage examples** - Include docstring examples showing typical usage patterns
+
+## Complete Example: Custom Smoothing Processor
+
+Here's a complete example that implements a simple action smoothing processor:
+
+```python
+from dataclasses import dataclass, field
+from typing import Any
+import torch
+from torch import Tensor
+
+from lerobot.processor.pipeline import (
+    EnvTransition, 
+    ProcessorStepRegistry, 
+    TransitionKey
+)
+from lerobot.configs.types import PolicyFeature
+
+@ProcessorStepRegistry.register("action_smoother")  
+@dataclass
+class ActionSmoothingProcessor:
+    """Smooths actions using exponential moving average.
+    
+    This processor maintains a running average of actions to reduce jitter
+    from the policy predictions.
+    """
+    
+    # Configuration
+    alpha: float = 0.7  # Smoothing factor (0 = no smoothing, 1 = no memory)
+    
+    # State (not in config)
+    _previous_action: Tensor | None = field(default=None, init=False, repr=False)
+    _initialized: bool = field(default=False, init=False, repr=False)
+    
+    def __call__(self, transition: EnvTransition) -> EnvTransition:
+        action = transition.get(TransitionKey.ACTION)
+        
+        if action is None:
+            return transition
+            
+        # Convert to tensor if needed
+        if not isinstance(action, torch.Tensor):
+            action = torch.as_tensor(action, dtype=torch.float32)
+            
+        # Initialize on first call
+        if not self._initialized:
+            self._previous_action = action.clone()
+            self._initialized = True
+            smoothed_action = action
+        else:
+            # Exponential moving average: new = alpha * current + (1-alpha) * previous  
+            smoothed_action = self.alpha * action + (1 - self.alpha) * self._previous_action
+            self._previous_action = smoothed_action.clone()
+            
+        # Return new transition with smoothed action
+        new_transition = transition.copy()
+        new_transition[TransitionKey.ACTION] = smoothed_action
+        return new_transition
+        
+    def get_config(self) -> dict[str, Any]:
+        """Return JSON-serializable configuration."""
+        return {"alpha": self.alpha}
+        
+    def state_dict(self) -> dict[str, torch.Tensor]:
+        """Return tensor state."""
+        if self._previous_action is not None:
+            return {"previous_action": self._previous_action}
+        return {}
+        
+    def load_state_dict(self, state: dict[str, torch.Tensor]) -> None:
+        """Load tensor state."""
+        if "previous_action" in state:
+            self._previous_action = state["previous_action"]
+            self._initialized = True
+        else:
+            self._previous_action = None
+            self._initialized = False
+            
+    def reset(self) -> None:
+        """Reset processor state at episode boundaries."""
+        self._previous_action = None
+        self._initialized = False
+        
+    def feature_contract(self, features: dict[str, PolicyFeature]) -> dict[str, PolicyFeature]:
+        """Action shapes remain unchanged."""
+        return features  # No transformation to feature space
+```
+
+### Usage Example
+
+```python
+from lerobot.processor import RobotProcessor, DeviceProcessor
+
+# Create a postprocessing pipeline with action smoothing
+postprocessing_steps = [
+    DeviceProcessor(device="cpu"),  # Move to CPU first
+    ActionSmoothingProcessor(alpha=0.8),  # Apply smoothing
+]
+
+postprocessor = RobotProcessor(
+    steps=postprocessing_steps, 
+    name="smooth_postprocessor"
+)
+
+# Use in your policy inference loop
+for transition in environment_transitions:
+    # Get action from policy
+    action = policy(transition) 
+    
+    # Post-process (including smoothing)
+    transition_with_action = {"action": action}
+    smoothed_transition = postprocessor(transition_with_action)
+    
+    # Execute smoothed action
+    next_obs = env.step(smoothed_transition["action"])
+```
+
+This example demonstrates all the key concepts: processor registration, state management, configuration serialization, and proper integration with the LeRobot pipeline system.