mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-15 08:39:49 +00:00
a877c596ba
* chore(docs): initialize doc * Added script for the second part of the processor doc * precommit style nit * improved part 2 of processor guide * Add comprehensive documentation for processors in robotics - Introduced a detailed guide on processors, covering their role in transforming raw robot data into model-ready inputs and vice versa. - Explained core concepts such as EnvTransition, ProcessorStep, and RobotProcessor, along with their functionalities. - Included examples of common processor steps like normalization, device management, batch processing, and text tokenization. - Provided insights on building complete pipelines, integrating processors into training loops, and saving/loading configurations. - Emphasized best practices and advanced features for effective usage of processors in robotics applications. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(docs): Enhance introduction to processors with additional converter functions - Updated the introduction to processors documentation to include default batch-to-transition and transition-to-batch converters. - Added detailed descriptions and examples for new specialized converter functions: `to_transition_teleop_action`, `to_transition_robot_observation`, `to_output_robot_action`, and `to_dataset_frame`. - Improved clarity on how these converters facilitate integration with existing robotics applications. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Improved doc implement_your_own_pipeline - Use normalization processor as default example - Add section on transform features - Add section on overrides. * Add phone docs and use pipeline for robots/teleop docs * Fix typo in documentation for adapters in robots/teleop section * Enhance documentation for processors with detailed explanations and examples - Updated the introduction to processors, clarifying the role of `EnvTransition` and `ProcessorStep`. - Introduced `DataProcessorPipeline` as a generic orchestrator for chaining processor steps. - Added comprehensive descriptions of new converter functions and their applications. - Improved clarity on type safety and the differences between `RobotProcessorPipeline` and `PolicyProcessorPipeline`. - Included examples for various processing scenarios, emphasizing best practices for data handling in robotics. * Enhance documentation for processor migration and debugging - Added detailed sections on the migration of models to the new `PolicyProcessorPipeline` system, including breaking changes and migration scripts. - Introduced a comprehensive guide for debugging processor pipelines, covering common issues, step-by-step inspection, and runtime monitoring techniques. - Updated examples to reflect new usage patterns and best practices for processor implementation and error handling. - Clarified the role of various processor steps and their configurations in the context of robotics applications. --------- Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pepijn <pepijn@huggingface.co>
425 lines
17 KiB
Plaintext
425 lines
17 KiB
Plaintext
# Introduction to Processors
|
|
|
|
In robotics, there's a fundamental mismatch between the data that robots and humans produce and what machine learning models expect. This creates several translation challenges:
|
|
|
|
**Raw Robot Data → Model Input:**
|
|
|
|
- Robots output raw sensor data (camera images, joint positions, force readings) that need normalization, batching, and device placement before models can process them
|
|
- Language instructions from humans ("pick up the red cube") must be tokenized into numerical representations
|
|
- Different robots use different coordinate systems and units that need standardization
|
|
|
|
**Model Output → Robot Commands:**
|
|
|
|
- Models might output end-effector positions, but robots need joint-space commands
|
|
- Teleoperators (like gamepads) produce relative movements (delta positions), but robots expect absolute commands
|
|
- Model predictions are often normalized and need to be converted back to real-world scales
|
|
|
|
**Cross-Domain Translation:**
|
|
|
|
- Training data from one robot setup needs adaptation for deployment on different hardware
|
|
- Models trained with specific camera configurations must work with new camera arrangements
|
|
- Datasets with different naming conventions need harmonization
|
|
|
|
**That's where processors come in.** They serve as the universal translators that bridge these gaps, ensuring seamless data flow from sensors to models to actuators.
|
|
|
|
Processors are the data transformation backbone of LeRobot. They handle all the preprocessing and postprocessing steps needed to convert raw environment data into model-ready inputs and vice versa.
|
|
|
|
## What are Processors?
|
|
|
|
In robotics, data comes in many forms - images from cameras, joint positions from sensors, text instructions from users, and more. Each type of data requires specific transformations before a model can use it effectively. Models need this data to be:
|
|
|
|
- **Normalized**: Scaled to appropriate ranges for neural network processing
|
|
- **Batched**: Organized with proper dimensions for batch processing
|
|
- **Tokenized**: Text converted to numerical representations
|
|
- **Device-placed**: Moved to the right hardware (CPU/GPU)
|
|
- **Type-converted**: Cast to appropriate data types
|
|
|
|
Processors handle these transformations through composable, reusable steps that can be chained together into pipelines. Think of them as a modular assembly line where each station performs a specific transformation on your data.
|
|
|
|
## Core Concepts
|
|
|
|
### EnvTransition: The Universal Data Container
|
|
|
|
The `EnvTransition` is the fundamental data structure that flows through all processors. It's a strongly-typed dictionary that represents a complete robot-environment interaction:
|
|
|
|
```python
|
|
from lerobot.processor import TransitionKey, EnvTransition, PolicyAction, RobotAction
|
|
|
|
# EnvTransition is precisely typed to handle different action types:
|
|
# - PolicyAction: torch.Tensor (for model inputs/outputs)
|
|
# - RobotAction: dict[str, Any] (for robot hardware)
|
|
# - EnvAction: np.ndarray (for gym environments)
|
|
|
|
# Example transition from a robot collecting data
|
|
transition: EnvTransition = {
|
|
TransitionKey.OBSERVATION: { # dict[str, Any] | None
|
|
"observation.images.camera0": camera0_image_tensor, # Shape: (H, W, C)
|
|
"observation.images.camera1": camera1_image_tensor, # Shape: (H, W, C)
|
|
"observation.state": joint_positions_tensor, # Shape: (7,) for 7-DOF arm
|
|
"observation.environment_state": env_state_tensor # Shape: (3,) for object position
|
|
},
|
|
TransitionKey.ACTION: action_tensor, # PolicyAction | RobotAction | EnvAction | None
|
|
TransitionKey.REWARD: 0.0, # float | torch.Tensor | None
|
|
TransitionKey.DONE: False, # bool | torch.Tensor | None
|
|
TransitionKey.TRUNCATED: False, # bool | torch.Tensor | None
|
|
TransitionKey.INFO: {"success": False}, # dict[str, Any] | None
|
|
TransitionKey.COMPLEMENTARY_DATA: { # dict[str, Any] | None
|
|
"task": "pick up the red cube", # Language instruction
|
|
"task_index": 0, # Task identifier
|
|
"index": 42 # Frame index
|
|
}
|
|
}
|
|
```
|
|
|
|
Each key in the transition has a specific purpose:
|
|
|
|
- **OBSERVATION**: All sensor data (images, states, proprioception)
|
|
- **ACTION**: The action to execute or that was executed
|
|
- **REWARD**: Reinforcement learning signal
|
|
- **DONE/TRUNCATED**: Episode boundary indicators
|
|
- **INFO**: Arbitrary metadata
|
|
- **COMPLEMENTARY_DATA**: Task descriptions, indices, padding flags, inter-step data
|
|
|
|
### ProcessorStep: The Building Block
|
|
|
|
A `ProcessorStep` is a single transformation unit that processes transitions. It's an abstract base class with two required methods:
|
|
|
|
```python
|
|
from lerobot.processor import ProcessorStep, EnvTransition
|
|
|
|
class MyProcessorStep(ProcessorStep):
|
|
"""Example processor step - inherit and implement abstract methods."""
|
|
|
|
def __call__(self, transition: EnvTransition) -> EnvTransition:
|
|
"""Transform the transition - REQUIRED abstract method."""
|
|
# Your processing logic here
|
|
return transition
|
|
|
|
def transform_features(self, features):
|
|
"""Declare how this step transforms feature shapes/types - REQUIRED abstract method."""
|
|
return features # Most processors return features unchanged
|
|
```
|
|
|
|
### DataProcessorPipeline: The Generic Orchestrator
|
|
|
|
The `DataProcessorPipeline[TInput, TOutput]` chains multiple `ProcessorStep` instances together with compile-time type safety:
|
|
|
|
```python
|
|
from lerobot.processor import RobotProcessorPipeline, PolicyProcessorPipeline
|
|
|
|
# For robot hardware (unbatched data)
|
|
robot_processor = RobotProcessorPipeline[dict[str, Any], dict[str, Any]](
|
|
steps=[step1, step2, step3],
|
|
name="robot_pipeline"
|
|
)
|
|
|
|
# For model training/inference (batched data)
|
|
policy_processor = PolicyProcessorPipeline[dict[str, Any], dict[str, Any]](
|
|
steps=[step1, step2, step3],
|
|
name="policy_pipeline"
|
|
)
|
|
```
|
|
|
|
## RobotProcessorPipeline vs PolicyProcessorPipeline
|
|
|
|
The key distinction is in the data structures they handle:
|
|
|
|
| Aspect | RobotProcessorPipeline | PolicyProcessorPipeline |
|
|
| --------------- | -------------------------------------------- | ---------------------------------------- |
|
|
| **Input** | `dict[str, Any]` - Individual robot values | `dict[str, Any]` - Batched tensors |
|
|
| **Output** | `dict[str, Any]` - Individual robot commands | `torch.Tensor` - Policy predictions |
|
|
| **Use Case** | Real-time robot control | Model training/inference |
|
|
| **Data Format** | Unbatched, heterogeneous | Batched, homogeneous |
|
|
| **Examples** | `{"joint_1": 0.5}` | `{"observation.state": tensor([[0.5]])}` |
|
|
|
|
**Use `RobotProcessorPipeline`** for robot hardware interfaces:
|
|
|
|
```python
|
|
# Robot data structures: dict[str, Any] for observations and actions
|
|
robot_obs: dict[str, Any] = {
|
|
"joint_1": 0.5, # Individual joint values
|
|
"joint_2": -0.3,
|
|
"camera_0": image_array # Raw camera data
|
|
}
|
|
|
|
robot_action: dict[str, Any] = {
|
|
"joint_1": 0.2, # Target joint positions
|
|
"joint_2": 0.1,
|
|
"gripper": 0.8
|
|
}
|
|
```
|
|
|
|
**Use `PolicyProcessorPipeline`** for model training and batch processing:
|
|
|
|
```python
|
|
# Policy data structures: batch dicts and tensors
|
|
policy_batch: dict[str, Any] = {
|
|
"observation.state": torch.tensor([[0.5, -0.3]]), # Batched states
|
|
"observation.images.camera0": torch.tensor(...), # Batched images
|
|
"action": torch.tensor([[0.2, 0.1, 0.8]]) # Batched actions
|
|
}
|
|
|
|
policy_action: torch.Tensor = torch.tensor([[0.2, 0.1, 0.8]]) # Model output tensor
|
|
```
|
|
|
|
## Converter Functions
|
|
|
|
LeRobot provides converter functions to bridge different data formats:
|
|
|
|
```python
|
|
from lerobot.processor.converters import (
|
|
# Robot hardware converters
|
|
robot_action_to_transition, # Robot dict → EnvTransition
|
|
observation_to_transition, # Robot obs → EnvTransition
|
|
transition_to_robot_action, # EnvTransition → Robot dict
|
|
|
|
# Policy/training converters
|
|
batch_to_transition, # Batch dict → EnvTransition
|
|
transition_to_batch, # EnvTransition → Batch dict
|
|
policy_action_to_transition, # Policy tensor → EnvTransition
|
|
transition_to_policy_action, # EnvTransition → Policy tensor
|
|
|
|
# Utilities
|
|
create_transition, # Build transitions with defaults
|
|
merge_transitions, # Combine multiple transitions
|
|
identity_transition # Pass-through converter
|
|
)
|
|
```
|
|
|
|
## Real-World Examples
|
|
|
|
### Robot Control Pipeline
|
|
|
|
```python
|
|
# Phone teleoperation → Robot control (from examples/phone_to_so100/)
|
|
phone_to_robot = RobotProcessorPipeline[RobotAction, RobotAction](
|
|
steps=[
|
|
MapPhoneActionToRobotAction(platform=PhoneOS.IOS), # Phone → robot targets
|
|
EEReferenceAndDelta(kinematics=solver, ...), # Deltas → absolute pose
|
|
EEBoundsAndSafety(bounds=..., max_step=0.2), # Safety limits
|
|
InverseKinematicsEEToJoints(kinematics=solver), # Pose → joint angles
|
|
GripperVelocityToJoint(motor_names=motors), # Gripper control
|
|
],
|
|
to_transition=robot_action_to_transition,
|
|
to_output=transition_to_robot_action
|
|
)
|
|
|
|
# Usage: phone_action → robot_joints
|
|
phone_input = {"phone.pos": [0.1, 0.2, 0.0], "phone.rot": rotation}
|
|
robot_joints = phone_to_robot(phone_input)
|
|
robot.send_action(robot_joints)
|
|
```
|
|
|
|
### Policy Training Pipeline
|
|
|
|
```python
|
|
# Training data preprocessing (optimized order for GPU performance)
|
|
training_preprocessor = PolicyProcessorPipeline[dict[str, Any], dict[str, Any]](
|
|
steps=[
|
|
RenameObservationsProcessorStep(rename_map={}), # Standardize keys
|
|
AddBatchDimensionProcessorStep(), # Add batch dims
|
|
TokenizerProcessorStep(tokenizer_name="...", ...), # Tokenize language
|
|
DeviceProcessorStep(device="cuda"), # Move to GPU first ⚡
|
|
NormalizerProcessorStep(features=..., stats=...), # Normalize on GPU ⚡
|
|
]
|
|
)
|
|
|
|
# Model output postprocessing
|
|
training_postprocessor = PolicyProcessorPipeline[torch.Tensor, torch.Tensor](
|
|
steps=[
|
|
DeviceProcessorStep(device="cpu"), # Move to CPU
|
|
UnnormalizerProcessorStep(features=..., stats=...), # Denormalize
|
|
]
|
|
)
|
|
```
|
|
|
|
### Mixed Robot + Policy Pipeline
|
|
|
|
```python
|
|
# Real deployment: Robot sensors → Model → Robot commands
|
|
with torch.no_grad():
|
|
while not done:
|
|
# 1. Get robot observation (unbatched)
|
|
raw_obs = robot.get_observation() # dict[str, Any]
|
|
|
|
# 2. Process for policy (add batching, normalize)
|
|
policy_input = policy_preprocessor(raw_obs) # Batched dict
|
|
|
|
# 3. Run model
|
|
policy_output = policy.select_action(policy_input) # Policy tensor
|
|
|
|
# 4. Postprocess for robot (denormalize, convert to dict)
|
|
robot_action = policy_postprocessor(policy_output) # dict[str, Any]
|
|
|
|
# 5. Send to robot
|
|
robot.send_action(robot_action)
|
|
```
|
|
|
|
## Feature Contracts: Shape and Type Transformation
|
|
|
|
Processors don't just transform data - they can also **change the data structure itself**. The `transform_features()` method declares these changes, which is crucial for dataset recording and policy creation.
|
|
|
|
### Why Feature Contracts Matter
|
|
|
|
When building datasets or policies, LeRobot needs to know:
|
|
|
|
- **What data fields will exist** after processing
|
|
- **What shapes and types** each field will have
|
|
- **How to configure models** for the expected data structure
|
|
|
|
```python
|
|
# Example: A processor that adds velocity to observations
|
|
class VelocityProcessor(ObservationProcessorStep):
|
|
def observation(self, obs):
|
|
new_obs = obs.copy()
|
|
if "observation.state" in obs:
|
|
# Add computed velocity field
|
|
new_obs["observation.velocity"] = self._compute_velocity(obs["observation.state"])
|
|
return new_obs
|
|
|
|
def transform_features(self, features):
|
|
"""Declare the new velocity field we're adding."""
|
|
if PipelineFeatureType.OBSERVATION in features:
|
|
# Add velocity feature with same shape as state
|
|
state_feature = features[PipelineFeatureType.OBSERVATION].get("observation.state")
|
|
if state_feature:
|
|
features[PipelineFeatureType.OBSERVATION]["observation.velocity"] = PolicyFeature(
|
|
type=FeatureType.STATE,
|
|
shape=state_feature.shape # Same shape as position
|
|
)
|
|
return features
|
|
```
|
|
|
|
### Real Examples from LeRobot
|
|
|
|
**Phone Action Mapping** - Transforms action structure:
|
|
|
|
```python
|
|
# Input features: {"phone.pos": (3,), "phone.rot": (4,), "phone.enabled": (1,)}
|
|
# Output features: {"target_x": (1,), "target_y": (1,), ..., "gripper": (1,)}
|
|
|
|
def transform_features(self, features):
|
|
# Remove phone-specific keys
|
|
features[PipelineFeatureType.ACTION].pop("phone.pos", None)
|
|
features[PipelineFeatureType.ACTION].pop("phone.rot", None)
|
|
|
|
# Add robot target keys
|
|
features[PipelineFeatureType.ACTION]["target_x"] = PolicyFeature(type=FeatureType.ACTION, shape=(1,))
|
|
features[PipelineFeatureType.ACTION]["target_y"] = PolicyFeature(type=FeatureType.ACTION, shape=(1,))
|
|
# ... more target fields
|
|
return features
|
|
```
|
|
|
|
**Forward Kinematics** - Adds computed observations:
|
|
|
|
```python
|
|
# Input: Joint positions
|
|
# Output: Joint positions + End-effector pose
|
|
|
|
def transform_features(self, features):
|
|
# Add end-effector pose features computed from joints
|
|
for axis in ["x", "y", "z", "wx", "wy", "wz"]:
|
|
features[PipelineFeatureType.OBSERVATION][f"observation.state.ee.{axis}"] = PolicyFeature(
|
|
type=FeatureType.STATE, shape=(1,)
|
|
)
|
|
return features
|
|
```
|
|
|
|
**Tokenization** - Adds language features:
|
|
|
|
```python
|
|
# Input: Text in complementary_data
|
|
# Output: Token IDs and attention mask in observations
|
|
|
|
def transform_features(self, features):
|
|
features[PipelineFeatureType.OBSERVATION]["observation.language.tokens"] = PolicyFeature(
|
|
type=FeatureType.LANGUAGE, shape=(self.max_length,)
|
|
)
|
|
features[PipelineFeatureType.OBSERVATION]["observation.language.attention_mask"] = PolicyFeature(
|
|
type=FeatureType.LANGUAGE, shape=(self.max_length,)
|
|
)
|
|
return features
|
|
```
|
|
|
|
### Feature Aggregation in Practice
|
|
|
|
```python
|
|
from lerobot.datasets.pipeline_features import aggregate_pipeline_dataset_features
|
|
|
|
# Start with robot's raw features
|
|
initial_features = create_initial_features(
|
|
observation=robot.observation_features, # {"joint_1.pos": float, "camera_0": (480,640,3)}
|
|
action=robot.action_features # {"joint_1.pos": float, "gripper.pos": float}
|
|
)
|
|
|
|
# Apply processor pipeline to compute final features
|
|
final_features = aggregate_pipeline_dataset_features(
|
|
pipeline=my_processor_pipeline,
|
|
initial_features=initial_features,
|
|
use_videos=True
|
|
)
|
|
|
|
# Result: Complete feature specification for dataset/policy
|
|
# {
|
|
# "observation.state": {"shape": (7,), "dtype": "float32"},
|
|
# "observation.images.camera_0": {"shape": (3, 480, 640), "dtype": "uint8"},
|
|
# "observation.velocity": {"shape": (7,), "dtype": "float32"}, # Added by processor!
|
|
# "action": {"shape": (7,), "dtype": "float32"}
|
|
# }
|
|
|
|
# Use for dataset creation
|
|
dataset = LeRobotDataset.create(
|
|
repo_id="my_dataset",
|
|
features=final_features, # Knows exactly what data to expect
|
|
...
|
|
)
|
|
```
|
|
|
|
## Common Processor Steps
|
|
|
|
LeRobot provides many registered processor steps. Here are the most commonly used core processors:
|
|
|
|
### Essential Processors
|
|
|
|
- **`normalizer_processor`**: Normalize observations/actions using dataset statistics (mean/std or min/max)
|
|
- **`device_processor`**: Move tensors to CPU/GPU with optional dtype conversion
|
|
- **`to_batch_processor`**: Add batch dimensions to transitions for model compatibility
|
|
- **`rename_observations_processor`**: Rename observation keys using mapping dictionaries
|
|
- **`tokenizer_processor`**: Tokenize natural language task descriptions into tokens and attention masks
|
|
|
|
## Performance Tips
|
|
|
|
**🚀 Critical Optimization**: Always move data to GPU **before** normalization for significant speedups:
|
|
|
|
```python
|
|
# ✅ FAST: GPU normalization
|
|
steps=[
|
|
DeviceProcessorStep(device="cuda"), # Move to GPU first
|
|
NormalizerProcessorStep(...) # Normalize on GPU - much faster!
|
|
]
|
|
|
|
# ❌ SLOW: CPU normalization
|
|
steps=[
|
|
NormalizerProcessorStep(...), # Normalize on CPU - slow
|
|
DeviceProcessorStep(device="cuda") # Move to GPU after
|
|
]
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
- **[Implement Your Own Processor](implement_your_own_processor.mdx)** - Create custom processor steps
|
|
- **[Debug Your Pipeline](debug_processor_pipeline.mdx)** - Troubleshoot and optimize pipelines
|
|
- **[Processors for Robots and Teleoperators](processors_robots_teleop.mdx)** - Real-world integration patterns
|
|
|
|
## Summary
|
|
|
|
Processors solve the data translation problem in robotics by providing:
|
|
|
|
- **Modular transformations**: Composable, reusable processing steps
|
|
- **Type safety**: Generic pipelines with compile-time checking
|
|
- **Performance optimization**: GPU-accelerated operations
|
|
- **Robot/Policy distinction**: Separate pipelines for different data structures
|
|
- **Comprehensive ecosystem**: 30+ registered processors for common tasks
|
|
|
|
The key insight: `RobotProcessorPipeline` handles unbatched robot hardware data, while `PolicyProcessorPipeline` handles batched model data. Choose the right tool for your data structure!
|