mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-22 03:59:42 +00:00
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
This commit is contained in:
committed by
Adil Zouitine
parent
a14af62ee3
commit
7124d471c1
@@ -3,16 +3,19 @@
|
|||||||
In robotics, there's a fundamental mismatch between the data that robots and humans produce and what machine learning models expect. This creates several translation challenges:
|
In robotics, there's a fundamental mismatch between the data that robots and humans produce and what machine learning models expect. This creates several translation challenges:
|
||||||
|
|
||||||
**Raw Robot Data → Model Input:**
|
**Raw Robot Data → Model Input:**
|
||||||
|
|
||||||
- Robots output raw sensor data (camera images, joint positions, force readings) that need normalization, batching, and device placement before models can process them
|
- Robots output raw sensor data (camera images, joint positions, force readings) that need normalization, batching, and device placement before models can process them
|
||||||
- Language instructions from humans ("pick up the red cube") must be tokenized into numerical representations
|
- Language instructions from humans ("pick up the red cube") must be tokenized into numerical representations
|
||||||
- Different robots use different coordinate systems and units that need standardization
|
- Different robots use different coordinate systems and units that need standardization
|
||||||
|
|
||||||
**Model Output → Robot Commands:**
|
**Model Output → Robot Commands:**
|
||||||
|
|
||||||
- Models might output end-effector positions, but robots need joint-space commands
|
- Models might output end-effector positions, but robots need joint-space commands
|
||||||
- Teleoperators (like gamepads) produce relative movements (delta positions), but robots expect absolute commands
|
- Teleoperators (like gamepads) produce relative movements (delta positions), but robots expect absolute commands
|
||||||
- Model predictions are often normalized and need to be converted back to real-world scales
|
- Model predictions are often normalized and need to be converted back to real-world scales
|
||||||
|
|
||||||
**Cross-Domain Translation:**
|
**Cross-Domain Translation:**
|
||||||
|
|
||||||
- Training data from one robot setup needs adaptation for deployment on different hardware
|
- Training data from one robot setup needs adaptation for deployment on different hardware
|
||||||
- Models trained with specific camera configurations must work with new camera arrangements
|
- Models trained with specific camera configurations must work with new camera arrangements
|
||||||
- Datasets with different naming conventions need harmonization
|
- Datasets with different naming conventions need harmonization
|
||||||
@@ -24,6 +27,7 @@ Processors are the data transformation backbone of LeRobot. They handle all the
|
|||||||
## What are Processors?
|
## What are Processors?
|
||||||
|
|
||||||
In robotics, data comes in many forms - images from cameras, joint positions from sensors, text instructions from users, and more. Each type of data requires specific transformations before a model can use it effectively. Models need this data to be:
|
In robotics, data comes in many forms - images from cameras, joint positions from sensors, text instructions from users, and more. Each type of data requires specific transformations before a model can use it effectively. Models need this data to be:
|
||||||
|
|
||||||
- **Normalized**: Scaled to appropriate ranges for neural network processing
|
- **Normalized**: Scaled to appropriate ranges for neural network processing
|
||||||
- **Batched**: Organized with proper dimensions for batch processing
|
- **Batched**: Organized with proper dimensions for batch processing
|
||||||
- **Tokenized**: Text converted to numerical representations
|
- **Tokenized**: Text converted to numerical representations
|
||||||
@@ -63,6 +67,7 @@ transition: EnvTransition = {
|
|||||||
```
|
```
|
||||||
|
|
||||||
Each key in the transition has a specific purpose:
|
Each key in the transition has a specific purpose:
|
||||||
|
|
||||||
- **OBSERVATION**: All sensor data (images, states, proprioception)
|
- **OBSERVATION**: All sensor data (images, states, proprioception)
|
||||||
- **ACTION**: The action to execute or that was executed
|
- **ACTION**: The action to execute or that was executed
|
||||||
- **REWARD**: Reinforcement learning signal
|
- **REWARD**: Reinforcement learning signal
|
||||||
@@ -82,27 +87,27 @@ import torch
|
|||||||
|
|
||||||
class MyProcessorStep:
|
class MyProcessorStep:
|
||||||
"""Example processor step interface - all methods must be implemented."""
|
"""Example processor step interface - all methods must be implemented."""
|
||||||
|
|
||||||
def __call__(self, transition: EnvTransition) -> EnvTransition:
|
def __call__(self, transition: EnvTransition) -> EnvTransition:
|
||||||
"""Transform the transition - this is the main processing logic."""
|
"""Transform the transition - this is the main processing logic."""
|
||||||
raise NotImplementedError
|
raise NotImplementedError
|
||||||
|
|
||||||
def feature_contract(self, features: dict[str, PolicyFeature]) -> dict[str, PolicyFeature]:
|
def feature_contract(self, features: dict[str, PolicyFeature]) -> dict[str, PolicyFeature]:
|
||||||
"""Declare how this step transforms feature shapes/types."""
|
"""Declare how this step transforms feature shapes/types."""
|
||||||
raise NotImplementedError
|
raise NotImplementedError
|
||||||
|
|
||||||
def get_config(self) -> dict[str, Any]:
|
def get_config(self) -> dict[str, Any]:
|
||||||
"""Return JSON-serializable configuration for saving/loading."""
|
"""Return JSON-serializable configuration for saving/loading."""
|
||||||
raise NotImplementedError
|
raise NotImplementedError
|
||||||
|
|
||||||
def state_dict(self) -> dict[str, torch.Tensor]:
|
def state_dict(self) -> dict[str, torch.Tensor]:
|
||||||
"""Return any learnable parameters (tensors only)."""
|
"""Return any learnable parameters (tensors only)."""
|
||||||
raise NotImplementedError
|
raise NotImplementedError
|
||||||
|
|
||||||
def load_state_dict(self, state: dict[str, torch.Tensor]) -> None:
|
def load_state_dict(self, state: dict[str, torch.Tensor]) -> None:
|
||||||
"""Load learnable parameters from saved state."""
|
"""Load learnable parameters from saved state."""
|
||||||
raise NotImplementedError
|
raise NotImplementedError
|
||||||
|
|
||||||
def reset(self) -> None:
|
def reset(self) -> None:
|
||||||
"""Reset any internal state between episodes."""
|
"""Reset any internal state between episodes."""
|
||||||
raise NotImplementedError
|
raise NotImplementedError
|
||||||
@@ -123,7 +128,7 @@ processor = RobotProcessor(
|
|||||||
step3 # Third transformation
|
step3 # Third transformation
|
||||||
],
|
],
|
||||||
name="my_preprocessing_pipeline",
|
name="my_preprocessing_pipeline",
|
||||||
|
|
||||||
# Optional: Custom converters for input/output formats
|
# Optional: Custom converters for input/output formats
|
||||||
to_transition=custom_batch_to_transition, # How to convert batch dict → EnvTransition
|
to_transition=custom_batch_to_transition, # How to convert batch dict → EnvTransition
|
||||||
to_output=custom_transition_to_batch # How to convert EnvTransition → output format
|
to_output=custom_transition_to_batch # How to convert EnvTransition → output format
|
||||||
@@ -146,7 +151,6 @@ output = processor(transition) # Stays as EnvTransition throughout
|
|||||||
The `to_transition` and `to_output` converters enable seamless integration with existing codebases.
|
The `to_transition` and `to_output` converters enable seamless integration with existing codebases.
|
||||||
By default, they handle the standard LeRobot batch format, but you can customize them for different data structures.
|
By default, they handle the standard LeRobot batch format, but you can customize them for different data structures.
|
||||||
|
|
||||||
|
|
||||||
### Data Format Conversion
|
### Data Format Conversion
|
||||||
|
|
||||||
Different data sources have different formats, but processors need a unified `EnvTransition` structure internally.
|
Different data sources have different formats, but processors need a unified `EnvTransition` structure internally.
|
||||||
@@ -191,7 +195,7 @@ processor = RobotProcessor(
|
|||||||
|
|
||||||
## Common Processor Steps
|
## Common Processor Steps
|
||||||
|
|
||||||
LeRobot provides a rich set of pre-built processor steps for common transformations.
|
LeRobot provides a rich set of pre-built processor steps for common transformations.
|
||||||
Let's explore each in detail:
|
Let's explore each in detail:
|
||||||
|
|
||||||
### Data Normalization
|
### Data Normalization
|
||||||
@@ -351,6 +355,7 @@ Different datasets and models may use different naming conventions.
|
|||||||
The `RenameProcessor` solves this mismatch:
|
The `RenameProcessor` solves this mismatch:
|
||||||
|
|
||||||
**Why is this useful?**
|
**Why is this useful?**
|
||||||
|
|
||||||
- When loading a model trained on a different dataset with different key names
|
- When loading a model trained on a different dataset with different key names
|
||||||
- When using foundation models that expect specific key naming conventions
|
- When using foundation models that expect specific key naming conventions
|
||||||
- When standardizing datasets from different sources
|
- When standardizing datasets from different sources
|
||||||
@@ -414,17 +419,17 @@ preprocessor = RobotProcessor(
|
|||||||
"observation.images.wrist": "observation.images.camera1"
|
"observation.images.wrist": "observation.images.camera1"
|
||||||
}
|
}
|
||||||
),
|
),
|
||||||
|
|
||||||
# 2. Add batch dimensions for model
|
# 2. Add batch dimensions for model
|
||||||
ToBatchProcessor(),
|
ToBatchProcessor(),
|
||||||
|
|
||||||
# 3. Tokenize language instructions if present
|
# 3. Tokenize language instructions if present
|
||||||
TokenizerProcessor(
|
TokenizerProcessor(
|
||||||
tokenizer_name="google/paligemma-3b-pt-224",
|
tokenizer_name="google/paligemma-3b-pt-224",
|
||||||
max_length=64,
|
max_length=64,
|
||||||
task_key="task"
|
task_key="task"
|
||||||
),
|
),
|
||||||
|
|
||||||
# 4. Normalize numerical data
|
# 4. Normalize numerical data
|
||||||
NormalizerProcessor(
|
NormalizerProcessor(
|
||||||
features=policy_features,
|
features=policy_features,
|
||||||
@@ -435,7 +440,7 @@ preprocessor = RobotProcessor(
|
|||||||
},
|
},
|
||||||
stats=dataset.meta.stats
|
stats=dataset.meta.stats
|
||||||
),
|
),
|
||||||
|
|
||||||
# 5. Move to GPU and convert to half precision
|
# 5. Move to GPU and convert to half precision
|
||||||
DeviceProcessor(
|
DeviceProcessor(
|
||||||
device="cuda:0",
|
device="cuda:0",
|
||||||
@@ -450,7 +455,7 @@ postprocessor = RobotProcessor(
|
|||||||
steps=[
|
steps=[
|
||||||
# 1. Move back to CPU for robot hardware
|
# 1. Move back to CPU for robot hardware
|
||||||
DeviceProcessor(device="cpu"),
|
DeviceProcessor(device="cpu"),
|
||||||
|
|
||||||
# 2. Denormalize actions to original scale
|
# 2. Denormalize actions to original scale
|
||||||
UnnormalizerProcessor(
|
UnnormalizerProcessor(
|
||||||
features=policy_features,
|
features=policy_features,
|
||||||
@@ -489,15 +494,15 @@ for epoch in range(num_epochs):
|
|||||||
for batch in dataloader:
|
for batch in dataloader:
|
||||||
# Preprocess batch
|
# Preprocess batch
|
||||||
processed_batch = preprocessor(batch)
|
processed_batch = preprocessor(batch)
|
||||||
|
|
||||||
# Forward pass - returns loss and optional metrics
|
# Forward pass - returns loss and optional metrics
|
||||||
loss, metrics = model.forward(processed_batch)
|
loss, metrics = model.forward(processed_batch)
|
||||||
|
|
||||||
# Backward pass
|
# Backward pass
|
||||||
optimizer.zero_grad()
|
optimizer.zero_grad()
|
||||||
loss.backward()
|
loss.backward()
|
||||||
optimizer.step()
|
optimizer.step()
|
||||||
|
|
||||||
# Log metrics if available
|
# Log metrics if available
|
||||||
if metrics:
|
if metrics:
|
||||||
wandb.log(metrics)
|
wandb.log(metrics)
|
||||||
@@ -515,7 +520,7 @@ preprocessor = RobotProcessor.from_pretrained(
|
|||||||
config_filename="robot_preprocessor.json"
|
config_filename="robot_preprocessor.json"
|
||||||
)
|
)
|
||||||
postprocessor = RobotProcessor.from_pretrained(
|
postprocessor = RobotProcessor.from_pretrained(
|
||||||
"path/to/model",
|
"path/to/model",
|
||||||
config_filename="robot_postprocessor.json"
|
config_filename="robot_postprocessor.json"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -534,34 +539,34 @@ with torch.no_grad():
|
|||||||
while not done:
|
while not done:
|
||||||
# Get observation from robot
|
# Get observation from robot
|
||||||
observation = robot.get_observation()
|
observation = robot.get_observation()
|
||||||
|
|
||||||
# Build dataset-compatible frame
|
# Build dataset-compatible frame
|
||||||
observation_frame = build_dataset_frame(
|
observation_frame = build_dataset_frame(
|
||||||
dataset.features,
|
dataset.features,
|
||||||
observation,
|
observation,
|
||||||
prefix="observation"
|
prefix="observation"
|
||||||
)
|
)
|
||||||
|
|
||||||
# Add task instruction to complementary data
|
# Add task instruction to complementary data
|
||||||
observation_frame["task"] = "pick up the red cube"
|
observation_frame["task"] = "pick up the red cube"
|
||||||
|
|
||||||
# Preprocess for model
|
# Preprocess for model
|
||||||
model_input = preprocessor(observation_frame)
|
model_input = preprocessor(observation_frame)
|
||||||
|
|
||||||
# Run policy
|
# Run policy
|
||||||
raw_action = policy.select_action(model_input)
|
raw_action = policy.select_action(model_input)
|
||||||
|
|
||||||
# Postprocess action
|
# Postprocess action
|
||||||
action_transition = {TransitionKey.ACTION: raw_action}
|
action_transition = {TransitionKey.ACTION: raw_action}
|
||||||
processed = postprocessor(action_transition)
|
processed = postprocessor(action_transition)
|
||||||
action = processed[TransitionKey.ACTION]
|
action = processed[TransitionKey.ACTION]
|
||||||
|
|
||||||
# Convert to robot action format
|
# Convert to robot action format
|
||||||
robot_action = {
|
robot_action = {
|
||||||
key: action[i].item()
|
key: action[i].item()
|
||||||
for i, key in enumerate(robot.action_features)
|
for i, key in enumerate(robot.action_features)
|
||||||
}
|
}
|
||||||
|
|
||||||
# Execute on robot
|
# Execute on robot
|
||||||
robot.send_action(robot_action)
|
robot.send_action(robot_action)
|
||||||
```
|
```
|
||||||
@@ -632,10 +637,10 @@ processor = RobotProcessor.from_pretrained(
|
|||||||
overrides={
|
overrides={
|
||||||
# Change device for different hardware
|
# Change device for different hardware
|
||||||
"device_processor": {"device": "cuda:1"},
|
"device_processor": {"device": "cuda:1"},
|
||||||
|
|
||||||
# Update statistics for new dataset
|
# Update statistics for new dataset
|
||||||
"normalizer_processor": {"stats": new_dataset.meta.stats},
|
"normalizer_processor": {"stats": new_dataset.meta.stats},
|
||||||
|
|
||||||
# Provide non-serializable objects (like tokenizers)
|
# Provide non-serializable objects (like tokenizers)
|
||||||
"tokenizer_processor": {"tokenizer": custom_tokenizer}
|
"tokenizer_processor": {"tokenizer": custom_tokenizer}
|
||||||
}
|
}
|
||||||
@@ -669,16 +674,16 @@ from lerobot.processor.pipeline import ProcessorStepRegistry, ObservationProcess
|
|||||||
@ProcessorStepRegistry.register("my_company/gaussian_noise")
|
@ProcessorStepRegistry.register("my_company/gaussian_noise")
|
||||||
class GaussianNoiseProcessor(ObservationProcessor):
|
class GaussianNoiseProcessor(ObservationProcessor):
|
||||||
"""Add Gaussian noise to observations for robustness training."""
|
"""Add Gaussian noise to observations for robustness training."""
|
||||||
|
|
||||||
noise_std: float = 0.01
|
noise_std: float = 0.01
|
||||||
training_only: bool = True
|
training_only: bool = True
|
||||||
is_training: bool = True
|
is_training: bool = True
|
||||||
|
|
||||||
def observation(self, observation):
|
def observation(self, observation):
|
||||||
"""Add noise to observation tensors."""
|
"""Add noise to observation tensors."""
|
||||||
if not self.is_training and self.training_only:
|
if not self.is_training and self.training_only:
|
||||||
return observation
|
return observation
|
||||||
|
|
||||||
noisy_obs = {}
|
noisy_obs = {}
|
||||||
for key, value in observation.items():
|
for key, value in observation.items():
|
||||||
if isinstance(value, torch.Tensor) and "image" not in key:
|
if isinstance(value, torch.Tensor) and "image" not in key:
|
||||||
@@ -687,9 +692,9 @@ class GaussianNoiseProcessor(ObservationProcessor):
|
|||||||
noisy_obs[key] = value + noise
|
noisy_obs[key] = value + noise
|
||||||
else:
|
else:
|
||||||
noisy_obs[key] = value
|
noisy_obs[key] = value
|
||||||
|
|
||||||
return noisy_obs
|
return noisy_obs
|
||||||
|
|
||||||
def get_config(self):
|
def get_config(self):
|
||||||
return {
|
return {
|
||||||
"noise_std": self.noise_std,
|
"noise_std": self.noise_std,
|
||||||
@@ -716,15 +721,15 @@ from lerobot.processor import ActionProcessor
|
|||||||
@ProcessorStepRegistry.register("my_company/action_clipper")
|
@ProcessorStepRegistry.register("my_company/action_clipper")
|
||||||
class ActionClipProcessor(ActionProcessor):
|
class ActionClipProcessor(ActionProcessor):
|
||||||
"""Clip actions to safe ranges."""
|
"""Clip actions to safe ranges."""
|
||||||
|
|
||||||
min_value: float = -1.0
|
min_value: float = -1.0
|
||||||
max_value: float = 1.0
|
max_value: float = 1.0
|
||||||
|
|
||||||
def action(self, action):
|
def action(self, action):
|
||||||
"""Process only the action component."""
|
"""Process only the action component."""
|
||||||
# No need to handle transition dict - base class does it
|
# No need to handle transition dict - base class does it
|
||||||
return torch.clamp(action, self.min_value, self.max_value)
|
return torch.clamp(action, self.min_value, self.max_value)
|
||||||
|
|
||||||
def get_config(self):
|
def get_config(self):
|
||||||
return {"min_value": self.min_value, "max_value": self.max_value}
|
return {"min_value": self.min_value, "max_value": self.max_value}
|
||||||
```
|
```
|
||||||
@@ -776,7 +781,7 @@ Use `step_through()` for detailed debugging of the transformation pipeline:
|
|||||||
# Inspect data at each transformation stage
|
# Inspect data at each transformation stage
|
||||||
for i, intermediate in enumerate(processor.step_through(data)):
|
for i, intermediate in enumerate(processor.step_through(data)):
|
||||||
print(f"\n=== After step {i} ===")
|
print(f"\n=== After step {i} ===")
|
||||||
|
|
||||||
# Check observation shapes
|
# Check observation shapes
|
||||||
obs = intermediate.get(TransitionKey.OBSERVATION)
|
obs = intermediate.get(TransitionKey.OBSERVATION)
|
||||||
if obs:
|
if obs:
|
||||||
@@ -786,7 +791,7 @@ for i, intermediate in enumerate(processor.step_through(data)):
|
|||||||
f"dtype={value.dtype}, "
|
f"dtype={value.dtype}, "
|
||||||
f"device={value.device}, "
|
f"device={value.device}, "
|
||||||
f"range=[{value.min():.3f}, {value.max():.3f}]")
|
f"range=[{value.min():.3f}, {value.max():.3f}]")
|
||||||
|
|
||||||
# Check action if present
|
# Check action if present
|
||||||
action = intermediate.get(TransitionKey.ACTION)
|
action = intermediate.get(TransitionKey.ACTION)
|
||||||
if action is not None and isinstance(action, torch.Tensor):
|
if action is not None and isinstance(action, torch.Tensor):
|
||||||
@@ -818,6 +823,7 @@ variant_processor = RobotProcessor(
|
|||||||
### 1. Order Matters
|
### 1. Order Matters
|
||||||
|
|
||||||
The sequence of processors is crucial. Follow this general order:
|
The sequence of processors is crucial. Follow this general order:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# Preprocessing: Raw → Model-ready
|
# Preprocessing: Raw → Model-ready
|
||||||
1. Rename (standardize keys)
|
1. Rename (standardize keys)
|
||||||
@@ -851,6 +857,7 @@ print(ProcessorStepRegistry.list()) # See all registered processors
|
|||||||
### 3. Common Pitfalls and Solutions
|
### 3. Common Pitfalls and Solutions
|
||||||
|
|
||||||
**Tensor Device Mismatch:**
|
**Tensor Device Mismatch:**
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# Problem: RuntimeError: Expected all tensors on same device
|
# Problem: RuntimeError: Expected all tensors on same device
|
||||||
# Solution: Ensure DeviceProcessor is in pipeline
|
# Solution: Ensure DeviceProcessor is in pipeline
|
||||||
@@ -863,6 +870,7 @@ preprocessor = RobotProcessor(
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Missing Statistics:**
|
**Missing Statistics:**
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# Problem: NormalizerProcessor has no stats
|
# Problem: NormalizerProcessor has no stats
|
||||||
# Solution 1: Compute stats from dataset
|
# Solution 1: Compute stats from dataset
|
||||||
@@ -897,6 +905,6 @@ Processors are the unsung heroes of robotics pipelines, handling the critical tr
|
|||||||
- Create custom transformations for specialized tasks
|
- Create custom transformations for specialized tasks
|
||||||
|
|
||||||
Remember: good preprocessing is often the difference between a model that works in theory
|
Remember: good preprocessing is often the difference between a model that works in theory
|
||||||
and one that works in practice!
|
and one that works in practice!
|
||||||
The modular pipeline approach ensures your transformations are testable, reproducible,
|
The modular pipeline approach ensures your transformations are testable, reproducible,
|
||||||
and portable across different robots and environments.
|
and portable across different robots and environments.
|
||||||
|
|||||||
Reference in New Issue
Block a user