diff --git a/src/lerobot/policies/xvla/IMPLEMENTATION_SUMMARY.md b/src/lerobot/policies/xvla/IMPLEMENTATION_SUMMARY.md deleted file mode 100644 index 2874f17de..000000000 --- a/src/lerobot/policies/xvla/IMPLEMENTATION_SUMMARY.md +++ /dev/null @@ -1,165 +0,0 @@ -# XVLA Custom Processor Steps - Implementation Summary - -## Overview -Implemented three custom processor steps for XVLA that encapsulate the preprocessing and postprocessing logic previously scattered in `lerobot_eval.py` (lines 165-184). - -## Files Modified - -### 1. `/src/lerobot/policies/xvla/processor_xvla.py` -**Changes:** -- Added imports: `dataclass`, `numpy`, `Rotate6D_to_AxisAngle`, processor core types -- Implemented 3 new processor step classes (all registered with `ProcessorStepRegistry`) - -**New Classes:** - -#### `XVLAImageScaleProcessorStep` -- **Registry Name:** `xvla_image_scale` -- **Purpose:** Scales image observations by 255 (converts [0,1] to [0,255]) -- **Configuration:** - - `image_keys: list[str] | None` - Auto-detects or specify image keys -- **Location:** Lines 93-140 - -#### `XVLAAddDomainIdProcessorStep` -- **Registry Name:** `xvla_add_domain_id` -- **Purpose:** Adds domain_id tensor to complementary data -- **Configuration:** - - `domain_id: int = 3` - Domain identifier - - `device: str = "cuda"` - Tensor device -- **Location:** Lines 143-192 - -#### `XVLARotation6DToAxisAngleProcessorStep` -- **Registry Name:** `xvla_rotation_6d_to_axis_angle` -- **Purpose:** Converts 6D rotation to axis-angle and reorganizes action dimensions - - Input: [eef(3), rotation_6d(6), gripper(1)] = 10D - - Output: [eef(3), axis_angle(3), gripper(1)] = 7D -- **Configuration:** - - `expected_action_dim: int = 10` -- **Location:** Lines 195-255 - -### 2. `/src/lerobot/policies/xvla/README_PROCESSORS.md` (NEW) -Comprehensive documentation covering: -- Processor step descriptions and configurations -- Integration examples for preprocessing/postprocessing pipelines -- Before/after comparison showing simplified evaluation code -- JSON/YAML configuration examples -- Reference to Groot processor patterns - -## Key Features - -### 1. **Registry-Based Architecture** -All processors are registered with `@ProcessorStepRegistry.register()`, enabling: -- Instantiation from configuration files -- Serialization/deserialization with policies -- Easy discovery and debugging - -### 2. **Proper ProcessorStep Interface** -Each processor implements: -- `__call__(transition: EnvTransition) -> EnvTransition` - Main processing logic -- `transform_features(features) -> features` - Feature contract declaration -- `get_config() -> dict` - Serializable configuration - -### 3. **Safe Data Handling** -- All processors use `transition.copy()` to avoid side effects -- Proper handling of missing/None values -- Device-aware tensor operations - -### 4. **Configurable and Reusable** -- All parameters exposed in `get_config()` -- Can be customized per deployment -- Works with any XVLA model configuration - -## Usage Impact - -### Before (from lerobot_eval.py): -```python -# Lines 166-184 - scattered preprocessing/postprocessing -observation[f"observation.images.image"] = observation[f"observation.images.image"] * 255 -observation[f"observation.images.image2"] = observation[f"observation.images.image2"] * 255 -observation = add_envs_task(env, observation) -observation = preprocessor(observation) -observation["domain_id"] = torch.tensor([int(3)], dtype=torch.long).to("cuda") - -with torch.inference_mode(): - action = policy.select_action(observation).to("cpu").numpy() -target_eef = action[:, :3] -target_axis = Rotate6D_to_AxisAngle(action[:, 3:9]) -target_act = action[:, 9:10] -action_numpy = np.concatenate([target_eef, target_axis, target_act], axis=-1) -``` - -### After (with custom processors): -```python -# Clean and simple - processors encapsulate all the logic -observation = add_envs_task(env, observation) -observation = preprocessor(observation) # Includes image scaling + domain_id - -with torch.inference_mode(): - action = policy.select_action(observation) -action = postprocessor(action) # Includes rotation conversion + device transfer -action_numpy = action.numpy() -``` - -## Design Patterns Followed - -1. **Groot Processor Reference:** Followed same patterns as `processor_groot.py`: - - Dataclass-based configuration - - Registry registration - - State management via `get_config()` - - Proper transition handling - -2. **LeRobot Processor Guidelines:** (from `implement_your_own_processor.mdx`): - - Safe data handling with `copy()` - - Clear error messages - - Device/dtype awareness - - Feature contract declaration - -3. **Pipeline Integration:** - - Works seamlessly with `PolicyProcessorPipeline` - - Automatic dict ↔ EnvTransition conversion - - Composable with other processor steps - -## Benefits - -1. **Cleaner Code:** Evaluation loop is now much simpler -2. **Maintainable:** Processing logic is centralized and well-documented -3. **Configurable:** All parameters can be adjusted via config files -4. **Reusable:** Can be used across different XVLA deployments -5. **Testable:** Each processor can be tested independently -6. **Serializable:** Processors save/load with the policy - -## Testing Recommendations - -1. **Unit Tests:** - - Test each processor with sample transitions - - Verify image scaling (multiply by 255) - - Verify domain_id addition and device placement - - Verify rotation conversion accuracy - -2. **Integration Tests:** - - Test full preprocessing pipeline - - Test full postprocessing pipeline - - Verify evaluation loop still works correctly - - Test with different domain_ids and devices - -3. **Configuration Tests:** - - Test loading processors from config - - Test serialization/deserialization - - Test overrides mechanism - -## Next Steps - -1. **Update XVLA Policy Factory:** Optionally add these processors to the default pipeline in `make_xvla_pre_post_processors()` or document how to add them via config - -2. **Update lerobot_eval.py:** Simplify the evaluation code to use the new processors - -3. **Add Configuration Examples:** Create sample config files showing processor integration - -4. **Add Tests:** Implement unit and integration tests for the new processors - -## Notes - -- No changes made to `make_xvla_pre_post_processors()` as requested -- Processors are available but not automatically included (must be added via config) -- All processors follow LeRobot conventions and best practices -- Compatible with existing XVLA model configurations - diff --git a/src/lerobot/policies/xvla/QUICK_START.md b/src/lerobot/policies/xvla/QUICK_START.md deleted file mode 100644 index f78f9e9c6..000000000 --- a/src/lerobot/policies/xvla/QUICK_START.md +++ /dev/null @@ -1,141 +0,0 @@ -# XVLA Custom Processors - Quick Start - -## What Was Implemented - -Three custom processor steps that simplify XVLA evaluation by encapsulating preprocessing and postprocessing logic: - -``` -┌─────────────────────────────────────────────────────────────┐ -│ PREPROCESSING PIPELINE │ -├─────────────────────────────────────────────────────────────┤ -│ 1. RenameObservationsProcessorStep │ -│ 2. AddBatchDimensionProcessorStep │ -│ 3. XVLAImageScaleProcessorStep ← NEW │ -│ └─ Scales images by 255 │ -│ 4. TokenizerProcessorStep │ -│ 5. DeviceProcessorStep │ -│ 6. XVLAAddDomainIdProcessorStep ← NEW │ -│ └─ Adds domain_id tensor │ -│ 7. NormalizerProcessorStep │ -└─────────────────────────────────────────────────────────────┘ - -┌─────────────────────────────────────────────────────────────┐ -│ POSTPROCESSING PIPELINE │ -├─────────────────────────────────────────────────────────────┤ -│ 1. UnnormalizerProcessorStep │ -│ 2. XVLARotation6DToAxisAngleProcessorStep ← NEW │ -│ └─ Converts 6D rotation to axis-angle (10D → 7D) │ -│ 3. DeviceProcessorStep(device="cpu") │ -└─────────────────────────────────────────────────────────────┘ -``` - -## Simplest Usage - -### Option 1: Import and Use Directly - -```python -from lerobot.policies.xvla.processor_xvla import ( - XVLAImageScaleProcessorStep, - XVLAAddDomainIdProcessorStep, - XVLARotation6DToAxisAngleProcessorStep, -) - -# Add to your existing preprocessor steps -preprocessor = PolicyProcessorPipeline( - steps=[ - # ... your existing steps ... - XVLAImageScaleProcessorStep(), - # ... more steps ... - XVLAAddDomainIdProcessorStep(domain_id=3), - ] -) - -# Add to your postprocessor steps -postprocessor = PolicyProcessorPipeline( - steps=[ - XVLARotation6DToAxisAngleProcessorStep(), - DeviceProcessorStep(device="cpu"), - ] -) -``` - -### Option 2: Load from Config - -```python -# In your config.json or YAML: -{ - "preprocessor_steps": [ - {"name": "xvla_image_scale"}, - {"name": "xvla_add_domain_id", "domain_id": 3, "device": "cuda"} - ], - "postprocessor_steps": [ - {"name": "xvla_rotation_6d_to_axis_angle", "expected_action_dim": 10} - ] -} - -# Then load: -preprocessor = PolicyProcessorPipeline.from_pretrained("path/to/config") -``` - -## Evaluation Loop Comparison - -### ❌ Old Way (Manual Processing) -```python -# Scattered preprocessing -observation["observation.images.image"] *= 255 -observation["observation.images.image2"] *= 255 -observation = add_envs_task(env, observation) -observation = preprocessor(observation) -observation["domain_id"] = torch.tensor([3], dtype=torch.long).to("cuda") - -# Policy inference -action = policy.select_action(observation) - -# Manual postprocessing -target_eef = action[:, :3] -target_axis = Rotate6D_to_AxisAngle(action[:, 3:9]) -target_act = action[:, 9:10] -action = np.concatenate([target_eef, target_axis, target_act], axis=-1) -``` - -### ✅ New Way (With Custom Processors) -```python -# All preprocessing in one call -observation = add_envs_task(env, observation) -observation = preprocessor(observation) # Includes scaling + domain_id - -# Policy inference -action = policy.select_action(observation) - -# All postprocessing in one call -action = postprocessor(action) # Includes rotation conversion -``` - -**Result:** 13 lines → 6 lines of cleaner, more maintainable code! - -## Quick Reference - -| Processor | Purpose | Config Key | Default | -|-----------|---------|------------|---------| -| **XVLAImageScaleProcessorStep** | Scale images by 255 | `xvla_image_scale` | Auto-detect images | -| **XVLAAddDomainIdProcessorStep** | Add domain_id tensor | `xvla_add_domain_id` | domain_id=3, device="cuda" | -| **XVLARotation6DToAxisAngleProcessorStep** | Convert 6D→axis-angle | `xvla_rotation_6d_to_axis_angle` | expected_action_dim=10 | - -## Key Benefits - -1. ✅ **Clean code** - No scattered preprocessing logic -2. ✅ **Configurable** - Adjust via config files -3. ✅ **Reusable** - Works across different XVLA setups -4. ✅ **Serializable** - Saves/loads with policy -5. ✅ **Testable** - Each processor can be tested independently -6. ✅ **Registry-based** - Easy instantiation from config - -## Next Steps - -1. **Update your evaluation script** to use the new processors -2. **Add processors to your config** if using config-based loading -3. **Test with your specific XVLA model** to ensure compatibility -4. **Adjust parameters** as needed (domain_id, device, etc.) - -For detailed documentation, see `README_PROCESSORS.md`. - diff --git a/src/lerobot/policies/xvla/XVLA_CONFIG_UPDATE_SUMMARY.md b/src/lerobot/policies/xvla/XVLA_CONFIG_UPDATE_SUMMARY.md deleted file mode 100644 index 887164737..000000000 --- a/src/lerobot/policies/xvla/XVLA_CONFIG_UPDATE_SUMMARY.md +++ /dev/null @@ -1,234 +0,0 @@ -# XVLA Configuration and Evaluation Updates - Summary - -## Overview -Updated XVLA configuration files and evaluation script to use the new custom processor steps, eliminating manual preprocessing and postprocessing code. - -## Files Modified - -### 1. `/src/lerobot/policies/xvla/policy_preprocessor.json` - -**Added two new processor steps:** - -#### Step 3: `xvla_image_scale` (NEW - Line 14-19) -```json -{ - "registry_name": "xvla_image_scale", - "config": { - "image_keys": null - } -} -``` -- **Position:** After `to_batch_processor`, before `tokenizer_processor` -- **Purpose:** Scales images by 255 (converts from [0,1] to [0,255]) -- **Replaces:** Manual code `observation["observation.images.image"] *= 255` - -#### Step 6: `xvla_add_domain_id` (NEW - Line 38-44) -```json -{ - "registry_name": "xvla_add_domain_id", - "config": { - "domain_id": 3, - "device": "cuda" - } -} -``` -- **Position:** After `device_processor`, before `normalizer_processor` -- **Purpose:** Adds domain_id tensor to complementary data -- **Replaces:** Manual code `observation["domain_id"] = torch.tensor([int(3)], dtype=torch.long).to("cuda")` - -**Final preprocessing pipeline order:** -1. `rename_observations_processor` -2. `to_batch_processor` -3. `xvla_image_scale` ⭐ NEW -4. `tokenizer_processor` -5. `device_processor` -6. `xvla_add_domain_id` ⭐ NEW -7. `normalizer_processor` - -### 2. `/src/lerobot/policies/xvla/policy_postprocessor.json` - -**Added one new processor step and updated device:** - -#### Step 2: `xvla_rotation_6d_to_axis_angle` (NEW - Line 23-28) -```json -{ - "registry_name": "xvla_rotation_6d_to_axis_angle", - "config": { - "expected_action_dim": 10 - } -} -``` -- **Position:** After `unnormalizer_processor`, before `device_processor` -- **Purpose:** Converts 6D rotation to axis-angle (10D → 7D action) -- **Replaces:** Manual code: - ```python - target_eef = action[:, :3] - target_axis = Rotate6D_to_AxisAngle(action[:, 3:9]) - target_act = action[:, 9:10] - action = np.concatenate([target_eef, target_axis, target_act], axis=-1) - ``` - -#### Step 3: `device_processor` (UPDATED - Line 29-35) -- **Changed device:** `"cuda"` → `"cpu"` -- **Purpose:** Move tensors to CPU for environment interaction -- **Replaces:** Manual code `.to("cpu")` - -**Final postprocessing pipeline order:** -1. `unnormalizer_processor` -2. `xvla_rotation_6d_to_axis_angle` ⭐ NEW -3. `device_processor` (device changed to "cpu") 🔧 UPDATED - -### 3. `/src/lerobot/scripts/lerobot_eval.py` - -**Removed manual preprocessing/postprocessing code:** - -#### Lines 91-92: Removed import (DELETED) -```python -# REMOVED: -from lerobot.policies.xvla.utils import Rotate6D_to_AxisAngle -``` - -#### Lines 165-184: Simplified evaluation logic (REPLACED) - -**Before (18 lines with manual processing):** -```python -observation[f"observation.images.image"] = observation[f"observation.images.image"] * 255 -observation[f"observation.images.image2"] = observation[f"observation.images.image2"] * 255 -observation = add_envs_task(env, observation) -observation = preprocessor(observation) -observation["domain_id"] = torch.tensor([int(3)], dtype=torch.long).to("cuda") - -with torch.inference_mode(): - action = policy.select_action(observation).to("cpu").numpy() -# action = postprocessor(action) # THIS WAS COMMENTED OUT -target_eef = action[:, :3] -target_axis = Rotate6D_to_AxisAngle(action[:, 3:9]) -target_act = action[:, 9:10] -action_numpy = np.concatenate([target_eef, target_axis, target_act], axis=-1) - -# Convert to CPU / numpy. -# action_numpy: np.ndarray = action.to("cpu").numpy() -assert action_numpy.ndim == 2, "Action dimensions should be (batch, action_dim)" -``` - -**After (11 lines, clean and simple):** -```python -observation = add_envs_task(env, observation) - -# Preprocess observation (includes image scaling and domain_id addition) -observation = preprocessor(observation) - -# Policy inference -with torch.inference_mode(): - action = policy.select_action(observation) - -# Postprocess action (includes rotation conversion and device transfer to CPU) -action = postprocessor(action) - -# Convert to numpy -action_numpy: np.ndarray = action.numpy() -assert action_numpy.ndim == 2, "Action dimensions should be (batch, action_dim)" -``` - -## Impact Summary - -### Code Reduction -- **Lines removed:** ~13 lines of manual processing code -- **Lines added:** ~7 lines of clean processor calls -- **Net reduction:** ~6 lines + cleaner structure -- **Removed import:** No longer need `Rotate6D_to_AxisAngle` import - -### Benefits - -1. **✅ Cleaner Code** - - Evaluation loop is now much simpler and more readable - - No scattered preprocessing logic - - Clear separation of concerns - -2. **✅ Configuration-Driven** - - All preprocessing/postprocessing controlled via JSON config - - Easy to adjust parameters (domain_id, device, etc.) without code changes - - Can load different configs for different deployments - -3. **✅ Maintainable** - - Processing logic centralized in processor classes - - Single source of truth for transformations - - Easier to debug and test - -4. **✅ Reusable** - - Processors work across all XVLA evaluations - - Can be shared between training and inference - - Can be serialized with the model - -5. **✅ Consistent** - - Same processing pipeline guaranteed in all contexts - - No risk of forgetting manual steps - - Automatic handling of edge cases - -## Testing Checklist - -Before deploying, verify: - -- [ ] Images are scaled correctly (0-255 range) -- [ ] domain_id is added to complementary data -- [ ] 6D rotation correctly converts to axis-angle -- [ ] Actions are 7D after postprocessing -- [ ] Evaluation success rates match previous results -- [ ] Video rendering still works -- [ ] Multi-environment batching works correctly - -## Configuration Notes - -### Customizing Domain ID -To change the domain ID for different embodiments, edit `policy_preprocessor.json`: -```json -{ - "registry_name": "xvla_add_domain_id", - "config": { - "domain_id": 5, // Change this value - "device": "cuda" - } -} -``` - -### Customizing Image Keys -To scale specific images only, edit `policy_preprocessor.json`: -```json -{ - "registry_name": "xvla_image_scale", - "config": { - "image_keys": ["observation.images.image", "observation.images.wrist_cam"] - } -} -``` - -### Customizing Action Dimensions -To support different action dimensions, edit `policy_postprocessor.json`: -```json -{ - "registry_name": "xvla_rotation_6d_to_axis_angle", - "config": { - "expected_action_dim": 12 // Adjust based on your model - } -} -``` - -## Migration Guide - -If you have existing XVLA checkpoints without these configs: - -1. **Copy the updated JSON files** to your checkpoint directory -2. **No model retraining needed** - processors are data transforms only -3. **Test evaluation** to ensure consistent results -4. **Update any custom evaluation scripts** to use processors - -## Related Files - -- Custom processors implementation: `/src/lerobot/policies/xvla/processor_xvla.py` -- Documentation: `/src/lerobot/policies/xvla/README_PROCESSORS.md` -- Quick start: `/src/lerobot/policies/xvla/QUICK_START.md` - -## Questions? - -See the processor documentation in `/src/lerobot/policies/xvla/README_PROCESSORS.md` for detailed usage examples and troubleshooting. -