more changes

2026-05-23 12:40:08 +00:00 · 2025-12-13 21:02:07 +00:00
parent dc452f37e0
commit 7e232fb114
7 changed files with 1071 additions and 20 deletions
@@ -0,0 +1,208 @@
+# Subtask Token Generation - Quick Reference
+
+## What Was Done
+
+Added **autoregressive subtask token generation** to PI05 model with decoding and printing during both training and inference.
+
+## Key Features
+
+✅ **Training:** Prints ground truth subtask tokens for monitoring  
+✅ **Inference:** Generates and prints predicted subtask tokens using next token prediction  
+✅ **Autoregressive:** Each token conditioned on previous tokens  
+✅ **Greedy Decoding:** Selects most likely token at each step  
+
+## Implementation Location
+
+**File:** `src/lerobot/policies/pi05/modeling_pi05.py`
+
+**New Method:** `_generate_subtask_tokens()` (lines 844-914)
+- Autoregressive token generation
+- Uses PaliGemma language model head
+- Greedy decoding with early stopping
+
+**Modified Methods:**
+- `sample_actions()` - Calls generation and prints during inference
+- `predict_action_chunk()` - Passes tokenizer to enable generation
+- `forward()` - Prints ground truth tokens during training
+- `__init__()` - Loads tokenizer
+
+## Console Output Examples
+
+### Training:
+```
+[Training] Ground truth subtask 0: pick up the red block
+[Training] Ground truth subtask 1: place in blue container
+```
+
+### Inference:
+```
+[Inference] Generated subtask 0: grasp the object
+[Inference] Generated subtask 1: move to target location
+```
+
+## How to Use
+
+### No Code Changes Required!
+
+The implementation is automatic:
+
+1. **Training:** Just run your training script
+   - Subtasks will be printed to console automatically
+   
+2. **Inference:** Just run your inference script  
+   - Subtasks will be generated and printed automatically
+
+### To Disable (if needed):
+
+To disable subtask generation during inference for better performance:
+
+```python
+# In the model code, set tokenizer to None temporarily
+policy.tokenizer = None
+actions = policy.predict_action_chunk(batch)
+```
+
+## Technical Specs
+
+| Property | Value |
+|----------|-------|
+| **Generation Method** | Autoregressive (sequential) |
+| **Decoding Strategy** | Greedy (argmax) |
+| **Max Tokens** | 50 (configurable) |
+| **Tokenizer** | google/paligemma-3b-pt-224 |
+| **Attention** | Causal masking for generated tokens |
+| **Performance Cost** | ~50 extra forward passes per inference |
+
+## Architecture Flow
+
+```
+Training:    Ground Truth Tokens → Decode → Print → Loss Computation
+Inference:   Observations → Generate Tokens → Decode → Print → Action Prediction
+```
+
+## Method: `_generate_subtask_tokens()`
+
+**Purpose:** Generate subtask tokens autoregressively
+
+**Algorithm:**
+```python
+1. Start with prefix = [images, high-level task, state]
+2. For each position (up to max_length):
+   a. Forward pass → get logits
+   b. Apply LM head → token probabilities
+   c. Select best token (greedy)
+   d. Embed token
+   e. Append to prefix
+   f. Update masks (causal attention)
+3. Stop when EOS or max length reached
+4. Return generated tokens
+```
+
+**Key Parameters:**
+- `images` - Visual observations
+- `img_masks` - Image padding masks
+- `tokens` - Instruction tokens with state
+- `masks` - Token attention masks
+- `tokenizer` - For EOS detection
+- `max_length` - Maximum tokens to generate (default: 50)
+- `device` - Computation device
+
+## Files Created
+
+📄 `SUMMARY.md` - Comprehensive summary  
+📄 `SUBTASK_GENERATION_CHANGES.md` - Detailed technical docs  
+📄 `SUBTASK_GENERATION_FLOW.md` - Visual flow diagrams  
+📄 `QUICK_REFERENCE.md` - This file  
+📄 `examples/dataset/test_subtask_generation.py` - Test script  
+
+## Quick Test
+
+```bash
+# Test that tokenizer loads correctly
+python examples/dataset/test_subtask_generation.py
+
+# Run training to see ground truth subtasks
+python your_training_script.py
+
+# Run inference to see generated subtasks
+python your_inference_script.py
+```
+
+## Troubleshooting
+
+### No subtask output during inference?
+- Check that tokenizer loaded: `print(policy.tokenizer)`
+- Should see: `PaliGemmaTokenizerFast(name_or_path='google/paligemma-3b-pt-224'...)`
+
+### Tokenizer failed to load?
+- Check internet connection (first run downloads tokenizer)
+- Check transformers library installed: `pip install transformers`
+
+### Performance too slow during inference?
+- Disable subtask generation by setting `policy.tokenizer = None`
+- Or implement KV caching for faster generation (future optimization)
+
+## Integration Points
+
+The implementation integrates seamlessly with existing code:
+
+- **Training Loop:** No changes needed, prints happen automatically
+- **Inference Loop:** No changes needed, generation happens automatically  
+- **Data Processing:** Uses existing tokenizer from processor
+- **Loss Computation:** Already implemented in training forward pass
+
+## Future Enhancements
+
+Possible improvements (not yet implemented):
+
+- [ ] KV caching for faster generation
+- [ ] Temperature/top-k/top-p sampling
+- [ ] Beam search for better quality
+- [ ] Optional flag to enable/disable printing
+- [ ] Save generated subtasks to file
+- [ ] Compute subtask prediction accuracy metrics
+- [ ] Use generated subtasks in action prediction (hierarchical)
+
+## Code Snippet - How Autoregressive Generation Works
+
+```python
+# Simplified pseudocode
+generated_tokens = []
+prefix = [images, high_level_task, state]
+
+for t in range(max_length):
+    # Forward pass
+    logits = model(prefix)
+    
+    # Greedy decode
+    next_token = argmax(logits[-1])
+    
+    # Store
+    generated_tokens.append(next_token)
+    
+    # Stop if EOS
+    if next_token == EOS:
+        break
+    
+    # Append for next iteration
+    prefix = prefix + [next_token]
+
+return generated_tokens
+```
+
+## Questions?
+
+See the detailed documentation files:
+- `SUBTASK_GENERATION_CHANGES.md` - Full technical details
+- `SUBTASK_GENERATION_FLOW.md` - Visual flow diagrams
+- `SUMMARY.md` - Complete overview
+
+---
+
+**Implementation Status:** ✅ Complete and Ready to Use
+
+
+
+
+
+