mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-16 00:59:46 +00:00
4.9 KiB
4.9 KiB
Subtask Token Generation Implementation - Summary
What Was Implemented
I've successfully added autoregressive subtask token generation and decoding to the PI05 model, enabling the model to:
- During Training: Decode and print ground truth subtask tokens for monitoring
- During Inference: Generate subtask tokens using next token prediction and print them
Key Changes
1. New Method: _generate_subtask_tokens()
File: src/lerobot/policies/pi05/modeling_pi05.py (lines 844-914)
- Implements autoregressive token generation using greedy decoding
- Uses the PaliGemma language model head for token prediction
- Generates tokens one at a time, each conditioned on previous tokens
- Stops when EOS token is generated or max length (50 tokens) is reached
2. Updated sample_actions() Method
File: src/lerobot/policies/pi05/modeling_pi05.py (lines 916-1020)
- Added optional
tokenizerandmax_subtask_tokensparameters - Calls
_generate_subtask_tokens()during inference if tokenizer is provided - Decodes and prints generated subtask tokens
3. Updated PI05Policy.__init__()
File: src/lerobot/policies/pi05/modeling_pi05.py (lines 1066-1099)
- Loads PaliGemma tokenizer (
google/paligemma-3b-pt-224) for decoding - Stores as
self.tokenizerfor use throughout the policy
4. Updated predict_action_chunk()
File: src/lerobot/policies/pi05/modeling_pi05.py (lines 1387-1409)
- Passes tokenizer to
sample_actions()to enable subtask generation
5. Updated forward() (Training Method)
File: src/lerobot/policies/pi05/modeling_pi05.py (lines 1411-1445)
- Decodes and prints ground truth subtask tokens during training
- Helps monitor what the model is learning to predict
How It Works
During Inference:
1. Initialize with prefix: [images, high-level task, state]
2. Generate tokens autoregressively:
- Forward pass → get logits
- Select most likely token (greedy decoding)
- Embed token and append to prefix
- Repeat until EOS or max length
3. Decode generated tokens to text
4. Print: "[Inference] Generated subtask {i}: {text}"
5. Continue with action prediction (flow matching)
During Training:
1. Extract ground truth subtask tokens from batch
2. Remove padding and decode to text
3. Print: "[Training] Ground truth subtask {i}: {text}"
4. Continue with normal training (subtask loss + flow loss)
Example Output
Training:
[Training] Ground truth subtask 0: pick up the red block
[Training] Ground truth subtask 1: move to the blue container
[Training] Ground truth subtask 2: place the object down
Inference:
[Inference] Generated subtask 0: grasp the object
[Inference] Generated subtask 1: move to target location
[Inference] Generated subtask 2: release the gripper
Benefits
- ✓ Transparency: See what subtasks the model predicts
- ✓ Debugging: Verify subtask prediction works correctly
- ✓ Interpretability: Understand the model's reasoning
- ✓ Monitoring: Track subtask quality during training
- ✓ Research: Enables hierarchical reasoning analysis
Files Modified
src/lerobot/policies/pi05/modeling_pi05.py(main implementation)
Files Created
examples/dataset/test_subtask_generation.py(demo script)SUBTASK_GENERATION_CHANGES.md(detailed documentation)SUBTASK_GENERATION_FLOW.md(visual flow diagrams)SUMMARY.md(this file)
Testing
To verify the implementation:
python examples/dataset/test_subtask_generation.py
This will check that the tokenizer loads correctly and explain the features.
Next Steps
To see subtask generation in action:
-
During Training:
- Run your training script as usual
- Watch console for
[Training] Ground truth subtaskmessages
-
During Inference:
- Run your inference script as usual
- Watch console for
[Inference] Generated subtaskmessages
Technical Details
- Generation Method: Autoregressive (one token at a time)
- Decoding Strategy: Greedy (always select most likely token)
- Max Tokens: 50 (configurable via
max_subtask_tokensparameter) - Attention: Causal masking for generated tokens
- Tokenizer: PaliGemma tokenizer (google/paligemma-3b-pt-224)
- Performance: Adds ~50 forward passes during inference (can be optimized with KV caching)
Notes
- The implementation follows the same pattern as training (using LM head for prediction)
- Subtask generation happens before action prediction
- Generated subtasks are currently for visualization only (not used in action prediction)
- In future, could be used for hierarchical planning or multi-step reasoning
Related Documentation
- See
SUBTASK_GENERATION_CHANGES.mdfor detailed technical documentation - See
SUBTASK_GENERATION_FLOW.mdfor visual flow diagrams - See training forward pass (lines 735-842) for reference implementation