admin/lerobot

Fork 0

mirror of https://github.com/huggingface/lerobot.git synced 2026-05-16 00:59:46 +00:00

Files

T

Jade Choghari 7e232fb114 more changes

2025-12-13 21:02:07 +00:00

4.9 KiB

Raw Blame History

Subtask Token Generation Implementation - Summary

What Was Implemented

I've successfully added autoregressive subtask token generation and decoding to the PI05 model, enabling the model to:

During Training: Decode and print ground truth subtask tokens for monitoring
During Inference: Generate subtask tokens using next token prediction and print them

Key Changes

1. New Method: `_generate_subtask_tokens()`

File: src/lerobot/policies/pi05/modeling_pi05.py (lines 844-914)

Implements autoregressive token generation using greedy decoding
Uses the PaliGemma language model head for token prediction
Generates tokens one at a time, each conditioned on previous tokens
Stops when EOS token is generated or max length (50 tokens) is reached

2. Updated `sample_actions()` Method

File: src/lerobot/policies/pi05/modeling_pi05.py (lines 916-1020)

Added optional tokenizer and max_subtask_tokens parameters
Calls _generate_subtask_tokens() during inference if tokenizer is provided
Decodes and prints generated subtask tokens

3. Updated `PI05Policy.init()`

File: src/lerobot/policies/pi05/modeling_pi05.py (lines 1066-1099)

Loads PaliGemma tokenizer (google/paligemma-3b-pt-224) for decoding
Stores as self.tokenizer for use throughout the policy

4. Updated `predict_action_chunk()`

File: src/lerobot/policies/pi05/modeling_pi05.py (lines 1387-1409)

Passes tokenizer to sample_actions() to enable subtask generation

5. Updated `forward()` (Training Method)

File: src/lerobot/policies/pi05/modeling_pi05.py (lines 1411-1445)

Decodes and prints ground truth subtask tokens during training
Helps monitor what the model is learning to predict

How It Works

During Inference:

1. Initialize with prefix: [images, high-level task, state]
2. Generate tokens autoregressively:
   - Forward pass → get logits
   - Select most likely token (greedy decoding)
   - Embed token and append to prefix
   - Repeat until EOS or max length
3. Decode generated tokens to text
4. Print: "[Inference] Generated subtask {i}: {text}"
5. Continue with action prediction (flow matching)

During Training:

1. Extract ground truth subtask tokens from batch
2. Remove padding and decode to text
3. Print: "[Training] Ground truth subtask {i}: {text}"
4. Continue with normal training (subtask loss + flow loss)

Example Output

Training:

[Training] Ground truth subtask 0: pick up the red block
[Training] Ground truth subtask 1: move to the blue container
[Training] Ground truth subtask 2: place the object down

Inference:

[Inference] Generated subtask 0: grasp the object
[Inference] Generated subtask 1: move to target location
[Inference] Generated subtask 2: release the gripper

Benefits

✓ Transparency: See what subtasks the model predicts
✓ Debugging: Verify subtask prediction works correctly
✓ Interpretability: Understand the model's reasoning
✓ Monitoring: Track subtask quality during training
✓ Research: Enables hierarchical reasoning analysis

Files Modified

src/lerobot/policies/pi05/modeling_pi05.py (main implementation)

Files Created

examples/dataset/test_subtask_generation.py (demo script)
SUBTASK_GENERATION_CHANGES.md (detailed documentation)
SUBTASK_GENERATION_FLOW.md (visual flow diagrams)
SUMMARY.md (this file)

Testing

To verify the implementation:

python examples/dataset/test_subtask_generation.py

This will check that the tokenizer loads correctly and explain the features.

Next Steps

To see subtask generation in action:

During Training:
- Run your training script as usual
- Watch console for [Training] Ground truth subtask messages
During Inference:
- Run your inference script as usual
- Watch console for [Inference] Generated subtask messages

Technical Details

Generation Method: Autoregressive (one token at a time)
Decoding Strategy: Greedy (always select most likely token)
Max Tokens: 50 (configurable via max_subtask_tokens parameter)
Attention: Causal masking for generated tokens
Tokenizer: PaliGemma tokenizer (google/paligemma-3b-pt-224)
Performance: Adds ~50 forward passes during inference (can be optimized with KV caching)

Notes

The implementation follows the same pattern as training (using LM head for prediction)
Subtask generation happens before action prediction
Generated subtasks are currently for visualization only (not used in action prediction)
In future, could be used for hierarchical planning or multi-step reasoning

See SUBTASK_GENERATION_CHANGES.md for detailed technical documentation
See SUBTASK_GENERATION_FLOW.md for visual flow diagrams
See training forward pass (lines 735-842) for reference implementation

4.9 KiB Raw Blame History