mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-15 16:49:55 +00:00
7.1 KiB
7.1 KiB
Temporal Sampling Strategy Visualization
How --sample-interval Works
Example: 30 fps dataset, --sample-interval 1.0 (1 second)
Timeline (seconds): 0.0 0.5 1.0 1.5 2.0 2.5 3.0
│ │ │ │ │ │ │
Frames: 0───15───30───45───60───75───90───105──120──135──150
│ │ │ │ │ │ │
▼ ▼ ▼ ▼
Sampled: YES NO YES NO YES NO YES
│ │ │ │
Task Index: [0]──────────────>[1]──────────────>[2]──────────────>[3]
│ │ │ │
VLM Called: ✓ Gen ✓ Gen ✓ Gen ✓ Gen
dialogue dialogue dialogue dialogue
│ │ │ │
Frames 0-29 ─────┘ │ │ │
get task 0 │ │ │
│ │ │
Frames 30-59 ────────────────────────┘ │ │
get task 1 │ │
│ │
Frames 60-89 ──────────────────────────────────────────┘ │
get task 2 │
│
Frames 90-119 ────────────────────────────────────────────────────────────┘
get task 3
Comparison: Different Sampling Intervals
--sample-interval 2.0 (every 2 seconds)
Timeline: 0.0 1.0 2.0 3.0 4.0 5.0 6.0
│ │ │ │ │ │ │
Sampled: YES NO YES NO YES NO YES
│ │ │ │
Tasks: [0]───────────────>[1]───────────────>[2]───────────────>[3]
VLM Calls: 4 (fewer calls, faster but less granular)
--sample-interval 1.0 (every 1 second) - DEFAULT
Timeline: 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
│ │ │ │ │ │ │ │ │ │ │ │ │
Sampled: YES NO YES NO YES NO YES NO YES NO YES NO YES
│ │ │ │ │ │ │
Tasks: [0]─────────>[1]─────────>[2]─────────>[3]─────────>[4]─────────>[5]─────>[6]
VLM Calls: 7 (balanced coverage and speed)
--sample-interval 0.5 (every 0.5 seconds)
Timeline: 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
│ │ │ │ │ │ │ │ │ │ │ │ │
Sampled: YES YES YES YES YES YES YES YES YES YES YES YES YES
│ │ │ │ │ │ │ │ │ │ │ │ │
Tasks: [0]─>[1]─>[2]─>[3]─>[4]─>[5]─>[6]─>[7]─>[8]─>[9]─>[10]>[11]>[12]
VLM Calls: 13 (high granularity, slower but more detailed)
Episode Boundaries
The script always samples the first frame of each episode:
Episode 0 Episode 1 Episode 2
├─────────────────────────────────┤├─────────────────────────────────┤├──────...
│ ││ ││
Frame: 0 30 60 90 120 130 160 190 220 250 260 290 320
Time: 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0
│ │ │ │ │ │ │ │ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼
Sample:YES YES YES YES YES YES YES YES YES YES YES YES YES
│ │ │ │ │ │ │ │ │ │ │ │ │
Task: 0────1─────2─────3────4 5─────6─────7─────8────9 10────11───12
Note: Frames 0, 130, 260 are ALWAYS sampled (episode starts)
Even if they're within the sample-interval window
Real-World Example: svla_so101_pickplace Dataset
Typical stats:
- Total episodes: 50
- Avg episode length: 300 frames (10 seconds at 30 fps)
- Total frames: 15,000
Without Sampling (every frame)
Frames processed: 15,000
VLM calls: 15,000
Time estimate: ~5 hours
Unique tasks: ~12,000 (lots of duplicates)
With --sample-interval 1.0 (every 1 second)
Frames processed: 15,000 ✓
VLM calls: 500
Time estimate: ~10 minutes
Unique tasks: ~450 (meaningful variety)
Efficiency gain: 30x faster
With --sample-interval 2.0 (every 2 seconds)
Frames processed: 15,000 ✓
VLM calls: 250
Time estimate: ~5 minutes
Unique tasks: ~220
Efficiency gain: 60x faster
Key Points
- All frames get labeled: Every frame gets a
task_index_high_level - Only sampled frames call VLM: Huge efficiency gain
- Temporal coherence: Nearby frames share the same task
- Episode-aware: Always samples episode starts
- Configurable: Adjust
--sample-intervalbased on your needs
Choosing Your Sampling Interval
| Use Case | Recommended Interval | Why |
|---|---|---|
| Quick testing | 2.0s | Fastest iteration |
| Standard training | 1.0s | Good balance |
| High-quality dataset | 0.5s | Better coverage |
| Fine-grained control | 0.33s | Very detailed |
| Dense annotations | 0.1s | Nearly every frame |
Rule of thumb: Match your sampling interval to your typical skill duration. If skills last 1-3 seconds, sampling every 1 second captures each skill multiple times.