mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-16 17:20:05 +00:00
142 lines
7.1 KiB
Markdown
142 lines
7.1 KiB
Markdown
# Temporal Sampling Strategy Visualization
|
|
|
|
## How `--sample-interval` Works
|
|
|
|
### Example: 30 fps dataset, `--sample-interval 1.0` (1 second)
|
|
|
|
```
|
|
Timeline (seconds): 0.0 0.5 1.0 1.5 2.0 2.5 3.0
|
|
│ │ │ │ │ │ │
|
|
Frames: 0───15───30───45───60───75───90───105──120──135──150
|
|
│ │ │ │ │ │ │
|
|
▼ ▼ ▼ ▼
|
|
Sampled: YES NO YES NO YES NO YES
|
|
│ │ │ │
|
|
Task Index: [0]──────────────>[1]──────────────>[2]──────────────>[3]
|
|
│ │ │ │
|
|
VLM Called: ✓ Gen ✓ Gen ✓ Gen ✓ Gen
|
|
dialogue dialogue dialogue dialogue
|
|
│ │ │ │
|
|
Frames 0-29 ─────┘ │ │ │
|
|
get task 0 │ │ │
|
|
│ │ │
|
|
Frames 30-59 ────────────────────────┘ │ │
|
|
get task 1 │ │
|
|
│ │
|
|
Frames 60-89 ──────────────────────────────────────────┘ │
|
|
get task 2 │
|
|
│
|
|
Frames 90-119 ────────────────────────────────────────────────────────────┘
|
|
get task 3
|
|
```
|
|
|
|
## Comparison: Different Sampling Intervals
|
|
|
|
### `--sample-interval 2.0` (every 2 seconds)
|
|
```
|
|
Timeline: 0.0 1.0 2.0 3.0 4.0 5.0 6.0
|
|
│ │ │ │ │ │ │
|
|
Sampled: YES NO YES NO YES NO YES
|
|
│ │ │ │
|
|
Tasks: [0]───────────────>[1]───────────────>[2]───────────────>[3]
|
|
|
|
VLM Calls: 4 (fewer calls, faster but less granular)
|
|
```
|
|
|
|
### `--sample-interval 1.0` (every 1 second) - **DEFAULT**
|
|
```
|
|
Timeline: 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
|
|
│ │ │ │ │ │ │ │ │ │ │ │ │
|
|
Sampled: YES NO YES NO YES NO YES NO YES NO YES NO YES
|
|
│ │ │ │ │ │ │
|
|
Tasks: [0]─────────>[1]─────────>[2]─────────>[3]─────────>[4]─────────>[5]─────>[6]
|
|
|
|
VLM Calls: 7 (balanced coverage and speed)
|
|
```
|
|
|
|
### `--sample-interval 0.5` (every 0.5 seconds)
|
|
```
|
|
Timeline: 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
|
|
│ │ │ │ │ │ │ │ │ │ │ │ │
|
|
Sampled: YES YES YES YES YES YES YES YES YES YES YES YES YES
|
|
│ │ │ │ │ │ │ │ │ │ │ │ │
|
|
Tasks: [0]─>[1]─>[2]─>[3]─>[4]─>[5]─>[6]─>[7]─>[8]─>[9]─>[10]>[11]>[12]
|
|
|
|
VLM Calls: 13 (high granularity, slower but more detailed)
|
|
```
|
|
|
|
## Episode Boundaries
|
|
|
|
The script always samples the **first frame** of each episode:
|
|
|
|
```
|
|
Episode 0 Episode 1 Episode 2
|
|
├─────────────────────────────────┤├─────────────────────────────────┤├──────...
|
|
│ ││ ││
|
|
Frame: 0 30 60 90 120 130 160 190 220 250 260 290 320
|
|
Time: 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0
|
|
│ │ │ │ │ │ │ │ │ │ │ │ │
|
|
▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼
|
|
Sample:YES YES YES YES YES YES YES YES YES YES YES YES YES
|
|
│ │ │ │ │ │ │ │ │ │ │ │ │
|
|
Task: 0────1─────2─────3────4 5─────6─────7─────8────9 10────11───12
|
|
|
|
Note: Frames 0, 130, 260 are ALWAYS sampled (episode starts)
|
|
Even if they're within the sample-interval window
|
|
```
|
|
|
|
## Real-World Example: svla_so101_pickplace Dataset
|
|
|
|
Typical stats:
|
|
- **Total episodes**: 50
|
|
- **Avg episode length**: 300 frames (10 seconds at 30 fps)
|
|
- **Total frames**: 15,000
|
|
|
|
### Without Sampling (every frame)
|
|
```
|
|
Frames processed: 15,000
|
|
VLM calls: 15,000
|
|
Time estimate: ~5 hours
|
|
Unique tasks: ~12,000 (lots of duplicates)
|
|
```
|
|
|
|
### With `--sample-interval 1.0` (every 1 second)
|
|
```
|
|
Frames processed: 15,000 ✓
|
|
VLM calls: 500
|
|
Time estimate: ~10 minutes
|
|
Unique tasks: ~450 (meaningful variety)
|
|
Efficiency gain: 30x faster
|
|
```
|
|
|
|
### With `--sample-interval 2.0` (every 2 seconds)
|
|
```
|
|
Frames processed: 15,000 ✓
|
|
VLM calls: 250
|
|
Time estimate: ~5 minutes
|
|
Unique tasks: ~220
|
|
Efficiency gain: 60x faster
|
|
```
|
|
|
|
## Key Points
|
|
|
|
1. **All frames get labeled**: Every frame gets a `task_index_high_level`
|
|
2. **Only sampled frames call VLM**: Huge efficiency gain
|
|
3. **Temporal coherence**: Nearby frames share the same task
|
|
4. **Episode-aware**: Always samples episode starts
|
|
5. **Configurable**: Adjust `--sample-interval` based on your needs
|
|
|
|
## Choosing Your Sampling Interval
|
|
|
|
| Use Case | Recommended Interval | Why |
|
|
|----------|---------------------|-----|
|
|
| Quick testing | 2.0s | Fastest iteration |
|
|
| Standard training | 1.0s | Good balance |
|
|
| High-quality dataset | 0.5s | Better coverage |
|
|
| Fine-grained control | 0.33s | Very detailed |
|
|
| Dense annotations | 0.1s | Nearly every frame |
|
|
|
|
**Rule of thumb**: Match your sampling interval to your typical skill duration.
|
|
If skills last 1-3 seconds, sampling every 1 second captures each skill multiple times.
|
|
|