# Temporal Sampling Strategy Visualization

## How `--sample-interval` Works

### Example: 30 fps dataset, `--sample-interval 1.0` (1 second)

```
Timeline (seconds):  0.0      0.5      1.0      1.5      2.0      2.5      3.0
                     │        │        │        │        │        │        │
Frames:              0───15───30───45───60───75───90───105──120──135──150
                     │        │        │        │        │        │        │
                     ▼                 ▼                 ▼                 ▼
Sampled:            YES      NO       YES      NO       YES      NO       YES
                     │                 │                 │                 │
Task Index:         [0]──────────────>[1]──────────────>[2]──────────────>[3]
                     │                 │                 │                 │
VLM Called:         ✓ Gen             ✓ Gen             ✓ Gen             ✓ Gen
                    dialogue          dialogue          dialogue          dialogue
                     │                 │                 │                 │
Frames 0-29    ─────┘                 │                 │                 │
get task 0                             │                 │                 │
                                       │                 │                 │
Frames 30-59  ────────────────────────┘                 │                 │
get task 1                                               │                 │
                                                         │                 │
Frames 60-89  ──────────────────────────────────────────┘                 │
get task 2                                                                 │
                                                                           │
Frames 90-119 ────────────────────────────────────────────────────────────┘
get task 3
```

## Comparison: Different Sampling Intervals

### `--sample-interval 2.0` (every 2 seconds)
```
Timeline:    0.0      1.0      2.0      3.0      4.0      5.0      6.0
             │        │        │        │        │        │        │
Sampled:    YES      NO       YES      NO       YES      NO       YES
             │                 │                 │                 │
Tasks:      [0]───────────────>[1]───────────────>[2]───────────────>[3]
             
VLM Calls:   4 (fewer calls, faster but less granular)
```

### `--sample-interval 1.0` (every 1 second) - **DEFAULT**
```
Timeline:    0.0   0.5   1.0   1.5   2.0   2.5   3.0   3.5   4.0   4.5   5.0   5.5   6.0
             │     │     │     │     │     │     │     │     │     │     │     │     │
Sampled:    YES   NO   YES   NO   YES   NO   YES   NO   YES   NO   YES   NO   YES
             │           │           │           │           │           │           │
Tasks:      [0]─────────>[1]─────────>[2]─────────>[3]─────────>[4]─────────>[5]─────>[6]
             
VLM Calls:   7 (balanced coverage and speed)
```

### `--sample-interval 0.5` (every 0.5 seconds)
```
Timeline:    0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0
             │    │    │    │    │    │    │    │    │    │    │    │    │
Sampled:    YES  YES  YES  YES  YES  YES  YES  YES  YES  YES  YES  YES  YES
             │    │    │    │    │    │    │    │    │    │    │    │    │
Tasks:      [0]─>[1]─>[2]─>[3]─>[4]─>[5]─>[6]─>[7]─>[8]─>[9]─>[10]>[11]>[12]
             
VLM Calls:   13 (high granularity, slower but more detailed)
```

## Episode Boundaries

The script always samples the **first frame** of each episode:

```
Episode 0                          Episode 1                          Episode 2
├─────────────────────────────────┤├─────────────────────────────────┤├──────...
│                                 ││                                 ││
Frame: 0    30    60    90   120  130   160   190   220  250  260   290  320
Time:  0.0  1.0   2.0   3.0  4.0  0.0   1.0   2.0   3.0  4.0  0.0   1.0  2.0
       │    │     │     │    │    │     │     │     │    │    │     │    │
       ▼    ▼     ▼     ▼    ▼    ▼     ▼     ▼     ▼    ▼    ▼     ▼    ▼
Sample:YES  YES   YES   YES  YES  YES   YES   YES   YES  YES  YES   YES  YES
       │    │     │     │    │    │     │     │     │    │    │     │    │
Task:  0────1─────2─────3────4    5─────6─────7─────8────9    10────11───12

Note: Frames 0, 130, 260 are ALWAYS sampled (episode starts)
      Even if they're within the sample-interval window
```

## Real-World Example: svla_so101_pickplace Dataset

Typical stats:
- **Total episodes**: 50
- **Avg episode length**: 300 frames (10 seconds at 30 fps)
- **Total frames**: 15,000

### Without Sampling (every frame)
```
Frames processed:    15,000
VLM calls:           15,000
Time estimate:       ~5 hours
Unique tasks:        ~12,000 (lots of duplicates)
```

### With `--sample-interval 1.0` (every 1 second)
```
Frames processed:    15,000 ✓
VLM calls:           500
Time estimate:       ~10 minutes
Unique tasks:        ~450 (meaningful variety)
Efficiency gain:     30x faster
```

### With `--sample-interval 2.0` (every 2 seconds)
```
Frames processed:    15,000 ✓
VLM calls:           250
Time estimate:       ~5 minutes
Unique tasks:        ~220
Efficiency gain:     60x faster
```

## Key Points

1. **All frames get labeled**: Every frame gets a `task_index_high_level`
2. **Only sampled frames call VLM**: Huge efficiency gain
3. **Temporal coherence**: Nearby frames share the same task
4. **Episode-aware**: Always samples episode starts
5. **Configurable**: Adjust `--sample-interval` based on your needs

## Choosing Your Sampling Interval

| Use Case | Recommended Interval | Why |
|----------|---------------------|-----|
| Quick testing | 2.0s | Fastest iteration |
| Standard training | 1.0s | Good balance |
| High-quality dataset | 0.5s | Better coverage |
| Fine-grained control | 0.33s | Very detailed |
| Dense annotations | 0.1s | Nearly every frame |

**Rule of thumb**: Match your sampling interval to your typical skill duration.
If skills last 1-3 seconds, sampling every 1 second captures each skill multiple times.