- Replace cv2.VideoCapture with PyAV (av library) which handles AV1
codec properly. Decode each video once and index by frame number.
- Use AutoImageProcessor instead of AutoProcessor to avoid loading
the SigLIP tokenizer (which requires sentencepiece).
Made-with: Cursor
Run two parallel KNN analyses per dataset:
1. State-based: KNN in joint-state space
2. Image-based: KNN in SigLIP embedding space (google/siglip-base-patch16-224)
Both measure action chunk variance among cross-episode neighbors.
Comparing them reveals whether visual and proprioceptive similarity
agree on where data is inconsistent.
Output is a 4-row figure: state histogram, image histogram,
overlaid per-episode curves, and spatial heatmap colored by
image-based variance.
Made-with: Cursor
Add three new analysis scripts for dataset quality insight:
- create_frame_grid.py: random frame grid JPG for visual inspection
- workspace_density.py: 3D TCP trajectory clustering with K-means
- action_consistency.py: KNN-based action-state consistency analysis
with action chunk support (default chunk=30) matching policy learning
Also update create_progress_videos.py with configurable camera selection.
Made-with: Cursor