lerobot

mirror of https://github.com/huggingface/lerobot.git synced 2026-07-06 17:41:47 +00:00

Files

T

Pepijn fbcb9225f5 feat: oversample sparse VQA annotations (recipe consumption + weighted sampler)

VQA annotations are sparse, so VQA was badly underrepresented in training:
its effective share was weight x density, and blend draws that picked an
ask_vqa* sub-recipe for a non-VQA frame were wasted entirely.

Two pieces:

1. Recipe-side consumption (language_render.py): render_sample now routes
   any frame that carries a VQA annotation to a matching ask_vqa* sub-recipe,
   regardless of the weighted blend draw. No VQA annotation is wasted and no
   draw lands on a non-renderable VQA recipe — VQA's recipe-side share now
   equals the VQA-annotation density.

2. Dataset-side oversampling (WeightedEpisodeAwareSampler + vqa_target_fraction):
   a new weighted, episode-aware sampler draws frames with replacement by
   per-frame weight. When TrainPipelineConfig.vqa_target_fraction is set, the
   train script scans language_events, weights VQA frames so they make up
   ~that fraction of the training stream, and uses the weighted sampler. This
   is what actually lets VQA exceed its natural density. Default None keeps
   uniform episode-aware sampling unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 15:30:00 +02:00

test_aggregate.py

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

test_compute_stats.py

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

test_dataset_metadata.py

feat(dependencies): minimal default tag install (#3362 )

2026-04-12 20:03:04 +02:00

test_dataset_reader.py

feat(dependencies): minimal default tag install (#3362 )