Steerable annotation pipeline (lerobot-annotate) that populates the language_persistent and language_events columns introduced in PR 1 (#3467) directly into data/chunk-*/file-*.parquet.
This is PR 2 of the three-PR plan:
PR 1 (Add extensive language support #3467): schema + DSL + rendering, base of this PR
PR 2 (this PR): annotation pipeline writing into PR 1's columns
PR 3: model with language prediction and runtime
A VLM (Qwen-VL family, served on vLLM) watches each episode's video and emits grounded language annotations: subtasks, plans, memory, task rephrasings, interjections + speech, and per-camera VQA. The pipeline is built for production annotation at scale — single-camera grounding, embedded-frame inputs, a describe-then-segment grounding flow, and a deterministic full-episode coverage guarantee — informed by Scale's dense-captioning findings (representation > sampling, rules > reasoning, model capacity is the biggest lever, two-pass systems compound errors)
* feat(policies): Initial setup to push policies to hub with tags and model card
* feat: add dataset that is used to train
* Add model template summary
* fix: Update link model_card template
* fix: remove print
* fix: change import name
* fix: add model summary in template
* fix: minor text
* fix: comments Lucain
* fix: feedback steven
* fix: restructure push to hub
* fix: remove unneeded changes
* fix: import
* fix: import 2
* Add MANIFEST.in
* fix: feedback pr
* Fix tests
* tests: Add smolvla end-to-end test
* Fix: smolvla test
* fix test name
* fix policy tests
* Add push to hub false policy tests
* Do push to hub cleaner
* fix(ci): add push_to_hub false in tests
---------
Co-authored-by: Steven Palma <steven.palma@huggingface.co>
- Changes on the `test.yml` workflow:
- Using poetry instead of pip. Contrary to what I wrote in #75, it is possible to use poetry (and have the benefits of shorter install times) without the need for having two separate versions of `pyproject.toml` and `poetry.lock`.
- Reduce the trigger scope to only run when files in these directories are modified:
- `lerobot/`
- `tests/`
- `examples/`
- `.github/`
- Add `style.yml` workflow for doing a `ruff check` pass on the code
- More cleanup (removed deprecated workflow)