mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-18 18:20:08 +00:00
review: fix dead-code bug, add thread safety, atomic writes, smaller cleanups
**Critical: video_for_episode was unreachable dead code.**
``video_for_episode`` was indented inside ``_decode_pyav_direct``, after
its ``return`` statement — Python parsed it as a nested function that
never executed. Module 1's ``_episode_video_block`` calls
``self.frame_provider.video_for_episode(record, target_count)`` on the
``use_video_url=False`` path, which would have AttributeError'd on any
real dataset. Tests passed only because they used ``_StubFrameProvider``
/ ``_NullProvider`` which have the method. Moved it to be a proper
method of ``VideoFrameProvider`` (right after ``frames_at``).
**Thread safety on VideoFrameProvider.**
The executor runs Module 1/2/3 phases under a ``ThreadPoolExecutor``, so
the per-instance ``_cache`` dict and the one-shot ``_warned_decode_fail``
flag were exposed to concurrent reads/writes. Added a ``threading.Lock``
field, wrapped cache reads/writes and the warn-flag check-and-set in
``with self._lock:``. Stub fixtures unaffected.
**episode_clip_path is now a method of VideoFrameProvider.**
Used to be a free function reaching into ``provider._meta.episodes`` and
``provider._meta.get_video_file_path`` from outside the class. As a
method it just uses ``self._meta``. The only caller (Module 1) updated;
no external callers.
**Atomic write in LanguageColumnsWriter.**
``pq.write_table(new_table, path)`` was overwriting the parquet shard
in place — a crash mid-write would corrupt the file. Now writes to a
sibling ``.tmp`` and ``Path.replace`` atomically.
**Smaller items:**
* ``executor.py`` docstring opened with "four phases" but listed six.
Now says "six phases" to match.
* ``[annotations]`` extra in ``pyproject.toml`` now includes
``openai>=1.40,<2.0``. Default ``VlmConfig.backend`` is ``"openai"``,
so without it ``_make_openai_client`` would ImportError on a fresh
``uv sync --extra annotations``.
* ``_snap_to_frame`` was duplicated identically in
``plan_subtasks_memory.py`` and ``interjections_and_speech.py``.
Promoted to ``snap_to_frame`` in ``reader.py`` (next to
``EpisodeRecord``); both modules now import it. Backwards-compat alias
not needed — no external callers.
* ``EpisodeRecord.frames_df()`` was re-reading the full parquet on every
call. Now memoizes via a private dataclass field so repeat calls from
different modules pay the cost once. Method signature unchanged.
* ``_extract_first_json_object`` had a redundant ``and not escape`` guard
that was dead because the prior block already handled and reset
``escape``. Replaced with a comment explaining the invariant.
**Pre-existing lint cleanups surfaced once these files entered
pre-commit's scope:**
* dead local ``client = clients[0]`` in ``_make_openai_client`` (the
real round-robin uses ``clients[rr_counter[...]]``).
* ``cmd = ... if "{port}" in cmd else f"...{port}"`` ternary collapse in
``_spawn_parallel_inference_servers``.
* ``seek_pts = 0 if stream.time_base is None else int(...)`` ternary
collapse in ``_decode_pyav_direct``.
* ``# nosec B310`` on the localhost ``urllib.request.urlopen`` probe in
``_server_is_up`` — the URL is the user-configured local-server endpoint
the CLI itself spawned, not arbitrary user input.
**Test added.**
``tests/annotations/test_frames.py`` pins the regression on
``VideoFrameProvider``: asserts ``video_for_episode`` and
``episode_clip_path`` are callable methods (not nested dead code or
free functions), and that the ``_lock`` field is a real
``threading.Lock``.
Sweep: 64 passed, 2 failed (same pre-existing module-impl bugs as
before this commit). Pre-commit clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,67 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
"""Unit tests for :class:`VideoFrameProvider` method bindings.
|
||||
|
||||
These were prompted by a real regression: ``video_for_episode`` was
|
||||
indented one level too deep so it ended up nested *inside* the
|
||||
``_decode_pyav_direct`` helper (after that function's ``return``
|
||||
statement) — silently dead code that meant production runs with
|
||||
``use_video_url=False`` would ``AttributeError`` on
|
||||
``self.frame_provider.video_for_episode(...)``. The existing module
|
||||
tests didn't catch it because they exercise stub providers.
|
||||
|
||||
The tests below assert on the class itself (not on an instance), so a
|
||||
future reindent regression flips them to red without needing a real
|
||||
LeRobot dataset on disk.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import pytest
|
||||
|
||||
pytest.importorskip("datasets", reason="datasets is required (install lerobot[dataset])")
|
||||
|
||||
from lerobot.annotations.steerable_pipeline.frames import ( # noqa: E402
|
||||
VideoFrameProvider,
|
||||
)
|
||||
|
||||
|
||||
def test_video_for_episode_is_a_method_of_videoframeprovider():
|
||||
"""``video_for_episode`` must be a bound method, not nested dead code."""
|
||||
assert callable(getattr(VideoFrameProvider, "video_for_episode", None))
|
||||
|
||||
|
||||
def test_episode_clip_path_is_a_method_of_videoframeprovider():
|
||||
"""``episode_clip_path`` is now a method (was a free function reaching
|
||||
into ``provider._meta`` from outside the class)."""
|
||||
assert callable(getattr(VideoFrameProvider, "episode_clip_path", None))
|
||||
|
||||
|
||||
def test_videoframeprovider_has_a_lock_for_concurrent_use():
|
||||
"""A ``ThreadPoolExecutor`` runs Module 1/2/3 phases concurrently;
|
||||
the cache + warn-flag accesses must be guarded.
|
||||
"""
|
||||
import threading
|
||||
|
||||
# Fresh-instance check via a minimal fake to avoid touching the hub.
|
||||
# The lock is declared with ``init=False`` and has a default factory,
|
||||
# so a constructed instance must own a real ``threading.Lock``.
|
||||
lock_field = next(
|
||||
(f for f in VideoFrameProvider.__dataclass_fields__.values() if f.name == "_lock"),
|
||||
None,
|
||||
)
|
||||
assert lock_field is not None
|
||||
assert lock_field.default_factory is threading.Lock
|
||||
Reference in New Issue
Block a user