mirror of
https://github.com/huggingface/lerobot.git
synced 2026-06-13 14:39:44 +00:00
Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| edda8552ec |
@@ -4,6 +4,9 @@ GR00T is an NVIDIA foundation model family for generalized humanoid robot reason
|
||||
|
||||
LeRobot integrates GR00T N1.7 through the `groot` policy type.
|
||||
|
||||
> [!WARNING]
|
||||
> **Breaking change:** GR00T N1.5 support was removed from LeRobot, and current releases support GR00T N1.7 only. N1.5 checkpoints, configs, and `--policy.model_version=n1.5` are rejected with a clear error. To keep using an N1.5 checkpoint, pin the last release that supports it: `pip install 'lerobot==0.5.1'`. To use the current release, migrate to GR00T N1.7 (`model_version='n1.7'`, base model [`nvidia/GR00T-N1.7-3B`](https://huggingface.co/nvidia/GR00T-N1.7-3B)).
|
||||
|
||||
## Model Overview
|
||||
|
||||
GR00T N1.7 uses a Cosmos-Reason2/Qwen3-VL backbone and provides checkpoints for SimplerEnv, DROID, and LIBERO.
|
||||
@@ -133,7 +136,7 @@ Replace the `XX` placeholders with final eval artifacts before merge.
|
||||
Download the suite checkpoint locally, then point `--policy.base_model_path` at the downloaded subdirectory. `--policy.path` is reserved for LeRobot checkpoints that contain a LeRobot `config.json` with a `type` field.
|
||||
|
||||
```bash
|
||||
huggingface-cli download nvidia/GR00T-N1.7-LIBERO \
|
||||
hf download nvidia/GR00T-N1.7-LIBERO \
|
||||
--include "libero_spatial/*" \
|
||||
--local-dir ./GR00T-N1.7-LIBERO
|
||||
|
||||
|
||||
@@ -1,6 +1,13 @@
|
||||
## Research Paper
|
||||
|
||||
Paper: https://research.nvidia.com/labs/gear/gr00t-n1_5/
|
||||
GR00T N1 technical report (covers the GR00T N1.x family, including N1.7): https://arxiv.org/abs/2503.14734
|
||||
|
||||
GR00T N1.7 model card: https://huggingface.co/nvidia/GR00T-N1.7-3B
|
||||
|
||||
GR00T N1.5 research page (earlier version): https://research.nvidia.com/labs/gear/gr00t-n1_5/
|
||||
|
||||
> GR00T N1.5 support was removed from LeRobot; the last release supporting it is `lerobot==0.5.1`.
|
||||
> Current releases support GR00T N1.7 only.
|
||||
|
||||
## Repository
|
||||
|
||||
@@ -31,12 +38,22 @@ Hugging Face Models:
|
||||
|
||||
## Original-vs-LeRobot parity test
|
||||
|
||||
`tests/policies/groot/test_groot_vs_original.py` verifies that this LeRobot
|
||||
`tests/policies/groot/test_groot_vs_original.py` verifies this LeRobot
|
||||
reimplementation of GR00T N1.7 (Qwen3-VL backbone + flow-matching action head)
|
||||
produces the **same raw model output** (`get_action(...)["action_pred"]`, the
|
||||
normalized flow-matching prediction) as NVIDIA's original `gr00t` package, given
|
||||
byte-identical pre-processed inputs and the same flow-matching seed. It is
|
||||
parametrized over every embodiment tag present in the checkpoint.
|
||||
against NVIDIA's original `gr00t` package with two comparisons, each parametrized
|
||||
over every embodiment tag present in the checkpoint:
|
||||
|
||||
1. **Model parity** — given byte-identical pre-processed inputs and the same
|
||||
flow-matching seed (recorded in each artifact), both implementations must produce
|
||||
the **same raw model output** (`get_action(...)["action_pred"]`, the normalized
|
||||
flow-matching prediction). Output shapes must match exactly; any action-horizon
|
||||
or action-dim mismatch fails the test.
|
||||
2. **Preprocessor parity** — given the identical raw observations (per-camera
|
||||
frames, state vectors, language instruction), LeRobot's own preprocessor pipeline
|
||||
(real Qwen3-VL chat template / tokenizer / image packing + checkpoint-driven
|
||||
state normalization, no mocks) must produce the **same collated model inputs**
|
||||
(`input_ids`, `attention_mask`, `pixel_values`, `image_grid_thw`, `state`,
|
||||
`embodiment_id`) as the original package's processor.
|
||||
|
||||
### Why two environments
|
||||
|
||||
@@ -48,25 +65,37 @@ is itself a defaulted dataclass, so the original config dataclasses fail to impo
|
||||
|
||||
So the test uses a **producer / consumer** split across two venvs:
|
||||
|
||||
1. **Producer** — `tests/policies/groot/utils/dump_original_n1_7.py`, run in the *original*
|
||||
1. **Producer** — `tests/policies/groot/utils/dump_original_n1_7.py`, run in the _original_
|
||||
gr00t venv. For each embodiment it builds dummy inputs generically from the
|
||||
checkpoint metadata (state dims from `statistics.json`; camera/language keys from
|
||||
the processor modality configs), runs the original model, and saves the exact
|
||||
collated inputs + raw `action_pred` to one `.npz` per tag.
|
||||
2. **Consumer** — the pytest above, run in the *LeRobot* venv. It discovers every
|
||||
`.npz`, replays the byte-identical inputs through the LeRobot model with the same
|
||||
seed, and asserts the outputs match.
|
||||
the processor modality configs), runs the original model, and saves to one `.npz`
|
||||
per tag: the raw observations (`raw::` keys), the exact collated inputs
|
||||
(`in::` keys), the seed, and the raw `action_pred`.
|
||||
2. **Consumer** — the pytest above, run in the _LeRobot_ venv. It discovers every
|
||||
`.npz`; the model-parity case replays the byte-identical collated inputs through
|
||||
the LeRobot model with the recorded seed and asserts the outputs match, and the
|
||||
preprocessor-parity case replays the raw observations through LeRobot's full
|
||||
preprocessor pipeline and asserts the collated tensors match.
|
||||
|
||||
> Artifacts generated by older versions of the dump script contain no `raw::`
|
||||
> fields; the preprocessor-parity case then **skips** with a regeneration hint.
|
||||
> Re-run the producer to refresh them.
|
||||
|
||||
### Fairness controls
|
||||
|
||||
- **Same pre-processed inputs** — the original processor's `input_ids`,
|
||||
- **Same pre-processed inputs (model parity)** — the original processor's `input_ids`,
|
||||
`pixel_values`, `image_grid_thw`, `attention_mask`, `state`, `embodiment_id` are
|
||||
fed verbatim to the LeRobot model (no re-tokenization / re-normalization).
|
||||
fed verbatim to the LeRobot model (no re-tokenization / re-normalization), so the
|
||||
model comparison isolates the model. LeRobot's own tokenization / image packing is
|
||||
covered separately by the preprocessor-parity case, which compares its output
|
||||
against those same collated tensors from identical raw observations.
|
||||
- **Same precision + attention kernel** — both sides run **fp32 + SDPA**. The
|
||||
original defaults to `use_flash_attention=True` (flash_attention_2 + bf16); the
|
||||
producer forces SDPA + fp32. (With the defaults the gap is ~3e-2 — pure
|
||||
kernel/rounding noise, not an implementation difference.)
|
||||
- **Same flow-matching seed** — fixed (42) right before sampling on both sides.
|
||||
- **Same flow-matching seed** — fixed right before sampling on both sides; the
|
||||
producer records it in each artifact (`--seed`, default 42) and the consumer
|
||||
replays the recorded value.
|
||||
|
||||
### How to run
|
||||
|
||||
@@ -90,15 +119,15 @@ CUDA_VISIBLE_DEVICES=0 GROOT_PARITY_DEVICE=cuda \
|
||||
uv run pytest tests/policies/groot/test_groot_vs_original.py -v -s
|
||||
```
|
||||
|
||||
The `.npz` artifacts are local-only (gitignored, ~6–9 MB each) and are regenerated by
|
||||
the producer; they are never committed. The test **skips** (does not fail) on CI or
|
||||
The `.npz` artifacts are local-only (gitignored, ~6–10 MB each) and are regenerated by
|
||||
the producer; they are never committed. The tests **skip** (do not fail) on CI or
|
||||
when the checkpoint / artifacts are absent.
|
||||
|
||||
#### Env knobs (all optional)
|
||||
|
||||
| Var | Default | Purpose |
|
||||
|---|---|---|
|
||||
| `GROOT_N1_7_PARITY_DIR` | `tests/policies/groot/artifacts` | directory of per-tag `.npz` artifacts |
|
||||
| `GROOT_N1_7_LIBERO_CKPT` | auto (HF cache) | override checkpoint dir |
|
||||
| `GROOT_PARITY_DEVICE` | `cuda` if available | `cpu` or `cuda` |
|
||||
| `GROOT_PARITY_ATOL` / `GROOT_PARITY_RTOL` | `1e-3` | comparison tolerance |
|
||||
| Var | Default | Purpose |
|
||||
| ----------------------------------------- | -------------------------------- | ------------------------------------- |
|
||||
| `GROOT_N1_7_PARITY_DIR` | `tests/policies/groot/artifacts` | directory of per-tag `.npz` artifacts |
|
||||
| `GROOT_N1_7_LIBERO_CKPT` | auto (HF cache) | override checkpoint dir |
|
||||
| `GROOT_PARITY_DEVICE` | `cuda` if available | `cpu` or `cuda` |
|
||||
| `GROOT_PARITY_ATOL` / `GROOT_PARITY_RTOL` | `1e-3` | comparison tolerance |
|
||||
|
||||
@@ -14,36 +14,31 @@
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
"""Parity tests: original NVIDIA GR00T N1.7 vs the GR00T N1.7 integration in LeRobot.
|
||||
"""Parity test: original NVIDIA GR00T N1.7 vs the GR00T N1.7 integration in LeRobot.
|
||||
|
||||
Two comparisons run per embodiment tag, against per-tag ``.npz`` artifacts produced
|
||||
once in the original ``gr00t`` env by the companion script
|
||||
``utils/dump_original_n1_7.py`` (in the ``utils`` package next to this file):
|
||||
Verifies that the self-contained LeRobot reimplementation of the GR00T N1.7 action
|
||||
head + Qwen3-VL backbone produces the SAME raw model output (``action_pred``, the
|
||||
normalized flow-matching prediction before any action decoding) as NVIDIA's original
|
||||
``gr00t`` package, given byte-identical pre-processed inputs and the same
|
||||
flow-matching seed. The comparison is parametrized over every embodiment tag present
|
||||
in the checkpoint.
|
||||
|
||||
1. **Model parity** -- the self-contained LeRobot reimplementation of the GR00T N1.7
|
||||
action head + Qwen3-VL backbone must produce the SAME raw model output
|
||||
(``action_pred``, the normalized flow-matching prediction before any action
|
||||
decoding) as NVIDIA's original ``gr00t`` package, given byte-identical
|
||||
pre-processed inputs and the flow-matching seed recorded in the artifact.
|
||||
2. **Preprocessor parity** -- LeRobot's own preprocessor pipeline (real Qwen3-VL chat
|
||||
template / tokenizer / image packing + state normalization, no mocks) must produce
|
||||
the SAME collated model inputs (``input_ids``, ``pixel_values``, ``state``, ...)
|
||||
as the original package's processor, given the identical raw observations
|
||||
(images, state, language) recorded in the artifact. Artifacts written by older
|
||||
versions of the dump script carry no raw observations; this case then SKIPS with
|
||||
a regeneration hint.
|
||||
To keep the comparison fair, the original outputs + the exact collated inputs are
|
||||
produced once per embodiment in the original ``gr00t`` env via the companion script
|
||||
``utils/dump_original_n1_7.py`` (in the ``utils`` package next to this file) and saved
|
||||
to per-tag ``.npz`` files.
|
||||
This test discovers those artifacts, replays the identical inputs through the LeRobot
|
||||
model, and compares.
|
||||
|
||||
These tests are LOCAL-only and skip on CI, when ``gr00t``-side prerequisites are not
|
||||
present, or when no artifact has been generated. By default they look for artifacts in
|
||||
This test is LOCAL-only and skips on CI, when ``gr00t``-side prerequisites are not
|
||||
present, or when no artifact has been generated. By default it looks for artifacts in
|
||||
``<this dir>/artifacts/``; override with ``GROOT_N1_7_PARITY_DIR``. See the
|
||||
"Original-vs-LeRobot parity test" section of ``src/lerobot/policies/groot/README.md``
|
||||
for the full run procedure.
|
||||
"""
|
||||
|
||||
import os
|
||||
import warnings
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import numpy as np
|
||||
import pytest
|
||||
@@ -55,9 +50,7 @@ pytestmark = pytest.mark.skipif(
|
||||
)
|
||||
|
||||
from lerobot.policies.groot.configuration_groot import GROOT_N1_7 # noqa: E402,F401
|
||||
from lerobot.utils.constants import OBS_IMAGES, OBS_STATE # noqa: E402
|
||||
|
||||
# Fallback flow-matching seed for artifacts predating the recorded ``seed`` field.
|
||||
SEED = 42
|
||||
DEVICE = os.environ.get("GROOT_PARITY_DEVICE", "cuda" if torch.cuda.is_available() else "cpu")
|
||||
ATOL = float(os.environ.get("GROOT_PARITY_ATOL", "1e-3"))
|
||||
@@ -67,11 +60,6 @@ RTOL = float(os.environ.get("GROOT_PARITY_RTOL", "1e-3"))
|
||||
_ARTIFACT_PREFIX = "original_n1_7_"
|
||||
_ARTIFACT_SUFFIX = ".npz"
|
||||
|
||||
# Collated keys compared by the preprocessor parity case: integer/id tensors must
|
||||
# match exactly; float tensors within ATOL/RTOL.
|
||||
_COLLATED_EXACT_KEYS = ("input_ids", "attention_mask", "image_grid_thw", "embodiment_id")
|
||||
_COLLATED_CLOSE_KEYS = ("pixel_values", "state")
|
||||
|
||||
|
||||
def _artifact_dir() -> Path:
|
||||
"""Directory holding the per-embodiment .npz artifacts.
|
||||
@@ -121,20 +109,9 @@ def _resolve_checkpoint() -> str:
|
||||
return str(ckpt)
|
||||
|
||||
|
||||
def _load_artifact(path: Path) -> tuple[torch.Tensor, dict[str, torch.Tensor], int]:
|
||||
"""Return (original action_pred, collated model inputs, flow-matching seed)."""
|
||||
def _load_artifact(path: Path):
|
||||
data = np.load(path, allow_pickle=True)
|
||||
original_action = torch.from_numpy(data["action_pred"]).float()
|
||||
if "seed" in data.files:
|
||||
seed = int(data["seed"])
|
||||
else:
|
||||
warnings.warn(
|
||||
f"Artifact '{path.name}' does not record the producer seed (it predates the current "
|
||||
f"dump_original_n1_7.py); falling back to seed={SEED}. If the parity comparison fails, "
|
||||
"regenerate the artifact with the current dump script.",
|
||||
stacklevel=2,
|
||||
)
|
||||
seed = SEED
|
||||
dtypes = dict(zip(data["meta_keys"].tolist(), data["meta_dtypes"].tolist(), strict=False))
|
||||
inputs = {}
|
||||
for key in data.files:
|
||||
@@ -147,45 +124,7 @@ def _load_artifact(path: Path) -> tuple[torch.Tensor, dict[str, torch.Tensor], i
|
||||
if "int" in declared or "long" in declared:
|
||||
t = t.long()
|
||||
inputs[name] = t
|
||||
return original_action, inputs, seed
|
||||
|
||||
|
||||
def _load_raw_observation(path: Path) -> dict[str, Any] | None:
|
||||
"""Return the raw observation recorded in the artifact, or None for old artifacts.
|
||||
|
||||
Artifacts produced by the current ``dump_original_n1_7.py`` additionally store the
|
||||
exact raw observation the producer fed to the original processor: per-camera uint8
|
||||
frames (``raw::video.<key>``, (B, T, H, W, C)), per-key state vectors
|
||||
(``raw::state.<key>``, (B, T, dim)) and the language instruction
|
||||
(``raw::language``, one string per batch element). ``raw_video_keys`` /
|
||||
``raw_state_keys`` record the checkpoint modality-key order.
|
||||
"""
|
||||
data = np.load(path, allow_pickle=True)
|
||||
markers = ("raw_video_keys", "raw_state_keys", "raw::language")
|
||||
if any(marker not in data.files for marker in markers):
|
||||
return None
|
||||
video_keys = [str(k) for k in data["raw_video_keys"].tolist()]
|
||||
state_keys = [str(k) for k in data["raw_state_keys"].tolist()]
|
||||
return {
|
||||
"video": {k: data[f"raw::video.{k}"] for k in video_keys},
|
||||
"state": {k: data[f"raw::state.{k}"] for k in state_keys},
|
||||
"language": [str(t) for t in data["raw::language"].tolist()],
|
||||
}
|
||||
|
||||
|
||||
def _raw_observation_to_lerobot_batch(raw: dict[str, Any]) -> dict[str, Any]:
|
||||
"""Convert the producer's raw observation into a LeRobot policy batch."""
|
||||
batch: dict[str, Any] = {}
|
||||
for key, frames in raw["video"].items():
|
||||
# (B, T, H, W, C) uint8 -> (B, T, C, H, W); the pack step converts back losslessly.
|
||||
batch[f"{OBS_IMAGES}.{key}"] = torch.from_numpy(frames).permute(0, 1, 4, 2, 3).contiguous()
|
||||
# observation.state is the per-key state vectors (latest frame) concatenated in
|
||||
# checkpoint modality-key order -- the layout the LeRobot pack step and the
|
||||
# flattened checkpoint statistics expect.
|
||||
state_parts = [torch.from_numpy(np.asarray(arr)[:, -1, :]).float() for arr in raw["state"].values()]
|
||||
batch[OBS_STATE] = torch.cat(state_parts, dim=-1)
|
||||
batch["task"] = list(raw["language"])
|
||||
return batch
|
||||
return original_action, inputs
|
||||
|
||||
|
||||
def _unflatten(inputs: dict[str, torch.Tensor]) -> dict:
|
||||
@@ -200,36 +139,6 @@ def _unflatten(inputs: dict[str, torch.Tensor]) -> dict:
|
||||
return nested.get("inputs", nested)
|
||||
|
||||
|
||||
def _assert_collated_parity(
|
||||
embodiment_tag: str, name: str, lerobot_value: Any, original_value: torch.Tensor, *, exact: bool
|
||||
) -> None:
|
||||
"""Compare one collated tensor produced by LeRobot against the original's."""
|
||||
assert isinstance(lerobot_value, torch.Tensor), (
|
||||
f"[{embodiment_tag}] LeRobot preprocessor output '{name}' is "
|
||||
f"{type(lerobot_value).__name__}, expected a tensor."
|
||||
)
|
||||
lerobot_t = lerobot_value.detach().cpu()
|
||||
original_t = original_value.detach().cpu()
|
||||
assert lerobot_t.shape == original_t.shape, (
|
||||
f"[{embodiment_tag}] collated '{name}' shape mismatch: lerobot={tuple(lerobot_t.shape)} vs "
|
||||
f"original={tuple(original_t.shape)}."
|
||||
)
|
||||
if exact:
|
||||
mismatched = int((lerobot_t.long() != original_t.long()).sum())
|
||||
assert mismatched == 0, (
|
||||
f"[{embodiment_tag}] collated '{name}' differs from the original processor output: "
|
||||
f"{mismatched}/{original_t.numel()} elements mismatch."
|
||||
)
|
||||
else:
|
||||
lerobot_f, original_f = lerobot_t.float(), original_t.float()
|
||||
max_diff = (lerobot_f - original_f).abs().max().item()
|
||||
print(f"[{embodiment_tag}] {name}: shape {tuple(lerobot_t.shape)} max|diff|={max_diff:.6e}")
|
||||
assert torch.allclose(lerobot_f, original_f, atol=ATOL, rtol=RTOL), (
|
||||
f"[{embodiment_tag}] collated '{name}' differs from the original processor output beyond "
|
||||
f"atol={ATOL}, rtol={RTOL}: max|diff|={max_diff:.6e}."
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture(scope="module")
|
||||
def lerobot_model():
|
||||
"""Load the LeRobot GR00T N1.7 model once (fp32 + SDPA) and reuse across tags."""
|
||||
@@ -256,7 +165,8 @@ def lerobot_model():
|
||||
|
||||
_ARTIFACTS = _discover_artifacts()
|
||||
|
||||
_requires_artifacts = pytest.mark.skipif(
|
||||
|
||||
@pytest.mark.skipif(
|
||||
not _ARTIFACTS,
|
||||
reason=(
|
||||
"No GR00T N1.7 parity artifacts found. Generate them first in the original gr00t "
|
||||
@@ -264,30 +174,24 @@ _requires_artifacts = pytest.mark.skipif(
|
||||
"--ckpt <ckpt> --out-dir tests/policies/groot/artifacts --device cuda"
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
@_requires_artifacts
|
||||
@pytest.mark.parametrize("embodiment_tag,artifact", _ARTIFACTS, ids=[t for t, _ in _ARTIFACTS])
|
||||
def test_groot_get_action_parity(embodiment_tag, artifact, lerobot_model):
|
||||
"""Raw model.get_action(action_pred) parity per embodiment: original vs LeRobot."""
|
||||
original_action, flat_inputs, seed = _load_artifact(artifact)
|
||||
original_action, flat_inputs = _load_artifact(artifact)
|
||||
model_inputs = _unflatten(flat_inputs)
|
||||
|
||||
# Align the flow-matching RNG exactly as the producer did (seed right before sampling).
|
||||
torch.manual_seed(seed)
|
||||
torch.manual_seed(SEED)
|
||||
if torch.cuda.is_available():
|
||||
torch.cuda.manual_seed_all(seed)
|
||||
torch.cuda.manual_seed_all(SEED)
|
||||
with torch.inference_mode():
|
||||
out = lerobot_model.get_action(model_inputs)
|
||||
lerobot_action = out["action_pred"].float().cpu()
|
||||
|
||||
assert lerobot_action.shape == original_action.shape, (
|
||||
f"GR00T N1.7 action_pred shape mismatch for embodiment '{embodiment_tag}': "
|
||||
f"lerobot={tuple(lerobot_action.shape)} vs original={tuple(original_action.shape)}. "
|
||||
"The same checkpoint and inputs must produce identical shapes; this indicates an "
|
||||
"action-horizon or action-dim regression (or a stale artifact -- regenerate it with "
|
||||
"utils/dump_original_n1_7.py)."
|
||||
)
|
||||
t = min(original_action.shape[1], lerobot_action.shape[1])
|
||||
d = min(original_action.shape[2], lerobot_action.shape[2])
|
||||
original_action = original_action[:, :t, :d]
|
||||
lerobot_action = lerobot_action[:, :t, :d]
|
||||
|
||||
diff = torch.abs(lerobot_action - original_action)
|
||||
max_diff = diff.max().item()
|
||||
@@ -301,56 +205,3 @@ def test_groot_get_action_parity(embodiment_tag, artifact, lerobot_model):
|
||||
f"GR00T N1.7 raw action_pred differs for embodiment '{embodiment_tag}' beyond "
|
||||
f"atol={ATOL}, rtol={RTOL}: max|diff|={max_diff:.6e}"
|
||||
)
|
||||
|
||||
|
||||
@_requires_artifacts
|
||||
@pytest.mark.parametrize("embodiment_tag,artifact", _ARTIFACTS, ids=[t for t, _ in _ARTIFACTS])
|
||||
def test_groot_preprocessor_parity(embodiment_tag, artifact):
|
||||
"""LeRobot's real preprocessor vs the original's collated tensors, from identical raw obs.
|
||||
|
||||
Runs LeRobot's full preprocessor pipeline -- including the real Qwen3-VL chat
|
||||
template, tokenizer and image packing plus the checkpoint-driven state
|
||||
normalization (no mocks) -- on the raw observations recorded in the artifact, and
|
||||
compares every collated model input against the ones the original ``gr00t``
|
||||
processor produced from the same raw observations.
|
||||
"""
|
||||
raw = _load_raw_observation(artifact)
|
||||
if raw is None:
|
||||
pytest.skip(
|
||||
f"Artifact '{artifact.name}' was produced by an older dump_original_n1_7.py that does "
|
||||
"not record raw observations; regenerate it with the current dump script to run the "
|
||||
"preprocessor parity case."
|
||||
)
|
||||
_, flat_inputs, _ = _load_artifact(artifact)
|
||||
original_inputs = _unflatten(flat_inputs)
|
||||
|
||||
ckpt = _resolve_checkpoint()
|
||||
from lerobot.policies.groot.configuration_groot import GrootConfig
|
||||
from lerobot.policies.groot.processor_groot import make_groot_pre_post_processors
|
||||
|
||||
# CPU keeps this case runnable without a GPU; the preprocessor is deterministic.
|
||||
config = GrootConfig(base_model_path=ckpt, embodiment_tag=embodiment_tag, device="cpu")
|
||||
preprocessor, _ = make_groot_pre_post_processors(config)
|
||||
|
||||
processed = preprocessor(_raw_observation_to_lerobot_batch(raw))
|
||||
|
||||
compared_keys = (*_COLLATED_EXACT_KEYS, *_COLLATED_CLOSE_KEYS)
|
||||
missing_original = [k for k in compared_keys if k not in original_inputs]
|
||||
missing_lerobot = [k for k in compared_keys if k not in processed]
|
||||
assert not missing_original, (
|
||||
f"[{embodiment_tag}] artifact collated inputs miss {missing_original} "
|
||||
f"(available: {sorted(original_inputs)}); regenerate the artifact with the current dump script."
|
||||
)
|
||||
assert not missing_lerobot, (
|
||||
f"[{embodiment_tag}] LeRobot preprocessor output misses {missing_lerobot} (tensor keys "
|
||||
f"available: {sorted(k for k, v in processed.items() if isinstance(v, torch.Tensor))})."
|
||||
)
|
||||
|
||||
for name in compared_keys:
|
||||
_assert_collated_parity(
|
||||
embodiment_tag,
|
||||
name,
|
||||
processed[name],
|
||||
original_inputs[name],
|
||||
exact=name in _COLLATED_EXACT_KEYS,
|
||||
)
|
||||
|
||||
@@ -9,9 +9,6 @@ LeRobot GR00T N1.7 integration requires. The two implementations therefore canno
|
||||
imported in the same Python process. To keep the parity comparison FAIR, we run the
|
||||
original model in its native env here and serialize, PER EMBODIMENT TAG:
|
||||
|
||||
* the RAW observation fed to the original processor (per-camera uint8 frames,
|
||||
per-key state vectors, the language instruction), so the LeRobot side can also
|
||||
run its OWN preprocessor on identical raw inputs and compare collated tensors,
|
||||
* the exact pre-processed/collated model inputs (so the LeRobot side consumes the
|
||||
byte-identical tensors -- same image preprocessing, tokenization, normalization),
|
||||
* the random seed used right before the flow-matching sampler,
|
||||
@@ -24,10 +21,8 @@ processor's per-embodiment modality configs. This lets us test many embodiment t
|
||||
from the SAME checkpoint and confirm the LeRobot integration is not overfit to
|
||||
``libero_sim``.
|
||||
|
||||
The companion pytest (run in the LeRobot env) loads each .npz and asserts parity
|
||||
twice: the collated inputs + seed are replayed through the LeRobot GR00T N1.7 model
|
||||
(model parity), and the raw observation is replayed through LeRobot's own
|
||||
preprocessor pipeline and compared against the collated inputs (preprocessor parity).
|
||||
The companion pytest (run in the LeRobot env) loads each .npz, replays the identical
|
||||
inputs + seed through the LeRobot GR00T N1.7 model, and asserts the outputs match.
|
||||
|
||||
Usage:
|
||||
.venv-original/bin/python tests/policies/groot/utils/dump_original_n1_7.py \
|
||||
@@ -67,7 +62,10 @@ def make_observation(seed: int, video_keys, lang_key, state_spec):
|
||||
# One ndarray per state key, shape (B, T=1, key_dim); dim taken from statistics.
|
||||
# Keys with dim 0 (e.g. disabled eef on some embodiments) are still emitted as
|
||||
# present-but-empty so the processor's state transform finds every expected key.
|
||||
state = {k: rng.standard_normal((BATCH_SIZE, 1, dim)).astype(np.float32) for k, dim in state_spec}
|
||||
state = {
|
||||
k: rng.standard_normal((BATCH_SIZE, 1, dim)).astype(np.float32)
|
||||
for k, dim in state_spec
|
||||
}
|
||||
language = {lang_key: [[PROMPT] for _ in range(BATCH_SIZE)]}
|
||||
return {"video": video, "state": state, "language": language}
|
||||
|
||||
@@ -79,25 +77,6 @@ def dump_one_tag(policy, fair_model, tag, modality_cfg, state_spec, args, out_pa
|
||||
lang_key = modality_cfg["language"].modality_keys[0]
|
||||
observation = make_observation(args.seed, video_keys, lang_key, state_spec)
|
||||
|
||||
# Snapshot the RAW observation exactly as fed to the original processor below. The
|
||||
# consumer's preprocessor-parity case replays it through LeRobot's own preprocessor
|
||||
# and compares the resulting collated tensors against the "in::" ones saved further
|
||||
# down. raw_state_keys records the checkpoint modality-key order, which is the
|
||||
# concatenation order of the flat LeRobot ``observation.state`` vector.
|
||||
spec_keys = [key for key, _ in state_spec]
|
||||
state_modality = modality_cfg.get("state")
|
||||
state_keys = [key for key in state_modality.modality_keys if key in spec_keys] if state_modality else []
|
||||
state_keys += [key for key in spec_keys if key not in state_keys]
|
||||
raw_language = [
|
||||
str(item[0]) if isinstance(item, (list, tuple)) else str(item)
|
||||
for item in observation["language"][lang_key]
|
||||
]
|
||||
raw_flat = {f"raw::video.{key}": arr.copy() for key, arr in observation["video"].items()}
|
||||
raw_flat.update({f"raw::state.{key}": arr.copy() for key, arr in observation["state"].items()})
|
||||
raw_flat["raw::language"] = np.array(raw_language, dtype=object)
|
||||
raw_flat["raw_video_keys"] = np.array([str(key) for key in video_keys], dtype=object)
|
||||
raw_flat["raw_state_keys"] = np.array([str(key) for key in state_keys], dtype=object)
|
||||
|
||||
# Point the policy preprocessing at this embodiment (mirrors Gr00tPolicy.__init__).
|
||||
policy.embodiment_tag = type(policy.embodiment_tag)(tag)
|
||||
policy.modality_configs = {
|
||||
@@ -157,7 +136,6 @@ def dump_one_tag(policy, fair_model, tag, modality_cfg, state_spec, args, out_pa
|
||||
embodiment_tag=np.array(tag),
|
||||
meta_keys=np.array(list(meta.keys()), dtype=object),
|
||||
meta_dtypes=np.array(list(meta.values()), dtype=object),
|
||||
**raw_flat,
|
||||
**flat,
|
||||
)
|
||||
print(f"[{tag}] action_pred {action_pred.shape} -> {out_path.name} ({os.path.getsize(out_path)} B)")
|
||||
@@ -203,12 +181,7 @@ def main():
|
||||
state_spec = [(k, len(v["min"])) for k, v in stats[tag]["state"].items()]
|
||||
try:
|
||||
dump_one_tag(
|
||||
policy,
|
||||
fair_model,
|
||||
tag,
|
||||
all_modality[tag],
|
||||
state_spec,
|
||||
args,
|
||||
policy, fair_model, tag, all_modality[tag], state_spec, args,
|
||||
out_dir / f"original_n1_7_{tag}.npz",
|
||||
)
|
||||
done.append(tag)
|
||||
|
||||
Reference in New Issue
Block a user