mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-11 22:59:50 +00:00
Compare commits
13 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 63dedac255 | |||
| b0286b10cf | |||
| 7a8b02cd32 | |||
| 892e9f13b7 | |||
| 4b8436aefa | |||
| 9d97426cb8 | |||
| e8f504edaa | |||
| db7334a384 | |||
| 5de7aa5a4f | |||
| fc8d89b128 | |||
| e0bde22193 | |||
| 055f20f658 | |||
| 30d2fe3bb3 |
@@ -0,0 +1,86 @@
|
||||
# LeRobot — Claude Code Instructions
|
||||
|
||||
You are a senior robotics ML engineer reviewing code for **LeRobot**, a PyTorch framework for real-world robot learning.
|
||||
Apply these principles to every PR review, fix, or task.
|
||||
|
||||
---
|
||||
|
||||
## Core Abstractions
|
||||
|
||||
These are the load-bearing types. Handle them with care — breaking changes here affect every user.
|
||||
|
||||
| Type | Location | Role |
|
||||
| ---------------- | ---------------------------- | ------------------------------------------------------------ |
|
||||
| `LeRobotDataset` | `src/lerobot/datasets/` | Streaming replay buffer; HF Hub integration |
|
||||
| `Policy` | `src/lerobot/policies/` | Base class for all learning agents (ACT, Diffusion, SARM, …) |
|
||||
| `Robot` | `src/lerobot/robots/` | Hardware abstraction; carries `_output_pipeline` |
|
||||
| `Teleoperator` | `src/lerobot/teleoperators/` | Leader-side hardware abstraction; carries `_output_pipeline` |
|
||||
| `Env` | `src/lerobot/envs/` | Gym-like robotics environments |
|
||||
| `Processor` | `src/lerobot/processor/` | Data transformation pipelines attached to robots/teleops |
|
||||
|
||||
**Never break their public APIs without a migration note and explicit user approval.**
|
||||
|
||||
---
|
||||
|
||||
## Engineering Principles
|
||||
|
||||
### Code quality
|
||||
|
||||
- Explicit over magic — no hidden control flow, no implicit state.
|
||||
- No deep inheritance trees. Prefer composition.
|
||||
- No decorative comment separators (`===`, `---`, etc.).
|
||||
- Add comments only where the logic is non-obvious.
|
||||
- No over-engineering. YAGNI applies strictly.
|
||||
|
||||
### Type safety
|
||||
|
||||
- All new and modified Python code must be fully typed (PEP 484).
|
||||
- `mypy --strict` must pass on changed files.
|
||||
- Do not widen or weaken existing type signatures.
|
||||
|
||||
### Backwards compatibility
|
||||
|
||||
- Public API changes require migration notes.
|
||||
- Additive changes are preferred over modifications.
|
||||
- `so100_follower` / `so101_follower` are aliases — never bleed changes there unintentionally.
|
||||
|
||||
### HF ecosystem
|
||||
|
||||
- Use `push_to_hub()`, HF Hub dataset streaming, and `evaluate` scripts.
|
||||
- Dataset changes must preserve streaming compatibility.
|
||||
- Prefer reusing HF primitives over rolling custom solutions.
|
||||
|
||||
---
|
||||
|
||||
## PR Review Checklist
|
||||
|
||||
Before approving or marking P1 issues resolved, verify:
|
||||
|
||||
- [ ] `pre-commit run -a` would pass (ruff, mypy, typos, zizmor, bandit)
|
||||
- [ ] All new/modified code is typed and passes `mypy --strict`
|
||||
- [ ] New features have unit tests; no silent behavioral changes
|
||||
- [ ] Public APIs of `LeRobotDataset`, `Policy`, `Robot`, `Teleoperator`, `Env` are unchanged (or migration note present)
|
||||
- [ ] HF Hub streaming still works for dataset changes
|
||||
- [ ] No unnecessary abstractions introduced
|
||||
- [ ] No breaking changes to training scripts (`lerobot-train`, `lerobot-eval`, `lerobot-record`)
|
||||
|
||||
---
|
||||
|
||||
## ML-Specific Checks
|
||||
|
||||
Flag these as **P1** if found:
|
||||
|
||||
- **Data leakage**: train and val/test splits must be constructed before any normalization or augmentation that uses train statistics.
|
||||
- **Loss function errors**: verify reduction mode (`mean` vs `sum`), correct masking, correct shape alignment.
|
||||
- **Gradient flow**: new modules must have gradients flowing (check `requires_grad`, no detached tensors in the loss path by accident).
|
||||
- **Distributed training**: operations on tensors must be DDP-safe; no in-place ops on parameters; batch norm needs `SyncBatchNorm` if used.
|
||||
- **Memory leaks**: no accumulation of tensors outside the training loop; `optimizer.zero_grad()` called correctly.
|
||||
|
||||
---
|
||||
|
||||
## What to Skip
|
||||
|
||||
- Don't flag style nitpicks on unchanged surrounding code.
|
||||
- Don't propose refactors outside the PR's scope.
|
||||
- Don't add docstrings or comments to code the PR didn't touch.
|
||||
- Don't suggest speculative future features (YAGNI).
|
||||
@@ -1,309 +0,0 @@
|
||||
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# Integration tests: build an isolated Docker image per benchmark and run a
|
||||
# 1-episode smoke eval. Each benchmark gets its own image so incompatible
|
||||
# dependency trees (e.g. hf-libero vs metaworld==3.0.0) can never collide.
|
||||
#
|
||||
# To add a new benchmark:
|
||||
# 1. Add docker/Dockerfile.benchmark.<name> (install only lerobot[<name>])
|
||||
# 2. Copy one of the jobs below and adjust the image name and eval command.
|
||||
name: Benchmark Integration Tests
|
||||
|
||||
on:
|
||||
# Run manually from the Actions tab
|
||||
workflow_dispatch:
|
||||
|
||||
# Run every Monday at 02:00 UTC.
|
||||
schedule:
|
||||
- cron: "0 2 * * 1"
|
||||
|
||||
push:
|
||||
branches:
|
||||
- feat/benchmark-ci
|
||||
- main
|
||||
paths:
|
||||
- "src/lerobot/envs/**"
|
||||
- "src/lerobot/scripts/lerobot_eval.py"
|
||||
- "docker/Dockerfile.benchmark.*"
|
||||
- ".github/workflows/benchmark_tests.yml"
|
||||
- "pyproject.toml"
|
||||
|
||||
pull_request:
|
||||
branches:
|
||||
- main
|
||||
paths:
|
||||
- "src/lerobot/envs/**"
|
||||
- "src/lerobot/scripts/lerobot_eval.py"
|
||||
- "docker/Dockerfile.benchmark.*"
|
||||
- ".github/workflows/benchmark_tests.yml"
|
||||
- "pyproject.toml"
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
UV_VERSION: "0.8.0"
|
||||
PYTHON_VERSION: "3.12"
|
||||
|
||||
# Cancel in-flight runs for the same branch/PR.
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
# ── LIBERO ────────────────────────────────────────────────────────────────
|
||||
# Isolated image: lerobot[libero] only (hf-libero, dm-control, mujoco chain)
|
||||
libero-integration-test:
|
||||
name: Libero — build image + 1-episode eval
|
||||
runs-on:
|
||||
group: aws-g6-4xlarge-plus
|
||||
env:
|
||||
HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }}
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
with:
|
||||
persist-credentials: false
|
||||
lfs: true
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses]
|
||||
with:
|
||||
cache-binary: false
|
||||
|
||||
# Build the benchmark-specific image; layer cache lives in the runner's
|
||||
# local Docker daemon — reused across re-runs on the same machine.
|
||||
- name: Build Libero benchmark image
|
||||
uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses]
|
||||
with:
|
||||
context: .
|
||||
file: docker/Dockerfile.benchmark.libero
|
||||
push: false
|
||||
load: true
|
||||
tags: lerobot-benchmark-libero:ci
|
||||
cache-from: type=local,src=/tmp/.buildx-cache-libero
|
||||
cache-to: type=local,dest=/tmp/.buildx-cache-libero,mode=max
|
||||
|
||||
- name: Login to Hugging Face
|
||||
if: env.HF_USER_TOKEN != ''
|
||||
run: |
|
||||
docker run --rm \
|
||||
-e HF_HOME=/tmp/hf \
|
||||
lerobot-benchmark-libero:ci \
|
||||
bash -c "hf auth login --token '$HF_USER_TOKEN' --add-to-git-credential && hf auth whoami"
|
||||
|
||||
- name: Run Libero smoke eval (1 episode)
|
||||
run: |
|
||||
# Named container (no --rm) so we can docker cp artifacts out.
|
||||
# Output to /tmp inside the container — user_lerobot cannot create
|
||||
# root-level dirs like /artifacts.
|
||||
docker run --name libero-eval --gpus all \
|
||||
--shm-size=4g \
|
||||
-e HF_HOME=/tmp/hf \
|
||||
-e HF_USER_TOKEN="${HF_USER_TOKEN}" \
|
||||
-e HF_HUB_DOWNLOAD_TIMEOUT=300 \
|
||||
lerobot-benchmark-libero:ci \
|
||||
bash -c "
|
||||
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
||||
lerobot-eval \
|
||||
--policy.path=pepijn223/smolvla_libero \
|
||||
--env.type=libero \
|
||||
--env.task=libero_spatial \
|
||||
--eval.batch_size=1 \
|
||||
--eval.n_episodes=1 \
|
||||
--eval.use_async_envs=false \
|
||||
--policy.device=cuda \
|
||||
'--env.camera_name_mapping={\"agentview_image\": \"camera1\", \"robot0_eye_in_hand_image\": \"camera2\"}' \
|
||||
--policy.empty_cameras=1 \
|
||||
--output_dir=/tmp/eval-artifacts
|
||||
python3 /lerobot/scripts/ci/extract_task_descriptions.py \
|
||||
--env libero --task libero_spatial \
|
||||
--output /tmp/eval-artifacts/task_descriptions.json 2>/dev/null || true
|
||||
"
|
||||
|
||||
- name: Copy Libero artifacts from container
|
||||
if: always()
|
||||
run: |
|
||||
mkdir -p /tmp/libero-artifacts
|
||||
docker cp libero-eval:/tmp/eval-artifacts/. /tmp/libero-artifacts/ 2>/dev/null || true
|
||||
docker rm -f libero-eval || true
|
||||
|
||||
- name: Parse Libero eval metrics
|
||||
if: always()
|
||||
run: |
|
||||
python3 scripts/ci/parse_eval_metrics.py \
|
||||
--artifacts-dir /tmp/libero-artifacts \
|
||||
--env libero \
|
||||
--task libero_spatial \
|
||||
--policy pepijn223/smolvla_libero
|
||||
|
||||
- name: Upload Libero rollout video
|
||||
if: always()
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: libero-rollout-video
|
||||
path: /tmp/libero-artifacts/videos/
|
||||
if-no-files-found: warn
|
||||
|
||||
- name: Upload Libero eval metrics
|
||||
if: always()
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: libero-metrics
|
||||
path: /tmp/libero-artifacts/metrics.json
|
||||
if-no-files-found: warn
|
||||
|
||||
# ── LIBERO TRAIN+EVAL SMOKE ──────────────────────────────────────────────
|
||||
# Train SmolVLA for 1 step (batch_size=1, dataset episode 0 only) then
|
||||
# immediately runs eval inside the training loop (eval_freq=1, 1 episode).
|
||||
# Tests the full train→eval-within-training pipeline end-to-end.
|
||||
- name: Run Libero train+eval smoke (1 step, eval_freq=1)
|
||||
run: |
|
||||
docker run --name libero-train-smoke --gpus all \
|
||||
--shm-size=4g \
|
||||
-e HF_HOME=/tmp/hf \
|
||||
-e HF_USER_TOKEN="${HF_USER_TOKEN}" \
|
||||
-e HF_HUB_DOWNLOAD_TIMEOUT=300 \
|
||||
lerobot-benchmark-libero:ci \
|
||||
bash -c "
|
||||
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
||||
accelerate launch --num_processes=1 \$(which lerobot-train) \
|
||||
--policy.path=lerobot/smolvla_base \
|
||||
--policy.load_vlm_weights=true \
|
||||
--policy.scheduler_decay_steps=25000 \
|
||||
--policy.freeze_vision_encoder=false \
|
||||
--policy.train_expert_only=false \
|
||||
--dataset.repo_id=lerobot/libero \
|
||||
--dataset.episodes=[0] \
|
||||
--dataset.use_imagenet_stats=false \
|
||||
--env.type=libero \
|
||||
--env.task=libero_spatial \
|
||||
'--env.camera_name_mapping={\"agentview_image\": \"camera1\", \"robot0_eye_in_hand_image\": \"camera2\"}' \
|
||||
--policy.empty_cameras=1 \
|
||||
--output_dir=/tmp/train-smoke \
|
||||
--steps=1 \
|
||||
--batch_size=1 \
|
||||
--eval_freq=1 \
|
||||
--eval.n_episodes=1 \
|
||||
--eval.batch_size=1 \
|
||||
--eval.use_async_envs=false \
|
||||
--save_freq=1 \
|
||||
--policy.push_to_hub=false \
|
||||
'--rename_map={\"observation.images.image\": \"observation.images.camera1\", \"observation.images.image2\": \"observation.images.camera2\"}'
|
||||
"
|
||||
|
||||
- name: Copy Libero train-smoke artifacts from container
|
||||
if: always()
|
||||
run: |
|
||||
mkdir -p /tmp/libero-train-smoke-artifacts
|
||||
docker cp libero-train-smoke:/tmp/train-smoke/. /tmp/libero-train-smoke-artifacts/ 2>/dev/null || true
|
||||
docker rm -f libero-train-smoke || true
|
||||
|
||||
- name: Upload Libero train-smoke eval video
|
||||
if: always()
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: libero-train-smoke-video
|
||||
path: /tmp/libero-train-smoke-artifacts/eval/
|
||||
if-no-files-found: warn
|
||||
|
||||
# ── METAWORLD ─────────────────────────────────────────────────────────────
|
||||
# Isolated image: lerobot[metaworld] only (metaworld==3.0.0, mujoco>=3 chain)
|
||||
metaworld-integration-test:
|
||||
name: MetaWorld — build image + 1-episode eval
|
||||
runs-on:
|
||||
group: aws-g6-4xlarge-plus
|
||||
env:
|
||||
HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }}
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
with:
|
||||
persist-credentials: false
|
||||
lfs: true
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses]
|
||||
with:
|
||||
cache-binary: false
|
||||
|
||||
- name: Build MetaWorld benchmark image
|
||||
uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses]
|
||||
with:
|
||||
context: .
|
||||
file: docker/Dockerfile.benchmark.metaworld
|
||||
push: false
|
||||
load: true
|
||||
tags: lerobot-benchmark-metaworld:ci
|
||||
cache-from: type=local,src=/tmp/.buildx-cache-metaworld
|
||||
cache-to: type=local,dest=/tmp/.buildx-cache-metaworld,mode=max
|
||||
|
||||
- name: Run MetaWorld smoke eval (1 episode)
|
||||
run: |
|
||||
docker run --name metaworld-eval --gpus all \
|
||||
--shm-size=4g \
|
||||
-e HF_HOME=/tmp/hf \
|
||||
-e HF_USER_TOKEN="${HF_USER_TOKEN}" \
|
||||
-e HF_HUB_DOWNLOAD_TIMEOUT=300 \
|
||||
lerobot-benchmark-metaworld:ci \
|
||||
bash -c "
|
||||
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
||||
lerobot-eval \
|
||||
--policy.path=pepijn223/smolvla_metaworld \
|
||||
--env.type=metaworld \
|
||||
--env.task=metaworld-push-v3 \
|
||||
--eval.batch_size=1 \
|
||||
--eval.n_episodes=1 \
|
||||
--eval.use_async_envs=false \
|
||||
--policy.device=cuda \
|
||||
'--rename_map={\"observation.image\": \"observation.images.camera1\"}' \
|
||||
--policy.empty_cameras=2 \
|
||||
--output_dir=/tmp/eval-artifacts
|
||||
python3 /lerobot/scripts/ci/extract_task_descriptions.py \
|
||||
--env metaworld --task metaworld-push-v3 \
|
||||
--output /tmp/eval-artifacts/task_descriptions.json 2>/dev/null || true
|
||||
"
|
||||
|
||||
- name: Copy MetaWorld artifacts from container
|
||||
if: always()
|
||||
run: |
|
||||
mkdir -p /tmp/metaworld-artifacts
|
||||
docker cp metaworld-eval:/tmp/eval-artifacts/. /tmp/metaworld-artifacts/ 2>/dev/null || true
|
||||
docker rm -f metaworld-eval || true
|
||||
|
||||
- name: Parse MetaWorld eval metrics
|
||||
if: always()
|
||||
run: |
|
||||
python3 scripts/ci/parse_eval_metrics.py \
|
||||
--artifacts-dir /tmp/metaworld-artifacts \
|
||||
--env metaworld \
|
||||
--task metaworld-push-v3 \
|
||||
--policy pepijn223/smolvla_metaworld
|
||||
|
||||
- name: Upload MetaWorld rollout video
|
||||
if: always()
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: metaworld-rollout-video
|
||||
path: /tmp/metaworld-artifacts/videos/
|
||||
if-no-files-found: warn
|
||||
|
||||
- name: Upload MetaWorld eval metrics
|
||||
if: always()
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: metaworld-metrics
|
||||
path: /tmp/metaworld-artifacts/metrics.json
|
||||
if-no-files-found: warn
|
||||
@@ -0,0 +1,49 @@
|
||||
name: Claude Code Review
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
types: [opened, synchronize, ready_for_review, reopened]
|
||||
|
||||
jobs:
|
||||
claude-review:
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: read
|
||||
pull-requests: write
|
||||
issues: read
|
||||
id-token: write
|
||||
actions: read
|
||||
env:
|
||||
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
|
||||
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 1
|
||||
persist-credentials: false
|
||||
|
||||
- name: Run Claude Code Review
|
||||
id: claude-review
|
||||
uses: anthropics/claude-code-action@26ddc358fe3befff50c5ec2f80304c90c763f6f8 # v1
|
||||
with:
|
||||
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
use_sticky_comment: true
|
||||
prompt: |
|
||||
Read `.github/CLAUDE.md` for lerobot-specific conventions, then review this PR.
|
||||
Provide structured, actionable feedback.
|
||||
|
||||
Focus areas (in priority order):
|
||||
1. **Correctness**: Logic errors, off-by-ones, wrong tensor shapes, incorrect loss functions
|
||||
2. **Type safety**: All new/modified Python code must pass `mypy --strict`; check for missing annotations
|
||||
3. **Backwards compatibility**: Does this break `LeRobotDataset`, `Policy`, `Robot`, `Teleoperator`, `Env`, or `Processor` public APIs?
|
||||
4. **Tests**: New features must have tests; no silent behavioral changes
|
||||
5. **Code style**: Explicit over magic, no unnecessary abstractions, no decorative comments
|
||||
6. **HF integration**: Dataset streaming, `push_to_hub`, HF Hub compatibility preserved?
|
||||
7. **pre-commit**: Would `pre-commit run -a` pass? (ruff, mypy, typos, zizmor)
|
||||
|
||||
Format findings as P1 (must fix) / P2 (should fix) / P3 (nice to have).
|
||||
Skip P3 if the PR is already high quality.
|
||||
claude_args: '--model claude-opus-4-6'
|
||||
# See https://github.com/anthropics/claude-code-action/blob/main/docs/usage.md
|
||||
# or https://code.claude.com/docs/en/cli-reference for available options
|
||||
@@ -0,0 +1,58 @@
|
||||
name: Claude Code
|
||||
|
||||
on:
|
||||
issue_comment:
|
||||
types: [created]
|
||||
pull_request_review_comment:
|
||||
types: [created]
|
||||
issues:
|
||||
types: [opened, assigned]
|
||||
pull_request_review:
|
||||
types: [submitted]
|
||||
|
||||
jobs:
|
||||
claude:
|
||||
if: |
|
||||
(github.event_name == 'issue_comment' &&
|
||||
contains(github.event.comment.body, '@claude') &&
|
||||
(github.event.comment.author_association == 'OWNER' || github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'COLLABORATOR')) ||
|
||||
(github.event_name == 'pull_request_review_comment' &&
|
||||
contains(github.event.comment.body, '@claude') &&
|
||||
(github.event.comment.author_association == 'OWNER' || github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'COLLABORATOR')) ||
|
||||
(github.event_name == 'pull_request_review' &&
|
||||
contains(github.event.review.body, '@claude') &&
|
||||
(github.event.review.author_association == 'OWNER' || github.event.review.author_association == 'MEMBER' || github.event.review.author_association == 'COLLABORATOR')) ||
|
||||
(github.event_name == 'issues' &&
|
||||
(contains(github.event.issue.body, '@claude') || contains(github.event.issue.title, '@claude')) &&
|
||||
(github.event.issue.author_association == 'OWNER' || github.event.issue.author_association == 'MEMBER' || github.event.issue.author_association == 'COLLABORATOR'))
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: read
|
||||
pull-requests: write
|
||||
issues: write
|
||||
id-token: write
|
||||
actions: read
|
||||
env:
|
||||
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
|
||||
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 1
|
||||
persist-credentials: false
|
||||
|
||||
- name: Run Claude Code
|
||||
id: claude
|
||||
uses: anthropics/claude-code-action@26ddc358fe3befff50c5ec2f80304c90c763f6f8 # v1
|
||||
with:
|
||||
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
use_sticky_comment: true
|
||||
|
||||
# This is an optional setting that allows Claude to read CI results on PRs
|
||||
additional_permissions: |
|
||||
actions: read
|
||||
|
||||
claude_args: '--system-prompt "Read .github/CLAUDE.md for lerobot-specific conventions before responding."'
|
||||
# See https://github.com/anthropics/claude-code-action/blob/main/docs/usage.md
|
||||
# or https://code.claude.com/docs/en/cli-reference for available options
|
||||
@@ -1,89 +0,0 @@
|
||||
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# Isolated benchmark image for LIBERO integration tests.
|
||||
# Installs only lerobot[libero] so its dep tree (hf-libero, dm-control, mujoco)
|
||||
# cannot conflict with other benchmarks.
|
||||
#
|
||||
# Build: docker build -f docker/Dockerfile.benchmark.libero -t lerobot-benchmark-libero .
|
||||
# Run: docker run --gpus all --rm lerobot-benchmark-libero lerobot-eval ...
|
||||
|
||||
ARG CUDA_VERSION=12.4.1
|
||||
ARG OS_VERSION=22.04
|
||||
FROM nvidia/cuda:${CUDA_VERSION}-base-ubuntu${OS_VERSION}
|
||||
|
||||
ARG PYTHON_VERSION=3.12
|
||||
|
||||
ENV DEBIAN_FRONTEND=noninteractive \
|
||||
MUJOCO_GL=egl \
|
||||
PATH=/lerobot/.venv/bin:$PATH \
|
||||
CUDA_VISIBLE_DEVICES=0 \
|
||||
DEVICE=cuda
|
||||
|
||||
# System deps — same set as Dockerfile.internal
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
software-properties-common build-essential git curl \
|
||||
libglib2.0-0 libgl1-mesa-glx libegl1-mesa ffmpeg \
|
||||
libusb-1.0-0-dev speech-dispatcher libgeos-dev portaudio19-dev \
|
||||
cmake pkg-config ninja-build \
|
||||
&& add-apt-repository -y ppa:deadsnakes/ppa \
|
||||
&& apt-get update \
|
||||
&& apt-get install -y --no-install-recommends \
|
||||
python${PYTHON_VERSION} \
|
||||
python${PYTHON_VERSION}-venv \
|
||||
python${PYTHON_VERSION}-dev \
|
||||
&& curl -LsSf https://astral.sh/uv/install.sh | sh \
|
||||
&& mv /root/.local/bin/uv /usr/local/bin/uv \
|
||||
&& useradd --create-home --shell /bin/bash user_lerobot \
|
||||
&& usermod -aG sudo user_lerobot \
|
||||
&& apt-get clean && rm -rf /var/lib/apt/lists/*
|
||||
|
||||
WORKDIR /lerobot
|
||||
RUN chown -R user_lerobot:user_lerobot /lerobot
|
||||
USER user_lerobot
|
||||
|
||||
ENV HOME=/home/user_lerobot \
|
||||
HF_HOME=/home/user_lerobot/.cache/huggingface \
|
||||
HF_LEROBOT_HOME=/home/user_lerobot/.cache/huggingface/lerobot \
|
||||
TORCH_HOME=/home/user_lerobot/.cache/torch \
|
||||
TRITON_CACHE_DIR=/home/user_lerobot/.cache/triton
|
||||
|
||||
RUN uv venv --python python${PYTHON_VERSION}
|
||||
|
||||
# Install only lerobot[libero] — completely isolated from metaworld's dep tree
|
||||
COPY --chown=user_lerobot:user_lerobot setup.py pyproject.toml uv.lock README.md MANIFEST.in ./
|
||||
COPY --chown=user_lerobot:user_lerobot src/ src/
|
||||
|
||||
RUN uv sync --locked --extra libero --extra smolvla --no-cache
|
||||
|
||||
# Pre-download lerobot/libero-assets from HF Hub so nothing is fetched at
|
||||
# runtime (which times out on CI). Point the libero config at the cached path.
|
||||
# libero/libero/__init__.py calls input() when ~/.libero/config.yaml is missing,
|
||||
# so we write the config before any libero import can happen.
|
||||
RUN LIBERO_DIR=$(python${PYTHON_VERSION} -c \
|
||||
"import importlib.util, os; s=importlib.util.find_spec('libero'); \
|
||||
print(os.path.join(os.path.dirname(s.origin), 'libero'))") && \
|
||||
mkdir -p /home/user_lerobot/.libero && \
|
||||
python${PYTHON_VERSION} -c "\
|
||||
from huggingface_hub import snapshot_download; \
|
||||
snapshot_download(repo_id='lerobot/libero-assets', repo_type='dataset', \
|
||||
local_dir='/home/user_lerobot/.libero/assets')" && \
|
||||
printf "assets: /home/user_lerobot/.libero/assets\nbddl_files: ${LIBERO_DIR}/bddl_files\ndatasets: ${LIBERO_DIR}/../datasets\ninit_states: ${LIBERO_DIR}/init_files\n" \
|
||||
> /home/user_lerobot/.libero/config.yaml
|
||||
|
||||
RUN chmod +x /lerobot/.venv/lib/python${PYTHON_VERSION}/site-packages/triton/backends/nvidia/bin/ptxas
|
||||
|
||||
COPY --chown=user_lerobot:user_lerobot . .
|
||||
|
||||
CMD ["/bin/bash"]
|
||||
@@ -1,74 +0,0 @@
|
||||
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
# Isolated benchmark image for MetaWorld integration tests.
|
||||
# Installs only lerobot[metaworld] so its dep tree (metaworld==3.0.0, mujoco>=3)
|
||||
# cannot conflict with other benchmarks.
|
||||
#
|
||||
# Build: docker build -f docker/Dockerfile.benchmark.metaworld -t lerobot-benchmark-metaworld .
|
||||
# Run: docker run --gpus all --rm lerobot-benchmark-metaworld lerobot-eval ...
|
||||
|
||||
ARG CUDA_VERSION=12.4.1
|
||||
ARG OS_VERSION=22.04
|
||||
FROM nvidia/cuda:${CUDA_VERSION}-base-ubuntu${OS_VERSION}
|
||||
|
||||
ARG PYTHON_VERSION=3.12
|
||||
|
||||
ENV DEBIAN_FRONTEND=noninteractive \
|
||||
MUJOCO_GL=egl \
|
||||
PATH=/lerobot/.venv/bin:$PATH \
|
||||
CUDA_VISIBLE_DEVICES=0 \
|
||||
DEVICE=cuda
|
||||
|
||||
# System deps — same set as Dockerfile.internal
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
software-properties-common build-essential git curl \
|
||||
libglib2.0-0 libgl1-mesa-glx libegl1-mesa ffmpeg \
|
||||
libusb-1.0-0-dev speech-dispatcher libgeos-dev portaudio19-dev \
|
||||
cmake pkg-config ninja-build \
|
||||
&& add-apt-repository -y ppa:deadsnakes/ppa \
|
||||
&& apt-get update \
|
||||
&& apt-get install -y --no-install-recommends \
|
||||
python${PYTHON_VERSION} \
|
||||
python${PYTHON_VERSION}-venv \
|
||||
python${PYTHON_VERSION}-dev \
|
||||
&& curl -LsSf https://astral.sh/uv/install.sh | sh \
|
||||
&& mv /root/.local/bin/uv /usr/local/bin/uv \
|
||||
&& useradd --create-home --shell /bin/bash user_lerobot \
|
||||
&& usermod -aG sudo user_lerobot \
|
||||
&& apt-get clean && rm -rf /var/lib/apt/lists/*
|
||||
|
||||
WORKDIR /lerobot
|
||||
RUN chown -R user_lerobot:user_lerobot /lerobot
|
||||
USER user_lerobot
|
||||
|
||||
ENV HOME=/home/user_lerobot \
|
||||
HF_HOME=/home/user_lerobot/.cache/huggingface \
|
||||
HF_LEROBOT_HOME=/home/user_lerobot/.cache/huggingface/lerobot \
|
||||
TORCH_HOME=/home/user_lerobot/.cache/torch \
|
||||
TRITON_CACHE_DIR=/home/user_lerobot/.cache/triton
|
||||
|
||||
RUN uv venv --python python${PYTHON_VERSION}
|
||||
|
||||
# Install only lerobot[metaworld] — completely isolated from libero's dep tree
|
||||
COPY --chown=user_lerobot:user_lerobot setup.py pyproject.toml uv.lock README.md MANIFEST.in ./
|
||||
COPY --chown=user_lerobot:user_lerobot src/ src/
|
||||
|
||||
RUN uv sync --locked --extra metaworld --extra smolvla --no-cache
|
||||
|
||||
RUN chmod +x /lerobot/.venv/lib/python${PYTHON_VERSION}/site-packages/triton/backends/nvidia/bin/ptxas
|
||||
|
||||
COPY --chown=user_lerobot:user_lerobot . .
|
||||
|
||||
CMD ["/bin/bash"]
|
||||
@@ -73,8 +73,6 @@
|
||||
title: Control & Train Robots in Sim (LeIsaac)
|
||||
title: "Simulation"
|
||||
- sections:
|
||||
- local: evaluation
|
||||
title: Evaluation (lerobot-eval)
|
||||
- local: adding_benchmarks
|
||||
title: Adding a New Benchmark
|
||||
- local: libero
|
||||
|
||||
@@ -26,7 +26,7 @@ During evaluation, data moves through four stages:
|
||||
1. gym.Env ──→ raw observations (numpy dicts)
|
||||
|
||||
2. Preprocessing ──→ standard LeRobot keys + task description
|
||||
(preprocess_observation in envs/utils.py, env.call("task_description"))
|
||||
(preprocess_observation, add_envs_task in envs/utils.py)
|
||||
|
||||
3. Processors ──→ env-specific then policy-specific transforms
|
||||
(env_preprocessor, policy_preprocessor)
|
||||
@@ -122,17 +122,15 @@ Each `EnvConfig` subclass declares two dicts that tell the policy what to expect
|
||||
|
||||
### Checklist
|
||||
|
||||
| File | Required | Why |
|
||||
| ----------------------------------------- | -------- | ------------------------------------------------------------ |
|
||||
| `src/lerobot/envs/<benchmark>.py` | Yes | Wraps the simulator as a standard gym.Env |
|
||||
| `src/lerobot/envs/configs.py` | Yes | Registers your benchmark and its `create_envs()` for the CLI |
|
||||
| `src/lerobot/processor/env_processor.py` | Optional | Custom observation/action transforms |
|
||||
| `src/lerobot/envs/utils.py` | Optional | Only if you need new raw observation keys |
|
||||
| `pyproject.toml` | Yes | Declares benchmark-specific dependencies |
|
||||
| `docs/source/<benchmark>.mdx` | Yes | User-facing documentation page |
|
||||
| `docs/source/_toctree.yml` | Yes | Adds your page to the docs sidebar |
|
||||
| `docker/Dockerfile.benchmark.<benchmark>` | Yes | Isolated Docker image for CI smoke tests |
|
||||
| `.github/workflows/benchmark_tests.yml` | Yes | CI job that builds the image and runs a 1-episode smoke eval |
|
||||
| File | Required | Why |
|
||||
| ---------------------------------------- | -------- | ------------------------------------------------------------ |
|
||||
| `src/lerobot/envs/<benchmark>.py` | Yes | Wraps the simulator as a standard gym.Env |
|
||||
| `src/lerobot/envs/configs.py` | Yes | Registers your benchmark and its `create_envs()` for the CLI |
|
||||
| `src/lerobot/processor/env_processor.py` | Optional | Custom observation/action transforms |
|
||||
| `src/lerobot/envs/utils.py` | Optional | Only if you need new raw observation keys |
|
||||
| `pyproject.toml` | Yes | Declares benchmark-specific dependencies |
|
||||
| `docs/source/<benchmark>.mdx` | Yes | User-facing documentation page |
|
||||
| `docs/source/_toctree.yml` | Yes | Adds your page to the docs sidebar |
|
||||
|
||||
### 1. The gym.Env wrapper (`src/lerobot/envs/<benchmark>.py`)
|
||||
|
||||
@@ -163,8 +161,6 @@ class MyBenchmarkEnv(gym.Env):
|
||||
...
|
||||
```
|
||||
|
||||
**GPU-based simulators (e.g. MuJoCo with EGL rendering):** If your simulator allocates GPU/EGL contexts during `__init__`, defer that allocation to a `_ensure_env()` helper called on first `reset()`/`step()`. This avoids inheriting stale GPU handles when `AsyncVectorEnv` spawns worker processes. See `LiberoEnv._ensure_env()` for the pattern.
|
||||
|
||||
Also provide a factory function that returns the nested dict structure:
|
||||
|
||||
```python
|
||||
@@ -211,7 +207,7 @@ class MyBenchmarkEnvConfig(EnvConfig):
|
||||
def gym_kwargs(self) -> dict:
|
||||
return {"obs_type": self.obs_type, "render_mode": self.render_mode}
|
||||
|
||||
def create_envs(self, n_envs: int, use_async_envs: bool = True):
|
||||
def create_envs(self, n_envs: int, use_async_envs: bool = False):
|
||||
"""Override for multi-task benchmarks or custom env creation."""
|
||||
from lerobot.envs.<benchmark> import create_<benchmark>_envs
|
||||
return create_<benchmark>_envs(task=self.task, n_envs=n_envs, ...)
|
||||
@@ -297,87 +293,14 @@ Add your benchmark to the "Benchmarks" section:
|
||||
title: "Benchmarks"
|
||||
```
|
||||
|
||||
### 7. CI smoke test (`docker/` + `.github/workflows/benchmark_tests.yml`)
|
||||
|
||||
Each benchmark must have an isolated Docker image and a CI job that runs a 1-episode eval. This catches install-time regressions (broken transitive deps, import errors, interactive prompts) before they reach users.
|
||||
|
||||
**Create `docker/Dockerfile.benchmark.<benchmark>`** — copy an existing one and change only the extra name:
|
||||
|
||||
```dockerfile
|
||||
# Isolated benchmark image — installs lerobot[<benchmark>] only.
|
||||
# Build: docker build -f docker/Dockerfile.benchmark.<benchmark> -t lerobot-benchmark-<benchmark> .
|
||||
ARG CUDA_VERSION=12.4.1
|
||||
ARG OS_VERSION=22.04
|
||||
FROM nvidia/cuda:${CUDA_VERSION}-base-ubuntu${OS_VERSION}
|
||||
ARG PYTHON_VERSION=3.12
|
||||
# ... (same system deps as Dockerfile.benchmark.libero) ...
|
||||
RUN uv sync --locked --extra <benchmark> --no-cache
|
||||
```
|
||||
|
||||
Each benchmark gets its own image so its dependency tree (pinned simulator packages, specific mujoco/scipy versions) cannot conflict with other benchmarks.
|
||||
|
||||
**Add a job to `.github/workflows/benchmark_tests.yml`** — copy an existing job block and adjust:
|
||||
|
||||
```yaml
|
||||
<benchmark>-integration-test:
|
||||
name: <Benchmark> — build image + 1-episode eval
|
||||
runs-on:
|
||||
group: aws-g6-4xlarge-plus
|
||||
env:
|
||||
HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }}
|
||||
steps:
|
||||
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
with:
|
||||
persist-credentials: false
|
||||
lfs: true
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses]
|
||||
with:
|
||||
cache-binary: false
|
||||
- name: Build <Benchmark> image
|
||||
uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses]
|
||||
with:
|
||||
context: .
|
||||
file: docker/Dockerfile.benchmark.<benchmark>
|
||||
push: false
|
||||
load: true
|
||||
tags: lerobot-benchmark-<benchmark>:ci
|
||||
cache-from: type=local,src=/tmp/.buildx-cache-<benchmark>
|
||||
cache-to: type=local,dest=/tmp/.buildx-cache-<benchmark>,mode=max
|
||||
- name: Run <Benchmark> smoke eval (1 episode)
|
||||
run: |
|
||||
docker run --rm --gpus all \
|
||||
--shm-size=4g \
|
||||
-e HF_HOME=/tmp/hf \
|
||||
-e HF_USER_TOKEN="${HF_USER_TOKEN}" \
|
||||
lerobot-benchmark-<benchmark>:ci \
|
||||
bash -c "
|
||||
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
|
||||
lerobot-eval \
|
||||
--policy.path=<hub_policy_path> \
|
||||
--env.type=<benchmark> \
|
||||
--env.task=<task> \
|
||||
--eval.batch_size=1 \
|
||||
--eval.n_episodes=1 \
|
||||
--eval.use_async_envs=false \
|
||||
--policy.device=cuda
|
||||
"
|
||||
```
|
||||
|
||||
**Tips:**
|
||||
|
||||
- If the benchmark library prompts for user input on import (like LIBERO asking for a dataset folder), pass the relevant env var in the `docker run` command (e.g. `-e LIBERO_DATA_FOLDER=/tmp/libero_data`).
|
||||
- The job is scoped to only trigger on changes to `src/lerobot/envs/**`, `src/lerobot/scripts/lerobot_eval.py`, and the Dockerfiles — it won't run on unrelated PRs.
|
||||
|
||||
## Verifying your integration
|
||||
|
||||
After completing the steps above, confirm that everything works:
|
||||
|
||||
1. **Install** — `pip install -e ".[mybenchmark]"` and verify the dependency group installs cleanly.
|
||||
2. **Smoke test env creation** — call `make_env()` with your config in Python, check that the returned dict has the expected `{suite: {task_id: VectorEnv}}` shape, and that `reset()` returns observations with the right keys.
|
||||
3. **Run a full eval** — `lerobot-eval --env.type=<name> --env.task=<task> --eval.n_episodes=1 --policy.path=<any_compatible_policy>` to exercise the full pipeline end-to-end. (`batch_size` defaults to auto-tuning based on CPU cores; pass `--eval.batch_size=1` to force a single environment.)
|
||||
3. **Run a full eval** — `lerobot-eval --env.type=<name> --env.task=<task> --eval.n_episodes=1 --eval.batch_size=1 --policy.path=<any_compatible_policy>` to exercise the full pipeline end-to-end.
|
||||
4. **Check success detection** — verify that `info["is_success"]` flips to `True` when the task is actually completed. This is what the eval loop uses to compute success rates.
|
||||
5. **Add CI smoke test** — follow step 7 above to add a Dockerfile and CI job. This ensures the install stays green as dependencies evolve.
|
||||
|
||||
## Writing a benchmark doc page
|
||||
|
||||
@@ -388,7 +311,7 @@ Each benchmark `.mdx` page should include:
|
||||
- **Overview image or GIF.**
|
||||
- **Available tasks** — table of task suites with counts and brief descriptions.
|
||||
- **Installation** — `pip install -e ".[<benchmark>]"` plus any extra steps (env vars, system packages).
|
||||
- **Evaluation** — recommended `lerobot-eval` command with `n_episodes` for reproducible results. `batch_size` defaults to auto; only specify it if needed. Include single-task and multi-task examples if applicable. See the [Evaluation guide](evaluation) for details.
|
||||
- **Evaluation** — recommended `lerobot-eval` command with `n_episodes` and `batch_size` for reproducible results. Include single-task and multi-task examples if applicable.
|
||||
- **Policy inputs and outputs** — observation keys with shapes, action space description.
|
||||
- **Recommended evaluation episodes** — how many episodes per task is standard.
|
||||
- **Training** — example `lerobot-train` command.
|
||||
|
||||
@@ -90,11 +90,17 @@ The same policy can work with different environment processors, and the same env
|
||||
|
||||
```python
|
||||
# Use SmolVLA policy with LIBERO environment
|
||||
libero_preprocessor, libero_postprocessor = make_env_pre_post_processors(libero_cfg)
|
||||
# Use SmolVLA policy with LIBERO environment
|
||||
libero_preprocessor, libero_postprocessor = make_env_pre_post_processors(
|
||||
env_cfg=libero_cfg,
|
||||
policy_cfg=smolvla_cfg,
|
||||
)
|
||||
smolvla_preprocessor, smolvla_postprocessor = make_pre_post_processors(smolvla_cfg)
|
||||
|
||||
# Or use ACT policy with the same LIBERO environment
|
||||
libero_preprocessor, libero_postprocessor = make_env_pre_post_processors(libero_cfg)
|
||||
libero_preprocessor, libero_postprocessor = make_env_pre_post_processors(
|
||||
env_cfg=libero_cfg,
|
||||
policy_cfg=act_cfg,
|
||||
)
|
||||
act_preprocessor, act_postprocessor = make_pre_post_processors(act_cfg)
|
||||
```
|
||||
|
||||
@@ -151,7 +157,7 @@ observation = {
|
||||
|
||||
### Factory Function
|
||||
|
||||
The `make_env_pre_post_processors` function follows the same pattern as `make_pre_post_processors` for policies:
|
||||
The `make_env_pre_post_processors` function delegates to `env_cfg.get_env_processors()`:
|
||||
|
||||
```python
|
||||
from lerobot.envs.factory import make_env_pre_post_processors
|
||||
@@ -159,47 +165,31 @@ from lerobot.envs.configs import LiberoEnv, PushtEnv
|
||||
|
||||
# For LIBERO: Returns LiberoProcessorStep in preprocessor
|
||||
libero_cfg = LiberoEnv(task="libero_spatial", camera_name=["agentview"])
|
||||
env_preprocessor, env_postprocessor = make_env_pre_post_processors(libero_cfg)
|
||||
env_preprocessor, env_postprocessor = make_env_pre_post_processors(libero_cfg, policy_cfg)
|
||||
|
||||
# For other environments: Returns identity processors (no-op)
|
||||
pusht_cfg = PushtEnv()
|
||||
env_preprocessor, env_postprocessor = make_env_pre_post_processors(pusht_cfg)
|
||||
env_preprocessor, env_postprocessor = make_env_pre_post_processors(pusht_cfg, policy_cfg)
|
||||
```
|
||||
|
||||
### Implementation in `envs/factory.py`
|
||||
### How It Works
|
||||
|
||||
Each `EnvConfig` subclass can override `get_env_processors()` to return benchmark-specific
|
||||
processor pipelines. The base class returns identity (no-op) processors by default.
|
||||
|
||||
```python
|
||||
def make_env_pre_post_processors(
|
||||
env_cfg: EnvConfig,
|
||||
) -> tuple[
|
||||
PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
|
||||
PolicyProcessorPipeline[dict[str, Any], dict[str, Any]],
|
||||
]:
|
||||
"""
|
||||
Create preprocessor and postprocessor pipelines for environment observations.
|
||||
|
||||
Args:
|
||||
env_cfg: The configuration of the environment.
|
||||
|
||||
Returns:
|
||||
A tuple containing:
|
||||
- preprocessor: Pipeline that processes environment observations
|
||||
- postprocessor: Pipeline that processes environment outputs
|
||||
"""
|
||||
# For LIBERO environments, add the LiberoProcessorStep to preprocessor
|
||||
if isinstance(env_cfg, LiberoEnv) or "libero" in env_cfg.type:
|
||||
preprocessor = PolicyProcessorPipeline(steps=[LiberoProcessorStep()])
|
||||
else:
|
||||
# For all other environments, return an identity preprocessor
|
||||
preprocessor = PolicyProcessorPipeline(steps=[])
|
||||
|
||||
# Postprocessor is currently identity for all environments
|
||||
# Future: Could add environment-specific action transformations
|
||||
postprocessor = PolicyProcessorPipeline(steps=[])
|
||||
|
||||
return preprocessor, postprocessor
|
||||
# In your EnvConfig subclass:
|
||||
def get_env_processors(self):
|
||||
from lerobot.processor.pipeline import PolicyProcessorPipeline
|
||||
return (
|
||||
PolicyProcessorPipeline(steps=[MyProcessorStep()]),
|
||||
PolicyProcessorPipeline(steps=[]),
|
||||
)
|
||||
```
|
||||
|
||||
The factory function `make_env_pre_post_processors` simply delegates to this method,
|
||||
with a special case for `XVLAConfig` policies which override the env processors entirely.
|
||||
|
||||
### Integration in Evaluation
|
||||
|
||||
In `lerobot_eval.py`, the environment processors are created once and used throughout:
|
||||
@@ -219,7 +209,10 @@ def eval_main(cfg: EvalPipelineConfig):
|
||||
)
|
||||
|
||||
# Create environment processors (NEW!)
|
||||
env_preprocessor, env_postprocessor = make_env_pre_post_processors(env_cfg=cfg.env)
|
||||
env_preprocessor, env_postprocessor = make_env_pre_post_processors(
|
||||
env_cfg=cfg.env,
|
||||
policy_cfg=cfg.policy,
|
||||
)
|
||||
|
||||
# Run evaluation with both processor types
|
||||
eval_policy_all(
|
||||
@@ -323,21 +316,22 @@ class MyEnvProcessorStep(ObservationProcessorStep):
|
||||
return processed
|
||||
```
|
||||
|
||||
### 2. Update the Factory
|
||||
### 2. Update Your `EnvConfig` Subclass
|
||||
|
||||
```python
|
||||
# In src/lerobot/envs/factory.py
|
||||
# In src/lerobot/envs/configs.py
|
||||
@EnvConfig.register_subclass("myenv")
|
||||
@dataclass
|
||||
class MyEnvConfig(EnvConfig):
|
||||
# ... task/features/gym kwargs ...
|
||||
|
||||
def make_env_pre_post_processors(env_cfg: EnvConfig):
|
||||
if isinstance(env_cfg, LiberoEnv) or "libero" in env_cfg.type:
|
||||
preprocessor = PolicyProcessorPipeline(steps=[LiberoProcessorStep()])
|
||||
elif isinstance(env_cfg, MyEnvConfig) or "myenv" in env_cfg.type:
|
||||
preprocessor = PolicyProcessorPipeline(steps=[MyEnvProcessorStep()])
|
||||
else:
|
||||
preprocessor = PolicyProcessorPipeline(steps=[])
|
||||
def get_env_processors(self):
|
||||
from lerobot.processor.pipeline import PolicyProcessorPipeline
|
||||
|
||||
postprocessor = PolicyProcessorPipeline(steps=[])
|
||||
return preprocessor, postprocessor
|
||||
return (
|
||||
PolicyProcessorPipeline(steps=[MyEnvProcessorStep()]),
|
||||
PolicyProcessorPipeline(steps=[]),
|
||||
)
|
||||
```
|
||||
|
||||
### 3. Use in Evaluation
|
||||
|
||||
@@ -1,162 +0,0 @@
|
||||
# Evaluation
|
||||
|
||||
`lerobot-eval` runs a trained policy on a simulation benchmark and reports success rate, reward, and (optionally) episode videos. It handles environment creation, batched rollouts, and metric aggregation automatically.
|
||||
|
||||
## Quick start
|
||||
|
||||
Evaluate a Hub-hosted policy on LIBERO:
|
||||
|
||||
```bash
|
||||
lerobot-eval \
|
||||
--policy.path=pepijn223/smolvla_libero \
|
||||
--env.type=libero \
|
||||
--env.task=libero_spatial \
|
||||
--eval.n_episodes=10 \
|
||||
--policy.device=cuda
|
||||
```
|
||||
|
||||
Evaluate a local checkpoint:
|
||||
|
||||
```bash
|
||||
lerobot-eval \
|
||||
--policy.path=outputs/train/act_pusht/checkpoints/005000/pretrained_model \
|
||||
--env.type=pusht \
|
||||
--eval.n_episodes=10
|
||||
```
|
||||
|
||||
`batch_size` defaults to **auto** (based on CPU cores). The script picks the right number of parallel environments for your machine.
|
||||
|
||||
## Key flags
|
||||
|
||||
| Flag | Default | Description |
|
||||
| ----------------------- | -------------- | ------------------------------------------------------------------------------------- |
|
||||
| `--policy.path` | required | Hub repo ID or local path to a pretrained model |
|
||||
| `--env.type` | required | Benchmark name (`pusht`, `libero`, `metaworld`, etc.) |
|
||||
| `--env.task` | varies | Task or suite name (e.g. `libero_spatial`, `libero_10`) |
|
||||
| `--eval.n_episodes` | `50` | Total episodes to run (across all tasks) |
|
||||
| `--eval.batch_size` | `0` (auto) | Number of parallel environments. `0` = auto-tune from CPU cores |
|
||||
| `--eval.use_async_envs` | `true` | Use `AsyncVectorEnv` (parallel stepping). Auto-downgrades to sync when `batch_size=1` |
|
||||
| `--policy.device` | `cuda` | Inference device |
|
||||
| `--policy.use_amp` | `false` | Mixed-precision inference (saves VRAM, faster on Ampere+) |
|
||||
| `--seed` | `1000` | Random seed for reproducibility |
|
||||
| `--output_dir` | auto-generated | Where to write results and videos |
|
||||
|
||||
### Environment-specific flags
|
||||
|
||||
Some benchmarks accept additional flags through `--env.*`:
|
||||
|
||||
```bash
|
||||
# LIBERO: map simulator camera names to policy feature names
|
||||
--env.camera_name_mapping='{"agentview_image": "camera1", "robot0_eye_in_hand_image": "camera2"}'
|
||||
|
||||
# Fill unused camera slots with zeros
|
||||
--policy.empty_cameras=1
|
||||
```
|
||||
|
||||
See each benchmark's documentation ([LIBERO](libero), [Meta-World](metaworld)) for benchmark-specific flags.
|
||||
|
||||
## How batch_size works
|
||||
|
||||
`batch_size` controls how many environments run in parallel within a single `VectorEnv`:
|
||||
|
||||
| `batch_size` | Behavior |
|
||||
| ------------- | -------------------------------------------------------------------- |
|
||||
| `0` (default) | Auto-tune: `floor(cpu_cores × 0.7)`, capped by `n_episodes` and `64` |
|
||||
| `1` | Single environment, synchronous. Useful for debugging |
|
||||
| `N` | N environments step in parallel via `AsyncVectorEnv` |
|
||||
|
||||
When `batch_size > 1` and `use_async_envs=true`, each environment runs in its own subprocess via Gymnasium's `AsyncVectorEnv`. This parallelizes the simulation stepping (the main bottleneck), while the policy runs a single batched forward pass on GPU.
|
||||
|
||||
**Example:** On a 16-core machine with `n_episodes=100`:
|
||||
|
||||
- Auto batch_size = `floor(16 × 0.7)` = `11`
|
||||
- 11 environments step simultaneously → ~11× faster than sequential
|
||||
|
||||
## Performance
|
||||
|
||||
### AsyncVectorEnv (default)
|
||||
|
||||
`AsyncVectorEnv` spawns one subprocess per environment. Each subprocess has its own simulator instance. While the policy computes actions on GPU, all environments step in parallel on CPU:
|
||||
|
||||
```
|
||||
GPU: [inference]....[inference]....[inference]....
|
||||
CPU: [step × N]....................[step × N]......
|
||||
↑ parallel ↑ parallel
|
||||
```
|
||||
|
||||
For GPU-based simulators (LIBERO, Meta-World), the environments use **lazy initialization**: the GPU/EGL context is created inside the worker subprocess on first `reset()`, not in the parent process. This avoids `EGL_BAD_CONTEXT` crashes from inheriting stale GPU handles across `fork()`.
|
||||
|
||||
### Lazy task loading
|
||||
|
||||
For multi-task benchmarks (e.g. LIBERO with 10 tasks), environments are wrapped in `_LazyAsyncVectorEnv` which defers worker creation until the task is actually evaluated. This keeps peak process count = `batch_size` instead of `n_tasks × batch_size`. After each task completes, workers are closed to free resources.
|
||||
|
||||
### Tuning for speed
|
||||
|
||||
| Situation | Recommendation |
|
||||
| ------------------------------ | ----------------------------------------------------- |
|
||||
| Slow eval, low GPU utilization | Increase `batch_size` (or leave at auto) |
|
||||
| Out of memory (system RAM) | Decrease `batch_size` |
|
||||
| Out of GPU memory | Decrease `batch_size`, or use `--policy.use_amp=true` |
|
||||
| Debugging / single-stepping | `--eval.batch_size=1 --eval.use_async_envs=false` |
|
||||
|
||||
## Output
|
||||
|
||||
Results are written to `output_dir` (default: `outputs/eval/<date>/<time>_<job_name>/`):
|
||||
|
||||
- `eval_info.json` — full metrics: per-episode, per-task, per-group, and overall aggregates
|
||||
- `videos/` — episode recordings (when `--eval.n_episodes_to_render > 0`)
|
||||
|
||||
### Metrics
|
||||
|
||||
| Metric | Description |
|
||||
| ---------------- | -------------------------------------------------------------------- |
|
||||
| `pc_success` | Success rate (%). Based on `info["is_success"]` from the environment |
|
||||
| `avg_sum_reward` | Mean cumulative reward per episode |
|
||||
| `avg_max_reward` | Mean peak reward per episode |
|
||||
| `n_episodes` | Total episodes evaluated |
|
||||
| `eval_s` | Total wall-clock time |
|
||||
| `eval_ep_s` | Mean wall-clock time per episode |
|
||||
|
||||
## Multi-task evaluation
|
||||
|
||||
For benchmarks with multiple tasks (LIBERO suites, Meta-World MT50), `lerobot-eval` automatically:
|
||||
|
||||
1. Creates environments for all tasks in the selected suite(s)
|
||||
2. Evaluates each task sequentially (one task's workers at a time)
|
||||
3. Aggregates metrics per-task, per-group (suite), and overall
|
||||
|
||||
```bash
|
||||
# Evaluate all 10 tasks in libero_spatial
|
||||
lerobot-eval \
|
||||
--policy.path=pepijn223/smolvla_libero \
|
||||
--env.type=libero \
|
||||
--env.task=libero_spatial \
|
||||
--eval.n_episodes=10
|
||||
|
||||
# Evaluate multiple suites
|
||||
lerobot-eval \
|
||||
--policy.path=pepijn223/smolvla_libero \
|
||||
--env.type=libero \
|
||||
--env.task="libero_spatial,libero_object" \
|
||||
--eval.n_episodes=10
|
||||
```
|
||||
|
||||
## API usage
|
||||
|
||||
You can call the eval functions directly from Python:
|
||||
|
||||
```python
|
||||
from lerobot.envs.factory import make_env
|
||||
from lerobot.policies.factory import make_policy
|
||||
from lerobot.scripts.lerobot_eval import eval_policy
|
||||
|
||||
envs = make_env(env_cfg, n_envs=10)
|
||||
policy = make_policy(cfg=policy_cfg, env_cfg=env_cfg)
|
||||
|
||||
metrics = eval_policy(
|
||||
env=envs["libero_spatial"][0],
|
||||
policy=policy,
|
||||
n_episodes=10,
|
||||
)
|
||||
print(metrics["pc_success"])
|
||||
```
|
||||
@@ -1,89 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
"""Extract natural-language task descriptions for a benchmark suite.
|
||||
|
||||
Runs inside the benchmark Docker container (where the env library is installed)
|
||||
immediately after lerobot-eval, writing a JSON file that parse_eval_metrics.py
|
||||
picks up and embeds in metrics.json.
|
||||
|
||||
Output format: {"<suite>_<task_idx>": "<nl instruction>", ...}
|
||||
|
||||
Usage:
|
||||
python scripts/ci/extract_task_descriptions.py \\
|
||||
--env libero --task libero_spatial \\
|
||||
--output /tmp/eval-artifacts/task_descriptions.json
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def _libero_descriptions(task_suite: str) -> dict[str, str]:
|
||||
from libero.libero import benchmark # type: ignore[import-untyped]
|
||||
|
||||
suite_dict = benchmark.get_benchmark_dict()
|
||||
if task_suite not in suite_dict:
|
||||
print(
|
||||
f"[extract_task_descriptions] Unknown LIBERO suite '{task_suite}'. "
|
||||
f"Available: {list(suite_dict.keys())}",
|
||||
file=sys.stderr,
|
||||
)
|
||||
return {}
|
||||
suite = suite_dict[task_suite]()
|
||||
return {f"{task_suite}_{i}": suite.get_task(i).language for i in range(suite.n_tasks)}
|
||||
|
||||
|
||||
def _metaworld_descriptions(task_name: str) -> dict[str, str]:
|
||||
# MetaWorld tasks don't expose a separate NL description attribute;
|
||||
# use a cleaned version of the task name as the description.
|
||||
label = task_name.removeprefix("metaworld-").replace("-", " ").strip()
|
||||
return {f"{task_name}_0": label}
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser(description=__doc__)
|
||||
parser.add_argument("--env", required=True, help="Environment family (libero, metaworld, ...)")
|
||||
parser.add_argument("--task", required=True, help="Task/suite name (e.g. libero_spatial)")
|
||||
parser.add_argument("--output", required=True, help="Path to write task_descriptions.json")
|
||||
args = parser.parse_args()
|
||||
|
||||
descriptions: dict[str, str] = {}
|
||||
try:
|
||||
if args.env == "libero":
|
||||
descriptions = _libero_descriptions(args.task)
|
||||
elif args.env == "metaworld":
|
||||
descriptions = _metaworld_descriptions(args.task)
|
||||
else:
|
||||
print(
|
||||
f"[extract_task_descriptions] No description extractor for env '{args.env}'.",
|
||||
file=sys.stderr,
|
||||
)
|
||||
except Exception as exc:
|
||||
print(f"[extract_task_descriptions] Warning: {exc}", file=sys.stderr)
|
||||
|
||||
out_path = Path(args.output)
|
||||
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
out_path.write_text(json.dumps(descriptions, indent=2))
|
||||
print(f"[extract_task_descriptions] {len(descriptions)} descriptions → {out_path}")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
@@ -1,129 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
"""Parse lerobot-eval output into a small metrics.json artifact.
|
||||
|
||||
Reads eval_info.json written by lerobot-eval --output_dir and extracts the
|
||||
key metrics needed by the health dashboard. Handles both single-task and
|
||||
multi-task eval output formats.
|
||||
|
||||
Usage:
|
||||
python scripts/ci/parse_eval_metrics.py \\
|
||||
--artifacts-dir /tmp/libero-artifacts \\
|
||||
--env libero \\
|
||||
--task libero_spatial \\
|
||||
--policy pepijn223/smolvla_libero
|
||||
|
||||
Writes <artifacts-dir>/metrics.json. The CI workflow then uploads this file
|
||||
as a GitHub Actions artifact named "<env>-metrics".
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import math
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def _extract_metrics(info: dict) -> tuple[float | None, int | None, float | None, float | None]:
|
||||
"""Extract (pc_success, n_episodes, avg_sum_reward, eval_s) from eval_info.json.
|
||||
|
||||
Handles two output shapes:
|
||||
- Single-task: {"aggregated": {"pc_success": 80.0, ...}}
|
||||
- Multi-task: {"overall": {"pc_success": 80.0, "n_episodes": 5, ...}}
|
||||
"""
|
||||
for key in ("aggregated", "overall"):
|
||||
if key not in info:
|
||||
continue
|
||||
agg = info[key]
|
||||
pc = agg.get("pc_success")
|
||||
n = agg.get("n_episodes")
|
||||
reward = agg.get("avg_sum_reward")
|
||||
eval_s = agg.get("eval_s")
|
||||
if pc is not None and not math.isnan(pc):
|
||||
return (
|
||||
float(pc),
|
||||
int(n) if n is not None else None,
|
||||
float(reward) if reward is not None else None,
|
||||
float(eval_s) if eval_s is not None else None,
|
||||
)
|
||||
|
||||
return None, None, None, None
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser(
|
||||
description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter
|
||||
)
|
||||
parser.add_argument("--artifacts-dir", required=True, help="Path to the mounted artifacts volume")
|
||||
parser.add_argument("--env", required=True, help="Environment name (e.g. libero)")
|
||||
parser.add_argument("--task", required=True, help="Task name (e.g. libero_spatial)")
|
||||
parser.add_argument("--policy", required=True, help="Policy hub path (e.g. pepijn223/smolvla_libero)")
|
||||
args = parser.parse_args()
|
||||
|
||||
artifacts_dir = Path(args.artifacts_dir)
|
||||
eval_info_path = artifacts_dir / "eval_info.json"
|
||||
|
||||
pc_success: float | None = None
|
||||
n_episodes: int | None = None
|
||||
avg_sum_reward: float | None = None
|
||||
eval_s: float | None = None
|
||||
|
||||
if eval_info_path.exists():
|
||||
try:
|
||||
info = json.loads(eval_info_path.read_text())
|
||||
pc_success, n_episodes, avg_sum_reward, eval_s = _extract_metrics(info)
|
||||
except (json.JSONDecodeError, KeyError, TypeError) as exc:
|
||||
print(f"[parse_eval_metrics] Warning: could not parse eval_info.json: {exc}", file=sys.stderr)
|
||||
else:
|
||||
print(
|
||||
f"[parse_eval_metrics] Warning: {eval_info_path} not found — eval may have failed.",
|
||||
file=sys.stderr,
|
||||
)
|
||||
|
||||
task_descriptions: dict[str, str] = {}
|
||||
task_desc_path = artifacts_dir / "task_descriptions.json"
|
||||
if task_desc_path.exists():
|
||||
try:
|
||||
task_descriptions = json.loads(task_desc_path.read_text())
|
||||
except json.JSONDecodeError as exc:
|
||||
print(
|
||||
f"[parse_eval_metrics] Warning: could not parse task_descriptions.json: {exc}",
|
||||
file=sys.stderr,
|
||||
)
|
||||
|
||||
metrics = {
|
||||
"env": args.env,
|
||||
"task": args.task,
|
||||
"policy": args.policy,
|
||||
"pc_success": pc_success,
|
||||
"n_episodes": n_episodes,
|
||||
"avg_sum_reward": avg_sum_reward,
|
||||
"eval_s": eval_s,
|
||||
"task_descriptions": task_descriptions,
|
||||
}
|
||||
|
||||
out_path = artifacts_dir / "metrics.json"
|
||||
out_path.write_text(json.dumps(metrics, indent=2))
|
||||
print(f"[parse_eval_metrics] Written: {out_path}")
|
||||
print(json.dumps(metrics, indent=2))
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
@@ -65,27 +65,20 @@ class WandBConfig:
|
||||
class EvalConfig:
|
||||
n_episodes: int = 50
|
||||
# `batch_size` specifies the number of environments to use in a gym.vector.VectorEnv.
|
||||
# Set to 0 for auto-tuning based on available CPU cores and n_episodes.
|
||||
batch_size: int = 0
|
||||
batch_size: int = 50
|
||||
# `use_async_envs` specifies whether to use asynchronous environments (multiprocessing).
|
||||
# Defaults to True; automatically downgraded to SyncVectorEnv when batch_size=1.
|
||||
use_async_envs: bool = True
|
||||
use_async_envs: bool = False
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
if self.batch_size == 0:
|
||||
self.batch_size = self._auto_batch_size()
|
||||
if self.batch_size > self.n_episodes:
|
||||
self.batch_size = self.n_episodes
|
||||
|
||||
def _auto_batch_size(self) -> int:
|
||||
"""Pick batch_size based on CPU cores, capped by n_episodes."""
|
||||
import math
|
||||
import os
|
||||
|
||||
cpu_cores = os.cpu_count() or 4
|
||||
# Each async env worker needs ~1 core; leave headroom for main process + inference.
|
||||
by_cpu = max(1, math.floor(cpu_cores * 0.7))
|
||||
return min(by_cpu, self.n_episodes, 64)
|
||||
raise ValueError(
|
||||
"The eval batch size is greater than the number of eval episodes "
|
||||
f"({self.batch_size} > {self.n_episodes}). As a result, {self.batch_size} "
|
||||
f"eval environments will be instantiated, but only {self.n_episodes} will be used. "
|
||||
"This might significantly slow down evaluation. To fix this, you should update your command "
|
||||
f"to increase the number of episodes to match the batch size (e.g. `eval.n_episodes={self.batch_size}`), "
|
||||
f"or lower the batch size (e.g. `eval.batch_size={self.n_episodes}`)."
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
|
||||
@@ -44,13 +44,6 @@ from lerobot.utils.constants import (
|
||||
)
|
||||
|
||||
|
||||
def _make_vec_env_cls(use_async: bool, n_envs: int):
|
||||
"""Return the right VectorEnv constructor."""
|
||||
if use_async and n_envs > 1:
|
||||
return gym.vector.AsyncVectorEnv
|
||||
return gym.vector.SyncVectorEnv
|
||||
|
||||
|
||||
@dataclass
|
||||
class EnvConfig(draccus.ChoiceRegistry, abc.ABC):
|
||||
task: str | None = None
|
||||
@@ -82,14 +75,13 @@ class EnvConfig(draccus.ChoiceRegistry, abc.ABC):
|
||||
def create_envs(
|
||||
self,
|
||||
n_envs: int,
|
||||
use_async_envs: bool = True,
|
||||
use_async_envs: bool = False,
|
||||
) -> dict[str, dict[int, gym.vector.VectorEnv]]:
|
||||
"""Create {suite: {task_id: VectorEnv}}.
|
||||
|
||||
Default: single-task env via gym.make(). Multi-task benchmarks override.
|
||||
AsyncVectorEnv is the default for n_envs > 1; auto-downgraded to Sync for n_envs=1.
|
||||
"""
|
||||
env_cls = gym.vector.AsyncVectorEnv if (use_async_envs and n_envs > 1) else gym.vector.SyncVectorEnv
|
||||
env_cls = gym.vector.AsyncVectorEnv if use_async_envs else gym.vector.SyncVectorEnv
|
||||
|
||||
if self.gym_id not in gym_registry:
|
||||
print(f"gym id '{self.gym_id}' not found, attempting to import '{self.package_name}'...")
|
||||
@@ -402,22 +394,17 @@ class LiberoEnv(EnvConfig):
|
||||
|
||||
@property
|
||||
def gym_kwargs(self) -> dict:
|
||||
kwargs: dict[str, Any] = {
|
||||
"obs_type": self.obs_type,
|
||||
"render_mode": self.render_mode,
|
||||
"observation_height": self.observation_height,
|
||||
"observation_width": self.observation_width,
|
||||
}
|
||||
kwargs: dict[str, Any] = {"obs_type": self.obs_type, "render_mode": self.render_mode}
|
||||
if self.task_ids is not None:
|
||||
kwargs["task_ids"] = self.task_ids
|
||||
return kwargs
|
||||
|
||||
def create_envs(self, n_envs: int, use_async_envs: bool = True):
|
||||
def create_envs(self, n_envs: int, use_async_envs: bool = False):
|
||||
from lerobot.envs.libero import create_libero_envs
|
||||
|
||||
if self.task is None:
|
||||
raise ValueError("LiberoEnv requires a task to be specified")
|
||||
env_cls = _make_vec_env_cls(use_async_envs, n_envs)
|
||||
env_cls = gym.vector.AsyncVectorEnv if use_async_envs else gym.vector.SyncVectorEnv
|
||||
return create_libero_envs(
|
||||
task=self.task,
|
||||
n_envs=n_envs,
|
||||
@@ -481,12 +468,12 @@ class MetaworldEnv(EnvConfig):
|
||||
"render_mode": self.render_mode,
|
||||
}
|
||||
|
||||
def create_envs(self, n_envs: int, use_async_envs: bool = True):
|
||||
def create_envs(self, n_envs: int, use_async_envs: bool = False):
|
||||
from lerobot.envs.metaworld import create_metaworld_envs
|
||||
|
||||
if self.task is None:
|
||||
raise ValueError("MetaWorld requires a task to be specified")
|
||||
env_cls = _make_vec_env_cls(use_async_envs, n_envs)
|
||||
env_cls = gym.vector.AsyncVectorEnv if use_async_envs else gym.vector.SyncVectorEnv
|
||||
return create_metaworld_envs(
|
||||
task=self.task,
|
||||
n_envs=n_envs,
|
||||
|
||||
@@ -58,7 +58,7 @@ def make_env_pre_post_processors(
|
||||
def make_env(
|
||||
cfg: EnvConfig | str,
|
||||
n_envs: int = 1,
|
||||
use_async_envs: bool = True,
|
||||
use_async_envs: bool = False,
|
||||
hub_cache_dir: str | None = None,
|
||||
trust_remote_code: bool = False,
|
||||
) -> dict[str, dict[int, gym.vector.VectorEnv]]:
|
||||
|
||||
+19
-51
@@ -29,7 +29,6 @@ from gymnasium import spaces
|
||||
from libero.libero import benchmark, get_libero_path
|
||||
from libero.libero.envs import OffScreenRenderEnv
|
||||
|
||||
from lerobot.envs.utils import _LazyAsyncVectorEnv
|
||||
from lerobot.types import RobotObservation
|
||||
|
||||
|
||||
@@ -151,17 +150,7 @@ class LiberoEnv(gym.Env):
|
||||
|
||||
self.init_state_id = self.episode_index # tie each sub-env to a fixed init state
|
||||
|
||||
# Extract task metadata without allocating GPU resources (safe before fork).
|
||||
task = task_suite.get_task(task_id)
|
||||
self.task = task.name
|
||||
self.task_description = task.language
|
||||
self._task_bddl_file = os.path.join(
|
||||
get_libero_path("bddl_files"), task.problem_folder, task.bddl_file
|
||||
)
|
||||
self._env: OffScreenRenderEnv | None = (
|
||||
None # deferred — created on first reset() inside the worker subprocess
|
||||
)
|
||||
|
||||
self._env = self._make_envs_task(task_suite, self.task_id)
|
||||
default_steps = 500
|
||||
self._max_episode_steps = (
|
||||
TASK_SUITE_MAX_STEPS.get(task_suite_name, default_steps)
|
||||
@@ -232,33 +221,29 @@ class LiberoEnv(gym.Env):
|
||||
low=ACTION_LOW, high=ACTION_HIGH, shape=(ACTION_DIM,), dtype=np.float32
|
||||
)
|
||||
|
||||
def _ensure_env(self) -> None:
|
||||
"""Create the underlying OffScreenRenderEnv on first use.
|
||||
|
||||
Called inside the worker subprocess after fork(), so each worker gets
|
||||
its own clean EGL context rather than inheriting a stale one from the
|
||||
parent process (which causes EGL_BAD_CONTEXT crashes with AsyncVectorEnv).
|
||||
"""
|
||||
if self._env is not None:
|
||||
return
|
||||
env = OffScreenRenderEnv(
|
||||
bddl_file_name=self._task_bddl_file,
|
||||
camera_heights=self.observation_height,
|
||||
camera_widths=self.observation_width,
|
||||
)
|
||||
env.reset()
|
||||
self._env = env
|
||||
|
||||
def render(self):
|
||||
self._ensure_env()
|
||||
raw_obs = self._env.env._get_observations()
|
||||
pixels = self._format_raw_obs(raw_obs)["pixels"]
|
||||
image = next(iter(pixels.values()))
|
||||
image = image[::-1, ::-1] # flip both H and W for visualization
|
||||
return image
|
||||
|
||||
def _make_envs_task(self, task_suite: Any, task_id: int = 0):
|
||||
task = task_suite.get_task(task_id)
|
||||
self.task = task.name
|
||||
self.task_description = task.language
|
||||
task_bddl_file = os.path.join(get_libero_path("bddl_files"), task.problem_folder, task.bddl_file)
|
||||
|
||||
env_args = {
|
||||
"bddl_file_name": task_bddl_file,
|
||||
"camera_heights": self.observation_height,
|
||||
"camera_widths": self.observation_width,
|
||||
}
|
||||
env = OffScreenRenderEnv(**env_args)
|
||||
env.reset()
|
||||
return env
|
||||
|
||||
def _format_raw_obs(self, raw_obs: RobotObservation) -> RobotObservation:
|
||||
assert self._env is not None, "_format_raw_obs called before _ensure_env()"
|
||||
images = {}
|
||||
for camera_name in self.camera_name:
|
||||
image = raw_obs[camera_name]
|
||||
@@ -310,7 +295,6 @@ class LiberoEnv(gym.Env):
|
||||
)
|
||||
|
||||
def reset(self, seed=None, **kwargs):
|
||||
self._ensure_env()
|
||||
super().reset(seed=seed)
|
||||
self._env.seed(seed)
|
||||
raw_obs = self._env.reset()
|
||||
@@ -337,8 +321,6 @@ class LiberoEnv(gym.Env):
|
||||
return observation, info
|
||||
|
||||
def step(self, action: np.ndarray) -> tuple[RobotObservation, float, bool, bool, dict[str, Any]]:
|
||||
self._ensure_env()
|
||||
assert self._env is not None
|
||||
if action.ndim != 1:
|
||||
raise ValueError(
|
||||
f"Expected action to be 1-D (shape (action_dim,)), "
|
||||
@@ -363,8 +345,7 @@ class LiberoEnv(gym.Env):
|
||||
return observation, reward, terminated, truncated, info
|
||||
|
||||
def close(self):
|
||||
if self._env is not None:
|
||||
self._env.close()
|
||||
self._env.close()
|
||||
|
||||
|
||||
def _make_env_fns(
|
||||
@@ -447,8 +428,6 @@ def create_libero_envs(
|
||||
if task_ids_filter is not None:
|
||||
print(f"Restricting to task_ids={task_ids_filter}")
|
||||
|
||||
is_async = env_cls is gym.vector.AsyncVectorEnv
|
||||
|
||||
out: dict[str, dict[int, Any]] = defaultdict(dict)
|
||||
for suite_name in suite_names:
|
||||
suite = _get_suite(suite_name)
|
||||
@@ -457,11 +436,6 @@ def create_libero_envs(
|
||||
if not selected:
|
||||
raise ValueError(f"No tasks selected for suite '{suite_name}' (available: {total}).")
|
||||
|
||||
# All tasks in a suite share identical observation/action spaces.
|
||||
# Probe once and reuse to avoid creating a temp env per task.
|
||||
cached_obs_space: spaces.Space | None = None
|
||||
cached_act_space: spaces.Space | None = None
|
||||
|
||||
for tid in selected:
|
||||
fns = _make_env_fns(
|
||||
suite=suite,
|
||||
@@ -475,14 +449,8 @@ def create_libero_envs(
|
||||
control_mode=control_mode,
|
||||
camera_name_mapping=camera_name_mapping,
|
||||
)
|
||||
if is_async:
|
||||
lazy = _LazyAsyncVectorEnv(fns, cached_obs_space, cached_act_space)
|
||||
if cached_obs_space is None:
|
||||
cached_obs_space = lazy.observation_space
|
||||
cached_act_space = lazy.action_space
|
||||
out[suite_name][tid] = lazy
|
||||
else:
|
||||
out[suite_name][tid] = env_cls(fns)
|
||||
out[suite_name][tid] = env_cls(fns)
|
||||
print(f"Built vec env | suite={suite_name} | task_id={tid} | n_envs={n_envs}")
|
||||
|
||||
# return plain dicts for predictability
|
||||
return {suite: dict(task_map) for suite, task_map in out.items()}
|
||||
|
||||
@@ -25,7 +25,6 @@ import metaworld.policies as policies
|
||||
import numpy as np
|
||||
from gymnasium import spaces
|
||||
|
||||
from lerobot.envs.utils import _LazyAsyncVectorEnv
|
||||
from lerobot.types import RobotObservation
|
||||
|
||||
# ---- Load configuration data from the external JSON file ----
|
||||
@@ -98,9 +97,8 @@ class MetaworldEnv(gym.Env):
|
||||
self.visualization_height = visualization_height
|
||||
self.camera_name = camera_name
|
||||
|
||||
self._env_name = self.task # already stripped of "metaworld-" prefix above
|
||||
self._env = None # deferred — created on first reset() inside the worker subprocess
|
||||
self._max_episode_steps = 500 # MT1 environments always have max_path_length=500
|
||||
self._env = self._make_envs_task(self.task)
|
||||
self._max_episode_steps = self._env.max_path_length
|
||||
self.task_description = TASK_DESCRIPTIONS[self.task]
|
||||
|
||||
self.expert_policy = TASK_POLICY_MAPPING[self.task]()
|
||||
@@ -138,24 +136,6 @@ class MetaworldEnv(gym.Env):
|
||||
|
||||
self.action_space = spaces.Box(low=-1, high=1, shape=(ACTION_DIM,), dtype=np.float32)
|
||||
|
||||
def _ensure_env(self) -> None:
|
||||
"""Create the underlying MetaWorld env on first use.
|
||||
|
||||
Called inside the worker subprocess after fork(), so each worker gets
|
||||
its own clean rendering context rather than inheriting a stale one from
|
||||
the parent process (which causes crashes with AsyncVectorEnv).
|
||||
"""
|
||||
if self._env is not None:
|
||||
return
|
||||
mt1 = metaworld.MT1(self._env_name, seed=42)
|
||||
env = mt1.train_classes[self._env_name](render_mode="rgb_array", camera_name=self.camera_name)
|
||||
env.set_task(mt1.train_tasks[0])
|
||||
if self.camera_name == "corner2":
|
||||
env.model.cam_pos[2] = [0.75, 0.075, 0.7]
|
||||
env.reset()
|
||||
env._freeze_rand_vec = False # otherwise no randomization
|
||||
self._env = env
|
||||
|
||||
def render(self) -> np.ndarray:
|
||||
"""
|
||||
Render the current environment frame.
|
||||
@@ -163,13 +143,26 @@ class MetaworldEnv(gym.Env):
|
||||
Returns:
|
||||
np.ndarray: The rendered RGB image from the environment.
|
||||
"""
|
||||
self._ensure_env()
|
||||
image = self._env.render()
|
||||
if self.camera_name == "corner2":
|
||||
# Images from this camera are flipped — correct them
|
||||
image = np.flip(image, (0, 1))
|
||||
return image
|
||||
|
||||
def _make_envs_task(self, env_name: str):
|
||||
mt1 = metaworld.MT1(env_name, seed=42)
|
||||
env = mt1.train_classes[env_name](render_mode="rgb_array", camera_name=self.camera_name)
|
||||
env.set_task(mt1.train_tasks[0])
|
||||
if self.camera_name == "corner2":
|
||||
env.model.cam_pos[2] = [
|
||||
0.75,
|
||||
0.075,
|
||||
0.7,
|
||||
] # corner2 position, similar to https://arxiv.org/pdf/2206.14244
|
||||
env.reset()
|
||||
env._freeze_rand_vec = False # otherwise no randomization
|
||||
return env
|
||||
|
||||
def _format_raw_obs(self, raw_obs: np.ndarray) -> RobotObservation:
|
||||
image = None
|
||||
if self._env is not None:
|
||||
@@ -216,7 +209,6 @@ class MetaworldEnv(gym.Env):
|
||||
observation (RobotObservation): The initial formatted observation.
|
||||
info (Dict[str, Any]): Additional info about the reset state.
|
||||
"""
|
||||
self._ensure_env()
|
||||
super().reset(seed=seed)
|
||||
|
||||
raw_obs, info = self._env.reset(seed=seed)
|
||||
@@ -240,7 +232,6 @@ class MetaworldEnv(gym.Env):
|
||||
truncated (bool): Whether the episode was truncated due to a time limit.
|
||||
info (Dict[str, Any]): Additional environment info.
|
||||
"""
|
||||
self._ensure_env()
|
||||
if action.ndim != 1:
|
||||
raise ValueError(
|
||||
f"Expected action to be 1-D (shape (action_dim,)), "
|
||||
@@ -272,8 +263,7 @@ class MetaworldEnv(gym.Env):
|
||||
return observation, reward, terminated, truncated, info
|
||||
|
||||
def close(self):
|
||||
if self._env is not None:
|
||||
self._env.close()
|
||||
self._env.close()
|
||||
|
||||
|
||||
# ---- Main API ----------------------------------------------------------------
|
||||
@@ -307,9 +297,6 @@ def create_metaworld_envs(
|
||||
|
||||
print(f"Creating Meta-World envs | task_groups={task_groups} | n_envs(per task)={n_envs}")
|
||||
|
||||
is_async = env_cls is gym.vector.AsyncVectorEnv
|
||||
cached_obs_space = None
|
||||
cached_act_space = None
|
||||
out: dict[str, dict[int, Any]] = defaultdict(dict)
|
||||
|
||||
for group in task_groups:
|
||||
@@ -322,14 +309,7 @@ def create_metaworld_envs(
|
||||
# build n_envs factories
|
||||
fns = [(lambda tn=task_name: MetaworldEnv(task=tn, **gym_kwargs)) for _ in range(n_envs)]
|
||||
|
||||
if is_async:
|
||||
lazy = _LazyAsyncVectorEnv(fns, cached_obs_space, cached_act_space)
|
||||
if cached_obs_space is None:
|
||||
cached_obs_space = lazy.observation_space
|
||||
cached_act_space = lazy.action_space
|
||||
out[group][tid] = lazy
|
||||
else:
|
||||
out[group][tid] = env_cls(fns)
|
||||
out[group][tid] = env_cls(fns)
|
||||
|
||||
# return a plain dict for consistency
|
||||
return {group: dict(task_map) for group, task_map in out.items()}
|
||||
|
||||
+27
-70
@@ -16,7 +16,7 @@
|
||||
import importlib.util
|
||||
import os
|
||||
import warnings
|
||||
from collections.abc import Callable, Mapping, Sequence
|
||||
from collections.abc import Mapping, Sequence
|
||||
from functools import singledispatch
|
||||
from typing import Any
|
||||
|
||||
@@ -130,99 +130,56 @@ def env_to_policy_features(env_cfg: EnvConfig) -> dict[str, PolicyFeature]:
|
||||
return policy_features
|
||||
|
||||
|
||||
def _sub_env_has_attr(env: gym.vector.VectorEnv, attr: str) -> bool:
|
||||
try:
|
||||
env.get_attr(attr)
|
||||
return True
|
||||
except (AttributeError, Exception):
|
||||
return False
|
||||
|
||||
|
||||
class _LazyAsyncVectorEnv:
|
||||
"""Defers AsyncVectorEnv creation until first use.
|
||||
|
||||
Creating all tasks' AsyncVectorEnvs upfront spawns N_tasks × n_envs worker
|
||||
processes, all of which allocate EGL/GPU resources immediately. Since tasks
|
||||
are evaluated sequentially, only one task's workers need to be alive at a
|
||||
time. This wrapper stores the factory functions and creates the real
|
||||
AsyncVectorEnv on first reset()/step()/call(), keeping peak process count = n_envs.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
env_fns: list[Callable],
|
||||
observation_space=None,
|
||||
action_space=None,
|
||||
):
|
||||
self._env_fns = env_fns
|
||||
self._env: gym.vector.AsyncVectorEnv | None = None
|
||||
self.num_envs = len(env_fns)
|
||||
if observation_space is not None and action_space is not None:
|
||||
self.observation_space = observation_space
|
||||
self.action_space = action_space
|
||||
else:
|
||||
tmp = env_fns[0]()
|
||||
self.observation_space = tmp.observation_space
|
||||
self.action_space = tmp.action_space
|
||||
tmp.close()
|
||||
self.single_observation_space = self.observation_space
|
||||
self.single_action_space = self.action_space
|
||||
|
||||
def _ensure(self) -> None:
|
||||
if self._env is None:
|
||||
self._env = gym.vector.AsyncVectorEnv(self._env_fns, context="forkserver", shared_memory=True)
|
||||
|
||||
def reset(self, **kwargs):
|
||||
self._ensure()
|
||||
return self._env.reset(**kwargs)
|
||||
|
||||
def step(self, actions):
|
||||
self._ensure()
|
||||
return self._env.step(actions)
|
||||
|
||||
def call(self, name, *args, **kwargs):
|
||||
self._ensure()
|
||||
return self._env.call(name, *args, **kwargs)
|
||||
|
||||
def get_attr(self, name):
|
||||
self._ensure()
|
||||
return self._env.get_attr(name)
|
||||
|
||||
def close(self) -> None:
|
||||
if self._env is not None:
|
||||
self._env.close()
|
||||
self._env = None
|
||||
def are_all_envs_same_type(env: gym.vector.VectorEnv) -> bool:
|
||||
first_type = type(env.envs[0]) # Get type of first env
|
||||
return all(type(e) is first_type for e in env.envs) # Fast type check
|
||||
|
||||
|
||||
def check_env_attributes_and_types(env: gym.vector.VectorEnv) -> None:
|
||||
with warnings.catch_warnings():
|
||||
warnings.simplefilter("once", UserWarning)
|
||||
warnings.simplefilter("once", UserWarning) # Apply filter only in this function
|
||||
|
||||
if not (_sub_env_has_attr(env, "task_description") and _sub_env_has_attr(env, "task")):
|
||||
if not (hasattr(env.envs[0], "task_description") and hasattr(env.envs[0], "task")):
|
||||
warnings.warn(
|
||||
"The environment does not have 'task_description' and 'task'. Some policies require these features.",
|
||||
UserWarning,
|
||||
stacklevel=2,
|
||||
)
|
||||
if not are_all_envs_same_type(env):
|
||||
warnings.warn(
|
||||
"The environments have different types. Make sure you infer the right task from each environment. Empty task will be passed instead.",
|
||||
UserWarning,
|
||||
stacklevel=2,
|
||||
)
|
||||
|
||||
|
||||
def add_envs_task(env: gym.vector.VectorEnv, observation: RobotObservation) -> RobotObservation:
|
||||
"""Adds task feature to the observation dict with respect to the first environment attribute."""
|
||||
if _sub_env_has_attr(env, "task_description"):
|
||||
task_result = list(env.call("task_description"))
|
||||
if hasattr(env.envs[0], "task_description"):
|
||||
task_result = env.call("task_description")
|
||||
|
||||
if isinstance(task_result, tuple):
|
||||
task_result = list(task_result)
|
||||
|
||||
if not isinstance(task_result, list):
|
||||
raise TypeError(f"Expected task_description to return a list, got {type(task_result)}")
|
||||
if not all(isinstance(item, str) for item in task_result):
|
||||
raise TypeError("All items in task_description result must be strings")
|
||||
|
||||
observation["task"] = task_result
|
||||
elif _sub_env_has_attr(env, "task"):
|
||||
task_result = list(env.call("task"))
|
||||
elif hasattr(env.envs[0], "task"):
|
||||
task_result = env.call("task")
|
||||
|
||||
if isinstance(task_result, tuple):
|
||||
task_result = list(task_result)
|
||||
|
||||
if not isinstance(task_result, list):
|
||||
raise TypeError(f"Expected task to return a list, got {type(task_result)}")
|
||||
if not all(isinstance(item, str) for item in task_result):
|
||||
raise TypeError("All items in task result must be strings")
|
||||
|
||||
observation["task"] = task_result
|
||||
else:
|
||||
else: # For envs without language instructions, e.g. aloha transfer cube and etc.
|
||||
num_envs = observation[list(observation.keys())[0]].shape[0]
|
||||
observation["task"] = ["" for _ in range(num_envs)]
|
||||
return observation
|
||||
|
||||
@@ -136,8 +136,8 @@ class TokenizerProcessorStep(ObservationProcessorStep):
|
||||
# Standardize to a list of strings for the tokenizer
|
||||
if isinstance(task, str):
|
||||
return [task]
|
||||
elif isinstance(task, (list, tuple)) and all(isinstance(t, str) for t in task):
|
||||
return list(task)
|
||||
elif isinstance(task, list) and all(isinstance(t, str) for t in task):
|
||||
return task
|
||||
|
||||
return None
|
||||
|
||||
|
||||
@@ -73,6 +73,7 @@ from lerobot.configs import parser
|
||||
from lerobot.configs.eval import EvalPipelineConfig
|
||||
from lerobot.envs.factory import make_env, make_env_pre_post_processors
|
||||
from lerobot.envs.utils import (
|
||||
add_envs_task,
|
||||
check_env_attributes_and_types,
|
||||
close_envs,
|
||||
preprocess_observation,
|
||||
@@ -165,15 +166,9 @@ def rollout(
|
||||
if return_observations:
|
||||
all_observations.append(deepcopy(observation))
|
||||
|
||||
# Infer "task" from sub-environments (prefer natural language description).
|
||||
# env.call() works with both SyncVectorEnv and AsyncVectorEnv.
|
||||
try:
|
||||
observation["task"] = list(env.call("task_description"))
|
||||
except Exception:
|
||||
try:
|
||||
observation["task"] = list(env.call("task"))
|
||||
except Exception:
|
||||
observation["task"] = [""] * env.num_envs
|
||||
# Infer "task" from attributes of environments.
|
||||
# TODO: works with SyncVectorEnv but not AsyncVectorEnv
|
||||
observation = add_envs_task(env, observation)
|
||||
|
||||
# Apply environment-specific preprocessing (e.g., LiberoProcessorStep for LIBERO)
|
||||
observation = env_preprocessor(observation)
|
||||
@@ -323,9 +318,8 @@ def eval_policy(
|
||||
n_to_render_now = min(max_episodes_rendered - n_episodes_rendered, env.num_envs)
|
||||
if isinstance(env, gym.vector.SyncVectorEnv):
|
||||
ep_frames.append(np.stack([env.envs[i].render() for i in range(n_to_render_now)])) # noqa: B023
|
||||
elif hasattr(env, "call"):
|
||||
elif isinstance(env, gym.vector.AsyncVectorEnv):
|
||||
# Here we must render all frames and discard any we don't need.
|
||||
# Covers AsyncVectorEnv and _LazyAsyncVectorEnv (which wraps one).
|
||||
ep_frames.append(np.stack(env.call("render")[:n_to_render_now]))
|
||||
|
||||
if max_episodes_rendered > 0:
|
||||
@@ -527,7 +521,7 @@ def eval_main(cfg: EvalPipelineConfig):
|
||||
|
||||
logging.info(colored("Output dir:", "yellow", attrs=["bold"]) + f" {cfg.output_dir}")
|
||||
|
||||
logging.info(f"Making environment (batch_size={cfg.eval.batch_size}, async={cfg.eval.use_async_envs}).")
|
||||
logging.info("Making environment.")
|
||||
envs = make_env(
|
||||
cfg.env,
|
||||
n_envs=cfg.eval.batch_size,
|
||||
@@ -761,39 +755,23 @@ def eval_policy_all(
|
||||
)
|
||||
|
||||
if max_parallel_tasks <= 1:
|
||||
prefetch_thread: threading.Thread | None = None
|
||||
for i, (task_group, task_id, env) in enumerate(tasks):
|
||||
if prefetch_thread is not None:
|
||||
prefetch_thread.join()
|
||||
prefetch_thread = None
|
||||
|
||||
try:
|
||||
tg, tid, metrics = task_runner(task_group, task_id, env)
|
||||
_accumulate_to(tg, metrics)
|
||||
per_task_infos.append({"task_group": tg, "task_id": tid, "metrics": metrics})
|
||||
finally:
|
||||
env.close()
|
||||
# Prefetch next task's workers *after* closing current env to prevent
|
||||
# GPU memory overlap between consecutive tasks.
|
||||
if i + 1 < len(tasks):
|
||||
next_env = tasks[i + 1][2]
|
||||
if hasattr(next_env, "_ensure"):
|
||||
prefetch_thread = threading.Thread(target=next_env._ensure, daemon=True)
|
||||
prefetch_thread.start()
|
||||
# sequential path (single accumulator path on the main thread)
|
||||
# NOTE: keeping a single-threaded accumulator avoids concurrent list appends or locks
|
||||
for task_group, task_id, env in tasks:
|
||||
tg, tid, metrics = task_runner(task_group, task_id, env)
|
||||
_accumulate_to(tg, metrics)
|
||||
per_task_infos.append({"task_group": tg, "task_id": tid, "metrics": metrics})
|
||||
else:
|
||||
# threaded path: submit all tasks, consume completions on main thread and accumulate there
|
||||
with cf.ThreadPoolExecutor(max_workers=max_parallel_tasks) as executor:
|
||||
fut2meta = {}
|
||||
for task_group, task_id, env in tasks:
|
||||
fut = executor.submit(task_runner, task_group, task_id, env)
|
||||
fut2meta[fut] = (task_group, task_id, env)
|
||||
fut2meta[fut] = (task_group, task_id)
|
||||
for fut in cf.as_completed(fut2meta):
|
||||
tg, tid, env = fut2meta[fut]
|
||||
try:
|
||||
tg, tid, metrics = fut.result()
|
||||
_accumulate_to(tg, metrics)
|
||||
per_task_infos.append({"task_group": tg, "task_id": tid, "metrics": metrics})
|
||||
finally:
|
||||
env.close()
|
||||
tg, tid, metrics = fut.result()
|
||||
_accumulate_to(tg, metrics)
|
||||
per_task_infos.append({"task_group": tg, "task_id": tid, "metrics": metrics})
|
||||
|
||||
# compute aggregated metrics helper (robust to lists/scalars)
|
||||
def _agg_from_list(xs):
|
||||
|
||||
@@ -90,7 +90,7 @@ def test_base_create_envs():
|
||||
envs = _Env().create_envs(n_envs=2)
|
||||
assert "_dispatch_base_test" in envs
|
||||
env = envs["_dispatch_base_test"][0]
|
||||
assert isinstance(env, gym.vector.VectorEnv)
|
||||
assert isinstance(env, gym.vector.SyncVectorEnv)
|
||||
assert env.num_envs == 2
|
||||
env.close()
|
||||
finally:
|
||||
|
||||
@@ -189,30 +189,6 @@ def test_list_of_strings_tokenization(mock_auto_tokenizer):
|
||||
assert attention_mask.shape == (2, 8)
|
||||
|
||||
|
||||
@require_package("transformers")
|
||||
@patch("lerobot.processor.tokenizer_processor.AutoTokenizer")
|
||||
def test_tuple_of_strings_tokenization(mock_auto_tokenizer):
|
||||
"""Test tokenization of a tuple of strings (returned by VectorEnv.call())."""
|
||||
mock_tokenizer = MockTokenizer(vocab_size=100)
|
||||
mock_auto_tokenizer.from_pretrained.return_value = mock_tokenizer
|
||||
|
||||
processor = TokenizerProcessorStep(tokenizer_name="test-tokenizer", max_length=8)
|
||||
|
||||
transition = create_transition(
|
||||
observation={"state": torch.tensor([1.0, 2.0])},
|
||||
action=torch.tensor([0.1, 0.2]),
|
||||
complementary_data={"task": ("pick up cube", "place on table")},
|
||||
)
|
||||
|
||||
result = processor(transition)
|
||||
|
||||
observation = result[TransitionKey.OBSERVATION]
|
||||
tokens = observation[f"{OBS_LANGUAGE}.tokens"]
|
||||
attention_mask = observation[f"{OBS_LANGUAGE}.attention_mask"]
|
||||
assert tokens.shape == (2, 8)
|
||||
assert attention_mask.shape == (2, 8)
|
||||
|
||||
|
||||
@require_package("transformers")
|
||||
@patch("lerobot.processor.tokenizer_processor.AutoTokenizer")
|
||||
def test_custom_keys(mock_auto_tokenizer):
|
||||
|
||||
Reference in New Issue
Block a user