mirror of
https://github.com/huggingface/lerobot.git
synced 2026-05-11 14:49:43 +00:00
Compare commits
8 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| a5b29d4301 | |||
| a4aa316470 | |||
| f6b16f6d97 | |||
| df0c335a5a | |||
| 87ed3a2b6e | |||
| d57d1aa197 | |||
| 3f8c5d9809 | |||
| d1548e1d13 |
@@ -83,11 +83,11 @@ jobs:
|
||||
fi
|
||||
|
||||
- name: Remove Tags with Git dependencies
|
||||
# TODO(Steven): Temporary patch to remove libero and pi from PyPi 0.4.0 release due to its reliance on git dependencies.
|
||||
# TODO(Steven): Temporary patch to remove pi from PyPi 0.4.0 release due to its reliance on git dependencies.
|
||||
run: |
|
||||
echo "::info:: Checking for Git dependencies to remove from pyproject.toml..."
|
||||
grep -E '@ git\+https|lerobot\[pi\]|lerobot\[libero\]' pyproject.toml | sed 's/^/::warning:: Removing line: /' || true
|
||||
sed -E -i '/@ git\+https|lerobot\[pi\]|lerobot\[libero\]/d' pyproject.toml
|
||||
grep -E '@ git\+https|lerobot\[pi\]' pyproject.toml | sed 's/^/::warning:: Removing line: /' || true
|
||||
sed -E -i '/@ git\+https|lerobot\[pi\]/d' pyproject.toml
|
||||
echo "::info:: Git dependencies removed. Proceeding with build."
|
||||
|
||||
- name: Install build dependencies
|
||||
|
||||
@@ -70,7 +70,7 @@ jobs:
|
||||
echo "Dependencies unbound:" && cat pyproject.toml
|
||||
|
||||
- name: Install lerobot with all extras
|
||||
run: uv sync --all-extras
|
||||
run: uv sync --all-extras --no-extra groot # TODO(Steven): Make flash-attn optional
|
||||
|
||||
- name: Run pytest (all extras)
|
||||
run: uv run pytest tests -vv
|
||||
|
||||
@@ -186,7 +186,7 @@ For a full list of optional dependencies, see:
|
||||
https://pypi.org/project/lerobot/
|
||||
|
||||
> [!NOTE]
|
||||
> For lerobot 0.4.0, if you want to install libero or pi tags, you will have to do: `pip install "lerobot[pi,libero]@git+https://github.com/huggingface/lerobot.git"`.
|
||||
> For lerobot 0.4.0, if you want to install pi tags, you will have to do: `pip install "lerobot[pi]@git+https://github.com/huggingface/lerobot.git"`.
|
||||
>
|
||||
> This will be solved in the next patch release
|
||||
|
||||
|
||||
@@ -41,6 +41,8 @@
|
||||
title: NVIDIA GR00T N1.5
|
||||
title: "Policies"
|
||||
- sections:
|
||||
- local: envhub
|
||||
title: Environments from the Hub
|
||||
- local: il_sim
|
||||
title: Imitation Learning in Sim
|
||||
- local: libero
|
||||
|
||||
@@ -0,0 +1,424 @@
|
||||
# Loading Environments from the Hub
|
||||
|
||||
The **EnvHub** feature allows you to load simulation environments directly from the Hugging Face Hub with a single line of code. This unlocks a powerful new model for collaboration: instead of environments being locked away inside monolithic libraries, anyone can publish custom environments and share them with the community.
|
||||
|
||||
## Overview
|
||||
|
||||
With EnvHub, you can:
|
||||
|
||||
- Load environments from the Hub instantly
|
||||
- Share your custom simulation tasks with the community
|
||||
- Version control your environments using Git
|
||||
- Distribute complex physics simulations without packaging hassles
|
||||
|
||||
## Quick Start
|
||||
|
||||
Loading an environment from the Hub is as simple as:
|
||||
|
||||
```python
|
||||
from lerobot.envs.factory import make_env
|
||||
|
||||
# Load a hub environment (requires explicit consent to run remote code)
|
||||
env = make_env("lerobot/cartpole-env", trust_remote_code=True)
|
||||
```
|
||||
|
||||
<Tip warning={true}>
|
||||
**Security Notice**: Loading environments from the Hub executes Python code
|
||||
from third-party repositories. Only use `trust_remote_code=True` with
|
||||
repositories you trust. We strongly recommend pinning to a specific commit
|
||||
hash for reproducibility and security.
|
||||
</Tip>
|
||||
|
||||
## What is EnvHub?
|
||||
|
||||
EnvHub is a framework that allows researchers and developers to:
|
||||
|
||||
1. **Publish environments** to the Hugging Face Hub as Git repositories
|
||||
2. **Load environments** dynamically without installing them as packages
|
||||
3. **Version and track** environment changes using Git semantics
|
||||
4. **Discover** new simulation tasks shared by the community
|
||||
|
||||
This design means you can go from discovering an interesting environment on the Hub to running experiments in seconds, without worrying about dependency conflicts or complex installation procedures.
|
||||
|
||||
## Repository Structure
|
||||
|
||||
To make your environment loadable from the Hub, your repository must contain at minimum:
|
||||
|
||||
### Required Files
|
||||
|
||||
**`env.py`** (or custom Python file)
|
||||
|
||||
- Must expose a `make_env(n_envs: int, use_async_envs: bool)` function
|
||||
- This function should return one of:
|
||||
- A `gym.vector.VectorEnv` (most common)
|
||||
- A single `gym.Env` (will be automatically wrapped)
|
||||
- A dict mapping `{suite_name: {task_id: VectorEnv}}` (for multi-task benchmarks)
|
||||
|
||||
### Optional Files
|
||||
|
||||
**`requirements.txt`**
|
||||
|
||||
- List any additional dependencies your environment needs
|
||||
- Users will need to install these manually before loading your environment
|
||||
|
||||
**`README.md`**
|
||||
|
||||
- Document your environment: what task it implements, observation/action spaces, rewards, etc.
|
||||
- Include usage examples and any special setup instructions
|
||||
|
||||
**`.gitignore`**
|
||||
|
||||
- Exclude unnecessary files from your repository
|
||||
|
||||
### Example Repository Structure
|
||||
|
||||
```
|
||||
my-environment-repo/
|
||||
├── env.py # Main environment definition (required)
|
||||
├── requirements.txt # Dependencies (optional)
|
||||
├── README.md # Documentation (recommended)
|
||||
├── assets/ # Images, videos, etc. (optional)
|
||||
│ └── demo.gif
|
||||
└── configs/ # Config files if needed (optional)
|
||||
└── task_config.yaml
|
||||
```
|
||||
|
||||
## Creating Your Environment Repository
|
||||
|
||||
### Step 1: Define Your Environment
|
||||
|
||||
Create an `env.py` file with a `make_env` function:
|
||||
|
||||
```python
|
||||
# env.py
|
||||
import gymnasium as gym
|
||||
|
||||
def make_env(n_envs: int = 1, use_async_envs: bool = False):
|
||||
"""
|
||||
Create vectorized environments for your custom task.
|
||||
|
||||
Args:
|
||||
n_envs: Number of parallel environments
|
||||
use_async_envs: Whether to use AsyncVectorEnv or SyncVectorEnv
|
||||
|
||||
Returns:
|
||||
gym.vector.VectorEnv or dict mapping suite names to vectorized envs
|
||||
"""
|
||||
def _make_single_env():
|
||||
# Create your custom environment
|
||||
return gym.make("CartPole-v1")
|
||||
|
||||
# Choose vector environment type
|
||||
env_cls = gym.vector.AsyncVectorEnv if use_async_envs else gym.vector.SyncVectorEnv
|
||||
|
||||
# Create vectorized environment
|
||||
vec_env = env_cls([_make_single_env for _ in range(n_envs)])
|
||||
|
||||
return vec_env
|
||||
```
|
||||
|
||||
### Step 2: Test Locally
|
||||
|
||||
Before uploading, test your environment locally:
|
||||
|
||||
```python
|
||||
from lerobot.envs.utils import _load_module_from_path, _call_make_env, _normalize_hub_result
|
||||
|
||||
# Load your module
|
||||
module = _load_module_from_path("./env.py")
|
||||
|
||||
# Test the make_env function
|
||||
result = _call_make_env(module, n_envs=2, use_async_envs=False)
|
||||
normalized = _normalize_hub_result(result)
|
||||
|
||||
# Verify it works
|
||||
suite_name = next(iter(normalized))
|
||||
env = normalized[suite_name][0]
|
||||
obs, info = env.reset()
|
||||
print(f"Observation shape: {obs.shape if hasattr(obs, 'shape') else type(obs)}")
|
||||
env.close()
|
||||
```
|
||||
|
||||
### Step 3: Upload to the Hub
|
||||
|
||||
Upload your repository to Hugging Face:
|
||||
|
||||
```bash
|
||||
# Install huggingface_hub if needed
|
||||
pip install huggingface_hub
|
||||
|
||||
# Login to Hugging Face
|
||||
huggingface-cli login
|
||||
|
||||
# Create a new repository
|
||||
huggingface-cli repo create my-custom-env --type space --org my-org
|
||||
|
||||
# Initialize git and push
|
||||
git init
|
||||
git add .
|
||||
git commit -m "Initial environment implementation"
|
||||
git remote add origin https://huggingface.co/my-org/my-custom-env
|
||||
git push -u origin main
|
||||
```
|
||||
|
||||
Alternatively, use the `huggingface_hub` Python API:
|
||||
|
||||
```python
|
||||
from huggingface_hub import HfApi
|
||||
|
||||
api = HfApi()
|
||||
|
||||
# Create repository
|
||||
api.create_repo("my-custom-env", repo_type="space")
|
||||
|
||||
# Upload files
|
||||
api.upload_folder(
|
||||
folder_path="./my-env-folder",
|
||||
repo_id="username/my-custom-env",
|
||||
repo_type="space",
|
||||
)
|
||||
```
|
||||
|
||||
## Loading Environments from the Hub
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```python
|
||||
from lerobot.envs.factory import make_env
|
||||
|
||||
# Load from the hub
|
||||
envs_dict = make_env(
|
||||
"username/my-custom-env",
|
||||
n_envs=4,
|
||||
trust_remote_code=True
|
||||
)
|
||||
|
||||
# Access the environment
|
||||
suite_name = next(iter(envs_dict))
|
||||
env = envs_dict[suite_name][0]
|
||||
|
||||
# Use it like any gym environment
|
||||
obs, info = env.reset()
|
||||
action = env.action_space.sample()
|
||||
obs, reward, terminated, truncated, info = env.step(action)
|
||||
```
|
||||
|
||||
### Advanced: Pinning to Specific Versions
|
||||
|
||||
For reproducibility and security, pin to a specific Git revision:
|
||||
|
||||
```python
|
||||
# Pin to a specific branch
|
||||
env = make_env("username/my-env@main", trust_remote_code=True)
|
||||
|
||||
# Pin to a specific commit (recommended for papers/experiments)
|
||||
env = make_env("username/my-env@abc123def456", trust_remote_code=True)
|
||||
|
||||
# Pin to a tag
|
||||
env = make_env("username/my-env@v1.0.0", trust_remote_code=True)
|
||||
```
|
||||
|
||||
### Custom File Paths
|
||||
|
||||
If your environment definition is not in `env.py`:
|
||||
|
||||
```python
|
||||
# Load from a custom file
|
||||
env = make_env("username/my-env:custom_env.py", trust_remote_code=True)
|
||||
|
||||
# Combine with version pinning
|
||||
env = make_env("username/my-env@v1.0:envs/task_a.py", trust_remote_code=True)
|
||||
```
|
||||
|
||||
### Async Environments
|
||||
|
||||
For better performance with multiple environments:
|
||||
|
||||
```python
|
||||
envs_dict = make_env(
|
||||
"username/my-env",
|
||||
n_envs=8,
|
||||
use_async_envs=True, # Use AsyncVectorEnv for parallel execution
|
||||
trust_remote_code=True
|
||||
)
|
||||
```
|
||||
|
||||
## URL Format Reference
|
||||
|
||||
The hub URL format supports several patterns:
|
||||
|
||||
| Pattern | Description | Example |
|
||||
| -------------------- | ------------------------------ | -------------------------------------- |
|
||||
| `user/repo` | Load `env.py` from main branch | `make_env("lerobot/pusht-env")` |
|
||||
| `user/repo@revision` | Load from specific revision | `make_env("lerobot/pusht-env@main")` |
|
||||
| `user/repo:path` | Load custom file | `make_env("lerobot/envs:pusht.py")` |
|
||||
| `user/repo@rev:path` | Revision + custom file | `make_env("lerobot/envs@v1:pusht.py")` |
|
||||
|
||||
## Multi-Task Environments
|
||||
|
||||
For benchmarks with multiple tasks (like LIBERO), return a nested dictionary:
|
||||
|
||||
```python
|
||||
def make_env(n_envs: int = 1, use_async_envs: bool = False):
|
||||
env_cls = gym.vector.AsyncVectorEnv if use_async_envs else gym.vector.SyncVectorEnv
|
||||
|
||||
# Return dict: {suite_name: {task_id: VectorEnv}}
|
||||
return {
|
||||
"suite_1": {
|
||||
0: env_cls([lambda: gym.make("Task1-v0") for _ in range(n_envs)]),
|
||||
1: env_cls([lambda: gym.make("Task2-v0") for _ in range(n_envs)]),
|
||||
},
|
||||
"suite_2": {
|
||||
0: env_cls([lambda: gym.make("Task3-v0") for _ in range(n_envs)]),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
<Tip warning={true}>
|
||||
**Important**: The `trust_remote_code=True` flag is required to execute
|
||||
environment code from the Hub. This is by design for security.
|
||||
</Tip>
|
||||
|
||||
When loading environments from the Hub:
|
||||
|
||||
1. **Review the code first**: Visit the repository and inspect `env.py` before loading
|
||||
2. **Pin to commits**: Use specific commit hashes for reproducibility
|
||||
3. **Check dependencies**: Review `requirements.txt` for suspicious packages
|
||||
4. **Use trusted sources**: Prefer official organizations or well-known researchers
|
||||
5. **Sandbox if needed**: Run untrusted code in isolated environments (containers, VMs)
|
||||
|
||||
Example of safe usage:
|
||||
|
||||
```python
|
||||
# ❌ BAD: Loading without inspection
|
||||
env = make_env("random-user/untrusted-env", trust_remote_code=True)
|
||||
|
||||
# ✅ GOOD: Review code, then pin to specific commit
|
||||
# 1. Visit https://huggingface.co/trusted-org/verified-env
|
||||
# 2. Review the env.py file
|
||||
# 3. Copy the commit hash
|
||||
env = make_env("trusted-org/verified-env@a1b2c3d4", trust_remote_code=True)
|
||||
```
|
||||
|
||||
## Example: CartPole from the Hub
|
||||
|
||||
Here's a complete example using the reference CartPole environment:
|
||||
|
||||
```python
|
||||
from lerobot.envs.factory import make_env
|
||||
import numpy as np
|
||||
|
||||
# Load the environment
|
||||
envs_dict = make_env("lerobot/cartpole-env", n_envs=4, trust_remote_code=True)
|
||||
|
||||
# Get the vectorized environment
|
||||
suite_name = next(iter(envs_dict))
|
||||
env = envs_dict[suite_name][0]
|
||||
|
||||
# Run a simple episode
|
||||
obs, info = env.reset()
|
||||
done = np.zeros(env.num_envs, dtype=bool)
|
||||
total_reward = np.zeros(env.num_envs)
|
||||
|
||||
while not done.all():
|
||||
# Random policy
|
||||
action = env.action_space.sample()
|
||||
obs, reward, terminated, truncated, info = env.step(action)
|
||||
total_reward += reward
|
||||
done = terminated | truncated
|
||||
|
||||
print(f"Average reward: {total_reward.mean():.2f}")
|
||||
env.close()
|
||||
```
|
||||
|
||||
## Benefits of EnvHub
|
||||
|
||||
### For Environment Authors
|
||||
|
||||
- **Easy distribution**: No PyPI packaging required
|
||||
- **Version control**: Use Git for environment versioning
|
||||
- **Rapid iteration**: Push updates instantly
|
||||
- **Documentation**: Hub README renders beautifully
|
||||
- **Community**: Reach LeRobot users directly
|
||||
|
||||
### For Researchers
|
||||
|
||||
- **Quick experiments**: Load any environment in one line
|
||||
- **Reproducibility**: Pin to specific commits
|
||||
- **Discovery**: Browse environments on the Hub
|
||||
- **No conflicts**: No need to install conflicting packages
|
||||
|
||||
### For the Community
|
||||
|
||||
- **Growing ecosystem**: More diverse simulation tasks
|
||||
- **Standardization**: Common `make_env` API
|
||||
- **Collaboration**: Fork and improve existing environments
|
||||
- **Accessibility**: Lower barrier to sharing research
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Refusing to execute remote code"
|
||||
|
||||
You must explicitly pass `trust_remote_code=True`:
|
||||
|
||||
```python
|
||||
env = make_env("user/repo", trust_remote_code=True)
|
||||
```
|
||||
|
||||
### "Module X not found"
|
||||
|
||||
The hub environment has dependencies you need to install:
|
||||
|
||||
```bash
|
||||
# Check the repo's requirements.txt and install dependencies
|
||||
pip install gymnasium numpy
|
||||
```
|
||||
|
||||
### "make_env not found in module"
|
||||
|
||||
Your `env.py` must expose a `make_env` function:
|
||||
|
||||
```python
|
||||
def make_env(n_envs: int, use_async_envs: bool):
|
||||
# Your implementation
|
||||
pass
|
||||
```
|
||||
|
||||
### Environment returns wrong type
|
||||
|
||||
The `make_env` function must return:
|
||||
|
||||
- A `gym.vector.VectorEnv`, or
|
||||
- A single `gym.Env`, or
|
||||
- A dict `{suite_name: {task_id: VectorEnv}}`
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Document your environment**: Include observation/action space descriptions, reward structure, and termination conditions in your README
|
||||
2. **Add requirements.txt**: List all dependencies with versions
|
||||
3. **Test thoroughly**: Verify your environment works locally before pushing
|
||||
4. **Use semantic versioning**: Tag releases with version numbers
|
||||
5. **Add examples**: Include usage examples in your README
|
||||
6. **Keep it simple**: Minimize dependencies when possible
|
||||
7. **License your work**: Add a LICENSE file to clarify usage terms
|
||||
|
||||
## Future Directions
|
||||
|
||||
The EnvHub ecosystem enables exciting possibilities:
|
||||
|
||||
- **GPU-accelerated physics**: Share Isaac Gym or Brax environments
|
||||
- **Photorealistic rendering**: Distribute environments with advanced graphics
|
||||
- **Multi-agent scenarios**: Complex interaction tasks
|
||||
- **Real-world simulators**: Digital twins of physical setups
|
||||
- **Procedural generation**: Infinite task variations
|
||||
- **Domain randomization**: Pre-configured DR pipelines
|
||||
|
||||
As more researchers and developers contribute, the diversity and quality of available environments will grow, benefiting the entire robotics learning community.
|
||||
|
||||
## See Also
|
||||
|
||||
- [Hugging Face Hub Documentation](https://huggingface.co/docs/hub/en/index)
|
||||
- [Gymnasium Documentation](https://gymnasium.farama.org/index.html)
|
||||
- [Example Hub Environment](https://huggingface.co/lerobot/cartpole-env)
|
||||
@@ -40,7 +40,7 @@ python -c "import flash_attn; print(f'Flash Attention {flash_attn.__version__} i
|
||||
3. Install LeRobot by running:
|
||||
|
||||
```bash
|
||||
pip install lerobot[groot] # consider also installing libero,dev and test tags
|
||||
pip install lerobot[groot]
|
||||
```
|
||||
|
||||
## Usage
|
||||
@@ -83,6 +83,9 @@ accelerate launch \
|
||||
|
||||
### Libero Benchmark Results
|
||||
|
||||
> [!NOTE]
|
||||
> Follow our instructions for Libero usage: [Libero](./libero)
|
||||
|
||||
GR00T has demonstrated strong performance on the Libero benchmark suite. To compare and test its LeRobot implementation, we finetuned the GR00T N1.5 model for 30k steps on the Libero dataset and compared the results to the GR00T reference results.
|
||||
|
||||
| Benchmark | LeRobot Implementation | GR00T Reference |
|
||||
|
||||
@@ -82,7 +82,7 @@ For a full list of optional dependencies, see:
|
||||
https://pypi.org/project/lerobot/
|
||||
|
||||
> [!NOTE]
|
||||
> For lerobot 0.4.0, if you want to install libero or pi, you will have to do: `pip install "lerobot[pi,libero]@git+https://github.com/huggingface/lerobot.git"`
|
||||
> For lerobot 0.4.0, if you want to install pi, you will have to do: `pip install "lerobot[pi]@git+https://github.com/huggingface/lerobot.git"`
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
|
||||
@@ -28,6 +28,11 @@ As described by Physical Intelligence, while AI has achieved remarkable success
|
||||
pip install -e ".[pi]"
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> For lerobot 0.4.0, if you want to install pi tag, you will have to do: `pip install "lerobot[pi]@git+https://github.com/huggingface/lerobot.git"`.
|
||||
>
|
||||
> This will be solved in the next patch release
|
||||
|
||||
## Training Data and Capabilities
|
||||
|
||||
π₀ is trained on the largest robot interaction dataset to date, combining three key data sources:
|
||||
|
||||
@@ -36,6 +36,11 @@ This diverse training mixture creates a "curriculum" that enables generalization
|
||||
pip install -e ".[pi]"
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> For lerobot 0.4.0, if you want to install pi tag, you will have to do: `pip install "lerobot[pi]@git+https://github.com/huggingface/lerobot.git"`.
|
||||
>
|
||||
> This will be solved in the next patch release
|
||||
|
||||
## Usage
|
||||
|
||||
To use π₀.₅ in your LeRobot configuration, specify the policy type as:
|
||||
|
||||
+1
-1
@@ -142,7 +142,7 @@ video_benchmark = ["scikit-image>=0.23.2,<0.26.0", "pandas>=2.2.2,<2.4.0"]
|
||||
# Simulation
|
||||
aloha = ["gym-aloha>=0.1.2,<0.2.0"]
|
||||
pusht = ["gym-pusht>=0.1.5,<0.2.0", "pymunk>=6.6.0,<7.0.0"] # TODO: Fix pymunk version in gym-pusht instead
|
||||
libero = ["lerobot[transformers-dep]", "libero @ git+https://github.com/huggingface/lerobot-libero.git@main#egg=libero"]
|
||||
libero = ["lerobot[transformers-dep]", "hf-libero>=0.1.3,<0.2.0"]
|
||||
metaworld = ["metaworld==3.0.0"]
|
||||
|
||||
# All
|
||||
|
||||
@@ -39,6 +39,7 @@ from lerobot.datasets.aggregate import aggregate_datasets
|
||||
from lerobot.datasets.compute_stats import aggregate_stats
|
||||
from lerobot.datasets.lerobot_dataset import LeRobotDataset, LeRobotDatasetMetadata
|
||||
from lerobot.datasets.utils import (
|
||||
DATA_DIR,
|
||||
DEFAULT_CHUNK_SIZE,
|
||||
DEFAULT_DATA_FILE_SIZE_IN_MB,
|
||||
DEFAULT_DATA_PATH,
|
||||
@@ -962,28 +963,23 @@ def _copy_data_with_feature_changes(
|
||||
remove_features: list[str] | None = None,
|
||||
) -> None:
|
||||
"""Copy data while adding or removing features."""
|
||||
if dataset.meta.episodes is None:
|
||||
dataset.meta.episodes = load_episodes(dataset.meta.root)
|
||||
data_dir = dataset.root / DATA_DIR
|
||||
parquet_files = sorted(data_dir.glob("*/*.parquet"))
|
||||
|
||||
# Map file paths to episode indices to extract chunk/file indices
|
||||
file_to_episodes: dict[Path, set[int]] = {}
|
||||
for ep_idx in range(dataset.meta.total_episodes):
|
||||
file_path = dataset.meta.get_data_file_path(ep_idx)
|
||||
if file_path not in file_to_episodes:
|
||||
file_to_episodes[file_path] = set()
|
||||
file_to_episodes[file_path].add(ep_idx)
|
||||
if not parquet_files:
|
||||
raise ValueError(f"No parquet files found in {data_dir}")
|
||||
|
||||
frame_idx = 0
|
||||
|
||||
for src_path in tqdm(sorted(file_to_episodes.keys()), desc="Processing data files"):
|
||||
df = pd.read_parquet(dataset.root / src_path).reset_index(drop=True)
|
||||
for src_path in tqdm(parquet_files, desc="Processing data files"):
|
||||
df = pd.read_parquet(src_path).reset_index(drop=True)
|
||||
|
||||
# Get chunk_idx and file_idx from the source file's first episode
|
||||
episodes_in_file = file_to_episodes[src_path]
|
||||
first_ep_idx = min(episodes_in_file)
|
||||
src_ep = dataset.meta.episodes[first_ep_idx]
|
||||
chunk_idx = src_ep["data/chunk_index"]
|
||||
file_idx = src_ep["data/file_index"]
|
||||
relative_path = src_path.relative_to(dataset.root)
|
||||
chunk_dir = relative_path.parts[1]
|
||||
file_name = relative_path.parts[2]
|
||||
|
||||
chunk_idx = int(chunk_dir.split("-")[1])
|
||||
file_idx = int(file_name.split("-")[1].split(".")[0])
|
||||
|
||||
if remove_features:
|
||||
df = df.drop(columns=remove_features, errors="ignore")
|
||||
@@ -1009,7 +1005,7 @@ def _copy_data_with_feature_changes(
|
||||
df[feature_name] = feature_slice
|
||||
frame_idx = end_idx
|
||||
|
||||
# Write using the preserved chunk_idx and file_idx from source
|
||||
# Write using the same chunk/file structure as source
|
||||
dst_path = new_meta.root / DEFAULT_DATA_PATH.format(chunk_index=chunk_idx, file_index=file_idx)
|
||||
dst_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
|
||||
@@ -430,9 +430,7 @@ class LeRobotDatasetMetadata:
|
||||
video_keys = [video_key] if video_key is not None else self.video_keys
|
||||
for key in video_keys:
|
||||
if not self.features[key].get("info", None):
|
||||
video_path = self.root / self.video_path.format(
|
||||
video_key=video_key, chunk_index=0, file_index=0
|
||||
)
|
||||
video_path = self.root / self.video_path.format(video_key=key, chunk_index=0, file_index=0)
|
||||
self.info["features"][key]["info"] = get_video_info(video_path)
|
||||
|
||||
def update_chunk_settings(
|
||||
@@ -942,11 +940,26 @@ class LeRobotDataset(torch.utils.data.Dataset):
|
||||
return query_timestamps
|
||||
|
||||
def _query_hf_dataset(self, query_indices: dict[str, list[int]]) -> dict:
|
||||
return {
|
||||
key: torch.stack(self.hf_dataset[q_idx][key])
|
||||
for key, q_idx in query_indices.items()
|
||||
if key not in self.meta.video_keys
|
||||
}
|
||||
"""
|
||||
Query dataset for indices across keys, skipping video keys.
|
||||
|
||||
Tries column-first [key][indices] for speed, falls back to row-first.
|
||||
|
||||
Args:
|
||||
query_indices: Dict mapping keys to index lists to retrieve
|
||||
|
||||
Returns:
|
||||
Dict with stacked tensors of queried data (video keys excluded)
|
||||
"""
|
||||
result: dict = {}
|
||||
for key, q_idx in query_indices.items():
|
||||
if key in self.meta.video_keys:
|
||||
continue
|
||||
try:
|
||||
result[key] = torch.stack(self.hf_dataset[key][q_idx])
|
||||
except (KeyError, TypeError, IndexError):
|
||||
result[key] = torch.stack(self.hf_dataset[q_idx][key])
|
||||
return result
|
||||
|
||||
def _query_videos(self, query_timestamps: dict[str, list[float]], ep_idx: int) -> dict[str, torch.Tensor]:
|
||||
"""Note: When using data workers (e.g. DataLoader with num_workers>0), do not call this function
|
||||
|
||||
@@ -19,6 +19,7 @@ import gymnasium as gym
|
||||
from gymnasium.envs.registration import registry as gym_registry
|
||||
|
||||
from lerobot.envs.configs import AlohaEnv, EnvConfig, LiberoEnv, PushtEnv
|
||||
from lerobot.envs.utils import _call_make_env, _download_hub_file, _import_hub_module, _normalize_hub_result
|
||||
|
||||
|
||||
def make_env_config(env_type: str, **kwargs) -> EnvConfig:
|
||||
@@ -33,15 +34,24 @@ def make_env_config(env_type: str, **kwargs) -> EnvConfig:
|
||||
|
||||
|
||||
def make_env(
|
||||
cfg: EnvConfig, n_envs: int = 1, use_async_envs: bool = False
|
||||
cfg: EnvConfig | str,
|
||||
n_envs: int = 1,
|
||||
use_async_envs: bool = False,
|
||||
hub_cache_dir: str | None = None,
|
||||
trust_remote_code: bool = False,
|
||||
) -> dict[str, dict[int, gym.vector.VectorEnv]]:
|
||||
"""Makes a gym vector environment according to the config.
|
||||
"""Makes a gym vector environment according to the config or Hub reference.
|
||||
|
||||
Args:
|
||||
cfg (EnvConfig): the config of the environment to instantiate.
|
||||
cfg (EnvConfig | str): Either an `EnvConfig` object describing the environment to build locally,
|
||||
or a Hugging Face Hub repository identifier (e.g. `"username/repo"`). In the latter case,
|
||||
the repo must include a Python file (usually `env.py`).
|
||||
n_envs (int, optional): The number of parallelized env to return. Defaults to 1.
|
||||
use_async_envs (bool, optional): Whether to return an AsyncVectorEnv or a SyncVectorEnv. Defaults to
|
||||
False.
|
||||
hub_cache_dir (str | None): Optional cache path for downloaded hub files.
|
||||
trust_remote_code (bool): **Explicit consent** to execute remote code from the Hub.
|
||||
Default False — must be set to True to import/exec hub `env.py`.
|
||||
|
||||
Raises:
|
||||
ValueError: if n_envs < 1
|
||||
@@ -54,6 +64,21 @@ def make_env(
|
||||
- For single-task environments: a single suite entry (cfg.type) with task_id=0.
|
||||
|
||||
"""
|
||||
# if user passed a hub id string (e.g., "username/repo", "username/repo@main:env.py")
|
||||
# simplified: only support hub-provided `make_env`
|
||||
if isinstance(cfg, str):
|
||||
# _download_hub_file will raise the same RuntimeError if trust_remote_code is False
|
||||
repo_id, file_path, local_file, revision = _download_hub_file(cfg, trust_remote_code, hub_cache_dir)
|
||||
|
||||
# import and surface clear import errors
|
||||
module = _import_hub_module(local_file, repo_id)
|
||||
|
||||
# call the hub-provided make_env
|
||||
raw_result = _call_make_env(module, n_envs=n_envs, use_async_envs=use_async_envs)
|
||||
|
||||
# normalize the return into {suite: {task_id: vec_env}}
|
||||
return _normalize_hub_result(raw_result)
|
||||
|
||||
if n_envs < 1:
|
||||
raise ValueError("`n_envs` must be at least 1")
|
||||
|
||||
|
||||
@@ -13,6 +13,8 @@
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import importlib.util
|
||||
import os
|
||||
import warnings
|
||||
from collections.abc import Mapping, Sequence
|
||||
from functools import singledispatch
|
||||
@@ -22,6 +24,7 @@ import einops
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
import torch
|
||||
from huggingface_hub import hf_hub_download, snapshot_download
|
||||
from torch import Tensor
|
||||
|
||||
from lerobot.configs.types import FeatureType, PolicyFeature
|
||||
@@ -195,3 +198,132 @@ def _(envs: Sequence) -> None:
|
||||
@close_envs.register
|
||||
def _(env: gym.Env) -> None:
|
||||
_close_single_env(env)
|
||||
|
||||
|
||||
# helper to safely load a python file as a module
|
||||
def _load_module_from_path(path: str, module_name: str | None = None):
|
||||
module_name = module_name or f"hub_env_{os.path.basename(path).replace('.', '_')}"
|
||||
spec = importlib.util.spec_from_file_location(module_name, path)
|
||||
if spec is None:
|
||||
raise ImportError(f"Could not load module spec for {module_name} from {path}")
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(module) # type: ignore
|
||||
return module
|
||||
|
||||
|
||||
# helper to parse hub string (supports "user/repo", "user/repo@rev", optional path)
|
||||
# examples:
|
||||
# "user/repo" -> will look for env.py at repo root
|
||||
# "user/repo@main:envs/my_env.py" -> explicit revision and path
|
||||
def _parse_hub_url(hub_uri: str):
|
||||
# very small parser: [repo_id][@revision][:path]
|
||||
# repo_id is required (user/repo or org/repo)
|
||||
revision = None
|
||||
file_path = "env.py"
|
||||
if "@" in hub_uri:
|
||||
repo_and_rev, *rest = hub_uri.split(":", 1)
|
||||
repo_id, rev = repo_and_rev.split("@", 1)
|
||||
revision = rev
|
||||
if rest:
|
||||
file_path = rest[0]
|
||||
else:
|
||||
repo_id, *rest = hub_uri.split(":", 1)
|
||||
if rest:
|
||||
file_path = rest[0]
|
||||
return repo_id, revision, file_path
|
||||
|
||||
|
||||
def _download_hub_file(
|
||||
cfg_str: str,
|
||||
trust_remote_code: bool,
|
||||
hub_cache_dir: str | None,
|
||||
) -> tuple[str, str, str, str]:
|
||||
"""
|
||||
Parse `cfg_str` (hub URL), enforce `trust_remote_code`, and return
|
||||
(repo_id, file_path, local_file, revision).
|
||||
"""
|
||||
if not trust_remote_code:
|
||||
raise RuntimeError(
|
||||
f"Refusing to execute remote code from the Hub for '{cfg_str}'. "
|
||||
"Executing hub env modules runs arbitrary Python code from third-party repositories. "
|
||||
"If you trust this repo and understand the risks, call `make_env(..., trust_remote_code=True)` "
|
||||
"and prefer pinning to a specific revision: 'user/repo@<commit-hash>:env.py'."
|
||||
)
|
||||
|
||||
repo_id, revision, file_path = _parse_hub_url(cfg_str)
|
||||
|
||||
try:
|
||||
local_file = hf_hub_download(
|
||||
repo_id=repo_id, filename=file_path, revision=revision, cache_dir=hub_cache_dir
|
||||
)
|
||||
except Exception as e:
|
||||
# fallback to snapshot download
|
||||
snapshot_dir = snapshot_download(repo_id=repo_id, revision=revision, cache_dir=hub_cache_dir)
|
||||
local_file = os.path.join(snapshot_dir, file_path)
|
||||
if not os.path.exists(local_file):
|
||||
raise FileNotFoundError(
|
||||
f"Could not find {file_path} in repository {repo_id}@{revision or 'main'}"
|
||||
) from e
|
||||
|
||||
return repo_id, file_path, local_file, revision
|
||||
|
||||
|
||||
def _import_hub_module(local_file: str, repo_id: str) -> Any:
|
||||
"""
|
||||
Import the downloaded file as a module and surface helpful import error messages.
|
||||
"""
|
||||
module_name = f"hub_env_{repo_id.replace('/', '_')}"
|
||||
try:
|
||||
module = _load_module_from_path(local_file, module_name=module_name)
|
||||
except ModuleNotFoundError as e:
|
||||
missing = getattr(e, "name", None) or str(e)
|
||||
raise ModuleNotFoundError(
|
||||
f"Hub env '{repo_id}:{os.path.basename(local_file)}' failed to import because the dependency "
|
||||
f"'{missing}' is not installed locally.\n\n"
|
||||
) from e
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
f"Failed to load hub env module '{repo_id}:{os.path.basename(local_file)}'. Import error: {e}\n\n"
|
||||
) from e
|
||||
return module
|
||||
|
||||
|
||||
def _call_make_env(module: Any, n_envs: int, use_async_envs: bool) -> Any:
|
||||
"""
|
||||
Ensure module exposes make_env and call it.
|
||||
"""
|
||||
if not hasattr(module, "make_env"):
|
||||
raise AttributeError(
|
||||
f"The hub module {getattr(module, '__name__', 'hub_module')} must expose `make_env(n_envs=int, use_async_envs=bool)`."
|
||||
)
|
||||
entry_fn = module.make_env
|
||||
return entry_fn(n_envs=n_envs, use_async_envs=use_async_envs)
|
||||
|
||||
|
||||
def _normalize_hub_result(result: Any) -> dict[str, dict[int, gym.vector.VectorEnv]]:
|
||||
"""
|
||||
Normalize possible return types from hub `make_env` into the mapping:
|
||||
{ suite_name: { task_id: vector_env } }
|
||||
Accepts:
|
||||
- dict (assumed already correct)
|
||||
- gym.vector.VectorEnv
|
||||
- gym.Env (will be wrapped into SyncVectorEnv)
|
||||
"""
|
||||
if isinstance(result, dict):
|
||||
return result
|
||||
|
||||
# VectorEnv: use its spec.id if available
|
||||
if isinstance(result, gym.vector.VectorEnv):
|
||||
suite_name = getattr(result, "spec", None) and getattr(result.spec, "id", None) or "hub_env"
|
||||
return {suite_name: {0: result}}
|
||||
|
||||
# Single Env: wrap into SyncVectorEnv
|
||||
if isinstance(result, gym.Env):
|
||||
vec = gym.vector.SyncVectorEnv([lambda: result])
|
||||
suite_name = getattr(result, "spec", None) and getattr(result.spec, "id", None) or "hub_env"
|
||||
return {suite_name: {0: vec}}
|
||||
|
||||
raise ValueError(
|
||||
"Hub `make_env` must return either a mapping {suite: {task_id: vec_env}}, "
|
||||
"a gym.vector.VectorEnv, or a single gym.Env."
|
||||
)
|
||||
|
||||
@@ -38,6 +38,7 @@ from lerobot.policies.sac.configuration_sac import SACConfig
|
||||
from lerobot.policies.sac.reward_model.configuration_classifier import RewardClassifierConfig
|
||||
from lerobot.policies.smolvla.configuration_smolvla import SmolVLAConfig
|
||||
from lerobot.policies.tdmpc.configuration_tdmpc import TDMPCConfig
|
||||
from lerobot.policies.utils import validate_visual_features_consistency
|
||||
from lerobot.policies.vqbet.configuration_vqbet import VQBeTConfig
|
||||
from lerobot.processor import PolicyAction, PolicyProcessorPipeline
|
||||
from lerobot.processor.converters import (
|
||||
@@ -420,20 +421,7 @@ def make_policy(
|
||||
# policy = torch.compile(policy, mode="reduce-overhead")
|
||||
|
||||
if not rename_map:
|
||||
expected_features = set(cfg.input_features.keys()) | set(cfg.output_features.keys())
|
||||
provided_features = set(features.keys())
|
||||
if expected_features and provided_features != expected_features:
|
||||
missing = expected_features - provided_features
|
||||
extra = provided_features - expected_features
|
||||
# TODO (jadechoghari): provide a dynamic rename map suggestion to the user.
|
||||
raise ValueError(
|
||||
f"Feature mismatch between dataset/environment and policy config.\n"
|
||||
f"- Missing features: {sorted(missing) if missing else 'None'}\n"
|
||||
f"- Extra features: {sorted(extra) if extra else 'None'}\n\n"
|
||||
f"Please ensure your dataset and policy use consistent feature names.\n"
|
||||
f"If your dataset uses different observation keys (e.g., cameras named differently), "
|
||||
f"use the `--rename_map` argument, for example:\n"
|
||||
f' --rename_map=\'{{"observation.images.left": "observation.images.camera1", '
|
||||
f'"observation.images.top": "observation.images.camera2"}}\''
|
||||
)
|
||||
validate_visual_features_consistency(cfg, features)
|
||||
# TODO: (jadechoghari) - add a check_state(cfg, features) and check_action(cfg, features)
|
||||
|
||||
return policy
|
||||
|
||||
@@ -22,6 +22,8 @@ import numpy as np
|
||||
import torch
|
||||
from torch import nn
|
||||
|
||||
from lerobot.configs.policies import PreTrainedConfig
|
||||
from lerobot.configs.types import FeatureType, PolicyFeature
|
||||
from lerobot.datasets.utils import build_dataset_frame
|
||||
from lerobot.processor import PolicyAction, RobotAction, RobotObservation
|
||||
from lerobot.utils.constants import ACTION, OBS_STR
|
||||
@@ -198,3 +200,42 @@ def make_robot_action(action_tensor: PolicyAction, ds_features: dict[str, dict])
|
||||
f"{name}": float(action_tensor[i]) for i, name in enumerate(action_names)
|
||||
}
|
||||
return act_processed_policy
|
||||
|
||||
|
||||
def raise_feature_mismatch_error(
|
||||
provided_features: set[str],
|
||||
expected_features: set[str],
|
||||
) -> None:
|
||||
"""
|
||||
Raises a standardized ValueError for feature mismatches between dataset/environment and policy config.
|
||||
"""
|
||||
missing = expected_features - provided_features
|
||||
extra = provided_features - expected_features
|
||||
# TODO (jadechoghari): provide a dynamic rename map suggestion to the user.
|
||||
raise ValueError(
|
||||
f"Feature mismatch between dataset/environment and policy config.\n"
|
||||
f"- Missing features: {sorted(missing) if missing else 'None'}\n"
|
||||
f"- Extra features: {sorted(extra) if extra else 'None'}\n\n"
|
||||
f"Please ensure your dataset and policy use consistent feature names.\n"
|
||||
f"If your dataset uses different observation keys (e.g., cameras named differently), "
|
||||
f"use the `--rename_map` argument, for example:\n"
|
||||
f' --rename_map=\'{{"observation.images.left": "observation.images.camera1", '
|
||||
f'"observation.images.top": "observation.images.camera2"}}\''
|
||||
)
|
||||
|
||||
|
||||
def validate_visual_features_consistency(
|
||||
cfg: PreTrainedConfig,
|
||||
features: dict[str, PolicyFeature],
|
||||
) -> None:
|
||||
"""
|
||||
Validates visual feature consistency between a policy config and provided dataset/environment features.
|
||||
|
||||
Args:
|
||||
cfg (PreTrainedConfig): The model or policy configuration containing input_features and type.
|
||||
features (Dict[str, PolicyFeature]): A mapping of feature names to PolicyFeature objects.
|
||||
"""
|
||||
expected_visuals = {k for k, v in cfg.input_features.items() if v.type == FeatureType.VISUAL}
|
||||
provided_visuals = {k for k, v in features.items() if v.type == FeatureType.VISUAL}
|
||||
if not provided_visuals.issubset(expected_visuals):
|
||||
raise_feature_mismatch_error(provided_visuals, expected_visuals)
|
||||
|
||||
+159
-1
@@ -17,6 +17,7 @@ import importlib
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
import pytest
|
||||
import torch
|
||||
from gymnasium.envs.registration import register, registry as gym_registry
|
||||
@@ -26,7 +27,11 @@ import lerobot
|
||||
from lerobot.configs.types import PolicyFeature
|
||||
from lerobot.envs.configs import EnvConfig
|
||||
from lerobot.envs.factory import make_env, make_env_config
|
||||
from lerobot.envs.utils import preprocess_observation
|
||||
from lerobot.envs.utils import (
|
||||
_normalize_hub_result,
|
||||
_parse_hub_url,
|
||||
preprocess_observation,
|
||||
)
|
||||
from tests.utils import require_env
|
||||
|
||||
OBS_TYPES = ["state", "pixels", "pixels_agent_pos"]
|
||||
@@ -108,3 +113,156 @@ def test_factory_custom_gym_id():
|
||||
finally:
|
||||
if gym_id in gym_registry:
|
||||
del gym_registry[gym_id]
|
||||
|
||||
|
||||
# Hub environment loading tests
|
||||
|
||||
|
||||
def test_make_env_hub_url_parsing():
|
||||
"""Test URL parsing for hub environment references."""
|
||||
# simple repo_id
|
||||
repo_id, revision, file_path = _parse_hub_url("user/repo")
|
||||
assert repo_id == "user/repo"
|
||||
assert revision is None
|
||||
assert file_path == "env.py"
|
||||
|
||||
# repo with revision
|
||||
repo_id, revision, file_path = _parse_hub_url("user/repo@main")
|
||||
assert repo_id == "user/repo"
|
||||
assert revision == "main"
|
||||
assert file_path == "env.py"
|
||||
|
||||
# repo with custom file path
|
||||
repo_id, revision, file_path = _parse_hub_url("user/repo:custom_env.py")
|
||||
assert repo_id == "user/repo"
|
||||
assert revision is None
|
||||
assert file_path == "custom_env.py"
|
||||
|
||||
# repo with revision and custom file path
|
||||
repo_id, revision, file_path = _parse_hub_url("user/repo@v1.0:envs/my_env.py")
|
||||
assert repo_id == "user/repo"
|
||||
assert revision == "v1.0"
|
||||
assert file_path == "envs/my_env.py"
|
||||
|
||||
# repo with commit hash
|
||||
repo_id, revision, file_path = _parse_hub_url("org/repo@abc123def456")
|
||||
assert repo_id == "org/repo"
|
||||
assert revision == "abc123def456"
|
||||
assert file_path == "env.py"
|
||||
|
||||
|
||||
def test_normalize_hub_result():
|
||||
"""Test normalization of different return types from hub make_env."""
|
||||
# test with VectorEnv (most common case)
|
||||
mock_vec_env = gym.vector.SyncVectorEnv([lambda: gym.make("CartPole-v1")])
|
||||
result = _normalize_hub_result(mock_vec_env)
|
||||
assert isinstance(result, dict)
|
||||
assert len(result) == 1
|
||||
suite_name = next(iter(result))
|
||||
assert 0 in result[suite_name]
|
||||
assert isinstance(result[suite_name][0], gym.vector.VectorEnv)
|
||||
mock_vec_env.close()
|
||||
|
||||
# test with single Env
|
||||
mock_env = gym.make("CartPole-v1")
|
||||
result = _normalize_hub_result(mock_env)
|
||||
assert isinstance(result, dict)
|
||||
suite_name = next(iter(result))
|
||||
assert 0 in result[suite_name]
|
||||
assert isinstance(result[suite_name][0], gym.vector.VectorEnv)
|
||||
result[suite_name][0].close()
|
||||
|
||||
# test with dict (already normalized)
|
||||
mock_vec_env = gym.vector.SyncVectorEnv([lambda: gym.make("CartPole-v1")])
|
||||
input_dict = {"my_suite": {0: mock_vec_env}}
|
||||
result = _normalize_hub_result(input_dict)
|
||||
assert result == input_dict
|
||||
assert "my_suite" in result
|
||||
assert 0 in result["my_suite"]
|
||||
mock_vec_env.close()
|
||||
|
||||
# test with invalid type
|
||||
with pytest.raises(ValueError, match="Hub `make_env` must return"):
|
||||
_normalize_hub_result("invalid_type")
|
||||
|
||||
|
||||
def test_make_env_from_hub_requires_trust_remote_code():
|
||||
"""Test that loading from hub requires explicit trust_remote_code=True."""
|
||||
hub_id = "lerobot/cartpole-env"
|
||||
|
||||
# Should raise RuntimeError when trust_remote_code=False (default)
|
||||
with pytest.raises(RuntimeError, match="Refusing to execute remote code"):
|
||||
make_env(hub_id, trust_remote_code=False)
|
||||
|
||||
# Should also raise when not specified (defaults to False)
|
||||
with pytest.raises(RuntimeError, match="Refusing to execute remote code"):
|
||||
make_env(hub_id)
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"hub_id",
|
||||
[
|
||||
"lerobot/cartpole-env",
|
||||
"lerobot/cartpole-env@main",
|
||||
"lerobot/cartpole-env:env.py",
|
||||
],
|
||||
)
|
||||
def test_make_env_from_hub_with_trust(hub_id):
|
||||
"""Test loading environment from Hugging Face Hub with trust_remote_code=True."""
|
||||
# load environment from hub
|
||||
envs_dict = make_env(hub_id, n_envs=2, trust_remote_code=True)
|
||||
|
||||
# verify structure
|
||||
assert isinstance(envs_dict, dict)
|
||||
assert len(envs_dict) >= 1
|
||||
|
||||
# get the first suite and task
|
||||
suite_name = next(iter(envs_dict))
|
||||
task_id = next(iter(envs_dict[suite_name]))
|
||||
env = envs_dict[suite_name][task_id]
|
||||
|
||||
# verify it's a vector environment
|
||||
assert isinstance(env, gym.vector.VectorEnv)
|
||||
assert env.num_envs == 2
|
||||
|
||||
# test basic environment interaction
|
||||
obs, info = env.reset()
|
||||
assert obs is not None
|
||||
assert isinstance(obs, (dict, np.ndarray))
|
||||
|
||||
# take a random action
|
||||
action = env.action_space.sample()
|
||||
obs, reward, terminated, truncated, info = env.step(action)
|
||||
assert obs is not None
|
||||
assert isinstance(reward, np.ndarray)
|
||||
assert len(reward) == 2
|
||||
|
||||
# clean up
|
||||
env.close()
|
||||
|
||||
|
||||
def test_make_env_from_hub_async():
|
||||
"""Test loading hub environment with async vector environments."""
|
||||
hub_id = "lerobot/cartpole-env"
|
||||
|
||||
# load with async envs
|
||||
envs_dict = make_env(hub_id, n_envs=2, use_async_envs=True, trust_remote_code=True)
|
||||
|
||||
suite_name = next(iter(envs_dict))
|
||||
task_id = next(iter(envs_dict[suite_name]))
|
||||
env = envs_dict[suite_name][task_id]
|
||||
|
||||
# verify it's an async vector environment
|
||||
assert isinstance(env, gym.vector.AsyncVectorEnv)
|
||||
assert env.num_envs == 2
|
||||
|
||||
# test basic interaction
|
||||
obs, info = env.reset()
|
||||
assert obs is not None
|
||||
|
||||
action = env.action_space.sample()
|
||||
obs, reward, terminated, truncated, info = env.step(action)
|
||||
assert len(reward) == 2
|
||||
|
||||
# clean up
|
||||
env.close()
|
||||
|
||||
@@ -0,0 +1,157 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
"""
|
||||
Visual Feature Consistency Tests
|
||||
|
||||
This module tests the `validate_visual_features_consistency` function,
|
||||
which ensures that visual features (camera observations) in a dataset/env
|
||||
match the expectations defined in a policy configuration.
|
||||
|
||||
The purpose of this check is to prevent mismatches between what a policy expects
|
||||
(e.g., `observation.images.camera1`, `camera2`, `camera3`) and what a dataset or
|
||||
environment actually provides (e.g., `observation.images.top`, `side`, or fewer cameras).
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import pytest
|
||||
|
||||
from lerobot.configs.default import DatasetConfig
|
||||
from lerobot.configs.policies import PreTrainedConfig
|
||||
from lerobot.configs.train import TrainPipelineConfig
|
||||
from lerobot.datasets.lerobot_dataset import LeRobotDataset
|
||||
from lerobot.policies.factory import make_policy_config
|
||||
from lerobot.scripts.lerobot_train import train
|
||||
from lerobot.utils.utils import auto_select_torch_device
|
||||
|
||||
pytest.importorskip("transformers")
|
||||
|
||||
DUMMY_REPO_ID = "dummy/repo"
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_dir(tmp_path):
|
||||
return tmp_path
|
||||
|
||||
|
||||
DUMMY_STATE_DIM = 6
|
||||
DUMMY_ACTION_DIM = 6
|
||||
IMAGE_SIZE = 8
|
||||
DEVICE = auto_select_torch_device()
|
||||
|
||||
|
||||
def make_dummy_dataset(camera_keys, tmp_path):
|
||||
"""Creates a minimal dummy dataset for testing rename_mapping logic."""
|
||||
features = {
|
||||
"action": {"dtype": "float32", "shape": (DUMMY_ACTION_DIM,), "names": None},
|
||||
"observation.state": {"dtype": "float32", "shape": (DUMMY_STATE_DIM,), "names": None},
|
||||
}
|
||||
for cam in camera_keys:
|
||||
features[f"observation.images.{cam}"] = {
|
||||
"dtype": "image",
|
||||
"shape": (IMAGE_SIZE, IMAGE_SIZE, 3),
|
||||
"names": ["height", "width", "channel"],
|
||||
}
|
||||
dataset = LeRobotDataset.create(
|
||||
repo_id=DUMMY_REPO_ID,
|
||||
fps=30,
|
||||
features=features,
|
||||
root=tmp_path / "_dataset",
|
||||
)
|
||||
root = tmp_path / "_dataset"
|
||||
for ep_idx in range(2):
|
||||
for _ in range(3):
|
||||
frame = {
|
||||
"action": np.random.randn(DUMMY_ACTION_DIM).astype(np.float32),
|
||||
"observation.state": np.random.randn(DUMMY_STATE_DIM).astype(np.float32),
|
||||
}
|
||||
for cam in camera_keys:
|
||||
frame[f"observation.images.{cam}"] = np.random.randint(
|
||||
0, 255, size=(IMAGE_SIZE, IMAGE_SIZE, 3), dtype=np.uint8
|
||||
)
|
||||
frame["task"] = f"task_{ep_idx}"
|
||||
dataset.add_frame(frame)
|
||||
dataset.save_episode()
|
||||
|
||||
dataset.finalize()
|
||||
return dataset, root
|
||||
|
||||
|
||||
def custom_validate(train_config: TrainPipelineConfig, policy_path: str, empty_cameras: int):
|
||||
train_config.policy = PreTrainedConfig.from_pretrained(policy_path)
|
||||
train_config.policy.pretrained_path = Path(policy_path)
|
||||
# override empty_cameras and push_to_hub for testing
|
||||
train_config.policy.empty_cameras = empty_cameras
|
||||
train_config.policy.push_to_hub = False
|
||||
if train_config.use_policy_training_preset:
|
||||
train_config.optimizer = train_config.policy.get_optimizer_preset()
|
||||
train_config.scheduler = train_config.policy.get_scheduler_preset()
|
||||
return train_config
|
||||
|
||||
|
||||
@pytest.mark.skip(reason="Skipping this test as it results OOM")
|
||||
@pytest.mark.parametrize(
|
||||
"camera_keys, empty_cameras, rename_map, expect_success",
|
||||
[
|
||||
# case 1: dataset has fewer cameras than policy (3 instead of 4), but we specify empty_cameras=1 for smolvla, pi0, pi05
|
||||
(["camera1", "camera2", "camera3"], 1, {}, True),
|
||||
# case 2: dataset has 2 cameras with different names, rename_mapping provided
|
||||
(
|
||||
["top", "side"],
|
||||
0,
|
||||
{
|
||||
"observation.images.top": "observation.images.camera1",
|
||||
"observation.images.side": "observation.images.camera2",
|
||||
},
|
||||
True,
|
||||
),
|
||||
# case 3: dataset has 2 cameras, policy expects 3, names do not match, no empty_cameras
|
||||
(["top", "side"], 0, {}, False),
|
||||
# TODO: case 4: dataset has 2 cameras, policy expects 3, no rename_map, no empty_cameras, should raise for smolvla
|
||||
# (["camera1", "camera2"], 0, {}, False),
|
||||
],
|
||||
)
|
||||
def test_train_with_camera_mismatch(camera_keys, empty_cameras, rename_map, expect_success, tmp_path):
|
||||
"""Tests that training works or fails depending on camera/feature alignment."""
|
||||
|
||||
_dataset, root = make_dummy_dataset(camera_keys, tmp_path)
|
||||
pretrained_path = "lerobot/smolvla_base"
|
||||
dataset_config = DatasetConfig(repo_id=DUMMY_REPO_ID, root=root)
|
||||
policy_config = make_policy_config(
|
||||
"smolvla",
|
||||
optimizer_lr=0.01,
|
||||
push_to_hub=False,
|
||||
pretrained_path=pretrained_path,
|
||||
device=DEVICE,
|
||||
)
|
||||
policy_config.empty_cameras = empty_cameras
|
||||
train_config = TrainPipelineConfig(
|
||||
dataset=dataset_config,
|
||||
policy=policy_config,
|
||||
rename_map=rename_map,
|
||||
output_dir=tmp_path / "_output",
|
||||
steps=1,
|
||||
)
|
||||
train_config = custom_validate(train_config, policy_path=pretrained_path, empty_cameras=empty_cameras)
|
||||
# HACK: disable the internal CLI validation step for tests, we did it with custom_validate
|
||||
train_config.validate = lambda: None
|
||||
if expect_success:
|
||||
train(train_config)
|
||||
else:
|
||||
with pytest.raises(ValueError):
|
||||
train(train_config)
|
||||
Reference in New Issue
Block a user