admin/lerobot

Fork 0

mirror of https://github.com/huggingface/lerobot.git synced 2026-05-16 17:20:05 +00:00

Files

T

Jade Choghari 32fc5504cc add various experiments for wavelet

2026-02-13 10:27:02 +00:00

7.7 KiB

Raw Blame History

Action tokenizer benchmark

Questions

What is the trade-off between:

Compression: how many tokens are needed to represent an action chunk (e.g. horizon × action_dim floats)?
Reconstruction quality: how well does encode-then-decode preserve the original actions?
Speed: how long does encoding and decoding take per chunk?

How to choose an action tokenizer?

Which tokenizer architecture (e.g. dct + BPE, DCT + BPE)?
Which action horizon and encoded dimensions to use?
Which normalization (QUANTILES, MEAN_STD, MIN_MAX) and delta transform (relative vs absolute actions)?
How do reconstruction error and compression ratio vary across datasets and tokenizer settings?

This benchmark loads action chunks from a LeRobot dataset using the same pipeline as lerobot-train-tokenizer, runs a trained action tokenizer in encode/decode mode, and reports reconstruction error, compression stats, and timing. Results are saved as JSON under outputs/ for comparison and analysis.

Variables

Dataset & chunking

repo_id: LeRobot dataset (e.g. lerobot/pusht). Action statistics and normalization are taken from the dataset metadata when available.
action_horizon: Number of future steps per action chunk (must match the tokenizer’s training).
encoded_dims: Dimension ranges to encode (e.g. 0:6 or 0:6,7:14). Must match the tokenizer.
max_episodes: Cap on episodes to load (default: all).
sample_fraction: Fraction of chunks to sample per episode (default 0.2) to keep runtime manageable.

Transform & normalization

normalization_mode: IDENTITY, MEAN_STD, MIN_MAX, QUANTILES, QUANTILE10. Should match the tokenizer’s training.
delta_dims: Comma-separated dimension indices for delta (relative) transform.
use_delta_transform: Whether to convert actions to relative to current state for those dimensions.
state_key: Dataset key for state (e.g. observation.state) used when applying delta transform.

Tokenizer & evaluation

action_tokenizer_path: Path or HuggingFace repo id of the trained tokenizer (e.g. outputs/wavetoken).
max_chunks_for_reconstruction: Max number of chunks to use for reconstruction and timing (default 500) to limit runtime.

Main parameters

parameter	default	description
action_tokenizer_path	(required)	Path or Hub id of the trained action tokenizer.
repo_id	(required)	LeRobot dataset repo id.
action_horizon	`10`	Future steps per chunk.
encoded_dims	`0:6`	Dimension ranges to encode (e.g. `0:6,7:14`).
normalization_mode	`QUANTILES`	Normalization mode for actions.
max_episodes	all	Max episodes to load.
sample_fraction	`0.2`	Fraction of chunks sampled per episode.
max_chunks_for_reconstruction	`500`	Chunks used for reconstruction and timing.
output_dir	`outputs/action_tokenizer_benchmark`	Directory for results JSON.

Metrics

Reconstruction (lower is better)

reconstruction_mae: Mean absolute error between original and decoded action chunks.
reconstruction_mse: Mean squared error.
reconstruction_rmse: Root mean squared error.
reconstruction_max_abs_error: Maximum absolute error over all dimensions and samples.
per_dimension_mae: MAE per action dimension (list of length action_dim).

Compression

compression_ratio: Ratio (action_horizon × action_dim) / mean number of tokens. Higher means more compression.
mean_token_length, std_token_length: Mean and standard deviation of token count per chunk.
min_token_length, max_token_length: Min and max token count.
p50_token_length, p99_token_length: 50th and 99th percentile token counts.

Timing (seconds per chunk)

mean_encode_time_sec: Mean time to encode one chunk.
mean_decode_time_sec: Mean time to decode one chunk.

The JSON output also includes num_chunks_evaluated and total_chunks_available for context.

How the benchmark works

Load dataset: LeRobot dataset is loaded for the given repo_id and root.
Build action chunks: For each episode (up to max_episodes), action chunks are built with the same logic as lerobot-train-tokenizer: sliding window of length action_horizon, optional delta transform, and per-episode sampling with sample_fraction.
Extract and normalize: Only encoded_dims are kept. Normalization is applied using the dataset’s action stats when available, according to normalization_mode.
Encode / decode: A random sample of chunks (size max_chunks_for_reconstruction) is encoded and then decoded with the tokenizer. Encode and decode times are recorded per chunk.
Compute metrics: Reconstruction metrics are computed between original and decoded chunks; compression and timing stats are aggregated.
Save results: A JSON file is written to output_dir with name {timestamp}_{repo_id}_action_tokenizer_results.json, containing the full config and all metrics.

The pipeline (chunking, dimensions, normalization, delta) must match how the tokenizer was trained; otherwise reconstruction error can be large or the tokenizer may raise.

Caveats

The tokenizer’s action_horizon and action_dim (and optionally DCT settings) are fixed at training time. The benchmark infers dimensions from the dataset and encoded dims; the tokenizer path must correspond to a model trained with the same horizon and encoded dimensions.
Reconstruction is evaluated in normalized space (the same space the tokenizer sees). For interpretation in raw action space, you would need to invert normalization outside this script.
Only one tokenizer and one dataset are evaluated per run. To compare tokenizers or datasets, run the script multiple times and compare the saved JSON files.

Example

Quick run with a local tokenizer and a small number of episodes:

python benchmarks/tokens/run_action_tokenizer_benchmark.py \
    --action-tokenizer-path=outputs/wavetoken \
    --repo-id=lerobot/pusht \
    --action-horizon=10 \
    --max-episodes=50 \
    --output-dir=outputs/action_tokenizer_benchmark

With delta transform and custom encoded dimensions:

python benchmarks/tokens/run_action_tokenizer_benchmark.py \
    --action-tokenizer-path=outputs/wavetoken \
    --repo-id=lerobot/pusht \
    --action-horizon=10 \
    --encoded-dims=0:6,7:14 \
    --delta-dims=0,1,2,3,4,5 \
    --use-delta-transform \
    --normalization-mode=QUANTILES \
    --max-chunks-for-reconstruction=500 \
    --output-dir=outputs/action_tokenizer_benchmark

Results are written to e.g. outputs/action_tokenizer_benchmark/2026-02-12_14-30-00_lerobot_pusht_action_tokenizer_results.json.

Results

Results are stored as JSON in the directory given by --output-dir (default: outputs/action_tokenizer_benchmark). Each file contains:

config: All script arguments (tokenizer path, repo_id, action_horizon, encoded_dims, normalization_mode, etc.) for reproducibility.
metrics: All reconstruction, compression, and timing metrics described above.

To compare runs, load and diff or aggregate these JSON files with your own scripts or notebooks.

7.7 KiB Raw Blame History Unescape Escape