add various experiments for wavelet

2026-07-24 10:16:09 +00:00 · 2026-02-13 10:27:02 +00:00
parent fc8a388a25
commit 32fc5504cc
3 changed files with 584 additions and 7 deletions
@@ -0,0 +1,134 @@
+# Action tokenizer benchmark
+
+## Questions
+
+What is the trade-off between:
+
+- **Compression**: how many tokens are needed to represent an action chunk (e.g. horizon × action_dim floats)?
+- **Reconstruction quality**: how well does encode-then-decode preserve the original actions?
+- **Speed**: how long does encoding and decoding take per chunk?
+
+How to choose an action tokenizer?
+
+- Which tokenizer architecture (e.g. dct + BPE, DCT + BPE)?
+- Which **action horizon** and **encoded dimensions** to use?
+- Which **normalization** (QUANTILES, MEAN_STD, MIN_MAX) and **delta transform** (relative vs absolute actions)?
+- How do reconstruction error and compression ratio vary across datasets and tokenizer settings?
+
+This benchmark loads action chunks from a LeRobot dataset using the same pipeline as `lerobot-train-tokenizer`, runs a trained action tokenizer in encode/decode mode, and reports reconstruction error, compression stats, and timing. Results are saved as JSON under `outputs/` for comparison and analysis.
+
+## Variables
+
+**Dataset & chunking**
+
+- **repo_id**: LeRobot dataset (e.g. `lerobot/pusht`). Action statistics and normalization are taken from the dataset metadata when available.
+- **action_horizon**: Number of future steps per action chunk (must match the tokenizer’s training).
+- **encoded_dims**: Dimension ranges to encode (e.g. `0:6` or `0:6,7:14`). Must match the tokenizer.
+- **max_episodes**: Cap on episodes to load (default: all).
+- **sample_fraction**: Fraction of chunks to sample per episode (default `0.2`) to keep runtime manageable.
+
+**Transform & normalization**
+
+- **normalization_mode**: `IDENTITY`, `MEAN_STD`, `MIN_MAX`, `QUANTILES`, `QUANTILE10`. Should match the tokenizer’s training.
+- **delta_dims**: Comma-separated dimension indices for delta (relative) transform.
+- **use_delta_transform**: Whether to convert actions to relative to current state for those dimensions.
+- **state_key**: Dataset key for state (e.g. `observation.state`) used when applying delta transform.
+
+**Tokenizer & evaluation**
+
+- **action_tokenizer_path**: Path or HuggingFace repo id of the trained tokenizer (e.g. `outputs/wavetoken`).
+- **max_chunks_for_reconstruction**: Max number of chunks to use for reconstruction and timing (default `500`) to limit runtime.
+
+### Main parameters
+
+| parameter                        | default                      | description                                      |
+| -------------------------------- | ---------------------------- | ------------------------------------------------ |
+| **action_tokenizer_path**        | (required)                   | Path or Hub id of the trained action tokenizer.  |
+| **repo_id**                      | (required)                   | LeRobot dataset repo id.                         |
+| **action_horizon**               | `10`                         | Future steps per chunk.                          |
+| **encoded_dims**                 | `0:6`                        | Dimension ranges to encode (e.g. `0:6,7:14`).   |
+| **normalization_mode**           | `QUANTILES`                  | Normalization mode for actions.                  |
+| **max_episodes**                 | all                          | Max episodes to load.                            |
+| **sample_fraction**              | `0.2`                        | Fraction of chunks sampled per episode.          |
+| **max_chunks_for_reconstruction**| `500`                        | Chunks used for reconstruction and timing.       |
+| **output_dir**                   | `outputs/action_tokenizer_benchmark` | Directory for results JSON.              |
+
+## Metrics
+
+**Reconstruction (lower is better)**
+
+- **reconstruction_mae**: Mean absolute error between original and decoded action chunks.
+- **reconstruction_mse**: Mean squared error.
+- **reconstruction_rmse**: Root mean squared error.
+- **reconstruction_max_abs_error**: Maximum absolute error over all dimensions and samples.
+- **per_dimension_mae**: MAE per action dimension (list of length `action_dim`).
+
+**Compression**
+
+- **compression_ratio**: Ratio (action_horizon × action_dim) / mean number of tokens. Higher means more compression.
+- **mean_token_length**, **std_token_length**: Mean and standard deviation of token count per chunk.
+- **min_token_length**, **max_token_length**: Min and max token count.
+- **p50_token_length**, **p99_token_length**: 50th and 99th percentile token counts.
+
+**Timing (seconds per chunk)**
+
+- **mean_encode_time_sec**: Mean time to encode one chunk.
+- **mean_decode_time_sec**: Mean time to decode one chunk.
+
+The JSON output also includes **num_chunks_evaluated** and **total_chunks_available** for context.
+
+## How the benchmark works
+
+1. **Load dataset**: LeRobot dataset is loaded for the given `repo_id` and `root`.
+2. **Build action chunks**: For each episode (up to `max_episodes`), action chunks are built with the same logic as `lerobot-train-tokenizer`: sliding window of length `action_horizon`, optional delta transform, and per-episode sampling with `sample_fraction`.
+3. **Extract and normalize**: Only `encoded_dims` are kept. Normalization is applied using the dataset’s action stats when available, according to `normalization_mode`.
+4. **Encode / decode**: A random sample of chunks (size `max_chunks_for_reconstruction`) is encoded and then decoded with the tokenizer. Encode and decode times are recorded per chunk.
+5. **Compute metrics**: Reconstruction metrics are computed between original and decoded chunks; compression and timing stats are aggregated.
+6. **Save results**: A JSON file is written to `output_dir` with name `{timestamp}_{repo_id}_action_tokenizer_results.json`, containing the full config and all metrics.
+
+The pipeline (chunking, dimensions, normalization, delta) must match how the tokenizer was trained; otherwise reconstruction error can be large or the tokenizer may raise.
+
+## Caveats
+
+- The tokenizer’s **action_horizon** and **action_dim** (and optionally DCT settings) are fixed at training time. The benchmark infers dimensions from the dataset and encoded dims; the tokenizer path must correspond to a model trained with the same horizon and encoded dimensions.
+- Reconstruction is evaluated in **normalized space** (the same space the tokenizer sees). For interpretation in raw action space, you would need to invert normalization outside this script.
+- Only one tokenizer and one dataset are evaluated per run. To compare tokenizers or datasets, run the script multiple times and compare the saved JSON files.
+
+## Example
+
+Quick run with a local tokenizer and a small number of episodes:
+
+```bash
+python benchmarks/tokens/run_action_tokenizer_benchmark.py \
+    --action-tokenizer-path=outputs/wavetoken \
+    --repo-id=lerobot/pusht \
+    --action-horizon=10 \
+    --max-episodes=50 \
+    --output-dir=outputs/action_tokenizer_benchmark
+```
+
+With delta transform and custom encoded dimensions:
+
+```bash
+python benchmarks/tokens/run_action_tokenizer_benchmark.py \
+    --action-tokenizer-path=outputs/wavetoken \
+    --repo-id=lerobot/pusht \
+    --action-horizon=10 \
+    --encoded-dims=0:6,7:14 \
+    --delta-dims=0,1,2,3,4,5 \
+    --use-delta-transform \
+    --normalization-mode=QUANTILES \
+    --max-chunks-for-reconstruction=500 \
+    --output-dir=outputs/action_tokenizer_benchmark
+```
+
+Results are written to e.g. `outputs/action_tokenizer_benchmark/2026-02-12_14-30-00_lerobot_pusht_action_tokenizer_results.json`.
+
+## Results
+
+Results are stored as JSON in the directory given by `--output-dir` (default: `outputs/action_tokenizer_benchmark`). Each file contains:
+
+- **config**: All script arguments (tokenizer path, repo_id, action_horizon, encoded_dims, normalization_mode, etc.) for reproducibility.
+- **metrics**: All reconstruction, compression, and timing metrics described above.
+
+To compare runs, load and diff or aggregate these JSON files with your own scripts or notebooks.