The generic Dockerfile.benchmark was using a plain `uv pip install ".[libero_plus]"`
which silently fails to make `libero` importable due to an upstream LIBERO-plus
packaging bug. Port the dedicated clone + .pth workaround from
Dockerfile.eval-libero-plus so `docker build --build-arg BENCHMARK=libero_plus`
produces working containers.
Also fix eval worker using nonexistent `parser.parse()` — use `draccus.parse()`.
Made-with: Cursor
Add Dockerfile.benchmark (parameterized via ARG BENCHMARK), a
docker-compose.benchmark.yml with services for libero, libero_plus,
robomme, and robocasa, and a smoke_test_benchmark.sh that verifies
imports and CLI entry-points in each container.
Also add the missing `robocasa` optional dep group to pyproject.toml
(the docs already referenced `pip install ".[robocasa]"` but the group
was not defined).
Build a specific benchmark image:
docker build --build-arg BENCHMARK=robomme \
-f docker/Dockerfile.benchmark -t lerobot-benchmark-robomme .
Build all via compose:
docker compose -f docker/docker-compose.benchmark.yml build
Smoke-test inside a container:
docker compose -f docker/docker-compose.benchmark.yml run --rm robomme \
bash docker/smoke_test_benchmark.sh
Co-Authored-By: Claude <noreply@anthropic.com>