From 7241f029c6fab6ea9cb556f2d1b40346c66fe1a2 Mon Sep 17 00:00:00 2001 From: Pepijn Date: Tue, 9 Jun 2026 17:08:54 +0200 Subject: [PATCH] =?UTF-8?q?docs(streaming):=20A100/H100=20NVDEC=20cannot?= =?UTF-8?q?=20decode=20AV1=20=E2=80=94=20correct=20guidance?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit NVIDIA's decode support matrix: the compute GPUs A100 (GA100) and H100 (GH100) have no AV1 NVDEC decoder; only Ada (L4/L40/RTX40) and some Ampere (A10/A40/A16) do. So on A100/H100 nodes, AV1 datasets must be decoded on CPU or re-encoded to H.265/H.264 — no torchcodec build enables cuda AV1 decode there. Also distinguish that error from "Unsupported device: cuda (variant: ffmpeg)", which is a torchcodec-built-without-CUDA issue. Update diagnose_decode.py message + benchmark README accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) --- benchmarks/streaming/README.md | 9 +++++++++ benchmarks/streaming/diagnose_decode.py | 12 ++++++++---- 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/benchmarks/streaming/README.md b/benchmarks/streaming/README.md index d598e555a..82d6dd4c1 100644 --- a/benchmarks/streaming/README.md +++ b/benchmarks/streaming/README.md @@ -55,6 +55,15 @@ limited number of concurrent decode sessions per GPU; if you hit session/IPC lim or compare against `--num_workers 0` (single-process NVDEC, which often saturates the decode engine on its own). Result files include the decode device in their name (`..._w6_cuda.json`). +> **Codec ⇄ NVDEC compatibility (important).** NVDEC can only decode codecs its hardware supports. LeRobot +> v3 datasets are often **AV1**-encoded, and the **A100 and H100 compute GPUs have no AV1 NVDEC decoder** +> (per NVIDIA's [decode support matrix](https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new)); +> only Ada (L4/L40/RTX40) and a few Ampere cards (A10/A40/A16) do. On A100/H100, AV1 must be decoded on +> **CPU**, or the dataset re-encoded to H.265/H.264 (which those GPUs' NVDEC do support). Run +> `diagnose_decode.py --video_decode_device cuda` to check your exact node before relying on `cuda` decode. +> A `cuda` torchcodec build also needs an FFmpeg with NVDEC; see +> . + Reference data root: bucket sources resolve through `--data_files_root hf://buckets//` (metadata still loads from `--repo_id`). The local `single`/`sarm` CPU baselines on this dataset were ~176 / ~212 frames/s/node at `--num_workers 3` (3 cameras, fps 20). diff --git a/benchmarks/streaming/diagnose_decode.py b/benchmarks/streaming/diagnose_decode.py index 087a91c6c..f84a1e1c6 100644 --- a/benchmarks/streaming/diagnose_decode.py +++ b/benchmarks/streaming/diagnose_decode.py @@ -95,10 +95,14 @@ def main() -> None: "(see the 'codec' line on a working machine). Then:\n" " - CPU decode needs an ffmpeg built with an AV1 decoder (libdav1d/libaom); a build without it " "reports 'No valid stream found'.\n" - " - GPU/NVDEC decode of AV1 requires an Ada-generation GPU or newer (RTX 40 / L4 / L40). " - "Ampere/Volta (A100/V100) NVDEC cannot decode AV1, so the decoder opens but yields 0 frames.\n" - "Fix: install an AV1-capable ffmpeg/torchcodec on the node (and use an Ada+ GPU for --video_" - "decode_device cuda), or re-encode the dataset to H.264/H.265.\n" + " - GPU/NVDEC decode of AV1 is only on AV1-capable NVDEC GPUs: Ada (L4/L40/RTX40) and some " + "Ampere (A10/A40/A16). The COMPUTE GPUs A100 and H100 have NO AV1 NVDEC decoder (per NVIDIA's " + "support matrix), so no torchcodec build enables cuda decode of AV1 on them.\n" + " - 'Unsupported device: cuda (variant: ffmpeg)' instead means torchcodec was built without " + "the CUDA backend; install a CUDA-enabled wheel (see README) — but on A100/H100 that still " + "won't decode AV1.\n" + "Fix: decode on CPU, run NVDEC on an Ada GPU, or re-encode the dataset to H.265/H.264 (which " + "A100/H100 NVDEC do support).\n" "If ftyp=False instead, the handle resolved to a placeholder/error page (auth, revision, or Xet " "resolution) rather than the video bytes." )