From 7241f029c6fab6ea9cb556f2d1b40346c66fe1a2 Mon Sep 17 00:00:00 2001
From: Pepijn <pepijn@huggingface.co>
Date: Tue, 9 Jun 2026 17:08:54 +0200
Subject: [PATCH] =?UTF-8?q?docs(streaming):=20A100/H100=20NVDEC=20cannot?=
 =?UTF-8?q?=20decode=20AV1=20=E2=80=94=20correct=20guidance?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

NVIDIA's decode support matrix: the compute GPUs A100 (GA100) and H100 (GH100) have no
AV1 NVDEC decoder; only Ada (L4/L40/RTX40) and some Ampere (A10/A40/A16) do. So on
A100/H100 nodes, AV1 datasets must be decoded on CPU or re-encoded to H.265/H.264 — no
torchcodec build enables cuda AV1 decode there. Also distinguish that error from
"Unsupported device: cuda (variant: ffmpeg)", which is a torchcodec-built-without-CUDA
issue. Update diagnose_decode.py message + benchmark README accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 benchmarks/streaming/README.md          |  9 +++++++++
 benchmarks/streaming/diagnose_decode.py | 12 ++++++++----
 2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/benchmarks/streaming/README.md b/benchmarks/streaming/README.md
index d598e555a..82d6dd4c1 100644
--- a/benchmarks/streaming/README.md
+++ b/benchmarks/streaming/README.md
@@ -55,6 +55,15 @@ limited number of concurrent decode sessions per GPU; if you hit session/IPC lim
 or compare against `--num_workers 0` (single-process NVDEC, which often saturates the decode engine on its
 own). Result files include the decode device in their name (`..._w6_cuda.json`).
 
+> **Codec ⇄ NVDEC compatibility (important).** NVDEC can only decode codecs its hardware supports. LeRobot
+> v3 datasets are often **AV1**-encoded, and the **A100 and H100 compute GPUs have no AV1 NVDEC decoder**
+> (per NVIDIA's [decode support matrix](https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new));
+> only Ada (L4/L40/RTX40) and a few Ampere cards (A10/A40/A16) do. On A100/H100, AV1 must be decoded on
+> **CPU**, or the dataset re-encoded to H.265/H.264 (which those GPUs' NVDEC do support). Run
+> `diagnose_decode.py --video_decode_device cuda` to check your exact node before relying on `cuda` decode.
+> A `cuda` torchcodec build also needs an FFmpeg with NVDEC; see
+> <https://github.com/meta-pytorch/torchcodec#installing-cuda-enabled-torchcodec>.
+
 Reference data root: bucket sources resolve through `--data_files_root hf://buckets/<owner>/<name>` (metadata
 still loads from `--repo_id`). The local `single`/`sarm` CPU baselines on this dataset were ~176 / ~212
 frames/s/node at `--num_workers 3` (3 cameras, fps 20).
diff --git a/benchmarks/streaming/diagnose_decode.py b/benchmarks/streaming/diagnose_decode.py
index 087a91c6c..f84a1e1c6 100644
--- a/benchmarks/streaming/diagnose_decode.py
+++ b/benchmarks/streaming/diagnose_decode.py
@@ -95,10 +95,14 @@ def main() -> None:
             "(see the 'codec' line on a working machine). Then:\n"
             "  - CPU decode needs an ffmpeg built with an AV1 decoder (libdav1d/libaom); a build without it "
             "reports 'No valid stream found'.\n"
-            "  - GPU/NVDEC decode of AV1 requires an Ada-generation GPU or newer (RTX 40 / L4 / L40). "
-            "Ampere/Volta (A100/V100) NVDEC cannot decode AV1, so the decoder opens but yields 0 frames.\n"
-            "Fix: install an AV1-capable ffmpeg/torchcodec on the node (and use an Ada+ GPU for --video_"
-            "decode_device cuda), or re-encode the dataset to H.264/H.265.\n"
+            "  - GPU/NVDEC decode of AV1 is only on AV1-capable NVDEC GPUs: Ada (L4/L40/RTX40) and some "
+            "Ampere (A10/A40/A16). The COMPUTE GPUs A100 and H100 have NO AV1 NVDEC decoder (per NVIDIA's "
+            "support matrix), so no torchcodec build enables cuda decode of AV1 on them.\n"
+            "  - 'Unsupported device: cuda (variant: ffmpeg)' instead means torchcodec was built without "
+            "the CUDA backend; install a CUDA-enabled wheel (see README) — but on A100/H100 that still "
+            "won't decode AV1.\n"
+            "Fix: decode on CPU, run NVDEC on an Ada GPU, or re-encode the dataset to H.265/H.264 (which "
+            "A100/H100 NVDEC do support).\n"
             "If ftyp=False instead, the handle resolved to a placeholder/error page (auth, revision, or Xet "
             "resolution) rather than the video bytes."
         )