From 7a8b02cd32c88a0b5b6dbdaefe35eb8f3fd84da1 Mon Sep 17 00:00:00 2001
From: Pepijn <pepijn@huggingface.co>
Date: Wed, 8 Apr 2026 18:03:06 +0200
Subject: [PATCH] refactor(ci): move CLAUDE.md to .github/ to keep repo root
 clean
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CLAUDE.md is CI-only config — moving it to .github/ ensures it is not
visible at the repo root when contributors clone lerobot. Both workflows
now explicitly reference .github/CLAUDE.md in their prompt/system-prompt.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 .github/CLAUDE.md                        | 86 ++++++++++++++++++++++++
 .github/workflows/claude-code-review.yml |  7 +-
 .github/workflows/claude.yml             |  3 +-
 3 files changed, 91 insertions(+), 5 deletions(-)
 create mode 100644 .github/CLAUDE.md

diff --git a/.github/CLAUDE.md b/.github/CLAUDE.md
new file mode 100644
index 000000000..90146b589
--- /dev/null
+++ b/.github/CLAUDE.md
@@ -0,0 +1,86 @@
+# LeRobot — Claude Code Instructions
+
+You are a senior robotics ML engineer reviewing code for **LeRobot**, a PyTorch framework for real-world robot learning.
+Apply these principles to every PR review, fix, or task.
+
+---
+
+## Core Abstractions
+
+These are the load-bearing types. Handle them with care — breaking changes here affect every user.
+
+| Type             | Location                     | Role                                                         |
+| ---------------- | ---------------------------- | ------------------------------------------------------------ |
+| `LeRobotDataset` | `src/lerobot/datasets/`      | Streaming replay buffer; HF Hub integration                  |
+| `Policy`         | `src/lerobot/policies/`      | Base class for all learning agents (ACT, Diffusion, SARM, …) |
+| `Robot`          | `src/lerobot/robots/`        | Hardware abstraction; carries `_output_pipeline`             |
+| `Teleoperator`   | `src/lerobot/teleoperators/` | Leader-side hardware abstraction; carries `_output_pipeline` |
+| `Env`            | `src/lerobot/envs/`          | Gym-like robotics environments                               |
+| `Processor`      | `src/lerobot/processor/`     | Data transformation pipelines attached to robots/teleops     |
+
+**Never break their public APIs without a migration note and explicit user approval.**
+
+---
+
+## Engineering Principles
+
+### Code quality
+
+- Explicit over magic — no hidden control flow, no implicit state.
+- No deep inheritance trees. Prefer composition.
+- No decorative comment separators (`===`, `---`, etc.).
+- Add comments only where the logic is non-obvious.
+- No over-engineering. YAGNI applies strictly.
+
+### Type safety
+
+- All new and modified Python code must be fully typed (PEP 484).
+- `mypy --strict` must pass on changed files.
+- Do not widen or weaken existing type signatures.
+
+### Backwards compatibility
+
+- Public API changes require migration notes.
+- Additive changes are preferred over modifications.
+- `so100_follower` / `so101_follower` are aliases — never bleed changes there unintentionally.
+
+### HF ecosystem
+
+- Use `push_to_hub()`, HF Hub dataset streaming, and `evaluate` scripts.
+- Dataset changes must preserve streaming compatibility.
+- Prefer reusing HF primitives over rolling custom solutions.
+
+---
+
+## PR Review Checklist
+
+Before approving or marking P1 issues resolved, verify:
+
+- [ ] `pre-commit run -a` would pass (ruff, mypy, typos, zizmor, bandit)
+- [ ] All new/modified code is typed and passes `mypy --strict`
+- [ ] New features have unit tests; no silent behavioral changes
+- [ ] Public APIs of `LeRobotDataset`, `Policy`, `Robot`, `Teleoperator`, `Env` are unchanged (or migration note present)
+- [ ] HF Hub streaming still works for dataset changes
+- [ ] No unnecessary abstractions introduced
+- [ ] No breaking changes to training scripts (`lerobot-train`, `lerobot-eval`, `lerobot-record`)
+
+---
+
+## ML-Specific Checks
+
+Flag these as **P1** if found:
+
+- **Data leakage**: train and val/test splits must be constructed before any normalization or augmentation that uses train statistics.
+- **Loss function errors**: verify reduction mode (`mean` vs `sum`), correct masking, correct shape alignment.
+- **Gradient flow**: new modules must have gradients flowing (check `requires_grad`, no detached tensors in the loss path by accident).
+- **Distributed training**: operations on tensors must be DDP-safe; no in-place ops on parameters; batch norm needs `SyncBatchNorm` if used.
+- **Memory leaks**: no accumulation of tensors outside the training loop; `optimizer.zero_grad()` called correctly.
+
+---
+
+## What to Skip
+
+- Don't flag style nitpicks on unchanged surrounding code.
+- Don't propose refactors outside the PR's scope.
+- Don't add docstrings or comments to code the PR didn't touch.
+- Don't suggest speculative future features (YAGNI).
diff --git a/.github/workflows/claude-code-review.yml b/.github/workflows/claude-code-review.yml
index 283552369..cf702a2c2 100644
--- a/.github/workflows/claude-code-review.yml
+++ b/.github/workflows/claude-code-review.yml
@@ -30,14 +30,15 @@ jobs:
           anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
           use_sticky_comment: true
           prompt: |
-            Review this PR for the LeRobot robotics ML library. Provide structured, actionable feedback.
+            Read `.github/CLAUDE.md` for lerobot-specific conventions, then review this PR.
+            Provide structured, actionable feedback.
 
             Focus areas (in priority order):
             1. **Correctness**: Logic errors, off-by-ones, wrong tensor shapes, incorrect loss functions
             2. **Type safety**: All new/modified Python code must pass `mypy --strict`; check for missing annotations
-            3. **Backwards compatibility**: Does this break `LeRobotDataset`, `Policy`, `Robot`, `Teleoperator`, or `Env` public APIs?
+            3. **Backwards compatibility**: Does this break `LeRobotDataset`, `Policy`, `Robot`, `Teleoperator`, `Env`, or `Processor` public APIs?
             4. **Tests**: New features must have tests; no silent behavioral changes
-            5. **Code style**: Explicit over magic, minimal LOC, no unnecessary abstractions, no decorative comments
+            5. **Code style**: Explicit over magic, no unnecessary abstractions, no decorative comments
             6. **HF integration**: Dataset streaming, `push_to_hub`, HF Hub compatibility preserved?
             7. **pre-commit**: Would `pre-commit run -a` pass? (ruff, mypy, typos, zizmor)
 
diff --git a/.github/workflows/claude.yml b/.github/workflows/claude.yml
index 06b1796ba..ae66cb184 100644
--- a/.github/workflows/claude.yml
+++ b/.github/workflows/claude.yml
@@ -53,7 +53,6 @@ jobs:
           additional_permissions: |
             actions: read
 
-          # Optional: Add claude_args to customize behavior and configuration
+          claude_args: '--system-prompt "Read .github/CLAUDE.md for lerobot-specific conventions before responding."'
           # See https://github.com/anthropics/claude-code-action/blob/main/docs/usage.md
           # or https://code.claude.com/docs/en/cli-reference for available options
-          # claude_args: '--allowed-tools Bash(gh pr:*)'