Help us improve
Share bugs, ideas, or general feedback.
From claude-leverage
Audits a repo for AI-readiness, scoring ~20 dimensions across Foundation, Why, What, Hygiene, and Sync. Use when inheriting a legacy repo or asking "is this repo agent-ready?"
npx claudepluginhub filip-podstavec/claude-leverage --plugin claude-leverageHow this skill is triggered — by the user, by Claude, or both
Slash command
/claude-leverage:repo-doctor [--score] [--json] [--fail-on missing|todo|stale] [--scope foundation|why|what|hygiene|all] [--quiet][--score] [--json] [--fail-on missing|todo|stale] [--scope foundation|why|what|hygiene|all] [--quiet]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Reads the current repo and answers the question prose `AGENTS.md`
Audits a repository to map its real stack, conventions, assets, tests, docs, risks, and integration points. Persists results in reusable markdown to reduce re-reading and save tokens. Also calculates a harnessability score (0-100) to assess how well the codebase supports autonomous agent work.
Assesses codebase for AI agent readiness by detecting stacks, monorepos, git setup, and evaluating style, testing, code quality, secrets, and file sizes.
Scaffolds AGENTS.md, ARCHITECTURE.md, and docs/ structure to make codebases legible to AI agents. Analyzes structure with bash recon, generates progressive disclosure docs, audits existing artifacts for coherence.
Share bugs, ideas, or general feedback.
Reads the current repo and answers the question prose AGENTS.md
doesn't cheaply answer per session: what's missing for AI-first
work here? Output is a Markdown report scored across ~15 dimensions,
with a concrete fix action per gap (often "invoke /X").
Read-only. Modifies nothing. The skill is an audit, not a bootstrap.
This skill complements three existing ones with clean differentiation:
| Skill | Question it answers |
|---|---|
/init-repo | "Set this fresh repo up." (writes files) |
/stack-check | "What I have — is it stale?" (freshness audit) |
/repo-doctor | "What I don't have — what's missing?" (completeness audit) |
/security-review | "Is this diff safe to commit?" (orthogonal — code-level scan) |
/init-repo ran, to see what else the bootstrap didn't
cover.skill-cheatsheet SessionStart hook suggests a checkup.Do NOT invoke for:
/stack-check./security-review./init-repo.# Repo Doctor — <repo-name> — <YYYY-MM-DD>
## Summary
✅ 8 pass · ⚠️ 4 attention · ❌ 3 missing · **Score: 67/100**
## Foundation (loaded every session)
| Check | Status | Fix |
|---|---|---|
| AGENTS.md (root) | ✅ 4.2 KiB | — |
| CLAUDE.md (one-line `@AGENTS.md` import) | ✅ present | — |
| Per-dir AGENTS.md | ⚠️ 3 source dirs >500 LOC missing | drop `templates/AGENTS.md.example` into `src/billing/`, `src/auth/`, `src/api/` |
## Why (load-bearing decisions)
| Check | Status | Fix |
|---|---|---|
| ADRs (docs/adr/) | ❌ directory absent | next load-bearing decision → invoke `/adr-new` |
| Session logs (docs/sessions/) | ⚠️ last entry 47 days old | end of next substantial session → `/session-log` |
## What (domain + structure)
| Check | Status | Fix |
|---|---|---|
| GLOSSARY.md | ❌ missing | invoke `/glossary-init` (auto-surfaces candidates) |
| architecture.yml | ❌ missing | invoke `/arch-map` |
## In-code discoverability
| Check | Status | Fix |
|---|---|---|
| AIDEV anchors | 12 anchors / 4823 LOC = 2.5/KLOC | typical (target 1–5/KLOC) |
| Overdue anchors | ⚠️ 2 past deadline | see `/stack-check` for detail |
## Engineering hygiene
| Check | Status | Fix |
|---|---|---|
| Tests present | ✅ 28 test files | — |
| Test/source LOC ratio | ⚠️ 0.18 (target 0.5–1.0) | `src/billing` largest, fewest tests — add coverage there first |
| Structured logging | ❌ `print(` 14×, no structured logger detected | invoke `/log-structured` |
| .gitignore (claude-leverage state) | ✅ present | — |
| README quickstart | ✅ present | — |
| Language manifest | ✅ pyproject.toml | — |
## Sync (code ↔ docs drift)
| Check | Status | Fix |
|---|---|---|
| `architecture.yml` ↔ disk | ⚠️ 2 drifts: `public_surface: [LegacyClient]` not in code; `src/old/` orphan (on disk, not in YAML) | invoke `/arch-map` to refresh |
| `GLOSSARY.md` ↔ code | ⚠️ term `Lead` no longer ref'd in code (last seen 4 months ago) | edit `GLOSSARY.md` or `/glossary-init --add` |
| Per-dir `AGENTS.md` staleness | ✅ all in sync | — |
| `CHANGELOG` ↔ version | ❌ `plugin.json` says 1.6.0, `CHANGELOG` top is 1.5.0 | add a `## [1.6.0]` entry to `CHANGELOG.md` |
| `README` slash-refs ↔ skills | ✅ all 13 slash commands resolve | — |
## Recommended next 3 actions
1. **`/arch-map`** — biggest unblock for refactor proposals; also fixes
sync drift on `public_surface: [LegacyClient]`; <5 min
2. **`/glossary-init`** — `Account` appears in 12 files; agent likely
hallucinating meaning
3. Add a `## [1.6.0]` entry to `CHANGELOG.md` to close the
version-drift gap
AGENTS.md (root) — test -f AGENTS.md. Evaluate the size bands
largest-first (a 40 KiB file is a fail, not a warn):
project_doc.max_bytes (32768), so part of the file is invisible to
Codex agents. This is data loss, not a style nit.docs/ behind a when-to-read link. See ADR 0009.CLAUDE.md (root) — test -f CLAUDE.md.
@AGENTS.md (i.e., diverges
from canonical guidance — split surface to maintain).@AGENTS.md import.Per-directory AGENTS.md — for each top-level source dir
(heuristic: has files matching *.py|*.ts|*.tsx|*.js|*.jsx|*.go|*.rs|*.java|*.rb|*.php|*.cs|*.kt|*.swift)
compute LOC (wc -l aggregated). For each with > 500 LOC and no
AGENTS.md at that dir root, count it.
ADRs — ls docs/adr/ 2>/dev/null | grep -E '^[0-9]{4}-.*\.md$'.
docs/adr/ absent OR 0 numbered ADR files.Session logs — ls docs/sessions/*.md 2>/dev/null and parse
the date from filename (YYYY-MM-DD-topic.md) or mtime fallback.
docs/sessions/ absent OR 0 logs.GLOSSARY.md — test -f GLOSSARY.md.
<TODO> placeholders
(parse H2 sections, count those whose body contains the literal
<TODO>).architecture.yml — test -f architecture.yml.
<TODO> placeholders
for role or stability.AIDEV anchor density — grep -rE 'AIDEV-(NOTE|TODO|QUESTION)'
across tracked files (skip bench archive, vendor, node_modules,
pycache, .git, dist, build). Count matches. Divide by total
tracked code LOC (git ls-files filtered to code extensions →
wc -l). Express as anchors-per-KLOC. Bands use half-open
intervals ([a, b)) so every value falls in exactly one band:
< 0.3/KLOC AND total LOC > 1000 (the repo is
big enough that the absence is signal).[0.3, 1.0)/KLOC (sparse for an AI-first repo).[1.0, 10.0]/KLOC (typical).> 10/KLOC (anchor noise — clutter dilutes
load-bearing ones).Overdue / due-soon anchors — borrowed from /stack-check's
anchor walk (intentional overlap; /stack-check provides the
full actionable detail, this dimension just surfaces the count
in the completeness report). Parse AIDEV-(TODO|QUESTION)(by: YYYY-MM-DD) and compare to today.
file:line and days
overdue; remind to run /stack-check for the rest.Tests present — exists tests/ dir OR ≥1 file matching
**/test_*.{py}, **/*_test.{go,py}, **/*.test.{ts,tsx,js,jsx},
**/*.spec.{ts,tsx,js,jsx}, **/*Test.{java,kt}, etc.
Test-to-source LOC ratio — sum LOC of test files (above
patterns) divided by sum LOC of source files (excluding tests,
excluding the same noise paths as anchor walk). Bands use
half-open intervals so every value falls in exactly one band.
Note: the ❌ floor at 0.15 is claude-leverage's own
judgment (below that the test suite is decorative); the ✅
band of [0.5, 1.5] is anchored on the
Count.co healthy range
of 0.5–1.0.
ratio < 0.15.[0.15, 0.5) (below the healthy range).[0.5, 1.5] (healthy).ratio > 1.5 (possible over-testing of trivial code
— judgment call).Structured logging — grep for unstructured logging patterns
(print(, console\.(log|info|warn|error)\(, fmt\.Print,
println!) vs structured logger imports (structlog, pino,
slog, tracing::, log/slog).
.gitignore claude-leverage state — grep -E '\.last-stack-check|claude-leverage' .gitignore.
.gitignore missing entirely.README quickstart — grep README.md for one of: ## Install,
## Quickstart, ## Getting started, ## Setup, ## Run,
## Usage (case-insensitive, first 200 lines).
Language manifest present — pyproject.toml, package.json,
go.mod, Cargo.toml, Gemfile, composer.json, pom.xml,
mix.exs. (Same list as bare-repo-nudge.sh.)
These dimensions check that the descriptive layer (architecture.yml, GLOSSARY.md, per-dir AGENTS.md, CHANGELOG, README) is still synchronized with the code. Differentiated from earlier dimensions (which check presence): a repo can have all artifacts present and still be in deep drift if those artifacts last described a previous version of the code.
Every Sync dimension returns N/A (excluded from divisor) when its
target artifact does not exist — drift is meaningless when there's
nothing to drift from. The presence gap is already reported by the
relevant earlier dimension (e.g. Dim 7 for architecture.yml).
architecture.yml ↔ disk + symbol drift — parse
architecture.yml. For each modules[].path, test -d it. For
each modules[].public_surface entry (a string), grep its name
in the declared path subtree. Walk top-level dirs on disk (same
noise-path filter as Dim 8) and identify any plausible source dir
(≥100 LOC, has code files) that is NOT covered by any
modules[].path — those are orphan modules, candidates for
/arch-map to add.
≤ 2.≥ 3.architecture.yml does not exist.GLOSSARY.md ↔ code drift — parse GLOSSARY.md. For each
## <Term> heading, grep the term across tracked code files
(skip noise paths, skip the glossary itself). For each Code:
bullet path in the entry body, test -e it. Separately:
identify the top-5 most-referenced PascalCase / domain-shaped
identifiers in the repo (using the same heuristic as
/glossary-init step 4) that are NOT in the glossary AND
appear ≥10 times — those are missing terms.
Code: paths AND no
obvious missing terms.≤ 2 total issues (stale + broken + missing combined).≥ 3.GLOSSARY.md does not exist.Per-dir AGENTS.md staleness vs dir activity — for each
<dir>/AGENTS.md (depth ≤ 3, skip noise paths), compute:
agents_md_ts = git log -1 --format=%ct -- <dir>/AGENTS.mddir_ts = git log -1 --format=%ct -- <dir> (any change in
the dir; for the comparison, ignore changes that touched
ONLY the AGENTS.md itself — see --invert-grep workaround
below).gap_days = (dir_ts - agents_md_ts) / 86400.If gap_days > N (default 30; override via
CLAUDE_LEVERAGE_AGENTS_MD_DRIFT_DAYS), the AGENTS.md is
likely describing a stale state of the dir.
1–2 stale.≥ 3 stale.gap_days.Implementation note: filtering "changes that only touched
AGENTS.md" requires git log -- <dir> plus a follow-up
git show --name-only per commit, or an approximation: subtract
1 day from agents_md_ts before comparing. The approximation
is fine — we're looking for month-scale drift, not hour-scale.
CHANGELOG.md ↔ version manifest — read the top-of-file
## [X.Y.Z] heading in CHANGELOG.md (first match). Compare
to the version in the primary manifest for this repo, in
order of precedence:
package.json#versionpyproject.toml [project] version or [tool.poetry] versionCargo.toml [package] version.claude-plugin/plugin.json#version (this stack)composer.json#versionUse the first manifest found.
CHANGELOG_top == manifest_version.manifest_version > CHANGELOG_top by exactly one minor /
patch level (probably an unreleased version about to ship — a
legit transient state).CHANGELOG_top > manifest_version which is "promised but not
shipped").CHANGELOG.md nor any recognized manifest
exists (a docs/scratch repo).README.md slash-refs ↔ skill availability — grep
README.md for /[a-z][a-z0-9-]+ tokens (slash-prefixed
identifiers). Filter to plausible skill / command references
(drop e.g. file paths, regex examples, dates). For each /foo:
skills/foo/SKILL.md exists at repo root → resolved.commands/foo.md exists → resolved.<plugin>" / "upstream" → resolved (external skill).1–2 unresolved.≥ 3 unresolved./foo slash-refs.Distinct from /stack-check's markdown link audit (which
checks file paths in markdown); this one checks slash-command
references against installed skills.
Resolve repo root. git rev-parse --show-toplevel. If not in
a git repo, STOP and report: "repo-doctor needs a git checkout to
walk tracked files".
Optionally narrow by --scope. Run only the dimensions in the
requested scope group. Default: all 20.
Run each dimension's check. Use the allowed-tools listed —
Read for AGENTS.md / CLAUDE.md / README / GLOSSARY.md /
architecture.yml; Grep for AIDEV anchors + structured-logging
patterns; Bash for git ls-files, wc -l, find. Cap walks
at 5000 tracked files; skip the same noise paths everywhere
(bench/archive-*, node_modules, pycache, .git, dist, build,
vendor, target, .next, .pytest_cache).
Compute the score as simple sum: ✅ = 1.0, ⚠️ = 0.5, ❌ = 0.
Divide by the number of dimensions actually evaluated — that is,
20 minus the count of N/A verdicts (some Sync and Hygiene
dimensions return N/A when their target artifact doesn't exist
or the language has no convention to check). The divisor is
further narrowed by --scope. Multiply by 100. Round.
Document the N/A count in the report's Summary line so the
Score: X/100 number is interpretable (e.g.
Score: 67/100 (3 N/A: arch-yml-drift, glossary-drift, structured-logging)).
Emit the report. Markdown by default. With --json, emit a
structured object:
{
"repo": "name",
"date": "2026-05-26",
"score": 67,
"summary": {"pass": 8, "attention": 4, "missing": 3},
"dimensions": [
{"name": "agents-md-root", "status": "pass", "value": "4.2 KiB", "fix": null},
{"name": "glossary-md", "status": "missing", "value": null, "fix": "/glossary-init"},
...
],
"recommended": ["/arch-map", "/glossary-init", "per-dir AGENTS.md"]
}
--quiet: suppress ✅ rows; show only ⚠️ + ❌ + the summary +
recommendations. Default is full report.
Exit code. 0 always, UNLESS --fail-on was passed:
--fail-on missing → exit 2 if any ❌--fail-on todo → exit 1 if any ⚠️ (TODO/draft state)--fail-on stale → exit 1 if any "stale" status (overdue
anchors, old session log)This is what makes the skill useful in CI as a gate.
<state>/.last-repo-doctor timestamp, which is in
~/.local/state/claude-leverage/ not the repo). The report is
text on stdout (or stdout-bound markdown).bench/archive-token-savings-thesis/)
and standard noise paths. Don't penalize a repo for old
archived experiments not following current conventions.--score — print only the integer 0–100 score on stdout, no
Markdown. Useful for CI scripts: score=$(claude /skill repo-doctor --score).--json — structured output (see step 5).--fail-on missing|todo|stale — exit non-zero per step 7.--scope foundation|why|what|hygiene|sync|all — narrow the check
set. sync runs only Dimensions 16–20 (drift detection); useful
for "did my last commit invalidate any docs?" runs.--quiet — suppress passing rows.--no-recommend — skip the "Recommended next 3 actions" section./init-repo (for AGENTS.md
/glossary-init, /arch-map, /adr-new, /session-log)./stack-check./security-review.pytest, eslint, cargo clippy, …). The skill checks that
tests exist, not that they pass.skill-cheatsheet SessionStart hook can
suggest /repo-doctor, but doesn't run it.Same SKILL.md ships in Codex via scripts/install-codex.sh. All
checks use plain Bash + Read + Grep — no Claude-Code-specific tools.
--diff-base <ref> to score a PR's incremental contribution
to AI-readiness (e.g., "did this PR add the AGENTS.md it should
have?"). Future enhancement.--json outputs.