Help us improve
Share bugs, ideas, or general feedback.
From audit-plugin
Multi-round independent content audit of Claude Code plugins (plugin.json, SKILL.md, references, examples, hooks, commands). Fixes only approved issues and loops until convergence.
npx claudepluginhub minhthang1009/dotclaude --plugin audit-pluginHow this skill is triggered — by the user, by Claude, or both
Slash command
/audit-plugin:audit-plugin <plugin-path><plugin-path>This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Audit the content quality of a Claude Code plugin through multiple independent review rounds, fix only what the user approves, and loop until convergence. The target is the plugin path passed as the skill argument (`<plugin-path>`); if no argument was given — or the path does not exist, or it contains no `.claude-plugin/plugin.json` (not a plugin) — stop and ask the user before proceeding.
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
Share bugs, ideas, or general feedback.
Audit the content quality of a Claude Code plugin through multiple independent review rounds, fix only what the user approves, and loop until convergence. The target is the plugin path passed as the skill argument (<plugin-path>); if no argument was given — or the path does not exist, or it contains no .claude-plugin/plugin.json (not a plugin) — stop and ask the user before proceeding.
Core principles: every claim must carry a verbatim quote + file:line; finders maximize coverage, a validator filters false positives afterward; whoever fixes never signs off on their own fixes; the user approves the scope and every design decision.
.claude-plugin/marketplace.json, enabledPlugins in settings, sync configs (e.g. a load list consumed by a sync script), junctions/symlinks. Marketplace registration alone proves nothing if the repo loads skills another way; a gap in any one mechanism is a structural finding — record it now and carry it into the Stage 1 findings list (it enters the ledger with everything else).references/benchmark-guide.md §4 if Stage 6 may apply.improvement-proposals.md; if criteria-update proposals are marked PROPOSED, ask the user once whether to apply them to the canonical block before this audit begins, then mark each APPLIED or DECLINED (dated). This step is the trigger that closes the self-update loop — without it the loop only ever writes.Stage 1 is performed by you (the lead), in-context — it builds the understanding needed to orchestrate every later stage. Dispatched zero-context reviewers run the same criteria only at Stage 5. Read 100% of the files. Bundled scripts (hooks, helpers) may be exercised with adversarial input in a temp dir only — test artifacts like caches are acceptable; never modify tracked plugin files (same rule as criterion 6 in the canonical block; that wording wins). Delete the probe temp dir when the probe is done — track every temp path you create; Stage 7 sweeps them.
The 7 criteria, the floor-not-ceiling rule, and the severity rubric are defined ONCE — inside the fresh-reviewer prompt block in references/reviewer-prompts.md. Read that block in full now and apply it as written: Stage 1 (lead) and Stage 5 (dispatched) review against the identical text; there is no second copy to drift. Criteria changes happen in that one place only (see Stage 7).
For high-stakes audits (the plugin gates other work, ships hooks, or will be distributed), WebFetch the current Claude Code docs — https://code.claude.com/docs/en/plugins and https://code.claude.com/docs/en/skills — and cross-check the canonical criteria before starting. The local criteria are a cache; the docs are the source.
Each finding: file:line + verbatim quote + severity per the canonical rubric. Suspected but unconfirmed → still report, tagged [UNCERTAIN]. Coverage matters more than precision — Stage 2 filters.
When Stage 1 completes, write the findings list to the audit ledger — .claude/audit-plugin-ledger.md at the project root (created here, maintained through Stage 5, deleted at Stage 7). Never keep it chat-only: it must survive /compact and session death; re-read it when resuming. The ledger holds ALL resume-critical state: the findings with statuses, the Stage 0 classification (benchmark applies or not), and the fresh-review round counter (rounds used / 4). If the file already exists, another audit may be in flight — stop and ask the user.
Every finding must pass independent verification before any fix is proposed (sole exception: the round ≥2 fast-path defined in Stage 5):
finding-validator agent./validator skill.references/reviewer-prompts.md §Validator fallback.The validator receives the finding list (file + line + claim) but NOT the Stage 1 reasoning or conclusions — strip [UNCERTAIN] and severity tags first; they are confidence signals that bias verdicts. Verdict per finding: CONFIRMED / FALSE POSITIVE / PARTIALLY TRUE, each with a quote from disk. Report retracted or overstated claims to the user honestly — before proposing any fix. Record every verdict in the ledger (refuted findings included — Stage 5's NEW-matching depends on them).
Zero findings (Stage 1 found nothing, or Stage 2 refuted everything): skip Stages 3–4 and run ONE fresh-review round (Stage 5) as confirmation — a clean first pass is more often a coverage failure than a clean plugin. The confirmation round counts toward the Stage 5 hard cap. If it returns 0 HIGH/MEDIUM → LOW-only output counts as clean (record the LOWs in the ledger as deferred polish) — proceed to Stage 6 if it applies per the Stage 0 classification (with user consent), otherwise Stage 7. If it returns any HIGH/MEDIUM → re-enter the normal path (Stage 2 validation → Stage 3 gate → Stage 4).
Present to the user: the validated findings table (severity, location, proposed fix) plus every design decision they must settle (use AskUserQuestion, each question with a recommended option and the reason). Design decisions include: changes to workflow behavior, trade-offs between two valid fixes, and anything touching files outside the plugin.
Wait for the user to approve the scope before editing anything. No reply → no edits.
If the user rejects ALL findings or aborts the audit: skip Stages 4–6 and go directly to Stage 7 — record every finding as rejected/deferred with the user's reasons. Never run a fresh review of unchanged disk.
python -m json.tool).git commit unless the user explicitly orders it.After each fix batch, spawn a fresh reviewer: a general-purpose agent with zero context — it does NOT know what was fixed and does NOT receive prior findings (feeding it the old list recreates the previous round's blind spots). It only reads the current state on disk and reruns the Stage 1 criteria, using the prompt template in references/reviewer-prompts.md §Fresh reviewer.
.claude/audit-plugin-ledger.md (created at Stage 1) — every finding ever reported, with status: confirmed-pending (validated, awaiting gate/fix) / fixed / deferred-by-user / rejected-by-user / refuted / unfixed-at-cap. Stage 2 verdict mapping: CONFIRMED → confirmed-pending; FALSE POSITIVE → refuted; PARTIALLY TRUE → confirmed-pending with the valid part restated and the incorrect part noted. Update the round counter here after every fresh round. A fresh round's finding is NEW only if it matches no ledger entry by location + substance (wording may differ between rounds). Re-reports of deferred or rejected findings are not new and do not block convergence — list them in the final report as "still open (deferred)" or "rejected by user" respectively.rejected-by-user and state plainly that a known HIGH ships unfixed). On approval: fix it, verify that single fix with a targeted validator pass instead of a full round, then STOP. Do not run a 5th full round on your own initiative.The benchmark costs real money — give the user the cost estimate from references/benchmark-guide.md §4 and run it only on explicit approval. Build a controlled fixture and run the plugin's full pipeline headless, scoring by hand-verified evidence. Fixture design, the 2-stage run (human gate → resume), Windows/headless caveats, and the scoring rubric: see references/benchmark-guide.md. For content-only plugins, state "Stage 6 not applicable: " and skip.
improvement-proposals.md at the audited plugin's root (create it if absent; if it exists, prepend a new entry — never edit historical entries). Split rule: the plugin-root file ships with every install, so it carries user-relevant content only (learnings, deferred findings, criteria notes); internal-only content — cost figures, operator-specific USER ACTIONs, machine/environment state — goes to the project-level .claude/audit-plugin-proposals.md instead. USER ACTION lines must carry an as-of timestamp, and each new audit entry must explicitly close or restate prior entries' still-open USER ACTIONs (a stale "do X" note becomes misinformation once X is done). If the plugin root is not writable (e.g. a marketplace cache install), write everything to the project-level file and say so. Note: this per-plugin file is distinct from the project-level .claude/improvement-proposals.md that subagent-system's pipeline-retrospective writes — when both exist, name which one you mean.[OUT-OF-CRITERIA] — and every defect discovered by real-world events that no criterion would have caught — MUST produce a proposal to update this plugin's own criteria/prompts, written to improvement-proposals.md at the audit-plugin root (in addition to the audited plugin's record). The criteria list stays alive only through this loop; a criteria set that never grows is a systematic blind spot shared by every future review round. Criteria changes are applied in exactly ONE place — the canonical block in references/reviewer-prompts.md; SKILL.md only points there, so there is nothing to sync..claude/audit-plugin-ledger.md — the audit is closed; an orphaned ledger blocks the next audit's Stage 1.~/.claude/projects/ stay — cleanupPeriodDays handles them.git status/git diff — unexpected working-tree changes from a read-only agent mean stop and investigate before continuing. No git → compare key files by content.references/benchmark-guide.md §4 (that file wins on divergence).references/reviewer-prompts.md — canonical verbatim prompt templates for the fresh reviewer and the validator fallback, plus context-isolation rules. These prompts are the single source — use them verbatim, replacing only the placeholders.references/benchmark-guide.md — fixture design, 2-stage headless run, Windows caveats, the 6-dimension scoring rubric, and the hand-verification checklist.