From ai-tools
Audits a codebase across nine dimensions with a 0-10 score. Supports dotnet and JavaScript/TypeScript ecosystems using deterministic JSON evidence. Activates on quality audit, post-merge health review, or due-diligence requests.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ai-tools:code-scorecardThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Audit a codebase across nine dimensions on a 0–10 scale. Use scorecard-native JSON evidence as authoritative for every deterministic dimension it contains. Fall back only for dimensions whose probes are missing, skipped, failed, or not yet implemented.
Audit a codebase across nine dimensions on a 0–10 scale. Use scorecard-native JSON evidence as authoritative for every deterministic dimension it contains. Fall back only for dimensions whose probes are missing, skipped, failed, or not yet implemented.
Announce at start: "I'm using the code-scorecard skill to perform a 9-dimension audit."
The deterministic pass starts from scorecard-native JSON evidence at <repo-root>\.scorecard\<ecosystem>\evidence.json (schema v2), produced by the CodeMetrics.AI analyzer for each detected ecosystem — the code-metrics dotnet global tool for dotnet, the codemetrics-ai NPM CLI for javascript-typescript. Before generating or trusting evidence, update the analyzer so the latest deterministic rules are used. Do not silently substitute qualitative scoring when JSON evidence includes a deterministic dimension score.
✅ Manual request for a codebase audit, scorecard, or quality review ✅ After a major code update, refactor, large feature merge, or release ✅ Pre-handover or pre-acquisition due-diligence reviews ✅ Periodic health checks on a long-lived codebase ❌ Narrowly scoped reviews (use security-audit, root-cause, or code-map instead) ❌ Implementation work — this skill produces a scorecard, not fixes
<repo-root>\.scorecard\<ecosystem>\evidence.json (ecosystem ids: dotnet, javascript-typescript)dotnet ecosystem only: <repo-root>\.scorecard\dotnet\metrics.csvFirst-time setup, missing evidence, or regenerating after a tool update: see bootstrap.md for ecosystem detection and the per-ecosystem generation procedure.
Scope: if the repo contains multiple .sln/.slnx/.csproj files or multiple package workspaces, specify which entry point to use at invocation time — e.g. "run the scorecard against eContract.API.slnx" or "run the scorecard against Worker.csproj". To restrict a polyglot repo to one ecosystem, say so — e.g. "scorecard the dotnet side only". If nothing is specified, the skill detects and prompts.
For dimensions without usable JSON evidence, source code access is required. The skill will read targeted files as needed during fallback qualitative scoring — it does not need to read every file.
Detect which analyzers apply before touching evidence. Check the repo root (non-recursive):
| Marker | Ecosystem id | Analyzer |
|---|---|---|
*.sln / *.slnx / *.csproj | dotnet | code-metrics dotnet global tool |
package.json | javascript-typescript | codemetrics-ai NPM CLI |
.scorecard\<ecosystem>\evidence.json. Missing, invalid, or mismatched evidence → run the bootstrap for that ecosystem (bootstrap.md)..scorecard\dotnet\ in a repo with no .sln, .slnx, or .csproj) → stale evidence; report it and do not score from it.Inspect the invocation args string for these flags before scoring:
.sln, .slnx, .csproj, or a package.json): scope the scorecard to that entry point's ecosystem (see Scope above).dotnet, javascript-typescript): scope a polyglot repo to one ecosystem.--verbose: also emit Sections 4 (Score Lift Summary), 5 (Top Offenders by Metric), and 6 (Deterministic Detail). Use when the user wants the extra prose backing the scores.--explain: also emit Section 7 (Score Derivation Detail) — filter counts, per-signal scores, threshold lookups, offender attribution. Load metrics-glossary.md for the formulas and threshold rationale. Section 7 is written to stand on its own; it does not require --verbose.The flags are additive — pass both for the full breakdown. If args are absent, run the default scorecard (Sections 1, 2, 3).
Layering, boundaries, interface use, dependency inversion, single responsibility. God classes and direct static dependencies are penalized.
Decomposition ratio, single-method complexity, and offender concentration, computed from the metrics export.
Test coverage and quality. Empty stub files, brittle tests, and zero-test projects are penalized. Integration coverage counts.
Secret management, authentication, authorization, input validation, CSRF protection, dependency CVEs, error message leakage.
Exception strategy, logging, observability. Empty catches, swallowed exceptions, and stack-trace destruction (e.g. throw ex) are penalized.
README, inline docs where they add value, architecture docs, AI/onboarding instructions, intent in code reviews. TODO/TBD markers and missing expected docs are penalized; do not infer staleness from filesystem mtimes.
Currency of packages, central management, version consistency, transitive risk. Outdated or mixed framework targets are penalized.
Async usage where I/O is involved, query efficiency, caching, pagination, N+1 awareness. Synchronous I/O on hot paths is penalized.
Maintainability index distribution and bottom-tail health, computed from the metrics export.
Apply to every qualitative dimension:
| Score | Meaning |
|---|---|
| 10 | Best-in-class. Industry exemplar. No meaningful gaps. |
| 8 | Strong. Minor gaps, no systemic issues. |
| 6 | Adequate. Inconsistent in places but functional. |
| 4 | Weak. Real problems that will compound under change. |
| 2 | Poor. Will block scaling, onboarding, or safe modification. |
| 0 | Absent or actively harmful. |
When JSON evidence contains a dimension score, use it as authoritative and cite its basis/status. Qualitative anchors apply only to dimensions without usable JSON evidence. CSV deterministic fallback applies only to Code Quality and Maintainability, and only for the dotnet ecosystem — its thresholds and archetypes are Roslyn-calibrated.
Run this pass once per detected ecosystem, starting from .scorecard\<ecosystem>\evidence.json:
schemaVersion == 2. If the schema is missing or unsupported, report that explicitly and regenerate with the latest analyzer for that ecosystem (see bootstrap.md).tool.ecosystem must equal the directory name the evidence was found under; subject.entryPoint must match the resolved entry point; for dotnet, subject.variant must match the requested configuration; tool.version must match the analyzer version just installed/updated. If any value is missing or mismatched, regenerate evidence.generatedAtUtc or filesystem LastWriteTime values to source-file mtimes. Those values vary across clones and CI checkouts.dimensions, use the JSON score when status is scored.status, basis, and top finding counts in the evidence summary.skipped or failed, report the status and reason. Fall back only for that dimension:
dotnet ecosystem only: use the CSV deterministic procedure in csv-fallback.md against .scorecard\dotnet\metrics.csvshared/scorecard-schema/dimensions.md in the CodeMetrics.AI repo), add one line noting that scores are not calibrated against other ecosystems.If evidence is missing entirely for a detected ecosystem, jump to bootstrap.md before scoring that ecosystem.
Return exactly this, in this order. Which sections render depends on invocation args:
| Mode | Sections emitted |
|---|---|
| Default | 1, 2, 3 |
--verbose | 1, 2, 3, 4, 5, 6 |
--explain | 1, 2, 3, 7 |
--verbose --explain | 1, 2, 3, 4, 5, 6, 7 |
Sections 1–3 are always shown (summary layer). Sections 4–6 are verbose justification, ordered from highest-leverage to most analytical. Section 7 is the math, gated on --explain.
Polyglot repos: when more than one ecosystem was scored, render the selected sections once per ecosystem, each under an ## <ecosystem> heading, then close with a single Suite Summary table:
| Dimension | dotnet | javascript-typescript |
|---|---|---|
| ... one row per dimension, then a per-ecosystem Overall row ... |
Never average, combine, or rank scores across ecosystems — cross-ecosystem comparability requires the calibration procedure in the shared contract, and uncalibrated ecosystems must carry a one-line caveat under the table.
Markdown table with columns: Dimension, Score, Evidence.
For deterministic dimensions, evidence is a one-sentence summary of the three signal scores plus the primary offender. For qualitative dimensions, evidence is one sentence with a concrete artifact (file, pattern, count).
| Dimension | Score | Evidence |
|---|---|---|
| Architecture & SOLID | ||
| Code Quality | X.X | Decomp X / MaxCC X / extreme rate Y%; worst: ClassName (ratio Z) |
| Testing | ||
| Security | ||
| Error Handling | ||
| Documentation | ||
| Dependency Management | ||
| Performance & Async | ||
| Maintainability | X.X | %MI<60: Y%, p10 MI: Z, N classes with MI<40 |
| Overall | Unweighted mean of applicable scores, one decimal |
Below the main scorecard, include a compact table of JSON evidence status:
| Dimension | Source | Status | Basis / probe summary |
|---|---|---|---|
| Code Quality | JSON or CSV fallback | scored/skipped/failed/fallback | Key thresholds or fallback reason |
For skipped/failed dimensions, show the explicit reason and what fallback was used. If JSON was unavailable and CSV fallback was used, say so in the Source column.
Highest-impact problems to fix first. For each:
--verbose, when applicable)If Top 3 Issues touch deterministic dimensions, restate the projected score after addressing them.
--verbose)For each of the three primary metrics, list the top 5 (not 10) worst classes with their metric value, archetype, and a one-sentence reason. Surface God/Legacy reclassifications even if they rank below 5.
--verbose, Dimensions 2 and 9)Three-signal breakdown for the deterministic dimensions:
Code Quality detail
Decomposition ratio: P=X T=X E=X → score X.X
Max member CC: P=X T=X E=X → score X.X
Composite: → score X.X
Maintainability detail
Maintainability index: P=X T=X E=X → score X.X
--explain only)Emit this section only when --explain appeared in the invocation args. Otherwise skip entirely. Section 7 is self-contained — it does not assume Section 6 was shown, so it must restate the three-signal breakdown for any deterministic dimension it covers.
For each deterministic dimension (Code Quality, Maintainability), show:
For qualitative dimensions when --explain is set, briefly state what evidence was inspected (files read, patterns counted, scope of search) so the user can audit the call.
Load metrics-glossary.md for formulas, threshold rationale, and the canonical layout of this section. Keep prose minimal — the user asked for the math, not narrative.
dotnet, if neither JSON nor CSV fallback is available for Code Quality or Maintainability, ask for evidence instead of guessing from reading code. For non-dotnet ecosystems without usable JSON, score qualitatively and state that deterministic evidence was unavailable.bootstrap.md — first-time setup, tool install/update, evidence regeneration (Steps 0–5), Path A/B input detailscsv-fallback.md — CSV deterministic procedure for Code Quality (dim 2) and Maintainability (dim 9), dotnet ecosystem only, including the 9-step pass, archetype tagging, per-archetype scoring reference, and calibration notestroubleshooting.md — common failures and fixes (tool not found, entry-point load errors, missing/unsupported evidence, skipped probes, empty CSV)metrics-glossary.md — load only when --explain is set. Formulas behind decomposition ratio, max member CC, and MI; threshold rationale; how a dimension score is derived from the three signals; canonical layout for the Section 7 output.shared/scorecard-schema/Blocks Edit/Write/Bash actions until Claude investigates importers, data schemas, and user instructions. Improves output quality by forcing concrete facts before edits.
npx claudepluginhub sean-m-cooper/ai_tools --plugin ai-tools