From claude-swe-workflows
Performs first-pass strategic review of repositories, producing evidence-cited maps calibrated to reference classes to guide where to engage, tread carefully, or leave alone. Advisory only.
npx claudepluginhub chrisallenlane/claude-swe-workflows --plugin claude-swe-workflowsThis skill uses the workspace's default tool permissions.
A first-pass review skill for the moment you want to step back and assess a repo strategically: you've just inherited it, you're evaluating a FOSS project for adoption, you're onboarding a teammate, or you're revisiting your own project to decide where to invest. The skill produces an evidence-cited map of the repo's state — not a grade. Its output is built to inform strategic decisions about e...
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
A first-pass review skill for the moment you want to step back and assess a repo strategically: you've just inherited it, you're evaluating a FOSS project for adoption, you're onboarding a teammate, or you're revisiting your own project to decide where to invest. The skill produces an evidence-cited map of the repo's state — not a grade. Its output is built to inform strategic decisions about engagement, not to itemize every imperfection.
This skill is advisory only. It makes no changes. To act on findings, hand off to /refactor, /review-arch, /review-test, /review-security, or other specialists as the findings indicate.
Observation before interpretation. The skill's procedure enforces an OODA cadence — Observe, Orient, Decide, Act — with strict phase gates. The Observe phase collects signals without verdicts. Only after observation is complete does interpretation begin. This is the structural countermeasure to the most common failure mode of informal code review: fixating on the first file opened and building a distorted mental model from there.
"Good" is relational, not absolute. A 34% test-coverage finding is a different finding in a research prototype than in an OSS library with external consumers. The skill's calibration is anchored in reference classes (references/classes/): the repo is classified into a class first, and each dimension is evaluated against class-specific expectations. Classification is a cited, overridable output — the user sees which class was applied and can correct it in one line. Every downstream finding reflows against the correct class.
Every claim carries evidence. A finding without a file:line citation or a tool-output reference is not a finding; it is an assertion and must be dropped or demoted to the Coverage Manifest. This is enforced structurally (the phase-integrity check in §Evidence Discipline), not stylistically.
Named unknowns beat silent unknowns. The Coverage Manifest is first-class output. Tools that weren't available, signals that couldn't be computed, and questions that couldn't be answered are named explicitly with the reasons they couldn't be resolved. A honest "we couldn't assess X" beats a confident assessment that silently excluded X.
Breadth first. Compose, don't reinvent. This skill is the wide-and-shallow pass. When a finding warrants a deep-and-narrow follow-up, the skill routes to a sibling specialist (/review-arch, /refactor, /review-test, /review-security, etc.) with a scoped argument. The skill does its own work at breadth level; it defers to siblings only when a sibling is an exact fit for what needs doing.
New-repo orientation has a small set of predictable cognitive failures. The skill's procedure is designed to counter each:
The practitioner traditions this skill draws from — OODA (Boyd), medical differential diagnosis, home inspection (ASHI severity tiers), marine surveying (reference-class calibration), technical due diligence (M&A), intelligence situational-awareness briefings — are all codifications of these countermeasures in domains that have had to solve "rapid assessment of unfamiliar systems under uncertainty" at higher stakes than software review.
┌────────────────────────────────────────────────────────────────┐
│ REVIEW-HEALTH │
├────────────────────────────────────────────────────────────────┤
│ Phase 0 — Preflight │
│ • Elicit lens (what's the user doing?) │
│ • Determine scope (whole repo / directory / module) │
│ │
│ Phase 1 — OBSERVE (no interpretation allowed) │
│ • Repo enumeration (size, structure, entry points) │
│ • Git-history signals (cadence, churn, bus factor) │
│ • Tooling signals (lint, coverage, CI, deps, secrets) │
│ • Documentation signals (README, inline, architectural) │
│ • Coverage Manifest: what couldn't be collected and why │
│ │
│ Phase 2 — ORIENT (classification + rubric application) │
│ • Differential diagnosis → reference class (cited) │
│ • Apply class rubric to each dimension │
│ • Severity-tier individual findings │
│ • Cross-cutting synthesis │
│ │
│ Phase 3 — DECIDE (strategic options for the user's lens) │
│ • Generate 2-4 engagement options │
│ • Each option cites Orient findings │
│ │
│ Phase 4 — ACT (concrete next steps) │
│ • Top 3-5 recommendations │
│ • Sibling-skill routing with scoped arguments │
└────────────────────────────────────────────────────────────────┘
Elicit the lens. Ask the user one question:
What's the context for this review?
- Inheriting a work repository — taking over maintenance or ownership
- Evaluating a FOSS project — for adoption, contribution, or dependency
- Revisiting my own repo — strategic overview of current state
- Onboarding a teammate — generating a map for someone else
- Other — briefly describe
Default: ask; do not guess from context. The lens is a load-bearing input — it shapes the Decide phase's option generation and informs classification.
Determine scope. Default: the whole repo. If the user specifies a directory or module, honor that scope in all phases. If the repo is unusually large (>100k LOC or >10 top-level subsystems), ask whether to scope down before running the full pass.
Abort if: not a git repository, or no source files in scope. Do not abort on missing tooling — that's handled by the Coverage Manifest.
Collect signals. Interpretation is not allowed in this phase. Output language is neutral and factual — "X is present / absent / at value Y," not "X is good / concerning / inadequate." Every signal carries a citation: a file:line reference, a tool invocation with its output, or an explicit "could not be collected" entry.
Signal battery (run what applies; record what doesn't):
Repo shape:
tokei, scc, cloc, or equivalent; fall back to file-extension counting)tree -L 2 -d or ls -la)main.go, cli.py, src/index.ts, bin/, cmd/; also: manifest declarations for binaries/scripts)package.json, go.mod, Cargo.toml, pyproject.toml, Gemfile, composer.json, etc.)Git-history signals:
git log --reverse | head -1, git log -1)git log --pretty=format:"%ad" --date=format:"%Y-%m" + sort/count)git shortlog -sne --since="12 months ago")git log --name-only --pretty=format: | sort | uniq -c | sort -rn | head -20)Tooling signals (best-effort; record absences):
tests/, *_test.go, spec/).coverage, coverage.xml, lcov.info), parse it; if not, note absence.eslintrc, .golangci.yml, pyproject.toml [tool.ruff], etc..github/workflows/, .gitea/workflows/, .gitlab-ci.yml, .circleci/, Jenkinsfile, etc.npm audit, pip-audit, bundle audit, cargo audit, govulncheck, or equivalent if available; record output summarynpm outdated, pip list --outdated, go list -u -m all, or equivalent.env tracked in git, API_KEY= in source, private-key headers in files)Documentation signals:
docs/, ARCHITECTURE.md, ADR directories)Coverage Manifest entries. For every tool that couldn't run, signal that couldn't be computed, or area that couldn't be assessed, record:
Output of Phase 1: a structured Observation Record with numbered entries (O1, O2, ...) or equivalent referenceable IDs. Every Orient finding will cite these IDs.
Interpret signals. This is where findings are produced.
Step 2a: Classify the repo (differential diagnosis).
Load the class definitions from references/classes/. For each candidate class, gather confirming and disconfirming evidence from the Observation Record and the user's stated lens:
Candidate:
production-serviceFor:O12(Dockerfile present with runtime target),O17(CI workflow gates merges),O8(>5 contributors in last 12 months) Against:O4(no deployment artifacts, no on-call docs) Confidence: Low
Rank candidates by weight of evidence. Pick the best-fit class. Hedge explicitly:
Classified as:
solo-utility(medium confidence) Reason: small contributor count, no deployment infrastructure, no external-consumer signals. Override if you consider this repo to be a different class.
For hybrid repos, apply multiple class rubrics to the dimensions where they differ and flag the hybrid explicitly. (Example: a CLI tool that's also a published library gets solo-utility for its command surface and oss-library for its public API.)
Step 2b: Load the class rubric.
Read the matched class file under references/classes/<class-name>.md. The rubric defines five dimensions (test health, dependency health, CI/automation health, documentation, architecture hygiene) with three levels each (Foundational / Adequate / Strong) and per-level criteria.
Step 2c: Apply the rubric to each dimension.
For each dimension, place the repo at Foundational / Adequate / Strong with cited evidence from the Observation Record:
Test Health: Adequate Criteria met:
O22shows test-to-code ratio of 0.34 (above 0.15 threshold for Adequate);O23confirms tests cover all three primary CLI commands. Strong would requireO24's coverage report to show ≥60% for core modules (observed: 45%), plus a matrix against supported runtime versions. Gap to Strong: coverage below threshold insrc/parser/andsrc/validators/; no runtime-version matrix in CI.
Step 2d: Severity-tier individual findings.
Within each dimension's assessment, individual notable findings get ASHI severity tiers (see references/severity-tiers.md): Safety Hazard / Major / Minor / Cosmetic. Severity is intrinsic to the finding, independent of class. Lead with Safety, then Major. Minor is reported concisely; Cosmetic is aggregated or omitted unless requested.
Step 2e: Cross-cutting synthesis.
Look across dimensions and modules for patterns no single dimension would reveal:
src/api/ and src/workers/)Cross-cutting findings are often the highest-leverage observations in the whole review. They're the ones per-dimension or per-language reviewers miss structurally.
Step 2f: Coverage Manifest.
Carry forward the Phase 1 Coverage Manifest entries, and add any new ones surfaced during interpretation. Name what the skill couldn't assess. This section is not a caveat; it is first-class output.
Phase 2 integrity check: every Orient claim must cite one or more Observation Record IDs. Any claim that cannot be cited is dropped, demoted to Coverage Manifest, or explicitly flagged as "interpretation — no direct evidence."
Generate strategic options for the user's lens. For each option, state what it entails and what findings from Orient support or complicate it.
Lens-specific option patterns:
Inheriting a work repository:
Evaluating a FOSS project:
Revisiting my own repo:
Onboarding a teammate:
Each option cites Orient finding IDs. The user picks; the skill does not pick for them.
Phase 3 integrity check: every option's supporting rationale must cite Orient findings by ID.
Produce a concrete next-step queue: top 3-5 recommendations, prioritized by severity × lens-relevance. Each recommendation is either:
Sibling-skill routing pattern:
Architecture concerns concentrated in
services/billing/: Run/review-arch services/billing/for a deep structural read.
Test gaps in
src/parser/andsrc/validators/: Run/review-test src/parser/ src/validators/for coverage-gap analysis.
Dependency audit surfaced high-severity CVEs: Run
/review-securitywith focus on direct and transitive dependencies.
Route to a sibling skill only when the sibling does exactly what needs to be done at the depth warranted. If the finding needs shallow follow-up, recommend the action directly rather than routing.
The skill's credibility is load-bearing. These rules are structural, not stylistic:
Every claim cites evidence. Findings in Orient, Decide, and Act cite either an Observation Record ID (preferred) or a direct file:line / tool-output reference. A claim without evidence is dropped or moved to Coverage Manifest.
Phase integrity. Orient claims cite Observe. Decide options cite Orient. Act recommendations cite Orient or Decide. A claim in a later phase that cannot cite an earlier phase's finding is a violation and must be re-examined.
Self-check before delivery. Before presenting results, scan for: uncited adjectives, claims phrased as "the author should..." without a cited basis, any verdict word in the Observe section. Remove or re-cite.
Coverage Manifest is not a weakness. Named unknowns are strictly better than silent unknowns. "We couldn't assess X because Y" is informative; confident-sounding output that silently omitted X is not.
The skill is primarily executed by the main Claude instance — OODA cadence is inherently sequential and benefits from synthesis within a single context. Subagents are used narrowly:
Do not routinely spawn per-language SMEs. That is the prior failure mode this redesign corrects.
/review-health is allowed and encouraged to recommend sibling skills. It does not normally invoke them itself — recommendations are for the user.
| Finding type | Sibling skill | When to route |
|---|---|---|
| Architectural coupling / module boundaries | /review-arch | Architecture hygiene at Foundational + cross-cutting coupling findings |
| Tactical cleanup (DRY, naming, dead code) | /refactor | Multiple Minor findings in a single module |
| Test coverage or test quality gaps | /review-test | Test health at Foundational, or specific coverage gaps |
| Security findings (CVEs, secrets, auth) | /review-security | Safety-tier findings involving security |
| Performance hotspots | /review-perf | Observations of perf-sensitive code without benchmarks |
| Accessibility (for web projects) | /review-a11y | Documentation findings for web UI code |
| Documentation gaps | /review-doc | Documentation at Foundational |
| Pre-release readiness | /review-release | When the user is preparing to publish/deploy |
Route with a scoped argument when possible: /review-arch services/billing/ is more useful than /review-arch.
Inline by default. Present the report as structured markdown in the conversation. The user can save it manually if desired.
Output structure (present in this order):
Keep the Observation Record compact and scannable (tables where appropriate; not prose paragraphs). The Findings section is the primary narrative. Recommendations are bulleted and concrete.
Abort:
Do NOT abort:
/refactor or /review-arch./review-security./review-perf.