Help us improve
Share bugs, ideas, or general feedback.
From squid
Sweeps codebase architecture periodically by reading ADRs, mapping module/dependency graphs, surfacing 5-10 smells with severity, and generating refactor proposals for /refactor. Triggered by '/architecture-review'.
npx claudepluginhub iusztinpaul/squid --plugin squidHow this skill is triggered — by the user, by Claude, or both
Slash command
/squid:architecture-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The team's day-to-day pipeline (`/day`, `/night`) operates at task grain. Long-horizon codebase health — drift, unintended coupling, dead modules, layering violations — accumulates between tasks and never gets addressed unless someone deliberately looks. This skill is that deliberate look.
Scans a codebase, conducts a structured interview, and produces a shareable architecture insights document. Use when assessing codebase architecture or direction.
Analyzes codebase architecture via noun analysis and produces a target blueprint. Advisory only — does not implement changes, but offers to cut tickets for planned work.
Identifies deepening opportunities in codebases by spotting shallow modules and tight coupling, informed by domain glossary and ADRs. Proposes refactors for better architecture, testability, and AI-navigability.
Share bugs, ideas, or general feedback.
The team's day-to-day pipeline (/day, /night) operates at task grain. Long-horizon codebase health — drift, unintended coupling, dead modules, layering violations — accumulates between tasks and never gets addressed unless someone deliberately looks. This skill is that deliberate look.
The output is not a single PR. It's a prioritised backlog of refactor proposals, each shaped so /refactor can pick one up and turn it into a Tasks Plan.
You are the auditor — you delegate exploration to sub-agents, you read docs/adr/ to avoid re-proposing settled questions, and you produce a written report. You do NOT write code, do NOT start refactors, and do NOT decide priority for the team — you propose, the human prioritises.
$ARGUMENTS:
packages/backend/) → audit only that subtree.backend, frontend-web) → audit only that component.Smaller scopes produce sharper findings; whole-repo audits produce broader maps but blunter recommendations.
/refactor lands and you want to find the next one./refactor with the named target./review territory.Resolve $ARGUMENTS. If empty, ask the user one question via AskUserQuestion:
Whole repo, one component, or a specific subtree?
Echo the resolved scope back as a one-line confirmation. Don't block.
Read docs/adr/ end-to-end if it exists. Build a mental model of:
This is the single highest-value step. An architecture review that proposes "split the auth module" when ADR-0005 already explained why it's deliberately combined wastes everyone's time. If ADRs don't exist, note that — recommend adr.md as a prerequisite to capturing this review's outcome.
Also read, if present:
docs/glossary.md — to know whether the team already names concepts consistently.CLAUDE.md and any per-component CLAUDE.md — for stated rules.Spawn 2–3 Explore agents in parallel:
Agent(
subagent_type="Explore",
prompt="""Architecture mapping pass on {scope}.
Produce four artefacts:
(1) Module / package list with one-line responsibility per module (file:line for the canonical entry point).
(2) Dependency graph — who imports whom. Flag any cycles. Flag any "junk drawer" modules imported by everyone.
(3) Layering — name the de-facto layers (e.g., handlers → services → repositories → models) and list any layer-skipping imports.
(4) Public surface — what's exported / consumed by other components or external callers. Mark anything intended-internal that's leaked.
Be exhaustive on the dependency graph; missing an edge produces a wrong recommendation. Report as four sections."""
)
Agent(
subagent_type="Explore",
prompt="""Hot-spot scan on {scope}.
Find:
(1) Files with > 500 lines.
(2) Files / modules touched by > 30% of commits in the last 6 months — `git log --since='6 months ago' --name-only | sort | uniq -c | sort -rn | head -20`.
(3) Tests with > 100 setup lines or > 5 fixtures — usually a smell that the system-under-test is too coupled.
(4) `# TODO` / `# FIXME` / `# HACK` markers — count and cluster.
(5) Any `# type: ignore` / `eslint-disable` / `nolint` clusters — places where the team is fighting their own static analysis.
Report as five sections with file:line."""
)
When agents return, read the top-3 most-implicated files yourself. Don't rely on summaries on the load-bearing modules — you need firsthand familiarity to call findings credibly.
Walk these dimensions. For each, form 0 or more findings. Don't manufacture findings to pad the report — empty dimensions are fine and signal health.
| Dimension | What to look for |
|---|---|
| Cycles | Import cycles between modules / packages. Always a smell. |
| Layer violations | Lower layers (e.g., models) importing higher layers (e.g., HTTP handlers). |
| God modules | One module that everything imports; usually means responsibilities accreted. |
| Junk drawers | utils/, common/, helpers/ modules with > 10 unrelated functions. |
| Hidden coupling | Two modules that should be independent but share a hidden contract (a shape, a magic string, an env var). |
| Test smells | High setup-to-assert ratio; brittle tests on private internals; flaky tests. Usually rooted in a structural problem in the SUT. |
| API leakage | Internal types reaching public surface; public surface that should be private (e.g., a route that no caller uses). |
| Dead code | Unused exports, unreachable branches, modules with zero importers. |
| Drift from CLAUDE.md | Stated rules the codebase no longer follows. The rule is right; the code drifted. |
| ADR drift | Decisions ADRs describe that the code no longer reflects (silently superseded). Either update the ADR or fix the code. |
| Performance shape | Sync I/O in async handlers; N+1 patterns; in-memory aggregation that should be a query. |
| Observability gaps | Critical paths without logs / metrics / traces; logs that don't include correlation IDs. |
For each finding, assign:
Prioritise by severity / effort ratio, with confidence as a tiebreaker. Do not propose more than 10 findings. A 30-finding report gets ignored; a 7-finding report gets acted on. Cluster related findings into one if they share a root cause.
Path:
tracker/architecture-review-{YYYY-MM-DD}.md.architecture-review.Template:
# Architecture review — {scope}, {YYYY-MM-DD}
**Scope:** {whole repo / component / path}
**ADRs read:** {N} ({list, e.g. "0001–0007"})
**Findings:** {N total — by severity: S1×N, S2×N, S3×N, S4×N}
## Summary
{2–3 sentences. The headline shape: "Three module cycles between auth/ and core/, all introduced in the last 4 months, blocking the planned auth-extraction work. Tests on the affected modules have grown 2× in setup size. No ADR governs the boundary. Recommend ADR + extraction refactor."}
## Findings
### F1 — {one-line title} (S{1–4}, effort: {S/M/L}, confidence: {H/M/L})
**Where:** `path/to/a.py:42`, `path/to/b.py:117`
**Symptom:** {what is concretely wrong, observable}
**Root cause hypothesis:** {your best read on why}
**Proposed refactor:** {one paragraph; the shape of the fix, not the fix itself}
**ADR exposure:** {does an ADR govern this? if yes, cite. If no, an ADR should accompany the refactor.}
**Out of scope (explicit):** {what this finding is NOT recommending}
### F2 — ...
(Repeat for each finding, top to bottom by priority.)
## Not recommended
{Smells you found but are deliberately NOT proposing. State why — e.g., "F-skip-1: The `utils/strings.py` module has 12 unrelated functions and looks like a junk drawer. Skipping because every function is < 10 lines and the cost of splitting exceeds the reading cost."}
## Pre-existing decisions (from ADRs) you should NOT undo
{One-bullet-each summary of ADRs that explain *why* something looks weird but is intentional. This protects the next reader from re-litigating.}
## Suggested order of operations
1. F{best-first} — {why it goes first}
2. F{next} — {why}
...
Each finding above can be picked up by `/refactor F{N}` (paste the finding's body into the refactor goal). Don't tackle all of them at once — pick the top 1–3 and run them through the pipeline first.
Single block:
## Architecture review complete — {scope}
**Report:** {path or issue URL}
**Findings:** {N} ({severity breakdown})
**Top recommendation:** F1 — {title}
### Recommended next step
`/refactor F1` (or paste the F1 body into a fresh `/refactor` invocation). The first finding is the highest leverage by severity-over-effort.
### Open question (if applicable)
{Anything you found but couldn't conclude on without team input — e.g., "F4 hinges on whether the auth/billing boundary should be a hard-coded contract or a feature-flag-toggled split. Surfacing for team decision before refactoring."}
### ADR action
{One of:}
- "Recommend ADR-{NNNN} to record the resolution of F{X} when its refactor lands."
- "No ADRs in `docs/adr/` — recommend bootstrapping with [`adr.md`](../scaffold/specs/adr.md) before proceeding so this review's findings stay durable."
/refactor then plans. /night then executes. Skipping ahead to "we should do X" pre-commits the team.