Help us improve
Share bugs, ideas, or general feedback.
From archspec
Read-only workflow that consults SERVICE_MAP.yaml to clarify ambiguous requirements, produce a change plan with Mermaid diagram, YAML patch, event fan-out, risk register, and a definition of done before implementation begins.
npx claudepluginhub krus210/archspec --plugin archspecHow this skill is triggered — by the user, by Claude, or both
Slash command
/archspec:architecture-investigateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Read-side workflow. Never modifies files. Produces:
Regenerates Mermaid architecture diagrams and ARCHITECTURE.md from SERVICE_MAP.yaml. Validates YAML against schema, produces deterministic output, and stages changes.
Reviews architecture of written plans: scores data flow, failure modes, edge cases, test matrix, rollback safety (0-10 each) with citations; produces ranked fixes.
Guides system design, architecture reviews, ADR creation, hexagonal compliance, Mermaid/PlantUML diagram generation, and layer dependency enforcement.
Share bugs, ideas, or general feedback.
Read-side workflow. Never modifies files. Produces:
SERVICE_MAP.yaml.Every run should end with the same shape, so the user can act on it without guessing what is next:
docs/SERVICE_MAP.yaml.:::new.SERVICE_MAP.yaml set, the complete list of producers and consumers for each new or changed event and dedup/join key, with each dead-end branch's terminal state + notification; undetermined fan-out marked # UNCONFIRMED.edge_cases) — every gap, deviation, # UNCONFIRMED, and join-key risk surfaced above, restated as a concrete edge_cases[] entry inside the YAML patch (id + a description that carries the given/when/then + a test: path). This is the bridge that carries a finding into code: a sentence in chat is forgotten the moment the plan step takes over, but an edge_cases[] entry persists in the contract and DET-003 blocks the commit until its test file exists.Self-review: <N> pass(es), <findings or "no findings"> recording the loop result (always emitted)./archspec:validate (and, for cross-service work, /archspec:check-architecture) is green and every edge_cases[] entry added above has a test that exercises it.apply YAML edits -> /archspec:sync -> implement -> /archspec:validate -> /archspec:check-architecture when the change spans services./archspec:investigate.Locate the contract:
test -f docs/SERVICE_MAP.yaml || echo "no SERVICE_MAP.yaml — run /archspec:init"
If missing, stop and tell the user.
Read the slice that matters. Use the Read tool on docs/SERVICE_MAP.yaml. Identify which sections relate to the user's question:
| User mention | Relevant sections |
|---|---|
| endpoint, route, handler | api.endpoints, architecture_rules.required_layers |
| call, dependency, downstream | dependencies.downstream, dependencies.storage |
| event, kafka, topic | events.published, events.consumed, consistency.write_path |
| aggregate, lock, conflict | concurrency.aggregates, consistency.bounded_aggregate |
| retry, fallback, failure | dependencies.downstream.sync.*.fallback, failure_modes |
2a. Optional: ingest a reference / golden architecture spec. A feature prompt rarely states the naming conventions or the out-of-prompt invariants the team already expects (e.g. "reassignment reuses the initial match snapshot — do not re-run the search"; the canonical subject is offer.declined, not a freshly invented task.offer_rejected). Ask the user once with AskUserQuestion:
"Is there a reference / golden architecture spec for this change — a design doc, an RFC, a target-state diagram, a naming convention? Paste a path, or answer
skip."
offer.declined → task.offer_rejected or quietly drop an invariant the spec requires. The spec is a hint, never an override: if it contradicts the live SERVICE_MAP.yaml or the code, prefer reality and say so.skip / none — proceed, and note in the output that no reference spec was supplied, so naming and any out-of-prompt invariants were inferred from the contract alone (a known blind spot — the prompt cannot be assumed complete).Clarify ambiguities before proposing anything (gate). A feature prompt is almost never complete enough to design a cross-service change. Walk the checklist below and, for every dimension the prompt and SERVICE_MAP.yaml do not already answer unambiguously, ask the user with AskUserQuestion (batch the questions into as few calls as possible). Asking is read-only — it modifies nothing.
| Dimension | What to pin down | Failure it prevents |
|---|---|---|
| Entry point & ownership | Which service receives the trigger? When the trigger comes from an external actor, does the request enter through the public edge (api-gateway / BFF) and not just the internal owner — i.e. is the whole reference flow (client → gateway → owner) wired, not only the last hop? Does an existing endpoint/flow already own this action (a decline/cancel path), and which aggregate owns the state it mutates? | Bolting the trigger onto the wrong service, or leaving no public entry point. |
| Trusted identity & actor | Separate sub-question — do not consider it answered just because the entry point is settled. Is any identifier in the request payload (e.g. worker_id, user_id, account_id) used as the actor's identity instead of being derived from the authenticated session/caller? Whoever the action runs as must come from the caller's credentials, not the body. If the source is unproven, mark the field # UNCONFIRMED: trusted from client and ask. | A client spoofing another actor by passing their id in the request body (declining an offer on behalf of any worker). |
| Async state & ordering | Does the trigger read or mutate state that a different async path writes (e.g. it checks assigned_worker, which a match.found consumer sets)? Can the trigger arrive before that write lands, or can a stale/replayed copy of that event arrive after the trigger and overwrite what it just changed? | Trigger no-ops because the state it depends on isn't there yet; a late or duplicate event resurrects state the trigger just cleared. |
| Delivery semantics & idempotency | Is the trigger event at-least-once? What is the dedup key, and which side dedups? If the dedup key changes (e.g. task_id → (task_id, attempt)), is the new key applied to every consumer of that event, not just the one you touched? | Duplicate delivery double-counting attempts; a dedup key fixed in one consumer but left stale in another. |
| Numeric limits & boundaries | Exact meaning of every limit ("max 3 retries" = 3 after the first attempt, or 3 total?). | Off-by-one in retry/attempt caps. |
| Identity & join keys | Which identifiers join entities across services (city_id vs free-text city, task_id vs match_id)? When a lookup can fail to resolve (free-text "Saint Petersburg" never maps to a city_id), what happens — and is that failure silent? | Joining on the wrong field; a lookup that silently degrades the result for every row. |
| Failure & terminal paths | For every branch that can dead-end (limit exhausted, and no candidates found, and a downstream returns empty), which state transition and which notification happen? Not just the happy path, and not just one terminal. | Silent terminal failures; a dead-end branch that logs-and-returns, leaving the aggregate stuck. |
| Write-path conformance | Must the new publish/persist go through this service's consistency.write_path.pattern (e.g. outbox)? May the publish error be swallowed? | Publishing straight to the broker outside the outbox; returning success after a swallowed error. |
Rules:
worker_id come from?"). A dimension is cleared only when every clause in it is answered or quoted from the contract — a dimension that bundles two questions is the most common place one half gets silently skipped.# UNCONFIRMED instead of asserting them.Summarise what the contract says, in 5–8 lines. Quote field paths (e.g. consistency.write_path.pattern: outbox) so the user can verify.
Draw a chat-only Mermaid diagram of the proposed change. Embed it in the response — do not write to disk. Highlight new/changed nodes with a :::new class:
flowchart LR
classDef new stroke:#0a0,stroke-width:2px;
svc[listing-service]
newEp[POST /listings/bulk]:::new
newEp --> svc
svc --> kafka>listings.created v1]
Propose YAML edits as a unified-diff snippet. Don't apply them — let the user accept, tweak, then run /archspec:sync. Example shape:
api:
endpoints:
+ - name: BulkCreateListings
+ protocol: HTTP
+ idempotency:
+ required: true
+ key_source: "header: X-Idempotency-Key"
+ storage: "redis: idemp:{key}"
+ contract: "api/openapi.yaml#/paths/~1listings~1bulk/post"
+ sla: { p99_latency: "300ms", availability: "99.9%" }
Deviation guard: when an edit crosses an existing boundary — changes who owns an action, adds a publish that sidesteps consistency.write_path.pattern, or relaxes an entry in service.invariants / consistency.cross_service_invariants — call it out in one line ("this deviates from <field>: <why>") and get explicit confirmation. A generated contract line must not silently ratify a design the user never affirmed.
Trace every new or changed event and key across all producers and consumers. Step 2 let you read only the slice that matters; this step is the exception — you must scan the full SERVICE_MAP.yaml set (every service's contract in the monorepo), because the "dedup fixed in one consumer but missed in another" class is invisible from a single slice. Operationally, for each event you add or change and each dedup / idempotency / join key you change:
events.published, events.consumed, and the topic/event name, plus any idempotency / dedup / join-key field that references it.If the fan-out cannot be fully determined from the contracts (a consumer's dedup key is undocumented, a topic's subscriber set is unclear), it is not a free pass — raise it as an open question and mark the affected YAML lines # UNCONFIRMED rather than assuming the fan-out is complete.
A single event must not carry two unrelated semantic roles. One task.reassignment_requested used both as the matching trigger and as a "we are reassigning" client notification fires the notification before a new worker is actually found — that is prohibited, not merely discouraged. Either propose separate events (one trigger, one notification emitted only after the outcome is known) or, if you cannot resolve the split yourself, block the YAML patch and surface it as an open question. Do not ship a single dual-role event with only a warning attached.
Flag invariants the user must preserve, citing service.invariants and consistency.cross_service_invariants.
8a. Materialise every risk as an edge_cases[] entry — the investigate→code bridge. This is the step that stops a finding from dying as chat prose. The strongest investigate output is worthless if it never reaches the code: an agent that writes the plan next, then dispatches implementation subagents, does not re-read this chat — it reads the contract. So for each gap, deviation, # UNCONFIRMED, join-key risk, and dead-end terminal you surfaced in steps 3–8, add a concrete edge_cases[] entry to the YAML patch:
edge_cases:
- id: EC-014
description: "worker city joins to geo by city_id, not free-text city_name; an unresolved city_id must fail loudly, never silently collapse the distance tie-breaker to a default"
test: "services/matching-service/usecase/matching_geo_test.go::TestEC014"
Why an edge_cases[] entry works where prose does not:
ARCHITECTURE.md, so the plan-writer and the implementation subagents see it without the chat history.test: path is enforced by DET-003 at commit time — the commit is blocked until that test file exists. A risk you cannot yet test is a risk you must at least name.Schema is closed — exactly three fields: id matches ^EC-\d+$ (use the next free EC-NNN); description is one line carrying the given/when/then; test is a path::TestName reference. The single highest-value entry is usually a join-key risk (the field a lookup resolves on) or a reuse-vs-recompute invariant (a snapshot that must not be recomputed) — exactly the findings that read as obvious in chat and then vanish. If a risk genuinely cannot be expressed as a test, it stays an open question, not a silent omission.
Self-review loop — turn the checklist on your own draft, not just the prompt. The clarify gate (step 3) interrogates the requirements; this step interrogates the design you just drew. Re-read your own diagram and YAML and walk the checklist again, plus the anti-pattern list below. Loop until a full pass surfaces nothing new — the first pass routinely does.
Anti-patterns to hunt in your own proposal:
assigned_worker) that arrives via a separate event consumer, and can the two reorder?(task_id, attempt) key reach every handler of that event, including the one in the owning service?maxDistance), killing a tie-breaker with no signal?FailTask) failed or was swallowed? The notification must be gated on the transition actually committing.open") tighter than the states the entity can actually be in when the branch fires (it may be assigned by then), so the terminal silently no-ops?edge_cases[] entry (step 8a): did every gap / deviation / # UNCONFIRMED / join-key risk become a testable edge_cases[] entry or an explicit open question, or is one still sitting in the narrative where the plan step will scroll past it?Record the outcome as a one-line note in the output using the literal prefix and shape Self-review: <N> pass(es), <what was found and fixed, or "no findings"> — write the count grammatically (1 pass, 2 passes). E.g. Self-review: 2 passes, found+fixed premature client notify and a stale dedup key; no remaining findings. Always emit this line, even on a clean first pass (Self-review: 1 pass, no findings). If a finding can't be resolved without the user, raise it as a new open question rather than shipping it.
End with the full loop, not just sync. The contract is only safe if code is checked back against it. Spell out the path: apply the YAML edits → /archspec:sync → implement → /archspec:validate (runs the behavioural linters — outbox, idempotency, optimistic-locking) → /archspec:check-architecture for any change that spans more than one service. A green build or passing unit tests is not a substitute for /archspec:validate: those tests usually cover only the happy path that was just written.
Spell out the Definition of done as a literal checklist, because the agent that closes the branch is often a separate finishing-a-development-branch pass that knows nothing about archspec and will otherwise stop at green unit tests:
edge_cases[] entry added in step 8a has a test that actually exercises it (not merely a file that exists);/archspec:validate is green, or every finding has an exceptions[] entry with a reason;/archspec:check-architecture is green for cross-service changes;# UNCONFIRMED marker is now resolved, or carried into an edge_cases[] entry or an ADR.A green go build / go test clears none of these boxes.
# UNCONFIRMED, and join-key risk must become an edge_cases[] entry (step 8a) or an explicit open question — never a sentence the plan step will scroll past.