From spec-ops
Companion to `write-spec`. `write-spec` produces the foundational spec; this skill drives that spec — over **as many passes as it takes** — to a state that is **accurate, grounded, lean, and implementation-ready**, so the user can go straight from here into building. Loop until the readiness gate passes. Do not stop after one pass, and do not start implementing.
How this skill is triggered — by the user, by Claude, or both
Slash command
/spec-ops:refine-specThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Companion to `write-spec`. `write-spec` produces the foundational spec; this skill drives that spec — over **as many passes as it takes** — to a state that is **accurate, grounded, lean, and implementation-ready**, so the user can go straight from here into building. Loop until the readiness gate passes. Do not stop after one pass, and do not start implementing.
Companion to write-spec. write-spec produces the foundational spec; this skill drives that spec — over as many passes as it takes — to a state that is accurate, grounded, lean, and implementation-ready, so the user can go straight from here into building. Loop until the readiness gate passes. Do not stop after one pass, and do not start implementing.
Arguments: $ARGUMENTS
@-mention in the arguments above. If none is given, ask which file with AskUserQuestion.Read the spec in full before doing anything. Also read every doc, file, or sibling spec it links or refers to, so you review it in context.
Run the five-step pass below repeatedly (step 0 — ingesting any pending verify-spec amendments — runs once at the start). Keep looping until a pass produces no corrections, no open questions, and the readiness gate passes. Then stop. Announce each pass (e.g. "Pass 2") so the user can follow the convergence.
Verify → Reconcile → Resolve → Refine → Re-check ↺
A Stop hook blocks you from ending your turn until the spec is genuinely ready, so you cannot quit a pass early. It reads a ledger you maintain at:
/tmp/claude-refine-spec-${CLAUDE_SESSION_ID}.json
At the start of the run, and at the start of every pass, write the ledger with the Write tool (overwrite it each time — that also keeps it fresh so the loop doesn't expire mid-run). Write strict, valid JSON exactly matching the schema below — gate flags and resolved must be JSON booleans (true/false, not strings), and spec must be the correct absolute path. The hook validates the ledger and will block you with a correction message if it is malformed, so a typo can't silently disable the gate:
{
"spec": "<absolute path to the spec file>",
"gate": {
"claims_verified": false,
"no_open_questions": false,
"no_overengineering": false,
"no_bloat": false,
"implementable_cold": false,
"ac_complete": false
},
"openQuestions": [
{ "q": "short text of an open question you found", "resolved": false }
]
}
openQuestions; set its resolved to true only once the user has given it a disposition — a concrete answer or an explicit "leave it / defer".gate flag to true only when that dimension genuinely holds. The five flags map 1:1 to the Readiness gate below.TODO / TBD / FIXME / ??? / "to be decided" / "open question" / [NEEDS CLARIFICATION: …] — those block the stop too, so don't leave them in the spec.When every flag is true, every question is resolved, the spec is clean, and the ready spec is committed (the hook enforces the commit — scoped to the spec file — see Handoff), the hook removes the ledger and lets you stop. If the user redirects to unrelated work, delete the ledger file and stop instead of continuing to refine.
A prior verify-spec run may have left proposed acceptance criteria — behaviors its backward sweep found in the implementation that map to no AC (a missed requirement). They're carried over a /tmp handoff so you don't re-key them. At the very start of the run, check for them:
python3 "${CLAUDE_PLUGIN_ROOT}/scripts/spec_amendments.py" load <abs-spec-path>
AskUserQuestion: an intended proposal is a confirmed gap → offer to add it as a new AC-id; an unsure one → ask; an unintended one is scope-creep in the code to remove, not a spec change → flag it, don't add. Fold every accepted proposal into the Acceptance Criteria table as a new criterion (it then gets grounded by the normal loop like any other), then clear the handoff so it can't re-apply:python3 "${CLAUDE_PLUGIN_ROOT}/scripts/spec_amendments.py" clear <abs-spec-path>
This closes the verify→refine loop: verify-spec (read-only) proposes the missed requirement, refine-spec (with your confirmation) amends the spec. It never edits the spec on its own.
List every checkable claim in the spec: file paths, function / class / method names, table / column names, routes, config or env keys, library or framework behavior, "the system currently does X" statements, data shapes, and counts.
Dispatch parallel Explore subagents (the Task tool, subagent_type: Explore) to check these claims against ground truth, never against other docs — they are read-only and fast. Ground truth, in order of authority: the actual codebase at branch HEAD; the latest git commits (git log / git diff on the working branch — specs drift after out-of-band commits and dev→infra merges, so re-ground against HEAD rather than trusting the spec's own history); and, for infra/ops specs, live state via the named CLI (e.g. aws, gh). Treat sibling or "completed" specs as possibly stale — never as ground truth. Split the claims by area (e.g. one agent per subsystem, model layer, or route group) and scale the agent count to the spec: a short spec may need a single verifier; a large one, several. Give each agent the relevant spec excerpt plus its claim list, and require a structured verdict per claim:
For each claim return one of:
confirmed/wrong(give the correct value and where you found it —file:line, a commit SHA, or CLI output) /not found. Quote the supporting evidence. Do not speculate — if you cannot verify it, say so.
Run one more agent (or do it yourself) as a skeptic lens: read the whole spec hunting for internal contradictions, over-engineering, speculative scope, anything an implementer could not actually act on, and unstated non-functional constraints the change implies but never pins as an AC — performance, security, idempotency, limits, concurrency (the requirements most often dropped).
Dedupe the findings and bucket each one:
| Bucket | What it is | Action |
|---|---|---|
| Inaccuracy | Contradicts the codebase | Fix to the verified value |
| Open question | Cannot be verified; needs a human decision | Queue for Resolve |
| Over-engineering | Speculative, gold-plated, or beyond the stated goal | Propose cutting |
| Bloat | Text not needed to build it: historical/background prose, rationale for why / previously / originally it got this way, problem statements, changelog, speculative out-of-scope narrative, restated field names, duplication — classically a Checklist or plan that re-describes the Acceptance Criteria instead of indexing them. Keep: decision/config/field tables, the Acceptance Criteria table, load-bearing failure-mode rationale — the "doing X the obvious way breaks Y" that stops an implementer applying a wrong fix (tighten it, don't cut it) — and a Checklist collapsed to a thin code-area → AC-id index (collapse, don't delete; first move any fact living only in it into the body). | Cut |
For every genuine ambiguity, unverifiable assumption, or open decision that changes what gets built, ask the user with AskUserQuestion. Batch related questions into one call; ask as many as you need across passes. It is better to ask one too many questions than to let the spec ship an assumption. Never guess to fill a gap.
You do not need to ask about facts you already verified and corrected against the codebase — fix those directly and note them in the final summary.
Apply, as one coherent edit per pass:
write-spec philosophy: say things once, in the right place; describe behavior, not implementation; show with tables / mermaid / examples instead of prose; bold the key terms; every sentence must earn its place.| AC | Criterion |) that captures every functional requirement and constraint as a discrete, atomic, testable assertion. Promote anything that exists only in prose into a criterion, split compound ones, and confirm each is an observable end-state (not a task). If the spec already carries a Validation, test-plan, or acceptance section, split what it holds: each assertion becomes an AC-id in the table, while any verification step it documents (how to check on staging, what to observe) is the human analog of a per-AC verification method — keep those as a lean section that cites the AC-ids rather than restating the assertions. Never leave two parallel sets of assertions. Cross-check coverage both ways: every criterion is addressed by the Checklist/plan, and every behavioral rule in the body maps back to an AC-id. The Checklist, if present, is that coverage turned into a task index — grouped by code area, each item citing the AC-id(s) it lands, never restating the criteria; if it currently paraphrases the ACs, collapse each item to a one-line code-area pointer after moving any Checklist-only fact into the body. This table is load-bearing — never cut it as bloat.### 1. <capability> — start here, ### 2. <capability> — needs §1) and add a needs §X header edge only for a real dependency you have grounded against the codebase — never a guessed or scheduling order. needs §X is the only binding order; group sequence is otherwise a suggested reading order. Prefer ≥2 ACs per group (a solo-AC group is allowed only for a genuine "start here" capability, e.g. a walking-skeleton AC). No dates, time-boxes, or effort estimates — order is dependency-derived only. If grouping would exceed ~5–6 groups, surface it — but distinguish the cause: many groups because the spec bundles independent changes → recommend splitting the spec; many groups for one coherent change with real cross-group dependencies → keep it whole and note that launch-spec will phase the build by group (R3). The trigger to split is independence, not the count. AC-ids stay globally unique and stable; a one-group spec is just a single table.AC-id — these are the most-dropped requirements, and a capable implementer will silently skip what isn't written. For every behavioral AC, run the completeness probe — what initiates this, under what precondition, and what's the observable bound? — and close any gap it exposes. This is a prompt for finding holes, not a syntax to impose: keep each AC a plain testable sentence (no EARS/Gherkin templates, no type tags), and exempt pure-math / decision-table criteria and the Boundaries section, which have no stimulus-response shape. Convey meaning over template.Preserve every detail an implementer needs. Simplify wording and structure, never silently drop substance. If you are unsure whether a detail is load-bearing, ask before cutting it. Keep edits reviewable as a clean git diff.
Prove no silent loss on a rewrite. If the spec is already tracked in git and this pass rewrote or heavily condensed it, diff the result against the prior committed version (git diff, or git show HEAD:<path>) and surface a short removed: list of any non-bloat content you cut, so the user can veto a wrongful drop. Skip this for a brand-new, untracked spec.
Re-read the edited spec. Edits can introduce new claims, new ambiguities, or new contradictions. If the pass changed anything, loop and verify again. If a full pass produced no fixes and no questions, evaluate the readiness gate with the independent judge (below) — don't sign off on your own work. Before you try to stop, make the ledger reflect reality — unresolved questions marked, gate flags set only where they truly hold. The Stop hook bounces you back here if anything is still open.
Finish only when all of these hold. Report the gate's status at the end of each pass. Each maps to a ledger flag (in parentheses).
The agent that did the refining does not get to declare it done. Before setting any gate flag to true, dispatch a fresh readiness judge — an independent subagent (Task) with no memory of your edits — and hand it the current spec plus the criteria below. Instruct it to be adversarial: read the spec and the codebase and hunt for any remaining inaccuracy, ambiguity, over-engineering, bloat, missing detail that would block implementation, any functional requirement or unstated non-functional constraint (performance, security, idempotency, limits, concurrency) not captured in the Acceptance Criteria table as a discrete, testable assertion; classify each finding as [Gap] (something required is missing), [Ambiguity] (more than one reasonable reading), or [Conflict] (two parts disagree), name the exact AC-id for every coverage finding ("AC-7 is not captured", never "some criteria missing" — precision drives recovery of dropped items), and return a per-criterion PASS/FAIL with specific reasons. Set each gate flag true only for the criteria the judge passes; every FAIL becomes findings for another pass.
claims_verified)[NEEDS CLARIFICATION] markers, or contradictions remain anywhere in the spec. (no_open_questions)no_overengineering)no_bloat)implementable_cold)ac_complete)When the gate passes, give a short summary: what you corrected, what you cut, and which open questions you resolved (with the user's answers). State plainly that the spec is ready to implement. Then commit the ready spec — scoped to that one file:
python3 "${CLAUDE_PLUGIN_ROOT}/scripts/spec_git.py" commit <abs-spec-path> "docs(spec): {spec name} ready for implementation"
The helper commits only the spec file (never git add -A, never other staged changes, never a push) and no-ops if it isn't a git repo. The Stop hook enforces this: while the spec file has uncommitted changes in a git repo it will not let the turn end — so commit after your final edit. The hook clears the ledger and releases the stop once the gate passes and the spec is committed. Stop there — do not begin implementation. To build it, hand the ready spec to launch-spec, which compiles it into a /goal driver; run that, then gate with verify-spec.
npx claudepluginhub zakattack9/agentic-coding --plugin spec-opsCreates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.