From engine
The autonomous build engine. Use when the user says "use the engine for this", "run the engine on …", "/engine", or asks for an autonomous / long-horizon build that should design its own checks and verify its own work before handing back. Also fires for designing or running ANY autonomous loop — "set up a loop", "run this overnight", "let it iterate until…", "fan out agents", "create a workflow to…", a maker/checker pair, worktree parallelism, or any task whose done-condition you intend to express as a machine-checkable condition. Given any task, you understand the intent, DESIGN A GATE that can prove it done (and can fail), get the human's sign-off on the target, then build and self-verify against that gate until green — stopping at a committed slice. Carries the full loop discipline inline (gate design, verify panel, fan-out, context budgeting, retro) so it is self-sufficient for multi-day build runs; for the broader loop-design space (review / plan / infra modes, the full playbook, station templates) install the companion `agent-loops` / `review` / `planning` / `infra` plugins in this marketplace.
How this skill is triggered — by the user, by Claude, or both
Slash command
/engine:engineThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
One entrypoint for autonomous, self-verifying builds. Given any task you: understand
One entrypoint for autonomous, self-verifying builds. Given any task you: understand the intent → design a gate that can prove it done and can fail → get the human's yes on the target → build and self-verify against that gate until green → hand back a committed slice. This skill is self-sufficient for long autonomous runs — it carries the gate-design, verify-panel, fan-out, context-budgeting and retro discipline inline.
Deeper reference (companion plugins). The build pipeline below is the fast path and is self-sufficient for a build. For the broader loop-design space — the four modes (build / review / plan / infra), the full operating manual, field notes, and copy-paste station skeletons — install the companion plugins in this marketplace:
agent-loops — the foundation: the accumulated-judgment playbook (read it before
designing a non-build loop or any unusually large run) + the goal-template build-station
skeleton.review / planning / infra — the non-build modes, each with its own
trigger and station template.templates/engine-template.md — the
engine parameterized for your repo.Quick gut-check before you loop: (1) is there a command that can actually fail? no gate =
no loop. (2) is "done" an exit code / checked artifact, not a vibe? (3) dynamic Workflow → did
the user explicitly opt in? (4) is the iteration count capped? (5) does it respect the repo's
HARD RULES (ports, push flags, prod env)? (6) will it outlive a context window → ledger on disk,
/compact at phase boundaries. (7) does the goal include the retro before declaring done?
Bind to the active project first. The engine is project-agnostic; its gate is not.
Before designing anything, read the active repo's contract — CLAUDE.md / AGENTS.md,
package.json scripts (or Makefile / pyproject.toml / justfile), and any
.claude/docs loop or goals doc the project keeps. Those define this run's real checks,
hard rules, and (if present) deeper field notes. Where the project documents an aggregate
"done" gate or a goal template, prefer it over anything you'd invent.
The engine has no fixed gate. For each task you design the checks that prove "done" and that can fail, then run until they're green.
On Gate-1 approval, set /goal <approved DONE> and run to green without soft stops.
The ONLY interruptions:
Decide vs. surface — the autonomy boundary. Between the gates you own every reversible call: implementation details, naming, which helper, local refactors — decide it, log it, keep moving; asking would just be noise. Surface a fork only when it (a) changes the approved target / DONE, (b) is hard to reverse — a data migration, a public API or contract, an added dependency, anything destructive or outward-facing — or (c) trades a value the human owns (security / privacy posture, cost, product behavior) with no obviously-right answer. When you must surface mid-run, bring a recommendation + the alternatives, not an open question. Auto-deciding an (a)/(b)/(c) fork to "keep the run going" is the failure mode, not the speed-up.
Fix at root cause: never suppress a check (no lint-disable, no type-escape hatch like
any/# type: ignore) to make it pass. Invoking the engine IS the standing Workflow
opt-in — announce heavy fan-out (what's parallelized, rough scale) before it runs;
honor a +Nk budget directive. Pre-authorized is not invisible.
Blocked-boundary rule (anti-stuck): a phase whose gate depends on UNMERGED external work is "complete" once the unblockable part is done and the blocker is written to the ledger. Close at the achievable frontier; move to the next phase. Never spin on a gate you can't legitimately pass, never fake one.
STAGE 0 — UNDERSTAND & DESIGN THE GATE
superpowers:brainstorming for large./goal <DONE>, create the ledger.STAGE 1 — PLAN — planning is the highest-ROI scaffold there is: a written plan is reportedly the difference between a medium task landing ~20–30% of the time and ~70–80% of the time, for the cost of a few hundred tokens. So don't skip it on anything non-trivial. For any non-trivial fork, generate 2–3 candidate approaches and pick one with a one-line reason (the cheapest form of the judge-panel pattern) — the first approach that comes to mind silently commits you to its constraints and is rarely the best. Self-verify the chosen plan: grounding (does each step touch files/APIs that actually exist? verify the load-bearing assumption, don't build on a guess) + premortem (how could this be wrong?). Every phase names the gate it closes against. Log the alternatives + why-rejected to the ledger (so a later phase doesn't relitigate or silently contradict), then write the plan to disk.
STAGE 2 — BUILD — drive the project's goal/station template if it has one. Quality is built at write-time, not bolted on in review:
CLAUDE.md). The best diff
reads as if the file's own author wrote it.STAGE 3 — VERIFY — run the gate you designed; loop until green; re-run the real user journey after every fix (not "the last error is gone" — layered failures hide behind each other). Run the adversarial panel (see "The verify panel").
STAGE 4 — RETRO + COMMIT — promote ledger traps (see "Retro"); ONE pathspec commit of the focused slice (never bundle others' dirty files).
Assemble a gate from the pieces that fit the task; each must be able to fail; together they cover the DONE block. Use the active project's real checks — discover them, don't assume them.
Code correctness — find this project's checks (compose only what the task touches):
package.json scripts / Makefile / pyproject.toml / justfile and the repo's
CLAUDE.md / AGENTS.md. Identify the real commands for: type-check, lint
(note if it's configured to fail on warnings — e.g. --max-warnings 0 — and honor that),
unit tests, and any integration / component suite run separately from unit tests.verify script that
chains type-check + lint + tests into one exit code). Prefer it when it exists — it
is the project's own definition of the gate.New behavior with no check → write the failing test first, watch it fail, then pass it. User-journey changes → run the project's end-to-end / smoke path (e.g. an auth preflight then a smoke suite against the running app). If it needs auth or a live server and that's unavailable, STOP and ask for the human's re-auth/setup protocol — never bypass it. The smoke is load-bearing: unit tests that mock a dependency away are structurally blind to that dependency's integration bugs.
Prove the gate can fail before trusting it (a planted type error trips the type-check; a planted visual break is caught by the visual pass). A gate you haven't seen say "no" isn't yet a gate.
.claude/docs/design, a design-system doc, or CLAUDE.md).Run read-only verifier subagents prompted to break the work, not bless it; default each toward "refuted/failing unless proven otherwise."
correctness · one domain-invariant lens (the
one that pays — pick the task's invariant: a data-isolation / visibility boundary,
money/parity, auth/security, as fits the domain) · simplicity.Workflow agents — a panel that silently
errors is indistinguishable from one that found nothing.Reserve fan-out for genuinely large work (many files, audits, migrations); for a small
task a dynamic Workflow is just an expensive single agent. When you do fan out:
pipeline() by default — items flow through stages with no barrier; only use a
barrier (parallel()) when a stage genuinely needs ALL prior results at once (dedup,
early-exit on zero, cross-item comparison).A multi-phase run will exceed one context window; the main loop auto-compacts (lossy, not free). Engineer for it:
/compact at phase boundaries only — right after commit + ledger update (zero
in-flight state). Never mid-edit.claude -p whose contract is: read ledger → do the next unchecked phase → run
gates → commit → update ledger. /goal works in -p, so each phase can carry its own
completion condition.The engine's /goal is the outermost ring; phases nest under it, verify rounds under
phases, per-finding refute loops under those. Three rules: every level has its own
done-condition; inner iterations are cheaper than outer; a stuck inner loop fails UP
with its ledger state (never improvises a different approach — that's the parent's call).
Subagents can't type /goal; give them a goal by writing the DONE + stations into their
prompt — the parent's goal evaluator waits for them before judging "met?".
behavior matching no code in the tree · tests green but the feature visibly dead/blank · a failure that disappears without your fix explaining why · logs referencing files/models you didn't touch · silent success (an op with no side effects it should have had) · a gate that goes green on empty arrays. Every mocked boundary must name the station that exercises the real thing; one with none is a ledger-recorded gap, not a shrug. At phase end, sketch the verification matrix (paths × real-vs-mocked × envs) and list the unexercised cells.
<plans-dir>/<task>-progress.md, created at Gate 1)Source of truth; updated before every commit; survives compaction. Put it where the
project keeps plans (e.g. .claude/plans/) or alongside the work. Sections:
Near-miss pass: what was caught by exactly one station, by design or luck? Strengthen
single-sensor catches. Promote any trap that bit twice / cost a phase / would bite a
fresh session into the project's loop/lessons doc (or CLAUDE.md if it has none) as part
of the final commit; session-specific noise dies with the task. (Promotion to a cross-repo
home wants the stronger bar — independent convergence in a second repo with a different
stack.)
The project's own hard rules win; read them and obey them. Universally:
git push only; never --force / --no-verify / --mirror /
--delete / +refspec, and never the project's break-glass env flags, unless the human
explicitly authorizes it for this run.Creates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.
npx claudepluginhub wilrf/agent-markdowns --plugin engine