Skill

supergoal

Plans and autonomously builds a software task end-to-end. Recons codebase, researches best practices, decomposes into phases, then generates a single ready-to-paste /goal command that drives the entire chain to completion with retry and recovery.

automation

developer-tools

Popularity

Stars

174

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/supergoal:supergoal <describe what you want built, fixed, or shipped>

User invocable

Model invocable

Inline context

Default effort

Argument hint<describe what you want built, fixed, or shipped>

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are running the Supergoal workflow. The user's task is:

Supporting Files

references/goal-format.mdreferences/phase-design.mdreferences/planning-depth.mdreferences/repo-state-comparison.mdscripts/claim-run.shscripts/detect-env.shscripts/detect-stack.shscripts/repo-state.shscripts/summarize-repo.shscripts/validate-phase.shtemplates/PROTOCOL.mdtemplates/ROADMAP.mdtemplates/STATE.mdtemplates/phase-goal.txt

SKILL.md

564 lines · ~10.1k tokens(exceeds 5k compaction limit)

Stats

LanguageShell

Stars174

Forks12

MaintenanceExcellent

Last CommitJun 7, 2026

Actions

View Source View Plugin View on GitHub View README

Supergoal

You are running the Supergoal workflow. The user's task is:

$ARGUMENTS

Your job: plan deeply, then auto-execute under a single /goal until the task is verifiably complete across every phase.

What "every aspect is perfect" means here

The user's bar is high. Translate it into measurable criteria, not vibes:

Functional — the feature works for the golden path and the obvious edge cases
Engineering — build, typecheck, lint, tests all pass; no new warnings
Polish — UX/copy, error states, empty states, loading states are handled
Hardening — security review, input validation, no obvious regressions
Verification — every phase produces transcript evidence the evaluator can see

If a phase can't be measured, it isn't a phase. Rewrite it until it can.

How this skill works (one-shot summary)

Available context — preload memory; detect available tools (Context7, WebSearch, MCPs, skills); resume any in-progress Supergoal state
Intake — restate, classify, ask enough questions to cover every material gap. Greenfield walks the full category checklist (platform, stack, design direction, integrations, scope, audience, perf, data model) in batches of up to 4 until everything material is filled in; brownfield asks 0–2 since recon answers most structural questions.
Recon — parallel codebase + environment scan
Deep think — research best practices with whatever tools exist (optional, not required); list top-3 risks + dependencies
Decompose — derive phase count from the task itself; no fixed cap
Write phase specs — one work-spec file per phase under $SUPERGOAL_ROOT/phases/phase-N.md (any length, no char budget)
Plan review — show summary + concrete revision menu; wait for explicit go/no-go
Hand off one ready-to-paste /goal with a short end-state condition; the user pastes once, and the agent inside that fresh /goal session executes phases sequentially with retry + fix-spec recovery + per-phase memory writeback, then runs a final audit that re-verifies the work against the original ROADMAP and self-heals any gaps before completion holds

Two human gates only: clarifying questions for true gaps (Stage 1) and plan review (Stage 6). Everything else runs autonomously.

Why one `/goal`, not a chain

/goal in both Claude Code and Codex takes a short end-state condition, not a long task body. A fast evaluator checks the condition against the transcript after each turn and auto-continues until it holds. Supergoal v3 leverages this directly: one /goal covers the whole run; phase work lives in files the agent reads from disk; the condition is "all phases done, SUPERGOAL_RUN_COMPLETE printed." No char budget, no inter-session chain dispatch, no fragility.

Locate the skill directory

SUPERGOAL_DIR=$(dirname "$(ls -1 \
  "$HOME/.claude/skills/supergoal/SKILL.md" \
  "$PWD/.claude/skills/supergoal/SKILL.md" \
  2>/dev/null | head -n1)")
export SUPERGOAL_DIR
# $SUPERGOAL_BASE holds ALL runs. Each run gets its own namespaced subdir under it
# (claimed in Stage 0) so two runs in the same working tree never clobber each other.
# The per-run dir — $SUPERGOAL_ROOT — is set in Stage 0, not here.
export SUPERGOAL_BASE="${SUPERGOAL_BASE:-.supergoal}"
mkdir -p "$SUPERGOAL_BASE"
echo "SUPERGOAL_DIR=$SUPERGOAL_DIR"
echo "SUPERGOAL_BASE=$SUPERGOAL_BASE"

All artifacts for a run live under $SUPERGOAL_ROOT — a per-run subdir of $SUPERGOAL_BASE, claimed in Stage 0. Skill assets (scripts, references, templates) live under $SUPERGOAL_DIR.

Stage 0 — Available context (memory + tools)

Before doing anything else, sense what's available this session. This is what makes the run frictionless — if memory already knows the user's preferences, don't ask; if a tool isn't available, don't try to call it.

Claim the run namespace (resume or fresh)

Do this first — before memory preload, recon, or anything that writes a file. Every run gets its own subdirectory under $SUPERGOAL_BASE, so two runs started in the same working tree can never overwrite each other's STATE/ROADMAP/phases (the v0.7 fix for concurrent-run clobbering).

# Look for an in-progress run to resume. Scan per-run dirs AND the legacy flat layout
# (.supergoal/STATE.md from pre-0.7 runs). A run is "active" unless its STATE.md Status
# is COMPLETE. (The unfilled template's "PLANNING → IN_PROGRESS → COMPLETE" arrow line
# is not a terminal COMPLETE, so it correctly reads as active.)
ACTIVE_RUNS=""
for s in "$SUPERGOAL_BASE"/*/STATE.md "$SUPERGOAL_BASE"/STATE.md; do
  [ -f "$s" ] || continue
  grep -Eqi 'status:\**[[:space:]]*complete[[:space:]]*$' "$s" && continue
  ACTIVE_RUNS="${ACTIVE_RUNS}$(dirname "$s")"$'\n'
done
printf 'Active runs in this tree:\n%s\n' "${ACTIVE_RUNS:-  (none)}"

Then decide:

Fresh run (default for a new task) — claim a unique namespace:
```
SUPERGOAL_ROOT="$(bash "$SUPERGOAL_DIR/scripts/claim-run.sh" "$ARGUMENTS")"
export SUPERGOAL_ROOT
echo "SUPERGOAL_ROOT=$SUPERGOAL_ROOT"   # e.g. .supergoal/add-dark-mode-Ab3Kx9
```
claim-run.sh uses mktemp -d to create-and-claim the dir atomically, so two simultaneous starts always get distinct dirs — the race that caused the overwrite is gone.
Resume — if an active run clearly matches this task (its STATE.md title ≈ $ARGUMENTS, or the user said "resume"/"continue"), set SUPERGOAL_ROOT to that run dir and follow the resume path (don't re-plan). If several active runs exist and intent is ambiguous, ask with one AskUserQuestion which to resume — or to start fresh.

Coexistence notice (load-bearing — print it). If ACTIVE_RUNS is non-empty and you're starting a fresh run, surface this before continuing:

⚠ Another Supergoal run is active in this working tree (<list>). Your planning artifacts are isolated under <SUPERGOAL_ROOT>, so they won't collide — but two /goal executions in the same working tree will still edit the same source files and clobber each other's code. Namespacing protects the plan, not the build. For true parallel execution, run each task in its own git worktree; or resume the existing run instead of starting a second.

That boundary is the honest one: namespacing removes the artifact overwrite that happens during planning; it does not make two autonomous builds in one tree safe.

Memory preload

# Detect a memory directory. Common locations:
MEM_DIR=""
for cand in \
  "$HOME/.claude/projects/-Users-$(whoami)/memory" \
  "$HOME/.claude/memory" \
  "$PWD/.claude/memory" \
  "$SUPERGOAL_ROOT/memory"; do
  [[ -d "$cand" ]] && MEM_DIR="$cand" && break
done
echo "MEM_DIR=$MEM_DIR"

if [[ -n "$MEM_DIR" && -f "$MEM_DIR/MEMORY.md" ]]; then
  echo "--- MEMORY INDEX ---"
  cat "$MEM_DIR/MEMORY.md"
fi

Read the index. Then selectively read individual memory files that look relevant to the task (feedback memories about the stack/domain, user role memories, related project memories). Don't dump them all into context — pull what matters.

Capture applicable memory hits in $SUPERGOAL_ROOT/applied-memories.md (one line per memory: name, why-applicable, what-it-changes). Surface them in Stage 1 as "Applied from memory: …" so the user can see what's being inherited and correct anything stale.

Tool discovery

Tools differ between sessions and hosts (Claude Code vs Codex, different MCP server sets). Detect, don't assume:

Context7 — available if mcp__claude_ai_Context7__resolve-library-id or similar is in the tool list. If absent, skip it; rely on training-cutoff knowledge + WebSearch if that's present.
WebSearch / WebFetch — available if listed. If neither, skip web research.
Project skills — check the available-skills list for domain-relevant skills (e.g. mobile-ios-design, clerk-auth, expo-dev-client) and note them in $SUPERGOAL_ROOT/applied-skills.md to invoke from inside phase goals if relevant.
Prior Supergoal state — handled above in "Claim the run namespace": active runs are detected per-namespace and either resumed (reuse their $SUPERGOAL_ROOT) or explicitly coexisted-with.

Write detected tools to $SUPERGOAL_ROOT/tools.md. Stage 3 and the phase goals reference this file when deciding what to invoke.

Resume detection

If you resolved to resume a run in "Claim the run namespace", read its $SUPERGOAL_ROOT/STATE.md. If Status is IN_PROGRESS / READY_TO_DISPATCH / BLOCKED with a phase pending, do not re-plan. Print a one-line "Resuming Supergoal from phase N ($SUPERGOAL_ROOT)" and jump straight to Stage 6 (plan review) with the existing artifacts, or directly to Stage 7 (dispatch) if the user confirms resume.

Stage 1 — Intake & clarifying questions

Echo the task back in one sentence. Then classify it (tags can combine):

Tag	Trigger
`greenfield`	Request implies a new project; cwd has no `.git/` or empty tree
`brownfield`	Change in an existing repo
`bugfix`	Mentions "bug", "broken", "fails", "regression"
`refactor`	Mentions "refactor", "clean up", "restructure"
`ui`	Mentions "design", "polish", "UI", "UX", "responsive", "redesign"

Calibrate the question count to the context. Greenfield has no codebase to scan, so it needs enough verbal context to plan well — never artificially limit questions when material info is missing. Brownfield runs lean on recon, so questions are sparse.

Greenfield — gather enough context to plan well

A new project has no signal beyond the user's prompt + memory. The planner's job in Stage 1 is to enumerate every category that meaningfully shapes the plan, eliminate the ones already answered by memory or prompt, and ask about every remaining one. Don't stop until every material gap is filled.

Category checklist — work through this for every greenfield run:

Category	Why it shapes the plan
Target platform / surface	iOS, Android, web, desktop, CLI, multi — the biggest fork. Different stacks, different phases.
Stack / framework preference	Next.js vs SvelteKit, Expo vs bare RN, FastAPI vs Django, Swift vs SwiftUI vs UIKit, etc. Affects every phase.
Design direction / aesthetic	Minimal-mono, brutalist, glass morphism, Apple-native, dashboardy-corporate, retro, etc. Determines tokens, component shapes, Polish phase content.
Integration anchors	Auth provider, database, payments, hosting, analytics, file storage, email — anything that locks in a vendor up front.
Scope cut-line	MVP-this-week vs full feature; what's explicitly out of scope vs deferred to v2.
Primary use case / audience	Solo-dev tool, team SaaS, public consumer app, internal admin — drives auth flow, onboarding shape, error tolerance.
Performance / scale constraints	"Realtime sub-100ms" vs "background batch ok"; expected traffic; offline-first or online-only. Only ask if non-trivial.
Data model anchors	If the prompt implies data, ask the shape ("users + posts? users + projects + tasks?"). Only if not obvious.

Process:

For each category, ask: did the user's prompt mention it? Does memory have a relevant preference?
If yes → use that, surface as "Applied from memory: …" or "From your prompt: …"
If no → that category becomes a question.
Ask all remaining questions in batches of up to 4 (the AskUserQuestion tool ceiling) until every material gap is filled. Two batches is fine for greenfield; three is rare but allowed if a complex task genuinely warrants it.
Within each batch, lead with the highest-leverage choices (the ones that change the phase shape most).

Anti-patterns:

Don't ask one batch and then plan around silent assumptions for the rest. If you're about to assume the design direction, the auth provider, AND the scope cut-line, that's 3 assumptions and one batch of follow-up is cheaper than getting it wrong.
Don't pad questions when memory/prompt already covers them. Reading "I want a SwiftUI iOS app with Liquid Glass" → don't ask "what platform?", "what stack?", or "what aesthetic?". Just ask about integrations, scope, and use case.
Don't ask micro-details that belong in plan review: naming, file paths, copy wording, color palette specifics, library minor versions, default test framework if the stack has one. Those go into ROADMAP.md as assumptions and surface in Stage 6's revision menu.

Brownfield — 0–2 questions, one batch

The codebase plus recon scripts already answer most structural questions (stack, package manager, build/test/lint, conventions, what exists). Ask only for true gaps memory + prompt + recon leave open:

Scope cut-line ("just this surface, or also touch the related ones?")
Compatibility surface ("backwards compat with the old API path, or break it?")
Primary fork when ambiguous ("which of these two existing patterns do you want me to extend?")

Most well-described brownfield tasks ask zero questions.

In both modes

Lead with "Applied from memory: …" and "From your prompt: …" so the user sees what's being inherited or read off before answering.
Each AskUserQuestion batch caps at 4 (tool limit). Greenfield can use multiple sequential batches; brownfield is one batch max.
If you genuinely need zero questions, say "No clarifying questions — proceeding from prompt + memory + recon." and move straight to Stage 2.
Never ask about anything you can responsibly assume — those go into the Stage 6 plan review for one-click correction.

Stage 2 — Recon (parallel)

Run recon scripts in parallel. They populate context files under $SUPERGOAL_ROOT/.

Brownfield path

bash "$SUPERGOAL_DIR/scripts/detect-stack.sh"   > "$SUPERGOAL_ROOT/context.md"
bash "$SUPERGOAL_DIR/scripts/summarize-repo.sh" > "$SUPERGOAL_ROOT/repo-map.md"

Greenfield path

bash "$SUPERGOAL_DIR/scripts/detect-env.sh" > "$SUPERGOAL_ROOT/context.md"

Read the outputs. Then print a 5-line summary to the user: stack, package manager, build/test/lint commands, notable modules (if any), risky areas. This is what tells them you've actually understood their codebase before planning.

Stage 3 — Deep think

This is the difference between a generic plan and a Supergoal. Spend real cycles here — but use only what's available.

Required regardless of tools:

Identify the top 3 risks: what's most likely to go wrong, what's hardest to undo, what's easy to miss until shipped.
Identify non-obvious dependencies: things that have to happen in a specific order or block other work.
Apply memory hits from $SUPERGOAL_ROOT/applied-memories.md — bake them into goals, constraints, or risk mitigations.

Optional, use if available (check $SUPERGOAL_ROOT/tools.md):

Context7 — if available, query current docs for any third-party SDK touched. Don't plan against stale APIs. If unavailable, lean on training-cutoff knowledge and call it out as an assumption ("planned against my training-cutoff understanding of Expo SDK — verify in phase 1").
WebSearch — if available, look up current consensus on patterns you're unsure about (auth flows, payment idempotency, accessibility standards). If unavailable, skip.
Project skills — if relevant skills are listed in $SUPERGOAL_ROOT/applied-skills.md (e.g. clerk-auth, mobile-ios-design), note them in THINKING.md as "consult <skill> skill during phase N" so the executor invokes them at the right moment.

Write $SUPERGOAL_ROOT/THINKING.md with sections: Goals, Constraints, Risks, Dependencies, Open Questions (already-assumed), Memory hits applied, Tools/skills relied on, Best Practices Applied. Keep it tight — 1–2 pages. This is the substrate the roadmap derives from.

See references/planning-depth.md for the bar to clear here.

Stage 4 — Decompose into phases

Break the work into as many phases as the task actually needs — no fixed count, no upper or lower cap. The right number falls out of the work itself: how many independently verifiable units exist between empty repo (or current state) and "done perfectly." A trivial change might need 2 phases; a typical feature 4–6; a full-stack greenfield app 8–12; a major migration 15+. Read references/phase-design.md for how to slice well — the short version:

Each phase delivers something verifiable on its own (it builds, it passes its own tests, you could ship it as a partial increment)
Phases have explicit dependencies (phase 3 depends on 1 and 2)
The last phase is always a "Polish & Harden" phase covering edge cases, error states, security, accessibility, copy, perf — this is how "every aspect is perfect" gets enforced
For UI work, include a dedicated visual polish phase with screenshot/visual evidence requirements
For brownfield, include an early safety net phase if test coverage is thin (add characterization tests before changing behavior)

Each phase has:

Name (5 words max, action-first: "Build auth foundation")
Why (1 sentence)
Deliverables (concrete files/features that will exist when done)
Acceptance criteria (5–10 measurable items)
Mandatory commands (build, typecheck, lint, test that must pass)
Evidence required (what the agent must print into the transcript to prove completion)
Dependencies (which prior phases must be done)

Stage 5 — Write the roadmap and phase specs

Three files, all under $SUPERGOAL_ROOT/:

ROADMAP.md — the plan (template at $SUPERGOAL_DIR/templates/ROADMAP.md).
STATE.md — live progress file the executor updates per phase (template at $SUPERGOAL_DIR/templates/STATE.md).
phases/phase-N.md — one work-spec file per phase (template at $SUPERGOAL_DIR/templates/phase-goal.txt, renamed conceptually to "phase spec"). Any length — these are read from disk by the executor, not passed to /goal, so no char budget.

Each phase spec must include these markers so the agent and evaluator both have stable anchors:

SUPERGOAL_PHASE_START
Phase: <N> of <total> — <name>
Task: <one-line>
Mandatory commands: <list>
Acceptance criteria: <count>
Evidence required: <list>
Depends on phases: <list or "none">

[... full work description, acceptance criteria, evidence requirements ...]

[Agent will print SUPERGOAL_PHASE_VERIFY and SUPERGOAL_PHASE_DONE here during execution]

Validate each spec with bash $SUPERGOAL_DIR/scripts/validate-phase.sh "$SUPERGOAL_ROOT/phases/phase-N.md" — it confirms the required markers exist. No char budget.

Stage 6 — Plan review & confirmation (hard gate)

Before any /goal is dispatched, show the user the full plan and ask for explicit confirmation. The chain runs unsupervised once it starts, so this is the last cheap moment to correct course. Skipping this step is a bug.

Stage 6a — Self-critique pass (cheap, runs once)

Plan-time is the cheapest moment to catch the most expensive bugs (vague criteria, mis-sliced phases, weak dependencies). Before printing the summary, run one self-critique turn answering exactly three questions:

Falsifiability: Is every acceptance criterion across every phase a yes/no test, not a vibe? Flag any that say "works", "good", "ready", "correct" without a measurable predicate.
Phase atomicity: Is any phase secretly two coherent units packed into one (deliverables that don't share a verify gate, names containing "and", split-able dependency lines)?
Weakest dependency: Where would a partial failure cascade worst? (e.g., phase 2 unblocks 3, 4, and 5 — if 2 ships shaky, three phases inherit the bug.)

Output:

If clean: record Self-critique: clean. and proceed.
If findings: list 1–3 specific findings (no padding). For falsifiability issues, rewrite the offending criteria in place in the affected phase-N.md files and ROADMAP.md before printing the summary. Re-run validate-phase.sh on any touched spec. Surface the rewrites in the Stage 6 summary so the user sees the post-critique version, not the pre-critique one.

Honesty check: this pass must produce findings or a "clean" verdict per run. If it silently always says "clean" on real plans, it's theater and we remove it in the next release.

Stage 6b — Summary print

Print a scannable summary in this exact shape:

✓ Plan ready for review. <N> phases.

Applied from memory:
  - <memory hit 1>
  - <memory hit 2>
  (or: "none — clean run")

Phases:
  1. <name> — <one-line deliverable>
  2. <name> — <one-line deliverable>
  ...
  N. Polish & Harden — every aspect verified

Stack: <stack> · pkg: <pm> · build/test/lint: <commands>

Key assumptions (correct any that are wrong):
  - <assumption 1>
  - <assumption 2>
  - <assumption 3>

Top risks & mitigations:
  1. <risk> → <mitigation>
  2. <risk> → <mitigation>
  3. <risk> → <mitigation>

Self-critique:
  - <finding 1, or "clean">
  - <finding 2 (optional)>
  - <finding 3 (optional)>
  (criteria rewrites applied in-place if any were flagged)

Artifacts:
  Roadmap: <run-root>/ROADMAP.md
  Progress: <run-root>/STATE.md (auto-updates)
  Phase specs: <run-root>/phases/phase-1..N.md

Once you confirm, I'll print a ready-to-paste `/goal` line. Paste it
once and the chain runs through to completion, with auto-retry and
fix-spec recovery.

Then call AskUserQuestion with one question, header "Start chain?", offering concrete revision modes (not a vague "revise plan"):

Start now — run pre-flight smoke check (Stage 6.5), then print the ready-to-paste /goal line; I paste it and the chain runs unsupervised
Adjust an assumption — pick one to change (will re-show plan)
Tweak a phase — change criteria, scope, or commands for a specific phase
Restructure phases — merge, split, add, or remove a phase

Keep options at 4 max. If the user picks any revision option, follow up with a second AskUserQuestion to pin down exactly what (e.g., "Which assumption?" with the assumptions listed). Apply the change, update ROADMAP/THINKING/STATE and the affected phase specs, re-run validate-phase.sh on each touched spec, then re-show the Stage 6 summary and ask again. Loop until "Start now" or user aborts.

Wait for the answer. Do not dispatch /goal until the user picks "Start now". Never assume confirmation; never start the chain on silence.

Stage 6.5 — Pre-flight smoke check

After Stage 6 returns "Start now" and before printing the /goal block, run a single pre-flight pass against the deduplicated mandatory commands. This catches the case where the baseline is already broken (e.g., pnpm build red before phase 1 ever ran) — without this, the 3-strike loop would thrash trying to "fix" phase 1 work that was never the cause.

Procedure:

Read every phase-N.md spec and union their Mandatory commands: lines into a deduplicated set.
Run each once. Capture exit code and last ~5 lines.
If all green:
- Append a Notable events line to $SUPERGOAL_ROOT/STATE.md: <DATE> — Pre-flight green: <N> commands clean.
- Print PREFLIGHT_GREEN with the per-command summary.
- Proceed to Stage 7.
If any red:
- Append <DATE> — Pre-flight red: <cmd> exited <code>. to STATE.md.
- Print PREFLIGHT_RED with the failing command, exit code, last ~5 lines.
- Re-show the Stage 6 summary with the failures surfaced and a revised menu (still 4 options to stay under the AskUserQuestion ceiling): "Skip pre-flight, dispatch anyway" (replaces "Start now" — the user might know the baseline is intentionally broken, e.g., phase 1's whole job is to fix it) / "Adjust an assumption" / "Tweak a phase" / "Restructure phases". If "Skip pre-flight, dispatch anyway" → log <DATE> — Pre-flight bypassed by user. and proceed to Stage 7. Any other choice loops back through the normal Stage 6 revision flow; after the user finishes revising, Stage 6.5 re-runs.

Honesty test: real command run, real exit code. The "skip anyway" option keeps the user in control — no forced re-plan if the baseline being red is the point.

Stage 7 — Hand off the `/goal` dispatch (one paste)

Slash commands on both Claude Code and Codex fire only from user input — agent message text is never parsed as a command. So Stage 7 is not an automatic dispatch; it's an honest one-paste handoff. After explicit "Start now" in Stage 6:

Update STATE.md: Status: READY_TO_DISPATCH, Current phase: 1, and capture the baseline ref — set Baseline ref: to the output of git rev-parse HEAD 2>/dev/null || echo "no-git". The audit reads this to diff deliverables against the working tree.
Copy the operating manual and comparison helper into this run's namespace, baking the run root into the manual:
```
sed "s#{{RUN_ROOT}}#$SUPERGOAL_ROOT#g" "$SUPERGOAL_DIR/templates/PROTOCOL.md" > "$SUPERGOAL_ROOT/PROTOCOL.md"
cp "$SUPERGOAL_DIR/scripts/repo-state.sh" "$SUPERGOAL_ROOT/repo-state.sh"
```
PROTOCOL.md is the manual the executing agent reads at the start of the /goal session; the sed substitutes the concrete run root for every {{RUN_ROOT}} placeholder, so the agent reads $SUPERGOAL_ROOT/STATE.md (etc.), never a placeholder. repo-state.sh is the complete-working-tree comparison helper the cleanliness + deliverable checks invoke (strategy in references/repo-state-comparison.md); it takes paths as arguments, so it needs no substitution.
Verify each $SUPERGOAL_ROOT/phases/phase-N.md exists; run bash $SUPERGOAL_DIR/scripts/validate-phase.sh "$SUPERGOAL_ROOT/phases/phase-<N>.md" on each.
Print a fenced code block with the ready-to-paste /goal command. Substitute the literal value of $SUPERGOAL_ROOT for every <run-root> below (e.g. .supergoal/add-dark-mode-Ab3Kx9) — the pasted line must contain the real directory, not the variable or the <run-root> placeholder. The condition is short, instructional but measurable, and well under the 4000-char /goal argument limit:

```
/goal "Execute all phases of <run-root>/ROADMAP.md sequentially. Read <run-root>/phases/phase-N.md for each phase; do the work; run mandatory commands; print SUPERGOAL_PHASE_VERIFY then SUPERGOAL_PHASE_DONE for each phase; follow the failure-recovery protocol in <run-root>/PROTOCOL.md if any criterion fails. After the last phase, run the FINAL AUDIT in <run-root>/PROTOCOL.md (re-verify against <run-root>/ROADMAP.md; re-run aggregated mandatory commands; spot-check criteria; on gaps, write <run-root>/phases/audit-fix-<round>.md and execute inline). Only after AUDIT_COMPLETE, print SUPERGOAL_RUN_COMPLETE. Done when SUPERGOAL_RUN_COMPLETE appears in the transcript with one SUPERGOAL_PHASE_DONE per phase, AUDIT_COMPLETE printed before SUPERGOAL_RUN_COMPLETE, and no FAILURE_HANDOFF or AUDIT_HANDOFF this run."
```

Follow the fenced block with exactly this one-line instruction:

Paste the /goal line above into your input to dispatch the chain. From there it runs autonomously — auto-retry, fix-spec recovery, per-phase memory writeback — until SUPERGOAL_RUN_COMPLETE appears.

Stop. Do not generate any further output. The Supergoal invocation ends here. The user's paste begins the autonomous run under a fresh /goal session, which reads PROTOCOL.md, ROADMAP.md, STATE.md, and the phase specs from disk and runs the loop documented in the next sections.

Once /goal is active (you'll see the ◎ /goal active indicator on Claude Code), the per-turn evaluator keeps the agent working until the end-state condition holds. On Codex, the auto-continuation loop does the same. The agent inside the /goal session has zero special context from the Supergoal invocation; everything it needs is in the files on disk — by design.

Phase execution loop (inside the single `/goal` session)

The agent's loop, repeated until SUPERGOAL_RUN_COMPLETE:

Read STATE.md → find current phase N.
Read <run-root>/phases/phase-N.md → full work spec.
Print SUPERGOAL_PHASE_START block with values from the spec.
Do the work; run mandatory commands; surface evidence into the transcript.
Print SUPERGOAL_PHASE_VERIFY block (every criterion pass|fail + engineering checks + cleanliness checks — grep bash <run-root>/repo-state.sh added-lines <Baseline ref> (complete added/new lines since baseline, including uncommitted and untracked work) for stack-specific debug prints, session TODO/FIXME, dead imports; non-zero counts trigger 3-strike unless the phase spec declares Cleanliness override:).
Memory writeback check — anything non-obvious learned? If yes, write a memory file under the detected MEM_DIR; print MEMORY_SAVED: <name> (or MEMORY_SAVED: none).
Print SUPERGOAL_PHASE_DONE, update STATE.md (mark phase N complete, set Current phase = N+1, append events line).
User-interrupt check — if a new user message has arrived since the last turn, pause and address it before continuing.
If N < total: loop to step 1 for phase N+1.
If N == total: do not print SUPERGOAL_RUN_COMPLETE yet. Run the Final audit (next section). Only after AUDIT_COMPLETE, print SUPERGOAL_RUN_COMPLETE with a 5-line summary. The /goal condition is now satisfied and clears.

Final audit (Stage 10 of the loop — before completion)

Per-phase VERIFY blocks are self-reports. A phase can pass its own check while a later phase silently breaks it (a type added in phase 2 violated in phase 5; tests that passed mid-run break after refactor; etc.). The audit closes that loophole by re-validating against the original ROADMAP.md, not against the run's own self-reports.

The audit runs once after the final phase. If it finds gaps, it writes a focused fix spec and re-runs itself. Cap at 3 audit rounds; on the 3rd round's failure, AUDIT_HANDOFF.

Audit steps:

Print AUDIT_START with round number, total phase count, criteria count, and the deduplicated set of mandatory commands to re-run.
Re-read ROADMAP.md — pull every phase's acceptance criteria fresh from the original plan. Do not trust prior VERIFY summaries.
Phase completeness check — scan the transcript: does every phase 1..N have a SUPERGOAL_PHASE_DONE block? Surface any missing.
Re-run aggregated mandatory commands once each (build, typecheck, lint, full test suite). Surface last ~10 lines + exit code for each. Any non-zero exit → an AUDIT_GAP.
Spot-check verifiable criteria — for each acceptance criterion across all phases:
- "File X exists" / "Function Y exported" / "Config key Z set" / "No console.log in app code" → re-check via ls/grep/cat.
- "Screenshot showed X" / "Manual smoke test passed" / other non-deterministic checks → mark trust-prior-verify, do not re-run. 5b. Deliverable check — for each phase block in ROADMAP.md, parse the **Deliverables:** bullets. For each bullet that names a file path or glob, run bash <run-root>/repo-state.sh deliverable <Baseline ref> "<path>" — it checks the complete working tree (committed + staged + unstaged + deleted) against the baseline and detects untracked new files separately. missing (exit 1) → AUDIT_GAP: phase <N> deliverable "<bullet>" not present. Repository ground-truth — catches "agent said done but didn't ship," even when the run never committed. Strategy: references/repo-state-comparison.md.
Print AUDIT_VERIFY block:
- Per-phase status (DONE present or missing)
- Each mandatory command's exit code
- Each criterion's pass | fail | trust-prior-verify with evidence
- Deliverables: block from step 5b — phase N / "<bullet>": present | missing
If any gaps:
- Print AUDIT_GAPS with the list.
- Write <run-root>/phases/audit-fix-<round>.md — a focused fix spec targeting only the failing criteria, with the original phase's VERIFY as the success gate, scope creep forbidden.
- Execute the fix spec inline (same agent, same /goal, same 3-strike per-criterion protocol from regular phases).
- On fix success: loop back to step 1 (round + 1). On 3rd round's failure: print AUDIT_HANDOFF (full gap history + suggested next move), update STATE.md to BLOCKED, stop. Do not print SUPERGOAL_RUN_COMPLETE.
If clean:
- Compute audit coverage = re_verified / (re_verified + trust_prior) as a percentage (where re_verified = criteria with pass + deliverables marked present; trust_prior = criteria marked trust-prior-verify).
- Print AUDIT_COMPLETE with phases verified, commands re-run clean, criteria pass/trust-prior counts, deliverables present/missing counts, and the audit coverage %.
- Print SUPERGOAL_RUN_COMPLETE. If trust_prior / (re_verified + trust_prior) > 30%, prepend an honesty banner: ⚠ Audit coverage: X re-verified, Y trust-prior (Z%). Eyeball UI/UX before merging. Below 30%, print the plain coverage line without the warning prefix.

The audit is the difference between "every phase passed its own self-report" and "the final state matches the plan I originally approved." That is the bar.

Failure recovery (3-strike, built into the protocol)

First failure of any criterion:

Print FAILURE_PROBE (what failed, what tried, root-cause hypothesis).
Append probe to STATE.md failure log.
Auto-retry the same phase once with the probe injected as feedback. Do not advance.

Second failure (auto-retry also failed):

Print FAILURE_ESCALATE.
Write a focused fix spec at <run-root>/phases/phase-N.fix.md (targets only the failing criterion, no scope creep).
Execute the fix spec inline (same agent, same /goal — no new dispatch). On success, re-run the original phase's VERIFY block; on pass, advance to N+1.

Third failure (fix spec also failed):

Print FAILURE_HANDOFF with: failing criterion, full probe history, three things tried, suggested next move.
Update STATE.md: Status: BLOCKED. The user takes the wheel.
The /goal condition will not be satisfied; the host's evaluator will keep evaluating but the agent should stop attempting and surface the handoff clearly.

This recovers from flaky envs, simple typos, and missed deps automatically. Only real blockers escalate.

Mid-run interruption

If the user sends any message during the /goal run, the agent pauses at the next phase boundary, addresses the message, and asks before resuming. Phase boundaries are after SUPERGOAL_PHASE_DONE and before reading the next phase spec.

Memory writeback rules (referenced by PROTOCOL.md)

Memory is load-bearing. Future runs start smarter because past runs wrote down what they learned. The phase execution loop's step 6 references these rules.

At each phase boundary, ask: "Did this phase surface anything a future Supergoal run on a similar task would benefit from knowing?"

Worth saving:

A library API quirk that wasn't in the docs
A user preference confirmed during this run ("user accepted dark-only UI without pushback")
A project-level fact ("auth lives in lib/auth/ not app/api/auth/")
A failure pattern + fix ("X always fails on first build; second build works")

Write the memory file under the detected MEM_DIR using the standard name / description / metadata.type frontmatter. Link it from MEMORY.md. Print MEMORY_SAVED: <name> to the transcript. If nothing non-obvious this phase: print MEMORY_SAVED: none.

At the final phase, always write a project_<slug>.md memory pointing at the new/changed project (location, stack, status, ROADMAP link). Guarantees future Supergoal runs on the same project start from the latest state.

Never save: secrets, transient task details, ephemeral state. Bar is "useful to a future run." When in doubt, skip.

Operating principles (read every run)

One /goal, short condition. /goal takes an end-state, not a task body. Long content lives in files the agent reads from disk. This is the natural shape on both Claude Code and Codex.
Frictionless is the goal. Memory + prompt + recon should answer most questions. Zero clarifying questions on well-described tasks is a win.
Adapt to available tools. Detect what's there (Context7, WebSearch, MCPs, skills). Use what's available; degrade gracefully without it. Never hard-require a tool that might not be present.
Memory is load-bearing. Preload at Stage 0, surface as "Applied from memory: …" in Stage 1, write back at every phase boundary.
"Perfect" is not a stopping condition — criteria are. Translate every "perfect" into observable, falsifiable criteria.
Two human gates, no more. Clarifying gaps (Stage 1 — walk the full category checklist for greenfield in batches of up to 4 until all material info is gathered; often zero for brownfield) and plan review (Stage 6). Between and after, autonomous.
The loop self-heals. Auto-retry once, then write a fix spec and execute inline, then escalate. Don't stop on first failure.
The evaluator only sees the transcript. Phase specs require the agent to surface their contract — START, commands, evidence, VERIFY, DONE — into the conversation, not just point at files.
Each phase is independently shippable in spirit. If phase 3 can't build/test on its own, the slicing is wrong.
The Polish & Harden phase is mandatory. It's how "every aspect is perfect" gets enforced.

When to deviate from the workflow

Very small task (< 1 hour of work, single file): tell the user this doesn't need Supergoal, suggest just doing it. Don't force the machinery.
The user pushes back on a phase during intake: collapse, re-plan, continue.
Mid-run interruption: if the user stops the run and asks for a change, update the affected <run-root>/phases/phase-N.md spec, run validate-phase.sh on it, then ask the user to resume (they can re-dispatch the same /goal or just say "continue"). No need to restart phase 1.

Reference files

references/planning-depth.md — what makes a plan deep enough to deserve "Super"
references/phase-design.md — how to slice phases that auto-chain cleanly
references/goal-format.md — what /goal is on Claude Code + Codex, Supergoal's single-/goal shape, required transcript blocks

Scripts

scripts/detect-stack.sh — identifies language, package manager, framework, build/test/lint commands (brownfield)
scripts/detect-env.sh — greenfield environment recon
scripts/summarize-repo.sh — compressed repo map (brownfield)
scripts/validate-phase.sh — checks a phase spec has the required SUPERGOAL_PHASE_START marker and a non-empty acceptance criteria section

Templates

templates/ROADMAP.md — phase plan with dependencies
templates/STATE.md — live progress file
templates/phase-goal.txt — phase spec skeleton (work, criteria, evidence, mandatory commands)
templates/PROTOCOL.md — phase execution loop, failure recovery, memory writeback (copied to <run-root>/PROTOCOL.md at dispatch, with {{RUN_ROOT}} substituted for the run root)

supergoal

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

supergoal

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Supergoal

What "every aspect is perfect" means here

How this skill works (one-shot summary)

Why one /goal, not a chain

Locate the skill directory

Stage 0 — Available context (memory + tools)

Claim the run namespace (resume or fresh)

Memory preload

Tool discovery

Resume detection

Stage 1 — Intake & clarifying questions

Greenfield — gather enough context to plan well

Brownfield — 0–2 questions, one batch

In both modes

Stage 2 — Recon (parallel)

Brownfield path

Greenfield path

Stage 3 — Deep think

Stage 4 — Decompose into phases

Stage 5 — Write the roadmap and phase specs

Stage 6 — Plan review & confirmation (hard gate)

Stage 6a — Self-critique pass (cheap, runs once)

Stage 6b — Summary print

Stage 6.5 — Pre-flight smoke check

Stage 7 — Hand off the /goal dispatch (one paste)

Phase execution loop (inside the single /goal session)

Final audit (Stage 10 of the loop — before completion)

Failure recovery (3-strike, built into the protocol)

Mid-run interruption

Memory writeback rules (referenced by PROTOCOL.md)

Operating principles (read every run)

When to deviate from the workflow

Reference files

Scripts

Templates

Similar Skills

Supergoal

What "every aspect is perfect" means here

How this skill works (one-shot summary)

Why one /goal, not a chain

Locate the skill directory

Stage 0 — Available context (memory + tools)

Claim the run namespace (resume or fresh)

Memory preload

Tool discovery

Resume detection

Stage 1 — Intake & clarifying questions

Greenfield — gather enough context to plan well

Brownfield — 0–2 questions, one batch

In both modes

Stage 2 — Recon (parallel)

Brownfield path

Greenfield path

Stage 3 — Deep think

Stage 4 — Decompose into phases

Stage 5 — Write the roadmap and phase specs

Stage 6 — Plan review & confirmation (hard gate)

Stage 6a — Self-critique pass (cheap, runs once)

Stage 6b — Summary print

Stage 6.5 — Pre-flight smoke check

Stage 7 — Hand off the /goal dispatch (one paste)

Phase execution loop (inside the single /goal session)

Final audit (Stage 10 of the loop — before completion)

Failure recovery (3-strike, built into the protocol)

Mid-run interruption

Memory writeback rules (referenced by PROTOCOL.md)

Operating principles (read every run)

When to deviate from the workflow

Why one `/goal`, not a chain

Stage 7 — Hand off the `/goal` dispatch (one paste)

Phase execution loop (inside the single `/goal` session)

Why one `/goal`, not a chain

Stage 7 — Hand off the `/goal` dispatch (one paste)

Phase execution loop (inside the single `/goal` session)