Help us improve
Share bugs, ideas, or general feedback.
From supergoal
Plans and autonomously builds a software task end-to-end. Recons codebase, researches, decomposes into phases, and generates a /goal command for execution. Useful for features, refactors, or redesigns driven to completion.
npx claudepluginhub robzilla1738/supergoal --plugin supergoalHow this skill is triggered — by the user, by Claude, or both
Slash command
/supergoal:supergoal <describe what you want built, fixed, or shipped><describe what you want built, fixed, or shipped>The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are running the Supergoal workflow. The user's task is:
references/goal-format.mdreferences/phase-design.mdreferences/planning-depth.mdreferences/repo-state-comparison.mdscripts/detect-env.shscripts/detect-stack.shscripts/repo-state.shscripts/summarize-repo.shscripts/validate-phase.shtemplates/PROTOCOL.mdtemplates/ROADMAP.mdtemplates/STATE.mdtemplates/phase-goal.txtGenerates atomic PLAN.md files for hierarchical project planning in solo agentic dev with Claude. Covers briefs, roadmaps, phases; includes tasks, verification, checkpoints, success criteria.
Standard planning pipeline for complex Claude Code tasks: discover, classify, explore, detail, save, check, confirm, handoff. Automates task breakdown and progress tracking.
Creates iterative development plans for Replit Agent, breaking projects into phases with tasks, checkpoints, prompts, verification steps, and rollback strategies.
Share bugs, ideas, or general feedback.
You are running the Supergoal workflow. The user's task is:
$ARGUMENTS
Your job: plan deeply, then auto-execute under a single /goal until the task is verifiably complete across every phase.
The user's bar is high. Translate it into measurable criteria, not vibes:
If a phase can't be measured, it isn't a phase. Rewrite it until it can.
.supergoal/phases/phase-N.md (any length, no char budget)/goal with a short end-state condition; the user pastes once, and the agent inside that fresh /goal session executes phases sequentially with retry + fix-spec recovery + per-phase memory writeback, then runs a final audit that re-verifies the work against the original ROADMAP and self-heals any gaps before completion holdsTwo human gates only: clarifying questions for true gaps (Stage 1) and plan review (Stage 6). Everything else runs autonomously.
/goal, not a chain/goal in both Claude Code and Codex takes a short end-state condition, not a long task body. A fast evaluator checks the condition against the transcript after each turn and auto-continues until it holds. Supergoal v3 leverages this directly: one /goal covers the whole run; phase work lives in files the agent reads from disk; the condition is "all phases done, SUPERGOAL_RUN_COMPLETE printed." No char budget, no inter-session chain dispatch, no fragility.
SUPERGOAL_DIR=$(dirname "$(ls -1 \
"$HOME/.claude/skills/supergoal/SKILL.md" \
"$PWD/.claude/skills/supergoal/SKILL.md" \
2>/dev/null | head -n1)")
export SUPERGOAL_DIR
export SUPERGOAL_ROOT="${SUPERGOAL_ROOT:-.supergoal}"
mkdir -p "$SUPERGOAL_ROOT/goals"
echo "SUPERGOAL_DIR=$SUPERGOAL_DIR"
echo "SUPERGOAL_ROOT=$SUPERGOAL_ROOT"
All artifacts live under $SUPERGOAL_ROOT. Skill assets (scripts, references, templates) live under $SUPERGOAL_DIR.
Before doing anything else, sense what's available this session. This is what makes the run frictionless — if memory already knows the user's preferences, don't ask; if a tool isn't available, don't try to call it.
# Detect a memory directory. Common locations:
MEM_DIR=""
for cand in \
"$HOME/.claude/projects/-Users-$(whoami)/memory" \
"$HOME/.claude/memory" \
"$PWD/.claude/memory" \
"$SUPERGOAL_ROOT/memory"; do
[[ -d "$cand" ]] && MEM_DIR="$cand" && break
done
echo "MEM_DIR=$MEM_DIR"
if [[ -n "$MEM_DIR" && -f "$MEM_DIR/MEMORY.md" ]]; then
echo "--- MEMORY INDEX ---"
cat "$MEM_DIR/MEMORY.md"
fi
Read the index. Then selectively read individual memory files that look relevant to the task (feedback memories about the stack/domain, user role memories, related project memories). Don't dump them all into context — pull what matters.
Capture applicable memory hits in $SUPERGOAL_ROOT/applied-memories.md (one line per memory: name, why-applicable, what-it-changes). Surface them in Stage 1 as "Applied from memory: …" so the user can see what's being inherited and correct anything stale.
Tools differ between sessions and hosts (Claude Code vs Codex, different MCP server sets). Detect, don't assume:
mcp__claude_ai_Context7__resolve-library-id or similar is in the tool list. If absent, skip it; rely on training-cutoff knowledge + WebSearch if that's present.mobile-ios-design, clerk-auth, expo-dev-client) and note them in $SUPERGOAL_ROOT/applied-skills.md to invoke from inside phase goals if relevant.$SUPERGOAL_ROOT/STATE.md exists from a previous run, read it; resume rather than restart.Write detected tools to $SUPERGOAL_ROOT/tools.md. Stage 3 and the phase goals reference this file when deciding what to invoke.
If STATE.md exists and shows Status: IN_PROGRESS with a phase pending, do not re-plan. Print a one-line "Resuming Supergoal from phase N" and jump straight to Stage 6 (plan review) with the existing artifacts, or directly to Stage 7 (dispatch) if the user confirms resume.
Echo the task back in one sentence. Then classify it (tags can combine):
| Tag | Trigger |
|---|---|
greenfield | Request implies a new project; cwd has no .git/ or empty tree |
brownfield | Change in an existing repo |
bugfix | Mentions "bug", "broken", "fails", "regression" |
refactor | Mentions "refactor", "clean up", "restructure" |
ui | Mentions "design", "polish", "UI", "UX", "responsive", "redesign" |
Calibrate the question count to the context. Greenfield has no codebase to scan, so it needs enough verbal context to plan well — never artificially limit questions when material info is missing. Brownfield runs lean on recon, so questions are sparse.
A new project has no signal beyond the user's prompt + memory. The planner's job in Stage 1 is to enumerate every category that meaningfully shapes the plan, eliminate the ones already answered by memory or prompt, and ask about every remaining one. Don't stop until every material gap is filled.
Category checklist — work through this for every greenfield run:
| Category | Why it shapes the plan |
|---|---|
| Target platform / surface | iOS, Android, web, desktop, CLI, multi — the biggest fork. Different stacks, different phases. |
| Stack / framework preference | Next.js vs SvelteKit, Expo vs bare RN, FastAPI vs Django, Swift vs SwiftUI vs UIKit, etc. Affects every phase. |
| Design direction / aesthetic | Minimal-mono, brutalist, glass morphism, Apple-native, dashboardy-corporate, retro, etc. Determines tokens, component shapes, Polish phase content. |
| Integration anchors | Auth provider, database, payments, hosting, analytics, file storage, email — anything that locks in a vendor up front. |
| Scope cut-line | MVP-this-week vs full feature; what's explicitly out of scope vs deferred to v2. |
| Primary use case / audience | Solo-dev tool, team SaaS, public consumer app, internal admin — drives auth flow, onboarding shape, error tolerance. |
| Performance / scale constraints | "Realtime sub-100ms" vs "background batch ok"; expected traffic; offline-first or online-only. Only ask if non-trivial. |
| Data model anchors | If the prompt implies data, ask the shape ("users + posts? users + projects + tasks?"). Only if not obvious. |
Process:
AskUserQuestion tool ceiling) until every material gap is filled. Two batches is fine for greenfield; three is rare but allowed if a complex task genuinely warrants it.Anti-patterns:
The codebase plus recon scripts already answer most structural questions (stack, package manager, build/test/lint, conventions, what exists). Ask only for true gaps memory + prompt + recon leave open:
Most well-described brownfield tasks ask zero questions.
AskUserQuestion batch caps at 4 (tool limit). Greenfield can use multiple sequential batches; brownfield is one batch max.Run recon scripts in parallel. They populate context files under $SUPERGOAL_ROOT/.
bash "$SUPERGOAL_DIR/scripts/detect-stack.sh" > "$SUPERGOAL_ROOT/context.md"
bash "$SUPERGOAL_DIR/scripts/summarize-repo.sh" > "$SUPERGOAL_ROOT/repo-map.md"
bash "$SUPERGOAL_DIR/scripts/detect-env.sh" > "$SUPERGOAL_ROOT/context.md"
Read the outputs. Then print a 5-line summary to the user: stack, package manager, build/test/lint commands, notable modules (if any), risky areas. This is what tells them you've actually understood their codebase before planning.
This is the difference between a generic plan and a Supergoal. Spend real cycles here — but use only what's available.
Required regardless of tools:
$SUPERGOAL_ROOT/applied-memories.md — bake them into goals, constraints, or risk mitigations.Optional, use if available (check $SUPERGOAL_ROOT/tools.md):
$SUPERGOAL_ROOT/applied-skills.md (e.g. clerk-auth, mobile-ios-design), note them in THINKING.md as "consult <skill> skill during phase N" so the executor invokes them at the right moment.Write $SUPERGOAL_ROOT/THINKING.md with sections: Goals, Constraints, Risks, Dependencies, Open Questions (already-assumed), Memory hits applied, Tools/skills relied on, Best Practices Applied. Keep it tight — 1–2 pages. This is the substrate the roadmap derives from.
See references/planning-depth.md for the bar to clear here.
Break the work into as many phases as the task actually needs — no fixed count, no upper or lower cap. The right number falls out of the work itself: how many independently verifiable units exist between empty repo (or current state) and "done perfectly." A trivial change might need 2 phases; a typical feature 4–6; a full-stack greenfield app 8–12; a major migration 15+. Read references/phase-design.md for how to slice well — the short version:
Each phase has:
Three files, all under $SUPERGOAL_ROOT/:
ROADMAP.md — the plan (template at $SUPERGOAL_DIR/templates/ROADMAP.md).STATE.md — live progress file the executor updates per phase (template at $SUPERGOAL_DIR/templates/STATE.md).phases/phase-N.md — one work-spec file per phase (template at $SUPERGOAL_DIR/templates/phase-goal.txt, renamed conceptually to "phase spec"). Any length — these are read from disk by the executor, not passed to /goal, so no char budget.Each phase spec must include these markers so the agent and evaluator both have stable anchors:
SUPERGOAL_PHASE_START
Phase: <N> of <total> — <name>
Task: <one-line>
Mandatory commands: <list>
Acceptance criteria: <count>
Evidence required: <list>
Depends on phases: <list or "none">
[... full work description, acceptance criteria, evidence requirements ...]
[Agent will print SUPERGOAL_PHASE_VERIFY and SUPERGOAL_PHASE_DONE here during execution]
Validate each spec with bash $SUPERGOAL_DIR/scripts/validate-phase.sh .supergoal/phases/phase-N.md — it confirms the required markers exist. No char budget.
Before any /goal is dispatched, show the user the full plan and ask for explicit confirmation. The chain runs unsupervised once it starts, so this is the last cheap moment to correct course. Skipping this step is a bug.
Plan-time is the cheapest moment to catch the most expensive bugs (vague criteria, mis-sliced phases, weak dependencies). Before printing the summary, run one self-critique turn answering exactly three questions:
Output:
Self-critique: clean. and proceed.phase-N.md files and ROADMAP.md before printing the summary. Re-run validate-phase.sh on any touched spec. Surface the rewrites in the Stage 6 summary so the user sees the post-critique version, not the pre-critique one.Honesty check: this pass must produce findings or a "clean" verdict per run. If it silently always says "clean" on real plans, it's theater and we remove it in the next release.
Print a scannable summary in this exact shape:
✓ Plan ready for review. <N> phases.
Applied from memory:
- <memory hit 1>
- <memory hit 2>
(or: "none — clean run")
Phases:
1. <name> — <one-line deliverable>
2. <name> — <one-line deliverable>
...
N. Polish & Harden — every aspect verified
Stack: <stack> · pkg: <pm> · build/test/lint: <commands>
Key assumptions (correct any that are wrong):
- <assumption 1>
- <assumption 2>
- <assumption 3>
Top risks & mitigations:
1. <risk> → <mitigation>
2. <risk> → <mitigation>
3. <risk> → <mitigation>
Self-critique:
- <finding 1, or "clean">
- <finding 2 (optional)>
- <finding 3 (optional)>
(criteria rewrites applied in-place if any were flagged)
Artifacts:
Roadmap: .supergoal/ROADMAP.md
Progress: .supergoal/STATE.md (auto-updates)
Phase specs: .supergoal/phases/phase-1..N.md
Once you confirm, I'll print a ready-to-paste `/goal` line. Paste it
once and the chain runs through to completion, with auto-retry and
fix-spec recovery.
Then call AskUserQuestion with one question, header "Start chain?", offering concrete revision modes (not a vague "revise plan"):
/goal line; I paste it and the chain runs unsupervisedKeep options at 4 max. If the user picks any revision option, follow up with a second AskUserQuestion to pin down exactly what (e.g., "Which assumption?" with the assumptions listed). Apply the change, update ROADMAP/THINKING/STATE and the affected phase specs, re-run validate-phase.sh on each touched spec, then re-show the Stage 6 summary and ask again. Loop until "Start now" or user aborts.
Wait for the answer. Do not dispatch /goal until the user picks "Start now". Never assume confirmation; never start the chain on silence.
After Stage 6 returns "Start now" and before printing the /goal block, run a single pre-flight pass against the deduplicated mandatory commands. This catches the case where the baseline is already broken (e.g., pnpm build red before phase 1 ever ran) — without this, the 3-strike loop would thrash trying to "fix" phase 1 work that was never the cause.
Procedure:
phase-N.md spec and union their Mandatory commands: lines into a deduplicated set.Notable events line to .supergoal/STATE.md: <DATE> — Pre-flight green: <N> commands clean.PREFLIGHT_GREEN with the per-command summary.<DATE> — Pre-flight red: <cmd> exited <code>. to STATE.md.PREFLIGHT_RED with the failing command, exit code, last ~5 lines.AskUserQuestion ceiling): "Skip pre-flight, dispatch anyway" (replaces "Start now" — the user might know the baseline is intentionally broken, e.g., phase 1's whole job is to fix it) / "Adjust an assumption" / "Tweak a phase" / "Restructure phases". If "Skip pre-flight, dispatch anyway" → log <DATE> — Pre-flight bypassed by user. and proceed to Stage 7. Any other choice loops back through the normal Stage 6 revision flow; after the user finishes revising, Stage 6.5 re-runs.Honesty test: real command run, real exit code. The "skip anyway" option keeps the user in control — no forced re-plan if the baseline being red is the point.
/goal dispatch (one paste)Slash commands on both Claude Code and Codex fire only from user input — agent message text is never parsed as a command. So Stage 7 is not an automatic dispatch; it's an honest one-paste handoff. After explicit "Start now" in Stage 6:
STATE.md: Status: READY_TO_DISPATCH, Current phase: 1, and capture the baseline ref — set Baseline ref: to the output of git rev-parse HEAD 2>/dev/null || echo "no-git". The audit reads this to diff deliverables against the working tree.$SUPERGOAL_DIR/templates/PROTOCOL.md to .supergoal/PROTOCOL.md (the operating manual the executing agent reads at the start of the /goal session), and copy $SUPERGOAL_DIR/scripts/repo-state.sh to .supergoal/repo-state.sh (the complete-working-tree comparison helper the cleanliness + deliverable checks invoke; strategy in references/repo-state-comparison.md)..supergoal/phases/phase-N.md exists; run bash $SUPERGOAL_DIR/scripts/validate-phase.sh .supergoal/phases/phase-<N>.md on each./goal command — the condition below is short, instructional but measurable, and well under the 4000-char /goal argument limit:```
/goal "Execute all phases of .supergoal/ROADMAP.md sequentially. Read .supergoal/phases/phase-N.md for each phase; do the work; run mandatory commands; print SUPERGOAL_PHASE_VERIFY then SUPERGOAL_PHASE_DONE for each phase; follow the failure-recovery protocol in .supergoal/PROTOCOL.md if any criterion fails. After the last phase, run the FINAL AUDIT in PROTOCOL.md (re-verify against ROADMAP.md; re-run aggregated mandatory commands; spot-check criteria; on gaps, write audit-fix-<round>.md and execute inline). Only after AUDIT_COMPLETE, print SUPERGOAL_RUN_COMPLETE. Done when SUPERGOAL_RUN_COMPLETE appears in the transcript with one SUPERGOAL_PHASE_DONE per phase, AUDIT_COMPLETE printed before SUPERGOAL_RUN_COMPLETE, and no FAILURE_HANDOFF or AUDIT_HANDOFF this run."
```
Paste the
/goalline above into your input to dispatch the chain. From there it runs autonomously — auto-retry, fix-spec recovery, per-phase memory writeback — untilSUPERGOAL_RUN_COMPLETEappears.
/goal session, which reads PROTOCOL.md, ROADMAP.md, STATE.md, and the phase specs from disk and runs the loop documented in the next sections.Once /goal is active (you'll see the ◎ /goal active indicator on Claude Code), the per-turn evaluator keeps the agent working until the end-state condition holds. On Codex, the auto-continuation loop does the same. The agent inside the /goal session has zero special context from the Supergoal invocation; everything it needs is in the files on disk — by design.
/goal session)The agent's loop, repeated until SUPERGOAL_RUN_COMPLETE:
STATE.md → find current phase N..supergoal/phases/phase-N.md → full work spec.SUPERGOAL_PHASE_START block with values from the spec.SUPERGOAL_PHASE_VERIFY block (every criterion pass|fail + engineering checks + cleanliness checks — grep bash .supergoal/repo-state.sh added-lines <Baseline ref> (complete added/new lines since baseline, including uncommitted and untracked work) for stack-specific debug prints, session TODO/FIXME, dead imports; non-zero counts trigger 3-strike unless the phase spec declares Cleanliness override:).MEMORY_SAVED: <name> (or MEMORY_SAVED: none).SUPERGOAL_PHASE_DONE, update STATE.md (mark phase N complete, set Current phase = N+1, append events line).SUPERGOAL_RUN_COMPLETE yet. Run the Final audit (next section). Only after AUDIT_COMPLETE, print SUPERGOAL_RUN_COMPLETE with a 5-line summary. The /goal condition is now satisfied and clears.Per-phase VERIFY blocks are self-reports. A phase can pass its own check while a later phase silently breaks it (a type added in phase 2 violated in phase 5; tests that passed mid-run break after refactor; etc.). The audit closes that loophole by re-validating against the original ROADMAP.md, not against the run's own self-reports.
The audit runs once after the final phase. If it finds gaps, it writes a focused fix spec and re-runs itself. Cap at 3 audit rounds; on the 3rd round's failure, AUDIT_HANDOFF.
Audit steps:
AUDIT_START with round number, total phase count, criteria count, and the deduplicated set of mandatory commands to re-run.ROADMAP.md — pull every phase's acceptance criteria fresh from the original plan. Do not trust prior VERIFY summaries.SUPERGOAL_PHASE_DONE block? Surface any missing.AUDIT_GAP.console.log in app code" → re-check via ls/grep/cat.trust-prior-verify, do not re-run.
5b. Deliverable check — for each phase block in ROADMAP.md, parse the **Deliverables:** bullets. For each bullet that names a file path or glob, run bash .supergoal/repo-state.sh deliverable <Baseline ref> "<path>" — it checks the complete working tree (committed + staged + unstaged + deleted) against the baseline and detects untracked new files separately. missing (exit 1) → AUDIT_GAP: phase <N> deliverable "<bullet>" not present. Repository ground-truth — catches "agent said done but didn't ship," even when the run never committed. Strategy: references/repo-state-comparison.md.AUDIT_VERIFY block:
pass | fail | trust-prior-verify with evidenceDeliverables: block from step 5b — phase N / "<bullet>": present | missingAUDIT_GAPS with the list..supergoal/phases/audit-fix-<round>.md — a focused fix spec targeting only the failing criteria, with the original phase's VERIFY as the success gate, scope creep forbidden./goal, same 3-strike per-criterion protocol from regular phases).AUDIT_HANDOFF (full gap history + suggested next move), update STATE.md to BLOCKED, stop. Do not print SUPERGOAL_RUN_COMPLETE.audit coverage = re_verified / (re_verified + trust_prior) as a percentage (where re_verified = criteria with pass + deliverables marked present; trust_prior = criteria marked trust-prior-verify).AUDIT_COMPLETE with phases verified, commands re-run clean, criteria pass/trust-prior counts, deliverables present/missing counts, and the audit coverage %.SUPERGOAL_RUN_COMPLETE. If trust_prior / (re_verified + trust_prior) > 30%, prepend an honesty banner: ⚠ Audit coverage: X re-verified, Y trust-prior (Z%). Eyeball UI/UX before merging. Below 30%, print the plain coverage line without the warning prefix.The audit is the difference between "every phase passed its own self-report" and "the final state matches the plan I originally approved." That is the bar.
First failure of any criterion:
FAILURE_PROBE (what failed, what tried, root-cause hypothesis).STATE.md failure log.Second failure (auto-retry also failed):
FAILURE_ESCALATE..supergoal/phases/phase-N.fix.md (targets only the failing criterion, no scope creep)./goal — no new dispatch). On success, re-run the original phase's VERIFY block; on pass, advance to N+1.Third failure (fix spec also failed):
FAILURE_HANDOFF with: failing criterion, full probe history, three things tried, suggested next move.STATE.md: Status: BLOCKED. The user takes the wheel./goal condition will not be satisfied; the host's evaluator will keep evaluating but the agent should stop attempting and surface the handoff clearly.This recovers from flaky envs, simple typos, and missed deps automatically. Only real blockers escalate.
If the user sends any message during the /goal run, the agent pauses at the next phase boundary, addresses the message, and asks before resuming. Phase boundaries are after SUPERGOAL_PHASE_DONE and before reading the next phase spec.
Memory is load-bearing. Future runs start smarter because past runs wrote down what they learned. The phase execution loop's step 6 references these rules.
At each phase boundary, ask: "Did this phase surface anything a future Supergoal run on a similar task would benefit from knowing?"
Worth saving:
lib/auth/ not app/api/auth/")Write the memory file under the detected MEM_DIR using the standard name / description / metadata.type frontmatter. Link it from MEMORY.md. Print MEMORY_SAVED: <name> to the transcript. If nothing non-obvious this phase: print MEMORY_SAVED: none.
At the final phase, always write a project_<slug>.md memory pointing at the new/changed project (location, stack, status, ROADMAP link). Guarantees future Supergoal runs on the same project start from the latest state.
Never save: secrets, transient task details, ephemeral state. Bar is "useful to a future run." When in doubt, skip.
/goal, short condition. /goal takes an end-state, not a task body. Long content lives in files the agent reads from disk. This is the natural shape on both Claude Code and Codex..supergoal/phases/phase-N.md spec, run validate-phase.sh on it, then ask the user to resume (they can re-dispatch the same /goal or just say "continue"). No need to restart phase 1.references/planning-depth.md — what makes a plan deep enough to deserve "Super"references/phase-design.md — how to slice phases that auto-chain cleanlyreferences/goal-format.md — what /goal is on Claude Code + Codex, Supergoal's single-/goal shape, required transcript blocksscripts/detect-stack.sh — identifies language, package manager, framework, build/test/lint commands (brownfield)scripts/detect-env.sh — greenfield environment reconscripts/summarize-repo.sh — compressed repo map (brownfield)scripts/validate-phase.sh — checks a phase spec has the required SUPERGOAL_PHASE_START marker and a non-empty acceptance criteria sectiontemplates/ROADMAP.md — phase plan with dependenciestemplates/STATE.md — live progress filetemplates/phase-goal.txt — phase spec skeleton (work, criteria, evidence, mandatory commands)templates/PROTOCOL.md — phase execution loop, failure recovery, memory writeback (copied to .supergoal/PROTOCOL.md at dispatch)