From Ultragoal
Turns unstructured brain dumps into verifiable goals with rubrics, works autonomously until verified, and supports follow-up rounds on previous goals.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ultragoal:goalThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Turn the user's brief into an armed, self-correcting goal loop, then start working toward it.
experiment-guide.mdgoal-template.mdqa-capability-map.mdrubric-guide.mdrubrics/INDEX.mdrubrics/accessibility.mdrubrics/api-endpoint.mdrubrics/app-store-readiness.mdrubrics/bug-fix.mdrubrics/build-ci-speedup.mdrubrics/cli-tool.mdrubrics/dependency-upgrade.mdrubrics/documentation.mdrubrics/frontend-design.mdrubrics/nextjs-react-feature.mdrubrics/realtime-stability.mdrubrics/refactor-migration.mdrubrics/rn-feature.mdrubrics/security-pass.mdrubrics/test-suite-health.mdTurn the user's brief into an armed, self-correcting goal loop, then start working toward it.
The input may be an unedited speech transcript: expect filler words, self-corrections, topic jumps, and missing structure. Extract intent; never quote the mess back at the user.
User's brief:
$ARGUMENTSIf .ultragoal/ does not exist in the project root, run the ultragoal:setup skill first (it scaffolds directories, asks the preference knobs, and offers the CLAUDE.md block), then continue here.
Before asking the user anything:
.ultragoal/memory/MEMORY.md and any topic files relevant to the brief. Trust [VERIFIED] facts; treat [UNVERIFIED] ones as hypotheses..ultragoal/goals/archive/ for related past goals (especially their Decision journals and failure notes).Off-ramp — not everything needs the loop. If the consult shows a contained, reversible change you could finish and verify with a couple of commands — no real forks, no long unattended run — say so in one line and do the work directly in this conversation instead of arming a goal: same evidence discipline, no spec, no interview, no ceremony. The machinery exists for work a plain session can't hold; spending more tokens on ceremony than on the work is a failure mode. Arm anyway only if the user says they want the loop (the audit trail, or to walk away mid-run).
Goals are per-session: each lives in .ultragoal/goals/active/<slug>/goal.md with a session: field, and the gate enforces only the goal armed by the current session. So a goal active in another session does not block you — arm a new one freely; concurrent goals across sessions are the intended model. Only if this same session already has an active goal do you ask the user to pick: keep the current one (drop this brief), or replace it (pause/abandon per the ultragoal:stop protocol, then arm this).
One caution: sessions share the working tree. If another session's active goal touches files this goal will also touch — and any kind: experiment goal touches everything, since it commits and resets constantly — warn the user before arming and suggest running one of the goals in its own checkout instead (claude --worktree, or npx ultragoal run --worktree). The gate keeps the loops from interfering; only a worktree keeps the files from interfering.
Ask high-leverage questions only — the forks where different answers produce materially different work. A question earns its place when all three hold: (1) the answer changes what you build, not just cosmetics; (2) you genuinely can't pick it confidently from the brief, repo, and memory; (3) guessing wrong is expensive (rework, wasted budget, wrong direction). If a question fails any of these, don't ask it — answer it yourself from the codebase, or take the obvious default and note it.
The leverage usually lives in these forks (pick the 2–5 that actually matter for this brief):
Size the interview to what a wrong guess costs. Check interview-depth in .ultragoal/config.md (the user can override per-goal by saying "quick" or "deep/thorough interview"):
Whatever the depth, the same rules keep even a 25-question interview painless: every question concrete and decision-shaped — real options (not "what do you think?"), your recommended default first with a one-line why, so ratifying is one tap and overriding is deliberate. At most 4 questions per AskUserQuestion batch; multiSelect where choices aren't exclusive. When going deep, open with one line that sets expectations ("Big goal — about 5 short rounds, every question has a recommended default; accepting all defaults is a fine answer."). Skip any round the brief already settles; stop when a round stops changing the plan.
Never ask what the codebase can answer — go look. If running non-interactively (no user), don't ask: make the most defensible call on each fork and record every such decision explicitly in the spec's Context as an assumption.
First decide the goal's kind:
task (default): success is "this exists and works" — features, fixes, migrations, investigations.experiment: success is "this number improved" — latency, build time, size, cost, score — and one command can measure it. Read experiment-guide.md and compile the spec as a measure-and-ratchet loop instead of a checklist. If the user's brief is an optimize-ask but no reliable measure command exists, the spec's first rubric item is building one.Check the rubric library first: read rubrics/INDEX.md and if the brief matches a domain, load that template as your starting point — it carries research-backed thresholds and check commands. Adapt it to this repo (real commands, applicable items only); don't transplant blindly. Also scan the available skills in this session against the template's "Skills to pair" line and the task domain — if a matching skill exists (e.g. frontend-design for UI work, vercel-react-best-practices for React), plan to use it during execution and note it in the spec's Context.
Copy the structure from goal-template.md and write the rubric following rubric-guide.md — read it; rubric quality decides whether this loop converges. Also read qa-capability-map.md before finalizing the rubric. For every claim, choose the proof rung that matches the promise: static checks for wiring, tests for pure logic, browser/simulator screenshots or rendered-size assertions for UI pixels, click/deep-link checks for reachability, live/staging smokes for external seams, and paired failure-mode checks for new boundaries. A well-designed rubric is doing more work than the model.
Before showing the user, adversarially review your own rubric against the anti-pattern list in the guide (vague judgments, unmeasurable criteria, missing stop conditions, no incremental order, checks the repo can't actually run) — plus the defect taxonomy that judge research keeps finding: compound items bundling two claims (split them), redundant items double-counting one property (merge them), coverage gaps against the brief's own stated criteria (every acceptance criterion the user voiced must map to an item), missing must-NOT items for things that shouldn't happen, checks whose output is noisy or ambiguous (wrap them to emit one decisive line), and behavioral or visual claims verified only by a static proxy (a "renders"/"visible" item checked with grep/typecheck/lint/build proves wired, never renders — give it a real runtime observation or split wired-vs-renders). Three more from the same family, since structure-not-behavior is the loop's most common false positive: a claim that crosses a process/network boundary backed only by mocked tests (it needs a live smoke that actually succeeds against the real/staging dependency — mock what you're sure of, live-test the seam you're least sure of); a self-disclosed runtime gap ("won't run until the secret is set", "mocked for now") parked as a footnote instead of a blocking [ ] item; and a rubric that never checks how the feature's failure cascades into the app (on 4xx/5xx: clear error, graceful degrade, no global logout). Sequence at least one live, end-to-end slice early in the rubric rather than deferring all real exercise to a final manual round. Fix what you find.
Set the rigor — scale the loop to the model, and let the user choose per run. rigor has a default in .ultragoal/config.md (vanilla if unset), but it's a per-goal choice — so OFFER it as a dial at arm time (in the same batch as the depth dial below): the config value is the recommended default (one tap to accept), and the user can dial up to max or down to vanilla for this run without editing config. Skip the question only when the brief already settles it ("max mode" / "vanilla" / "standard") or the goal took the off-ramp. Rigor sets how much scaffolding the harness adds to compensate for model strength — it picks the loop's baseline, which the stakes × ambiguity × length sizing below then modulates:
verify: on (or off if the verification knob is off). This is today's behavior — add nothing.verify: on.verify: panel, and at the end dispatch three ultragoal:verifier subagents in one message (lenses: checks / refute / constraints), all three must PASS. Verify every claim as you check it (every-claim cadence). Multi-modal scout sweeps plus a completeness critic for read-heavy work. Favor the deep interview and offer rubric variants. The monitor is active.Whatever rigor selects, still default to less within the tier: a contained vanilla brief doesn't earn scouts; a small max goal still doesn't need five scattered agents (2–4 focused beat five; MacNet: 2–3 judges capture the gain — the panel is exactly 3, never more). Name the kit you chose and what you skipped in the spec's Context — a quietly smaller plan and silent gold-plating are both failures; the user's lever is informed consent.
Let the user own the dials. After drafting, pull out the 2–4 thresholds that define the contract — the latency bar, the coverage floor, how strict the constraints are, how deep to go — and put them to the user as one AskUserQuestion batch, recommended value first with the research behind it. A number the user chose is a number they'll trust at verification time; a number buried in a recap is one they'll dispute later. Skip this for thresholds the interview already settled.
The depth dial. Ask the budget as a depth question, never as a raw number — users steer effort in human terms, the same way the API's effort parameter replaced budget_tokens with named levels. Offer named tiers with your recommendation first and the cap as the parenthetical detail:
The chosen tier's cap goes in the spec's budget: as a plain integer. A "turn" is a gate-checked stop — the loop's own heartbeat, counted deterministically by the gate with zero machinery; it is a checkpoint trigger (when the gate demands an honest status report), not a spend meter. The user picks depth; the gate counts turns. An undersized budget pauses good work mid-flight — when in doubt between tiers, recommend the deeper one and say why. Skip the question entirely when the brief or interview already settles depth ("quick fix", "take the night").
The rigor dial. Ask rigor the same way, in the same AskUserQuestion batch as depth — they're the two effort dials, and a user steering one usually wants to steer the other. Offer the modes in human terms, the config default recommended first (one tap to keep):
The pick sets the spec's verify: (vanilla / standard → on, max → panel) and selects the loop kit described above. Skip only when the brief already said "max mode" / "vanilla" / "standard". Like depth, when in doubt between modes recommend the higher one and say what it buys.
For goals that earned a deep interview, draft the rubric at two or three contract levels — lean (core checks only, ship fast), standard (recommended), strict (production-grade: the domain template's full security/a11y/perf items) — and present them as previews in a single question so the user picks the bar. Drafting the variants costs minutes; it turns the user from spec-reader into contract-author, and the unchosen items go in the spec's Context as a noted non-goal.
First, write the finished spec as a draft: create .ultragoal/goals/active/<slug>/ and write the spec to goal.md inside it with status: draft in the frontmatter (if that directory already exists for a different session, add a short suffix to the slug). A draft is inert — the gate ignores it. This ordering is enforced, not advisory: a guard hook blocks the arm question if no draft exists, because the recap must be read back from a real artifact, not improvised.
Immediately run the rubric audit against the draft:
node <this-skill-dir>/scripts/rubric-audit.mjs .ultragoal/goals/active/<slug>/goal.md
Resolve <this-skill-dir> to the directory containing this SKILL.md; in the ultragoal source repo the equivalent shorthand is node scripts/rubric-audit.mjs <goal>. If the script is unavailable, do the same audit manually from the guide: missing check commands, placeholders, subjective items, behavioral claims proved by static checks, missing reachability checks, mock-only external seams, missing failure-cascade items, missing stop conditions, missing constraints where scope matters, and missing verifier item. A BLOCKER is a draft defect: revise the draft and run the audit again before recapping. A WARN must either be fixed or recorded in the draft's Context as an intentional tradeoff; do not hide warnings from yourself and then ask the user to arm a weak contract.
Then give the user a tight, skimmable recap built from that draft, so they can course-correct while it's still cheap. Five short parts, in plain language:
For any goal that adds or changes a screen, destination, or navigation, lead the recap with a "What you'll see" block — describe the built thing from the user's chair, never what the code does. Prose like "a Messages tab" hides exactly the disagreement that a picture surfaces: the user pictures a bottom-bar destination, the agent (reading a hidden route in the config) pictures a header icon, and nothing in a sentence catches it. Make the end state visible instead:
This replaces the prose of part 3 for UI work — a nav map is denser and clearer than three sentences about navigation, so it's not extra ceremony — and it's skipped entirely for backend/refactor goals, which instead confirm their own end-state artifact (an endpoint signature, the resulting file tree, a sample input→output). When the placement or layout has real alternatives, use AskUserQuestion's preview to show them as side-by-side ASCII so the user picks what it will look like, rather than approving a noun. Carry this same block into the finish summary, and — when the spec will be handed to another session — into the handoff prompt, so the locked picture travels with it instead of being re-derived from config. (The rubric must then check the placement, not just that the screen renders — see rubric-guide's reachability pattern.)
Then ask to arm with a standalone AskUserQuestion — exactly one question, header exactly Arm goal, options "Yes, arm it" / "Edits first" — in the same message as the recap, recap first. Never bundle the arm question into an interview batch (the guard hook blocks that too), and never ask it before the draft exists. This holds for every goal, including follow-up rounds in a session that has already run goals — earlier rounds never waive the recap, because each round's decisions and rubric are new. Keep the recap scannable — it's a confirmation, not the full spec dump; the draft file holds the detail.
On yes:
status: draft to status: active. The frontmatter already carries session: ${CLAUDE_SESSION_ID} (the gate enforces only this session's goal) and verify:: off if the verification knob is off; panel if rigor is max (the 3-lens panel sign-off); otherwise on. (Off means the gate accepts a fully checked rubric without a verifier pass.)0 to .ultragoal/goals/active/<slug>/.turns. (Experiment goals keep their results.tsv in this same directory, beside goal.md.)/ultragoal:stop, or the turn budget).On "Edits first": revise the draft and re-recap. If the goal is dropped entirely, delete the draft directory — never leave orphan drafts.
Then begin working immediately. Do not end the turn with a plan.
- evidence: \command` -> key output line. Never write a ULTRAGOAL-VERIFIEDline yourself — that verdict is the verifier's alone. The default verification cadence is **one pass at the final sign-off**: dispatch theultragoal:verifiersubagent before finishing, **passing it the exact path to this goal'sgoal.md** so it hashes and signs the right file; it audits the evidence ledger (a checked box without evidence is an automatic FAIL), re-runs every check itself, and appends the verdict. Dispatch it earlier only for an item that already failed verification, a check that looks shaky, or when .ultragoal/config.mdsetsverification-cadence: every-claim— and then pipeline it: dispatch in the background and keep building while it checks. Scoped early checks recordULTRAGOAL-INTERIM:lines, which the gate ignores; only a full-rubric pass earnsULTRAGOAL-VERIFIED. **If the goal's verify: panel** (rigor=max), the final sign-off is instead three verifiers dispatched in ONE message — lenses checks / refute / constraints, mutually blind — each appending its own lens=`-tagged verdict; the gate releases only when the latest verdict for all three is PASS on the current rubric.[ ] item, not a footnote — never close the goal as done on a mocked-only external seam. Build a thin slice that makes one real call as early as you can, instead of deferring all live exercise to a final manual round; the 5-minute "mint a real token, hit the real endpoint, read the real error" check is worth more than a hundred green mocks.<goal dir>/logs/<name>.log; tail or grep the log for the decisive lines when you need them. If the goal is genuinely blocked on a long background run (a suite, a deploy), don't burn budget on idle polling turns: set status: paused with a note, arm a completion watch, and flip back to active when it fires — pausing on a real blocker is honest, not quitting.! prefix so the output arrives in the session. One precise ask beats three vague ones.status: paused in the goal file, report honestly where every rubric item stands, and stop.npx claudepluginhub morphaxl/ultragoal --plugin ultragoalPrepares a GoalBuddy board for autonomous, long-running work: creates goal.md, state.yaml, and a structured task board for discovery, delegation, and execution. Use when work is broad, stalled, or needs a PM-owned rolling board.
Plans and autonomously builds a software task end-to-end. Recons codebase, researches best practices, decomposes into phases, then generates a single ready-to-paste /goal command that drives the entire chain to completion with retry and recovery.
Builds manifests to plan work, scope tasks, spec requirements, and break down complex tasks before coding. Converts needs into deliverables, invariants, and verification criteria for features, bugs, refactors.