Stats

Actions

Available In

Tags

ultragoal

Tell Claude what you want once. It works until the job is verifiably done — and it gets smarter every time.

When you prompt, you are the loop

In a long agent session you do the real work around the typing: you read each output, catch the false claims, remind it what it forgot, and decide when it's done. The model types — you are the quality control, the memory, and the off switch. That's fine for five minutes; it collapses at five hours. Not because the model isn't capable — Fable 5 one-shots most single tasks — but because you don't scale. You can't review two hundred actions an hour or stay awake overnight, and the moment you stop watching, the system has zero verification left. A days-capable model run this way is capped at the speed of your attention.

Loop engineering is the fix. Instead of steering prompt by prompt, you design a small system around the model — a goal, a check, a memory, a stopping rule — and the system does the steering. You build the loop once; the loop does the prompting. The model is the easy part now; writing "done" in a form a command can check is the skill. (The full argument, with the research behind it: docs/loop-engineering.md.)

ultragoal is that loop, packaged — so a system, not you, does the verifying, remembering, and stopping. You ramble (a messy voice note is fine); it interviews you on the few forks that actually change the outcome, compiles a rubric where every line is checkable by a command, and arms a loop you can walk away from — turn after turn, session after session, until an independent verifier confirms the work holds and the lessons are written down. It's the goal-loop architecture Anthropic's engineers describe using with Fable 5 (the same one Claude Code ships natively as /goal), with everything the published workflow still assumes an expert wires by hand — a checkable rubric, a fresh-eyes verifier, a memory discipline, a goal that survives the session — built into the harness. That's how you actually leverage a model built to run for days: the structure holds the standard, so the model's full range isn't bounded by how long you can watch it. Goals on steroids.

Every mechanism in the loop is research-backed — verifier design, evidence ledgers, rubric architecture, memory provenance all trace to published results from Anthropic, DeepSeek, Alibaba, ByteDance, Tencent, and academic agent-systems work. The full mechanism→evidence map lives in docs/research-foundations.md, fed by dated research sweeps in docs/research/.

BRIEF ──► GOAL ──► LOOP ──► VERIFY ──► DISTILL │ │ │ │ │ ramble spec work fresh-eyes memory (voice) +rubric turns subagent grows ▲ │ └──── consult ◄──────┘ next session starts smarter

Four parts keep each other honest:

A real definition of done. Every goal becomes a spec whose rubric is checkable by commands — "tests pass", "p95 under 200ms" — never vibes. In the research's words: rubric design is the skill now; a well-designed rubric does more work than the model.

Fresh eyes, not self-review. A separate verifier agent — with no knowledge of how the work was done — re-runs every check and tries to prove the work wrong. Anthropic's guidance is blunt: fresh-context verifiers outperform self-critique. The gate releases only on the verifier's sign-off, and the worker is instructed never to write that verdict itself. (Like everything in Claude Code, this is a prompt-level boundary, not a sandbox — the rigor comes from the separation and the honest rubric, not from locking the worker out of a file.)

A loop that can't quit early. A gate blocks Claude from stopping while the goal is unfinished — and because the goal lives in a file, it survives /clear, restarts, and days away. Goals are per-session: run different goals in different sessions of the same repo at once, each gated independently. Same architecture as Claude Code's built-in /goal, with upgrades (see how the loop works).

Memory that compounds — for the whole team. Every goal ends by saving verified facts, working patterns, and dead ends into your repo. Fable-class models run the continual-learning progression — fail → investigate → verify → distill → consult — largely on their own once they have somewhere durable to write. ultragoal's somewhere is shared through git, so every teammate's Claude feeds and consults one brain, and provenance-tagged, so the memory can't quietly start citing its own guesses as fact.

ultragoal

Tell Claude what you want once. It works until the job is verifiably done — and it gets smarter every time.

When you prompt, you are the loop

  BRIEF ──► GOAL ──► LOOP ──► VERIFY ──► DISTILL
   │          │        │         │           │
   ramble    spec    work     fresh-eyes   memory
   (voice)  +rubric  turns    subagent     grows
                        ▲                    │
                        └──── consult ◄──────┘    next session starts smarter

Four parts keep each other honest:

A real definition of done. Every goal becomes a spec whose rubric is checkable by commands — "tests pass", "p95 under 200ms" — never vibes. In the research's words: rubric design is the skill now; a well-designed rubric does more work than the model.
Fresh eyes, not self-review. A separate verifier agent — with no knowledge of how the work was done — re-runs every check and tries to prove the work wrong. Anthropic's guidance is blunt: fresh-context verifiers outperform self-critique. The gate releases only on the verifier's sign-off, and the worker is instructed never to write that verdict itself. (Like everything in Claude Code, this is a prompt-level boundary, not a sandbox — the rigor comes from the separation and the honest rubric, not from locking the worker out of a file.)
A loop that can't quit early. A gate blocks Claude from stopping while the goal is unfinished — and because the goal lives in a file, it survives /clear, restarts, and days away. Goals are per-session: run different goals in different sessions of the same repo at once, each gated independently. Same architecture as Claude Code's built-in /goal, with upgrades (see how the loop works).
Memory that compounds — for the whole team. Every goal ends by saving verified facts, working patterns, and dead ends into your repo. Fable-class models run the continual-learning progression — fail → investigate → verify → distill → consult — largely on their own once they have somewhere durable to write. ultragoal's somewhere is shared through git, so every teammate's Claude feeds and consults one brain, and provenance-tagged, so the memory can't quietly start citing its own guesses as fact.

Ultragoal

Popularity

Health & Quality

What's Inside

Confidence

README

ultragoal

When you prompt, you are the loop

Similar Plugins

goalkeeper

goalbuddy

learning-agents

backpressured

ultragoal

When you prompt, you are the loop

Popularity

Health & Quality

Similar Plugins

goalkeeper

goalbuddy

learning-agents

backpressured

learning-goal

supergoal