From okra
A workflow for turning goals, objectives, missions, measurable improvements, or decision-oriented research into a self-correcting OKR loop with measured anti-goal guardrails. Use when the user is setting OKRs, planning toward a metric, asking how to make progress without breaking a constraint (budget, quality, risk, trust), or wants delegated loop execution with human control. Trigger even for "help me set a goal", "what should my OKRs be", "I want to grow X", or open-ended research that needs DKR discovery to reduce uncertainty. Produces candidate objective metrics and anti-goals for ratification, then the structured loop: objective, anti-goal, decomposition, eval points, flags, and operating cadence.
How this skill is triggered — by the user, by Claude, or both
Slash command
/okra:reverse-tornado-okrThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill is a workflow. Given any goal, it sets the goal up so an LLM can drive most of the work
This skill is a workflow. Given any goal, it sets the goal up so an LLM can drive most of the work while a human keeps the direction. The core picture is a reverse tornado: wide guessing on day one, narrowing loop by loop into known work, bounded the whole way by a wall it cannot cross (the anti-goal), stopping when the metric hits target. Another useful picture: the objective is the maze exit, anti-goals are traps, and discovery maps enough of the maze that the loop can keep moving without blindly sprinting into danger.
Apply the steps below in order. Do not skip the anti-goal - it is what makes the rest safe.
On receiving a goal-like, mission-like, improvement, or research request, first draft the frame. Do not wait for the user to phrase it as an OKR. Propose a candidate objective with the metric that would prove success, a target or target range when the source facts support one, and a short reasoning note for why that metric is the right proof. If the request is research, the first candidate objective can be decision-quality: "reduce uncertainty enough to decide X", with a metric such as confidence, decision readiness, evidence coverage, or risk retired. Mark it candidate until the human ratifies it.
Pin two things before any planning:
Anti-goals can be reusable across runs. When the user has not named one, propose candidate anti-goals from the current skill and any available past run context: budget/spend, quality, trust, safety, privacy/data boundaries, authority drift, time/DKR budget, storage integrity, and eval regression, stale or unproven learning-memory reuse, and single-LLM-truth acceptance. Pick the ones that match the current goal and give each a metric. The user may add, remove, or alter anti-goals; treat that as healthy frame negotiation, then freeze only the human-ratified set.
Why the anti-goal matters: the cheapest way to hit almost any objective is to wreck something unmeasured. Sales rises fastest by blowing the budget. The anti-goal is the wall that stops the loop from narrowing toward that disaster. A goal without a named anti-goal is a goal you can hit in a way you will regret.
The frame always moves through two states: candidate frame -> human-ratified frame. If the frame is not fully knowable on day one (often true - you may not yet know what to track), that is expected. The first discovery surfaces candidate key results and anti-goals; the human ratifies them, then they freeze. Authority here is the human's call, not the loop's invention.
For delegated or automated loops, also ratify the action envelope: allowed move classes, forbidden actions, spend caps, data boundaries, blast-radius limits, irreversible-action gates, rollback expectations, and which moves need approval. A move can be metric-safe and still outside authority.
Do an anti-goal coverage review for real stakes: list the candidate harms considered, selected guardrails, rejected guardrails with rationale, non-negotiable tripwires, owners, and review cadence. One measured wall is required; documenting what it does not cover keeps it honest.
Decompose work into exactly three kinds. Keep them distinct; blurring them is where these systems rot.
The loop runs as an orchestrator directing disposable workers. This split is not cosmetic - it carries the authority lines. Each tier hands control up when it reaches the edge of its authority.
Orchestrator - the loop's brain. Holds the frame read-only (objective, anti-goal, thresholds, metric contracts, and action envelope - all human-set). It owns the OKR board, governs check-ins, decides the next move, runs the three-point anti-goal eval (especially admissibility, before dispatch), spawns and budgets workers, reads the direct objective/CKR/anti-goal metrics, raises the flags, and is the only part that talks to the human. It also owns the DKR learning gate: no CKR or PKR is promoted onto the working board until a DKR worker has returned a learning checkpoint with a decision target, evidence, probability/confidence updates, and risk/anti-goal implications. The checkpoint must make the next steering decision safer or clearer; "we learned things" is not enough. The orchestrator steers within the cone. It never executes work itself and never edits the frame. It also does not stop because a board, branch, PKR list, or worker queue is complete; it keeps checking, steering, and dispatching until the objective target is achieved, a human changes/stops the frame, or a blocking flag requires human resolution. Every serious artifact should state this loop-ownership rule explicitly.
Workers - the hands. Scoped, disposable, parallelizable. Two kinds, matching the executable units. There is no CKR worker; CKR remains orchestrator-owned context.
The cardinal worker rule: a worker that hits an unknown mid-run does not improvise - it hands back to the orchestrator, which decides whether to spawn a discovery worker. A worker's authority ends at the edge of its scope. It cannot screen its own moves against the anti-goal (that is the orchestrator's admissibility job), cannot decide to call the human (it reports; the orchestrator decides if it is a flag), and cannot change scope.
Workers must report progress in a durable place. For long runs, use an explicit run store such as
.okra/runs/<run-id>/workers/<worker-id>/progress.jsonl, written at each worker finish, when an
unknown is hit, and on a timed heartbeat. Ten minutes is a good default heartbeat for live subagent
work unless the human sets a different cadence.
The authority gradient, made concrete:
human owns the frame -> orchestrator works inside it and makes the loop's calls -> workers execute inside their scope and hand back at their edge.
Before running any move, set up storage so the loop is safe to interrupt and resume - the human can step in anytime, a worker can crash, a run can restart. Without it, re-running replays side effects: the discount applies twice, the expense double-counts against the wall.
The rule: every state-changing move gets a stable idempotency key; the store records whether it ran and what it produced; the orchestrator checks the store before dispatch and writes the outcome after. Re-running a known key returns the stored result instead of repeating the effect.
Persist five things: the frame (write-once, read-only - this is also what freezes the human-set frame so the loop cannot rewrite it), the tree, per-move results (write-once per key), an append-only ledger of direct metric and anti-goal readings (never overwritten, so guardrail history cannot be quietly rewritten greener), and raised flags.
When a run produces many files, artifacts, check-ins, or progress summaries, add the integrity rule:
append-only records are the source of truth; status/progress files are generated views. Store
important content by hash, append check-ins and flags as records, and verify the store before resume
or before reporting success. A stale or contradictory generated status is a signal, not evidence.
When multiple OKRA loops may run in one workspace, keep .okra/content/sha256 shared but put each
loop's mutable state under .okra/runs/<run-id>/; do not let concurrent loops share one ledger,
flag log, check-in log, worker directory, move-result directory, or generated status.
In scored or delegated harness work, avoid ungoverned direct reads and writes: important content
reads should be by content hash or recorded source check-in, and important writes should go through
the store helper or record target path plus content hash. Also avoid single LLM truth: an
agent's own final answer is not proof of progress, storage integrity, or governed read/write. Accept
claims only when backed by deterministic evidence, store records, hashes, changed-path checks,
human ratification, or independent review.
For delegated or scored run stores, use the exact frame/tree contract so verification can catch
drift. frame/frame.v1.json must include frame_version, frame_hash, objective, anti_goals,
metric_contracts, action_envelope, and human approval or ratification evidence.
tree/tree.v1.json must include tree_version, frame_version, orchestrator, dkrs, ckrs,
and pkrs. Do not replace orchestrator with a vague ownership note. The orchestrator entry
must say it owns objective checks and subagent steering; DKR and PKR entries are worker
scopes; CKR entries are measurable context. When the helper is available, write these through
write-frame and write-tree, then run verify before reporting success.
Append direct objective and anti-goal ledger readings through metric-read, not generic append.
Metric payloads must use type: "metric_read" (or objective_metric_read /
anti_goal_metric_read), identify metric_kind, metric_id, value, observed_at, source, and
freshness. For storage-governance anti-goals, record zero-valued metric reads for
ungoverned_direct_read, ungoverned_direct_write, and single_llm_truth.
A consequence worth knowing: the admissibility dry-run (propose-cost) worker has no side effect, so it is naturally idempotent and needs no key - which is why dry-run is the default for any move whose anti-goal cost cannot be known up front. Only the committing move is keyed and stored.
For the full storage schema, key construction, and resume sequence, read
references/storage-idempotency.md. For a lightweight file layout, generated-status rule, and bash
helper, read references/integrity-store.md.
If the run continues across turns or time, define the update ritual before dispatching work. The
orchestrator needs a metric freshness contract for every objective, CKR, and anti-goal metric:
source of truth, owner, exact definition, read method, observed_at, recorded_at, max_age, lag
window, and missing-data policy.
Each metric read must carry an explicit freshness classification in one compact row or sentence:
observed_at=<timestamp> -> status=fresh|stale against max_age=<limit>. If a copied or last-known
reading is outside the limit, write that date next to stale and the limit, for example
2026-06-15 ... stale against the 72-hour max_age.
Then set the clock: start-of-turn freshness check, pre-dispatch admissibility, post-move metric
read, end-of-turn status write, and an idle heartbeat when no worker finishes. Every round should
write current_round, open flags, last metric read, and next_check_at. Do not dispatch committing
work on stale metrics unless the human explicitly waives that stale state.
For delegated subagent work, make check-ins both event-based and time-based: worker completion,
unknown discovery, flag opening, and a default ten-minute heartbeat for long-running workers. Each
check-in recollects DKR learning, reads file-based worker progress reports, updates CKR/PKR
candidate status, and decides whether to continue, spawn discovery, pause, or escalate.
Each steering check-in must also show its value, not just that it happened: the inbound signal it
consumed, the decision/state/allocation change it made, and the expected or direct objective,
anti-goal, uncertainty, or waste-reduction effect. Track this with a steering-value metric such as
steering_value_score or valuable_steering_decision_count, plus a zero-valued anti-goal such as
no_value_checkin_count == 0. A check-in that only says "continue" without evidence of why that was
the right steering move is process theater, not loop control.
For the operating-loop fields, lag handling, and flag lifecycle, read
references/operating-loop.md.
A serious OKRA loop should improve under stress. Here, stress means time, turn, budget, or attention pressure while the loop must still avoid anti-goal violations and either reach the objective or produce clear evidence that it cannot. The orchestrator owns allocation under that stress: it spends DKR budget where uncertainty blocks safer steering, holds or promotes candidate CKRs/PKRs only after accepted learning checkpoints, enforces PKR progress signals, and vetoes moves whose anti-goal cost is too high. It may re-rank, fund, hold, or stop work inside the ratified frame; it does not change the frame.
Treat DKR as the loop's learning allocator, not as background research. Each DKR should reduce the
next allocation decision: which path to fund, which candidate CKR/PKR to promote, which move to
dry-run, which branch to pause, which anti-goal uncertainty to inspect, or when to raise cannot.
That is how the loop self-heals: flags, vetoes, flat metrics, worker unknowns, and budget exhaustion
become new steering evidence instead of silent failure.
At check-ins and at end-of-run, extract OKRA-specific learning records: traps hit or nearly hit, avoidances/vetoes that worked, misconceptions corrected, optimization candidates, reusable candidate anti-goals, source run references, evidence hashes or metric reads, confidence, context where the learning applies, and no-regression evidence. Run-local learning repairs the current loop; cross-run learning seeds the next loop's candidate frame, anti-goals, DKR probes, and action envelope. Previous-run memories are automatic inputs, not automatic authority: load them when available, but keep them candidate-only until the current run has evidence and the human ratifies any frame or guardrail change.
This is not generic memory. It is memory optimized for running OKRs in a particular context. Accept a learned anti-goal or optimization only when backed by deterministic evidence, append-only store records, hashes, changed-path or eval results, human ratification, or independent review. If reuse would make evals worse, rests on one LLM's narrative, lacks current-context fit, or tries to alter a human-owned frame without ratification, hold it as a candidate and open the right flag.
For the learning-memory record shape and no-regression gates, read
references/learning-memory.md.
The tree of work is scaffolding, not scoreboard. The only score that counts is the direct metric - the objective's number and each CKR's number - read fresh from the source.
A finished subtree with a flat objective metric is not success. If the metric's lag window is
still open, mark the branch waiting_for_measurement and schedule the next read. Once the lag window
has closed and fresh reads still show no movement, the flat metric is a signal the breakdown was
wrong. Never infer progress from completed tasks. Measure the world directly. The same applies to the
anti-goal: measure breakage where it manifests, never roll it up.
This is not waterfall (discover everything, then build everything). It is a zig-zag that narrows: learn a slice -> act on it -> that action surfaces the next unknown -> discover that -> act again. The swings shrink as guess turns into known work. That narrowing is progress.
Keep every bend clean: when an execution task hits an unknown mid-run, it hands back up - "this is not execution anymore, this is discovery" - and the loop decides whether to fund a fresh probe before resuming. Never let a task quietly muddle through a discovery it cannot see the end of. You are always either executing known work or running a scoped probe - never pretending one is the other.
The anti-goal is not a single end-of-loop check. It fires three times, each doing a different job. This is the heart of the skill - get the timing right.
The loop runs around 80% on its own. It calls the human on three outcome conditions, each a distinct failure:
Run all three outcome flags at once. Drop any one and a class of silent failure slips through.
For delegated loops, also raise Authority drift when the loop or a worker tries to change the frame, relax a threshold, expand scope, bypass approval, contact a human directly, or act outside the ratified action envelope. This is a governance breaker, not just an invalid move.
Flags have lifecycle. They are open, acknowledged, resolved, or waived. breaking pauses
committing moves by default; cannot and pointless stop the affected branch; authority drift
stops the proposed move and goes to the human. The orchestrator may resume only inside the recorded
resolution.
The human owns the frame: the objective and target, the CKR and anti-goal definitions and thresholds, the metric contracts, the action envelope, and the call that a goal is wrong. Goal-switching is human-only.
This is load-bearing. If goal switching sits inside the loop, the loop can satisfy anything by quietly retreating to a goal it is already hitting, and every guardrail becomes theater. Keeping goal switching with the human is what makes the anti-goal mean something. This is best-effort: the loop must try against the goal it was given; when effort goes in and the metric stays flat, it reports the gap and hands up the evidence (budget spent, tree built, contributions done, flat metric). The human decides.
In team terms: no matter how much runs on its own, eventually someone makes the call - and the call belongs to a person.
The width of the funnel - the ratio of discovery to execution in recent loops - tells position. Wide, still guessing -> early, far from goal. Narrow, mostly known work -> close. This is a progress signal that is not the direct metric: the metric says if you have arrived; the funnel width says how close on the way. It is honest only while "more known" and "closer to goal" stay coupled - which is what the pointless flag protects.
Match depth to the stakes. Not every goal needs the full machinery. For a light or personal goal, the load-bearing core is just: a metric+target objective, a measured anti-goal, and the no-cascade habit of reading the real metric instead of counting tasks done. Lead with that.
Bring in the heavier parts - orchestrator/worker split, idempotent storage, the formal three-point eval, the flags - when the goal is being run as an actual automated loop, has real side effects (spend, sends, deploys), or the user asks how to operationalize it. Offer them rather than front-load them on someone who just wants help shaping a goal. The reverse tornado is the same shape at every size; you do not always need to draw the whole funnel.
Deliver the structured loop: objective + target, the named anti-goal with its metric and type
(drift/tripwire), the CKR/DKR/PKR decomposition, the three eval points instantiated for this goal,
the flags, and the human-only frame. Use the user's real domain throughout - do not leave the example
abstract. Preserve exact metric literals from the source material in addition to any explanation; if
the source says 12 per 100, include that exact phrase instead of only a paraphrase.
When the user wants to run the goal over time, also deliver the Operating Loop: cadence, current
round, metric freshness contracts, lag windows, next_check_at, stale-data policy, flag lifecycle,
and what gets updated at every turn or timed heartbeat. Include the current metric freshness
classification, keeping observed_at, recorded_at, fresh or stale, and max_age in the same
table row or sentence.
When the goal will recur in the same project, also deliver an OKRA Learning Memory section: previous-run inputs scanned, traps, avoidances, misconceptions, optimization candidates, reusable candidate anti-goals with metrics, evidence or hashes, confidence, context fit, ratification status, and no-regression / no-single-LLM-truth evidence. State which memories are automatic candidates and which, if any, the human ratified for this run.
For delegated loops, make these four lines explicit in the artifact:
.okra/runs/<run-id>/workers/ and
use a timed heartbeat, defaulting to ten minutes when the human has not set a cadence. Include one
compact line that starts "Heartbeat cadence and next_check_at:" and contains both the cadence
and the next scheduled check.steering_value_score >= 0.75 or
valuable_steering_decision_count >= 1; do not leave steering value only in prose.For delegated loops, include a compact Eval Points section with these exact labels instantiated for the current goal:
For delegated loops with storage, also state the exact frame/tree schema in the artifact or records:
frame keys frame_version, frame_hash, objective, anti_goals, metric_contracts,
action_envelope, and human approval/ratification evidence; tree keys tree_version,
frame_version, orchestrator, dkrs, ckrs, and pkrs. The tree must use the key
orchestrator and include the phrases objective checks and subagent steering.
Avoid ambiguous frame-authority wording. Do not write that the loop, agent, model, or orchestrator may change, adjust, relax, redefine, retune, or switch the objective, target, threshold, anti-goal, guardrail, metric, or action envelope. Write that the loop raises evidence and the human decides. For boundary-drift gates, use rejection-shaped wording such as: "Reject any attempted frame, guardrail, metric, threshold, or action-envelope change unless the human ratifies it." Avoid permission-shaped wording even when describing a forbidden pattern.
Define all four flags explicitly. For pointless, use the exact shape: "Pointless opens when work
finished or a CKR metric moved, but the objective metric stays flat / does not move after the lag
window." Then add the domain example and the stop/re-aim behavior.
If the user wants a visual or shareable explainer, produce a self-contained HTML artifact. See
references/artifact-guide.md for how (and how to keep the artifact within its own anti-goal:
single file, no external runtime, no decoration that does not carry meaning).
pointless flag, or defining it without saying the objective metric stays flat or
does not move after the lag window.ownership but no orchestrator key, or omitting frame_version..okra/ledger.jsonl, .okra/checkins.jsonl,
or .okra/workers/ path instead of giving each loop its own run store.Creates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.
npx claudepluginhub lagz0ne/okra --plugin okra