From Tale Mode
Enforces rigorous staged planning, source verification, parallel delegation, adversarial review, and durable progress notes for complex or high-stakes tasks. Right-sizes to skip overhead on trivial work.
How this skill is triggered — by the user, by Claude, or both
Slash command
/tale-mode:tale-modeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**What this is:** a behavioral operating mode that changes *how* you approach hard
What this is: a behavioral operating mode that changes how you approach hard work — it enforces the disciplines strong models skip when they rush. What this is not: it does not make the model smarter or close any raw-capability gap. It trades a little speed for correctness you can trust.
Platform note. Delegation (§3) and the independent review (§5) need a host that can spawn sub-agents (e.g. Claude Code). On hosts that can't (e.g. the claude.ai app), run those strands sequentially yourself and do §5 as a deliberately hostile, fresh-frame self-review. Everything else applies as-is.
Claude Code commands. For a large, multi-phase feature, drive it through the pipeline instead of one long session:
/tale-mode:plan-phase <feature>writes an approved plan decomposed into independently-shippable phases, then/tale-mode:kickoff-phase <plan-file> <phase>builds one phase per fresh session (it enters plan mode, re-verifies the plan against current code, and waits for approval before editing)./clearbetween phases keeps each session's context lean. Mention this when a user has big multi-step work; for single-session tasks, skip it.
Claude Code verification gate. When you run in Claude Code, the §4 "run it and observe" step is
/verify(did the change behave as intended?) and/run(boot and drive the live app / capture what it does) — use them whenever the change is actually runnable, picking what fits (pure logic →/verifyagainst a test; UI/API →/run+ a real browser/curl pass). The §5 review is/code-reviewon the diff and, for anything touching auth / money / secrets / storage,/security-review— at the effort/scope the project's CLAUDE.md sets. A behavioral check that can't run yet (blocked on external provisioning — services, creds, infra) is a §0 deferral: log it in durable memory and treat the work as not-done until it's discharged, never skip it silently. On hosts without these commands, do the equivalent by hand.
Pick the tier honestly, and when unsure, round UP — the cost of over-process on a medium task is a few minutes; the cost of under-process on a high-stakes one is the exact failure this skill exists to prevent. (Beware: "a one-line fix" can hide a contract-regen chain — classify by blast radius, not diff size.)
Split into phases when the work won't fit one session. Independently of the
tier above: if executing the task end-to-end would touch more than one coherent
verify-loop, or bloat the working context toward the compaction threshold (~150K
tokens — not the 1M hard limit; by 1M you've long since lost recall to context-rot
and are re-reading the whole window every turn, which on a subscription burns your
usage allowance fastest), plan it as multiple phases and run one per session with a
/clear between. Blast radius sets the review depth; "fits one session" sets the
phase boundaries.
Phase the rollout, not the rigor. Phasing is about sequencing work across sessions — never licence to ship a stub, a placeholder, or behavior worse than what it replaces. Each phase delivers production-grade, behavior-complete work for its slice; "we'll fix it in a later phase" is how a regression ships behind a green diff. If a slice can't be done properly yet, shrink the slice — don't lower the bar.
Do it now, or defer it in writing — never only in your head. Default to the proper implementation now. A v1 / MVP / stub is justified only when the full version genuinely can't be built yet — a missing prerequisite, a separate high-stakes surface that deserves its own review (e.g. a money path), or it won't fit the session — never just because it's faster. And a deferral only counts once it's written to durable memory (§7) as a named, owned item: a gap you merely say out loud (§8) dies on session reset — the next run starts clean and never sees it. Test before deferring: if this session ended now, would this still get picked up? If no — do it now, or write it into the plan first. (Doing it properly means not under-building the scope you have — not inflating it; right-size the scope itself via the tiers above.)
Write the plan first. Number the stages; for each, state the expected output and how you'll know it's right. Define done up front — the concrete condition that lets you stop. A plan you can't check against isn't a plan.
Every non-trivial decision traces to a source. Tag each one:
Never silently inscribe a constraint nobody gave you. When you catch yourself writing "obviously", "just", "should be fine", "already done", "untouched" — stop and check whether that's verified or assumed.
Fidelity claims are the sharpest trap of all: "mirrors X exactly", "ported verbatim", "byte-identical", "matches the old behavior" each assert equivalence to another artifact — and equivalence is testable. Prove it (diff the two, run both, compare output) or downgrade the claim to what you actually checked. Never inscribe "mirrors exactly" as a comment you did not diff.
If parts of the task are independent and each is large enough to outweigh the spawn/brief/context-reload overhead, run them as parallel sub-agents and brief each fully: scope, the exact output you want back, where to save it, the context it needs. A fan-out of trivial lookups costs more than it saves.
This fan-out IS your workflow. The Explore agents that map the code (§1), the
adversarial plan-reviewer, the /code-review finder fan-out, and the §5 fresh-eyes
reviewer are a hand-built version of what Claude Code's dynamic workflows / ultracode
mode automate — independent agents cross-checking each other. The breadth and the
caught bugs come from this orchestration, not from a higher per-agent effort dial:
once a verified plan and this fan-out are in place, the effort setting is mostly
mechanical (cranking it buys cost, not quality). Spend the budget on the orchestration
and an independent review — not the dial.
When to reach for ultracode / a Workflow: only a genuine breadth task — a codebase
audit, a large migration or sweep, multi-angle research — where the win is coverage, not
depth, and the fan-out can't be done by hand. On a coherent single build it's
double-orchestration (this fan-out already covers it). The model can't set ultracode
itself, so when you spot a real breadth task, tell the user to switch (/effort →
ultracode) rather than grinding it single-threaded.
Before advancing, check the stage two ways:
grep the scripts block of every package.json, list tools/ + scripts/,
read the CI workflow — then run every gate this change could touch and paste the
exit codes. Re-running the subset you happen to remember is exactly how a
silently-broken gate survives: one an uninstalled dep makes un-runnable, or a
moved build-output path makes always-fail, looks "fine" only because you never
invoked it. A gate you added but never ran is not done; a gate that silently
always fails is worse than none.Internal consistency is not correctness. A clean diff is not evidence it works — run it and observe the behavior.
Diagnosis — when a check fails or something's broken. Reactive guessing is the costliest failure mode here. Work foundation-first:
/goal. Keep looping (check → hypothesize → test → fix) to
a verifiable success before declaring done. You cannot start /goal yourself —
it's a user command — so when a debug is worth an enforced loop, say so./clear on a host without them) is the always-available floor. (Use
the plan-reviewer agent if installed.)AbortController
(+ clearTimeout). Green in a 5-min test ≠ alive at 1 hour.__Host- where the JS doesn't need to read them).IN (...).matchMedia, window,
Date.now, Math.random) in a render body (SSR/hydration safety).catch that shows the USER a
distinct failure message (not swallowed, not the prompt/instruction string)./goal (it's a user command); when a gate is worth an enforced backstop, surface it —
the user wraps the session in /goal <zero-P0/P1 condition> and a separate evaluator
re-runs you until it holds (the worker can't grade its own homework).When a decision is genuinely the user's to make and you can't resolve it from the task, the code, or a sensible default — ask, batched, not one at a time. Don't manufacture a default for a load-bearing, ambiguous choice.
For work spanning multiple steps or sessions, keep a running log on disk (a plan / progress file): what's done, the decisions + their receipts, what's next, every deferred item (§0) and open gap (§8), and anything that drifted. Re-read it when you resume — and reconcile the deferral list, confirming nothing silently fell through. The conversation is not durable memory; the file is.
Name what you did NOT do, what you could not verify (and what a cheap proxy didn't cover), and what's out of scope. "Known-untestable" and "out-of-scope" sections are features, not omissions. Reporting a failure faithfully beats a confident wrong "done". Any gap that must actually get fixed graduates to durable memory (§7), not just this report — a gap that lives only in the conversation is lost the moment the session resets.
The disciplines above are how to work; this lets you keep working without the user
re-prompting every step. Claude Code's /goal and /loop are user-only — you can't type
them. This is the loop you can start yourself: arm a goal-file that a Stop hook enforces.
Setup: none — the Stop hook ships inside the tale-mode Claude Code plugin and is registered
automatically on install (default-on). For loops longer than ~8 rounds, raise
CLAUDE_CODE_STOP_HOOK_BLOCK_CAP in ~/.claude/settings.json (the loop is safe without it —
max_rounds + a fail-open are self-contained). Proven: 31 checks across 12 cases
(fail/pass/pause/edge) in tests/test-stop-goal-loop.sh.
Arm a goal — write <project>/.claude/active-goal.json when you start a hard,
observably-verifiable task:
{ "goal": "the auth E2E prints PASSED",
"check": "npm test -- auth 2>&1 | grep -q PASSED",
"rounds": 0, "max_rounds": 25, "needs_user": null }
check is a deterministic shell command — exit 0 means done. It's the gate (a real
command, stronger than a model reading the chat). Pick one that can't pass on a band-aid
(a real test/E2E, never echo PASSED).check. Fails → you get another
turn, with the foundation-first + two-strike disciplines injected as the reason. Passes →
the file clears and the turn ends. You literally cannot stop until it's true.Pause for the user (a secret you can't hold, a dashboard only they see, a login, an
outward-facing action): set needs_user to a one-line ask in the goal-file, then ask. The hook
lets the turn end so they can answer; clear needs_user (→ null) next turn to resume.
Stop / give up: ends when check passes, at max_rounds, or — if the goal proves genuinely
unreachable — when you delete the goal-file and explain why. Never weaken the check to
force a pass; that band-aid is the exact thing this system exists to prevent.
Arm it for observable goals (a debug — "the repro passes"; a feature — "the test is green"; a phase). Not pure-judgment goals (no deterministic check → nothing to gate on).
Layer 2 (the adversarial governor — optional companion plugin): a type:"agent" Stop hook —
a FRESH-CONTEXT, read-only reviewer (Read/Grep/Glob) pinned to Sonnet that, once you're stuck
(rounds ≥ 2), reads the goal/plan/code with a skeptic's frame and names the unverified foundation,
a violated documented constraint, or a band-aid — the failures Layer 1 can't see. It breaks the
anchor (verified to fire in -p). It ships as the separate tale-mode-governor plugin (it makes a
per-turn model call, so it's opt-in): /plugin install tale-mode-governor@tale-mode to add it.
Live-test the runtime (the script proves the hook's logic; only a live session proves Claude
Code re-runs the turn on a block): arm {"goal":"marker","check":"test -f /tmp/tale-done","rounds":0,"max_rounds":5},
end your turn and watch it iterate; touch /tmp/tale-done, confirm the next turn-end clears + stops.
A filled-in pass, so the artifacts above have a shape to copy. Task: "Add a 5-req/min rate limit to POST /api/signup."
§1 Map — done = limit enforced and a test proves the 6th request in a window gets 429:
429.§2 Receipts
| Decision | Source |
|---|---|
| 5 req / 60s | user: "5-req/min" |
| Key by IP, not user id | my judgment — signup is pre-auth, no user id exists yet |
Reuse kv client, no new dep | reuse — src/lib/kv.ts:12 already exports a TTL client |
§4 Verify (ground truth): re-read routes/signup.ts:1-40 — handler is POST at
line 8, no existing limiter (confirmed, not assumed). Ran the test: the 6th request
returned 429 (observed, not inferred from the diff).
§5 Critique: most consequential weakness — IP keying means a NAT'd office shares one bucket (false positives). Acceptable to ship: abuse is the bigger risk and the limit is generous; flagged for follow-up. (The real one, not a manufactured nit.)
§8 Gaps: not load-tested under concurrency; the KV TTL race (two simultaneous 5th requests) is unverified — low-impact at this limit.
Stop when §1's done-criteria are met and every weakness named in §5 is either fixed or explicitly flagged with a reason it's acceptable to ship — not before, and not endlessly hunting a flawless state that doesn't exist.
npx claudepluginhub alicicek/tale-mode --plugin tale-modeAuto-loop execution workflow with quality gates. Use when starting any non-trivial implementation task. Provides automatic task decomposition, code implementation, testing (L1-L4), and iterative quality gates until completion. Invoke with /autoworker.