Fable 5 behavioral patterns as a harness for Opus 4.8: grounded progress, self-verification loops, delegation triggers, file-based memory, autonomy calibration. Zero-config: installs a SessionStart hook; skills auto-trigger.
Extract a full task specification up front (goal, constraints, definition of done), then execute — the primary documented lever for Opus 4.8 long-horizon quality.
Write this session's lessons into the project's lessons/ directory using the memory-discipline format.
Spawn a fresh-context verifier subagent to check completed work against its specification before trusting it.
Use when deciding whether to spawn subagents — explicit rules for when delegation pays off and when direct work is faster. Opus 4.8 under-delegates by default; these rules correct that.
Use when recording lessons learned in a project, or when a project contains lessons/ or MEMORY.md — file-based memory format and read/write rules that measurably improve long-horizon performance.
Use when a task spans more than a few steps or files — establishes a checkable definition of done, a verification cadence, and fresh-context verifier subagents before claiming completion.
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Make every model cheaper or better. Measured on four Claude models — none got worse.
What it actually is: a zero-config Claude Code plugin. On every session start it injects a ≈910-token behavioral core — six working practices distilled from how Fable 5 was trained to operate — plus three on-demand skills and a fresh-context verifier agent. No commands to learn; the model just starts working differently:
| ✅ Grounded progress — only claims backed by a tool result; "tests fail" said plainly | ⚡ Act, don't overplan — enough information means act, not narrate options |
| 🎯 Autonomy calibration — decides minor things itself, asks only on scope or destructive actions | 🔍 Self-verification loops — a checkable definition of done, real checks on a cadence, a fresh-context verifier before "done" |
| 🔀 Delegation triggers — explicit rules for when to fan work out to subagents | 📝 Cross-session memory — writes lessons and plans to files, so the next session can pick up the work |
| 🐛 Bug hunts 4 tasks · 96 runs Find and fix planted defects: TTL cache, CSV quoting, rate limiter, date rollover | ✨ Features from spec 4 tasks · 96 runs Build to a written spec: retry backoff, config merging, cursor pagination, slugify | ♻️ Refactors 2 tasks · 48 runs Restructure code with zero behavior change, verified structurally |
| 🧠 Long-horizon builds 2 tasks · 48 runs Multi-stage pipelines where later steps depend on earlier decisions | 🧩 Spec-dense traps 3 tasks · 72 runs 18+ interacting rules (discount engine, mini-interpreter) that punish shallow reading | 🔁 Session handoffs 2 tasks · 48 runs A fresh session must finish another session's work — memory is the only bridge |
17 tasks × 3 attempts × 8 configurations = 408 runs. Grading is hidden and binary: test suites the agent never sees decide pass/fail. No LLM judge. Every task ships with a reference solution proving it solvable.
Same 17 tasks, 3 runs per configuration. Higher pass rate and lower cost/time are better. 🟢 = better with modelharness, 🔴 = worse (explained in the last row).
| What we measured | Fable 5 | Opus 4.8 ⭐ biggest gain | Sonnet 4.6 | Haiku 4.5 | ||||
|---|---|---|---|---|---|---|---|---|
| plain model | + modelharness | plain model | + modelharness | plain model | + modelharness | plain model | + modelharness | |
| Tasks completed successfully | 100% | 100% | 100% | 100% | 100% | 100% | 🔴 98% | 🟢 100% |
| Average cost per task | $1.80 | 🟢 $1.73 | $0.89 | 🟢 $0.77 | $0.41 | 🟢 $0.40 | $0.24 | $0.24 |
| · bug hunts | $1.30 | 🟢 $1.26 | $0.63 | 🟢 $0.55 | $0.26 | 🟢 $0.24 | $0.16 | 🟢 $0.13 |
| · features from spec | $1.44 | 🔴 $1.49 | $0.76 | 🟢 $0.60 | $0.34 | 🟢 $0.32 | $0.18 | 🟢 $0.16 |
| · refactors | $0.91 | $0.91 | $0.51 | $0.51 | ||||
npx claudepluginhub vitaliikapliuk/modelharness --plugin modelharnessComplete collection of battle-tested Claude Code configs from an Anthropic hackathon winner - agents, skills, hooks, and rules evolved over 10+ months of intensive daily use
This skill should be used when the model's ROLE_TYPE is orchestrator and needs to delegate tasks to specialist sub-agents. Provides scientific delegation framework ensuring world-building context (WHERE, WHAT, WHY) while preserving agent autonomy in implementation decisions (HOW). Use when planning task delegation, structuring sub-agent prompts, or coordinating multi-agent workflows.
A harness that makes Opus (or any Claude model) behave like Fable. It enforces completion, evidence, and verification as procedure, and auto-routes the right verified pack per task: render-output verification, a multi-story evidence gate, an investigation protocol, and an early-stop guard. It does not fake model capability — see README for the full analysis of what transfers and what does not.
Harness for Claude Code — skills, /harness:* slash commands, persona subagents, lifecycle hooks, and MCP tools without per-repo `harness setup`. Sibling plugins exist for Cursor, Gemini CLI, and Codex.
Intelligent model routing for Claude Code - routes queries to optimal Claude model (Haiku/Sonnet/Opus) based on complexity, with persistent knowledge system, context forking, and multi-turn awareness
OSS Claude Code config: agents, skills, and hooks for professional AI-assisted development workflows