By toolsbbb
BELCORT Planner → Generator → Evaluator pipeline for Claude Code. Adapted from Anthropic's harness engineering research. Opinionated, file-based, git-tracked.
Cross-artifact consistency check (SpecKit-inspired) — verifies PRD coverage, NFR alignment, constitution compliance across spec and feature contracts. CRITICAL findings halt the pipeline; warnings pass through. Writes analysis-report.md.
Verification debt scan (GSD-inspired) — finds deferred issues, stale known-issues, silent skips, TODO/FIXME without owners across the harness state.
Targeted modification of an existing spec file with downstream reference updates (BMAD tri-modal). Example — change DB stack, edit propagates to architecture, init.sh, and NFRs.
Generator ↔ Evaluator contract negotiation round. Generator proposes HOW, Evaluator reviews, iterate up to 3 rounds before code is written. Bridges the Planner's "what/why" to a testable "how".
Fast pipeline — skip Planner, use minimal contract, single build + QA pass. Use for small well-defined tasks (<30 min).
Executes bash commands
Hook triggers when Bash tool is used
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
An opinionated harness for Claude Code that implements a Planner → Generator → Evaluator pipeline, inspired by Anthropic's published research on long-running agent harness design.
Built for Claude Opus 4.6+. Tuned for TypeScript/Node.js full-stack projects but adaptable.
A set of Skills, agent prompts, and hooks that plug into Claude Code to enable autonomous multi-agent software development. You give Claude a 1–4 sentence prompt and the harness orchestrates planning, contract negotiation, test-driven implementation, adversarial QA, and retrospective drift analysis — all file-based, all auditable via git.
This is NOT a framework or a library. It's a set of markdown files that shape how Claude Code behaves when working on substantial projects.
Based on Anthropic Labs' engineering blog post Harness design for long-running application development (Rajasekaran, 2026) and its predecessor Effective harnesses for long-running agents.
See docs/anthropic-alignment.md for a point-by-point mapping between design decisions in this harness and the source material.
| Decision | Source |
|---|---|
| Three agents (Planner / Generator / Evaluator) as separate subagents | GAN-inspired architecture described in the Anthropic post |
| Evaluator MUST have separate context from Generator | "Separating the agent doing the work from the agent judging it proves to be a strong lever" |
| Planner outputs high-level direction only, NOT file paths or components | "stay focused on product context and high level technical design rather than detailed technical implementation" |
| Generator and Evaluator negotiate a sprint contract BEFORE any code is written | "Before each sprint, the generator and evaluator negotiated a sprint contract... before any code was written" |
| File-based agent communication | "Communication was handled via files: one agent would write a file, another agent would read it..." |
| Evaluator grades against 4 hard-threshold criteria | "Each criterion had a hard threshold, and if any one fell below it, the sprint failed" |
| Few-shot calibration examples for Evaluator scoring | "I calibrated the evaluator using few-shot examples with detailed score breakdowns" |
| Tuning loop: capture human-Evaluator divergence, refine over time | "The tuning loop was to read the evaluator's logs, find examples where its judgment diverged from mine..." |
| Criteria weighting emphasizes model's weak dimensions | "by weighting design and originality more heavily it pushed the model toward more aesthetic risk-taking" |
| Criteria wording deliberately chosen (shapes Generator output, not just Evaluator scoring) | "The wording of the criteria steered the generator in ways I didn't fully anticipate" |
Requires Claude Code installed and working.
/plugin marketplace add mosaladtaooo/belcort-harness
/plugin install harness@belcort-harness
/harness:setup
That's it. The plugin auto-registers the skill, three agents, ten slash commands, two hooks (SessionStart + PreToolUse), and two MCP servers (context7 + playwright). The one-time /harness:setup command patches ~/.claude/CLAUDE.md with the harness behavioral rules so they apply globally and survive context compaction. The patch is idempotent, version-aware, and removable (scripts/uninstall-rules.sh).
git clone https://github.com/mosaladtaooo/belcort-harness.git
cd belcort-harness
./install/install.sh
Complete the two manual steps the installer prints (append CLAUDE.md snippet, register hooks in ~/.claude/settings.json).
Verify either install:
./install/verify.sh
Once installed, in any project directory:
# Start Claude Code, then:
/harness:sprint "Build a minimal bookmark manager with tags and search"
The harness will orchestrate planning, negotiation, build, and evaluation across the session. Your feedback gets captured into the Evaluator tuning loop for next time.
npx claudepluginhub toolsbbb/belcort-harness --plugin harnessMulti-model consensus engine integrating OpenAI Codex CLI, Gemini CLI, and Claude CLI for collaborative code review and problem-solving.
Ultra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Memory compression system for Claude Code - persist context across sessions
Standalone image generation plugin using Nano Banana MCP server. Generates and edits images, icons, diagrams, patterns, and visual assets via Gemini image models. No Gemini CLI dependency required.
Write feature specs, plan roadmaps, and synthesize user research faster. Keep stakeholders updated and stay ahead of the competitive landscape.