Stats

Actions

Available In

Tags

Harness Engineering Skills

Stometa's public curated Claude Code skillset — a small, opinionated set of skills we use ourselves, published periodically.

Why this repo

This is the public companion to Stometa's private stometa-skillset. We dogfood a larger internal skillset day-to-day; selected skills are extracted, polished, and published here in batches. The goal is to share the workflows that actually hold up under real engineering work — not a pile of prototypes.

The first batch ships two skills: review-loop (already proven in daily use) and harness (multi-agent orchestration for larger tasks). Both are installable as a single Claude Code plugin.

Workflow at a glance

harness is a cybernetics-inspired orchestrator: planning and execution live in separate sessions so context cannot leak; every checkpoint runs against a fresh sub-agent to reset eigenbehavior; an engine script owns state and enforces hard gates so the LLM cannot self-certify; and a cross-model peer (a different vendor's CLI) reviews before PR so you never merge on a single model's opinion. Persistent retro feeds learnings back into future tasks — that's the closing loop of the cybernetic system.

flowchart TB H((Human)) H -->|"harness plan task-id"| HOST subgraph HOST["Orchestrator host — pick one (symmetric)"] direction LR CC["Claude Code CLI"] CX["Codex CLI"] end HOST --> ENG[["harness-engine.sh single source of truth state · phase machine · hard gates"]] ENG --> S1 subgraph S1["Session 1 — Planning (recommended host: Claude Code)"] direction TB PL["Orchestrator = Planner brainstorm + draft spec.md"] SE["harness-spec-evaluator fresh sub-agent (Claude)"] OK1["spec.md approved"] PL --> SE SE -->|revise| PL SE -->|approve| OK1 end S1 -. session ends — planning context discarded .-> S2 subgraph S2["Session 2 — Execution (recommended host: Codex)"] direction TB CPL{{"For each Checkpoint NN"}} GEN["harness-generator fresh sub-agent per CP TDD + verification preloaded"] EVL{"harness-evaluator fresh sub-agent per CP Tier 1 deterministic + Tier 2 LLM"} MORE{"more checkpoints?"} E2E["E2E Evaluator cross-checkpoint data-flow audit"] RL[["review-loop cross-model quality gate"]] FV["full-verify tests · coverage ≥ threshold · lint · types"] PR["Open PR"] RT["harness-retro fresh sub-agent"] CPL --> GEN --> EVL EVL -->|FAIL / REVIEW| GEN EVL -->|PASS| MORE MORE -->|yes| CPL MORE -->|no| E2E E2E -->|FAIL| GEN E2E -->|PASS| RL RL --> FV --> PR --> RT end subgraph RLSUB["review-loop · cross-LLM peer review"] direction LR PEER["Peer reviewer CLI codex OR gemini (allowlisted)"] HEV["Host LLM evaluates ACCEPT / REJECT / INSIST"] FRESH["Fresh peer session final approval pass"] DONE["pass-review-loop"] PEER -->|FINDING fN| HEV HEV -->|fix + commit| PEER PEER -.CONSENSUS.-> FRESH --> DONE end RL -. invokes .-> RLSUB RT --> RD[(".harness/retro/ cross-task learnings git-tracked, persistent")] RD -. informs future tasks .-> H classDef antiDrift stroke:#d97706,stroke-width:2px; class GEN,EVL,SE,RT antiDrift; classDef gate stroke:#059669,stroke-width:2px; class ENG,RL,FV gate;

Legend — orange-bordered nodes are the fresh-sub-agent drift firewalls; green-bordered nodes are the engine-enforced gates that the LLM cannot bypass.

Hosts and roles

The model running each role is decoupled from the model hosting the session — that's why the same pipeline works whether you start in Claude Code or Codex.

English | 中文