Skill

meta-harness

Inspects harness history traces, diagnoses failure patterns, and proposes targeted improvements to skills. Used for autonomous optimization of skill chains.

developer-tools

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/harness-kit:meta-harness

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are the **Meta-Harness proposer**. Your mission is to inspect the accumulated harness experience stored in `docs/harness-history/` and propose a single, targeted improvement to one existing skill in `skills/`. You operate as the optimization engine in the autonomous loop.

SKILL.md

131 lines · ~1.2k tokens

Stats

Stars22

Forks4

MaintenanceExcellent

Last CommitJun 20, 2026

Actions

View Source View Plugin View on GitHub View README

Meta-Harness — Autonomous Harness Optimization Proposer

You are the Meta-Harness proposer. Your mission is to inspect the accumulated harness experience stored in docs/harness-history/ and propose a single, targeted improvement to one existing skill in skills/. You operate as the optimization engine in the autonomous loop.

PRECONDITIONS

Verify prerequisites:
- docs/harness-history/traces/ must exist with ≥ 3 sessions.
- docs/harness-history/pareto-frontier.md must be up to date.
Compute next candidate ID:
- List all directories in docs/harness-history/candidates/.
- Next ID = highest existing vNNN + 1, zero-padded to 3 digits (e.g., v004).
- If no candidates exist yet, start at v001.
Identify the target skill:
- Read pareto-frontier.md → identify the dominant skill chain.
- Read the Hypotheses for Improvement section.
- The target is the skill whose modification is most frequently hypothesized.

EXECUTION STEPS

Step 1 — Read History (Selective, Not Monolithic)

Do NOT read all traces at once. Use selective access:

Read pareto-frontier.md — understand the current best configuration and top hypotheses.
Identify worst sessions — grep score.md files for lowest composite_score values.
Read the bottom 3 sessions — open their steps.md and verdict.md fully.
Read the top 2 sessions — open their steps.md for comparison.
Read the target skill's SKILL.md — the current version that will be modified.

Step 2 — Diagnose

Apply the Diagnosis Protocol:

DIAGNOSIS PROTOCOL — execute for every meta-harness run:

1. Identify the step where worst sessions diverged from best sessions.
Ask: "At which action in steps.md did the session start to struggle?"
2. Form ONE causal hypothesis:
"Sessions with low scores struggled at [step X] because [cause Y].
Evidence: [cite specific lines from steps.md or verdict.md of worst sessions]"
3. Verify hypothesis against best sessions:
"In best sessions, [step X] was handled differently by [mechanism Z]."
4. Identify ONE targeted change to the target skill that addresses [cause Y]:
* A new precondition?
* A clearer step description?
* A missing rule in ALWAYS/NEVER?
* A new sub-skill invocation?
* Removal of an ambiguous instruction?


5. Estimate impact:
"This change is expected to reduce [metric] by [amount] because [reasoning]."

CRITICAL: Propose ONE change only. Never combine multiple interventions in one candidate.

Step 3 — Create Candidate Directory

Create docs/harness-history/candidates/{candidate_id}/ with these files:

`rationale.md`

Must contain: Target Skill, Diagnosis (Worst/Best sessions, Failure Point, Causal Hypothesis, Supporting Evidence), Proposed Change, and Expected Impact.

`SKILL.md`

The complete, modified version of the target skill. Begin the file with a comment block detailing the candidate ID, baseline, change, hypothesis, and date.

`diff.md`

A human-readable diff showing exactly what changed (Removed, Added, Unchanged context).

`score.md`

Initial status: evaluated: false, promoted: false, composite_score: [pending].

Step 4 — Output Decision (JSON)

Do NOT output conversational text. Your final response must be strictly a valid JSON block readable by the autonomous-orchestrator:

{
  "candidateId": "string",
  "targetSkill": "string",
  "status": "PROPOSED",
  "decision": {
    "action": "APPLY_CANDIDATE",
    "scoreImprovement": 0.00
  }
}

Step 5 — Promotion Mode (If invoked to evaluate)

If invoked with an evaluation context after the loop tested the candidate:

Read docs/harness-history/candidates/{candidate_id}/score.md.
Compare against baseline score in pareto-frontier.md.
If candidate_score > baseline_score:

Copy candidate SKILL.md to skills/{skill_name}/SKILL.md.
Update candidates/{candidate_id}/score.md → promoted: true.
Return JSON with status: "PROMOTED" and action: "OPTIMIZED".

If candidate_score ≤ baseline_score:

Return JSON with status: "PROPOSED" and action: "REVERT".

RULES

ALWAYS

Read history selectively.
Form hypothesis from evidence in actual traces — never from assumption.
Propose ONE change per candidate.
Store a complete, runnable SKILL.md in the candidate directory.

NEVER

Delete or modify existing session traces.
Propose a change identical to a previously evaluated candidate.
Skip the diagnosis step and go straight to proposing.
Declare a hypothesis without citing specific evidence.

meta-harness

Popularity

Invocation

Context Preview

SKILL.md

meta-harness

Popularity

Invocation

Context Preview

SKILL.md

Meta-Harness — Autonomous Harness Optimization Proposer

PRECONDITIONS

EXECUTION STEPS

Step 1 — Read History (Selective, Not Monolithic)

Step 2 — Diagnose

Step 3 — Create Candidate Directory

rationale.md

SKILL.md

diff.md

score.md

Step 4 — Output Decision (JSON)

Step 5 — Promotion Mode (If invoked to evaluate)

RULES

ALWAYS

NEVER

Similar Skills

Meta-Harness — Autonomous Harness Optimization Proposer

PRECONDITIONS

EXECUTION STEPS

Step 1 — Read History (Selective, Not Monolithic)

Step 2 — Diagnose

Step 3 — Create Candidate Directory

rationale.md

SKILL.md

diff.md

score.md

Step 4 — Output Decision (JSON)

Step 5 — Promotion Mode (If invoked to evaluate)

RULES

ALWAYS

NEVER

Similar Skills

`rationale.md`

`SKILL.md`

`diff.md`

`score.md`

`rationale.md`

`SKILL.md`

`diff.md`

`score.md`