recursive-improvement

L1 Improvement

Reworked the skill into Skill Forge required sections with explicit triggers, stop conditions, and evidence logging.
Added prompt-architect ceiling discipline and contract-style IO to keep iterations auditable.

LIBRARY-FIRST PROTOCOL (MANDATORY)

Before writing ANY code, you MUST check:

Step 1: Library Catalog

Location: .claude/library/catalog.json
If match >70%: REUSE or ADAPT

Step 2: Patterns Guide

Location: .claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md
If pattern exists: FOLLOW documented approach

Step 3: Existing Projects

Location: D:\Projects\*
If found: EXTRACT and adapt

Decision Matrix

Match	Action
Library >90%	REUSE directly
Library 70-90%	ADAPT minimally
Pattern exists	FOLLOW pattern
In project	EXTRACT
No match	BUILD (add to library after)

STANDARD OPERATING PROCEDURE

Purpose

Manage recursive improvement cycles that collect failures, propose targeted changes, validate them, and stop when marginal gains level off.

Trigger Conditions

Positive: iterative hardening of skills/prompts/agents, regression triage, A/B comparisons, or postmortem action items.
Negative/reroute: net-new prompt design (prompt-architect/prompt-forge) or new skill scaffolding (skill-forge/skill-builder).

Guardrails

Define stop criteria up front (delta threshold, timebox, risk acceptance) to avoid infinite loops.
Capture evidence for each iteration: inputs, changes, tests, and results with confidence ceilings.
Keep outputs in English; avoid hidden reasoning.
Do not silently discard failed experiments—record them for future avoidance.

Execution Phases

Intake: Identify artifact to improve, goals, constraints, baseline metrics, and stop criteria; classify constraints as HARD/SOFT/INFERRED.
Hypothesis: Propose targeted changes addressing failures or goals; prioritize by impact.
Apply & Test: Implement changes, run validation (happy/edge/adversarial), and log results with ceilings.
Assess: Compare metrics vs baseline; decide to continue, pivot, or stop based on delta.
Package: Summarize iterations, residual risks, and recommended next steps.

Pattern Recognition

Quality drift → focus on regression tests and schema tightening.
Safety issues → add guardrails, refusals, and escalation rules.
Performance/latency complaints → simplify prompts and reduce tool calls.

Advanced Techniques

Use multi-armed bandit style sampling for competing variants with limited budget.
Apply self-consistency or debate to stress-test high-risk changes.
Snapshot checkpoints so reverting is easy when metrics regress.

Common Anti-Patterns

Iterating without baseline metrics.
Changing multiple variables simultaneously, making results ambiguous.
Ignoring ceiling discipline or failing to log evidence.

Practical Guidelines

Limit each iteration to one or two focused hypotheses.
Keep a changelog with timestamps, metrics, and confidence statements.
Escalate to domain specialists when improvements stall.

Cross-Skill Coordination

Upstream: prompt-architect/prompt-forge to clarify artifacts under test.
Parallel: cognitive-lensing to unlock new hypotheses.
Downstream: skill-forge/agent-creator to bake improvements into canonical docs.

MCP Requirements

Optional memory/vector MCP to store iteration history; tag WHO=recursive-improvement-{session}, WHY=skill-execution.

Input/Output Contracts

inputs:
  target: string  # required artifact to improve
  goals: list[string]  # required goals/metrics
  constraints: list[string]  # optional constraints
  stop_conditions: object  # required thresholds/timebox
outputs:
  iterations: list[object]  # steps taken, tests, outcomes, ceilings
  recommendation: summary  # continue/stop with rationale
  artifacts: list[file]  # updated files if applicable

Recursive Improvement

Meta: apply this SOP to itself; stop when improvement delta < 2% or risks documented.

Examples

Harden a prompt that occasionally hallucinates by tightening schema and adding refusals.
Improve a code-generation skill by adding edge-case tests and measuring deltas.

Troubleshooting

No improvement after multiple cycles → revisit hypotheses or broaden search (new lenses/tools).
Metrics regressing → revert to last good checkpoint and reassess constraints.
Timebox exceeded → summarize current best variant and open risks.

Completion Verification

Stop conditions defined and respected.
Iteration logs include tests, results, and ceilings.
Recommendation provided with residual risks.
Artifacts updated or explicitly unchanged.

Confidence: 0.70 (ceiling: inference 0.70) - Recursive Improvement SOP rewritten with Skill Forge structure and prompt-architect ceilings.