L1 Improvement
- Reworked the skill into Skill Forge required sections with explicit triggers, stop conditions, and evidence logging.
- Added prompt-architect ceiling discipline and contract-style IO to keep iterations auditable.
LIBRARY-FIRST PROTOCOL (MANDATORY)
Before writing ANY code, you MUST check:
Step 1: Library Catalog
- Location:
.claude/library/catalog.json
- If match >70%: REUSE or ADAPT
Step 2: Patterns Guide
- Location:
.claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md
- If pattern exists: FOLLOW documented approach
Step 3: Existing Projects
- Location:
D:\Projects\*
- If found: EXTRACT and adapt
Decision Matrix
| Match | Action |
|---|
| Library >90% | REUSE directly |
| Library 70-90% | ADAPT minimally |
| Pattern exists | FOLLOW pattern |
| In project | EXTRACT |
| No match | BUILD (add to library after) |
STANDARD OPERATING PROCEDURE
Purpose
Manage recursive improvement cycles that collect failures, propose targeted changes, validate them, and stop when marginal gains level off.
Trigger Conditions
- Positive: iterative hardening of skills/prompts/agents, regression triage, A/B comparisons, or postmortem action items.
- Negative/reroute: net-new prompt design (prompt-architect/prompt-forge) or new skill scaffolding (skill-forge/skill-builder).
Guardrails
- Define stop criteria up front (delta threshold, timebox, risk acceptance) to avoid infinite loops.
- Capture evidence for each iteration: inputs, changes, tests, and results with confidence ceilings.
- Keep outputs in English; avoid hidden reasoning.
- Do not silently discard failed experiments—record them for future avoidance.
Execution Phases
- Intake: Identify artifact to improve, goals, constraints, baseline metrics, and stop criteria; classify constraints as HARD/SOFT/INFERRED.
- Hypothesis: Propose targeted changes addressing failures or goals; prioritize by impact.
- Apply & Test: Implement changes, run validation (happy/edge/adversarial), and log results with ceilings.
- Assess: Compare metrics vs baseline; decide to continue, pivot, or stop based on delta.
- Package: Summarize iterations, residual risks, and recommended next steps.
Pattern Recognition
- Quality drift → focus on regression tests and schema tightening.
- Safety issues → add guardrails, refusals, and escalation rules.
- Performance/latency complaints → simplify prompts and reduce tool calls.
Advanced Techniques
- Use multi-armed bandit style sampling for competing variants with limited budget.
- Apply self-consistency or debate to stress-test high-risk changes.
- Snapshot checkpoints so reverting is easy when metrics regress.
Common Anti-Patterns
- Iterating without baseline metrics.
- Changing multiple variables simultaneously, making results ambiguous.
- Ignoring ceiling discipline or failing to log evidence.
Practical Guidelines
- Limit each iteration to one or two focused hypotheses.
- Keep a changelog with timestamps, metrics, and confidence statements.
- Escalate to domain specialists when improvements stall.
Cross-Skill Coordination
- Upstream: prompt-architect/prompt-forge to clarify artifacts under test.
- Parallel: cognitive-lensing to unlock new hypotheses.
- Downstream: skill-forge/agent-creator to bake improvements into canonical docs.
MCP Requirements
- Optional memory/vector MCP to store iteration history; tag WHO=recursive-improvement-{session}, WHY=skill-execution.
Input/Output Contracts
inputs:
target: string # required artifact to improve
goals: list[string] # required goals/metrics
constraints: list[string] # optional constraints
stop_conditions: object # required thresholds/timebox
outputs:
iterations: list[object] # steps taken, tests, outcomes, ceilings
recommendation: summary # continue/stop with rationale
artifacts: list[file] # updated files if applicable
Recursive Improvement
- Meta: apply this SOP to itself; stop when improvement delta < 2% or risks documented.
Examples
- Harden a prompt that occasionally hallucinates by tightening schema and adding refusals.
- Improve a code-generation skill by adding edge-case tests and measuring deltas.
Troubleshooting
- No improvement after multiple cycles → revisit hypotheses or broaden search (new lenses/tools).
- Metrics regressing → revert to last good checkpoint and reassess constraints.
- Timebox exceeded → summarize current best variant and open risks.
Completion Verification
Confidence: 0.70 (ceiling: inference 0.70) - Recursive Improvement SOP rewritten with Skill Forge structure and prompt-architect ceilings.