Pattern 5: Concurrent Event-Driven Multi-Agent Loop. Coordinates multiple Claude sessions as OS threads sharing a common event bus and memory address space. Every loop cycle is a full improvement cycle: execute, eval against benchmark (KEEP/DISCARD), emit friction events during work, close with post_run_metrics, agent self-assessment survey saved to retrospectives, memory persistence, and Triple-Loop Retrospective trigger if friction threshold crossed. Four coordination topologies: turn-signal, fan-out, request-reply, triple-loop (Pattern D).
From agent-agentic-osnpx claudepluginhub richfrem/agent-plugins-skills --plugin agent-agentic-osThis skill is limited to using the following tools:
assets/architecture-overview.mdassets/diagrams/triple-loop-learning-system.mmdevals/evals.jsonevals/results.tsvreferences/dual-loop.mdreferences/memory/improvement-ledger-spec.mdreferences/memory/metrics.mdreferences/memory/post_run_survey.mdreferences/meta/backlog.mdreferences/operations/triple-loop.mdreferences/program.mdreferences/testing/test-registry-protocol.mdscripts/eval_runner.pyscripts/evaluate.pyscripts/generate_report.pyscripts/kernel.pyscripts/post_run_metrics.pySearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Verifies tests pass on completed feature branch, presents options to merge locally, create GitHub PR, keep as-is or discard; executes choice and cleans up worktree.
Pattern 5 in the agent-loops taxonomy. Treats concurrent Claude sessions as OS threads sharing a filesystem address space. The kernel event bus coordinates signals. Every cycle includes real work, eval against benchmark, friction tracking, agent self-assessment survey, post-run metrics, and memory persistence. The OS learns from every run.
There are two distinct Triple-Loop orchestration cycles operating at different scopes. Do not conflate them.
┌─────────────────────────────────────────────────────────┐
│ TRIPLE-LOOP ARCHITECT — OS Self-Improvement (this skill) │
│ │
│ os-improvement-loop evaluates and improves the OS │
│ workflows, protocols, agent coordination patterns, │
│ and this SKILL.md itself. │
│ │
│ Target: the OS machinery — ledgers, surveys, kernel, │
│ event bus, loop protocol. │
│ Eval gate: ORCHESTRATOR + PEER_AGENT run eval_runner.py │
│ on the OS skill being patched. │
│ Self-improvement: ORCHESTRATOR updates this SKILL.md │
│ when a confirmed protocol fix is found. │
└────────────────────┬────────────────────────────────────┘
│ spawns / governs
┌────────────────────▼────────────────────────────────────┐
│ TRIPLE-LOOP EXECUTOR — Individual Skill Improvement │
│ │
│ os-eval-runner + os-skill-improvement evaluate and │
│ improve a specific target SKILL.md (routing accuracy, │
│ trigger descriptions, example blocks). │
│ │
│ Target: a single skill's description and routing. │
│ Eval gate: os-eval-runner scores the target skill. │
│ Improvement: os-skill-improvement runs RED-GREEN-REFACTOR│
│ until score ≥ threshold. │
└─────────────────────────────────────────────────────────┘
Key distinction:
Triple-Loop Retrospective vs os-improvement-loop: Triple-Loop Retrospective (agent) is the
trigger/diagnostic layer — it analyzes friction events, identifies improvement targets,
and decides which Triple-Loop to invoke. os-improvement-loop (skill) is the execution
protocol that agents follow once a target has been identified. Do not conflate them.
Session Lifecycle Invariant: The OUTER loop owns session lifecycle. INNER loop work
(os-eval-runner, os-skill-improvement) never closes a session. A session is incomplete
until Phase 6 (os-memory-manager) is executed. An INNER loop that completes without running
Phase 6/7 has silently discarded its learnings.
Each Triple-Loop has its own eval targets, its own memory artifacts, and its own close protocol. A session that runs INNER loop work must still close through the OUTER loop's Phase 6/7 (os-memory-manager + os-skill-improvement) to persist learnings and harden OS-level routing.
See assets/diagrams/triple-loop-learning-system.mmd for the full visual.
Every loop cycle uses one of two tiers. Triple-Loop cycle is the default. Use Standard Cycle only when the north star is regressing or explicitly requested.
digraph fast_cycle {
rankdir=TB;
node [shape=box, style="rounded,filled", fillcolor=white, fontname=Helvetica];
edge [fontname=Helvetica, fontsize=10];
Orientation [label="1. Orientation\nRead: ledger, registry, last survey, friction patterns"];
Scenario [label="2. Document Test Scenario\nWrite hypothesis + AC to tests/ BEFORE emitting loop.start"];
Execution [label="3. Execute\nINNER_AGENT reads packet, does work, emits friction events immediately"];
Eval [label="4. Eval Against Baseline\nINNER_AGENT + PEER_AGENT run eval_runner.py independently\nKEEP if both accuracy AND F1 >= baseline"];
Verdict [label="KEEP / DISCARD?", shape=diamond, fillcolor=lightyellow];
Apply [label="5a. Apply (KEEP)\nApply change to canonical skill or artifact"];
Correction [label="5b. Correction Packet (DISCARD)\nWrite correction-{CID}.md (CRITICAL/MODERATE/MINOR)\nRe-assign to INNER_AGENT"];
LoopClose [label="6. Loop Close -- all 4 required, no exceptions\n6a: ledger + registry row 6b: surveys (all agents)\n6c: metrics + report 6d: log M/L-class issues as tasks"];
Trigger [label="7. Trigger Check\n3+ same-type friction events this cycle?\nFlag Triple-Loop Retrospective for Full Loop on next session start"];
Orientation -> Scenario -> Execution -> Eval -> Verdict;
Verdict -> Apply [label="KEEP"];
Verdict -> Correction [label="DISCARD"];
Apply -> LoopClose;
Correction -> Execution [label="re-cycle", style=dashed];
LoopClose -> Trigger;
}
Orientation -- ORCHESTRATOR reads improvement-ledger.md (score trend, pending Section 2
items) and the last registry row (what was recommended next).
Test scenario documented -- ORCHESTRATOR writes hypothesis + acceptance criteria to
context/memory/tests/[CYCLE_ID]_[TARGET].md BEFORE emitting task.assigned.
Execution -- INNER_AGENT reads strategy packet, does real work, emits friction events immediately on uncertainty or wrong syntax.
Eval against baseline -- INNER_AGENT runs eval_runner.py. PEER_AGENT runs it
independently. KEEP if both accuracy AND F1 score >= baseline. DISCARD otherwise.
On BASELINE verdict (first run of a skill): record the score, do not apply or revert any
change, proceed to step 5.
Apply verdict -- KEEP: apply change. DISCARD: correction packet, re-assign to INNER_AGENT.
Loop close (4 required actions -- all mandatory every Triple-Loop cycle):
6a. Ledger + Registry -- Append one row to ledger Section 1 (date, cycle ID, target,
scores, verdict). Update context/memory/tests/registry.md row to CLOSED-KEEP or
CLOSED-DISCARD. Full scenario file fill-in and ledger Section 2+3 are Standard Cycle only.
6b. Survey child agents -- INNER_AGENT and PEER_AGENT each complete the Post-Run
Self-Assessment Survey (references/memory/post_run_survey.md), save to
context/memory/retrospectives/survey_[DATE]_[TIME]_[AGENT].md, and emit survey_completed.
Even on Triple-Loop cycle, surveys are required -- they are the source of truth for what to improve next.
6c. Metrics + report -- Run post_run_metrics.py --correlation-id "$CID". If this is a
KEEP cycle, optionally run generate_report.py to update the progress chart. Update
temp/agent-agentic-os-review/HOW-TO-RESTART.md in UPSTREAM with any state changes
(new known bugs fixed, backlog items added, what-exists status changed).
6d. Log issues as tasks -- Any problem, opportunity, or improvement observed this cycle
that is M-class or L-class (requires thought or architecture) MUST be created as a task file
in tasks/backlog/ in UPSTREAM using the naming convention NNNN-[slug].md:
# Task NNNN: [Title]
## Objective
[what needs to change and why -- cite cycle ID and agent that observed it]
## Acceptance Criteria
[specific, testable definition of done]
## Notes
[options considered, links to backlog.md entry if one exists]
S-class issues (trivial, <5 min fix) can go directly into backlog.md without a task file.
The next available task number is the highest NNNN across all lanes in tasks/ + 1.
Check with: ls tasks/backlog/ tasks/todo/ tasks/in-progress/ tasks/done/ | grep -o '^[0-9]*' | sort -n | tail -1
6e. ORCHESTRATOR memory ownership -- ORCHESTRATOR is solely responsible for writing and keeping current all of the following at loop close. No other agent owns these files.
| File | Location | When written |
|---|---|---|
improvement-ledger.md | LAB context/memory/ | Every Triple-Loop cycle (Section 1). Standard Cycle adds S2+S3. |
tests/registry.md | LAB context/memory/tests/ | Every cycle -- row updated to CLOSED. |
tests/[CID]_[TARGET].md | LAB context/memory/tests/ | Before emit (scenario). Results filled in on Standard Cycle. |
memory/YYYY-MM-DD.md | LAB context/memory/ | Standard Cycle only (session log). |
loop-reports/report_[CID].md | LAB context/memory/loop-reports/ | Standard Cycle only. |
memory.md | LAB context/ | Standard Cycle -- promoted L3 facts via os-memory-manager. |
HOW-TO-RESTART.md | UPSTREAM temp/agent-agentic-os-review/ | Every cycle -- reflect state changes. |
tasks/backlog/NNNN-[slug].md | UPSTREAM tasks/backlog/ | When M/L-class issue is observed. |
references/meta/backlog.md | UPSTREAM references/ | When any issue is observed (S/M/L). |
SKILL.md (this file) | UPSTREAM .agents/skills/os-improvement-loop/ | When applicable -- if the loop produces a confirmed protocol improvement (step unclear, gap found, new requirement), ORCHESTRATOR updates this file before closing the cycle. Self-improvement of the loop protocol is a first-class output of every loop. |
A cycle that produces a protocol fix but does not update this SKILL.md has not fully closed.
Cycle Completion Checklist — a Triple-Loop cycle is complete only when ALL of these exist:
improvement-ledger.md)tests/registry.md)context/memory/retrospectives/)post_run_metrics.py --correlation-id "$CID")memory/MEMORY.md) — see 4.9loop.close event emittedMissing any item = incomplete cycle. Do not start the next cycle until the checklist is done.
context/memory/improvement-ledger.md Section 3:
if the last two Trend values are both negative, emit north_star_regression event and trigger
Triple-Loop Retrospective immediately (do not wait for next session).Emitting eval.result without completing steps 6a-6d and 7 is an incomplete Triple-Loop cycle.
Used when: north star completion rate declining, or explicitly requested. These steps are NOT required on every run:
context/memory/retrospectives/.context/memory/YYYY-MM-DD.md.context/memory/loop-reports/report_[CYCLE_ID].md with baseline
vs result table, survey summary, artifacts updated.registry.md
row to CLOSED, write recommended next test.os-memory-manager for L3 promotion.Use when:
Do NOT use for:
learning-loop or triple-loop instead)| Role | Responsibility |
|---|---|
| ORCHESTRATOR | Orients, writes strategy packets, applies improvements on KEEP, owns git, runs metrics, closes all memory files, updates SKILL.md when protocol improvements are found |
| PEER_AGENT | Runs os-eval-runner independently, produces KEEP/DISCARD verdict, completes self-assessment survey |
| INNER_AGENT | Reads strategy packet, executes work, runs eval_runner.py, emits friction events during work, completes self-assessment survey |
| WORKER | Stateless subprocess, no bus, returns result via file/stdout, no survey required |
${CLAUDE_PROJECT_DIR}/context/
events.jsonl <- shared event bus (append-only, atomic)
agents.json <- permitted agent registry
os-state.json <- shared counters and state
agents/<id>.cursor <- per-agent read cursor (line-count)
.locks/ <- per-resource execution lock directories
memory/YYYY-MM-DD.md <- session log written at every loop close
memory/retrospectives/ <- per-agent self-assessment surveys
survey_[DATE]_[TIME]_[AGENT].md <- one file per agent per cycle
memory.md <- L3 long-term facts (promoted from session logs)
memory/hook-errors.log <- hook failures (read by post_run_metrics.py)
Companion skills (all required for a complete loop):
triple-loop — strategy packet format, correction packet protocol, verificationos-eval-lab-setup — bootstrap experiment dirs (deploys program.md, evals.json, results.tsv); use before running any eval cycle on a new targetos-eval-runner — eval_runner.py (pure scorer), evaluate.py (loop gate with KEEP/DISCARD exit codes), results.tsv baseline; the canonical eval engineos-memory-manager — session log template, L2/L3 promotion, deduplicationTriple-Loop Retrospective — root cause analysis, Full Loop improvement, auto-patching skills[!TIP] See INSTALL.md for instructions on how to install missing dependencies.
Agents MUST emit a type: friction event immediately whenever they encounter:
<WRITE_FAILED> or tool error requiring retrypython3 "$KERNEL_PY" emit_event \
--agent INNER_AGENT --type friction --action encountered \
--correlation-id "$CID" \
--summary "step:eval-runner cause:wrong-flag-name"
These events are counted by post_run_metrics.py at close and drive the Triple-Loop Retrospective
auto-trigger (3+ friction events of same type = Full Loop improvement automatically).
poll_for_event() {
local AGENT=$1 ACTION=$2 CID=$3
for i in $(seq 1 30); do
EVENTS=$(python3 "$KERNEL_PY" read_events --agent "$AGENT")
MATCH=$(echo "$EVENTS" | python3 -c "
import sys, json
evs = json.load(sys.stdin)
hits = [e for e in evs if e.get('action') == '$ACTION'
and (not '$CID' or e.get('correlation_id') == '$CID')]
print(json.dumps(hits[0]) if hits else '')
")
if [ -n "$MATCH" ]; then echo "$MATCH"; return 0; fi
sleep 2
done
echo ""; return 1
}
Goal: Every agent orients before any work begins. No agent starts cold.
New target? Before running any eval cycle on a target skill for the first time, use
os-eval-lab-setupto bootstrap the experiment dir. This deploys:
evals/evals.json— test prompts withshould_triggerboolean schema (REQUIRED — legacyexpected_behaviorstring fields score 0.0 and will destroy accuracy)evals/results.tsv— baseline ledger (written when you runevaluate.py --baseline)references/program.md— your optimization goal, target score, and max iterationsWithout this setup,
evaluate.pywill fail with exit code 2 (missing experiment structure).
context/memory/improvement-ledger.md — cross-session OS-level trajectory per skill, survey-to-action trace, north star trend<target-experiment-dir>/evals/results.tsv — per-experiment baseline and iteration history (written by os-eval-runner's evaluate.py); this is the authoritative score history for the specific target being improvedcontext/memory/tests/registry.md — what has been tested, what was recommended nextcontext/memory.md (L3 long-term facts)context/memory/YYYY-MM-DD.mdcontext/memory/retrospectives/ (most recent per agent)context/events.jsonl last 100 lines for friction patterns from prior cycleagents.json lists all participating agents.agent_start:
python3 "$KERNEL_PY" emit_event \
--agent ORCHESTRATOR --type agent_start --action registered \
--summary "ORCHESTRATOR online — registry read, designing test from prior results"
context/memory/tests/[CYCLE_ID]_[TARGET_SLUG].md
per references/testing/test-registry-protocol.md — hypothesis, acceptance criteria, failure criteria,
prior results consulted, known weaknesses — BEFORE emitting loop.start.context/memory/tests/registry.md with status IN PROGRESS.loop.start:
CYCLE_ID="cycle-$(date +%Y%m%d-%H%M%S)"
python3 "$KERNEL_PY" emit_event \
--agent ORCHESTRATOR --type intent --action loop.start \
--correlation-id "$CYCLE_ID" \
--summary "target:[TARGET_SLUG] hypothesis:[one-line] scenario:tests/${CYCLE_ID}_[TARGET_SLUG].md"
# ORCHESTRATOR: apply fix, signal PEER_AGENT to eval
python3 "$KERNEL_PY" emit_event \
--agent ORCHESTRATOR --type signal --action signal.wakeup \
--to PEER_AGENT --correlation-id "$CID" \
--summary "target:skills/skill-A/SKILL.md change:updated-triggers"
# PEER_AGENT: poll, run full eval cycle (Stage 3), emit verdict
RESULT=$(poll_for_event ORCHESTRATOR eval.result "$CID")
# ORCHESTRATOR: act on verdict (Stage 3 and Stage 4)
for partition in 1 2 3; do
(
CLAIM=$(python3 "$KERNEL_PY" claim_task \
--task-id "$CYCLE_ID" --partition $partition --agent INNER_AGENT --ttl 600)
if [ "$CLAIM" = "claimed" ]; then
# INNER_AGENT: full execution obligation (Stage 3)
python3 "$KERNEL_PY" emit_event \
--agent INNER_AGENT --type result --action task.complete \
--status success --to ORCHESTRATOR --correlation-id "$CYCLE_ID" \
--summary "partition:$partition score:0.88 verdict:KEEP survey:saved"
python3 "$KERNEL_PY" release_lock "task_${CYCLE_ID}_p${partition}"
fi
) &
done
wait
python3 "$KERNEL_PY" read_events --agent ORCHESTRATOR
CID=$(python3 -c "import uuid; print(uuid.uuid4().hex[:8])")
python3 "$KERNEL_PY" emit_event \
--agent ORCHESTRATOR --type intent --action task.assigned \
--to INNER_AGENT --correlation-id "$CID" \
--summary "packet:handoffs/packet-${CID}.md target:skill-B"
# INNER_AGENT: poll, execute, eval, survey, reply (Stage 3)
REPLY=$(poll_for_event ORCHESTRATOR task.complete "$CID")
Mandatory event chain:
loop.start -> task.assigned -> task.complete -> eval.result -> orchestrator.decision -> loop.close
MANDATORY GATE: ORCHESTRATOR must receive
eval.resultwith KEEP/DISCARD verdict from PEER_AGENT before applying any improvement or emittingorchestrator.decision. The eval.result event carries the verdict AND the PEER_AGENT self-assessment reference. Merging ontask.completealone is a protocol violation.
# ORCHESTRATOR assigns task
python3 "$KERNEL_PY" emit_event \
--agent ORCHESTRATOR --type intent --action task.assigned \
--to INNER_AGENT --correlation-id "$CID" \
--summary "packet:handoffs/packet-${CID}.md target:skills/skill-A/SKILL.md"
# Wait for task.complete
TC=$(poll_for_event ORCHESTRATOR task.complete "$CID")
# Signal PEER_AGENT to eval
python3 "$KERNEL_PY" emit_event \
--agent ORCHESTRATOR --type signal --action signal.wakeup \
--to PEER_AGENT --correlation-id "$CID" \
--summary "eval-target:skills/skill-A/SKILL.md output:handoffs/out-${CID}.md"
# Wait for eval.result — MANDATORY before any decision
ER=$(poll_for_event ORCHESTRATOR eval.result "$CID")
# Emit decision
python3 "$KERNEL_PY" emit_event \
--agent ORCHESTRATOR --type result --action orchestrator.decision \
--status success --correlation-id "$CID" \
--summary "verdict:KEEP improvements-applied:yes"
Every time INNER_AGENT receives task.assigned, it MUST:
Read the strategy packet at the path in the event summary.
Execute the assigned work — edit target skill, workflow doc, or artifact.
Emit friction events immediately when hitting uncertainty, wrong syntax, or needing help.
Run the eval engine using the os-eval-runner canonical scripts.
The experiment dir must have been bootstrapped by os-eval-lab-setup first
(deploys evals/evals.json with should_trigger boolean schema, evals/results.tsv,
and references/program.md).
Option A — pure scorer (get JSON metrics, decide KEEP/DISCARD manually):
python3 ./scripts/eval_runner.py --skill path/to/target/
# Pass the FOLDER path, not a file. Output: JSON with accuracy + F1 scores.
Option B — loop gate (evaluate.py returns exit 0=KEEP, 1=DISCARD automatically):
python3 ./scripts/evaluate.py --skill path/to/target/
# Exit 0 = KEEP (accuracy AND F1 >= baseline). Exit 1 = DISCARD. Exit 2 = path error.
# Exit 3 = tampered env (.lock.hashes mismatch) — delete .lock.hashes, re-run --baseline.
See os-eval-runner Troubleshooting section for exit code reference, keywords footgun,
and 4-character word floor.
If DISCARD: revert edit, note failure in output file, emit task.complete --status fail.
Write output to handoffs/out-${CID}.md.
Complete the Post-Run Self-Assessment Survey (see Stage 4.2).
Emit task.complete including score, output path, and survey path in summary.
Every time PEER_AGENT receives signal.wakeup for eval, it MUST:
evaluate.py independently — do NOT read the score from the INNER_AGENT event.
Use evaluate.py (loop gate) for KEEP/DISCARD; it compares against results.tsv baseline
automatically and returns exit code 0=KEEP or 1=DISCARD.
python3 ./scripts/evaluate.py --skill path/to/target/
# Note: PEER_AGENT runs this from its OWN session independently.
results.tsv is the authoritative per-experiment baseline
(written by os-eval-runner). The improvement-ledger.md tracks cross-cycle OS-level trajectory.eval.result with KEEP/DISCARD verdict, score delta, and survey path:
python3 "$KERNEL_PY" emit_event \
--agent PEER_AGENT --type result --action eval.result \
--status success --to ORCHESTRATOR --correlation-id "$CID" \
--summary "verdict:KEEP score-before:0.82 score-after:0.89 gaps:adversarial survey:retrospectives/survey_DATE_PEER_AGENT.md"
On KEEP verdict:
orchestrator.decision.On DISCARD verdict:
handoffs/correction-${CID}.md using severity schema:
orchestrator.decision until KEEP is received.python3 "$KERNEL_PY" emit_event \
--agent ORCHESTRATOR --type result --action loop.close \
--status success --correlation-id "$CYCLE_ID" \
--summary "improvements-applied:N friction-events:N"
Every agent that performed work this cycle MUST complete the Post-Run Self-Assessment Survey
(references/memory/post_run_survey.md). Answer every section — do not skip.
Save completed survey to:
context/memory/retrospectives/survey_[YYYYMMDD]_[HHMM]_[AGENT].md
Survey sections (all mandatory):
Run Metadata: date, task type, task complexity, skill under test
Completion Outcome:
Count-Based Signals (Karpathy Parity):
Qualitative Friction:
Improvement Recommendation:
After saving, emit survey_completed event:
python3 "$KERNEL_PY" emit_event \
--agent PEER_AGENT --type learning --action survey_completed \
--summary "retrospectives/survey_${DATE}_${TIME}_PEER_AGENT.md"
python3 "${CLAUDE_PROJECT_DIR}/context/kernel.py" emit_event \
--agent post_run_hook --type intent --action session_summary
python3 ./scripts/post_run_metrics.py
This emits a type: metric event with:
human_interventions — count of human rescues this cycleworkflow_uncertainty — count of uncertainty friction eventsmissed_steps — count of skipped required stepscli_errors — count of wrong CLI syntax errorsfriction_events_total — total friction eventshook_errors — count from context/memory/hook-errors.logORCHESTRATOR writes context/memory/YYYY-MM-DD.md:
# Session Log: YYYY-MM-DD (Cycle: CYCLE_ID)
## Summary
[What was improved, which skills/workflows were modified]
## Eval Results
- Target: [skill or artifact]
- Score before: [baseline from results.tsv]
- Score after: [new score]
- Verdict: KEEP / DISCARD
- Gaps remaining: [from PEER_AGENT survey]
## Metrics (from post_run_metrics.py)
- Human interventions: N
- Friction events: N
- CLI errors: N
- Hook errors: N
## Agent Surveys
- INNER_AGENT: retrospectives/survey_DATE_TIME_INNER_AGENT.md
- PEER_AGENT: retrospectives/survey_DATE_TIME_PEER_AGENT.md
- Top recommendation: [single most impactful change from surveys]
## Skills / Workflows Updated
- [skill name]: [what changed and why]
## Open Items
- [ ] [Gaps flagged CRITICAL or MODERATE in surveys for next cycle]
ORCHESTRATOR writes a Loop Report before running os-memory-manager. This is the
cycle's official record. Save to context/memory/loop-reports/report_[CYCLE_ID].md:
# Loop Report: [CYCLE_ID] — [YYYY-MM-DD HH:MM]
## Agent Summaries
### ORCHESTRATOR
[2-3 sentence summary: what was assigned, what decision was made, what was applied]
### INNER_AGENT
[2-3 sentence summary: what was executed, what score was produced, what friction was hit]
### PEER_AGENT
[2-3 sentence summary: eval run, verdict, gaps identified, self-assessment headline]
## Baseline vs Result
| Metric | Before | After | Delta |
|--------|--------|-------|-------|
| Eval score | [results.tsv baseline] | [new score] | [+/-] |
| Friction events | [prior cycle count] | [this cycle count] | [+/-] |
| Human interventions | [prior] | [this cycle] | [+/-] |
## Survey Response Summary
- INNER_AGENT biggest friction: [one line from survey qualitative section]
- PEER_AGENT biggest friction: [one line from survey qualitative section]
- ORCHESTRATOR biggest friction: [one line from survey qualitative section]
- Top improvement recommendation: [the single most impactful change cited across surveys]
## Artifacts Updated This Cycle
- [ ] Skill updated: [path] — [what changed]
- [ ] Script updated: [path] — [what changed]
- [ ] Hook updated: [path] — [what changed]
- [ ] Memory updated: context/memory/YYYY-MM-DD.md
- [ ] L3 promoted: [N facts to context/memory.md]
- [ ] Survey saved: retrospectives/survey_[DATE]_[AGENT].md (each agent)
## Status
- [ ] Results saved to memory: YES / NO
- [ ] Triple-Loop Retrospective triggered: YES (cause: [friction pattern]) / NO
Emit loop report written event:
python3 "$KERNEL_PY" emit_event \
--agent ORCHESTRATOR --type result --action loop.report \
--correlation-id "$CYCLE_ID" \
--summary "report:loop-reports/report_${CYCLE_ID}.md"
The loop report is always written to disk. After writing, ask the user:
"Loop report saved to
context/memory/loop-reports/report_[CYCLE_ID].md. Would you like me to surface the summary now?"
Only display the report content if the user says yes. Never display it automatically.
After the loop report is written, update the test scenario record per
references/testing/test-registry-protocol.md:
Open context/memory/tests/[CYCLE_ID]_[TARGET_SLUG].md and fill in the Results section:
Update context/memory/tests/registry.md row from IN PROGRESS to CLOSED with verdict.
If the hypothesis was Confirmed: promote the finding to context/memory.md L3 with
a dedup ID and a reference to the cycle ID as evidence.
If the hypothesis was Falsified: add a "DO NOT RE-TEST" entry to context/memory.md
with the cycle ID, so future cycles do not waste time re-running it.
If Inconclusive: note what additional data would be needed and what to change in the test design before retrying.
Emit registry updated event:
python3 "$KERNEL_PY" emit_event \
--agent ORCHESTRATOR --type learning --action test_registry_updated \
--correlation-id "$CYCLE_ID" \
--summary "scenario:tests/${CYCLE_ID}_[TARGET_SLUG].md verdict:[KEEP/DISCARD] next-hypothesis:[one-line]"
After the test registry update, ORCHESTRATOR appends to context/memory/improvement-ledger.md.
This is the longitudinal record that makes the cycle of improvement visible over time.
See references/memory/improvement-ledger-spec.md for the full format and writing protocol.
Section 1 — Eval Score Progression (one row, every cycle):
| [DATE] | [CYCLE_ID] | [TARGET] | [baseline score] | [after score] | [+/-delta] | KEEP/DISCARD | [N sub-cycles] | [what changed in 5-10 words] |
Section 2 — Survey-to-Action Trace (one row per friction item that generated a change):
| [DATE] | [survey file name] | [AGENT] | [friction item — exact quote from survey] | [action taken] | [target file] | [what changed] | [eval delta after change] | KEEP/DISCARD/pending |
Section 3 — North Star Metric (one row per session, written ONCE at session close):
| [DATE] | [session ID] | [total cycles] | [cycles without human rescue] | [completion %] | [human interventions total] | [friction events total] | [trend vs prior session] |
After appending, emit:
python3 "$KERNEL_PY" emit_event \
--agent ORCHESTRATOR --type learning --action ledger_updated \
--correlation-id "$CYCLE_ID" \
--summary "target:[TARGET] delta:[DELTA] verdict:[VERDICT] survey-actions:[N rows added to section 2]"
Optional: update progress chart (run after every KEEP cycle, or on user request):
python3 ./scripts/generate_report.py \
--project-dir "${CLAUDE_PROJECT_DIR}" \
--plugin-dir "${CLAUDE_PLUGIN_ROOT}"
After running: "Progress chart updated at context/memory/reports/progress_[TIMESTAMP].png. Want to see the summary?"
Only display the chart/summary if the user says yes — never auto-display.
If north star regresses 2 consecutive sessions: log a warning in the ledger and invoke
Triple-Loop Retrospective in Full Loop mode at the start of the next session. Do not wait for the
friction event threshold — a completion rate decline is a systemic signal.
Run os-memory-manager to evaluate session log entries for L3 promotion:
<SUPERSEDE old_id=NNN> if overwriting a prior factAfter os-memory-manager runs, review the session for facts worth persisting in Claude's
cross-session auto-memory (memory/MEMORY.md in the project memory directory).
This is distinct from os-memory-manager (which promotes facts into context/memory.md
inside the lab). Auto-memory persists across all future conversations — it is the agent's
durable long-term knowledge about the user, project, and working patterns.
What belongs here (not in os-memory-manager):
What does NOT belong here (use os-memory-manager instead, or skip):
Procedure:
memory/MEMORY.md — check for stale entries that need updatingmemory/MEMORY.mdChecklist — ask before closing:
feedback_*.mdproject_*.mdproject_*.md or feedback_*.mduser_*.mdIf all four answers are "no", skip this step. Otherwise, update memory before emitting loop.close.
Note: The most common omission is feedback memory — if the user corrected an approach or confirmed a non-obvious choice worked, that should be saved. Watch for it.
After metrics are collected, ORCHESTRATOR checks the friction threshold:
FRICTION=$(python3 -c "
import json
events = [json.loads(l) for l in open('${CLAUDE_PROJECT_DIR}/context/events.jsonl') if l.strip()]
# Count friction events by cause this cycle
from collections import Counter
causes = Counter(e.get('summary','').split('cause:')[-1].split()[0]
for e in events if e.get('type') == 'friction' and e.get('correlation_id') == '$CYCLE_ID')
print(max(causes.values()) if causes else 0, list(causes.most_common(1)))
")
If any single friction cause appears 3+ times this cycle: invoke Triple-Loop Retrospective in
Full Loop mode automatically. Pass the friction pattern and relevant survey excerpts.
The learning loop will run root cause analysis (Kernel/RAM/Stdlib layer), propose a fix,
run the eval-gate, and apply the improvement before the next cycle begins.
python3 "$KERNEL_PY" release_lock memory
# Each agent:
python3 "$KERNEL_PY" emit_event --agent <ROLE> --type agent_stop --action shutdown \
--summary "surveys:saved metrics:emitted memory:written"
Invoke os-clean-locks if any .lock dirs remain.
Autonomous Workflow Completion Rate: percentage of cycles that complete the full
loop.start -> task.complete -> eval.result -> orchestrator.decision -> loop.close
chain without human rescue. Track this in the session log. Goal: increase every cycle.
Supporting metrics (all tracked by post_run_metrics.py, goal: decrease every cycle):