From crucible
Runs structured retrospectives after significant tasks, consults past lessons before new tasks, and proposes skill improvements from patterns in 10+ retrospectives.
npx claudepluginhub raddue/crucibleThis skill uses the workspace's default tool permissions.
<!-- CANONICAL: shared/dispatch-convention.md -->
Logs cross-project outcomes and recalls lessons to inform new sessions, avoiding past mistakes. Analyzes skill executions for better routing. Use /memento modes: log, global recall, health, route.
Analyzes skill outcome logs, user corrections, traces, and description bloat to propose improvements for other skills. Triggers on 'improve' or /improve commands.
Captures high/medium/low confidence patterns from conversations to prevent repeating mistakes and preserve successes. Invoke proactively after corrections, praise, edge cases, or skill-heavy sessions.
Share bugs, ideas, or general feedback.
All subagent dispatches use disk-mediated dispatch. See shared/dispatch-convention.md for the full protocol.
Long retrospectives and multi-session forward-pass consultations maintain an Invariant Cairn per shared/cairn-convention.md. See ## Cairn (Layer 3) below.
Self-improving retrospective system. After tasks complete, runs structured retrospectives. Before tasks begin, consults accumulated lessons. Periodically proposes concrete skill edits based on evidence.
Core principle: The agent that never reviews its own performance never improves. The Forge closes the loop.
Announce at start: "I'm using the forge skill to [run a retrospective / consult past lessons / propose skill improvements]."
Per shared/cairn-convention.md. Forge-specific bindings:
receipt-ledger.jsonl; Rule 4 checks only that dispatched subagents' receipts are live.digraph forge_modes {
"Task just completed?" [shape=diamond];
"Starting new task?" [shape=diamond];
"10+ retros + recurring pattern?" [shape=diamond];
"Run Retrospective" [shape=box];
"Run Feed-Forward" [shape=box];
"Run Mutation Analysis" [shape=box];
"Not applicable" [shape=box];
"Task just completed?" -> "Run Retrospective" [label="yes"];
"Task just completed?" -> "Starting new task?" [label="no"];
"Starting new task?" -> "Run Feed-Forward" [label="yes"];
"Starting new task?" -> "10+ retros + recurring pattern?" [label="no"];
"10+ retros + recurring pattern?" -> "Run Mutation Analysis" [label="yes"];
"10+ retros + recurring pattern?" -> "Not applicable" [label="no"];
}
Three modes:
Significant task = anything that used crucible:build, crucible:debugging, or crucible:finish. Simple questions and file reads do not qualify.
All data lives in the project memory directory:
~/.claude/projects/<project-hash>/memory/forge/
retrospectives/
YYYY-MM-DD-HHMMSS-<slug>.md # Individual entries (<40 lines each)
patterns.md # Aggregated patterns (max 200 lines)
mutation-proposals/
YYYY-MM-DD-<topic>.md # Skill mutation proposals
skill-proposals/
YYYY-MM-DD-<topic>.md # Skill extraction proposals
chronicle/
signals.jsonl # Always-on execution signals (1 line per skill completion)
summary.md # Bounded summary (~100 lines, regenerated on read)
Context budget: patterns.md MUST stay under 200 lines. chronicle/summary.md MUST stay under 100 lines. Both are loaded into context during feed-forward. Individual retrospective files are NOT loaded during feed-forward — only during mutation analysis.
Chronicle is always-on — no config toggle. Signals contain no prompt content or task descriptions, only operational metrics (skill name, duration, outcome, files touched, skill-specific counts). This is separate from trajectory capture, which remains opt-in.
Skill-Worthy Patterns section format (within patterns.md):
## Skill-Worthy Patterns
- **[Pattern name]** (count: N, last seen: YYYY-MM-DD): [Description]
- Status: none | proposed ([path]) | accepted | rejected
Trajectory capture records structured data about real skill invocations for eval generation. It is OFF by default and requires explicit opt-in.
Check for ~/.claude/projects/<hash>/memory/trajectory-config.json before any
trajectory operation. If the file does not exist or enabled is false, skip all
trajectory recording silently.
Config schema:
{
"enabled": false,
"max_entries": 500,
"include_prompt_summary": true,
"additional_redact_patterns": []
}
enabled: Master switch. Default false.max_entries: Maximum entries per JSONL file before oldest are pruned. Default 500.include_prompt_summary: Whether to include the one-line redacted prompt summary. If false, prompt_summary is set to "[omitted]". Default true.additional_redact_patterns: List of regex strings for project-specific secret patterns applied during the redaction pass.All trajectory data lives alongside other forge data:
~/.claude/projects/<hash>/memory/trajectories/
trajectory_samples.jsonl # Successful completions
failed_trajectories.jsonl # Failures, partial completions, aborts
When trajectory capture is first enabled (config file is created or enabled transitions
from false to true), output to the user:
"Trajectory capture is now enabled. Here is what this means:
Before writing ANY trajectory entry, apply these redaction steps in order:
Prompt summary generation: Do NOT copy the user's prompt. Instead, generate
a one-line summary that captures the task TYPE without revealing specific content.
Good: "Add authentication middleware to REST API"
Bad: "Add JWT auth to the Acme Corp billing API at /srv/acme/billing/api.py"
If include_prompt_summary is false in config, set prompt_summary to "[omitted]".
File path normalization: Replace absolute paths with project-relative paths.
Replace home directory segments with ~. Replace username segments with [user].
Example: /home/alice/projects/myapp/src/auth.py becomes ~/projects/myapp/src/auth.py
or src/auth.py if within the project root.
Secret pattern matching: Scan all string fields for patterns matching:
[A-Za-z0-9_-]{20,} preceded by key/token/secret/api):// with credentials)KEY=value patterns)[REDACTED].Custom patterns: Apply each regex in additional_redact_patterns from the
config file against all string fields. Replace matches with [REDACTED].
Set redacted flag: Only set redacted: true after steps 1-4 complete
successfully. If any step fails, do NOT write the entry.
After any skill that completes a significant task reports success. The calling skill (or orchestrator) invokes crucible:forge in retrospective mode.
Capture raw execution metrics (if trajectory capture is enabled): Before dispatching the retrospective analyst, gather and hold in context:
These raw metrics are NOT written to disk yet. They are held in context for step 8 (trajectory recording) after the retrospective completes, where they are merged with the retrospective's analytical output (deviation type, outcome, tags) to form the complete trajectory entry.
If trajectory capture is disabled, skip this step.
Dispatch a Retrospective Analyst subagent (Sonnet) using ./retrospective-prompt.md
Provide: task description, the plan (if any), actual execution summary, skills used, duration estimate
Subagent returns structured retrospective entry
Write entry to ~/.claude/projects/<project-hash>/memory/forge/retrospectives/YYYY-MM-DD-HHMMSS-<slug>.md
Update patterns.md — read current file, merge new findings, rewrite
For debugging sessions, the retrospective also extracts diagnostic patterns using a dedicated extraction subagent (Opus). Dispatch using ./diagnostic-extraction-prompt.md. Patterns are written to cartographer's landmines via crucible:cartographer (record mode) with dead_ends and diagnostic_path fields. Tag dead-end entries with (source: debugging).
6b. For build sessions with QG fix journals: glob for ~/.claude/projects/<project-hash>/memory/quality-gate/fix-journal-*.md. For each handoff file found:
a. Read landmines.md and check for existing entries matching the same module + same failed approach (same file path AND same module AND 3+ non-stopword shared terms). If matching entries exist, skip extraction — handoff was already processed. Delete the handoff file.
b. If no match: dispatch the diagnostic extraction subagent (Opus) using ./diagnostic-extraction-prompt.md with the QG-specific addendum (see that file's "Source Context: Quality Gate Fix Journal" section). Tag dead-end entries with (source: qg).
c. Write extracted dead ends to cartographer's landmines via crucible:cartographer (record mode).
d. Delete the handoff file after successful extraction.
e. Cap-pressure behavior: If landmines.md is within 10 lines of its 100-line cap, write only Fatal-severity dead ends. At cap, skip and emit a chronicle signal: { "event": "dead_end_cap_skip", "module": "<module>", "source": "qg" }.
For build sessions with a decision journal, the retrospective also extracts substantive design decisions. The retrospective analyst identifies decisions that are NOT operational routing (reviewer-model, gate-round, task-grouping, cleanup-removal types from the journal) but are substantive design choices (technology selection, API design, architecture, constraint trade-offs). These are passed to a cartographer recorder dispatch with the "Extract decisions for cartographer" directive, alongside the module mapping from the build session's task list and design doc.
Trajectory recording (if trajectory capture is enabled):
a. Check ~/.claude/projects/<hash>/memory/trajectory-config.json — if missing
or enabled: false, skip this step entirely.
b. Construct the raw trajectory entry from execution data available in context:
trajectory_id: Generate a UUIDtimestamp: ISO-8601 of when the skill invocation startedskill: The Crucible skill that was invoked (build, debugging, audit, etc.)completed: Whether the skill ran to its natural completionoutcome: Derived from the retrospective's outcome field (success/partial/failure)duration_ms: From pipeline status timestamps or session timingtool_call_count: Estimated from execution summaryerror_recovery_events: Count of error-then-retry sequences observeduser_acceptance: Whether the user accepted the output (accepted/rejected/modified/unknown)phases_reached: For multi-phase skills, which phases completeddeviation_type: From the retrospective entryprompt_hash: SHA-256 of the original user promptprompt_summary: One-line redacted summary (if include_prompt_summary is true)redacted: Set to true only after step (c) completestags: From the retrospective entry's tags
c. Run the redaction pass (see Redaction Rules above).
d. Append the entry as a single JSON line to the appropriate file:completed == true AND outcome == "success": append to trajectory_samples.jsonlfailed_trajectories.jsonl
e. Check file size: if the target file exceeds max_entries lines, remove the
oldest entries (from the top of the file) to bring it back to max_entries.
8.5. Chronicle signal (always-on — runs regardless of trajectory capture config):
a. Construct signal entry from execution data already in context:v: 1 (schema version)ts: ISO-8601 completion timestampskill: The Crucible skill that just completedoutcome: From retrospective's outcome field (success/failure/partial)duration_m: Wall clock minutes from start to completionbranch: Current git branchfiles_touched: Project-relative paths of files modified during the skill invocationmetrics: Skill-specific metrics bag (see table below)
b. Compute efficiency sub-object from manifest (if enriched manifest data is available):.dispatch-active-* marker in the pipeline's scratch directory, or from the metrics log.manifest.jsonl from the dispatch directory.input_chars populated (non-null). If none do (pre-enrichment run), skip — do NOT include the efficiency sub-object (no zeros, no nulls).total_input_chars: sum of all input_chars values (skip nulls)total_output_chars: sum of all output_chars values (skip nulls)est_input_tokens: total_input_chars / 4 (rounded to nearest integer)est_output_tokens: total_output_chars / 4 (rounded to nearest integer)dispatches_by_tier: count of dispatches grouped by model_tier (e.g., {"opus": 5, "sonnet": 8, "haiku": 2}) — skip null tiersest_rework_tokens: for any seq with a failed/errored entry followed by a retry, sum the retry's (input_chars + output_chars) / 4. 0 if no retries occurred.rework_pct: est_rework_tokens / (est_input_tokens + est_output_tokens) * 100, rounded to 1 decimalactive_work_m: from existing metrics log computation (overlapping parallel intervals merged)wall_clock_m: from existing duration computationmetrics.efficiency in the signal entry.
c. Append as a single JSON line to ~/.claude/projects/<hash>/memory/chronicle/signals.jsonl
d. If the file or directory doesn't exist, create it
e. This step does NOT require redaction — signals contain no prompt content,
task descriptions, or secrets. Only operational facts.Example signal (with efficiency):
{"v":1,"ts":"2026-03-25T10:00:00Z","skill":"build","outcome":"success","duration_m":42,"branch":"feat/auth-refactor","files_touched":["src/auth/token.ts","src/auth/refresh.ts"],"metrics":{"mode":"feature","tasks":5,"tasks_passed":5,"qg_rounds":3,"review_rounds":2,"stagnation":false,"efficiency":{"total_input_chars":128400,"total_output_chars":82000,"est_input_tokens":32100,"est_output_tokens":20500,"est_rework_tokens":4200,"rework_pct":8.0,"dispatches_by_tier":{"opus":5,"sonnet":8,"haiku":2},"active_work_m":28,"wall_clock_m":42}}}
Example signal (without efficiency — pre-enrichment or no manifest data):
{"v":1,"ts":"2026-03-25T10:00:00Z","skill":"build","outcome":"success","duration_m":42,"branch":"feat/auth-refactor","files_touched":["src/auth/token.ts","src/auth/refresh.ts"],"metrics":{"mode":"feature","tasks":5,"tasks_passed":5,"qg_rounds":3,"review_rounds":2,"stagnation":false}}
Metrics bag by skill:
| Skill | Metrics |
|---|---|
| build | mode, tasks, tasks_passed, qg_rounds, review_rounds, stagnation |
| debugging | hypotheses, root_cause_category, where_else_hits |
| quality-gate | artifact_type, rounds, fatals_found, stagnation |
| design | questions_investigated, auto_resolved |
| planning | task_count, review_rounds |
| audit | findings_count, lenses_dispatched |
| code-review | rounds, findings_by_severity |
| TDD | cycles, red_green_refactor_count |
| all skills | efficiency (optional sub-object, present only when enriched manifest data exists) |
Signal scope rule: Emit one signal per top-level skill invocation, not per sub-skill dispatch. When build calls quality-gate internally, quality-gate does NOT emit its own signal — its metrics are captured in build's metrics bag. Standalone invocations of quality-gate, code-review, etc. DO emit signals.
This is self-enforcing: forge retrospective only runs at the end of a top-level skill invocation, so Step 8.5 naturally fires once per top-level skill. Sub-skills called within build do not trigger their own forge retrospective.
Skill extraction check (all sessions): Evaluate the just-produced
retrospective entry against the following trigger heuristics. If ANY
trigger fires, dispatch a Skill Extraction Analyst subagent (Sonnet)
using ./extraction-analyst-prompt.md.
Trigger heuristics (ANY fires = dispatch analyst):
Dispatch input: retrospective entry, execution summary, existing skill names/descriptions, existing proposals in skill-proposals/ and mutation-proposals/.
Handle output:
source: extraction tag, update patterns.md entry with status: proposedThis step is RECOMMENDED, not REQUIRED. Failure does not break the retrospective. If the analyst cannot determine skill-worthiness, record the pattern and move on.
patterns.md (create if first retrospective)If total retrospective count >= 10 AND any deviation type has 3+ occurrences, suggest to user:
"Forge has accumulated enough data for skill improvement proposals. Would you like to run mutation analysis?"
If a skill extraction proposal was generated in step 9, notify the user:
"Forge detected a skill-worthy workflow: [proposed skill name or extension target]. Proposal written to [path]. When you're ready, you can use skill-creator with this proposal as a starting point."
Do NOT prompt for immediate action. The notification is informational. The user decides when (or whether) to act on it.
Forge retrospective is RECOMMENDED but not REQUIRED. When a significant skill completes but no retrospective is triggered (user declines, session ending, quick task), trajectory data would be lost.
To handle this, any skill that completes a significant task SHOULD write a minimal trajectory entry if:
The minimal entry uses deviation_type: "unknown", tags: [], and
outcome based on the completion signal alone (success if the skill reported
success, failure if it reported failure, partial otherwise). The entry still
goes through the full redaction pass.
This ensures trajectory data is captured even when forge does not run, at the
cost of less-rich analytical fields. The skill-creator's eval generation pipeline
handles entries with deviation_type: "unknown" by clustering on execution
metrics alone.
Similarly, any skill that completes a significant task SHOULD append a minimal
chronicle signal if no forge retrospective is expected to run. The minimal signal
uses outcome from the skill's own completion status, files_touched from
git diff --name-only, and whatever metrics are available in context. Chronicle
signals require no redaction (they contain no prompt content), so the fallback
path is simpler than trajectory fallback. This ensures chronicle data is captured
even when forge does not run.
Before crucible:design, crucible:planning, or crucible:build begins its core work.
~/.claude/projects/<project-hash>/memory/forge/patterns.md existspatterns.md (under 200 lines — safe for context)
3.5. Chronicle context (always-on):
a. Check if ~/.claude/projects/<hash>/memory/chronicle/signals.jsonl exists
b. If not found: skip (cold start — no chronicle data yet)
c. If found: compare signals.jsonl mtime with chronicle/summary.md mtime
- If summary.md doesn't exist OR signals.jsonl is newer: regenerate summary.md
- Regeneration: Read all signals from signals.jsonl, compute:
- Hotspots: Group files_touched by cartographer module (if module maps exist
in memory/cartographer/modules/) or by directory prefix. A module qualifies as
a hotspot when it has 3+ signals with friction indicators (stagnation=true,
metrics.qg_rounds>2, skill="debugging", or outcome="failure"/"stagnation").
Show top 5 hotspots sorted by signal count.
- Skill Performance: Aggregate runs, avg duration, avg QG rounds, stagnation
rate, success rate per skill. Cap at 8 rows.
- Trends: Compare last 10 signals vs prior 10 for key metrics.
- Recent Friction: Last 5 signals with friction indicators.
- Hard cap at 100 lines — drop Trends and Recent Friction sections first if needed.
- Write regenerated summary to chronicle/summary.md
d. Load chronicle/summary.md into context alongside patterns.md
e. Pass both to the Feed-Forward Advisor in Step 4
3.7. Dead-end context (if cartographer data exists):
a. Check if ~/.claude/projects/<project-hash>/memory/cartographer/landmines.md exists
b. If not found: skip (no dead-end data yet)
c. If found: identify the upcoming task's target file paths from the task description. Resolve each to a cartographer module via Path: prefix matching (same logic as Cartographer Mode 3 Load step 7). Scan landmines.md for entries with file paths resolving to the same modules.
d. If 0 matching entries: skip
e. If 1+ matching entries: extract the matching entries (both source: qg and source: debugging). Pass to the Feed-Forward Advisor in Step 4 under the "Dead-End Context" section.
Note: Forge scans landmines.md directly rather than routing through Cartographer Mode 3 Load to avoid coupling — feed-forward works even when no Cartographer consult runs in the current session../feed-forward-prompt.md
4b. Trajectory context (if trajectory capture is enabled):
Also read ~/.claude/projects/<hash>/memory/trajectories/failed_trajectories.jsonl
and extract the 5 most recent failure entries for the upcoming skill type.
Pass these to the Feed-Forward Advisor alongside patterns.md.
The advisor can surface trajectory-specific warnings like:
When patterns.md shows 10+ total retrospectives AND recurring patterns (3+ occurrences of same deviation type). Can also be invoked manually.
patterns.md and ALL individual retrospective files in retrospectives/./mutation-proposal-prompt.md~/.claude/projects/<project-hash>/memory/forge/mutation-proposals/YYYY-MM-DD-<topic>.mdNEVER AUTO-MODIFY SKILLS. PROPOSALS ONLY.
The Forge produces proposals for human review. It does not edit skill files. It does not dispatch subagents to edit skill files. It does not suggest "just making this small change." Every mutation requires explicit human approval.
| Calling Skill | Mode | When | What to Pass |
|---|---|---|---|
crucible:build | Feed-Forward | Phase 1 start | Feature description |
crucible:build | Retrospective | Phase 4, after red-team, before finishing | Full build summary |
crucible:debugging | Retrospective | After fix verified | Bug description + hypothesis log |
crucible:debugging | Retrospective (diagnostic extraction) | After fix verified | Session artifacts → cartographer landmines with dead_ends + diagnostic_path |
crucible:finish | Retrospective | After Step 3, before Step 4 | Branch summary + review findings |
crucible:design | Feed-Forward | Before first question | Topic description |
crucible:build | Retrospective (decision extraction) | After fix verified | Decision journal + task list → cartographer decisions via recorder |
| Any skill | Trajectory Record | After retrospective step 7 | Execution data + retrospective output (opt-in only) |
| Any skill | Chronicle Signal | After retrospective step 8 | Execution metrics (always-on) |
Forge is RECOMMENDED, not REQUIRED. It is a learning accelerator, not a quality gate. Skipping it does not produce broken output — it misses an opportunity to learn.
Skill extraction is an internal step within Mode 1 (Retrospective). It does not require any calling skill to pass additional data -- the retrospective entry itself provides the input. The extraction analyst may read existing skill descriptions from the skill directories to check for overlap.
| Mode | Trigger | Model | Template | Output |
|---|---|---|---|---|
| Retrospective | Task completes | Sonnet | retrospective-prompt.md | Entry file + patterns.md update |
| Feed-Forward | Task begins | Sonnet | feed-forward-prompt.md | 3-5 targeted warnings |
| Mutation | 10+ retros + manual | Opus | mutation-proposal-prompt.md | Proposal doc for human review |
Never:
patterns.md exceed 200 linesAlways:
redacted: true only after the redaction pass completes| Excuse | Reality |
|---|---|
| "Task was too simple for a retrospective" | Simple tasks reveal patterns too. 2 minutes max. |
| "No time for retrospective" | Retrospective prevents the NEXT task from repeating the mistake. |
| "Feed-forward data is stale" | Prune mechanism handles staleness. Read it anyway. |
| "Mutation proposal is obviously correct, just apply it" | Iron Law: proposals only. Humans decide. |
| "Only one data point, feed-forward is useless" | Even one warning is better than none. Report limited data. |
| "I'll run the retrospective later" | Later never comes. Run it now, while context is fresh. |
| "I already know what went wrong" | Knowing is not recording. Write it down so FUTURE sessions know too. |
| "This workflow is obviously worth a skill, just create it" | Iron Law: proposals only. Humans decide. Even the best-looking workflow may be a one-off. |
| "Every session has a skill-worthy pattern" | If >30% of retrospectives trigger proposals, the heuristics are too loose. Tighten them. |
| "The proposal is low-confidence, not worth writing" | Low-confidence proposals are seeds. They become medium when they recur. Write them down. |
| "Trajectory data is too noisy to be useful" | Even noisy data reveals patterns at scale. 10 failed trajectories with the same deviation type IS a signal. |
| "I'll enable trajectory capture later" | Later means no data for the current project. Enable it now if you want eval generation from real usage. |
| "The redaction pass is too conservative" | Conservative redaction protects the user. A missed eval scenario is cheaper than a leaked secret. |
| "This task is too small to record" | Small tasks reveal patterns too. The trajectory entry is one JSON line — the cost is negligible. |
Bloating patterns.md
Skipping feed-forward on cold start
Treating warnings as requirements
Running mutation analysis too early
Proposing skills for domain-specific workflows
Ignoring extraction proposals
Duplicate proposals from extraction and mutation
./retrospective-prompt.md — Post-task retrospective analyst dispatch./feed-forward-prompt.md — Pre-task feed-forward advisor dispatch./mutation-proposal-prompt.md — Skill mutation analyst dispatch./diagnostic-extraction-prompt.md — Debugging session diagnostic pattern extraction dispatch./extraction-analyst-prompt.md -- Skill-worthy workflow detection and proposal generation dispatch