From crucible
Evaluates competing approaches against codebase constraints using recon briefs or context. Returns structured recommendations with confidence scoring, kill criteria, and evidence-grounded citations.
npx claudepluginhub raddue/crucibleThis skill uses the workspace's default tool permissions.
<!-- CANONICAL: shared/dispatch-convention.md -->
Assesses code, designs, or approaches with 0-10 rating, pros/cons analysis, and actionable recommendations. Use for evaluating quality or trade-offs.
Builds weighted decision matrices, analyzes trade-offs, and generates ADRs for architectural, technical, and process decisions like database selection or framework choice.
Evaluates decisions via stance rotation (neutral, advocate, critic perspectives), synthesizes confidence-rated recommendation with next steps. For architectural choices, tech options, build-vs-buy, tradeoffs.
Share bugs, ideas, or general feedback.
All subagent dispatches use disk-mediated dispatch. See shared/dispatch-convention.md for the full protocol.
Evaluate competing approaches against codebase constraints. Returns a structured Assay Report with a recommendation, alternatives with kill criteria, and confidence scoring. Evidence-grounded — recommendations cite specific file:line references, not generic best practices.
Skill type: Rigid — follow exactly, no shortcuts.
Models:
Announce at start: "I'm using the assay skill to evaluate competing approaches."
Name origin: In metallurgy, an assay tests raw material to determine its quality and composition before committing it to the forge.
/assay
question: "How should the auth middleware handle token refresh?"
context: { ... }
decision_type: "architecture"
approaches: [...]
cascading_decisions: [...]
question (required) — The decision or question to evaluate. One clear sentence.
context (required) — Evidence for the evaluator to reason against. Accepts different shapes depending on the caller:
| Caller | Context Shape | Key Fields |
|---|---|---|
/design | Recon brief + agent findings | project_structure, existing_patterns, scope_boundaries, prior_art |
/spec | Recon brief + agent findings (autonomous) | project_structure, existing_patterns, scope_boundaries, prior_art |
/migrate | Recon brief + migration analysis | project_structure, migration_target, breaking_changes, blast_radius |
| Generic caller | Freeform evidence | description (string) — unstructured context, lower confidence |
When context contains unrecognized keys, the evaluator treats them as additional evidence. When context is a bare string, treat as { "description": context }.
decision_type (optional) — architecture | strategy | diagnosis | optimization. Auto-detected from the question if omitted. Defaults to architecture when ambiguous.
approaches (optional) — Array of { name, description } candidates to evaluate. When omitted, the evaluator generates 2-4 candidates from the question and context.
cascading_decisions (optional) — Array of { decision, reasoning } representing prior decisions. Treated as hard constraints — the evaluator cannot modify or challenge them. Conflicts are reported in prior_decision_conflicts.
question is present and non-emptycontext is present (object or string)decision_type is provided, validate it's one of the 4 recognized valuesapproaches is provided, verify it's an array with at least 2 entries, each having name and descriptionDispatch a single Opus agent using skills/assay/assay-evaluator-prompt.md.
Fill template placeholders before writing the dispatch file:
{{QUESTION}} — the decision question{{CONTEXT}} — the full context object/string{{DECISION_TYPE}} — the decision type (provided or "auto-detect"){{APPROACHES}} — the approaches array (or "Generate 2-4 candidates"){{CASCADING_DECISIONS}} — cascading decisions array (or "None")Parse the evaluator's response as JSON. Validate:
decision_type, confidence, missing_information, recommended, alternatives, prior_decision_conflictsrecommended has: name, rationale, evidence, risks, kill_criteria, constraint_fitname, constraint_fit, pros, cons, would_recommend_ifconstraint_fit objects have: pattern_alignment, scope_fit, reversibility, integration_riskconfidence is one of: high, medium, lowOn validation failure: Retry once with the validation errors as feedback. On second failure, return:
{ "error": "Evaluator produced invalid output after retry", "raw_output": "..." }
Return the validated Assay Report to the caller.
The evaluator adapts scoring weights based on decision type:
| Type | Primary Weight | Secondary Weight |
|---|---|---|
architecture | Reversibility, constraint fit | Long-term cost, extensibility |
strategy | Risk, phasing | Blast radius, team capacity |
diagnosis | Evidence strength, testability | Explanation coverage, simplicity |
optimization | Measurable improvement | Disruption cost, reversibility |
{
"decision_type": "architecture",
"confidence": "high",
"missing_information": [],
"recommended": {
"name": "Event-driven via message bus",
"rationale": "Aligns with existing src/events/bus.ts pattern...",
"evidence": ["src/events/bus.ts:14 — existing event dispatch"],
"risks": ["Adds async complexity to currently synchronous flow"],
"kill_criteria": "Switch away if latency requirements exceed 50ms p99",
"constraint_fit": {
"pattern_alignment": "high",
"scope_fit": "high",
"reversibility": "two-way door",
"integration_risk": "low"
}
},
"alternatives": [
{
"name": "Direct service calls",
"constraint_fit": {
"pattern_alignment": "medium",
"scope_fit": "high",
"reversibility": "one-way door",
"integration_risk": "medium"
},
"pros": ["Simpler mental model", "Synchronous"],
"cons": ["Tight coupling", "Requires shared deployment"],
"would_recommend_if": "Latency is critical or team prefers simplicity"
}
],
"prior_decision_conflicts": []
}
| Level | Criteria |
|---|---|
high | One approach clearly dominates on all weighted dimensions |
medium | Two viable options with trade-offs that depend on priority |
low | Need more information — missing_information lists what would help |
Every recommendation must cite specific evidence from the context:
"This is the industry standard approach" is NOT evidence. "This aligns with how src/api/routes/users.ts already handles it" IS evidence.
Without a recon brief, evidence cites the caller's context. Confidence scores skew lower.
kill_criteria on recommended approach: condition that would flip the recommendationwould_recommend_if on each alternative: condition that would make it the recommendationThese make decisions revisitable without re-running the full analysis.
| Failure | Behavior |
|---|---|
Missing question or context | Return error immediately — no dispatch |
| Evaluator returns invalid JSON | Retry once with validation errors. Second failure returns { "error": ... } |
| Evaluator timeout | Return { "error": "Evaluator timed out" } |
Invalid decision_type | Warn and default to architecture |
approaches has fewer than 2 entries | Ignore provided approaches, let evaluator generate candidates |
| Skill | Decision Type | Context Source | Approaches |
|---|---|---|---|
/design | architecture | Recon brief + cascading decisions | Evaluator generates |
/spec | architecture | Recon brief + cascading decisions (autonomous — confidence routing) | Evaluator generates |
/migrate | strategy | Recon brief + migration analysis | Evaluator generates |
Not called by (investigated, not a fit): /debugging (hypothesis evaluation uses quality-gate, not assay), /prospector (competing design evaluation is more sophisticated than assay for this use case). See #147 for rationale.
From /design:
/assay
question: "How should components communicate in the new auth module?"
context: { recon brief with project_structure, existing_patterns }
decision_type: "architecture"
cascading_decisions: [{ decision: "Using Redis for session store", reasoning: "..." }]
From /spec:
/assay
question: "How should the auth middleware handle token refresh?"
context: { recon brief + investigation findings }
decision_type: "architecture"
cascading_decisions: [{ decision: "Using Redis for session store", reasoning: "..." }]
Spec consumes assay output autonomously: high confidence = accept, medium = terminal alert, low = block alert.
From /migrate:
/assay
question: "What migration strategy minimizes risk for the React 18→19 upgrade?"
context: { recon brief + migration_target: "React 19", breaking_changes: [...] }
decision_type: "strategy"
/assay question: "Should we use PostgreSQL or SQLite for this project?"
context: "Small team, <10K users, read-heavy workload, deployed on single server"
skills/assay/assay-evaluator-prompt.md/recon)/design's Challenger agent)