crucible

What survives the test, ships — evidence-gated execution for Claude Code.

Live site: crucible.withagents.dev Field journal entry: (coming)

What survives the test, ships.

A Claude Code plugin that converts task execution into a scientific procedure. Every change-producing run produces a reproducible evidence package, and completion is forbidden unless every Mandatory Success Criterion is backed by an inspectable artifact and a quorum of independent Oracles approves.

In one sentence: Crucible is the gate between "I did the work" and "the work is done."

What's in the box

	Count
Slash commands (`/crucible:*`)	19 — three tiers: orchestration, authoring, inspection
Skills	12 — codebase-analysis, docs-research, planning, skill-enrichment (NEW v0.4), validation, evidence-indexing, session-log-audit, oracle-review, completion-gate, enable, disable, setup
Subagents	11 — planner, codebase-analyst, docs-researcher, skill-discoverer (NEW v0.4), validator, 3 reviewers, 3 oracles
Hooks	4 — SessionStart, PreToolUse, PostToolUse, Stop
Bin scripts	4 — hook handlers (read JSON stdin, exit 2 to block)
Setup scripts	2 — CLAUDE.md installer + progress tracker
Skill scripts	3 — `gate.py` (completion gate), `build_index.py` (evidence indexer), `discover_skills.py` (NEW v0.4: skill-enrichment)
Rule templates	4 — Iron-Rule, Cite-or-Refuse, Cite-Paths, No-Self-Review

Iron-Rule violations: 0. Crucible was itself built under its own discipline — the build evidence package lives in ../evidence/ of this repo.

Why it exists

LLM-driven engineering systems routinely declare success without proof. They claim a feature works because the code looks right; they claim a refactor is safe because it compiles; they emit "Done!" while leaving silent test failures and missing migrations behind. This isn't adversarial — it's the default failure mode of a context-bounded system trained to produce coherent text. Coherent text is not evidence.

Crucible removes the option to fake completion. Three moves at the plugin layer:

Hooks watch every tool use. PreToolUse rejects writes to test files, mocks, stubs, fixtures. Stop refuses session end unless evidence/completion-gate/report.json shows overall=COMPLETE.
Verdicts cite paths or are invalid. Every PASS / FAIL / APPROVE / BLOCK must point to a specific file (and ideally line range). Prose isn't a citation.
Independence is structural, not advisory. The agent that produced an artifact may not also approve it. Three reviewers in isolation. Three Oracles in isolation. The synthesizer aggregates raw verdicts; it never rewrites them.

When Crucible says COMPLETE, an outside reviewer with only evidence/ can independently verify. When it refuses, the refusal is structured, machine-readable, and remediable.

Quick start

# 1. install (once per machine)
claude plugin marketplace add krzemienski/crucible
claude plugin install crucible@crucible-local

# 2. set up (once per project)
cd my-project
/crucible:setup --local

# 3. work
/crucible:forge "Add /healthz endpoint that returns {status:ok}"

If /crucible:forge refuses:

/crucible:remediate          # auto-generates delta plan from REFUSAL.md
/crucible:forge              # retry

Or use /crucible:autopilot <task> to loop forge → remediate → forge up to 3 attempts automatically.

If you're stuck and need out:

/crucible:disable             # clean opt-out
touch .crucible/disabled      # nuclear opt-out
CRUCIBLE_DISABLE=1 claude     # one-shell escape

Documentation

Doc	When to read it
`docs/OVERVIEW.md`	Architecture, philosophy, evidence model, gate sequence, quorum mechanics, refusal protocol — the conceptual reference
`docs/USAGE.md`	Per-command reference (all 19), per-skill reference (all 12), per-subagent reference (all 11), three worked walkthroughs, refusal recovery playbook, FAQ
`docs/CRUCIBLE-CLAUDE-MD.md`	The canonical CLAUDE.md fragment that `/crucible:setup` installs
`INSTALL.md`	Three install paths, prerequisites, troubleshooting, activation lifecycle
`CHANGELOG.md`	Release history (v0.1.0 → v0.4.0)

For "what does X actually do?" questions, run:

/crucible:explain forge          # DAG of any pipeline
/crucible:doctor                 # 9-check installation health
/crucible:status                 # current gate state

Command tiers (at a glance)

Tier 1 — Orchestration (the conductors)

/crucible:forge · /crucible:autopilot · /crucible:remediate · /crucible:resume · /crucible:trial

crucible

crucible

Popularity

What's Inside

Confidence

README

Crucible

What's in the box

Why it exists

Quick start

Documentation

Command tiers (at a glance)

Tier 1 — Orchestration (the conductors)

Similar Plugins

claude-council

octo

creative-writing

context7

More by krzemienski

shannon

lynx

deepest-plan

anneal-temper

multi-agent-consensus

Similar Plugins

claude-council

octo

creative-writing

context7

More by krzemienski

shannon

lynx

deepest-plan

anneal-temper

multi-agent-consensus

Popularity

startup-business-analyst

dotnet-skills

Health & Quality