Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
By krzemienski
Evidence-gated task planning, execution, and validation for Claude Code. Refuses completion without quorum-approved proof. No mocks. No stubs. No silent retries past the gate.
npx claudepluginhub krzemienski/crucible --plugin crucibleScaffold a new Crucible subagent at agents/<name>.md with role-appropriate frontmatter and tool grants. Roles: planner, reviewer, oracle, validator, analyst, generic.
Run the Final Oracle evidence audit independently against the current evidence/ tree. Convenes 3 Oracle auditors, computes quorum, writes decision.md. Idempotent — safe to re-run. PRD §1.16.2 CMD-3.
/crucible:forge in a refusal-driven retry loop. Runs forge; on REFUSED, parses REFUSAL.md, invokes /crucible:remediate, re-gates. Stops on COMPLETE or after --max-attempts (default 3). Iron Rule preserved at every iteration.
Scaffold a new Crucible slash command at commands/<name>.md with proper frontmatter (name, description, allowed-tools), pipeline section, and refusal-modes section.
Verify Crucible installation health. Compares the plugin manifest (plugin.json) against the Claude Code plugin record and reports drift. Verifies SDK reachability. Read-only diagnostic. PRD §1.16.2 CMD-5.
Use this subagent to build repo-wide context before any modification in Crucible's planning mode. Activate whenever a planning task starts, whenever the planner subagent needs a module map, or whenever a refactor task requires understanding existing code. Read-only — never modifies source. Always runs before the planner builds the executable plan. Outputs a structured evidence/codebase-analysis/ directory.
Use this subagent to fetch and cite current upstream documentation for every external dependency in scope. Activate before writing code against any SDK/framework/API/CLI, whenever training data may be outdated, or whenever a fact must be sourced rather than recalled. Produces evidence/documentation-research/ with raw markdown sources, ISO-8601 fetch timestamps, and a SUMMARY.md citing 3-5 verified facts per source pointing to local sources/ filenames. Refuses memory-only references.
Use this subagent as the FIRST of at least three Final Oracle auditors in Crucible's quorum-gated final evidence audit (VG-14). Oracle 1's emphasis is COMPLETENESS-AND-CITATION — do reviewer-consensus + every MSC have approved verdicts and citations? Activate when the final evidence audit phase begins. Read-only access to evidence/. Issues APPROVE or BLOCK with cited blockers. Never shares context with Oracle 2 or 3. Quorum requires ≥2 APPROVE.
Use this subagent as the SECOND of at least three Final Oracle auditors in Crucible's quorum-gated final evidence audit (VG-14). Oracle 2's emphasis is STRUCTURAL INTEGRITY — does every directory have README.md + INDEX.md, are gate-receipt files (vg0-* through vg15-*) all present, and does the report.json schema parse? Activate when the final evidence audit phase begins. Read-only access to evidence/. Issues APPROVE or BLOCK with cited blockers. Never shares context with Oracle 1 or 3.
Use this subagent as the THIRD of at least three Final Oracle auditors in Crucible's quorum-gated final evidence audit (VG-14). Oracle 3's emphasis is ADVERSARIAL SKEPTICISM — try to find what a hostile reviewer would point at to BLOCK completion. Activate when the final evidence audit phase begins. Read-only access to evidence/. Issues APPROVE or BLOCK with cited blockers. Never shares context with Oracle 1 or 2. Designed to catch what completeness/integrity audits miss.
Build repo-wide context before any modification. Use this skill whenever starting a planning task in a real codebase, refactoring across multiple files, surveying module boundaries, identifying hot paths, or understanding existing code before changing it. Produces a structured evidence/codebase-analysis/ artifact (file inventory, module map, dependency manifests, hot-path identification). Read-only — never modifies source. Always runs before the planning skill in comprehensive mode.
Evaluate the completion gate — refuse on any missing criterion. Use this skill ONLY when invoked as the final step of a Crucible run, when VG-15 executes, or when a user attempts to claim completion. Reads the entire evidence/ tree, evaluates every Mandatory Success Criterion (MSC-1..MSC-21) against cited evidence, AND requires three-reviewer consensus PASS plus Oracle quorum APPROVED. Emits machine-readable evidence/completion-gate/report.json. Has NO override flag. NO force-complete. Refusal is a feature.
Deactivate Crucible enforcement in the current project. Use this when Crucible's hooks are blocking a session in a project that is not actively running a Crucible workflow, or when you want to step out of enforcement temporarily. This is the explicit opt-out — Crucible's hooks become silent no-ops. Removes the .crucible/active sentinel. Reversible via /crucible:enable. Does not delete any evidence/ artifacts.
Fetch and cite current upstream documentation for every external dependency in scope. Use this skill before writing any code that calls an external SDK, framework, API, or CLI. Use whenever training data might be outdated. Use whenever a fact must be sourced rather than recalled. Produces evidence/documentation-research/ with raw markdown sources, ISO-8601 fetch timestamps, and a SUMMARY.md citing 3-5 verified facts per source pointing to local sources/ filenames. Refuses memory-only references — every fact must cite a sources/ file.
Activate Crucible enforcement in the current project. Use this when you intentionally want Crucible's hooks (PreToolUse, PostToolUse, Stop) to enforce evidence-gated completion. Without activation, Crucible is silent in this project. This is the explicit opt-in step before starting any /crucible:planning or /crucible:validation workflow. Creates a sentinel file at .crucible/active in the project root. Reversible via /crucible:disable. Safe to invoke multiple times — idempotent.
Executes bash commands
Hook triggers when Bash tool is used
Modifies files
Hook triggers on file write and edit operations
Share bugs, ideas, or general feedback.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Complete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
Comprehensive startup business analysis with market sizing (TAM/SAM/SOM), financial modeling, team planning, and strategic research
v9.44.0 — Patch release for cursor-agent smoke checks in untrusted workspaces. Run /octo:setup.
Comprehensive .NET development skills for modern C#, ASP.NET, MAUI, Blazor, Aspire, EF Core, Native AOT, testing, security, performance optimization, CI/CD, and cloud-native applications
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Shannon Framework v6 — single-plugin consolidation replacing OMC + VF + Crucible + 13 others. 26 commands, 22 skills, 10 agents, 14 hooks across 7 domain modules + 4 enforcement layers.
Sharp-eyed visual-audit suite for Claude Code. Two coupled skills catch real UI defects — contrast failures, false affordances, modal opacity, contract mismatches — via real-system probes, zero mocks, evidence-cited verdicts.
Deepest-mode planning — consensus + gates + phase hierarchy + multi-plan tournament synthesis. Self-contained. Works with Claude Code and OpenCode.
Fixed-Point Deepen architecture of the Anneal plugin family. One plan, heated and cooled repeatedly — inline red team at every depth, Momus 0-100 scoring, convergence by variance/delta/cap.
3-agent unanimous consensus validation with hard gates for Claude Code. Maps Lead/Alpha/Bravo roles to CC subagents, enforces unanimity at phase transitions, persists evidence per phase+role.
Uses power tools
Uses Bash, Write, or Edit tools
Uses power tools
Uses Bash, Write, or Edit tools
Share bugs, ideas, or general feedback.
What survives the test, ships — evidence-gated execution for Claude Code.
Live site: crucible.withagents.dev Field journal entry: (coming)
What survives the test, ships.
A Claude Code plugin that converts task execution into a scientific procedure. Every change-producing run produces a reproducible evidence package, and completion is forbidden unless every Mandatory Success Criterion is backed by an inspectable artifact and a quorum of independent Oracles approves.
In one sentence: Crucible is the gate between "I did the work" and "the work is done."
| Count | |
|---|---|
Slash commands (/crucible:*) | 19 — three tiers: orchestration, authoring, inspection |
| Skills | 12 — codebase-analysis, docs-research, planning, skill-enrichment (NEW v0.4), validation, evidence-indexing, session-log-audit, oracle-review, completion-gate, enable, disable, setup |
| Subagents | 11 — planner, codebase-analyst, docs-researcher, skill-discoverer (NEW v0.4), validator, 3 reviewers, 3 oracles |
| Hooks | 4 — SessionStart, PreToolUse, PostToolUse, Stop |
| Bin scripts | 4 — hook handlers (read JSON stdin, exit 2 to block) |
| Setup scripts | 2 — CLAUDE.md installer + progress tracker |
| Skill scripts | 3 — gate.py (completion gate), build_index.py (evidence indexer), discover_skills.py (NEW v0.4: skill-enrichment) |
| Rule templates | 4 — Iron-Rule, Cite-or-Refuse, Cite-Paths, No-Self-Review |
Iron-Rule violations: 0. Crucible was itself built under its own
discipline — the build evidence package lives in ../evidence/ of this repo.
LLM-driven engineering systems routinely declare success without proof. They claim a feature works because the code looks right; they claim a refactor is safe because it compiles; they emit "Done!" while leaving silent test failures and missing migrations behind. This isn't adversarial — it's the default failure mode of a context-bounded system trained to produce coherent text. Coherent text is not evidence.
Crucible removes the option to fake completion. Three moves at the plugin layer:
Hooks watch every tool use. PreToolUse rejects writes to test
files, mocks, stubs, fixtures. Stop refuses session end unless
evidence/completion-gate/report.json shows overall=COMPLETE.
Verdicts cite paths or are invalid. Every PASS / FAIL / APPROVE / BLOCK must point to a specific file (and ideally line range). Prose isn't a citation.
Independence is structural, not advisory. The agent that produced an artifact may not also approve it. Three reviewers in isolation. Three Oracles in isolation. The synthesizer aggregates raw verdicts; it never rewrites them.
When Crucible says COMPLETE, an outside reviewer with only evidence/ can
independently verify. When it refuses, the refusal is structured,
machine-readable, and remediable.
# 1. install (once per machine)
claude plugin marketplace add krzemienski/crucible
claude plugin install crucible@crucible-local
# 2. set up (once per project)
cd my-project
/crucible:setup --local
# 3. work
/crucible:forge "Add /healthz endpoint that returns {status:ok}"
If /crucible:forge refuses:
/crucible:remediate # auto-generates delta plan from REFUSAL.md
/crucible:forge # retry
Or use /crucible:autopilot <task> to loop forge → remediate → forge up to
3 attempts automatically.
If you're stuck and need out:
/crucible:disable # clean opt-out
touch .crucible/disabled # nuclear opt-out
CRUCIBLE_DISABLE=1 claude # one-shell escape
| Doc | When to read it |
|---|---|
docs/OVERVIEW.md | Architecture, philosophy, evidence model, gate sequence, quorum mechanics, refusal protocol — the conceptual reference |
docs/USAGE.md | Per-command reference (all 19), per-skill reference (all 12), per-subagent reference (all 11), three worked walkthroughs, refusal recovery playbook, FAQ |
docs/CRUCIBLE-CLAUDE-MD.md | The canonical CLAUDE.md fragment that /crucible:setup installs |
INSTALL.md | Three install paths, prerequisites, troubleshooting, activation lifecycle |
CHANGELOG.md | Release history (v0.1.0 → v0.4.0) |
For "what does X actually do?" questions, run:
/crucible:explain forge # DAG of any pipeline
/crucible:doctor # 9-check installation health
/crucible:status # current gate state
/crucible:forge · /crucible:autopilot · /crucible:remediate ·
/crucible:resume · /crucible:trial