Plugin

crucible

Name: crucible
Author: raddue

Run eval-tested AI agent skills to automate systematic software development pipelines: design features from ideas, generate TDD plans with failing tests first, dispatch adversarial testing and code reviews on git diffs, enforce quality gates with shadow checkpoints, debug cross-component issues, and safely merge PRs after verifications.

npx claudepluginhub raddue/crucible

Component Overview

Skills

Component Details

Skills (39)

adversarial-tester

/adversarial-tester

Use after completing implementation to find unknown failure modes. Reads implementation diff and writes up to 5 tests designed to make it break. Triggers on 'break it', 'adversarial test', 'stress test implementation', 'find weaknesses', or any task seeking to expose unknown failure modes.

assay

/assay

Recon-informed approach evaluator. Weighs competing options against codebase constraints and returns structured recommendations with confidence scoring, kill criteria, and evidence grounding. Consumes recon briefs or caller context. Used by design, spec, migrate. Triggers on /assay, 'evaluate approaches', 'which option', 'compare alternatives'.

audit

/audit

Adversarial review of code subsystems or non-code artifacts (design docs, plans, concepts) through parallel analytical lenses. Triggers on 'audit', 'review subsystem', 'audit this design', 'review this plan', 'audit concept', 'check the save system', 'examine the UI code', or any task requesting adversarial review of existing artifacts.

cartographer

/cartographer-skill

Use when exploring unfamiliar code and want to persist what you learn, when starting a task and want to consult known codebase structure, or when collaborators need module-specific context for implementation or review

checkpoint

/checkpoint

Shadow git checkpoint system for pipeline rollback. Creates working directory snapshots without modifying the project's git history. Invoked by build, quality-gate, and debugging orchestrators at pipeline boundaries.

code-review

/code-review

Use when completing tasks, implementing major features, or before merging to verify work meets requirements

consensus

/consensus

Multi-model consensus for high-stakes quality decisions. Opt-in MCP-based system that dispatches prompts to multiple LLM providers in parallel and synthesizes their responses. Used by quality-gate (stagnation judge, periodic red-team) and design (Challenger).

design

/design

You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation.

distill

/distill

Convert heavy document formats (PDF, Word, Excel, PowerPoint, and 10+ others) to token-efficient Markdown/CSV with structurally-aware digest compression. Use when Claude needs to read documents without burning excessive context budget. Triggers on /distill, 'distill this', 'convert to markdown', 'make this readable'.

finish

/finish

Use when implementation is complete, all tests pass, and you need to decide how to integrate the work - guides completion of development work by presenting structured options for merge, PR, or cleanup

forge

/forge

Use when a significant task completes and you need a retrospective, when starting a new task and want to consult past lessons, or when 10+ retrospectives have accumulated and skill improvements should be proposed

getting-started

/getting-started

Use when starting any conversation - establishes how to find and use skills, requiring skill activation before ANY response including clarifying questions

innovate

/innovate

Use when a design doc or implementation plan is finalized and you want a divergent creativity injection before adversarial review. Proposes the single most impactful addition.

inquisitor

/inquisitor

Use when a full feature is assembled and you want to hunt cross-component bugs before final quality gate. Runs 5 parallel adversarial dimensions against the complete implementation diff. Triggers on 'inquisitor', 'hunt bugs', 'cross-component test', 'find integration issues', or automatically in build pipeline Phase 4.

merge-pr

/merge-pr

Use when merging a PR after implementation is complete - verifies CI, runs local tests, checks repo safety, and monitors post-merge health. Triggers on 'merge pr', 'merge this', 'land this PR', or any task executing a PR merge.

migrate

/migrate

Autonomous migration planning and execution. Takes a migration target (framework upgrade, API version bump, dependency major version, deprecation removal) and produces a phased migration plan with compatibility verification, then optionally executes via build's refactor mode. Triggers on /migrate, 'migration plan', 'upgrade X from Y to Z', 'remove deprecated', 'major version bump'.

mock-to-unity

/mock-to-unity

Use when implementing Unity UI Toolkit code from a mockup, HTML reference, screenshot, or visual spec. Triggers on "implement this mockup", "translate to USS", "build this UI", "match the mock", "UI looks wrong", "fix the layout", "spacing is off", "doesn't look like the design", or any task turning a visual reference into Unity UI Toolkit USS/C# code.

mockup-builder

/mockup-builder

Use when creating a visual mockup, UI prototype, or HTML reference for your project's UI. Triggers on "mockup", "prototype", "UI reference", "design a panel", "mock up", or any task producing a visual HTML file for later Unity UI Toolkit implementation.

parallel

/parallel

Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies

planning

/planning

Use when you have a spec or requirements for a multi-step task, before touching code

prd

/prd

Generate a stakeholder-facing PRD from a design doc. Use when you have a finalized design doc and need a non-technical product requirements document for archival or sharing. Triggers on /prd, 'generate PRD', 'write a PRD', 'product requirements'.

project-init

/project-init

Use when onboarding to an unfamiliar codebase and want full structural context before the first real task. Deep-scans the repo and discovers cross-repo topology.

prospector

/prospector

Explore a codebase for architectural friction and propose competing redesigns. Triggers on 'prospector', 'find improvements', 'architecture friction', 'what should I refactor', 'where are the structural problems', or any task requesting discovery of codebase improvement opportunities.

quality-gate

/quality-gate

Iterative red-teaming of any artifact (design docs, plans, code, hypotheses, mockups). Loops until clean or stagnation. Invoked by artifact-producing skills or their parent orchestrator.

recall

/recall

Query session history from the persistent activity index. Returns event logs, summaries, and filtered views that survive context compaction.

recon

/recon

Standalone codebase investigation. Produces a layered Investigation Brief with core findings (structure, patterns, scope, prior art) plus optional depth modules. Dispatches parallel scouts, synthesizes findings, and feeds cartographer. Use before any task requiring codebase understanding.

red-team

/red-team

Use when you need adversarial review of any artifact — design docs, implementation plans, code, PRs, or documentation. Iterates until clean or stagnation.

replay

/replay

Resume interrupted pipelines from dispatch manifests, or replay historical pipelines with template mutations for A/B experimentation. Triggers on /replay, 'resume pipeline', 'replay build', 'A/B test templates'.

review-feedback

/review-feedback

Use when receiving code review feedback, before implementing suggestions, especially if feedback seems unclear or technically questionable - requires technical rigor and verification, not performative agreement or blind implementation

skill-creator

/skill-creator

Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.

skill-selection-evals

/skill-selection-evals

Eval-only skill for measuring skill routing accuracy. Not invoked directly — contains selection evals that test whether the agent picks the correct skill for a given prompt.

source-driven-development

/source-driven-development

Enforces the Detect → Fetch → Implement → Cite protocol when implementing against external frameworks or libraries. Invoke when a change touches an external API surface and the edit exceeds the triviality threshold, so implementations come from current official docs rather than stale training-data recall.

spec

/spec

Use when you have a GitHub epic (or equivalent) with child tickets and want to autonomously produce design docs, implementation plans, and machine-readable contracts for each ticket without human interaction. Triggers on /spec, 'spec out', 'write specs for', 'spec this epic'.

stocktake

/stocktake

Audits all crucible skills for overlap, staleness, broken references, and quality. Quick scan or full evaluation modes.

test-coverage

/test-coverage

Audit existing tests for staleness, needed updates, or removal after code changes. Use after any code modification to verify test suite alignment. Triggers on 'audit tests', 'check test alignment', 'stale tests', 'test health', or any task verifying tests match changed code. Technology-agnostic — works with any language and test framework.

test-driven-development

/test-driven-development

Use when implementing any feature or bugfix, before writing implementation code

ui-verify

/ui-verify

Use when verifying Unity UI matches a mockup, after implementing UI from a visual reference, when a user shares a screenshot showing UI drift, when checking theming compliance, or when UI looks wrong. Triggers on "verify", "compare to mock", "does this match", "check the UI", "screenshot shows wrong", "UI looks wrong", "fix the layout", "spacing is off", "colors are off", "doesn't look like the design", "visual bug", or completing any UI implementation task.

verify

/verify

Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always

worktree

/worktree

Use when starting feature work that needs isolation from current workspace or before executing implementation plans - creates isolated git worktrees with smart directory selection and safety verification

README

Crucible

A collection of agent skills for systematic software development. Works with Claude Code, Cursor, OpenAI Codex, Amp, Cline, and any platform that supports the SKILL.md format.

42 skills across the full development lifecycle — design, planning, TDD implementation, code review, debugging, adversarial testing, and quality gates. Every skill is eval-tested with measured A/B deltas (49 execution evals + 18 sequence evals, +29% / +31% average).

Originally forked from obra/superpowers, now independently maintained and significantly diverged. Pipeline checkpoint system, auto skill extraction, structured context compression, and trajectory capture inspired by NousResearch/hermes-agent.

Marketplace

Platform	Status
Claude Code	Pending review
skills.sh	`npx skills add raddue/crucible`
Cursor / Codex / Amp / Cline	Compatible — same SKILL.md format

What you get

Iterative quality gates, not single-pass review — quality-gate loops until clean (red-team → fix → fresh re-review), with weighted stagnation detection. Measured +68% delta (see eval results).
Full pipeline orchestration — build chains design → plan → execute → complete with shadow-git checkpoints, crash recovery, and structured compaction-state blocks. One command, idea to merged PR.
Adversarial testing at every level — adversarial-tester writes tests designed to break the implementation; inquisitor runs 5 parallel attack dimensions against the full feature diff; siege runs 6 attacker-perspective security agents until zero Critical/High.
Token-efficient by design — orchestrators use disk-mediated dispatch: full subagent prompts go to /tmp, only ~100-token pointers enter orchestrator context. A full build saves 73-131K tokens of fossilized context.
Crash-resilient and session-continuous — replay resumes interrupted pipelines from the last phase boundary (10 minutes, not 90); recall queries a searchable session activity index that survives compaction.

See docs/architecture.md for how the orchestrator skills compose, and docs/skills.md for the full catalog.

Install

Claude Code

git clone git@github.com:raddue/crucible.git ~/repos/crucible
ln -sf ~/repos/crucible/skills/* ~/.claude/skills/

Or, when available on the marketplace: claude plugin install raddue/crucible.

Cursor / OpenAI Codex / Amp / Cline

git clone git@github.com:raddue/crucible.git ~/repos/crucible

Then point your platform at the cloned skills directory. See Cursor plugin docs and Codex skills docs. Some advanced features (parallel subagent dispatch, agent teams, persistent memory) are platform-specific and degrade gracefully — see PLATFORMS.md.

Setup essentials

Run in auto mode for autonomous pipelines (/build, /debugging, long /quality-gate runs). Auto mode lets Claude execute tool calls without per-action prompts while still confirming destructive operations. It supersedes earlier --dangerously-skip-permissions guidance.

Set CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 for build's parallel team execution. Without it, independent tasks fall back to sequential dispatch — everything still works, just slower.

Recommended but optional: enable the forge (retrospectives) and cartographer (codebase map) knowledge accelerators — they compound skill quality over time. See docs/architecture.md.

For hooks (session activity index, build routing advisor), MCP-based external model review, autocompact tuning, and other Claude Code specifics, see docs/configuration.md.

Documentation

docs/skills.md — full catalog of all 42 skills
docs/architecture.md — how the orchestrator skills compose
docs/configuration.md — Claude Code settings, hooks, MCP, env vars
docs/evals.md — full eval methodology and per-skill A/B deltas
PLATFORMS.md — cross-platform compatibility notes

Similar Plugins

agentic-dev-team

Persona-driven AI development team: orchestrator, team agents, review agents, skills, slash commands, and advisory hooks for Claude Code

Stats

Version1.0.0

Stars10

Forks2

MaintenanceExcellent

LicenseMIT

Last CommitApr 27, 2026

AddedMar 25, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Help us improve

Share bugs, ideas, or general feedback.

Back to Plugins

Crucible

A collection of agent skills for systematic software development. Works with Claude Code, Cursor, OpenAI Codex, Amp, Cline, and any platform that supports the SKILL.md format.

Marketplace

Platform	Status
Claude Code	Pending review
skills.sh	`npx skills add raddue/crucible`
Cursor / Codex / Amp / Cline	Compatible — same SKILL.md format

What you get

Iterative quality gates, not single-pass review — quality-gate loops until clean (red-team → fix → fresh re-review), with weighted stagnation detection. Measured +68% delta (see eval results).
Full pipeline orchestration — build chains design → plan → execute → complete with shadow-git checkpoints, crash recovery, and structured compaction-state blocks. One command, idea to merged PR.
Adversarial testing at every level — adversarial-tester writes tests designed to break the implementation; inquisitor runs 5 parallel attack dimensions against the full feature diff; siege runs 6 attacker-perspective security agents until zero Critical/High.
Token-efficient by design — orchestrators use disk-mediated dispatch: full subagent prompts go to /tmp, only ~100-token pointers enter orchestrator context. A full build saves 73-131K tokens of fossilized context.
Crash-resilient and session-continuous — replay resumes interrupted pipelines from the last phase boundary (10 minutes, not 90); recall queries a searchable session activity index that survives compaction.

See docs/architecture.md for how the orchestrator skills compose, and docs/skills.md for the full catalog.

Install

Claude Code

git clone git@github.com:raddue/crucible.git ~/repos/crucible
ln -sf ~/repos/crucible/skills/* ~/.claude/skills/

Or, when available on the marketplace: claude plugin install raddue/crucible.

Cursor / OpenAI Codex / Amp / Cline

git clone git@github.com:raddue/crucible.git ~/repos/crucible

Setup essentials

Set CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 for build's parallel team execution. Without it, independent tasks fall back to sequential dispatch — everything still works, just slower.

Recommended but optional: enable the forge (retrospectives) and cartographer (codebase map) knowledge accelerators — they compound skill quality over time. See docs/architecture.md.

For hooks (session activity index, build routing advisor), MCP-based external model review, autocompact tuning, and other Claude Code specifics, see docs/configuration.md.

Documentation

docs/skills.md — full catalog of all 42 skills
docs/architecture.md — how the orchestrator skills compose
docs/configuration.md — Claude Code settings, hooks, MCP, env vars
docs/evals.md — full eval methodology and per-skill A/B deltas
PLATFORMS.md — cross-platform compatibility notes

crucible

Component Overview

Component Details

Skills (39)

README

Crucible

Marketplace

What you get

Install

Claude Code

Cursor / OpenAI Codex / Amp / Cline

Setup essentials

Documentation

Similar Plugins

agentic-dev-team

Help us improve

Help us improve

crucible

Component Overview

Component Details

Skills (39)

README

Crucible

Marketplace

What you get

Install

Claude Code

Cursor / OpenAI Codex / Amp / Cline

Setup essentials

Documentation

Similar Plugins

agentic-dev-team

Help us improve

correctless

superpowers

sdlc-wizard

agent-skills

ring-default