By raddue
Run eval-tested AI agent skills to automate systematic software development pipelines: design features from ideas, generate TDD plans with failing tests first, dispatch adversarial testing and code reviews on git diffs, enforce quality gates with shadow checkpoints, debug cross-component issues, and safely merge PRs after verifications.
npx claudepluginhub raddue/crucibleUse after completing implementation to find unknown failure modes. Reads implementation diff and writes up to 5 tests designed to make it break. Triggers on 'break it', 'adversarial test', 'stress test implementation', 'find weaknesses', or any task seeking to expose unknown failure modes.
Recon-informed approach evaluator. Weighs competing options against codebase constraints and returns structured recommendations with confidence scoring, kill criteria, and evidence grounding. Consumes recon briefs or caller context. Used by design, spec, migrate. Triggers on /assay, 'evaluate approaches', 'which option', 'compare alternatives'.
Adversarial review of code subsystems or non-code artifacts (design docs, plans, concepts) through parallel analytical lenses. Triggers on 'audit', 'review subsystem', 'audit this design', 'review this plan', 'audit concept', 'check the save system', 'examine the UI code', or any task requesting adversarial review of existing artifacts.
Use when exploring unfamiliar code and want to persist what you learn, when starting a task and want to consult known codebase structure, or when collaborators need module-specific context for implementation or review
Shadow git checkpoint system for pipeline rollback. Creates working directory snapshots without modifying the project's git history. Invoked by build, quality-gate, and debugging orchestrators at pipeline boundaries.
Use when completing tasks, implementing major features, or before merging to verify work meets requirements
Multi-model consensus for high-stakes quality decisions. Opt-in MCP-based system that dispatches prompts to multiple LLM providers in parallel and synthesizes their responses. Used by quality-gate (stagnation judge, periodic red-team) and design (Challenger).
You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation.
Convert heavy document formats (PDF, Word, Excel, PowerPoint, and 10+ others) to token-efficient Markdown/CSV with structurally-aware digest compression. Use when Claude needs to read documents without burning excessive context budget. Triggers on /distill, 'distill this', 'convert to markdown', 'make this readable'.
Use when implementation is complete, all tests pass, and you need to decide how to integrate the work - guides completion of development work by presenting structured options for merge, PR, or cleanup
Use when a significant task completes and you need a retrospective, when starting a new task and want to consult past lessons, or when 10+ retrospectives have accumulated and skill improvements should be proposed
Use when starting any conversation - establishes how to find and use skills, requiring skill activation before ANY response including clarifying questions
Use when a design doc or implementation plan is finalized and you want a divergent creativity injection before adversarial review. Proposes the single most impactful addition.
Use when a full feature is assembled and you want to hunt cross-component bugs before final quality gate. Runs 5 parallel adversarial dimensions against the complete implementation diff. Triggers on 'inquisitor', 'hunt bugs', 'cross-component test', 'find integration issues', or automatically in build pipeline Phase 4.
Use when merging a PR after implementation is complete - verifies CI, runs local tests, checks repo safety, and monitors post-merge health. Triggers on 'merge pr', 'merge this', 'land this PR', or any task executing a PR merge.
Autonomous migration planning and execution. Takes a migration target (framework upgrade, API version bump, dependency major version, deprecation removal) and produces a phased migration plan with compatibility verification, then optionally executes via build's refactor mode. Triggers on /migrate, 'migration plan', 'upgrade X from Y to Z', 'remove deprecated', 'major version bump'.
Use when implementing Unity UI Toolkit code from a mockup, HTML reference, screenshot, or visual spec. Triggers on "implement this mockup", "translate to USS", "build this UI", "match the mock", "UI looks wrong", "fix the layout", "spacing is off", "doesn't look like the design", or any task turning a visual reference into Unity UI Toolkit USS/C# code.
Use when creating a visual mockup, UI prototype, or HTML reference for your project's UI. Triggers on "mockup", "prototype", "UI reference", "design a panel", "mock up", or any task producing a visual HTML file for later Unity UI Toolkit implementation.
Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies
Use when you have a spec or requirements for a multi-step task, before touching code
Generate a stakeholder-facing PRD from a design doc. Use when you have a finalized design doc and need a non-technical product requirements document for archival or sharing. Triggers on /prd, 'generate PRD', 'write a PRD', 'product requirements'.
Use when onboarding to an unfamiliar codebase and want full structural context before the first real task. Deep-scans the repo and discovers cross-repo topology.
Explore a codebase for architectural friction and propose competing redesigns. Triggers on 'prospector', 'find improvements', 'architecture friction', 'what should I refactor', 'where are the structural problems', or any task requesting discovery of codebase improvement opportunities.
Iterative red-teaming of any artifact (design docs, plans, code, hypotheses, mockups). Loops until clean or stagnation. Invoked by artifact-producing skills or their parent orchestrator.
Query session history from the persistent activity index. Returns event logs, summaries, and filtered views that survive context compaction.
Standalone codebase investigation. Produces a layered Investigation Brief with core findings (structure, patterns, scope, prior art) plus optional depth modules. Dispatches parallel scouts, synthesizes findings, and feeds cartographer. Use before any task requiring codebase understanding.
Use when you need adversarial review of any artifact — design docs, implementation plans, code, PRs, or documentation. Iterates until clean or stagnation.
Resume interrupted pipelines from dispatch manifests, or replay historical pipelines with template mutations for A/B experimentation. Triggers on /replay, 'resume pipeline', 'replay build', 'A/B test templates'.
Use when receiving code review feedback, before implementing suggestions, especially if feedback seems unclear or technically questionable - requires technical rigor and verification, not performative agreement or blind implementation
Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.
Eval-only skill for measuring skill routing accuracy. Not invoked directly — contains selection evals that test whether the agent picks the correct skill for a given prompt.
Enforces the Detect → Fetch → Implement → Cite protocol when implementing against external frameworks or libraries. Invoke when a change touches an external API surface and the edit exceeds the triviality threshold, so implementations come from current official docs rather than stale training-data recall.
Use when you have a GitHub epic (or equivalent) with child tickets and want to autonomously produce design docs, implementation plans, and machine-readable contracts for each ticket without human interaction. Triggers on /spec, 'spec out', 'write specs for', 'spec this epic'.
Audits all crucible skills for overlap, staleness, broken references, and quality. Quick scan or full evaluation modes.
Audit existing tests for staleness, needed updates, or removal after code changes. Use after any code modification to verify test suite alignment. Triggers on 'audit tests', 'check test alignment', 'stale tests', 'test health', or any task verifying tests match changed code. Technology-agnostic — works with any language and test framework.
Use when implementing any feature or bugfix, before writing implementation code
Use when verifying Unity UI matches a mockup, after implementing UI from a visual reference, when a user shares a screenshot showing UI drift, when checking theming compliance, or when UI looks wrong. Triggers on "verify", "compare to mock", "does this match", "check the UI", "screenshot shows wrong", "UI looks wrong", "fix the layout", "spacing is off", "colors are off", "doesn't look like the design", "visual bug", or completing any UI implementation task.
Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always
Use when starting feature work that needs isolation from current workspace or before executing implementation plans - creates isolated git worktrees with smart directory selection and safety verification
A collection of agent skills for systematic software development. Works with Claude Code, Cursor, OpenAI Codex, Amp, Cline, and any platform that supports the SKILL.md format.
42 skills across the full development lifecycle — design, planning, TDD implementation, code review, debugging, adversarial testing, and quality gates. Every skill is eval-tested with measured A/B deltas (49 execution evals + 18 sequence evals, +29% / +31% average).
Originally forked from obra/superpowers, now independently maintained and significantly diverged. Pipeline checkpoint system, auto skill extraction, structured context compression, and trajectory capture inspired by NousResearch/hermes-agent.
| Platform | Status |
|---|---|
| Claude Code | Pending review |
| skills.sh | npx skills add raddue/crucible |
| Cursor / Codex / Amp / Cline | Compatible — same SKILL.md format |
quality-gate loops until clean (red-team → fix → fresh re-review), with weighted stagnation detection. Measured +68% delta (see eval results).build chains design → plan → execute → complete with shadow-git checkpoints, crash recovery, and structured compaction-state blocks. One command, idea to merged PR.adversarial-tester writes tests designed to break the implementation; inquisitor runs 5 parallel attack dimensions against the full feature diff; siege runs 6 attacker-perspective security agents until zero Critical/High./tmp, only ~100-token pointers enter orchestrator context. A full build saves 73-131K tokens of fossilized context.replay resumes interrupted pipelines from the last phase boundary (10 minutes, not 90); recall queries a searchable session activity index that survives compaction.See docs/architecture.md for how the orchestrator skills compose, and docs/skills.md for the full catalog.
git clone git@github.com:raddue/crucible.git ~/repos/crucible
ln -sf ~/repos/crucible/skills/* ~/.claude/skills/
Or, when available on the marketplace: claude plugin install raddue/crucible.
git clone git@github.com:raddue/crucible.git ~/repos/crucible
Then point your platform at the cloned skills directory. See Cursor plugin docs and Codex skills docs. Some advanced features (parallel subagent dispatch, agent teams, persistent memory) are platform-specific and degrade gracefully — see PLATFORMS.md.
Run in auto mode for autonomous pipelines (/build, /debugging, long /quality-gate runs). Auto mode lets Claude execute tool calls without per-action prompts while still confirming destructive operations. It supersedes earlier --dangerously-skip-permissions guidance.
Set CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 for build's parallel team execution. Without it, independent tasks fall back to sequential dispatch — everything still works, just slower.
Recommended but optional: enable the forge (retrospectives) and cartographer (codebase map) knowledge accelerators — they compound skill quality over time. See docs/architecture.md.
For hooks (session activity index, build routing advisor), MCP-based external model review, autocompact tuning, and other Claude Code specifics, see docs/configuration.md.
Persona-driven AI development team: orchestrator, team agents, review agents, skills, slash commands, and advisory hooks for Claude Code
Share bugs, ideas, or general feedback.
Correctness workflow for structured development. 26 skills with configurable intensity levels: spec-before-code, skeptical review, enforced TDD, formal modeling, adversarial review, verification, documentation, and metrics. Intensity gates let you choose standard, high, or critical rigor per project.
Complete Superpowers skills library by Jesse Vincent (obra/superpowers), packaged for GitHub Copilot CLI. Includes TDD, systematic debugging, brainstorming, plan writing, subagent-driven development, code review, and more.
SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd
Production-grade engineering skills for AI coding agents — covering the full software development lifecycle from spec to ship.
Core skills library for the Lerian Team: TDD, debugging, collaboration patterns, and proven techniques. Features parallel 3-reviewer code review system (Foundation, Correctness, Safety), systematic debugging, and workflow orchestration. 21 essential skills for software engineering excellence.