From crucible
Iteratively red-teams artifacts like design docs, plans, code, hypotheses, and mockups by dispatching subagents for adversarial review and fixes until clean or stagnation.
npx claudepluginhub raddue/crucibleThis skill uses the workspace's default tool permissions.
<!-- CANONICAL: shared/dispatch-convention.md -->
Performs adversarial reviews of design docs, implementation plans, code, PRs, or documentation using fresh Devil's Advocate subagents. Iterates until clean or stagnation detected.
Spawns three parallel adversarial reviewers (Feasibility, Completeness, Scope & Alignment) after plan drafting; all must pass before presenting to user. For non-trivial plans (2+ work units, 3+ files).
Runs multi-agent verification loop post-implementation, dispatching specialized agents for review with autonomous subagent fixes and retries until unanimous approval.
Share bugs, ideas, or general feedback.
All subagent dispatches use disk-mediated dispatch. See shared/dispatch-convention.md for the full protocol.
All subagent returns (red-team agents, judges, fix agents) use the Ledger Return Protocol. Every subagent returns exactly one Evidence Receipt per shared/return-convention.md; the orchestrator applies the two-tier receipt linter (see the "Receipt Linter (Ledger Return Protocol)" section below) to every Task return before acting on the declared VERDICT.
The gate maintains an Invariant Cairn per shared/cairn-convention.md. Each gate round is a cairn phase. See ## Cairn (Layer 3) below.
Shared iterative red-teaming mechanism invoked at the end of artifact-producing skills. Provides rigorous adversarial review as the core quality mechanism.
Announce at start: "Running quality gate on [artifact type]."
Skill type: Rigid -- follow exactly, no shortcuts.
Execution model: When this skill is running, YOU are the orchestrator. You drive the loop, dispatch fix agents and reviewers as subagents, track scores, and make escalation decisions. All references to "the orchestrator" in this document refer to you.
Every subagent dispatched by the gate (red-team agents, fix agents, judges) returns exactly one Evidence Receipt per shared/return-convention.md. After every Task return, apply this check before scoring findings or escalating.
Tier 1 — Structural (in-context): parse sections in the order RCPT, VERDICT, ARTIFACTS, TRACE, CLAIMS, WITNESS, SUSPICION, NEXT (unknown headers after NEXT ignored). Every CLAIM citation must resolve. Every EXEC has exit=/dur=/out= and a listed out= artifact; byte-ranges ≤ 4 KiB. Every DISPATCHED carries a valid rcpt-sha256 present in receipt-ledger.jsonl. WITNESS is mandatory (no (n/a)); kind ∈ {exec, grep, lint}; expect-fail non-empty, not wildcard-only, ≥ 4 chars (exemptions: exit-clause forms; the bare token match — valid only for kind=grep). PASS: ran=TRACE#N or SKIPPED:<reason>. FAIL/BLOCKED UNRUNNABLE: reason from closed vocabulary. ran=SKIPPED requires NEXT to contain the witness payload verbatim. ran=TRACE#N verb-binding: exec → EXEC; grep → EXEC/READ/WROTE; lint → any verb (rule re-applied to receipt itself).
Tier 2 — Witness verification: for PASS+TRACE#N, Read the cited range (≤ 4 KiB) and fail if the witness would have matched expect-fail. For FAIL+TRACE#N (weak positive-evidence), reject only if no evidence of failure is visible in the range. For SKIPPED/UNRUNNABLE, no read; record the deferred obligation.
Per shared/cairn-convention.md. Quality-gate-specific bindings:
round/1, round/2, …. A round begins at red-team dispatch, ends at judge verdict (either PASS/escalate or loop-again with score delta recorded).round/N | dispatches=<red-team+judge+fix> receipts=<same> verdict=<PASS|FAIL|MIXED> | <score delta + key finding>. Advance PHASE to the next round on loop; advance to terminal/N on PASS or escalation.active-run.md on terminal; keep cairn-<run-id>.md.score-delta: -2) for stagnation-detection audit.SUPERSEDED_BY) — keeping the invariants list from ballooning across long gates.Starting with convention v1.1, every QG subagent (red-team, judge, fix-agent) returns a receipt carrying TRIPWIRE:, SUPERSEDES:, and (when applicable) TRIPWIRE-CHILD: lines. Full grammar in shared/return-convention.md.
Manifest: After each Task return (post-lint), append:
<rcpt-sha256-prefix-12> <skill>/<dispatch-id> <verdict> TRIPWIRE: <predicates> [SUPERSEDED_BY=<prefix>] [keys=quality-gate:<k>:<v>,…] [files=<path>:<h6>,…]
Namespace CLAIM-key discriminators as quality-gate:<key> (e.g. quality-gate:severity-max:minor) — prevents collision with build/siege keys.
Sweep (dispatch-loop clause): The orchestrator MAY NOT dispatch the next round until it has: (1) linted; (2) appended; (3) processed SUPERSEDES; (4) evaluated self-checks; (5) evaluated forward-checks against every active prior entry (TRIPWIRE ∪ TRIPWIRE-CHILD); (6) Read each firing M's full receipt and narrated the re-read; (7) then dispatch.
Fix-agent supersession. A QG fix-agent supersedes the prior FAIL red-team receipt. SUPERSEDES: <fail-prefix> + cited CLAIM + exec/grep witness with ran=TRACE#N. Tier-2 re-runs the witness against the fix — only survives if clean.
Stagnation-judge tripwires. A stagnation judge's receipt declaring TRIPWIRE: peer-dispatch-disagrees(count) lets a later round's divergent issue-count fire a re-read, surfacing judge-vs-judge disagreement without a separate escalation channel.
Mandatory-work declarations for quality-gate subagent types:
read-artifact, emit-findings.read-findings, emit-scores.read-findings, apply-edits, run-tests (if tests exist for the artifact's subtree).On lint failure: treat as structurally BLOCKED regardless of declared VERDICT. Re-dispatch with lint errors appended to the brief, or escalate.
At the start of the quality gate, check whether the consensus_query MCP tool
is available in the current environment:
Do NOT:
Consensus is a transparent enhancement. Its presence improves coverage; its absence changes nothing.
At the start of the quality gate, check whether the external_review MCP tool
is available in the current environment AND skills.quality_gate is enabled in
the external review config. If either check fails, skip all external review
steps silently — no warnings, no prompts.
Every red-team round, alongside the host red-team dispatch. Call
external_review with:
prompt: contents of skills/shared/external-review-prompt.mdcontext: the same artifact context given to the red-team subagentskill: "quality_gate" (top-level argument for per-skill toggle enforcement)metadata: {"skill": "quality_gate", "round": N} (traceability)On consensus-eligible rounds where both consensus_query and external_review
are available:
external_review FIRST, before calling consensus_queryerror is null. Skip errored reviews — their
empty content would corrupt the consensus signal.additional_responses
parameter to consensus_queryCRITICAL: External findings do NOT affect the scoring algorithm.
This invariant is load-bearing. The quality gate's convergence guarantees depend on a single, consistent scoring source. Mixing external signal into scoring would create non-deterministic stagnation behavior.
external_review tool not available (MCP server not running): skip silently.status is "unavailable" (no config or disabled): skip silently.status is "error" (all models failed): skip silently, note
failure in round output. Distinct from "unavailable" — means the feature is
configured but every model errored.status is "partial" (some models failed): include available
reviews, note which models failed in round output.| Rationalization | Rebuttal | Rule |
|---|---|---|
| "This finding is minor, I'll just fix it inline instead of dispatching a fix agent." | Orchestrator-applied fixes break separation of concerns and corrupt the fix journal. Fix-agent overhead for trivial fixes is negligible; the risk of conflation is not. | All fixes route through the fix agent — no exceptions, no matter how small. |
| "Round N fixed everything, I can return PASS without another red-team round." | Fixing is not passing. A fresh red-team round is the verification step. Skipping it is a skip disguised as a pass. | The gate is only PASS after a fresh red-team round returns 0 Fatal, 0 Significant. |
| "The red-team finding is wrong / overblown, I'll mark it resolved without a fix." | Rationalizing away findings defeats the point of adversarial review. If a finding is wrong, the fix agent explicitly justifies dismissal in the fix journal — the orchestrator does not dismiss findings unilaterally. | Every Fatal/Significant finding is either fixed or documented as dismissed by the fix agent with reasoning. |
| "The score went up but I can tell it's close, skip the stagnation judge." | Stagnation detection uses weighted score, not orchestrator intuition. Score-based inline judgment is the exact failure the judge exists to catch. | Dispatch the stagnation judge whenever score is not strictly lower than the prior round. |
| "Round 15 hit — I'll squeeze in one more round, surely the next will pass." | The 15-round limit is a circuit breaker, not a suggestion. Exceeding it silently is how runaway loops happen. | At round 15, escalate to the user with full round history — never silently continue. |
| "Pre-flight dependency audit is noise for this artifact, skip it." | The audit only runs on code artifacts, and on code artifacts it's mandatory. Dependency drift is a documented source of shipped bugs. | Run the dependency audit on every code artifact; skip silently only for non-code types. |
| "The user said 'move on', that's approval to skip the gate." | General feedback is never skip approval. Skip requires an unambiguous instruction specifically referencing the gate. | Only an explicit, gate-referencing instruction counts as skip approval. |
code, run the pre-flight dependency audit (see Pre-Flight Dependency Audit below). If the result is BLOCKED and the user does not approve continuation, abort the gate. For all other artifact types, skip this step entirely — no scan, no output, no scratch files.crucible:red-team as a single-pass reviewer (one dispatch = one review round). Quality-gate owns the iteration loop; red-team produces findings for one round and returns. Red-team does NOT run its own stagnation loop when invoked by quality-gate.### Verifier Assessment heading; write verdict summary to round-N-verification.md
d. If Fatal-severity Unresolved: flag as "prior unresolved Fatal — must address" in next round's fix dispatch (binding, one-round grace)
e. If Significant-severity Unresolved: appended to fix journal as informational context
f. Invoke a FRESH red-team on the revised artifact (no anchoring)Applies to: Round 1 and every 3rd round thereafter (rounds 1, 4, 7, 10, 13). Intermediate rounds: Standard single-model red-team dispatch (no change).
On consensus-eligible rounds:
consensus_query(mode: "review") with the red-team prompt and artifact contentround-N-findings.mdCost control: The consensus dispatch replaces (not supplements) the single-model dispatch on eligible rounds. Fallback: If consensus is unavailable on an eligible round, dispatch standard single-model red-team review.
This gate cannot be bypassed without explicit user approval. Task size, complexity, or scope is never a valid reason to skip. The invoking skill is responsible for always dispatching the gate AND letting it run to completion.
The gate is not "done" until it completes with a clean round (0 Fatal, 0 Significant on a fresh review). Fixing findings and moving on without a verification round is a skip, not a pass. The iteration loop exists because fix agents introduce new issues or incompletely resolve old ones — fresh-eyes re-review catches what the fixer missed.
The only valid skip is an unambiguous user instruction specifically referencing the gate (e.g., "skip the quality gate"). General feedback like "looks good" or "move on" is not skip approval. Once a gate has run and presented findings to the user, the user's decision to proceed is authoritative.
The orchestrator coordinates the loop but does NOT fix artifacts directly. Fixes are dispatched to a separate subagent to maintain separation of concerns between coordination, review, and remediation.
| Artifact Type | Fix Agent |
|---|---|
| design | Plan Writer subagent revises the doc |
| plan | Plan Writer subagent revises the plan |
| code | Fix subagent (new, not the original implementer) |
| hypothesis | Debugging skill's hypothesis refinement (see below) |
| mockup | Fix subagent |
| translation | Fix subagent revises the translation map |
Before dispatching the fix agent (code artifacts only): If crucible:checkpoint is available, create checkpoint with reason "pre-qg-fix-round-N". Non-code artifacts (design, plan, hypothesis, mockup, translation) skip this step — they are fully captured by the existing artifact-N.md snapshots.
The fix agent receives: (a) the current artifact, (b) the red-team findings, (c) project context, and (d) the fix journal from prior rounds (see Fix Memory below). It returns the revised artifact. The orchestrator writes the revised artifact to the scratch directory and dispatches the next red-team round.
The orchestrator never applies fixes directly. Even trivial fixes go through a fix agent to maintain separation of concerns. The cost of dispatching for a small fix is negligible; the risk of the orchestrator conflating coordination with fixing is not.
Fix agents are prone to drift — addressing findings by adding unrequested features, restructuring documents, or expanding scope beyond what was asked. This costs real time in re-anchoring and rework.
Before dispatching each fix agent, the orchestrator MUST include in the fix prompt:
Why this matters: The #1 user friction with the quality gate is fix agents drifting from the original design by adding unrequested content. Scope anchoring turns "stop. skipping. steps." into a structural guardrail.
Anti-anchoring is a property of review, not remediation. Reviewers need fresh eyes to avoid confirmation bias. Fix agents need institutional memory to avoid repeating failed strategies.
The quality gate maintains a fix journal (fix-journal.md in the scratch directory) that accumulates across rounds. After each fix agent completes, the orchestrator appends a structured entry:
## Round N Fix
- **Findings addressed:** [list of Fatal/Significant findings from round N, summarized]
- **Approach taken:** [1-2 sentence description of fix strategy]
- **Files changed:** [list of files modified]
- **Reasoning:** [why this approach was chosen over alternatives]
On subsequent rounds, the fix agent receives the full fix journal. This gives the fix agent critical context:
Anti-anchoring is preserved. The fix journal is NEVER passed to the red-team reviewer. Reviewers see only the clean artifact. The journal flows exclusively through the remediation path: fix agent writes it, next fix agent reads it, orchestrator maintains it.
Round 1 fix agents receive an empty journal (no prior rounds). This is the only round where the fix agent works without remediation history.
Why this matters: Without fix memory, the most common causes of stagnation and oscillation are fix agents repeating failed approaches or unknowingly reverting prior fixes while addressing new findings. Fix memory turns these escalation events into solvable problems -- the fix agent can see what was already tried and choose a genuinely different approach.
Compaction recovery: The fix journal is written to fix-journal.md in the scratch directory alongside round scores and findings. It is recovered automatically when the orchestrator reads the scratch directory after compaction.
After each fix agent completes and before the next red-team round, dispatch a Fix Verifier — a dedicated Sonnet agent that checks whether each fix actually resolves its stated finding. No re-fix sub-loop; the verifier checks once, and its output feeds into the fix journal for the next round.
Dispatch method: Task tool (model: Sonnet), same pattern as the stagnation judge. The verifier needs no file access; the orchestrator includes all input in the dispatch file directly.
Input the orchestrator provides:
## Round N Fix section just appended (not the full journal)fix-verifier-prompt.md as the agent's instructionsReading the verdict: The verifier returns a per-finding Resolved/Unresolved table and an overall PASS/FAIL.
Handling Unresolved findings:
Fix journal integration: The verifier's output is appended under a ### Verifier Assessment heading in the fix journal, distinct from the ## Round N Fix entry format. This keeps verifier assessments on the remediation path (fix agents see them) without contaminating the review path (red-team never sees them).
Anti-anchoring preserved: The verifier is on the remediation path — its output flows to fix agents only, never to the red-team reviewer. Same isolation as the fix journal itself.
Round counter unchanged: The verifier dispatch does not increment the round counter. It is part of the fix step, not a separate review round.
Two-layer system: the orchestrator handles scoring; a dedicated judge agent handles semantic analysis.
Stagnation uses weighted scoring (Fatal=3, Significant=1) AND Fatal count tracking.
Progress requires EITHER:
If either condition is met → progress, loop again. No judge needed.
Oscillation detection: If the weighted score increases (not just stays the same), escalate immediately as a regression. Report: "Round N score (X) is higher than Round N-1 score (Y). The fix cycle introduced new issues. Escalating." No judge needed.
Regression with checkpoint: If a pre-qg-fix-round checkpoint exists for the prior round, include in the escalation: "A checkpoint of the pre-fix state exists (<hash>). Options: (a) restore to pre-fix checkpoint and retry with different fix strategy, (b) continue with current state, (c) escalate to user." If no checkpoint exists, escalate as currently specified.
When the consensus_query MCP tool is available and consensus mode verdict is enabled:
Instead of dispatching a single Sonnet judge via Task tool, call
consensus_query(mode: "verdict") with:
stagnation-judge-prompt.mdRead the consensus response:
status: "complete" or status: "partial":
synthesis verdict (PROGRESS/STAGNATION/DIMINISHING_RETURNS)status: "unavailable":
The comparison file (round-N-comparison.md) includes the consensus
metadata: models queried, models responded, agreement level, and any
dissenting verdicts.
If neither progress condition is met AND the score did not increase (i.e., same score, no Fatal count improvement), dispatch the Stagnation Judge — a dedicated Sonnet agent that performs semantic comparison of findings across rounds. If the consensus_query tool is not available in the environment, this step uses the standard single-Sonnet dispatch described below.
Dispatch method: Task tool (model: Sonnet). The judge needs no file access; the orchestrator includes all input in the dispatch file directly.
Input the orchestrator provides:
round-N-findings.md (current round)round-(N-1)-findings.md (prior round)## Round N Fix section from fix-journal.md (not the full journal)round-*-comparison.md files (for consecutive-round state tracking)stagnation-judge-prompt.md as the agent's instructionsReading the verdict: The judge returns a structured verdict: PROGRESS, STAGNATION, or DIMINISHING_RETURNS.
The judge also writes: a round-N-comparison.md file. The orchestrator saves the judge's full output as round-N-comparison.md in the scratch directory. This file is used by future judge dispatches for consecutive-round tracking.
Pass the full artifact content to the red-team subagent. No preparation needed.
Code artifacts vary in size. The orchestrator prepares the artifact based on scope:
chunk-manifest.md (lists all chunks with gated/pending status) to the parent scratch directory. Per-chunk round files go in chunk-N/ subdirectories. Only delete the parent scratch directory after the final cross-chunk round completes. The active-run.md marker references the parent run-id throughout.The red-team subagent receives the prepared artifact, not raw diff. This mirrors audit's Tier 1/Tier 2 context management approach.
Hypotheses are 1-2 sentence statements, not plans or designs. The red-team prompt template is plan-centric and does not map well to hypothesis testing. For hypothesis artifacts, the orchestrator frames the red-team dispatch with hypothesis-specific attack vectors:
Include these in the dispatch prompt alongside the standard red-team template. The debugging skill's Phase 3.5 defines these questions -- the quality-gate orchestrator should use them.
Minor issues do not trigger fix rounds and do not count toward stagnation. However, they accumulate across rounds and contain useful information. Do not silently discard them.
After the gate completes (artifact approved or stagnation escalated):
Runs ecosystem-appropriate dependency audit commands before the red-team loop begins. Produces an independent supply-chain signal that is surfaced to the orchestrator and user — the red-team never sees audit data.
Artifact-type scoping: Runs only when the artifact type is code. Unconditionally skipped for design, plan, hypothesis, mockup, and translation artifacts. When skipped, no audit section appears in gate output and no scratch files are written.
Timing: Runs after the active-run marker is written (setup phase, before the numbered steps in How It Works) but before artifact preparation and red-team dispatch. The pre-flight completes fully before the first red-team round begins.
skip_blocking (boolean, default: false) — Global override. When true, disables ALL blocking regardless of min_blocking_severity. Findings are still reported in audit-results.md but no blocking occurs and the result is FINDINGS (not BLOCKED). skip_blocking supersedes min_blocking_severity entirely — they do not interact as independent thresholds.
min_blocking_severity (string, default: "critical", case-insensitive) — The minimum normalized severity at which a finding triggers blocking. Accepted values: "critical", "high", "moderate", "low". Invalid values are rejected with an error before execution begins. This does not change what gets reported — all findings always appear in audit-results.md; it only affects whether the result is BLOCKED vs FINDINGS.
Walk the directory tree from artifact root, collecting all manifest files matching the supported set:
| Manifest File | Ecosystem |
|---|---|
package.json | Node.js |
Cargo.toml | Rust |
requirements.txt | Python |
pyproject.toml | Python |
Excluded directories: node_modules/, .git/, target/, dist/, vendor/, third_party/, .venv/, venv/. These contain vendored or installed dependencies, not the project's own manifests.
Symlinks are not followed — following them risks infinite recursion in repos with circular symlinks or deeply nested node_modules.
npm workspace detection: Before scheduling per-directory npm audit runs, inspect each discovered package.json for a top-level "workspaces" field. If a workspace root is detected, schedule a single npm audit from that root directory. Do not schedule separate runs for package.json files in subdirectories that are members of that workspace.
Python dual-manifest handling: When a directory contains both requirements.txt and pyproject.toml, audit both. They may represent different dependency sets. Duplicate findings are deduplicated at result-write time in audit-results.md using the key (package name + CVE ID) — each unique (package, CVE) pair appears once with a note of which sources reported it. Version differences for the same (package, CVE) pair are noted but not double-counted.
Manifest list finalization: The manifest list is written to preflight-audit.md before any audit tool is invoked. This list is the authoritative scope for the run. If compaction occurs after this point, the gate resumes from the recorded list — it does not re-scan.
Zero manifests: If zero manifests are found anywhere in the tree, pre-flight completes as a no-op and notes this in the output summary.
Detected manifests are audited in fixed order for deterministic output: Node.js -> Rust -> Python.
| Manifest File | Audit Command | Notes |
|---|---|---|
package.json | npm audit --json | Run from workspace root if applicable, otherwise cwd = manifest directory |
Cargo.toml | cargo audit --json | Run with cwd = manifest directory |
requirements.txt | pip-audit --format json -r requirements.txt | Explicit -r flag; does NOT require active venv |
pyproject.toml | pip-audit --format json | Requires active venv or lockfile (see below) |
All detected manifests are audited independently (after workspace consolidation). Each runs as an isolated subprocess. A failure in one audit does not abort or skip audits for other manifests. All ecosystems run to completion before the overall result is computed — a BLOCKED result from one ecosystem does not short-circuit audits for remaining ecosystems.
Before invoking any audit tool, the gate checks availability:
| Case | Condition | Action |
|---|---|---|
| Available | Tool in PATH, environment ready | Run audit |
| Tool missing | Tool not in PATH | Write warning to audit-results.md, surface to user |
| Tool broken | Tool found but --version fails | Write warning, skip |
| Environment not ready | Tool found but required environment absent | Write specific reason, skip with warning |
Per-manifest environment readiness checks:
requirements.txt: pip-audit -r requirements.txt reads the file directly. No virtualenv required. Available if pip-audit is on PATH.pyproject.toml: pip-audit without -r inspects the installed environment. Requires an active virtualenv or a lockfile (poetry.lock, pdm.lock, uv.lock). If neither is present, skip with: "pip-audit requires a virtual environment or lock file for pyproject.toml; results would be unreliable."Cargo.toml: Requires Cargo.lock to be present. If absent: "skipped — Cargo.lock absent; run cargo generate-lockfile first."package.json: Requires package-lock.json (or npm-shrinkwrap.json) in the same directory (or workspace root). If absent: "skipped — no lockfile found; run npm install to generate package-lock.json." npm must be on PATH.Python manifest confidence: When only pyproject.toml is present (no requirements.txt or lockfile in the same directory), include a notice in audit-results.md: "Confidence: Reduced — No requirements.txt or lock file found. pip-audit is resolving dependencies from pyproject.toml directly. Results may be incomplete."
Tool availability results are written to audit-results.md (not preflight-audit.md), because they are discovered at execution time, not scan time.
A run where all manifests are skipped (missing tools or environment-not-ready) is reported as INCONCLUSIVE, not passing.
Audit tools exit non-zero for two distinct reasons:
Exit code contracts per tool:
| Tool | Clean | Findings | Error |
|---|---|---|---|
npm audit | exit 0 | exit 1 | exit 2+ |
cargo audit | exit 0 | exit 1 | exit 2+ |
pip-audit | exit 0 | exit 1 | exit 2+ (or non-zero with unparseable stdout) |
Use exit codes to distinguish outcomes. Do not parse stderr substring content to classify results.
Audit tools use different severity vocabularies. The gate normalizes to a common scale. CVSS boundaries are inclusive on the lower bound, exclusive on the upper (e.g., a CVSS score of exactly 9.0 is Critical, not High).
| Level | npm audit | cargo audit | pip-audit |
|---|---|---|---|
| Critical | critical | CVSS >= 9.0 | CVSS >= 9.0 |
| High | high | CVSS >= 7.0 and < 9.0 | CVSS >= 7.0 and < 9.0 |
| Moderate | moderate | CVSS >= 4.0 and < 7.0 | CVSS >= 4.0 and < 7.0 |
| Low | low | CVSS >= 0.1 and < 4.0 | CVSS >= 0.1 and < 4.0 |
| Informational | — | CVSS = 0.0 | CVSS = 0.0 |
CVSS 0.0 findings are classified as Informational — reported in audit-results.md but never count toward blocking. They do not map to any blocking severity level.
If a finding has no CVSS score (advisory-only, no CVE assigned), it is treated as Moderate and flagged with [no-cvss] in the output.
Pre-flight produces two files under scratch/<run-id>/:
preflight-audit.md — Scan-time plan. Written before any audit tool runs. Contains only scan-time information:
generated-at timestamp (ISO-8601)This file is not updated after execution begins. It is the immutable record of what the scan discovered.
audit-results.md — Execution-time output. Written incrementally as each ecosystem completes. Contains:
Each ecosystem section ends with a status: complete sentinel line. A section without this sentinel is considered incomplete and must be discarded and re-run on recovery.
Schema for audit-results.md:
# Dependency Audit
generated-at: <ISO-8601>
run-id: <run-id>
> This section is independent of red-team findings. The red-team did not see this data.
## Tool Availability
- npm audit: available
- cargo audit: available
- pip-audit (requirements.txt): available
- pip-audit (pyproject.toml): unavailable — no venv or lock file
## Summary
Result: CLEAN | FINDINGS | BLOCKED | INCONCLUSIVE | FAILED
Critical: N High: N Moderate: N Low: N Informational: N
## npm — packages/api/package.json — FINDINGS
[findings list: package, severity, CVE, fix-available]
status: complete
## pip — src/requirements.txt — FINDINGS
## pip — src/pyproject.toml — FINDINGS
[deduplicated: CVE-2024-XXXXX reported by both src/requirements.txt and src/pyproject.toml — counted once]
status: complete
## Warnings
[environment-not-ready, reduced-confidence, or deduplication notes]
When results span multiple manifests with mixed outcomes, the overall Result: field uses this precedence (highest wins):
| Priority | Result | Condition |
|---|---|---|
| 1 (highest) | BLOCKED | Findings at or above min_blocking_severity and skip_blocking is false |
| 2 | FINDINGS | At least one manifest returned vulnerability findings (below blocking threshold or override active) |
| 3 | INCONCLUSIVE | At least one manifest was skipped (tool missing, environment not ready); no findings |
| 4 | FAILED | At least one manifest tool errored; no findings and no skips |
| 5 (lowest) | CLEAN | All manifests completed without findings |
INCONCLUSIVE outranks FAILED because unknown coverage (a manifest exists but was never audited) is more dangerous than a known, retryable tool error.
When a finding at or above min_blocking_severity is present and skip_blocking is not true:
npm audit fix / cargo update away; no-fix blockers may require dependency replacement or acceptance.Result: BLOCKED and return to the parent orchestrator without prompting.Whether a session is interactive is a Claude Code runtime property, not something the skill detects via TTY heuristics or environment inspection.
Parent-pipeline integration: When the gate returns with Result: BLOCKED, the parent orchestrator (build, spec, or direct user invocation) treats this the same as any gate failure — escalate to the user with the blocking findings listed. The red-team-rounds: 0 field indicates the red-team loop never ran.
Neither preflight-audit.md nor audit-results.md is passed to red-team dispatch. The red-team receives only the artifact under review — unchanged from current behavior. Audit findings are surfaced to the user at gate completion as an independent signal alongside (not merged with) red-team findings.
The generated-at timestamp marks when results were produced. Results are valid for that point in time only. The gate does not re-run pre-flight after fix-agent remediation within the same gate run. This is an explicit design boundary: the gate run is a point-in-time evaluation.
The iterative loop's value depends on each reviewer seeing the artifact with fresh eyes. To prevent information leaking between rounds:
crucible:red-team but is restated here because the quality-gate orchestrator is the most likely point of accidental leakage.Quality gate writes round state to disk for compaction recovery.
Scratch directory: ~/.claude/projects/<project-hash>/memory/quality-gate/scratch/<run-id>/ where <run-id> is a timestamp generated at the start of the gate. This path is persistent and discoverable (matching the audit skill's pattern), so it survives compaction even if the run-id is lost from context — the orchestrator can list the directory to find active runs.
Tool constraint: All scratch directory operations (create, read, list, delete) must use Write, Read, and Glob tools — NOT Bash. Safety hooks block Bash commands referencing .claude/ paths.
Active run marker: At the start of the gate, write ~/.claude/projects/<project-hash>/memory/quality-gate/active-run-<run-id>.md containing the run-id and scratch directory path. Delete only your own marker when the gate completes. After compaction, glob for active-run-*.md files to locate active runs — recover the one whose run-id matches context, or the most recent if context is lost.
Stale cleanup: At the start of each gate, delete scratch directories whose timestamps are older than 2 hours AND that are NOT referenced by any active-run-*.md marker. Also delete any fix-journal-*.md handoff files in the memory/quality-gate/ directory whose mtime is older than 24 hours (the longer window accommodates overnight breaks between QG and forge sessions).
After each round, write:
round-N-score.md: weighted score, Fatal count, Significant count, Minor countround-N-findings.md: the red-team findings for this roundartifact-N.md: the artifact snapshot after fixes (input to round N+1)fix-journal.md: cumulative fix journal (appended after each fix agent completes; see Fix Memory above)round-N-comparison.md: stagnation judge output (only exists for rounds where the judge was dispatched — absence on clean-progress rounds is expected, not an error). When multi-model consensus was used, this file also contains consensus metadata: models queried, models responded, agreement level, and any dissenting verdicts.round-N-verification.md: fix verifier verdict summary (written after every fix round — unlike comparison files, these exist for every round that had fixes)Compaction recovery:
0. Read ## Compression State from pipeline-status.md — recover Goal, Key Decisions (including parent skill decisions that affect the gate), Active Constraints, and Next Steps. If absent, skip to step 1. Note: quality-gate is invoked by a parent skill (build, debugging, spec), so the Compression State reflects the parent's context. The quality-gate orchestrator inherits this context.
active-run-*.md markers to locate the scratch directory.
1b. Pre-flight recovery (code artifacts only): Check for preflight-audit.md in the scratch directory. If absent, restart from manifest scan. If present, read it to recover the manifest list. Then check audit-results.md for completed ecosystem sections (those ending with status: complete sentinel). Sections without the sentinel are discarded as incomplete. Resume from the first manifest not yet present as a complete section. Recovery re-invokes the audit tool for incomplete manifests — no raw output is cached between compaction events. After all manifests complete, regenerate the Summary section of audit-results.md.round-N-score.md files).artifact-N.md as the current artifact state.round-N-score.md files to reconstruct the score progression.round-N-comparison.md files to reconstruct consecutive-round state for the stagnation judge. Absence of comparison files is expected on clean-progress rounds.round-N-verification.md files to recover fix verifier state. If any Fatal-severity Unresolved verdicts exist in the latest verification file, carry them forward as binding context for the next fix dispatch.consensus_query MCP tool is available (consensus
availability may have changed across compaction boundary). Use current
availability for subsequent rounds regardless of what was used pre-compaction.Emit a Compression State Block at:
Dead-end handoff (step 5, code artifacts only): After Minor Issue Handling and before cleanup, if fix-journal.md exists in the scratch directory and contains 1+ round entries, copy its contents to ~/.claude/projects/<project-hash>/memory/quality-gate/fix-journal-<run-id>.md (using the gate's run-id). This is a transient handoff artifact for the next forge retrospective. On stagnation/escalation exit paths, also write the handoff file before escalating — stagnated sessions produce the highest-value dead-end data.
Cleanup: Delete scratch directory and your active-run-<run-id>.md marker after the gate completes (pass or stagnation). Do NOT delete verdict marker files (gate-verdict-<run-id>.md) — the build orchestrator is responsible for their lifecycle.
After Minor Issue Handling completes and before cleanup begins, write a verdict marker file to a stable location outside the scratch directory. This marker survives scratch cleanup and serves as a cross-skill consistency signal for the build orchestrator's gate ledger.
When: After Minor Issue Handling (the quick-fix pass on consolidated minors) and before cleanup. Written on ALL exit paths — PASS, FAIL, STAGNATION, and ESCALATED. The Verdict field reflects the actual outcome.
Path: ~/.claude/projects/<project-hash>/memory/quality-gate/gate-verdict-<run-id>.md
Format: Key-value pairs, one per line:
Verdict: PASS | FAIL | STAGNATION | ESCALATED
Phase: <phase name from invoking orchestrator, omit if standalone>
PipelineID: <pipeline-id from invoking orchestrator, omit if standalone>
Rounds: <total round count>
FinalScore: <weighted score from last round>
Timestamp: <ISO-8601>
RunID: <quality-gate run-id>
Tool: Write tool (not Bash) since the path is under .claude/.
Standalone invocations: When quality-gate is invoked directly (not by build), the Phase and PipelineID fields are omitted. The marker is still written — it serves as a completion record even without pipeline context.
Stale cleanup exclusion: Verdict markers are NOT subject to the 2-hour stale cleanup that applies to scratch directories. They are deleted by the build orchestrator after writing the corresponding gate ledger entry. Orphaned markers (from crashed runs) are cleaned up during the build skill's ledger initialization.
Quality gate is invoked by the outermost orchestrator only — not self-invoked by child skills. This avoids double-gating.
Rule: Skills NEVER self-invoke quality-gate. They only document that their output is gateable. The outermost orchestrator (build, the user session, or another pipeline) always handles gating. This eliminates the ambiguity of skills trying to detect whether they are running standalone or as a sub-skill.
The user's session is the outermost orchestrator. When a user runs /design directly, the design skill produces the doc and documents it as gateable. The user's session (following the design skill's instructions) invokes quality-gate.
Build is the outermost orchestrator and controls all quality gates:
Context from invoking orchestrator: When build invokes quality-gate, it includes a "Context from invoking orchestrator" block in the dispatch prompt containing:
Phase: <phase name> — "design", "plan", or "code"PipelineID: <pipeline-id> — the build's PipelineID (format: build-YYYYMMDD-HHMMSS)Quality-gate reads these values from its dispatch context and includes them in the verdict marker. These are dispatch context values, not tool arguments — quality-gate is a skill, not an API.
| Type | Produced By | Gate Trigger |
|---|---|---|
| design | crucible:design | After design doc is saved |
| plan | crucible:planning | After plan passes review |
| hypothesis | crucible:debugging | Phase 3.5, before implementation |
| code | crucible:debugging, build | After implementation/fix |
| mockup | crucible:mockup-builder | After mockup is created |
| translation | crucible:mock-to-unity | After self-verification |
Each artifact-producing skill's SKILL.md documents:
"This skill produces [artifact type]. The outermost orchestrator invokes
crucible:quality-gateafter [trigger]."
Three exit modes beyond clean approval:
round-N-comparison.md (breaks consecutive-round tracking)crucible:finish).crucible:quality-gate for fix review, not crucible:red-team directly. This ensures fixes get iteration tracking, compaction recovery, and user checkpoints.