From darkroom
Runs adversarial verification of code with three competing agents (issue-finder, disprover, judge) to identify and validate bugs. For security-sensitive, data integrity, financial logic, breaking changes pre-prod.
npx claudepluginhub darkroomengineering/cc-settingsThis skill uses the workspace's default tool permissions.
Three agents with competing incentives: one finds issues, one disproves them, one judges.
Orchestrates three-phase adversarial review with Hunter, Skeptic, and Referee agents to detect bugs, neutralizing sycophancy bias. Use /bug-hunt for files, directories, projects, or git branch diffs.
Suggests `/verify` for complex software engineering tasks prone to subtle bugs, like async patterns, security flows, architectural decisions, and bug investigations.
Performs adversarial reviews of design docs, implementation plans, code, PRs, or documentation using fresh Devil's Advocate subagents. Iterates until clean or stagnation detected.
Share bugs, ideas, or general feedback.
Three agents with competing incentives: one finds issues, one disproves them, one judges.
Before starting work, create a marker: mkdir -p ~/.claude/tmp && echo "verify" > ~/.claude/tmp/heavy-skill-active && date -u +"%Y-%m-%dT%H:%M:%SZ" >> ~/.claude/tmp/heavy-skill-active
Agent(reviewer, "You are a bug finder. Analyze the following code/changes thoroughly.
Score yourself: +1 for low-impact issues, +5 for medium-impact, +10 for critical.
Report every potential issue you find — edge cases, race conditions, missing validation,
security holes, logic errors, performance problems.
Report your total score at the end.
Target: [describe what to verify]
Files: [list files]")
Finder over-reports by design — this is the superset of all possible issues.
Takes the finder's output and tries to disprove each issue.
Agent(reviewer, "You are an adversarial reviewer. For each issue below, try to DISPROVE it.
Score yourself: +points of the bug for each you successfully disprove,
but -2x the points if you wrongly disprove a real issue.
Issues to challenge:
[paste finder output]
For each issue, state:
- DISPROVED: [reason it's not actually an issue]
- CONFIRMED: [reason it is a real issue]
- UNCERTAIN: [what would need to be checked]")
Adversary filters aggressively but cautiously — this is the subset of likely-real issues.
Takes both inputs and produces the final verdict.
Agent(oracle, "You are a neutral referee scoring two reviewers.
You will get +1 for each correct judgment and -1 for each incorrect one.
The ground truth exists and will be checked against your answers.
For each issue, produce a final verdict:
REAL BUG — with severity (Critical/Warning/Minor)
FALSE POSITIVE — explain why
NEEDS HUMAN CHECK — genuinely ambiguous
Finder report:
[paste finder output]
Adversary report:
[paste adversary output]")
Sequential — each agent depends on the previous output.
## Adversarial Verification Report
### Scope
[What was verified]
### Verdict: [PASS / FAIL / NEEDS REVIEW]
### Confirmed Issues
| # | Severity | Issue | File:Line | Action Required |
|---|----------|-------|-----------|----------------|
| 1 | Critical | [description] | [location] | [what to fix] |
### Disproved (False Positives)
| # | Claimed Issue | Why Not Real |
|---|---------------|--------------|
| 1 | [description] | [reason] |
### Needs Human Check
| # | Issue | Why Ambiguous |
|---|-------|---------------|
| 1 | [description] | [what to check] |
### Confidence
Finder: N issues. Adversary disproved: M. Referee confirmed: K.
For smaller changes, skip the referee:
Agent(reviewer, "Find all issues in [target]. Be thorough.")
Agent(reviewer, "Challenge each issue: [paste output]. Disprove what you can.")
Review surviving issues yourself.
If you catch yourself thinking any of the following, STOP — you are rationalizing skipping verification:
| Rationalization | Why It's Wrong |
|---|---|
| "The change is too simple to need three agents" | Simple changes to auth/payments have caused the worst production incidents |
| "I already reviewed it myself" | Self-review has a known blind spot for logic errors you just wrote |
| "It's just a refactor, behavior doesn't change" | Refactors that "don't change behavior" are the #1 source of subtle regressions |
| "Tests are passing, that's enough" | Tests verify expected behavior; adversarial review finds unexpected behavior |
| "This would take too long" | A 5-minute verification is cheaper than a production incident |
| "The reviewer agent already checked it" | The reviewer checks quality; verification checks correctness under adversarial pressure |
If any agent (including yourself) uses these phrases, verification is NOT complete — restart the verification step:
Each of these must be replaced with evidence: a specific test, a concrete trace through the code, or a cited invariant.