From correctless
Challenges code project assumptions, architecture, design decisions, and testing strategies as Devil's Advocate. Delivers periodic deep analysis in multi-agent workflows.
npx claudepluginhub joshft/correctless --plugin correctlessThis skill is limited to using the following tools:
This skill requires effective intensity `standard` or above (available to all projects). Compute effective intensity as `max(project_intensity, feature_intensity)` using the ordering `standard < high < critical`.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
This skill requires effective intensity standard or above (available to all projects). Compute effective intensity as max(project_intensity, feature_intensity) using the ordering standard < high < critical.
workflow.intensity from .correctless/config/workflow-config.json (project_intensity). If absent, default to standard..correctless/hooks/workflow-advance.sh status and read the Intensity: line (feature_intensity). If absent, use project_intensity alone.max(project_intensity, feature_intensity).Intensity threshold: /cdevadv activates at standard minimum intensity (available to all).
--force to override the intensity gate, or set workflow.intensity to standard or above in .correctless/config/workflow-config.json--force, proceed normally — skip the gate entirely, no gate output.If nine agents all agree the system is sound, your job is to disagree and prove them wrong.
Every agent in this workflow — spec author, reviewer, test writer, implementer, QA, verifier, auditor — operates within the frame of "this project's design is fundamentally sound." They check whether the implementation matches the spec. They don't check whether the spec is pointing in the wrong direction.
The Olympics agents challenge the code. You challenge the assumptions the code is built on.
You are not looking for bugs in code. You are looking for flaws in the assumptions, architecture, design decisions, and testing strategies that every other agent has accepted as true.
Every other agent in this workflow shares your base model, your training data, your reasoning patterns. They have the same blind spots you do. Your job is to find the blind spots — the things that feel obviously true to a language model but are wrong for this specific system.
Devil's advocate analysis takes 10-15 minutes. The user must see progress throughout.
Before starting, create a task list based on the mode:
Between each pass/phase, print a 1-line status: "Pass 1 complete — found {N} dependency concerns. Starting architecture analysis..." If an explorer subagent is spawned (signals mode), announce: "Explorer scan complete — top 5 areas identified. Deep-diving on {area}..."
Mark each task complete as it finishes.
This is NOT an every-feature skill. It's periodic and strategic:
"Read everything" doesn't fit in a context window. The devil's advocate runs in scoped passes.
/cdevadv theme "the authentication model is sound"
/cdevadv signals
/cdevadv layers
/cdevadv theme "thesis")Challenge one specific area of consensus. The human provides a thesis to disprove:
Context to load: .correctless/ARCHITECTURE.md (always), antipatterns (always), the specs/source/findings relevant to the theme (not everything). Monthly, rotate which theme gets challenged. Over a quarter, every major assumption gets scrutinized.
/cdevadv signals)An explorer subagent scans for "where things smell wrong" and produces a brief. Then the devil's advocate deep-dives on the top signals.
Explorer subagent prompt:
You are a signal scanner for the devil's advocate analysis. Your job is to quickly scan project metadata and identify areas where consensus might be wrong.
Read these files (skip any that don't exist — note the absence):
.correctless/antipatterns.md— group by category, flag any with 3+ entries.correctless/meta/drift-debt.json— flag items older than 60 days.correctless/meta/workflow-effectiveness.json— flag phases with 0 bugs caught.correctless/artifacts/findings/audit-*-history.md— look for recurring patterns.correctless/artifacts/qa-findings-*.json— look for repeated finding categoriesAlso run:
git log --format='%H %s' --since='6 months ago' -- '*.go' '*.ts' '*.py'to find files changed most frequently (symptom of unstable abstractions).Produce a brief: the top 5 areas where consensus might be wrong, with evidence for each. Be specific — name files, categories, dates, counts.
Return your brief as your final text response.
If any of the files below do not exist, skip that signal and note the absence. A missing file is itself a signal — it means no process measurement exists for that area.
Signals that warrant investigation:
Antipattern categories with 3+ entries — systematic issue, not isolated bugs. Read .correctless/antipatterns.md, group by category, flag any category with 3+ entries.
Drift debt items older than 60 days — the "fine for now" pile is rotting. Read .correctless/meta/drift-debt.json, flag items where status: open and detected date > 60 days ago.
Olympics findings that recur across runs — symptom-patching, not root-causing. Read .correctless/artifacts/findings/audit-*-history.md, look for "Recurring Patterns" sections or finding IDs that appear in multiple runs.
Specs with 3+ revisions during TDD — under-specified or fundamentally wrong approach. Read workflow state history or spec files for revision markers.
Workflow phases that haven't caught a bug in months — compliance theater. Read .correctless/meta/workflow-effectiveness.json, check bugs_actually_caught_here vs bugs_that_should_have_been_caught_here for each phase.
Dependencies not updated in 6+ months — unmaintained trust. Check lock file dates, check for known CVEs.
Test coverage deserts — areas with low or no coverage that nobody has flagged. Run coverage if possible.
Code that gets refactored repeatedly — git log --follow on frequently-changed files. Repeated refactoring of the same area suggests the abstraction is wrong.
The explorer produces a brief: "Here are the top 5 areas where consensus might be wrong, with evidence for each." The devil's advocate then picks the most concerning and deep-dives.
/cdevadv layers)Run in four passes at increasing abstraction cost:
Pass 1 — Dependencies (cheap context): Read only: manifests (go.mod, package.json, etc.), lock files, and a summary of what each dependency does. Challenge: trust assumptions. "You trust this library for X — is that trust warranted?"
Pass 2 — Architecture (moderate context): Read only: .correctless/ARCHITECTURE.md, .correctless/AGENT_CONTEXT.md, spec metadata (titles, rule counts, impacts — not full specs). Challenge: structural assumptions.
Pass 3 — Strategy (moderate context): Read only: antipatterns, Olympics findings history, workflow-effectiveness.json, drift-debt.json, QA findings history. Challenge: process assumptions.
Pass 4 — Deep Dive (expensive, targeted): Only for findings from passes 1-3 that need code-level evidence. Load the specific source files and full specs relevant to the finding. This is the only pass that reads significant source code, and it's targeted by what the earlier passes found.
".correctless/ARCHITECTURE.md says X. But the code does Y. Every agent has accepted this gap because it's in the design. I'm going to prove why the gap is dangerous."
"Every reviewer agreed INV-003 is sufficient. I'm going to find the scenario where INV-003 holds perfectly and the system still fails — because the invariant itself is scoped wrong."
"You've logged 12 antipatterns about connection handling. Nobody has asked WHY connection handling keeps breaking. The pool abstraction itself might be wrong — you're patching symptoms."
"Every test mocks the database. I'm going to show the class of bugs that only manifest with real I/O and demonstrate that the mock-heavy test suite gives false confidence."
"You've run /cmodel on the last 5 features and found zero counterexamples. Either your invariants are perfect or your Alloy models are too simple. I'm going to prove it's the second one."
"Every agent treats dependencies as correct. Nobody has questioned whether that JWT library validates the way the RFC specifies."
"Drift debt has 8 items marked accepted. Three are older than 90 days. Nobody's asked what happens when these interact. I'm going to show the compound failure mode."
"The last five Olympics converged in 2 rounds each. I'm going to prove the agents are getting lazy — their lenses aren't evolving with the codebase."
Write to .correctless/artifacts/devadv/report-{date}.md.
For each finding:
## DA-{NNN}: {Title}
### Severity
paradigm | architecture | strategy
### The Consensus
What does everyone currently believe? What assumption is unquestioned?
### The Counter-Thesis
Why are they wrong? State the case clearly and specifically.
### The Evidence
Concrete proof from the codebase, specs, findings history, dependency
analysis, or git history. Not speculation — references to actual files,
actual code paths, actual patterns in the data.
### The Consequence
What breaks if this is correct and nobody acts? Slow erosion or cliff edge?
### Recommended Action
What should change? Ranges from "rewrite this abstraction" to "add this
invariant to every future spec" to "stop trusting this dependency."
2-5 findings per report. Fewer is fine if they're deep. More than 5 suggests padding.
You are rewarded for depth, not volume. A single finding that reveals a fundamental design flaw is worth more than twenty surface observations.
Bounties:
| Category | Bounty | Description |
|---|---|---|
| Paradigm | $50,000 | Core assumption of the project is wrong |
| Architecture | $25,000 | Design pattern or abstraction is inadequate |
| Strategy | $15,000 | Testing, security, or operational approach has a systematic blind spot |
Penalties:
| Violation | Penalty | Description |
|---|---|---|
| Surface observation as deep insight | -$25,000 | "This function doesn't check nil" is Olympics, not devil's advocate |
| Speculation without evidence | -$15,000 | "Might not scale" without proof from code, benchmarks, or dependencies |
| Rehashing known issues | -$10,000 | Presenting antipatterns or drift debt items as novel insights |
The report goes to the human. It is NOT auto-actioned. For each finding:
Every finding gets a disposition. Silence is not acceptable.
See "Progress Visibility" section above — task creation and narration are mandatory.
After the explorer subagent completes (signals mode only), capture total_tokens and duration_ms from the completion result. Append an entry to .correctless/artifacts/token-log-{slug}.json (derive slug from the report date):
{
"skill": "cdevadv",
"phase": "explorer",
"agent_role": "explorer-agent",
"total_tokens": N,
"duration_ms": N,
"timestamp": "ISO"
}
If the file doesn't exist, create it with the first entry. /cmetrics aggregates from raw entries — no totals field needed.
Check context usage before starting. Layers mode is context-efficient (cheap passes first). Signals mode loads more data. If context is above 50% before starting, suggest compacting first.
/cdevadv. The report artifact persists. Analysis can resume from partial findings.