From correctless
Runs cross-codebase quality audits using parallel specialist agents (QA, Hacker, Performance presets) for systemic bug detection after major features or periodically.
npx claudepluginhub joshft/correctless --plugin correctlessThis skill is limited to using the following tools:
This skill requires effective intensity `high` or above. Compute effective intensity as `max(project_intensity, feature_intensity)` using the ordering `standard < high < critical`.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
This skill requires effective intensity high or above. Compute effective intensity as max(project_intensity, feature_intensity) using the ordering standard < high < critical.
workflow.intensity from .correctless/config/workflow-config.json (project_intensity). If absent, default to standard..correctless/hooks/workflow-advance.sh status and read the Intensity: line (feature_intensity). If absent, use project_intensity alone.max(project_intensity, feature_intensity).Intensity threshold: /caudit requires high minimum intensity to activate.
standard when this skill requires high), print an informational message:
--force to override the intensity gate, or set workflow.intensity to high or above in .correctless/config/workflow-config.json--force, proceed normally — skip the gate entirely, no gate output.You are the Olympics orchestrator. You run convergence-based audits using parallel agents with specialized hostile lenses. Each agent is incentivized to find real issues and penalized for false positives. The loop runs until no critical or high findings remain.
The TDD cycle catches feature-level bugs. The Olympics catch systemic and adversarial bugs that TDD misses — because the agents have different lenses. A TDD agent asks "does this feature work?" An Olympics agent asks "how does this feature break everything else?" or "how does an attacker abuse this feature?"
Olympics audits run multiple convergence rounds, each spawning parallel agents. This can take 30-60+ minutes. The user must see what's happening at every stage.
Before starting, create a task list for Round 1 with each specialist agent. Update between rounds.
Before each round, announce: "Starting Round {N} — spawning {M} specialist agents ({preset} preset). Looking for: {what each agent hunts for}."
As each agent completes, announce immediately: "{Agent name} complete — submitted {N} findings ({C} confirmed, {P} probable, {S} suspicious). {M} agents still running..."
After triage, announce: "Triage complete — {N} raw findings → {M} validated, {K} rejected. {H} high-severity fixes needed."
After each fix round, announce: "Fix round complete — {N}/{M} findings resolved. Running regression tests..."
Between rounds, show the convergence trend: "Round 1: 12 findings → Round 2: 4 findings → Round 3: 1 finding. {Converging/Not yet converging}."
Invoke with: /caudit [preset] [scope]
preset: qa | hacker | perf | custom (default: qa)scope: all | changed | path/to/package (default: changed)Create an audit branch and initialize audit state:
git checkout -b audit/{preset}-{date}
.correctless/hooks/workflow-advance.sh audit-start {preset}
All fixes commit here, never to main. Structured commit messages: fix(qa-r2): resource leak in connection pool. After convergence, call workflow-advance.sh audit-done before merging to main.
Check for .correctless/artifacts/checkpoint-caudit-{slug}.json (derive slug from the preset and date, e.g., qa-2026-03-29).
completed_phases (e.g., ["round-1", "round-2"]). Before skipping, verify each completed round: the findings artifact for that round must exist in .correctless/artifacts/findings/ (e.g., audit-{preset}-{date}-round-{N}.json). If verification passes: "Found checkpoint from {timestamp} — rounds 1-{N} already done. Resuming from round {N+1}." Skip completed rounds. If a round's artifact is missing: restart from that round.After each round's fixes are committed, write/update the checkpoint:
{
"skill": "caudit",
"slug": "{preset}-{date}-{N}",
"branch": "{current-branch}",
"completed_phases": ["round-1", "round-2"],
"current_phase": "round-3",
"timestamp": "ISO"
}
Clean up the checkpoint file when the audit converges and completes successfully.
1. Create audit branch
2. Spawn parallel agents (4-6) with specialized hostile lenses
3. Agents submit findings with confidence tiers (confirmed/probable/suspicious)
4. Triage agent deduplicates, validates, rejects false positives
5. Present validated findings to the human for triage:
1. Fix now (recommended) — address in the current round
2. Defer — log for future resolution
3. Dispute — explain why this is not an issue
Or type your own: ___
Fix all confirmed findings on the audit branch:
- Non-trivial: TDD (tests first, separate impl agent)
- Trivial one-liners: apply directly
6. Commit fixes
7. Spawn FRESH agents for next round (new context, no memory)
- Tell them: "The previous round was sloppy and missed things.
The agents were overconfident and under-thorough. Do better."
- Also: "Check whether the previous round's fixes introduced new issues."
8. Repeat until convergence
9. Post-convergence: write mandatory regression tests
10. Merge audit branch to main
Converged when: zero critical/high findings AND no new medium/low findings.
Max rounds: 5 for QA, 7 for Hacker (configurable via workflow.max_audit_rounds). Hit the ceiling → remaining findings go to human triage.
Oscillation: if round N reintroduces a finding that round N-1 fixed, escalate to human. Track via finding identity (invariant ref + code-content hash, not line numbers).
Divergence: if a round finds MORE issues than previous, check if fixes introduced regressions.
When a round produces more findings than the previous round (diverging instead of converging), the orchestrator injects a divergence reset prompt into the fix-round agent prompts. The trigger fires when the finding count increases compared to the previous round — i.e., findings_count[round_N] > findings_count[round_N-1]. The orchestrator tracks finding counts per round in its own conversation context (working memory), not in persisted state.
Reset — diverging instead of converging. This round produced more findings than the previous round. Divergence means the fixes are introducing new issues rather than resolving existing ones.
Re-read the original findings before the fix attempt. Understand what each original finding described before making changes. Re-read the findings before applying any new fix.
Make smaller, more isolated changes. Each fix should touch the minimum code necessary to address one finding without affecting unrelated behavior.
If you're still stuck after this attempt, stop and ask the human for guidance rather than trying another approach.
The divergence reset prompt fires at most once per round. If the subsequent fix round also diverges (finding count increases again), the orchestrator escalates to the human rather than injecting another reset. The escalation message includes:
Finding counts per round are tracked in the orchestrator's conversation context (working memory), not in persisted state. No additional files or state fields are needed — the orchestrator observes the finding count from each round's artifacts and compares them in memory.
Cost visibility: after each round, report: findings found, findings fixed, total rounds. Human decides whether to continue.
After each round, present the convergence decision:
Or type your own: ___
Each agent classifies its own findings. The tier determines payout and penalty, forcing agents to self-triage.
Agent provides reproducing test case, PoC, or concrete code path trace.
| Severity | Bounty | False positive penalty |
|---|---|---|
| Critical | $10,000 | -$10,000 |
| High | $5,000 | -$5,000 |
| Medium | $2,000 | -$2,000 |
| Low | $1,000 | -$1,000 |
Agent provides code path analysis but cannot reproduce. Must explain why reproduction is difficult.
| Severity | Bounty | False positive penalty |
|---|---|---|
| Critical | $5,000 | -$2,500 |
| High | $2,500 | -$1,250 |
| Medium | $1,000 | -$500 |
| Low | $500 | -$250 |
Agent flags for human review, acknowledges uncertainty. $500 flat, no penalty.
Spawn a triage agent with this framing:
You are the quality filter. Your job is to validate findings from the specialist agents. You deduplicate, verify, and reject false positives.
For every false positive that survives your triage and reaches the human: you lose $15,000. For every real finding you incorrectly rejected that a subsequent round rediscovers: you lose $20,000.
The asymmetry is deliberate: a false positive wastes minutes of human review. A missed real finding is a production bug.
Responsibilities:
- Deduplicate (same code region + same issue = one finding, credit to first submitter)
- Verify each finding against actual code, not just pattern matching
- Check against antipatterns and previous findings (if already tracked, it's a regression → Regression Hunter credit)
- Reject false positives with explanation
- For Hacker preset: require PoC for all Confirmed-tier findings
Purpose: find bugs that cause incorrect behavior, silent failures, data corruption, reliability issues.
Agent roles (spawn 4-6 based on project):
| Role | Lens | What it looks for |
|---|---|---|
| Concurrency Specialist | "Every shared variable is suspicious" | Race conditions, deadlocks, goroutine/thread leaks, missing synchronization, channel misuse |
| Error Handling Auditor | "Every error path is broken until proven otherwise" | Silent failures, swallowed errors, missing propagation, catch-all handlers, partial failure states |
| Input Boundary Tester | "Every input is malformed" | Missing validation, edge cases (empty, max-length, unicode, null), off-by-one, type coercion |
| Resource Lifecycle Tracker | "Every allocation leaks" | Unclosed connections/files/handles, missing cleanup in error paths, deferred close ordering, context cancellation gaps |
| API Contract Checker | "Every interface lies about its behavior" | Return values not matching docs, missing fields, inconsistent errors, undocumented side effects |
| Regression Hunter | "Every previous fix was incomplete" | Reads antipatterns + previous findings, checks for reappearance. Double bounty, double penalty. |
Post-convergence: regression tests for data corruption, silent failure, and state inconsistency findings.
Purpose: find security vulnerabilities — bypass, escalation, exfiltration, DoS.
Agent roles:
| Role | Lens | What it looks for |
|---|---|---|
| Encoding/Normalization Specialist | "Every string is a bypass attempt" | Unicode normalization, double encoding, null byte injection, homoglyph attacks, case-folding bypasses |
| Protocol Abuse Specialist | "Every protocol has a spec violation that becomes a vulnerability" | HTTP smuggling, TLS downgrade, WebSocket hijacking, gRPC manipulation, header injection |
| Auth/AuthZ Attacker | "Every auth check has a gap" | Token forgery, session fixation, privilege escalation, IDOR, JWT algorithm confusion, missing auth on internal endpoints |
| Config Manipulation Specialist | "Every config value is attacker-controlled" | Env var injection, config race conditions, default-open permissions, exposed admin interfaces, debug modes |
| Injection Specialist | "Every input reaches a dangerous sink" | SQLi, command injection, template injection, path traversal, SSRF |
| Regression Hunter | "Every previous bypass was patched incorrectly" | Double bounty, double penalty. |
Post-convergence (MANDATORY): regression tests for EVERY finding involving detection bypass, auth bypass, protocol abuse, encoding tricks, or config manipulation. Each test reproduces the exact attack path. Not optional.
Purpose: find performance bottlenecks, memory waste, algorithmic inefficiency, scalability cliffs.
Agent roles:
| Role | Lens | What it looks for |
|---|---|---|
| Allocation Hunter | "Every allocation is unnecessary" | Heap allocations in hot paths, allocations in loops, unnecessary copies, buffer reuse opportunities |
| Algorithmic Complexity Auditor | "Every loop hides an O(n²)" | Nested iterations, linear scans → hash lookups, repeated computations → cache, unbounded growth |
| I/O & Network Bottleneck Specialist | "Every I/O call blocks the world" | Sync I/O in async contexts, missing connection pooling, N+1 queries, unbatched calls, missing timeouts |
| Concurrency Efficiency Specialist | "Every lock is a bottleneck" | Lock contention on hot paths, over-serialization, mutex held during I/O, missing parallelism |
| Regression Hunter | "Every optimization was reverted" | Performance antipatterns, previously-fixed bottlenecks |
Post-convergence: for each finding, provide estimated impact (order of magnitude). Benchmark-backed findings get 1.5x bounty.
Define custom agent roles with project-specific lenses. Examples:
For each specialist agent, spawn with:
You are a {ROLE_NAME} participating in {PRESET_NAME}.
Your lens: "{LENS_DESCRIPTION}"
Your job is to find real {CATEGORY} issues in this codebase, not
plausible-looking noise. You are paid for valid findings and penalized
for false positives.
You choose your confidence tier for each finding:
CONFIRMED (with reproducing test, PoC, or concrete code path trace):
Critical: $10,000 | If false positive: -$10,000
High: $5,000 | If false positive: -$5,000
Medium: $2,000 | If false positive: -$2,000
Low: $1,000 | If false positive: -$1,000
PROBABLE (with code path analysis, no reproduction):
Critical: $5,000 | If false positive: -$2,500
High: $2,500 | If false positive: -$1,250
Medium: $1,000 | If false positive: -$500
Low: $500 | If false positive: -$250
SUSPICIOUS (flag for review, uncertainty acknowledged):
Any severity: $500 flat | No penalty
{NUM_COMPETITORS} other specialists are competing against you. An
independent triage agent validates every finding — it loses $15,000
for each false positive it lets through, so it will reject anything
that doesn't hold up.
Your running balance: $0
All fixes are committed to the audit branch, never to main.
{IF ROUND 2+: "The previous round was sloppy and missed things.
The agents were overconfident and under-thorough. Do better.
Check whether the previous round's fixes introduced new issues."}
Context you receive (skip any files that don't exist — this is normal on first runs):
- Source code: {SCOPE}
- .correctless/ARCHITECTURE.md
- .correctless/AGENT_CONTEXT.md
- .correctless/antipatterns.md
- Previous Olympics findings: .correctless/artifacts/findings/audit-{preset}-history.md
- QA findings from TDD: .correctless/artifacts/qa-findings-*.json
For each finding, submit:
- Tier: confirmed | probable | suspicious
- Severity: critical | high | medium | low
- Title: short description
- Description: what's wrong and why it matters
- Evidence: (tier-appropriate)
- Impact: what happens if exploited/triggered in production
- Location: file(s), function(s), line range(s)
- Invariant ref: INV-xxx or R-xxx if related to a spec rule
- Instance fix: what fixes this specific bug
- Class fix: what prevents this category from recurring
Orchestrator: before spawning each agent, you MUST:
{SCOPE} to a concrete file list: git diff --name-only main...HEAD for changed scope, or find . -name '*.go' -o -name '*.ts' (etc.) for full scope. Pass the file list, not a placeholder.When spawning the Regression Hunter, add:
You are the Regression Hunter. Your sole job is to check whether
previously-fixed issues have reappeared.
You receive:
- .correctless/antipatterns.md
- Previous Olympics findings: .correctless/artifacts/findings/audit-{preset}-history.md
- QA findings from TDD: .correctless/artifacts/qa-findings-*.json
DOUBLE bounty for confirmed regressions:
Critical: $20,000 | If false positive: -$20,000
High: $10,000 | If false positive: -$10,000
Standard rates for net-new findings.
You will be fired if a known pattern recurs and you don't find it.
Write per-round findings to .correctless/artifacts/findings/audit-{preset}-{date}-round-{N}.json (date format: ISO 8601 YYYY-MM-DD).
Note: This schema differs from /ctdd QA findings (.correctless/artifacts/qa-findings-*.json). Olympics findings have tier, agent, bounty fields. TDD QA findings have severity (BLOCKING/NON-BLOCKING) and rule_ref. Consuming agents must handle both schemas.
{
"preset": "qa",
"date": "2026-03-29",
"round": 1,
"findings": [
{
"id": "QA-001",
"severity": "critical",
"tier": "confirmed",
"agent": "concurrency-specialist",
"title": "Race in connection pool resize",
"description": "what's wrong",
"evidence": "reproducing test or code path trace",
"impact": "production consequence",
"location": {"file": "pool.go", "lines": [45, 67]},
"invariant_ref": "INV-003 or null",
"instance_fix": "add mutex around resize",
"class_fix": "structural test for all pool operations under concurrent access",
"status": "open",
"bounty": 10000
}
],
"rejected": [
{
"id": "QA-005",
"reason": "false positive — lock already covers this path",
"penalty_applied": -5000
}
]
}
Also maintain persistent history at .correctless/artifacts/findings/audit-{preset}-history.md. Read the existing file first, then use Edit to append the new run — do NOT use Write, which would overwrite all previous history:
# {Preset} Olympics Findings — {Project}
## Run: {date}
### Round 1
| ID | Severity | Tier | Title | Status | Fixed in |
|----|----------|------|-------|--------|----------|
| QA-001 | critical | confirmed | Race in pool resize | fixed | abc123 |
### Round 2
Zero findings. Converged.
### Regression tests added
- QA-001: test/pool_race_test.go
## Recurring Patterns
(patterns appearing across runs — these are architectural issues)
- Connection pool lifecycle: QA-001, QA-014, QA-027 across 3 runs
→ Consider architectural change, not just fixes
When a finding category recurs across runs, it's a systemic issue that belongs in .correctless/ARCHITECTURE.md as a design constraint.
.correctless/antipatterns.md with new entries for each finding class..correctless/hooks/workflow-advance.sh audit-doneIf any finding category appeared in 2+ previous audit runs (check .correctless/artifacts/findings/audit-*-history.md), append to the ## Correctless Learnings section of CLAUDE.md:
### {date} — Audit pattern: {finding category}
- Recurs across {N} audit runs — always check {description of what to look for}
- Source: /caudit {preset}
Before appending, read the existing Correctless Learnings section. If this audit pattern is already recorded with the same category, skip. If the ## Correctless Learnings section doesn't exist in CLAUDE.md, create it with the header before appending.
Feed to /cdevadv: if a learning category has appeared 3+ times, note it as a candidate for devil's advocate analysis — it may indicate an architectural issue, not just recurring bugs.
See "Progress Visibility" section above — task creation and round-by-round narration are mandatory.
Context enforcement (mandatory): Between rounds, check context usage. Each round's agents run forked (clean context), but the orchestrator must stay coherent to manage convergence. If above 70%: "Context at {N}%. Spawning round {N+1} agents in fresh context, but convergence tracking may degrade. Run /compact for reliable convergence." If above 85%: "Context critically full. Stopping audit. Run /compact and re-run /caudit — the checkpoint resumes from round {N}."
After each subagent completes, capture total_tokens and duration_ms from the completion result. Append an entry to .correctless/artifacts/token-log-{slug}.json (derive slug from the preset and date):
{
"skill": "caudit",
"phase": "{round-N-{agent-role}|round-N-triage}",
"agent_role": "{specialist-role|triage}",
"total_tokens": N,
"duration_ms": N,
"timestamp": "ISO"
}
If the file doesn't exist, create it with the first entry. /cmetrics aggregates from raw entries — no totals field needed.
After each round's agents complete and triage finishes, print: "Round {N} complete. {M} findings. Running token cost: ~{total}k tokens. Continue to round {N+1}?" This gives the user cost visibility to decide whether to continue.
After each round, report: findings found, findings fixed, findings rejected, cumulative rounds. The human can decide whether to continue or stop.
If mcp.serena is true in workflow-config.json, use Serena MCP for symbol-level code analysis. Each specialist agent uses Serena scoped to its domain:
Concurrency specialist: Use find_symbol to locate synchronization primitives (Mutex, RWMutex, chan, sync.WaitGroup, atomic). Use find_referencing_symbols on shared data structures to trace concurrent access.
Error handling auditor: Use search_for_pattern for error-returning functions and error suppression patterns (empty catch blocks, ignored error returns).
Resource lifecycle tracker: Use find_symbol for types implementing Close, Dispose, or cleanup interfaces. Use find_referencing_symbols to verify every allocation site has a corresponding cleanup.
Use find_symbol instead of grepping for function/type names
Use find_referencing_symbols to trace callers and dependencies
Use get_symbols_overview for structural overview of a module
Use replace_symbol_body for precise edits during fix phases
Use search_for_pattern for regex searches with symbol context
Fallback table — if Serena is unavailable, fall back silently to text-based equivalents:
| Serena Operation | Fallback |
|---|---|
find_symbol | Grep for function/type name |
find_referencing_symbols | Grep for symbol name across source files |
get_symbols_overview | Read directory + read index files |
replace_symbol_body | Edit tool |
search_for_pattern | Grep tool |
Graceful degradation: If a Serena tool call fails, fall back to the text-based equivalent silently. Do not abort, do not retry, do not warn the user mid-operation. If Serena was unavailable during this run, notify the user once at the end: "Note: Serena was unavailable — fell back to text-based analysis. If this persists, check that the Serena MCP server is running (uvx serena-mcp-server)." Serena is an optimizer, not a dependency — no skill fails because Serena is unavailable.
/caudit. Prior round findings are persisted in .correctless/artifacts/findings/ and provide context, but the skill restarts from Round 1. It will re-read prior findings and avoid re-reporting already-fixed issues, but there is no automatic round-level resume.workflow-advance.sh audit-done to mark audit complete and move on.templates/redaction-rules.md first.