From correctless
Audits Claude Code/Correctless workflows for phase execution, rule coverage in QA/review, and agent thoroughness. Use when suspecting shortcuts or after bugs escape despite completion.
npx claudepluginhub joshft/correctless --plugin correctlessThis skill is limited to using the following tools:
Every other skill watches the code. This skill watches the agents watching the code. It answers: **"Did the workflow actually do what the skill instructions said it should do?"**
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Every other skill watches the code. This skill watches the agents watching the code. It answers: "Did the workflow actually do what the skill instructions said it should do?"
Invoke with: /cwtf (analyzes the most recent or current workflow) or /cwtf {phase} (analyzes a specific phase)
This report identifies gaps, not blame. "QA checked 4 of 6 rules" is a fact, not an accusation. The QA agent may have had good reason — context overflow, rate limiting, or the 2 unchecked rules were trivially satisfied by the implementation. Present findings with context, not judgment. Let the user decide what matters.
Frame gaps as: "R-003 was not checked during QA. This may indicate context overflow, rate limiting, or that the agent prioritized higher-risk rules." NOT: "The QA agent FAILED to check R-003."
Accountability analysis takes 5-10 minutes. The user must see progress throughout.
Before starting, create a task list:
Between each step, print a 1-line status: "Phase execution verified — all 7 phases ran. Checking rule coverage..." Mark each task complete as it finishes.
Derive the branch slug and hash using the same formula as other hooks (sed + md5sum/md5). Determine the repo root with git rev-parse --show-toplevel — prepend this to all relative paths for the Read tool.
Derive the task-slug from the workflow state's .task field: lowercase, non-alphanumeric characters replaced with -, consecutive dashes collapsed, leading/trailing dashes removed. This differs from the branch slug.
Read these data sources (skip any that don't exist):
.correctless/artifacts/workflow-state-{slug}-{hash}.json.spec_file field (relative to repo root — prepend repo root for Read tool).correctless/artifacts/qa-findings-{task-slug}.json.correctless/artifacts/tdd-test-edits.log.correctless/artifacts/audit-trail-{slug}-{hash}.jsonl.correctless/artifacts/override-log.json.correctless/verification/{task-slug}-verification.mdfind ~/.claude/usage-data/session-meta/ -name '*.json' filtered by project_path matching repo root~/.claude/projects/. List directories with find ~/.claude/projects/ -maxdepth 2 -name '*.jsonl', identify the correct file by matching the project path pattern in the directory name and selecting the most recent file. This file can be very large — use targeted jq queries, never read it entirely.If no workflow state file exists: "No active or completed workflow on this branch. Nothing to analyze."
Extract the spec rules: grep the spec file for R-xxx or INV-xxx identifiers. Count them — this is the baseline for coverage checks.
Check whether all mandatory phases executed:
At standard intensity: spec → review → tdd-tests → tdd-impl → tdd-qa → done → verified → documented At high+ intensity: spec → review-spec (or model → review-spec) → tdd-tests → tdd-impl → tdd-qa → (tdd-verify →) done → verified → documented
Primary source for phase history: the audit trail. The workflow state's phase_entered_at field only contains the MOST RECENT transition timestamp — it cannot prove earlier phases ran. Instead, extract distinct phase values from the audit trail: jq -r '.phase' .correctless/artifacts/audit-trail-{slug}-{hash}.jsonl | sort -u. This shows every phase that had tool activity. If no audit trail exists, fall back to the current phase field as a minimum marker and note: "No audit trail — can only verify current phase, not history."
Also check:
spec_update_history — many updates suggests the spec was undercooked)Report: "All {N} phases executed" or "Phase {X} was skipped via override: '{reason}'"
For each spec rule (R-xxx or INV-xxx):
Tests R-001, R-001). Use the patterns.test_file from workflow-config.json to find test files.[integration], check whether the test uses real wiring or mocks.Output as a table:
| Rule | Test | QA Checked | Verify Status |
|------|------|-----------|---------------|
| R-001 | auth.test.ts:42 | Yes | covered |
| R-002 | — | NO | UNCOVERED |
Recommended action for gaps: "R-002 has no test. Run /ctdd from the tests phase to add coverage, or run /cverify to confirm this is a known gap."
The most valuable analysis. Assess QA coverage and depth:
Rule mention count: Search the QA findings artifact AND the conversation JSONL for mentions of each rule ID. Count: "QA mentioned {N} of {M} spec rules." Missing rules are listed.
Token budget indicator (best-effort): Find the most recent session-meta entry matching this project's path. Report its total output_tokens. If multiple prior sessions exist for the project, compute a rough average and compare: "This session used {N}k tokens ({X}% of project average)." A session at 30% of average likely shortcut. Note: identifying which session corresponds to QA specifically is imprecise — this is a rough signal, not a measurement. If session-meta is unavailable, skip this metric.
File coverage (if audit trail exists): From the audit trail, which files had Read operations during the tdd-qa phase? Compare against files modified during tdd-impl. "QA read {N} of {M} files modified during implementation." Missing files are listed.
Recommended action: "QA did not check R-003 or R-005. Consider: re-run QA with workflow-advance.sh fix then workflow-advance.sh qa, or manually verify these rules are satisfied."
Check whether the review agent was thorough:
Security checklist coverage: If the spec touches auth, user input, data storage, or APIs, the security checklist should have fired. Search conversation JSONL for security-related terms (CSRF, XSS, injection, auth bypass, SSRF, RLS, CORS, HSTS). Count how many categories were checked vs how many were applicable.
Antipattern check: Did the review mention any antipattern IDs (AP-xxx)? If the project has antipatterns.md with entries, the review should have checked against them.
Recommended action: "Review did not check for CSRF despite the spec touching API endpoints. Run /creview again or verify CSRF protection manually."
Cross-reference what happened against what should have happened:
Source files modified during QA: The audit trail should show no Edit/Write operations on source files during tdd-qa phase. If it does: "Source file {file} was modified during QA phase at {timestamp}. This is a gate bypass — the QA agent should be read-only."
Test files modified during GREEN without logging: Compare audit trail (test file edits during tdd-impl) against the test-edit log. If the audit trail shows a test edit that the log doesn't mention: "Test file {file} was edited during GREEN at {timestamp} but not logged in tdd-test-edits.log."
Overrides: List all overrides with their reasons and when they occurred relative to the workflow timeline.
Spec updates: If spec_updates > 0, list each update with its reason. Multiple updates suggest the spec wasn't thorough enough.
Recommended action per deviation: specific, not vague. "Source file edited during QA — check if this was a legitimate fix round (should have used workflow-advance.sh fix first) or a gate bypass."
Assess the overall workflow quality:
Precedence rule: If any SHORTCUT criterion is met (phase skip via override, source edit during QA, token usage far below average), the verdict cannot be higher than SHORTCUT regardless of other positive signals. A single gate bypass dominates all other indicators.
The verdict is a judgment call within those constraints. Explain the reasoning. Users can disagree.
## Workflow Accountability Report
### Workflow: {task name}
**Branch:** {branch}
**Phases completed:** {list}
**QA rounds:** {N}
**Overrides used:** {N}
### Phase Execution: {PASS | {N} issues}
{details}
### Rule Coverage: {N}/{M} rules covered
| Rule | Test | QA Checked | Verify Status |
|------|------|-----------|---------------|
| R-001 | auth.test.ts:42 | Yes | covered |
| R-002 | — | NO | UNCOVERED |
### Agent Thoroughness
**QA**: Checked {N}/{M} rules. Token usage: {N}k ({X}% of average).
- {Missing rules with recommended actions}
**Review**: {N}/{M} security checks applied.
- {Skipped checks with recommended actions}
### Deviations ({N} found)
{Each deviation with timestamp, evidence, and recommended action}
### Verdict: {THOROUGH | ADEQUATE | INCOMPLETE | SHORTCUT}
{2-3 sentences explaining the assessment and what, if anything, should be done about it.}
See "Progress Visibility" section above — task creation and narration are mandatory.
If mcp.serena is true in workflow-config.json, use Serena MCP for symbol-level code analysis during thoroughness checking — particularly call-graph-based analysis:
find_symbol instead of grepping for function/type namesfind_referencing_symbols to trace callers and dependencies for call-graph completenessget_symbols_overview for structural overview of a modulereplace_symbol_body for precise edits (not used in this skill — wtf is read-only)search_for_pattern for regex searches with symbol contextFallback table — if Serena is unavailable, fall back silently to text-based equivalents:
| Serena Operation | Fallback |
|---|---|
find_symbol | Grep for function/type name |
find_referencing_symbols | Grep for symbol name across source files |
get_symbols_overview | Read directory + read index files |
replace_symbol_body | Edit tool |
search_for_pattern | Grep tool |
Graceful degradation: If a Serena tool call fails, fall back to the text-based equivalent silently. Do not abort, do not retry, do not warn the user mid-operation. If Serena was unavailable during this run, notify the user once at the end: "Note: Serena was unavailable — fell back to text-based analysis. If this persists, check that the Serena MCP server is running (uvx serena-mcp-server)." Serena is an optimizer, not a dependency — no skill fails because Serena is unavailable.
grep and jq queries on the JSONL file. Search for specific rule IDs, tool names, or phase-related keywords. Never read the entire file into memory.grep for rule IDs, jq for structured extraction. Never cat the entire file.templates/redaction-rules.md first.