From superpowers-plus
Simulates three escalating critic personas for adversarial review of non-code deliverables like plans, specs, designs, documents. Scores correctness, simplicity, testability, edge cases, security; rejects if <7. Invoke before presenting.
npx claudepluginhub bordenet/superpowers-plus --plugin superpowers-plusThis skill uses the workspace's default tool permissions.
> **Wrong skill?** Code PR review → `progressive-code-review-gate`. File-protocol review → `code-review-respond`. Quick feedback → `providing-code-review`.
Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.
Share bugs, ideas, or general feedback.
Wrong skill? Code PR review →
progressive-code-review-gate. File-protocol review →code-review-respond. Quick feedback →providing-code-review.Purpose: Multi-persona adversarial review that catches what self-review cannot. Pattern: Three escalating critic personas, each scoring independently.
Announce at start: "I'm using the progressive-harsh-review skill to red-team this work."
Intent-based (self-fire — do not wait to be asked):
Explicit request:
NOT for:
code-review-battery insteaddebate handles thatFocus: typos, formatting, naming, style, obvious bugs, missing error handling. Tone: eager, thorough, detail-oriented. Full access: All codebase context is available. Every persona may follow any lead. START FROM: The local diff — line-by-line reading of changed code. PRIORITIZE: Surface correctness, naming consistency, null handling, off-by-one errors, error message quality. Dimension weights: Correctness 35%, Simplicity 25%, Edge Cases 20%, Testability 15%, Security/Perf 5%.
Focus: architecture, design patterns, separation of concerns, extensibility, testability. Tone: experienced, skeptical, pattern-aware. Full access: All codebase context is available. Every persona may follow any lead. START FROM: Interface contracts and public APIs affected by the change. Check downstream impact before scoring. PRIORITIZE: Ripple analysis across all callers and consumers, design pattern adherence, reversibility, extensibility. Dimension weights: Correctness 25%, Simplicity 15%, Edge Cases 15%, Testability 25%, Security/Perf 20%.
Focus: failure modes, edge cases, security, performance, monitoring, rollback. Tone: battle-scarred, worst-case thinker, "what breaks at 3am?" Full access: All codebase context is available. Every persona may follow any lead. START FROM: Failure modes and state transitions. Trace error handling paths and retry/rollback behavior first. PRIORITIZE: Operational safety, monitoring/logging coverage, backward compatibility, deployment risk, 3 AM resilience. Dimension weights: Correctness 25%, Simplicity 10%, Edge Cases 25%, Testability 10%, Security/Perf 30%.
HARD GATE: Author ≠ Reviewer. Use a sub-agent or explicit role switch.
For each persona, answer ALL scoring dimensions:
| Dimension | Weight | Question |
|---|---|---|
| Correctness | 30% | Does it do what it claims? Are there bugs? |
| Simplicity | 20% | Is it the simplest solution? Over-engineered? |
| Testability | 15% | Can each component be tested independently? |
| Edge Cases | 20% | What breaks? What wasn't considered? |
| Security/Perf | 15% | Vulnerabilities? Performance bottlenecks? |
Each persona scores 1-10 on each dimension. Aggregation rule: take the weighted mean across all three personas (equal persona weight by default). This replaces the previous MINIMUM rule, which was overly pessimistic when one persona was mismatched to the task.
Critical veto: If ANY persona scores Correctness or Security/Perf ≤4 AND cites a specific defect (not a general concern), that finding acts as a hard veto — automatic REJECT regardless of the weighted mean. This preserves safety without making the whole system hostage to the weakest persona on non-critical dimensions.
| Weighted Mean | Verdict | Action |
|---|---|---|
| ≥8 | PASS | Ship it |
| 7–7.9 | PASS_WITH_FIXES | Fix all findings, re-score changed areas only. Exit when mean ≥8 or Round 2 finds no new issues. |
| <7 | REJECT | Root-cause analysis → remediate → full re-review |
| Any | REJECT (veto) | Critical veto fired — fix the cited defect, full re-review |
On REJECT:
debate (generate alternatives)think-twice (fresh perspective)plan-and-execute (replan)After scoring, scan persona outputs for shared blind spots:
⚠️ CORRELATED EVIDENCE. At least one persona must re-examine from a different starting point (Nitpicker: local diff, ArchCritic: interface contracts, ProdOps: failure modes).⚠️ ECHO REASONING. Require the echoing persona to restate the finding through their own analytical lens.Flags trigger re-examination, not automatic verdict changes.
### Persona: SeniorArchCritic
| Dimension | Score | Finding |
|-----------|-------|---------|
| Correctness | 7 | Lock release doesn't check PID ownership |
| Simplicity | 8 | Clean, minimal |
| Testability | 6 | No unit tests for lock race condition |
| Edge Cases | 5 | What if process dies mid-lock? |
| Security/Perf | 7 | Lock file readable by any user |
**Weighted Average: 6.5 → REJECT** (below 7.0 threshold)
# Run the harsh review tool
cd ~/.codex/superpowers-plus && bash tools/harsh-review.sh
# Check specific skill
bash tools/harsh-review.sh skills/engineering/my-skill/skill.md
| Anti-Pattern | Detection | Correction |
|---|---|---|
| Soft review | No score <7 given | Recalibrate with known-bad example |
| Same feedback loop | Same comment 3 iterations | Escalate to structural fix |
| Style over substance | All comments are formatting | Check logic, edge cases, error handling first |
| Perfection paralysis | 5+ rounds, no convergence | Set hard limit: 3 rounds then ship |
| Missing context | Review without reading full file | Load surrounding context first |
| Failure | Fix |
|---|---|
| Self-reviewed in same thinking pass | Use sub-agent or explicit role switch — author ≠ reviewer |
| All personas gave same feedback | Each persona must name ≥1 plausible failure mode unique to their lens, or cite a specific property of the change explaining why none exists (generic dismissal = rubber-stamp) — identical findings means the lenses aren't distinct |
| Score inflated to avoid re-work | Findings with concrete issues MUST score ≤7 on that dimension |
| Remediation skipped after REJECT | REJECT means start over. No "fix one thing and call it done" |
| Only reviewed happy path | ProdOpsHardass must consider failure, rollback, 3am scenarios |