Skill

progressive-harsh-review

Simulates three escalating critic personas for adversarial review of non-code deliverables like plans, specs, designs, documents. Scores correctness, simplicity, testability, edge cases, security; rejects if <7. Invoke before presenting.

code-quality

npx claudepluginhub bordenet/superpowers-plus --plugin superpowers-plus

Tool Access

This skill uses the workspace's default tool permissions.

Preview

> **Wrong skill?** Code PR review → `progressive-code-review-gate`. File-protocol review → `code-review-respond`. Quick feedback → `providing-code-review`.

SKILL.md

Similar Skills

using-superpowers

185.1k

Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.

3 files

superpowers

Stats

Stars6

Forks1

Last CommitMay 1, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Progressive Harsh Review

Wrong skill? Code PR review → progressive-code-review-gate. File-protocol review → code-review-respond. Quick feedback → providing-code-review.

Purpose: Multi-persona adversarial review that catches what self-review cannot. Pattern: Three escalating critic personas, each scoring independently.

Announce at start: "I'm using the progressive-harsh-review skill to red-team this work."

Companion Skills

progressive-code-review-gate: Code-level review (this skill reviews designs/plans)
brainstorming: Generating options before review
micro-harsh-review: Per-batch code review
providing-code-review: Code-specific review

When to Use

Intent-based (self-fire — do not wait to be asked):

About to present any non-code deliverable to the human — plans, specs, skill files, designs, documents
The trigger is the INTENT to present, not whether the human explicitly requested review
If there is even a 1% chance the human expects a solid artifact, run PHR first

Explicit request:

When the user says "review this harshly", "find what's wrong", or "red team this"

NOT for:

Code PRs → use code-review-battery instead
Design comparison (choosing between options) → debate handles that
Initial brainstorming (too early — nothing to review yet)

The Three Personas

Persona 1: JuniorDevNitpicker (Surface Quality)

Focus: typos, formatting, naming, style, obvious bugs, missing error handling. Tone: eager, thorough, detail-oriented. Full access: All codebase context is available. Every persona may follow any lead. START FROM: The local diff — line-by-line reading of changed code. PRIORITIZE: Surface correctness, naming consistency, null handling, off-by-one errors, error message quality. Dimension weights: Correctness 35%, Simplicity 25%, Edge Cases 20%, Testability 15%, Security/Perf 5%.

Persona 2: SeniorArchCritic (Structural Quality)

Focus: architecture, design patterns, separation of concerns, extensibility, testability. Tone: experienced, skeptical, pattern-aware. Full access: All codebase context is available. Every persona may follow any lead. START FROM: Interface contracts and public APIs affected by the change. Check downstream impact before scoring. PRIORITIZE: Ripple analysis across all callers and consumers, design pattern adherence, reversibility, extensibility. Dimension weights: Correctness 25%, Simplicity 15%, Edge Cases 15%, Testability 25%, Security/Perf 20%.

Persona 3: ProdOpsHardass (Operational Quality)

Focus: failure modes, edge cases, security, performance, monitoring, rollback. Tone: battle-scarred, worst-case thinker, "what breaks at 3am?" Full access: All codebase context is available. Every persona may follow any lead. START FROM: Failure modes and state transitions. Trace error handling paths and retry/rollback behavior first. PRIORITIZE: Operational safety, monitoring/logging coverage, backward compatibility, deployment risk, 3 AM resilience. Dimension weights: Correctness 25%, Simplicity 10%, Edge Cases 25%, Testability 10%, Security/Perf 30%.

The Process

Step 1: Dispatch Review

HARD GATE: Author ≠ Reviewer. Use a sub-agent or explicit role switch.

For each persona, answer ALL scoring dimensions:

Dimension	Weight	Question
Correctness	30%	Does it do what it claims? Are there bugs?
Simplicity	20%	Is it the simplest solution? Over-engineered?
Testability	15%	Can each component be tested independently?
Edge Cases	20%	What breaks? What wasn't considered?
Security/Perf	15%	Vulnerabilities? Performance bottlenecks?

Step 2: Score and Aggregate

Each persona scores 1-10 on each dimension. Aggregation rule: take the weighted mean across all three personas (equal persona weight by default). This replaces the previous MINIMUM rule, which was overly pessimistic when one persona was mismatched to the task.

Critical veto: If ANY persona scores Correctness or Security/Perf ≤4 AND cites a specific defect (not a general concern), that finding acts as a hard veto — automatic REJECT regardless of the weighted mean. This preserves safety without making the whole system hostage to the weakest persona on non-critical dimensions.

Step 3: Verdict

Weighted Mean	Verdict	Action
≥8	PASS	Ship it
7–7.9	PASS_WITH_FIXES	Fix all findings, re-score changed areas only. Exit when mean ≥8 or Round 2 finds no new issues.
<7	REJECT	Root-cause analysis → remediate → full re-review
Any	REJECT (veto)	Critical veto fired — fix the cited defect, full re-review

Step 4: Remediation (if needed)

On REJECT:

Root-cause analysis — why did the issues exist? (missed requirement, wrong assumption, insufficient context)
Chain to remediation skills:
- Design issues → debate (generate alternatives)
- Stuck/circular → think-twice (fresh perspective)
- Plan issues → plan-and-execute (replan)
Re-review — minimum 2 rounds. Round 2 reviews ONLY delta changes.

Step 5: Correlated-Failure Detection

After scoring, scan persona outputs for shared blind spots:

Evidence overlap: If all 3 personas cite the same evidence for their findings, flag ⚠️ CORRELATED EVIDENCE. At least one persona must re-examine from a different starting point (Nitpicker: local diff, ArchCritic: interface contracts, ProdOps: failure modes).
Phrasing similarity: If 2+ personas use near-identical phrasing, flag ⚠️ ECHO REASONING. Require the echoing persona to restate the finding through their own analytical lens.
Clean-sweep suspicion: If ALL personas report no findings, verify each persona's output shows evidence of their distinct starting point (Nitpicker: line-level reading, ArchCritic: caller/contract analysis, ProdOps: failure mode tracing). If any persona's output lacks starting-point-specific evidence, re-examine.

Flags trigger re-examination, not automatic verdict changes.

Step 6: Convergence

Exit when: Final round weighted mean ≥7 AND no active Critical vetoes AND no correlated-failure flags AND no new material issues in latest round
Escalate when: 3 rounds without convergence → summarize blockers, escalate to human

Scoring Output Format

### Persona: SeniorArchCritic
| Dimension | Score | Finding |
|-----------|-------|---------|
| Correctness | 7 | Lock release doesn't check PID ownership |
| Simplicity | 8 | Clean, minimal |
| Testability | 6 | No unit tests for lock race condition |
| Edge Cases | 5 | What if process dies mid-lock? |
| Security/Perf | 7 | Lock file readable by any user |
**Weighted Average: 6.5 → REJECT** (below 7.0 threshold)

Example

# Run the harsh review tool
cd ~/.codex/superpowers-plus && bash tools/harsh-review.sh
# Check specific skill
bash tools/harsh-review.sh skills/engineering/my-skill/skill.md

Anti-Patterns

Anti-Pattern	Detection	Correction
Soft review	No score <7 given	Recalibrate with known-bad example
Same feedback loop	Same comment 3 iterations	Escalate to structural fix
Style over substance	All comments are formatting	Check logic, edge cases, error handling first
Perfection paralysis	5+ rounds, no convergence	Set hard limit: 3 rounds then ship
Missing context	Review without reading full file	Load surrounding context first

Failure Modes

Failure	Fix
Self-reviewed in same thinking pass	Use sub-agent or explicit role switch — author ≠ reviewer
All personas gave same feedback	Each persona must name ≥1 plausible failure mode unique to their lens, or cite a specific property of the change explaining why none exists (generic dismissal = rubber-stamp) — identical findings means the lenses aren't distinct
Score inflated to avoid re-work	Findings with concrete issues MUST score ≤7 on that dimension
Remediation skipped after REJECT	REJECT means start over. No "fix one thing and call it done"
Only reviewed happy path	ProdOpsHardass must consider failure, rollback, 3am scenarios