From test-writing
Orchestrates multi-agent team review of PHPUnit unit tests in 4 waves: independent analysis, peer debate, adversarial red team, and defense.
npx claudepluginhub shopwarelabs/ai-coding-tools --plugin test-writingThis skill is limited to using the following tools:
Wave-based orchestration: spawn agents per wave, collect outputs, assemble inputs for the next wave. You (the skill executor) act as team lead.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Wave-based orchestration: spawn agents per wave, collect outputs, assemble inputs for the next wave. You (the skill executor) act as team lead.
Run via Bash:
printenv CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS
If the output is NOT exactly 1, output the following and stop immediately:
Agent Teams is not enabled. Team-based review requires the experimental Agent Teams feature.
To enable it, add the following to the "env" section of ~/.claude/settings.json:
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
Then restart Claude Code and try again.
Then ask via AskUserQuestion: "Would you like to use the standard single-reviewer instead?"
Do not proceed to Phase 1.
Read references/input-resolution.md first — then follow its resolution strategies to build the file manifest. Do not run any git or file discovery commands before reading it.
Output: [{path}] — each entry is a validated test file. Let N = number of files in the manifest.
Calculate reviewer count R:
if N == 1: R = 3
else: R = min(5, max(4, ceil(N * 3 / 5)))
Calculate adversary count A per references/reviewer-allocation.md
Compute file assignments for reviewers (round-robin per references/reviewer-allocation.md) and adversaries (partitioning per references/reviewer-allocation.md)
Call TeamCreate(team_name: "test-review", description: "PHPUnit test review — {R} reviewers + {A} adversaries")
No agents spawned yet. Agents are spawned per wave.
Spawn R reviewer agents + A adversary agents in a single message (parallel).
Agent names include the wave number as suffix (reviewer-{n}-{wave}) to avoid collisions within the team. Use the same reviewer-{n} identity in output contracts and co-reviewer references across waves.
For each reviewer:
Agent(
agent: "test-writing:test-reviewer",
team_name: "test-review",
name: "reviewer-{n}-0",
prompt: "Invoke Skill(test-writing:phpunit-unit-test-reviewing) for each of your assigned files.
Assigned files:
{for each file: - {path} (Category {category})}
After ALL reviews complete, return your combined findings for all files
using this format:
type: findings
reviewer: reviewer-{n}
files:
- path: {path}
category: {category}
findings: [{rule_id, enforce, location, summary, current, suggested}]"
)
For each adversary:
Agent(
agent: "test-writing:test-adversary",
team_name: "test-review",
name: "adversary-{n}-0",
prompt: "Read your assigned test files and their source classes (from #[CoversClass]).
Form intuitive impressions — what concerns you about these tests?
Assigned files:
{for each file: - {path} (Category {category})}
Use these heuristic lenses (do NOT use MCP rule tools):
- Absence detection: what's NOT tested that you'd expect?
- Consequence weighting: which gaps would cause the most production damage?
- Dependency fan-out: which shared assumptions could mask bugs?
- Pattern anomalies: inconsistencies in style, mocking, assertions?
- The 'surprised?' test: if the test passed but behavior was broken, would you be surprised?
Return your impressions per file:
impressions:
- file_path: {path}
concerns:
- area: 'description'
severity: high | medium | low"
)
Wait for all agents to complete. Collect findings and impressions.
For each reviewer, assemble:
Spawn R reviewer agents in a single message (parallel):
Agent(
agent: "test-writing:test-reviewer",
team_name: "test-review",
name: "reviewer-{n}-1",
prompt: "Invoke Skill(test-writing:phpunit-unit-test-debating) with this input.
Own findings:
[reviewer's Wave 0 findings]
Peer findings:
[per co-reviewer, their findings on shared files]
Co-reviewers (use these names for SendMessage):
[list of {name: reviewer-{m}-1, shared_files}]
Debate with your co-reviewers via SendMessage, then return your final stance."
)
Wait for all agents to complete. Collect final stances.
Evaluate skip conditions per references/red-team-context.md using Wave 1 final stances:
If skipped, proceed directly to Phase 8. Use Wave 1 final stances as binding input.
For each file, merge Wave 1 final stances into a preliminary consensus (same logic as Phase 8 merge, but intermediate)
Assemble context package for each adversary per references/red-team-context.md — consensus findings, withdrawn findings with reasons, and debate evidence per file
Spawn A adversary agents:
Agent(
agent: "test-writing:test-adversary",
team_name: "test-review",
name: "adversary-{n}-2",
prompt: "Invoke Skill(test-writing:phpunit-unit-test-adversarial-reviewing) with this input.
Consensus package:
[per-file context package as YAML]
Impressions from Wave 0:
[this adversary's Wave 0 impressions]
Return your challenges."
)
Wait. Collect challenges.
For each reviewer with files that received adversary challenges, assemble:
Spawn R reviewer agents:
Agent(
agent: "test-writing:test-reviewer",
team_name: "test-review",
name: "reviewer-{n}-3",
prompt: "Invoke Skill(test-writing:phpunit-unit-test-defending) with this input.
Own final stance:
[reviewer's Wave 1 final stance]
Adversary challenges:
[adversary challenges for this reviewer's files]
Return your defense stance."
)
Wait. Collect defense stances.
If the red team round ran (Phases 6-7), use Wave 3 defense stances as input. If skipped, use Wave 1 final stances.
For each file, extract the 3 binding stances from its assigned reviewers. For each unique (rule_id, location) pair:
Location matching: match by rule_id first, then treat locations within a 5-line range of the same method as the same finding. Use the location from the majority if ambiguous.
Enforce level conflicts: if reviewers agree a violation exists but disagree on enforce level, use the majority enforce level and note the disagreement.
After all per-file verdicts, scan for pattern divergences:
cross_file_references from all debate outputsConsistency findings are should-fix (warnings) — they count toward NEEDS_ATTENTION but not ISSUES_FOUND.
For each finding in the final report, assign an adversary_impact tag:
When the red team round was skipped, all findings receive adversary_impact: unchanged.
Generate the report per references/report-format.md.
Call TeamDelete directly. Do NOT send SendMessage to any agent or broadcast to "*". Agents already completed and returned after each wave. There is nothing to shut down.
On ALL exit paths (success, failure, partial failure), ensure TeamDelete is called.
For all error scenarios and recovery actions, see references/error-handling.md.