From harness-engineering
Multi-model PR review using the 3 Amigos pattern: Claude Sonnet (architect), Gemini 2.5 Pro (bug hunter), DeepSeek (DX reviewer), orchestrated via Mastra. Use when reviewing a PR that needs multi-perspective analysis, when a single-model review has missed bugs before, or when the PR touches critical paths (auth, data layer, proto). Do NOT use Claude Code Max OAuth for these — API keys only, due to ToS. See docs/RATIONALE.md for the design decision.
npx claudepluginhub toru-oizumi/claude-harness-engineering --plugin harness-engineeringThis skill uses the workspace's default tool permissions.
A PR review agent that runs three specialized reviewers in parallel, each on a different model, then merges their findings into a single structured review comment. The design rationale is in `docs/RATIONALE.md`.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
A PR review agent that runs three specialized reviewers in parallel, each on a different model, then merges their findings into a single structured review comment. The design rationale is in docs/RATIONALE.md.
| Role | Model | Focus |
|---|---|---|
| Architect Reviewer | Claude Sonnet (API) | Design principles, dependency direction, layer violations, abstraction level |
| Bug Hunter | Gemini 2.5 Pro (API) | Race conditions, null/nil handling, edge cases, security, error swallowing |
| DX Reviewer | DeepSeek Chat (API) | Naming, testability, readability, test coverage, public API clarity |
Each runs independently on the full PR diff + repo context. They don't see each other's reviews (by design — avoid groupthink).
┌─────────────────────┐
│ GitHub webhook / │
│ /harness:review-pr │
└──────────┬──────────┘
│
┌──────▼──────┐
│ Mastra │ (Supervisor)
│ Orchestrator│
└──────┬──────┘
│ parallel
┌────────────────┼────────────────┐
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Architect │ │ Bug Hunter │ │ DX Review │
│ (Sonnet) │ │ (Gemini) │ │ (DeepSeek) │
└──────┬─────┘ └──────┬─────┘ └──────┬─────┘
│ │ │
└────────────────┼────────────────┘
▼
┌────────────┐
│ Merger │ (Claude Sonnet)
│ + Dedup │
└──────┬─────┘
▼
Structured review comment
Claude Max plan OAuth inside a third-party orchestrator (Mastra server, GitHub Actions) is a ToS grey area. The plugin uses Anthropic API keys server-side to avoid this risk. Claude Max is reserved for local Claude Code use only.
Create .env in the Mastra project (NOT in this plugin's repo):
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
DEEPSEEK_API_KEY=...
GITHUB_TOKEN=ghp_...
The plugin does not ship the Mastra server itself — that lives in your own repo (one per deployment). A minimal starter is in the Mastra github-pr-code-review-agent template.
Recommended deployment target: GitHub Actions (simplest, already authenticated to your repo) or ECS Fargate (if you want it always-on for faster response).
On PR open / synchronize → POST the PR number to the Mastra server endpoint. The server pulls the diff, runs the 3 Amigos, and posts a review comment back via GITHUB_TOKEN.
Once the Mastra server is reachable:
/harness:review-pr <PR-number>
Output format: a single markdown comment with three sections:
## Multi-Model Review
### 🏛 Architect (Claude Sonnet)
- ...
### 🐛 Bug Hunter (Gemini 2.5 Pro)
- ...
### ✨ DX Reviewer (DeepSeek)
- ...
### 🔍 Consolidated Top Issues
1. ... (mentioned by 2+ reviewers — high confidence)
2. ...
For a typical 500 LOC PR:
For small PRs (100 LOC): ~$0.10. Cheap relative to human time.
After collecting findings from all 3 Amigos, the Merger runs a Validation Pass before synthesis. For each finding, a lightweight sub-agent checks whether the issue is real and significant in the context of the full diff.
| Score | Meaning | Action |
|---|---|---|
| ≥ 80 | Real issue, significant | Include in output |
| 50–79 | Possible issue, minor | Include as "Note" only |
| < 50 | Likely false positive | Discard silently |
Auto-discard without scoring:
architecture-enforcement — deterministic checks that should run BEFORE the reviewersedit-lint-feedback-loop — catches cheap issues before they reach the reviewersSee gotchas.md.