From claude-impl-tools
Performs comprehensive quality audits verifying planning conformance, DDD validation, security checks, tests, browser verification, and metrics before deployment or PR merge.
npx claudepluginhub insightflo/claude-impl-tools --plugin claude-impl-toolsThis skill uses the workspace's default tool permissions.
> **Purpose**: Comprehensive quality audit against planning documents + quantitative metric tracking + verification discipline enforcement.
Final code review skill: runs stack-specific tests/lints (Next.js, Python, Swift, Kotlin), security checks, verifies spec.md criteria, audits hub files, issues ship/no-go verdict after /build or /deploy.
Runs parallel specialized agents to verify implementations, run tests (unit/e2e/integration/perf/LLM), grade quality (0-10 scale), and suggest improvements. Use before merging.
Delivers testing strategies via test pyramid, enforces code quality standards, and validates phase transition gates. Use for testing setup, code reviews, and production readiness.
Share bugs, ideas, or general feedback.
Purpose: Comprehensive quality audit against planning documents + quantitative metric tracking + verification discipline enforcement.
v3.0.0: Absorbed
evaluation(metrics) andverification-before-completion(evidence discipline)
implementation agent's responsibilitydocs/planning/management/mini-prd.md and docs/planning/*.md are absent, run /governance-setup first1. Verify planning document existence (Mini-PRD or Socrates)
2. Load context (read reference documents)
3. Two-stage review (Spec Compliance → Code Quality)
4. DDD (Demo-Driven Development) validation
5. Security validation (invoke /security-review)
6. Dynamic validation (run tests)
7. UI/UX browser validation (agent-browser CLI + Lighthouse CLI)
8. Write quality report + provide fix guidance
Before starting the audit, verify the following when the skill is triggered.
management/mini-prd.md or docs/planning/*.md must be present.
/governance-setup first."Default assumption: "It doesn't work." Prove otherwise with evidence. (Inspired by Harness Evaluator pattern.)
| Rule | Violation Blocked |
|---|---|
| All scores require evidence (file:line, test output, screenshot) | Score without evidence = 0 |
| Console errors > 0 → Functionality capped at 7/10 | Ignoring console errors |
| Uncaught exceptions → auto FAIL for that route | Rationalizing "minor" exceptions |
| Untestable (server down) → Score 0, not skip | "Couldn't test, so pass" |
| BLOCKING issues must be fixed before deploy | Shipping with known blockers |
Issue Classification (applies to all audit output):
# One of the two options
ls management/mini-prd.md 2>/dev/null # Option A: Mini-PRD
ls docs/planning/*.md 2>/dev/null # Option B: Socrates
Mini-PRD required fields: purpose, features, tech_stack
Socrates required documents: 01-prd.md, 02-trd.md, 07-coding-convention.md
design//security-review --path src --summary
| Severity | Meaning | Deployable |
|---|---|---|
| CRITICAL | Immediate fix required | No — cannot deploy |
| HIGH | Fix recommended before deployment | Conditional |
| MEDIUM | Known issue | Yes — can deploy |
| Project Type | Test Command |
|---|---|
| Node.js | npm test |
| Python | pytest |
| Python (Poetry) | poetry run pytest |
Uses agent-browser CLI or Lighthouse CLI
# 1. Open page + take snapshot
agent-browser open http://localhost:3000
agent-browser snapshot # accessibility tree (@ref based)
agent-browser screenshot audit.png # visual capture
# 2. Lighthouse audit (accessibility + performance + SEO)
npx lighthouse http://localhost:3000 --output=json --quiet
# 3. Check console errors
agent-browser console # error/warning count
AI Slop Detection — Auto-deduct from visual score: ≥1 pattern → −1pt, ≥3 patterns → −2pt.
| # | Pattern | # | Pattern |
|---|---|---|---|
| 1 | Hero section with no real image (placeholder/gradient) | 6 | Generic icons only (no custom illustrations) |
| 2 | 3-column generic feature grid | 7 | Empty state not handled |
| 3 | Meaningless gradient decoration | 8 | Hardcoded demo data visible in UI |
| 4 | Lorem ipsum text remaining | 9 | Excessive rounded-xl on everything |
| 5 | Identical card layout repeated throughout | 10 | Purposeless animation/motion |
┌─────────────────────────────────────────┐
│ Quality Audit Result │
├─────────────────────────────────────────┤
│ Score: 85/100 │
│ Verdict: CAUTION │
│ │
│ ✅ Feature conformance: 95% │
│ ⚠️ Conventions: 75% │
│ Security: 88% (1 medium issue) │
│ ✅ Tests: passed (coverage 82%) │
└─────────────────────────────────────────┘
Verdict criteria:
| Score | Verdict | Meaning |
|---|---|---|
| 90+ | PASS | Ready to deploy immediately |
| 70–89 | CAUTION | Deploy after minor fixes |
| Below 70 | FAIL | Major fixes required |
| BLOCKING | Priority | Category | Description | Related File |
|---|---|---|---|---|
| ✅ YES | Critical | Security | Hardcoded API key | src/api/auth.py:23 |
| ✅ YES | High | Bug | Missing duplicate check | src/api/auth.py:45 |
| Audit Result | Recommended Skill |
|---|---|
| Spec mismatch | /agile iterate |
| Code quality issues | /checkpoint → re-audit |
| Security vulnerabilities | Re-run /security-review |
| Deep review needed | /multi-ai-review |
Automates the deployment approval process in collaboration with the QA Manager agent.
/audit → calculate quality score → request QA Manager approval
↓
✅ Approved → proceed to deployment
⚠️ Conditional → re-validate after fixing issues
❌ Rejected → send feedback to Specialist
Detailed integration patterns: see references/agent-integration.md
Absorbed from
verification-before-completion. Applies to ALL completion claims.
Iron Law: No claims without fresh evidence.
Before asserting any state ("tests pass", "bug fixed", "build succeeds"):
1. IDENTIFY — What command proves this claim?
2. RUN — Execute the full command (fresh, complete)
3. READ — Check full output, exit code, failure count
4. VERIFY — Does the output confirm the claim?
- NO → state actual status with evidence
- YES → state claim with evidence
5. ONLY THEN — Make the claim
| Claim | Required evidence | NOT sufficient |
|---|---|---|
| Tests pass | Test command output: 0 failures | Previous run, "should pass" |
| Lint clean | Lint output: 0 errors | Partial check, inference |
| Build succeeds | Build command: exit 0 | Lint passed, logs look OK |
| Bug fixed | Original symptom test: passes | Code changed, assumed fixed |
Red flags — stop immediately if you catch yourself:
Absorbed from
evaluation. Optional — run when metrics tracking is needed.
| Metric | Command | Target | Warning |
|---|---|---|---|
| Test coverage | pytest --cov / vitest --coverage | ≥70% | <60% |
| Cyclomatic complexity | radon cc / eslint complexity | ≤10 | >15 |
| Code duplication | jscpd / pylint duplicate | ≤5% | >10% |
| Lint errors | ruff / eslint | 0 | >5 |
| Type errors | mypy / tsc | 0 | >0 |
| Security score | bandit / npm audit | 0 critical | any critical |
| Metric | Target |
|---|---|
| Task completion rate | ≥95% |
| Average retries per task | ≤2 |
| First-attempt success rate | ≥80% |
Store metrics in .claude/metrics/ for trend tracking across phases.
references/agent-integration.md — QA Manager integration patterns, feedback routingLast Updated: 2026-04-01 (v3.1.0 — Skeptical QA baseline + BLOCKING/NON-BLOCKING classification + AI Slop Detection)
핵심 원칙: Task 완료 ≠ Goal 달성
Goal (목표)
↓
Must-have (참이어야 할 것)
↓
Must-exist (존재해야 할 것)
↓
Must-wired (연결되어야 할 것)
↓
실제 코드베이스 검증
# TASKS.md에서 현재 Phase/작업의 목표 추출
GOAL=$(grep -A5 "## Phase" TASKS.md | grep -E "^>" | head -1)
목표에서 역산하여 필수 조건 도출:
Goal: "사용자가 채팅할 수 있어야 함"
↓
Must-have:
- 메시지 전송 가능
- 메시지 수신 가능
- 메시지 표시 가능
각 must-have에 대해 실제 코드 존재 확인:
# 예: 채팅 기능 검증
grep -r "sendMessage\\|ChatInput\\|MessageList" src/
컴포넌트 간 연결 확인:
# import/export 관계 확인
grep -r "import.*from.*chat" src/
# {Phase} - Verification Report
**검증일:** {date}
**상태:** {PASS|FAIL}
## Goal
{검증한 목표}
## Must-haves 검증
| ID | Must-have | Status | Evidence |
|----|-----------|--------|----------|
| M-01 | {항목} | ✅/❌ | {파일:라인} |
## Gaps (실패 시)
- {누락된 항목}
- {수정 필요 사항}
## 다음 단계
{PASS 시: 다음 Phase}
{FAIL 시: Gap 해결 작업}