[BUILD] Comprehensive verification with parallel test agents. Use when verifying implementations or validating changes.
Runs comprehensive feature verification using parallel specialized agents that provide nuanced grading and improvement suggestions.
/plugin marketplace add yonatangross/orchestkit/plugin install ork@orchestkitComprehensive verification using parallel specialized agents with nuanced grading (0-10 scale) and improvement suggestions.
/verify authentication flow
/verify user profile feature
/verify --scope=backend database migrations
BEFORE creating tasks, clarify verification scope:
AskUserQuestion(
questions=[{
"question": "What scope for this verification?",
"header": "Scope",
"options": [
{"label": "Full verification (Recommended)", "description": "All tests + security + code quality + grades"},
{"label": "Tests only", "description": "Run unit + integration + e2e tests"},
{"label": "Security audit", "description": "Focus on security vulnerabilities"},
{"label": "Code quality", "description": "Lint, types, complexity analysis"},
{"label": "Quick check", "description": "Just run tests, skip detailed analysis"}
],
"multiSelect": false
}]
)
Based on answer, adjust workflow:
# Create main verification task
TaskCreate(
subject="Verify [feature-name] implementation",
description="Comprehensive verification with nuanced grading",
activeForm="Verifying [feature-name] implementation"
)
# Create subtasks for 8-phase process
phases = ["Run code quality checks", "Execute security audit",
"Verify test coverage", "Validate API", "Check UI/UX",
"Calculate grades", "Generate suggestions", "Compile report"]
for phase in phases:
TaskCreate(subject=phase, activeForm=f"{phase}ing")
| Phase | Activities | Output |
|---|---|---|
| 1. Context Gathering | Git diff, commit history | Changes summary |
| 2. Parallel Agent Dispatch | 5 agents evaluate | 0-10 scores |
| 3. Test Execution | Backend + frontend tests | Coverage data |
| 4. Nuanced Grading | Composite score calculation | Grade (A-F) |
| 5. Improvement Suggestions | Effort vs impact analysis | Prioritized list |
| 6. Alternative Comparison | Compare approaches (optional) | Recommendation |
| 7. Metrics Tracking | Trend analysis | Historical data |
| 8. Report Compilation | Evidence artifacts | Final report |
# PARALLEL - Run in ONE message
git diff main --stat
git log main..HEAD --oneline
git diff main --name-only | sort -u
Launch ALL agents in ONE message with run_in_background=True.
| Agent | Focus | Output |
|---|---|---|
| code-quality-reviewer | Lint, types, patterns | Quality 0-10 |
| security-auditor | OWASP, secrets, CVEs | Security 0-10 |
| test-generator | Coverage, test quality | Coverage 0-10 |
| backend-system-architect | API design, async | API 0-10 |
| frontend-ui-developer | React 19, Zod, a11y | UI 0-10 |
See Grading Rubric for detailed scoring criteria.
# PARALLEL - Backend and frontend
cd backend && poetry run pytest tests/ -v --cov=app --cov-report=json
cd frontend && npm run test -- --coverage
See Grading Rubric for full scoring details.
Weights:
| Dimension | Weight |
|---|---|
| Code Quality | 20% |
| Security | 25% |
| Test Coverage | 20% |
| API Compliance | 20% |
| UI Compliance | 15% |
Grade Interpretation:
| Score | Grade | Action |
|---|---|---|
| 9.0-10.0 | A+ | Ship it! |
| 8.0-8.9 | A | Ready for merge |
| 7.0-7.9 | B | Minor improvements optional |
| 6.0-6.9 | C | Consider improvements |
| 5.0-5.9 | D | Improvements recommended |
| 0.0-4.9 | F | Do not merge |
Each suggestion includes effort (1-5) and impact (1-5) with priority = impact/effort.
| Points | Effort | Impact |
|---|---|---|
| 1 | < 15 min | Minimal |
| 2 | 15-60 min | Low |
| 3 | 1-4 hrs | Medium |
| 4 | 4-8 hrs | High |
| 5 | 1+ days | Critical |
Quick Wins: Effort <= 2 AND Impact >= 4
See Alternative Comparison for template.
Use when:
mcp__memory__create_entities(entities=[{
"name": "verification-{date}-{feature}",
"entityType": "VerificationMetrics",
"observations": [f"composite_score: {score}", ...]
}])
Query trends: mcp__memory__search_nodes(query="VerificationMetrics")
See Report Template for full format.
# Feature Verification Report
**Composite Score: [N.N]/10** (Grade: [LETTER])
## Top Improvement Suggestions
| # | Suggestion | Effort | Impact | Priority |
|---|------------|--------|--------|----------|
| 1 | [highest] | [N] | [N] | [N.N] |
## Verdict
**[READY FOR MERGE | IMPROVEMENTS RECOMMENDED | BLOCKED]**
See Policy-as-Code for configuration.
Define verification rules in .claude/policies/verification-policy.json:
{
"thresholds": {
"composite_minimum": 6.0,
"security_minimum": 7.0,
"coverage_minimum": 70
},
"blocking_rules": [
{"dimension": "security", "below": 5.0, "action": "block"}
]
}
| Decision | Choice | Rationale |
|---|---|---|
| Scoring scale | 0-10 with decimals | Nuanced, not binary |
| Improvement priority | Impact / Effort ratio | Do high-value first |
| Alternative comparison | Optional phase | Only when multiple valid approaches |
| Metrics persistence | Memory MCP | Track trends over time |
implement - Full implementation with verificationreview-pr - PR-specific verificationrun-tests - Detailed test executionquality-gates - Quality gate patternsVersion: 3.0.0 (January 2026)