From ork
Runs parallel specialized agents to verify implementations, run tests (unit/e2e/integration/perf/LLM), grade quality (0-10 scale), and suggest improvements. Use before merging.
npx claudepluginhub yonatangross/orchestkit --plugin orkThis skill is limited to using the following tools:
Comprehensive verification using parallel specialized agents with nuanced grading (0-10 scale) and improvement suggestions.
assets/gallery-template.htmlassets/quality-policy.yamlassets/verification-report.mdchecklists/verification-checklist.mdreferences/alternative-comparison.mdreferences/grading-rubric.mdreferences/orchestration-mode.mdreferences/policy-as-code.mdreferences/quality-model.mdreferences/report-template.mdreferences/verification-checklist.mdreferences/verification-phases.mdreferences/visual-capture.mdrules/_sections.mdrules/evidence-collection.mdrules/scoring-rubric.mdtest-cases.jsonProvides truth scoring (0-1 scale) for code/agents/tasks, automated verification of correctness/security/best practices, auto-rollback below 0.95 threshold, metrics dashboards, and CI/CD exports.
Verifies feature completion by writing automated tests against SPEC.md, running commands for fresh evidence, and confirming outputs per Iron Law of Verification.
Performs comprehensive quality audits verifying planning conformance, DDD validation, security checks, tests, browser verification, and metrics before deployment or PR merge.
Share bugs, ideas, or general feedback.
Comprehensive verification using parallel specialized agents with nuanced grading (0-10 scale) and improvement suggestions.
/ork:verify authentication flow
/ork:verify --model=opus user profile feature
/ork:verify --scope=backend database migrations
SCOPE = "$ARGUMENTS" # Full argument string, e.g., "authentication flow"
SCOPE_TOKEN = "$ARGUMENTS[0]" # First token for flag detection (e.g., "--scope=backend")
# $ARGUMENTS[0], $ARGUMENTS[1] etc. for indexed access (CC 2.1.59)
# Model override detection (CC 2.1.72)
MODEL_OVERRIDE = None
for token in "$ARGUMENTS".split():
if token.startswith("--model="):
MODEL_OVERRIDE = token.split("=", 1)[1] # "opus", "sonnet", "haiku"
SCOPE = SCOPE.replace(token, "").strip()
Pass MODEL_OVERRIDE to all Agent() calls via model=MODEL_OVERRIDE when set. Accepts symbolic names (opus, sonnet, haiku) or full IDs (claude-opus-4-6) per CC 2.1.74.
Opus 4.6: Agents use native adaptive thinking (no MCP sequential-thinking needed). Extended 128K output supports comprehensive verification reports.
Scale verification depth based on /effort level:
| Effort Level | Phases Run | Agents | Output |
|---|---|---|---|
| low | Run tests only → pass/fail | 0 agents | Quick check |
| medium | Tests + code quality + security | 3 agents | Score + top issues |
| high (default) | All 8 phases + visual capture | 6-7 agents | Full report + grades |
Override: Explicit user selection (e.g., "Full verification") overrides
/effortdownscaling.
BEFORE creating tasks, clarify verification scope:
AskUserQuestion(
questions=[{
"question": "What scope for this verification?",
"header": "Scope",
"options": [
{"label": "Full verification (Recommended)", "description": "All tests + security + code quality + visual + grades", "markdown": "```\nFull Verification (10 phases)\n─────────────────────────────\n 7 parallel agents:\n ┌────────────┐ ┌────────────┐\n │ Code │ │ Security │\n │ Quality │ │ Auditor │\n ├────────────┤ ├────────────┤\n │ Test │ │ Backend │\n │ Generator │ │ Architect │\n ├────────────┤ ├────────────┤\n │ Frontend │ │ Performance│\n │ Developer │ │ Engineer │\n ├────────────┤ └────────────┘\n │ Visual │\n │ Capture │ → gallery.html\n └────────────┘\n ▼\n Composite Score (0-10)\n 8 dimensions + Grade\n + Visual Gallery\n```"},
{"label": "Tests only", "description": "Run unit + integration + e2e tests", "markdown": "```\nTests Only\n──────────\n npm test ──▶ Results\n ┌─────────────────────┐\n │ Unit tests ✓/✗ │\n │ Integration ✓/✗ │\n │ E2E ✓/✗ │\n │ Coverage NN% │\n └─────────────────────┘\n Skip: security, quality, UI\n Output: Pass/fail + coverage\n```"},
{"label": "Security audit", "description": "Focus on security vulnerabilities", "markdown": "```\nSecurity Audit\n──────────────\n security-auditor agent:\n ┌─────────────────────────┐\n │ OWASP Top 10 ✓/✗ │\n │ Dependency CVEs ✓/✗ │\n │ Secrets scan ✓/✗ │\n │ Auth flow review ✓/✗ │\n │ Input validation ✓/✗ │\n └─────────────────────────┘\n Output: Security score 0-10\n + vulnerability list\n```"},
{"label": "Code quality", "description": "Lint, types, complexity analysis", "markdown": "```\nCode Quality\n────────────\n code-quality-reviewer agent:\n ┌─────────────────────────┐\n │ Lint errors N │\n │ Type coverage NN% │\n │ Cyclomatic complex N.N │\n │ Dead code N │\n │ Pattern violations N │\n └─────────────────────────┘\n Output: Quality score 0-10\n + refactor suggestions\n```"},
{"label": "Quick check", "description": "Just run tests, skip detailed analysis", "markdown": "```\nQuick Check (~1 min)\n────────────────────\n Run tests ──▶ Pass/Fail\n\n Output:\n ├── Test results\n ├── Build status\n └── Lint status\n No agents, no grading,\n no report generation\n```"}
],
"multiSelect": true
}]
)
Based on answer, adjust workflow:
Load details: Read("${CLAUDE_SKILL_DIR}/references/orchestration-mode.md") for env var check logic, Agent Teams vs Task Tool comparison, and mode selection rules.
Choose Agent Teams (mesh -- verifiers share findings) or Task tool (star -- all report to lead) based on the orchestration mode reference.
ToolSearch(query="select:mcp__memory__search_nodes")
Write(".claude/chain/capabilities.json", { memory, timestamp })
Read(".claude/chain/state.json") # resume if exists
After verification completes, write results:
Write(".claude/chain/verify-results.json", JSON.stringify({
"phase": "verify", "skill": "verify",
"timestamp": now(), "status": "completed",
"outputs": {
"tests_passed": N, "tests_failed": N,
"coverage": "87%", "security_scan": "clean"
}
}))
Optionally schedule post-verification monitoring:
# Guard: Skip cron in headless/CI (CLAUDE_CODE_DISABLE_CRON)
# if env CLAUDE_CODE_DISABLE_CRON is set, run a single check instead
CronCreate(
schedule="0 8 * * *",
prompt="Daily regression check: npm test.
If 7 consecutive passes → CronDelete.
If failures → alert with details."
)
# Create main verification task
TaskCreate(
subject="Verify [feature-name] implementation",
description="Comprehensive verification with nuanced grading",
activeForm="Verifying [feature-name] implementation"
)
# Create subtasks for 8-phase process
phases = ["Run code quality checks", "Execute security audit",
"Verify test coverage", "Validate API", "Check UI/UX",
"Calculate grades", "Generate suggestions", "Compile report"]
for phase in phases:
TaskCreate(subject=phase, activeForm=f"{phase}ing")
Load details: Read("${CLAUDE_SKILL_DIR}/references/verification-phases.md") for complete phase details, agent spawn definitions, Agent Teams alternative, and team teardown.
| Phase | Activities | Output |
|---|---|---|
| 1. Context Gathering | Git diff, commit history | Changes summary |
| 2. Parallel Agent Dispatch | 6 agents evaluate | 0-10 scores |
| 2.5 Visual Capture | Screenshot routes, AI vision eval | Gallery + visual score |
| 3. Test Execution | Backend + frontend tests | Coverage data |
| 4. Nuanced Grading | Composite score calculation | Grade (A-F) |
| 5. Improvement Suggestions | Effort vs impact analysis | Prioritized list |
| 6. Alternative Comparison | Compare approaches (optional) | Recommendation |
| 7. Metrics Tracking | Trend analysis | Historical data |
| 8. Report Compilation | Evidence artifacts + gallery.html | Final report |
| 8.5 Agentation Loop | User annotates, ui-feedback fixes | Before/after diffs |
| Agent | Focus | Output |
|---|---|---|
| code-quality-reviewer | Lint, types, patterns | Quality 0-10 |
| security-auditor | OWASP, secrets, CVEs | Security 0-10 |
| test-generator | Coverage, test quality | Coverage 0-10 |
| backend-system-architect | API design, async | API 0-10 |
| frontend-ui-developer | React 19, Zod, a11y | UI 0-10 |
| python-performance-engineer | Latency, resources, scaling | Performance 0-10 |
Launch ALL agents in ONE message with run_in_background=True and max_turns=25.
Output each agent's score as soon as it completes — don't wait for all 6-7 agents:
Security: 8.2/10 — No critical vulnerabilities found
Code Quality: 7.5/10 — 3 complexity hotspots identified
[...remaining agents still running...]
This gives users real-time visibility into multi-agent verification. If any dimension scores below the security_minimum threshold (default 5.0), flag it as a blocker immediately — the user can terminate early without waiting for remaining agents.
Load details: Read("${CLAUDE_SKILL_DIR}/references/visual-capture.md") for auto-detection, route discovery, screenshot capture, and AI vision evaluation.
Summary: Auto-detects project framework, starts dev server, discovers routes, uses agent-browser to screenshot each route, evaluates with Claude vision, generates self-contained gallery.html with base64-embedded images.
Output: verification-output/{timestamp}/gallery.html — open in browser to see all screenshots with AI evaluations, scores, and annotation diffs.
Graceful degradation: If no frontend detected or server won't start, skips visual capture with a warning — never blocks verification.
Load details: Read("${CLAUDE_SKILL_DIR}/references/visual-capture.md") (Phase 8.5 section) for agentation loop workflow.
Trigger: Only when agentation MCP is configured. Offers user the choice to annotate the live UI. ui-feedback agent processes annotations, re-screenshots show before/after.
Load Read("${CLAUDE_PLUGIN_ROOT}/skills/quality-gates/references/unified-scoring-framework.md") for dimensions, weights, grade thresholds, and improvement prioritization. Load Read("${CLAUDE_SKILL_DIR}/references/quality-model.md") for verify-specific extensions (Visual dimension). Load Read("${CLAUDE_SKILL_DIR}/references/grading-rubric.md") for per-agent scoring criteria.
Load details: Read("${CLAUDE_SKILL_DIR}/rules/evidence-collection.md") for git commands, test execution patterns, metrics tracking, and post-verification feedback.
Load details: Read("${CLAUDE_SKILL_DIR}/references/policy-as-code.md") for configuration.
Define verification rules in .claude/policies/verification-policy.json:
{
"thresholds": {
"composite_minimum": 6.0,
"security_minimum": 7.0,
"coverage_minimum": 70
},
"blocking_rules": [
{"dimension": "security", "below": 5.0, "action": "block"}
]
}
Load details: Read("${CLAUDE_SKILL_DIR}/references/report-template.md") for full format. Summary:
# Feature Verification Report
**Composite Score: [N.N]/10** (Grade: [LETTER])
## Verdict
**[READY FOR MERGE | IMPROVEMENTS RECOMMENDED | BLOCKED]**
Load on demand with Read("${CLAUDE_SKILL_DIR}/references/<file>"):
| File | Content |
|---|---|
verification-phases.md | 8-phase workflow, agent spawn definitions, Agent Teams mode |
visual-capture.md | Phase 2.5 + 8.5: screenshot capture, AI vision, gallery generation, agentation loop |
quality-model.md | Scoring dimensions and weights (8 unified) |
grading-rubric.md | Per-agent scoring criteria |
report-template.md | Full report format with visual evidence section |
alternative-comparison.md | Approach comparison template |
orchestration-mode.md | Agent Teams vs Task Tool |
policy-as-code.md | Verification policy configuration |
verification-checklist.md | Pre-flight checklist |
Load on demand with Read("${CLAUDE_SKILL_DIR}/rules/<file>"):
| File | Content |
|---|---|
scoring-rubric.md | Composite scoring, grades, verdicts |
evidence-collection.md | Evidence gathering and test patterns |
ork:implement - Full implementation with verificationork:review-pr - PR-specific verificationtesting-unit / testing-integration / testing-e2e - Test execution patternsork:quality-gates - Quality gate patternsbrowser-tools - Browser automation for visual captureVersion: 4.2.0 (March 2026) — Added progressive output for incremental agent scores