From ork
Generates and runs unit, integration (testcontainers/docker-compose), and Playwright E2E test suites for JS/TS code. Analyzes coverage gaps with parallel test-generator agents, executes tests, and heals failures up to 3 times.
npx claudepluginhub yonatangross/orchestkit --plugin orkThis skill is limited to using the following tools:
Generate comprehensive test suites for existing code with real-service integration testing and automated failure healing.
Discovers unit testing gaps and generates new tests following project conventions. Supports deep iterative mode and harness for automated multi-cycle coverage improvement.
Creates and manages unit and integration tests by analyzing codebase, auto-detecting test frameworks, and generating tests that follow project conventions.
Conducts five-phase test suite review: fills unit coverage gaps, surveys integration and E2E (webapps) coverage, identifies fuzz opportunities, audits test quality.
Share bugs, ideas, or general feedback.
Generate comprehensive test suites for existing code with real-service integration testing and automated failure healing.
/ork:cover authentication flow
/ork:cover --model=opus payment processing
/ork:cover --tier=unit,integration user service
/ork:cover --real-services checkout pipeline
SCOPE = "$ARGUMENTS" # e.g., "authentication flow"
# Flag parsing
MODEL_OVERRIDE = None
TIERS = ["unit", "integration", "e2e"] # default: all three
REAL_SERVICES = False
for token in "$ARGUMENTS".split():
if token.startswith("--model="):
MODEL_OVERRIDE = token.split("=", 1)[1]
SCOPE = SCOPE.replace(token, "").strip()
elif token.startswith("--tier="):
TIERS = token.split("=", 1)[1].split(",")
SCOPE = SCOPE.replace(token, "").strip()
elif token == "--real-services":
REAL_SERVICES = True
SCOPE = SCOPE.replace(token, "").strip()
Scale test generation depth based on /effort level:
| Effort Level | Tiers Generated | Agents | Heal Iterations |
|---|---|---|---|
| low | Unit only | 1 agent | 1 max |
| medium | Unit + Integration | 2 agents | 2 max |
| high (default) | Unit + Integration + E2E | 3 agents | 3 max |
Override: Explicit
--tier=flag or user selection overrides/effortdownscaling.
# Probe MCPs (parallel):
ToolSearch(query="select:mcp__memory__search_nodes")
ToolSearch(query="select:mcp__context7__resolve-library-id")
Write(".claude/chain/capabilities.json", {
"memory": <true if found>,
"context7": <true if found>,
"skill": "cover",
"timestamp": now()
})
# Resume check:
Read(".claude/chain/state.json")
# If exists and skill == "cover": resume from current_phase
# Otherwise: initialize state
AskUserQuestion(
questions=[
{
"question": "What test tiers should I generate?",
"header": "Test Tiers",
"options": [
{"label": "Full coverage (Recommended)", "description": "Unit + Integration (real services) + E2E", "markdown": "```\nFull Coverage\n─────────────\n Unit Integration E2E\n ┌─────────┐ ┌─────────────┐ ┌──────────┐\n │ AAA │ │ Real DB │ │Playwright│\n │ Mocks │ │ Real APIs │ │Page obj │\n │ Factory │ │ Testcontain │ │A11y │\n └─────────┘ └─────────────┘ └──────────┘\n 3 parallel test-generator agents\n```"},
{"label": "Unit + Integration", "description": "Skip E2E, focus on logic and service boundaries", "markdown": "```\nUnit + Integration\n──────────────────\n Unit tests for business logic\n Integration tests at API boundaries\n Real services if docker-compose found\n Skip: browser automation\n```"},
{"label": "Unit only", "description": "Fast isolated tests for business logic", "markdown": "```\nUnit Only (~2 min)\n──────────────────\n AAA pattern tests\n MSW/VCR mocking\n Factory-based data\n Coverage gap analysis\n Skip: real services, browser\n```"},
{"label": "Integration only", "description": "API boundary and real-service tests", "markdown": "```\nIntegration Only\n────────────────\n API endpoint tests (Supertest/httpx)\n Database tests (real or in-memory)\n Contract tests (Pact)\n Testcontainers if available\n```"},
{"label": "E2E only", "description": "Playwright browser tests", "markdown": "```\nE2E Only\n────────\n Playwright page objects\n User flow tests\n Visual regression\n Accessibility (axe-core)\n```"}
],
"multiSelect": false
},
{
"question": "Healing strategy for failing tests?",
"header": "Failure Handling",
"options": [
{"label": "Auto-heal (Recommended)", "description": "Fix failing tests up to 3 iterations"},
{"label": "Generate only", "description": "Write tests, report failures, don't fix"},
{"label": "Strict", "description": "All tests must pass or abort"}
],
"multiSelect": false
}
]
)
Override TIERS based on selection. Skip this step if --tier= flag was provided.
TaskCreate(
subject=f"Cover: {SCOPE}",
description="Generate comprehensive test suite with real-service testing",
activeForm=f"Generating tests for {SCOPE}"
)
# Subtasks per phase
TaskCreate(subject="Discover scope and detect frameworks", activeForm="Discovering test scope")
TaskCreate(subject="Analyze coverage gaps", activeForm="Analyzing coverage gaps")
TaskCreate(subject="Generate tests (parallel per tier)", activeForm="Generating tests")
TaskCreate(subject="Execute generated tests", activeForm="Running tests")
TaskCreate(subject="Heal failing tests", activeForm="Healing test failures")
TaskCreate(subject="Generate coverage report", activeForm="Generating report")
| Phase | Activities | Output |
|---|---|---|
| 1. Discovery | Detect frameworks, scan scope, find untested code | Framework map, file list |
| 2. Coverage Analysis | Run existing tests, map gaps per tier | Coverage baseline, gap map |
| 3. Generation | Parallel test-generator agents per tier | Test files created |
| 4. Execution | Run all generated tests | Pass/fail results |
| 5. Heal | Fix failures, re-run (max 3 iterations) | Green test suite |
| 6. Report | Coverage delta, test count, summary | Coverage report |
| After Phase | Handoff File | Key Outputs |
|---|---|---|
| 1. Discovery | 01-cover-discovery.json | Frameworks, scope files, tier plan |
| 2. Analysis | 02-cover-analysis.json | Baseline coverage, gap map |
| 3. Generation | 03-cover-generation.json | Files created, test count per tier |
| 5. Heal | 05-cover-healed.json | Final pass/fail, iterations used |
Detect the project's test infrastructure and scope the work.
# PARALLEL — all in ONE message:
# 1. Framework detection (hook handles this, but also scan manually)
Grep(pattern="vitest|jest|mocha|playwright|cypress", glob="package.json", output_mode="content")
Grep(pattern="pytest|unittest|hypothesis", glob="pyproject.toml", output_mode="content")
Grep(pattern="pytest|unittest|hypothesis", glob="requirements*.txt", output_mode="content")
# 2. Real-service infrastructure
Glob(pattern="**/docker-compose*.yml")
Glob(pattern="**/testcontainers*")
Grep(pattern="testcontainers", glob="**/package.json", output_mode="content")
Grep(pattern="testcontainers", glob="**/requirements*.txt", output_mode="content")
# 3. Existing test structure
Glob(pattern="**/tests/**/*.test.*")
Glob(pattern="**/tests/**/*.spec.*")
Glob(pattern="**/__tests__/**/*")
Glob(pattern="**/test_*.py")
# 4. Scope files (what to test)
# If SCOPE specified, find matching source files
Grep(pattern=SCOPE, output_mode="files_with_matches")
Real-service decision:
docker-compose*.yml found → integration tests use real servicestestcontainers in deps → use testcontainers for isolated service instances--real-services flag → error: "No docker-compose or testcontainers found. Install testcontainers or remove --real-services flag."Load real-service detection details: Read("${CLAUDE_SKILL_DIR}/references/real-service-detection.md")
Run existing tests and identify gaps.
# Detect and run coverage command
# TypeScript: npx vitest run --coverage --reporter=json
# Python: pytest --cov=<scope> --cov-report=json
# Go: go test -coverprofile=coverage.out ./...
# Parse coverage output to identify:
# 1. Files with 0% coverage (priority targets)
# 2. Files below threshold (default 70%)
# 3. Uncovered functions/methods
# 4. Untested edge cases (error paths, boundary conditions)
Output coverage baseline to user immediately (progressive output).
Spawn test-generator agents per tier. Launch ALL in ONE message with run_in_background=true.
# Unit tests agent
if "unit" in TIERS:
Agent(
subagent_type="test-generator",
prompt=f"""Generate unit tests for: {SCOPE}
Coverage gaps: {gap_map.unit_gaps}
Framework: {detected_framework}
Existing tests: {existing_test_files}
Focus on:
- AAA pattern (Arrange-Act-Assert)
- Parametrized tests for multiple inputs
- MSW/VCR for HTTP mocking (never mock fetch directly)
- Factory-based test data (FactoryBoy/faker-js)
- Edge cases: empty input, errors, timeouts, boundary values
- Target: 90%+ business logic coverage""",
isolation="worktree",
run_in_background=True,
max_turns=50,
model=MODEL_OVERRIDE
)
# Integration tests agent
if "integration" in TIERS:
Agent(
subagent_type="test-generator",
prompt=f"""Generate integration tests for: {SCOPE}
Coverage gaps: {gap_map.integration_gaps}
Framework: {detected_framework}
Real services available: {real_service_infra}
Focus on:
- API endpoint tests (Supertest/httpx)
- Database tests with {'real DB via testcontainers/docker-compose' if real_services else 'in-memory/mocked DB'}
- Contract tests (Pact) for service boundaries
- Zod/Pydantic schema validation at edges
- Fresh state per test (transaction rollback or cleanup)
- Target: all API endpoints and service boundaries""",
isolation="worktree",
run_in_background=True,
max_turns=50,
model=MODEL_OVERRIDE
)
# E2E tests agent
if "e2e" in TIERS:
Agent(
subagent_type="test-generator",
prompt=f"""Generate E2E tests for: {SCOPE}
Framework: Playwright
Routes/pages: {discovered_routes}
Focus on:
- Semantic locators (getByRole > getByLabel > getByTestId)
- Page Object Model for complex pages
- User flow tests (happy path + error paths)
- Accessibility tests (axe-core WCAG 2.2 AA)
- Visual regression (toHaveScreenshot)
- No hardcoded waits (use auto-wait)""",
isolation="worktree",
run_in_background=True,
max_turns=50,
model=MODEL_OVERRIDE
)
Output each agent's results as soon as it returns — don't wait for all agents. This lets users see generated tests incrementally.
Partial results (CC 2.1.76): If an agent is killed (timeout, context limit), its response is tagged
[PARTIAL RESULT]. Include partial tests but flag them in Phase 4.
Run all generated tests and collect results.
# Run test commands per tier (PARALLEL if independent):
# Unit: npx vitest run tests/unit/ OR pytest tests/unit/
# Integration: npx vitest run tests/integration/ OR pytest tests/integration/
# E2E: npx playwright test
# Collect: pass count, fail count, error details, coverage delta
Fix failing tests iteratively. Max 3 iterations to prevent infinite loops.
for iteration in range(3):
if all_tests_pass:
break
# For each failing test:
# 1. Read the test file and the source code it tests
# 2. Analyze the failure (assertion error? import error? timeout?)
# 3. Fix the test (not the source code — tests only)
# 4. Re-run the fixed tests
# Common fixes:
# - Wrong assertions (expected value mismatch)
# - Missing imports or setup
# - Stale selectors in E2E tests
# - Race conditions (add proper waits)
# - Mock configuration errors
Load heal strategy details: Read("${CLAUDE_SKILL_DIR}/references/heal-loop-strategy.md")
Boundary: heal fixes TESTS, not source code. If a test fails because the source code has a bug, report it — don't silently fix production code.
Generate coverage report with before/after comparison.
Coverage Report: {SCOPE}
═══════════════════════════
Baseline → After
────────────────
Unit: 67.2% → 91.3% (+24.1%)
Integration: 42.0% → 78.5% (+36.5%)
E2E: 0.0% → 65.0% (+65.0%)
Overall: 48.4% → 82.1% (+33.7%)
Tests Generated
───────────────
Unit: 23 tests (18 pass, 5 healed)
Integration: 12 tests (10 pass, 2 healed)
E2E: 8 tests (8 pass)
Total: 43 tests
Heal Iterations: 2/3
Files Created
─────────────
tests/unit/services/test_auth.py
tests/unit/services/test_payment.py
tests/integration/api/test_users.py
tests/integration/api/test_checkout.py
tests/e2e/checkout.spec.ts
tests/e2e/pages/CheckoutPage.ts
Real Services Used: PostgreSQL (testcontainers), Redis (docker-compose)
Remaining Gaps
──────────────
- src/services/notification.ts (0% — no tests generated, out of scope)
- src/utils/crypto.ts (45% — edge cases not covered)
Next Steps
──────────
/ork:verify {SCOPE} # Grade the implementation + tests
/ork:commit # Commit generated tests
/loop 10m npm test -- --coverage # Watch coverage while coding
Optionally schedule weekly coverage drift detection:
# Guard: Skip cron in headless/CI (CLAUDE_CODE_DISABLE_CRON)
# if env CLAUDE_CODE_DISABLE_CRON is set, run a single check instead
CronCreate(
schedule="0 2 * * 0",
prompt="Weekly coverage drift check for {SCOPE}: npm test -- --coverage.
If coverage >= baseline → CronDelete.
If coverage drops > 5% → alert with regression details and recommendation."
)
ork:implement — generates tests during implementation (Phase 5); use /ork:cover after for deeper coverageork:verify — grades existing tests 0-10; chain: implement → cover → verifytesting-unit / testing-integration / testing-e2e — knowledge skills loaded by test-generator agentsork:commit — commit generated test filesLoad on demand with Read("${CLAUDE_SKILL_DIR}/references/<file>"):
| File | Content |
|---|---|
real-service-detection.md | Docker-compose/testcontainers detection, service startup, teardown |
heal-loop-strategy.md | Failure classification, fix patterns, iteration budget |
coverage-report-template.md | Report format, delta calculation, gap analysis |
Version: 1.0.0 (March 2026) — Initial release