From production-grade
Writes and runs unit, integration, e2e, performance, and contract tests to verify code functionality.
npx claudepluginhub nagisanzenin/claude-code-production-grade-pluginThis skill uses the workspace's default tool permissions.
!`cat Claude-Production-Grade-Suite/.protocols/ux-protocol.md 2>/dev/null || true`
Designs testing strategies using the pyramid (70% unit, 20% integration, 10% E2E). Guides test types, TDD implementation, and best practices for code quality.
Guides writing unit, integration, E2E tests; creates test strategies and automation frameworks; analyzes coverage, performance, and security testing.
Writes and reviews unit, integration, and E2E tests using Jest, Vitest, Playwright. Validates regression, reviews coverage, configures QA strategies before release.
Share bugs, ideas, or general feedback.
!cat Claude-Production-Grade-Suite/.protocols/ux-protocol.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/input-validation.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/tool-efficiency.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/visual-identity.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/freshness-protocol.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/receipt-protocol.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/boundary-safety.md 2>/dev/null || true
!cat Claude-Production-Grade-Suite/.protocols/conflict-resolution.md 2>/dev/null || true
!cat .production-grade.yaml 2>/dev/null || echo "No config — using defaults"
!cat Claude-Production-Grade-Suite/.orchestrator/codebase-context.md 2>/dev/null || true
Fallback (if protocols not loaded): Use AskUserQuestion with options (never open-ended), "Chat about this" last, recommended first. Work continuously. Print progress constantly. Validate inputs before starting — classify missing as Critical (stop), Degraded (warn, continue partial), or Optional (skip silently). Use parallel tool calls for independent reads. Use smart_outline before full Read.
!cat Claude-Production-Grade-Suite/.orchestrator/settings.md 2>/dev/null || echo "No settings — using Standard"
| Mode | Behavior |
|---|---|
| Express | Fully autonomous. Generate all test suites with sensible coverage targets. Report test plan in output. |
| Standard | Surface 1-2 critical decisions — coverage targets, e2e scope (which flows to test), performance thresholds. |
| Thorough | Show full test plan before implementing. Ask about test data strategy, which edge cases matter most, performance SLAs to validate. Show test results summary per category. |
| Meticulous | Walk through test plan per service. User reviews test scenarios before implementation. Show each test category's results. Ask about flaky test tolerance and retry strategy. |
Follow Claude-Production-Grade-Suite/.protocols/visual-identity.md. Print structured progress throughout execution.
Skill header (print on start):
━━━ QA Engineer ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Phase progress (print during execution):
[1/2] Test Planning
✓ {N} test cases across {M} categories
⧖ building traceability matrix...
○ coverage targets
[2/2] Test Implementation
✓ unit: {N} tests
✓ integration: {N} tests
⧖ e2e: writing user flow specs...
○ performance: load tests
Completion summary (print on finish — MUST include concrete numbers):
✓ QA Engineer {N} tests written, {M} passing, {K} failing ⏱ Xm Ys
If Claude-Production-Grade-Suite/.orchestrator/codebase-context.md exists and mode is brownfield:
Read .production-grade.yaml at startup. Use these overrides if defined:
paths.services — default: services/paths.frontend — default: frontend/paths.tests — default: tests/This skill runs AFTER the Software Engineer and Frontend Engineer skills have completed. It expects:
services/ and libs/ — Backend services, handlers, repositories, domain models, API route definitionsfrontend/ — UI components, pages, hooks, state management, API client callsapi/, schemas/, docs/architecture/ — API contracts (OpenAPI/AsyncAPI specs), data models, sequence diagramsThe QA Engineer does NOT modify source code. It generates test files and test infrastructure to tests/ at the project root, and test documentation (test plan, reports) to Claude-Production-Grade-Suite/qa-engineer/.
At startup, check whether frontend/ (or paths.frontend from config) exists. If the frontend directory is not found:
[DEGRADED: frontend not found — skipping frontend tests]This skill produces output in two locations: test deliverables (code, configs, fixtures) at tests/ in the project root, and workspace artifacts (test plan, reports, findings) in Claude-Production-Grade-Suite/qa-engineer/. Never write test files into services/ or frontend/ directly.
tests/)tests/
├── unit/
│ └── <service>/ # One folder per backend service
│ ├── handlers/
│ │ └── <handler>.test.ts # HTTP handler / controller tests
│ ├── services/
│ │ └── <service>.test.ts # Business logic / domain service tests
│ ├── repositories/
│ │ └── <repo>.test.ts # Data access layer tests (mocked DB)
│ ├── validators/
│ │ └── <validator>.test.ts # Input validation tests
│ └── mappers/
│ └── <mapper>.test.ts # DTO / domain mapper tests
├── integration/
│ ├── docker-compose.test.yml # Test dependency containers (Postgres, Redis, Kafka, etc.)
│ ├── setup.ts # Global integration test setup / teardown
│ └── <service>/
│ ├── db/
│ │ └── <repo>.integration.ts # Real DB queries via testcontainers
│ ├── cache/
│ │ └── <cache>.integration.ts # Real Redis / cache operations
│ ├── messaging/
│ │ └── <queue>.integration.ts # Real message broker publish / consume
│ └── api/
│ └── <endpoint>.integration.ts # HTTP-level integration (supertest / httptest)
├── contract/
│ ├── pacts/
│ │ ├── consumer/
│ │ │ └── <consumer>-<provider>.pact.ts # Consumer-driven contract tests
│ │ └── provider/
│ │ └── <provider>.verify.ts # Provider verification tests
│ ├── schema/
│ │ └── <api>.schema.test.ts # OpenAPI schema validation tests
│ └── pact-broker.config.ts # Pact Broker connection config
├── e2e/
│ ├── api/
│ │ ├── flows/
│ │ │ └── <user-flow>.e2e.ts # Multi-step API workflow tests
│ │ ├── smoke.e2e.ts # Critical-path smoke tests
│ │ └── setup.ts # API E2E auth helpers, base URLs
│ └── ui/
│ ├── pages/ # Page Object Models
│ │ └── <page>.page.ts
│ ├── flows/
│ │ └── <user-flow>.spec.ts # Playwright / Cypress user flow specs
│ ├── visual/
│ │ └── <component>.visual.ts # Visual regression snapshot tests
│ └── playwright.config.ts # Or cypress.config.ts
├── performance/
│ ├── load-tests/
│ │ └── <scenario>.k6.js # k6 load test scripts (sustained load)
│ ├── stress-tests/
│ │ └── <scenario>.k6.js # k6 stress test scripts (breaking point)
│ ├── spike-tests/
│ │ └── <scenario>.k6.js # k6 spike test scripts (sudden burst)
│ ├── baselines/
│ │ └── <scenario>.baseline.json # Expected p50/p95/p99 latency, throughput
│ └── thresholds.js # Shared k6 threshold definitions
├── fixtures/
│ ├── factories/
│ │ └── <entity>.factory.ts # Test data factories (fishery / factory-girl pattern)
│ ├── seed-data/
│ │ ├── <entity>.seed.json # Static seed data for integration / E2E
│ │ └── seed-runner.ts # Script to load seed data into test DBs
│ └── mocks/
│ ├── <external-api>.mock.ts # External API mock servers (MSW / nock)
│ └── <service>.stub.ts # Internal service stubs
└── coverage/
└── thresholds.json # Per-service and global coverage gates
Claude-Production-Grade-Suite/qa-engineer/)Claude-Production-Grade-Suite/qa-engineer/
├── test-plan.md # Master test plan with traceability matrix
├── coverage-report.md # Coverage analysis and findings
└── findings.md # QA findings and recommendations
Execute each phase sequentially. Do NOT skip phases. Each phase builds on the outputs of the previous one.
After Phase 1 (Test Planning), Phases 2-6 run in parallel — each test type is independent:
# After test plan is written, spawn all test types simultaneously:
Agent(prompt="Write unit tests following Phase 2 rules. Read test-plan.md for traceability. Write to tests/unit/.", ...)
Agent(prompt="Write integration tests following Phase 3 rules. Read test-plan.md. Write to tests/integration/.", ...)
Agent(prompt="Write contract tests following Phase 4 rules. Read test-plan.md. Write to tests/contract/.", ...)
Agent(prompt="Write E2E tests following Phase 5 rules. Read test-plan.md. Write to tests/e2e/.", ...)
Agent(prompt="Write performance tests following Phase 6 rules. Read test-plan.md. Write to tests/performance/.", ...)
Wait for all 5 agents to complete, then run Phase 7 (Test Infrastructure) sequentially — it needs all test files to configure CI.
Why this works: Each test type reads source code independently and writes to its own directory. No conflicts. The test plan from Phase 1 provides shared context.
Execution order:
Goal: Produce a traceability matrix linking every BRD acceptance criterion to concrete test cases, categorized by test type.
Inputs to read:
api/ API contracts (OpenAPI specs, AsyncAPI specs)schemas/ data models and docs/architecture/ sequence diagramsservices/ service structure (list all services, handlers, repos)frontend/ component and page structure (if frontend exists; otherwise skip frontend inputs)Actions:
Output: Write Claude-Production-Grade-Suite/qa-engineer/test-plan.md with the following sections:
Goal: Test each service's business logic, handlers, and repositories in isolation with full mocking of external dependencies.
Inputs to read:
services/ source code for each serviceRules:
tests/unit/<service>/.it("should return 404 when order does not exist for the given user").tests/fixtures/factories/ for test data — never inline large object literals.toEqual over toBeTruthy.Output: Write test files to tests/unit/<service>/.
Also write factories to tests/fixtures/factories/ as you discover entity shapes.
Goal: Test service interactions with real dependencies using testcontainers or docker-compose.
Inputs to read:
services/ database migrations, schemas, connection configsdocs/architecture/ infrastructure requirements (which DBs, caches, brokers)Rules:
tests/integration/docker-compose.test.yml with containers for every real dependency (PostgreSQL, Redis, Kafka, Elasticsearch, etc.). Pin exact image versions.tests/integration/setup.ts with global before/after hooks: start containers, run migrations, seed base data, tear down after suite.Output: Write test files to tests/integration/<service>/.
Write docker-compose.test.yml and setup.ts to tests/integration/.
Goal: Verify API consumers and providers agree on request/response schemas and that implementations conform to OpenAPI specifications.
Inputs to read:
api/ OpenAPI specs and AsyncAPI specsservices/ API route definitions, request/response DTOsfrontend/ API client calls and expected response shapes (if frontend exists; otherwise skip consumer-side frontend contracts)Rules:
pact-broker.config.ts (even if the broker URL is a placeholder).Output: Write contract tests to tests/contract/.
Goal: Test critical user flows end-to-end through the full stack.
Inputs to read:
frontend/ pages and navigation flow (if frontend exists; otherwise API-only E2E)services/ API endpointsRules:
tests/e2e/ui/pages/.data-testid attributes, ARIA roles — never CSS classes or DOM structure.smoke.e2e.ts) that covers the absolute minimum "is the app alive" checks. This runs on every deploy.sleep() calls.<Link> or client-side navigate() targets API routes, external URLs, or auth endpoints. These must use raw <a href> or window.location for full HTTP requests.Output: Write E2E tests and page objects to tests/e2e/. Write Playwright or Cypress config.
Goal: Establish performance baselines and create load/stress test scripts for performance-sensitive endpoints.
Inputs to read:
docs/architecture/ NFRs (latency targets, throughput requirements, SLOs)services/ API endpoints (especially high-traffic ones)Rules:
http_req_duration['p(95)'] < 500, http_req_failed < 0.01.order_processing_time).Output: Write k6 scripts to tests/performance/. Write baseline files to tests/performance/baselines/.
Goal: Configure CI test execution, coverage enforcement, and test reliability tooling.
Inputs to read:
Actions:
tests/coverage/thresholds.json with per-service and global coverage gates:
{
"global": { "lines": 80, "branches": 75, "functions": 80, "statements": 80 },
"services": {
"<service-name>": { "lines": 85, "branches": 80, "functions": 85, "statements": 85 }
}
}
.github/workflows/test.yml (or ci/test-config.yml) with:
tests/fixtures/seed-data/seed-runner.ts.tests/fixtures/mocks/.Output: Write CI config to .github/workflows/test.yml, coverage thresholds and test infrastructure to tests/.
| # | Mistake | Why It Fails | What to Do Instead |
|---|---|---|---|
| 1 | Writing tests inside services/ or frontend/ source directories | Pollutes source directories; violates pipeline separation | Always write tests to tests/ at project root exclusively |
| 2 | Testing implementation details instead of behavior | Tests break on every refactor, providing no safety net | Test public interfaces, inputs, and outputs — not private methods or internal state |
| 3 | Using any type or skipping type assertions in test mocks | Mocks drift from real interfaces silently; tests pass but code is broken | Type mocks against the real interface; use jest.Mocked<typeof RealService> or equivalent |
| 4 | Sharing mutable state between tests | Tests pass in isolation but fail when run together; order-dependent results | Reset state in beforeEach; use factory functions that return fresh instances |
| 5 | Hardcoding connection strings, ports, or URLs in test files | Tests break in CI, on other machines, or when container ports change | Use environment variables with sensible defaults; read from docker-compose labels |
| 6 | Writing integration tests that mock the dependency under test | You are just writing unit tests with extra steps; real bugs slip through | If testing DB queries, use a real database. If testing cache, use real Redis. Mock only the things NOT under test |
| 7 | E2E tests that depend on specific database IDs or auto-increment values | Tests break when seed data changes or when run against a non-empty database | Create test data as part of test setup; reference by unique business identifiers, not DB IDs |
| 8 | Performance test scripts with a single hardcoded request | Does not simulate real traffic patterns; results are misleading | Parameterize requests with varied data; simulate realistic user think-time with sleep(Math.random() * 3) |
| 9 | Coverage thresholds set to 100% | Encourages meaningless tests written just to hit the number; blocks legitimate PRs | Set realistic thresholds (80-85% lines, 75-80% branches); focus on critical path coverage |
| 10 | Ignoring test execution time | Slow test suites get skipped by developers; CI feedback loops become painful | Parallelize tests by service; keep unit suite under 60 seconds; keep integration suite under 5 minutes |
| 11 | Not testing error paths and failure modes | Happy-path-only tests miss the bugs that actually cause production incidents | For every success test, write at least one failure test: invalid input, timeout, auth failure, conflict |
| 12 | Writing E2E tests with sleep() for async waits | Flaky on slow CI runners; wastes time on fast ones | Use explicit wait-for conditions: poll for element visibility, API response, or DB state change |
| 13 | Contract tests that only check status codes | Schema changes, missing fields, and type mismatches go undetected | Validate full response body shape, field types, required fields, and enum values against the contract |
| 14 | No seed data strategy — each test creates its own world from scratch | Integration and E2E suites become extremely slow; redundant setup logic everywhere | Build a shared seed-data layer with factories and a seed runner; tests add only their unique data on top |
| 15 | Generating test files without reading the actual implementation first | Tests reference nonexistent functions, wrong parameter names, or incorrect module paths | Always read the source file before writing its test file; match imports, function signatures, and error types exactly |
| 16 | Auth E2E tests that only check "token returned" | Misses redirect bugs, callback misconfig, and infinite loops that only appear in the full browser flow | Test the complete journey: visit protected page → redirect to login → authenticate → land on original page with authenticated state |
| 17 | Not testing cross-system flows end-to-end | Payment tests that check "Stripe returns success" but never check "order status is updated and user sees confirmation" miss the integration point bugs | For every multi-system flow (auth, payment, webhook), trace from user action to final visible state |
Before marking the skill as complete, verify:
Claude-Production-Grade-Suite/qa-engineer/test-plan.md has a traceability matrix covering every BRD acceptance criterionservices/ has corresponding unit tests in tests/unit/tests/integration/docker-compose.test.yml defines all required test containers with pinned versionstests/coverage/thresholds.json defines realistic per-service coverage gates.github/workflows/test.yml orchestrates all test stages with parallelization and artifact collectiontests/fixtures/factories/ and reused across test types