From enterprise-harness-engineering
Generates complete layered testing strategy (L1-L4 pyramid), plans, architecture, scenarios, code templates, and CI/CD configs for Backend+APP, Backend+WEB, or Backend+APP+Embedded projects.
npx claudepluginhub addxai/enterprise-harness-engineering --plugin enterprise-harness-engineeringThis skill uses the workspace's default tool permissions.
This Skill generates a complete layered testing strategy based on project type (Backend+APP / Backend+WEB / Backend+APP+Embedded), including test layer architecture, scenario adaptation matrices, code templates, and CI/CD configuration.
Writes and runs unit, integration, e2e, performance, and contract tests to verify code functionality.
Designs testing strategies using the pyramid (70% unit, 20% integration, 10% E2E). Guides test types, TDD implementation, and best practices for code quality.
Guides TDD workflows, pytest unit/integration/UAT testing strategies, test pyramid organization, coverage requirements, and GenAI validation for code quality.
Share bugs, ideas, or general feedback.
This Skill generates a complete layered testing strategy based on project type (Backend+APP / Backend+WEB / Backend+APP+Embedded), including test layer architecture, scenario adaptation matrices, code templates, and CI/CD configuration.
Core philosophy: "Quality is built in, not tested in."
L{layer}-<MODULE>-NNN formatmake mock-scenario or equivalent commands for controlled, repeatable inputAsk or auto-detect which scenario the project belongs to and determine the tech stack:
| Scenario | Typical Tech Stack | Special Focus |
|---|---|---|
| Backend+APP | Go/Java + Flutter/RN | App UI testing, API contracts, push notifications |
| Backend+WEB | Go/Java + React/Vue | Browser compatibility, SEO, SSR |
| Backend+APP+Embedded | Go/Java + Flutter + C/C++ (Bazel) | HAL abstraction, Wasm simulation, Digital Twin, Software Update |
graph BT
L1["L1: Unit Test"] --> L2["L2: Single-Service Integration"]
L2 --> L3["L3: Cross-Service E2E"]
L3 --> L4["L4: User Acceptance Test"]
Goal: Verify the logical correctness of the smallest code unit (function/class/state machine). Includes interface contract and schema contract (validation without I/O dependencies).
| Project Type | Test Target | Tools | Focus |
|---|---|---|---|
| Backend | Service/Repository/Domain logic + API Schema | Go Test / JUnit / pytest | Business rules, boundary conditions, interface contracts |
| APP (Flutter) | Widget/BLoC/Provider logic | flutter_test | State management, data transformation |
| APP (RN) | Component/Hook logic | Jest | State management, data transformation |
| WEB | Components/Hooks/Store | Vitest / Jest | Rendering logic, state management |
| Embedded | Cluster logic, FSM, algorithms | GTest + Mock HAL | State transitions, HAL interactions, memory safety |
L1 Writing Standards:
Goal: Verify module collaboration within a single service boundary, using real middleware (DB/Redis/MQ/Vault, etc.) but without crossing service boundaries.
| Project Type | Test Target | Tools |
|---|---|---|
| Backend | Service <-> DB/Redis/Kafka | Docker Compose / Testcontainers |
| APP | App <-> Mock Server | Integration Test + Mock Server |
| WEB | Frontend <-> Mock API | MSW / Vitest |
| Embedded | Device + HAL (Wasm) | Vitest Browser (Digital Twin) |
Goal: Verify the complete chain across multiple independent services, including real third-party APIs.
| Project Type | Test Target | Tools |
|---|---|---|
| Backend+APP | App → API → DB → Push | Appium / Flutter Integration Test |
| Backend+WEB | Browser → API → DB → SSE | Playwright / Cypress |
| Backend+APP+Embedded | App → Cloud → Hub(Wasm) → HAL | Simulator + Vitest Browser + Playwright |
Black-box principle: L3 tests are from the user's perspective — the test client (browser/App) → frontend → backend → database full-chain connectivity. Intercepting or mocking at any intermediate layer is prohibited. The only exception is uncontrollable real third-party services (e.g., payment gateways), which may use stub services, but internal components must not be mocked.
L3-k8s sub-layer (optional): When test scenarios depend on the K8s environment (e.g., Pod scheduling, Spot Recovery), mark as L3-k8s. Use L3 (no K8s) for daily development; use L3-k8s for Nightly/Pre-release.
L4 is not entirely manual: Automation is primary; only scenarios requiring real third-party client interaction retain manual testing.
| Type | Design Principle | Execution Method | Scenario Example |
|---|---|---|---|
| L4-Auto | Acceptance criteria are quantifiable: has input and expected output | make test-l4-uat (automated, Staging environment) | API Roundtrip, Memory persistence, approval workflows |
| L4-Manual | Requires real third-party client visual verification | QA manual testing in real environment | Feishu Card UI rendering, message delivery visual check |
These three suites are not new test layers — they are run suites formed by tagging (markers) existing L1-L4 test cases.
| Suite | Meaning | Case Source | Run Command | Timing |
|---|---|---|---|---|
| Smoke | Quick validation of the most critical paths (system is alive) | A few key L3 cases tagged with @smoke | make smoke | Run immediately after deployment |
| Regression | Full regression verifying historical features are not broken | All L1+L2+L3 cases | make regression | Before PR merge / CI quality gate |
| UAT-Auto | Business acceptance automation (Staging real environment) | Quantifiable portion of L4 cases | make test-l4-uat | Before release / Release Tag |
Key principles:
@pytest.mark.smoke to existing L3 critical-path cases — do not create new test files@pytest.mark.smoke + @pytest.mark.l3# Example: A case belonging to both l3 and smoke
@pytest.mark.l3
@pytest.mark.smoke
async def test_completions_roundtrip(...):
"""L3-COMP-001: The most critical E2E case."""
...
# Example: L4 Auto case — Staging environment, does not assert specific text
@pytest.mark.l4
async def test_agent_memory_persists(letta_client):
"""L4-MS-001: Agent remembers user preferences across conversations."""
archival = await letta_client.archival_memory_search(agent_id, keyword)
assert len(archival.items) > 0 # Having content is sufficient
Based on project type, determine required / recommended / optional for each layer:
| Layer | Status | Focus |
|---|---|---|
| L1 (Unit) | Required | Backend business logic + App state management |
| L2-1 (Interface) | Required | API contracts (OpenAPI/Protobuf) |
| L2-2 (Integration) | Required | Backend service + DB/MQ integration |
| L2-3 (E2E) | Required | App → API full chain |
| L2-4 (Playground) | Recommended | Swagger UI + Mock environment |
| L3-1 (Contract) | Recommended | Frontend-backend API contracts |
| L3-2 (Cross-System) | Optional | Multi-subsystem coordination |
| L4 (UAT) | Required | Real device testing + App Store review process |
Special focus areas:
| Layer | Status | Focus |
|---|---|---|
| L1 (Unit) | Required | Backend business logic + Frontend components/Store |
| L2-1 (Interface) | Required | API contracts + Component Props interface |
| L2-2 (Integration) | Required | Backend service integration + Frontend API layer |
| L2-3 (E2E) | Required | Browser → API full chain (Playwright) |
| L2-4 (Playground) | Recommended | Storybook + Staging environment |
| L3-1 (Contract) | Recommended | Frontend-backend API change compatibility |
| L3-2 (Cross-System) | Optional | Multi-subsystem coordination |
| L4 (UAT) | Recommended | Real browser testing |
Special focus areas:
| Layer | Status | Focus |
|---|---|---|
| L1 (Unit) | Required | Backend + App + Cluster/FSM/algorithms |
| L2-1 (Interface) | Required | API contracts + C ABI + AxData protocol |
| L2-2 (Integration) | Required | Backend integration + Device Wasm integration (Digital Twin) |
| L2-3 (E2E) | Required | App → Cloud → Hub(Wasm) → HAL full chain |
| L2-4 (Playground) | Required | Web Simulator |
| L3-1 (Contract) | Required | Device/cloud/edge protocol contracts |
| L3-2 (Cross-System) | Recommended | Multi-subsystem coordination (Security <-> AI <-> Push) |
| L4 (UAT) | Required | Real hardware + Real App + Real cloud |
Special focus areas:
select switches at build timedocs/testing/
├── strategy.md # Test plan overview (SSOT, ≤400 lines)
└── scenarios/ # Scenario matrices (split when strategy.md is too long)
├── ep1-<epic-name>.md # By Epic (product perspective): User Story → AC scenario traceability
├── ep2-<epic-name>.md
├── tech-<module>.md # By technical module (developer perspective): service/component traceability
└── tech-nfr.md # NFR degradation/fault tolerance
Epic files are for product/QA audiences (organized by User Story); technical files are for developers (organized by module). Both file types share the same set of case IDs for bidirectional traceability.
# <Project Name> Test Plan
## 1. Test Layer Overview
| Layer | Case Count | Test Goal | Real Dependencies | Mock Dependencies | Real Infra | Mock Infra | Execution Timing | Duration | Code Location |
|------|-------|---------|---------|----------|-----------|-----------|---------|------|---------|
> 10-column standard table — real dependencies vs mock dependencies is the core decision basis for layering.
### 1.1 Layering Logic
| Layer | Core Problem Solved | Why the Layer Above Is Insufficient |
|------|-------------|--------------|
> L2 vs L3 boundary: L2 = single application, no external dependencies (all mocked); L3 = real dependency integration.
### 1.2 Shift-Left Principle
| Verification Point | First Appearing Layer | Notes |
|--------|-------------|------|
Prohibited anti-patterns:
- No verifying logic at L3 that should be covered at L1
- No omitting L3 User Story AC cases just because "L1/L2 already tested it"
## 2. Mock Infrastructure (SSOT)
Mock infrastructure is managed by the `mock-engine` skill (start/stop mock services, load test data, create test scenarios).
[mock directory structure + WireMock per-layer switching strategy table]
| External System | L2 Handling | L3 Handling |
|---------|-----------|-----------|
## 3. L2 Integration Tests
> L1 unit tests co-exist with code and are not listed case-by-case in this plan.
Detailed cases in [scenarios/tech-<module>.md].
## 4. L3 E2E Tests (Black-Box)
> Test client interacts through the user interface — **no intermediate layer interception**.
Detailed cases in [scenarios/ep*.md] (by User Story) and [scenarios/tech-*.md] (by module).
## 5. L4 Acceptance Criteria
| Acceptance Item | User Story | Execution Method | Pass Criteria |
|--------|----|---------|----|
## 6. Requirements Traceability Matrix
| User Story | L1 | L2-1 | L2-2 | L3-1 | L3-2 | L4 |
|------------|----|------|------|------|------|-----|
> Each column is filled with specific case IDs. L1 lists covered logic points (co-exists with code, no case IDs).
## 7. Test Scenario Data
| Scenario Name | Purpose | DB Initial State | Covered Cases |
|--------------|------|-----------|---------|
> Each Scenario = DB seed data + Mock stub configuration. Switch with `make mock-scenario SCENARIO=<name>`.
## 8. CI/CD Automation Pipeline + Quality Gates
| Gate | Checkpoint | Criteria |
|------|-------|------|
Epic files and technical files share the AC-level traceability table:
## US-TP-01 Device Card Badge
| AC Scenario | Smoke | L1 | L2-1 | L2-2 | L3-1 | L3-2 | L4 |
|---------|-------|----|------|------|------|------|-----|
| Show locked badge when unsubscribed | fire | TestClass | CONTRACT-001 | EVAL-001 | FG01-001 | TP01-001 | pass |
Fixed 8 columns. Each cell is filled with a specific case ID; use
-when not covered. fire = smoke case. Complete real-world example: Seereferences/engagement-example.md(includes directory structure, table format, mock architecture, semantics annotation conventions, etc.).
L1-<MODULE>-NNN (e.g., L1-AUTH-001)L2-<MODULE>-NNN (e.g., L2-CRED-001)L3-<FLOW>-NNN (e.g., L3-COMP-001)L3k8s-<FLOW>-NNNL4-<FLOW>-NNNCode templates for each tech stack are in references/code-templates.md (Go / Java / Flutter / React / C++ / Playwright / Flutter Web+Playwright / Wasm Digital Twin).
// Correct: event-driven
await waitFor(() => shadowUpdates.get("panel_status") === "armed_away");
// Wrong: never use setTimeout
await new Promise((resolve) => setTimeout(resolve, 3000));
Embedded-specific patterns (State-Wait, Forced Cycle) are in
references/code-templates.md.
Switch preset test data states via make mock-scenario for controlled, repeatable test input:
make mock-scenario SCENARIO=new-user # No history
make mock-scenario SCENARIO=returning-user # Has past interaction history
make mock-scenario SCENARIO=converted-user # Has completed conversion
Each Scenario = a set of DB seed data + Mock stub configuration. Playwright tests specify the scenario via URL parameter (?scenario=new-user) or environment variable, integrating seamlessly with CI.
Inject fault scenarios via WireMock to verify service degradation behavior:
// Inject 5s delay → Expected: service degrades and returns HTTP 200 with empty list
{ "request": { "method": "POST", "urlPattern": "/api/eval" },
"response": { "fixedDelayMilliseconds": 5000, "status": 200 } }
{ "request": { "method": "GET", "urlPattern": "/v1/features/.*" },
"response": { "status": 500 } }
The CI pipeline is organized by test layer into stages: test-l1 (every commit) → test-l2 (Nightly/Merge) → test-e2e (Release) → quality (coverage). The complete .gitlab-ci.yml template is in references/code-templates.md.
| Gate | Checkpoint | Criteria |
|---|---|---|
| CI Gate | L1 | 100% pass, coverage ≥ 80% |
| Nightly | L2 + L3 | 100% pass |
| Release | L1 + L2 + L3 + L4 | 100% pass |
| Memory Safety | ASan/TSan (Embedded) | 0 errors |
| Code Quality | SonarQube | Quality Gate pass |
Manual browser debugging is prohibited — bugs must be reproduced and verified through test cases
Select the test tool based on bug layer:
| Bug Type | Test Layer | Tools |
|---|---|---|
| Backend business logic | L1 | Go Test / JUnit / GTest |
| API interface | L2-1 | Interface test + Mock |
| Frontend component | L1 | Vitest / Jest / flutter_test |
| Frontend-backend integration | L2-2 | Docker Compose + Integration |
| Full chain | L2-3 | Playwright / Simulator |
| Embedded logic | L1 | GTest + Mock HAL |
| Wasm/Browser | L2-2 | Vitest Browser (Digital Twin) |
Write a reproduction test → run to confirm it fails
Analyze test logs → fix code → run to confirm it passes
Run the full test suite to confirm no regressions; retain the test case as regression protection.
stages: [build, test]
test:
script: [go test ./..., flutter test, npm run test]
# Problem: no layering, no gates, L1 failures cannot be quickly pinpointed
references/code-templates.md)stages: [build, test-l1, test-l2, test-e2e, quality]
# Each layer has its own stage, trigger rules, and gates
// Manual setTimeout wait — unreliable and fragile
await new Promise((resolve) => setTimeout(resolve, 3000));
console.log("manually check the browser to see if it's working"); // Manual browser debugging prohibited
// Using Promise Resolver Pattern for precise state change waiting
await waitFor(() => shadowUpdates.get("panel_status") === "armed_away");
expect(shadowUpdates.get("panel_status")).toBe("armed_away");
docs/<project>_test_plan.md)