Test strategy design — pyramid, automation, E2E, contract testing, shift-left, test data management, QA-as-a-service strategy, test factory design, PITT methodology, QA CoE design. Use when the user asks to "design test strategy", "build test automation", "implement contract testing", "manage test data", "define quality gates", or mentions test pyramid, Pact, Playwright, Cypress, coverage targets, flaky tests, chaos engineering.
From maonpx claudepluginhub javimontano/mao-discovery-frameworkThis skill is limited to using the following tools:
examples/README.mdexamples/sample-output.htmlexamples/sample-output.mdprompts/metaprompts.mdprompts/use-case-prompts.mdreferences/body-of-knowledge.mdreferences/knowledge-graph.mmdreferences/state-of-the-art.mdreferences/testing-patterns.mdEnables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Testing strategy defines how quality is verified, automated, and measured across the software delivery lifecycle. The skill produces comprehensive test architectures covering shape selection, automation frameworks, contract testing, performance and chaos testing, test data management, and quality metrics that shift quality left while maintaining production confidence.
Un test que no puede fallar no protege nada. La estrategia de testing no se mide por el porcentaje de coverage — se mide por la confianza que genera para hacer deploy un viernes a las 5pm.
The user provides a project or system name as $ARGUMENTS. Parse $1 as the project/system name used throughout all output artifacts.
Parameters:
{MODO}: piloto-auto (default) | desatendido | supervisado | paso-a-paso
{FORMATO}: markdown (default) | html | dual{VARIANTE}: ejecutiva (~40% — S1 pyramid + S3 contracts + S6 metrics) | técnica (full 6 sections, default){TIPO_SERVICIO}: SDA (default) | QA
Before generating strategy, detect the codebase context:
!find . -name "*.test.*" -o -name "*.spec.*" -o -name "*test*" -type d -o -name "jest*" -o -name "pytest*" -o -name "cypress*" | head -20
Use detected testing frameworks, languages, and existing test structure to tailor recommendations.
If reference materials exist, load them:
Read ${CLAUDE_SKILL_DIR}/references/testing-patterns.md
Define the test architecture shape with ratio targets and ROI-based prioritization.
Shape selection decision matrix:
| System Type | Recommended Shape | Ratio (Unit:Integration:E2E) | Rationale |
|---|---|---|---|
| Backend API, business logic-heavy | Pyramid (Fowler) | 70:20:10 | Pure functions, fast unit tests dominate |
| Frontend SPA, UI-heavy | Trophy (Kent C. Dodds) | 20:50:20 + static 10 | Integration tests provide best ROI for component interactions |
| Microservices, many boundaries | Honeycomb | 10:70:20 | Integration across service boundaries matters most |
| Monolith, legacy | Ice cream cone (invert it) | Start E2E, add unit as refactored | Characterization tests first, unit tests on new code |
Testing Trophy (Kent C. Dodds): Static analysis at base, then unit, then integration as the largest layer, then E2E at top. The trophy argues integration tests provide the best confidence-to-cost ratio — they combine realistic coverage with reasonable speed. "Write tests. Not too many. Mostly integration." As E2E tooling matures (Playwright, Vitest Browser Mode), the trophy's top layer grows increasingly cost-effective.
Produce:
Key decisions:
Design the automation infrastructure including tool selection, patterns, and execution strategy.
Tool selection matrix:
| Language/Platform | Unit | Integration | E2E | Visual Regression |
|---|---|---|---|---|
| JavaScript/TypeScript | Vitest, Jest | Testing Library, Supertest | Playwright | Chromatic, Percy |
| Python | pytest | pytest + Testcontainers | Playwright | Percy |
| Java/Kotlin | JUnit 5, jqwik | Spring Boot Test, Testcontainers | Playwright, Selenium | Percy |
| .NET | xUnit, NUnit | WebApplicationFactory | Playwright | Percy |
| Mobile (iOS) | XCTest | XCUITest | Detox, Appium | Percy |
| Mobile (Android) | JUnit, Espresso | Espresso | Detox, Appium | Percy |
Produce:
Ensure service interfaces remain compatible through consumer-driven contracts and schema validation.
Produce:
Contract testing decision matrix:
| Scenario | Approach | Tool |
|---|---|---|
| Internal microservices, multiple consumers | Consumer-driven | Pact |
| Public API, schema-first | Provider-driven | Specmatic, Prism |
| Event-driven, async messaging | Schema registry | Confluent Schema Registry, AWS Glue |
| GraphQL | Schema validation | Apollo Studio, GraphQL Inspector |
Integrate performance validation and failure injection into the testing lifecycle.
Produce:
Chaos maturity model:
| Level | Practice | Environment |
|---|---|---|
| 1 - Learning | Manual fault injection, document results | Staging only |
| 2 - Automated | Scheduled chaos experiments in CI | Staging |
| 3 - Production | Canary chaos with automatic rollback | Production canary |
| 4 - Advanced | Continuous chaos, GameDays, cross-team | Production |
Tools: Chaos Monkey, Litmus, Gremlin, or custom fault injection. Require automatic rollback if safety thresholds exceeded.
Design strategies for creating, managing, and cleaning test data across environments.
Produce:
Database strategy per test level:
| Test Level | Strategy | Tools |
|---|---|---|
| Unit | In-memory, mocked | H2, SQLite, mocks |
| Integration | Containerized, real engine | Testcontainers |
| E2E | Seeded staging or ephemeral | Terraform, Docker Compose |
| Performance | Production-scale anonymized | Custom ETL, Faker at scale |
Incorporate modern testing approaches and define measurable quality indicators.
Property-based testing: Define properties that must hold for all inputs instead of hand-writing examples (e.g., "encode then decode returns original"). Generate hundreds of random inputs; shrink failures to minimal reproducible cases. Particularly effective for parsers, serializers, algorithms, and state machines.
| Language | Tool | Integration |
|---|---|---|
| JS/TS | fast-check | Jest, Vitest |
| Python | Hypothesis | pytest |
| Java | jqwik | JUnit 5 |
| Scala | ScalaCheck | ScalaTest |
| Haskell | QuickCheck | HSpec |
Mutation testing: Seed small faults (mutants) into production code and verify tests catch them. Line coverage measures quantity; mutation testing measures quality.
| Language | Tool | Target Score | CI Cadence |
|---|---|---|---|
| Java | PIT/pitest | >80% on critical paths | Nightly |
| JS/TS/.NET | Stryker | >80% on critical paths | Nightly |
| Python | mutmut | >80% on critical paths | Nightly |
Run on CI nightly, not per-commit (too slow). Focus on critical business logic modules, not the entire codebase.
Visual regression testing: Capture screenshots of UI components/pages, compare against baselines pixel-by-pixel or perceptually.
| Tool | Strength | Best For |
|---|---|---|
| Chromatic | Storybook-native, component-level | Design systems, component libraries |
| Percy | Cross-browser, full-page | Multi-browser apps, full pages |
| Playwright screenshots | Free, CI-integrated | Budget-conscious, custom pipelines |
| BackstopJS | Open source, self-hosted | Self-hosted requirement |
Test impact analysis: Map code changes to affected tests using coverage data. Run only tests that exercise changed code paths. Tools: Launchable, Gradle Enterprise predictive test selection. Reduces CI time 40-70% in large codebases while maintaining confidence.
Quality metrics — track these:
| Decision | Enables | Constrains | When to Use |
|---|---|---|---|
| Heavy unit testing | Fast feedback, cheap maintenance | Misses integration issues | Business logic-heavy, pure functions |
| E2E-heavy strategy | Catches user-facing bugs | Slow, flaky, expensive | Small apps, critical journeys only |
| Contract testing | Decoupled deployment, fast verification | Setup overhead, team coordination | Microservices, multi-team consumers |
| Chaos engineering | Reveals hidden failures | Risk of impact, needs monitoring | Production-ready with observability |
| Mutation testing | Validates test quality | Slow, high compute | Critical business logic modules |
| Property-based testing | Finds edge cases humans miss | Learning curve, slower tests | Parsers, serializers, algorithms |
Greenfield Project: Start with unit test framework from day one. Add integration tests as external dependencies emerge. Defer E2E until user journeys stabilize. Establish conventions early.
Legacy System with No Tests: Start with characterization tests (capture current behavior). Add integration tests around critical paths. Introduce unit tests for new code only. Do not attempt 80% coverage retroactively.
Microservices with Many Consumers: Contract testing is essential. Set up Pact Broker or schema registry. Establish can-i-deploy gates. Each team owns their consumer tests.
Monorepo with Multiple Teams: Use test impact analysis: detect changed modules, run only related tests. Shared test utilities in a common package. Team-owned suites with cross-team integration tests.
Regulated Environment: Test evidence is a compliance artifact. Document test plans, link tests to requirements, archive results. Maintain a traceability matrix: requirement --> test case --> execution result.
Before finalizing delivery, verify:
graph TD
subgraph Core
TS[Testing Strategy]
end
subgraph Inputs
I1[Codebase & Frameworks] --> TS
I2[System Architecture] --> TS
I3[Quality Requirements] --> TS
I4[CI/CD Pipeline] --> TS
end
subgraph Outputs
TS --> O1[Test Shape & Pyramid Design]
TS --> O2[Automation Framework]
TS --> O3[Contract Testing Setup]
TS --> O4[Performance & Chaos Plan]
TS --> O5[Test Data Strategy]
TS --> O6[Quality Metrics & Gates]
end
subgraph Related Skills
RS1[quality-engineering] -.-> TS
RS2[devsecops-architecture] -.-> TS
RS3[observability] -.-> TS
RS4[software-architecture] -.-> TS
RS5[qa-service-discovery] -.-> TS
end
Formato MD (default):
# Testing Strategy: {project_name}
## S1: Test Shape Selection & Pyramid Design
### Shape Decision Matrix | Layer Definitions | Coverage Targets | Shift-Left
## S2: Test Automation Framework
### Tool Selection | Page Object/Screen Object | Parallel Execution | CI Triggers
## S3: Contract & API Testing
### Consumer-Driven | Pact/Specmatic | Schema Validation | can-i-deploy
## S4: Performance & Chaos Testing
### Load in CI | Performance Budgets | Chaos Maturity | Game Days
## S5: Test Data Management
### Synthetic Generation | Anonymization | Environment Strategy | Isolation
## S6: Advanced Techniques & Quality Metrics
### Property-Based | Mutation Testing | Visual Regression | Test Impact Analysis
Formato HTML:
A-01_Testing_Strategy.html -- Branded HTML con Design System CSS de MetodologIA. Incluye diagrama interactivo de test pyramid, matriz de herramientas por plataforma, y dashboard de quality metrics con targets y tendencias.
Formato DOCX (bajo demanda):
{fase}_{entregable}_{cliente}_{WIP}.docxFormato XLSX (bajo demanda):
{fase}_testing-strategy_{cliente}_{WIP}.xlsxFormato PPTX (bajo demanda):
{fase}_{entregable}_{cliente}_{WIP}.pptx| Dimension | Peso | Criterio (7/10 minimo) |
|---|---|---|
| Trigger Accuracy | 10% | Se activa ante keywords de test strategy, pyramid, automation, contract testing; no ante code architecture |
| Completeness | 25% | Las 6 secciones cubren shape, automation, contracts, chaos, data, y metrics con herramientas concretas |
| Clarity | 20% | Shape selection justificada por tipo de sistema; tool matrix es especifica por lenguaje/plataforma |
| Robustness | 20% | Edge cases (greenfield, legacy, microservices, monorepo, regulado) tienen estrategia diferenciada |
| Efficiency | 10% | Variante ejecutiva (S1+S3+S6) entrega shape + contracts + metrics en ~40% del contenido |
| Value Density | 15% | Cada seccion produce configuracion aplicable: framework setup, Pact config, chaos experiments, quality gates |
Umbral minimo: 7/10 en cada dimension. Composite ponderado >= 7.0 para considerar el output aceptable.
| Format | Default | Description |
|---|---|---|
markdown | Yes | Rich Markdown + Mermaid diagrams. Token-efficient. |
html | On demand | Branded HTML (Design System). Visual impact. |
dual | On demand | Both formats. |
Default output is Markdown with embedded Mermaid diagrams. HTML generation requires explicit {FORMATO}=html parameter.
Primary: A-01_Testing_Strategy.html — Executive summary, test shape design, automation framework, contract testing setup, chaos engineering plan, test data strategy, quality metrics dashboard.
Secondary: Test framework configuration files, Pact contract examples, quality gate definitions, flaky test management runbook.
{TIPO_SERVICIO}=QA)When invoked with {TIPO_SERVICIO}=QA, this skill shifts from "test strategy for a software project" to "test strategy design as a QA service offering." Additional sections generated:
Output Artifact (QA variant): Testing_Strategy_QA_Service_{project}.md
Autor: Javier Montaño | Última actualización: 12 de marzo de 2026