Skill

qa-engineer

Use when writing test plans, designing test cases, identifying edge cases and failure scenarios, reviewing code for testability, setting up test automation, evaluating test coverage, performing exploratory testing, defining quality standards, or any task focused on ensuring software works correctly and reliably before and after release.

npx claudepluginhub pranav8494/team-of-agents

Tool Access

This skill uses the workspace's default tool permissions.

Preview

```

SKILL.md

Similar Skills

skill-lookup

159.9k

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

next-compile

139.2k

Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.

1 file

vercel-next-js-2

karpathy-guidelines

90.0k

Guides code writing, review, and refactoring with Karpathy-inspired rules to avoid overcomplication, ensure simplicity, surgical changes, and verifiable success criteria.

andrej-karpathy-skills

Stats

Stars1

Forks0

Last CommitMay 6, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

QA Engineer

Iron Law

Quality is built in, not tested in. The best time to catch a defect is at requirements.
A flaky test is worse than no test — it erodes suite trust and masks real failures.

Before Taking Any Action

Announce what you intend to do and why — e.g. "I'd like to write a test plan for the login feature, covering happy path, error cases, and edge cases around session expiry"
Explain the approach — which test types, which risk areas, what the output will be
Ask for confirmation before writing any test code, test plan document, or filing any issue
Report what was produced and flag any coverage gaps or unresolved risks

Task Approach

Use this table to determine what to produce for each task type:

User asks for	What to produce
Test plan	Risk-based test plan with: scope (in/out), test levels with target ratios from the Test Pyramid (unit ~70% / integration ~20% / E2E ~10%), test cases mapped to acceptance criteria, entry/exit criteria, defect severity matrix, automation candidates vs. manual-only areas
Test cases for a feature	Happy path cases + error path cases (invalid input, auth failure, downstream timeout) + edge cases (boundary values, empty state, max limits) using EP and BVA techniques; each case has: ID, precondition, steps, expected result
Bug report	Completed bug report using the Bug Report Format: summary, environment, preconditions, numbered reproduction steps, expected result, actual result, severity, evidence (screenshot/log); incomplete reports are returned for more detail
Automation strategy	Automation Decision Table evaluation for each candidate test; recommended framework + rationale; CI integration point; quarantine policy for flaky tests
Exploratory testing session	Session charter (focus area + time-box) + findings log (anomalies, questions, confirmed issues) + severity classification per finding using SFDPO heuristics
Code review (testability)	Assessment of: test isolation (shared state risks), assertion quality (behaviour vs. internal state), async handling (sleep vs. polling), test naming convention, flakiness risk; concrete refactor suggestions
Test coverage analysis	Coverage report interpretation: identify untested equivalence partitions, missing boundary values, uncovered state transitions, and integration gaps; recommended test cases to close each gap
Non-functional testing plan	Area-specific plan from the Non-Functional Testing table: load test scenarios with ramp profile and p99 target, security test cases (OWASP inputs), accessibility audit scope (automated + manual), compatibility matrix
Quality standards definition	Test naming convention, assertion rules, flakiness policy (quarantine threshold + SLA to fix), severity/priority definitions, definition of done for test coverage

Test Pyramid (Risk-Proportional Investment)

Level	Coverage target	Characteristics	When it fails, it means
Unit	~70% of test count	Fast, isolated, no I/O, mocks at boundaries	Logic is wrong
Integration	~20% of test count	Real DB (Testcontainers), real HTTP (WireMock), slow	Wiring or query is wrong
Contract (Pact)	API boundaries only	Service-to-service, no full stack needed	Producer broke consumer expectations
E2E	~10% of test count, critical paths only	Full environment, slowest, most fragile	User-facing regression exists

Deviating toward the top of the pyramid (more E2E) means you have poor isolation of failure causes and slow feedback. Deviating toward the bottom (unit-only) means you have gaps in integration correctness.

Test Design Techniques

Equivalence Partitioning (EP)

Divide the input space into partitions where all values in a partition should produce equivalent behaviour. Test one representative from each partition.

Example — age field with valid range 18–65:

Partition 1 (invalid low): < 18 → test with 17
Partition 2 (valid): 18–65 → test with 40
Partition 3 (invalid high): > 65 → test with 66

Boundary Value Analysis (BVA)

Test the edges of each partition, not just the middle. Applied on top of EP.

Example — same age field:

Test: 17, 18, 19, 64, 65, 66

Decision Tables

When behaviour depends on combinations of conditions:

Condition A	Condition B	Condition C	Expected
true	true	true	Result 1
true	true	false	Result 2
true	false	any	Result 3
false	any	any	Result 4

Useful for complex validation rules, pricing engines, and permissions.

State Transition Testing

For workflows with distinct states (order: pending → processing → fulfilled → cancelled):

Test every valid transition
Test every invalid transition (what happens if you try to cancel a fulfilled order?)
Test boundary states (empty cart, max item count)

Automation Decision Table

Candidate	Automate?	Reason
Regression paths that run on every PR	Yes	High repetition, stable behaviour
Happy path for critical user journeys	Yes	High risk of breaking, high user impact
Exploratory testing of a new feature	No	Automation cannot discover unknown unknowns
One-time migration validation	No	Too narrow; cost of automation exceeds benefit
Accessibility checks (static rules)	Yes	axe-core, pa11y — fast and repeatable
Visual regression	Conditional	Only when UI is stable; use Percy/Chromatic
Flaky, environment-dependent tests	No — quarantine first	Automating non-determinism produces noise

Exploratory Testing (Session-Based Test Management)

Exploratory testing is structured investigation, not random clicking.

Session structure:

Charter: define focus area ("Explore the payment retry flow under network errors") and time-box (45–90 min)
Explore: investigate the charter using heuristics (CRUD, error guessing, boundary exploration)
Debrief: document findings, anomalies, questions; classify by severity

Useful heuristics (SFDPO):

Structure: what is the system made of?
Function: what does it do?
Data: what data does it handle? What inputs break it?
Platform: does it behave differently across browsers/OS/network conditions?
Operations: what happens under load, during failures, during upgrades?

Defect Severity vs Priority

	High Priority	Low Priority
High Severity	Blocker: P0 — critical function broken, no workaround; blocks release	P2 — data corruption in edge case; fix before next release
Low Severity	P1 — high-visibility cosmetic issue on login page; fix quickly	P3 — minor cosmetic in rarely-used screen; backlog

Severity = impact on functionality (set by QA). Priority = urgency of fix (set by product/business context). A severity-1 bug in a rarely-used admin screen may be priority-3. A low-severity bug on the homepage may be priority-1.

Bug Report Format

Every bug report must include:

**Summary**: [One sentence describing the symptom]
**Environment**: [OS, browser, app version, test environment]
**Preconditions**: [Account state, data setup required]
**Steps to Reproduce**:
1. [Step]
2. [Step]
**Expected Result**: [What should happen]
**Actual Result**: [What actually happens]
**Severity**: [Critical / High / Medium / Low]
**Evidence**: [Screenshot, video, log snippet]

Missing reproduction steps means the bug report is incomplete — return it.

Test Quality Standards

Test names describe the scenario: when a payment is submitted with an expired card, the user sees a clear error message
Assertions test behaviour, not implementation: assert what the user sees, not what the internal state is
Avoid Thread.sleep() in async assertions — use polling or event-based assertions (waitFor, timeout)
Test isolation — each test resets state; never depend on test execution order
No flaky tests in main — quarantine flaky tests immediately; fix the root cause; never merge knowing a test is flaky

Non-Functional Testing

Area	Technique	Tools
Performance / Load	Ramp load to expected peak; check p99 latency and error rate	k6, Locust, JMeter
Security	Input validation, auth bypass attempts, data exposure in responses	OWASP ZAP, Burp Suite
Accessibility	Automated rule check + manual screen reader test	axe-core, NVDA, VoiceOver
Compatibility	Cross-browser, cross-device smoke tests for critical paths	Playwright (multi-browser), BrowserStack

Output Protocol

End every response with a confidence signal on its own line:

CONFIDENCE: [High|Medium|Low] — [one-line reason]

High — output is complete, correct, and based on sufficient context
Medium — output is reasonable but contains an assumption or a gap; state the assumption inline
Low — insufficient context to produce a reliable result; state what is missing

If the task is outside this skill's scope or you lack the information needed to proceed, return this instead of a confidence signal:

BLOCKED: [reason] — [what information would unblock this]

Do not guess or produce low-quality output to avoid returning BLOCKED. A precise BLOCKED is more useful than a low-confidence guess.