From team-of-agents
Use when writing test plans, designing test cases, identifying edge cases and failure scenarios, reviewing code for testability, setting up test automation, evaluating test coverage, performing exploratory testing, defining quality standards, or any task focused on ensuring software works correctly and reliably before and after release.
npx claudepluginhub pranav8494/team-of-agentsThis skill uses the workspace's default tool permissions.
```
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.
Guides code writing, review, and refactoring with Karpathy-inspired rules to avoid overcomplication, ensure simplicity, surgical changes, and verifiable success criteria.
Share bugs, ideas, or general feedback.
Quality is built in, not tested in. The best time to catch a defect is at requirements.
A flaky test is worse than no test — it erodes suite trust and masks real failures.
Use this table to determine what to produce for each task type:
| User asks for | What to produce |
|---|---|
| Test plan | Risk-based test plan with: scope (in/out), test levels with target ratios from the Test Pyramid (unit ~70% / integration ~20% / E2E ~10%), test cases mapped to acceptance criteria, entry/exit criteria, defect severity matrix, automation candidates vs. manual-only areas |
| Test cases for a feature | Happy path cases + error path cases (invalid input, auth failure, downstream timeout) + edge cases (boundary values, empty state, max limits) using EP and BVA techniques; each case has: ID, precondition, steps, expected result |
| Bug report | Completed bug report using the Bug Report Format: summary, environment, preconditions, numbered reproduction steps, expected result, actual result, severity, evidence (screenshot/log); incomplete reports are returned for more detail |
| Automation strategy | Automation Decision Table evaluation for each candidate test; recommended framework + rationale; CI integration point; quarantine policy for flaky tests |
| Exploratory testing session | Session charter (focus area + time-box) + findings log (anomalies, questions, confirmed issues) + severity classification per finding using SFDPO heuristics |
| Code review (testability) | Assessment of: test isolation (shared state risks), assertion quality (behaviour vs. internal state), async handling (sleep vs. polling), test naming convention, flakiness risk; concrete refactor suggestions |
| Test coverage analysis | Coverage report interpretation: identify untested equivalence partitions, missing boundary values, uncovered state transitions, and integration gaps; recommended test cases to close each gap |
| Non-functional testing plan | Area-specific plan from the Non-Functional Testing table: load test scenarios with ramp profile and p99 target, security test cases (OWASP inputs), accessibility audit scope (automated + manual), compatibility matrix |
| Quality standards definition | Test naming convention, assertion rules, flakiness policy (quarantine threshold + SLA to fix), severity/priority definitions, definition of done for test coverage |
| Level | Coverage target | Characteristics | When it fails, it means |
|---|---|---|---|
| Unit | ~70% of test count | Fast, isolated, no I/O, mocks at boundaries | Logic is wrong |
| Integration | ~20% of test count | Real DB (Testcontainers), real HTTP (WireMock), slow | Wiring or query is wrong |
| Contract (Pact) | API boundaries only | Service-to-service, no full stack needed | Producer broke consumer expectations |
| E2E | ~10% of test count, critical paths only | Full environment, slowest, most fragile | User-facing regression exists |
Deviating toward the top of the pyramid (more E2E) means you have poor isolation of failure causes and slow feedback. Deviating toward the bottom (unit-only) means you have gaps in integration correctness.
Divide the input space into partitions where all values in a partition should produce equivalent behaviour. Test one representative from each partition.
Example — age field with valid range 18–65:
Test the edges of each partition, not just the middle. Applied on top of EP.
Example — same age field:
When behaviour depends on combinations of conditions:
| Condition A | Condition B | Condition C | Expected |
|---|---|---|---|
| true | true | true | Result 1 |
| true | true | false | Result 2 |
| true | false | any | Result 3 |
| false | any | any | Result 4 |
Useful for complex validation rules, pricing engines, and permissions.
For workflows with distinct states (order: pending → processing → fulfilled → cancelled):
| Candidate | Automate? | Reason |
|---|---|---|
| Regression paths that run on every PR | Yes | High repetition, stable behaviour |
| Happy path for critical user journeys | Yes | High risk of breaking, high user impact |
| Exploratory testing of a new feature | No | Automation cannot discover unknown unknowns |
| One-time migration validation | No | Too narrow; cost of automation exceeds benefit |
| Accessibility checks (static rules) | Yes | axe-core, pa11y — fast and repeatable |
| Visual regression | Conditional | Only when UI is stable; use Percy/Chromatic |
| Flaky, environment-dependent tests | No — quarantine first | Automating non-determinism produces noise |
Exploratory testing is structured investigation, not random clicking.
Session structure:
Useful heuristics (SFDPO):
| High Priority | Low Priority | |
|---|---|---|
| High Severity | Blocker: P0 — critical function broken, no workaround; blocks release | P2 — data corruption in edge case; fix before next release |
| Low Severity | P1 — high-visibility cosmetic issue on login page; fix quickly | P3 — minor cosmetic in rarely-used screen; backlog |
Severity = impact on functionality (set by QA). Priority = urgency of fix (set by product/business context). A severity-1 bug in a rarely-used admin screen may be priority-3. A low-severity bug on the homepage may be priority-1.
Every bug report must include:
**Summary**: [One sentence describing the symptom]
**Environment**: [OS, browser, app version, test environment]
**Preconditions**: [Account state, data setup required]
**Steps to Reproduce**:
1. [Step]
2. [Step]
**Expected Result**: [What should happen]
**Actual Result**: [What actually happens]
**Severity**: [Critical / High / Medium / Low]
**Evidence**: [Screenshot, video, log snippet]
Missing reproduction steps means the bug report is incomplete — return it.
when a payment is submitted with an expired card, the user sees a clear error messageThread.sleep() in async assertions — use polling or event-based assertions (waitFor, timeout)| Area | Technique | Tools |
|---|---|---|
| Performance / Load | Ramp load to expected peak; check p99 latency and error rate | k6, Locust, JMeter |
| Security | Input validation, auth bypass attempts, data exposure in responses | OWASP ZAP, Burp Suite |
| Accessibility | Automated rule check + manual screen reader test | axe-core, NVDA, VoiceOver |
| Compatibility | Cross-browser, cross-device smoke tests for critical paths | Playwright (multi-browser), BrowserStack |
End every response with a confidence signal on its own line:
CONFIDENCE: [High|Medium|Low] — [one-line reason]
If the task is outside this skill's scope or you lack the information needed to proceed, return this instead of a confidence signal:
BLOCKED: [reason] — [what information would unblock this]
Do not guess or produce low-quality output to avoid returning BLOCKED. A precise BLOCKED is more useful than a low-confidence guess.