From nexus
Diagnoses and eliminates flaky or nondeterministic tests by classifying failure types (ordering, timing, resource, environment, external, concurrency) and isolating root causes with reproducible fixes.
How this skill is triggered — by the user, by Claude, or both
Slash command
/nexus:testingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Structured protocol for diagnosing, isolating, and permanently eliminating non-deterministic test
Structured protocol for diagnosing, isolating, and permanently eliminating non-deterministic test failures. Not a retry wrapper — treats flakiness as a first-class defect.
name: nexus-testing
category: testing / quality
required_context: test file path, failure frequency, CI log or local output, stack trace if available
expected_inputs: test name, framework, failure pattern, environment (local/CI/both), reproduction rate
expected_outputs: flakiness type, reproduction steps, root cause (one sentence), narrowest fix +
verification command, prevention recommendation
xfail/skip without a linked issue and expiry date.Required before any investigation:
| Signal | How |
|---|---|
| Failure rate | Estimate from CI history (1/5? 1/100?) |
| Stack trace | Full verbatim trace from a failing run |
| Test path | Exact: tests/users/test_create.py::test_create_user |
| Framework + version | pytest --version, jest --version, go version |
| CI vs local | Fails only in CI, only locally, or both? |
| Parallelism config | -n auto, --workers, t.Parallel()? |
| Recent changes | git log --oneline -10 on test file and its imports |
If CI-only failure, diff these before reading code: runtime version, OS/arch, parallelism,
TZ / NTP, network access, env vars (missing vars silently produce defaults), I/O speed, Docker layer cache.
| Type | Signature | Mechanism |
|---|---|---|
| Ordering | Fails after specific other tests; passes alone | Shared mutable state not reset |
| Timing | Fails on slow machines or under load | Hard-coded delays, no backoff, wall-clock assertions |
| Resource | Fails parallel, passes single-threaded | Port conflicts, shared temp dirs, DB row locks |
| Environment | Fails in CI only | Runtime delta, missing env var, OS behavior |
| External | Network errors, connection refused, timeouts | Real HTTP/DB calls in test |
| Concurrency | Assertion error varies each run; stack trace differs | Thread/async race, missing lock |
Heuristic:
Record: Primary: <type> | Secondary: <type or none>
Run at minimum 5 times before investigating. Confirm flaky, not consistently failing.
pytest tests/path/to/test.py::test_fn -xvs # alone, verbose
pytest tests/ --randomly-seed=last -x # ordering check
pytest tests/ -n auto -x # contention check
for i in {1..20}; do pytest tests/path/to/test.py::test_fn -x --tb=no -q; done | grep -c FAILED
go test ./pkg/... -run TestFn -count=20 -v 2>&1 | grep -E "PASS|FAIL" # Go
jest --testNamePattern "name" --runInBand --verbose # Jest
Record: X/N failed, fails alone or suite-only, rate under parallelism, verbatim stack trace.
Ordering: find unreset shared state — grep -n "global\|@classmethod\|setUp\|tearDown\|autouse";
look for DB rows not rolled back, in-memory caches, module-level singletons, scope="session" fixtures.
Timing: grep -rn "time.sleep\|asyncio.sleep\|setTimeout" and polling loops without timeout;
look for fixed delays insufficient on slow machines, wall-clock assertions.
Resource: grep -rn "port.*=[0-9]\{4,5\}\|localhost:[0-9]\{4,5\}" and grep -rn "tmp\|tempfile";
look for hardcoded ports, shared temp dirs, DB connections without rollback.
Environment: cat .github/workflows/*.yml | grep -E "python-version|node-version|go-version";
compare env vars between local and CI.
External: grep -rn "requests\.\|httpx\.\|fetch(\|http.Get" and real DB connection strings in tests.
Concurrency: grep -rn "threading\.\|asyncio\.\|goroutine" and missing Lock/Mutex/await.
State root cause as exactly one sentence:
"test_X fails intermittently because Y when Z."
If you cannot state it in one sentence, root cause is not yet identified.
Apply the narrowest fix. Do not improve unrelated code.
| Type | Wrong fix | Correct fix |
|---|---|---|
| Ordering | Delete the interfering test | Add teardown resetting shared state after each test |
| Timing | Increase time.sleep(N) | Replace with wait_for(condition, timeout=N) |
| Resource | Manually assign different port | Use port=0 or a free_port() fixture |
| Environment | Hard-code CI env locally | Parametrize via env var; test both modes in CI |
| External | Add retry in test | Mock external call; test real integration separately |
| Concurrency | Add sleep before assertion | Use lock, event, barrier, or join |
Verification — required consecutive passes by failure rate:
| Rate | Required passes |
|---|---|
| >20% | 20 consecutive passes |
| 5–20% | 30–50 consecutive passes |
| 1–5% | 50 consecutive passes |
| <1% | 100 passes or 30-day CI monitoring |
for i in {1..20}; do pytest tests/path/to/test.py::test_fn -x --tb=short -q; done
pytest --randomly-seed=random to every PR; fixture teardown in PR template.wait_for(condition, timeout, poll_interval) helper; flag new time.sleep in CI.free_port() and tmp_dir() fixtures to conftest.py.--block-network in unit tests; separate @pytest.mark.integration.AsyncEventLoop fixture; asyncio_mode = "auto" in pytest.ini.Every investigation closes with this report. Fill every field.
## Flaky Test Report
**Test:** [full path + test name]
**Framework:** [pytest / jest / go test / other]
**Failure Rate:** [X/N runs]
**Primary Type:** [Ordering | Timing | Resource | Environment | External | Concurrency]
**Secondary Type:** [same options, or "none"]
**Root Cause:** [one sentence — "test_X fails because Y when Z"]
**Why It Happens:** [2–3 sentences — mechanism]
**Fix Applied:** [file path + line numbers + change description]
**Verification:** [command + result — "20/20 passes"]
**Prevention:** [guardrail that stops this class from returning]
**Follow-up Needed:** [yes/no — if yes, describe]
time.sleep() — masks the race, breaks on slower machines.xfail/skip without a linked issue and target fix date.| Document | Purpose |
|---|---|
checklists/investigation-checklist.md | Pre/during/post checklists for structured investigation |
anti-patterns/common-mistakes.md | Common wrong fixes and why they fail |
validation/output-validation.md | Fix confirmation, confidence scoring, escalation |
npx claudepluginhub aayushostwal/nexus --plugin nexusDiagnoses non-deterministic test failures and eliminates root causes (timing, shared state, concurrency, external dependency, randomness) instead of retrying or skipping.
Triages flaky tests across any framework into root-cause categories (async races, shared state, environment coupling, etc.) and assigns remediation or quarantine paths.
Investigates a specific flaky test by retrieving its history, failure pattern, and category, then recommends fix, quarantine, or escalate. Best for DataDog CI users.