From antigravity-awesome-skills
Executes test plans, reproduces bugs, audits CI signal integrity, and files WJTTC reports with tier verdicts. Use for testing code, validating functionality, or reproducing failures.
How this skill is triggered — by the user, by Claude, or both
Slash command
/antigravity-awesome-skills:wjttc-testerThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**"We break things so others never have to know they were broken."**
"We break things so others never have to know they were broken."
Apply F1-inspired standards to software testing. When brakes must work flawlessly at race pace, so must the code in production. This skill executes test plans and files reports — it is the driver, not the engineer. To plan and generate the suite, use wjttc-builder.
Triage every test by blast radius. The first three set severity; Tyre and Pit cover durability and the release gate.
| Tier | Symbol | Meaning | Examples |
|---|---|---|---|
| Brake | 🚨 | Life-critical — failure is catastrophic | data loss, auth bypass, payment errors, destructive ops without confirm |
| Engine | ⚡ | Performance-critical — wrong results / poor UX | API accuracy, data transforms, calculations, format compliance, perf |
| Aero | 🏁 | Polish & edge cases — minor inconvenience | UI quirks, rare message formatting, optional-feature edges, docs |
| Tyre | 🛞 | Durability under load — degradation over time | stress/volume, concurrency, memory growth, large inputs |
| Pit | 🔧 | Release gate — the stop that lets you go | smoke/regression suite, CI green, the WJTTC report filed |
Test Brake first. If the brakes don't work, nothing else matters.
Red CI is a contract: it must always mean "stop, look, fix." A suite with high coverage but flaky reds is less trustworthy than a smaller suite with zero false alarms — because the team has stopped reading the reds. Fix the signal before you add more tests.
Method — classify the last 30 days of CI failures:
| Bucket | Definition | Verdict |
|---|---|---|
| Real bug | Red mapped to a real defect; fixed by a code change | ✓ Signal worked |
| Flake | Timing/network/concurrency noise; passed on rerun, no code change | ✗ Test design defect |
| Infra | Missing secret, runner image change, upstream dep — not the code | ✗ Workflow design defect |
Signal Integrity Score: SI = Real bugs / (Real bugs + Flakes + Infra) × 100
| SI % | Verdict | Action |
|---|---|---|
| 100% | ✪ | Maintain — exemplary signal |
| 95–99% | ★ Championship | Annotate any flake immediately |
| 85–94% | ◇ Acceptable | Schedule the flake-class fix this sprint |
| 70–84% | ● Eroding | Stop adding tests — fix flakes first |
| <70% | ○ Dead signal | Block merges until signal restored |
Eliminate on sight: hard absolute-time perf asserts on shared runners (expect(t).toBeLessThan(30)) → move to a non-gating workflow; network calls in the main suite → mock at the boundary; concurrency tests without explicit ordering; secret-dependent steps that hard-fail when missing → grey-skip.
The inverse rule: green CI that passes while something is broken is equally a violation. If a real bug shipped despite green, write the regression test BEFORE the fix lands.
The conversation is the real gate. CI is supporting infrastructure for the human + AI audit; flaky CI wastes the audit's bandwidth. Signal Integrity keeps CI worthy of the conversation.
faf wjttc --path tests # audit tier coverage (vendor-neutral)
faf wjttc --strict --json # CI gate: non-zero if any test is untiered
Save reports to ./wjttc-reports/ in the project under test (or a path the user specifies). Never write to an absolute/personal path. Name files YYYY-MM-DD-{project}-{feature}-tests.yaml.
---
# WJTTC Test Report
project: "project-name"
feature: "feature-being-tested"
date: "2026-06-26"
tier: "Engine" # Brake | Engine | Aero | Tyre | Pit
result: "PASS" # PASS | FAIL | BLOCKED
environment: "OS, runtime version, key deps"
---
## Summary
objective: What was tested
totals: { total: 25, passed: 23, failed: 2, blocked: 0, pass_rate: "92%" }
## Failures
- name: "Long-string handling"
tier: "Engine ⚡"
status: "FAIL"
steps: ["...", "..."]
expected: "Handle gracefully"
actual: "Crash"
error: "RangeError: ..."
root_cause: "Unbounded buffer"
fix: "Cap input length / stream"
## Edge cases
- { case: "Empty string", input: "''", expected: "error", actual: "error", status: "PASS" }
- { case: "Unicode", input: "🏎️", expected: "stored", actual: "stored", status: "PASS" }
## Performance
- { op: "file read", target: "<50ms", actual: "18ms", status: "PASS" }
- { op: "parse YAML", target: "<50ms", actual: "12ms", status: "PASS" }
## Bugs found
- id: 1
title: "..."
severity: "Brake" # tier doubles as severity
reproducibility: "Always"
impact: "Who is affected, how serious"
fix: "..."
## Coverage
tested: ["happy path", "edges", "error handling", "perf"]
not_tested: ["concurrent access", "files >100MB"]
## Verdict
tier: "◆ Silver" # from the tier table below
to_next: ["Fix 2 failing Engine tests", "Add Tyre concurrency tests"]
Map the pass rate (or SI score) to the single canonical FAF tier ladder. No second ladder, no medals.
| Score | Tier | Symbol |
|---|---|---|
| 100% | Trophy | ✪ |
| 99% | Gold | ★ |
| 95% | Silver | ◆ |
| 85% | Bronze | ◇ |
| 70% | Green | ● |
| 55% | Yellow | ● |
| 1% | Red | ○ |
| 0% | White | ♡ |
The FAF score is deterministic — same input, same score. A test report should be just as falsifiable: every verdict traces to a reproducible run. FAF doesn't lie.
faf wjttc enforces that nothing ships untiered.faf taf setup --write # create .github/workflows/taf.yml (test receipts)
faf score --json # deterministic score snapshot for the receipt
faf wjttc --strict green — every test tiered./wjttc-reports/Made with 🧡 by wolfejam.dev — "We break things so others never have to know they were broken."
npx claudepluginhub sickn33/antigravity-awesome-skills --plugin antigravity-bundle-aas-localization-international-growthGenerates F1-inspired, championship-grade test suites by analyzing codebases, classifying components across five tiers, and scaffolding executable test files. Use before coding to define success criteria.
Helps plan, write, review, execute, and maintain manual test cases with reproducible artifacts traceable to design documents.