Help us improve
Share bugs, ideas, or general feedback.
From AFK Coding Pipeline
Use when implementation and refactoring are done but no human has seen the feature run — exercises the real running project on whatever surface it actually has (browser via agent-browser, HTTP API via curl, or CLI invocations), walks the happy path and negative paths from the spec, and writes an evidence-backed QA report.
npx claudepluginhub alexanderop/afk --plugin afkHow this skill is triggered — by the user, by Claude, or both
Slash command
/afk:qaThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Tests prove the units work. They don't prove a user can finish the flow. The gap
Provides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Guides systematic root-cause debugging when tests fail, builds break, or unexpected errors occur. Provides a structured triage checklist to preserve evidence, localize, and fix issues instead of guessing.
Share bugs, ideas, or general feedback.
Tests prove the units work. They don't prove a user can finish the flow. The gap between "tests pass" and "the feature actually completes" is where AFK features quietly fail — and it's exactly the gap a walk on the real, running surface closes. The surface differs per project — a browser UI, an HTTP API, a CLI — but the job is the same: drive the real thing, observe what actually happens, report with evidence. A claim without evidence (screenshot or captured transcript) is not a finding.
This skill runs in a forked context (snapshots, response bodies, and console
dumps stay out of the main session — only the report comes back). Orient
yourself from disk: .afk/pipeline/<slug>.md for the feature slug and PRD path
if a pipeline is running, otherwise the newest docs/specs/prd-*.md.
Read qa.mode from .afk/config.json (recorded by afk:setup). If absent,
infer it and state your choice in the report:
| Mode | When | You drive it with |
|---|---|---|
| browser | The feature has a UI a human would click | agent-browser against devUrl |
| api | Pure backend: HTTP/RPC endpoints, webhooks, no UI | curl (or the project's HTTP client) against the running service |
| cli | Command-line tool or library | Real invocations of the built binary / a scratch consumer script |
A feature can span modes (an endpoint AND the form that calls it) — then QA the outermost surface a user touches, and drop to the API for cases the UI can't reach (malformed payloads, missing auth).
.afk/config.json for the dev command and devUrl. Start the app or
service; wait until it responds. (cli mode: build the binary / link the
package instead.)agent-browser --help works. If not installed, tell
the user how to install it and offer to fall back to api mode against the
same endpoints, or the project's e2e runner. Do not silently skip QA.qa/ and qa/evidence/<slug>/.Derive test cases from the PRD — not from the implementation:
How to drive and what counts as evidence:
@e1 refs,
not CSS selectors), act, screenshot every state transition, and read the
console/network even when the UI looks right. A rendered success screen with
a 500 in the console is a FAIL. Evidence: qa/evidence/<slug>/tc01-*.png.{"ok":true}, or a missing side effect is a FAIL. Also watch the service
logs for stack traces on requests that "succeeded". Evidence:
qa/evidence/<slug>/tc01.http.md (request, response, side-effect check).qa/evidence/<slug>/tc01.txt.Write qa/<slug>.md:
# QA Report: <feature> — <date>
Mode: browser | api | cli (and why, if inferred)
## Verdict: PASS | FAIL (N of M cases failed)
## TC-01: <name> — PASS/FAIL
Steps taken: (numbered, exactly what you did — commands/requests verbatim)
Expected: (from the PRD)
Actual: (what happened, console/log/stderr errors included)
Evidence: qa/evidence/<slug>/tc01-*
Failures must be reproducible from the report alone — a developer (or the next pipeline phase) fixes from your steps, not from your memory. If anything is red, hand the failing cases back to afk:ralph as fix tasks before review.
| Thought | Reality |
|---|---|
| "Pure backend, no browser — QA doesn't apply" | The surface is the API. curl the spec's acceptance criteria against the running service. No-UI ≠ no-QA. |
| "The e2e tests cover this, QA is redundant" | The e2e tests were written by the same loops that wrote the bugs. Independent eyes, real surface. |
| "It returned 200 / exit 0, mark it PASS" | Check the body, the side effect, the logs, the console. Looking right and being right differ by one swallowed error. |
| "I'll test what was implemented" | Test what was SPECIFIED. The difference between the two is the bug. |
| "Skip the hostile pass, users won't do that" | Users (and integrators, and retrying clients) do exactly that, within the first hour. |
| "Evidence at the end is enough" | Every state transition / every request-response pair. The bug is always in the step you didn't capture. |
qa.mode from .afk/config.json (afk:setup).