Skill

qa

QA verification, qualitative assessment, criteria design, and test planning

Install

Run in your terminal

npx claudepluginhub nicsuzor/academicops --plugin aops-cowork

Tool Access

This skill is limited to using the following tools:

TaskReadGlobGrep

Supporting Assets

View in Repository

references/acceptance-testing.md

references/integration-validation.md

references/qa-planning.md

references/qualitative-assessment.md

references/quick-verification.md

references/system-design-qa.md

references/visual-analysis.md

Skill Content

Similar Skills

agent-harness-construction

Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.

ecc

140.3k

agent-payment-x402

Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.

ecc

140.3k

agent-eval

Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.

ecc

140.3k

Stats

Parent Repo Stars0

Parent Repo Forks1

Last CommitApr 2, 2026

Actions

View Source View Plugin View on GitHub View README

/qa — Quality Assurance

Philosophy

Every feature exists for a reason. That reason is expressed practically as user stories: someone needs something, and the feature is supposed to deliver it. QA answers one question: is this feature actually achieving its goals and serving the people it was built for?

This applies whether the feature is a UI dashboard, a gate in a hook pipeline, a batch processing script, an API endpoint, or a skill definition. The evidence might come from:

Running the feature and observing its behavior
Analyzing production logs, transcripts, or hook event data
Driving a browser and evaluating what a user would see
Reading test output and checking coverage
Reviewing code against acceptance criteria
Comparing actual outcomes to intended outcomes across real usage

QA is not a checklist. It is a judgment call: does this work serve the people it was made for? The agent's job is to figure out what evidence is needed, gather it, and evaluate honestly.

What "Good QA" Looks Like

Start from the user story, not the implementation. What was this supposed to do? For whom? Why?
Gather real evidence. Don't evaluate against imagined scenarios — look at actual behavior, actual data, actual user experience.
Evaluate fitness-for-purpose in narrative prose. Binary pass/fail obscures the interesting parts. What works well? What almost works? What fails entirely? Why?
Be honest about what you find. A feature that passes its tests but doesn't serve its users is not good. A feature that has rough edges but genuinely helps is.
Stop after the report. QA evaluates and reports. It does not decompose, fix, or redesign. A separate session handles remediation.

Usage

/qa                                           # Quick verification of current work
/qa Verify the authentication feature         # Specific feature verification
/qa Analyze custodiet gate effectiveness      # Operational effectiveness analysis
/qa Design QA criteria for the new epic       # Upstream criteria design

Reference Materials

These references provide detailed guidance for specific QA activities. Read the ones relevant to your task — you don't need all of them for every QA invocation.

Reference	When useful
[[references/qa-planning.md]]	Designing acceptance criteria or QA plans before development
[[references/qualitative-assessment.md]]	Evaluating fitness-for-purpose after development
[[references/acceptance-testing.md]]	Running structured test plans, tracking failures
[[references/quick-verification.md]]	Pre-completion sanity checks
[[references/integration-validation.md]]	Verifying structural/framework changes
[[references/system-design-qa.md]]	Designing QA infrastructure for a project
[[references/visual-analysis.md]]	UI changes or visual artifacts
[[../eval/references/dimensions.md]]	Agent session performance evaluation

Execution

When delegating to a QA subagent:

Agent(subagent_type="aops-core:qa", model="opus", prompt="
[What you need evaluated and why]

**User story / goal**: [What this feature is supposed to achieve]
**Evidence available**: [Where to find data — logs, transcripts, browser, tests, etc.]
**Acceptance criteria**: [If known — extract from task or spec]

Evaluate fitness-for-purpose. Cite specific evidence. Report honestly.
")

Delegation Guidance for Callers

Preserve qualitative framing. The delegation prompt determines output quality. Never reframe QA as pass/fail or checklist compliance — this causes the agent to regress to mechanical evaluation. The prompt must ask for judgment, not tallying.

Anti-pattern: "Check each user story and report pass/fail" → produces DOM element counting, loses all interpretive value.

Good pattern: "Evaluate fitness-for-purpose. Is this serving the user it was built for? Cite evidence." → produces genuine qualitative assessment.

For features with data pipelines (dashboards, transcripts, reports, generated artifacts), explicitly instruct the agent to trace the pipeline, not just inspect output:

Agent(subagent_type="aops-core:qa", model="opus", prompt="
Qualitative assessment of [FEATURE] against user stories in [SPEC].

For each section: trace the data pipeline from source to output.
Verify data correctness, not just that output appears. Cross-verify against actual sources.
Go deep on 2-3 critical sections rather than skimming everything.

Evaluate fitness-for-purpose. Cite specific evidence. Report honestly.
")

For agent session evaluation, extract sessions first:

cd "$AOPS"
PYTHONPATH=aops-core uv run python \
  aops-core/skills/eval/scripts/prepare_evaluation.py \
  --recent 10 --pretty

Evidence storage for evaluations:

$ACA_DATA/eval/
├── YYYY-MM-DD-<session-id>.md    # Individual session evaluations
├── trends/
│   └── YYYY-MM-DD-batch.md       # Batch trend reports
└── insights/
    └── YYYY-MM-DD-<topic>.md     # Cross-cutting quality insights

Default (no args)

When invoked as /qa with no arguments, do a quick verification of the current session's work:

Identify what was requested and what was done
Check: does the work actually achieve what was requested?
Produce a verdict (VERIFIED / ISSUES) with brief evidence

Integration

Stop hook: May require QA verification before session end
Task completion: QA should verify before complete_task()
Gate tracking: post_qa_trigger() detects QA invocation
Spec writing: templates/spec.md references qa-planning.md for criteria design