npx claudepluginhub nicsuzor/academicops --plugin aops-coworkThis skill is limited to using the following tools:
references/acceptance-testing.mdreferences/integration-validation.mdreferences/qa-planning.mdreferences/qualitative-assessment.mdreferences/quick-verification.mdreferences/system-design-qa.mdreferences/visual-analysis.mdDesigns and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Every feature exists for a reason. That reason is expressed practically as user stories: someone needs something, and the feature is supposed to deliver it. QA answers one question: is this feature actually achieving its goals and serving the people it was built for?
This applies whether the feature is a UI dashboard, a gate in a hook pipeline, a batch processing script, an API endpoint, or a skill definition. The evidence might come from:
QA is not a checklist. It is a judgment call: does this work serve the people it was made for? The agent's job is to figure out what evidence is needed, gather it, and evaluate honestly.
/qa # Quick verification of current work
/qa Verify the authentication feature # Specific feature verification
/qa Analyze custodiet gate effectiveness # Operational effectiveness analysis
/qa Design QA criteria for the new epic # Upstream criteria design
These references provide detailed guidance for specific QA activities. Read the ones relevant to your task — you don't need all of them for every QA invocation.
| Reference | When useful |
|---|---|
| [[references/qa-planning.md]] | Designing acceptance criteria or QA plans before development |
| [[references/qualitative-assessment.md]] | Evaluating fitness-for-purpose after development |
| [[references/acceptance-testing.md]] | Running structured test plans, tracking failures |
| [[references/quick-verification.md]] | Pre-completion sanity checks |
| [[references/integration-validation.md]] | Verifying structural/framework changes |
| [[references/system-design-qa.md]] | Designing QA infrastructure for a project |
| [[references/visual-analysis.md]] | UI changes or visual artifacts |
| [[../eval/references/dimensions.md]] | Agent session performance evaluation |
When delegating to a QA subagent:
Agent(subagent_type="aops-core:qa", model="opus", prompt="
[What you need evaluated and why]
**User story / goal**: [What this feature is supposed to achieve]
**Evidence available**: [Where to find data — logs, transcripts, browser, tests, etc.]
**Acceptance criteria**: [If known — extract from task or spec]
Evaluate fitness-for-purpose. Cite specific evidence. Report honestly.
")
Preserve qualitative framing. The delegation prompt determines output quality. Never reframe QA as pass/fail or checklist compliance — this causes the agent to regress to mechanical evaluation. The prompt must ask for judgment, not tallying.
Anti-pattern: "Check each user story and report pass/fail" → produces DOM element counting, loses all interpretive value.
Good pattern: "Evaluate fitness-for-purpose. Is this serving the user it was built for? Cite evidence." → produces genuine qualitative assessment.
For features with data pipelines (dashboards, transcripts, reports, generated artifacts), explicitly instruct the agent to trace the pipeline, not just inspect output:
Agent(subagent_type="aops-core:qa", model="opus", prompt="
Qualitative assessment of [FEATURE] against user stories in [SPEC].
For each section: trace the data pipeline from source to output.
Verify data correctness, not just that output appears. Cross-verify against actual sources.
Go deep on 2-3 critical sections rather than skimming everything.
Evaluate fitness-for-purpose. Cite specific evidence. Report honestly.
")
For agent session evaluation, extract sessions first:
cd "$AOPS"
PYTHONPATH=aops-core uv run python \
aops-core/skills/eval/scripts/prepare_evaluation.py \
--recent 10 --pretty
Evidence storage for evaluations:
$ACA_DATA/eval/
├── YYYY-MM-DD-<session-id>.md # Individual session evaluations
├── trends/
│ └── YYYY-MM-DD-batch.md # Batch trend reports
└── insights/
└── YYYY-MM-DD-<topic>.md # Cross-cutting quality insights
When invoked as /qa with no arguments, do a quick verification of the current session's work:
complete_task()post_qa_trigger() detects QA invocation