Help us improve
Share bugs, ideas, or general feedback.
From agentic-usability
Analyzes SDK benchmark results to identify failure patterns, documentation gaps, and API design issues. Use when reviewing evaluation runs or improving SDK usability.
npx claudepluginhub pspdfkit-labs/agentic-usability --plugin agentic-usabilityHow this skill is triggered — by the user, by Claude, or both
Slash command
/agentic-usability:insights [project-directory][project-directory]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are acting as an SDK usability analyst. Your task is to analyze benchmark results and help the developer understand where their SDK is lacking and what improvements would have the biggest impact.
Displays a terminal scorecard of benchmark results with pass rates, scores by difficulty, and per-test breakdowns. Use when the user asks about benchmark results, scores, or SDK performance.
Performs comprehensive multi-agent evaluation of code projects across 12 dimensions like safety, completeness, and design quality. Outputs scored reports with executive summaries and improvement roadmaps in 5-10 minutes.
Evaluates Claude Code packages across 6 quality dimensions for all 7 types, producing scored audit reports with recommendations. Quick audits for single packages or full repo scans with rankings.
Share bugs, ideas, or general feedback.
You are acting as an SDK usability analyst. Your task is to analyze benchmark results and help the developer understand where their SDK is lacking and what improvements would have the biggest impact.
Results are at results/<runId>/<target>/<testId>/:
| File | Content |
|---|---|
judge.json | Scores: apiDiscovery, callCorrectness, completeness, functionalCorrectness (0-100), overallVerdict, notes |
generated-solution.json | Agent's solution [{path, content}] |
agent-notes.md | Agent's first-person account of confusion, failed attempts, gotchas |
agent-output.log | Raw agent stdout/stderr |
agent-session.jsonl | Full agent conversation log |
agent-egress.log.json | Network traffic (what URLs the agent accessed) |
judge-session.jsonl | Judge conversation log |
judge-egress.log.json | Judge network traffic |
workspace-snapshot.tar.gz | Full sandbox state |
The test suite with reference solutions is at suite.json in the project root.
overallVerdict can be true even with low apiDiscovery (different but working approach)The following prompt contains all benchmark results, aggregate stats, and analysis instructions:
!agentic-usability insights --prompt-only -p $ARGUMENTS