From skill-picker
Use when user wants to compare, evaluate, or choose between AI coding skills, or when user says "skill picker". Also use when user is unsure which skill to install for a specific problem, wants to benchmark skill effectiveness, or asks "which skill is better".
npx claudepluginhub qhuang20/skill-picker --plugin skill-pickerThis skill uses the workspace's default tool permissions.
Compare skills head-to-head to find which one actually improves CC's performance for your specific problem.
Creates, modifies, improves, tests, and benchmarks Claude Code skills using category-aware design, gotchas-driven development, eval prompts, and performance analysis.
Creates, fixes, validates, and analyzes skills for Claude Code AI agents. Runs Python analysis scripts on structure, token budgets, tools, and reusability from sessions.
Searches, installs, updates, rates, lists, removes, and security-scans 100,000+ AI agent skills from agentskill.sh. Use /learn command or for skill discovery, installation, and management.
Share bugs, ideas, or general feedback.
Compare skills head-to-head to find which one actually improves CC's performance for your specific problem.
Run controlled A/B tests: same model (CC), same tasks, different skills. Each candidate runs in an isolated worktree sub-agent. Output: a markdown comparison report with scores and a winner.
digraph skill_battle {
rankdir=TB;
node [shape=box];
diagnose [label="Phase 1: Diagnose\nUnderstand the pain point"];
source [label="Phase 2: Source Skills\nLocal path or web search"];
define [label="Phase 3: Define Tests\nTasks + evaluation criteria"];
execute [label="Phase 4: Execute Battle\nParallel worktree sub-agents"];
report [label="Phase 5: Report\nCompare and output .md"];
diagnose -> source -> define -> execute -> report;
user_has_skills [label="User already\nhas skills?" shape=diamond];
diagnose -> user_has_skills [style=dashed];
user_has_skills -> source [label="no"];
user_has_skills -> define [label="yes, skip search"];
}
Goal: Understand what problem the user is trying to solve with a skill.
If user triggers skill-picker without context (just says "skill picker"), start with question 1.
If user provides skills upfront (e.g., "compare these two skill folders"), skip diagnosis — extract the category from the skill content and go to Phase 2 Step 2a or Phase 3.
Goal: Collect 2+ candidate skills for comparison.
SKILL.mdSKILL.md.md filename and description from frontmatter"[category] skill" site:github.com SKILL.md"awesome claude skills" [category]"[category]" CLAUDE.md agent skill github"[category]" awesome-agent-skillsFound 5 candidate skills for [category]:
1. user/repo (4.2K stars) — description
2. user/repo (2.1K stars) — description
3. ...
.skill-picker/candidates/<skill-name>/reference/, templates/, scripts/), download those too. A skill missing its dependencies will not execute properly..claude/skills/picker-<name>/SKILL.md (prefix picker- avoids name conflicts with user's existing skills).claude/agents/picker-<name>.md for each:
---
name: picker-<name>
description: Skill picker test agent with <name>
skills:
- picker-<name>
---
Complete the assigned task using the loaded skill's methodology. Do your best work.
.claude/agents/picker-baseline.md (no skills: field) for the control group/exit to restart CC, then come back and say '继续' to proceed from Phase 3."
(Agents and skills are loaded at session startup — a restart is required for new ones to be recognized.)Goal: Agree on test tasks, establish ground truth with the user, and define evaluation criteria.
Before designing tasks, restate the problem to the user: "Just to confirm — the problem we're solving is [X]. The skills should help with [Y]. Correct?" This prevents wasted effort if CC misunderstood the problem (e.g., searching for "citation verification" skills when the problem is "inaccurate numbers").
CC's own answers carry LLM bias. Ground truth MUST be verified with the user.
For each task:
Task 1: [title]
Ground truth (draft):
- Key fact A: [value] (source: [URL])
- Key fact B: [value] (source: [URL])
- ...
.skill-picker/ground-truth-YYYY-MM-DD.md — include all tasks, standard answers, sources, and evaluation criteria. This file is referenced during Phase 5 scoring.Why this matters: Without user-verified ground truth, CC is both writing the exam and grading it — its own biases would propagate into the scoring. The user is the ultimate authority on what counts as correct.
For each task, define with user:
Hard criteria (against ground truth):
Soft criteria (CC judges, informed by ground truth):
Weights (optional): Ask user if any criteria matter more. Default: equal weight.
Confirm: "Ground truth set, criteria defined. Ready to run?"
Goal: Run each skill + baseline against the same tasks in parallel sub-agents.
Skills are loaded via predefined agent definitions (.claude/agents/picker-<name>.md) with the skills: frontmatter field. This ensures the system injects the full skill content into each sub-agent at startup — no truncation, no simplification. This is more reliable than prompt injection, where the main CC may simplify long skill content.
Prerequisites (done in Phase 2):
.claude/skills/picker-<name>/SKILL.md exists for each candidate.claude/agents/picker-<name>.md exists with skills: [picker-<name>].claude/agents/picker-baseline.md exists (no skills)If the current directory is a git repo, add isolation: worktree to sub-agent calls for file-level isolation. This is important for code tasks where sub-agents write/modify files. For pure Q&A tasks, worktree is optional but still recommended.
Note to user: If not in a git repo, run git init && git add -A && git commit -m "init" then restart your CC session (CC caches git status at startup).
For each candidate skill AND the baseline, spawn a sub-agent using the predefined agent type:
Agent(subagent_type: "picker-database-lookup", isolation: worktree, prompt: "...")
Agent(subagent_type: "picker-paper-lookup", isolation: worktree, prompt: "...")
Agent(subagent_type: "picker-fact-checking", isolation: worktree, prompt: "...")
Agent(subagent_type: "picker-baseline", isolation: worktree, prompt: "...")
Since skills are already loaded via the agent definition, the prompt only needs the task:
You are participating in a skill comparison test. Do your best work.
Follow the methodology from your loaded skills.
## Your Task
<task description from Phase 3>
## Output Requirements
When you finish, end your response with this exact format:
## Picker Result
**Skill:** <skill-name or "baseline">
**Task:** <task title>
### Output
<your full work product>
### Self-Check
- [criterion]: PASS/FAIL — brief reason
(repeat for each hard criterion from Phase 3)
After the report is generated, offer to clean up test artifacts:
.claude/skills/picker-* directories.claude/agents/picker-* files.skill-picker/ (candidates, ground truth, reports)Goal: Compare all results and output a markdown report.
For each task x skill combination:
Save to: .skill-picker/report-YYYY-MM-DD.md
# Skill Picker Report
**Date:** YYYY-MM-DD
**Problem:** <diagnosed problem>
**Skills tested:** <list with sources>
**Tasks:** <count>
## Summary
| Rank | Skill | Score | Strengths | Weaknesses |
|------|-------|-------|-----------|------------|
| 1 | skill-name | 85/100 | ... | ... |
| 2 | skill-name | 72/100 | ... | ... |
| - | baseline | 60/100 | ... | ... |
## Winner: [skill-name]
<2-3 sentences: why it won, how much better than baseline>
## Detailed Results
### Task 1: [title]
| Skill | Hard | Soft | Total | Notes |
|-------|------|------|-------|-------|
| ... | .../N | .../5 | ... | ... |
<comparison of outputs>
(repeat for each task)
## Execution Stats
| Skill | Tool Calls | Duration | Search Behavior |
|-------|-----------|----------|-----------------|
| ... | N | Ns | searched / no search |
## Recommendation
<which skill to install, any caveats, setup tips>