From crucible
Audits crucible skills for overlap, staleness, broken references, and quality with quick scan or full evaluation modes.
npx claudepluginhub raddue/crucibleThis skill uses the workspace's default tool permissions.
Audits all crucible skills for overlap, staleness, broken references, and quality.
Audits and optimizes Agent Skills (SKILL.md files) across 8 dimensions using session transcripts and static analysis, prioritizing P0-P2 fixes for better triggering.
Audits Claude skills and commands for quality with quick scans of changed files or full stocktakes using subagents, checklists, and AI verdicts. Invoke via /skill-stocktake.
Diagnoses and optimizes Agent Skills (SKILL.md) using session transcripts and static analysis. Generates reports scoring 8 dimensions with P0-P2 fixes for Claude Code and Codex.
Share bugs, ideas, or general feedback.
Audits all crucible skills for overlap, staleness, broken references, and quality.
Announce at start: "I'm using the stocktake skill to audit skill health."
/stocktake or asks to audit skills| Mode | Trigger | Duration |
|---|---|---|
| Quick scan | results.json exists (default) | ~5 min |
| Full stocktake | results.json absent, or /stocktake full | ~20 min |
| Efficiency report | /stocktake efficiency | ~5 min |
Results cache: skills/stocktake/results.json
skills/stocktake/results.jsonevaluated_at timestamp (compare file mtimes)skills/stocktake/results.jsonEnumerate all skill directories under skills/. For each:
Present inventory table:
| Skill | Files | Lines | Last Modified | Description |
|---|
Dispatch an Opus Explore agent with all skill contents and the evaluation checklist.
Each skill is evaluated against:
crucible: links resolve to existing skills?Each skill gets a verdict:
| Verdict | Meaning |
|---|---|
| Keep | Useful and current |
| Improve | Worth keeping, specific improvements needed |
| Retire | Low quality, stale, or cost-asymmetric |
| Merge into [X] | Substantial overlap with another skill; name the merge target |
Reason quality requirements — the reason field must be self-contained and decision-enabling:
| Skill | Verdict | Reason |
|---|
skills/stocktake/results.jsonTriggered by /stocktake efficiency or by forge feed-forward when 10+ chronicle signals with efficiency data exist.
~/.claude/projects/<hash>/memory/chronicle/signals.jsonlmetrics.efficiency sub-object.Group filtered signals by skill. For each skill, compute:
(est_input_tokens + est_output_tokens) across runsduration_mdispatches_by_tier values)rework_pct across runs. If rework_pct is missing (pre-rework-tracking signal), display "—"If any skill has average rework >30%, append a note: "[skill]: rework >30% — consider reviewing dispatch templates or quality-gate prompts for this skill."
Output:
## Skill Efficiency Report
**Period:** <oldest signal date> to <newest signal date>
**Tracked runs:** N
**Disclaimer:** Estimates based on dispatch file sizes (chars/4). Actual token consumption may vary +/-30%.
### Per-Skill Summary
| Skill | Runs | Avg Est. Tokens (in+out) | Rework % | Avg Duration | Avg Dispatches | Trend |
|-------|------|--------------------------|----------|--------------|----------------|-------|
For each skill, compute dispatch tier distribution and categorize dispatches as review vs. implementation:
dispatches_by_tier averaged across runsNote: Review vs. implementation breakdown requires reading manifest entries (role field). If manifests are not available (only chronicle signals), report "N/A" for these columns.
Output:
### Dispatch Breakdown
| Skill | Opus % | Sonnet % | Haiku % | Review % | Impl % |
|-------|--------|----------|---------|----------|--------|
For each skill, compute:
total_input_chars / total dispatches — measures context per subagentreview dispatches / total dispatches * 100 — what fraction of work is quality assurance (requires manifest data; "N/A" if unavailable)Output:
### Structural Efficiency
| Skill | Avg Input/Dispatch | Context Distribution | Quality Overhead % |
|-------|--------------------|-----------------------|--------------------|
For each skill with sufficient data (3+ runs):
(total_input_chars + total_output_chars) per run — total context the pipeline touchedtotal_input_chars / total dispatches per run — how much context each subagent receives on averageavg input per dispatch / avg total context — lower values mean each subagent sees a smaller slice of the total, indicating effective context distributionreview dispatches / total dispatches — fraction of dispatches dedicated to quality assurance (requires manifest data; "N/A" if only chronicle signals available)Output:
### Baseline Comparison (Structural)
| Skill | Avg Total Context | Avg Input/Dispatch | Context Focus Ratio | Quality Investment |
|-------|-------------------|--------------------|---------------------|--------------------|
**Interpretation:** Context focus ratio measures how much of the total pipeline context each
subagent receives. Lower values mean more focused dispatches. Quality investment shows the
fraction of dispatches dedicated to review, red-team, and quality gates. These are structural
comparisons, not cost savings claims — they measure how the skill distributes work, not what
a monolithic alternative would cost.
Save efficiency report data to skills/stocktake/results.json under a new efficiency key (separate from the skill verdict cache):
{
"efficiency": {
"computed_at": "2026-04-07T10:00:00Z",
"signals_with_efficiency": 15,
"total_signals": 42,
"per_skill": {
"build": { "runs": 8, "avg_est_tokens": 52600, "avg_duration_m": 45, "trend": "stable" },
"debugging": { "runs": 5, "avg_est_tokens": 25000, "avg_duration_m": 22, "trend": "improving" }
}
}
}
skills/stocktake/results.json:
{
"evaluated_at": "2026-03-07T10:00:00Z",
"mode": "full",
"skills": {
"skill-name": {
"path": "skills/skill-name/SKILL.md",
"verdict": "Keep",
"reason": "Concrete, actionable, unique value for X workflow",
"mtime": "2026-01-15T08:30:00Z"
}
}
}