From crucible
Audits all crucible skills for overlap, staleness, broken references, and quality. Quick scan or full evaluation modes.
How this skill is triggered — by the user, by Claude, or both
Slash command
/crucible:stocktakeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Audits all crucible skills for overlap, staleness, broken references, and quality.
Audits all crucible skills for overlap, staleness, broken references, and quality.
Announce at start: "I'm using the stocktake skill to audit skill health."
/stocktake or asks to audit skills| Mode | Trigger | Duration |
|---|---|---|
| Quick scan | results.json exists (default) | ~5 min |
| Full stocktake | results.json absent, or /stocktake full | ~20 min |
| Efficiency report | /stocktake efficiency | ~5 min |
Results cache: skills/stocktake/results.json
skills/stocktake/results.jsonevaluated_at timestamp (compare file mtimes)skills/stocktake/results.jsonEnumerate all skill directories under skills/. For each:
Present inventory table:
| Skill | Files | Lines | Last Modified | Description |
|---|
Structural invariants (repo-level). Run the tracked invariant checker from the repo root and treat a non-zero exit as a stocktake failure to surface:
python3 scripts/check_i2_marker.py — the I2 engine-dispatch marker allowlist: the set of files carrying a column-0 `dispatch: delve-engine` body line must equal exactly {delve, temper} (a stray third dispatcher or a missing one fails). Added #336.python3 scripts/check_qg_stagnation_minor.py — the Minor-aware stagnation judge contract: asserts quality-gate/stagnation-judge-prompt.md carries the Step-3 Mixed-branch Minor-accumulation rule + the Consecutive recurring-Minor rounds counter + the DR-Cause enum; that quality-gate/SKILL.md's Minor prose is reconciled (the bare "do not count toward stagnation" claim is gone; path-pinned, literal match); and that the convergence-log dr_cause value set (minor-accumulation | structural-saturation | consensus | null) is documented. Added #260.python3 scripts/check_crossref.py — the cross-reference invariant: every live crucible:<token> in a git-tracked *.md resolves to a real skills/<token>/ dir. Plugin-namespaced agent types (the crucible-* namespace, e.g. crucible-red-team) are resolved against agents/<token>.md (so a typo'd agent ref is still caught); documented template placeholders (skill-name/old-name/new-name) are blanket-exempt. docs/plans/, docs/prds/, and docs/handoffs/ are gitignored and thus naturally excluded since git ls-files lists tracked files only (so new files under those paths are not scanned); tracked surfaces — skills/**, top-level docs/*.md, docs/research/, etc. — ARE scanned. --selftest runs the resolution-logic regression cases. Added #365.python3 scripts/catalog.py check — the generated skill-catalog contract: asserts docs/skills.md's <!-- CATALOG:START/END --> rows are in bijection with skills/*/SKILL.md frontmatter names (no omission, no bogus entry, no naming mismatch), every on-disk skill is in CATEGORIES (and vice-versa, no dangling category), and every registered count token (README, workshop, plugin.json) equals the runtime skill count n. Added #364.python3 scripts/check_calibration_dispatch.py — the calibration-weighted-dispatch wiring invariant: asserts each of the 5 consumers (siege, quality-gate, inquisitor, delve, audit) carries the <!-- CANONICAL: shared/calibration-weighted-dispatch.md --> marker + its own advise <skill> invocation, and that none inlines the convention's prose body (the net-new no-copy assertion check_canonical_drift/check_crossref don't cover). --selftest runs the marker/invocation/no-copy logic cases. Added #372.python3 scripts/check_model_pins.py — the model-tier guardrail: no fable-family pin (fable/claude-fable-5, case-insensitive, across frontmatter model: + inline Task tool/Agent tool forms) on any <!-- MODEL-TIER: security-hard-out -->-marked file, AND every file in the security-surface set (skills/siege/**, skills/dependency-audit/**, agents/crucible-red-team.md, plus narrow offensive/CVE name-stems gated match-then-check-for-pin; audit/test-coverage/stocktake carved out) carries the marker (default-deny). Static tracked-*.md pins only — inherit/session-model roles and untracked consensus config are disclosed residuals (see skills/shared/model-tier-policy.md). --selftest runs the detection-logic cases. Added #392.(Other tracked checkers under scripts/check_*.py may be run here too as they are brought into alignment.)
Dispatch an Opus Explore agent with all skill contents and the evaluation checklist.
Each skill is evaluated against:
crucible: links resolve to existing skills?<!-- CANONICAL: shared/dispatch-convention.md --> / return-convention.md markers (vs. copying or omitting them)?evals/ directory, or is one warranted given its surface? (eval-before-publish; flag absence, don't auto-fail — evals are aspirational across the suite)Each skill gets a verdict:
| Verdict | Meaning |
|---|---|
| Keep | Useful and current |
| Improve | Worth keeping, specific improvements needed |
| Retire | Low quality, stale, or cost-asymmetric |
| Merge into [X] | Substantial overlap with another skill; name the merge target |
Reason quality requirements — the reason field must be self-contained and decision-enabling:
| Skill | Verdict | Reason |
|---|
skills/stocktake/results.jsonTriggered by /stocktake efficiency or by forge feed-forward when 10+ chronicle signals with efficiency data exist.
~/.claude/projects/<hash>/memory/chronicle/signals.jsonlmetrics.efficiency sub-object.Group filtered signals by skill. For each skill, compute:
(est_input_tokens + est_output_tokens) across runsduration_mdispatches_by_tier values)rework_pct across runs. If rework_pct is missing (pre-rework-tracking signal), display "—"If any skill has average rework >30%, append a note: "[skill]: rework >30% — consider reviewing dispatch templates or quality-gate prompts for this skill."
Output:
## Skill Efficiency Report
**Period:** <oldest signal date> to <newest signal date>
**Tracked runs:** N
**Disclaimer:** Estimates based on dispatch file sizes (chars/4). Actual token consumption may vary +/-30%.
### Per-Skill Summary
| Skill | Runs | Avg Est. Tokens (in+out) | Rework % | Avg Duration | Avg Dispatches | Trend |
|-------|------|--------------------------|----------|--------------|----------------|-------|
For each skill, compute dispatch tier distribution and categorize dispatches as review vs. implementation:
dispatches_by_tier averaged across runsNote: Review vs. implementation breakdown requires reading manifest entries (role field). If manifests are not available (only chronicle signals), report "N/A" for these columns.
Output:
### Dispatch Breakdown
| Skill | Opus % | Sonnet % | Haiku % | Review % | Impl % |
|-------|--------|----------|---------|----------|--------|
For each skill, compute:
total_input_chars / total dispatches — measures context per subagentreview dispatches / total dispatches * 100 — what fraction of work is quality assurance (requires manifest data; "N/A" if unavailable)Output:
### Structural Efficiency
| Skill | Avg Input/Dispatch | Context Distribution | Quality Overhead % |
|-------|--------------------|-----------------------|--------------------|
For each skill with sufficient data (3+ runs):
(total_input_chars + total_output_chars) per run — total context the pipeline touchedtotal_input_chars / total dispatches per run — how much context each subagent receives on averageavg input per dispatch / avg total context — lower values mean each subagent sees a smaller slice of the total, indicating effective context distributionreview dispatches / total dispatches — fraction of dispatches dedicated to quality assurance (requires manifest data; "N/A" if only chronicle signals available)Output:
### Baseline Comparison (Structural)
| Skill | Avg Total Context | Avg Input/Dispatch | Context Focus Ratio | Quality Investment |
|-------|-------------------|--------------------|---------------------|--------------------|
**Interpretation:** Context focus ratio measures how much of the total pipeline context each
subagent receives. Lower values mean more focused dispatches. Quality investment shows the
fraction of dispatches dedicated to review, red-team, and quality gates. These are structural
comparisons, not cost savings claims — they measure how the skill distributes work, not what
a monolithic alternative would cost.
Save efficiency report data to skills/stocktake/results.json under a new efficiency key (separate from the skill verdict cache):
{
"efficiency": {
"computed_at": "2026-04-07T10:00:00Z",
"signals_with_efficiency": 15,
"total_signals": 42,
"per_skill": {
"build": { "runs": 8, "avg_est_tokens": 52600, "avg_duration_m": 45, "trend": "stable" },
"debugging": { "runs": 5, "avg_est_tokens": 25000, "avg_duration_m": 22, "trend": "improving" }
}
}
}
skills/stocktake/results.json:
{
"evaluated_at": "2026-03-07T10:00:00Z",
"mode": "full",
"skills": {
"skill-name": {
"path": "skills/skill-name/SKILL.md",
"verdict": "Keep",
"reason": "Concrete, actionable, unique value for X workflow",
"mtime": "2026-01-15T08:30:00Z"
}
}
}
npx claudepluginhub raddue/crucibleAudits Claude skills and commands for quality with quick scans of changed files or full stocktakes using subagents, checklists, and AI verdicts. Invoke via /skill-stocktake.
Diagnoses and optimizes Agent Skills (SKILL.md) by scanning session transcripts for underused skills, wasted context, and CSO issues, then outputting a prioritized report.
Audits pm-skills skills against structural conventions and quality criteria, producing a pass/fail report with actionable recommendations. Use before shipping or after editing a skill.