From armory
Evaluates Claude Code packages across 6 quality dimensions like frontmatter and structure for all 7 package types, producing scored audit reports. Use for quick single-package audits or full repository scans.
npx claudepluginhub mathews-tom/armory --plugin armoryThis skill uses the workspace's default tool permissions.
Packages that do not activate on relevant queries waste the entire investment in writing
Implements Playwright E2E testing patterns: Page Object Model, test organization, configuration, reporters, artifacts, and CI/CD integration for stable suites.
Guides Next.js 16+ Turbopack for faster dev via incremental bundling, FS caching, and HMR; covers webpack comparison, bundle analysis, and production builds.
Discovers and evaluates Laravel packages via LaraPlugins.io MCP. Searches by keyword/feature, filters by health score, Laravel/PHP compatibility; fetches details, metrics, and version history.
Packages that do not activate on relevant queries waste the entire investment in writing them. A skill can have deep, well-structured content and still deliver zero value if its frontmatter description lacks the trigger phrases users actually type. An agent without a decision tree produces inconsistent results. A hook without a handler script is inert. Quality evaluation catches trigger gaps, missing sections, structural deficiencies, and shallow content before deployment — turning a package from a static document into a reliable tool.
| File | Contents |
|---|---|
references/evaluation-rubric.md | Detailed 1-5 scoring criteria per dimension, weight justifications, type-specific criteria, worked examples for calibration |
Two modes, selected by input:
| Input | Mode |
|---|---|
| Path to a specific package directory or definition file | Quick Audit |
| "all", "every package", no path specified | Full Audit |
| "--type agents" or type filter | Full Audit filtered to one type |
| Multiple specific paths | Quick Audit for each, then comparative summary |
Six dimensions, each scored 1-5. Weighted sum determines overall percentage.
Evaluates the YAML frontmatter block for completeness and discoverability.
Signals:
name field present and non-emptydescription field present and non-emptyScoring constraints: A description under 100 characters caps this dimension at 2/5.
A missing name or description field caps at 1/5.
Evaluates whether the package activates on the queries users actually type.
Signals:
Scoring constraints: Fewer than 3 distinct trigger phrases caps at 2/5. Zero trigger phrases in the description caps at 1/5.
Evaluates whether the package contains the sections needed to function reliably.
Signals:
Scoring constraints: A package with no workflow section caps at 2/5. A package with a workflow but no error handling or output format caps at 3/5.
Evaluates the substantive quality of the package's guidance — whether it provides enough detail for an agent to execute well without human intervention.
Signals:
Scoring constraints: A package consisting only of bare commands with no explanatory context caps at 2/5. Reference files count toward this dimension only if they contain substantive guidance (checklists, rubrics, criteria), not just link collections.
Evaluates internal consistency and structural integrity.
Signals:
name field in frontmatter exactly../other-package/) in the definition
file or reference files. Packages must be standalone — all referenced files must live
within the package's own directory. Shared content should use the _templates/ sync
system to maintain local copies.Scoring constraints: A name mismatch between directory and frontmatter is a CRITICAL
finding and caps at 1/5. Cross-package ../ references are a CRITICAL finding and cap
at 1/5 — they break standalone packaging. Missing referenced files cap at 2/5.
Evaluates adherence to the repository's contribution guidelines.
Signals:
Scoring constraints: Any single violation caps at 3/5. Multiple violations cap at 2/5. Invalid YAML that prevents parsing caps at 1/5.
In addition to the 6 shared dimensions, each package type has type-specific quality signals that influence D3 (Structural Completeness) and D4 (Content Depth) scoring.
When evaluating an AGENT.md:
D3 additions:
model field specified (opus/sonnet/haiku)color field specifiedmetadata.category specifiedmetadata.execution_phase specified (pre-write/post-write/pre-commit)metadata.language_targets specifiedD4 additions:
Scoring impact: An agent without a decision tree caps D4 at 2/5. An agent without model/category/phase metadata caps D3 at 3/5.
When evaluating a HOOK.md:
D3 additions:
hook.events list specified (PreToolUse, PostToolUse, Stop, etc.)hook.handler.type specified (command/python-module)hook.handler.command specifiedhandler.sh or equivalent handler script exists in directoryD4 additions:
Scoring impact: A hook without a handler script caps D4 at 1/5. A hook without documented events caps D3 at 2/5.
When evaluating a RULE.md:
D3 additions:
metadata.scope specified (global/project)metadata.applies_to.languages specified if scope is language-specificD4 additions:
Scoring impact: A rule with only vague principles and no concrete requirements caps D4 at 2/5.
When evaluating a COMMAND.md:
D3 additions:
command.syntax specified with argument documentationcommand.handler specified (inline/command)D4 additions:
Scoring impact: A command without a step-by-step workflow caps D4 at 2/5.
When evaluating a UTILITY.md:
D3 additions:
utility.runtime specified (python/node/shell)utility.entry_point specified and the file existsutility.executable specifiedD4 additions:
Scoring impact: A utility without a working entry_point script caps D4 at 1/5.
When evaluating a PRESET.md:
D3 additions:
preset.packages specified with at least one type sectionpreset.compatibility.platforms specifiedD4 additions:
Scoring impact: A preset referencing non-existent packages is a CRITICAL finding.
| Severity | Criteria | Score Impact |
|---|---|---|
| CRITICAL | Package cannot activate or breaks on load — missing frontmatter, name mismatch, invalid YAML, non-existent preset references | Caps overall score at 40% |
| HIGH | Significant trigger gap or missing core section — no workflow, no error handling, zero trigger phrases, missing handler script | Caps affected dimension at 3/5 |
| MEDIUM | Weak coverage, shallow content, few trigger synonyms, missing type-specific metadata | Dimension needs improvement but functions |
| LOW | Minor polish — formatting inconsistencies, slightly short description, missing calibration rules | Fix when convenient |
SKILL.md = skill, AGENT.md = agent, HOOK.md = hook, RULE.md = rule,
COMMAND.md = command, UTILITY.md = utility, PRESET.md = preset.
If the path points to a definition file directly, use its parent directory.skills/, agents/, hooks/,
rules/, commands/, utilities/, and presets/ that contain a recognized
definition file. If a --type filter is specified, restrict to that type's directory.For each package under evaluation:
name and description fields. If YAML parsing
fails, record a CRITICAL finding and score D1 and D6 as 1/5.references/ directory for existence and contents. Verify every file
referenced in the definition file body exists on disk.../ patterns pointing outside the package directory). Flag any as CRITICAL D5
findings.references/evaluation-rubric.md.references/evaluation-rubric.md.Overall% = (sum of dimension_score x weight) / 5 x 100.| Range | Verdict |
|---|---|
| 90-100% | Exemplary |
| 80-89% | Strong |
| 70-79% | Adequate |
| 60-69% | Needs Work |
| Below 60% | Deficient |
Generate the structured output using the appropriate template below.
## Package Audit: {package-name} ({type})
| Dimension | Score | Weight | Weighted | Key Finding |
|-----------|-------|--------|----------|-------------|
| D1: Frontmatter Quality | X/5 | 20% | X.XXX | ... |
| D2: Trigger Coverage | X/5 | 18% | X.XXX | ... |
| D3: Structural Completeness | X/5 | 20% | X.XXX | ... |
| D4: Content Depth | X/5 | 22% | X.XXX | ... |
| D5: Consistency & Integrity | X/5 | 12% | X.XXX | ... |
| D6: CONTRIBUTING Compliance | X/5 | 8% | X.XXX | ... |
**Overall: XX% — {Verdict}**
### Type-Specific Findings
[Findings from type-specific signal evaluation, if any.]
### Findings
[Severity-sorted list. Each entry includes dimension tag, severity, description,
evidence, and recommendation.]
- **[CRITICAL] D5:** ...
- **[HIGH] D2:** ...
- **[MEDIUM] D4:** ...
- **[LOW] D3:** ...
### Score Calculation
D1: {score} x 0.20 = {result}
D2: {score} x 0.18 = {result}
D3: {score} x 0.20 = {result}
D4: {score} x 0.22 = {result}
D5: {score} x 0.12 = {result}
D6: {score} x 0.08 = {result}
Sum = {weighted_sum}
Overall = {weighted_sum} / 5 x 100 = {percentage}% — {Verdict}
## Package Repository Audit
| Package | Type | Overall | Verdict | Worst Dimension | Top Issue |
|---------|------|---------|---------|-----------------|-----------|
| {name} | {type} | XX% | {verdict} | {dimension} | {issue} |
| ... | ... | ... | ... | ... | ... |
### Per-Package Summaries
[Condensed Quick Audit for each package: scorecard table, overall score, top 3 findings.
Omit the full Score Calculation section in condensed mode.]
| Problem | Cause | Fix |
|---|---|---|
| Definition file not found in directory | Path incorrect or file missing | Report as CRITICAL; do not attempt evaluation; surface the path and stop |
| Unknown definition file | Directory has no recognized definition file (SKILL.md, AGENT.md, HOOK.md, RULE.md, COMMAND.md, UTILITY.md, PRESET.md) | Skip directory; report as warning in Full Audit; report as CRITICAL in Quick Audit |
| YAML frontmatter parse failure | Invalid YAML syntax (unclosed quotes, bad indentation) | Report as CRITICAL finding; score D1 and D6 as 1/5; continue evaluating the body content where parseable |
references/ directory missing | Package has no reference files | Not an error — score D5 normally; check only that any files referenced in the definition file body actually exist on disk |
references/ exists but referenced file is absent | File path in definition file body doesn't resolve | Record as a CRITICAL D5 finding; missing referenced files cap D5 at 2/5 |
| Empty definition file (zero bytes or whitespace only) | File created but never populated | Treat as CRITICAL; score all dimensions 1/5; overall verdict: Deficient |
references/evaluation-rubric.md not found | Evaluator's own reference file missing | Note the irony; evaluate using the criteria inline in this SKILL.md; flag D5 as a CRITICAL finding |
| Handler script missing for hook | HOOK.md references a handler that doesn't exist | Record as CRITICAL D4 finding; cap D4 at 1/5 |
| Preset references non-existent package | PRESET.md lists a package not in the manifest | Record as CRITICAL finding; caps overall at 40% |