Analyze skill execution metrics and identify unstable or underperforming skills using continual learning analysis.
Analyzes skill execution metrics and identifies unstable or underperforming skills using continual learning analysis.
/plugin marketplace add athola/claude-night-market/plugin install parseltongue@claude-night-marketAnalyze skill execution metrics and identify unstable or underperforming skills using continual learning analysis.
/skill-review # Show all skill metrics
/skill-review --all-plugins # Aggregate by plugin
/skill-review --skill <plugin:skill> # Specific skill deep-dive
/skill-review --unstable-only # Only skills with stability_gap > 0.3
/skill-review --top <n> # Top N by execution count
/skill-review --recommendations # Get improvement recommendations
For each skill, calculates:
| Metric | Description |
|---|---|
| Execution Count | Total invocations |
| Success Rate | Percentage of successful executions |
| Average Duration | Mean execution time |
| Worst-Case Accuracy | Lowest accuracy in execution history |
| Stability Gap | average_accuracy - worst_case_accuracy |
The stability gap detects inconsistent skills - those that work sometimes but fail unpredictably:
stability_gap = average_accuracy - worst_case_accuracy
Interpretation:
Why it matters: Traditional metrics only measure average performance. Stability gap catches skills that are "usually fine but sometimes catastrophically wrong."
/skill-review
Output:
Skill Performance Review
========================
abstract:skill-auditor:
Executions: 47
Success rate: 94%
Average duration: 134ms
Worst-case accuracy: 0.85
Stability gap: 0.09 ✓ (stable)
sanctum:pr-review:
Executions: 23
Success rate: 91%
Average duration: 2.1s
Worst-case accuracy: 0.82
Stability gap: 0.09 ✓ (stable)
imbue:proof-of-work:
Executions: 31
Success rate: 68%
Average duration: 1.8s
Worst-case accuracy: 0.40
Stability gap: 0.38 ⚠️ (unstable)
/skill-review --unstable-only
Output:
Skills with Stability Gap > 0.3
================================
imbue:proof-of-work
Stability gap: 0.38
Issue: High failure rate (32%)
Suggestion: Review recent failures with /memory-palace:skill-logs --skill imbue:proof-of-work --failures-only
your-custom:my-skill
Stability gap: 0.45
Issue: Inconsistent performance
Suggestion: Check error patterns in execution logs
/skill-review --skill abstract:skill-auditor
Output:
Skill Analysis: abstract:skill-auditor
======================================
Summary:
Total executions: 47
First seen: 2025-01-01
Last used: 2025-01-08
Performance:
Success rate: 94% (44/47)
Average duration: 134ms
Median duration: 128ms
95th percentile: 245ms
Stability:
Worst-case accuracy: 0.85
Stability gap: 0.09 ✓
Trend: Improving (last 10: 100% success)
Recent Failures (3):
2025-01-05: "Skill file not found"
2025-01-03: "Invalid frontmatter format"
2025-01-02: "Timeout exceeded"
Recommendation: HEALTHY - No action needed
/skill-review --all-plugins
Output:
Metrics by Plugin
=================
abstract:
Skills tracked: 12
Total executions: 1,234
Avg success rate: 92%
Skills with issues: 2
sanctum:
Skills tracked: 8
Total executions: 856
Avg success rate: 89%
Skills with issues: 1
memory-palace:
Skills tracked: 6
Total executions: 423
Avg success rate: 95%
Skills with issues: 0
/skill-review --recommendations
Output:
Skill Improvement Recommendations
=================================
CRITICAL (stability_gap > 0.5):
None
NEEDS ATTENTION (stability_gap > 0.3):
imbue:proof-of-work
- Review last 5 failures for common patterns
- Consider adding input validation
- Check for environmental dependencies
MONITORING (stability_gap 0.2 - 0.3):
conjure:template-engine
- Performance variance detected
- Consider caching or optimization
HEALTHY (stability_gap < 0.2):
45 skills performing well
Reads from metrics stored by memory-palace:
~/.claude/skills/logs/*/*/*.jsonl~/.claude/skills/logs/.history.json| Command | Plugin | Description |
|---|---|---|
/skill-logs | memory-palace | View raw execution history |
/skill-review | pensive | Analyze metrics and stability |
/skill-history | pensive | Recent executions with context |
Default: Markdown with color-coded indicators
JSON: Add --format json for programmatic access
Uses the continual evaluation approach from Avalanche (ICLR 2023) to detect:
This enables proactive skill improvement before users notice problems.