From superpowers-plus
Diagnoses LLM/prompt behavior issues: tool selection failures, prompt regressions, context window problems, parsing failures. Dispatched by debug-conductor in forked debugging.
npx claudepluginhub bordenet/superpowers-plus --plugin superpowers-plusThis skill uses the workspace's default tool permissions.
> **Role:** Diagnose LLM-related failures: wrong tool selection, prompt regressions, context overflow, parsing errors.
Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.
Share bugs, ideas, or general feedback.
Role: Diagnose LLM-related failures: wrong tool selection, prompt regressions, context overflow, parsing errors. Dispatched by:
debug-conductor— never invoked directly by user. Evidence type:LLMEvidence(seeskills/_shared/evidence-schema.md)
Dispatched by debug-conductor when the incident involves AI/LLM behavior — tool misselection, prompt regressions, context window pressure, or output parsing failures.
| Mode | Symptoms | Investigation Path |
|---|---|---|
| Tool selection failure | Wrong tool invoked; correct tool available | Step 2A: Tool call audit |
| Prompt regression | Behavior changed after prompt/template update | Step 2B: Prompt diff analysis |
| Context overflow | Degraded quality on long conversations | Step 2C: Context window analysis |
| Parsing failure | LLM output can't be parsed by downstream code | Step 2D: Output format audit |
| Hallucination in tool args | Tool called with fabricated parameters | Step 2A + Step 2C |
{ section, before, after, impact }failingAvg vs. succeedingAvg → significant difference?usedTokens / maxTokens > 0.8 → "high utilization zone"Return LLMEvidence to conductor:
{
"toolCalls": [
{ "tool": "send_email", "params": {"to": "customer"}, "success": true, "expected": "make_call" }
],
"promptDiffs": [
{ "section": "make_call description", "before": "Initiate outbound phone call", "after": "Reach out via voice channel", "impact": "Ambiguity increase" }
],
"contextUsage": { "promptTokens": 108000, "maxTokens": 128000, "utilization": 0.84 },
"parsingFailures": []
}
Plus standard evidence wrapper:
| Pattern | Evidence Shape |
|---|---|
| Ambiguous tool description | Misselections cluster around specific tool; description is vague |
| Context window pressure | Misselections correlate with high utilization (>80%) |
| Prompt regression | Behavior change correlates with prompt template deployment |
| Tool argument hallucination | Tool called with plausible but fabricated parameters |
| Format drift | Output structure degrades under high context load |
| Compound failure | 2+ factors required together (e.g., ambiguous description + high context) |
| Mode | Symptom | Recovery |
|---|---|---|
| Prompt red herring | Blaming prompt when model changed | Check model version and deployment first |
| Context window overflow | Subtle truncation not detected | Measure actual token count vs limit |
| Non-determinism | Cannot reproduce intermittent failure | Run multiple trials, report distribution |