From rkstack
Use when encountering any bug, test failure, or unexpected behavior. Use before proposing fixes. Use when asked to debug, investigate, find root cause, or figure out why something is broken.
npx claudepluginhub mrkhachaturov/ccode-personal-plugins --plugin rkstackThis skill is limited to using the following tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Processes code review feedback technically: verify suggestions against codebase, clarify unclear items, push back if questionable, implement after evaluation—not blind agreement.
# === RKstack Preamble (systematic-debugging) ===
# Read detection cache (written by session-start via rkstack detect)
if [ -f .rkstack/settings.json ]; then
cat .rkstack/settings.json
else
echo "WARNING: .rkstack/settings.json not found — detection cache missing"
fi
# Session-volatile checks (can change mid-session)
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
_HAS_CLAUDE_MD=$([ -f CLAUDE.md ] && echo "yes" || echo "no")
echo "BRANCH: $_BRANCH"
echo "CLAUDE_MD: $_HAS_CLAUDE_MD"
Use the detection cache and preamble output to adapt your behavior:
detection.flowType (web or default). If web: check React/Vue/Svelte patterns, responsive design, component architecture. If default: CLI tools, MCP servers, backend scripts.just commands instead of raw shell.detection.stack for what's in the project and detection.stats for scale (files, code, complexity).detection.repoMode for solo vs collaborative.detection.services for Supabase and other service integrations.ALWAYS follow this structure for every AskUserQuestion call:
_BRANCH value from preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)RECOMMENDATION: Choose [X] because [one-line reason] — always prefer the complete option over shortcuts (see Completeness Principle). Include Completeness: X/10 for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work.A) ... B) ... C) ... — when an option involves effort, show both scales: (human: ~X / CC: ~Y)Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.
AI makes completeness near-free. Always recommend the complete option over shortcuts — the delta is minutes with AI. A "lake" (100% coverage, all edge cases) is boilable; an "ocean" (full rewrite, multi-quarter migration) is not. Boil lakes, flag oceans.
Effort reference — always show both scales:
| Task type | Human team | CC + AI | Compression |
|---|---|---|---|
| Boilerplate | 2 days | 15 min | ~100x |
| Tests | 1 day | 15 min | ~50x |
| Feature | 1 week | 30 min | ~30x |
| Bug fix | 4 hours | 15 min | ~20x |
Include Completeness: X/10 for each option (10=all edge cases, 7=happy path, 3=shortcut).
When completing a skill workflow, report status using one of:
It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."
Bad work is worse than no work. You will not be penalized for escalating.
Escalation format:
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.
Fixing symptoms creates whack-a-mole debugging. Every fix that doesn't address root cause makes the next bug harder to find. Random fixes waste time and create new bugs. Quick patches mask underlying issues.
If you haven't completed Phase 1, you cannot propose fixes. This is not optional. This is not flexible. Violating the letter of the rules is violating the spirit of the rules.
Use for ANY technical issue:
Use this ESPECIALLY when:
Don't skip when:
Gather context before forming any hypothesis. No guessing. No fixing. Just evidence.
Read the error messages, stack traces, and reproduction steps completely. Don't skip past warnings. They often contain the exact answer.
Note:
If the user hasn't provided enough context, ask ONE question at a time via AskUserQuestion:
I need more context about this bug before I can investigate.
RECOMMENDATION: Choose A — the reproduction steps will help me find the root cause faster.
A) Describe the exact steps to reproduce — what did you do, what happened, what did you expect?
B) Paste the full error output / stack trace
C) Point me to the relevant file or test
Trace the code path from the symptom back to potential causes. Use Grep to find all references. Use Read to understand the logic. Do NOT skim — read every line in the critical path.
For multi-component systems (API -> service -> database, CI -> build -> signing), trace data across EVERY boundary:
For EACH component boundary:
- What data enters the component?
- What data exits the component?
- Is environment/config propagating correctly?
- What is the state at each layer?
# What changed recently in the affected files?
git log --oneline -20 -- <affected-files>
# Full diff of recent changes
git diff HEAD~5 -- <affected-files>
Was this working before? What changed? A regression means the root cause is in the diff. Check:
Can you trigger the bug deterministically? If not, gather MORE evidence before proceeding. Do not guess.
# Run the specific failing test
<test-command> <specific-test-file>
# Or reproduce the specific scenario
<reproduction-steps>
If the bug is intermittent, add instrumentation to capture state on the NEXT occurrence rather than guessing at the cause.
For multi-component systems only. Add diagnostic instrumentation at each layer boundary:
# Example: tracing through a multi-layer system
# Layer 1: Entry point
echo "=== Input at entry: ==="
echo "PARAM_A: ${PARAM_A:-UNSET}"
# Layer 2: Processing
echo "=== State after processing: ==="
echo "RESULT: ${RESULT:-UNSET}"
# Layer 3: Output
echo "=== Final output: ==="
echo "OUTPUT: ${OUTPUT:-UNSET}"
This reveals WHICH layer fails (entry -> processing OK, processing -> output FAILS).
See root-cause-tracing.md in this directory for the complete backward tracing technique — trace from symptom up through the call chain to the original trigger.
"Root cause hypothesis: ..." — a specific, testable claim about what is wrong and why. Not "something is wrong with auth" but "the JWT token is not refreshed when the session cookie expires because refreshToken() only checks localStorage, not the HTTP-only cookie."
After forming your root cause hypothesis, lock edits to the affected module to prevent scope creep. Debug sessions that touch unrelated code create more bugs than they fix.
[ -x "${CLAUDE_PLUGIN_ROOT}/skills/freeze/bin/check-freeze.sh" ] && echo "FREEZE_AVAILABLE" || echo "FREEZE_UNAVAILABLE"
If FREEZE_AVAILABLE: Identify the narrowest directory containing the affected files. Write it to the freeze state file:
STATE_DIR="${CLAUDE_PLUGIN_DATA:-$HOME/.rkstack}"
mkdir -p "$STATE_DIR"
echo "<detected-directory>/" > "$STATE_DIR/freeze-dir.txt"
echo "Debug scope locked to: <detected-directory>/"
Substitute <detected-directory> with the actual directory path (e.g., src/auth/). Tell the user: "Edits restricted to <dir>/ for this debug session. This prevents changes to unrelated code. Run /unfreeze to remove the restriction."
If the bug spans the entire repo or the scope is genuinely unclear, skip the lock and note why.
If FREEZE_UNAVAILABLE: Skip scope lock. Edits are unrestricted.
Check if this bug matches a known pattern before forming your hypothesis:
| Pattern | Signature | Where to look |
|---|---|---|
| Race condition | Intermittent, timing-dependent | Concurrent access to shared state |
| Nil/null propagation | NoMethodError, TypeError, undefined | Missing guards on optional values |
| State corruption | Inconsistent data, partial updates | Transactions, callbacks, hooks |
| Integration failure | Timeout, unexpected response | External API calls, service boundaries |
| Configuration drift | Works locally, fails in staging/prod | Env vars, feature flags, DB state |
| Stale cache | Shows old data, fixes on cache clear | Redis, CDN, browser cache, Turbo |
TODOS.md (if it exists) for related known issuesgit log for prior fixes in the same area — recurring bugs in the same files are an architectural smell, not a coincidenceIf the bug doesn't match a known pattern above, use WebSearch for:
If WebSearch is unavailable, skip this search and proceed with hypothesis testing. If a documented solution or known dependency bug surfaces, present it as a candidate hypothesis in Phase 3.
Before writing ANY fix, verify your hypothesis. Scientific method — one variable at a time.
State it clearly and specifically:
maxConnections defaults to 5 but the request handler opens a new connection per middleware layer" not "something wrong with connections"Add a temporary log statement, assertion, or debug output at the suspected root cause. Run the reproduction. Does the evidence match your hypothesis?
# Add temporary instrumentation — NOT a fix
# Example: log the value at the suspected point of failure
If the hypothesis is confirmed: Proceed to Phase 4 (Implementation).
If the hypothesis is wrong: Before forming the next hypothesis, consider searching for the error.
Sanitize first — strip hostnames, IPs, file paths, SQL fragments, customer identifiers, and any internal/proprietary data from the error message. Search only the generic error type and framework context: "{component} {sanitized error type} {framework version}".
If the error message is too specific to sanitize safely, skip the search. If WebSearch is unavailable, skip and proceed.
Then return to Phase 1. Gather more evidence. Do not guess.
If 3 hypotheses fail, STOP. This is non-negotiable. Use AskUserQuestion:
3 hypotheses tested, none confirmed. This may be an architectural issue
rather than a simple bug.
Hypotheses tested:
1. [hypothesis] — [evidence that disproved it]
2. [hypothesis] — [evidence that disproved it]
3. [hypothesis] — [evidence that disproved it]
RECOMMENDATION: Choose B — 3 failed hypotheses suggests this needs someone
who knows the system architecture. Completeness: 9/10.
A) Continue investigating — I have a new hypothesis: [describe] — Completeness: 7/10
B) Escalate for human review — this needs someone who knows the system — Completeness: 9/10
C) Add logging and wait — instrument the area and catch it next time — Completeness: 5/10
If you catch yourself thinking any of these, you are guessing, not debugging:
Once root cause is confirmed with evidence — not guessed, not assumed, CONFIRMED:
The smallest change that eliminates the actual problem. Not a workaround. Not a guard that catches the symptom downstream. The root cause.
See root-cause-tracing.md in this directory — trace backward through the call chain until you find the original trigger. Fix at the source. Never fix just where the error appears.
Fewest files touched, fewest lines changed. Resist the urge to refactor adjacent code. No "while I'm here" improvements. No bundled refactoring. This is a bug fix, not a cleanup session.
Before applying the fix, write a test that:
# Run the new test — it should FAIL before the fix
<test-command> <new-test-file>
This is TDD applied to debugging. The failing test IS your proof that you understand the bug.
After fixing the root cause, consider adding validation at multiple layers. See defense-in-depth.md in this directory for the four-layer pattern:
Don't stop at one validation point. Make the bug structurally impossible, not just fixed.
# Run ALL tests, not just the one you wrote
<full-test-command>
Paste the output. No regressions allowed. If other tests break, your fix is wrong or incomplete — return to Phase 1 with the new evidence.
If the fix touches >5 files, use AskUserQuestion before proceeding:
This fix touches N files. That's a large blast radius for a bug fix.
Files affected:
- file1.ts (root cause fix)
- file2.ts (defense-in-depth validation)
- ...
RECOMMENDATION: Choose A if the root cause genuinely spans these files.
Completeness: 9/10.
A) Proceed — the root cause genuinely spans these files — Completeness: 9/10
B) Split — fix the critical path now, defer defense-in-depth — Completeness: 7/10
C) Rethink — maybe there's a more targeted approach — Completeness: 5/10
Reproduce the original bug scenario and confirm it's fixed. This is not optional. Do not skip this. Do not say "this should fix it." Verify and prove it.
# Reproduce the original failure scenario — it should now succeed
<reproduction-command>
# Run the full test suite
<full-test-command>
Paste the output.
Output a structured report:
DEBUG REPORT
════════════════════════════════════════
Symptom: [what the user observed]
Root cause: [what was actually wrong — specific, technical]
Fix: [what was changed, with file:line references]
Evidence: [test output, reproduction attempt showing fix works]
Regression test: [file:line of the new test]
Defense layers: [any defense-in-depth validations added]
Related: [TODOS.md items, prior bugs in same area, architectural notes]
Status: DONE | DONE_WITH_CONCERNS | BLOCKED
════════════════════════════════════════
These companion files are part of systematic debugging and available in this directory:
root-cause-tracing.md — Trace bugs backward through the call chain to find the original trigger. Covers the 5-step backward tracing process, adding stack traces for instrumentation, and finding which test causes pollution.defense-in-depth.md — Add validation at multiple layers (entry, business logic, environment, debug) after finding root cause. Make the bug structurally impossible, not just fixed.Related skills: