From sdd-pipeline
Turns bug symptoms into fixes via regression tests: interviews for details if vague, then builds feedback loops, reproduces, hypothesizes, probes, specs for TDD. For failures, slowdowns, regressions.
npx claudepluginhub eduwxyz/my-awesome-skills --plugin sdd-pipelineThis skill uses the workspace's default tool permissions.
Symptom in, fix-with-regression-test out. An interview if the user opens vague, then five phases, two cardinal rules.
Implements disciplined diagnosis loop for hard bugs and performance regressions: reproduce → minimise → hypothesise → instrument → fix → regression-test. Use for bug reports, failures, or 'diagnose this'.
Enforces scientific-method debugging loop (Observe→Hypothesize→Experiment→Conclude) for non-trivial bugs like crashes, flaky tests, performance regressions, or CI failures.
Enforces root cause investigation for bugs, test failures, unexpected behavior, and performance issues through four phases before proposing fixes.
Share bugs, ideas, or general feedback.
Symptom in, fix-with-regression-test out. An interview if the user opens vague, then five phases, two cardinal rules.
One-line obvious fixes (typo, missing import, off-by-one in a test you just wrote). "Bugs" that are actually missing functionality — that's a feature, use interview-to-spec. Flaky tests caused by infrastructure (CI, network) rather than code under test.
If either slips, back up.
Before building a loop, you need to know what you're trying to reproduce. If the user opened with detail covering the points below, skip this phase. If they opened vague ("there's a bug", "X is broken", or just /diagnose), interview them.
Ask one question at a time. Stop the moment you have enough to attempt Phase 1 — do not gather more than you need.
The minimum set you're trying to fill:
If the answer to a question is in the codebase rather than in the user's head, read the code instead of asking.
When the answers converge, propose a short slug for the bug (e.g. token-expiry-not-rejected). This is the filename for the spec written in Phase 5. Then move to Phase 1.
If after the interview the user still cannot give you enough to reproduce, that is itself a Phase 1 outcome — go to Phase 1 and follow When you genuinely cannot build a loop.
This is the skill. Everything downstream is mechanical once you have a fast, deterministic, agent-runnable pass/fail signal for the bug. Without one, no amount of staring at code finds the cause.
Spend disproportionate effort here. Be aggressive. Refuse to give up.
git bisect run works.Treat the loop as a product. Once you have a loop, ask:
A 30-second flaky loop is barely a loop. A 2-second deterministic loop is a debugging superpower.
Goal isn't a clean repro — it's a higher reproduction rate. Loop the trigger 100×, parallelise, narrow timing windows, inject sleeps. A 50%-flake bug is debuggable; 1% is not — keep raising the rate until it is.
Stop and say so explicitly. List what you tried. Ask the user for one of:
Do not proceed to Phase 3 without a loop.
Run the loop. Watch the bug appear. Confirm:
Do not proceed until you reproduce the bug.
Generate 3–5 ranked hypotheses before testing any. Single-hypothesis generation anchors on the first plausible idea.
Each hypothesis must be falsifiable — state the prediction it makes:
If
<X>is the cause, then changing<Y>will make the bug disappear / changing<Z>will make it worse.
If you cannot state the prediction, the hypothesis is a vibe — discard or sharpen it.
Show the ranked list to the user before probing. They often have domain knowledge that re-ranks instantly ("we just deployed a change to #3"), or know hypotheses they've already ruled out. Cheap checkpoint, big time saver. Do not block — proceed with your ranking if the user is AFK.
Each probe maps to a specific prediction from Phase 3. Change one variable at a time.
Tool order:
Tag every debug log with a unique prefix, e.g. [DEBUG-a4f2]. Cleanup at the end becomes a single grep. Untagged logs survive; tagged logs die.
Perf branch. For performance regressions, logs lie. Establish a baseline measurement (timing harness, performance.now(), profiler, query plan), then bisect. Measure first.
By the end of Phase 4 you have the root cause — the hypothesis from Phase 3 that survived.
Synthesise everything into spec/<slug>.md. The spec is the input contract for tdd.
Use these sections:
# <slug>
## Symptom
What the user observed (error message, wrong output, slow timing).
## Reproduction
The deterministic trigger from Phase 1. A test name, a curl, a script — whatever the loop is.
## Root cause
What is actually wrong. The hypothesis from Phase 3 that survived Phase 4.
## Fix
What will change to correct the cause. One paragraph.
## Acceptance criteria
- A new regression test (named `<test_name>`) fails without the fix and passes with it.
- Behaviour Y still works.
- (Add more as needed — keep them observable, not internal.)
## Related risks
Other places in the code that share the pattern or module and may carry the same bug. Listed for follow-up, not for this fix.
Then hand off to tdd. The spec's Reproduction + Acceptance criteria become the test queue.
If Phase 4 surfaces an architectural issue (no good test seam, tangled callers, hidden coupling), capture it under Related risks — but do not rewrite architecture during diagnose. Make that recommendation after the fix lands.
[DEBUG-...] instrumentation removed (grep the prefix).You hand off to tdd with spec/<slug>.md written. tdd runs red-green-refactor against the acceptance criteria. After tdd is green, the work joins the normal review / PR flow.