Structured four-phase debugging workflow (hypothesis → reproduce → isolate → fix) with mandatory context-hygiene pause-and-resume after N failed attempts.
From workflow-orchestrationnpx claudepluginhub mikecubed/agent-orchestration --plugin workflow-orchestrationThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Implements structured self-debugging workflow for AI agent failures: capture errors, diagnose patterns like loops or context overflow, apply contained recoveries, and generate introspection reports.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Use this skill when a developer is stuck on a non-obvious bug and needs a structured, repeatable process for moving from symptom to root cause to verified fix.
Persistent team, squad, or fleet-style long-lived orchestration is out of scope for this skill. Use a separate orchestration layer if persistent coordination is needed.
This skill is for systematic fault isolation and root-cause analysis. It is not a general task runner or a substitute for a working test suite.
Activate when the developer asks for things like:
Also activate when:
Before you start, identify:
max-failed-attempts — the number of failed hypotheses before the skill triggers a
context-hygiene pause (default: 3);.agent/SESSION.md).If max-failed-attempts is not provided by the developer, use 3.
Use separate roles for:
All roles may receive the discovery brief as factual context. Do not share the investigator's working hypotheses with the reviewer before a fix is proposed — the reviewer must assess the fix independently.
Resolve the active model for each role using this priority chain:
Project config — look for the runtime-specific config file in the current project root:
.copilot/models.yaml.claude/models.yamlRead the implementer, reviewer, and scout keys directly. If a key is absent, fall
back to the baked-in default for that role.
Session cache — if models were already confirmed earlier in this session, reuse them.
Baked-in defaults — if neither config file nor session cache exists, use the defaults below, ask the developer to confirm or override once, then cache for the session.
| Runtime | Role | Default model |
|---|---|---|
| Copilot CLI | Investigator | claude-opus-4.6 |
| Copilot CLI | Reviewer | gpt-5.4 |
| Copilot CLI | Scout | claude-haiku-4.5 |
| Claude Code | Investigator | claude-opus-4.6 |
| Claude Code | Reviewer | claude-opus-4.6 |
| Claude Code | Scout | claude-haiku-4.5 |
Each hypothesis must:
Do not proceed to the next hypothesis until the current one is confirmed or explicitly ruled out with evidence.
Once a hypothesis is ruled out, record it in ## Failed Hypotheses in .agent/SESSION.md
with:
A hypothesis in ## Failed Hypotheses must never be retried, including in resumed
sessions. Before forming new hypotheses, scan this section and treat all listed entries as
definitively ruled out.
When the failed-attempt counter reaches max-failed-attempts:
.agent/SESSION.md, including every ruled-out hypothesis in
## Failed Hypotheses marked DO-NOT-RETRY.## Failed Hypotheses);## Failed Hypotheses.Before applying any fix:
Do not apply a fix that cannot be validated by a test or a reproducible sequence.
Run one lightweight scout pass to produce a factual context brief:
Use the discovery brief template from docs/workflow-artifact-templates.md.
Skip condition: Skip the scout when a complete factual brief already exists (e.g., the developer provided full reproduction steps, error text, and affected files). Record the skip reason in the brief.
Based on the brief, form a ranked list of hypotheses ordered by:
Announce the hypothesis list to the developer before testing any hypothesis. Allow the developer to add, remove, or reorder before proceeding.
Write .agent/SESSION.md at the end of this phase as a complete, schema-compliant
document (see docs/session-md-schema.md). All five YAML frontmatter fields are required
on every write — do not write a partial file:
current-task: the bug description being investigatedcurrent-phase: "hypothesis"next-action: "begin reproduction"workspace: the active branch or PR referencelast-updated: current ISO-8601 timestampInclude all five ## sections (Decisions, Files Touched, Open Questions, Blockers,
Failed Hypotheses), updating each with the current session state. An empty body is
acceptable for sections with nothing to record yet.
Attempt to reproduce the bug using the top hypothesis.
## Failed Hypotheses, mark the
hypothesis ruled out, and return to Phase 1 to form the next hypothesis.Write .agent/SESSION.md at the end of this phase as a complete, schema-compliant
document (see docs/session-md-schema.md). All five YAML frontmatter fields are required
on every write — do not write a partial file:
current-task: the bug description being investigatedcurrent-phase: "reproduce"next-action: "begin isolation"workspace: the active branch or PR referencelast-updated: current ISO-8601 timestampInclude all five ## sections (Decisions, Files Touched, Open Questions, Blockers,
Failed Hypotheses), updating each with the current session state. An empty body is
acceptable for sections with nothing to record yet.
Narrow the reproduction to the smallest failing case:
## Failed Hypotheses if they were previously considered as causes.## Failed Hypotheses, return
to Phase 1 for the next hypothesis, and increment the failed-attempt counter.max-failed-attempts is reached: trigger the context-hygiene cycle (see Core
Rules §3) before continuing.Write .agent/SESSION.md at the end of this phase as a complete, schema-compliant
document (see docs/session-md-schema.md). All five YAML frontmatter fields are required
on every write — do not write a partial file:
current-task: the bug description being investigatedcurrent-phase: "isolate"next-action: "form fix proposal"workspace: the active branch or PR referencelast-updated: current ISO-8601 timestampInclude all five ## sections (Decisions, Files Touched, Open Questions, Blockers,
Failed Hypotheses), updating each with the current session state. An empty body is
acceptable for sections with nothing to record yet.
.agent/SESSION.md ## Decisions.Write .agent/SESSION.md at the end of this phase as a complete, schema-compliant
document (see docs/session-md-schema.md). All five YAML frontmatter fields are required
on every write — do not write a partial file:
current-task: the bug description being investigatedcurrent-phase: "fix-complete"next-action: "done"workspace: the active branch or PR referencelast-updated: current ISO-8601 timestampInclude all five ## sections (Decisions, Files Touched, Open Questions, Blockers,
Failed Hypotheses), updating each with the current session state including the root-cause
note in ## Decisions. An empty body is acceptable for sections with nothing to record.
If the proposed fix does not resolve the bug: do not mark the hypothesis as confirmed.
Increment the failed-attempt counter, record the failed fix attempt in ## Failed Hypotheses,
and return to Phase 1.
When triggered (failed-attempt counter = max-failed-attempts):
Write .agent/SESSION.md as a complete, schema-compliant document (see
docs/session-md-schema.md) — do not write a partial file. All five YAML frontmatter
fields are required:
current-task: the bug description being investigatedcurrent-phase: "context-hygiene-pause"next-action: "resume in fresh session — load confirmed steps + Failed Hypotheses only"workspace: the active branch or PR referencelast-updated: current ISO-8601 timestampInclude all five ## sections (Decisions, Files Touched, Open Questions, Blockers,
Failed Hypotheses), populating ## Failed Hypotheses with every ruled-out hypothesis
(DO-NOT-RETRY). Announce the pause. Begin a fresh session loading only the allowed context
(see Core Rules §3). The fresh session MUST NOT re-attempt any hypothesis in
## Failed Hypotheses.
A phase is not complete until:
current-phase value for this phase;## Failed Hypotheses.Before declaring the debugging session done, confirm ALL of the following.
## Decisions — PASS / FAIL## Failed Hypotheses — PASS / FAILIf any item is FAIL: surface the failing item and do not declare the session done.
Before stopping, produce a durable debugging summary that records:
## Failed Hypotheses list from SESSION.md;"Durable" means written to a repository-appropriate sink — a committed document, a PR comment, or an issue — not only to chat. Chat-only summaries do not satisfy this requirement.
Symptom: UserService.authenticate returns null for valid credentials in CI but not locally.
max-failed-attempts: 3
Phase 1 — Hypothesis:
H1: DB fixture seed order differs between local and CI (most likely)
H2: Environment variable AUTH_SECRET is unset in CI (cheap to check)
H3: Session store uses in-memory TTL that expires faster in CI
Developer approved order: H2 → H1 → H3
Phase 2 — Reproduce (H2):
Added AUTH_SECRET check to CI config. Bug still present. H2 ruled out.
Failed hypotheses: 1 / 3
Phase 1 (revised) → Phase 2 — Reproduce (H1):
Ran seed script with explicit order. Bug reproduced reliably.
Phase 3 — Isolate:
Minimal case: UserService.authenticate called before seed index 3 (admin fixture) inserts.
Confirmed: all other authenticate calls pass. Race condition in seed ordering.
Phase 4 — Fix:
Root cause: CI seed runs asynchronously; admin fixture sometimes missing at query time.
Fix: await seed completion before authenticate call in CI setup hook.
Failing test: auth.test.js > authenticates admin user
After fix: test passes, 0 regressions.
Committed: "fix: await seed completion in CI to prevent auth race condition"
Durable debugging summary: committed to docs/debug-session-2025-01-15.md