This skill should be used when the user asks to "check session health", "detect failure patterns", "run a health audit", "find agentic loops", "check for spec drift", "verify session quality", or needs to identify and remediate failure patterns in agentic coding sessions including spec drift, sycophantic confirmation, silent failures, tool selection errors, agentic loops, and context degradation.
From agentic-watchdognpx claudepluginhub nbkm8y5/claude-plugins --plugin agentic-watchdogThis skill uses the workspace's default tool permissions.
references/detection_heuristics.mdreferences/failure_patterns.mdreferences/severity_schema.mdSearches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Guides agent creation for Claude Code plugins with file templates, frontmatter specs (name, description, model), triggering examples, system prompts, and best practices.
Detect, diagnose, and remediate failure patterns in agentic coding sessions. Agentic AI tools can fail in subtle ways that are invisible to both the user and the agent itself. This skill provides a systematic framework for identifying six common failure patterns before they compound into wasted work.
Definition: Implementation gradually diverges from requirements as the session progresses. New files appear that no spec mentioned. Requirements get dropped or modified without acknowledgment.
Why it happens: As context accumulates, the agent focuses on the most recent instructions and loses sight of the original specification. Each small deviation is individually reasonable but compounds into significant drift.
How to detect: Compare current implementation against spec artifacts. Look for files with no spec coverage and requirements with no code coverage. See references/detection_heuristics.md for specific signals.
How to remediate: Re-read the original specification. Create an alignment matrix (requirement to code). Address gaps. Remove unspecified code or get explicit user approval to keep it.
Definition: The agent agrees with incorrect assumptions, confirms success without verification, or avoids pushing back on technically questionable requests.
Why it happens: Language models have a bias toward agreement. When the user states something confidently, the agent tends to confirm it rather than question it, even when verification would be trivial.
How to detect: Look for claims of "success" without running verification commands. Check for agreement with contradictory requirements. See references/detection_heuristics.md for specific signals.
How to remediate: Run actual verification commands (tests, build, lint). Question assumptions explicitly. If the user says "this should work," run the command to prove it.
Definition: Commands exit with code 0 but contain errors in their output. Tests pass because they do not actually test anything. Error-handling code swallows exceptions.
Why it happens: Agents often check only the exit code, not the output content. A build that prints deprecation warnings, a test suite where "0 tests ran," or a script that catches and ignores all exceptions all appear to succeed.
How to detect: Parse command output for error keywords even when exit code is 0. Check test counts. Look for empty catch blocks. See references/detection_heuristics.md for specific signals.
How to remediate: Read command output carefully. Verify test counts are non-zero. Ensure error handlers log or re-raise. Re-run suspicious commands with verbose output.
Definition: Using the wrong tool, API, or approach for the task at hand. Examples include using grep when Grep tool is available, writing a file when an edit would suffice, or choosing a heavyweight framework for a simple task.
Why it happens: As context grows, the agent may forget which tools are available or default to familiar patterns that are not optimal for the current environment.
How to detect: Look for Bash commands that duplicate available tool functionality. Check for full file rewrites when targeted edits would work. See references/detection_heuristics.md for specific signals.
How to remediate: Review available tools before acting. Prefer targeted operations (Edit over Write, specific tool over Bash workaround).
Definition: Repeating similar actions without making progress. Editing the same file back and forth. Retrying the same command that already failed. Oscillating between two approaches.
Why it happens: When an approach fails, the agent may try minor variations of the same thing instead of stepping back to reconsider. Without explicit loop detection, this can continue indefinitely.
How to detect: Track file edit frequency -- the same file edited 3+ times in a short span is suspicious. Watch for repeated Bash commands. See references/detection_heuristics.md for specific signals.
How to remediate: Stop. Articulate why the current approach is failing. Consider a fundamentally different strategy. If stuck, ask the user for guidance rather than continuing to loop.
Definition: Losing track of original requirements, decisions, and context as the context window fills up. Making decisions that contradict earlier decisions. Asking the user to repeat information.
Why it happens: As context fills, earlier information gets compressed or falls out of the effective attention window. The agent operates on incomplete context without realizing it.
How to detect: Test recall of original requirements. Look for decisions that contradict earlier ones. Check if the agent is asking for information it already received. See references/detection_heuristics.md for specific signals.
How to remediate: Re-read the original spec or session state file. Write a summary of key decisions to a persistent file. Use the PreCompact hook to checkpoint critical context before compression.
references/failure_patterns.md -- Detailed catalog of all 6 patterns with concrete examples and severity levelsreferences/detection_heuristics.md -- Specific signals and indicators for automated and manual detectionreferences/severity_schema.md -- Severity levels, report format, and health report template.claude/watchdog-report.md