Skill

failure-detection

This skill should be used when the user asks to "check session health", "detect failure patterns", "run a health audit", "find agentic loops", "check for spec drift", "verify session quality", or needs to identify and remediate failure patterns in agentic coding sessions including spec drift, sycophantic confirmation, silent failures, tool selection errors, agentic loops, and context degradation.

From agentic-watchdog

Install

Run in your terminal

npx claudepluginhub nbkm8y5/claude-plugins --plugin agentic-watchdog

Tool Access

This skill uses the workspace's default tool permissions.

Supporting Assets

View in Repository

references/detection_heuristics.md

references/failure_patterns.md

references/severity_schema.md

Skill Content

Similar Skills

prompt-lookup

Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.

prompts.chat

157.6k

skill-lookup

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

157.6k

Agent Development

6 files

Guides agent creation for Claude Code plugins with file templates, frontmatter specs (name, description, model), triggering examples, system prompts, and best practices.

plugin-dev

83.2k

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitMar 27, 2026

Actions

View Source View Plugin View on GitHub View README

Failure Detection

Detect, diagnose, and remediate failure patterns in agentic coding sessions. Agentic AI tools can fail in subtle ways that are invisible to both the user and the agent itself. This skill provides a systematic framework for identifying six common failure patterns before they compound into wasted work.

When to Use This Skill

A session feels unproductive or is going in circles
You suspect the agent has lost track of the original requirements
Code is being written that nobody asked for
Tests pass but the software does not actually work
The agent agrees with everything without pushing back
Context is about to be compressed (PreCompact hook)

The Six Failure Patterns

1. Spec Drift

Definition: Implementation gradually diverges from requirements as the session progresses. New files appear that no spec mentioned. Requirements get dropped or modified without acknowledgment.

Why it happens: As context accumulates, the agent focuses on the most recent instructions and loses sight of the original specification. Each small deviation is individually reasonable but compounds into significant drift.

How to detect: Compare current implementation against spec artifacts. Look for files with no spec coverage and requirements with no code coverage. See references/detection_heuristics.md for specific signals.

How to remediate: Re-read the original specification. Create an alignment matrix (requirement to code). Address gaps. Remove unspecified code or get explicit user approval to keep it.

2. Sycophantic Confirmation

Definition: The agent agrees with incorrect assumptions, confirms success without verification, or avoids pushing back on technically questionable requests.

Why it happens: Language models have a bias toward agreement. When the user states something confidently, the agent tends to confirm it rather than question it, even when verification would be trivial.

How to detect: Look for claims of "success" without running verification commands. Check for agreement with contradictory requirements. See references/detection_heuristics.md for specific signals.

How to remediate: Run actual verification commands (tests, build, lint). Question assumptions explicitly. If the user says "this should work," run the command to prove it.

3. Silent Failures

Definition: Commands exit with code 0 but contain errors in their output. Tests pass because they do not actually test anything. Error-handling code swallows exceptions.

Why it happens: Agents often check only the exit code, not the output content. A build that prints deprecation warnings, a test suite where "0 tests ran," or a script that catches and ignores all exceptions all appear to succeed.

How to detect: Parse command output for error keywords even when exit code is 0. Check test counts. Look for empty catch blocks. See references/detection_heuristics.md for specific signals.

How to remediate: Read command output carefully. Verify test counts are non-zero. Ensure error handlers log or re-raise. Re-run suspicious commands with verbose output.

4. Tool Selection Errors

Definition: Using the wrong tool, API, or approach for the task at hand. Examples include using grep when Grep tool is available, writing a file when an edit would suffice, or choosing a heavyweight framework for a simple task.

Why it happens: As context grows, the agent may forget which tools are available or default to familiar patterns that are not optimal for the current environment.

How to detect: Look for Bash commands that duplicate available tool functionality. Check for full file rewrites when targeted edits would work. See references/detection_heuristics.md for specific signals.

How to remediate: Review available tools before acting. Prefer targeted operations (Edit over Write, specific tool over Bash workaround).

5. Agentic Loops

Definition: Repeating similar actions without making progress. Editing the same file back and forth. Retrying the same command that already failed. Oscillating between two approaches.

Why it happens: When an approach fails, the agent may try minor variations of the same thing instead of stepping back to reconsider. Without explicit loop detection, this can continue indefinitely.

How to detect: Track file edit frequency -- the same file edited 3+ times in a short span is suspicious. Watch for repeated Bash commands. See references/detection_heuristics.md for specific signals.

How to remediate: Stop. Articulate why the current approach is failing. Consider a fundamentally different strategy. If stuck, ask the user for guidance rather than continuing to loop.

6. Context Degradation

Definition: Losing track of original requirements, decisions, and context as the context window fills up. Making decisions that contradict earlier decisions. Asking the user to repeat information.

Why it happens: As context fills, earlier information gets compressed or falls out of the effective attention window. The agent operates on incomplete context without realizing it.

How to detect: Test recall of original requirements. Look for decisions that contradict earlier ones. Check if the agent is asking for information it already received. See references/detection_heuristics.md for specific signals.

How to remediate: Re-read the original spec or session state file. Write a summary of key decisions to a persistent file. Use the PreCompact hook to checkpoint critical context before compression.

Reference Files

references/failure_patterns.md -- Detailed catalog of all 6 patterns with concrete examples and severity levels
references/detection_heuristics.md -- Specific signals and indicators for automated and manual detection
references/severity_schema.md -- Severity levels, report format, and health report template

Integration Points

PreCompact hook: Runs a full health audit before context compression, writing results to .claude/watchdog-report.md
Stop hook: Verifies completion claims with actual verification before allowing the session to end
PostToolUse(Bash) hook: Scans command output for silent failure indicators
PostToolUse(Write|Edit) hook: Checks written files against spec artifacts for drift
/health-check command: On-demand full session audit
/drift-report command: Requirement-by-requirement alignment check
/loop-detect command: Analyze recent tool calls for repetition patterns