Structured root cause analysis methodology with three-test isolation and prevention analysis
Applies a structured three-test methodology to isolate true root causes from contributing factors. Use this when investigating failures to distinguish between symptoms, amplifiers, and the single factor that passes all three tests (counterfactual, sufficiency, necessity).
/plugin marketplace add violetio/violet-ai-plugins/plugin install v@violetThis skill inherits all available tools. When active, it can use any tool Claude has access to.
Evidence-based methodology for isolating true root causes from contributing factors.
Root cause analysis is evidence gathering, not solution design.
The RCA process produces:
The RCA process does NOT produce:
Most teams conflate root cause with contributing factors. Use these three tests to isolate the true root cause:
"If this factor didn't exist, would the failure still have occurred?"
"Is this factor alone sufficient to cause the failure?"
"Is this factor necessary for the failure to occur?"
| Factor | Counterfactual | Sufficiency | Necessity | Classification |
|---|---|---|---|---|
| True Root Cause | NO (prevents failure) | YES | YES | ROOT CAUSE |
| Contributing Factor | YES (still fails) | NO | NO | Contributing |
| Necessary Condition | NO | NO | YES | Enabler |
| Amplifying Factor | YES | NO | NO | Amplifier |
The root cause is the factor that passes ALL THREE tests.
Goal: Establish clear boundaries around what failed and what didn't.
Required Outputs:
Checkpoint Questions:
Goal: Collect all relevant data without interpretation.
Evidence Types:
Rules:
Checkpoint Questions:
Goal: Build the chain of events that led to failure.
Process:
Output Format:
[Triggering Event]
↓
[Intermediate Event 1]
↓
[Intermediate Event 2]
↓
[Failure Event]
Checkpoint Questions:
Goal: Apply the three tests to identify the true root cause.
Process:
Classification Output:
| Factor | Counterfactual | Sufficiency | Necessity | Classification |
|---|---|---|---|---|
| [Factor A] | [Result] | [Result] | [Result] | [Type] |
| [Factor B] | [Result] | [Result] | [Result] | [Type] |
Checkpoint Questions:
Goal: Identify WHAT needs to change to prevent recurrence (not HOW to change it).
Categories:
Rules:
Checkpoint Questions:
Critical Question: Why did THIS case fail when similar cases succeed?
This section is REQUIRED for every RCA. It forces examination of:
Template:
## Differential Analysis
### Similar Cases That Succeeded
[List similar situations that didn't fail]
### Key Differences in Failure Case
[What was different about this case?]
### Safeguard Failures
[Why didn't existing protections work?]
### Unique Conditions
[What conditions were present only in this failure?]
# Root Cause Analysis: [Incident Name]
**Date**: [Analysis Date]
**Incident Date**: [When failure occurred]
**Analyst**: [Who conducted RCA]
**Status**: [Draft | Review | Final]
---
## 1. Problem Definition
### Failure Statement
[One sentence describing what failed]
### Scope
- **In Scope**: [What this RCA covers]
- **Out of Scope**: [What this RCA excludes]
### Success Criteria
[What "working correctly" looks like]
### Timeline Bounds
- **Start**: [When failure window began]
- **End**: [When failure was resolved]
---
## 2. Evidence Timeline
| Timestamp | Event | Source | Evidence |
|-----------|-------|--------|----------|
| [Time] | [What happened] | [Log/Person/System] | [Exact data] |
---
## 3. Causal Chain
[Visual chain from trigger to failure]
[Event 1] → [Event 2] → [Event 3] → [FAILURE]
### Chain Narrative
[Prose explanation of how events connected]
---
## 4. Root Cause Isolation
### Candidate Factors
| Factor | Counterfactual | Sufficiency | Necessity | Classification |
|--------|----------------|-------------|-----------|----------------|
| [Factor] | [YES/NO] | [YES/NO] | [YES/NO] | [Type] |
### Root Cause Statement
[The factor that passed all three tests]
### Contributing Factors
[Factors that amplified or enabled but aren't root cause]
---
## 5. Differential Analysis
### Similar Cases That Succeeded
[List similar situations that didn't fail]
### Key Differences
[What was unique about this failure case?]
### Safeguard Failures
[Why didn't existing protections work?]
---
## 6. Prevention Analysis
### Detection Gaps
- [What monitoring would catch this earlier?]
### Process Gaps
- [What process changes would prevent root cause?]
### System Gaps
- [What system changes would eliminate failure mode?]
### Knowledge Gaps
- [What documentation/training gaps contributed?]
---
## Appendix: Raw Evidence
[Attach logs, screenshots, configs, communications]
When conducting RCA with a user, follow this interactive flow:
I'll guide you through a structured Root Cause Analysis. We'll work through 5 phases:
1. Problem Definition
2. Evidence Gathering
3. Causal Chain Construction
4. Root Cause Isolation (using three tests)
5. Prevention Analysis
Let's start with Phase 1: Problem Definition.
Can you describe in one sentence what failed?
At each phase checkpoint, verify completion before proceeding:
Before we move to [Next Phase], let me verify:
- [ ] [Checkpoint 1]
- [ ] [Checkpoint 2]
- [ ] [Checkpoint 3]
[If incomplete]: We're missing [X]. Can you provide [specific ask]?
[If complete]: Great, let's proceed to [Next Phase].
Now let's apply the three tests to isolate the root cause.
For [Factor]:
1. Counterfactual: If [Factor] didn't happen, would the failure still occur?
2. Sufficiency: Is [Factor] alone enough to cause this failure?
3. Necessity: Is [Factor] required for this failure to occur?
Based on your answers: [Classification]
Based on our analysis:
**Root Cause**: [Statement]
**Contributing Factors**: [List]
**Prevention Gaps**: [Categories]
Would you like me to generate the full RCA document?
Merchant integration failed during high-traffic sale event.
| Time | Event | Source |
|---|---|---|
| 09:15 | Traffic spike began | Metrics |
| 09:23 | MAX_COST_EXCEEDED errors started | API logs |
| 09:24 | Retry storm began | Client logs |
| 10:10 | Manual intervention resolved | Incident channel |
[Traffic spike] → [Query complexity exceeded limit] → [MAX_COST_EXCEEDED] → [Client retries] → [Amplified load] → [Extended outage]
| Factor | Counterfactual | Sufficiency | Necessity | Classification |
|---|---|---|---|---|
| Traffic spike | YES (still fails if queries complex) | NO | NO | Amplifier |
| Complex queries | NO (prevents error) | YES | YES | ROOT CAUSE |
| Aggressive retries | YES (still errors) | NO | NO | Amplifier |
| No alerting | YES (still errors) | NO | NO | Enabler |
Root Cause: Query complexity exceeded platform limits under normal traffic patterns.
Why This Merchant?
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.