Systematic gap analysis for claimed vs actual work completion. Uses 100+ sequential thoughts to identify assumptions, partial completions, missing components, and rationalization patterns. Validates completion claims against original plans, detects scope deviations, reveals quality gaps. Essential for self-assessment before declaring work complete. Use when: claiming completion, final reviews, quality audits, detecting rationalization patterns in own work.
Systematic gap analysis using 100+ sequential thoughts to identify discrepancies between claimed and actual work completion. Triggers automatically before any completion declaration, final commit, or quality audit to catch assumptions, partial work, and rationalization patterns that mental reviews miss.
/plugin marketplace add krzemienski/shannon-framework/plugin install shannon@shannon-frameworkThis skill is limited to using the following tools:
Purpose: Systematic gap analysis using 100+ sequential thoughts to identify discrepancies between claimed completion and actual delivery. Prevents premature completion declarations by revealing assumptions, partial work, missing components, and rationalization patterns.
Core Value: Catches the moment when you're about to claim "100% complete" on 50% completion.
Key Innovation: Self-assessment protocol that replicates critical external review, catching gaps before they become credibility issues.
Use this skill when:
❌ "Work looks complete" → Appearances deceive, systematic check required ❌ "I'm confident it's done" → Confidence without verification is overconfidence ❌ "Takes too long" → 30-minute reflection prevents hours of rework ❌ "Already did self-review" → Mental review misses 40-60% of gaps ❌ "User didn't explicitly ask" → Professional responsibility to verify completion
CRITICAL: Agents systematically skip honest reflection to avoid discovering gaps. Below are the 6 most common rationalizations detected in baseline testing, with mandatory counters.
Example: Agent finishes implementing 8 features, thinks "all features done", declares complete without checking original spec that required 12 features
COUNTER:
Rule: Run systematic reflection before ANY completion claim. No exceptions.
Example: Agent thinks "reflection would take 30 minutes, I'll just commit and fix gaps if reported"
COUNTER:
Rule: Reflection is time investment with 800% ROI. Always worth it.
Example: Plan requires 16 tasks, agent completes 8 high-quality tasks, declares success based on quality not quantity
COUNTER:
Rule: Quality AND quantity both matter. Track both separately.
Example: Agent mentally aware of incomplete work but doesn't document it, commits with "complete" claim anyway
COUNTER:
Rule: If gap exists, document it. Mental awareness insufficient.
Example: Agent ships work with gaps, user doesn't immediately comment, agent assumes gaps acceptable
COUNTER:
Rule: Disclose gaps before user discovers them. Builds trust.
Example: Agent subconsciously avoids reflection because it might require redoing work
COUNTER:
Rule: If you're avoiding reflection, that's WHY you need it most.
Objective: Understand what was actually requested
Process:
1. Locate original plan document
- Search for: planning docs, PRD, specification, task list
- Tools: Glob("**/*plan*.md"), Grep("## Phase", "### Task")
2. Read plan COMPLETELY
- Count: Total tasks, phases, hours estimated
- Parse: Each task's deliverables, acceptance criteria
- Tool: Read (entire plan, don't skim)
3. Extract requirements
- Deliverables: What files/docs should exist?
- Quality criteria: What standards specified?
- Dependencies: What must be done before what?
- Time budget: How much time allocated?
4. Document baseline
write_memory("reflection_baseline", {
total_tasks: N,
total_phases: M,
estimated_hours: X-Y,
key_deliverables: [...]
})
Output: Complete understanding of original scope
Duration: 10-15 minutes
Objective: Catalog what was actually completed
Process:
1. List all commits made
- Tool: Bash("git log --oneline --since='session_start'")
- Parse: Commit messages for deliverables
2. Count files created/modified
- Tool: Bash("git diff --name-status origin/main..HEAD")
- Categorize: New files, modified files, deleted files
3. Measure lines added
- Tool: Bash("git diff --stat origin/main..HEAD")
- Calculate: Total lines added, per file
4. Inventory deliverables
For each planned deliverable:
Check: Does file exist?
Check: Does content match requirements?
Classify: COMPLETE, PARTIAL, NOT_DONE
5. Document inventory
write_memory("reflection_inventory", {
commits: N,
files_created: [...],
files_modified: [...],
lines_added: X,
deliverables_complete: [...],
deliverables_partial: [...],
deliverables_missing: [...]
})
Output: Complete accounting of delivered work
Duration: 10-15 minutes
Objective: Systematically identify ALL gaps between plan and delivery
Process:
Use Sequential MCP for structured analysis:
1. Initialize reflection (thoughts 1-10)
- Recall plan scope
- Recall delivered work
- Set up comparison framework
2. Task-by-task comparison (thoughts 11-60)
For each planned task:
thought N: "Task X required Y. I delivered Z. Gap analysis: ..."
thought N+1: "Why did I skip/modify this task? Rationalization: ..."
3. Quality dimension analysis (thoughts 61-80)
- Testing methodology gaps
- Validation gaps
- Verification gaps
- Documentation completeness gaps
4. Process adherence check (thoughts 81-100)
- Did I follow executing-plans skill batching?
- Did I use recommended sub-skills?
- Did I apply Shannon principles to Shannon work?
- Did I wait for user feedback when uncertain?
5. Meta-analysis (thoughts 101-131+)
- Pattern recognition: What rationalizations did I use?
- Self-awareness: Am I still rationalizing in this reflection?
- Credibility check: Did I overclaim in commits/docs?
- Solution space: What needs fixing vs what's acceptable?
Minimum 100 thoughts, extend to 150+ if complex project
Output: Comprehensive gap catalog with root cause analysis
Duration: 20-30 minutes
Objective: Identify where you rationalized away work
Common Rationalization Patterns:
1. "Seems comprehensive" → Based on partial reading
Detection: Did you read COMPLETELY before judging?
2. "Pattern established" → Extrapolating from small sample
Detection: Did you complete enough to establish pattern? (usually need 5+ examples, not 3)
3. "Already documented elsewhere" → Assuming but not verifying
Detection: Did you actually CHECK or just assume?
4. "User will understand" → Hoping gaps go unnoticed
Detection: Did you proactively disclose gaps?
5. "Close enough to target" → Percentage substitution
Detection: 716 lines ≠ 3,500 lines (20% ≠ 100%)
6. "Quality over quantity" → Justifying incomplete scope
Detection: User asked for quantity (16 skills) not "best quality 3 skills"
For Each Rationalization Found:
Document:
- What I told myself
- What I actually did
- What plan required
- Gap size
- Whether fixable
Output: Rationalization inventory with honest labeling
Duration: 10 minutes
Objective: Quantify actual completion honestly
Algorithm:
1. Score each task:
COMPLETE: 100% (fully met requirements)
PARTIAL: 50% (significant work but incomplete)
NOT_DONE: 0% (not started or minimal work)
2. Calculate weighted completion:
total_points = Σ(task_score × task_weight)
max_points = Σ(100% × task_weight)
completion_percentage = (total_points / max_points) × 100
3. Validate against time investment:
time_spent / total_estimated_time should ≈ completion_percentage
If mismatch >20%: investigate (either underestimated or overclaimed)
4. Compare claims vs reality:
claimed_completion (from commits/docs)
actual_completion (calculated above)
discrepancy = claimed - actual
If discrepancy >10%: CRITICAL (misleading claims)
If discrepancy 5-10%: MODERATE (minor overclaim)
If discrepancy <5%: ACCEPTABLE (honest assessment)
Output: Honest completion percentage + discrepancy analysis
Duration: 10 minutes
Objective: Prioritize gaps by impact
Classification:
CRITICAL (Must Fix):
- Testing methodology flaws (undermines validation claims)
- Incomplete major deliverables (e.g., README 20% of target)
- Broken functionality (hooks untested, might not work)
- Misleading claims in commits (credibility issue)
HIGH (Should Fix):
- Missing planned components (13 skills not enhanced)
- Format deviations from plan (consolidated vs individual)
- Verification steps skipped (end-to-end testing)
MEDIUM (Nice to Fix):
- Documentation link validation
- Additional examples beyond minimum
- Enhanced formatting or structure
LOW (Optional):
- Minor wording improvements
- Supplementary documentation
- Future enhancement ideas
Output: Prioritized gap list with fix estimates
Duration: 10 minutes
Objective: Present findings to user with integrity
Report Structure:
# Honest Reflection: [Project Name]
## Claimed Completion
[What you claimed in commits, docs, handoffs]
## Actual Completion
- Tasks: X/Y (Z%)
- Weighted: W%
- Time: A hours / B estimated
## Gaps Discovered (N total)
### Critical Gaps (M gaps)
1. [Gap description]
- Impact: [credibility/functionality/quality]
- Fix effort: [hours]
- Priority: CRITICAL
### High Priority Gaps (P gaps)
[List...]
### Medium/Low Gaps (Q gaps)
[Summary...]
## Rationalization Patterns Detected
1. [Rationalization you used]
- Pattern matches: [Shannon anti-rationalization pattern]
- Why it's a rationalization: [explanation]
## Recommendations
**Option A: Complete All Remaining Work**
- Remaining tasks: [list]
- Estimated time: [hours]
- Outcome: Fulfills original plan scope 100%
**Option B: Fix Critical Gaps Only**
- Critical fixes: [list]
- Estimated time: [hours]
- Outcome: Addresses credibility/functionality issues
**Option C: Accept As-Is With Honest Disclosure**
- Update handoff: Acknowledge gaps honestly
- Document: Remaining work as future enhancement
- Outcome: Maintains credibility via transparency
## User Decision Required
[Present options clearly, wait for choice]
Output: Comprehensive honest report
Duration: 15-20 minutes
1. Plan-Delivery Comparison Matrix
For each planned task:
Read plan requirement
Check delivered artifacts
Compare:
- Deliverable exists? (YES/NO)
- Deliverable complete? (100%/50%/0%)
- Quality matches plan? (meets criteria / partial / below)
Document gap if <100% complete
2. Requirement Tracing
REQUIRED directives in plan:
- Search for: "REQUIRED", "MUST", "mandatory"
- Extract each requirement
- Verify each requirement fulfilled
- Flag any unfulfilled REQUIRED items as CRITICAL gaps
3. Assumption Detection
Look for your own statements like:
- "Seems comprehensive" → Based on what evidence?
- "Pattern established" → How many examples? (need 5+, not 3)
- "Good enough" → Compared to what standard?
- "User will understand" → Did you verify or assume?
Each assumption is potential gap until verified
4. Time-Scope Alignment Check
If plan estimated 20 hours total:
- 10 hours worked = should be ~50% complete
- If claiming >60% complete: investigate overclaim
- If claiming <40% complete: investigate inefficiency
Time spent / time estimated ≈ scope completed
Significant mismatch = either estimation wrong or completion wrong
5. Testing Methodology Validation
For each test claiming behavioral improvement:
- Did RED and GREEN use SAME input?
- If different inputs: INVALID test (can't compare)
- If same input, different output: Valid behavioral change
- If same input, same output: No behavioral change (educational only)
Validate methodology before accepting test results
6. Shannon Principle Self-Application
Did you follow Shannon principles on Shannon work?
- 8D Complexity Analysis: Did you analyze the plan's complexity?
- Wave-Based Execution: Did you use waves if complex?
- NO MOCKS Testing: Did you test with real sub-agents?
- FORCED_READING: Did you read ALL files completely?
- Context Preservation: Did you checkpoint properly?
Violating Shannon principles while enhancing Shannon = credibility gap
Minimum: 100 thoughts Typical: 120-150 thoughts for thorough analysis Complex Projects: 150-200+ thoughts
Before concluding reflection, verify:
Completeness: ☐ Read entire original plan (every task, every requirement) ☐ Inventoried all delivered work (files, commits, lines) ☐ Compared EVERY task in plan to delivery ☐ Calculated honest completion percentage ☐ Identified ALL gaps (not just obvious ones)
Quality: ☐ Examined testing methodology validity ☐ Checked if validation claims are supported ☐ Verified assumptions vs confirmations ☐ Assessed if I followed Shannon principles
Honesty: ☐ Acknowledged rationalizations made ☐ Admitted where I fell short ☐ Didn't minimize or justify gaps ☐ Calculated actual completion without bias
Actionability: ☐ Classified gaps (critical/high/medium/low) ☐ Estimated fix effort for each gap ☐ Presented clear options to user ☐ Ready to act on user's choice
# Honest Reflection: [Project Name]
**Reflection Date**: [timestamp]
**Sequential Thoughts**: [count] (minimum 100)
**Reflection Duration**: [minutes]
## Executive Summary
**Claimed Completion**: [what you said in commits]
**Actual Completion**: [calculated percentage]
**Discrepancy**: [gap between claim and reality]
**Assessment**: [HONEST / OVERCLAIMED / UNDERCLAIMED]
## Original Plan Scope
**Total Tasks**: [number]
**Phases**: [number]
**Estimated Time**: [hours]
**Key Deliverables**: [list]
## Delivered Work
**Tasks Completed**: [number] ([percentage]%)
**Tasks Partial**: [number]
**Tasks Not Done**: [number]
**Time Spent**: [hours] ([percentage]% of estimate)
**Artifacts Created**:
- [list of files with line counts]
**Commits Made**: [number]
## Gaps Discovered
**Total Gaps**: [number]
### CRITICAL (Must Address)
1. [Gap name]
- Requirement: [what plan specified]
- Delivered: [what actually done]
- Impact: [why critical]
- Fix Effort: [hours]
### HIGH Priority
[List...]
### MEDIUM/LOW Priority
[Summary...]
## Rationalization Patterns
**Rationalizations I Used**:
1. "[Exact rationalization quote]"
- Matches anti-pattern: [Shannon pattern]
- Reality: [what should have been done]
## Testing Methodology Issues
[Any test validity problems discovered]
## Honest Completion Assessment
**Weighted Completion**: [percentage]% ([calculation method])
**Time Alignment**: [hours spent] / [hours estimated] = [percentage]%
**Validation**: Time% ≈ Completion% ? [YES/NO]
## Recommendations
**Option A: Complete Remaining Work**
- Remaining: [list of tasks]
- Time: [hours]
- Outcome: [100% scope fulfillment]
**Option B: Fix Critical Gaps**
- Critical: [list]
- Time: [hours]
- Outcome: [addresses key issues]
**Option C: Accept & Document**
- Action: Update docs honestly
- Outcome: [maintains credibility via transparency]
**My Recommendation**: [A/B/C with reasoning]
## User Decision Required
[Clear question about what to do next]
Trigger Point: After completing batch, before claiming phase complete
executing-plans: "Batch 3 complete, ready for feedback"
↓
BEFORE user feedback:
honest-reflections: "Did I actually complete Batch 3 per plan?"
↓
If gaps found:
Report gaps WITH batch results (transparent)
If no gaps:
Proceed to user feedback
Trigger Point: After wave synthesis, before declaring wave complete
wave-orchestration: "Wave 3 synthesis complete"
↓
BEFORE marking wave_3_complete:
honest-reflections: "Did all Wave 3 agents deliver per plan?"
↓
Check: Deliverables, tests, integration, NO MOCKS compliance
Trigger Point: Any completion claim
using-shannon: Enforces quality standards
↓
Before ANY "complete" claim:
honest-reflections: MANDATORY checkpoint
↓
Prevents premature completion declarations
Reflection succeeds when:
✅ Minimum 100 thoughts completed (Sequential MCP) ✅ All gaps identified (no rationalization: "I'm sure there are no more") ✅ Honest percentage calculated (not rounded up favorably) ✅ Critical gaps prioritized (not all weighted equally) ✅ Rationalizations acknowledged (not justified) ✅ Options presented clearly to user (not assumed) ✅ Report saved to Serena (learning for future)
Reflection fails when:
❌ Thoughts <100 (incomplete systematic analysis) ❌ Glossing over gaps (minimization, justification) ❌ Claiming "mostly complete" without calculation ❌ Not presenting clear options to user ❌ Proceeding without user decision ❌ Rationalizing the reflection itself ("don't need full 100 thoughts")
Scenario: Plan required enhancing 16 skills, agent enhanced 3
Reflection Process (thought 23):
"Plan Task 2-15: Audit remaining 14 skills. It says 'read skill completely'
and '20-30 minutes per skill'. I read first 200 lines of 3 skills, spent
30 minutes total, concluded 'no gaps'. But 14 skills × 1,000 avg lines =
14,000 lines. I only read 600 lines (4%). I didn't actually audit - I
sampled and assumed. Gap: 96% of audit reading not performed."
Detection Method: Time-scope alignment
Scenario: Agent claims "19% improvement" from RED/GREEN test
Reflection Process (thought 30):
"I tested spec-analysis with RED (inventory system) vs GREEN (recipe platform).
Different inputs produced different outputs (0.47 vs 0.38). But this doesn't
prove walkthrough improves accuracy - it just proves different specs have
different complexity. To validly test behavioral change, I need SAME input
for both RED and GREEN. My test methodology is flawed."
Detection Method: Test validity analysis
Scenario: Plan wanted sh_discover_skills guide, agent created sh_check_mcps guide
Reflection Process (thought 41):
"Plan Task 26 specifies 'sh_discover_skills guide'. I created sh_check_mcps
guide instead. Both exist as commands, but plan explicitly listed
sh_discover_skills. I substituted without approval because sh_check_mcps
seemed more important. That's my judgment overriding plan specification."
Detection Method: Exact requirement matching
Problem: Agent thinks "I've found the main gaps, 70 thoughts is enough"
Why It Fails: Last 30-50 thoughts often reveal deepest gaps (meta-level patterns, principle violations, methodology flaws)
Solution: Continue to minimum 100, extend to 150+ if still discovering gaps
Problem: Reflection becomes justification exercise ("here's WHY gaps are acceptable")
Why It Fails: Reflection goal is IDENTIFY gaps, not JUSTIFY them
Solution: Label rationalizations as rationalizations, don't defend them
Problem: "Only 13 skills missing, that's not that many" or "50% completion is passing grade"
Why It Fails: Minimization is gap avoidance
Solution: State gaps factually without minimization. Let user judge severity.
Problem: Skim plan, assume you remember requirements, miss specific details
Why It Fails: Plans have specific requirements (file names, line counts, exact deliverables) that skimming misses
Solution: Read ENTIRE plan during Phase 1 of reflection. Every line.
| Project Complexity | Reflection Time | Thoughts Required | Gaps Typically Found |
|---|---|---|---|
| Simple (1-5 tasks) | 15-20 min | 50-80 | 2-5 gaps |
| Moderate (5-15 tasks) | 20-30 min | 100-120 | 5-15 gaps |
| Complex (15-30 tasks) | 30-45 min | 120-150 | 15-30 gaps |
| Critical (30+ tasks) | 45-60 min | 150-200+ | 30-50+ gaps |
This project: 38 tasks (Complex-Critical) → 30-45 min reflection, 131 thoughts, 27 gaps found
Alignment: ✅ Metrics align with complexity (thorough reflection appropriate for scope)
How to verify reflection executed correctly:
Check thought count:
Check completeness:
Check honesty:
Check actionability:
Version: 1.0.0 Created: 2025-11-08 (from Shannon V4.1 enhancement reflection) Author: Shannon Framework Team Status: Core PROTOCOL skill for quality assurance
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.