Learn From Mistakes (DOG-Specific)

Purpose

Analyze mistakes using 5-whys methodology with DOG-specific consideration of conversation length and context degradation as potential root causes. Integrates with token tracking data to identify context-related failure patterns and recommend preventive measures including earlier decomposition.

When to Use

Any mistake occurs during DOG orchestration
Subagent produces incorrect or incomplete results
Task requires rework or correction
Build failures, test failures, logical errors
Repeated attempts at same operation
Quality degradation observed over time

Workflow

1. Document the Mistake

mistake:
  timestamp: 2026-01-10T16:30:00Z
  type: incorrect_implementation
  description: |
    Subagent implemented parser with wrong precedence rules.
    Expressions like "a + b * c" parsed as "(a + b) * c" instead
    of "a + (b * c)".
  impact: |
    All tests using operator precedence failing.
    Required complete rewrite of expression parsing.

2. Gather Context Metrics

DOG-specific: Always collect token data

SESSION_ID="${SUBAGENT_SESSION}"
SESSION_FILE="/home/node/.config/claude/projects/-workspace/${SESSION_ID}.jsonl"

# Token usage at time of mistake
TOKENS_AT_ERROR=$(jq -s 'map(select(.type == "assistant")) |
  map(.message.usage | .input_tokens + .output_tokens) | add' "${SESSION_FILE}")

# Compaction events before mistake
COMPACTIONS=$(jq -s '[.[] | select(.type == "summary")] | length' "${SESSION_FILE}")

# Messages before mistake
MESSAGE_COUNT=$(jq -s '[.[] | select(.type == "assistant")] | length' "${SESSION_FILE}")

# Time since session start
SESSION_DURATION=$(calculate_duration "${SESSION_FILE}")

3. Perform 5-Whys Analysis

Standard analysis with DOG-specific consideration:

five_whys:
  - why: "Why was precedence implemented incorrectly?"
    answer: "Subagent confused multiplication and addition handling"

  - why: "Why was the subagent confused?"
    answer: "Earlier context about precedence rules was not referenced"

  - why: "Why wasn't earlier context referenced?"
    answer: "Session had 95K tokens, approaching context limit"

  - why: "Why were there 95K tokens in the session?"
    answer: "Task scope was too large for single context window"

  - why: "Why wasn't the task decomposed earlier?"
    answer: "Token monitoring wasn't triggering at 40% threshold"

root_cause: "Task exceeded safe context bounds without decomposition"
category: CONTEXT_DEGRADATION

4. Check for Context Degradation Patterns

DOG-specific analysis checklist:

context_degradation_analysis:
  # Token-related factors
  tokens_at_error: 95000
  threshold_exceeded: true  # > 80K
  threshold_exceeded_by: 15000

  # Compaction factors
  compaction_events: 2
  errors_after_compaction: true

  # Temporal factors
  session_duration: 4.5 hours
  messages_before_error: 127

  # Quality trend
  early_session_quality: high
  late_session_quality: degraded
  quality_degradation_detected: true

  # Conclusion
  context_related: LIKELY
  confidence: 0.85

5. Identify Prevention Level

Standard hierarchy with DOG additions:

prevention_hierarchy:
  # Level 1: Code fix (best)
  - level: 1
    type: code_fix
    description: "Make code self-correcting or impossible to get wrong"

  # Level 2: DOG-specific - Earlier decomposition
  - level: 2
    type: earlier_decomposition
    description: "Trigger task split before context degradation occurs"
    dog_specific: true

  # Level 3: Validation/hook
  - level: 3
    type: validation
    description: "Add automated checks that catch the mistake early"

  # Level 4: Lower threshold
  - level: 4
    type: threshold_adjustment
    description: "Reduce context threshold from 40% to more conservative value"
    dog_specific: true

  # Level 5: Process change
  - level: 5
    type: process
    description: "Change workflow to prevent mistake"

  # Level 6: Documentation (last resort)
  - level: 6
    type: documentation
    description: "Document to prevent future occurrence"

6. Implement Prevention

For context-related mistakes:

prevention_action:
  if_context_related:
    primary:
      action: "Adjust token monitoring threshold"
      current_threshold: 80000  # 40%
      new_threshold: 60000      # 30%
      rationale: "Earlier warning gives time to decompose"

    secondary:
      action: "Add quality checkpoint at 50% context"
      implementation: |
        At 50% context (100K tokens), pause and verify:
        - Is work quality consistent with early session?
        - Are earlier decisions still being referenced?
        - Should task be decomposed now?

    tertiary:
      action: "Enhance PLAN.md with explicit checkpoints"
      implementation: |
        Add context-aware milestones to task plans.
        Each milestone = potential decomposition point.

7. Verify Prevention Works

verification:
  action: "Rerun similar task with new threshold"
  success_criteria:
    - Decomposition triggered before 60K tokens
    - No quality degradation observed
    - Original mistake type does not recur

8. Record Learning

learning_record:
  mistake_id: M019
  date: 2026-01-10
  category: CONTEXT_DEGRADATION

  summary: |
    Parser precedence error due to context degradation at 95K tokens.

  root_cause: |
    Task too large for single context window. Quality degraded
    as earlier context became less accessible.

  prevention: |
    - Lowered monitoring threshold from 40% to 30%
    - Added 50% context quality checkpoint
    - Task plans now include decomposition milestones

  dog_specific_learning: |
    Long conversations (>60K tokens) show measurable quality
    degradation. Complex tasks should be decomposed proactively,
    not reactively at context limits.

Examples

Context-Related Mistake

mistake:
  type: "Forgot earlier requirement"
  tokens_at_error: 110000
  compactions: 3

analysis:
  context_related: YES
  pattern: "Requirement stated at 15K tokens, forgotten by 110K"

prevention:
  type: earlier_decomposition
  action: "Split task at 40K tokens, before degradation"

Non-Context-Related Mistake

mistake:
  type: "Used wrong API method"
  tokens_at_error: 25000
  compactions: 0

analysis:
  context_related: NO
  pattern: "Simple misunderstanding of API, not context issue"

prevention:
  type: validation
  action: "Add API usage verification in code review checklist"

Ambiguous Case

mistake:
  type: "Inconsistent code style"
  tokens_at_error: 75000
  compactions: 1

analysis:
  context_related: POSSIBLY
  pattern: "Style was consistent until compaction, then diverged"
  contributing_factors:
    - Compaction lost style context
    - No automated style check

prevention:
  type: hybrid
  actions:
    - "Add automated style linting (code fix)"
    - "Lower threshold to avoid compaction (DOG-specific)"

Anti-Patterns

Do NOT ignore token metrics

# ❌ Standard analysis only
five_whys:
  - "Why error?" -> "Bad implementation"
  - "Why bad?" -> "Misunderstood requirements"
  # Stops here, misses context cause

# ✅ DOG-specific analysis
five_whys:
  - "Why error?" -> "Bad implementation"
  - "Why bad?" -> "Misunderstood requirements"
  - "Why misunderstood?" -> "Earlier context not referenced"
  - "Why not referenced?" -> "95K tokens, context pressure"
  - "Why 95K tokens?" -> "Task not decomposed"

Do NOT assume all mistakes are context-related

# ❌ Blaming context for everything
mistake: "Typo in variable name"
analysis: "Must be context degradation"

# ✅ Honest analysis
mistake: "Typo in variable name"
analysis: |
  Tokens at error: 15000 (15% of context)
  Compactions: 0
  Context-related: NO
  Actual cause: Simple typo, needs spellcheck

Do NOT adjust thresholds without data

# ❌ Arbitrary threshold change
new_threshold: 20000  # "Let's be extra safe"

# ✅ Data-driven adjustment
analysis: |
  Errors consistently occur after 70K tokens.
  Quality degradation measurable at 60K.
  Setting threshold at 50K provides safety margin.
new_threshold: 50000

Do NOT skip verification

# ❌ Implement and forget
prevention: "Lower threshold to 30%"
# Never verified!

# ✅ Verify prevention works
prevention: "Lower threshold to 30%"
verification:
  - Run similar task
  - Confirm decomposition triggers at 30%
  - Confirm mistake type doesn't recur

Related Skills

dog:token-report - Provides data for context analysis
dog:decompose-task - Implements earlier decomposition
dog:monitor-subagents - Catches context issues early
dog:collect-results - Preserves progress before intervention

dog:learn-from-mistakes