Help us improve
Share bugs, ideas, or general feedback.
From contextd
Use when reviewing agent behavior patterns, improving CLAUDE.md based on past failures, or checking ReasoningBank health. REQUIRES contextd MCP server - this skill is inoperable without it.
npx claudepluginhub fyrsmithlabs/marketplace --plugin contextdHow this skill is triggered — by the user, by Claude, or both
Slash command
/contextd:self-reflectionThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Mine memories and remediations for behavior patterns, surface findings to user, remediate docs with pressure-tested improvements.
Reviews completed coding sessions to extract actionable improvements: DX friction, documentation gaps, architecture issues, anti-patterns, bug prevention, and tooling updates.
Captures high/medium/low confidence patterns from conversations to prevent repeating mistakes and preserve successes. Invoke proactively after corrections, praise, edge cases, or skill-heavy sessions.
Captures high/medium/low confidence learnings from conversations via triggers like corrections, praise, edge cases. Improves skills by preventing mistakes and preserving successes. Invoke proactively after 'no/wrong', 'perfect', or session ends.
Share bugs, ideas, or general feedback.
Mine memories and remediations for behavior patterns, surface findings to user, remediate docs with pressure-tested improvements.
Core loop: Search -> Report -> User prioritizes -> Brainstorm -> Pressure test -> Apply
contextd-workflow skill/remembercontextd-workflow skillFocus on agent behaviors, not technical failures:
| Behavior Type | Description | Examples |
|---|---|---|
| rationalized-skip | Justified skipping required step | "too simple to test", "user implied consent" |
| overclaimed | Absolute language inappropriately | "ensures", "guarantees", "production ready" |
| ignored-instruction | Didn't follow CLAUDE.md/skill | Skipped contextd search, ignored TDD |
| assumed-context | Assumed without verification | Assumed permission, requirements, state |
| undocumented-decision | Significant choice without rationale | Changed architecture without comparison |
| Severity | Combination |
|---|---|
| CRITICAL | rationalized-skip + destructive/security operation |
| HIGH | rationalized-skip + validation skip, ignored-instruction |
| MEDIUM | overclaimed, assumed-context |
| LOW | undocumented-decision, style issues |
For each finding, surface:
Present findings
|
User selects findings to remediate
|
Generate doc improvements
|
Generate pressure scenarios (from real failures)
|
Run batch tests via subagents
|
Pass? --No--> Iterate
| Yes
Create Issue/PR
|
Apply changes
|
Close feedback loop:
memory_feedback(memory_id, helpful=true)
Tag original memories as remediated
# Rationalized skips
memory_search("skip OR skipped OR bypass OR ignored")
memory_search("too simple OR trivial OR obvious")
# User feedback indicating ignored instructions
memory_search("why did you OR should have OR forgot to")
# Assumptions without verification
memory_search("assumed OR without checking")
# Overclaiming
memory_search("ensures OR guarantees OR production ready")
Filter out technical bugs: Exclude memories with error:* tags or stack traces.
--health flag analyzes:
| Action | Command |
|---|---|
| Full report | /reflect |
| Health only | /reflect --health |
| Apply fixes | /reflect --apply |
| Recent only | /reflect --since=7d |
| Filter by behavior | /reflect --behavior=rationalized-skip |
| Filter by severity | /reflect --severity=HIGH |
| Mistake | Why It Fails |
|---|---|
| Skipping pressure tests | "Fixed" docs don't actually prevent behavior |
| Modifying plugin source | Breaks on update; use includes |
| Auto-applying security fixes | High-stakes changes need review |
| Ignoring frequency | 10 TDD skips is systemic, not minor |
| Absolute claims in fixes | "This prevents X" -> "This helps reduce X" |
Go beyond symptoms to find root causes:
{
"finding_id": "ref_001",
"behavior": "rationalized-skip",
"symptom": "Skipped tests before claiming fix complete",
"causal_chain": [
{
"level": 1,
"cause": "Agent claimed fix without running tests",
"evidence": ["mem_123", "mem_124"]
},
{
"level": 2,
"cause": "CLAUDE.md test instruction buried in long section",
"evidence": ["claude_md_line_245"]
},
{
"level": 3,
"cause": "No PreToolUse hook enforcing test requirement",
"evidence": ["hooks.json missing enforcement"]
}
],
"root_cause": "Missing automated enforcement of test-before-fix policy",
"fix_target": "hooks.json + CLAUDE.md restructure"
}
| Level | Description | Fix Location |
|---|---|---|
| 1 | Immediate behavior | Agent prompt/skill |
| 2 | Missing guidance | CLAUDE.md/documentation |
| 3 | Missing enforcement | Hooks/automation |
| 4 | Systemic gap | Plugin/skill redesign |
Find patterns across incidents:
causal_correlate(findings: [ref_001, ref_002, ref_003])
Returns:
shared_root_causes: [
{ cause: "Missing hook enforcement", incidents: [ref_001, ref_002] },
{ cause: "Ambiguous CLAUDE.md section", incidents: [ref_002, ref_003] }
]
recommended_fixes: [
{ target: "hooks.json", impact: "high", fixes_incidents: 2 }
]
Track improvement (or regression):
{
"benchmark_period": "2026-01-01 to 2026-01-28",
"metrics": {
"rationalized_skip": {
"count": 5,
"previous_period": 12,
"trend": "improving",
"change_pct": -58
},
"ignored_instruction": {
"count": 8,
"previous_period": 6,
"trend": "regressing",
"change_pct": +33
},
"assumed_context": {
"count": 3,
"previous_period": 3,
"trend": "stable",
"change_pct": 0
}
}
}
| Metric | Target | Good | Warning | Critical |
|---|---|---|---|---|
| rationalized_skip/week | 0 | < 2 | 2-5 | > 5 |
| ignored_instruction/week | 0 | < 3 | 3-7 | > 7 |
| overclaimed/week | 0 | < 5 | 5-10 | > 10 |
| test_coverage_skip | 0% | < 5% | 5-15% | > 15% |
/reflect --benchmark --compare-periods "2026-01" "2025-12"
Output:
| Behavior | Dec 2025 | Jan 2026 | Change |
|----------|----------|----------|--------|
| rationalized-skip | 12 | 5 | -58% |
| ignored-instruction | 6 | 8 | +33% |
Top Improvement: Hook enforcement reduced skips
Top Regression: New skills lack CLAUDE.md entries
Predict likely future failures based on patterns:
{
"prediction": {
"behavior": "rationalized-skip",
"likelihood": 0.75,
"conditions": [
"Complex task with > 5 sub-steps",
"Time pressure mentioned in prompt",
"No explicit test requirement in task"
],
"historical_basis": ["mem_101", "mem_102", "mem_103"],
"prevention": "Add explicit test checkpoint to complex task prompts"
}
}
| Factor | Risk Increase | Mitigation |
|---|---|---|
| Task complexity > 5 steps | +40% skip risk | Explicit checkpoints |
| "Quick fix" language | +60% skip risk | Reject quick-fix framing |
| No acceptance criteria | +50% assumption risk | Require criteria |
| Security-adjacent code | +30% overclaim risk | Require review |
{
"alert": "high_risk_task_detected",
"task_description": "Quick fix for authentication bug",
"risk_factors": ["quick_fix_language", "security_adjacent"],
"predicted_behaviors": ["rationalized-skip", "assumed-context"],
"recommended_guardrails": [
"Require explicit test plan before starting",
"Trigger consensus-review before merge"
]
}
Auto-intervene when risk detected:
{
"hook_type": "PreToolUse",
"tool_name": "Edit",
"condition": "file_path.contains('auth') AND prediction.skip_risk > 0.5",
"prompt": "High skip risk detected for security code. Before editing, confirm: 1) Tests exist 2) Review planned 3) No assumptions about user state"
}
Tag reflection findings with standard types:
| Finding Type | Tag | Purpose |
|---|---|---|
| Behavior pattern | type:pattern, category:behavior | Track patterns |
| Root cause | type:decision, category:analysis | Document cause |
| Fix proposal | type:learning, category:improvement | Capture fix |
| Regression | type:failure, category:regression | Track setbacks |
| Policy update | type:policy, category:enforcement | New rules |
<org>/<project>/reflections/<reflection_id>
Examples:
fyrsmithlabs/contextd/reflections/2026-01-weekly
fyrsmithlabs/marketplace/reflections/v1.6-pre-release
<reflection_namespace>/findings/<finding_id>
Example:
fyrsmithlabs/contextd/reflections/2026-01-weekly/findings/ref_001
All reflection records include:
| Field | Description | Auto-set |
|---|---|---|
created_by | Reflection agent/session | Yes |
created_at | Analysis timestamp | Yes |
period_start | Analysis period start | Yes |
period_end | Analysis period end | Yes |
memory_count | Memories analyzed | Yes |
finding_count | Findings generated | Yes |
remediation_count | Fixes applied | Yes |
Run reflection analysis without blocking:
Task(
subagent_type: "general-purpose",
prompt: "Analyze memories for behavior patterns over past 7 days",
run_in_background: true,
description: "Background reflection analysis"
)
// Continue other work...
// Collect results later:
TaskOutput(task_id, block: true)
Chain reflection phases:
search_task = Task(prompt: "Search memories for behavior patterns")
analyze_task = Task(prompt: "Analyze patterns, build causal chains", addBlockedBy: [search_task.id])
benchmark_task = Task(prompt: "Compare to previous period", addBlockedBy: [analyze_task.id])
predict_task = Task(prompt: "Generate predictions", addBlockedBy: [analyze_task.id])
report_task = Task(prompt: "Synthesize report", addBlockedBy: [benchmark_task.id, predict_task.id])
Auto-alert on predicted risky operations:
{
"hook_type": "PreToolUse",
"tool_name": "Edit|Bash",
"condition": "prediction_model.risk_score > 0.7",
"prompt": "High-risk operation predicted. Review risk factors and confirm guardrails are in place before proceeding."
}
Auto-record behavior patterns:
{
"hook_type": "PostToolUse",
"tool_name": "Task",
"condition": "task_description.contains('reflection')",
"prompt": "Reflection complete. Record findings to memory with type:pattern tags. Update benchmarks."
}
Self-reflection emits events for other skills:
{
"event": "reflection_complete",
"payload": {
"reflection_id": "2026-01-weekly",
"findings_count": 12,
"critical_count": 1,
"high_count": 3,
"trend": "improving",
"top_behavior": "rationalized-skip"
},
"notify": ["setup", "workflow", "orchestration"]
}
Subscribe to reflection events:
reflection_started - Analysis beganreflection_complete - Analysis finishedcritical_finding - CRITICAL behavior detectedregression_detected - Metrics worseningbenchmark_updated - New baseline recordedprediction_generated - Risk prediction availableintervention_triggered - Auto-guardrail activated