From claude-code-hermit
Reflects on recent work by analyzing session reports, detecting token cost spikes, reviewing proposals, checking resolutions, and proposing pattern-based improvements.
npx claudepluginhub gtapps/claude-code-hermit --plugin claude-code-fitness-hermitThis skill uses the workspace's default tool permissions.
Pause and think about your recent work.
Reviews conversation history for corrections, traces errors to skills, CLAUDE.md, memory or tools, and proposes process fixes. Invoke after corrections or at session end.
Reviews completed coding sessions to extract actionable improvements: DX friction, documentation gaps, architecture issues, anti-patterns, bug prevention, and tooling updates.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Share bugs, ideas, or general feedback.
Pause and think about your recent work.
This skill is silent by default. Only notify the operator (per the channel policy in CLAUDE.md § Operator Notification) if reflect produces an outcome: a proposal candidate, a micro-approval, a resolved proposal, a graduated sub-threshold observation, or a cost spike.
Read SHELL.md for current context
Read last 20 lines of cost-log.jsonl. Compute today's total and the 7-day median. If today's total > 2× the 7-day median (and both are non-zero), record the spike to project memory as a sub-threshold observation with pattern cost_spike: $X.XX vs 7d median $Y.YY and today's session_id — it becomes input to later reflects and may graduate via the recurrence rule.
2b. Compute phase — gates adapt to hermit age so cold-start installs produce visible output without eroding mature-hermit rigor.
counters.since from state/reflection-state.json (set once at hatch, never rewritten). If missing or unparseable → default $PHASE = adult and continue. Never block.age_days = whole days between counters.since and now.$PHASE table (age is monotonic → no hysteresis):
newborn — age_days < 3juvenile — 3 ≤ age_days < 14adult — age_days ≥ 14$PHASE for the rest of this run; it gates recurrence (Three-Condition Rule #1), sub-threshold surfacing (Outcomes), and the Progress Log annotation.Scan proposals/ for existing proposals (dedup, stale check, feedback loop). Parse metadata from YAML frontmatter if present (file starts with ---). Fall back to parsing bullet-point metadata (**Status:**, **Source:**, etc.) for pre-Observatory proposals. Also tail the last 100 lines of state/proposal-metrics.jsonl and count responded events by action (accept / defer / dismiss) — the dismissal ratio feeds the operator-value self-check below.
Resolution Check — check whether any accepted proposals can be marked resolved. Cap: check up to 5 per reflect cycle, round-robin.
a. Read state/reflection-state.json → last_resolution_check (last PROP-NNN checked, or null if first run).
b. Read all proposals with status: accepted. Sort by accepted_date ascending. Resume from the proposal after last_resolution_check, wrapping around. Take up to 5.
c. If the accepted list from step b is empty, skip to step f.
d. For each proposal: read its title and Evidence section to understand the original pattern.
Glob .claude-code-hermit/sessions/S-*-REPORT.md, sort descending, take the 3 most recent.
e. If the pattern is absent from all 3 checked sessions — apply the cadence-aware resolution rule:
Compute original cadence:
related_sessions list from frontmatter.S-NNN, read date: from sessions/S-NNN-REPORT.md frontmatter.original_cadence_days = max(date) - min(date) in whole days. Single session → 0.related_sessions is empty → treat as sparse (conservative fallback).Frequent pattern (original_cadence_days ≤ 14) — auto-resolve if ≥ 14 days have elapsed since accepted_date:
status: resolved, resolved_date: <now ISO>.resolved event:
node ${CLAUDE_PLUGIN_ROOT}/scripts/append-metrics.js .claude-code-hermit/state/proposal-metrics.jsonl '{"ts":"<now ISO>","type":"resolved","proposal_id":"PROP-NNN"}'
Sparse pattern (original_cadence_days > 14) — never auto-resolve. Surface for operator confirmation if elapsed ≥ 2 × original_cadence_days since accepted_date:
state/reflection-state.json → last_sparse_nudge.<PROP-NNN>. If present and fewer than 7 days have elapsed since that date, skip (prevents daily re-nudge noise).PROP-NNN appears resolved (pattern absent 3/3 recent sessions, original cadence Nd, Xd since accept). Run /claude-code-hermit:proposal-act resolve PROP-NNN to confirm.
"last_sparse_nudge": {"<PROP-NNN>": "<now ISO>"} in the State Update payload below. The update script merges under the top-level last_sparse_nudge key.If the pattern is absent but the elapsed guard is not yet met (frequent: < 14d, sparse: < 2×cadence), skip and revisit next cycle.
f. Note the last PROP-NNN checked (or null if batch was empty). Include as last_resolution_check in the final state write in the State Update section below — do NOT write reflection-state.json here.
Now reflect — ultrathink — using your memory and the context above:
responded event counts from step 3: a high dismiss ratio, proposals stacking in deferred, or compiled/brief outputs that go uncited in subsequent sessions are signals that some output is noise. If the signal is strong, treat it like the routine-silence check below and consider a Tier 1 micro-proposal to pare back the offending output.Only create a proposal if all three are true:
newborn: 1+ session acceptable for Tier 1 only — use Evidence Source: current-session and cite Sessions: current when the pattern is present only in the live SHELL.md (judge returns ACCEPT (current-session)). Tier 2/3 still require 2+ sessions.juvenile / adult: 2+ sessions (baseline — observed more than once, across sessions).If any of the three cannot be stated concretely, do not create the proposal. Sub-threshold observations (interesting but failing the rule) are recorded to project memory so they can graduate on later recurrence — see the Outcomes section.
If SHELL.md status is idle — think broader:
Should any recurring check be added to HEARTBEAT.md?
Is there a preference or constraint missing from OPERATOR.md?
Would a sub-agent improve a type of work that keeps coming up?
Would a skill formalize a workflow you keep repeating?
Is a manual request repeating on a schedule? (e.g., "operator asked for dependency check 3 of last 4 Mondays")
If so: create a proposal with Type: routine and a ## Config block containing the routine JSON:
## Config
{"id":"weekly-deps","schedule":"0 9 * * 1","skill":"claude-code-hermit:session-start --task 'dependency audit'","enabled":true}
When accepted via proposal-act, this JSON is parsed and added to config.json routines automatically.
Is a routine firing repeatedly with no visible downstream effect? Read the tail of state/routine-metrics.jsonl (last 200 lines), group fired events by routine_id, and cross-reference with the last 3 session reports (sessions/S-*-REPORT.md). If a routine has ≥5 fires in the last 14 days and no session report cites its routine_id or skill output as producing findings, decisions, or follow-ups — apply the Three-Condition Rule. If all three conditions hold:
enabled: false (disable) via a Type: routine proposal reusing the existing id. proposal-act upserts by id, so no delete path is needed.schedule if the routine is valuable but mis-timed.disable → Tier 1 (micro-approval). Re-time → Tier 1. Both are fully reversible. Operators can clean up disabled entries any time via /hermit-settings routines.## Config
{"id":"weekly-deps","schedule":"0 9 * * 1","skill":"claude-code-hermit:session-start --task 'dependency audit'","enabled":false}
Check whether any skill, agent, or hook is underperforming.
Skills:
Agents: read state/reflection-state.json cumulative counters (they accumulate since the since timestamp). Flag if reflection-judge shows judge_suppress dominating judge_accept (rough threshold: suppress count > 2× accept count, with at least 5 total verdicts since since) — the gate may be too strict and killing legitimate candidates. proposal-triage has no verdict counters today; treat it as a known gap and skip unless qualitative evidence (e.g., a recent DUPLICATE verdict that was actually novel) is visible in SHELL.md Findings.
Hooks: out of scope here — there is no hook execution telemetry. Document as a known gap if hook misbehavior is suspected; do not try to infer from side-effects.
Signal ladder (same for all three):
/claude-code-hermit:proposal-create with the evidence (subject to Three-Condition Rule)./claude-code-hermit:proposal-create with the evidence and include a ## Skill Improvement section (or ## Agent Improvement) listing the component name, observed failures, and suggested eval criteria. When the proposal is accepted via proposal-act, use /skill-creator eval and /skill-creator improve to implement the changes. If /skill-creator is not available, apply the changes to the component's definition file directly.For any candidate with Evidence Source: current-session, reflect must not add or rewrite evidence-bearing lines in ## Findings or ## Blockers of SHELL.md before reflection-judge runs. The judge validates against pre-existing session content; injecting the pattern text immediately before the judge reads it would make the system self-certifying.
Exempt (always allowed, any time): the mandatory ## Progress Log append (see § Progress Log Entry) and housekeeping notes that do not describe the candidate's pattern (e.g. skipped-scheduled-check lines, resolved-proposal notes).
If the pattern is only visible to reflect via inference (cost log, token counters, timing), the candidate is not eligible for Evidence Source: current-session in that run. Keep it sub-threshold until it recurs and can be cited from independent historical evidence (Evidence Source: archived-session).
Before acting on any proposal candidate, delegate to claude-code-hermit:reflection-judge.
Pass each candidate as:
Candidate: <title>
Tier: <1|2|3>
Evidence Source: archived-session | current-session | scheduled-check/<id> | operator-request
Evidence: <summary>
Sessions: <S-001, S-002, ...> (or "none")
Evidence Source: defaults to archived-session if omitted. Plugin-check candidates use Evidence Source: scheduled-check/<id> with Sessions: none. Newborn Tier-1 candidates from live SHELL.md evidence use Evidence Source: current-session with Sessions: current.
scheduled-check/<id> and operator-request share the same bypass policy at every gate (skip recurrence, enforce consequence + actionability). They are kept distinct on purpose: scheduled-check/<id> carries the check identifier for telemetry and debugging; operator-request marks human-initiated flows (e.g. baseline audits in session-start). Future routing (e.g. KAIROS) will read them as different provenance classes. Do not collapse them into one value.
no-sessions, note the candidate in SHELL.md Findings for future revisit. Otherwise drop silently.Only act on ACCEPT and DOWNGRADE verdicts.
After reflecting and validating with claude-code-hermit:reflection-judge, choose exactly one outcome per observation:
claude-code-hermit:proposal-triage first (see below), then queue micro-approval in state/micro-proposals.jsonclaude-code-hermit:proposal-triage first (see below), then call /claude-code-hermit:proposal-createSub-threshold observations (interesting but failing the Three-Condition Rule — typically single-occurrence) do not surface to the operator in steady state. Record them to project memory with a short pattern label and today's session_id so they can graduate via recurrence on a later reflect. Do not generate observations for their own sake, and do not surface them before they graduate.
Reflect-generated inferences (cost spikes, token-count shapes, timing patterns) never use bypass Evidence Sources (scheduled-check/* or operator-request). They remain sub-threshold observations recorded to project memory and graduate only by genuine recurrence across sessions, at which point they can be cited as Evidence Source: archived-session or current-session like any other session-grounded observation.
Phase-aware surfacing exception:
newborn: also log each sub-threshold observation inline to SHELL.md Findings as Noticed: <pattern> (single line, no ceremony). Gives the operator early signal that reflect is watching while recurrence data accumulates.juvenile: emit a weekly digest instead of per-observation lines. Read last_digest_at from state/reflection-state.json (top-level, may be absent). If absent or older than 7 days, write a single Noticed (digest): <N> observations — <top 3 pattern labels> line to SHELL.md Findings, and include "last_digest_at": "<now ISO>" in the State Update payload below so the update script persists it.adult: silent (baseline).Review past dismissed and deferred proposals. Avoid re-suggesting recently dismissed ideas. If significantly more evidence has accumulated since a dismissal, it may be worth revisiting.
Before creating a proposal or acting silently, classify every proposal candidate into a tier:
Tier 1 — reversible, routine, low-scope: queue micro-approval, do NOT create PROP-NNN. Example: "For 3 weeks I've added the same 5 hashtags manually. Proposing to automate that step."
Tier 2 — meaningful but non-critical: queue micro-approval, create PROP-NNN only after operator says yes. Example: "Morning brief is consistently ignored on weekdays before 9am. Proposing to shift it to 9:30am."
Tier 3 — safety-critical, irreversible, or cross-hermit scope: create PROP-NNN immediately via /claude-code-hermit:proposal-create, skip micro-approval entirely.
Example: "Operator's gate automation script has an error that could trigger physical actuators unexpectedly. Requires explicit review before any change."
Before queuing a micro-approval or calling proposal-create, call claude-code-hermit:proposal-triage. Pass Evidence Source: when known:
Title: <title>
Evidence Source: <value from the candidate, or omit to default to archived-session>
Evidence: <one-paragraph evidence summary>
CREATE — proceedDUPLICATE:<PROP-ID> — link to existing proposal in SHELL.md Findings instead, do not createSUPPRESS — drop silentlyEvery micro-proposal question must include: [observed pattern + duration] + [consequence] + [exact proposed change] + "Yes / No"
Do not queue vague questions like "Found a pattern. Want me to improve it?" — all three components must be present.
Dedup: do not re-append the same candidate if an entry with the same title/id already exists in pending.
MP-YYYYMMDD-N where N increments within the same day (0, 1, 2). Check existing micro-queued events in proposal-metrics.jsonl for today to determine N.state/micro-proposals.json. Append a new entry to pending with id: "MP-YYYYMMDD-N", tier: <1|2>, status: "pending", follow_up_count: 0, ts: "<now ISO>", question: "<full question text>". Write the file.micro-queued event to proposal-metrics.jsonl via append-metrics.js: {"ts":"<now ISO>","type":"micro-queued","micro_id":"MP-YYYYMMDD-N","tier":1,"question":"<full question text>"}MP-YYYYMMDD-N (tier <N>): <question> — Reply "MP-YYYYMMDD-N yes" or "MP-YYYYMMDD-N no" (bare yes/no accepted when only one entry is pending).After each reflection run, call update-reflection-state.js with the run's verdict counts:
node ${CLAUDE_PLUGIN_ROOT}/scripts/update-reflection-state.js \
.claude-code-hermit/state/reflection-state.json \
'{"last_resolution_check":"<last-PROP-NNN-or-null>","ran_with_candidates":<true|false>,"judge_accept":<N>,"judge_downgrade":<N>,"judge_suppress":<N>,"proposals_created":<N>,"micro_proposals_queued":<N>}'
Include "last_digest_at":"<now ISO>" in the payload only when a juvenile digest fired in this run (see Outcomes → phase-aware surfacing). Omit otherwise — the script preserves the prior value.
Include "last_sparse_nudge":{"<PROP-NNN>":"<now ISO>"} when a sparse-pattern nudge was emitted this run (step 4e). The script merges the provided map into the existing last_sparse_nudge top-level key. Omit if no sparse nudge was emitted — the script preserves the prior map.
The script handles: counter increments, last_reflection/last_run_at timestamps, missing-counters fallback, since preservation, last_digest_at passthrough, last_sparse_nudge merge, and atomic write. It always exits 0 — if the write fails it logs one line to stderr and continues. Counters are diagnostic, not audit-grade — a missed increment is acceptable.
On every reflect run, including empty ones, append one line to SHELL.md ## Progress Log:
[HH:MM] reflect (<phase>) — N candidates; verdicts: accept=A downgrade=D suppress=S; outcomes: <list or "none">
When any suppressions occurred, append a compact suffix:
[HH:MM] reflect (<phase>) — N candidates; verdicts: accept=A downgrade=D suppress=S; outcomes: <list or "none">; suppressed: [<slug>: <code>, ...]
Format per suppression: <candidate-title-slug>: <code> where <code> is the canonical code from the judge or triage verdict (no-evidence, no-sessions, weak-recurrence, weak-consequence, not-actionable). Cap at 3 entries with +N more overflow. Omit the suppressed: suffix entirely when suppress=0.
<phase> is one of newborn / juvenile / adult (from step 2b). If phase detection fell back to adult due to missing counters.since, annotate as adult silently — no operator-facing distinction.
This is the audit trail. The silent-by-default rule at the top governs operator pings only — the log line always goes in.