From junjak-ai-harness
Extract reusable patterns from sessions, manage session state, and evolve learnings into skills. Use at session end, after completing features, or when patterns emerge during work.
How this skill is triggered — by the user, by Claude, or both
Slash command
/junjak-ai-harness:continuous-learningThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Capture, evaluate, and evolve reusable patterns from coding sessions. Three modes: session state tracking, pattern extraction, skill evolution.
Capture, evaluate, and evolve reusable patterns from coding sessions. Three modes: session state tracking, pattern extraction, skill evolution.
During any non-trivial session, maintain .claude/session-state/current.md:
# Session State — {date}
## Current Task
{what you're working on}
## Progress
- [x] Completed steps
- [ ] Remaining steps
## Verified Approaches
{approach}: {evidence it works}
## Failed Approaches
{approach}: {why it failed}
## Key Decisions
- {decision}: {rationale}
## Discoveries
- {non-obvious findings about the codebase}
| Trigger | Action |
|---|---|
| Starting a task | Write initial state with task description |
| Completing a milestone | Update progress, add verified approaches |
| Failed approach | Document what failed and why |
| Before compaction | Full state snapshot (pre-compact hook reminds you) |
| Before session end | Final state with remaining work items |
Session start → Load .claude/session-state/last-session.md (if exists)
During session → Write/update .claude/session-state/current.md
Session end → Stop hook archives current.md → last-session.md
Next session → SessionStart hook outputs last-session.md
Extract patterns that are:
| Category | Example |
|---|---|
| Debugging | "This error always means X, fix with Y" |
| Architecture | "This codebase uses pattern X for Y" |
| Tooling | "Use X tool with Y flags for best results" |
| Domain | "Business rule: X always requires Y" |
| Performance | "Query X is slow because Y, use Z instead" |
After completing a significant task:
.claude/session-state/learnings/# {Pattern Name}
**Confidence**: 0.3–0.9 (numeric — see §4)
**Category**: debugging | architecture | tooling | domain | performance
**Discovered**: {date}
**Last validated**: {date}
## Pattern
{Concise description of the reusable pattern}
## Evidence
{How this was validated — test results, build success, etc.}
## When to Apply
{Trigger conditions — when should this pattern be used?}
## Anti-patterns
{What NOT to do — common mistakes this pattern prevents}
When 3+ related learnings or instincts cluster around a topic, evolve them — and not only into a skill. Pick the target by shape:
learnings + instincts → cluster by domain → draft (skill | command | agent) → validate → install
---
name: {skill-name}
description: "{when to activate this skill}"
---
# {Skill Name}
## When to Activate
{Conditions that trigger this skill}
## Core Concepts
{Distilled from clustered learnings}
## Practical Examples
{From validated evidence in learnings}
## Anti-Patterns
{From failed approaches across learnings}
An instinct is an atomic learned behavior — one trigger, one action — finer-grained than a learning. A learning is a prose pattern; an instinct is a small, evidence-backed rule the agent applies by default.
Observation that relies on a skill firing is probabilistic (~50–80% — it fires on the agent's judgment). Hooks fire 100% of the time, deterministically. So capture observations through the harness's hooks (Stop / PreCompact / PostToolUse), not by hoping a skill notices — every user correction, error resolution, and repeated workflow is an observation, and none should be missed.
Store at .claude/session-state/instincts/<id>.md (repo-local = project-scoped by nature):
---
id: grep-before-edit
trigger: "before editing an unfamiliar file"
action: "grep/Read the file and its callers first"
confidence: 0.6 # 0.3 tentative → 0.9 near-certain
domain: tooling # code-style | testing | git | debugging | workflow | tooling | domain | security
scope: global # project (default) | global
evidence: 3 # observation count backing it
discovered: {date}
last-validated: {date}
---
- Action: {exactly what to do}
- Evidence: {the observations that created or raised it}
| Score | Meaning | Behavior |
|---|---|---|
| 0.3 | Tentative | suggested, not enforced |
| 0.5 | Moderate | applied when relevant |
| 0.7 | Strong | auto-applied |
| 0.9 | Near-certain | core behavior |
Increase when: repeatedly observed · user does not correct the applied behavior · an instinct from another source agrees. Decrease when: user corrects it · not observed for a long stretch · contradicting evidence appears.
Default project (lives in this repo's .claude/session-state/instincts/). Tag global only for cross-project-stable behaviors.
| Pattern type | Scope |
|---|---|
| Framework/lang conventions, file structure, code style, error strategy | project |
| Security practices, general best practices, tool-workflow, git practices | global |
Promotion (project → global): when the same instinct appears in 2+ projects with avg confidence ≥ 0.8, promote it — for this harness that means moving the stable fact into the project-profile (§6, where every agent reads it) or evolving it into a skill (§3). Project-scoping prevents cross-project contamination (React conventions never leak into a Python repo).
Observation (hook-captured)
→ instinct @0.3 (atomic, project-scoped)
→ confidence grows with repeat / non-correction
→ 3+ related, ≥0.7 → cluster → evolve (§3: skill | command | agent)
→ cross-project stable → promote to global (profile / skill)
To get v2's deterministic 100% capture without a background process: a lean shell PostToolUse hook appends a compact line (tool, target, outcome) to .claude/session-state/observations.jsonl; the existing Stop hook's session-end step extracts instincts from it. This stays inside the harness's shell-hook + .claude/session-state/ model — no background observer, no Python. Until that hook is wired, capture observations in current.md during work and extract instincts at milestones / Stop.
After each team workflow completion:
Capturing learnings is worthless if nothing ever reads them back. Extraction (§2) is only half the loop — this is the consumption half. Without it, the same mistake recurs every session despite being "learned."
Before starting non-trivial work, route the relevant learnings into the task:
.claude/session-state/learnings/ for entries whose "When to Apply" matches the current task's domain/area.When spawning an agent for a task that a learning covers, include the relevant learning in the agent's briefing (the subagent-orchestration Context section). An agent that never sees the learning cannot apply it. Prefer pasting the one matching learning over hoping the agent re-derives it.
When learnings/ grows past ~10 entries, maintain a one-line index (learnings/index.md: area → file → hook) so "which learnings apply to area X" is a single read, not a folder scan. This is the router that makes reuse cheap enough to actually happen.
A high-confidence learning that is a stable fact about this project (not a transient finding) belongs in the project-profile (.claude/project-profile/) where every agent already reads it — e.g. "the authoritative typecheck command is X", "bulk search caps at N", "store Y must be eagerly initialized." Move it there so it is consumed by default, and leave a pointer in the learning.
For projects that maintain a curated knowledge base (a wiki, a profile, an index, an ADR set), the base drifts the moment code changes without a matching doc update. A living knowledge base needs an explicit maintenance contract, or it rots into a misleading liability.
| When this changes | Update this |
|---|---|
| API contract / endpoint / DTO | api-layer profile + any contract doc |
| Authoritative verify command / baseline | stack.md "Build & Verify" |
| A new recurring gotcha is confirmed | a learning (§2), promoted to profile if project-stable (§6) |
| Architecture / module boundary | the architecture overview/wiki page (link to the code, don't restate it) |
| A documented file/command/flag is renamed or removed | every doc that named it (grep the knowledge base for the old name) |
Periodically (e.g. at workflow completion, or when a doc feels stale): pick a knowledge-base page, follow its claims to the code, and flag any that no longer hold. A recalled learning or wiki line that names a file/function/flag is only valid as of when it was written — verify it still exists before acting on it.
wiki skillThe agent wiki (.claude/wiki/) is maintained by the separate wiki skill, but continuous-learning feeds and governs it — no overlap:
learnings/ is one of the wiki's ingest sources. A high-confidence, project-stable learning may be promoted to a wiki page (in parallel with §6 profile promotion — routing, not duplication).State file: .claude/session-state/current.md (update during session)
Learnings: .claude/session-state/learnings/{topic}.md
Archive: .claude/session-state/archive/ (auto-managed by hooks)
Extract: After milestones — identify, validate, generalize, score, store
Reuse: At task start, load matching learnings; route them into agent briefings; index when >10
Instincts: .claude/session-state/instincts/<id>.md — atomic (1 trigger/1 action), confidence 0.3–0.9, scope project|global
Promote: Project-stable learning/instinct → project-profile (read by default); project→global when seen in 2+ projects ≥0.8
Evolve: 3+ related items at ≥0.7 → skill | command | agent
Maintain: Link don't duplicate; same change updates the doc it invalidates; verify a recalled fact still exists
Prune: 30-day TTL for pending, immediate for contradicted
npx claudepluginhub junjak/ai-harness --plugin junjak-ai-harnessCreates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.