Help us improve
Share bugs, ideas, or general feedback.
From mtk
Loads minimal relevant context when starting, switching phases, or entering unfamiliar code. Prevents output drift by refreshing scope and rules.
npx claudepluginhub moberghr/mtk-agent-toolkit --plugin mtkHow this skill is triggered — by the user, by Claude, or both
Slash command
/mtk:context-engineeringThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
```!
Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.
Share bugs, ideas, or general feedback.
echo "--- Tech Stack ---"
cat .claude/tech-stack 2>/dev/null || echo "(not set)"
if [ -f .claude/tech-stack-pm ]; then echo "--- Package Manager ---"; cat .claude/tech-stack-pm; fi
Good output depends on good context. Load the minimum relevant context needed to act correctly, then refresh it when the task shifts.
CLAUDE.md when present..claude/manifest.json
may declare an applyTo glob array. When the current task has a known
set of files in scope (from the spec's change_manifest or from
git diff --name-only HEAD):
mtk_resolve_references tool is available, call it
with the list of touched files. It returns deterministic glob matches
against the manifest's applyTo arrays. Use its output directly.case / fnmatch semantics).applyTo are always-on when needed (e.g.
coding-guidelines, framework-patterns); load on demand per phase.applyTo references activated and whyReference reads in load-context steps are independent — issue multiple Read calls in a single message, not sequentially. Same applies to independent Glob/Grep discovery and to reviewer agents fanning out on orthogonal axes. If Call B's input would mention Call A's output, force them sequential; otherwise batch them.
See docs/parallelism-patterns.md for canonical patterns (parallel ref load, Stage 2 reviewer fan-out, batch deferred-tool hydration).
Track four lightweight signals during a session and flag fatigue early — refresh, prune, or hand off before output quality collapses. None of these require tooling; estimate from session state.
| Signal | Weight | Read as |
|---|---|---|
| Token utilization | 40% | Approaching the conversation's context limit (e.g., compaction warnings appearing). High = imminent fatigue. |
| Scope scatter | 25% | Number of distinct directories or features touched this session. >3 unrelated areas = scope creep, recall degrades. |
| Re-read ratio | 20% | How often the same file is re-loaded because earlier reads aged out. >2 re-reads of the same file = context evicted. |
| Error density | 15% | Build/test failures, corrections from the engineer, or tool-call retries per phase. Rising density = signal-to-noise dropping. |
Composite reading. If 2+ signals are elevated simultaneously:
handoff — capture state, end the session, resume in a fresh context.Honest reporting. These are heuristics, not measurements. When you report fatigue, name which signals are elevated and why — don't hide behind a composite score.
Track the cumulative context loaded in the session. Fewer, focused instructions beat many, diluted ones — every extra rule competes with the ones that actually matter most for the current task.
Budget guidelines:
When to check the budget:
Warning signals:
After completing reference loading at the end of Phase 0 (and after any subsequent phase that loads new references), emit a one-block footprint report so the engineer can see the cost of what was loaded:
# Run wc -l on each loaded reference file, then format the output:
# Example output:
#
# Context footprint (Phase 0):
# security-checklist.md 78 lines (~2k tokens)
# testing-patterns.md 112 lines (~2k tokens)
# dotnet/coding-guidelines.md 195 lines (~3k tokens)
# ─────────────────────────────────────────────────────────────────
# Total: 3 files, 385 lines (~5k tokens)
# (actual load depends on path-scoped matching — unmatched refs not counted)
Token estimate: 1 line ≈ 13 tokens (median for reference docs at ~65 chars/line ÷ 5 chars/token). This is a proxy, not an exact count.
Omit the block if no references were loaded in that phase (e.g., a Bash-only phase that touched no reference files). Keep it skimmable — one line per file, one totals line. Engineers can skip past it if they already know their setup.
applyTo globs: if a reference's globs don't match any touched
file, do NOT load it as a "just in case" measure. That defeats the budget.git diff --name-only HEAD as
the authoritative list of touched files.Not all tasks need the same model tier. Route work by complexity to optimize cost and quality:
| Phase / Skill | Model | Rationale |
|---|---|---|
| Pre-commit linter | N/A (bash) | Deterministic — no model needed |
| Setup-bootstrap scan recipes | haiku | File discovery, grep — structured data collection |
| Setup-audit scan recipes | haiku | Same — structured data collection |
| Planning and task breakdown | sonnet | Judgment needed but scope is bounded |
| Incremental implementation | sonnet | Standard code generation |
| Test-driven development | sonnet | Standard test generation |
| Pre-commit AI review | sonnet | Fast review with bounded scope |
| Compliance review | opus | Security, financial state, audit trails — highest stakes |
| Security and hardening | opus | Security decisions cannot be shallow |
| Spec-drift detection | sonnet | Structured comparison, moderate judgment |
| Architecture review | sonnet | Pattern matching against known rules |
| Test review | sonnet | Assertion quality, coverage gaps |
| Brainstorming | opus | Creative exploration benefits from deeper reasoning |
Agent frontmatter model: sets the model for subagents. Entry-point skills run on the user's selected model. When a skill spawns a reviewer agent, the agent's frontmatter controls its model.
Shared table for all MTK skills. Individual skills reference this section instead of repeating their own tables. If you catch yourself thinking one of these, stop and re-read what the current skill actually requires.
| Rationalization | Reality |
|---|---|
| "I'll just start coding and adjust later" | Early wrong assumptions produce the most expensive rework. Read before writing. |
| "More context is always better" | No. Irrelevant context crowds out the rules that actually matter. |
| "I already read a similar file in another project" | Local codebase patterns win over generic memory. |
| "This change is trivial, it obviously works" | Trivial changes cause production incidents. Verify anyway. |
| "I'll verify / test / document it later" | Later rarely happens. Do it now or it won't happen. |
| "I know where the bug / issue is without reproducing it" | You have a hunch, not evidence. Reproduce first. |
| "It's only one more file" | Hidden scope creep is how quick fixes become feature work. Escalate instead. |
| "Probably works / should work / the framework handles it" | Probably is not a control. Verify the actual behavior. |
| "The tests pass, so this is fine" | Passing tests do not clear architecture, security, or performance risks. |
| "I'll remember this for next time" | You won't — no persistent memory without explicit capture. Write it down. |
| "The approach is obvious — skip planning / approval / alternatives" | Obvious to whom? Planning and approval exist to catch the mis-framings that feel obvious. |
| "The spec is outdated; the implementation is right" | Then amend the spec and re-approve. Drift checks run against the current spec, not a hypothetical one. |