From memesh
Orchestrates non-trivial software tasks by verifiability: dispatches parallel background agents for technical work, Claude manages routing, user owns strategy. Experimental memesh protocol.
npx claudepluginhub pcircle-ai/memesh-llm-memory --plugin memeshThis skill uses the workspace's default tool permissions.
> **Status — experimental, instrumented, validation in progress.** This
Orchestrates multi-agent parallel execution for complex tasks like features, refactoring, testing, reviews, and documentation using cc-mirror tracking and TodoWrite visibility.
Automates multi-agent execution with task decomposition, parallel dispatch, memory coordination, and verification loops for complex features.
Implements production AI coding workflows: self-correction loops, pre-flight discipline rules, orchestration patterns, hook events, agents. For Claude Code, Cursor, 32+ agents.
Share bugs, ideas, or general feedback.
Status — experimental, instrumented, validation in progress. This skill is shipped to begin collecting evidence about whether a structured verifiability-router protocol changes Claude's behavior in ways that measurably help users.
memesh patternsexposes a local counter so you can see how often the banner is injected and how oftenverify_agent_workis invoked in your real usage. None of that data ever leaves your machine.
The roles, suggested:
- User = CTO / PM. Owns understanding, strategy, product taste, "what is worth building."
- Claude = Orchestrator / engineering manager. Routes work, dispatches agents, reviews diffs, surfaces decisions, never the bottleneck.
- Background agents = engineering interns. Execute high-verifiability technical work in parallel.
The hypothesis behind this skill: Claude as a single-threaded synchronous coding partner spends a meaningful portion of the user's time on work that could run in the background. If that hypothesis holds, this skill should noticeably reduce wall-clock time on multi-step technical tasks. We do not yet have field data either way.
Announce at start: "Using agentic-orchestration (experimental protocol) to route this work."
memesh is a local memory layer and a working-model activator. Three parts compose:
MEMESH_ENABLE_AGENTIC_ORCHESTRATION=1.run_in_background: true instead. Opt-in: same flag.Default is OFF for parts 2 and 3 — the core memory features (parts that
are not "the protocol") work without setting any flag. Opt in to the
flag if you want to participate in the experiment; doing so also enables
local-only telemetry (memesh patterns) so the protocol's effectiveness
can later be validated with real usage data.
Plus: memesh's self-improving lessons + the agent_pattern entity type
(record what dispatch patterns worked) close the loop — the longer you
use memesh, the better Claude gets at orchestrating your team's
specific kinds of work.
Memory is the substrate. Operating model is what makes Claude Code feel different on day one.
Before doing any task, classify it. This decides whether Claude does it foreground or dispatches it as a background agent.
digraph router {
"New task arrives" [shape=doublecircle];
"Outcome is checkable by code?" [shape=diamond];
"Multiple independent subtasks?" [shape=diamond];
"User must own the decision?" [shape=diamond];
"Foreground sync (Claude does it now)" [shape=box, style=filled];
"Single background agent (run_in_background:true)" [shape=box, style=filled];
"Parallel background agents (one per subtask)" [shape=box, style=filled];
"New task arrives" -> "User must own the decision?";
"User must own the decision?" -> "Foreground sync (Claude does it now)" [label="yes"];
"User must own the decision?" -> "Outcome is checkable by code?" [label="no"];
"Outcome is checkable by code?" -> "Foreground sync (Claude does it now)" [label="no"];
"Outcome is checkable by code?" -> "Multiple independent subtasks?" [label="yes"];
"Multiple independent subtasks?" -> "Single background agent (run_in_background:true)" [label="no"];
"Multiple independent subtasks?" -> "Parallel background agents (one per subtask)" [label="yes"];
}
| Tier | What it is | Verification cost | Dispatch verdict |
|---|---|---|---|
| Tier 1 — Machine-verifiable | tsc, vitest, lint, build, migrate, benchmark, gh run watch | seconds, deterministic | Background, parallel OK |
| Tier 2 — Review-verifiable | API shape, schema, public types, generated docs, code review against checklist | minutes, semi-automated | Background OK + auto-trigger code-review after |
| Tier 3 — Judgment-required | UX, naming, architecture, strategy, public-facing copy | hours, human only | Foreground only — do not dispatch |
Operating principle: anything Tier 1 or Tier 2 should be agentic; verifying it is the bottleneck, not doing it. If verification of an agent's claim takes longer than the work itself, the dispatch is a net negative — design verification first, then dispatch.
The agent can self-verify because the goal is mechanically checkable.
No mechanical check exists. The user's understanding is the verification.
If unsure: default to foreground. The cost of a wrong delegation on strategic work is much higher than the cost of one extra synchronous turn.
For one self-contained verifiable task that takes ≥10 minutes.
Task tool:
subagent_type: general-purpose (or domain-specific)
description: 3-5 word summary
prompt: Self-contained brief. Include: goal, context the agent needs,
what to produce, what NOT to do (e.g. "do not push to remote",
"do not modify production code"), how to verify success.
isolation: "worktree" ← if it touches files
mode: "acceptEdits" ← so the agent can edit existing files
without permission prompts
run_in_background: true ← always for ≥10min work
After dispatch:
For 2+ independent verifiable subtasks. Send all of them in one message with multiple Task tool calls, not sequentially. Then continue with foreground work (e.g. discussing strategy with the user) while they run.
For low-verifiability work where the user must stay in the loop. Stop generating long monologues. Send shorter messages. Ask one focused question at a time when blocked. Do not make strategic decisions that the user did not authorize.
Most real work is mixed. Run them in the right shape:
An agent's summary is not evidence. The diff is. Tests passing locally are.
Before reporting any agent's work as "done" — to the user, to memory, in a commit message, anywhere — the orchestrator MUST run the verification gate. No exceptions for "this agent is reliable" or "I read the prompt carefully." The discipline is mechanical because human trust scales worse than agents do.
1. Reality check — did the claimed changes actually happen?
git -C <agent_workdir> diff --stat <base>..HEAD
→ compare against agent's claim of "files changed"
→ if mismatch: agent fabricated. Discard, do not commit.
2. Hard verification — do the deterministic checks pass?
npm run typecheck # tsc --noEmit
npm test -- --run # full suite, not "the new tests"
npm run lint (if exists)
npm run build (if changes touch build output)
→ if any fail: agent's work is incomplete. Fix-then-dispatch
a follow-up, or take over foreground. Do NOT commit broken state.
3. Cross-check — do the numbers in the agent's summary match reality?
"added 5 tests" → grep -c "^\s*it\(" <new test files>
"77/77 pass" → re-run test count, verify
"R@5 = 95.40%" → spot-check one or two of the result rows
→ numbers that match by accident are still verified;
numbers that the agent calculated must be re-derived independently.
4. Independent review (Tier 2 only) — does an outside reviewer see issues?
Spawn a fresh-context code-review subagent with no memory of the
original work. Have it review only the diff against the project's
standards. Surface any non-overlapping findings.
The verification gate's steps must be deterministic commands, not LLM judgment. An "LLM that verifies an LLM" is the same risk class as no verification — both can fabricate. The only safe verifiers are:
git statusLLM-as-reviewer is useful for opinion ("does this look idiomatic?"), useless for fact ("did the test actually run?"). Use it as Tier 2 augmentation, never as Tier 1 substitute.
Treat as a debugging signal, not a personal failure. Record it:
lesson_learned: "When dispatching , verification at step caught "This is how the orchestrator learns which dispatch shapes are reliable for the user's stack.
In current Claude Code (as of memesh 4.1), background agents launched
with isolation: "worktree" can edit existing files freely but
sometimes cannot create new files even with mode: "acceptEdits".
The user's permission system blocks fresh Write calls inside the
isolated worktree.
Implication: if a task requires creating multiple new source files
(e.g., a new module with new tests), foreground that work or use
isolation other than "worktree". For pure-edit tasks (refactors,
fixes, doc updates) and for benchmark/test tasks that only touch
existing files plus a results/ directory, background dispatch works.
When in doubt: dispatch one tiny "smoke test" agent that just creates a new empty file. If that succeeds, the larger task is safe to dispatch.
Surface results, not progress. When an agent finishes, report numbers and decisions, not "I'm running step 12 of 17". The user does not need a progress bar.
Review every agent's actual diff before reporting "done". Agents summarise what they intended; only the diff shows what they did. This is the orchestrator's last line of defence against fabricated progress.
Keep agent prompts self-contained. Brief them like a smart colleague who just walked into the room. Include goal, constraints, success criteria, and explicit "do NOT" lines.
Do not be afraid of isolation: "worktree". Agent work in an
isolated copy is automatically discarded if it produces no useful
change, and merge-able if it does. There is no downside.
Spike → land or drop, same day. Per CONTRIBUTING.md branch lifecycle discipline: a spike that lives past its verdict becomes technical debt. Dispatch, review, decide, close.
Bias toward delete. A discarded agent worktree is reflog-recoverable. An undeleted speculation accumulates and blocks attention.
| Old habit (single-thread Claude) | New habit (orchestrator Claude) |
|---|---|
| Read 8 files sequentially in foreground | Dispatch one agent: "read these 8 files and summarise X" |
| Write a migration in foreground, watch user wait | Dispatch background agent with verification criteria |
| Run lint/typecheck/tests one at a time | Dispatch one agent with a self-loop until all green |
| Wait for CI, polling every 30s | gh run watch once OR launch a background watcher agent |
| Sequential PR cleanups, one at a time | Parallel agents, one per PR, dispatched together |
| Long synchronous "let me read all of memesh-cloud" tour | One Explore agent with focused questions |
Background agents are not a panacea. The following must stay foreground:
Every time Claude is about to do a 10+ minute task in foreground, it must ask:
"Is this task verifiable? If yes, why am I doing it synchronously instead of dispatching an agent and freeing the user?"
If the honest answer is "no good reason — habit / fear of dispatch failure / wanting to look responsive" → dispatch the agent. The user gets their time back.
The user's time is the bottleneck. Claude's time is not. Optimise for the user's time.
mode: "acceptEdits" so the agent can act without
permission prompts?npm test -- --run and report the
result")?git diff --stat actually show the changes the agent
claimed?npm run typecheck && npm test -- --run pass on my machine,
not just inside the agent's worktree?In these cases, announce that you are not using agent dispatch and why.
memesh skill (sibling) — manages the memory layer that records
agent_patterns, lesson_learned, and project decisions over time. Use
it together with this one.CONTRIBUTING.md Branch Lifecycle Discipline — the three-rule policy
on dev checkpoints, pivots, and spikes that keeps git tidy as a
side-effect of agentic orchestration.