From mlld
Designs and builds mlld orchestrators for LLM workflows, coordinating LLM calls in pipelines, processing data at scale, and decision-driven automation.
npx claudepluginhub mlld-lang/mlld --plugin mlldThis skill uses the workspace's default tool permissions.
**IMMEDIATELY AFTER READING THIS SKILL, YOU MUST RUN `mlld howto intro` before writing any mlld code.** The intro covers syntax, gotchas, built-in methods, file loading, and common traps. Skipping it leads to inventing non-existent features and writing code that validates but fails at runtime.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
IMMEDIATELY AFTER READING THIS SKILL, YOU MUST RUN mlld howto intro before writing any mlld code. The intro covers syntax, gotchas, built-in methods, file loading, and common traps. Skipping it leads to inventing non-existent features and writing code that validates but fails at runtime.
mlld howto intro # Language fundamentals — read this first
mlld init # Initialize project (enables mlld run)
mlld install @mlld/claude # Install the Claude module
It is strongly encouraged to view at least one of the examples in plugins/mlld/examples/ before writing an orchestrator — audit/, research/, and development/ each demonstrate a complete archetype.
Every mlld orchestrator follows one flow:
gather context → execute (LLM call) → invalidate → remediate → re-invalidate
Work is broken until the adversary can't break it. This is the default stance.
Every orchestrator should be:
exe llm. Crash recovery is automatic via the LLM cache. Use checkpoint directives between phases for --resume targeting. Use --new to start a fresh run.for parallel(N) wherever items are independent. Accept --parallel n to let the caller cap concurrency (default: 20).output @prompt to "@runDir/worker-*.prompt.md")./mlld:llm-first for the full design philosophy.llm/output/{script-name}/YYYY-MM-DD-n by default unless the caller specifies otherwise.One cache, three invalidation strategies:
| Strategy | Scope | Use case | How it works |
|---|---|---|---|
| Automatic (LLM cache) | Per-call | Crash recovery | Re-run script, llm-labeled calls with same args hit cache |
--resume @fn | Per-function | Prompt iteration | Invalidate all cached results for a function, re-execute |
--resume "name" | Per-phase | Workflow navigation | Named checkpoint directive marks a position; invalidate all LLM calls after it |
The llm label on exe marks calls for caching. Checkpointing auto-enables when llm-labeled calls exist — no flag needed.
>> Define once — every invocation is independently cached by argument hash
exe llm @review(prompt) = @claudePoll(@prompt, { model: "sonnet", tools: @tools, poll: @outPath })
>> For one-off calls
var llm @summary = @claudePoll(@prompt, { model: "sonnet", poll: @outPath })
Just re-run. Completed llm-labeled calls hit cache automatically. No run directories, no event logs, no idempotency checks.
mlld run pipeline # crashes at item 47 of 100
mlld run pipeline # items 1-46 are instant cache hits, continues from 47
checkpoint directives mark phase boundaries. On --resume "name", everything before the checkpoint hits cache; everything after re-executes.
exe llm @collect(item) = @claudePoll(@collectPrompt(@item), { model: "sonnet", tools: @tools, poll: @outPath })
exe llm @analyze(item, data) = @claudePoll(@analyzePrompt(@item, @data), { model: "opus", tools: @tools, poll: @outPath })
checkpoint "collection"
var @data = for parallel(20) @item in @items => @collect(@item)
checkpoint "analysis"
var @results = for parallel(20) @item in @items => @analyze(@item, @data)
mlld run pipeline # auto-resumes via cache
mlld run pipeline --resume "analysis" # skip to analysis phase
mlld run pipeline --resume @analyze # re-run all @analyze calls
mlld run pipeline --new # fresh run, clear cache
Checkpoints are top-level only. Loops are covered by the other two strategies:
for parallel over a collection: Each llm-labeled call gets a unique cache key (different args per item). Crash at item 547? Re-run, items 1-546 hit cache automatically.loop() convergence loops: Each iteration calls functions with evolving arguments, so they get different cache keys. Cache handles crash recovery call-by-call.--resume @fn to invalidate a specific function across all iterations.BEFORE (manual resumption, ~40 lines per phase):
loop(@maxAttempts) [
for parallel(@parallelism) @item in @items [
let @outPath = `@runDir/phase1/@item.id\.json`
let @alreadyDone = @fileExists(@outPath)
if @alreadyDone == "yes" [
show ` @item.name: skipped`
=> null
]
show ` @item.name: processing...`
@claudePoll(@prompt, { model: "opus", tools: @tools, poll: @outPath })
let @result = <@outPath>?
if !@result [
show ` @item.name: FAILED`
@logEvent(@runDir, "failed", { id: @item.id })
=> null
]
@logEvent(@runDir, "complete", { id: @item.id })
=> null
]
let @checks = for @c in @items [
let @exists = @fileExists(`@runDir/phase1/@c.id\.json`)
if @exists != "yes" [ => 1 ]
=> null
]
let @missing = for @x in @checks when @x => @x
if @missing.length == 0 [ done ]
show ` Retrying @missing.length failed items...`
continue
]
AFTER (checkpoint, ~3 lines):
exe llm @review(item) = @claudePoll(@buildPrompt(@item), { model: "opus", tools: @tools, poll: @outPath })
checkpoint "phase-1"
var @results = for parallel(20) @item in @items => @review(@item)
LLM-first cheat sheet (see /mlld:llm-first for details):
Your mlld code should be boring. It gathers context, asks an LLM "what should I do?", executes the answer mechanically, and repeats. All intelligence lives in prompts. The orchestrator is a switch statement.
Code does:
Code does NOT:
When you need new behavior, add guidance to the decision prompt. Don't add if-else to the orchestrator.
The throughline of every orchestrator. Default is failure. Work is broken until proven otherwise.
Evidence-based: Every claim requires command + output + interpretation. "Probably works" is invalid.
No substitution: Test the actual mechanism described. If the spec says "env block restricts tools," test an env block — don't test a different mechanism and call it equivalent.
Remediation requires re-invalidation: Fixing a finding doesn't close it. The adversary must re-test after the fix.
All three example archetypes include invalidation workers. See:
examples/audit/ — verified invalidation with tool escalationexamples/research/ — invalidation of synthesis claimsexamples/development/ — adversarial verification of implementationThe naive pattern — give the adversary the same context as the worker and ask "is this right?" — produces false positives. The adversary has no more information than the original worker, so it either rubber-stamps or invents objections from thin air.
The fix: give the adversary more tools than the worker had. This is the verified invalidation pattern:
The verifier must answer: "Can I prove this finding is real?" — not "Does this look right?"
Phase 1 worker: Read-only, scoped context → candidate findings
Phase 2 verifier: Read + search + execute → verified findings with evidence
Concrete tool escalation:
Read,Write,Glob,Grep (enough to compare inputs)Read,Write,Glob,Grep,Bash(mlld:*),Bash(ls:*) (can search the codebase, run validation commands, check test cases)Verification requirements (enforce in the prompt):
Classification taxonomy (give the verifier these options):
confirmed: Evidence supports the finding — cite specific filesfalse-positive: The feature/content exists, worker just didn't see itinsufficient-context: Worker's context was too narrow — note where the answer livesneeds-human: Ambiguous, might be intentional designThis pattern comes from the QA workflow (llm/run/qa/), where Phase 1 (black-box testing with limited docs) produces candidate issues, and Phase 2 (self-review with test cases + source access) verifies them empirically. The self-review consistently reclassifies 30-50% of Phase 1 findings as false positives.
Anti-pattern: Giving the adversary the exact same tools and context as the original worker. If the worker couldn't tell, neither can the adversary. Escalate access or the invalidation step is theater.
Use pipeline => retry with @mx.hint for step-level quality checks on LLM output. The gate validates the output and either accepts it, retries with feedback, or falls back after max attempts.
>> Source: calls the LLM (re-runs on retry with gate feedback via @mx.hint)
exe @callAgent() = [
let @feedback = @mx.hint ? `\n\nPrevious attempt was rejected: @mx.hint\n\nAddress the feedback and try again.` : ""
let @fullPrompt = `@prompt@feedback
IMPORTANT: Write your JSON response to @outPath using the Write tool.`
@claudePoll(@fullPrompt, { model: "sonnet", tools: @tools, poll: @outPath })
=> <@outPath>?
]
>> Gate: validates output, retries with feedback, falls back after 3 attempts
exe @qualityGate() = [
if !@mx.input [ => { status: "failed" } ]
let @gate = @checkOutput(@task, @mx.input)
if @gate.pass [ => @mx.input ]
if @mx.try < 3 [ => retry @gate.feedback ]
=> { status: "failed", reason: "gate_retry_failed" }
]
var @result = @callAgent() | @qualityGate
On retry, @callAgent() re-executes with @mx.try incremented and @mx.hint set to the gate's feedback. The LLM sees the rejection reason and can correct its output.
Model escalation follows the same pattern — use @mx.try to pick the model:
exe @classify() = [
let @model = @mx.try > 1 ? "sonnet" : "haiku"
>> ... call @claudePoll with @model ...
=> @result
]
exe @ensureConfidence() = when [
@mx.input.confidence == "low" && @mx.try < 2 => retry
* => @mx.input
]
var @routing = @classify() | @ensureConfidence
When to use pipeline retry vs. decision-loop repair:
Both can coexist. A worker might use pipeline retry for output quality, while the outer decision loop handles strategic failures.
Glob files → parallel LLM review (limited tools) → collect findings → parallel verification (expanded tools) → output classified results.
No decision agent. Linear pipeline. Fastest to build. Demonstrates tool escalation between phases.
Use when: Processing a batch of similar items independently (file review, data extraction, classification).
>> Phase 1: reviewer sees only the file
var @reviewTools = ["Read", "Write"]
>> Phase 2: verifier can explore the codebase
var @verifyTools = ["Read", "Write", "Glob", "Grep"]
exe llm @review(file) = @claudePoll(@reviewPrompt(@file), { model: "sonnet", tools: @reviewTools, poll: @outPath })
exe llm @verify(finding, source) = @claudePoll(@verifyPrompt(@finding, @source), { model: "sonnet", tools: @verifyTools, poll: @outPath })
var @files = <src/**/*.ts>
checkpoint "review"
var @results = for parallel(20) @file in @files => @review(@file)
checkpoint "verify"
var @verified = for parallel(20) @finding in @results => @verify(@finding, @finding.file)
Decision agent infers phase from filesystem state. Parallel fan-out for batch operations. Builds toward a synthesis that gets invalidated.
Use when: Multi-step analysis where phases depend on prior results (document analysis, research synthesis, data pipeline).
exe llm @assess(source) = @claudePoll(@assessPrompt(@source), { model: "sonnet", tools: @workerTools, poll: @outPath })
exe llm @synthesize(data) = @claudePoll(@synthesizePrompt(@data), { model: "opus", tools: @workerTools, poll: @outPath })
loop(endless) [
let @context = @buildContext(@runDir)
let @decision = @callDecisionAgent(@context)
when @decision.action [
"discover" => [...]
"assess" => [...] >> for parallel(20)
"synthesize" => [...]
"invalidate" => [...] >> for parallel(20)
"complete" => done
]
]
Continuous decision loop with external state (GitHub Issues). Creates issues, dispatches workers, runs adversarial verification. Quality gate before completion.
Use when: Open-ended tasks requiring iteration, external coordination, and quality assurance (feature development, project automation).
See: ../../examples/development/
exe llm @callWorker(prompt, config) = @claudePoll(@prompt, @config)
exe llm @callDecisionAgent(context) = @claudePoll(@decisionPrompt(@context), { model: "opus", tools: @decisionTools, poll: @outPath })
loop(endless) [
let @context = @buildContext(@config, @runDir)
let @decision = @callDecisionAgent(@context)
when @decision.action [
"work" => [...]
"create_issue" => [...]
"close_issue" => [...]
"blocked" => [ @writeQuestions(...); done ]
"complete" => done
]
]
llm/ ConventionThe llm/ directory is the standard home for LLM workflows in any mlld project — the equivalent of src/ for application code.
project/
├── llm/
│ ├── run/ # Scripts for `mlld run <name>`
│ │ └── my-pipeline/ # Each pipeline gets a subdirectory
│ │ └── index.mld # Entry point
│ ├── mcp/ # MCP tool modules (`mlld mcp` auto-serves this dir)
│ ├── agents/ # Agent definitions
│ ├── prompts/ # Shared prompt templates
│ └── lib/ # Shared utilities
├── mlld-config.json # Created by `mlld init`
└── ...
mlld run <name> looks in llm/run/<name>/ for an index.mld. mlld mcp with no arguments serves every module in llm/mcp/.
Orchestrators live in llm/run/ and follow this internal structure:
llm/run/my-orchestrator/
├── index.mld # Entry point — main loop
├── lib/
│ ├── context.mld # State management, context gathering
│ └── [domain].mld # Domain-specific helpers
├── prompts/
│ ├── decision/
│ │ └── core.att # Decision agent prompt template
│ ├── workers/
│ │ ├── [role].att # One template per worker type
│ │ └── verify.att # Verification worker (expanded tools)
│ └── shared/
│ └── [fragment].md # Reusable prompt fragments
└── schemas/
├── decision.json # Decision output JSON Schema
└── worker-result.json # Worker output JSON Schema
Label LLM calls and mark phase boundaries. The checkpoint system handles crash recovery and resumption automatically.
>> Label LLM calls — caching is automatic
exe llm @review(prompt) = @claudePoll(@prompt, { model: "sonnet", tools: @tools, poll: @outPath })
>> Mark phase boundaries
checkpoint "collection"
var @data = for parallel(20) @item in @items => @collect(@item)
checkpoint "analysis"
var @results = for parallel(20) @item in @items => @analyze(@item)
mlld run pipeline # auto-resumes via cache
mlld run pipeline --resume "analysis" # skip to analysis phase
mlld run pipeline --resume @analyze # re-run all @analyze calls
mlld run pipeline --new # fresh run, clear cache
For decision-loop orchestrators where the LLM agent needs to see what happened, event logs serve as data for the decision agent, not as a resumption mechanism. The checkpoint cache handles resumption.
exe @logEvent(runDir, eventType, data) = [
let @event = { ts: @now, event: @eventType, ...@data }
append @event to "@runDir/events.jsonl"
]
exe @loadRecentEvents(runDir, limit) = [
let @lines = @tailFile(`@runDir/events.jsonl`, @limit)
=> for @line in @lines.split("\n") when @line.trim() => @line | @json
]
Use event logs when the decision agent reads recent history to inform its next action (the development/j2bd archetype). For linear pipelines (the audit archetype), event logs are unnecessary — the checkpoint cache handles everything.
Decision-loop orchestrators that track cross-iteration state (last worker result, last error) still use run.json:
{
"id": "2026-02-09-0",
"created": "2026-02-09T10:00:00Z",
"lastResult": null,
"lastError": null
}
This is program state the decision agent reads, not resumption infrastructure.
Tell the LLM to write structured output to a specific file path. Don't parse streaming output.
let @outputPath = `@runDir/decision-@iteration.json`
let @fullPrompt = `@prompt
IMPORTANT: Write your JSON response to @outputPath using the Write tool.`
@claudePoll(@fullPrompt, { model: "opus", tools: @tools, poll: @outputPath })
let @decision = <@outputPath>?
The orchestrator reads the file after the agent finishes. The file doubles as a debugging artifact.
Prompts are .att template files with @variable interpolation. Declared as executables:
exe @decisionPrompt(tickets, events, lastError) = template "./prompts/decision/core.att"
exe @workerPrompt(task, guidance, context) = template "./prompts/workers/implement.att"
Inside .att files, use XML-tagged sections for structured context:
<tickets>
@tickets
</tickets>
<recent_events>
@recentEvents
</recent_events>
<last_error>
@lastError
</last_error>
Standard XML-tagged sections in decision prompts:
| Section | Purpose |
|---|---|
<goal> | What we're trying to accomplish |
<state> | Current state: open issues, file inventory |
<history> | Recent events from events.jsonl |
<last_result> | Output from the previous action |
<last_error> | Error from last iteration (if any) |
<constraints> | Hard limits, budgets, rules |
Use selective context loading — only load what the current job needs.
Loaded files have parsed frontmatter (@f.mx.fm.title, @f.mx.fm.tags) and metadata (@f.mx.tokens, @f.mx.relative) — don't use js/node blocks for this. See mlld howto file-loading-metadata.
Decision agents return one action type. Use conditional JSON Schema:
{
"required": ["reasoning", "action"],
"allOf": [
{
"if": { "properties": { "action": { "const": "work" } } },
"then": { "required": ["task", "guidance"] }
},
{
"if": { "properties": { "action": { "const": "blocked" } } },
"then": { "required": ["questions"] }
},
{
"if": { "properties": { "action": { "const": "complete" } } },
"then": { "required": ["summary"] }
}
]
}
The orchestrator switches mechanically on decision.action. No interpretation.
Different agents get different tool sets:
var @decisionTools = ["Read", "Write", "Glob", "Grep"]
var @workerTools = ["Read", "Write", "Edit", "Glob", "Grep", "Bash(git:*)", "Bash(npm:*)"]
Decision agents: read + write (for output file). Workers: full access scoped to needs.
Use for parallel(N) for batch operations. The llm label makes each call independently cached — no manual idempotency checks needed.
exe llm @review(file) = @claudePoll(@reviewPrompt(@file), { model: "sonnet", tools: @tools, poll: @outPath })
var @results = for parallel(20) @file in @files => @review(@file)
Crash at item 47 out of 100? Re-run. Items 1-46 are instant cache hits.
When to parallelize: Independent items (file reviews, assessments, invalidation checks). When to sequence: Data dependencies (synthesis depends on assessments, decisions depend on prior state).
The LLM cache handles idempotency automatically for llm-labeled calls. Each call is cached by argument hash — same args return the cached result without calling the LLM.
Manual idempotency checks (<@outPath>?) are only needed for non-LLM side effects (file writes, shell commands) that you don't want to repeat.
When blocked, write structured questions and exit cleanly:
"blocked" => [
@writeQuestionsFile(@runDir, @decision.questions)
show `Resume with: mlld run myorch`
done
]
Human answers at their leisure. Re-run the script — the cache auto-resumes past completed work. For phase targeting: mlld run myorch --resume "phase-name". Decision agent reads answers from context and continues.
Audit archetype: sonnet for everything (simple, parallel). Research archetype: sonnet for assessment, opus for synthesis and invalidation. Development archetype: opus for decisions and workers (high-stakes).
@runDir/worker-*.prompt.md for replay.MLLD_DEBUG_CLAUDE_POLL=1: Diagnostics for @claudePoll polling behavior.>> Hook: log every LLM call with cache status
hook @trace after op:exe = [
if @mx.op.labels.includes("llm") [
show ` @mx.op.name | cached: @mx.checkpoint.hit`
]
]
>> Save prompt for debugging
output @workerPrompt to "@runDir/worker-@task-@iteration.prompt.md"
Two approaches for two archetypes:
Linear pipelines (audit archetype): Use named checkpoint directives between phases. The script runs top-to-bottom; --resume "phase-name" skips to a phase.
checkpoint "review"
var @results = for parallel(20) @file in @files => @review(@file)
checkpoint "verify"
var @verified = for parallel(20) @finding in @results => @verify(@finding)
Decision-loop orchestrators (research/development archetypes): The decision agent infers the current phase from what exists rather than tracking phase in code:
Infer phase from filesystem state:
1. No assessments/ → discovery phase
2. assessments/ incomplete → assessment phase
3. All assessed, no synthesis.json → synthesis phase
4. synthesis.json exists → invalidation phase
Put this logic in the decision prompt. The orchestrator never checks what phase it's in. Checkpoints are not needed inside loop(endless) — the LLM cache handles crash recovery call-by-call.
checkpoint directives for linear pipelines; let decision agents track phases via prompts for loop-based orchestratorsFor the full design philosophy with 17 principles and worked examples, see /mlld:llm-first.
See anti-patterns.md for traps to avoid.
See syntax-reference.md for mlld syntax cheat sheet.
See gotchas.md for mlld language gotchas and sharp edges.
To scaffold a new orchestrator: /mlld:scaffold.
To learn by example: read the three archetypes in ../../examples/.