From simmer
Records coding iteration results in trajectory table, tracks best candidate, handles regression rollback, and passes ASI forward in simmer loops. Supports single-file and workspace modes.
npx claudepluginhub 2389-research/claude-plugins --plugin simmerThis skill uses the workspace's default tool permissions.
You are the only subskill that sees the full score history. Your job: record the iteration, track the best candidate, and pass the ASI forward.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Builds scalable data pipelines, modern data warehouses, and real-time streaming architectures using Spark, dbt, Airflow, Kafka, and cloud platforms like Snowflake, BigQuery.
Builds production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. For data pipelines, workflow orchestration, and batch job scheduling.
You are the only subskill that sees the full score history. Your job: record the iteration, track the best candidate, and pass the ASI forward.
Update {OUTPUT_DIR}/trajectory.md with the running score table.
The trajectory table uses the same format regardless of evaluation mode (judge-only, runnable, or hybrid). Do not dump raw evaluator output, per-test-case breakdowns, or inline analysis into the table. The table is a clean score record.
Required format (do not add extra columns beyond those listed below):
Single-file mode:
Best candidate: iteration [N] (composite: [N.N]/10)# Simmer Trajectory
| Iteration | [criterion 1] | [criterion 2] | [criterion 3] | Composite | Key Change |
|-----------|---------------|---------------|---------------|-----------|------------|
| 0 | 4 | 5 | 3 | 4.0 | seed |
| 1 | 7 | 5 | 4 | 5.3 | [summary] |
| 2 | 7 | 6 | 7 | 6.7 | [summary] |
Best candidate: iteration 2 (composite: 6.7/10)
Workspace mode:
Best candidate: iteration [N] (composite: [N.N]/10)# Simmer Trajectory
| Iteration | Coverage | Efficiency | Noise | Composite | Config | Key Change |
|-----------|----------|------------|-------|-----------|--------|------------|
| 0 | 2 | 6 | 2 | 3.3 | qwen3.5:4b, single-call | seed |
| 1 | 4 | 4 | 4 | 4.0 | qwen3.5:27b, single-call | model swap + rich prompt |
| 2 | 3 | 5 | 3 | 3.3 | qwen3.5:9b, single-call | REGRESSION — 9b too weak |
Best candidate: iteration 1 (composite: 4.0/10)
If evaluator details matter for context, add them in a separate section BELOW the table:
## Evaluator Details
### Iteration 1
- Video ozXhzdjT8tU: 54% coverage (23/43 matched)
- Video DRqUWtnXEXA: 48% coverage (15/31 matched)
- Missed: [list of key misses]
The table itself stays clean. Evaluator details are supplementary, not part of the trajectory record.
The "Key Change" column uses the generator's 2-3 sentence report, condensed to a few words (under 60 characters). For iteration 0 (the seed), Key Change is always "seed".
Compare this iteration's score to the best-so-far. Update the "Best candidate" line at the bottom of the trajectory.
If a PRIMARY criterion is specified in the setup brief: best-so-far is determined by the primary criterion score first, composite as tiebreaker. Example: if primary is "coverage" and iteration 2 has coverage 6 (composite 5.3) vs iteration 4 with coverage 5 (composite 6.0), iteration 2 is best.
If no primary criterion: best-so-far is determined by composite (default).
The best candidate may not be the latest one. If iteration 3 scores lower than iteration 2, the best is still iteration 2.
If this iteration's composite is LOWER than best-so-far:
REGRESSION: true — use iteration [N] as input to next generatorWorkspace mode regression: The orchestrator will selectively restore workspace files from the best iteration's commit (git checkout <commit> -- <files>). Trajectory.md and other tracking files are NOT reverted. Include the iteration number so the orchestrator knows which snapshot to restore.
If the setup brief includes a SEARCH_SPACE, review the trajectory to determine what has been explored vs what remains untried. Look at the Config column and Key Change column to identify:
Produce a concise exploration summary. Example:
Models tried: qwen3.5:4b (iter 0), qwen3.5:27b (iter 1-3). Untried: qwen3.5:9b.
Topologies tried: single-call (all iterations). Untried: multi-call.
Prompt changes: 4 variations tried.
Skip this for text/creative mode or when no search space is specified.
Review the trajectory to identify what's been consistently working — elements that were added in a previous iteration and have NOT been associated with a regression since.
Look at the Key Change column across iterations. If an element was introduced at iteration N and scores held or improved through iterations N+1, N+2, etc., it's a stable win. If an iteration that removed or changed that element regressed, that's strong evidence it's load-bearing.
Produce a concise list. Example:
STABLE WINS (do not remove):
- Correction lookup table (added iter 1, held through iters 2-4)
- Worked examples format (added iter 3, improved coverage each iteration since)
NOT WORKING:
- Verbose rule lists (tried iter 2, regressed)
- Multi-step prompt structure (tried iter 1, skipped by executor)
This gets passed to the judge board's deliberation summary so judges and the generator know what to preserve and what to avoid.
Return to the orchestrator:
ITERATION [N] RECORDED
BEST SO FAR: iteration [N] (composite: [N.N]/10)
REGRESSION: [true/false] — [if true: use iteration N as input to next generator]
ITERATIONS REMAINING: [N]
ASI FOR NEXT ROUND: [the judge's ASI, unchanged]
EXPLORATION STATUS: [what's been tried vs untried — omit for text/creative or no search space]
STABLE WINS: [what's working — do not remove]
NOT WORKING: [what's been tried and failed — do not retry same approach]
Dumping evaluator output into the trajectory table
Modifying the ASI
Not tracking best-so-far separately
Writing candidate content into trajectory
Using prose format instead of table