Run a continuous optimization loop — edit code, benchmark, keep or discard, repeat. Use when systematically optimizing a metric through iterative code changes.
From interlabnpx claudepluginhub mistakeknot/interagency-marketplace --plugin interlabThis skill uses the workspace's default tool permissions.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Designs, audits, and improves analytics tracking systems using Signal Quality Index for reliable, decision-ready data in marketing, product, and growth.
Enforces A/B test setup with gates for hypothesis locking, metrics definition, sample size calculation, assumptions checks, and execution readiness before implementation.
Run a continuous optimization loop: edit code, benchmark, keep or discard, repeat.
Announce at start: "I'm using the autoresearch skill to run an autonomous experiment loop."
Use this skill when you want to systematically optimize a metric by iterating through code changes. Examples:
The interlab MCP tools must be available: init_experiment, run_experiment, log_experiment.
Verify with a quick mental check: can you see these tools in your tool list? If not, the interlab plugin is not loaded — stop and tell the user.
If no interlab.md exists in the working directory:
Ask the user (or infer from context):
lower_is_better or higher_is_betterMETRIC name=value lines)Write interlab.sh (or use an inline command) that outputs metrics in the format:
METRIC <name>=<value>
Multiple METRIC lines are supported (one primary + optional secondary metrics). The script must be deterministic enough to measure real changes — avoid metrics that fluctuate >5% between identical runs.
Example:
#!/usr/bin/env bash
set -euo pipefail
# Run the thing being measured
START=$(date +%s%N)
go test ./... > /dev/null 2>&1
END=$(date +%s%N)
DURATION_MS=$(( (END - START) / 1000000 ))
echo "METRIC test_duration=$DURATION_MS"
echo "METRIC test_count=$(go test ./... -v 2>&1 | grep -c '^--- PASS')"
Call init_experiment with:
name: short campaign name (e.g., "skaffen-test-speed")metric_name: primary metric to optimize (must match a METRIC line name)metric_unit: unit (ms, bytes, ops/s, etc.)direction: "lower_is_better" or "higher_is_better"benchmark_command: command to run (e.g., "bash interlab.sh")working_directory: project root (omit to use cwd)This creates interlab.jsonl and checks out a branch interlab/<name>.
Create interlab.md in the working directory:
# interlab: <goal>
## Objective
<what we're optimizing and why>
## Metrics
- **Primary**: <name> (<unit>, <direction>)
- **Secondary**: <names if any>
## How to Run
`bash interlab.sh` — outputs METRIC name=value lines
## Files in Scope
<list of files the agent may modify>
## Constraints
<hard rules: tests must pass, no new deps, API must not change, etc.>
## What's Been Tried
<updated after each experiment — key wins, dead ends, insights>
run_experiment to establish the starting metric value.log_experiment with decision: "keep" and description: "baseline measurement".interlab.md with the baseline value under "What's Been Tried".After establishing baseline, check for prior approaches on this task type:
Call mutation_query with:
task_type: the campaign's task type (e.g., "agent-quality", "plugin-quality")is_new_best: true (only successful approaches)limit: 10If results returned, add to interlab.md under "## Prior Approaches (from mutation store)":
If mutation_query fails or returns empty: continue normally. The mutation store is optional.
In addition to the local mutation store, check for broadcasts from parallel sessions:
Call list_topic_messages with topic: "mutation" to get recent mutation broadcasts from other agents.
For each broadcast message:
task_type matches the current campaign's task type, add to the "Prior Approaches" sectionIf list_topic_messages fails or is unavailable: continue with local mutation store only.
This enables compound learning: Agent A's discovery feeds Agent B's hypothesis generation, even when they're running in different sessions.
LOOP FOREVER. Never ask "should I continue?" Never pause to check in. The circuit breaker is the safety net — trust it.
Each iteration:
Read interlab.md to refresh on what's been tried, what works, and what constraints apply. If interlab.ideas.md exists, check for untried ideas.
Look at the code, the metrics, and past attempts. Think about what single change could improve the primary metric. Prioritize:
interlab.ideas.md) firstMake ONE focused change. Small, targeted edits beat large rewrites. You need to isolate what caused any metric shift.
If the campaign has a test constraint, run tests before proceeding to step 4. If tests fail, fix them or revert and try a different approach.
Call run_experiment. Read the output carefully:
| Condition | Decision | Action |
|---|---|---|
| Primary improved AND secondaries acceptable | "keep" | Changes committed automatically |
| Primary regressed | "discard" | Changes reverted automatically |
| Secondary degraded >20% even if primary improved | "discard" | Changes reverted automatically |
| Benchmark crashed (non-zero exit, timeout, error) | "crash" | Changes reverted automatically |
Call log_experiment with the decision and a description of what you changed and why.
After each log_experiment call, record the mutation for provenance tracking:
Call mutation_record with:
task_type: campaign's task typehypothesis: the description passed to log_experimentquality_signal: the metric value from run_experimentcampaign_id: the campaign nameinspired_by: if the hypothesis was explicitly inspired by a prior approach from the mutation query, include that session_idNote whether is_new_best was true — this signals a meaningful improvement.
If mutation_record fails: log a warning but do NOT stop the campaign. Mutation recording is best-effort.
After recording the mutation, broadcast it so parallel sessions can discover this approach:
Call broadcast_message with:
topic: "mutation"subject: "[<campaign_name>] <keep|discard|crash>: <hypothesis summary>"body: JSON string with: {"task_type": "<type>", "hypothesis": "<description>", "quality_signal": <value>, "is_new_best": <bool>, "campaign_id": "<name>", "session_id": "<id>"}If broadcast_message fails or is unavailable (interlock not loaded): continue silently. Broadcasting is best-effort.
Important: log_experiment handles git operations. On "keep", it stages in-scope files and commits. On "discard" or "crash", it reverts in-scope files. Do NOT run git commands yourself.
interlab.md: Append to "What's Been Tried" with the result (1-2 lines per experiment).interlab.ideas.md: If you discovered new optimization ideas during this iteration, add them. Mark completed ideas as tried.Go back to step 1. Do not pause. Do not ask the user.
Stop the loop when ANY of these are true:
run_experiment returns an error about limits (max experiments: 50, max consecutive crashes: 3, max no-improvement streak: 10)When stopping:
Add to interlab.md:
## Final Summary
- **Starting**: <baseline metric value>
- **Ending**: <best metric value>
- **Improvement**: <absolute and percentage>
- **Experiments**: <total> (<kept>/<discarded>/<crashed>)
- **Key wins**: <top 2-3 changes that moved the needle>
- **Key insights**: <what you learned about this codebase/metric>
Completed campaigns are saved to campaigns/<name>/ for future reference:
mkdir -p campaigns/<name>
cp interlab.jsonl campaigns/<name>/results.jsonl
Write campaigns/<name>/learnings.md with validated insights, dead ends, and generalizable patterns (see template in Learnings Document section below).
Update campaigns/README.md index table with the campaign summary row.
Delete interlab.jsonl and interlab.md from the working directory — the archived copies in campaigns/ are the permanent record. The next campaign starts fresh.
If interlab.md already exists when this skill is invoked:
interlab.md for full context on the campaigninterlab.ideas.md if it exists — prune completed or invalid ideasDo not re-run baseline. Do not re-initialize. The JSONL has all the history.
Maintain interlab.ideas.md as a lightweight holding pen:
# Ideas Backlog
## Promising
- [ ] <idea> — <expected impact>
## Tried
- [x] <idea> — <result>
## Rejected
- [-] <idea> — <why not>
Keep this file lean. One line per idea. Move ideas between sections as they're attempted.
After significant discoveries (not every iteration — only genuine insights), update interlab-learnings.md:
# interlab Learnings: <campaign>
## Validated Insights
- <insight> — proved by experiment #N, delta <X>%
- Evidence: <what changed, what metrics showed>
## Dead Ends
- <approach> — tried in experiment #N, no improvement because <reason>
## Patterns
- <general pattern discovered> — applies beyond this campaign
These are non-negotiable:
run_experiment.interlab.md.log_experiment handle all git staging, committing, and reverting.interlab.md. This is how future sessions (and humans) understand what happened.Bundling multiple changes
Ignoring secondary metrics
run_experiment output before decidingForgetting to update interlab.md
Running git commands manually
Pausing to ask the user