Help us improve
Share bugs, ideas, or general feedback.
From epistemic-skills
Executes preregistered hypotheses under locked methods, generating provisional evidence without contaminating headline outputs. For runs that respect preregistration, judge locks, and cost accounting.
npx claudepluginhub atomicstrata/epistemic --plugin epistemic-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/epistemic-skills:experiment-executionThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Related skills:** `/skill:preregistration`, `/skill:baseline-reproduction`, `/skill:falsification-review`, `/skill:kill-or-ship`
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Explores codebases via GitNexus: discover repos, query execution flows, trace processes, inspect symbol callers/callees, and review architecture.
Share bugs, ideas, or general feedback.
Related skills:
/skill:preregistration,/skill:baseline-reproduction,/skill:falsification-review,/skill:kill-or-ship
Execution is where method turns into evidence. Not where you redesign the method, switch carriers because today is inconvenient, or leak a promising number into a headline file.
Your job is narrow:
computeTargetprereg.md, judge.lock, and environment.lock before launchnexperiments/{id}/smokes/appendCostRecord(...)Core principle: execution is measurement under a locked contract. Routing, environment, and cost accounting are part of that contract. If you improvise on any of them, downstream review cannot rescue the result.
| Need | File or API | Rule |
|---|---|---|
| Load hypothesis state | loadHypotheses(cwd) | Start from repo state, not memory |
| Select the live experiment | getActiveHypothesis(entries) | Do not guess the active id |
| Read execution target | HypothesisEntry.computeTarget | local, docker, or modal is part of the method |
| Mark execution start | updateHypothesisStatus(cwd, id, "RUNNING") | Bookkeeping only |
| Check preregistration | fileExists("experiments/{id}/prereg.md") | No prereg, no run |
| Read judge lock | getJudgeLock(cwd, id) | Missing or drifted judge lock blocks scored execution |
| Read environment lock | getEnvironmentLock(cwd, id) | Missing or drifted environment lock blocks all execution |
| Compare judge hash | computeJudgeHash(judgeRef, id) | Check before the first scored call |
| Read spend | getHypothesisSpend(cwd, id) | Watch the cap before more runs |
| Split spend by type | getHypothesisSpendByCategory(cwd, id) | Separate llm from compute burn |
| Append compute cost | appendCostRecord(cwd, record) | Use category: "compute" after every attempted run |
| Local or docker env contract | active Dockerfile + requirements.txt | Hash the exact pair used for the run |
| Modal env contract | experiments/{id}/modal-app.py | Hash the exact file you will execute |
| Docker writable path | experiments/{id}/ | Mount this read-write; everything else read-only |
| Provisional artifacts | experiments/{id}/smokes/ | Logs and raw outputs live here |
| Headline files | experiments/{id}/RESULTS.md and any root RESULTS.md | Do not write here yet |
NO RUN WITHOUT LOCKS; NO RUN WITHOUT A LEDGER ROW
Every legitimate run leaves four traces:
experiments/{id}/prereg.mdexperiments/{id}/smokes/.epistemic/cost-ledger.jsonlIf any trace is missing, you do not have evidence. You have a story.
Use this skill when:
experiments/{id}/prereg.md already existsHYPOTHESES.md is OPEN or RUNNINGcomputeTarget: local, docker, or modalenvironment.lock and, if applicable, judge.lock must be enforced before launchn and compute the promised summaryDo not use this skill when:
/skill:research-question/skill:preregistration/skill:baseline-reproduction/skill:falsification-review/skill:kill-or-shipA quick run that changes what you believe is not a harmless preview. It is a real run with missing governance.
loadHypotheses(cwd).getActiveHypothesis(entries).id, claim, falsifier, n, judgeRef, baselineRef, costCap, computeTarget, and status.computeTarget as part of the preregistered method.local, docker, and modal.experiments/{id}/prereg.mdexperiments/{id}/judge.lockexperiments/{id}/environment.lockexperiments/{id}/modal-app.pyexperiments/{id}/smokes/experiments/{id}/RESULTS.md.epistemic/cost-ledger.jsonlOPEN to RUNNING with updateHypothesisStatus(cwd, id, "RUNNING").CONFIRMED here.prereg.mdexperiments/{id}/prereg.md exists with fileExists(path).bun, python, pytest, benchmark, train, docker, or modal commands and promise yourself you will document the method later.nHypothesisEntry.n, judgeRef, baselineRef, or computeTarget disagree between preregistration and HYPOTHESES.md, repair the inconsistency before running.judgeRef from the active hypothesis.getJudgeLock(cwd, id).computeJudgeHash(judgeRef, id).writeJudgeLock(cwd, id, judgeRef).getEnvironmentLock(cwd, id).null, stop.environment.lock.local and docker, the environment contract is the exact Dockerfile and requirements.txt registered for this run.Dockerfile plus requirements.txt in a deterministic order and compare the result to environment.lock.modal, the environment contract is experiments/{id}/modal-app.py.modal-app.py and compare the result to environment.lock.environment.lock after editing Dockerfile, requirements.txt, or modal-app.py..epistemic/cost-ledger.jsonl.getHypothesisSpend(cwd, id).getHypothesisSpendByCategory(cwd, id).getAllHypothesisSpends(cwd).CostRecord with category: "compute" after every attempted run.computeTarget, not from convenienceh.computeTarget.local means run in a virtual environment under the locked dependency contract.experiments/{id}/smokes/run-{n}.log.docker means build from the registered Dockerfile, then execute inside a container./work inside the container.experiments/{id}/ read-write.docker run --rm \
-v "$(pwd):/work:ro" \
-v "$(pwd)/experiments/{id}:/work/experiments/{id}:rw" \
-w /work \
<image> <command>
modal means write experiments/{id}/modal-app.py with @modal.app() and @modal.function() decorators.modal run experiments/{id}/modal-app.py.experiments/{id}/smokes/run-{n}.log.appendCostRecord(cwd, record).CostRecord shape from src/state/repo.ts.category: "compute".local and docker, record estimatedCost: 0.modal, record estimatedCost = gpuSeconds × rate.rate before the run.toolName to the actual backend.isError to reflect whether the run failed.n unless the preregistered stopping rule explicitly says otherwise.computeTargetsmokes/ and nowhere elseexperiments/{id}/smokes/ as the provisional artifact directory.run-001.logrun-001.jsonrun-002.logaggregate.mdnotes.mdsmokes/.experiments/{id}/smokes/ as provisional and non-quotable.experiments/{id}/RESULTS.mdRESULTS.mdsmokes/ too if they mention provisional numbers.experiments/{id}/falsifiers/{model}.md here.experiments/{id}/smokes/.experiments/{id}/RESULTS.md yet.RESULTS.md, do not write there either.CONFIRMED just because the mean looks good.experiments/{id}/prereg.mdexperiments/{id}/judge.lockexperiments/{id}/environment.lockexperiments/{id}/modal-app.py if computeTarget is modalexperiments/{id}/smokes/ artifacts.epistemic/cost-ledger.jsonlHYPOTHESES.md.epistemic/cost-ledger.jsonl is JSON Lines: one JSON object per line, append-only.
Do not treat it like one array and do not rewrite history to make spend look cleaner.
The CostRecord shape defined in src/state/repo.ts is:
interface CostRecord {
timestamp: string;
hypothesisId: string;
toolName: string;
estimatedCost: number;
category: "llm" | "compute";
isError: boolean;
}
Execution adds category: "compute" rows.
Examples:
{"timestamp":"2026-05-31T18:04:11.233Z","hypothesisId":"h-rag-precision","toolName":"compute:local","estimatedCost":0,"category":"compute","isError":false}
{"timestamp":"2026-05-31T18:12:44.901Z","hypothesisId":"h-rag-precision","toolName":"compute:docker","estimatedCost":0,"category":"compute","isError":true}
{"timestamp":"2026-05-31T18:19:52.918Z","hypothesisId":"h-rag-precision","toolName":"compute:modal:a10g","estimatedCost":1.12,"category":"compute","isError":false}
Read it with concrete questions:
getHypothesisSpend(cwd, id).getHypothesisSpendByCategory(cwd, id).isError: true rows are execution evidence, not bookkeeping noise.What not to do:
local or docker blank because the cost is zero. Zero is still a recorded decision.The ledger is methodology, not bookkeeping theater. Untracked cost usually means untracked execution.
| Excuse | Reality |
|---|---|
“The hypothesis says local, but Docker is cleaner on this machine.” | Carrier choice is part of the registered method. Convenience does not overrule it. |
“I only changed requirements.txt a little.” | A little dependency drift is still dependency drift. |
“I can rewrite environment.lock after I finish debugging.” | Retroactive compliance is theater. Stop and repair the protocol first. |
| “I will let the container write anywhere in the repo because it is faster.” | Wide write access destroys containment and makes the run harder to audit. |
| “Modal cost is hard to estimate, so I will leave compute blank.” | Untracked compute is hidden spend. Estimate it and record it. |
| “The first five runs already prove the point.” | Your preregistered n exists to stop exactly that impulse. |
| “I can switch carriers midway to reduce infra noise.” | Mid-run carrier changes are methodology changes after peeking. |
| “Failed launches do not count because no result file was produced.” | They still consumed time, budget, and feasibility. Log them. |
“I only wrote the number into RESULTS.md as a placeholder.” | Headline files are claims, not scratchpads. |
| “I found a better metric after seeing the data.” | Then it is a new analysis, not this preregistered execution. |
Stop immediately if any of these are true:
experiments/{id}/prereg.md.id is active.computeTarget the active hypothesis specifies.judge.lock exists but you have not compared it against computeJudgeHash(judgeRef, id).environment.lock exists but you have not compared it against the current environment hash.environment.lock does not match and you are tempted to “just proceed once.”experiments/{id}/.modal-app.py changed after the lock was recorded and you are still planning to run it.smokes/.CostRecord was appended.RESULTS.md, a PR, or a commit message.All of those mean the same thing: stop, return to the contract, and repair the method before generating more evidence.
const entries = await loadHypotheses(cwd);
const h = getActiveHypothesis(entries);
if (!h) throw new Error("No OPEN or RUNNING hypothesis.");
switch (h.computeTarget) {
case "local":
case "docker":
case "modal":
break;
default:
throw new Error(`Unknown compute target: ${h.computeTarget}`);
}
Good because the carrier comes from repo state, not from vibes.
const target = process.env.USE_DOCKER ? "docker" : "local";
Bad because the hypothesis already owns that decision.
const lockedEnv = await getEnvironmentLock(cwd, h.id);
if (!lockedEnv) throw new Error("Missing environment.lock.");
const currentEnvHash = computeCurrentEnvironmentHash();
if (lockedEnv !== currentEnvHash) {
throw new Error("Environment drift detected.");
}
Good because the environment is checked before results exist.
await writeFile(`experiments/${h.id}/environment.lock`, currentEnvHash, "utf8");
Bad because you are laundering drift into compliance.
-v "$(pwd):/work:ro"
-v "$(pwd)/experiments/h-rag-precision:/work/experiments/h-rag-precision:rw"
Good because the container can write evidence without rewriting the repo.
-v "$(pwd):/work:rw"
Bad because the run can silently mutate unrelated files.
await appendCostRecord(cwd, {
timestamp: new Date().toISOString(),
hypothesisId: h.id,
toolName: `compute:${h.computeTarget}`,
estimatedCost: h.computeTarget === "modal" ? gpuSeconds * rate : 0,
category: "compute",
isError: runFailed,
});
Good because compute burn is explicit and auditable.
const estimatedTotal = 4.0; // roughly what all runs cost
Bad because roughly is not a ledger.
Clean execution buys you things improvisation never will.
environment.lock means the dependencies were frozen before launch.compute row, including zero-cost local and docker runs and billable Modal runs.smokes/ until later review decides what, if anything, deserves a headline.Execution is not where you prove brilliance. Execution is where you prove restraint.
After execution is complete, use /skill:statistical-rigor, then /skill:falsification-review.