Skill

experiment

Validates development rules using scientific method: register hypotheses, design experiments, execute, score confidence, and graduate or kill rules based on evidence.

developer-tools

code-quality

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/kernel:experiment

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadGrepBashWrite

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Supporting Files

reference/experiment-research.md

SKILL.md

79 lines · ~958 tokens

Stats

LanguageJavaScript

Stars11

MaintenanceExcellent

Last CommitJun 19, 2026

Actions

View Source View Plugin View on GitHub View README

Tags

Rules without evidence are superstitions. Apply scientific method: register hypothesis, design experiment, execute, record evidence, update confidence, graduate or kill. Deep patterns, SQL schema, domain design templates, confidence-scoring examples: skills/experiment/reference/experiment-research.md --domain : filter to domain (methodology|coordination|testing|git|security|performance|quality) --status : filter by status (unproven|testing|supported|refuted|graduated) --confidence : filter by minimum confidence

Seed — parse CLAUDE.md for imperative/assertive statements → register as hypotheses
- (gate: each hypothesis has id, statement, source, domain, status=unproven, confidence=0.0)
- Auto-assign H001, H002, … ; classify domain from list
- agentdb learn hypothesis "H{N}: {statement}" "{source}:{line}"
Design — for each hypothesis under test, define experiment BEFORE running anything
- method: what you will do (A/B, timed comparison, fuzz, track-and-count)
- measurement: quantitative metric preferred (time, error count, rework frequency)
- control condition: what happens WITHOUT the rule applied
- pass_criteria: specific observable that supports the hypothesis
- fail_criteria: specific observable that refutes it
- (gate: experiment_designed — must be falsifiable; if no outcome could refute, redesign)
Execute — run the experiment; record raw observations
- (gate: control condition was actually tested, not just assumed)
- minimum sample sizes: methodology/coordination/git/quality ≥ 3 comparisons; testing ≥ 5 tasks per condition; security ≥ 50 fuzz inputs
Score — apply Bayesian update to confidence
- supports: confidence += (1 - confidence) * 0.25
- refutes: confidence -= confidence * 0.3
- inconclusive: no change (record in evidence log)
- agentdb learn experiment "H{N} result={supports|refutes|inconclusive}" "{evidence}"
Transition — update hypothesis status per lifecycle rules
- unproven → testing: first experiment registered
- testing → supported: confidence ≥ 0.8 AND evidence_for ≥ 3 AND ratio ≥ 3:1
- testing → refuted: confidence < 0.2 AND evidence_against ≥ 2
- supported → graduated: human approval after sustained confidence
- refuted → killed: human approval to remove from rules
- any → unproven: rule is modified (resets all evidence)
- (gate: evidence_recorded — verdict must include specific, measurable evidence string)
Report — surface verdict
- verdict: SUPPORTED | REFUTED | INCONCLUSIVE
- confidence: current 0.0–1.0 value
- evidence: required (specific, measurable, not narrative)
- if GRADUATED: propose rule promotion to CLAUDE.md with human approval
- if KILLED: propose rule removal from CLAUDE.md with human approval

<anti_patterns> Confirm a hypothesis without running a real experiment. Use a single data point to graduate a hypothesis. Ignore refuting evidence because the rule "feels right". Test a hypothesis with a method that can only confirm (design for falsifiability). Modify the hypothesis after seeing results (that is a new hypothesis). </anti_patterns>

<on_complete> agentdb write-end '{"skill":"experiment","hypotheses_tested":N,"graduated":[],"killed":[],"inconclusive":[]}' </on_complete>

experiment

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

experiment

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Similar Skills

Similar Skills