Search everything...

Stats

Actions

Available In

Scientific Method

Name: Scientific Method
Author: 88plug

By 88plug

Falsification-first investigation workflow: convert every assertion into a labeled falsifiable hypothesis, predict before measuring, run controlled experiments, verify findings adversarially (REFUTE-first), and persist verdicts in a hypothesis ledger so killed ideas are never re-attacked

npx claudepluginhub 88plug/claude-code-plugins --plugin scientific-method

Popularity

Stars

Med: 0·Avg: 273

Installs

Med: 0·Avg: 1

What's Inside

Slash Commands7

Council

/council

Convene a model council: the same question answered independently across different models, dissent surfaced, cruxes routed to probes

Falsify

/falsify

Attack an asserted limit, ceiling, or claim with designed experiments

Invent

/invent

Run a full invention campaign: ideate past a limit, refute, build, measure against acceptance criteria, provenance-search, certify

Investigate

/investigate

Run a full scientific investigation: hypotheses, controlled experiments, verdicts, ledger

Ledger

/ledger

Create or update the persistent hypothesis ledger (EXPERIMENTS.md) for this project

Agents5

council-member

/council-member

Use this agent as one independent seat in a model council — the same question is posed to several seats, each running on a different model (pass a different model override per spawn), blind to each other's answers. The seat investigates with its own read-only probes and returns a structured position with evidence, calibrated confidence, and an explicit "what would change my mind". Spawn 3-5 seats in parallel for judgment calls: design decisions, interpretation of ambiguous results, risk assessments, go/no-go calls. Do not use a council to settle purely empirical questions — those go to experiments.

experiment-designer

/experiment-designer

Use this agent to turn one asserted limit, claim, or candidate root cause into a rigorously designed experiment — without running it. It returns a falsifiable hypothesis with an explicit null, a written probe artifact, a pre-committed outcome→conclusion table, and what each outcome unlocks. Spawn one per hypothesis when fanning out a falsification campaign (the parent runs the probes serially for clean numbers). Use whenever an investigation has accumulated untested assertions ("the ceiling is X", "the daemon causes Y", "Z can't work") that need designed experiments rather than debate.

meta-reviewer

/meta-reviewer

Use this agent as the area chair who closes a peer-review round. It receives the submission packet, all independent reviews, the author rebuttal, and any re-scores, then issues the final decision (accept / minor-revision / major-revision / reject) with camera-ready requirements. It weighs evidence quality rather than counting votes — one reviewer with a failed reproduction outweighs three approving skims. Spawn exactly one, after rebuttal, never before all reviews are in.

peer-reviewer

/peer-reviewer

Use this agent as one independent reviewer in a peer-review round for an invention, design, finding, or paper-style writeup. Each reviewer gets the same submission packet plus ONE assigned lens (soundness, prior-art/provenance, reproducibility, significance, or fatal-flaw) and works blind to the other reviewers. Reviews are execution-grounded — the reviewer runs the Reproduce block, searches prior art, or re-derives the numbers depending on lens — and return structured scores plus an accept/revise/reject recommendation. Spawn 3-5 with different lenses after a finding survives the refute gate and before it is built, merged, published, or sent externally.

refuter

/refuter

Use this agent to evaluate a finding, claimed root cause, performance claim, invention, or research result by attempting to refute it before judging what survives. It returns a verdict (confirmed / prototype / research / kill) with the refutation analysis, calibrated confidence, and a kill_reason when applicable. Spawn one fresh refuter per claim — the author of a finding must not referee it. Use proactively before acting on any finding, sending conclusions externally, merging a "fix", or relaying research-agent results the parent has not independently verified.

Skills1

scientific-method

/scientific-method

Use this skill whenever someone doubts a number, demands rigorous proof of a cause, or wants something invented and proven — any time a benchmark, metric, ceiling, or root-cause story must be verified rather than trusted, or a limit must be broken with a built, measured mechanism. Typical situations: a measurement looks suspicious ("2x faster but I don't trust it"); an incident needs its true cause before a decision ("the team blames X — confirm it, no guessing"); ANY production incident, outage, brownout, latency/error spike, regression, or "started failing" / "load suddenly went 5x" / "every component met spec but it still broke" report where the cause is not yet proven — especially when a recent deploy or change is suspected ("is the deploy to blame", "this coincided with the rollout"), since the obvious suspect is exactly what needs a control case; an asserted ceiling or "impossible" limit needs breaking or proving real; numbers need validation before publishing; a problem needs an invented solution that demonstrably meets acceptance criteria. Runs as a falsification-first campaign — hypotheses, predictions before measurement, controlled experiments, adversarial refutation, calibrated confidence, persistent ledger — with invention as the default continuation: surviving limits get attacked with designed mechanisms, built, measured against tuned baselines, and provenance-searched (claims demonstrated or searched, never asserted). Triggers: "use the scientific method", "prove it", "validate these claims", "verify, don't trust", "are you sure", "root cause this", "root cause and fix", "why did X happen / why is X slow", "diagnose this incident/outage/ brownout", "retry storm", "cascading failure", "what caused the regression", "I need real numbers", "falsify", "is it real", "invent", "break the ceiling" — and any incident or any challenge to an asserted number or cause, even unnamed and even when phrased as a routine ask. Use proactively before asserting any ceiling or root cause of your own. Do NOT use for ordinary feature work, refactoring, or one-file questions.

Hooks1

Event Hooks

1 hook across 1 event

Stats

Version1.0.0

LanguagePython

Stars0

MaintenanceExcellent

LicenseFSL-1.1-ALv2

Last CommitJun 23, 2026

AddedJul 2, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

88plug1

Safety Signals

Caution

Uses power tools

Uses Bash, Write, or Edit tools

README

Scientific Method

A Claude Code plugin that runs investigations, debugging, performance work, and claim validation as falsification-first campaigns — for engineers who need to be right, not just confident.

Install

/plugin marketplace add 88plug/scientific-method
/plugin install scientific-method@scientific-method

Quickstart

Run a full campaign on any problem in one command:

/scientific-method:investigate the API returns 500 under load but not in tests

You get back labeled hypotheses (H1..Hn), a prediction for each written before any measurement, the cheapest probe run first, and a calibrated verdict — with every result logged to a persistent ledger so killed ideas stay killed.

No setup, no API keys, no MCP server. The plugin enforces method over your existing tools.

What it does

Most debugging is guessing dressed up as analysis. This plugin makes Claude work like a scientist: turn each assertion into a falsifiable hypothesis, predict the outcome before measuring, run a controlled experiment, attack the result before trusting it, and record the verdict so it survives across sessions.

It is distilled from real session transcripts where the method cracked problems ordinary debugging did not — a GPU codec campaign that falsified four asserted "physical" performance walls, a fleet forensics investigation that killed two plausible-but-wrong root causes with control cases before filing a vendor bug, and benchmark work where honest baselines caught regressions that averages hid.

[!NOTE] This is a methodology plugin. It ships a skill, commands, agents, and one read-only hook — no MCP server, output style, or statusline. The hook is the only thing that runs automatically, and it only reads EXPERIMENTS.md.

What it enforces

Every asserted limit, cause, or claim becomes a labeled falsifiable hypothesis (H1..Hn) with an explicit null.
Predictions and outcome-to-conclusion tables are written before measuring.
Cheapest falsification first — one probe beats five agents arguing.
Controls and baselines are mandatory for causal and performance claims.
Findings pass a REFUTE-first adversarial gate before being trusted.
Confidence is calibrated — 0.90+ needs ground-truth proof, and being confidently wrong is worse than being inconclusive.
Verdicts persist in a hypothesis ledger (EXPERIMENTS.md) with a falsification log marked DO-NOT-RE-ATTACK, so killed ideas stay killed across sessions and compactions.

The workflow

Each stage maps to a command you can run on its own, or that investigate chains for you.

Hypothesis — turn each assertion into a labeled, falsifiable claim with a null.
Prediction — write what each outcome would mean, before measuring.
Experiment — run the cheapest controlled probe that can falsify the claim.
Refute — attack surviving findings adversarially before trusting them.
Verdict — issue a calibrated result: confirmed, prototype, research, or kill.
Ledger — persist verdicts so killed ideas are never re-attacked.

Commands

Command	What it does
`/scientific-method:investigate <problem>`	Full campaign: hypotheses, controlled experiments, verdicts, ledger
`/scientific-method:falsify <claim>`	Attack an asserted limit, ceiling, or claim with designed probes
`/scientific-method:invent <problem>`	Invention campaign: ideate past a limit, refute, build, measure vs tuned baseline, provenance-search, certify
`/scientific-method:verdict [claims]`	Adversarial REFUTE-first review of findings before they are trusted
`/scientific-method:ledger [sync]`	Create or update the persistent hypothesis ledger
`/scientific-method:council <question>`	Model council: same question to independent seats on different models, dissent surfaced, factual cruxes routed to probes
`/scientific-method:peer-review <work>`	Blind lensed reviewers who execute, rebuttal answered with evidence, area-chair decision

Agents

These run under the hood when a campaign fans out. You can also invoke them directly.

View full README on GitHub

Scientific Method

Popularity

What's Inside

Confidence

README

Scientific Method

Install

Quickstart

What it does

What it enforces

The workflow

Commands

Agents

Similar Plugins

ecc

fullstack-dev-skills

prompts.chat

context7-plugin

superpowers

godot-skills

More by 88plug

Caveman Plus

screen-mcp

Amnesia

SearXNG

Drift Detector

Scientific Method

Install

Quickstart

What it does

What it enforces

The workflow

Commands

Agents

Popularity

Health & Quality

More by 88plug

Caveman Plus

screen-mcp

Amnesia

SearXNG

Drift Detector

Similar Plugins

ecc

fullstack-dev-skills

prompts.chat

context7-plugin

superpowers

godot-skills