Help us improve
Share bugs, ideas, or general feedback.
From science-superpowers
Enforces fresh verification of analysis results before making claims. Requires running the analysis from raw data and reading actual output before reporting any finding.
npx claudepluginhub k-dense-ai/science-superpowers --plugin science-superpowersHow this skill is triggered — by the user, by Claude, or both
Slash command
/science-superpowers:verifying-results-before-claimingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Claiming a finding without fresh verification is dishonesty, not efficiency.
Verifies data science analysis reproducibility and completion with automated guard hooks and a structured checklist. Useful when confirming analysis results or validating output.
Dispatching a skeptical reviewer subagent to attack your conclusion before you believe or report it. Useful after analysis steps, before writeups, or when results are surprising or convenient.
Reviews data analysis pipelines for quality, correctness, and reproducibility. Assesses data quality, model validation, leakage detection, and verifies reproducibility. Use for pre-publication reviews, ML pipeline validation, or regulatory audits.
Share bugs, ideas, or general feedback.
Claiming a finding without fresh verification is dishonesty, not efficiency.
Core principle: Evidence before claims, always.
Violating the letter of this rule is violating the spirit of this rule.
NO CLAIMS WITHOUT FRESH REPRODUCED EVIDENCE
If you haven't run the analysis in this state and read its actual output, you cannot claim its result. "It was significant earlier" is not evidence now.
BEFORE claiming any result or expressing satisfaction:
1. IDENTIFY: What command/analysis proves this claim?
2. RUN: Execute it fresh and complete (from the immutable raw data, fixed seed)
3. READ: The actual output — the estimate, the interval, the p-value, the diagnostics
4. CHECK: Do the method's assumptions hold? Does it reproduce?
5. VERIFY: Does the output actually support the claim?
- If NO: state the real result with evidence
- If YES: state the claim WITH the evidence (number + interval)
6. ONLY THEN: make the claim
Skip any step = asserting, not verifying
| Claim | Requires | Not Sufficient |
|---|---|---|
| "The effect is significant" | Fresh run; read estimate, CI, and p | "It was significant before" |
| "There's no effect" | Effect size + interval showing precision | A non-significant p (could be underpowered) |
| "The result reproduces" | Re-run from raw data + fixed seed → same number | "It ran fine earlier" |
| "Assumptions are met" | The diagnostic output, read | "It's probably fine" |
| "The model is good" | Out-of-sample metric | In-sample fit / training accuracy |
| "Data cleaned correctly" | Validation counts (rows in/out, ranges) | "The script ran without error" |
| "Confirmatory finding" | It was pre-registered AND re-run as registered | Matches what I expected |
| "The subagent finished" | Inspect the committed artifacts/diff | The subagent said "done" |
| Excuse | Reality |
|---|---|
| "It was significant last run" | Re-run it now. Code/data may have changed. |
| "I'm confident in the result" | Confidence is not evidence. |
| "The script ran without errors" | Running ≠ correct. Read the output. |
| "p < .05, so it's real" | p is not the probability the effect is real. Report effect + interval. |
| "p > .05, so no effect" | Could be underpowered. Absence of evidence ≠ evidence of absence. |
| "The subagent reported success" | Verify the artifacts independently. |
| "It reproduces, I'm sure" | Re-run from raw + seed and show the same number. |
| "Different words, so the rule doesn't apply" | Spirit over letter. |
Significance / effect:
✅ [Re-run the model] [Read: beta=0.23, 95% CI [0.08, 0.38], p=.002] "Exposure raises the outcome; CI excludes zero."
❌ "The effect looked significant."
Reproducibility:
✅ Fresh env → re-run from data/raw with seed → same estimate to reported precision → "Reproduces."
❌ "It ran earlier, so it reproduces."
Null result:
✅ [Read: beta=0.01, 95% CI [-0.12, 0.14]] "No detectable effect; the interval rules out effects larger than ~0.14."
❌ "p=0.3, so there's no effect."
Model performance:
✅ [Held-out test set, used once] [AUC=0.78] "Out-of-sample AUC 0.78."
❌ "Training accuracy is 0.97, the model is great."
Subagent delegation:
✅ Subagent reports done → inspect committed code + output artifact → confirm → report actual state
❌ Trust the report
ALWAYS before:
Applies to: exact phrases, paraphrases, synonyms, and any implication of a verified finding.
Run it fresh. Read the output. Check it reproduces. THEN state the result — with the number and the interval.
This is non-negotiable.