Help us improve
Share bugs, ideas, or general feedback.
From grimoire
Calculates required sample size for a study or evaluates whether a completed study had adequate statistical power to detect an effect.
npx claudepluginhub jeffreytse/grimoire --plugin grimoireHow this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:calculate-statistical-powerThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Determine the minimum sample size needed to detect a biologically meaningful effect with specified confidence, preventing both underpowered and wasteful studies.
Plans and critiques power, MDE, and sample-size calculations for Stata research workflows. Useful for study design, detectability checks, and defending precision claims.
Selects statistical tests, interprets effect sizes and confidence intervals, conducts power analysis, verifies assumptions for quantitative research data analysis.
Guides medical researchers through interactive test selection and generates IRB-ready sample size justifications with reproducible R/Python code for diagnostic accuracy, survival, ANOVA, logistic regression, and non-inferiority designs.
Share bugs, ideas, or general feedback.
Determine the minimum sample size needed to detect a biologically meaningful effect with specified confidence, preventing both underpowered and wasteful studies.
Adopted by: NIH grant review criteria (power statement required since 2014), FDA guidelines for clinical trials, Nature/Science submission checklists, ARRIVE 2.0 animal research guidelines.
Impact: Button et al. (2013) found median power of 21% in neuroscience studies — meaning 79% of negative results were false negatives. Studies with ≥80% power reduce false-negative rate to <20% and improve replication rates by ~2×.
Why best: Power analysis forces researchers to specify effect size before data collection, preventing HARKing; it quantifies the tradeoff between Type I error (α), Type II error (β), effect size, and n.
Sources: Cohen (1988) chapters 2–8; Button et al. Nature Rev Neurosci 14:365–376 (2013); Faul et al. Behav Res Methods 39:175–191 (2007).
Set α (significance level) — use α=0.05 as default; use α=0.01 for exploratory studies with many comparisons; adjust for multiple comparisons (Bonferroni: α/k).
Set desired power (1−β) — use 0.80 as minimum; use 0.90 for high-stakes experiments (clinical, irreversible interventions).
Specify the statistical test — identify the test you will use: t-test, ANOVA, chi-square, correlation, regression, survival analysis. Power formulas differ by test.
Estimate the effect size — use Cohen's d for t-tests, f for ANOVA, r for correlations, w for chi-square. Sources in order of preference: (a) pilot data, (b) meta-analytic estimate, (c) smallest effect of biological importance, (d) Cohen's conventional values (small d=0.2, medium d=0.5, large d=0.8).
Calculate n using G*Power or formula — for two-sample t-test: n = 2(z_α/2 + z_β)² / d²; use G*Power 3.1 software for complex designs or run in R: pwr::pwr.t.test(d=0.5, sig.level=0.05, power=0.80).
Account for attrition — inflate n by expected dropout rate: n_adjusted = n / (1 − dropout_rate). Use 10–20% for animal studies, 20–30% for human clinical trials.
Report power justification — write: "We require n=X per group to detect d=Y with 80% power at α=0.05 (two-tailed), based on [source of effect size estimate]."
For completed studies — calculate observed power only to contextualize a non-significant result; do not use observed power to retroactively justify design.