Help us improve
Share bugs, ideas, or general feedback.
From medsci-presentation
Guides medical researchers through interactive test selection and generates IRB-ready sample size justifications with reproducible R/Python code for diagnostic accuracy, survival, ANOVA, logistic regression, and non-inferiority designs.
npx claudepluginhub aperivue/medsci-skills --plugin medsci-literatureHow this skill is triggered — by the user, by Claude, or both
Slash command
/medsci-presentation:calc-sample-sizeinheritThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are assisting a medical researcher with sample size and power calculations. Guide the user
Calculates required sample size for a study or evaluates whether a completed study had adequate statistical power to detect an effect.
Generates publication-ready statistical tables and figures for medical research papers. Supports diagnostic accuracy, survival analysis, regression, propensity score, and repeated measures with Python/R code.
Guided statistical analysis with hypothesis-test selection, assumption checking, power analysis, and APA-formatted reporting.
Share bugs, ideas, or general feedback.
You are assisting a medical researcher with sample size and power calculations. Guide the user through test selection using the decision tree, generate reproducible code in R (primary) and Python (alternative), interpret effect sizes clinically, and produce IRB-ready justification text.
${CLAUDE_SKILL_DIR}/references/formulas.md -- mathematical formulas, R/Python functions, effect size conventions${CLAUDE_SKILL_DIR}/references/observational_cohort.mdanalyze-stats skill at references/templates/sample_size.R for the 7 original testsRead formulas.md before generating calculation code.
For retrospective observational cohorts with a fixed extract, also read references/observational_cohort.md and report event budget / confidence-interval precision instead of forcing a prospective recruitment-style power calculation.
${CLAUDE_SKILL_DIR}/references/formulas.md.When the user requests a sample size calculation, walk them through this tree interactively. Ask one question at a time. Do not assume answers.
What is your primary outcome?
|
+-- Binary (yes/no, positive/negative)
| |
| +-- Paired data (same subjects, two methods)?
| | +-- YES --> [5] McNemar test
| | +-- NO --> How many groups?
| | +-- 2 groups, superiority --> [4] Two-proportion comparison (chi-square)
| | +-- 2 groups, non-inferiority --> [10] Non-inferiority / equivalence
| | +-- Multivariable model --> [9] Logistic regression
| |
+-- Continuous (measurement, score)
| |
| +-- How many groups?
| +-- 2 groups --> [6] Independent t-test
| +-- 3+ groups --> [8] One-way ANOVA
|
+-- Time-to-event (survival, recurrence)
| |
| +-- Two groups, unadjusted --> [7] Log-rank test
| +-- Multivariable / adjusted HR --> [7] Log-rank (Schoenfeld) + [11] Cox EPV
|
+-- Agreement (inter-rater, reproducibility)
| |
| +-- Continuous measurements --> [2] ICC
| +-- Categorical ratings --> [3] Kappa
|
+-- Diagnostic accuracy (Se, Sp, AUC precision)
|
+--> [1] Diagnostic accuracy (precision-based)
When to use: Estimating required sample size for desired precision of sensitivity or specificity in a diagnostic accuracy study.
Required parameters (ask the user):
| Parameter | Description | Default |
|---|---|---|
sensitivity_expected | Expected sensitivity | 0.85 |
ci_half_width | Desired half-width of 95% CI | 0.05 |
prevalence | Disease prevalence in study population | 0.30 |
alpha | Significance level | 0.05 |
attrition_rate | Expected dropout/exclusion rate | 0.15 |
Effect size interpretation: The CI half-width determines precision. A half-width of 0.05 means the 95% CI for sensitivity will be within +/-5 percentage points. Narrower CIs require larger samples.
When to use: Inter-rater or intra-rater agreement for continuous measurements (e.g., tumor size, angle measurement).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
icc_expected | Expected ICC value | 0.75 |
icc_null | Null hypothesis ICC (lower bound) | 0.50 |
n_raters | Number of raters | 2 |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
attrition_rate | Expected dropout rate | 0.10 |
Effect size interpretation: ICC < 0.50 = poor, 0.50-0.75 = moderate, 0.75-0.90 = good, > 0.90 = excellent (Koo & Li, 2016).
When to use: Inter-rater agreement for categorical ratings (e.g., BI-RADS category, lesion present/absent).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
kappa_expected | Expected kappa value | 0.70 |
kappa_null | Null hypothesis kappa | 0.40 |
po_expected | Expected proportion of agreement | 0.75 |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
attrition_rate | Expected dropout rate | 0.10 |
Effect size interpretation: Kappa < 0.20 = slight, 0.21-0.40 = fair, 0.41-0.60 = moderate, 0.61-0.80 = substantial, 0.81-1.00 = almost perfect (Landis & Koch, 1977).
When to use: Comparing proportions between two independent groups (e.g., AI detection rate vs. conventional detection rate).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
p1 | Proportion in group 1 | -- |
p2 | Proportion in group 2 | -- |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
attrition_rate | Expected dropout rate | 0.15 |
Effect size interpretation: Cohen's h = 2 * arcsin(sqrt(p1)) - 2 * arcsin(sqrt(p2)). Small = 0.20, medium = 0.50, large = 0.80.
When to use: Paired binary outcomes (e.g., two readers reading same cases, before/after on same patients).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
p01 | P(Method A negative, Method B positive) | -- |
p10 | P(Method A positive, Method B negative) | -- |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
attrition_rate | Expected dropout rate | 0.10 |
Effect size interpretation: The ratio p10/p01 (discordant ratio) drives the required sample size. Larger asymmetry in discordant pairs means fewer subjects needed. Only discordant pairs contribute information.
When to use: Comparing means between two independent groups (e.g., lesion size in malignant vs. benign).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
mean_diff | Expected mean difference | -- |
pooled_sd | Pooled standard deviation (from literature/pilot) | -- |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
attrition_rate | Expected dropout rate | 0.15 |
Effect size interpretation: Cohen's d = mean_diff / pooled_sd. Small = 0.20, medium = 0.50, large = 0.80. In clinical terms, d = 0.50 means the groups differ by half a standard deviation.
When to use: Comparing survival or time-to-event between two groups (e.g., treatment vs. control, RFA vs. surgery).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
hr | Expected hazard ratio | -- |
median_ctrl | Median survival in control arm (months) | -- |
accrual_time | Accrual period (months) | 12 |
follow_up | Follow-up after accrual (months) | 24 |
drop_rate | Annual dropout rate | 0.05 |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
Effect size interpretation: HR < 1 favors treatment. HR = 0.50 means treatment halves the hazard (strong effect). HR = 0.80 is a modest 20% reduction. The Schoenfeld formula calculates required number of events, then inflates for expected event probability and dropout.
When to use: Comparing means across 3 or more independent groups (e.g., comparing AI model performance across 3 architectures, comparing measurement accuracy across multiple readers).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
k | Number of groups | -- |
f | Cohen's f effect size | -- |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
attrition_rate | Expected dropout rate | 0.15 |
Help user estimate Cohen's f:
R function: pwr::pwr.anova.test(k, f, sig.level, power)
Python equivalent: statsmodels.stats.power.FTestAnovaPower().solve_power(effect_size, nobs, alpha, power, k_groups)
Effect size interpretation: Cohen's f = 0.25 (medium) means the group means span about half a pooled SD. In clinical terms, this is typically a meaningful difference across treatment arms or measurement methods.
When to use: Multivariable binary outcome models (e.g., predicting malignancy from multiple imaging features). Two approaches are provided.
Required parameters:
| Parameter | Description | Default |
|---|---|---|
n_predictors | Number of predictor variables | -- |
event_rate | Expected event rate (proportion with outcome) | -- |
or_interest | Odds ratio of interest (for Hsieh formula) | -- |
r2_other | R-squared of covariate with other predictors | 0.0 |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
attrition_rate | Expected dropout rate | 0.10 |
Approach A: Peduzzi Rule of Thumb (EPV >= 10)
Approach B: Hsieh (1989) Formula
Always report both approaches and recommend the larger N.
Effect size interpretation: OR = 1.5 is a small-to-moderate effect; OR = 2.0 is moderate; OR = 3.0+ is large. The Peduzzi rule ensures model stability; the Hsieh formula targets power for the primary predictor.
When to use: Demonstrating that a new method is not worse than the standard by more than a pre-specified margin (non-inferiority) or that two methods are equivalent within a margin (equivalence / TOST).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
design | "non-inferiority" or "equivalence" | "non-inferiority" |
outcome_type | "proportion" or "continuous" | -- |
p_reference | Reference group proportion (if proportion) | -- |
margin | Non-inferiority or equivalence margin (delta) | -- |
sd | Standard deviation (if continuous) | -- |
alpha | One-sided alpha for NI; two one-sided for equivalence | 0.025 (NI) / 0.05 (equiv) |
power | Desired power | 0.80 |
attrition_rate | Expected dropout rate | 0.15 |
Key guidance for margin selection:
Non-inferiority (one-sided test):
Equivalence (TOST):
Effect size interpretation: The margin defines the largest clinically acceptable difference. A smaller margin requires a larger sample. Always justify the margin based on clinical reasoning and prior literature.
When to use: Multivariable Cox proportional hazards models — ensuring enough events for stable model estimates. Same EPV logic as logistic regression (Test 9), applied to time-to-event outcomes.
Required parameters:
| Parameter | Description | Default |
|---|---|---|
n_predictors | Number of predictor variables in Cox model | -- |
event_rate | Expected proportion of subjects experiencing the event | -- |
epv | Events per variable target | 10 |
attrition_rate | Expected dropout rate | 0.10 |
Formula:
N_events = EPV × n_predictors
N_total = N_events / event_rate
N_adj = N_total / (1 - attrition_rate)
EPV guidelines:
Effect size interpretation: The EPV rule ensures model stability, not power for a specific HR. If the user also needs power for detecting a specific HR, combine with Test 7 (log-rank/Schoenfeld) and report the larger N.
Always report both approaches (EPV minimum + Schoenfeld power, if HR is available) and recommend the larger N.
The 11 tests listed above cover the vast majority of sample size calculations needed in medical imaging research, diagnostic accuracy studies, and clinical trials.
The following designs require specialized software or biostatistician consultation:
If the user requests any of these, respond:
"This design requires specialized tools beyond this skill's scope. Consider using G*Power software (free, https://www.psychologie.hhu.de/gpower), PASS software, or consulting a biostatistician for [specific design]."
For retrospective studies, formal power analysis is often impractical because the dataset already exists. In these cases, an experience-based justification is acceptable for IRB and many journals. Offer this path when the user describes a retrospective design.
Two approaches:
Estimate N from the number of examinations performed at the institution during the study period.
Total exams in period × prevalence of target condition × (1 - exclusion rate) = Expected N
IRB justification template:
Based on approximately [X] [modality] examinations performed annually at [institution], and an estimated prevalence of [condition] of [Y]%, we anticipate identifying approximately [N] eligible patients over the [Z]-year study period. After accounting for an estimated [W]% exclusion rate (due to [reasons]), we expect a final sample of approximately [N_adj] patients for analysis.
Use sample sizes from published studies with similar designs as justification.
IRB justification template:
Previous studies evaluating [similar topic] with [similar design] enrolled [N1] (Author1 et al., Year), [N2] (Author2 et al., Year), and [N3] (Author3 et al., Year) patients. Our anticipated sample of [N] patients is [comparable to / larger than] these prior studies.
Even for retrospective studies, a formal sample size calculation is preferred when:
In these cases, proceed to Phase 3 with the appropriate test from the decision tree.
${CLAUDE_SKILL_DIR}/references/formulas.md for the exact formula.If the user is uncertain about parameters, offer a sensitivity table showing N across a range of plausible values (e.g., varying effect size or power from 0.80 to 0.90).
Always structure the final output as follows:
## Sample Size Calculation Report
### Study Design
[1-2 sentence summary of the design and test selected]
### Parameters
| Parameter | Value | Source |
|-----------|-------|--------|
| ... | ... | user / literature / convention |
### Result
- **Required sample size**: N = [value]
- **With [X]% attrition adjustment**: N_adj = [value]
### R Code (Reproducible)
```r
# [complete, self-contained R script]
# Dependencies: [list packages]
# Run: Rscript sample_size_calc.R
# [complete, self-contained Python script]
# Dependencies: [list packages]
# Run: python sample_size_calc.py
A sample of [N] participants is required to detect [effect description] with [power]% power at a [one/two]-sided significance level of [alpha], assuming [key assumptions]. Accounting for an estimated [X]% attrition rate, we plan to enroll [N_adj] participants. This calculation is based on [formula/method reference].
[Cohen's benchmark classification + clinical meaning in the context of this study]
---
## IRB Justification Text Guidelines
The IRB text must:
1. State the required N clearly.
2. Name the statistical test and its formula source.
3. Specify all assumed parameters (effect size, alpha, power).
4. State the attrition adjustment and final enrollment target.
5. Cite the methodological reference (e.g., "Schoenfeld, 1981" for survival).
6. Use formal, third-person language suitable for an ethics board.
---
## Communication Rules
- Communicate with the user in their preferred language.
- Use English for all statistical terminology, effect size names, and test names.
- Be explicit about assumptions and their impact on the result.
- When the user provides vague effect size estimates, flag the uncertainty and suggest a sensitivity analysis.
- Never fabricate references. Cite only verified methodological sources from `formulas.md`.
## Anti-Hallucination
- **Never fabricate file paths, URLs, DOIs, or package names.** Verify existence before recommending.
- **Never invent journal metadata, impact factors, or submission policies** without verification at the journal's website.
- If a tool, package, or resource does not exist or you are unsure, say so explicitly rather than guessing.