From sciagent-skills
Guides statistical test selection, assumption checks, effect sizes, power analysis, and APA reporting for frequentist (t-test, ANOVA, chi-square, regression) and Bayesian methods using statsmodels or pymc-bayesian-modeling.
npx claudepluginhub jaechang-hits/sciagent-skills --plugin sciagent-skillsThis skill uses the workspace's default tool permissions.
Statistical analysis is the systematic process of selecting appropriate tests, verifying assumptions, quantifying effect magnitudes, and reporting results. This knowhow guides test selection, assumption diagnostics, and APA-style reporting for frequentist and Bayesian analyses in academic research.
Guides statistical test selection, assumption checks, power analysis, hypothesis tests (t-tests, ANOVA, chi-square, regression, Bayesian), effect sizes, and APA-formatted reports for research data.
Conducts statistical hypothesis tests (t-test, ANOVA, chi-square), regression, correlation, Bayesian analyses, power analysis, assumption checks, and APA reporting for academic research data.
Selects statistical tests, interprets effect sizes and confidence intervals, conducts power analysis, verifies assumptions for quantitative research data analysis.
Share bugs, ideas, or general feedback.
Statistical analysis is the systematic process of selecting appropriate tests, verifying assumptions, quantifying effect magnitudes, and reporting results. This knowhow guides test selection, assumption diagnostics, and APA-style reporting for frequentist and Bayesian analyses in academic research.
| Aspect | Frequentist | Bayesian |
|---|---|---|
| Core output | p-value, confidence interval | Posterior distribution, credible interval |
| Interpretation | "How likely is this data if H0 is true?" | "How likely is H1 given the data?" |
| Null support | Cannot support H0 (only fail to reject) | Can quantify evidence for H0 via Bayes Factor |
| Prior info | Not used | Incorporated via prior distributions |
| Sample size | Requires adequate power | Works with any sample size |
| Best for | Standard analyses, large samples | Small samples, prior info, complex models |
A statistically significant result (p < .05) may be trivially small in practice. Always report:
| Test | Effect Size | Small | Medium | Large |
|---|---|---|---|---|
| t-test | Cohen's d | 0.20 | 0.50 | 0.80 |
| t-test (small n) | Hedges' g | 0.20 | 0.50 | 0.80 |
| ANOVA | eta-squared partial | 0.01 | 0.06 | 0.14 |
| ANOVA | omega-squared | 0.01 | 0.06 | 0.14 |
| Correlation | r | 0.10 | 0.30 | 0.50 |
| Regression | R-squared | 0.02 | 0.13 | 0.26 |
| Regression | f-squared | 0.02 | 0.15 | 0.35 |
| Chi-square | Cramer's V | 0.07 | 0.21 | 0.35 |
| Chi-square 2x2 | phi coefficient | 0.10 | 0.30 | 0.50 |
Cohen's benchmarks are guidelines, not rigid thresholds -- domain context always matters.
Most parametric tests require:
When assumptions are violated:
T-test assumptions: (1) Check normality per group with Shapiro-Wilk + Q-Q plots. (2) Check homogeneity with Levene's test. (3) If normality violated: Mann-Whitney U (independent) or Wilcoxon signed-rank (paired). If variance heterogeneity: use Welch's t-test.
ANOVA assumptions: (1) Normality per group. (2) Homogeneity via Levene's test. (3) For repeated measures: check sphericity (Mauchly's test); if violated, apply Greenhouse-Geisser (epsilon < 0.75) or Huynh-Feldt (epsilon > 0.75) correction. (4) If normality violated: Kruskal-Wallis (independent) or Friedman (repeated).
Linear regression assumptions: (1) Linearity via residuals-vs-fitted plot. (2) Independence via Durbin-Watson test (1.5-2.5 acceptable). (3) Homoscedasticity via Breusch-Pagan test + scale-location plot. (4) Normality of residuals via Q-Q plot + Shapiro-Wilk. (5) Multicollinearity via VIF (>10 = severe, >5 = moderate).
Logistic regression assumptions: (1) Independence. (2) Linearity of log-odds with continuous predictors (Box-Tidwell test). (3) No perfect multicollinearity (VIF). (4) Adequate sample size (10-20 events per predictor minimum).
Beyond the main decision flowchart, several specialized test families address specific data types:
Survival / time-to-event analysis:
Count outcome models:
Agreement and reliability:
Categorical data extensions:
What is your research question?
|
+-- Comparing GROUPS on a continuous outcome?
| |
| +-- How many groups?
| | +-- 2 groups
| | | +-- Independent -> Independent t-test (or Mann-Whitney U)
| | | +-- Paired/repeated -> Paired t-test (or Wilcoxon signed-rank)
| | +-- 3+ groups
| | +-- Independent -> One-way ANOVA (or Kruskal-Wallis)
| | +-- Repeated -> Repeated-measures ANOVA (or Friedman)
| |
| +-- Multiple factors? -> Factorial ANOVA / Mixed ANOVA
| +-- With covariates? -> ANCOVA
|
+-- Testing a RELATIONSHIP between variables?
| |
| +-- Both continuous?
| | +-- Normal -> Pearson correlation
| | +-- Non-normal or ordinal -> Spearman correlation
| |
| +-- Predicting continuous outcome?
| | +-- 1 predictor -> Simple linear regression
| | +-- Multiple predictors -> Multiple linear regression
| |
| +-- Predicting categorical outcome?
| | +-- Binary -> Logistic regression
| | +-- Ordinal -> Ordinal logistic regression
| |
| +-- Predicting count outcome?
| | +-- Equidispersed -> Poisson regression
| | +-- Overdispersed -> Negative binomial regression
| | +-- Excess zeros -> Zero-inflated Poisson/NB
| |
| +-- Time-to-event outcome?
| +-- Compare survival curves -> Log-rank test
| +-- With covariates -> Cox proportional hazards
|
+-- Testing ASSOCIATION between categorical variables?
| +-- Expected cell count >= 5 -> Chi-square test
| +-- Expected cell count < 5 -> Fisher's exact test
| +-- Ordered categories -> Cochran-Armitage trend test
| +-- Paired categories -> McNemar's test
|
+-- Assessing AGREEMENT / RELIABILITY?
+-- Categorical, 2 raters -> Cohen's kappa
+-- Categorical, >2 raters -> Fleiss' kappa
+-- Continuous ratings -> ICC
+-- Two measurement methods -> Bland-Altman analysis
+-- Internal consistency -> Cronbach's alpha
| Research Question | Data Type | Normal? | Test | Non-parametric Alternative |
|---|---|---|---|---|
| 2 independent groups | Continuous | Yes | Independent t-test | Mann-Whitney U |
| 2 paired groups | Continuous | Yes | Paired t-test | Wilcoxon signed-rank |
| 3+ independent groups | Continuous | Yes | One-way ANOVA | Kruskal-Wallis |
| 3+ repeated groups | Continuous | Yes | Repeated-measures ANOVA | Friedman test |
| 2 variables | Continuous | Yes | Pearson r | Spearman rho |
| Predict continuous | Mixed | -- | Linear regression | -- |
| Predict binary | Mixed | -- | Logistic regression | -- |
| Predict counts | Count | -- | Poisson / Negative binomial | -- |
| Time-to-event | Survival | -- | Cox PH / Log-rank | -- |
| 2 categorical | Categorical | -- | Chi-square / Fisher's exact | -- |
| Rater agreement | Categorical | -- | Cohen's kappa / Fleiss' kappa | -- |
| Method agreement | Continuous | -- | Bland-Altman / ICC | -- |
Misinterpreting p-values as probability of the hypothesis being true. p-values measure P(data | H0), not P(H0 | data). How to avoid: Use precise language: "If the null hypothesis were true, the probability of observing data this extreme is p = ..."
Confusing statistical significance with practical importance. A large sample can make trivially small effects significant. How to avoid: Always report and interpret effect sizes alongside p-values
Running post-hoc power analysis after a non-significant result. Post-hoc power is a mathematical function of the p-value and adds no new information. How to avoid: Use sensitivity analysis instead -- determine what effect size the study could detect at 80% power
Ignoring assumption violations and proceeding with parametric tests. How to avoid: Run assumption checks systematically. Use Welch's corrections, non-parametric alternatives, or transformations when violated
Multiple comparisons without correction. Running 20 tests at alpha = .05 gives ~64% chance of at least one false positive. How to avoid: Apply Bonferroni, Holm, or FDR correction. Report both corrected and uncorrected p-values
Treating ordinal data as continuous. Likert scales are ordinal -- means and standard deviations assume equal intervals. How to avoid: Use non-parametric tests (Mann-Whitney, Kruskal-Wallis) or ordinal regression
Ignoring missing data patterns. Listwise deletion assumes MCAR, which is rarely true. How to avoid: Assess missingness mechanism (MCAR, MAR, MNAR). Use multiple imputation for MAR data
Confusing correlation with causation. Observational studies cannot establish causal relationships regardless of effect size. How to avoid: Use causal language only for experimental designs with random assignment
Not reporting non-significant results. Publication bias and file-drawer effect distort the literature. How to avoid: Report all pre-registered analyses. Consider registered reports
Using one-tailed tests to "improve" significance. One-tailed tests should be pre-specified based on strong directional hypotheses. How to avoid: Default to two-tailed. Only use one-tailed when justified a priori
Define research question and hypotheses
Select statistical test (use Decision Framework above)
Conduct a priori power analysis
statsmodels.stats.power, pingouinInspect and clean data
assumption_checks.py script provides automated normality, homogeneity, and outlier detection with visualizationCheck assumptions (see Test-Specific Assumption Workflows above)
Run primary analysis
references/bayesian_statistics.md)Conduct post-hoc and secondary analyses
Report results in APA format
references/reporting_standards.md for templatesreferences/effect_sizes_and_power.md -- Detailed guide to calculating, interpreting, and reporting effect sizes (Cohen's d, Hedges' g, Glass's delta, eta-squared, omega-squared, partial eta-squared, phi coefficient, standardized beta, f-squared, Cramer's V, odds ratio); a priori, sensitivity, and correlation power analysis with code examples. Condensed from 582-line original.
references/bayesian_statistics.md -- Comprehensive Bayesian analysis guide: Bayes' theorem, prior specification, ROPE (Region of Practical Equivalence), prior sensitivity analysis, Bayesian t-test/ANOVA/correlation/regression, hierarchical models, model comparison (WAIC/LOO), convergence diagnostics. Condensed from 662-line original.
references/reporting_standards.md -- APA-style reporting templates for t-tests, ANOVA, regression, correlation, chi-square, non-parametric, and Bayesian analyses; pre-registration guidance; methods section templates (participants, design, measures); null results reporting; reporting checklist. Condensed from 470-line original.
test_selection_guide.md (130 lines original) -- Fully consolidated into Decision Framework (flowchart + Quick Reference Table) and Specialized Test Categories subsection in Key Concepts. Combined coverage: flowchart (~35 lines) + Quick Reference Table (~15 lines) + Specialized Test Categories (~35 lines) = ~85 lines covering all original capabilities. Original content on sample size considerations, multiple comparisons, and missing data was consolidated into Best Practices and Common Pitfalls. Omitted: study design considerations (RCTs, observational, clustered data) -- general guidance covered by statsmodels-statistical-modeling skill.
assumptions_and_diagnostics.md (370 lines original) -- Fully consolidated into Key Concepts (Assumptions Overview + Test-Specific Assumption Workflows) and Workflow Steps 4-5. Combined coverage: Assumptions Overview (~12 lines) + Test-Specific Assumption Workflows (~20 lines) + Workflow Steps 4-5 (~16 lines) = ~48 lines. The original contained detailed code blocks for each assumption check; since this is Knowhow (not Skill), code is referenced rather than reproduced. Key diagnostic thresholds preserved (VIF > 10, Durbin-Watson 1.5-2.5, variance ratio < 2-3). Omitted: extensive Python code blocks for individual checks (normality, homogeneity, linearity, logistic regression diagnostics) -- available in scipy.stats and pingouin documentation. Sample size rules of thumb covered in Workflow Step 3.
assumption_checks.py (540 lines) -- Contains 6 functions: check_normality(), check_normality_per_group(), check_homogeneity_of_variance(), check_linearity(), detect_outliers(), comprehensive_assumption_check(). As Knowhow entry, script functions are referenced in Workflow Step 4 rather than reproduced inline. Key capabilities (Shapiro-Wilk, Levene's, IQR/z-score outlier detection, Q-Q plots) are described in Assumptions Overview and Test-Specific Assumption Workflows. Users needing automated checking should use scipy.stats and pingouin directly following the patterns described.