From ds
Guides statistical analysis with test selection, assumption checking, power analysis, and APA reporting. Use with /ds:experiment for methodology design, validation, and results.
npx claudepluginhub andikarachman/data-science-plugin --plugin dsThis skill uses the workspace's default tool permissions.
Statistical analysis is a systematic process for testing hypotheses and quantifying relationships. Conduct hypothesis tests (t-test, ANOVA, chi-square), regression, correlation, and Bayesian analyses with assumption checks and APA reporting.
Guides statistical test selection, assumption checks, power analysis, hypothesis tests (t-tests, ANOVA, chi-square, regression, Bayesian), effect sizes, and APA-formatted reports for research data.
Guides statistical test selection, assumption checks, effect sizes, power analysis, and APA reporting for frequentist (t-test, ANOVA, chi-square, regression) and Bayesian analyses using statsmodels or pymc.
Selects statistical tests, interprets effect sizes and confidence intervals, conducts power analysis, verifies assumptions for quantitative research data analysis.
Share bugs, ideas, or general feedback.
Statistical analysis is a systematic process for testing hypotheses and quantifying relationships. Conduct hypothesis tests (t-test, ANOVA, chi-square), regression, correlation, and Bayesian analyses with assumption checks and APA reporting.
Role in the ds plugin: This skill is invoked by /ds:experiment at step 3 (Methodology Design) for test selection and power analysis, and at step 7 (Generate Results) for assumption checking and APA-formatted reporting. For concrete statsmodels API patterns (model fitting, result object methods, diagnostics, convergence troubleshooting), see the statsmodels skill. This skill focuses on the statistical workflow: selecting the right test, verifying assumptions, interpreting results, and reporting in APA format.
This skill should be used when:
Use this decision tree to determine your analysis path:
START
|
+-- Need to SELECT a statistical test?
| +-- YES -> See "Test Selection Guide"
| +-- NO -> Continue
|
+-- Ready to check ASSUMPTIONS?
| +-- YES -> See "Assumption Checking"
| +-- NO -> Continue
|
+-- Ready to run ANALYSIS?
| +-- YES -> See "Running Statistical Tests"
| +-- NO -> Continue
|
+-- Need to REPORT results?
+-- YES -> See "Reporting Results"
Use references/test_selection_guide.md for comprehensive guidance. Quick reference:
Comparing Two Groups:
Comparing 3+ Groups:
Relationships:
Bayesian Alternatives: All tests have Bayesian versions that provide:
references/bayesian_statistics.mdALWAYS check assumptions before interpreting test results.
Use the provided scripts/assumption_checks.py module for automated checking:
from scripts.assumption_checks import comprehensive_assumption_check
# Comprehensive check with visualizations
results = comprehensive_assumption_check(
data=df,
value_col='score',
group_col='group', # Optional: for group comparisons
alpha=0.05
)
This performs:
For targeted checks, use individual functions:
from scripts.assumption_checks import (
check_normality,
check_normality_per_group,
check_homogeneity_of_variance,
check_linearity,
detect_outliers
)
# Example: Check normality with visualization
result = check_normality(
data=df['score'],
name='Test Score',
alpha=0.05,
plot=True
)
print(result['interpretation'])
print(result['recommendation'])
Normality violated:
Homogeneity of variance violated:
Linearity violated (regression):
See references/assumptions_and_diagnostics.md for comprehensive guidance.
Primary libraries for statistical analysis:
import pingouin as pg
import numpy as np
# Run independent t-test
result = pg.ttest(group_a, group_b, correction='auto')
# Extract results
t_stat = result['T'].values[0]
df = result['dof'].values[0]
p_value = result['p-val'].values[0]
cohens_d = result['cohen-d'].values[0]
ci_lower = result['CI95%'].values[0][0]
ci_upper = result['CI95%'].values[0][1]
# Report
print(f"t({df:.0f}) = {t_stat:.2f}, p = {p_value:.3f}")
print(f"Cohen's d = {cohens_d:.2f}, 95% CI [{ci_lower:.2f}, {ci_upper:.2f}]")
import pingouin as pg
# One-way ANOVA
aov = pg.anova(dv='score', between='group', data=df, detailed=True)
print(aov)
# If significant, conduct post-hoc tests
if aov['p-unc'].values[0] < 0.05:
posthoc = pg.pairwise_tukey(dv='score', between='group', data=df)
print(posthoc)
# Effect size
eta_squared = aov['np2'].values[0] # Partial eta-squared
print(f"Partial eta-squared = {eta_squared:.3f}")
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
# Fit model
X = sm.add_constant(X_predictors) # Add intercept
model = sm.OLS(y, X).fit()
# Summary
print(model.summary())
# Check multicollinearity (VIF)
vif_data = pd.DataFrame()
vif_data["Variable"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
print(vif_data)
import pymc as pm
import arviz as az
import numpy as np
with pm.Model() as model:
# Priors
mu1 = pm.Normal('mu_group1', mu=0, sigma=10)
mu2 = pm.Normal('mu_group2', mu=0, sigma=10)
sigma = pm.HalfNormal('sigma', sigma=10)
# Likelihood
y1 = pm.Normal('y1', mu=mu1, sigma=sigma, observed=group_a)
y2 = pm.Normal('y2', mu=mu2, sigma=sigma, observed=group_b)
# Derived quantity
diff = pm.Deterministic('difference', mu1 - mu2)
# Sample
trace = pm.sample(2000, tune=1000, return_inferencedata=True)
# Summarize
print(az.summary(trace, var_names=['difference']))
# Probability that group1 > group2
prob_greater = np.mean(trace.posterior['difference'].values > 0)
print(f"P(mu1 > mu2 | data) = {prob_greater:.3f}")
Effect sizes quantify magnitude, while p-values only indicate existence of an effect.
See references/effect_sizes_and_power.md for comprehensive guidance.
| Test | Effect Size | Small | Medium | Large |
|---|---|---|---|---|
| T-test | Cohen's d | 0.20 | 0.50 | 0.80 |
| ANOVA | eta-squared | 0.01 | 0.06 | 0.14 |
| Correlation | r | 0.10 | 0.30 | 0.50 |
| Regression | R-squared | 0.02 | 0.13 | 0.26 |
| Chi-square | Cramer's V | 0.07 | 0.21 | 0.35 |
Most effect sizes are automatically calculated by pingouin:
# T-test returns Cohen's d
result = pg.ttest(x, y)
d = result['cohen-d'].values[0]
# ANOVA returns partial eta-squared
aov = pg.anova(dv='score', between='group', data=df)
eta_p2 = aov['np2'].values[0]
# Correlation: r is already an effect size
corr = pg.corr(x, y)
r = corr['r'].values[0]
Determine required sample size before data collection:
from statsmodels.stats.power import (
tt_ind_solve_power,
FTestAnovaPower
)
# T-test: What n is needed to detect d = 0.5?
n_required = tt_ind_solve_power(
effect_size=0.5,
alpha=0.05,
power=0.80,
ratio=1.0,
alternative='two-sided'
)
print(f"Required n per group: {n_required:.0f}")
# ANOVA: What n is needed to detect f = 0.25?
anova_power = FTestAnovaPower()
n_per_group = anova_power.solve_power(
effect_size=0.25,
ngroups=3,
alpha=0.05,
power=0.80
)
print(f"Required n per group: {n_per_group:.0f}")
Determine what effect size you could detect:
# With n=50 per group, what effect could we detect?
detectable_d = tt_ind_solve_power(
effect_size=None, # Solve for this
nobs1=50,
alpha=0.05,
power=0.80,
ratio=1.0,
alternative='two-sided'
)
print(f"Study could detect d >= {detectable_d:.2f}")
See references/effect_sizes_and_power.md for detailed guidance.
Follow guidelines in references/reporting_standards.md.
Group A (n = 48, M = 75.2, SD = 8.5) scored significantly higher than
Group B (n = 52, M = 68.3, SD = 9.2), t(98) = 3.82, p < .001, d = 0.77,
95% CI [0.36, 1.18], two-tailed. Assumptions of normality (Shapiro-Wilk:
Group A W = 0.97, p = .18; Group B W = 0.96, p = .12) and homogeneity
of variance (Levene's F(1, 98) = 1.23, p = .27) were satisfied.
A one-way ANOVA revealed a significant main effect of treatment condition
on test scores, F(2, 147) = 8.45, p < .001, eta-squared = .10. Post hoc
comparisons using Tukey's HSD indicated that Condition A (M = 78.2,
SD = 7.3) scored significantly higher than Condition B (M = 71.5,
SD = 8.1, p = .002, d = 0.87) and Condition C (M = 70.1, SD = 7.9,
p < .001, d = 1.07). Conditions B and C did not differ significantly
(p = .52, d = 0.18).
Multiple linear regression was conducted to predict exam scores from
study hours, prior GPA, and attendance. The overall model was significant,
F(3, 146) = 45.2, p < .001, R-squared = .48, adjusted R-squared = .47.
Study hours (B = 1.80, SE = 0.31, beta = .35, t = 5.78, p < .001,
95% CI [1.18, 2.42]) and prior GPA (B = 8.52, SE = 1.95, beta = .28,
t = 4.37, p < .001, 95% CI [4.66, 12.38]) were significant predictors,
while attendance was not (B = 0.15, SE = 0.12, beta = .08, t = 1.25,
p = .21, 95% CI [-0.09, 0.39]). Multicollinearity was not a concern
(all VIF < 1.5).
A Bayesian independent samples t-test was conducted using weakly
informative priors (Normal(0, 1) for mean difference). The posterior
distribution indicated that Group A scored higher than Group B
(M_diff = 6.8, 95% credible interval [3.2, 10.4]). The Bayes Factor
BF10 = 45.3 provided very strong evidence for a difference between
groups, with a 99.8% posterior probability that Group A's mean exceeded
Group B's mean.
Consider Bayesian approaches when:
See references/bayesian_statistics.md for comprehensive guidance.
comprehensive_assumption_check(): Complete workflowcheck_normality(): Normality testing with Q-Q plotscheck_homogeneity_of_variance(): Levene's test with box plotscheck_linearity(): Regression linearity checksdetect_outliers(): IQR and z-score outlier detection