From light
Analyzes experimental results, model outputs, and data with statistical rigor and diagnostic depth.
How this skill is triggered — by the user, by Claude, or both
Slash command
/light:light-result-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
1. **描述**:指标汇总、分布、与 baseline 的差距,配误差棒/置信区间。
assets/result_analysis_report_template.mdexamples/worked_example.pyreferences.mdscripts/analysis_out/summary.jsonscripts/analysis_out/summary.mdscripts/analyze_results.pyscripts/explain_shap.pyscripts/leakage_overfit_check.pyscripts/leakage_report.jsonscripts/make_figs.pyscripts/significance_test.pymultipletests(method="fdr_bh"))。analyze_results.py --paired-by seed 自动按共享列对齐配对,配对效应量用 d_z、差值 CI 用 bootstrap(详见 references「配对设计识别」节)。analyze_results.py --slice-by <col> 对每个切片值复算同套 EDA+检验+效应量+FDR,自动标注小 n 切片为"待核查"(公平性维度的敏感属性可作 slice_by,关联 a10;详见 references「切片分析协议」节)。ProfileReport(df, minimal=True).to_file(...) 一键出分布/缺失/相关/告警;deepchecks data_integrity().run(Dataset(df,label=...)) 查泄漏/重复/单值列。smf.ols("y~x1+x2",data).fit().summary() 给系数/p/R²/AIC(OLS 用 sm.add_constant 加截距,公式接口自动加);方差分析 anova_lm(model, typ=2);检验在 statsmodels.stats(ttest_ind / proportions_ztest / multipletests / het_breuschpagan)。scripts/explain_shap.py(一键产 beeswarm/bar/waterfall 三图)。树模型用 shap.TreeExplainer(快),通用兜底 KernelExplainer(慢,背景集要采样);shap.plots.beeswarm 看全局方向、bar 看重要性排序、waterfall 拆单样本。SHAP 反映模型非因果。fig,ax=plt.subplots(layout="constrained"),savefig(dpi=300,bbox_inches="tight") 存 PDF/SVG 矢量;seaborn 轴级函数(boxplot/heatmap/barplot, barplot 默认带 95%CI 误差棒)可嵌 ax,图形级(relplot/catplot)自带分面。配色用 viridis 等色盲友好 colormap,避免 jet。出图语言三选(Python 为主,R/MATLAB 备选):以本技能 make_figs.py(Python/matplotlib)为默认;R 用户可用 ggplot2(+ theme_classic() + scale_color_viridis_d(),拼图 patchwork,ggsave(units="mm") 按期刊栏宽出图);MATLAB 生态(信号/控制类)用 exportgraphics(...,ContentType="vector") 出矢量、tiledlayout 组图。三者出版级矢量质量相当,按项目栈选;具体用法与栏宽规范统一见 m11(light-figure-drawing),本技能只做"分析→出图"不重复绘图细节。px.scatter(...,color=,facet_col=) → write_html) 或 altair(alt.Chart(df).mark_point().encode(x="a:Q",...)),做附录/补充材料。Report(metrics=[DataDriftPreset()]).run(reference_data=ref,current_data=cur)(API 版本敏感,先确认版本)。nbconvert 出报告;多组实验汇编成站点用 Jupyter Book(_config.yml+_toc.yml → jupyter-book build)。claim_evidence_table.md(交 m07/m09 的交接工件,命名见 CONVENTIONS §6.1)——由 analyze_results.py --emit-claim-table 自动生成:每个比较(claim)连到检验/p/q(FDR)/Cohen's d/CI/n,显著性以 BH-FDR 后 q 为准,并标"不显著的不得声称更好"。scripts/explain_shap.py 直接产出 dpi300 的 PDF/SVG/PNG。scripts/analyze_results.py:结果表 csv 一键分析。EDA 摘要(n/均值±std/中位/95%CI/正态性)+ 按正态性与组数自动选检验(2 组正态→Welch t / 非正态→Mann-Whitney;≥3 组正态→先 Levene 方差齐性:齐用 ANOVA+Tukey、不齐自动切 Welch-ANOVA / 非正态→Kruskal-Wallis)+ 每对 Cohen's d(Hedges 校正)+ BH-FDR 跨比较校正,输出 summary.json + summary.md。小样本(最小组 n<10)自动加 Shapiro 功效不足警告(n 小时"判正态"不可靠,提示改非参/预设检验)。用法 python scripts/analyze_results.py results.csv --group method --metric acc f1;共享种子/折加 --paired-by seed 配对检验;--slice-by <col> 切片分析(逐切片复算+标小 n);--emit-claim-table 产 claim_evidence_table.md(§6.1 工件);无参跑合成 demo。scripts/significance_test.py:"p + Cohen's d + CI + FDR"函数库(welch_t/cohens_d/mean_diff_ci/bootstrap_ci/benjamini_hochberg/compare_two/delong_two_auroc)。__main__ 逐函数对齐 scipy/statsmodels 打印 ALL PASS。复用 ../../../code_assets/stats_tests.py。delong_two_auroc(y,score_a,score_b)(借医疗评估常用 DeLong):比较同一测试集上两模型 AUROC 差异是否显著(相关样本扣协方差,普通独立检验会错),与 sklearn roc_auc_score 数值对齐。scripts/make_figs.py:出版级 matplotlib 模板(OO 接口、constrained_layout、viridis 色盲友好、误差棒、dpi300 矢量 PDF/SVG/PNG)。builder:grouped_bar_ci/box_strip/line_with_band/heatmap + save_all。python scripts/make_figs.py 产 demo 四联图。scripts/leakage_overfit_check.py:纯 numpy/pandas 的 train/val/test gap(过拟合/漂移)+ 特征-标签高相关泄漏 + train/test 重复行 + 近常量列告警;deepchecks 缺失时自动降级(不强依赖)。阈值可 CLI 覆盖(--gap-overfit/--gap-shift/--leak-corr/--near-const,默认值仅启发式、强依赖任务,报告里如实标用的哪套阈值)。python scripts/leakage_overfit_check.py 跑带"植入泄漏"的合成 demo。scripts/explain_shap.py:SHAP 模型可解释性出图。派发 TreeExplainer(树模型快路径)→ shap.Explainer(统一入口自动挑 Tree/Linear)→ KernelExplainer(模型无关兜底,背景集用 shap.kmeans/sample 采样约 100 行控成本);封装 SHAP 存图坑(每图 show=False → plt.gcf() → 复用 make_figs save_all 存 dpi300 PDF/SVG/PNG),产 beeswarm(全局方向)+ bar(重要性排序)+ waterfall(单样本分解) 三图。shap 未装时优雅降级跳过、exit 0(仿 deepchecks 处理,不强依赖)。python scripts/explain_shap.py 跑 make_classification+RandomForest 合成自测。坑:强相关稀释贡献、SHAP 非因果、KernelExplainer 昂贵。examples/worked_example.py:端到端 EDA→显著性→图→泄漏体检→填好的 example_report.md,全部写入 examples/example_out/。assets/result_analysis_report_template.md:四段式(现象→原因→证据→对论文的意义)报告模板,含亮点/异常/待补实验清单。亮点 → m07 写作支撑;异常/不足 → 回 m05 补实验 或 回 m03 提新 idea;结论写入 db09。诚实标注已验证/未验证(CONVENTIONS §4)。
工具真实端点/API/参数与已知坑的逐工具笔记见 references.md。
npx claudepluginhub light0305/light-skillsRuns rigorous statistical analysis for ML/AI experiments: validates artifacts, computes descriptive/inferential stats, generates scientific figures, and surfaces missing evidence.
Analyzes experiment results from tables, stats, or descriptions to generate LaTeX discussion paragraphs for academic papers via two-phase workflow: extracts findings for user confirmation, then writes grounded analysis.
Documents results of experiments or A/B tests with statistical analysis, learnings, and recommendations. Use after experiments conclude to communicate findings and inform decisions.