From icml-skills
Audits ML experiments for ICML submission/rebuttal: baselines, ablations, variance, data leakage, compute disclosure, reproducibility, negative results.
How this skill is triggered — by the user, by Claude, or both
Slash command
/icml-skills:icml-experimentsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this before submission or rebuttal when the central issue is whether experiments are sound
Use this before submission or rebuttal when the central issue is whether experiments are sound enough for ICML. The question is not just "does it win"; it is whether the evidence supports the ML claim under fair comparison.
| Pushback | Why it lands at ICML | Fix |
|---|---|---|
| "Convergence guarantees under assumptions the experiments violate" | Theory paper asserts a rate under smoothness or bounded variance, but the deep-learning runs break it | State assumptions honestly, add a figure showing the rate holds empirically in-regime, flag where it does not |
| "Missing strong, tuned baselines" | The leaderboard win used an undertuned competitor | Re-tune the baseline with matched budget, report the search protocol |
| "No variance, single seed" | One run cannot separate signal from noise | Report seeds with confidence intervals or justify determinism |
| "Compute not disclosed" | ICML expects hardware and training-cost transparency | Add a compute table and confirm comparison fairness |
A paper claims a new adaptive step-size method beats Adam with a non-convex convergence guarantee. The audit asks: is Adam tuned with the same budget, do the benchmark losses actually satisfy the proof's assumptions, and do gains survive across seeds and model sizes? If the win shrinks under a tuned baseline or the assumptions hold only on toy quadratics, the right move is to narrow the claim to the regime where both theory and experiments agree, rather than overclaim a universal speedup.
During response, prefer a small decisive table, corrected baseline, missing ablation, or concise error analysis over a broad new experimental section. ICML gives one discussion round, so a single tuned-baseline row or in-regime variance plot moves a reviewer more than a sprawling new study.
[Evidence status] strong / adequate / weak
[Most vulnerable claim] <claim>
[Critical missing result] <baseline/ablation/variance/leakage/compute>
[Small response result] <feasible clarification>
[Claim narrowing] <text if evidence is not enough>
npx claudepluginhub brycewang-stanford/awesome-journal-skills --plugin icml-skillsAudits ICLR experiments for scientific rigor: baselines, ablations, scaling laws, robustness, statistics, benchmarks, human evaluation, and compute reporting. Helps pre-answer reviewer questions and isolate mechanisms with compute-matched controls.
Strengthens ICML reproducibility evidence: code/data availability, random seeds, compute disclosure, appendix evidence, and reviewer-facing claims.
Audits IJCAI/IJCAI-ECAI experiments for baselines, ablations, statistical evidence, hyperparameters, compute, dataset handling, ethics, and reproducibility.