Skill

iclr-experiments

Audits ICLR experiments for scientific rigor: baselines, ablations, scaling laws, robustness, statistics, benchmarks, human evaluation, and compute reporting. Helps pre-answer reviewer questions and isolate mechanisms with compute-matched controls.

ai-ml

Popularity

Parent stars

342

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/iclr-skills:iclr-experiments

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use this before submission or during a revision pass to stress-test empirical claims. ICLR

SKILL.md

70 lines · ~927 tokens

Stats

LanguageStata

Parent stars342

Parent forks45

MaintenanceGood

Last CommitJun 10, 2026

Actions

View Source View Plugin View on GitHub View README

ICLR Experiments

Use this before submission or during a revision pass to stress-test empirical claims. ICLR experiments should answer the scientific question, not merely assemble a leaderboard.

Experiment audit

Match each experiment to a claim in the introduction.
Compare against current strong baselines, open-source systems, and the most relevant recent OpenReview/arXiv papers.
Add ablations that isolate one mechanism at a time.
Report variance across seeds or runs when randomness can change conclusions.
Include robustness checks for dataset shift, prompt changes, architecture variants, hyperparameter sensitivity, or compute scale when those affect the claim.
State compute budget, hardware, training time, inference cost, and environmental or access limits where relevant.
For human evaluation, document task, annotator instructions, aggregation, quality control, and IRB or ethics status when needed.

Reviewer questions to pre-answer

Is the baseline tuned fairly?
Does the method win because of more compute, data, parameters, or prompt search?
Does the effect persist outside the easiest benchmark?
Are negative results hidden?
Can a reviewer reproduce the headline table from the supplement or artifact?

What ICLR reviewers reward in evidence

ICLR's empirical culture prizes honest ablations and mechanism over leaderboard position. A clean ablation that explains why a representation works often outscores a larger raw number.

Claim type	Evidence that convinces ICLR reviewers	Common reject trigger
New objective helps	Ablate the objective with everything else fixed	Gains confounded with extra tuning
Method scales	Several model sizes/tasks with a trend	One large run, no scaling curve
Robust representation	Tests across shifts, seeds, prompts	Single-seed peak on one benchmark
Beats prior method	Tuned, current, open-source baseline	Stale or under-tuned baseline

Worked vignette

A paper claims a new self-supervised pretext task yields better linear-probe accuracy. Reviewers ask whether the gain is the pretext task or simply longer pretraining. The author audit: hold total pretraining compute fixed, swap only the pretext objective, and report linear-probe accuracy with error bars over five seeds. The compute-matched ablation isolates the mechanism and is small enough to post inline during discussion, where the table becomes part of the permanent public record.

Reviewer-pushback patterns

"You win because of more compute." Add a compute-matched control; report FLOPs, not just wall time.
"Only one seed." Report mean and spread across seeds; an unstable benchmark needs variance.
"Baseline is weak." Cite the baseline's own recommended settings and show you matched them.
"Ablation removes two things at once." Split into single-mechanism ablations a reviewer can read.

Output format

[Claim] <paper claim>
[Experiment evidence] sufficient / needs baseline / needs ablation / needs robustness
[Fairness issue] <compute, tuning, data, prompt, metric>
[Fast fix] <experiment or analysis feasible before deadline>
[Appendix placement] <what can move out of main text>

iclr-experiments

Popularity

Invocation

Context Preview

SKILL.md

iclr-experiments

Popularity

Invocation

Context Preview

SKILL.md

ICLR Experiments

Experiment audit

Reviewer questions to pre-answer

What ICLR reviewers reward in evidence

Worked vignette

Reviewer-pushback patterns

Output format

Similar Skills

ICLR Experiments

Experiment audit

Reviewer questions to pre-answer

What ICLR reviewers reward in evidence

Worked vignette

Reviewer-pushback patterns

Output format

Similar Skills