From aaai-skills
Audits AAAI experimental evidence including baselines, ablations, statistical significance, robustness, human evaluation, and reproducibility-checklist alignment for Phase-1 survival.
How this skill is triggered — by the user, by Claude, or both
Slash command
/aaai-skills:aaai-experimentsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this before submission to ensure empirical evidence supports the AI contribution. AAAI
Use this before submission to ensure empirical evidence supports the AI contribution. AAAI reviewers may come from adjacent AI subfields, so experiments must be interpretable beyond one benchmark community.
Because an AAAI reviewer from an adjacent subfield must trust your numbers quickly, classify each experimental block by how much weight it can bear and what would strengthen it.
| Block | Carries the claim when | Reviewer doubt | Cheap reinforcement |
|---|---|---|---|
| Headline benchmark | beats tuned recent baselines | "lucky seed" | seeds, variance bars |
| Ablation | isolates one mechanism | "joint removal" | single-factor toggles |
| Robustness | holds across split/shift | "one setting" | extra split or perturbation |
| Human eval | protocol is documented | "rater bias" | IRB note, inter-rater agreement |
A planning paper reports a single-seed win on one domain. Audit: the headline block "needs robustness" and "needs variance", so the fix before the deadline is five seeds with confidence intervals plus one extra IPC-style domain. Because new results cannot rescue this in rebuttal, the team runs both before submission and aligns the checklist's seed answer to the supplement.
[Claim] <paper claim>
[Evidence status] sufficient / needs baseline / needs ablation / needs robustness / unclear
[Fairness issue] <compute, tuning, data, prompt, metric, human eval>
[Checklist dependency] <what checklist answer this supports>
[Fast fix] <experiment or analysis feasible before deadline>
npx claudepluginhub brycewang-stanford/awesome-journal-skills --plugin aaai-skillsAudits IJCAI/IJCAI-ECAI experiments for baselines, ablations, statistical evidence, hyperparameters, compute, dataset handling, ethics, and reproducibility.
Audits AAAI paper reproducibility: maps claims to evidence, checks seed/hyperparameter/compute reporting, verifies code/data availability and licensing, and cross-checks the reproducibility checklist for contradictions.
Designs and audits AISTATS experiments: simulations, baselines, statistical tests, uncertainty estimates, ablations, and theory-validation checks with claim-to-evidence mapping.