From eer-skills
Runs a robustness battery for EER-style results: tests specification, sample, measurement, inference, and multiple-hypothesis sensitivity. Use when a referee demands disciplined stress tests beyond the author's preferred specification.
How this skill is triggered — by the user, by Claude, or both
Slash command
/eer-skills:eer-robustnessThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- The headline estimate exists but its fragility has not been probed
A general-interest result must be believable beyond the authors' favorite specification. EER referees — methods-aware under single-anonymized review — expect a disciplined battery, not a scattershot appendix: vary the things that could plausibly overturn the result, report them transparently, and say which (if any) move the estimate. The goal is a result that is robust where it matters and honest where it is fragile. Robustness is not infinite specification mining; choose tests with a reason.
| Dimension | Test | Why it matters |
|---|---|---|
| Specification | add/drop controls; alternative functional form; FE structure | shows the estimate is not a control artifact |
| Sample | leave-one-out (unit/region/year); alternative windows; trimming outliers | shows no single observation drives it |
| Measurement | alternative outcome/treatment definitions; alternative data source | shows it is not a coding choice |
| Estimator | heterogeneity-robust DiD vs TWFE; alternative IV/RDD bandwidth | shows method-robustness |
| Inference | clustering level; wild-cluster bootstrap (few clusters); spatial/cross-sectional dependence; randomization inference | shows SEs are valid under real dependence |
| Multiple testing | Romano–Wolf / Bonferroni–Holm across families | guards against cherry-picked significance |
| Structural | parameter sensitivity; alternative calibration targets; grid/tuning | shows quantity is not a tuning artifact |
| Pre-trends | honest-DiD sensitivity (Rambachan–Roth); placebo timing | bounds violations of parallel trends |
An IO paper finds a merger raised prices 4%. A weak appendix re-runs with more controls. An EER battery: leave-one-market-out (range 3.1–4.6%, illustrative), alternative price index, synthetic-control placebo on untreated markets, wild-cluster bootstrap (28 markets), and a Romano–Wolf correction across the three outcomes. Verdict stated plainly: "the price effect is 3.1–4.6% and significant in all but the trimmed-outlier sample, where it is 2.0% (s.e. 1.1)." The reader trusts the number because its fragility was mapped.
【Core claim under test】one sentence
【Threats probed】[spec / sample / measurement / estimator / inference / MHT / structural]
【Most dangerous test + result】[...]
【Estimate range across specs】X–Y (where it breaks: Z)
【Honest fragilities】[...]
【Next step】eer-tables-figures (present the battery) or eer-referee-strategy
npx claudepluginhub brycewang-stanford/awesome-journal-skills --plugin eer-skillsBuilds a robustness suite for REStat manuscripts: tests whether headline estimates survive specification, sample, measurement, identification, and inference alternatives.
Builds robustness suites for AEJ: Applied manuscripts to show headline estimates survive specification, sample, and inference choices.
Organizes robustness checks for IER papers by threat to load-bearing assumption, without running regressions. Helps structure responses to referee concerns.