Robustness & Multiple-Testing Discipline (rfs-robustness)
When to trigger
- The main result holds in one specification but you have not stress-tested it
- A new return predictor / cross-sectional anomaly is the central claim
- You tested many candidate variables and want to report the ones that "worked"
- Reviewers will ask "does this survive [alternative spec / subsample / period]?"
- Results may be sensitive to outliers, windsorization, or a single event
The two robustness mandates at RFS
RFS punishes fragile results and, in cross-sectional asset pricing, undisciplined multiple testing. Build the battery proactively — a result that only the authors can reproduce in one specification is treated as no result.
Two RFS-specific mechanisms raise the bar above JF/JFE:
- Public code release is a condition of publication. Referees and post-publication readers can and do re-run your code. A robustness claim you cannot reproduce from the released code is a liability, not a footnote — design the battery so the released scripts regenerate every check.
- Registered Reports neutralize the multiple-testing critique by construction. Because RFS offers pre-results review (the format it pioneered in finance), pre-specifying the hypothesis and the test before seeing outcomes is the strongest possible answer to "you data-mined this." Even outside the Registered Report track, pre-specification is the RFS-preferred defense. The q-factor spanning logic of Hou, Xue, and Zhang (2015) "Digesting Anomalies" (RFS 28(3)) is a model for confronting the anomaly zoo head-on.
A. General-fragility battery (all empirical papers)
- Alternative specifications: different FE, control sets, functional forms — show the coefficient is stable.
- Alternative measures: re-estimate with an alternative proxy for the key variable.
- Subsamples: split by period, size, industry, region; the sign should not flip without explanation.
- Outliers: re-run with alternative winsorization/trimming; drop influential observations.
- Placebo / falsification: a setting where the effect should be zero.
- Alternative clustering: show inference is not driven by an SE choice.
B. Multiple-testing discipline (cross-sectional asset pricing especially)
- If the claim is a new predictor or anomaly, confront the data-mining critique head-on.
- Report and discuss multiple-testing-adjusted significance (e.g., Bonferroni / Holm, FDR control, or the higher t-hurdles argued in the asset-pricing replication literature such as Harvey–Liu–Zhu).
- Distinguish in-sample fit from out-of-sample performance; report OOS explicitly.
- Pre-specify the hypothesis; do not present a survivor of a large specification search as if it were a single test.
- For factor claims, run spanning tests against established factor models and report the alpha after controls.
C. Mechanism / external validity (supporting robustness)
- Show the mechanism, not just the reduced-form effect — heterogeneity consistent with the proposed channel strengthens credibility.
- Triangulate with a second data source or setting when feasible.
Sequencing the battery
- Run the fragility checks before writing the main tables — a result that moves under reasonable perturbation is not ready.
- For asset-pricing claims, treat the out-of-sample and multiple-testing checks as primary, not optional add-ons.
- Decide early which single check per result earns a place in the main paper; the rest go to the Internet Appendix (
rfs-internet-appendix).
- If a check weakens the result, address it in the text — do not bury it or omit it; referees will re-run the obvious ones.
Checklist
Anti-patterns
- Reporting the 3 predictors that worked out of 40 tested, with no adjustment.
- Calling a result "robust" after one alternative control set.
- In-sample-only predictability dressed as economically meaningful.
- A subsample sign flip mentioned only in a footnote with no explanation.
- Dumping 30 robustness tables into the main paper instead of the IA.
Output format
【Main result】coefficient / magnitude
【Fragility checks done】[specs, measures, subsamples, outliers, placebo]
【Multiple-testing】adjustment used + OOS result (if asset pricing)
【Surviving concerns】[...]
【Where reported】main paper vs. Internet Appendix
【Next step】rfs-tables-figures