From jape-skills
Sets up reproducible estimation and inference pipelines for JAE manuscripts, covering robust inference, master-script discipline, Monte Carlo evidence, and plain-text archive formatting.
How this skill is triggered — by the user, by Claude, or both
Slash command
/jape-skills:jape-data-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- Setting up the estimation pipeline for a JAE paper
JAE's identity is replicable applied work, and accepted papers must deposit data and (typically) programs in the JAE Data Archive. Structure the analysis as if a referee will rerun it:
run_all.do / make / Snakefile) regenerates every exhibit from raw inputs — no manual steps.version in Stata, renv/sessionInfo() in R, pinned requirements.txt in Python); fix and log all seeds..dta be the only copy.Match inference to structure: HAC/Newey–West for serial correlation; cluster-robust SEs for panels; wild/cluster bootstrap with few clusters; weak-IV-robust inference for IV. State which adjustment you use and why.
If you stress-test a method, report the DGP, sample sizes, replication count, and seeds, and ship the simulation script so the tables regenerate exactly. Simulations should illuminate the empirical problem, not stand alone.
Before submission, create a ledger with one row per main claim:
Claim | Exhibit | Input data | Code target | Inference choice | Archive file
Use it to catch unsupported claims and archive gaps. If a row has no code target, the exhibit is not reproducible. If a row has no inference rationale, a referee can challenge the standard errors before engaging the economics. If a row has no archive file, the Data Archive package will not reproduce the published paper.
The ledger also clarifies what belongs in the online appendix: diagnostics and robustness checks that support a ledger row should be preserved there, even if the main article cannot spare space.
JAE referees rerun your code, so the inference choice must be defensible and visible in the deposited scripts. Default mapping:
| Data structure | Expected inference at JAE | What the table note states |
|---|---|---|
| Time series, serial correlation | HAC / Newey–West; justify bandwidth/lag choice | Kernel, lag truncation, sample span |
| Panel, many clusters (≈40+) | Cluster-robust at the treatment/assignment level | Cluster variable and count |
| Panel, few clusters (<~30) | Wild cluster bootstrap (Rademacher/Webb weights) | Bootstrap type, replications, seed |
| IV with modest first stage | Weak-IV-robust (Anderson–Rubin CI; effective F) | First-stage F alongside the 2SLS column |
| Forecast comparisons | Diebold–Mariano-style tests with HAC variance | Loss function and comparison window |
Two inference failures recur in JAE referee reports: cluster-robust SEs treated as valid with a handful of clusters, and 2SLS t-statistics reported without weak-IV diagnostics. Pre-empt both — report the cluster count in every clustered table, switch to wild cluster bootstrap p-values when it is small, and pair any IV column with the effective first-stage F plus an Anderson–Rubin interval. Put the bootstrap loop in the deposited code with its seed so the archived p-value regenerates digit-for-digit.
A staggered state-policy evaluation with 13 treated-side clusters: CRVE gives β = −0.042, s.e. 0.018, p ≈ 0.02; the wild cluster bootstrap (Webb weights, 9,999 draws, seed logged) gives p ≈ 0.08. The JAE-grade move is to report both, lead with the bootstrap, and let the online appendix carry the full grid (weight choice, replications, leave-one-cluster-out). The archived infer_main.do regenerates each p-value; the readme names the seed. Hiding the fragile p-value is the move referees at this venue are trained to catch.
When simulation evidence backs the empirical design, register it like an exhibit:
DGP: [equations + parameter values, calibrated to the application]
Sample sizes: [matching the real data, plus stress values]
Replications: [count] | Seed: [value] | Software: [version]
Script: sims/mc_main.R → outputs tables/mc_table3.csv
Question answered: [which empirical inference concern this resolves]
【Master script】regenerates all exhibits? [Y/N]
【Repro】versions pinned + seeds fixed? [Y/N]
【Inference】HAC / clustered / bootstrap / weak-IV — matched? [Y/N]
【Few-cluster guard】cluster counts reported; bootstrap where small? [Y/N]
【Archive format】plain CSV + readme alongside (not .dta-only)? [Y/N]
../../resources/external_tools.md — estimation, inference, reproducibility tooling../../resources/official-source-map.md — archive format-rule sourcesnpx claudepluginhub brycewang-stanford/awesome-journal-skills --plugin jape-skillsGuides design and audit of Monte Carlo simulations, empirical applications, and estimator comparisons for The Econometrics Journal, focusing on reproducibility and theoretical alignment.
Prepares code and data materials for a Journal of Econometrics submission under Elsevier data-citation norms, focusing on reproducible Monte Carlo and dataset referencing.
Runs and reports empirical analysis for JAE manuscripts: builds archival samples, specifies fixed effects and clustered standard errors, executes identification (DiD, IV, matching), and demonstrates robustness.