From compound-science
This skill covers applied microeconomic empirical methods and research design. Use when the user is selecting an identification strategy, comparing estimators, running diagnostics, designing a research study, or evaluating an empirical strategy. Triggers on "which method", "what estimator", "how to choose", "method comparison", "empirical strategy", "research design", "applied micro", "identification strategy", "power analysis", "design-based", "model-based", "minimum detectable effect", "specification".
npx claudepluginhub james-traina/science-plugins --plugin compound-scienceThis skill uses the workspace's default tool permissions.
Reference for applied micro research design: method selection, diagnostics, inference, pitfalls, reporting standards, and power analysis.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Reference for applied micro research design: method selection, diagnostics, inference, pitfalls, reporting standards, and power analysis.
Use when the user is:
references/data-sources.md for FRED/World Bank API access)Skip when:
causal-inference skill for IV, DiD, RDD, SC, matching)structural-modeling skill)submission-guide skill)identification-proofs skill)bayesian-estimation skill)After selecting a method, the econometric-reviewer agent can review the implementation and the identification-critic agent can evaluate the identification argument.
Start with the fundamental question: What source of variation identifies the causal effect?
| Source of Variation | Method Family | Key Assumption |
|---|---|---|
| Randomized assignment (with full compliance) | Experimental analysis (OLS on treatment indicator) | Random assignment |
| Randomized assignment (with imperfect compliance) | IV / 2SLS using random assignment as instrument | Exclusion restriction, monotonicity |
| Policy change at a sharp threshold | Sharp RDD | Continuity of potential outcomes at cutoff |
| Policy change at a threshold with imperfect compliance | Fuzzy RDD (= IV at the cutoff) | Continuity + monotonicity at cutoff |
| Policy change at a point in time, with affected and unaffected groups | Difference-in-differences | Parallel trends |
| Staggered policy adoption across units over time | Staggered DiD (Callaway-Sant'Anna, Sun-Abraham, etc.) | Parallel trends (conditional on group and time) |
| Rare event affecting a single unit, long pre-treatment data | Synthetic control | Pre-treatment fit implies post-treatment counterfactual |
| Exogenous shifter of treatment that does not affect outcome directly | IV / 2SLS / GMM | Exclusion restriction, relevance, monotonicity |
| Rich set of observables that plausibly captures all confounders | Matching, IPW, AIPW (selection on observables) | Conditional independence (no unobserved confounders) |
| No credible exogenous variation | Sensitivity analysis, bounds, partial identification | Depends on bounding assumptions |
Within DiD:
Is treatment timing staggered?
├── No → Classic 2x2 DiD (TWFE is fine)
└── Yes
├── Can treatment turn off (reversals)?
│ ├── Yes → de Chaisemartin-D'Haultfoeuille (2020)
│ └── No
│ ├── Do you have never-treated units?
│ │ ├── Yes → Callaway-Sant'Anna (2021) with never-treated controls
│ │ └── No → Callaway-Sant'Anna with not-yet-treated controls
│ │ or Sun-Abraham (2021)
│ └── Are effects likely heterogeneous across cohorts?
│ ├── Yes → Callaway-Sant'Anna or Sun-Abraham (NOT TWFE)
│ └── No → TWFE is OK, but report Bacon decomposition
Within IV:
How many instruments for how many endogenous regressors?
├── Exactly identified (K instruments = K endogenous)
│ └── 2SLS (= IV = Wald estimator for single instrument)
├── Over-identified (K instruments > K endogenous)
│ ├── 2SLS (default)
│ ├── GMM (efficient, use if heteroskedasticity suspected)
│ └── LIML (less biased with weak instruments)
└── Under-identified (K instruments < K endogenous)
└── Cannot identify all parameters — need more instruments or fewer endogenous regressors
Within RDD:
Does crossing the threshold guarantee treatment?
├── Yes → Sharp RDD
└── No → Fuzzy RDD
└── Is the running variable continuous?
├── Yes → Standard rdrobust
└── No (discrete / few mass points)
└── Cattaneo-Idrobo-Titiunik (2019) discrete RD methods
Within Matching / Selection on Observables:
Is the selection-on-observables assumption plausible?
├── No → Need a different identification strategy
└── Yes
├── Do you need ATE or ATT?
│ ├── ATE → IPW or AIPW
│ └── ATT → Matching or IPW with ATT weights
├── Is the propensity score model well-specified?
│ ├── Uncertain → Use AIPW (doubly robust)
│ └── Confident → IPW or regression adjustment
└── Many covariates or nonlinear confounding?
├── Yes → ML-based methods (causal forests, DML)
└── No → Parametric PS model + AIPW
Key diagnostics to run for each method family. For full reporting checklists and minimum standards, see references/reporting-standards.md.
| Method | Must-Run Diagnostics | Key Concern |
|---|---|---|
| IV / 2SLS | First-stage F (KP), reduced form, overid test | Weak instruments (F < 10), exclusion restriction |
| DiD (classic) | Pre-trend F-test, event study plot, raw means by group/period | Parallel trends violation |
| Staggered DiD | Bacon decomposition, Callaway-Sant'Anna group-time ATTs | Negative TWFE weights with heterogeneous effects |
| RDD | McCrary density test, covariate balance at cutoff, bandwidth sensitivity | Manipulation of running variable, extrapolation bias |
| Synthetic Control | Pre-fit RMSPE, permutation p-value, leave-one-out | Pre-period fit quality, donor pool sensitivity |
| Matching / AIPW | Overlap plots, Love plot (SMD before/after), Oster/Rosenbaum bounds | Lack of overlap, unobserved confounders |
| Structural | Convergence, identification rank condition, robustness to starting values | Global vs local optimum, identification failure |
For implementation details and diagnostic code by method, see the causal-inference skill.
| Mistake | Consequence | Fix |
|---|---|---|
| Clustering too fine (individual when treatment is at state level) | SEs too small; over-rejection | Cluster at the level of treatment assignment |
| Few clusters (< 30–40) with standard cluster-robust SEs | Poor finite-sample properties | Wild cluster bootstrap |
| Not clustering when treatment varies at group level | SEs dramatically understated | Always cluster at level of treatment assignment |
| Dimension | Design-Based | Model-Based |
|---|---|---|
| Source of randomness | Treatment assignment mechanism | Outcome draws from a superpopulation |
| Key assumption | Known or modeled treatment assignment | Correct outcome model specification |
| Examples | Experiments, RCTs, RDD, DiD, natural experiments | Structural models, matching, cross-sectional surveys |
| Advantages | Transparent; does not require outcome model | More powerful; extends to complex settings |
Design-based is appropriate when the assignment mechanism is known (experiments, lotteries, cutoffs). Model-based when random sampling is reasonable. The standard in applied micro is hybrid: design-based identification + model-based inference. Doubly robust methods (AIPW) combine both.
The key quantity is the Minimum Detectable Effect (MDE) — the smallest effect detectable with 80% power at alpha = 0.05.
Quick MDE formula (equal groups, two-sided test):
MDE = 2.8 × sigma / sqrt(N)
Required N = (2.8 × sigma / MDE)²
For IV designs, the effective MDE is inflated by the inverse of the first-stage coefficient: MDE_IV ≈ MDE_OLS / |pi|. A weak first stage (small pi) dramatically reduces power.
For DiD designs, effective power increases with more post-treatment periods and higher within-group correlation (absorbed by FEs). For RDD, use effective N (observations within bandwidth), not total N.
For cluster-randomized designs, the design effect (1 + (m-1) × ICC) inflates variance — with ICC = 0.05 and cluster size m = 50, you need 3.45x as many observations.
For full MDE formulas (DiD, IV, RDD, cluster-randomized), power simulation code, and MDE interpretation tables, see references/reporting-standards.md.
references/reporting-standards.md.A "bad control" is a variable that is itself an outcome of treatment. Conditioning on it introduces selection bias.
| Variable Type | Example | Why It Is Bad |
|---|---|---|
| Post-treatment outcome | Controlling for occupation when estimating returns to education | Education affects occupation; conditioning selects on an outcome of treatment |
| Mediator | Controlling for wages when estimating effect of training on employment | Blocks part of the causal effect |
| Collider | Conditioning on "survived" when estimating health effects | Opens a non-causal path |
Rule of thumb: If you cannot be sure a variable is determined before treatment, do not include it as a control. When in doubt, draw the DAG.
| Mistake | Consequence | Fix |
|---|---|---|
| Running TWFE with staggered timing | Already-treated units used as controls; negative weights; estimate can have wrong sign | Use Callaway-Sant'Anna, Sun-Abraham, or other modern DiD estimator |
| Using single post-treatment indicator for all cohorts | Masks heterogeneity in treatment effects across cohorts | Estimate group-time ATTs separately, then aggregate |
| Not reporting the Bacon decomposition | Reader cannot assess how much of the TWFE estimate comes from problematic comparisons | Report bacondecomp output |
Never plug a manual first-stage into an OLS second stage (SEs are wrong — use proper 2SLS). Never use a nonlinear first stage with linear second stage (not consistent — use control function). Never include generated regressors without bootstrapping the full two-step procedure.
For full minimum reporting standards (method-specific checklists for IV, DiD, RDD, SC, Matching) and complete power analysis code, see references/reporting-standards.md. For sensitivity analysis procedures (Oster bounds, Conley bounds, breakdown frontiers, specification curves), see references/sensitivity-analysis.md.
Agents:
econometric-reviewer: Reviews identification strategy, standard errors, and diagnostic resultsidentification-critic: Evaluates identification argument completeness and exclusion restrictionsnumerical-auditor: Designs power simulations for nonstandard study designsjournal-referee: Reviews whether the empirical strategy meets journal standardsCross-references:
identification-proofs skill: Formalize an identification argument for the chosen methodreferences/diagnostic-battery.md: Run the full diagnostic battery for the estimated specificationreferences/sensitivity-analysis.md: Run sensitivity analysis (Oster bounds, specification curve, breakdown frontier)publication-output skill: Format regression tables and diagnostic output for publication