From great-econometrics
Econometrics skill for Regression Discontinuity Design (RDD). Activates when the user asks about: "regression discontinuity", "RDD", "RD design", "sharp RDD", "fuzzy RDD", "running variable", "forcing variable", "cutoff", "bandwidth selection", "local linear regression", "McCrary test", "density test", "RDROBUST", "continuity assumption", "donut hole RDD", "geographic RDD", "断点回归", "回归不连续", "运行变量", "截断值", "带宽选择", "精确断点", "模糊断点", "密度检验", "局部线性回归"
npx claudepluginhub zhouziyue233/great-econometrics --plugin econometricsThis skill uses the workspace's default tool permissions.
This skill covers sharp and fuzzy RDD: identification assumptions, bandwidth selection, local polynomial estimation, validity tests, and reporting standards for academic papers.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
This skill covers sharp and fuzzy RDD: identification assumptions, bandwidth selection, local polynomial estimation, validity tests, and reporting standards for academic papers.
RDD exploits a known threshold in a continuous "running variable" (X) that determines treatment assignment. Units just above and below the cutoff (c) are comparable on all dimensions except treatment.
Sharp RDD: Treatment perfectly determined by crossing cutoff
Fuzzy RDD: Crossing cutoff increases probability of treatment (like an instrument)
Always plot the raw data with binned means before any regression.
# Python
import matplotlib.pyplot as plt
import numpy as np
# Bin the running variable
df['bin'] = pd.cut(df['running_var'], bins=50)
bin_means = df.groupby('bin')[['running_var', 'y']].mean().reset_index()
plt.figure(figsize=(10, 6))
plt.scatter(bin_means['running_var'], bin_means['y'], s=30, color='steelblue')
plt.axvline(x=cutoff, color='red', linestyle='--', label='Cutoff')
plt.xlabel('Running Variable'); plt.ylabel('Outcome')
plt.title('RDD: Binned Scatter Plot')
plt.legend(); plt.show()
# R (rdplot from rdrobust)
library(rdrobust)
rdplot(y = df$y, x = df$running_var, c = cutoff,
title = "RDD Binned Scatter", x.label = "Running Variable",
y.label = "Outcome")
rdplot y running_var, c(cutoff) graph_options(title("RDD Visualization"))
Default: Use Imbens-Kalyanaraman (IK) or Calonico-Cattaneo-Titiunik (CCT) optimal bandwidth.
from rdrobust import rdrobust
result = rdrobust(df['y'], df['running_var'], c=cutoff)
print(result.summary())
rdbwselect(y = df$y, x = df$running_var, c = cutoff)
rdbwselect y running_var, c(cutoff) all
# Python (rdrobust) — triangular kernel, local linear
result = rdrobust(y=df['y'], x=df['running_var'], c=cutoff,
kernel='triangular', p=1)
print(result.summary())
# R
main_rdd <- rdrobust(y = df$y, x = df$running_var, c = cutoff,
kernel = "triangular", p = 1)
summary(main_rdd)
rdrobust y running_var, c(cutoff) kernel(triangular) p(1)
H₀: No discontinuity in density of running variable at cutoff
from rdrobust import rddensity
density_test = rddensity(df['running_var'], c=cutoff)
print(density_test.summary())
library(rddensity)
rdd_density <- rddensity(df$running_var, c = cutoff)
summary(rdd_density)
rdplotdensity(rdd_density, df$running_var)
rddensity running_var, c(cutoff)
Interpretation: p > 0.05 → no bunching; manipulation unlikely ✓
Run RDD on pre-determined covariates — should find no discontinuity.
for (cov in c("age", "income_pre", "gender")) {
res <- rdrobust(y = df[[cov]], x = df$running_var, c = cutoff)
cat(cov, ": coef =", res$coef[1], ", p =", res$pv[3], "\n")
}
Run RDD at fake cutoffs above and below actual cutoff — should find no effects.
for (fake_c in c(cutoff - 5, cutoff + 5)) {
df_sub <- df[df$running_var < cutoff, ] # Use only control side
res <- rdrobust(df_sub$y, df_sub$running_var, c = fake_c)
cat("Placebo c =", fake_c, ": coef =", res$coef[1], "\n")
}
Report estimates at 50%, 75%, 125%, 150% of optimal bandwidth.
bw_opt <- rdbwselect(df$y, df$running_var, c = cutoff)$bws[1,1]
for (mult in c(0.5, 0.75, 1, 1.25, 1.5)) {
res <- rdrobust(df$y, df$running_var, c = cutoff, h = bw_opt * mult)
cat("BW =", round(bw_opt*mult,2), ": coef =", round(res$coef[1],3),
", p =", round(res$pv[3],3), "\n")
}
# R — fuzzy RDD (uses crossing as instrument for actual treatment)
fuzzy_rdd <- rdrobust(y = df$y, x = df$running_var, c = cutoff,
fuzzy = df$actual_treatment)
summary(fuzzy_rdd)
rdrobust y running_var, c(cutoff) fuzzy(actual_treatment)
Report in this order:
Key sentence template for papers:
"We estimate the RDD using a local linear regression with a triangular kernel and the CCT optimal bandwidth (h = [X]). The point estimate at the cutoff is [β] (SE = [se], p = [p])."
See references/rdd-reference.md for geographic RDD, kink designs, donut-hole robustness, discrete running variable RDD, and multi-cutoff/multi-score RDD.