Weighted pandas DataFrames for survey microdata analysis - inequality, poverty, and distributional calculations. Triggers: "weighted mean", "Gini", "poverty rate", "inequality", "MicroDataFrame", "MicroSeries", "weighted statistics", "decile", "quintile", "income distribution", "microdf"
From essentialnpx claudepluginhub policyengine/policyengine-claude --plugin data-scienceThis skill uses the workspace's default tool permissions.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Optimizes cloud costs on AWS, Azure, GCP via rightsizing, tagging strategies, reserved instances, spot usage, and spending analysis. Use for expense reduction and governance.
MicroDF provides weighted pandas DataFrames and Series for analyzing survey microdata, with built-in support for inequality and poverty calculations.
When you see poverty rates, Gini coefficients, or distributional charts in PolicyEngine, those are calculated using MicroDF.
MicroDF powers:
Gini coefficient:
Poverty rates:
Percentiles:
uv pip install microdf-python
import microdf as mdf
import pandas as pd
# Create sample data
df = pd.DataFrame({
'income': [10000, 20000, 30000, 40000, 50000],
'weights': [1, 2, 3, 2, 1]
})
# Create MicroDataFrame
mdf_df = mdf.MicroDataFrame(df, weights='weights')
# All operations are weight-aware
print(f"Weighted mean: ${mdf_df.income.mean():,.0f}")
print(f"Gini coefficient: {mdf_df.income.gini():.3f}")
Weighted statistics:
mdf_df.income.mean() # Weighted mean
mdf_df.income.median() # Weighted median
mdf_df.income.sum() # Weighted sum
mdf_df.income.std() # Weighted standard deviation
Inequality metrics:
mdf_df.income.gini() # Gini coefficient
mdf_df.income.top_x_pct_share(10) # Top 10% share
mdf_df.income.top_x_pct_share(1) # Top 1% share
Poverty analysis:
# Poverty rate (income < threshold)
poverty_rate = mdf_df.poverty_rate(
income_measure='income',
threshold=poverty_line
)
# Poverty gap (how far below threshold)
poverty_gap = mdf_df.poverty_gap(
income_measure='income',
threshold=poverty_line
)
# Deep poverty (income < 50% of threshold)
deep_poverty_rate = mdf_df.deep_poverty_rate(
income_measure='income',
threshold=poverty_line,
deep_poverty_line=0.5
)
Quantiles:
# Deciles
mdf_df.income.decile_values()
# Quintiles
mdf_df.income.quintile_values()
# Custom quantiles
mdf_df.income.quantile(0.25) # 25th percentile
# Extract a Series with weights
income_series = mdf_df.income # This is a MicroSeries
# MicroSeries operations
income_series.mean()
income_series.gini()
income_series.percentile(50)
WARNING: .values and .to_numpy() strip weights. These methods now emit a UserWarning because they return plain numpy arrays where operations like .mean() are unweighted. Always use MicroSeries methods directly for weighted calculations:
# ❌ WRONG - strips weights, .mean() is unweighted
ms.values.mean()
ms.to_numpy().mean()
# ✅ CORRECT - weighted automatically
ms.mean()
import microdf as mdf
from policyengine_us import Simulation
# Run simulation with axes (multiple households)
situation_with_axes = {...} # See policyengine-us-skill
sim = Simulation(situation=situation_with_axes)
# Get results as arrays
incomes = sim.calculate("household_net_income", 2026)
weights = sim.calculate("household_weight", 2026)
# Create MicroDataFrame
df = pd.DataFrame({'income': incomes, 'weight': weights})
mdf_df = mdf.MicroDataFrame(df, weights='weight')
# Calculate metrics
gini = mdf_df.income.gini()
poverty_rate = mdf_df.poverty_rate('income', threshold=15000)
print(f"Gini: {gini:.3f}")
print(f"Poverty rate: {poverty_rate:.1%}")
Location: PolicyEngine/microdf
Clone:
git clone https://github.com/PolicyEngine/microdf
cd microdf
To see current API:
# Main classes
cat microdf/microframe.py # MicroDataFrame
cat microdf/microseries.py # MicroSeries
# Key modules
cat microdf/generic.py # Generic weighted operations
cat microdf/inequality.py # Gini, top shares
cat microdf/poverty.py # Poverty metrics
To see all methods:
# MicroDataFrame methods
grep "def " microdf/microframe.py
# MicroSeries methods
grep "def " microdf/microseries.py
To see test patterns:
ls tests/
cat tests/test_microframe.py
Run tests:
make test
# Or
pytest tests/ -v
Before contributing:
Common contributions:
# Define custom weighted aggregation
def weighted_operation(series, weights):
return (series * weights).sum() / weights.sum()
# Apply to MicroSeries
result = weighted_operation(mdf_df.income, mdf_df.weights)
# Group by with weights
grouped = mdf_df.groupby('state')
state_means = grouped.income.mean() # Weighted means by state
To see decomposition methods:
grep -A 20 "def.*decomp" microdf/
# Pattern from PolicyEngine blog posts
import microdf as mdf
# Get simulation results
baseline_income = baseline_sim.calculate("household_net_income", 2026)
reform_income = reform_sim.calculate("household_net_income", 2026)
weights = baseline_sim.calculate("household_weight", 2026)
# Create MicroDataFrame
df = pd.DataFrame({
'baseline_income': baseline_income,
'reform_income': reform_income,
'weight': weights
})
mdf_df = mdf.MicroDataFrame(df, weights='weight')
# Calculate impacts
baseline_gini = mdf_df.baseline_income.gini()
reform_gini = mdf_df.reform_income.gini()
print(f"Gini change: {reform_gini - baseline_gini:+.4f}")
# Calculate poverty under baseline and reform
from policyengine_us import Simulation
baseline_sim = Simulation(situation=situation)
reform_sim = Simulation(situation=situation, reform=reform)
# Get incomes — use household_weight (the only calibrated weight) mapped to spm_unit
baseline_income = baseline_sim.calculate("spm_unit_net_income", 2026)
reform_income = reform_sim.calculate("spm_unit_net_income", 2026)
spm_threshold = baseline_sim.calculate("spm_unit_poverty_threshold", 2026)
weights = baseline_sim.calculate("household_weight", 2026, map_to="spm_unit")
# Calculate poverty rates
df_baseline = mdf.MicroDataFrame(
pd.DataFrame({'income': baseline_income, 'threshold': spm_threshold, 'weight': weights}),
weights='weight'
)
poverty_baseline = (df_baseline.income < df_baseline.threshold).mean() # Weighted
# Similar for reform
print(f"Poverty reduction: {(poverty_baseline - poverty_reform):.1%}")
Maturity: Stable, production-ready API stability: Stable (rarely breaking changes) Performance: Optimized for large datasets
To see version:
pip show microdf-python
To see changelog:
cat CHANGELOG.md # In microdf repo
Repository: https://github.com/PolicyEngine/microdf PyPI: https://pypi.org/project/microdf-python/ Issues: https://github.com/PolicyEngine/microdf/issues