Skill

policyengine-microsimulation

Enables PolicyEngine microsimulation for population-level policy analysis, winners/losers, poverty impact, revenue, and distributional effects. Provides code patterns and release verification.

Python

backend

data-engineering

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/data-science:policyengine-microsimulation-skill

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

- **Microsimulation API**: https://policyengine.github.io/policyengine-us/usage/microsimulation.html

SKILL.md

548 lines · ~6k tokens(exceeds 5k compaction limit)

Stats

LanguagePython

Stars31

Forks5

MaintenanceExcellent

Last CommitJun 23, 2026

Actions

View Source View Plugin View on GitHub View README

PolicyEngine Microsimulation

Documentation References

Microsimulation API: https://policyengine.github.io/policyengine-us/usage/microsimulation.html
Parameter Discovery: https://policyengine.github.io/policyengine-us/usage/parameter-discovery.html
Reform.from_dict(): https://policyengine.github.io/policyengine-core/usage/reforms.html

Latest rules and microdata provenance

When the user asks to use the latest PolicyEngine rules, data, microdata, variable weights, sampled households, or default datasets, verify the package release from authoritative live metadata before running the microsimulation. Do not infer "latest" from search snippets, local installed versions, lockfiles, or repo constraints.

For policyengine.py bundled rules plus default microdata:

# Verify the latest policyengine.py release.
python - <<'PY'
import json
import urllib.request

with urllib.request.urlopen(
    "https://pypi.org/pypi/policyengine/json",
    timeout=20,
) as response:
    print(json.load(response)["info"]["version"])
PY
python -m pip index versions policyengine

# Install the exact verified version with the country extra.
uv pip install "policyengine[us]==X.Y.Z"

Then confirm the resolved model/data bundle without importing top-level policyengine, which may initialize countries and private data you are not using:

python - <<'PY'
import json
from importlib import metadata
from pathlib import Path

for package in ["policyengine", "policyengine-us"]:
    print(f"{package}=={metadata.version(package)}")
    direct_url = metadata.distribution(package).read_text("direct_url.json")
    if direct_url:
        print(f"{package} direct_url={direct_url}")

manifest_path = Path(
    metadata.distribution("policyengine").locate_file(
        "policyengine/data/release_manifests/us.json"
    )
)
manifest = json.loads(manifest_path.read_text())
print(json.dumps({
    "bundle_id": manifest.get("bundle_id"),
    "model_package": manifest.get("model_package"),
    "data_package": manifest.get("data_package"),
    "default_dataset": manifest.get("default_dataset"),
    "default_dataset_uri": (
        manifest.get("certified_data_artifact") or {}
    ).get("uri"),
    "certification": manifest.get("certification"),
}, indent=2, sort_keys=True))
PY

Only call a run "latest" after this verification. If PyPI, npm, GitHub, or the official provider source cannot be reached, report that latest could not be verified instead of guessing.

CRITICAL: Use calc() with MicroSeries — never strip weights or fetch them manually

MicroSeries handles all weighting automatically. Never convert to numpy, strip types, or do manual weight math.

NEVER strip MicroSeries weights

calc() and calculate() return MicroSeries with embedded weights AND entity context. Any of these operations strip both, producing silently wrong results:

Anti-pattern	Why it's wrong
`np.array(series)`	Converts to unweighted numpy array
`series.values` / `series.to_numpy()`	Same — strips weights and entity context
`series.astype(float)` / `.astype(int)`	Converts MicroSeries to plain pandas Series, losing weight metadata
`float(series.sum())`	Premature scalar extraction — usually a sign of manual weight math nearby
`np.average(x, weights=w)`	Manual weighting — `.mean()` already does this correctly

NEVER fetch weight variables manually

calc() returns MicroSeries that already knows its weights. There is no reason to fetch household_weight, spm_unit_weight, person_weight, or tax_unit_weight yourself. If you're writing sim.calc("spm_unit_weight", ...), something is wrong — calc() handles weight mapping internally via the map_to parameter.

# ❌ WRONG — fetching weights and doing manual math
liheap = sim.calc("dc_liheap_payment", period=2026).astype(float)  # strips weights
weights = sim.calc("spm_unit_weight", period=2026).astype(float)    # unnecessary
total = float((liheap * weights).sum())                              # manual weighting
avg = float(np.average(liheap[liheap > 0], weights=weights[liheap > 0]))  # numpy

# ✅ CORRECT — MicroSeries does everything
liheap = sim.calc("dc_liheap_payment", period=2026)
total = liheap.sum()                  # Weighted total
avg = liheap[liheap > 0].mean()       # Weighted mean of recipients

# ❌ WRONG — np.array() strips weights AND entity context
change_arr = np.array(sim.calc("income_tax", period=2026))
weights = np.array(sim.calc("household_weight", period=2026))
# These may be DIFFERENT LENGTHS (tax units vs households)!
losers = weights[change_arr < -1].sum()  # SILENTLY WRONG

# ✅ CORRECT — keep as MicroSeries, all operations are weighted
income_tax_b = baseline.calc("income_tax", period=2026)
income_tax_r = reformed.calc("income_tax", period=2026)
tax_change = income_tax_r - income_tax_b
loser_count = (tax_change > 1).sum()    # Weighted count of losers
loser_share = (tax_change > 1).mean()   # Weighted share of losers
avg_change = tax_change.mean()          # Weighted mean change
total_change = tax_change.sum()         # Weighted total

Entity-level matching

When comparing variables across entities, use map_to to align them — never mix raw arrays from different entities:

# ❌ WRONG - income_tax is tax_unit level, household_weight is household level
tax = np.array(sim.calc("income_tax", period=2026))        # 23K tax units
wt = np.array(sim.calc("household_weight", period=2026))   # 15K households
# tax and wt have DIFFERENT lengths — any indexing is wrong

# ✅ CORRECT - map income_tax to household level, or just use MicroSeries
tax = sim.calc("income_tax", period=2026)  # tax_unit level MicroSeries
losers = (tax > 0).sum()  # Weighted count, correct entity

Quick start

from policyengine_us import Microsimulation
from policyengine_core.reforms import Reform

baseline = Microsimulation()
reform = Reform.from_dict({
    'gov.irs.credits.ctc.amount.base[0].amount': {'2026-01-01.2100-12-31': 3000}
}, 'policyengine_us')
reformed = Microsimulation(reform=reform)

# calc() returns MicroSeries - all operations are weighted automatically
baseline_income = baseline.calc('household_net_income', period=2026, map_to='person')
reformed_income = reformed.calc('household_net_income', period=2026, map_to='person')
change = reformed_income - baseline_income

# Weighted stats - no manual weight handling needed!
print(f"Average impact: ${change.mean():,.0f}")
print(f"Total cost: ${change.sum()/1e9:,.1f}B")
print(f"Share losing: {(change < 0).mean():.1%}")

API methods

US: sim.calc() — policyengine_us.Microsimulation uses .calc(). Do NOT use .calculate() for US.
UK: sim.calculate() — policyengine_uk.Microsimulation uses .calculate(). It does NOT have .calc().
Both return MicroSeries with automatic weighting. Use .sum(), .mean(), arithmetic operators.
Use the period= keyword: sim.calc("variable", period=2026), not sim.calc("variable", 2026).

Creating reforms

US (policyengine_us.Microsimulation)

from policyengine_us import Microsimulation
from policyengine_core.reforms import Reform

reform = Reform.from_dict({
    'gov.irs.credits.ctc.amount.base[0].amount': {'2026-01-01.2100-12-31': 3600},
}, 'policyengine_us')

baseline = Microsimulation()
reformed = Microsimulation(reform=reform)

UK (policyengine_uk.Microsimulation)

from policyengine_uk import Microsimulation

# UK: pass reform dict directly — do NOT use Reform.from_dict()
reform = {
    'gov.hmrc.income_tax.allowances.personal_allowance.amount': {'2026-01-01.2100-12-31': 15000}
}

baseline = Microsimulation()
reformed = Microsimulation(reform=reform)

IMPORTANT: For UK, pass the dict directly to Microsimulation(reform=dict). Reform.from_dict() works for US but causes errors in UK.

Dataset argument

National: Omit dataset argument entirely (default is correct)
State-level: Microsimulation(dataset='hf://policyengine/policyengine-us-data/states/NY.h5')
Never pass short strings like dataset="cps_2024" — use full HF URLs or omit

Available Datasets (HuggingFace)

# National (default)
sim = Microsimulation()

# State-level
sim = Microsimulation(dataset='hf://policyengine/policyengine-us-data/states/NY.h5')

# Congressional district - SEE policyengine-district-analysis skill for full examples
sim = Microsimulation(dataset='hf://policyengine/policyengine-us-data/districts/NY-17.h5')

For congressional district analysis (representative's constituents, district-level impacts), use the policyengine-district-analysis skill which has complete examples.

Memory considerations

State-level datasets are large (~590MB each). Loading two Microsimulation objects simultaneously (baseline + reformed) requires ~2GB+ RAM. Options:

Preferred: Use the national dataset (default) — it includes all states with proper weighting
If state-calibrated weights needed: Compute baseline values first, delete the baseline object (del baseline), then create the reformed simulation

Key MicroSeries Methods

MicroSeries (from microdf) handles all weighting automatically — see the microdf skill for full documentation.

income = sim.calc('household_net_income', period=2026, map_to='person')

# Basic weighted statistics
income.mean()           # Weighted mean
income.sum()            # Weighted sum
income.median()         # Weighted median
(income > 50000).mean() # Weighted share meeting condition

# Inequality metrics (see microdf skill for more)
income.gini()           # Weighted Gini coefficient

Inequality & Distributional Analysis

Use built-in MicroSeries methods — never reimplement Gini or other inequality metrics manually:

Include the full distribution by default. PolicyEngine distributional tables should not drop the bottom 5%, trim nonpositive incomes, or otherwise exclude income tails unless the user explicitly requests that exclusion or you are replicating a cited external method. If you do apply an exclusion, make it visible in the table/chart labels and methodology.

baseline_income = baseline.calc('household_net_income', period=2026, map_to='person')
reformed_income = reformed.calc('household_net_income', period=2026, map_to='person')

# Gini coefficient change
print(f"Baseline Gini: {baseline_income.gini():.4f}")
print(f"Reform Gini:   {reformed_income.gini():.4f}")

# Poverty rate (boolean MicroSeries — .mean() gives weighted rate)
baseline_in_poverty = baseline.calc('person_in_poverty', period=2026, map_to='person')
print(f"SPM poverty rate: {baseline_in_poverty.mean():.1%}")

# Decile-level analysis
income.decile_rank()    # Assign decile ranks (1-10)

Poverty analysis

Overall and child poverty

Use household_weight (the only calibrated weight) with MicroSeries arithmetic for all poverty calculations. No .values or np.sum() needed.

from policyengine_us import Microsimulation

baseline = Microsimulation()
reformed = Microsimulation(reform=reform)

YEAR = 2026

# Overall poverty rate — .mean() gives weighted rate automatically
baseline_in_poverty = baseline.calc('person_in_poverty', period=YEAR, map_to='person')
reform_in_poverty = reformed.calc('person_in_poverty', period=YEAR, map_to='person')
baseline_poverty_rate = baseline_in_poverty.mean()
reform_poverty_rate = reform_in_poverty.mean()
print(f"Poverty: {baseline_poverty_rate:.1%} → {reform_poverty_rate:.1%}")

# Child poverty rate — filter by is_child, then .mean()
is_child = baseline.calc('is_child', period=YEAR)
baseline_child_pov_rate = (baseline_in_poverty * is_child).sum() / is_child.sum()
reform_child_pov_rate = (reform_in_poverty * is_child).sum() / is_child.sum()
print(f"Child poverty: {baseline_child_pov_rate:.1%} → {reform_child_pov_rate:.1%}")

# People lifted out of poverty
total_people = baseline_in_poverty.sum() / baseline_in_poverty.mean()  # total weighted pop
people_lifted = (baseline_poverty_rate - reform_poverty_rate) * total_people
children_lifted = (baseline_child_pov_rate - reform_child_pov_rate) * is_child.sum()

WARNING: Never subtract boolean MicroSeries directly. NumPy 2.4+ raises TypeError on boolean subtraction (True - False). Use .mean() to get float rates first, then subtract:
# ❌ DON'T: diff = baseline_in_poverty - reform_in_poverty  # TypeError in numpy 2.4+
# ✅ DO: compute rates with .mean(), then subtract floats
baseline_rate = baseline_in_poverty.mean()
reform_rate = reform_in_poverty.mean()
reduction_pp = baseline_rate - reform_rate

Deep poverty

in_deep_poverty is at the SPM unit level. Use household_weight mapped to the appropriate entity level:

# Deep poverty — person-level via map_to
baseline_in_deep_poverty = baseline.calc('in_deep_poverty', period=YEAR, map_to='person')
reform_in_deep_poverty = reformed.calc('in_deep_poverty', period=YEAR, map_to='person')

baseline_deep_rate = baseline_in_deep_poverty.mean()
reform_deep_rate = reform_in_deep_poverty.mean()
print(f"Deep poverty: {baseline_deep_rate:.1%} → {reform_deep_rate:.1%}")

# Deep child poverty rate
is_child = baseline.calc('is_child', period=YEAR)
baseline_deep_child_rate = (baseline_in_deep_poverty * is_child).sum() / is_child.sum()
reform_deep_child_rate = (reform_in_deep_poverty * is_child).sum() / is_child.sum()
print(f"Deep child poverty: {baseline_deep_child_rate:.1%} → {reform_deep_child_rate:.1%}")

Subgroup analysis note

MicroSeries arithmetic handles subgroup analysis — you should rarely need .values. Multiply by a boolean MicroSeries to filter (e.g., pov * is_child * pw) and use .sum() / .mean() directly.

UK microsimulation

Key differences from US

UK uses .calculate(), NOT .calc() — policyengine_uk.Microsimulation does NOT have a .calc() method at all. Always use .calculate().
UK reform: Pass a plain dict directly to Microsimulation(reform=dict). Do NOT use Reform.from_dict() — it causes errors with UK.
UK poverty variables: in_poverty_bhc (before housing costs) and in_poverty_ahc (after housing costs) — both at household level. UK poverty analysis typically reports both BHC and AHC rates; always note which measure you are using.
UK entity structure: household (not SPM unit), benunit (benefit unit), person. Use household_weight and household_count_people for person-weighted rates.

Example: UK personal allowance reform with poverty analysis

from policyengine_uk import Microsimulation

reform = {'gov.hmrc.income_tax.allowances.personal_allowance.amount': {'2026-01-01.2100-12-31': 15000}}

baseline = Microsimulation()
reformed = Microsimulation(reform=reform)

# Cost
baseline_income = baseline.calculate('household_net_income', period=2026)
reform_income = reformed.calculate('household_net_income', period=2026)
cost = (reform_income - baseline_income).sum()

# Poverty (BHC) — map to person level for .mean()
baseline_in_poverty = baseline.calculate('in_poverty_bhc', period=2026, map_to='person')
reform_in_poverty = reformed.calculate('in_poverty_bhc', period=2026, map_to='person')
baseline_rate = baseline_in_poverty.mean()
reform_rate = reform_in_poverty.mean()

print(f"Cost: £{cost / 1e9:,.1f}B")
print(f"Poverty (BHC): {baseline_rate:.1%} → {reform_rate:.1%}")

CRITICAL: Budgetary impact calculation

Start with a BOTEC range before running code, and flag if the point estimate diverges

Use `household_net_income` for total cost — but understand what it includes

The budgetary cost of a reform is the change in household_net_income, NOT the change in the directly-modified program variable. A reform that changes one program (e.g., CTC) can have cascading effects on other taxes and benefits through interactions (refundability, phase-outs, benefit clawbacks). Summing only the program-specific variable will undercount the true cost.

This matches the pattern used in the PolicyEngine API (policyengine-api/endpoints/economy/compare.py).

IMPORTANT: household_net_income includes state tax effects. Many states inherit federal taxable_income, so a federal reform that changes taxable_income will indirectly change state taxes too. For federal-only revenue estimates, use income_tax directly:

# Total cost including state tax interactions
total_cost = (reformed.calc('household_net_income', period=YEAR).sum() -
              baseline.calc('household_net_income', period=YEAR).sum()) / 1e9

# Federal-only revenue impact (use this when scoring a federal bill)
federal_rev = (reformed.calc('income_tax', period=YEAR).sum() -
               baseline.calc('income_tax', period=YEAR).sum()) / 1e9

# Break out all components
state_tax_cost = (baseline.calc('state_income_tax', period=YEAR).sum() -
                  reformed.calc('state_income_tax', period=YEAR).sum()) / 1e9
benefit_cost = (reformed.calc('household_benefits', period=YEAR).sum() -
                baseline.calc('household_benefits', period=YEAR).sum()) / 1e9

print(f"Total budgetary cost: ${total_cost:,.1f}B")
print(f"Federal income tax revenue change: ${federal_rev:,.1f}B")
print(f"State/local tax revenue loss: ${state_tax_cost:,.1f}B")
print(f"Benefit spending increase: ${benefit_cost:,.1f}B")

Why not sum the program variable directly? Example: making the CTC fully refundable shifts credits from non-refundable to refundable, changing income_tax by much more than the ctc variable itself changes. The household_net_income change captures the full effect.

Per-program decomposition

Individual program changes are still useful for understanding where the cost comes from, but they don't substitute for the total household_net_income cost above.

programs = ["income_tax", "ctc", "eitc", "snap", "ssi", "household_benefits"]
for prog in programs:
    b = baseline.calc(prog, period=YEAR).sum()
    r = reformed.calc(prog, period=YEAR).sum()
    if abs(r - b) > 1e6:
        print(f"{prog}: ${(r - b) / 1e9:+.1f}B")

Current law context

Always check baseline parameter values before interpreting reform impacts. Tax law changes frequently (TCJA, OBBBA, etc.). Use CountryTaxBenefitSystem().parameters to look up current-law values:

from policyengine_us import CountryTaxBenefitSystem
p = CountryTaxBenefitSystem().parameters
print(p.gov.irs.credits.ctc.amount.base("2026-01-01"))
print(p.gov.irs.credits.ctc.refundable.fully_refundable("2026-01-01"))

Finding parameter paths

grep -r "salt" policyengine_us/parameters/gov/irs/ --include="*.yaml"

Parameter tree: gov.irs.deductions, gov.irs.credits, gov.states.{state}.tax

Patterns: Filing status variants (SINGLE, JOINT, etc.), bracket syntax [index], date format 'YYYY-MM-DD.YYYY-MM-DD'

CRITICAL: Bracket path syntax for scale parameters

When referencing bracket/scale parameters, the bracket index goes directly on the scale node — there is NO .brackets in the path.

# ✅ Correct — bracket index on the scale node
'gov.irs.credits.ctc.amount.base[0].amount'
'gov.states.ca.tax.income.rates.single[8].rate'
'gov.states.ca.tax.income.rates.single[8].threshold'

# ❌ Wrong — ".brackets" does not exist in the path
'gov.irs.credits.ctc.amount.base.brackets[0].amount'
'gov.states.ca.tax.income.rates.single.brackets[8].rate'

The YAML file has a brackets: list, but the parameter tree flattens it. The index attaches to the node containing the brackets (the YAML filename without .yaml), not to a child called brackets.

To verify a path, inspect the parameter tree:

from policyengine_us import CountryTaxBenefitSystem
p = CountryTaxBenefitSystem().parameters
print(p.gov.irs.credits.ctc.amount.base[0].amount("2026-01-01"))

Two types of bracket parameters

ParameterScale (marginal rate schedules, single YAML with brackets: at root):
- Path: parent_node.scale_name[index].rate or .threshold
- Example: gov.states.ca.tax.income.rates.single[8].rate
ParameterNode with indexed children (folder-based, separate YAML files):
- Path: node_name[index].child_name
- Example: gov.irs.credits.ctc.amount.base[0].amount

Both use [index] syntax in Reform.from_dict() — the difference is in the YAML structure. Use CountryTaxBenefitSystem().parameters to navigate and verify paths.

Complete analysis recipe: single-program impact with breakdowns

This pattern covers the common case of analyzing a single benefit or tax variable with subgroup breakdowns. All operations stay in MicroSeries — no manual weights, no numpy, no .astype().

from policyengine_us import Microsimulation

sim = Microsimulation()
YEAR = 2026

# All variables via calc() — returns weighted MicroSeries
benefit = sim.calc("dc_liheap_payment", period=YEAR)
income_level = sim.calc("dc_liheap_income_level", period=YEAR)
unit_size = sim.calc("spm_unit_size", period=YEAR)

# Summary stats — .sum() and .mean() are weighted automatically
recipients = (benefit > 0)
print(f"Total spending:  ${benefit.sum():>12,.0f}")
print(f"Recipient units: {recipients.sum():>12,.0f}")
print(f"Avg benefit:     ${benefit[recipients].mean():>12,.0f}")

# Subgroup breakdowns — boolean mask preserves MicroSeries weights
for level in range(1, 11):
    at_level = recipients & (income_level == level)
    if at_level.any():
        print(f"Level {level}: {at_level.sum():,.0f} units, "
              f"avg ${benefit[at_level].mean():,.0f}, "
              f"total ${benefit[at_level].sum():,.0f}")

# Size breakdown (grouped)
for size in [1, 2, 3]:
    mask = recipients & (unit_size == size)
    if mask.any():
        print(f"Size {size}: {mask.sum():,.0f} units, avg ${benefit[mask].mean():,.0f}")
large = recipients & (unit_size >= 4)
if large.any():
    print(f"Size 4+: {large.sum():,.0f} units, avg ${benefit[large].mean():,.0f}")

Common variables for microsimulation

Weights

household_weight — the only calibrated weight. calc() uses it internally via map_to — you should never need to fetch it directly. Use map_to='person' or map_to='spm_unit' on any calc() call to project to other entity levels.

Person-level

person_in_poverty — SPM poverty indicator (boolean)
is_child — under 18
age, employment_income

Household-level

household_net_income — net income after taxes/transfers
household_count_people — number of people in household

SPM unit-level

spm_unit_size, spm_unit_count_children
in_poverty, in_deep_poverty

Tax/benefit variables

income_tax, ctc, eitc, snap, ssi

policyengine-microsimulation

Popularity

Invocation

Context Preview

SKILL.md

policyengine-microsimulation

Popularity

Invocation

Context Preview

SKILL.md

PolicyEngine Microsimulation

Documentation References

Latest rules and microdata provenance

CRITICAL: Use calc() with MicroSeries — never strip weights or fetch them manually

NEVER strip MicroSeries weights

NEVER fetch weight variables manually

Entity-level matching

Quick start

API methods

Creating reforms

US (policyengine_us.Microsimulation)

UK (policyengine_uk.Microsimulation)

Dataset argument

Available Datasets (HuggingFace)

Memory considerations

Key MicroSeries Methods

Inequality & Distributional Analysis

Poverty analysis

Overall and child poverty

Deep poverty

Subgroup analysis note

UK microsimulation

Key differences from US

Example: UK personal allowance reform with poverty analysis

CRITICAL: Budgetary impact calculation

Start with a BOTEC range before running code, and flag if the point estimate diverges

Use household_net_income for total cost — but understand what it includes

Per-program decomposition

Current law context

Finding parameter paths

CRITICAL: Bracket path syntax for scale parameters

Two types of bracket parameters

Complete analysis recipe: single-program impact with breakdowns

Common variables for microsimulation

Weights

Person-level

Household-level

SPM unit-level

Tax/benefit variables

Similar Skills

PolicyEngine Microsimulation

Documentation References

Latest rules and microdata provenance

CRITICAL: Use calc() with MicroSeries — never strip weights or fetch them manually

NEVER strip MicroSeries weights

NEVER fetch weight variables manually

Entity-level matching

Quick start

API methods

Creating reforms

US (policyengine_us.Microsimulation)

UK (policyengine_uk.Microsimulation)

Dataset argument

Available Datasets (HuggingFace)

Memory considerations

Key MicroSeries Methods

Inequality & Distributional Analysis

Poverty analysis

Overall and child poverty

Deep poverty

Subgroup analysis note

UK microsimulation

Key differences from US

Example: UK personal allowance reform with poverty analysis

CRITICAL: Budgetary impact calculation

Start with a BOTEC range before running code, and flag if the point estimate diverges

Use household_net_income for total cost — but understand what it includes

Per-program decomposition

Current law context

Finding parameter paths

Use `household_net_income` for total cost — but understand what it includes

Use `household_net_income` for total cost — but understand what it includes