PolicyEngine vectorization patterns - NumPy operations, where/select usage, avoiding scalar logic with arrays
From essentialnpx claudepluginhub policyengine/policyengine-claude --plugin data-scienceThis skill uses the workspace's default tool permissions.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Critical patterns for vectorized operations in PolicyEngine. Scalar logic with arrays will crash the microsimulation.
PolicyEngine processes multiple households simultaneously using NumPy arrays. NEVER use if-elif-else with entity data.
# THIS WILL CRASH - household data is an array
def formula(household, period, parameters):
income = household("income", period)
if income > 1000: # ❌ CRASH: "truth value of array is ambiguous"
return 500
else:
return 100
# CORRECT - works with arrays
def formula(household, period, parameters):
income = household("income", period)
return where(income > 1000, 500, 100) # ✅ Vectorized
where()# Instead of if-else
❌ if age >= 65:
amount = senior_amount
else:
amount = regular_amount
✅ amount = where(age >= 65, senior_amount, regular_amount)
select()# Instead of if-elif-else
❌ if age < 18:
benefit = child_amount
elif age >= 65:
benefit = senior_amount
else:
benefit = adult_amount
✅ benefit = select(
[age < 18, age >= 65],
[child_amount, senior_amount],
default=adult_amount
)
When dispatching on an Enum variable, list all values explicitly in select(). Set the default to match the Enum's default_value so unexpected values get consistent behavior.
# ❌ BAD — hides one option in default, unclear which is the fallback
provider_type = person("ri_ccap_provider_type", period)
types = provider_type.possible_values
return select(
[
provider_type == types.LICENSED_CENTER,
provider_type == types.LICENSED_FAMILY,
],
[center_rate, family_rate],
default=exempt_rate, # Why exempt as fallback?
)
# ✅ GOOD — all options listed, default matches Enum default_value
provider_type = person("ri_ccap_provider_type", period)
types = provider_type.possible_values
return select(
[
provider_type == types.LICENSED_CENTER,
provider_type == types.LICENSED_FAMILY,
provider_type == types.LICENSE_EXEMPT,
],
[center_rate, family_rate, exempt_rate],
default=center_rate, # Matches default_value = LICENSED_CENTER
)
# Combining conditions
eligible = (age >= 18) & (income < threshold) # Use & not 'and'
eligible = (is_disabled | is_elderly) # Use | not 'or'
eligible = ~is_excluded # Use ~ not 'not'
# Instead of if for bounds checking
❌ if amount < 0:
amount = 0
elif amount > maximum:
amount = maximum
✅ amount = clip(amount, 0, maximum)
# Or: amount = max_(0, min_(amount, maximum))
When subtracting values and wanting to floor at zero, you must wrap the entire subtraction in max_():
# Common scenario: income after deductions/losses
❌ WRONG - Creates phantom negative values:
income = max_(income, 0) - capital_loss # If capital_loss > income, result is negative!
✅ CORRECT - Properly floors at zero:
income = max_(income - capital_loss, 0) # Entire subtraction floored
# Real example from MT income tax bug:
❌ WRONG - Tax on phantom negative income:
def formula(tax_unit, period, parameters):
income = tax_unit("adjusted_gross_income", period)
capital_gains = tax_unit("capital_gains", period)
# BUG: If capital_gains is negative (loss), this creates negative income
# But max_() only floors income, not the result
regular_income = max_(income, 0) - capital_gains
return calculate_tax(regular_income) # Tax on negative number!
✅ CORRECT - No phantom income:
def formula(tax_unit, period, parameters):
income = tax_unit("adjusted_gross_income", period)
capital_gains = tax_unit("capital_gains", period)
# Properly floors the entire result
regular_income = max_(income - capital_gains, 0)
return calculate_tax(regular_income) # Never negative
Why this matters:
capital_gains = -3000 (loss), then income - capital_gains = income + 3000max_(income, 0) - capital_gains allows the subtraction to make the result negativeRule: When the formula is A - B and you want the result floored at zero, use max_(A - B, 0), NOT max_(A, 0) - B.
# OK - parameters are scalars, not arrays
def formula(entity, period, parameters):
p = parameters(period).gov.program
# This is fine - p.enabled is a scalar boolean
if p.enabled:
base = p.base_amount
else:
base = 0
# But must vectorize when using entity data
income = entity("income", period)
return where(income < p.threshold, base, 0)
# OK - controlling which calculation to use
def formula(entity, period, parameters):
year = period.start.year
if year >= 2024:
# Use new formula (still vectorized)
return entity("new_calculation", period)
else:
# Use old formula (still vectorized)
return entity("old_calculation", period)
❌ WRONG:
if household("income", period) > 1000:
# Error: truth value of array is ambiguous
✅ CORRECT:
income = household("income", period)
high_income = income > 1000 # Boolean array
benefit = where(high_income, low_benefit, high_benefit)
❌ WRONG:
eligible = is_elderly or is_disabled # Python's 'or'
✅ CORRECT:
eligible = is_elderly | is_disabled # NumPy's '|'
❌ WRONG:
if eligible:
if income < threshold:
return full_benefit
else:
return partial_benefit
else:
return 0
✅ CORRECT:
return where(
eligible,
where(income < threshold, full_benefit, partial_benefit),
0
)
where() for Divisionwhere() evaluates BOTH branches before selecting. This causes divide-by-zero warnings even when the zero case wouldn't be selected:
# ❌ WRONG - causes divide-by-zero warning
proportion = where(
total_income > 0,
person_income / total_income, # Still evaluated when total_income = 0!
0,
)
np.divide with where Parameter# ✅ CORRECT - only divides where mask is True
# The `out` parameter IS the default value - positions where mask=False keep this value
mask = total_income > 0
proportion = np.divide(
person_income,
total_income,
out=np.zeros_like(person_income), # Default to 0 where mask is False
where=mask,
)
How out works as the default:
out=np.zeros_like(...) → default is 0out=np.ones_like(...) → default is 1where=False keep their out value unchanged# ✅ CORRECT - traditional mask assignment
proportion = np.zeros_like(total_income)
mask = total_income > 0
proportion[mask] = person_income[mask] / total_income[mask]
Proportional allocation (e.g., splitting deductions between spouses):
# Allocate proportionally by income
unit_income = tax_unit.sum(person_income)
mask = unit_income > 0
share = np.divide(
person_income,
unit_income,
out=np.zeros_like(person_income),
where=mask,
)
# Default share when unit has no income
share = where(mask, share, where(is_head, 1.0, 0.0))
Calculating ratios:
# AGI ratio for credit calculations
mask = us_agi != 0
ratio = np.divide(
state_agi,
us_agi,
out=np.zeros_like(us_agi),
where=mask,
)
See these files for reference implementations:
taxable_social_security.py - person share of unit benefitsmo_taxable_income.py - AGI share allocationmd_two_income_subtraction.py - head's share of couple incomeok_child_care_child_tax_credit.py - AGI ratio# Instead of if-elif for ranges
❌ if size == 1:
amount = 100
elif size == 2:
amount = 150
elif size == 3:
amount = 190
✅ # Using parameter brackets
amount = p.benefit_schedule.calc(size)
✅ # Or using select
amounts = [100, 150, 190, 220, 250]
amount = select(
[size == i for i in range(1, 6)],
amounts[:5],
default=amounts[-1] # 5+ people
)
# Building complex eligibility
income_eligible = income < p.income_threshold
resource_eligible = resources < p.resource_limit
demographic_eligible = (age < 18) | is_pregnant
# Combine with & (not 'and')
eligible = income_eligible & resource_eligible & demographic_eligible
# Sum only for eligible members
person = household.members
is_eligible = person("is_eligible", period)
person_income = person("income", period)
# Only count income of eligible members
eligible_income = where(is_eligible, person_income, 0)
total = household.sum(eligible_income)
Error messages:
Performance:
# Your formula should work with arrays
def test_vectorization():
# Create array inputs
incomes = np.array([500, 1500, 3000])
# Should return array output
benefits = formula_with_arrays(incomes)
assert len(benefits) == 3
| Operation | Scalar (WRONG) | Vectorized (CORRECT) |
|---|---|---|
| Simple condition | if x > 5: | where(x > 5, ...) |
| Multiple conditions | if-elif-else | select([...], [...]) |
| Boolean AND | and | & |
| Boolean OR | or | | |
| Boolean NOT | not | ~ |
| Bounds checking | if x < 0: x = 0 | max_(0, x) |
| Floor subtraction | max_(x, 0) - y ❌ | max_(x - y, 0) ✅ |
| Complex logic | Nested if | Nested where/select |
When state tax calculations produce small non-zero values (e.g., $277) even though taxable income is zero, the root cause is usually phantom intermediate values in calculation chains.
When taxable income is zero but tax is non-zero, trace the calculation chain:
# Tax calculation chain (Montana example)
taxable_income: 0 # ✓ Correct
rate: 0.0475 # Used despite zero income
brackets: [15_600] # Used despite zero income
tax_before_credits: 277.41 # ❌ PHANTOM VALUE
# The bug: Brackets calculated regular tax even when taxable income was zero
# due to missing zero-check in bracket calculation
When you see phantom tax values:
Check the calculation chain - Run test with verbose output to see intermediate values:
pytest tests/file.py -vv
Verify zero-income handling - Look for formulas that don't short-circuit on zero income:
✅ GOOD:
def formula(entity, period, parameters):
taxable_income = entity("taxable_income", period)
# Short-circuit when income is zero
return where(taxable_income == 0, 0, calculate_tax(...))
❌ BAD:
def formula(entity, period, parameters):
# Always calculates, even when income is zero
return calculate_brackets(taxable_income, rates, brackets)
Check type consistency - Ensure operations preserve NumPy array dtypes:
✅ Use: max_(value, 0) or clip(value, 0, None)
❌ Avoid: max(value, 0) - Python's max can cause type issues