PolicyEngine testing patterns - YAML test structure, naming conventions, period handling, and quality standards
From essentialnpx claudepluginhub policyengine/policyengine-claude --plugin data-scienceThis skill uses the workspace's default tool permissions.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Comprehensive patterns and standards for creating PolicyEngine tests.
policyengine_us/tests/policy/baseline/gov/states/[state]/[agency]/[program]/
├── [variable_name].yaml # Unit test for specific variable
├── [another_variable].yaml # Another unit test
└── integration.yaml # Integration test (NEVER prefixed)
2024-01 - First month only2024 - Whole year2024-04 - Other months NOT supported2024-01-01 - Full dates NOT supportedChoose the margin based on the output type:
true/false, eligibility, flags): no error margin at all — booleans are exact, no rounding. Omit absolute_error_margin entirely.absolute_error_margin: 0.01absolute_error_margin: 0.001true (1) and false (0) indistinguishable, rendering the test meaninglessvariable_name.yaml (matches variable exactly)integration.yaml (never prefixed)Case 1, description. (numbered, comma, period)person1, person2 (never descriptive names)Unit tests - Named after the variable they test:
✅ CORRECT:
az_liheap_eligible.yaml # Tests az_liheap_eligible variable
az_liheap_benefit.yaml # Tests az_liheap_benefit variable
❌ WRONG:
test_az_liheap.yaml # Wrong prefix
liheap_tests.yaml # Wrong pattern
Integration tests - Always named integration.yaml:
✅ CORRECT:
integration.yaml # Standard name
❌ WRONG:
az_liheap_integration.yaml # Never prefix integration
program_integration.yaml # Never prefix integration
Follow state/agency/program hierarchy:
gov/
└── states/
└── [state_code]/
└── [agency]/
└── [program]/
├── eligibility/
│ └── income_eligible.yaml
├── income/
│ └── countable_income.yaml
└── integration.yaml
PolicyEngine test system ONLY supports:
2024-01 - First month of year2024 - Whole yearNever use:
2024-04 - April (will fail)2024-10 - October (will fail)2024-01-01 - Full date (will fail)If policy changes April 1, 2024:
# Option 1: Test with first month
period: 2024-01 # Tests January with new policy
# Option 2: Test next year
period: 2025-01 # When policy definitely active
Use numbered cases with descriptions:
✅ CORRECT:
- name: Case 1, single parent with one child.
- name: Case 2, two parents with two children.
- name: Case 3, income at threshold.
❌ WRONG:
- name: Single parent test
- name: Test case for family
- name: Case 1 - single parent # Wrong punctuation
CRITICAL: Always append new test cases at the bottom of the file. Never insert cases in the middle of existing tests.
# Existing file has Cases 1-3
# ✅ CORRECT - Add Case 4 at the bottom:
- name: Case 3, income above threshold.
...
- name: Case 4, new edge case scenario.
...
# ❌ WRONG - Inserting between existing cases and renumbering:
- name: Case 1, ...
- name: Case 2, new case inserted here. # Renumbered!
- name: Case 3, was previously Case 2. # Renumbered!
Why: Inserting in the middle forces renumbering of existing cases, which creates noisy diffs and makes review harder. Appending at the bottom keeps existing cases untouched.
Use generic sequential names:
✅ CORRECT:
people:
person1:
age: 30
person2:
age: 10
person3:
age: 8
❌ WRONG:
people:
parent:
age: 30
child1:
age: 10
Use simplified format without entity key:
✅ CORRECT:
output:
tx_tanf_eligible: true
tx_tanf_benefit: 250
❌ WRONG:
output:
tx_tanf_eligible:
spm_unit: true # Don't nest under entity
Skip tests for simple composition variables using only adds or subtracts:
# NO TEST NEEDED - just summing
class tx_tanf_countable_income(Variable):
adds = ["earned_income", "unearned_income"]
# NO TEST NEEDED - simple arithmetic
class net_income(Variable):
adds = ["gross_income"]
subtracts = ["deductions"]
Create tests for variables with:
where, select, if)# NEEDS TEST - has logic
class tx_tanf_income_eligible(Variable):
def formula(spm_unit, period, parameters):
return where(enrolled, passes_test, other_test)
The key rule: Input matches the larger of (variable period, test period). Output matches the test period.
| Variable Def | Test Period | Input Value | Output Value |
|---|---|---|---|
| YEAR | YEAR | Yearly | Yearly |
| YEAR | MONTH | Yearly (always!) | Monthly (÷12) |
| MONTH | YEAR | Yearly (÷12 per month) | Yearly (sum of 12) |
| MONTH | MONTH | Monthly | Monthly |
# YEAR variable + YEAR period
- name: Case 1, yearly test.
period: 2024
input:
employment_income: 12_000 # Yearly input
output:
employment_income: 12_000 # Yearly output
# YEAR variable + MONTH period
- name: Case 2, monthly test with yearly variable.
period: 2024-01
input:
employment_income: 12_000 # Still yearly input!
output:
employment_income: 1_000 # Monthly output (12_000/12)
# MONTH variable + YEAR period
- name: Case 3, yearly test with monthly variable.
period: 2024
input:
some_monthly_var: 1_200 # Yearly total (divided by 12 = 100/month)
output:
some_monthly_var: 1_200 # Yearly sum
# MONTH variable + MONTH period
- name: Case 4, monthly test with monthly variable.
period: 2024-01
input:
some_monthly_var: 100 # Monthly input (just January)
output:
some_monthly_var: 100 # Monthly output
See policyengine-period-patterns skill for the full explanation of period auto-conversion.
✅ CORRECT:
employment_income: 50_000
cash_assets: 1_500
❌ WRONG:
employment_income: 50000
cash_assets: 1500
Document every calculation step:
- name: Case 2, earnings with deductions.
period: 2025-01
input:
people:
person1:
employment_income: 3_000 # $250/month
output:
# Person-level arrays
tx_tanf_gross_earned_income: [250, 0]
# Person1: 3,000/12 = 250
tx_tanf_earned_after_disregard: [87.1, 0]
# Person1: 250 - 120 = 130
# Disregard: 130/3 = 43.33
# After: 130 - 43.33 = 86.67 ≈ 87.1
Include 5-7 scenarios covering:
Check 8-10 values per test:
output:
# Income calculation chain
program_gross_income: 250
program_earned_after_disregard: 87.1
program_deductions: 200
program_countable_income: 0
# Eligibility chain
program_income_eligible: true
program_resources_eligible: true
program_eligible: true
# Final benefit
program_benefit: 320
# Demographics
age: 30
is_disabled: false
is_pregnant: false
# Income
employment_income: 50_000
self_employment_income: 10_000
social_security: 12_000
ssi: 9_000
# Benefits
snap: 200
tanf: 150
medicaid: true
# Location
state_code: CA
county_code: "06037" # String for FIPS
Never use these (not in PolicyEngine):
heating_expenseutility_expenseutility_shut_off_noticepast_due_balancebulk_fuel_amountweatherization_neededBefore using enums in tests:
# Find enum definition
grep -r "class ImmigrationStatus" --include="*.py"
# Check actual values
class ImmigrationStatus(Enum):
CITIZEN = "Citizen"
LEGAL_PERMANENT_RESIDENT = "Legal Permanent Resident" # NOT "PERMANENT_RESIDENT"
REFUGEE = "Refugee"
✅ CORRECT:
immigration_status: LEGAL_PERMANENT_RESIDENT
❌ WRONG:
immigration_status: PERMANENT_RESIDENT # Doesn't exist
Before submitting tests:
2024-01 or 2024 only- name: Case 1, income exactly at threshold.
period: 2024-01
input:
people:
person1:
employment_income: 30_360 # Annual limit
output:
program_income_eligible: true # At threshold = eligible
- name: Case 2, elderly priority.
period: 2024-01
input:
people:
person1:
age: 65
output:
program_priority_group: true
- name: Case 3, SNAP categorical.
period: 2024-01
input:
spm_units:
spm_unit:
snap: 200 # Receives SNAP
output:
program_categorical_eligible: true
Always include a test that verifies benefits are capped at the maximum payment amount when countable income is negative. This prevents the bug where max_benefit - (-N) = max_benefit + N, inflating benefits beyond the payment standard.
# Tests that benefits are capped at the maximum payment amount,
# even when countable income is negative.
# Prevents: benefit = max - (-5M) = 5M+
- name: Case N, negative countable income does not inflate benefit.
period: 2025-01
input:
people:
person1:
age: 30
self_employment_income: -60_000_000 # -$5M/month
person2:
age: 8
spm_units:
spm_unit:
members: [person1, person2]
households:
household:
members: [person1, person2]
state_code: XX
output:
xx_tanf: 300 # Capped at max payment standard, not 5M+
When a variable depends on a multi-valued dimension (provider type, care setting, filing status), every dimension value needs at least one test case. Zero coverage of an entire dimension hides bugs.
defined_for filtering.When values change mid-year (e.g., July 1), test both sides of the boundary (e.g., June vs July). January-only tests miss off-by-one errors in effective dates.
When a benefit has supplements or adjustments, test that each flows through to the top-level benefit variable — not just in isolation.
When a source provides combined amounts (e.g., Federal SSI + State SSP), test both components independently with comments showing the combined math matches the source.
Check the formula's actual input variable names before writing tests. Use the variable the formula reads (e.g., employment_income_before_lsr), not a similar-sounding upstream variable.
A case named "$275 weekly" that expects $250 misleads reviewers. Keep names and expected values consistent.
When fixing a buggy parameter or formula, sweep ALL test files referencing the affected variable. Stale expected values silently mask regressions.