Builds data layers for PolicyEngine dashboards from plan.yaml: precomputed JSON files, PolicyEngine API clients, or custom Modal backends. Outputs typed API clients, React Query hooks, and tests.
From essentialnpx claudepluginhub policyengine/policyengine-claude --plugin data-scienceopusOrchestrates plugin quality evaluation: runs static analysis CLI, dispatches LLM judge subagent, computes weighted composite scores/badges (Platinum/Gold/Silver/Bronze), and actionable recommendations on weaknesses.
IMPORTANT: Use careful, step-by-step reasoning before taking any action. Think through:
Builds the data layer for a dashboard based on the approved plan.yaml.
Always prefer simpler patterns. Before building a custom backend:
api.policyengine.org/calculate handle it? → Use Pattern B (policyengine-api)custom-modal) with gateway + pollingprecomputed / precomputed-csv)Pattern C is the most complex and should be the last resort. If the plan specifies custom-modal, the plan MUST include a reason explaining why Pattern B is insufficient.
Before starting ANY work, use the Skill tool to load each required skill:
Skill: policyengine-interactive-tools-skillSkill: policyengine-us-skill (if US dashboard)Skill: policyengine-uk-skill (if UK dashboard)Skill: policyengine-simulation-mechanics-skill (if custom-backend pattern)Skill: policyengine-parameter-patterns-skill (if custom-backend pattern — required for correct Reform.from_dict() paths)plan.yaml and skeleton API clientdata_pattern chosen from the patterns defined in the policyengine-interactive-tools-skill (loaded in the First step)precomputed)When data_pattern: precomputed, the dashboard ships static JSON files with pre-run results. No backend, no API calls at runtime.
Generate JSON files in public/data/ based on the plan's data requirements:
public/data/
results.json # or split by dimension:
results_by_state.json
results_by_year.json
Data should be structured for direct consumption by the frontend — no post-processing needed.
Create lib/api/client.ts:
// client.ts
export interface DashboardData {
// Types matching the JSON structure from plan.yaml
}
export async function loadData(): Promise<DashboardData> {
const res = await fetch('/data/results.json');
if (!res.ok) throw new Error(`Failed to load data: ${res.status}`);
return res.json();
}
Create lib/hooks/useData.ts:
import { useQuery } from '@tanstack/react-query';
import { loadData } from '../api/client';
export function useDashboardData() {
return useQuery({
queryKey: ['dashboard-data'],
queryFn: loadData,
staleTime: Infinity, // Static data never goes stale
});
}
Create lib/api/__tests__/client.test.ts:
policyengine-api)When data_pattern: policyengine-api, the dashboard calls api.policyengine.org directly for household calculations. No custom backend needed.
Read the plan's endpoints and generate TypeScript interfaces for the household request and response:
// types.ts
/** Household structure for the PolicyEngine API */
export interface HouseholdRequest {
household: {
people: Record<string, Record<string, Record<string, number | boolean | string>>>;
tax_units: Record<string, { members: string[]; [key: string]: unknown }>;
spm_units: Record<string, { members: string[] }>;
households: Record<string, { members: string[]; [key: string]: unknown }>;
};
}
/** API response from /calculate */
export interface CalculateResponse {
status: 'ok' | 'error';
message: string | null;
result: Record<string, unknown>;
}
// Add dashboard-specific types for the variables in the plan
Create lib/api/client.ts:
// client.ts
const API_BASE = 'https://api.policyengine.org';
export async function calculate(
countryId: string,
household: HouseholdRequest['household'],
): Promise<CalculateResponse> {
const res = await fetch(`${API_BASE}/${countryId}/calculate`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ household }),
});
if (!res.ok) throw new Error(`API error: ${res.status}`);
return res.json();
}
Create lib/hooks/useCalculation.ts:
import { useMutation } from '@tanstack/react-query';
import { calculate } from '../api/client';
import type { HouseholdRequest } from '../api/types';
export function useCalculation(countryId: string) {
return useMutation({
mutationFn: (household: HouseholdRequest['household']) =>
calculate(countryId, household),
});
}
Create lib/api/__tests__/client.test.ts:
custom-modal) — Gateway + PollingWhen data_pattern: custom-modal, build a two-layer architecture on Modal: a lightweight gateway that manages job submission/polling, and worker functions that run the heavy policyengine computations. This mirrors the pattern used by PolicyEngine API v2's simulation service.
Why not synchronous HTTP? Modal's dev gateway (modal serve) and production gateway have a ~150s timeout. US statewide microsimulations take 2-5+ minutes, causing HTTP 303 redirects that browser fetch() cannot follow for POST requests. The gateway + polling architecture avoids this entirely.
Backend structure: The backend uses a three-file structure to avoid a common crash-loop where module-level imports of pydantic or policyengine fail because those packages are only available inside the Modal function's image, not at module import time.
| File | Purpose | Module-level imports |
|---|---|---|
backend/_image_setup.py | Standalone snapshot function — runs during image build | None (all inside function body) |
backend/app.py | Modal app + function decorators | Only modal |
backend/simulation.py | Pure business logic | policyengine_us/_uk (captured in image snapshot) |
backend/modal_app.py | Lightweight gateway (FastAPI) | modal, fastapi, pydantic |
Before writing any code, look up the latest version from PyPI. Do NOT guess or use a version from memory — these packages release frequently and stale versions will have bugs.
# For US dashboards:
pip index versions policyengine-us 2>/dev/null | head -1
# For UK dashboards:
pip index versions policyengine-uk 2>/dev/null | head -1
Use the version number returned (e.g., 1.592.4) in the pip_install() call below.
Generate backend/_image_setup.py. This is a standalone function with no package imports at module level — it runs during image build via .run_function():
def snapshot_models():
"""Pre-load models at image build time for fast cold starts."""
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
logger.info("Pre-loading tax-benefit system...")
from policyengine_us import CountryTaxBenefitSystem # or policyengine_uk
CountryTaxBenefitSystem()
logger.info("Models pre-loaded into image snapshot")
Generate backend/simulation.py. This is pure business logic — no Modal imports. policyengine imports are at module level because they are captured in the image snapshot.
from policyengine_us import Simulation, Microsimulation # Snapshotted at build time
from pydantic import BaseModel # Available in image
# Pydantic models, constants, and helper functions live here.
# Each endpoint in plan.yaml gets a run_*() function.
def run_household(params: dict) -> dict:
# Build household from params (per plan.yaml endpoints)
sim = Simulation(situation=params["household"])
return {"net_income": float(sim.calculate("household_net_income", 2025).sum())}
def run_statewide(params: dict) -> dict:
baseline = Microsimulation()
reform_sim = Microsimulation(reform=params["reform"])
# ... compute and return impacts
return {"revenue_change": ..., "winners": ..., "losers": ...}
CRITICAL — every parameter path used in Reform.from_dict() or reform dictionaries MUST be verified against the actual YAML files. Incorrect paths cause silent failures or runtime errors like "Could not find the parameter". Consult the policyengine-parameter-patterns-skill section 6.5 for the bracket path syntax rules.
Common mistakes:
gov.irs.income.bracket.rates has keys 1-7, not 0-6). Always check whether the YAML uses a list (0-indexed) or explicit integer keys (use those exact keys).brackets: list) require .amount or .rate after the index. E.g., gov.irs.credits.eitc.max[0].amount, NOT gov.irs.credits.eitc.max[0].gov.irs.credits.ctc.phase_out.threshold[SINGLE]).Verification procedure for every parameter path:
Find the YAML file in the policyengine-us (or policyengine-uk) parameters directory:
# Convert dotted path to directory path and search
find $(python3 -c "import policyengine_us; import os; print(os.path.dirname(policyengine_us.__file__))") \
-path "*/parameters/gov/irs/income/bracket.yaml" 2>/dev/null
Read the YAML and check whether the parameter uses:
1:, 2:, 3:) → use those exact indices: path[1], path[2]brackets: list → use 0-indexed with sub-key: path[0].amount, path[0].rate[SINGLE], [JOINT], etc.Verify programmatically (if policyengine is installed locally):
from policyengine_us import CountryTaxBenefitSystem
p = CountryTaxBenefitSystem().parameters
# Navigate and confirm the path resolves:
print(p.gov.irs.income.bracket.rates[1]("2026-01-01")) # Should return 0.10
Do NOT guess parameter paths from memory. Always verify against the actual YAML files.
Generate backend/app.py. Only modal at module level. Imports business logic inside each function body:
import modal
from pathlib import Path
from _image_setup import snapshot_models
_BACKEND_DIR = Path(__file__).parent
app = modal.App("DASHBOARD_NAME-workers")
image = (
modal.Image.debian_slim(python_version="3.11")
.pip_install("policyengine-us==LATEST_VERSION", "pydantic") # Pinned — looked up from PyPI in Step 1
.run_function(snapshot_models)
.add_local_file(str(_BACKEND_DIR / "simulation.py"), remote_path="/root/simulation.py")
)
@app.function(image=image, cpu=8.0, memory=32768, timeout=3600)
def compute_household(params: dict) -> dict:
from simulation import run_household
return run_household(params)
@app.function(image=image, cpu=8.0, memory=32768, timeout=3600)
def compute_statewide(params: dict) -> dict:
from simulation import run_statewide
return run_statewide(params)
Create one @app.function per endpoint in the plan. Set timeout based on the plan's worker_timeout values. Microsimulation endpoints need at least timeout=3600.
Generate backend/modal_app.py. The gateway is lightweight — no policyengine in its image. It spawns worker jobs and polls for results:
import modal
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
app = modal.App("DASHBOARD_NAME")
gateway_image = modal.Image.debian_slim(python_version="3.11").pip_install(
"fastapi", "pydantic",
)
WORKER_APP = "DASHBOARD_NAME-workers"
# Map endpoint names to worker function names
FUNCTION_MAP = {
"household-impact": "compute_household",
"statewide-impact": "compute_statewide",
}
web_app = FastAPI()
web_app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"],
)
class SubmitResponse(BaseModel):
job_id: str
class StatusResponse(BaseModel):
status: str # "computing" | "ok" | "error"
result: dict | None = None
message: str | None = None
@web_app.post("/submit/{endpoint}")
def submit(endpoint: str, params: dict):
if endpoint not in FUNCTION_MAP:
raise HTTPException(status_code=404, detail=f"Unknown endpoint: {endpoint}")
fn = modal.Function.from_name(WORKER_APP, FUNCTION_MAP[endpoint])
call = fn.spawn(params)
return SubmitResponse(job_id=call.object_id)
@web_app.get("/status/{job_id}")
def status(job_id: str):
from modal.functions import FunctionCall
call = FunctionCall.from_id(job_id)
try:
result = call.get(timeout=0)
return StatusResponse(status="ok", result=result)
except TimeoutError:
return StatusResponse(status="computing")
except Exception as e:
return StatusResponse(status="error", message=str(e))
@app.function(image=gateway_image)
@modal.asgi_app()
def fastapi_app():
return web_app
Generate lib/api/client.ts:
const API_URL = process.env.NEXT_PUBLIC_API_URL
|| 'https://policyengine--DASHBOARD_NAME-fastapi-app.modal.run';
interface JobResponse { job_id: string }
export interface StatusResponse {
status: 'computing' | 'ok' | 'error';
result?: unknown;
message?: string;
}
export async function submitJob(endpoint: string, params: unknown): Promise<string> {
const res = await fetch(`${API_URL}/submit/${endpoint}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(params),
});
if (!res.ok) throw new Error(`Submit failed: ${res.status}`);
const data: JobResponse = await res.json();
return data.job_id;
}
export async function pollStatus(jobId: string): Promise<StatusResponse> {
const res = await fetch(`${API_URL}/status/${jobId}`);
if (!res.ok) throw new Error(`Status check failed: ${res.status}`);
return res.json();
}
Create lib/hooks/useCalculation.ts:
import { useQuery } from '@tanstack/react-query';
import { useState, useEffect } from 'react';
import { submitJob, pollStatus } from '../api/client';
import type { StatusResponse } from '../api/client';
export function useAsyncCalculation<T>(
queryKey: unknown[],
endpoint: string,
params: unknown,
options?: { enabled?: boolean },
) {
const [jobId, setJobId] = useState<string | null>(null);
// Reset jobId when params change
useEffect(() => { setJobId(null); }, [JSON.stringify(params)]);
// Step 1: Submit job
const submit = useQuery({
queryKey: [...queryKey, 'submit'],
queryFn: async () => {
const id = await submitJob(endpoint, params);
setJobId(id);
return id;
},
enabled: options?.enabled ?? true,
});
// Step 2: Poll for results
const poll = useQuery<StatusResponse>({
queryKey: [...queryKey, 'poll', jobId],
queryFn: () => pollStatus(jobId!),
enabled: !!jobId,
refetchInterval: (query) =>
query.state.data?.status === 'computing' ? 2000 : false,
});
return {
isLoading: submit.isLoading || (!!jobId && poll.isLoading),
isComputing: poll.data?.status === 'computing',
isError: submit.isError || poll.data?.status === 'error',
data: poll.data?.status === 'ok' ? (poll.data.result as T) : undefined,
error: poll.data?.message || submit.error?.message,
};
}
Generate backend/tests/test_simulation.py from the plan's tests.api_tests:
def test_basic_calculation():
"""From plan.yaml: basic_calculation test"""
# Test with known inputs, verify outputs are in expected range
pass
def test_zero_income():
"""From plan.yaml: zero_income test"""
pass
Use uv for Python dependency management. Do NOT use requirements.txt or pip install.
cd backend
uv init --no-workspace
uv add policyengine-us # or policyengine-uk
uv add --dev pytest
This creates a pyproject.toml with pinned dependencies and a uv.lock lockfile. Commit both files.
| Context | Default timeout | Max timeout | Notes |
|---|---|---|---|
@app.function(timeout=...) | 300s | 86,400s (24h) | Set per-function |
modal serve dev gateway | ~150s | Not configurable | Returns HTTP 303 on timeout |
modal deploy prod gateway | ~150s | Not configurable | Returns HTTP 303 on timeout |
US statewide microsimulations take 2-5+ minutes. This exceeds the gateway timeout, which is why synchronous HTTP calls fail for microsimulation endpoints. The gateway + polling architecture avoids this by using non-blocking job submission. Household-level simulations typically complete in 10-40s.
Cold starts: With .run_function(snapshot_models), cold starts are ~2s because the tax-benefit system is pre-loaded into the image. Without the snapshot, cold starts take 3-5 minutes as policyengine must initialize from scratch.
precomputed-csv)When data_pattern: precomputed-csv, the dashboard ships CSV files generated by a Python microsimulation pipeline. Follow Pattern A but use CSV files in public/data/ instead of JSON, and parse them at runtime with a lightweight CSV parser.
See the policyengine-interactive-tools-skill for examples (e.g., snap-bbce-repeal, uk-spring-statement-2026).
/deploy-dashboard)requirements.txt or pip install — always use uv for Python dependency management