[codex] Align SPM thresholds and stabilize clone-half priors#702
[codex] Align SPM thresholds and stabilize clone-half priors#702
Conversation
|
Temporarily closing to retrigger GitHub Actions on the latest head; reopening immediately. |
…-align-data # Conflicts: # policyengine_us_data/datasets/cps/extended_cps.py # tests/unit/test_extended_cps.py
There was a problem hiding this comment.
Exciting that this could help us with poverty:
Tenure-specific geoadj — if you're applying renter housing shares to owner households, their SPM thresholds are
wrong, which means their poverty status is wrong, which means calibration is optimizing toward incorrect poverty
targets. Fixing this makes the targets mean what they're supposed to mean.
Watch for my comments about defining NYC with CDs, as I consider this a regression compared to what was merged in with #671
My robot seems to think this is a high risk / high reward PR. Finding out why the integration test failed might give us some more insight into that.
| "KINGS_COUNTY_NY", | ||
| } | ||
|
|
||
| NYC_CDS = [ |
There was a problem hiding this comment.
Heads up! PR #702 switches to filtering by NYC_CDS (congressional districts) + NYC_COUNTIES (county name strings). This would be a regression from the block-based approach used currently in main, going back to the CD-level geography that the block assignment work was meant to improve.
Honestly, any time I see "CDs", I get nervous.
What changed
policyengine-usSPM_CAPHOUSESUBbenchmark forspm_unit_capped_housing_subsidyhousing_assistance, so Census SPM capped subsidy and HUD spending/assisted-household counts are no longer mixed togetherspm_unit_spm_thresholdfor the PUF clone halfspm_unit_spm_thresholddeterministically from the donor half's geography and the current threshold formula+1sparse-reweighting prior with deterministic near-zero priors for zero-weight clone households, while keeping donor-half priors close to their survey weightsWhy
The data pipeline had three distinct SPM issues:
policyengine-us-datahad drifted from the model-side logic inpolicyengine-usspm_unit_spm_threshold, even though thresholds should be derived from donor geography plus composition, not predicted statisticallyThe housing benchmark cleanup is separate but related concept hygiene:
spm_unit_capped_housing_subsidyis a Census SPM concept and should be benchmarked to CPS ASECSPM_CAPHOUSESUBhousing_assistanceis a HUD program/spending concept and should be benchmarked separately to HUD USER assisted-household counts and spending totalsImpact
policyengine-usRoot cause
policyengine-us-datahad:policyengine-usspm_unit_spm_thresholdinto the CPS-only QRF output setValidation
uv run pytest -q tests/unit/test_extended_cps.pyuv run pytest -q tests/unit/calibration/test_calibration_puf_impute.pyuv run pytest -q policyengine_us_data/tests/test_local_area_calibration/test_spm_thresholds.pyuv run pytest -q tests/integration/test_enhanced_cps.py -k 'household_count or poverty_rate_reasonable'git diff --check