Skip to content

Refresh local SOI state targets used by local calibration#664

Open
MaxGhenis wants to merge 2 commits intomainfrom
codex/local-soi-refresh
Open

Refresh local SOI state targets used by local calibration#664
MaxGhenis wants to merge 2 commits intomainfrom
codex/local-soi-refresh

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis commented Mar 30, 2026

Summary

  • add an explicit refresh script for the legacy agi_state.csv artifact consumed by local calibration
  • regenerate agi_state.csv from the latest IRS geographic state SOI file and fix the missing DC GEO_ID rows
  • document the local-vs-national SOI split and add regression coverage for the local target format

Context

PR #660 refreshed the tracked national SOI workbook targets in soi_targets.csv, but local calibration still reads the separate agi_state.csv artifact in utils/loss.py. This follow-up makes that local refresh path explicit and keeps the tracked file in sync with its IRS source.

The IRS geographic state SOI file currently only goes through tax year 2022, so this PR refreshes the latest available local artifact rather than moving local calibration to TY2023.

Testing

  • uv run ruff format --check policyengine_us_data/storage/calibration_targets/refresh_local_agi_state_targets.py tests/test_refresh_local_agi_state_targets.py
  • uv run pytest tests/test_refresh_local_agi_state_targets.py tests/test_refresh_soi_table_targets.py -q

anth-volk added a commit that referenced this pull request Mar 30, 2026
- Pass HUGGING_FACE_TOKEN to unit test step in pr.yaml so tests that
  transitively import huggingface.py can collect without crashing
- Fix test_etl_national_targets.py: remove nonexistent
  TAX_EXPENDITURE_REFORM_ID import, use reform_id > 0 filter instead
  (mirrors fix from unmerged PR #664)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
anth-volk added a commit that referenced this pull request Mar 31, 2026
- Pass HUGGING_FACE_TOKEN to unit test step in pr.yaml so tests that
  transitively import huggingface.py can collect without crashing
- Fix test_etl_national_targets.py: remove nonexistent
  TAX_EXPENDITURE_REFORM_ID import, use reform_id > 0 filter instead
  (mirrors fix from unmerged PR #664)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MaxGhenis MaxGhenis force-pushed the codex/local-soi-refresh branch from 78aea43 to 59a0a6a Compare April 1, 2026 14:16
Copy link
Copy Markdown
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment for now: I see this as legacy ecps targeting enhancements. Nothing but love for the ECPS, but my primary aim is to make sure nothing gets undone with local area targets. (It shouldn't, because this doesn't touch the /etl folder, but I still get nervous!)

Do please see my comment about the max 500K bucket, given this is legacy.

@MaxGhenis MaxGhenis force-pushed the codex/local-soi-refresh branch from 59a0a6a to 8d1f57c Compare April 9, 2026 15:53
@MaxGhenis
Copy link
Copy Markdown
Contributor Author

Addressing Ben's top-bucket review note: the legacy state refresh now preserves the IRS state split for k-m and m+ instead of re-merging them into a single k+ bucket. I also regenerated the tracked agi_state.csv and tightened the refresh regression test around the separate top-tail rows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants