Skip to content

Split Enhanced CPS CTC calibration targets across national and unified paths#711

Merged
MaxGhenis merged 15 commits intomainfrom
codex/fix-legacy-ctc-calibration
Apr 10, 2026
Merged

Split Enhanced CPS CTC calibration targets across national and unified paths#711
MaxGhenis merged 15 commits intomainfrom
codex/fix-legacy-ctc-calibration

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Closes #709.

This branch fixes the legacy national Enhanced CPS CTC calibration path so the refundable and non-refundable components are calibrated and validated separately.

What changed:

  • Adds the IRS geography-file mapping for the non-refundable child and other dependent credit alongside the refundable CTC mapping.
  • Calibrates the legacy national loss matrix to separate refundable and non-refundable CTC amounts and recipient counts.
  • Updates the national and staging validation helpers to report both CTC components against IRS SOI references.
  • Expands unit coverage for the IRS target lookup and legacy loss-target assembly.

Tests:

  • uv run pytest tests/unit/test_etl_irs_soi_overlay.py tests/unit/calibration/test_loss_targets.py tests/unit/calibration/test_validate_national_h5.py tests/unit/calibration/test_check_staging_sums.py
  • uv run ruff check policyengine_us_data/db/etl_irs_soi.py policyengine_us_data/utils/loss.py policyengine_us_data/calibration/validate_national_h5.py policyengine_us_data/calibration/check_staging_sums.py tests/unit/test_etl_irs_soi_overlay.py tests/unit/calibration/test_loss_targets.py tests/unit/calibration/test_validate_national_h5.py tests/unit/calibration/test_check_staging_sums.py

Copy link
Copy Markdown
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title says "legacy," but since this PR is already tackling some files used only by local area calibration, it might be worth it to just go all in and attempt to fix CTC in both spots. You can remind the robot that this is how we build the X matrix for the local area calibration:

python -m policyengine_us_data.calibration.unified_calibration \
  --build-only

At the very least, make database must pass:

● make database fails at the etl_irs_soi.py step with:                                                                
                                                                      
  sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) Invalid period value for targets                            
  [parameters: ('tax_unit_count', 2021, 10356, 0, 37086500.0, ...)]                                                   
                                                                                                                      
  The DB's field_valid_values trigger rejects period=2021 because only 2022-2025 are registered as valid periods. So  
  CI will fail on any workflow that runs make database.        

If that didn't fail during the PR build, if you could please check on why, I'd appreciate it.

I also requested some light documentation on ORG.

@MaxGhenis MaxGhenis changed the title Split legacy Enhanced CPS CTC calibration targets Split Enhanced CPS CTC calibration targets across national and unified paths Apr 9, 2026
@MaxGhenis
Copy link
Copy Markdown
Contributor Author

Addressed the main review blocker on this branch:

  • The new DB-backed CTC geography targets now write period=target_year rather than period=source_year, while keeping the IRS source year in notes. That fixes the period=2021 integrity error Ben flagged.
  • uv run make database now passes locally on this branch.
  • Added a regression test that reproduces the old failure mode with target_year=2023 and source_year=2021.
  • Added a short module docstring in policyengine_us_data/datasets/org/org.py explaining that census_cps_org_2024_wages.csv.gz is a generated cache built from CPS basic-month files, not a vendored source artifact.

On the CI question: the PR workflow currently gates unit-tests, smoke-test, and integration-tests on lint, and there is no dedicated make database job in .github/workflows/pr.yaml. So once lint failed, the jobs that might have surfaced the DB issue were skipped rather than proving the branch clean.

The unified/local path piece is already on this branch via the target_config.yaml additions for non_refundable_ctc amount and recipient-count targets; I re-verified that with tests/unit/calibration/test_target_config.py.

@MaxGhenis
Copy link
Copy Markdown
Contributor Author

Addressed the review items from Ben:

  • Retitled the PR to drop the legacy wording now that it also touches the unified path.
  • Added non_refundable_ctc to the shared IRS geography-file target registry in etl_irs_soi.py, so the national overlay and transformed SOI targets reuse one source of truth instead of a one-off hard-coded map.
  • Added the matching unified-calibration district target in target_config.yaml, so the nonrefundable CTC split is no longer only a legacy-national change.
  • Kept the ORG docstring update in org.py.
  • Re-ran uv run make database locally after the period fix and again after this registry refactor; it passes cleanly.
  • The earlier unit-test CI failure was stale-head related: the fix had initially only been pushed to the fork branch, while #711 tracks PolicyEngine/codex/fix-legacy-ctc-calibration. The PR is now on upstream head e53f022c, and GitHub has started a fresh Actions run from that commit.

@MaxGhenis MaxGhenis enabled auto-merge (squash) April 9, 2026 20:55
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
pipeline-diagrams Error Error Apr 10, 2026 0:53am

Request Review

Copy link
Copy Markdown
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fire at will

@MaxGhenis MaxGhenis merged commit c0e924d into main Apr 10, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Legacy Enhanced CPS does not properly calibrate refundable CTC

2 participants