Tighten remaining tail in Census SPM childcare cap replication

## Summary
We now reproduce Census 2024 SPM work expenses and capped work+childcare expenses closely in aggregate after the recent childcare-cap fixes, but a narrow unit-level tail remains.

Current direct raw-CPS replication against the official 2025 ASEC public-use file for 2024 shows:
- `SPM_WKXPNS` weighted total ratio: `1.00185`
- `SPM_CAPWKCCXPNS` weighted total ratio: `0.99958`
- `SPM_CAPWKCCXPNS` positive-unit MAE: about `$17`
- `SPM_CAPWKCCXPNS` positive-unit share within `$1`: about `97.22%`

So the broad formula looks right. The remaining issue is localized tail accuracy, not aggregate bias.

## What remains
The biggest remaining gaps are a small set of units where we still choose the wrong lower-earner/reference-person pairing for the childcare cap.

Current pattern:
- remaining top misses are mostly not cohabitors anymore
- only about `12%` of the top-100 cap misses are cohabiting units
- cohabitors account for only about `9.8%` of top-100 absolute error
- the remaining tail is now mostly in more complex non-cohabiting married/multi-adult SPM units

Representative misses:
- underpredictions where Census allows a much larger childcare add-on than our selected lower-earner cap
- overpredictions where we still allow too much childcare relative to Census in a narrow set of units

## Likely cause
The remaining tail appears to come from incomplete reconstruction of Census family-role logic in complex SPM units, especially around:
- exact reference person / spouse mapping in multi-adult units
- edge cases where tax-unit roles do not fully recover Census's SPM reference-person structure
- possible additional relationship fields or tie-break rules not yet carried through the CPS pipeline

## Proposed follow-up
1. Build a reproducible raw-CPS comparison script into the repo so this does not live only in local analysis.
2. Audit the worst remaining `SPM_CAPWKCCXPNS` misses record-by-record.
3. Identify which extra CPS relationship/reference fields are needed to match Census role assignment in the remaining tail.
4. Tighten the model only if the extra role reconstruction clearly improves the tail without hurting aggregate fit.
5. Add regression tests for the newly understood edge cases.

## Related work
- policyengine-us-data PR #705
- policyengine-us PR #7960
- related broader SPM tracking issue: #3686


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tighten remaining tail in Census SPM childcare cap replication #7961

Summary

What remains

Likely cause

Proposed follow-up

Related work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tighten remaining tail in Census SPM childcare cap replication #7961

Description

Summary

What remains

Likely cause

Proposed follow-up

Related work

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions