Skip to content

Tighten remaining tail in Census SPM childcare cap replication #7961

@MaxGhenis

Description

@MaxGhenis

Summary

We now reproduce Census 2024 SPM work expenses and capped work+childcare expenses closely in aggregate after the recent childcare-cap fixes, but a narrow unit-level tail remains.

Current direct raw-CPS replication against the official 2025 ASEC public-use file for 2024 shows:

  • SPM_WKXPNS weighted total ratio: 1.00185
  • SPM_CAPWKCCXPNS weighted total ratio: 0.99958
  • SPM_CAPWKCCXPNS positive-unit MAE: about $17
  • SPM_CAPWKCCXPNS positive-unit share within $1: about 97.22%

So the broad formula looks right. The remaining issue is localized tail accuracy, not aggregate bias.

What remains

The biggest remaining gaps are a small set of units where we still choose the wrong lower-earner/reference-person pairing for the childcare cap.

Current pattern:

  • remaining top misses are mostly not cohabitors anymore
  • only about 12% of the top-100 cap misses are cohabiting units
  • cohabitors account for only about 9.8% of top-100 absolute error
  • the remaining tail is now mostly in more complex non-cohabiting married/multi-adult SPM units

Representative misses:

  • underpredictions where Census allows a much larger childcare add-on than our selected lower-earner cap
  • overpredictions where we still allow too much childcare relative to Census in a narrow set of units

Likely cause

The remaining tail appears to come from incomplete reconstruction of Census family-role logic in complex SPM units, especially around:

  • exact reference person / spouse mapping in multi-adult units
  • edge cases where tax-unit roles do not fully recover Census's SPM reference-person structure
  • possible additional relationship fields or tie-break rules not yet carried through the CPS pipeline

Proposed follow-up

  1. Build a reproducible raw-CPS comparison script into the repo so this does not live only in local analysis.
  2. Audit the worst remaining SPM_CAPWKCCXPNS misses record-by-record.
  3. Identify which extra CPS relationship/reference fields are needed to match Census role assignment in the remaining tail.
  4. Tighten the model only if the extra role reconstruction clearly improves the tail without hurting aggregate fit.
  5. Add regression tests for the newly understood edge cases.

Related work

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions