Use tax liability aggregates to validate/prune disaggregated PUF records

## Summary

Use tax liability aggregates from the original PUF aggregate records as additional calibration targets when generating synthetic disaggregated records. This would prune implausible records via L0 regularization.

## Motivation

The 4 aggregate PUF records contain not just income variables but also computed tax liabilities (E05800, E06200, E06500, E09600, E10300, etc.). These are per-return averages like everything else. Currently the disaggregation only calibrates to income totals — we could additionally require that the synthetic records, when scored through PolicyEngine, reproduce the known tax liability totals.

This is valuable because:
- A record with $500M AGI but implausible deduction structure would produce wrong tax liability
- Tax liabilities encode the **joint plausibility** of the entire return, not just individual variables
- L0 regularization naturally zeros out records that can't contribute to matching both income AND tax targets

## Approach

1. **Overgenerate** synthetic records (e.g., 200-400 per bucket instead of 20-40)
2. **Score** each through PolicyEngine to compute tax liabilities
3. **Set up reweighting problem**: find non-negative weights that minimize distance to:
   - Income component totals (E00200, P23250, E00650, etc.)
   - Tax liability totals (E05800, E09600, etc.)
   - Count targets from GDB
4. **L0 regularization** zeros out records that don't help match both sets of targets
5. **Output** the surviving records with their optimized weights

### Tax year considerations

The raw PUF is 2015 but gets uprated to 2021. The tax liability values also get uprated. We'd score the uprated synthetic records under 2021 tax law (which PolicyEngine supports) and compare to the uprated tax liability totals.

### Integration point

This would replace the current Dirichlet + donor scaling approach in `disaggregate_puf.py`. The overgeneration step is fast; the PE scoring is the bottleneck (~1,000 records scored is feasible).

## Dependencies

- #606 (aggregate record disaggregation — in progress)
- PolicyEngine-US must support the target tax year's law
- `microcalibrate` or custom L0 reweighting (already used in enhanced CPS)

## References

- L0 regularization: already used in `enhanced_cps.py` for sparse reweighting
- Tax variables in aggregate records: E05800, E06200, E06500, E09600, E10300, E10700, E10900


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use tax liability aggregates to validate/prune disaggregated PUF records #619

Summary

Motivation

Approach

Tax year considerations

Integration point

Dependencies

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use tax liability aggregates to validate/prune disaggregated PUF records #619

Description

Summary

Motivation

Approach

Tax year considerations

Integration point

Dependencies

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions