Skip to content

[FEATURE] Improve NA handeling inside modeling #342

@talgalili

Description

@talgalili

As seen here: https://import-balance.org/docs/tutorials/quickstart/
When we have NA in some variable, the default modeling does this:
[ipw/ipw (line 767)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']

There are several ways to do better: (each of this should be it's own PR!)

1/ It's not clear what does the model do for na in gender? If it's imputed with the mean (or something similar), it's probably worth adding an INFO about it.

2/ Adding more information about missingness in variables (in the default outputs and others), is valuable. Since missingness can be a big deal. E.g., instead of just doing this:

print(adjusted)
        Adjusted balance Sample object with target set using ipw
        1000 observations x 3 variables: gender,age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness

To add missing info, such as:

print(adjusted)
        Adjusted balance Sample object with target set using ipw
        1000 observations x 3 variables: gender(8% NA),age_group,income
        id_column: id, weight_column: weight,
        outcome_columns: happiness

3/ Consider more carefully what should be in
adjusted.covars().summary()

4/
Include is_na as an interaction term, instead of just additive term.
Call this transformation default_with_na.
The new structure should be:
[ipw/ipw (line 767)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender + gender:_is_na_gender']

More ideas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions