-
Notifications
You must be signed in to change notification settings - Fork 51
Description
As seen here: https://import-balance.org/docs/tutorials/quickstart/
When we have NA in some variable, the default modeling does this:
[ipw/ipw (line 767)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender']
There are several ways to do better: (each of this should be it's own PR!)
1/ It's not clear what does the model do for na in gender? If it's imputed with the mean (or something similar), it's probably worth adding an INFO about it.
2/ Adding more information about missingness in variables (in the default outputs and others), is valuable. Since missingness can be a big deal. E.g., instead of just doing this:
print(adjusted)
Adjusted balance Sample object with target set using ipw
1000 observations x 3 variables: gender,age_group,income
id_column: id, weight_column: weight,
outcome_columns: happiness
To add missing info, such as:
print(adjusted)
Adjusted balance Sample object with target set using ipw
1000 observations x 3 variables: gender(8% NA),age_group,income
id_column: id, weight_column: weight,
outcome_columns: happiness
3/ Consider more carefully what should be in
adjusted.covars().summary()
4/
Include is_na as an interaction term, instead of just additive term.
Call this transformation default_with_na.
The new structure should be:
[ipw/ipw (line 767)]: The formula used to build the model matrix: ['income + gender + age_group + _is_na_gender + gender:_is_na_gender']
More ideas?