-
Notifications
You must be signed in to change notification settings - Fork 74
&joshdunnlime [ENH] TransformedDistribution and TransformedTargetRegressor cdf support
#611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
FYI @joshdunnlime, this is what I meant for a quick fix |
This reverts commit 36e20a3.
|
@joshdunnlime, any idea why the test is failing? cdf and ppf do not seem to be inverse to each other. |
|
@fkiraly is
So the |
|
@fkiraly - What are the methods |
|
@fkiraly merge https://github.com/joshdunnlime/skpro/tree/ttr-cdf-pdf-add-inv or just add inverse_transform=self.inverse_transform,to the cls call in |
|
This still raises a |
These are methods to subset or reorder indices of the array distribution. This is called in If the distribution is parametric, this is taken care of by the default, but in general, e.g., if the distribution is composite, it needs to be done manually currently. There is space to define a broader default including distribution objects as components, but that is currently an open issue:
I see, that must be it! |
TransformedDistribution and TransformedTargetRegressor cdf supportTransformedDistribution and TransformedTargetRegressor cdf support
can you outline your understanding of why this occurs? Which two objects exactly where, when passed to |
| assume_monotonic=self.assume_monotonic, | ||
| index=new_index, | ||
| columns=new_columns, | ||
| **params_dict, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is safer and more extensible. This might already be an almost solution for #559 (only needs to be combined with checks for type?)
If you have a fix, @joshdunnlime, could you open a PR with only the fix? So we can merge it quickly while the more complicated design questions remain open? |
|
Done. It is literally that one line to allow the exact |
|
|
||
| inv_trafo = self.inverse_transform | ||
|
|
||
| inv_x = inv_trafo(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where MinMaxScaler in my local script raised the warning.
I used:
warnings.filterwarnings('error')to catch and debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
I think this is resolved by ensuring x is a pd.DataFrame when we pass it.
I have added a comment on where the warning is raised. The I think the solution I have implemented in #612, where every function or transformer is wrapped in a DifferentiableTransformer would be the nicest way to fix this. It is then very easy to keep all of the transformation logic in there and out of the if not isinstance(x_t, pd.DataFrame):
x_t = pd.DataFrame(x_t, index=x.index, columns=x.columns)
else:
x_t.columns = x.columns
x_t.index = x.indexThis is the case for |
Can you explain why this would be the case? The data arrives inside This may have been an unfortunate design choice (not sure about this yet), but that also means whatever we do with |
|
I have opened a new design issue here: #615 I will now merge this PR to expedite the various improvements. |
I'm talking more about the output of the cdf, ppf and pdf. We are nearly always applying some transform in the TD so having the outputs (e.g. Jacobian) as dataframe would tidy up the TD code. It does deviate from default sklearn behaviour but sklearn does have a transformer to dataframe setting. Skpro TD method outputs are dataframes so this keeps consistency with that and makes debugging easier IMO.
I'll take a look. Thanks. |
This PR adds
cdfsupport toTransformedTargetRegressor, via changes inTransformedDistribution:TransformedDistributionnow acceptsinverse_transform, which can be used for an exactcdfTransformedTargetRegressorpasses the fittedself.transformer_transformto theinverse_transformofTransformedDistributionTogether with #610, it means that
TransformedTargetRegressorcan now produce distributions with reasonably reliablecdfandpdf.Goes partially towards #601.