-
Notifications
You must be signed in to change notification settings - Fork 74
[ENH] Add LogLoss to TransformedTargetRegressor (TTR) #605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ion. This allows us to get more information from the tranformer class. E.g. MinMaxScaler scale_ parameter. Though not part of the typical forward-facing interface, TransformedDistribution.transform is now no longer callable. Instead TransformedDistribution.transform.inverse_transform would be needed.
This is useful when testing XGBoostLLS and hyperparameter optimisation is not important. XGBoostLLS can be computationally expensive when searching over the default 30 trials.
Does so by adding the log_pdf with change-of-variables (jacobian of inverse transform) and direct method for linear transforms, and numerical method for non-linear transforms. TransformedDistribution now take transformer instead of transform - that is, the transformer class instead of just the inverse transform function/method.
This is not actually intended as an example. It is a simple way to share findings. This can (should) be removed before any final merges into main.
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
| self, | ||
| distribution, | ||
| transform, | ||
| transformer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, we cannot change the names of already released arguments, or it might break user code. Not without deprecation and warnings ate least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I'll make some changes.
| ) | ||
|
|
||
| trafo = self.transform | ||
| trafo = self.transformer.inverse_transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The self.transform is just the inverse function at present, not the whole tranformer class. The idea was to expose the whole class so we can do numerical differentiation or access the scale_ attr. Based on the comment above this isn't favourable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not easily possible, because the __init__ resets every estimator that is a component to a pre-fitted state on construction, and because of deprecation.
Overall, I think it is a very good idea, though one may have to find the right implementation.
Would passing inverse_transform as an optional additional argument work?
Alternatively, we have to pass something like sklearn FrozenEstimator wrapping the transformer, though it feels like a bit of an overhead software engineering wise.
The general question is to whether, and if yes how, to allow, API-wise, fitted estimators to be passed to __init__ - currently this violates the scikit-learn-like design.
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! Though we cannot merge this as is, since:
- this changes the arguments of existing released classes, possibly breaking user code
- I think it introduces too much coupling between TTR and the
TransformedDistribution, we should avoid that.
Suggestion: how about we allow the user to pass both transform and inverse_transform (optional) to the TransformedDistribution? Then we can do numerical differentiation.
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may I request:
- remove the
BaseTransformerfrom this pull request and open a separate one, since we will have some API discussions; let's aim to complete this with numerical differentiation, and later extend it with exact jacobians in a separate PR - may I suggest to move the numerical differentiation to TD for it to happen only inside
_pdfand_log_pdf? And leave TTR untouched? This we could merge quickly, and it could be "version 0.1" of the feature you want.e- testing it might also highlight aspects we have not yet thought about
|
Apologies, I made the DiffTransformer changes on the wrong branch. |
|
No problem. May I suggest to change this PR to revert changes across the classes, and add only an approximate That would be a good starting point for adding more features in separate PR (e.g., autodiff-like). |
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - almost ready to merge. The public methods should not be overridden, private methods should be implemented.
See how, for instance, _cdf is implemented.
| else: | ||
| raise NotImplementedError | ||
|
|
||
| jac = np.abs(self.transformer_.inverse_transform_diff(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this we should replace with an approximate jacobian
This is needed so that when TransformerDistribtution.distribution is called, it has the same indices as the wrapper distribution.
|
I have implemented the This PR could be merged but it does have to potential to return very approximate pdf and log_pdf results. My suggestion would instead be to make the DifferentiableTransformer or DifferentiableFunction changes on a branch from this (branch stacking) and merge it all at once. Notes for reviewer/@fkiraly : I have added some warnings to inform the user that these are approximations. I have added this to the docs as well. This can be removed when we implement the Jacobian. When changing The seconds issue that was rather subtle was Finally, to pass the "pdf_log_pdf" tests, these had to be added to the approx tags. |
Yes, good thinking - this is consistent with the default approximations in
Yes, I think that is the correct choice to hard code, it is the most general class. We can dynamically change this later if we want to.
Well, looks like the tests were doing their job. |
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions which may imply changed requests:
- why are we doing the branch "
if hasattr(dist, "_distribution_attr")? This feels like a violation of the law of demeter. We should only usedist.pdf. - the wrong indices are explained by this, I think. You should not need to correct them if all we do is call
pdf.dist.
Can you also explain to me how the computation is an approximation to transformed pdf? It is missing the jacobian.
I guess that depends on the definition of approximate - I might have been somewhat loose with it. Refer to my sentence from above:
I recommend not merging this independently but waiting until the we have a solution for the exact Jacobian in place. My suggestion is to use branch stacking to implement the DiffT or DiffFunc classes. I have this done already and will open a new PR (in place of #608). |
Actually, if it was |
I was under the impression you were going to implement numerical derivatives. I have now made a draft here in the base class: #610 This is due to realizing that this is nothing specific to the transformed distribution - any distribution that does not have a This should, in particular, fix it for the |
Perhaps a misunderstanding on my behalf. This is now fixed. Would we like to rollback the changes to ttr or are these beneficial? E.g. is a user wants to dig into the internals of TTR/TD the indices are shared between TD and the wrapped distribution. |
It was |
I would say:
|
What are the indices currently? I think they should be the same between row and column index. |
OK - some confusion around approximate here:
Yes, but I am also adding explicit differentiation in two cases: 1) the user passes the explicit In these cases, the Jacobian is exact and therefore so is the log_pdf.
Understood, but the TD doesn't have a (edit) or TD is this implemented via the BaseDistribution
Yes - happy to test. It will be interesting to compare this to the exact methods above. |
I know, but I think we need to add these gradually. Numerical differentiation is a sensible default in the absence of anything else. Therefore imo it makes sense to put it even in the base class as a the last fallback.
Oh, I see. You would need the inverse transformation for that. |
Prior to the change, always a pandas |
Is this a bug in the wrapped regressor then? It should return index and columns in line with the |
|
I have now opened an additional PR #611, this adds an exact Combined with this, #610 should be able to produce a reasonable The sequence of computations would be, when calling
(the user should see nothing of this, optimally) |
Reference Issues/PRs
#601
Problem Summary: LogLoss not included due to lack of a
log_pdf. It can be computed but requires calculating the Jacobian of the transform. For many transformers this is non-trivial.What does this implement/fix? Explain your changes.
I have implemented several pieces here. They breakdown as follows:
log_pdfmethod to TTR._jacobianmethod to TransformedDistribution (TD).transformmethod/function withtransformerclass in the TD.ordered_gradientfunction to handlegradient(f, x)where the arrays are not sorted by x.Does your contribution introduce a new dependency? If yes, which one?
No.
What should a reviewer concentrate their feedback on?
log_pdfin TTR.Did you add any tests for the change?
I haven't yet added tests. I have modified the
get_test_paramsto take a transformer instead of a single function.Any other comments?
PR checklist
For all contributions
How to: add yourself to the all-contributors file in the
skproroot directory (not theCONTRIBUTORS.md). Common badges:code- fixing a bug, or adding code logic.doc- writing or improving documentation or docstrings.bug- reporting or diagnosing a bug (get this pluscodeif you also fixed the bug in the PR).maintenance- CI, test framework, release.See here for full badge reference
For new estimators
docs/source/api_reference/taskname.rst, follow the pattern.Examplessection.python_dependenciestag and ensureddependency isolation, see the estimator dependencies guide.