-
Notifications
You must be signed in to change notification settings - Fork 74
[ENH] Add Differentiable Transformer and Exact PDFs for TargetTransformRegressor #612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[ENH] Add Differentiable Transformer and Exact PDFs for TargetTransformRegressor #612
Conversation
…ion. This allows us to get more information from the tranformer class. E.g. MinMaxScaler scale_ parameter. Though not part of the typical forward-facing interface, TransformedDistribution.transform is now no longer callable. Instead TransformedDistribution.transform.inverse_transform would be needed.
This is useful when testing XGBoostLLS and hyperparameter optimisation is not important. XGBoostLLS can be computationally expensive when searching over the default 30 trials.
Does so by adding the log_pdf with change-of-variables (jacobian of inverse transform) and direct method for linear transforms, and numerical method for non-linear transforms. TransformedDistribution now take transformer instead of transform - that is, the transformer class instead of just the inverse transform function/method.
This is not actually intended as an example. It is a simple way to share findings. This can (should) be removed before any final merges into main.
This is needed so that when TransformerDistribtution.distribution is called, it has the same indices as the wrapper distribution.
|
Usage examples: Create some data: import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from skpro.metrics import LogLoss, CRPS
from skpro.regression.xgboostlss import XGBoostLSS
from skpro.regression.compose import TransformedTargetRegressor
from skpro.compose import DifferentiableTransformer
size = 1000
X = np.random.normal(1, 1, size)
y = (2 * X + np.random.normal(0, 0.5, size)) # / 15 + 0.5 # for Beta
Xy = pd.DataFrame(
{"target": y, "feature": X},
index=range(size)
)Now we can create pass an sklearn transform to TTR: xgb = XGBoostLSS(**params)
mms = MinMaxScaler((0.1, 0.9))
pipe = TransformedTargetRegressor(regressor=xgb, transformer=mms)or a DT: xgb = XGBoostLSS(**params)
mms = MinMaxScaler((0.1, 0.9))
mms = DifferentiableTransformer(transformer=MinMaxScaler())
pipe = TransformedTargetRegressor(regressor=xgb, transformer=mms)Both are handled the same: pipe.fit(X=Xy[["feature"]], y=Xy["target"])
pp = pipe.predict_proba(Xy[["feature"]])
p = pipe.predict(Xy[["feature"]])
CRPS()(y_true=Xy["target"], y_pred=pp)
LogLoss()(y_true=Xy["target"], y_pred=pp)
And in either case, if we inspect the TD ( pp.transformer_
|
|
I tried to resolve the conflicts, have a look |
…unnlime/skpro into differentiable-transformer
Here we consider the numerical differentiation to classed as approx. For wellbehaved function (which change-of-variables requires), this is likely to be overkill.
|
This should give working implementation. There are a couple design choice that it might be worth discussing:
In addition to the above, the documentation needs some tidying, and comments, tags and licenses need updating. Further improvements:
|
Oh, very nice! I will review but first leave some comments on the design choices.
I think we should not take a dependency on I will contact the author to see what is going on, but for now I think we should avoid it.
That is unclear to me and we should really think carefully about pros and cons.
But then we would need to turn the fitted transformer into a
If transformer, I am not in favour strongly of either option. If a function, I would favour an skbase implementation strongly, because it is too far from the sklearn transformer.
I like this, though I still need time to think about it too.
Where exactly? I have already implemented numerical differentiation (with sixth-order approximation) in the
That is correct, currentls I think that API question needs to be answered before we move to the differentiation question (but it might make sense to look at both in close sequence ot validate the design). E.g., do we need to have a separate method for multivariate |
|
FYI, I opened an issue to collect API discussion for multivariate distributions here: |
I'll revert to scipy with version handling for
I've had much more of a think about having a
My impression is that if we were implementing TTR and TD from scratch, with # what this PR has
xgb = XGBoostLSS(**params)
mms = MinMaxScaler((0.1, 0.9))
dt = DifferentiableTransformer(transformer=mms)
pipe = TransformedTargetRegressor(regressor=xgb, transformer=dt)or # we could easily add "dft" with the current PR
xgb = XGBoostLSS(**params)
dft = DifferentiableFuncTransformer(func, inverse_func, func_diff, inverse_diff)
pipe = TransformedTargetRegressor(regressor=xgb, transformer=dft)I think the main issue here is extending/changing the TTR input options, e.g. point 3 above. For example, the
I would keep it inheriting from sklearn for now then. My understanding is it should be straightforward to replace this with skbase if/when needed.
It would be good to get some more thoughts on this. As mentioned above, I think this is the main constraint in how this gets implemented.
In the
I assumed that this was fully supported. Until the these other parts are implemented it will be difficult to design around it. I would consider this a edge-case - sklearn transformers don't even handle combined column transformations (afaik). |
|
ok, I also think that transformer is the better way to go after some thinking. How do we proceed practically - do you want to give it a stab and then we maybe refine? |
|
@fkiraly - I have removed any additional packages and the differentiation is now done by scipy with both the old and new derivative APIs implemented. The transformer implementation we spoke about is implemented and now just needs refining. I have moved some methods down to the DiffT class so as to keep the BaseT as generic as possible. |
|
Nice! Will review in the coming days - this is much appreciated, but I think I need more time to digest the API design. |
@fkiraly - just bringing this one your attention again. Thanks |
Reference Issues/PRs
Background on why we want this: #601 and #605
Supersedes this PR: #608
What does this implement/fix? Explain your changes.
This implementation adds a DifferentiableTransformer (DT). This acts as a wrapped around an sklearn transformer and gives the user the option to use either (in order):
inverse_func_diff.scaler_.inverse_transform.The derivative available for both the (forward) transform and inverse_transform.
The DT has a coerce classmethod that takes a sklearn transformer or a function and coerces it to a DT.
In addition to the above, it also makes changes to #605 to apply the Jacobian (derivative of the transform/inverse_tranform) to the
pdfandlog_pdf. In cases 1) and 2) above this returns the exact pdf and log_pdf.This preserves the current functionality of being able to pass a function to TTR (TransformedTargetRegressor) and allows a user to extend this by passing an sklearn transformer or their own DT. The final point allows the user to configure their DT with either the explicit derivative, or to pass kwargs that can be used for numerical differentiation. In theory, it also allows a user to pass these kwargs as hyperparameters via a gridsearch/optimisation.
Does your contribution introduce a new dependency? If yes, which one?
No.
What should a reviewer concentrate their feedback on?
Implementation of the DT is the most important part:
Changes to TD (TranformedDistribution) - the main change here is internally handling a transformer instead of a function.
Changes to the TTR are minor.
On points 2) and 3), note that if we wish to explicitly pass
transformonly as a method/function, it is very straightforward to change this back. We would simply need to addinverse_transformandinverse_func_diffas kwargs to the TD. This does mean adding more kwargs if we wish to support numerical derivative kwargs.Secondary feedback
Did you add any tests for the change?
Yes, added a param3 to TD, passing a FunctionTransformer instead of just a function.
Any other comments?
I had the bulk of this implementation down prior to the discussion on what we pass to TTR/TD here. Again, it is trivial to move the creation of the DT into TTR and pass the
transform,inverse_transformandinverse_func_diff, but I feel the current approach is much cleaner as it keep the majority of the new logic in the DT and within the new _transformer.py module.PR checklist
For all contributions
How to: add yourself to the all-contributors file in the
skproroot directory (not theCONTRIBUTORS.md). Common badges:code- fixing a bug, or adding code logic.doc- writing or improving documentation or docstrings.bug- reporting or diagnosing a bug (get this pluscodeif you also fixed the bug in the PR).maintenance- CI, test framework, release.See here for full badge reference
For new estimators
docs/source/api_reference/taskname.rst, follow the pattern.Examplessection.python_dependenciestag and ensureddependency isolation, see the estimator dependencies guide.