-
Notifications
You must be signed in to change notification settings - Fork 0
Moo ensemble #3
base: development
Are you sure you want to change the base?
Moo ensemble #3
Conversation
|
This PR is ready from my side except for the missing documentation, unit tests and ... weighting. While I know how to do the 1st two, I'm not 100% sure how to do weighting of metrics properly. The ParEGO paper states:
but gives no directions on how to estimate such limits. For bounded metrics (accuracy, AUC) we could use the known bounds. However, these might be loose (the ensemble will never observe something that is worse than 0.5 AUC because the ensemble builder will drop that before) or unknown (What's the upper bound of RMSE?). Here's a potential remedy:
What do you all think? After writing this I am currently considering to pass the scores computed in the ensemble builder to the ensemble method as an additional argument. |
* Load single best model as fallback * Update estimators.py * Improve comment in code. * Fix meta-data generation test, potentially improve model loading
eddiebergman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor changes but mostly it looks all good. As you mentioned, some doc and test needed.
The only major discussion point is what to do with the fact the API for specifying Ensembling has now changed. ensemble_size has been removed and I feel like this is a parameter quite a few people will have specified. If we can keep it so that default behaviour is maintained with ensemble_size specified, I would be happy with that.
| if self._ensemble_class is not None: | ||
| raise ValueError( | ||
| "Not starting ensemble builder because there " | ||
| "is no time left. Try increasing the value " | ||
| "of time_left_for_this_task." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really notice this before but does this mean we can end up where a user calls fit but because of bad time management on autosklearn's part, the user gets a ValueError? It would make more sense to make this a RuntimeError but maybe it's not worth changing at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they should get an ensemble with a DummyEstimator.
| ensemble_class: Type[AbstractEnsemble] | None = EnsembleSelection, | ||
| ensemble_kwargs: Dict[str, Any] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main API breaking change, I imagine most power users will have ensemble_size specified and maybe we should deprecate it, maintaining old default behavior by inserting it into ensemble_kwargs? We can issue a warning:
# In function defnition
ensemble_size: int | None = None
# User specified `ensemble_size` explicitly, warn them about deprecation
if ensemble_size is not None:
warnings.warn(f"`ensemble_size` has been deprecated, please use `ensemble_kwargs = { 'ensemble_size': {ensemble_size} }. Inserting `ensemble_size` into `ensemble_kwargs` for now.")
# Keep consistent behaviour
if ensemble_kwargs is None:
ensemble_kwargs = {"ensemble_size": ensemble_size}
else:
ensemble_kwargs["ensemble_size"] = ensemble_size
else:
# Old default behaviour, no need to warn here as they havn't set `ensemble_size`
if ensemble_kwargs is None:
ensemble_kwargs = {"ensemble_size": 50}Porbably not the cleanest solution but hopefully it gets the point across.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above I'll deprecate this. I'll deprecate this here so that automl.py will only accept ensemle_kwargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this.
| import numpy as np | ||
|
|
||
| from autosklearn.automl import AutoML | ||
|
|
||
| import pytest | ||
| from pytest_cases import filters as ft | ||
| from pytest_cases import parametrize, parametrize_with_cases | ||
|
|
||
| import test.test_automl.cases as cases | ||
|
|
||
|
|
||
| @parametrize("ensemble_size", [-10, -1, 0]) | ||
| @parametrize_with_cases("automl", cases=cases, filter=~ft.has_tag("fitted")) | ||
| def test_non_positive_ensemble_size_raises( | ||
| tmp_dir: str, | ||
| automl: AutoML, | ||
| ensemble_size: int, | ||
| ) -> None: | ||
| """ | ||
| Parameters | ||
| ---------- | ||
| automl: AutoML | ||
| The AutoML object to test | ||
| ensemble_size : int | ||
| The ensemble size to use | ||
| Expects | ||
| ------- | ||
| * Can't fit ensemble with non-positive ensemble size | ||
| """ | ||
| dummy_data = np.array([1, 1, 1]) | ||
|
|
||
| with pytest.raises(ValueError): | ||
| automl.fit_ensemble(dummy_data, ensemble_size=ensemble_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably replaced by it's new counterpart behaviour in that no ensemble_class will raise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do so, but I'm not 100% if it makes a lot of sense. The user would have to set ensemble_class=None in both the constructor and fit_ensemble.
…s around random state for ensembles
|
I think SMAC does the min max scaling and having thought about it then, I don't see another good alternative |
|
I've disabled We can always re-enable them at the end |
| by=lambda r: has_metrics(r) and tangible_losses(r), | ||
| ) | ||
| all_discarded.update(discarded) | ||
| print("all_discarded", all_discarded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This print should not be here I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, twas some debugging, good spot :)
Hi everybody,
Here's a draft PR on how to pass custom ensemble classes to Auto-sklearn, and how to use multi-objective ensembles. is currently in the private fork because we don't want to make it public yet. As before, I'll add unit tests and doc strings as soon as folks are happy with the API.
Concretely, this PR adds:
Open TODOs: