Skip to content

Conversation

@Louquinze
Copy link
Collaborator

In this PR we change the categorical dictionary to the 'feat_type' dictionary which stores all column name column type pairs.
We do not changed the behavior of the code overall.

In the calculate metafeatures.py:1180 we still create the boolean dictionary which than is passed to the helper functions. This ensures that the behavior of the code does not change. Also more advanced changes include including strings to the meta features.

… are marked as categorical. Previously string columns are treated as not categorical which transfers to numerical. We decided that encoded string columns are more similar to OHE as they are to numerical transformations. This change effects the metalearning part of `autosklearn` exclusively.
… are marked as categorical. Previously string columns are treated as not categorical which transfers to numerical. We decided that encoded string columns are more similar to OHE as they are to numerical transformations. This change effects the metalearning part of `autosklearn` exclusively.
…feat_type dictionary which stores more information and reduce it to categorical where it is really needed.
@Louquinze Louquinze requested review from eddiebergman and mfeurer May 2, 2022 09:28
@codecov
Copy link

codecov bot commented May 2, 2022

Codecov Report

Merging #1454 (fc9628d) into development (daa9ad6) will decrease coverage by 0.05%.
The diff coverage is 95.34%.

@@               Coverage Diff               @@
##           development    #1454      +/-   ##
===============================================
- Coverage        84.31%   84.25%   -0.06%     
===============================================
  Files              147      147              
  Lines            11284    11287       +3     
  Branches          1934     1939       +5     
===============================================
- Hits              9514     9510       -4     
- Misses            1256     1257       +1     
- Partials           514      520       +6     

Impacted file tree graph

Copy link
Contributor

@eddiebergman eddiebergman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this needs some kind of clear test to describe the intended behaviours as I see what's going on but not what's meant to come from it.

@Louquinze
Copy link
Collaborator Author

Do some know what the issue is with the Docs / build-and-deploy (pull_request) test which always fails ? i tried to understand this but i do not really get it :(

@eddiebergman
Copy link
Contributor

Seems it was just a timeout, I woul just rerun it and hope it works. I manually checked the link and it seems to work fine so I dunno what's up

/home/runner/work/auto-sklearn/auto-sklearn/doc/examples/60_search/example_successive_halving.rst.rst:31:broken link: https://jmlr.org/papers/volume18/16-558/16-558.pdf (HTTPSConnectionPool(host='jmlr.org', port=443): Max retries exceeded with url: /papers/volume18/16-558/16-558.pdf (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f35a9fb44f0>: Failed to establish a new connection: [Errno 110] Connection timed out')))

@Louquinze Louquinze requested review from eddiebergman and mfeurer May 4, 2022 11:28
Copy link
Contributor

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, this look very nice overall. I believe these are a few places where the code can be simplified and it would be great if you could have a look at them.

@Louquinze Louquinze requested a review from mfeurer May 5, 2022 08:24
@Louquinze Louquinze requested a review from mfeurer May 8, 2022 19:25
@mfeurer mfeurer merged commit 800e659 into automl:development May 9, 2022
github-actions bot pushed a commit that referenced this pull request May 9, 2022
eddiebergman pushed a commit that referenced this pull request Aug 18, 2022
* change treatment of string features. In file `smbo.py` string columns are marked as categorical. Previously string columns are treated as not categorical which transfers to numerical. We decided that encoded string columns are more similar to OHE as they are to numerical transformations. This change effects the metalearning part of `autosklearn` exclusively.

* change treatment of string features. In file `smbo.py` string columns are marked as categorical. Previously string columns are treated as not categorical which transfers to numerical. We decided that encoded string columns are more similar to OHE as they are to numerical transformations. This change effects the metalearning part of `autosklearn` exclusively.

* made categorical dictionary more cohesive. Use the previously create feat_type dictionary which stores more information and reduce it to categorical where it is really needed.

* change test files since they now get the feat type dictionary and no longer the categorical one

* change test files since they now get the feat type dictionary and no longer the categorical one

* moved feat_type to categorical conv. to the helper functions

* fixed minor issues

* fixed minor issues

* fixed minor issues

* fixed bug in metalearning tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants