-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Made categorical dictionary more cohesive to overall structure #1454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… are marked as categorical. Previously string columns are treated as not categorical which transfers to numerical. We decided that encoded string columns are more similar to OHE as they are to numerical transformations. This change effects the metalearning part of `autosklearn` exclusively.
… are marked as categorical. Previously string columns are treated as not categorical which transfers to numerical. We decided that encoded string columns are more similar to OHE as they are to numerical transformations. This change effects the metalearning part of `autosklearn` exclusively.
# Conflicts: # autosklearn/smbo.py
…feat_type dictionary which stores more information and reduce it to categorical where it is really needed.
Codecov Report
@@ Coverage Diff @@
## development #1454 +/- ##
===============================================
- Coverage 84.31% 84.25% -0.06%
===============================================
Files 147 147
Lines 11284 11287 +3
Branches 1934 1939 +5
===============================================
- Hits 9514 9510 -4
- Misses 1256 1257 +1
- Partials 514 520 +6 |
eddiebergman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine this needs some kind of clear test to describe the intended behaviours as I see what's going on but not what's meant to come from it.
…longer the categorical one
|
Do some know what the issue is with the |
|
Seems it was just a timeout, I woul just rerun it and hope it works. I manually checked the link and it seems to work fine so I dunno what's up |
mfeurer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, this look very nice overall. I believe these are a few places where the code can be simplified and it would be great if you could have a look at them.
test/test_metalearning/pyMetaLearn/test_meta_features_sparse.py
Outdated
Show resolved
Hide resolved
test/test_metalearning/pyMetaLearn/test_meta_features_sparse.py
Outdated
Show resolved
Hide resolved
* change treatment of string features. In file `smbo.py` string columns are marked as categorical. Previously string columns are treated as not categorical which transfers to numerical. We decided that encoded string columns are more similar to OHE as they are to numerical transformations. This change effects the metalearning part of `autosklearn` exclusively. * change treatment of string features. In file `smbo.py` string columns are marked as categorical. Previously string columns are treated as not categorical which transfers to numerical. We decided that encoded string columns are more similar to OHE as they are to numerical transformations. This change effects the metalearning part of `autosklearn` exclusively. * made categorical dictionary more cohesive. Use the previously create feat_type dictionary which stores more information and reduce it to categorical where it is really needed. * change test files since they now get the feat type dictionary and no longer the categorical one * change test files since they now get the feat type dictionary and no longer the categorical one * moved feat_type to categorical conv. to the helper functions * fixed minor issues * fixed minor issues * fixed minor issues * fixed bug in metalearning tests
In this PR we change the
categoricaldictionary to the 'feat_type' dictionary which stores all column name column type pairs.We do not changed the behavior of the code overall.
In the calculate metafeatures.py:1180 we still create the boolean dictionary which than is passed to the helper functions. This ensures that the behavior of the code does not change. Also more advanced changes include including strings to the meta features.