-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21027][ML][PYTHON] Added tunable parallelism to one vs. rest in both Scala mllib and Pyspark #19110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…st classification. Added a parallelism parameter to the scala implementation of one vs. rest for python persistence but have not yet used it to tune the scala parallelism implementation.
…n of the one vs. rest algorithm.
…ts for testing that parallelism doesn't affect the output.
…executor service with a given level of parallelism in a separat trait that OneVsRest inherits from.
…ng OneVsRest and OneVsRest model JavaMLReadable and JavaMLWritable)
|
Test build #81350 has finished for PR 19110 at commit
|
|
Jenkins, test this please. |
|
Test build #81352 has finished for PR 19110 at commit
|
24f4499 to
fc6fd5e
Compare
|
Test build #81381 has finished for PR 19110 at commit
|
|
LGTM Btw, if someone else merges this, then @ajaysaini725 and @WeichenXu123 should both be authors, with @ajaysaini725 as the primary one. |
BryanCutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor comments, I think left over from the previous PR.
| None, "TypeConverters.toString"), | ||
| ("aggregationDepth", "suggested depth for treeAggregate (>= 2).", "2", | ||
| "TypeConverters.toInt"), | ||
| ("parallelism", "number of threads to use when fitting models in parallel (>= 1).", "1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should have a more generic description since it is a shared param?
| /** | ||
| * @group expertSetParam | ||
| * The implementation of parallel one vs. rest runs the classification for | ||
| * each class in a separate threads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the previous PR, @group expertSetParam should be at the bottom
|
Test build #81436 has finished for PR 19110 at commit
|
|
Test build #81438 has finished for PR 19110 at commit
|
|
FYI I went ahead and merged #16774 - the doc on the shared param trait is a little more detailed there which I slightly prefer. @WeichenXu123 you will just need to resolve the small merge conflict that introduces. |
|
@MLnick Conflict resolved. Thanks! |
|
Test build #81456 has finished for PR 19110 at commit
|
|
LGTM! |
|
Thanks @MLnick @BryanCutler . Would you mind helping review another similar PR #19122 ? We need some other features but blocking on that PR. Thanks! |
python/pyspark/ml/param/shared.py
Outdated
| Mixin for param parallelism: number of threads to use when fitting models in parallel. | ||
| """ | ||
|
|
||
| parallelism = Param(Params._dummy(), "parallelism", "the number of threads to use when running parallel algorithms.", typeConverter=TypeConverters.toInt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is this out of date? It's missing the "(>= 1)" from the code gen file.
|
Other than that 1 item, this looks ready |
|
Test build #81653 has finished for PR 19110 at commit
|
|
LGTM |
| * each class in a separate threads. | ||
| * | ||
| * @group expertSetParam | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing since annotation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I create a PR to fix this.
## What changes were proposed in this pull request? add missing since tag for `setParallelism` in apache#19110 ## How was this patch tested? N/A Author: WeichenXu <[email protected]> Closes apache#19214 from WeichenXu123/minor01.
What changes were proposed in this pull request?
Added tunable parallelism to the pyspark implementation of one vs. rest classification. Added a parallelism parameter to the Scala implementation of one vs. rest along with functionality for using the parameter to tune the level of parallelism.
I take this PR #18281 over because the original author is busy but we need merge this PR soon.
After this been merged, we can close #18281 .
How was this patch tested?
Test suite added.