Skip to content

Conversation

@WeichenXu123
Copy link
Contributor

@WeichenXu123 WeichenXu123 commented Sep 3, 2017

What changes were proposed in this pull request?

Added tunable parallelism to the pyspark implementation of one vs. rest classification. Added a parallelism parameter to the Scala implementation of one vs. rest along with functionality for using the parameter to tune the level of parallelism.

I take this PR #18281 over because the original author is busy but we need merge this PR soon.
After this been merged, we can close #18281 .

How was this patch tested?

Test suite added.

ajaysaini725 and others added 20 commits June 12, 2017 14:46
…st classification. Added a parallelism parameter to the scala implementation of one vs. rest for python persistence but have not yet used it to tune the scala parallelism implementation.
…ts for testing that parallelism doesn't affect the output.
…executor service with a given level of parallelism in a separat trait that OneVsRest inherits from.
…ng OneVsRest and OneVsRest model JavaMLReadable and JavaMLWritable)
@SparkQA
Copy link

SparkQA commented Sep 3, 2017

Test build #81350 has finished for PR 19110 at commit 24f4499.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@WeichenXu123
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Sep 3, 2017

Test build #81352 has finished for PR 19110 at commit 24f4499.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 4, 2017

Test build #81381 has finished for PR 19110 at commit fc6fd5e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

jkbradley commented Sep 5, 2017

LGTM
Pinging on #16774 first to coordinate merging

Btw, if someone else merges this, then @ajaysaini725 and @WeichenXu123 should both be authors, with @ajaysaini725 as the primary one.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments, I think left over from the previous PR.

None, "TypeConverters.toString"),
("aggregationDepth", "suggested depth for treeAggregate (>= 2).", "2",
"TypeConverters.toInt"),
("parallelism", "number of threads to use when fitting models in parallel (>= 1).", "1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should have a more generic description since it is a shared param?

/**
* @group expertSetParam
* The implementation of parallel one vs. rest runs the classification for
* each class in a separate threads.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the previous PR, @group expertSetParam should be at the bottom

@SparkQA
Copy link

SparkQA commented Sep 6, 2017

Test build #81436 has finished for PR 19110 at commit 7d0849e.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 6, 2017

Test build #81438 has finished for PR 19110 at commit edcf85c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MLnick
Copy link
Contributor

MLnick commented Sep 6, 2017

FYI I went ahead and merged #16774 - the doc on the shared param trait is a little more detailed there which I slightly prefer. @WeichenXu123 you will just need to resolve the small merge conflict that introduces.

@WeichenXu123
Copy link
Contributor Author

@MLnick Conflict resolved. Thanks!

@SparkQA
Copy link

SparkQA commented Sep 6, 2017

Test build #81456 has finished for PR 19110 at commit 7a1d404.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class NettyMemoryMetrics implements MetricSet

@BryanCutler
Copy link
Member

LGTM!

@WeichenXu123
Copy link
Contributor Author

Thanks @MLnick @BryanCutler . Would you mind helping review another similar PR #19122 ? We need some other features but blocking on that PR. Thanks!

Mixin for param parallelism: number of threads to use when fitting models in parallel.
"""

parallelism = Param(Params._dummy(), "parallelism", "the number of threads to use when running parallel algorithms.", typeConverter=TypeConverters.toInt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is this out of date? It's missing the "(>= 1)" from the code gen file.

@jkbradley
Copy link
Member

Other than that 1 item, this looks ready

@SparkQA
Copy link

SparkQA commented Sep 12, 2017

Test build #81653 has finished for PR 19110 at commit c24d4e2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM
Merging with master
Thanks!

@asfgit asfgit closed this in 720c94f Sep 12, 2017
@WeichenXu123 WeichenXu123 deleted the spark-21027 branch September 13, 2017 01:49
* each class in a separate threads.
*
* @group expertSetParam
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing since annotation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I create a PR to fix this.

ghost pushed a commit to dbtsai/spark that referenced this pull request Sep 13, 2017
## What changes were proposed in this pull request?

add missing since tag for `setParallelism` in apache#19110

## How was this patch tested?

N/A

Author: WeichenXu <[email protected]>

Closes apache#19214 from WeichenXu123/minor01.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants