Skip to content

Conversation

@fjiang6
Copy link

@fjiang6 fjiang6 commented Jan 28, 2015

Add single pseudo-eigenvector PIC
Including documentations and updated pom.xml with the following codes:
mllib/src/main/scala/org/apache/spark/mllib/clustering/PIClustering.scala
mllib/src/test/scala/org/apache/spark/mllib/clustering/PIClusteringSuite.scala

fjiang6 and others added 24 commits January 22, 2015 13:52
@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26252 has started for PR 4254 at commit 121e4d5.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26252 has finished for PR 4254 at commit 121e4d5.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26252/
Test FAILed.

@mengxr
Copy link
Contributor

mengxr commented Jan 28, 2015

Is it possible to do Gaussian similarity in another PR? It should be part of the feature transformation but not within PIC. It would be easier for code review if the PR is minimal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not handle I/O here. We can have an example code under examples/ and load files there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need ordering vt before calling kmeans.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, it was used sorted for other purpose, will remove.

@mengxr
Copy link
Contributor

mengxr commented Jan 29, 2015

@fjiang6 @javadba Please focus on the public APIs first and then the implementation. The best way to check public APIs is generating the html doc and look what are exposed to users. We can probably refactor the test code in later PRs.

@SparkQA
Copy link

SparkQA commented Jan 30, 2015

Test build #26354 has started for PR 4254 at commit 24fbf52.

  • This patch merges cleanly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too large for unit tests. Unit tests should be as minimal as possible. For this one, we can construct a very small graph, compute its eigenvector, and derive the clustering result manually, then verify PIC result. For example

a - b - c - g - h
| \     |   | \ |
d - e - f   i - j

Assign each edge distance 1 and run PIC with k = 2. The solution should be clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Jan 30, 2015

Test build #26354 has finished for PR 4254 at commit 24fbf52.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PowerIterationClustering(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26354/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Jan 30, 2015

Test build #26420 has started for PR 4254 at commit f292f31.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 30, 2015

Test build #26423 has started for PR 4254 at commit 4550850.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 30, 2015

Test build #26420 has finished for PR 4254 at commit f292f31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PowerIterationClusteringModel(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26420/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Jan 30, 2015

Test build #26423 has finished for PR 4254 at commit 4550850.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PowerIterationClusteringModel(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26423/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use relative path "api/graphx/...". See examples in this markdown file.

@asfgit asfgit closed this in f377431 Jan 30, 2015
@mengxr
Copy link
Contributor

mengxr commented Jan 30, 2015

LGTM except minor user guide issues, which will be addressed in SPARK-5503. I've merged this into master. Thanks for the contributing! (Now MLlib depends on GraphX.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants