Skip to content

Conversation

@jkbradley
Copy link
Member

Updated DecisionTree documentation, with examples for Java, Python.
Added same Java example to code as well.
CC: @mengxr @manishamde @atalwalkar

@SparkQA
Copy link

SparkQA commented Aug 20, 2014

QA tests have started for PR 2063 at commit b9bee04.

  • This patch merges cleanly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache the data before computation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is cached by tree training, but should we cache it here too since it used again for testing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cached the binned features in training. But in this example, we visit the raw features twice. Since it is reading from disk, it should help if we cache the data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are only calculating numClasses in the Java example. Should we eliminate it since it makes the already verbose Java code even more verbose? Else, we need to make the same change to the Scala and Python examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to cache, let's note why we are doing it via a comment. Else, some users might get confused and decide to cache always before calling the tree algorithm.

@SparkQA
Copy link

SparkQA commented Aug 20, 2014

QA tests have finished for PR 2063 at commit b9bee04.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • is used for ordering. In multiclass classification, all$2^`
    • public final class JavaDecisionTree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just call them random forests instead? :-)

@SparkQA
Copy link

SparkQA commented Aug 20, 2014

QA tests have started for PR 2063 at commit d802369.

  • This patch merges cleanly.

@manishamde
Copy link
Contributor

Thanks @jkbradley

I had some minor comments that I have noted above. LGTM!

@jkbradley
Copy link
Member Author

@mengxr @manishamde Thanks for the feedback! I believe I've addressed all of the comments.

@SparkQA
Copy link

SparkQA commented Aug 20, 2014

QA tests have started for PR 2063 at commit 9dd1b6b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 20, 2014

QA tests have finished for PR 2063 at commit d802369.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 20, 2014

QA tests have finished for PR 2063 at commit 9dd1b6b.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • In multiclass classification, all$2^`
    • public final class JavaDecisionTree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation is specific to binary classification, though I think it's supposed to explain a strategy that's applicable to both binary classification and regression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

@jkbradley
Copy link
Member Author

@atalwalkar Thanks for the comments! I believe I've fixed the issues.

@SparkQA
Copy link

SparkQA commented Aug 21, 2014

QA tests have started for PR 2063 at commit 2dd2c19.

  • This patch merges cleanly.

@atalwalkar
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Aug 21, 2014

QA tests have finished for PR 2063 at commit 2dd2c19.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Aug 21, 2014

I've merged this into master and branch-1.1. Thanks!!

asfgit pushed a commit that referenced this pull request Aug 21, 2014
Updated DecisionTree documentation, with examples for Java, Python.
Added same Java example to code as well.
CC: @mengxr  @manishamde @atalwalkar

Author: Joseph K. Bradley <[email protected]>

Closes #2063 from jkbradley/dt-docs and squashes the following commits:

2dd2c19 [Joseph K. Bradley] Last updates based on github review.
9dd1b6b [Joseph K. Bradley] Updated decision tree doc.
d802369 [Joseph K. Bradley] Updates based on comments: cache data, corrected doc text.
b9bee04 [Joseph K. Bradley] Updated DT examples
57eee9f [Joseph K. Bradley] Created JavaDecisionTree example from example in docs, and corrected doc example as needed.
d939a92 [Joseph K. Bradley] Updated DecisionTree documentation.  Added Java, Python examples.

(cherry picked from commit 050f8d0)
Signed-off-by: Xiangrui Meng <[email protected]>
@asfgit asfgit closed this in 050f8d0 Aug 21, 2014
@jkbradley jkbradley deleted the dt-docs branch August 26, 2014 17:41
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Updated DecisionTree documentation, with examples for Java, Python.
Added same Java example to code as well.
CC: @mengxr  @manishamde @atalwalkar

Author: Joseph K. Bradley <[email protected]>

Closes apache#2063 from jkbradley/dt-docs and squashes the following commits:

2dd2c19 [Joseph K. Bradley] Last updates based on github review.
9dd1b6b [Joseph K. Bradley] Updated decision tree doc.
d802369 [Joseph K. Bradley] Updates based on comments: cache data, corrected doc text.
b9bee04 [Joseph K. Bradley] Updated DT examples
57eee9f [Joseph K. Bradley] Created JavaDecisionTree example from example in docs, and corrected doc example as needed.
d939a92 [Joseph K. Bradley] Updated DecisionTree documentation.  Added Java, Python examples.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants