-
Notifications
You must be signed in to change notification settings - Fork 29k
[SparkR][SPARK-21381]:SparkR: pass on setHandleInvalid for classification algorithms #18605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #79546 has finished for PR 18605 at commit
|
|
@felixcheung This is a follow-up PR of JIRA-20307. |
|
Trigger windows check. |
|
Reopen for windows check |
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you update this to make it consistent with the earlier PR? I think it's mostly the param document wording
|
Sure. I am reading the #18613 comments. Just come back from a business travel. Thanks! |
|
Test build #79678 has finished for PR 18605 at commit
|
|
@yanboliang after #18613, unit tests fails if "skip" is used. For example, It fails the as if "error" is used. If I change "skip" to "keep", then the predictions$click[0] is NULL.
I am not sure whether this is expected or there is a bug. Before, the units work fine. |
|
@wangmiao1981 This is expected, see my comment here . This uncovers an existing bug for |
|
@yanboliang Thanks for your reply! I will change the unit tests now. |
|
Test build #79728 has finished for PR 18605 at commit
|
|
Test build #79734 has finished for PR 18605 at commit
|
|
@yanboliang I have made changes accordingly. Thanks! |
|
@felixcheung Can you take a look? Thanks! |
|
I'll take a look |
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry about the delay, one comment otherwise LG
R/pkg/R/mllib_tree.R
Outdated
| #' "error" (throw an error), "keep" (put invalid data in a special additional | ||
| #' bucket, at index numLabels). Default is "error". | ||
| #' @param handleInvalid How to handle invalid data (unseen labels or NULL values) in features and label | ||
| #' column of string type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was only for classification though, so the original text in classification model. we should keep.
ditto for decisionTree and gbt in this .R file
|
Test build #80081 has finished for PR 18605 at commit
|
|
merged to master |
What changes were proposed in this pull request?
SPARK-20307 Added handleInvalid option to RFormula for tree-based classification algorithms. We should add this parameter for other classification algorithms in SparkR.
This is a followup PR for SPARK-20307.
How was this patch tested?
New Unit tests are added.