[SparkR][SPARK-21381]:SparkR: pass on setHandleInvalid for classification algorithms #18605

wangmiao1981 · 2017-07-12T00:43:04Z

What changes were proposed in this pull request?

SPARK-20307 Added handleInvalid option to RFormula for tree-based classification algorithms. We should add this parameter for other classification algorithms in SparkR.

This is a followup PR for SPARK-20307.

How was this patch tested?

New Unit tests are added.

SparkQA · 2017-07-12T01:58:51Z

Test build #79546 has finished for PR 18605 at commit 77b04a3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangmiao1981 · 2017-07-12T05:58:41Z

@felixcheung This is a follow-up PR of JIRA-20307.

wangmiao1981 · 2017-07-12T18:49:34Z

Trigger windows check.

wangmiao1981 · 2017-07-12T18:49:43Z

Reopen for windows check

felixcheung

could you update this to make it consistent with the earlier PR? I think it's mostly the param document wording

wangmiao1981 · 2017-07-15T19:20:12Z

Sure. I am reading the #18613 comments. Just come back from a business travel. Thanks!

SparkQA · 2017-07-17T19:32:09Z

Test build #79678 has finished for PR 18605 at commit e59941e.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

wangmiao1981 · 2017-07-17T21:34:33Z

@yanboliang after #18613, unit tests fails if "skip" is used.

For example,
data <- data.frame(clicked = base::sample(c(0, 1), 10, replace = TRUE),
someString = base::sample(c("this", "that"), 10, replace = TRUE),
stringsAsFactors = FALSE)
trainidxs <- base::sample(nrow(data), nrow(data) * 0.7)
traindf <- as.DataFrame(data[trainidxs, ])
testdf <- as.DataFrame(rbind(data[-trainidxs, ], c(0, "the other")))
model <- spark.mlp(traindf, clicked ~ ., layers = c(1, 3), handleInvalid = "skip")
predictions <- predict(model, testdf)
expect_equal(class(collect(predictions)$clicked[1]), "character")

It fails the as if "error" is used.

If I change "skip" to "keep", then the predictions$click[0] is NULL.

collect(predictions)
[1] clicked someString prediction
<0 rows> (or 0-length row.names)
collect(predictions)$click[1]
[[1]]
NULL

I am not sure whether this is expected or there is a bug.

Before, the units work fine.

yanboliang · 2017-07-17T23:26:34Z

@wangmiao1981 This is expected, see my comment here . This uncovers an existing bug for forceIndexLabel. I will send a fix later, please use keep to test handleInvalid currently. Thanks.

wangmiao1981 · 2017-07-18T16:56:37Z

@yanboliang Thanks for your reply! I will change the unit tests now.

SparkQA · 2017-07-18T22:36:42Z

Test build #79728 has finished for PR 18605 at commit 3d7c517.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-07-19T01:35:56Z

Test build #79734 has finished for PR 18605 at commit 3ebb5cd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangmiao1981 · 2017-07-20T23:57:58Z

@yanboliang I have made changes accordingly. Thanks!

wangmiao1981 · 2017-07-24T17:20:33Z

@felixcheung Can you take a look? Thanks!

felixcheung · 2017-07-25T16:14:29Z

I'll take a look

felixcheung

sorry about the delay, one comment otherwise LG

felixcheung · 2017-07-29T03:44:25Z

R/pkg/R/mllib_tree.R

-#'                           "error" (throw an error), "keep" (put invalid data in a special additional
-#'                           bucket, at index numLabels). Default is "error".
+#' @param handleInvalid How to handle invalid data (unseen labels or NULL values) in features and label
+#'                      column of string type.


this was only for classification though, so the original text in classification model. we should keep.
ditto for decisionTree and gbt in this .R file

SparkQA · 2017-07-31T18:49:35Z

Test build #80081 has finished for PR 18605 at commit 18c69d2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-08-01T03:37:36Z

merged to master

felixcheung mentioned this pull request Jul 12, 2017

[SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula should handle invalid for both features and label column. #18613

Closed

wangmiao1981 closed this Jul 12, 2017

wangmiao1981 reopened this Jul 12, 2017

felixcheung reviewed Jul 15, 2017

View reviewed changes

wangmiao1981 added 2 commits July 15, 2017 12:23

add handleInvalid for classifications

aca87bd

revise comments

e59941e

wangmiao1981 force-pushed the class branch from 77b04a3 to e59941e Compare July 17, 2017 18:14

revise unit tests

3d7c517

fix test failure

3ebb5cd

felixcheung reviewed Jul 29, 2017

View reviewed changes

address review comments

18c69d2

felixcheung approved these changes Aug 1, 2017

View reviewed changes

asfgit closed this in 9570e81 Aug 1, 2017

[SparkR][SPARK-21381]:SparkR: pass on setHandleInvalid for classification algorithms #18605

[SparkR][SPARK-21381]:SparkR: pass on setHandleInvalid for classification algorithms #18605

Uh oh!

Conversation

wangmiao1981 commented Jul 12, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jul 12, 2017

Uh oh!

wangmiao1981 commented Jul 12, 2017

Uh oh!

wangmiao1981 commented Jul 12, 2017

Uh oh!

wangmiao1981 commented Jul 12, 2017

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

wangmiao1981 commented Jul 15, 2017

Uh oh!

SparkQA commented Jul 17, 2017

Uh oh!

wangmiao1981 commented Jul 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yanboliang commented Jul 17, 2017

Uh oh!

wangmiao1981 commented Jul 18, 2017

Uh oh!

SparkQA commented Jul 18, 2017

Uh oh!

SparkQA commented Jul 19, 2017

Uh oh!

wangmiao1981 commented Jul 20, 2017

Uh oh!

wangmiao1981 commented Jul 24, 2017

Uh oh!

felixcheung commented Jul 25, 2017

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

felixcheung Jul 29, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 31, 2017

Uh oh!

felixcheung commented Aug 1, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wangmiao1981 commented Jul 17, 2017 •

edited

Loading