-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12044] [SparkR] Fix usage of isnan, isNaN #10037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
6805952
00ce43a
041c9c6
10a5fe7
95fdd2c
3ee7d5c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -488,19 +488,35 @@ setMethod("initcap", | |
| column(jc) | ||
| }) | ||
|
|
||
| #' isNaN | ||
| #' isnan | ||
| #' | ||
| #' Return true iff the column is NaN. | ||
| #' Return true if the column is NaN. | ||
| #' | ||
| #' @rdname isNaN | ||
| #' @name isNaN | ||
| #' @rdname isnan | ||
| #' @name isnan | ||
| #' @family normal_funcs | ||
| #' @export | ||
| #' @examples \dontrun{isNaN(df$c)} | ||
| setMethod("isNaN", | ||
| #' @examples \dontrun{isnan(df$c)} | ||
| setMethod("isnan", | ||
| signature(x = "Column"), | ||
| function(x) { | ||
| jc <- callJStatic("org.apache.spark.sql.functions", "isNaN", x@jc) | ||
| jc <- callJStatic("org.apache.spark.sql.functions", "isnan", x@jc) | ||
| column(jc) | ||
| }) | ||
|
|
||
| #' isnull | ||
| #' | ||
| #' Return true if the column is NULL. | ||
| #' | ||
| #' @rdname isnull | ||
| #' @name isnull | ||
| #' @family normal_funcs | ||
| #' @export | ||
| #' @examples \dontrun{isnull(df$c)} | ||
| setMethod("isnull", | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. similarly,
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is.null works on the object as a whole. is.na is better()?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed,
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the context of DataFrame column, "null" means missing value, which I think NA in R means. When we read a column from a DataFrame to R side, null will be converted to NA, see the code at https://github.com/apache/spark/blob/master/R/pkg/R/deserialize.R#L115
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this gets tricky... R's usage for NULL is not the same as JVM's. In many cases it might be closer to R's NA though not exactly the same either. eg. can't set a column in data.frame to NULL in the conventional way: Maybe it's fine to leave it as
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Our replies crossed - it would seem to me to say Scala/Python In any case we could definitely use some explanations on NULL/NA in the SPARKR programming guide, opened JIRA https://issues.apache.org/jira/browse/SPARK-12071
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good. It's a place where we can discuss how we map R NA or NULL to Scala null. |
||
| signature(x = "Column"), | ||
| function(x) { | ||
| jc <- callJStatic("org.apache.spark.sql.functions", "isnull", x@jc) | ||
| column(jc) | ||
| }) | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -621,6 +621,10 @@ setGeneric("getField", function(x, ...) { standardGeneric("getField") }) | |
| #' @export | ||
| setGeneric("getItem", function(x, ...) { standardGeneric("getItem") }) | ||
|
|
||
| #' @rdname column | ||
| #' @export | ||
| setGeneric("isNaN", function(x) { standardGeneric("isNaN") }) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also, isNaN is around since Spark 1.5 - if we are taking this out we would need to note this breaking change in release doc with a JIRA
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry, rereading the description. it sounds like we have
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, indeed. I have send #10056 to push SQL side make change to uniform interface.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It won't uniform the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand the resolution here?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok I think I get it. Let me summarize the situation below and let know if I am getting it right.
I think the change looks fine to me, but I just want to understand the different things going on here
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe that's correct.We could scope this PR to only isnan on Column. And track others with JIRAs.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @shivaram, to be clear, IsNaN, isnan and is.nan are functions (in org.apache.spark.sql.functions) applied to Column, they are not DataFrame operators. IsNaN function is deprecated by isnan function. We add is.nan as an alias of isnan. |
||
|
|
||
| #' @rdname column | ||
| #' @export | ||
| setGeneric("isNull", function(x) { standardGeneric("isNull") }) | ||
|
|
@@ -796,9 +800,13 @@ setGeneric("initcap", function(x) { standardGeneric("initcap") }) | |
| #' @export | ||
| setGeneric("instr", function(y, x) { standardGeneric("instr") }) | ||
|
|
||
| #' @rdname isNaN | ||
| #' @rdname isnan | ||
| #' @export | ||
| setGeneric("isNaN", function(x) { standardGeneric("isNaN") }) | ||
| setGeneric("isnan", function(x) { standardGeneric("isnan") }) | ||
|
|
||
| #' @rdname isnull | ||
| #' @export | ||
| setGeneric("isnull", function(x) { standardGeneric("isnull") }) | ||
|
|
||
| #' @rdname kurtosis | ||
| #' @export | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -878,7 +878,7 @@ test_that("column functions", { | |
| c2 <- avg(c) + base64(c) + bin(c) + bitwiseNOT(c) + cbrt(c) + ceil(c) + cos(c) | ||
| c3 <- cosh(c) + count(c) + crc32(c) + exp(c) | ||
| c4 <- explode(c) + expm1(c) + factorial(c) + first(c) + floor(c) + hex(c) | ||
| c5 <- hour(c) + initcap(c) + isNaN(c) + last(c) + last_day(c) + length(c) | ||
| c5 <- hour(c) + initcap(c) + isnan(c) + isnull(c) + last(c) + last_day(c) + length(c) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please add test cases for "isNaN", "isNull", "isNotNul" for Column |
||
| c6 <- log(c) + (c) + log1p(c) + log2(c) + lower(c) + ltrim(c) + max(c) + md5(c) | ||
| c7 <- mean(c) + min(c) + month(c) + negate(c) + quarter(c) | ||
| c8 <- reverse(c) + rint(c) + round(c) + rtrim(c) + sha1(c) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be called (or have an alias)
is.nan?https://stat.ethz.ch/R-manual/R-devel/library/base/html/is.finite.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it makes sense to add an alias as is.nan