Skip to content
42 changes: 20 additions & 22 deletions R/pkg/R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -361,10 +361,13 @@ setMethod("column",
#'
#' @rdname corr
#' @name corr
#' @family math functions
#' @family aggregate functions
#' @export
#' @aliases corr,Column-method
#' @examples \dontrun{corr(df$c, df$d)}
#' @examples
#' \dontrun{
#' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need space/newline in front of this example like the other ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one does not need the extra newline since it's in its own Rd and there are no examples before it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great - I know we talk about it, but we might consider getting all examples on the Rd into one \dontrun block again. as of now it's very hard to review new PR without knowing whether a newline is needed or not...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe we should have a newline at the end of every @example block (when there are multiple examples on one Rd)? This way we don't have to know where goes first

#' head(select(df, corr(df$mpg, df$hp)))}
#' @note corr since 1.6.0
setMethod("corr", signature(x = "Column"),
function(x, col2) {
Expand All @@ -375,27 +378,32 @@ setMethod("corr", signature(x = "Column"),

#' cov
#'
#' Compute the sample covariance between two expressions.
#' Compute the covariance between two expressions.
#'
#' @details
#' \code{cov}: Compute the sample covariance between two expressions.
#'
#' @rdname cov
#' @name cov
#' @family math functions
#' @family aggregate functions
#' @export
#' @aliases cov,characterOrColumn-method
#' @examples
#' \dontrun{
#' cov(df$c, df$d)
#' cov("c", "d")
#' covar_samp(df$c, df$d)
#' covar_samp("c", "d")
#' }
#' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))
#' head(select(df, cov(df$mpg, df$hp), cov("mpg", "hp"),
#' covar_samp(df$mpg, df$hp), covar_samp("mpg", "hp"),
#' covar_pop(df$mpg, df$hp), covar_pop("mpg", "hp")))}
#' @note cov since 1.6.0
setMethod("cov", signature(x = "characterOrColumn"),
function(x, col2) {
stopifnot(is(class(col2), "characterOrColumn"))
covar_samp(x, col2)
})

#' @details
#' \code{covar_sample}: Alias for \code{cov}.
#'
#' @rdname cov
#'
#' @param col1 the first Column.
Expand All @@ -414,23 +422,13 @@ setMethod("covar_samp", signature(col1 = "characterOrColumn", col2 = "characterO
column(jc)
})

#' covar_pop
#'
#' Compute the population covariance between two expressions.
#'
#' @param col1 First column to compute cov_pop.
#' @param col2 Second column to compute cov_pop.
#' @details
#' \code{covar_pop}: Computes the population covariance between two expressions.
#'
#' @rdname covar_pop
#' @rdname cov
#' @name covar_pop
#' @family math functions
#' @export
#' @aliases covar_pop,characterOrColumn,characterOrColumn-method
#' @examples
#' \dontrun{
#' covar_pop(df$c, df$d)
#' covar_pop("c", "d")
#' }
#' @note covar_pop since 2.0.0
setMethod("covar_pop", signature(col1 = "characterOrColumn", col2 = "characterOrColumn"),
function(col1, col2) {
Expand Down
20 changes: 7 additions & 13 deletions R/pkg/R/stats.R
Original file line number Diff line number Diff line change
Expand Up @@ -52,22 +52,17 @@ setMethod("crosstab",
collect(dataFrame(sct))
})

#' Calculate the sample covariance of two numerical columns of a SparkDataFrame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this is one of the tricky ones where there is one page for DataFrame & Columns.
I think it's useful to touch of how this works with a SparkDataFrame and keep this line in some form?

Copy link
Contributor Author

@actuaryzhang actuaryzhang Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method for SparkDataFrame is still there. I'm just removing redundant doc here.
See the screenshot here.

image
image

Copy link
Member

@felixcheung felixcheung Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. in that case can we add this in the @details for cov
I feel like this has bits of info that could be useful:
Calculate the sample covariance of two numerical columns of a SparkDataFrame - say, numerical columns of one SparkDataFrame, as supposed to cov(df, df$name, df2$bar)

#'
#' @param colName1 the name of the first column
#' @param colName2 the name of the second column
#' @return The covariance of the two columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would the @return line in the final doc?

Copy link
Contributor Author

@actuaryzhang actuaryzhang Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I added this back. The doc should be very clear even without this return value. Indeed, most functions do not document return value in SparkR. See what it looks like in the image above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but better clarity wouldn't hurt, right?

#'
#' @rdname cov
#' @name cov
#' @aliases cov,SparkDataFrame-method
#' @family stat functions
#' @export
#' @examples
#'\dontrun{
#' df <- read.json("/path/to/file.json")
#' cov <- cov(df, "title", "gender")
#' }
#'
#' \dontrun{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the newline be after the dontrun?

Copy link
Contributor Author

@actuaryzhang actuaryzhang Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The newline should be between @example and \dontrun to separate multiple dontruns. See the screen shot above.

#' cov(df, "mpg", "hp")}
#' @note cov since 1.6.0
setMethod("cov",
signature(x = "SparkDataFrame"),
Expand All @@ -93,11 +88,10 @@ setMethod("cov",
#' @family stat functions
#' @export
#' @examples
#'\dontrun{
#' df <- read.json("/path/to/file.json")
#' corr <- corr(df, "title", "gender")
#' corr <- corr(df, "title", "gender", method = "pearson")
#' }
#'
#' \dontrun{
#' corr(df, "mpg", "hp")
#' corr(df, "mpg", "hp", method = "pearson")}
#' @note corr since 1.6.0
setMethod("corr",
signature(x = "SparkDataFrame"),
Expand Down