Skip to content

Commit aa023fd

Browse files
committed
[SPARK-17902][R] Revive stringsAsFactors option for collect() in SparkR
## What changes were proposed in this pull request? This PR proposes to revive `stringsAsFactors` option in collect API, which was mistakenly removed in 71a138c. Simply, it casts `charactor` to `factor` if it meets the condition, `stringsAsFactors && is.character(vec)` in primitive type conversion. ## How was this patch tested? Unit test in `R/pkg/tests/fulltests/test_sparkSQL.R`. Author: hyukjinkwon <[email protected]> Closes #19551 from HyukjinKwon/SPARK-17902. (cherry picked from commit a83d8d5) Signed-off-by: hyukjinkwon <[email protected]>
1 parent 3e77b74 commit aa023fd

2 files changed

Lines changed: 9 additions & 0 deletions

File tree

R/pkg/R/DataFrame.R

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1173,6 +1173,9 @@ setMethod("collect",
11731173
vec <- do.call(c, col)
11741174
stopifnot(class(vec) != "list")
11751175
class(vec) <- PRIMITIVE_TYPES[[colType]]
1176+
if (is.character(vec) && stringsAsFactors) {
1177+
vec <- as.factor(vec)
1178+
}
11761179
df[[colIndex]] <- vec
11771180
} else {
11781181
df[[colIndex]] <- col

R/pkg/tests/fulltests/test_sparkSQL.R

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -417,6 +417,12 @@ test_that("create DataFrame with different data types", {
417417
expect_equal(collect(df), data.frame(l, stringsAsFactors = FALSE))
418418
})
419419

420+
test_that("SPARK-17902: collect() with stringsAsFactors enabled", {
421+
df <- suppressWarnings(collect(createDataFrame(iris), stringsAsFactors = TRUE))
422+
expect_equal(class(iris$Species), class(df$Species))
423+
expect_equal(iris$Species, df$Species)
424+
})
425+
420426
test_that("SPARK-17811: can create DataFrame containing NA as date and time", {
421427
df <- data.frame(
422428
id = 1:2,

0 commit comments

Comments
 (0)