Skip to content

Commit 96c9589

Browse files
HyukjinKwondongjoon-hyun
authored andcommitted
[SPARK-35045][SQL] Add an internal option to control input buffer in univocity
### What changes were proposed in this pull request? This PR makes the input buffer configurable (as an internal option). This is mainly to work around uniVocity/univocity-parsers#449. ### Why are the changes needed? To work around uniVocity/univocity-parsers#449. ### Does this PR introduce _any_ user-facing change? No, it's only internal option. ### How was this patch tested? Manually tested by modifying the unittest added in apache#31858 as below: ```diff diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala index fd25a79..b58f0bd3661 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala -2460,6 +2460,7 abstract class CSVSuite Seq(line).toDF.write.text(path.getAbsolutePath) assert(spark.read.format("csv") .option("delimiter", "|") + .option("inputBufferSize", "128") .option("ignoreTrailingWhiteSpace", "true").load(path.getAbsolutePath).count() == 1) } } ``` Closes apache#32145 from HyukjinKwon/SPARK-35045. Lead-authored-by: Hyukjin Kwon <[email protected]> Co-authored-by: HyukjinKwon <[email protected]> Signed-off-by: Max Gekk <[email protected]> (cherry picked from commit 1f56215) Signed-off-by: Max Gekk <[email protected]>
1 parent 8494c14 commit 96c9589

1 file changed

Lines changed: 3 additions & 0 deletions

File tree

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,8 @@ class CSVOptions(
211211
}
212212
val lineSeparatorInWrite: Option[String] = lineSeparator
213213

214+
val inputBufferSize: Option[Int] = parameters.get("inputBufferSize").map(_.toInt)
215+
214216
/**
215217
* The handling method to be used when unescaped quotes are found in the input.
216218
*/
@@ -257,6 +259,7 @@ class CSVOptions(
257259
settings.setIgnoreLeadingWhitespaces(ignoreLeadingWhiteSpaceInRead)
258260
settings.setIgnoreTrailingWhitespaces(ignoreTrailingWhiteSpaceInRead)
259261
settings.setReadInputOnSeparateThread(false)
262+
inputBufferSize.foreach(settings.setInputBufferSize)
260263
settings.setMaxColumns(maxColumns)
261264
settings.setNullValue(nullValue)
262265
settings.setEmptyValue(emptyValueInRead)

0 commit comments

Comments
 (0)