Skip to content

Commit ad9f8a0

Browse files
committed
Update DataStreamReader
1 parent c7f31b3 commit ad9f8a0

2 files changed

Lines changed: 14 additions & 37 deletions

File tree

docs/sql-data-sources-parquet.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,12 @@ Data source options of Parquet can be set via:
260260

261261
<table class="table">
262262
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr>
263+
<tr>
264+
<td><code>maxFilesPerTrigger</code></td>
265+
<td>None</td>
266+
<td>Sets the maximum number of new files to be considered in every trigger.</td>
267+
<td>read</td>
268+
</tr>
263269
<tr>
264270
<td><code>datetimeRebaseMode</code></td>
265271
<td>The SQL config <code>spark.sql.parquet</code> <code>.datetimeRebaseModeInRead</code> which is <code>EXCEPTION</code> by default</td>

sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala

Lines changed: 8 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -476,43 +476,14 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
476476
/**
477477
* Loads a Parquet file stream, returning the result as a `DataFrame`.
478478
*
479-
* You can set the following Parquet-specific option(s) for reading Parquet files:
480-
* <ul>
481-
* <li>`maxFilesPerTrigger` (default: no max limit): sets the maximum number of new files to be
482-
* considered in every trigger.</li>
483-
* <li>`mergeSchema` (default is the value specified in `spark.sql.parquet.mergeSchema`): sets
484-
* whether we should merge schemas collected from all
485-
* Parquet part-files. This will override
486-
* `spark.sql.parquet.mergeSchema`.</li>
487-
* <li>`pathGlobFilter`: an optional glob pattern to only include files with paths matching
488-
* the pattern. The syntax follows <code>org.apache.hadoop.fs.GlobFilter</code>.
489-
* It does not change the behavior of partition discovery.</li>
490-
* <li>`recursiveFileLookup`: recursively scan a directory for files. Using this option
491-
* disables partition discovery</li>
492-
* <li>`datetimeRebaseMode` (default is the value specified in the SQL config
493-
* `spark.sql.parquet.datetimeRebaseModeInRead`): the rebasing mode for the values
494-
* of the `DATE`, `TIMESTAMP_MICROS`, `TIMESTAMP_MILLIS` logical types from the Julian to
495-
* Proleptic Gregorian calendar:
496-
* <ul>
497-
* <li>`EXCEPTION` : Spark fails in reads of ancient dates/timestamps that are ambiguous
498-
* between the two calendars</li>
499-
* <li>`CORRECTED` : loading of dates/timestamps without rebasing</li>
500-
* <li>`LEGACY` : perform rebasing of ancient dates/timestamps from the Julian to Proleptic
501-
* Gregorian calendar</li>
502-
* </ul>
503-
* </li>
504-
* <li>`int96RebaseMode` (default is the value specified in the SQL config
505-
* `spark.sql.parquet.int96RebaseModeInRead`): the rebasing mode for `INT96` timestamps
506-
* from the Julian to Proleptic Gregorian calendar:
507-
* <ul>
508-
* <li>`EXCEPTION` : Spark fails in reads of ancient `INT96` timestamps that are ambiguous
509-
* between the two calendars</li>
510-
* <li>`CORRECTED` : loading of timestamps without rebasing</li>
511-
* <li>`LEGACY` : perform rebasing of ancient `INT96` timestamps from the Julian to Proleptic
512-
* Gregorian calendar</li>
513-
* </ul>
514-
* </li>
515-
* </ul>
479+
* Parquet-specific option(s) for reading Parquet file stream can be found in
480+
* <a href=
481+
* "https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option">
482+
* Data Source Option</a>
483+
* and
484+
* <a href=
485+
* "https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html">
486+
* Generic Files Source Options</a> in the version you use.
516487
*
517488
* @since 2.0.0
518489
*/

0 commit comments

Comments
 (0)