-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26024][SQL]: Update documentation for repartitionByRange #23025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
b47b6d0
5a50282
5bce520
654fed9
f829dfe
7ca4821
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2789,6 +2789,12 @@ class Dataset[T] private[sql]( | |
| * When no explicit sort order is specified, "ascending nulls first" is assumed. | ||
| * Note, the rows are not sorted in each partition of the resulting Dataset. | ||
| * | ||
| * | ||
| * Note that due to performance reasons this method uses sampling to estimate the ranges. | ||
| * Hence, the output may not be consistent, since sampling can return different values. | ||
| * The sample size can be controlled by setting the value of the parameter | ||
| * `spark.sql.execution.rangeExchange.sampleSizePerPartition`. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not a parameter but a config. So I'd like to propose
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @cloud-fan the sentence has been changed according to your suggestion (in both Spark & PySpark). |
||
| * | ||
| * @group typedrel | ||
| * @since 2.3.0 | ||
| */ | ||
|
|
@@ -2813,6 +2819,11 @@ class Dataset[T] private[sql]( | |
| * When no explicit sort order is specified, "ascending nulls first" is assumed. | ||
| * Note, the rows are not sorted in each partition of the resulting Dataset. | ||
| * | ||
| * Note that due to performance reasons this method uses sampling to estimate the ranges. | ||
| * Hence, the output may not be consistent, since sampling can return different values. | ||
| * The sample size can be controlled by setting the value of the parameter | ||
| * `spark.sql.execution.rangeExchange.sampleSizePerPartition`. | ||
| * | ||
| * @group typedrel | ||
| * @since 2.3.0 | ||
| */ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides Python, we also have
repartitionByRangeAPI in R. Can you also update it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, I missed it! Pushed.