-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24722][SQL] pivot() with Column type argument #21699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 6 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
889e922
Adding pivot() which takes Column as its argument
MaxGekk f736ea2
Tests for new function
MaxGekk 5e68226
the since tag is updated
MaxGekk c82c397
Test for nested columns
MaxGekk 7d0d226
Python test for nested columns
MaxGekk 0fdd11f
Adding ticket number to test's title
MaxGekk 74ddcdd
Making diff shorter
MaxGekk 390d832
Adding a function which accepts Column argument
MaxGekk d62b7e7
Adding Java-specific functions
MaxGekk fae4fd2
Tests for column expression
MaxGekk 8ffdc32
Tests for pivot column which refers to multiple other columns
MaxGekk 57c0f64
Test pivoting by constant
MaxGekk f32a85b
Test for pivoting by an aggregate
MaxGekk b9996df
Merge remote-tracking branch 'origin/master' into pivot-column
MaxGekk e76e7ad
Improving comments by referencing to the overloaded methods
MaxGekk 34535a9
Fix expected error message
MaxGekk e869f85
Merge remote-tracking branch 'origin/master' into pivot-column
MaxGekk cf55135
Fix expected message
MaxGekk 5da5a2c
Support multiple values
MaxGekk ca1250b
Revert "Support multiple values"
MaxGekk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -340,36 +340,52 @@ class RelationalGroupedDataset protected[sql]( | |
|
|
||
| /** | ||
| * Pivots a column of the current `DataFrame` and performs the specified aggregation. | ||
| * There are two versions of pivot function: one that requires the caller to specify the list | ||
| * of distinct values to pivot on, and one that does not. The latter is more concise but less | ||
| * efficient, because Spark needs to first compute the list of distinct values internally. | ||
| * | ||
| * {{{ | ||
| * // Compute the sum of earnings for each year by course with each course as a separate column | ||
| * df.groupBy("year").pivot("course", Seq("dotNET", "Java")).sum("earnings") | ||
| * | ||
| * // Or without specifying column values (less efficient) | ||
| * df.groupBy("year").pivot("course").sum("earnings") | ||
| * df.groupBy($"year").pivot($"course", Seq("dotNET", "Java")).sum($"earnings") | ||
| * }}} | ||
| * | ||
| * @param pivotColumn Name of the column to pivot. | ||
| * @param pivotColumn the column to pivot. | ||
| * @param values List of values that will be translated to columns in the output DataFrame. | ||
| * @since 1.6.0 | ||
| * @since 2.4.0 | ||
| */ | ||
| def pivot(pivotColumn: String, values: Seq[Any]): RelationalGroupedDataset = { | ||
| def pivot(pivotColumn: Column, values: Seq[Any]): RelationalGroupedDataset = { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To make diffs smaller, can you move this under the signature |
||
| groupType match { | ||
| case RelationalGroupedDataset.GroupByType => | ||
| new RelationalGroupedDataset( | ||
| df, | ||
| groupingExprs, | ||
| RelationalGroupedDataset.PivotType(df.resolve(pivotColumn), values.map(Literal.apply))) | ||
| RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(Literal.apply))) | ||
| case _: RelationalGroupedDataset.PivotType => | ||
| throw new UnsupportedOperationException("repeated pivots are not supported") | ||
| case _ => | ||
| throw new UnsupportedOperationException("pivot is only supported after a groupBy") | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Pivots a column of the current `DataFrame` and performs the specified aggregation. | ||
| * There are two versions of pivot function: one that requires the caller to specify the list | ||
| * of distinct values to pivot on, and one that does not. The latter is more concise but less | ||
| * efficient, because Spark needs to first compute the list of distinct values internally. | ||
| * | ||
| * {{{ | ||
| * // Compute the sum of earnings for each year by course with each course as a separate column | ||
| * df.groupBy("year").pivot("course", Seq("dotNET", "Java")).sum("earnings") | ||
| * | ||
| * // Or without specifying column values (less efficient) | ||
| * df.groupBy("year").pivot("course").sum("earnings") | ||
| * }}} | ||
| * | ||
| * @param pivotColumn Name of the column to pivot. | ||
| * @param values List of values that will be translated to columns in the output DataFrame. | ||
| * @since 1.6.0 | ||
| */ | ||
| def pivot(pivotColumn: String, values: Seq[Any]): RelationalGroupedDataset = { | ||
| pivot(Column(pivotColumn), values) | ||
| } | ||
|
|
||
| /** | ||
| * (Java-specific) Pivots a column of the current `DataFrame` and performs the specified | ||
| * aggregation. | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we note this in
ColumnAPI too, or note that this is an overloaded version of string's?