-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-38124][SS][FOLLOWUP] Document the current challenge on fixing distribution of stateful operator #35512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -101,6 +101,14 @@ case class ClusteredDistribution( | |
| * Since this distribution relies on [[HashPartitioning]] on the physical partitioning of the | ||
| * stateful operator, only [[HashPartitioning]] (and HashPartitioning in | ||
| * [[PartitioningCollection]]) can satisfy this distribution. | ||
| * | ||
| * NOTE: This is applied only stream-stream join as of now. For other stateful operators, we have | ||
| * been using ClusteredDistribution, which could construct the physical partitioning of the state | ||
| * in different way. (ClusteredDistribution requires relaxed condition and multiple | ||
|
||
| * partitionings can satisfy the requirement.) We need to construct the way to fix this with | ||
| * minimizing possibility to break the existing checkpoints. | ||
| * | ||
| * TODO: SPARK-38204 to address above note. | ||
|
||
| */ | ||
| case class StatefulOpClusteredDistribution( | ||
| expressions: Seq[Expression], | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"applied only to stream-stream join"?