-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26141] Enable custom metrics implementation in shuffle write #23106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #99130 has finished for PR 23106 at commit
|
squito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just a minor comment
| taskContext.taskMetrics().incDiskBytesSpilled(writeMetricsToUse.bytesWritten()); | ||
|
|
||
| // This is guaranteed to be a ShuffleWriteMetrics based on the if check in the beginning | ||
| // of this file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found "beginning of this file" to be confusing, I thought you meant the beginning of ShuffleExternalSorter.java, not the spill file. maybe "beginning of this method"
also is the comment above this about SPARK-3577 out of date now that has been fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes. nice catch
|
Test build #99285 has finished for PR 23106 at commit
|
|
Test build #99298 has finished for PR 23106 at commit
|
|
Merging in master. Thanks @squito. |
## What changes were proposed in this pull request? This patch defines an internal Spark interface for reporting shuffle metrics and uses that in shuffle reader. Before this patch, shuffle metrics is tied to a specific implementation (using a thread local temporary data structure and accumulators). After this patch, callers that define their own shuffle RDDs can create a custom metrics implementation. With this patch, we would be able to create a better metrics for the SQL layer, e.g. reporting shuffle metrics in the SQL UI, for each exchange operator. Note that I'm separating read side and write side implementations, as they are very different, to simplify code review. Write side change is at apache#23106 ## How was this patch tested? No behavior change expected, as it is a straightforward refactoring. Updated all existing test cases. Closes apache#23105 from rxin/SPARK-26140. Authored-by: Reynold Xin <[email protected]> Signed-off-by: gatorsmile <[email protected]>
## What changes were proposed in this pull request? This is the write side counterpart to apache#23105 ## How was this patch tested? No behavior change expected, as it is a straightforward refactoring. Updated all existing test cases. Closes apache#23106 from rxin/SPARK-26141. Authored-by: Reynold Xin <[email protected]> Signed-off-by: Reynold Xin <[email protected]>
What changes were proposed in this pull request?
This is the write side counterpart to #23105
How was this patch tested?
No behavior change expected, as it is a straightforward refactoring. Updated all existing test cases.