Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Nov 21, 2018

What changes were proposed in this pull request?

This is the write side counterpart to #23105

How was this patch tested?

No behavior change expected, as it is a straightforward refactoring. Updated all existing test cases.

@rxin rxin changed the title [SPARK-26141] Enable custom shuffle metrics implementation in shuffle write [SPARK-26141] Enable custom metrics implementation in shuffle write Nov 21, 2018
@SparkQA
Copy link

SparkQA commented Nov 21, 2018

Test build #99130 has finished for PR 23106 at commit 115bd8b.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@squito squito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just a minor comment

taskContext.taskMetrics().incDiskBytesSpilled(writeMetricsToUse.bytesWritten());

// This is guaranteed to be a ShuffleWriteMetrics based on the if check in the beginning
// of this file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found "beginning of this file" to be confusing, I thought you meant the beginning of ShuffleExternalSorter.java, not the spill file. maybe "beginning of this method"

also is the comment above this about SPARK-3577 out of date now that has been fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes. nice catch

@SparkQA
Copy link

SparkQA commented Nov 26, 2018

Test build #99285 has finished for PR 23106 at commit 35e0c1a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 27, 2018

Test build #99298 has finished for PR 23106 at commit 4e9b756.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Nov 27, 2018

Merging in master. Thanks @squito.

@asfgit asfgit closed this in 6a064ba Nov 27, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?
This patch defines an internal Spark interface for reporting shuffle metrics and uses that in shuffle reader. Before this patch, shuffle metrics is tied to a specific implementation (using a thread local temporary data structure and accumulators). After this patch, callers that define their own shuffle RDDs can create a custom metrics implementation.

With this patch, we would be able to create a better metrics for the SQL layer, e.g. reporting shuffle metrics in the SQL UI, for each exchange operator.

Note that I'm separating read side and write side implementations, as they are very different, to simplify code review. Write side change is at apache#23106

## How was this patch tested?
No behavior change expected, as it is a straightforward refactoring. Updated all existing test cases.

Closes apache#23105 from rxin/SPARK-26140.

Authored-by: Reynold Xin <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?
This is the write side counterpart to apache#23105

## How was this patch tested?
No behavior change expected, as it is a straightforward refactoring. Updated all existing test cases.

Closes apache#23106 from rxin/SPARK-26141.

Authored-by: Reynold Xin <[email protected]>
Signed-off-by: Reynold Xin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants