-
Notifications
You must be signed in to change notification settings - Fork 51
[SPARK-25299] Local shuffle implementation of the shuffle writer API #524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…#6) Implement default version of the API that would be used across all shuffle writers, writes to local disk. Shuffle Writes [2/6] [3/6] [5/6]
654cfeb to
cb261e1
Compare
- We always have to make a partition writer regardless if input spill files exist or not. - Initialize executor components in SortShuffleManager - Ensure we're always using the initialized executor components
vanzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see anything that caught my attention, but I kinda skimmed the tests.
core/src/main/java/org/apache/spark/api/shuffle/ShufflePartitionWriter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/spark/api/shuffle/ShufflePartitionWriter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/spark/api/shuffle/ShufflePartitionWriter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala
Outdated
Show resolved
Hide resolved
|
Took a look through the changes since bloomberg#6, lgtm! |
mccheah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'm going to merge this now - we've had some time vetting this back and forth on the Bloomberg fork. Furthermore having access to ShuffleExecutorComponents will allow the reader plugin to be a more complete patch in #523, and it also gives the locations API something to hook into in #517. We discussed in other forums that making the plugin implementation handle Spark's compression and encryption is impossible because of the way BypassMergeSortShuffleWriter is fundamentally built.
I'm going to merge this as is. Incremental improvements can be proposed here and can be added as follow-up patches moving forward. @felixcheung @vanzin @squito for SA.
Thanks @ifilonenko for working on this!
Allows more efficient writing of chunks.
…524) Implements the shuffle writer API by writing shuffle files to local disk and using the index block resolver to commit data and write index files. The logic in `BypassMergeSortShuffleWriter` has been refactored to use the base implementation of the plugin instead. APIs have been slightly renamed to clarify semantics after considering nuances in how these are to be implemented by other developers. Follow-up commits are to come for `SortShuffleWriter` and `UnsafeShuffleWriter`. Ported from bloomberg#6, credits to @ifilonenko.
Implements the shuffle writer API by writing shuffle files to local disk and using the index block resolver to commit data and write index files.
The logic in
BypassMergeSortShuffleWriterhas been refactored to use the base implementation of the plugin instead.APIs have been slightly renamed to clarify semantics after considering nuances in how these are to be implemented by other developers.
Follow-up commits are to come for
SortShuffleWriterandUnsafeShuffleWriter.Ported from bloomberg#6, credits to @ifilonenko.