Skip to content

Conversation

@mccheah
Copy link

@mccheah mccheah commented Apr 19, 2019

We originally made the shuffle map output writer API behave like an iterator in fetching the "next" partition writer. However, the shuffle writer implementations tend to skip opening empty partitions. If we used an iterator-like API though we would be tied down to opening a partition writer for every single partition, even if some of them are empty. Here, we go back to using specific partition identifiers to give us more freedom to avoid needing to create writers for empty partitions.

@mccheah
Copy link
Author

mccheah commented Apr 19, 2019

@yifeih @ifilonenko

Copy link

@yifeih yifeih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bulldozer-bot bulldozer-bot bot merged commit b8e255b into spark-25299 Apr 19, 2019
@bulldozer-bot bulldozer-bot bot deleted the shuffle-writer-specific-partition-ids branch April 19, 2019 22:09
@ifilonenko
Copy link

Late LGTM

mccheah added a commit that referenced this pull request Jun 27, 2019
…tion ids (#540)

We originally made the shuffle map output writer API behave like an iterator in fetching the "next" partition writer. However, the shuffle writer implementations tend to skip opening empty partitions. If we used an iterator-like API though we would be tied down to opening a partition writer for every single partition, even if some of them are empty. Here, we go back to using specific partition identifiers to give us more freedom to avoid needing to create writers for empty partitions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants