[SPARK-25299] shuffle reader API #523

yifeih · 2019-03-26T00:11:27Z

No description provided.

core/src/main/java/org/apache/spark/api/shuffle/ShuffleLocationBlocks.java

yifeih · 2019-03-26T00:16:14Z

The thing I found most awkward about this PR is translating things into an API and then translating them immediately back. This happens twice, once for translating form Scala Iterables to Java Interables just to fit the Java API, and then translating it back to Scala to make it compatible with the ShuffleFetchBlockIterator because it's implemented in Java. Another is translating everything into a ShuffleLocationBlocks and then back out because certain things, like BlockId are not part of the public API.

vinooganesh · 2019-03-26T04:01:10Z

core/src/main/java/org/apache/spark/api/shuffle/ShuffleLocationBlocks.java

+  private final ShuffleBlockInfo[] shuffleBlocks;
+  private final Optional<BlockManagerId> shuffleLocation;
+
+  public static final class ShuffleBlockInfo {


looks like a great candidate for a scala case class - does this need to be java code?

The rest of the API is in Java to allow people to write the plugin in either scala or java, so because of that reason, i think it makes more sense to have java here

core/src/main/scala/org/apache/spark/shuffle/io/DefaultShuffleReadSupport.scala

mccheah

Will add more later today

mccheah · 2019-03-26T21:29:44Z

core/src/main/java/org/apache/spark/api/shuffle/ShuffleBlockInfo.java

+
+package org.apache.spark.api.shuffle;
+
+public final class ShuffleBlockInfo {


Nit: Spark doesn't usually put final modifiers.

mccheah · 2019-03-26T21:32:13Z

core/src/test/scala/org/apache/spark/shuffle/BlockStoreShuffleReaderSuite.scala

+    when(mapOutputTracker.getMapSizesByExecutorId(shuffleId, reduceId, reduceId + 1))
+      .thenAnswer(new Answer[Iterator[(BlockManagerId, Seq[(BlockId, Long)])]] {
+        def answer(invocationOnMock: InvocationOnMock):
+        Iterator[(BlockManagerId, Seq[(BlockId, Long)])] = {


4-space indent this from def

core/src/test/scala/org/apache/spark/shuffle/BlockStoreShuffleReaderSuite.scala

mccheah · 2019-03-26T21:33:28Z

core/src/main/scala/org/apache/spark/shuffle/io/DefaultShuffleReadSupport.scala

+    serializerManager: SerializerManager,
+    mapOutputTracker: MapOutputTracker) extends ShuffleReadSupport {
+
+  val maxBytesInFlight = SparkEnv.get.conf.get(config.REDUCER_MAX_SIZE_IN_FLIGHT) * 1024 * 1024


Should these be private?

hmmm yes xD

core/src/main/java/org/apache/spark/api/shuffle/ShuffleReadSupport.java

mccheah · 2019-03-28T01:04:25Z

core/src/main/scala/org/apache/spark/shuffle/io/DefaultShuffleReadSupport.scala

+        shuffleMetrics = TaskContext.get().taskMetrics().createTempShuffleReadMetrics()
+      ).toCompletionIterator
+
+      new ShuffleBlockInputStreamIterator(shuffleBlockFetchIterator).toIterable.asJava


toIterable is scary because it assumes that the single iterator instance can be returned multiple times, when in fact by definition an iterator object can only be traversed once. I think we want to return an Iterable object that creates the ShuffleBlockFetcherIterator when Iterable#iterator() is called.

mccheah · 2019-03-28T01:08:01Z

core/src/main/scala/org/apache/spark/shuffle/io/DefaultShuffleReadSupport.scala

+      new ShuffleBlockInputStreamIterator(Iterator.empty).toIterable.asJava
+    } else {
+      val minReduceId = blockMetadata.asScala.map(block => block.getReduceId).min
+      val maxReduceId = blockMetadata.asScala.map(block => block.getReduceId).max


I wonder if we can derive both the minReduceId and the maxReduceId in a single pass over the blockMetadata. It would probably require a manual search that tracks both the running minimum and the running maximum over the iteration. Since the algorithm wouldn't be hard here I think it's worth doing to save some CPU cycles.

mccheah · 2019-03-28T01:09:19Z

core/src/main/scala/org/apache/spark/shuffle/io/DefaultShuffleReadSupport.scala

+    }
+  }
+
+  private class ShuffleBlockInputStreamIterator(


Do we need the explicit iterator subclass? I'd think we can just call blockFetchIterator.map(_._2).asJava.

Heh scala magic that I am still learning 🙃

mccheah · 2019-03-28T01:10:13Z

core/src/main/scala/org/apache/spark/shuffle/io/DefaultShuffleReadSupport.scala

+    mapOutputTracker: MapOutputTracker) extends ShuffleReadSupport {
+
+  private val maxBytesInFlight =
+    SparkEnv.get.conf.get(config.REDUCER_MAX_SIZE_IN_FLIGHT) * 1024 * 1024


Let's just pass in the SparkConf - we can certainly do that once we have ShuffleExecutorComponents hooked up.

…ated. Important for SQL which always wants a SQL metrics reporter instead.

This reverts commit c149d24.

mccheah · 2019-04-15T17:40:17Z

@squito we have plans to discuss this further offline. I'll just note for now that I posted a proof of concept implementation of the reader APIs here: https://github.com/mccheah/ignite-shuffle-service/pull/1/files. It's largely incomplete and not tested, but hopefully it illustrates some of the API decisions with another implementation.

mccheah · 2019-04-19T02:24:43Z

We decided to introduce an exception API, which will probably just be having FetchFailedException be part of the plugin API as an exception available for the reader to throw. But we'll tackle both that matter and the scheduler changes in a follow-up patch.

@yifeih and I talked and we want to get the shuffle locations API merged first and to have this PR use the locations as well.

mccheah · 2019-04-19T23:22:07Z

core/src/main/java/org/apache/spark/api/shuffle/ShuffleBlockInfo.java

+
+  @Override
+  public int hashCode() {
+    return Objects.hash(shuffleId, mapId, reduceId, length);


Should we hash the location?

mccheah · 2019-04-19T23:22:19Z

core/src/main/java/org/apache/spark/api/shuffle/ShuffleBlockInfo.java

+        && shuffleId == ((ShuffleBlockInfo) other).shuffleId
+        && mapId == ((ShuffleBlockInfo) other).mapId
+        && reduceId == ((ShuffleBlockInfo) other).reduceId
+        && length == ((ShuffleBlockInfo) other).length;


Should we check equality against the location?

mccheah · 2019-04-19T23:23:01Z

core/src/main/java/org/apache/spark/api/shuffle/ShuffleBlockInfo.java

+package org.apache.spark.api.shuffle;
+
+import java.util.Objects;
+import java.util.Optional;


Ah, Optional. I've noticed around Spark they use their own implementation of Optional, pretty sure it's org.apache.spark.api.java.Optional. I think we're expected to use that everywhere?

The JRE's optional is not serializable, which is why the Spark one exists. Not sure whether that matters here.

mccheah · 2019-04-19T23:23:45Z

core/src/main/java/org/apache/spark/api/shuffle/ShuffleLocation.java

 * and writers are expected to cast this down to an implementation-specific representation.
 */
 public interface ShuffleLocation {
+  ShuffleLocation EMPTY_LOCATION = new ShuffleLocation() {};


Why do we need this - can't we pass Optional.empty everywhere?

mccheah · 2019-04-19T23:26:17Z

core/src/main/java/org/apache/spark/api/shuffle/ShuffleReaderIterable.java

+     *
+     * @throws Exception if current block cannot be retried.
+     */
+    default void retryLastBlock(Throwable t) throws Exception {


Should this construct still be part of the API? We talked offline about special casing the default shuffle implementation on this subject since stream corruption shouldn't be a factor in non-Spark-core plugins.

mccheah · 2019-04-19T23:27:24Z

core/src/main/scala/org/apache/spark/MapOutputTracker.scala

-            val shuffleLoc = status.mapShuffleLocations.getLocationForBlock(part)
-            splitsByAddress.getOrElseUpdate(shuffleLoc, ListBuffer()) +=
+            if (status.mapShuffleLocations == null) {
+              splitsByAddress.getOrElseUpdate(ShuffleLocation.EMPTY_LOCATION, ListBuffer()) +=


Can we return a map of Option[ShuffleLocation] as the key? Removes the need for this placeholder.

mccheah · 2019-04-22T19:58:36Z

core/src/main/java/org/apache/spark/api/shuffle/ShuffleReaderInputStream.java

+@Experimental
+public class ShuffleReaderInputStream {
+
+  private final ShuffleBlockInfo shuffleBlockInfo;


I thought about this a bit more, and I don't think we want to be putting the shuffle block info into this part of the API. The reader should just return plain InputStream objects without this wrapper.

We originally included the block info here because in the default implementation, we need to know the length of the block to know if we should be checking for corruption. But this is again only applicable for the default implementation. Putting the block info here would result in it being unused by other shuffle storage plugins.

What I think we want instead is for BlockStoreShuffleReader to check that each returned InputStream is an instance of some class, then have that input stream class have a method that returns which block it's reading and what its length is. Something like that, anyways.

We might want to use pattern matching instead of isInstanceof and asInstanceOf in a lot of cases.

Can we see if we can get the reader API to return Iterable<InputStream>?

Sorry for the slow turnaround. I'm not sure that returning just Iterable<InputStream> would work because the decryption/decompression function, serializerManager.wrapStream must take a BlockId as one of its arguments. Every implementation should have some way of giving that information back to the BlockStoreShuffleReader for the decryption/decompression to be feasibly done outside the plugin.

Ah ok that makes sense, let's keep it this way until we can think of something better. I don't think we necessarily need to use wrapStream directly - we can call wrapForCompression and wrapForEncryption individually, both of which only accepts streams and not requiring block ids. I think block ids are only required to check the type of input stream it is, but you could do that check a layer above? Regardless this way might be fine as is, but look around and see what we can do.

Ah ok yea I guess I could call those two function directly. I think I can also move the corruption detection logic back inside the ShuffleBlockFetcherIterator in that case too.

Yea it looks like other parts of the codebase call those two functions directly, so it seems reasonable

Well now the class is no longer used, we should delete it yeah? =)

Never mind it's gone =D

yifeih · 2019-05-01T00:07:00Z

ah oops thanks @mccheah!

ifilonenko · 2019-05-01T00:50:19Z

Good work @yifeih :)

yifeih added 4 commits March 20, 2019 16:49

initial API

864d1cd

wip

c88751c

wip

9af216f

initial implementation of reader

a35b826

yifeih changed the title ~~[SPARK-25299] [WIP] shuffle reader API~~ [SPARK-25299] shuffle reader API Mar 26, 2019

mccheah reviewed Mar 26, 2019

View reviewed changes

core/src/main/java/org/apache/spark/api/shuffle/ShuffleLocationBlocks.java Outdated Show resolved Hide resolved

vinooganesh reviewed Mar 26, 2019

View reviewed changes

core/src/main/scala/org/apache/spark/shuffle/io/DefaultShuffleReadSupport.scala Outdated Show resolved Hide resolved

vinooganesh reviewed Mar 26, 2019

View reviewed changes

core/src/main/scala/org/apache/spark/shuffle/io/DefaultShuffleReadSupport.scala Outdated Show resolved Hide resolved

yifeih added 2 commits March 26, 2019 11:38

fix based on comments

14c47ae

fix java lang import and delete unneeded class

5bb4c32

mccheah reviewed Mar 26, 2019

View reviewed changes

yifeih added 4 commits March 27, 2019 11:51

address initial comments

584e6c8

fix unit tests

0292fe2

java checkstyle

71c2cc7

fix tests

43c377c

ifilonenko reviewed Mar 27, 2019

View reviewed changes

core/src/main/java/org/apache/spark/api/shuffle/ShuffleReadSupport.java Show resolved Hide resolved

mccheah reviewed Mar 28, 2019

View reviewed changes

yifeih and others added 8 commits March 27, 2019 20:56

address some comments

9fc6a60

blah

45172a5

address more comments

4e5652b

Use decorators to customize how the read metrics reporter is instanti…

a35d8fe

…ated. Important for SQL which always wants a SQL metrics reporter instead.

blah

1a09ebe

initial tests

c149d24

Revert "initial tests"

672d473

This reverts commit c149d24.

initial impl

e0a3289

mccheah mentioned this pull request Apr 15, 2019

[WIP] Implement newer APIs for shuffle read and write mccheah/ignite-shuffle-service#1

Closed

yifeih added 3 commits April 19, 2019 14:21

Merge branch 'spark-25299' into yh/reader-api

2758a5c

resolve conflicts

bd349ca

style

653f67c

mccheah reviewed Apr 19, 2019

View reviewed changes

yifeih added 7 commits April 19, 2019 16:56

address some comments

9f53839

style

94275fd

refactor API

26e97c1

cleanup

91db776

fix tests and style

f0fa7b8

style

50c8fc3

reorder result for test?

4aa4b6e

mccheah reviewed Apr 22, 2019

View reviewed changes

yifeih and others added 5 commits April 26, 2019 16:18

wip

7d23f47

address comments

363d4ab

style

bb7fa4c

cleanup tests

711109b

Remove unused class

04a135c

mccheah approved these changes Apr 30, 2019

View reviewed changes

bulldozer-bot bot merged commit b35d238 into spark-25299 Apr 30, 2019

bulldozer-bot bot deleted the yh/reader-api branch April 30, 2019 21:07


		package org.apache.spark.api.shuffle;

		public final class ShuffleBlockInfo {

[SPARK-25299] shuffle reader API #523

[SPARK-25299] shuffle reader API #523

Uh oh!

Conversation

yifeih commented Mar 26, 2019

Uh oh!

Uh oh!

yifeih commented Mar 26, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mccheah left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mccheah commented Apr 15, 2019

Uh oh!

mccheah commented Apr 19, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yifeih commented May 1, 2019

Uh oh!

ifilonenko commented May 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development