Skip to content

Conversation

@xuanyuanking
Copy link
Member

@xuanyuanking xuanyuanking commented Aug 1, 2019

What changes were proposed in this pull request?

After SPARK-27677, the shuffle client not only handles the shuffle block but also responsible for local persist RDD blocks. For better code scalability and precise semantics(as the discussion), here we did several changes:

  • Rename ShuffleClient to BlockStoreClient.
  • Correspondingly rename the ExternalShuffleClient to ExternalBlockStoreClient, also change the server-side class from ExternalShuffleBlockHandler to ExternalBlockHandler.
  • Move MesosExternalBlockStoreClient to Mesos package.

Note, we still keep the name of BlockTransferService, because the Service contains both client and server, also the name of BlockTransferService is not referencing shuffle client only.

How was this patch tested?

Existing UT.

@SparkQA
Copy link

SparkQA commented Aug 1, 2019

Test build #108522 has finished for PR 25327 at commit 6e02ff0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

@cloud-fan

val blockTransferService: BlockTransferService,
securityManager: SecurityManager,
externalShuffleClient: Option[ExternalShuffleClient])
externalShuffleClient: Option[ExternalBlockStoreClient])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the variable as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I tried this, but there are plenty of names bind with Shuffle, so in ae64538 I just change this param name and the member var shuffleClient.

@cloud-fan
Copy link
Contributor

the new name loos clearer, also cc @attilapiros @vanzin @tgravescs

@SparkQA
Copy link

SparkQA commented Aug 2, 2019

Test build #108544 has finished for PR 25327 at commit f8d9777.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 2, 2019

Test build #108547 has finished for PR 25327 at commit ae64538.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Aug 2, 2019

Test build #108560 has finished for PR 25327 at commit ae64538.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

I'm ok with renaming, it is more clear. My only concern is changing public api's when maybe not truly needed, although this is 3.0 so the time to do it and I don't think this will impact very many users.

@cloud-fan
Copy link
Contributor

AFAIK the network is an internal module, these interfaces/classes are not even developer APIs. I think it's fine to change them.

Copy link
Contributor

@attilapiros attilapiros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new names are much bette even self explanatory.

But what about going one step further and renaming the ExternalShuffleService to ExternalBlockStoreService?

I know it will generate a lot of documentation changes and a temporary mapping in the mind of the operators running Spark but if we would like to ever fix this then this is the best time to do so.

cc @squito

@xuanyuanking
Copy link
Member Author

Thanks for the review and comments.
@attilapiros Yep, I tried to rename the ExternalShuffleService to ExternalBlockStoreService, compare with the present changes, it's more like an API change and influence for end users. After changing the class to ExternalBlockStoreService, we'll get a dilemma that should we also change the plenty of variable names, related classes and configs name. Maybe we just limit the scope to the developers, better not to the end-users. WDYT?

@cloud-fan
Copy link
Contributor

It takes time to educate the users that external shuffle service is now external block service. We need to update documents, configs, etc. and probably need a few blog posts, presentations, etc.

I don't think it's a good idea to do them with one single PR. I'd like to merge this PR first, as a start of internal renaming.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in db39f45 Aug 5, 2019
@attilapiros
Copy link
Contributor

ok, I see skipping its rename was result of a conscientious decision. That's fine.

@xuanyuanking
Copy link
Member Author

Yep, thanks for the advice and review :)

@xuanyuanking xuanyuanking deleted the SPARK-28593 branch August 6, 2019 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants