[GLUTEN-2051] Make ColumnarBatchSerializer supports relocation so that continuous shuffle block fetching can be enabled #2052

WangGuangxin · 2023-06-23T14:33:18Z

What changes were proposed in this pull request?

Spark supports fetch the contiguous shuffle blocks in batch, which is enabled by default (by conf spark.sql.adaptive.fetchShuffleBlocksInBatch). This feature has a big performance improvement in our production.

However, currently, since ColumnarBatchSerializer's supportsRelocationOfSerializedObjects return false, so that this feature cann't take effect.
In fact, the arrow serialization does support reloation if we don't write schema (which is default to true) and don't write EOS (which is an optional in arrow rpc serialization format)

https://wesm.github.io/arrow-site-test/format/IPC.html#streaming-format

(Fixes: #2051)

How was this patch tested?

Manually test using our internal query

github-actions · 2023-06-23T14:33:35Z

Run Gluten Clickhouse CI

github-actions · 2023-06-23T14:33:39Z

#2051

github-actions · 2023-06-23T14:37:34Z

Run Gluten Clickhouse CI

WangGuangxin · 2023-06-23T14:37:49Z

@zhztheplayer @zhouyuan Can you please help review this?

zhouyuan · 2023-06-25T03:19:20Z

@WangGuangxin good finding! Curious how the performance gain look like? The marker is 4 bytes only if I i understand this correctly

-yuan

WangGuangxin · 2023-06-25T04:58:16Z

spark.sql.adaptive.fetchShuffleBlocksInBatch

@zhouyuan Hi, the benefits are not comes from the 4bytes savings, but make the Spark AQE feature "batch fetch shuffle blocks" take effect.
You can refer the feature by the initial commit apache/spark#26040

In short, when Spark AQE do coalesce partitions, it can fetch continuous blocks in batch, instead of fetch blocks one by one. But it has some preconditions, one is the serialization of shuffle must support the so called relocation https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala#L48C48-L48C85.

Currently, the ColumnarBatchSerializer doesn't support relocation. In fact, We can make some simple modifications to make it support relocation by not writing EOS after each shuffle blocks.

zhouyuan · 2023-06-25T05:13:56Z

spark.sql.adaptive.fetchShuffleBlocksInBatch

@zhouyuan Hi, the benefits are not comes from the 4bytes savings, but make the Spark AQE feature "batch fetch shuffle blocks" take effect. You can refer the feature by the initial commit apache/spark#26040

In short, when Spark AQE do coalesce partitions, it can fetch continuous blocks in batch, instead of fetch blocks one by one. But it has some preconditions, one is the serialization of shuffle must support the so called relocation https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala#L48C48-L48C85.

Currently, the ColumnarBatchSerializer doesn't support relocation. In fact, We can make some simple modifications to make it support relocation by not writing EOS after each shuffle blocks.

@WangGuangxin thanks for detailed explanation, got the idea now

zhouyuan

👍

CLTFOREVER · 2023-07-10T02:44:38Z

@WangGuangxin @zhouyuan we have meet the same problem，if the pr is ready，can we merge it？

github-actions · 2023-07-10T13:01:29Z

Run Gluten Clickhouse CI

github-actions · 2023-07-11T02:32:40Z

Run Gluten Clickhouse CI

github-actions · 2023-07-11T02:32:43Z

Run Gluten Clickhouse CI

github-actions · 2023-07-11T11:55:26Z

Run Gluten Clickhouse CI

…huffle block fetching can be enabled

github-actions · 2023-07-21T01:54:59Z

Run Gluten Clickhouse CI

Signed-off-by: Yuan Zhou <[email protected]>

github-actions · 2023-07-21T02:11:45Z

Run Gluten Clickhouse CI

zhouyuan

👍 works for me

WangGuangxin force-pushed the support_relocation branch from acc088e to 6ef2b73 Compare June 23, 2023 14:37

zhouyuan previously approved these changes Jun 25, 2023

View reviewed changes

zhouyuan dismissed their stale review via 99ff551 July 10, 2023 13:01

zhouyuan force-pushed the support_relocation branch from 6ef2b73 to 99ff551 Compare July 10, 2023 13:01

WangGuangxin force-pushed the support_relocation branch from 6ececfe to fd0736b Compare July 11, 2023 11:55

Make ColumnarBatchSerializer supports relocation so that continuous s…

4ebd167

…huffle block fetching can be enabled

zhouyuan force-pushed the support_relocation branch from fd0736b to 4ebd167 Compare July 21, 2023 01:54

fix for prefer cache code path in shuffle

fc2f101

Signed-off-by: Yuan Zhou <[email protected]>

zhouyuan approved these changes Jul 21, 2023

View reviewed changes

PHILO-HE approved these changes Jul 21, 2023

View reviewed changes

PHILO-HE merged commit 55103a3 into apache:main Jul 21, 2023

[GLUTEN-2051] Make ColumnarBatchSerializer supports relocation so that continuous shuffle block fetching can be enabled #2052

[GLUTEN-2051] Make ColumnarBatchSerializer supports relocation so that continuous shuffle block fetching can be enabled #2052

Uh oh!

Conversation

WangGuangxin commented Jun 23, 2023

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

github-actions bot commented Jun 23, 2023

Uh oh!

github-actions bot commented Jun 23, 2023

Uh oh!

github-actions bot commented Jun 23, 2023

Uh oh!

WangGuangxin commented Jun 23, 2023

Uh oh!

zhouyuan commented Jun 25, 2023

Uh oh!

WangGuangxin commented Jun 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhouyuan commented Jun 25, 2023

Uh oh!

zhouyuan left a comment

Choose a reason for hiding this comment

Uh oh!

CLTFOREVER commented Jul 10, 2023

Uh oh!

github-actions bot commented Jul 10, 2023

Uh oh!

github-actions bot commented Jul 11, 2023

Uh oh!

github-actions bot commented Jul 11, 2023

Uh oh!

github-actions bot commented Jul 11, 2023

Uh oh!

github-actions bot commented Jul 21, 2023

Uh oh!

github-actions bot commented Jul 21, 2023

Uh oh!

zhouyuan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

WangGuangxin commented Jun 25, 2023 •

edited

Loading