perf: Add experimental feature to replace SortMergeJoin with ShuffledHashJoin #1007

andygrove · 2024-10-09T17:38:07Z

Which issue does this PR close?

Closes #1006

Rationale for this change

Improved performance

What changes are included in this PR?

Add new config option to replace SMJ with SHJ

How are these changes tested?

I manually ran TPC-H and saw improved performance. I will post benchmarks once I have run more tests.

andygrove · 2024-10-09T18:04:35Z

spark/src/test/resources/tpcds-plan-stability/approved-plans-v1_4-spark3_5/q16/simplified.txt

+                                    CometFilter [ca_address_sk,ca_state]
+                                      CometScan parquet spark_catalog.default.customer_address [ca_address_sk,ca_state]
+                    InputAdapter
+                      BroadcastExchange #7


This is a regression that I am looking into (falling back to Spark for BroadcastHashJoin)

andygrove · 2024-10-09T18:37:32Z

Here is a teaser for the performance improvement. This is for TPC-H q11 (SF=100) with broadcast joins disabled (I am looking into a regression with those). I ran the query 5 times each with rule enabled vs disabled.

Rule Off

79.87537693977356,
77.76734256744385,
75.35734295845032,
75.44863200187683,
72.88174152374268

Rule On

39.33945274353027,
36.159271240234375,
35.83299708366394,
35.638232707977295,
35.67777371406555

parthchandra · 2024-10-09T20:05:41Z

There is a small danger in enabling this without having a good estimate of the size of the build side. ShuffleHashJoin has limits on how much data it can process efficiently. If the build side hash table has no spilling then a large enough build side will cause OOMs and if there is spilling, then SMJ can frequently lead to better performance. We might even see this when we scale the benchmark from SF1 to say SF10.
Is there a way for us to get cardinality and row size for the build side somehow?
Still worth adding this option though.

parthchandra · 2024-10-09T20:07:04Z

if there is spilling, then SMJ can frequently lead to better performance
I have seen this happen with Spark with some TPC-DS queries at SF10.

viirya · 2024-10-09T21:11:10Z

common/src/main/scala/org/apache/comet/CometConf.scala

+    conf(s"$COMET_EXEC_CONFIG_PREFIX.replaceSortMergeJoin")
+      .doc("Whether to replace SortMergeJoin with ShuffledHashJoin for improved performance.")
+      .booleanConf
+      .createWithDefault(true)


I think we should have a default value as false for stablility. Spark decides to use SMJ for some reasons including data statistics. If Spark thinks SHJ may not work, I think we better follow it except for explicitly asking by users.

The other accelerators (Spark RAPIDS and Gluten) default this to true. Perhaps we should benchmark at large scale factors before and see if we run into any issues?

I guess it is okay for the benchmark datasets like TPCDS or TPCH. The cases I worry about is the production ones. But it might be more internal cases.

For OSS, maybe enabling it by default is okay.

At least, we should add some more descriptions here to mention the risk.

Perhaps we should benchmark at large scale factors before and see if we run into any issues?

Agreed. (Also, when I wrote SF1 and SF10 I meant 1TB, and 10TB which is really SF 1000 and SF 10000).

For this PR, I disabled the feature by default. I created the following PR to enable it by default and update the tests. I will add documentation as part of this PR.

#1008

Here is a new follow on issue for enabling by default:

#1011

andygrove · 2024-10-09T21:14:28Z

Current benchmarks:

Speedup of using HashJoin instead of SortMergeJoin:

codecov-commenter · 2024-10-09T23:14:12Z

Codecov Report

❌ Patch coverage is 19.44444% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.27%. Comparing base (e3ac6cf) to head (1073517).
⚠️ Report is 748 commits behind head on main.

Files with missing lines	Patch %	Lines
...ain/scala/org/apache/comet/rules/RewriteJoin.scala	0.00%	26 Missing ⚠️
...org/apache/comet/CometSparkSessionExtensions.scala	40.00%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1007      +/-   ##
============================================
- Coverage     34.41%   34.27%   -0.14%     
+ Complexity      886      881       -5     
============================================
  Files           112      113       +1     
  Lines         43479    43514      +35     
  Branches       9656     9663       +7     
============================================
- Hits          14962    14914      -48     
- Misses        25442    25510      +68     
- Partials       3075     3090      +15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andygrove · 2024-10-10T15:37:54Z

I will add documentation to this PR today, explaining pros/cons of this feature in our tuning guide.

andygrove · 2024-10-10T16:18:57Z

kube/Dockerfile


 # note the use of a wildcard in the file name so that this works with both snapshot and final release versions
-COPY --from=builder  /comet/spark/target/comet-spark-spark${SPARK_VERSION}_$SCALA_VERSION-0.2.0*.jar $SPARK_HOME/jars
+COPY --from=builder  /comet/spark/target/comet-spark-spark${SPARK_VERSION}_$SCALA_VERSION-*.jar $SPARK_HOME/jars


unrelated, but ran into this hard-coded version number during testing

andygrove · 2024-10-10T16:20:12Z

@viirya @parthchandra This is now ready for review. The new option is disabled by default and I added a section to the tuning guide explaining why users may want to enable this new option.

parthchandra

lgtm.

andygrove · 2024-10-11T01:52:11Z

I have run into a deadlock when running TPC-DS benchmarks with this feature, so I am moving to draft while I investigate. It is possibly related to the memory pool issues that we are also working on in other PRs.

andygrove · 2024-10-15T14:52:44Z

After upmerging, I no longer see the deadlock, but instead get an error if I have insufficient memory allocated, which is an improvement.

org.apache.comet.CometNativeException (External error: Internal error: 

Partition is still not able to allocate enough memory for the array builders after spilling..

However, when I increase memory, I see queries fail due to #1019.

…apache#988)" This reverts commit e146cfa.

andygrove · 2024-10-18T20:56:13Z

I have now marked the feature as experimental and explained in the tuning guide that there is no spill to disk so this could result in OOM.

andygrove · 2024-10-19T18:19:47Z

Fresh benchmarks after upmerging.

andygrove · 2024-10-20T00:03:00Z

TPC-DS excluding q97 (OOM with ShuffledHashJoin).

jaceklaskowski · 2024-10-20T19:41:40Z

common/src/main/scala/org/apache/comet/CometConf.scala

+  val COMET_REPLACE_SMJ: ConfigEntry[Boolean] =
+    conf(s"$COMET_EXEC_CONFIG_PREFIX.replaceSortMergeJoin")
+      .doc("Experimental feature to force Spark to replace SortMergeJoin with ShuffledHashJoin " +
+        "for improved performance. See tuning guide for more information regarding stability of " +


Can we add a link to the tuning guide?

Good point. Updated. Thanks @jaceklaskowski

jaceklaskowski · 2024-10-20T19:42:42Z

docs/source/user-guide/configs.md

 | spark.comet.exec.localLimit.enabled | Whether to enable localLimit by default. | true |
 | spark.comet.exec.memoryFraction | The fraction of memory from Comet memory overhead that the native memory manager can use for execution. The purpose of this config is to set aside memory for untracked data structures, as well as imprecise size estimation during memory acquisition. Default value is 0.7. | 0.7 |
 | spark.comet.exec.project.enabled | Whether to enable project by default. | true |
+| spark.comet.exec.replaceSortMergeJoin | Experimental feature to force Spark to replace SortMergeJoin with ShuffledHashJoin for improved performance. See tuning guide for more information regarding stability of this feature. | false |


Can we add a link to the tuning guide? 🙏

## Which issue does this PR close?  Closes #. ## Rationale for this change  ## What changes are included in this PR?  ``` cb3e977 perf: Add experimental feature to replace SortMergeJoin with ShuffledHashJoin (apache#1007) 3df9d5c fix: Make comet-git-info.properties optional (apache#1027) 4033687 chore: Reserve memory for native shuffle writer per partition (apache#1022) bd541d6 (public/main) remove hard-coded version number from Dockerfile (apache#1025) e3ac6cf feat: Implement bloom_filter_agg (apache#987) 8d097d5 (origin/main) chore: Revert "chore: Reserve memory for native shuffle writer per partition (apache#988)" (apache#1020) 591f45a chore: Bump arrow-rs to 53.1.0 and datafusion (apache#1001) e146cfa chore: Reserve memory for native shuffle writer per partition (apache#988) abd9f85 fix: Fallback to Spark if named_struct contains duplicate field names (apache#1016) 22613e9 remove legacy comet-spark-shell (apache#1013) d40c802 clarify that Maven central only has jars for Linux (apache#1009) 837c256 docs: Various documentation improvements (apache#1005) 0667c60 chore: Make parquet reader options Comet options instead of Hadoop options (apache#968) 0028f1e fix: Fallback to Spark if scan has meta columns (apache#997) b131cc3 feat: Support `GetArrayStructFields` expression (apache#993) 3413397 docs: Update tuning guide (apache#995) afd28b9 Quality of life fixes for easier hacking (apache#982) 18150fb chore: Don't transform the HashAggregate to CometHashAggregate if Comet shuffle is disabled (apache#991) a1599e2 chore: Update for 0.3.0 release, prepare for 0.4.0 development (apache#970) ``` ## How are these changes tested?

andygrove added 7 commits October 9, 2024 09:34

experiment

a1d04f5

fix and add credit

598735e

disable by default and make internal

d55f2ea

remove sort

e3313bd

minor optimization

a0d1381

minor optimization

948f2c0

remove unused import

fd87412

andygrove marked this pull request as draft October 9, 2024 18:01

andygrove commented Oct 9, 2024

View reviewed changes

viirya reviewed Oct 9, 2024

View reviewed changes

andygrove force-pushed the replace-smj branch from c551f6c to fd87412 Compare October 9, 2024 22:15

disable feature by default

99eca10

andygrove mentioned this pull request Oct 9, 2024

perf: Enable replaceSortMergeJoin by default #1008

Closed

fix dockerfile

1d5b58d

andygrove marked this pull request as ready for review October 10, 2024 03:06

andygrove added 2 commits October 10, 2024 10:17

Add section to tuning guide

1a5de4e

update benchmarking guide

7cce6a5

andygrove commented Oct 10, 2024

View reviewed changes

parthchandra approved these changes Oct 10, 2024

View reviewed changes

andygrove mentioned this pull request Oct 10, 2024

[Research] Use custom cost model when deciding between SMJ and SHJ #1011

Open

andygrove marked this pull request as draft October 11, 2024 01:50

Merge remote-tracking branch 'apache/main' into replace-smj

8f5d440

andygrove added 2 commits October 15, 2024 09:04

Revert "chore: Reserve memory for native shuffle writer per partition (…

7ce8726

…apache#988)" This reverts commit e146cfa.

mark feature as experimental and explain risks

26f9a4f

andygrove marked this pull request as ready for review October 18, 2024 20:55

upmerge

27a02af

workaround for TPC-DS q14 hanging on a RightSemi join

63ce71c

andygrove added 3 commits October 20, 2024 10:02

revert a change

60d1028

remove debug logging:

662d0de

format

6ed01c1

jaceklaskowski reviewed Oct 20, 2024

View reviewed changes

add link to tuning guide

1073517

andygrove requested review from huaxingao, kazuyukitanimura and viirya October 21, 2024 13:58

andygrove changed the title ~~perf: Add option to replace SortMergeJoin with ShuffledHashJoin~~ perf: Add experimental feature to replace SortMergeJoin with ShuffledHashJoin Oct 21, 2024

kazuyukitanimura approved these changes Oct 21, 2024

View reviewed changes

viirya approved these changes Oct 21, 2024

View reviewed changes

andygrove merged commit cb3e977 into apache:main Oct 21, 2024

andygrove deleted the replace-smj branch October 21, 2024 19:43

perf: Add experimental feature to replace SortMergeJoin with ShuffledHashJoin #1007

perf: Add experimental feature to replace SortMergeJoin with ShuffledHashJoin #1007

Uh oh!

Conversation

andygrove commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

andygrove Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andygrove commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rule Off

Rule On

Uh oh!

parthchandra commented Oct 9, 2024

Uh oh!

parthchandra commented Oct 9, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andygrove commented Oct 9, 2024

Uh oh!

codecov-commenter commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andygrove commented Oct 10, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andygrove commented Oct 10, 2024

Uh oh!

parthchandra left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove commented Oct 11, 2024

Uh oh!

andygrove commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andygrove commented Oct 18, 2024

Uh oh!

andygrove commented Oct 19, 2024

Uh oh!

andygrove commented Oct 20, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

andygrove commented Oct 9, 2024 •

edited

Loading

andygrove Oct 9, 2024 •

edited

Loading

andygrove commented Oct 9, 2024 •

edited

Loading

viirya Oct 9, 2024 •

edited

Loading

codecov-commenter commented Oct 9, 2024 •

edited

Loading

andygrove commented Oct 15, 2024 •

edited

Loading