-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25850][SQL] Make the split threshold for the code generated function configurable #22847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…thod configurable
|
@cloud-fan @dongjoon-hyun @gengliangwang Kindly help review. |
| .internal() | ||
| .doc("The maximum source code length of a single Java function by codegen. When the " + | ||
| "generated Java function source code exceeds this threshold, it will be split into " + | ||
| "multiple small functions, each function length is spark.sql.codegen.methodSplitThreshold." + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each function length is spark.sql.codegen.methodSplitThreshold this is not true, the method size is always larger than the threshold. cc @kiszk any idea about the naming and description of this config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, spark.sql.codegen.methodSplitThreshold can be used, but the description should be changed. For example,
The threshold of source code length without comment of a single Java function by codegen to be split. When the generated Java function source code exceeds this threshold, it will be split into multiple small functions. ...
|
Test build #98080 has finished for PR 22847 at commit
|
| "generated Java function source code exceeds this threshold, it will be split into " + | ||
| "multiple small functions, each function length is spark.sql.codegen.methodSplitThreshold." + | ||
| " A function's bytecode should not go beyond 8KB, otherwise it will not be JITted, should " + | ||
| "also not be too small, or we will have many function calls. We can't know how many " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will not be JITted; it also should not not be too small, otherwise there will be many function calls.
|
Test build #98129 has finished for PR 22847 at commit
|
|
Test build #98130 has finished for PR 22847 at commit
|
|
Test build #98132 has finished for PR 22847 at commit
|
| "be generated, so use the code length as metric. A function's bytecode should not go " + | ||
| "beyond 8KB, otherwise it will not be JITted; it also should not be too small, otherwise " + | ||
| "there will be many function calls.") | ||
| .intConf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the description, it seems that we had better have checkValue here. Could you recommend the reasonable min/max values, @kiszk ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more accurately, I think I should add When running on HotSpot, a function's bytecode should not go beyond 8KB....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. That could be max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1000 is conservative. But, there is no recommendation since the bytecode size depends on the content (e.g. 0's byte code length is 1. 9's byte code lengh is 2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kiszk agree, 1000 might be not the best, see my benchmark for the wide table, 2048 is better.
================================================================================================
projection on wide table
================================================================================================
Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
projection on wide table: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
split threshold 10 8464 / 8737 0.1 8072.0 1.0X
split threshold 100 5959 / 6251 0.2 5683.4 1.4X
split threshold 1024 3202 / 3248 0.3 3053.2 2.6X
split threshold 2048 3009 / 3097 0.3 2869.2 2.8X
split threshold 4096 3414 / 3458 0.3 3256.1 2.5X
split threshold 8196 4095 / 4112 0.3 3905.5 2.1X
split threshold 65536 28800 / 29705 0.0 27465.8 0.3X
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you try some very long alias names or complex expressions? You will get different number, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like long alias names have no influence.
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6
[info] Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
[info] projection on wide table: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------
[info] split threshold 10 6512 / 6736 0.2 6210.4 1.0X
[info] split threshold 100 5730 / 6329 0.2 5464.9 1.1X
[info] split threshold 1024 3119 / 3184 0.3 2974.6 2.1X
[info] split threshold 2048 2981 / 3100 0.4 2842.9 2.2X
[info] split threshold 4096 3289 / 3379 0.3 3136.6 2.0X
[info] split threshold 8196 4307 / 4338 0.2 4108.0 1.5X
[info] split threshold 65536 29147 / 30212 0.0 27797.0 0.2X
No averylongprefixrepeatedmultipletimes in the expression code gen:
/* 047 */ private void createExternalRow_0_8(InternalRow i, Object[] values_0) {
/* 048 */
/* 049 */ // input[80, bigint, false]
/* 050 */ long value_81 = i.getLong(80);
/* 051 */ if (false) {
/* 052 */ values_0[80] = null;
/* 053 */ } else {
/* 054 */ values_0[80] = value_81;
/* 055 */ }
/* 056 */
/* 057 */ // input[81, bigint, false]
/* 058 */ long value_82 = i.getLong(81);
/* 059 */ if (false) {
/* 060 */ values_0[81] = null;
/* 061 */ } else {
/* 062 */ values_0[81] = value_82;
/* 063 */ }
/* 064 */
/* 065 */ // input[82, bigint, false]
/* 066 */ long value_83 = i.getLong(82);
/* 067 */ if (false) {
/* 068 */ values_0[82] = null;
/* 069 */ } else {
/* 070 */ values_0[82] = value_83;
/* 071 */ }
/* 072 */
...
My benchmark:
object WideTableBenchmark extends SqlBasedBenchmark {
override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
runBenchmark("projection on wide table") {
val N = 1 << 20
val df = spark.range(N)
val columns = (0 until 400).map{ i => s"id as averylongprefixrepeatedmultipletimes_id$i"}
val benchmark = new Benchmark("projection on wide table", N, output = output)
Seq("10", "100", "1024", "2048", "4096", "8196", "65536").foreach { n =>
benchmark.addCase(s"split threshold $n", numIters = 5) { iter =>
withSQLConf("spark.testing.codegen.splitThreshold" -> n) {
df.selectExpr(columns: _*).foreach(identity(_))
}
}
}
benchmark.run()
}
}
}
Will keep benchmarking for the complex expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "freshNamePrefix" prefix is only applied in whole-stage codegen,
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L87
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L169
It doesn't take any effect in non-whole-stage codegen.
If you intend to stress test expression codegen but don't see the prefix being prepended, you're probably not adding it in the right place. Where did you add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, you're using the column name...that's not the right place to put the "prefix". Column names are almost never carried over to the generated code in the current framework (the only exception is the lambda variable name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rednaxelafx the wide table benchmark I used has 400 columns, whole stage codegen is disabled by default.
| .internal() | ||
| .doc("The threshold of source code length without comment of a single Java function by " + | ||
| "codegen to be split. When the generated Java function source code exceeds this threshold" + | ||
| ", it will be split into multiple small functions. We can't know how many bytecode will " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal but I would avoid abbreviation in documentation. can't -> cannot
|
Test build #98182 has finished for PR 22847 at commit
|
|
retest this please |
|
Test build #98190 has finished for PR 22847 at commit
|
| "bytecode should not go beyond 8KB, otherwise it will not be JITted; it also should not " + | ||
| "be too small, otherwise there will be many function calls.") | ||
| .intConf | ||
| .createWithDefault(1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add a check value to make sure the value is positive. We can figure out a lower and upper bound later.
|
@cloud-fan @dongjoon-hyun @kiszk I just add a negative check, maybe we need another PR to figure out better value later if it is not easy to decide now. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! For me, it looks enough to proceed for now.
|
Test build #98288 has finished for PR 22847 at commit
|
| .internal() | ||
| .doc("The threshold of source code length without comment of a single Java function by " + | ||
| "codegen to be split. When the generated Java function source code exceeds this threshold" + | ||
| ", it will be split into multiple small functions. We cannot know how many bytecode will " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The threshold of source-code splitting in the codegen. When the number of characters in a single JAVA function (without comment) exceeds the threshold, the function will be automatically split to multiple smaller ones.
| val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold") | ||
| .internal() | ||
| .doc("The threshold of source-code splitting in the codegen. When the number of characters " + | ||
| "in a single JAVA function (without comment) exceeds the threshold, the function will be " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: JAVA -> Java
|
Test build #98295 has finished for PR 22847 at commit
|
|
retest this please |
|
Test build #98298 has finished for PR 22847 at commit
|
|
Test build #98301 has finished for PR 22847 at commit
|
|
Using source code length will have to be a very coarse-grained, "fuzzy" heuristic. It's not meant to be accurate. So just pick some number that makes sense. Tidbit: before this PR, since the Side notes for those curious: Performance-wise, it might be very interesting to note that, running on HotSpot, sometimes it might be better to split into more smaller methods, and sometimes it might be better to split into less but larger methods. The theory is this: if a method is really hot, we'd like it to be eventually inlined by the JIT to reduce method invocation overhead and get better optimization. In HotSpot, hot methods will eventually be compiled by C2, and it uses multiple bytecode size based threshold to determine whether or not a callee is fit for inlining. (values shown above are default values for C2 on x86.) Don't be confused by the name, though. So the deal is: on HotSpot, if you want to save bytecode code size by
So, if a method is compute-intensive and is small enough, it'd better to strive to keep the split-off methods within the thresholds mentioned above. |
|
Can / should we apply the same threshold conf to |
|
@rednaxelafx ah good point! It's hardcoded as 1024 too, and it's also doing method splitting. Let's apply the config there too. |
|
Just in case people wonder, the following is the hack patch that I used for stress testing code splitting before this PR: --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
@@ -647,11 +647,13 @@ class CodegenContext(val useStreamlining: Boolean) {
* Returns a term name that is unique within this instance of a `CodegenContext`.
*/
def freshName(name: String): String = synchronized {
- val fullName = if (freshNamePrefix == "") {
+ // hack: intentionally add a very long prefix (length=300 characters) to
+ // trigger code splitting more frequently
+ val fullName = ("averylongprefix" * 20) + (if (freshNamePrefix == "") {
name
} else {
s"${freshNamePrefix}_$name"
- }
+ })
if (freshNameIds.contains(fullName)) {
val id = freshNameIds(fullName)
freshNameIds(fullName) = id + 1Of course, now with this PR, we can simply set the split threshold to a very low value (e.g. |
|
I used the WideTableBenchmark to test this configuration. Scenario
Perf Results Complete benchmark source: WideTableBenchmark.scala And I have another finding.
Not sure if it is a known issue? |
|
@cloud-fan @gatorsmile How about merging this PR first? And then we can dissuss those performance issue in other PR?
|
|
did you address #22847 (comment) ? |
|
@cloud-fan @rednaxelafx I missed that! Please help review. |
|
Test build #98384 has finished for PR 22847 at commit
|
|
retest this please |
|
Test build #98386 has finished for PR 22847 at commit
|
|
thanks, merging to master! |
…nction configurable ## What changes were proposed in this pull request? As per the discussion in [apache#22823](https://github.com/apache/spark/pull/22823/files#r228400706), add a new configuration to make the split threshold for the code generated function configurable. When the generated Java function source code exceeds `spark.sql.codegen.methodSplitThreshold`, it will be split into multiple small functions. ## How was this patch tested? manual tests Closes apache#22847 from yucai/splitThreshold. Authored-by: yucai <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
As per the discussion in #22823, add a new configuration to make the split threshold for the code generated function configurable.
When the generated Java function source code exceeds
spark.sql.codegen.methodSplitThreshold, it will be split into multiple small functions.How was this patch tested?
manual tests