-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21603][SQL]The wholestage codegen will be much slower then that is closed when the function is too long #18810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 11 commits
ca9eff6
1b0ac5e
5582f00
7c185c6
7e84753
c4235dc
52da6b2
d0c753a
d3238e9
d44a2f8
ce544a5
08f5ddf
4fbe5f8
5c180ac
b83cd1c
6814047
8b32b54
32813b0
b879dbf
44ce894
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -355,6 +355,20 @@ class CodegenContext { | |
| */ | ||
| private val placeHolderToComments = new mutable.HashMap[String, String] | ||
|
|
||
| /** | ||
| * It will count the lines of every Java function generated by whole-stage codegen, | ||
| * if there is a function of length greater than spark.sql.codegen.maxLinesPerFunction, | ||
| * it will return true. | ||
| */ | ||
| def isTooLongGeneratedFunction: Boolean = { | ||
| classFunctions.values.exists { _.values.exists { | ||
| code => | ||
| val codeWithoutComments = CodeFormatter.stripExtraNewLinesAndComments(code) | ||
| codeWithoutComments.count(_ == '\n') > SQLConf.get.maxLinesPerFunction | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /** | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add one more empty line
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, added, thanks |
||
| * Returns a term name that is unique within this instance of a `CodegenContext`. | ||
| */ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -572,6 +572,14 @@ object SQLConf { | |
| "disable logging or -1 to apply no limit.") | ||
| .createWithDefault(1000) | ||
|
|
||
| val WHOLESTAGE_MAX_LINES_PER_FUNCTION = buildConf("spark.sql.codegen.maxLinesPerFunction") | ||
| .internal() | ||
| .doc("The maximum lines of a single Java function generated by whole-stage codegen. " + | ||
| "When the generated function exceeds this threshold, " + | ||
| "the whole-stage codegen is deactivated for this subtree of the current query plan.") | ||
| .intConf | ||
| .createWithDefault(1500) | ||
|
||
|
|
||
| val FILES_MAX_PARTITION_BYTES = buildConf("spark.sql.files.maxPartitionBytes") | ||
| .doc("The maximum number of bytes to pack into a single partition when reading files.") | ||
| .longConf | ||
|
|
@@ -1014,6 +1022,8 @@ class SQLConf extends Serializable with Logging { | |
|
|
||
| def loggingMaxLinesForCodegen: Int = getConf(CODEGEN_LOGGING_MAX_LINES) | ||
|
|
||
| def maxLinesPerFunction: Int = getConf(WHOLESTAGE_MAX_LINES_PER_FUNCTION) | ||
|
|
||
| def tableRelationCacheSize: Int = | ||
| getConf(StaticSQLConf.FILESOURCE_TABLE_RELATION_CACHE_SIZE) | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co | |
|
|
||
| override def doExecute(): RDD[InternalRow] = { | ||
| val (ctx, cleanedSource) = doCodeGen() | ||
| if (ctx.isTooLongGeneratedFunction) { | ||
| logWarning("Found too long generated codes and JIT optimization might not work, " + | ||
| "Whole-stage codegen disabled for this plan, " + | ||
| "You can change the config spark.sql.codegen.MaxFunctionLength " + | ||
| "to adjust the function length limit:\n " | ||
| + s"$treeString") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This could be very big. Please follow what did in #18658
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gatorsmile , thank you for review, the treeString not contains the code, it only contains the tree string of the Physical plan like below: |
||
| return child.execute() | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to add a test in which we create a query with long generated function, and check if whole-stage codegen is disabled for it.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it can be tested by " max function length of wholestagecodegen" added in AggregateBenchmark.scala, thanks.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AggregateBenchmark is more like a benchmark than a test. It won't run every time. We need a test to prevent regression brought by future change.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @viirya, it is hard to check if whole-stage codegen is disabled or not for me, would you like to give me some suggestion, thanks.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can check if there is a
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. I'll take a look later. Thanks.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are multiple ways to verify it. One of the solution is using SQL metrics.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am fine about your proposed way, but needs to simplify it.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gatorsmile Do you mean
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes |
||
| // try to compile and fallback if it failed | ||
| try { | ||
| CodeGenerator.compile(cleanedSource) | ||
|
|
||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: align
// strip //commentwith above// strip /*comment*/.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok,modified, thanks