-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21717][SQL] Decouple consume functions of physical operators in whole-stage codegen #18931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 24 commits
05274e7
e0e7a6e
413707d
0bb8c0e
6d600d5
502139a
5fe3762
4bef567
1694c9b
8f3b984
c04da15
9540195
1101b2c
ff77bfe
e36ec3c
edb73d6
601c225
476994f
bdc1146
58eaf00
2f2d1fd
9f0d1da
79d0106
6384aec
0c4173e
c859d53
11946e7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -661,6 +661,14 @@ object SQLConf { | |
| .intConf | ||
| .createWithDefault(CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT) | ||
|
|
||
| val DECOUPLE_OPERATOR_CONSUME_FUNCTIONS = buildConf("spark.sql.codegen.decoupleOperatorConsume") | ||
| .internal() | ||
| .doc("When true, whole stage codegen would put the logic of consuming rows of each physical " + | ||
| "operator into individual methods, instead of a single big method. This can be used to " + | ||
| "avoid oversized function that can miss the opportunity of JIT optimization.") | ||
| .booleanConf | ||
| .createWithDefault(true) | ||
|
|
||
| val FILES_MAX_PARTITION_BYTES = buildConf("spark.sql.files.maxPartitionBytes") | ||
| .doc("The maximum number of bytes to pack into a single partition when reading files.") | ||
| .longConf | ||
|
|
@@ -1263,6 +1271,8 @@ class SQLConf extends Serializable with Logging { | |
|
|
||
| def hugeMethodLimit: Int = getConf(WHOLESTAGE_HUGE_METHOD_LIMIT) | ||
|
|
||
| def decoupleOperatorConsumeFuncs: Boolean = getConf(DECOUPLE_OPERATOR_CONSUME_FUNCTIONS) | ||
|
||
|
|
||
| def tableRelationCacheSize: Int = | ||
| getConf(StaticSQLConf.FILESOURCE_TABLE_RELATION_CACHE_SIZE) | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -156,13 +156,94 @@ trait CodegenSupport extends SparkPlan { | |
| ctx.INPUT_ROW = null | ||
| ctx.freshNamePrefix = parent.variablePrefix | ||
| val evaluated = evaluateRequiredVariables(output, inputVars, parent.usedInputs) | ||
|
|
||
| // Under certain conditions, we can put the logic to consume the rows of this operator into | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you elaborate
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added more comment to elaborate the idea. |
||
| // another function. So we can prevent a generated function too long to be optimized by JIT. | ||
| // The conditions: | ||
| // 1. The config "SQLConf.DECOUPLE_OPERATOR_CONSUME_FUNCTIONS" is enabled. | ||
| // 2. The parent uses all variables in output. we can't defer variable evaluation when consume | ||
| // in another function. | ||
| // 3. The output variables are not empty. If it's empty, we don't bother to do that. | ||
|
||
| // 4. We don't use row variable. The construction of row uses deferred variable evaluation. We | ||
|
||
| // can't do it. | ||
| // 5. The number of output variables must less than maximum number of parameters in Java method | ||
| // declaration. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My only concern is if we have a bunch of simple operators and we create a lot of small methods here. Maybe it's fine as optimizer would prevent such cases.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can be super safe and only do this for certain operators, like
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. or introduce a config so that users can turn it off.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a config for it so we can turn it off. |
||
| val requireAllOutput = output.forall(parent.usedInputs.contains(_)) | ||
| val consumeFunc = | ||
| if (SQLConf.get.decoupleOperatorConsumeFuncs && row == null && outputVars.nonEmpty && | ||
| requireAllOutput && ctx.isValidParamLength(output)) { | ||
| constructDoConsumeFunction(ctx, inputVars) | ||
|
||
| } else { | ||
| parent.doConsume(ctx, inputVars, rowVar) | ||
| } | ||
| s""" | ||
| |${ctx.registerComment(s"CONSUME: ${parent.simpleString}")} | ||
| |$evaluated | ||
| |${parent.doConsume(ctx, inputVars, rowVar)} | ||
| |$consumeFunc | ||
| """.stripMargin | ||
| } | ||
|
|
||
| /** | ||
| * To prevent concatenated function growing too long to be optimized by JIT. We can separate the | ||
| * parent's `doConsume` codes of a `CodegenSupport` operator into a function to call. | ||
| */ | ||
| private def constructDoConsumeFunction( | ||
| ctx: CodegenContext, | ||
| inputVars: Seq[ExprCode]): String = { | ||
| val (callingParams, arguList, inputVarsInFunc) = | ||
|
||
| constructConsumeParameters(ctx, output, inputVars) | ||
|
|
||
| // Set up rowVar because parent plan can possibly consume UnsafeRow instead of variables. | ||
| val colExprs = output.zipWithIndex.map { case (attr, i) => | ||
| BoundReference(i, attr.dataType, attr.nullable) | ||
| } | ||
| // Don't need to copy the variables because they're already evaluated before entering function. | ||
| ctx.INPUT_ROW = null | ||
| ctx.currentVars = inputVarsInFunc | ||
| val ev = GenerateUnsafeProjection.createCode(ctx, colExprs, false) | ||
| val rowVar = ExprCode(ev.code.trim, "false", ev.value) | ||
|
|
||
| val doConsume = ctx.freshName("doConsume") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shall we put the operator name in this function name?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
| ctx.currentVars = inputVarsInFunc | ||
| ctx.INPUT_ROW = null | ||
| val doConsumeFuncName = ctx.addNewFunction(doConsume, | ||
| s""" | ||
| | private void $doConsume($arguList) throws java.io.IOException { | ||
| | ${parent.doConsume(ctx, inputVarsInFunc, rowVar)} | ||
| | } | ||
| """.stripMargin) | ||
|
|
||
| s""" | ||
| | $doConsumeFuncName($callingParams); | ||
| """.stripMargin | ||
| } | ||
|
|
||
| /** | ||
| * Returns source code for calling consume function and the argument list of the consume function | ||
| * and also the `ExprCode` for the argument list. | ||
| */ | ||
| private def constructConsumeParameters( | ||
| ctx: CodegenContext, | ||
| attributes: Seq[Attribute], | ||
| variables: Seq[ExprCode]): (String, String, Seq[ExprCode]) = { | ||
| val params = variables.zipWithIndex.map { case (ev, i) => | ||
| val arguName = ctx.freshName(s"expr_$i") | ||
| val arguType = ctx.javaType(attributes(i).dataType) | ||
|
|
||
| val (callingParam, funcParams, arguIsNull) = if (!attributes(i).nullable) { | ||
| // When the argument is not nullable, we don't need to pass in `isNull` param for it and | ||
| // simply give a `false`. | ||
| val arguIsNull = "false" | ||
| (ev.value, s"$arguType $arguName", arguIsNull) | ||
| } else { | ||
| val arguIsNull = ctx.freshName(s"exprIsNull_$i") | ||
| (ev.value + ", " + ev.isNull, s"$arguType $arguName, boolean $arguIsNull", arguIsNull) | ||
| } | ||
| (callingParam, funcParams, ExprCode("", arguIsNull, arguName)) | ||
| }.unzip3 | ||
| (params._1.mkString(", "), params._2.mkString(", "), params._3) | ||
| } | ||
|
|
||
| /** | ||
| * Returns source code to evaluate all the variables, and clear the code of them, to prevent | ||
| * them to be evaluated twice. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -205,7 +205,7 @@ class WholeStageCodegenSuite extends QueryTest with SharedSQLContext { | |
| val codeWithShortFunctions = genGroupByCode(3) | ||
| val (_, maxCodeSize1) = CodeGenerator.compile(codeWithShortFunctions) | ||
| assert(maxCodeSize1 < SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.defaultValue.get) | ||
| val codeWithLongFunctions = genGroupByCode(20) | ||
| val codeWithLongFunctions = genGroupByCode(50) | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We reduced the length of generated codes. So to make this test work, we increase the number of expressions.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In my pr, I changed the code to just check if long functions have the larger value of max code size: |
||
| val (_, maxCodeSize2) = CodeGenerator.compile(codeWithLongFunctions) | ||
| assert(maxCodeSize2 > SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.defaultValue.get) | ||
| } | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decoupleOperatorConsumelooks weird, how aboutsplitConsumeFuncByOperator?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DECOUPLE_OPERATOR_CONSUME_FUNCTIONS->WHOLESTAGE_SPLIT_CONSUME_FUNC_BY_OPERATORThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.