Skip to content
Closed
Changes from 2 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
05274e7
Decouple consume functions of physical operators in whole-stage codegen.
viirya Aug 13, 2017
e0e7a6e
shouldStop is called outside consume().
viirya Aug 13, 2017
413707d
Fix the condition and the case of using continue in consume.
viirya Aug 13, 2017
0bb8c0e
More comment.
viirya Aug 13, 2017
6d600d5
Fix aggregation.
viirya Aug 13, 2017
502139a
Also deal with sort case.
viirya Aug 13, 2017
5fe3762
Fix broadcasthash join.
viirya Aug 14, 2017
4bef567
Add more comments.
viirya Aug 14, 2017
1694c9b
Fix the cases where operators set up its produce framework.
viirya Aug 14, 2017
8f3b984
Fix Expand.
viirya Aug 14, 2017
c04da15
Fix BroadcastHashJoin.
viirya Aug 15, 2017
9540195
Rename variables.
viirya Aug 17, 2017
1101b2c
Don't create consume function if the number of arguments are more tha…
viirya Sep 1, 2017
ff77bfe
Merge remote-tracking branch 'upstream/master' into SPARK-21717
viirya Sep 26, 2017
e36ec3c
Remove the part of "continue" processing.
viirya Sep 26, 2017
edb73d6
Merge remote-tracking branch 'upstream/master' into SPARK-21717
viirya Oct 6, 2017
601c225
Fix test.
viirya Oct 7, 2017
476994f
More accurate calculation of valid method parameter length.
viirya Oct 11, 2017
bdc1146
Address comment.
viirya Oct 12, 2017
58eaf00
Address comments.
viirya Jan 24, 2018
2f2d1fd
Merge remote-tracking branch 'upstream/master' into SPARK-21717
viirya Jan 24, 2018
9f0d1da
Copy variables used for creating unsaferow.
viirya Jan 24, 2018
79d0106
Revert vairables copying.
viirya Jan 24, 2018
6384aec
Add final to constants.
viirya Jan 24, 2018
0c4173e
Address comments.
viirya Jan 25, 2018
c859d53
Add tests.
viirya Jan 25, 2018
11946e7
Refactor isValidParamLength a bit.
viirya Jan 25, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -149,13 +149,64 @@ trait CodegenSupport extends SparkPlan {

ctx.freshNamePrefix = parent.variablePrefix
val evaluated = evaluateRequiredVariables(output, inputVars, parent.usedInputs)

// Under certain conditions, we can put the logic to consume the rows of this operator into
Copy link
Member

@kiszk kiszk Aug 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate certain conditions in the comment if you have time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comment to elaborate the idea.

// another function. So we can prevent a generated function too long to be optimized by JIT.
val consumeFunc =
if (row == null && outputVars.nonEmpty && parent.usedInputs.size == inputVars.size) {
constructDoConsumeFunction(ctx, inputVars)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always split consume functions?; we don't need to check if this consume function is too long?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to check it. But the whole-stage codegen is a non-breaking processing which produce/consume calls are embeded together. You don't have a break to check the function length here.

Actually I think it should have no negative effect to split consume functions always. From the benchmarking numbers, looks it shows no harm to normal queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should pass row to this function, if it's non-null, we can save a projection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should create a method for generating rowVar, so that we can use it in both consume and constructDoConsumeFunction

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

} else {
parent.doConsume(ctx, inputVars, rowVar)
}
s"""
|${ctx.registerComment(s"CONSUME: ${parent.simpleString}")}
|$evaluated
|${parent.doConsume(ctx, inputVars, rowVar)}
|$consumeFunc
""".stripMargin
}

/**
* To prevent concatenated function growing too long to be optimized by JIT. We separate the
* consume function of each `CodegenSupport` operator into a function to call.
*/
protected def constructDoConsumeFunction(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private?

ctx: CodegenContext,
inputVars: Seq[ExprCode]): String = {
val (callingParams, arguList, inputVarsInFunc) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add fall back path to the original code gen (i.e. without creating consume function) if the number of arguments is more than 254 (255 - one for non-static method) (e.g. #19082).
If the total number of arguments is more than 255, janino will fail its compilation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I'll follow it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's cleaner to return paramNames, paramTypes, paramVars, then we can simply do

void $doConsume(paramTypes.zip(paramNames).map(i => i._1 + " " + i._2).mkString(", "))

and

doConsumeFuncName(paramNames.mkString(", "))

inside constructConsumeParameters we can just create 3 mutable collections and go through variables to fill these collections.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds cleaner. I need to change it a little because the arguments and parameters are not the same. Some variables are not able parameterized, e.g., constants or statements.

constructConsumeParameters(ctx, output, inputVars)
val rowVar = ExprCode("", "false", "unsafeRow")
val doConsume = ctx.freshName("doConsume")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we put the operator name in this function name?

Copy link
Member Author

@viirya viirya Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The freshName here will add variablePrefix before doConsume. So it already has operator name, e.g., agg_doConsume.

val doConsumeFuncName = ctx.addNewFunction(doConsume,
s"""
| private void $doConsume($arguList) throws java.io.IOException {
| ${parent.doConsume(ctx, inputVarsInFunc, rowVar)}
| }
""".stripMargin)

s"$doConsumeFuncName($callingParams);"
}

/**
* Returns source code for calling consume function and the argument list of the consume function
* and also the `ExprCode` for the argument list.
*/
protected def constructConsumeParameters(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private?

ctx: CodegenContext,
attributes: Seq[Attribute],
variables: Seq[ExprCode]): (String, String, Seq[ExprCode]) = {
val params = variables.zipWithIndex.map { case (ev, i) =>
val callingParam = ev.value + ", " + ev.isNull
val arguName = ctx.freshName(s"expr_$i")
val arguIsNull = ctx.freshName(s"exprIsNull_$i")
(callingParam,
ctx.javaType(attributes(i).dataType) + " " + arguName + ", boolean " + arguIsNull,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about s"${ctx.javaType(attributes(i).dataType)} $arguName, boolean $arguIsNull"?

ExprCode("", arguIsNull, arguName))
}.unzip3
(params._1.mkString(", "),
params._2.mkString(", "),
params._3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the above 3 lines can be one line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

/**
* Returns source code to evaluate all the variables, and clear the code of them, to prevent
* them to be evaluated twice.
Expand Down