You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-31101][BUILD][3.0] Upgrade Janino to 3.0.16
### What changes were proposed in this pull request?
This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently.
* Merged pull request janino-compiler/janino#114 "Grow the code for relocatables, and do fixup, and relocate".
Please see the commit log.
- https://github.com/janino-compiler/janino/commits/3.0.16
You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect.
### Why are the changes needed?
We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing UTs.
Below test code fails on branch-3.0 and passes with this patch.
```
/**
* NOTE: The test code tries to control the size of for/switch statement in expand_doConsume,
* as well as the overall size of expand_doConsume, so that the query triggers known Janino
* bug - janino-compiler/janino#113.
*
* The expected exception message from Janino when we use switch statement for "ExpandExec":
* - "Operand stack inconsistent at offset xxx: Previous size 1, now 0"
* which will not happen when we use if-else-if statement for "ExpandExec".
*
* "The number of fields" and "The number of distinct aggregation functions" are the major
* factors to increase the size of generated code: while these values should be large enough
* to trigger the Janino bug, these values should not also too big; otherwise one of below
* exceptions might be thrown:
* - "expand_doConsume would be beyond 64KB"
* - "java.lang.ClassFormatError: Too many arguments in method signature in class file"
*/
test("SPARK-31115 Lots of columns and distinct aggregations shouldn't break code generation") {
withSQLConf(
(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true"),
(SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key, "10000"),
(SQLConf.CODEGEN_FALLBACK.key, "false"),
(SQLConf.CODEGEN_LOGGING_MAX_LINES.key, "-1")
) {
var df = Seq(("1", "2", 1), ("1", "2", 2), ("2", "3", 3), ("2", "3", 4)).toDF("a", "b", "c")
// The value is tested under commit "e807118eef9e0214170ff62c828524d237bd58e3":
// the query fails with switch statement, whereas it passes with if-else statement.
// Note that the value depends on the Spark logic as well - different Spark versions may
// require different value to ensure the test failing with switch statement.
val numNewFields = 100
df = df.withColumns(
(1 to numNewFields).map { idx => s"a$idx" },
(1 to numNewFields).map { idx =>
when(col("c").mod(lit(2)).===(lit(0)), lit(idx)).otherwise(col("c"))
}
)
val aggExprs: Array[Column] = Range(1, numNewFields).map { idx =>
if (idx % 2 == 0) {
coalesce(countDistinct(s"a$idx"), lit(0))
} else {
coalesce(count(s"a$idx"), lit(0))
}
}.toArray
val aggDf = df
.groupBy("a", "b")
.agg(aggExprs.head, aggExprs.tail: _*)
// We are only interested in whether the code compilation fails or not, so skipping
// verification on outputs.
aggDf.collect()
}
}
```
Closes#27996 from HeartSaVioR/SPARK-31101-branch-3.0.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
0 commit comments