-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-22215][SQL] Add configuration to set the threshold for generated class #19447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -279,11 +279,13 @@ class CodegenContext { | |
| inlineToOuterClass: Boolean = false): String = { | ||
| // The number of named constants that can exist in the class is limited by the Constant Pool | ||
| // limit, 65,536. We cannot know how many constants will be inserted for a class, so we use a | ||
| // threshold of 1600k bytes to determine when a function should be inlined to a private, nested | ||
| // sub-class. | ||
| // threshold to determine when a function should be inlined to a private, nested sub-class | ||
| val generatedClassLengthThreshold = SparkEnv.get.conf.getInt( | ||
| SQLConf.GENERATED_CLASS_LENGTH_THRESHOLD.key, | ||
| SQLConf.GENERATED_CLASS_LENGTH_THRESHOLD.defaultValue.get) | ||
|
||
| val (className, classInstance) = if (inlineToOuterClass) { | ||
| outerClassName -> "" | ||
| } else if (currClassSize > 1600000) { | ||
| } else if (currClassSize > generatedClassLengthThreshold) { | ||
| val className = freshName("NestedClass") | ||
| val classInstance = freshName("nestedClassInstance") | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -934,6 +934,15 @@ object SQLConf { | |
| .intConf | ||
| .createWithDefault(10000) | ||
|
|
||
| val GENERATED_CLASS_LENGTH_THRESHOLD = | ||
| buildConf("spark.sql.codegen.generatedClass.size.threshold") | ||
| .doc("Threshold in bytes for the size of a generated class. If the generated class " + | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for my ignorance. If users cannot set it, how can this parameter be changed?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is not a frequently-used parameter for users. Also, other JVM implimentation-specific parameters are internal, e.g.,
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with you, but might you please explain me how to set a parameter which is internal? Thanks.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Users can change this value in the same way with other public parameters, but
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for the explanation, I'll add it immediately. |
||
| "size is higher of this value, a private nested class is created and used." + | ||
| "This is useful to limit the number of named constants in the class " + | ||
| "and therefore its Constant Pool. The default is 1600k.") | ||
| .intConf | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add |
||
| .createWithDefault(1600000) | ||
|
|
||
| object Deprecated { | ||
| val MAPRED_REDUCE_TASKS = "mapred.reduce.tasks" | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you ensure
SparkEnv.getis always true?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be empty? In
spark/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java
Line 80 in 83488cc
None...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code that you pointed out is not in
CodeGenerator.scala. In my case atCodeGenerator.scala, there are some cases thatSparkEnv.getisnull. If you have not seen thisnullin you case, it would be good.@gatorsmile will implement a better approach to get this conf soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you better refer other code in
CodeGenerate:https://github.com/mgaido91/spark/blob/c69be31314d9aa96c3920073beaf7cca46d507fa/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L934