-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15165][SQL] Codegen can break because toCommentSafeString is not actually safe #12939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
30ad081
ed2e1e7
15a23aa
7106f23
1140642
447b24d
43a340f
f2b7adb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -162,7 +162,18 @@ package object util { | |
| def toCommentSafeString(str: String): String = { | ||
| val len = math.min(str.length, 128) | ||
| val suffix = if (str.length > len) "..." else "" | ||
| str.substring(0, len).replace("*/", "\\*\\/").replace("\\u", "\\\\u") + suffix | ||
|
|
||
| // Unicode literals, like \u0022, should be escaped before | ||
| // they are put in code comment to avoid codegen breaking. | ||
| // To escape them, single "\" should be prepended to a series of "\" just before "u" | ||
| // only when the number of "\" is odd. | ||
| // For example, \u0022 should become to \\u0022 | ||
| // but \\u0022 should not become to \\\u0022 because the first backslash escapes the second one, | ||
| // and \u0022 will remain, means not escaped. | ||
| // Otherwise, the runtime Java compiler will fail to compile or code injection can be allowed. | ||
| // For details, see SPARK-15165. | ||
| str.substring(0, len).replace("*/", "*\\/") | ||
| .replaceAll("(^|[^\\\\])(\\\\(\\\\\\\\)*u)", "$1\\\\$2") + suffix | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about we also have a comment at here?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, make sense. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the implementation of
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the suggestion @mhseiden . |
||
| } | ||
|
|
||
| /* FIX ME | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to make sure that the comment string does not have
*/in it,*\/will be OK, one simpler solution could beThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the advice.
I think "\u" should be escaped too otherwise, the compilation will fail when invalid unicode characters, like
\u002Xor\u001, are in literals.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, LGTM