-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33100][SQL][3.0] Ignore a semicolon inside a bracketed comment in spark-sql #31033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…park-sql Now the spark-sql does not support parse the sql statements with bracketed comments. For the sql statements: ``` /* SELECT 'test'; */ SELECT 'test'; ``` Would be split to two statements: The first one: `/* SELECT 'test'` The second one: `*/ SELECT 'test'` Then it would throw an exception because the first one is illegal. In this PR, we ignore the content in bracketed comments while splitting the sql statements. Besides, we ignore the comment without any content. Spark-sql might split the statements inside bracketed comments and it is not correct. No. Added UT. Closes apache#29982 from turboFei/SPARK-33110. Lead-authored-by: fwang12 <[email protected]> Co-authored-by: turbofei <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
Hmmm... I've checked it and the backport for branch-2.4 looks complicated, so I will backport the fix only for branch-3.0 for now. Anyway, thanks for the check, @turboFei |
|
ok to test |
|
Test build #133683 has finished for PR 31033 at commit
|
|
no space left on jenkins node: |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this PR by including #31054 .
|
Also, I converted this PR as |
|
got it,thanks |
|
@turboFei Could you merge the flakiness fix into this PR? Thanks! |
…in spark-sql
### What changes were proposed in this pull request?
This PR help find correct bound of bracketed comment in spark-sql.
Here is the log for UT of SPARK-33100 in CliSuite before:
```
2021-01-05 13:22:34.768 - stdout> spark-sql> /* SELECT 'test';*/ SELECT 'test';
2021-01-05 13:22:41.523 - stderr> Time taken: 6.716 seconds, Fetched 1 row(s)
2021-01-05 13:22:41.599 - stdout> test
2021-01-05 13:22:41.6 - stdout> spark-sql> ;;/* SELECT 'test';*/ SELECT 'test';
2021-01-05 13:22:41.709 - stdout> test
2021-01-05 13:22:41.709 - stdout> spark-sql> /* SELECT 'test';*/;; SELECT 'test';
2021-01-05 13:22:41.902 - stdout> spark-sql> SELECT 'test'; -- SELECT 'test';
2021-01-05 13:22:41.902 - stderr> Time taken: 0.129 seconds, Fetched 1 row(s)
2021-01-05 13:22:41.902 - stderr> Error in query:
2021-01-05 13:22:41.902 - stderr> mismatched input '<EOF>' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 19)
2021-01-05 13:22:42.006 - stderr>
2021-01-05 13:22:42.006 - stderr> == SQL ==
2021-01-05 13:22:42.006 - stderr> /* SELECT 'test';*/
2021-01-05 13:22:42.006 - stderr> -------------------^^^
2021-01-05 13:22:42.006 - stderr>
2021-01-05 13:22:42.006 - stderr> Time taken: 0.226 seconds, Fetched 1 row(s)
2021-01-05 13:22:42.006 - stdout> test
```
The root cause is that the insideBracketedComment is not accurate.
For `/* comment */`, the last character `/` is not insideBracketedComment and it would be treat as beginning of statements.
In this PR, this issue is fixed.
### Why are the changes needed?
To fix the issue described above.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing UT
Closes apache#31054 from turboFei/SPARK-33100-followup.
Authored-by: fwang12 <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
|
thanks, have merged the followup PR |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #133797 has finished for PR 31033 at commit
|
… in spark-sql ### What changes were proposed in this pull request? Now the spark-sql does not support parse the sql statements with bracketed comments. For the sql statements: ``` /* SELECT 'test'; */ SELECT 'test'; ``` Would be split to two statements: The first one: `/* SELECT 'test'` The second one: `*/ SELECT 'test'` Then it would throw an exception because the first one is illegal. In this PR, we ignore the content in bracketed comments while splitting the sql statements. Besides, we ignore the comment without any content. NOTE: This backport comes from #29982 ### Why are the changes needed? Spark-sql might split the statements inside bracketed comments and it is not correct. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added UT. Closes #31033 from turboFei/SPARK-33100. Authored-by: fwang12 <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
|
Thanks! Merged to branch-3.0. |
What changes were proposed in this pull request?
Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:
Would be split to two statements:
The first one:
/* SELECT 'test'The second one:
*/ SELECT 'test'Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.
NOTE: This backport comes from #29982
Why are the changes needed?
Spark-sql might split the statements inside bracketed comments and it is not correct.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added UT.