Skip to content

Conversation

@turboFei
Copy link
Member

@turboFei turboFei commented Jan 5, 2021

What changes were proposed in this pull request?

Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:

/* SELECT 'test'; */
SELECT 'test';

Would be split to two statements:
The first one: /* SELECT 'test'
The second one: */ SELECT 'test'

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

NOTE: This backport comes from #29982

Why are the changes needed?

Spark-sql might split the statements inside bracketed comments and it is not correct.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added UT.

…park-sql

Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:
```
/* SELECT 'test'; */
SELECT 'test';
```
Would be split to two statements:
The first one: `/* SELECT 'test'`
The second one: `*/ SELECT 'test'`

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

Spark-sql might split the statements inside bracketed comments and it is not correct.

No.

Added UT.

Closes apache#29982 from turboFei/SPARK-33110.

Lead-authored-by: fwang12 <[email protected]>
Co-authored-by: turbofei <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
@turboFei
Copy link
Member Author

turboFei commented Jan 5, 2021

For branch-2.4, this pr depends on below patches:
#25018

#27321

#28393

#27920

@maropu
Should I back port all these prs for branch-2.4?

FYI:
I have create one for branch-2.4

#31040

@maropu
Copy link
Member

maropu commented Jan 5, 2021

For branch-2.4, this pr depends on below patches:

Hmmm... I've checked it and the backport for branch-2.4 looks complicated, so I will backport the fix only for branch-3.0 for now. Anyway, thanks for the check, @turboFei

@maropu
Copy link
Member

maropu commented Jan 5, 2021

ok to test

@maropu maropu changed the title [SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql [SPARK-33100][SQL][3.0] Ignore a semicolon inside a bracketed comment in spark-sql Jan 5, 2021
@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Test build #133683 has finished for PR 31033 at commit 3ee7483.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@turboFei
Copy link
Member Author

turboFei commented Jan 5, 2021

no space left on jenkins node:

[error] /home/jenkins/workspace/SparkPullRequestBuilder@2/sql/core/target/java/org/apache/spark/sql/LHMapClass$.java (No space left on device)

@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38272/

@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38272/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this PR by including #31054 .

@dongjoon-hyun dongjoon-hyun marked this pull request as draft January 6, 2021 07:38
@dongjoon-hyun
Copy link
Member

Also, I converted this PR as Draft to prevent accidental merging.

@turboFei
Copy link
Member Author

turboFei commented Jan 6, 2021

got it,thanks

@maropu
Copy link
Member

maropu commented Jan 7, 2021

@turboFei Could you merge the flakiness fix into this PR? Thanks!

…in spark-sql

### What changes were proposed in this pull request?

This PR help find correct bound of bracketed comment in spark-sql.

Here is the log for UT of SPARK-33100 in CliSuite before:
```
2021-01-05 13:22:34.768 - stdout> spark-sql> /* SELECT 'test';*/ SELECT 'test';
2021-01-05 13:22:41.523 - stderr> Time taken: 6.716 seconds, Fetched 1 row(s)
2021-01-05 13:22:41.599 - stdout> test
2021-01-05 13:22:41.6 - stdout> spark-sql> ;;/* SELECT 'test';*/ SELECT 'test';
2021-01-05 13:22:41.709 - stdout> test
2021-01-05 13:22:41.709 - stdout> spark-sql> /* SELECT 'test';*/;; SELECT 'test';
2021-01-05 13:22:41.902 - stdout> spark-sql> SELECT 'test'; -- SELECT 'test';
2021-01-05 13:22:41.902 - stderr> Time taken: 0.129 seconds, Fetched 1 row(s)
2021-01-05 13:22:41.902 - stderr> Error in query:
2021-01-05 13:22:41.902 - stderr> mismatched input '<EOF>' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 19)
2021-01-05 13:22:42.006 - stderr>
2021-01-05 13:22:42.006 - stderr> == SQL ==
2021-01-05 13:22:42.006 - stderr> /* SELECT 'test';*/
2021-01-05 13:22:42.006 - stderr> -------------------^^^
2021-01-05 13:22:42.006 - stderr>
2021-01-05 13:22:42.006 - stderr> Time taken: 0.226 seconds, Fetched 1 row(s)
2021-01-05 13:22:42.006 - stdout> test
```
The root cause is that the insideBracketedComment is not accurate.

For `/* comment */`, the last character `/` is not insideBracketedComment and it would be treat as beginning of statements.

In this PR, this issue is fixed.

### Why are the changes needed?
To fix the issue described above.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UT

Closes apache#31054 from turboFei/SPARK-33100-followup.

Authored-by: fwang12 <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
@turboFei turboFei marked this pull request as ready for review January 7, 2021 12:20
@turboFei turboFei requested a review from dongjoon-hyun January 7, 2021 12:20
@turboFei
Copy link
Member Author

turboFei commented Jan 7, 2021

thanks, have merged the followup PR

@SparkQA
Copy link

SparkQA commented Jan 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38385/

@SparkQA
Copy link

SparkQA commented Jan 7, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38385/

@SparkQA
Copy link

SparkQA commented Jan 7, 2021

Test build #133797 has finished for PR 31033 at commit a15f125.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

maropu pushed a commit that referenced this pull request Jan 8, 2021
… in spark-sql

### What changes were proposed in this pull request?
Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:
```
/* SELECT 'test'; */
SELECT 'test';
```
Would be split to two statements:
The first one: `/* SELECT 'test'`
The second one: `*/ SELECT 'test'`

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

NOTE: This backport comes from #29982

### Why are the changes needed?
Spark-sql might split the statements inside bracketed comments and it is not correct.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added UT.

Closes #31033 from turboFei/SPARK-33100.

Authored-by: fwang12 <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
@maropu
Copy link
Member

maropu commented Jan 8, 2021

Thanks! Merged to branch-3.0.

@maropu maropu closed this Jan 8, 2021
@turboFei turboFei deleted the SPARK-33100 branch January 8, 2021 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants