[SPARK-37555][SQL] spark-sql should pass last unclosed comment to backend #34815

AngersZhuuuu · 2021-12-06T06:11:44Z

What changes were proposed in this pull request?

In current spark-sql cli interface, if the end SQL is not a close comment, the SQL won't be passed to backend engine and just ignored. This caused a problem that if user write a SQL with wrong comment. It's just ignored and won't throw exception.
For example:

spark-sql> /* This is a comment without end symbol SELECT 1;
spark-sql>

After this pr:

spark-sql> /* This is a comment without end symbol SELECT 1;
Error in query:
Unclosed bracketed comment(line 1, pos 0)

 == SQL ==
 /* This is a comment without end symbol SELECT 1;
 ^^^

In SPARK-33100 add this change #29982

Hive related code
https://github.com/apache/hive/blob/1090c93b1a02d480bdee2af2cecf503f8a54efc6/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L488-L490

Why are the changes needed?

Exact exceptions are thrown for wrong statements, which is convenient for users to troubleshoot.

Does this PR introduce any user-facing change?

Yes, if user write a wrong comment in sql/sql file or query in the end.
Before it's just ignored since it's not a statement. Now it will be passed to backend engine and if the statement is not correct, it will throw SQL exception.

How was this patch tested?

added UT and test by handle.

spark-sql> /* SELECT /*+ HINT() 4; */;
Error in query:
mismatched input ';' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 26)

== SQL ==
/* SELECT /*+ HINT() 4; */;
--------------------------^^^

spark-sql> /* SELECT /*+ HINT() 4; */
         >         SELECT 1;
1
Time taken: 3.16 seconds, Fetched 1 row(s)
spark-sql> /* SELECT /*+ HINT() */ 4; */;
spark-sql>
         > ;
spark-sql>
         > /* SELECT /*+ HINT() 4\\;
         >         SELECT 1;
Error in query:
Unclosed bracketed comment(line 1, pos 0)

== SQL ==
/* SELECT /*+ HINT() 4\\;
^^^
        SELECT 1;

spark-sql>

…kend

AngersZhuuuu · 2021-12-06T07:08:30Z

ping @cloud-fan

SparkQA · 2021-12-06T07:23:19Z

Test build #145939 has finished for PR 34815 at commit e874bbd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-12-06T07:35:20Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50415/

SparkQA · 2021-12-06T07:42:05Z

Test build #145941 has finished for PR 34815 at commit c67dcaf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-12-06T07:47:11Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50417/

cloud-fan · 2021-12-06T08:27:45Z

what if the users want to write the queries in multiple lines? does our CLI support it?

SparkQA · 2021-12-06T08:28:22Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50417/

AngersZhuuuu · 2021-12-06T08:32:02Z

what if the users want to write the queries in multiple lines? does our CLI support it?

CLI will cut input when meet a un-escaped ;

spark-sql> select
         > 1,
         > 2,
         > 3,
         > 4;
1	2	3	4
Time taken: 3.442 seconds, Fetched 1 row(s)
spark-sql> select
         > 1,
         > 2 \\;
         > ;
Error in query:
extraneous input '\' expecting {<EOF>, ';'}(line 3, pos 2)

== SQL ==
select
1,
2 \;
--^^^

spark-sql>

SparkQA · 2021-12-06T08:35:44Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50415/

SparkQA · 2021-12-06T10:21:55Z

Test build #145949 has finished for PR 34815 at commit e45b12d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-12-06T10:23:36Z

...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala

      isStatement = statementInProgress(index)
    }
-    if (isStatement) {
+    if (beginIndex < line.length()) {


what does this condition mean?

Avoid pass an blank string to processLine, although in processLine it will handle this

// we can not use "split" function directly as ";" may be quoted val commands = splitSemiColon(line).asScala var command: String = "" for (oneCmd <- commands) { if (StringUtils.endsWith(oneCmd, "\\")) { command += StringUtils.chop(oneCmd) + ";" } else { command += oneCmd if (!StringUtils.isBlank(command)) { val ret = processCmd(command) command = "" lastRet = ret val ignoreErrors = HiveConf.getBoolVar(conf, HiveConf.ConfVars.CLIIGNOREERRORS) if (ret != 0 && !ignoreErrors) { CommandProcessorFactory.clean(conf.asInstanceOf[HiveConf]) return ret } } } }

Also keep same cod with hive

@AngersZhuuuu @cloud-fan Wondering why do we keep the CLI in the hive-thriftserver package. It has nothing to do with ThriftServer.

SparkQA · 2021-12-06T10:36:51Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50424/

SparkQA · 2021-12-06T11:34:51Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50424/

cloud-fan · 2021-12-07T04:41:33Z

thanks, merging to master!

juliuszsompolski · 2021-12-08T23:14:59Z

spark-sql> /* This is a comment without end symbol SELECT 1;
Error in query:
Unclosed bracketed comment(line 1, pos 0)

 == SQL ==
 /* This is a comment without end symbol SELECT 1;
 ^^^

Can /* be the start of a multiline comment, in which case the behaviour should be

spark-sql> /* This is a comment without end symbol SELECT 1;
         >

(waiting for next line of multiline input)?

AngersZhuuuu · 2021-12-09T02:35:40Z

spark-sql> /* This is a comment without end symbol SELECT 1;
Error in query:
Unclosed bracketed comment(line 1, pos 0)

 == SQL ==
 /* This is a comment without end symbol SELECT 1;
 ^^^
Can /* be the start of a multiline comment, in which case the behaviour should be
spark-sql> /* This is a comment without end symbol SELECT 1;
         > 
(waiting for next line of multiline input)?

If you write

spark-sql> /* This is a comment without end symbol SELECT 1\\;
         >

it will wait for input.

If want

spark-sql> /* This is a comment without end symbol SELECT 1;
         >

to wait for input, we need to change the console terminate logic.

cc @cloud-fan WDYT?

cloud-fan · 2021-12-09T02:59:32Z

In the spark-sql console, ; has a special meaning and it means to terminate a sql statement (similar to ctrl+D in the paste mode). I don't have a strong opinion to change it, as writing comments in the console is not common anyway.

juliuszsompolski · 2021-12-09T12:25:59Z

I don't have a strong opinion either, but in this case it is a ; inside a comment, and other examples here showed that ; inside a comment should not terminate command:

spark-sql> /* SELECT /*+ HINT() 4; */
         >         SELECT 1;

if the ; after HINT() does not terminate a command here,
then I think the ; at end of line in

spark-sql> /* This is a comment without end symbol SELECT 1;

also shouldn't?

I can imagine an edge case, where someone has a multi-line comment in their SQL that contains either some snippet of code with ; terminated lines, or some bullet points with some arguments terminated with ; that they copy paste into the console.
But I don't have a strong opinion and for the most part I think it's not a practical edge case...

cloud-fan · 2021-12-09T12:55:56Z

Yea, maybe we should leave it as it is and document the behavior clearly in https://github.com/apache/spark/pull/34821/files

### What changes were proposed in this pull request? In #34815 we change back support unclosed bracketed comment to backend. But miss the case such as ``` SELECT 1; --comment ``` ``` SELECT 1; /* comment */ ``` It's a common use case in sql job. We should ignore the comment at end of SQL script. Need to clarify that when use `-e`, we directly pass SQL to `splitSemiColon`, when use `-f`, CliDriver will add a `\n` for query. ``` public int processReader(BufferedReader r) throws IOException { StringBuilder qsb = new StringBuilder(); String line; while((line = r.readLine()) != null) { if (!line.startsWith("--")) { qsb.append(line + "\n"); } } return this.processLine(qsb.toString()); } ``` So in `splitSemiColon`, we should consider both case. In this pr, the final behavior like below For `-e` | Query | Behavior before | Behavior now| -------|---------|--------------| | `SELECT 1; --comment` | Will pass both `SELECT 1` and `--comment` to backend engine and throw exception since `--comment` can't be executed| Only pass `SELECT 1` to backend engine and will ignore the simple comment | | `SELECT 1; /* comment */ ` | Will pass both `SELECT 1` and `/* comment */` to backend engine and throw exception since `/* comment */` can't be executed | Only pass `SELECT 1` to backend engine| | `SELECT 1; /* comment ` | Will pass `SELECT 1` and `/* comment` to backend engine | Will pass `SELECT 1` and `/* comment` to backend engine | | `SELECT 1; /* comment SELECT 1` | Will pass `SELECT 1` and `/* comment SELECT 1` to backend engine | Will pass `SELECT 1` and `/* comment SELECT 1` to backend engine | | `/ * comment SELECT 1;` | Will pass `/ * comment SELECT 1;` to back end engine and throw `unclose bracketed comment exception`| Will pass `/ * comment SELECT 1;` to back end engine and throw `unclose bracketed comment exception`| For `-f`, since `-f` will add a `\n` at the end line if it's not started as `--` | Query | Behavior before | Behavior now| -------|---------|--------------| | `SELECT 1; --comment\n` | Will pass both `SELECT 1` and `--comment` to backend engine and throw exception since `--comment` can't be executed| Only pass `SELECT 1` to backend engine and will ignore the simple comment | | `SELECT 1; /* comment */ \n` | Will pass both `SELECT 1` and `/* comment */` to backend engine and throw exception since `/* comment */` can't be executed | Only pass `SELECT 1` to backend engine| | `SELECT 1; /* comment \n` | Will pass `SELECT 1` and `/* comment\n` to backend engine | Will pass `SELECT 1` and `/* comment\n` to backend engine | | `SELECT 1; /* comment SELECT 1\n` | Will pass `SELECT 1` and `/* comment SELECT 1\n` to backend engine | Will pass `SELECT 1` and `/* comment SELECT 1\n` to backend engine | | `/ * comment SELECT 1;\n` | Will pass `/ * comment SELECT 1;\n` to back end engine and throw `unclose bracketed comment exception`| Will pass `/ * comment SELECT 1;\n` to back end engine and throw `unclose bracketed comment exception`| ### Why are the changes needed? Spark sql should not pass last entire comment to backend ### Does this PR introduce _any_ user-facing change? Use can write SQL script end with a comment ``` SELECT 1; --comment ``` ``` SELECT 1; /* comment */ ``` ### How was this patch tested? Added UT Closes #35206 from AngersZhuuuu/SPARK-37906. Lead-authored-by: Angerszhuuuu <[email protected]> Co-authored-by: AngersZhuuuu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

bogdanghit · 2022-01-27T09:21:53Z

sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala

    )
  }
+
+  test("SPARK-37555: spark-sql should pass last unclosed comment to backend") {


@AngersZhuuuu This test is flaky, fails quite often on repeated runs. Here's a sample error:

2021-12-08 12:01:27.68 - stderr> Setting default log level to "WARN". 2021-12-08 12:01:27.68 - stderr> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 2021-12-08 12:01:39.459 - stderr> Spark master: local, Application Id: local-1638993689929 2021-12-08 12:01:40.688 - stdout> spark-sql> /* SELECT /*+ HINT() 4; */; 2021-12-08 12:01:41.299 - stdout> spark-sql> /* SELECT /*+ HINT() 4; */ SELECT 1; 2021-12-08 12:01:41.56 - stderr> Error in query: 2021-12-08 12:01:41.56 - stderr> mismatched input ';' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 26) 2021-12-08 12:01:41.56 - stderr> 2021-12-08 12:01:41.56 - stderr> == SQL == 2021-12-08 12:01:41.56 - stderr> /* SELECT /*+ HINT() 4; */; 2021-12-08 12:01:41.56 - stderr> --------------------------^^^ 2021-12-08 12:01:41.56 - stderr> 2021-12-08 12:01:47.573 - stdout> 1 2021-12-08 12:01:47.573 - stderr> Time taken: 6.272 seconds, Fetched 1 row(s) 2021-12-08 12:01:47.592 - stdout> spark-sql> /* Here is a unclosed bracketed comment SELECT 1; 2021-12-08 12:01:47.601 - stderr> Error in query: 2021-12-08 12:01:47.601 - stderr> Unclosed bracketed comment(line 1, pos 0) 2021-12-08 12:01:47.601 - stderr> 2021-12-08 12:01:47.601 - stderr> == SQL == 2021-12-08 12:01:47.601 - stderr> /* Here is a unclosed bracketed comment SELECT 1; 2021-12-08 12:01:47.601 - stderr> ^^^ 2021-12-08 12:01:47.601 - stderr> 2021-12-08 12:01:47.612 - stdout> spark-sql> /* SELECT /*+ HINT() */ 4; */; 2021-12-08 12:01:49.552 - stdout> spark-sql>

Let me check it.

I have run this many times. Not failed.

It fails on my PRs periodically too. For instance, see https://github.com/MaxGekk/spark/actions/runs/3383705774/jobs/5619867814:

[info] - SPARK-37555: spark-sql should pass last unclosed comment to backend *** FAILED *** (2 minutes, 8 seconds) [info] ======================= [info] CliSuite failure output [info] ======================= [info] Spark SQL CLI command line: ../../bin/spark-sql --master local --driver-java-options -Dderby.system.durability=test --conf spark.ui.enabled=false --hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/home/runner/work/spark/spark/target/tmp/spark-16a98152-adaa-4653-93d1-82425ddc6ed5;create=true --hiveconf hive.exec.scratchdir=/home/runner/work/spark/spark/target/tmp/spark-d3ccd810-41f0-4a29-9592-f8c08ad7f50a --hiveconf conf1=conftest --hiveconf conf2=1 --hiveconf hive.metastore.warehouse.dir=/home/runner/work/spark/spark/target/tmp/spark-c80a8015-a17d-422d-aeff-11be2a4b9986 [info] Exception: java.util.concurrent.TimeoutException: Futures timed out after [2 minutes] [info] Failed to capture next expected output "Unclosed bracketed comment" within 2 minutes. [info] [info] 2022-11-03 01:00:52.245 - stderr> Setting default log level to "WARN". [info] 2022-11-03 01:00:52.245 - stderr> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). [info] 2022-11-03 01:00:59.101 - stderr> Spark master: local, Application Id: local-1667462453894 [info] 2022-11-03 01:00:59.939 - stdout> spark-sql> /* SELECT /*+ HINT() 4; */; [info] 2022-11-03 01:01:00.378 - stderr> [info] 2022-11-03 01:01:00.379 - stderr> [PARSE_SYNTAX_ERROR] Syntax error at or near ';'(line 1, pos 26) [info] 2022-11-03 01:01:00.379 - stderr> [info] 2022-11-03 01:01:00.379 - stderr> == SQL == [info] 2022-11-03 01:01:00.379 - stderr> /* SELECT /*+ HINT() 4; */; [info] 2022-11-03 01:01:00.379 - stderr> --------------------------^^^ [info] 2022-11-03 01:01:00.379 - stderr> [info] 2022-11-03 01:01:00.401 - stdout> spark-sql> /* SELECT /*+ HINT() 4; */ SELECT 1; [info] 2022-11-03 01:01:02.78 - stdout> 1 [info] 2022-11-03 01:01:02.78 - stderr> Time taken: 2.384 seconds, Fetched 1 row(s) [info] 2022-11-03 01:01:02.791 - stderr> [info] 2022-11-03 01:01:02.791 - stderr> Unclosed bracketed comment(line 1, pos 0) [info] 2022-11-03 01:01:02.791 - stderr> [info] 2022-11-03 01:01:02.791 - stderr> == SQL == [info] 2022-11-03 01:01:02.791 - stderr> /* Here is a unclosed bracketed comment SELECT 1; [info] 2022-11-03 01:01:02.791 - stderr> ^^^ [info] 2022-11-03 01:01:02.791 - stdout> spark-sql> /* Here is a unclosed bracketed comment SELECT 1; [info] 2022-11-03 01:01:02.792 - stderr> [info] 2022-11-03 01:01:02.795 - stdout> spark-sql> /* SELECT /*+ HINT() */ 4; */; [info] 2022-11-03 01:01:02.995 - stdout> spark-sql> [info] =========================== [info] End CliSuite failure output [info] =========================== (CliSuite.scala:213) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)

Maybe we don't wait for 2 minutes in real life.

ok, let me check again.

Hey this also fails on my PR. Just FYI didn't meant to push!

@AngersZhuuuu shall we increase the timeout? The github action machines are not stable.

@AngersZhuuuu shall we increase the timeout? The github action machines are not stable.

Yea, the logic should be ok...
Add below PR and we can trigger it many times to see if it's stable?
#38571

[SPARK-37555][SQL] spark-sql should pass last unclosed comment to bac…

7ac7a71

…kend

github-actions bot added the SQL label Dec 6, 2021

AngersZhuuuu added 2 commits December 6, 2021 14:36

Update CliSuite.scala

e874bbd

Update CliSuite.scala

c67dcaf

Update SparkSQLCLIDriver.scala

e45b12d

cloud-fan reviewed Dec 6, 2021

View reviewed changes

cloud-fan closed this in d50d464 Dec 7, 2021

AngersZhuuuu mentioned this pull request Jan 14, 2022

[SPARK-37906][SQL] spark-sql should not pass last comment to backend #35206

Closed

bogdanghit reviewed Jan 27, 2022

View reviewed changes

[SPARK-37555][SQL] spark-sql should pass last unclosed comment to backend #34815

[SPARK-37555][SQL] spark-sql should pass last unclosed comment to backend #34815

Uh oh!

Conversation

AngersZhuuuu commented Dec 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AngersZhuuuu commented Dec 6, 2021

Uh oh!

SparkQA commented Dec 6, 2021

Uh oh!

SparkQA commented Dec 6, 2021

Uh oh!

SparkQA commented Dec 6, 2021

Uh oh!

SparkQA commented Dec 6, 2021

Uh oh!

cloud-fan commented Dec 6, 2021

Uh oh!

SparkQA commented Dec 6, 2021

Uh oh!

AngersZhuuuu commented Dec 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Dec 6, 2021

Uh oh!

SparkQA commented Dec 6, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 6, 2021

Uh oh!

SparkQA commented Dec 6, 2021

Uh oh!

cloud-fan commented Dec 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juliuszsompolski commented Dec 8, 2021

Uh oh!

AngersZhuuuu commented Dec 9, 2021

Uh oh!

cloud-fan commented Dec 9, 2021

Uh oh!

juliuszsompolski commented Dec 9, 2021

Uh oh!

cloud-fan commented Dec 9, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

AngersZhuuuu commented Dec 6, 2021 •

edited

Loading

AngersZhuuuu commented Dec 6, 2021 •

edited

Loading

cloud-fan commented Dec 7, 2021 •

edited

Loading