Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Dec 6, 2021

What changes were proposed in this pull request?

In current spark-sql cli interface, if the end SQL is not a close comment, the SQL won't be passed to backend engine and just ignored. This caused a problem that if user write a SQL with wrong comment. It's just ignored and won't throw exception.
For example:

spark-sql> /* This is a comment without end symbol SELECT 1;
spark-sql>

After this pr:

spark-sql> /* This is a comment without end symbol SELECT 1;
Error in query:
Unclosed bracketed comment(line 1, pos 0)

 == SQL ==
 /* This is a comment without end symbol SELECT 1;
 ^^^

In SPARK-33100 add this change #29982

Hive related code
https://github.com/apache/hive/blob/1090c93b1a02d480bdee2af2cecf503f8a54efc6/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L488-L490

Why are the changes needed?

Exact exceptions are thrown for wrong statements, which is convenient for users to troubleshoot.

Does this PR introduce any user-facing change?

Yes, if user write a wrong comment in sql/sql file or query in the end.
Before it's just ignored since it's not a statement. Now it will be passed to backend engine and if the statement is not correct, it will throw SQL exception.

How was this patch tested?

added UT and test by handle.

spark-sql> /* SELECT /*+ HINT() 4; */;
Error in query:
mismatched input ';' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 26)

== SQL ==
/* SELECT /*+ HINT() 4; */;
--------------------------^^^

spark-sql> /* SELECT /*+ HINT() 4; */
         >         SELECT 1;
1
Time taken: 3.16 seconds, Fetched 1 row(s)
spark-sql> /* SELECT /*+ HINT() */ 4; */;
spark-sql>
         > ;
spark-sql>
         > /* SELECT /*+ HINT() 4\\;
         >         SELECT 1;
Error in query:
Unclosed bracketed comment(line 1, pos 0)

== SQL ==
/* SELECT /*+ HINT() 4\\;
^^^
        SELECT 1;

spark-sql>

@github-actions github-actions bot added the SQL label Dec 6, 2021
@AngersZhuuuu
Copy link
Contributor Author

ping @cloud-fan

@SparkQA
Copy link

SparkQA commented Dec 6, 2021

Test build #145939 has finished for PR 34815 at commit e874bbd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50415/

@SparkQA
Copy link

SparkQA commented Dec 6, 2021

Test build #145941 has finished for PR 34815 at commit c67dcaf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50417/

@cloud-fan
Copy link
Contributor

what if the users want to write the queries in multiple lines? does our CLI support it?

@SparkQA
Copy link

SparkQA commented Dec 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50417/

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Dec 6, 2021

what if the users want to write the queries in multiple lines? does our CLI support it?

CLI will cut input when meet a un-escaped ;

spark-sql> select
         > 1,
         > 2,
         > 3,
         > 4;
1	2	3	4
Time taken: 3.442 seconds, Fetched 1 row(s)
spark-sql> select
         > 1,
         > 2 \\;
         > ;
Error in query:
extraneous input '\' expecting {<EOF>, ';'}(line 3, pos 2)

== SQL ==
select
1,
2 \;
--^^^

spark-sql>

@SparkQA
Copy link

SparkQA commented Dec 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50415/

@SparkQA
Copy link

SparkQA commented Dec 6, 2021

Test build #145949 has finished for PR 34815 at commit e45b12d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

isStatement = statementInProgress(index)
}
if (isStatement) {
if (beginIndex < line.length()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this condition mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid pass an blank string to processLine, although in processLine it will handle this

      // we can not use "split" function directly as ";" may be quoted
      val commands = splitSemiColon(line).asScala
      var command: String = ""
      for (oneCmd <- commands) {
        if (StringUtils.endsWith(oneCmd, "\\")) {
          command += StringUtils.chop(oneCmd) + ";"
        } else {
          command += oneCmd
          if (!StringUtils.isBlank(command)) {
            val ret = processCmd(command)
            command = ""
            lastRet = ret
            val ignoreErrors = HiveConf.getBoolVar(conf, HiveConf.ConfVars.CLIIGNOREERRORS)
            if (ret != 0 && !ignoreErrors) {
              CommandProcessorFactory.clean(conf.asInstanceOf[HiveConf])
              return ret
            }
          }
        }
      }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also keep same cod with hive

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu @cloud-fan Wondering why do we keep the CLI in the hive-thriftserver package. It has nothing to do with ThriftServer.

@SparkQA
Copy link

SparkQA commented Dec 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50424/

@SparkQA
Copy link

SparkQA commented Dec 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50424/

@cloud-fan
Copy link
Contributor

cloud-fan commented Dec 7, 2021

thanks, merging to master!

@cloud-fan cloud-fan closed this in d50d464 Dec 7, 2021
@juliuszsompolski
Copy link
Contributor

spark-sql> /* This is a comment without end symbol SELECT 1;
Error in query:
Unclosed bracketed comment(line 1, pos 0)

 == SQL ==
 /* This is a comment without end symbol SELECT 1;
 ^^^

Can /* be the start of a multiline comment, in which case the behaviour should be

spark-sql> /* This is a comment without end symbol SELECT 1;
         > 

(waiting for next line of multiline input)?

@AngersZhuuuu
Copy link
Contributor Author

spark-sql> /* This is a comment without end symbol SELECT 1;
Error in query:
Unclosed bracketed comment(line 1, pos 0)

 == SQL ==
 /* This is a comment without end symbol SELECT 1;
 ^^^

Can /* be the start of a multiline comment, in which case the behaviour should be

spark-sql> /* This is a comment without end symbol SELECT 1;
         > 

(waiting for next line of multiline input)?

If you write

spark-sql> /* This is a comment without end symbol SELECT 1\\;
         > 

it will wait for input.

If want

spark-sql> /* This is a comment without end symbol SELECT 1;
         > 

to wait for input, we need to change the console terminate logic.

cc @cloud-fan WDYT?

@cloud-fan
Copy link
Contributor

In the spark-sql console, ; has a special meaning and it means to terminate a sql statement (similar to ctrl+D in the paste mode). I don't have a strong opinion to change it, as writing comments in the console is not common anyway.

@juliuszsompolski
Copy link
Contributor

I don't have a strong opinion either, but in this case it is a ; inside a comment, and other examples here showed that ; inside a comment should not terminate command:

spark-sql> /* SELECT /*+ HINT() 4; */
         >         SELECT 1;

if the ; after HINT() does not terminate a command here,
then I think the ; at end of line in

spark-sql> /* This is a comment without end symbol SELECT 1;

also shouldn't?

I can imagine an edge case, where someone has a multi-line comment in their SQL that contains either some snippet of code with ; terminated lines, or some bullet points with some arguments terminated with ; that they copy paste into the console.
But I don't have a strong opinion and for the most part I think it's not a practical edge case...

@cloud-fan
Copy link
Contributor

Yea, maybe we should leave it as it is and document the behavior clearly in https://github.com/apache/spark/pull/34821/files

cloud-fan pushed a commit that referenced this pull request Jan 18, 2022
### What changes were proposed in this pull request?
In #34815 we change back support unclosed bracketed comment to backend.

But miss the case such as
```
SELECT 1; --comment
```

```
SELECT 1; /* comment */
```

It's a common use case in sql job. We should ignore the comment at end of SQL script.

Need to clarify that when use `-e`, we directly pass SQL to `splitSemiColon`, when use `-f`, CliDriver will add a `\n` for query.
```
  public int processReader(BufferedReader r) throws IOException {
    StringBuilder qsb = new StringBuilder();

    String line;
    while((line = r.readLine()) != null) {
      if (!line.startsWith("--")) {
        qsb.append(line + "\n");
      }
    }

    return this.processLine(qsb.toString());
  }
```

So in `splitSemiColon`, we should consider both case.

In this pr, the final behavior like below

For `-e`

| Query | Behavior before  | Behavior now|
-------|---------|--------------|
|   `SELECT 1; --comment` |  Will pass both `SELECT 1` and `--comment` to backend engine and throw exception since `--comment` can't be executed| Only pass `SELECT 1` to backend engine and will ignore the simple comment |
|   `SELECT 1;  /* comment */ ` |  Will pass both `SELECT 1` and `/* comment */` to backend engine and throw exception since `/* comment */` can't be executed |  Only pass `SELECT 1` to backend engine|
|   `SELECT 1;  /* comment ` | Will pass `SELECT 1` and `/* comment` to backend engine  |  Will pass `SELECT 1` and `/* comment` to backend engine  |
|   `SELECT 1; /* comment SELECT 1` | Will pass `SELECT 1` and `/* comment SELECT 1` to backend engine  |  Will pass `SELECT 1` and `/* comment SELECT 1` to backend engine  |
|   `/ * comment SELECT 1;`  |  Will pass `/ * comment SELECT 1;` to back end engine and throw `unclose bracketed comment exception`|  Will pass `/ * comment SELECT 1;` to back end engine and throw `unclose bracketed comment exception`|

For `-f`, since `-f` will add a `\n` at the end line if it's not started  as `--`
| Query | Behavior before  | Behavior now|
-------|---------|--------------|
|   `SELECT 1; --comment\n` |  Will pass both `SELECT 1` and `--comment` to backend engine and throw exception since `--comment` can't be executed| Only pass `SELECT 1` to backend engine and will ignore the simple comment |
|   `SELECT 1;  /* comment */ \n` |  Will pass both `SELECT 1` and `/* comment */` to backend engine and throw exception since `/* comment */` can't be executed |  Only pass `SELECT 1` to backend engine|
|   `SELECT 1;  /* comment \n` | Will pass `SELECT 1` and `/* comment\n` to backend engine  |  Will pass `SELECT 1` and `/* comment\n` to backend engine  |
|   `SELECT 1; /* comment SELECT 1\n` | Will pass `SELECT 1` and `/* comment SELECT 1\n` to backend engine  |  Will pass `SELECT 1` and `/* comment SELECT 1\n` to backend engine  |
|   `/ * comment SELECT 1;\n`  |  Will pass `/ * comment SELECT 1;\n` to back end engine and throw `unclose bracketed comment exception`|  Will pass `/ * comment SELECT 1;\n` to back end engine and throw `unclose bracketed comment exception`|

### Why are the changes needed?
Spark sql should not pass last entire comment   to backend

### Does this PR introduce _any_ user-facing change?
Use can write SQL script end with a comment
```
SELECT 1; --comment
```

```
SELECT 1; /* comment */
```

### How was this patch tested?
Added UT

Closes #35206 from AngersZhuuuu/SPARK-37906.

Lead-authored-by: Angerszhuuuu <[email protected]>
Co-authored-by: AngersZhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
)
}

test("SPARK-37555: spark-sql should pass last unclosed comment to backend") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu This test is flaky, fails quite often on repeated runs. Here's a sample error:

2021-12-08 12:01:27.68 - stderr> Setting default log level to "WARN".
2021-12-08 12:01:27.68 - stderr> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2021-12-08 12:01:39.459 - stderr> Spark master: local, Application Id: local-1638993689929
2021-12-08 12:01:40.688 - stdout> spark-sql> /* SELECT /*+ HINT() 4; */;
2021-12-08 12:01:41.299 - stdout> spark-sql> /* SELECT /*+ HINT() 4; */ SELECT 1;
2021-12-08 12:01:41.56 - stderr> Error in query: 
2021-12-08 12:01:41.56 - stderr> mismatched input ';' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 26)
2021-12-08 12:01:41.56 - stderr> 
2021-12-08 12:01:41.56 - stderr> == SQL ==
2021-12-08 12:01:41.56 - stderr> /* SELECT /*+ HINT() 4; */;
2021-12-08 12:01:41.56 - stderr> --------------------------^^^
2021-12-08 12:01:41.56 - stderr> 
2021-12-08 12:01:47.573 - stdout> 1
2021-12-08 12:01:47.573 - stderr> Time taken: 6.272 seconds, Fetched 1 row(s)
2021-12-08 12:01:47.592 - stdout> spark-sql> /* Here is a unclosed bracketed comment SELECT 1;
2021-12-08 12:01:47.601 - stderr> Error in query: 
2021-12-08 12:01:47.601 - stderr> Unclosed bracketed comment(line 1, pos 0)
2021-12-08 12:01:47.601 - stderr> 
2021-12-08 12:01:47.601 - stderr> == SQL ==
2021-12-08 12:01:47.601 - stderr> /* Here is a unclosed bracketed comment SELECT 1;
2021-12-08 12:01:47.601 - stderr> ^^^
2021-12-08 12:01:47.601 - stderr> 
2021-12-08 12:01:47.612 - stdout> spark-sql> /* SELECT /*+ HINT() */ 4; */;
2021-12-08 12:01:49.552 - stdout> spark-sql> 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run this many times. Not failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails on my PRs periodically too. For instance, see https://github.com/MaxGekk/spark/actions/runs/3383705774/jobs/5619867814:

[info] - SPARK-37555: spark-sql should pass last unclosed comment to backend *** FAILED *** (2 minutes, 8 seconds)
[info]   =======================
[info]   CliSuite failure output
[info]   =======================
[info]   Spark SQL CLI command line: ../../bin/spark-sql --master local --driver-java-options -Dderby.system.durability=test --conf spark.ui.enabled=false --hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/home/runner/work/spark/spark/target/tmp/spark-16a98152-adaa-4653-93d1-82425ddc6ed5;create=true --hiveconf hive.exec.scratchdir=/home/runner/work/spark/spark/target/tmp/spark-d3ccd810-41f0-4a29-9592-f8c08ad7f50a --hiveconf conf1=conftest --hiveconf conf2=1 --hiveconf hive.metastore.warehouse.dir=/home/runner/work/spark/spark/target/tmp/spark-c80a8015-a17d-422d-aeff-11be2a4b9986
[info]   Exception: java.util.concurrent.TimeoutException: Futures timed out after [2 minutes]
[info]   Failed to capture next expected output "Unclosed bracketed comment" within 2 minutes.
[info]   
[info]   2022-11-03 01:00:52.245 - stderr> Setting default log level to "WARN".
[info]   2022-11-03 01:00:52.245 - stderr> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[info]   2022-11-03 01:00:59.101 - stderr> Spark master: local, Application Id: local-1667462453894
[info]   2022-11-03 01:00:59.939 - stdout> spark-sql> /* SELECT /*+ HINT() 4; */;
[info]   2022-11-03 01:01:00.378 - stderr> 
[info]   2022-11-03 01:01:00.379 - stderr> [PARSE_SYNTAX_ERROR] Syntax error at or near ';'(line 1, pos 26)
[info]   2022-11-03 01:01:00.379 - stderr> 
[info]   2022-11-03 01:01:00.379 - stderr> == SQL ==
[info]   2022-11-03 01:01:00.379 - stderr> /* SELECT /*+ HINT() 4; */;
[info]   2022-11-03 01:01:00.379 - stderr> --------------------------^^^
[info]   2022-11-03 01:01:00.379 - stderr> 
[info]   2022-11-03 01:01:00.401 - stdout> spark-sql> /* SELECT /*+ HINT() 4; */ SELECT 1;
[info]   2022-11-03 01:01:02.78 - stdout> 1
[info]   2022-11-03 01:01:02.78 - stderr> Time taken: 2.384 seconds, Fetched 1 row(s)
[info]   2022-11-03 01:01:02.791 - stderr> 
[info]   2022-11-03 01:01:02.791 - stderr> Unclosed bracketed comment(line 1, pos 0)
[info]   2022-11-03 01:01:02.791 - stderr> 
[info]   2022-11-03 01:01:02.791 - stderr> == SQL ==
[info]   2022-11-03 01:01:02.791 - stderr> /* Here is a unclosed bracketed comment SELECT 1;
[info]   2022-11-03 01:01:02.791 - stderr> ^^^
[info]   2022-11-03 01:01:02.791 - stdout> spark-sql> /* Here is a unclosed bracketed comment SELECT 1;
[info]   2022-11-03 01:01:02.792 - stderr> 
[info]   2022-11-03 01:01:02.795 - stdout> spark-sql> /* SELECT /*+ HINT() */ 4; */;
[info]   2022-11-03 01:01:02.995 - stdout> spark-sql> 
[info]   ===========================
[info]   End CliSuite failure output
[info]   =========================== (CliSuite.scala:213)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)

Maybe we don't wait for 2 minutes in real life.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let me check again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey this also fails on my PR. Just FYI didn't meant to push!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu shall we increase the timeout? The github action machines are not stable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu shall we increase the timeout? The github action machines are not stable.

Yea, the logic should be ok...
Add below PR and we can trigger it many times to see if it's stable?
#38571

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants