-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37555][SQL] spark-sql should pass last unclosed comment to backend #34815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ping @cloud-fan |
|
Test build #145939 has finished for PR 34815 at commit
|
|
Kubernetes integration test starting |
|
Test build #145941 has finished for PR 34815 at commit
|
|
Kubernetes integration test starting |
|
what if the users want to write the queries in multiple lines? does our CLI support it? |
|
Kubernetes integration test status failure |
CLI will cut input when meet a un-escaped |
|
Kubernetes integration test status failure |
|
Test build #145949 has finished for PR 34815 at commit
|
| isStatement = statementInProgress(index) | ||
| } | ||
| if (isStatement) { | ||
| if (beginIndex < line.length()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this condition mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid pass an blank string to processLine, although in processLine it will handle this
// we can not use "split" function directly as ";" may be quoted
val commands = splitSemiColon(line).asScala
var command: String = ""
for (oneCmd <- commands) {
if (StringUtils.endsWith(oneCmd, "\\")) {
command += StringUtils.chop(oneCmd) + ";"
} else {
command += oneCmd
if (!StringUtils.isBlank(command)) {
val ret = processCmd(command)
command = ""
lastRet = ret
val ignoreErrors = HiveConf.getBoolVar(conf, HiveConf.ConfVars.CLIIGNOREERRORS)
if (ret != 0 && !ignoreErrors) {
CommandProcessorFactory.clean(conf.asInstanceOf[HiveConf])
return ret
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also keep same cod with hive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngersZhuuuu @cloud-fan Wondering why do we keep the CLI in the hive-thriftserver package. It has nothing to do with ThriftServer.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
thanks, merging to master! |
Can (waiting for next line of multiline input)? |
If you write it will wait for input. If want to wait for input, we need to change the console terminate logic. cc @cloud-fan WDYT? |
|
In the spark-sql console, |
|
I don't have a strong opinion either, but in this case it is a if the also shouldn't? I can imagine an edge case, where someone has a multi-line comment in their SQL that contains either some snippet of code with |
|
Yea, maybe we should leave it as it is and document the behavior clearly in https://github.com/apache/spark/pull/34821/files |
### What changes were proposed in this pull request? In #34815 we change back support unclosed bracketed comment to backend. But miss the case such as ``` SELECT 1; --comment ``` ``` SELECT 1; /* comment */ ``` It's a common use case in sql job. We should ignore the comment at end of SQL script. Need to clarify that when use `-e`, we directly pass SQL to `splitSemiColon`, when use `-f`, CliDriver will add a `\n` for query. ``` public int processReader(BufferedReader r) throws IOException { StringBuilder qsb = new StringBuilder(); String line; while((line = r.readLine()) != null) { if (!line.startsWith("--")) { qsb.append(line + "\n"); } } return this.processLine(qsb.toString()); } ``` So in `splitSemiColon`, we should consider both case. In this pr, the final behavior like below For `-e` | Query | Behavior before | Behavior now| -------|---------|--------------| | `SELECT 1; --comment` | Will pass both `SELECT 1` and `--comment` to backend engine and throw exception since `--comment` can't be executed| Only pass `SELECT 1` to backend engine and will ignore the simple comment | | `SELECT 1; /* comment */ ` | Will pass both `SELECT 1` and `/* comment */` to backend engine and throw exception since `/* comment */` can't be executed | Only pass `SELECT 1` to backend engine| | `SELECT 1; /* comment ` | Will pass `SELECT 1` and `/* comment` to backend engine | Will pass `SELECT 1` and `/* comment` to backend engine | | `SELECT 1; /* comment SELECT 1` | Will pass `SELECT 1` and `/* comment SELECT 1` to backend engine | Will pass `SELECT 1` and `/* comment SELECT 1` to backend engine | | `/ * comment SELECT 1;` | Will pass `/ * comment SELECT 1;` to back end engine and throw `unclose bracketed comment exception`| Will pass `/ * comment SELECT 1;` to back end engine and throw `unclose bracketed comment exception`| For `-f`, since `-f` will add a `\n` at the end line if it's not started as `--` | Query | Behavior before | Behavior now| -------|---------|--------------| | `SELECT 1; --comment\n` | Will pass both `SELECT 1` and `--comment` to backend engine and throw exception since `--comment` can't be executed| Only pass `SELECT 1` to backend engine and will ignore the simple comment | | `SELECT 1; /* comment */ \n` | Will pass both `SELECT 1` and `/* comment */` to backend engine and throw exception since `/* comment */` can't be executed | Only pass `SELECT 1` to backend engine| | `SELECT 1; /* comment \n` | Will pass `SELECT 1` and `/* comment\n` to backend engine | Will pass `SELECT 1` and `/* comment\n` to backend engine | | `SELECT 1; /* comment SELECT 1\n` | Will pass `SELECT 1` and `/* comment SELECT 1\n` to backend engine | Will pass `SELECT 1` and `/* comment SELECT 1\n` to backend engine | | `/ * comment SELECT 1;\n` | Will pass `/ * comment SELECT 1;\n` to back end engine and throw `unclose bracketed comment exception`| Will pass `/ * comment SELECT 1;\n` to back end engine and throw `unclose bracketed comment exception`| ### Why are the changes needed? Spark sql should not pass last entire comment to backend ### Does this PR introduce _any_ user-facing change? Use can write SQL script end with a comment ``` SELECT 1; --comment ``` ``` SELECT 1; /* comment */ ``` ### How was this patch tested? Added UT Closes #35206 from AngersZhuuuu/SPARK-37906. Lead-authored-by: Angerszhuuuu <[email protected]> Co-authored-by: AngersZhuuuu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
| ) | ||
| } | ||
|
|
||
| test("SPARK-37555: spark-sql should pass last unclosed comment to backend") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngersZhuuuu This test is flaky, fails quite often on repeated runs. Here's a sample error:
2021-12-08 12:01:27.68 - stderr> Setting default log level to "WARN".
2021-12-08 12:01:27.68 - stderr> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2021-12-08 12:01:39.459 - stderr> Spark master: local, Application Id: local-1638993689929
2021-12-08 12:01:40.688 - stdout> spark-sql> /* SELECT /*+ HINT() 4; */;
2021-12-08 12:01:41.299 - stdout> spark-sql> /* SELECT /*+ HINT() 4; */ SELECT 1;
2021-12-08 12:01:41.56 - stderr> Error in query:
2021-12-08 12:01:41.56 - stderr> mismatched input ';' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 26)
2021-12-08 12:01:41.56 - stderr>
2021-12-08 12:01:41.56 - stderr> == SQL ==
2021-12-08 12:01:41.56 - stderr> /* SELECT /*+ HINT() 4; */;
2021-12-08 12:01:41.56 - stderr> --------------------------^^^
2021-12-08 12:01:41.56 - stderr>
2021-12-08 12:01:47.573 - stdout> 1
2021-12-08 12:01:47.573 - stderr> Time taken: 6.272 seconds, Fetched 1 row(s)
2021-12-08 12:01:47.592 - stdout> spark-sql> /* Here is a unclosed bracketed comment SELECT 1;
2021-12-08 12:01:47.601 - stderr> Error in query:
2021-12-08 12:01:47.601 - stderr> Unclosed bracketed comment(line 1, pos 0)
2021-12-08 12:01:47.601 - stderr>
2021-12-08 12:01:47.601 - stderr> == SQL ==
2021-12-08 12:01:47.601 - stderr> /* Here is a unclosed bracketed comment SELECT 1;
2021-12-08 12:01:47.601 - stderr> ^^^
2021-12-08 12:01:47.601 - stderr>
2021-12-08 12:01:47.612 - stdout> spark-sql> /* SELECT /*+ HINT() */ 4; */;
2021-12-08 12:01:49.552 - stdout> spark-sql>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have run this many times. Not failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fails on my PRs periodically too. For instance, see https://github.com/MaxGekk/spark/actions/runs/3383705774/jobs/5619867814:
[info] - SPARK-37555: spark-sql should pass last unclosed comment to backend *** FAILED *** (2 minutes, 8 seconds)
[info] =======================
[info] CliSuite failure output
[info] =======================
[info] Spark SQL CLI command line: ../../bin/spark-sql --master local --driver-java-options -Dderby.system.durability=test --conf spark.ui.enabled=false --hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/home/runner/work/spark/spark/target/tmp/spark-16a98152-adaa-4653-93d1-82425ddc6ed5;create=true --hiveconf hive.exec.scratchdir=/home/runner/work/spark/spark/target/tmp/spark-d3ccd810-41f0-4a29-9592-f8c08ad7f50a --hiveconf conf1=conftest --hiveconf conf2=1 --hiveconf hive.metastore.warehouse.dir=/home/runner/work/spark/spark/target/tmp/spark-c80a8015-a17d-422d-aeff-11be2a4b9986
[info] Exception: java.util.concurrent.TimeoutException: Futures timed out after [2 minutes]
[info] Failed to capture next expected output "Unclosed bracketed comment" within 2 minutes.
[info]
[info] 2022-11-03 01:00:52.245 - stderr> Setting default log level to "WARN".
[info] 2022-11-03 01:00:52.245 - stderr> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[info] 2022-11-03 01:00:59.101 - stderr> Spark master: local, Application Id: local-1667462453894
[info] 2022-11-03 01:00:59.939 - stdout> spark-sql> /* SELECT /*+ HINT() 4; */;
[info] 2022-11-03 01:01:00.378 - stderr>
[info] 2022-11-03 01:01:00.379 - stderr> [PARSE_SYNTAX_ERROR] Syntax error at or near ';'(line 1, pos 26)
[info] 2022-11-03 01:01:00.379 - stderr>
[info] 2022-11-03 01:01:00.379 - stderr> == SQL ==
[info] 2022-11-03 01:01:00.379 - stderr> /* SELECT /*+ HINT() 4; */;
[info] 2022-11-03 01:01:00.379 - stderr> --------------------------^^^
[info] 2022-11-03 01:01:00.379 - stderr>
[info] 2022-11-03 01:01:00.401 - stdout> spark-sql> /* SELECT /*+ HINT() 4; */ SELECT 1;
[info] 2022-11-03 01:01:02.78 - stdout> 1
[info] 2022-11-03 01:01:02.78 - stderr> Time taken: 2.384 seconds, Fetched 1 row(s)
[info] 2022-11-03 01:01:02.791 - stderr>
[info] 2022-11-03 01:01:02.791 - stderr> Unclosed bracketed comment(line 1, pos 0)
[info] 2022-11-03 01:01:02.791 - stderr>
[info] 2022-11-03 01:01:02.791 - stderr> == SQL ==
[info] 2022-11-03 01:01:02.791 - stderr> /* Here is a unclosed bracketed comment SELECT 1;
[info] 2022-11-03 01:01:02.791 - stderr> ^^^
[info] 2022-11-03 01:01:02.791 - stdout> spark-sql> /* Here is a unclosed bracketed comment SELECT 1;
[info] 2022-11-03 01:01:02.792 - stderr>
[info] 2022-11-03 01:01:02.795 - stdout> spark-sql> /* SELECT /*+ HINT() */ 4; */;
[info] 2022-11-03 01:01:02.995 - stdout> spark-sql>
[info] ===========================
[info] End CliSuite failure output
[info] =========================== (CliSuite.scala:213)
[info] org.scalatest.exceptions.TestFailedException:
[info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
Maybe we don't wait for 2 minutes in real life.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let me check again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey this also fails on my PR. Just FYI didn't meant to push!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngersZhuuuu shall we increase the timeout? The github action machines are not stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngersZhuuuu shall we increase the timeout? The github action machines are not stable.
Yea, the logic should be ok...
Add below PR and we can trigger it many times to see if it's stable?
#38571
What changes were proposed in this pull request?
In current spark-sql cli interface, if the end SQL is not a close comment, the SQL won't be passed to backend engine and just ignored. This caused a problem that if user write a SQL with wrong comment. It's just ignored and won't throw exception.
For example:
After this pr:
In SPARK-33100 add this change #29982
Hive related code
https://github.com/apache/hive/blob/1090c93b1a02d480bdee2af2cecf503f8a54efc6/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L488-L490
Why are the changes needed?
Exact exceptions are thrown for wrong statements, which is convenient for users to troubleshoot.
Does this PR introduce any user-facing change?
Yes, if user write a wrong comment in sql/sql file or query in the end.
Before it's just ignored since it's not a statement. Now it will be passed to backend engine and if the statement is not correct, it will throw SQL exception.
How was this patch tested?
added UT and test by handle.