[SPARK-33259][SS] Disable streaming query with possible correctness issue by default #30210

viirya · 2020-10-30T21:55:43Z

What changes were proposed in this pull request?

This patch proposes to disable the streaming query with possible correctness issue in chained stateful operators. The behavior can be controlled by a SQL config, so if users understand the risk and still want to run the query, they can disable the check.

Why are the changes needed?

The possible correctness in chained stateful operators in streaming query is not straightforward for users. From users perspective, it will be considered as a Spark bug. It is also possible the worse case, users are not aware of the correctness issue and use wrong results.

A better approach should be to disable such queries and let users choose to run the query if they understand there is such risk, instead of implicitly running the query and let users to find out correctness issue by themselves and report this known to Spark community.

Does this PR introduce any user-facing change?

Yes. Streaming query with possible correctness issue will be blocked to run, except for users explicitly disable the SQL config.

How was this patch tested?

Unit test.

viirya · 2020-10-30T21:55:56Z

cc @dongjoon-hyun @HeartSaVioR

dongjoon-hyun · 2020-10-30T22:14:06Z

Thank you so much, @viirya !

HeartSaVioR · 2020-10-30T22:28:05Z

I'd rather try to get some sort of consensus via initiating the discussion around dev@ mailing list. I see Spark community does everything in the PR (even that requires some sort of consensus) which I don't think it's ideal and the result is from consensus among the narrow group.

To explain why I made it just logging instead of failing the query - I tried to get consensus around how to deal with this before:

I failed to get any voice except @gaborgsomogyi in #24890 and the approach wasn't radical so I did it.

This change may break some query which may work if end users are super careful and know in details and go ahead. (there's new config for sure though) I don't expect majority of end users could be, but just hypothetically thinking. I'm OK to disable such query, but I'm not 100% sure everyone is on the same page. (Someone might concern and you'd better to check that.)

SparkQA · 2020-10-30T22:50:50Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35080/

HeartSaVioR · 2020-10-30T22:52:27Z

cc. @tdas @zsxwing @jose-torres @gaborgsomogyi @xuanyuanking

SparkQA · 2020-10-30T23:14:35Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35080/

SparkQA · 2020-10-31T03:06:27Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35087/

SparkQA · 2020-10-31T03:39:23Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35087/

SparkQA · 2020-10-31T06:40:57Z

Test build #130483 has finished for PR 30210 at commit d480632.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-10-31T18:06:44Z

cc @dbtsai , @gatorsmile since this is a correctness issue.

xuanyuanking · 2020-11-02T10:19:56Z

I think it's ok to change the original param to a SQL config for end users.

This change may break some query which may work if end users are super careful and know in details and go ahead.

+1 for this concern. So how about change the default value to false?

I failed to get any voice except @gaborgsomogyi in #24890 and the approach wasn't radical so I did it.

Actually, I reviewed that PR after merging 😂, thanks for the excellent doc!
@viirya qq: Do we have the real cases on enabling this config without correctness issues? It would be great to keep updating the document by providing demo cases and specific usage of this config.

viirya · 2020-11-02T20:48:55Z

I think it's ok to change the original param to a SQL config for end users.
This change may break some query which may work if end users are super careful and know in details and go ahead.
+1 for this concern. So how about change the default value to false?

I believe this is not the first change that may break some queries. We did some similar. For such changes, we provided some configs so users still can keep with legacy behavior if they want. This change basically follows this approach.

This involves correctness and may not be aware by users. Users need to be very careful to avoid the issue. I think we should provide a baseline which is definitely correct, and provide an option (the config) for users to run with correctness risk.

@viirya qq: Do we have the real cases on enabling this config without correctness issues? It would be great to keep updating the document by providing demo cases and specific usage of this config.

For outer join or aggregation, I think the risk of correctness is pretty high. FlatMapGroupsWithState, I am not sure, but I think it is possible to not emit late rows in the state function, maybe @HeartSaVioR has some real cases?

HeartSaVioR · 2020-11-03T02:50:28Z

No I don't have real case for knowing and taking the risk. Probably I could create some query which could evade the issue, but I agree that's more likely in theory and not real case.

Saying again I don't object the change. If you look back my proposal then you'll find blocking the query is also one of options in my proposal. My point was that such change warrants the discussion, ideally in dev@ mailing list instead of PR. We should avoid making an important decision in closed group.

viirya · 2020-11-03T16:52:42Z

Hmm, based on what I saw, it seems to me the discussion on dev@ mailing list is not so active, and the PR attracts more discussion in Spark community, but I'm okay to drop some words in dev@ mailing list and see if we can get some feedback.

HeartSaVioR · 2020-11-03T21:05:06Z

That's the chicken and egg problem, you know. dev@ list is not so active because all the important discussions aren't passing through the dev list, which is I think bad in perspective of "community over code". And only me and @xuanyuanking responded for the approach which is more likely just a regular SS PRs.

viirya · 2020-11-07T07:01:15Z

Post to dev mailing list: http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Disable-streaming-query-with-possible-correctness-issue-by-default-td30380.html

tgravescs · 2020-11-10T14:13:35Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+    buildConf("spark.sql.streaming.statefulOperator.correctnessCheck")
+      .internal()
+      .doc("When true, the stateful operators for streaming query will be checked for possible " +
+        "correctness issue. Once the issue is detected, Spark will throw analysis exception. " +


this should have more information about the correctness issue or point to somewhere that does so users can properly make a decision.

Added more info. Thanks.

dongjoon-hyun · 2020-11-10T23:00:24Z

Retest this please

dongjoon-hyun · 2020-11-10T23:02:14Z

cc @tdas , @zsxwing , @cloud-fan , @gatorsmile

SparkQA · 2020-11-10T23:41:12Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35497/

SparkQA · 2020-11-11T00:11:44Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35497/

SparkQA · 2020-11-11T01:24:37Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35502/

SparkQA · 2020-11-11T01:45:37Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35502/

SparkQA · 2020-11-11T03:29:54Z

Test build #130891 has finished for PR 30210 at commit d480632.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-11T05:21:47Z

Test build #130896 has finished for PR 30210 at commit 1222f1e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-11-12T06:20:23Z

Shall we finalize and merge this PR by address @HeartSaVioR 's comments, @viirya ?

HeartSaVioR · 2020-11-12T06:25:13Z

Let's also mention the behavior change in ss-migration-guide.md

Let's make sure this review comment is also addressed as well. I just skipped mentioning it as it's already commented.

viirya · 2020-11-12T06:42:24Z

Shall we finalize and merge this PR by address @HeartSaVioR 's comments, @viirya ?

Yeah, I will address these comments.

dongjoon-hyun · 2020-11-12T07:08:48Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

-  val STATEFUL_OPERATOR_CORRECTNESS_CHECK_ENABLED =
-    buildConf("spark.sql.streaming.statefulOperator.correctnessCheck")
+  val STATEFUL_OPERATOR_CHECK_CORRECTNESS_ENABLED =
+    buildConf("spark.sql.streaming.statefulOperator.checkCorrectness.enabled")


For the naming, cc @cloud-fan .

SparkQA · 2020-11-12T08:00:24Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35581/

SparkQA · 2020-11-12T08:23:38Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35581/

tgravescs

I'm not super picky on the config name, we were trying to get rid of extra .xxx. and nothing else uses .statefulOperator. but changing to combine with checkCorrectness seems very long so seems fine to me

viirya · 2020-11-12T17:20:27Z

retest this please

SparkQA · 2020-11-12T18:15:08Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35620/

SparkQA · 2020-11-12T18:37:09Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35620/

SparkQA · 2020-11-12T22:20:35Z

Test build #131014 has finished for PR 30210 at commit 5dff48f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM. Thank you, @viirya and all!
Merged to master for Apache Spark 3.1.

HeartSaVioR · 2020-11-14T05:40:36Z

Sorry to all for all noises. Please disregard all conversation. I'll remove them now.

HeartSaVioR · 2020-11-14T05:45:36Z

I just initiated the discussion on dev@ mailing list which I should have been done instead.
https://lists.apache.org/thread.html/r30069e17f59e8d29267ae296d56840970905476019023f20164ee5a3%40%3Cdev.spark.apache.org%3E

My apologize to make noises and feel anyone unhappy.

Disable streaming query with possible correctness issue.

4862499

This comment has been minimized.

Sign in to view

Fix.

d480632

viirya changed the title ~~[SPARK-33259][SS] Disable streaming query with possible correctness issue~~ [SPARK-33259][SS] Disable streaming query with possible correctness issue by default Nov 1, 2020

tgravescs reviewed Nov 10, 2020

View reviewed changes

Add more info.

1222f1e

For review comments.

5dff48f

dongjoon-hyun reviewed Nov 12, 2020

View reviewed changes

This comment has been minimized.

Sign in to view

tgravescs approved these changes Nov 12, 2020

View reviewed changes

dongjoon-hyun approved these changes Nov 12, 2020

View reviewed changes

dongjoon-hyun closed this in 2c64b73 Nov 12, 2020

apache deleted a comment from dongjoon-hyun Nov 14, 2020

apache deleted a comment from HyukjinKwon Nov 14, 2020

apache deleted a comment from dongjoon-hyun Nov 14, 2020

viirya deleted the SPARK-33259 branch December 27, 2023 18:28

[SPARK-33259][SS] Disable streaming query with possible correctness issue by default #30210

[SPARK-33259][SS] Disable streaming query with possible correctness issue by default #30210

Uh oh!

Conversation

viirya commented Oct 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

viirya commented Oct 30, 2020

Uh oh!

dongjoon-hyun commented Oct 30, 2020

Uh oh!

HeartSaVioR commented Oct 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Oct 30, 2020

Uh oh!

HeartSaVioR commented Oct 30, 2020

Uh oh!

SparkQA commented Oct 30, 2020

Uh oh!

This comment has been minimized.

SparkQA commented Oct 31, 2020

Uh oh!

SparkQA commented Oct 31, 2020

Uh oh!

SparkQA commented Oct 31, 2020

Uh oh!

dongjoon-hyun commented Oct 31, 2020

Uh oh!

xuanyuanking commented Nov 2, 2020

Uh oh!

viirya commented Nov 2, 2020

Uh oh!

HeartSaVioR commented Nov 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viirya commented Nov 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Nov 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viirya commented Nov 7, 2020

Uh oh!

tgravescs Nov 10, 2020

Choose a reason for hiding this comment

Uh oh!

viirya Nov 11, 2020

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Nov 10, 2020

Uh oh!

dongjoon-hyun commented Nov 10, 2020

Uh oh!

SparkQA commented Nov 10, 2020

Uh oh!

SparkQA commented Nov 11, 2020

Uh oh!

SparkQA commented Nov 11, 2020

Uh oh!

SparkQA commented Nov 11, 2020

Uh oh!

SparkQA commented Nov 11, 2020

Uh oh!

SparkQA commented Nov 11, 2020

Uh oh!

dongjoon-hyun commented Nov 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Nov 12, 2020

Uh oh!

viirya commented Nov 12, 2020

Uh oh!

dongjoon-hyun Nov 12, 2020

Choose a reason for hiding this comment

Uh oh!

viirya commented Oct 30, 2020 •

edited

Loading

HeartSaVioR commented Oct 30, 2020 •

edited

Loading

HeartSaVioR commented Nov 3, 2020 •

edited

Loading

viirya commented Nov 3, 2020 •

edited

Loading

HeartSaVioR commented Nov 3, 2020 •

edited

Loading

dongjoon-hyun commented Nov 12, 2020 •

edited

Loading