Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Oct 30, 2020

What changes were proposed in this pull request?

This patch proposes to disable the streaming query with possible correctness issue in chained stateful operators. The behavior can be controlled by a SQL config, so if users understand the risk and still want to run the query, they can disable the check.

Why are the changes needed?

The possible correctness in chained stateful operators in streaming query is not straightforward for users. From users perspective, it will be considered as a Spark bug. It is also possible the worse case, users are not aware of the correctness issue and use wrong results.

A better approach should be to disable such queries and let users choose to run the query if they understand there is such risk, instead of implicitly running the query and let users to find out correctness issue by themselves and report this known to Spark community.

Does this PR introduce any user-facing change?

Yes. Streaming query with possible correctness issue will be blocked to run, except for users explicitly disable the SQL config.

How was this patch tested?

Unit test.

@viirya
Copy link
Member Author

viirya commented Oct 30, 2020

cc @dongjoon-hyun @HeartSaVioR

@dongjoon-hyun
Copy link
Member

Thank you so much, @viirya !

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Oct 30, 2020

I'd rather try to get some sort of consensus via initiating the discussion around dev@ mailing list. I see Spark community does everything in the PR (even that requires some sort of consensus) which I don't think it's ideal and the result is from consensus among the narrow group.

To explain why I made it just logging instead of failing the query - I tried to get consensus around how to deal with this before:

I failed to get any voice except @gaborgsomogyi in #24890 and the approach wasn't radical so I did it.

This change may break some query which may work if end users are super careful and know in details and go ahead. (there's new config for sure though) I don't expect majority of end users could be, but just hypothetically thinking. I'm OK to disable such query, but I'm not 100% sure everyone is on the same page. (Someone might concern and you'd better to check that.)

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35080/

@HeartSaVioR
Copy link
Contributor

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35080/

@SparkQA

This comment has been minimized.

@SparkQA
Copy link

SparkQA commented Oct 31, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35087/

@SparkQA
Copy link

SparkQA commented Oct 31, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35087/

@SparkQA
Copy link

SparkQA commented Oct 31, 2020

Test build #130483 has finished for PR 30210 at commit d480632.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

cc @dbtsai , @gatorsmile since this is a correctness issue.

@viirya viirya changed the title [SPARK-33259][SS] Disable streaming query with possible correctness issue [SPARK-33259][SS] Disable streaming query with possible correctness issue by default Nov 1, 2020
@xuanyuanking
Copy link
Member

I think it's ok to change the original param to a SQL config for end users.

This change may break some query which may work if end users are super careful and know in details and go ahead.

+1 for this concern. So how about change the default value to false?

I failed to get any voice except @gaborgsomogyi in #24890 and the approach wasn't radical so I did it.

Actually, I reviewed that PR after merging 😂, thanks for the excellent doc!
@viirya qq: Do we have the real cases on enabling this config without correctness issues? It would be great to keep updating the document by providing demo cases and specific usage of this config.

@viirya
Copy link
Member Author

viirya commented Nov 2, 2020

I think it's ok to change the original param to a SQL config for end users.

This change may break some query which may work if end users are super careful and know in details and go ahead.

+1 for this concern. So how about change the default value to false?

I believe this is not the first change that may break some queries. We did some similar. For such changes, we provided some configs so users still can keep with legacy behavior if they want. This change basically follows this approach.

This involves correctness and may not be aware by users. Users need to be very careful to avoid the issue. I think we should provide a baseline which is definitely correct, and provide an option (the config) for users to run with correctness risk.

@viirya qq: Do we have the real cases on enabling this config without correctness issues? It would be great to keep updating the document by providing demo cases and specific usage of this config.

For outer join or aggregation, I think the risk of correctness is pretty high. FlatMapGroupsWithState, I am not sure, but I think it is possible to not emit late rows in the state function, maybe @HeartSaVioR has some real cases?

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Nov 3, 2020

No I don't have real case for knowing and taking the risk. Probably I could create some query which could evade the issue, but I agree that's more likely in theory and not real case.

Saying again I don't object the change. If you look back my proposal then you'll find blocking the query is also one of options in my proposal. My point was that such change warrants the discussion, ideally in dev@ mailing list instead of PR. We should avoid making an important decision in closed group.

@viirya
Copy link
Member Author

viirya commented Nov 3, 2020

Hmm, based on what I saw, it seems to me the discussion on dev@ mailing list is not so active, and the PR attracts more discussion in Spark community, but I'm okay to drop some words in dev@ mailing list and see if we can get some feedback.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Nov 3, 2020

That's the chicken and egg problem, you know. dev@ list is not so active because all the important discussions aren't passing through the dev list, which is I think bad in perspective of "community over code". And only me and @xuanyuanking responded for the approach which is more likely just a regular SS PRs.

@viirya
Copy link
Member Author

viirya commented Nov 7, 2020

buildConf("spark.sql.streaming.statefulOperator.correctnessCheck")
.internal()
.doc("When true, the stateful operators for streaming query will be checked for possible " +
"correctness issue. Once the issue is detected, Spark will throw analysis exception. " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have more information about the correctness issue or point to somewhere that does so users can properly make a decision.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more info. Thanks.

@dongjoon-hyun
Copy link
Member

Retest this please

@dongjoon-hyun
Copy link
Member

cc @tdas , @zsxwing , @cloud-fan , @gatorsmile

@SparkQA
Copy link

SparkQA commented Nov 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35497/

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35497/

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35502/

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35502/

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Test build #130891 has finished for PR 30210 at commit d480632.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Test build #130896 has finished for PR 30210 at commit 1222f1e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Nov 12, 2020

Shall we finalize and merge this PR by address @HeartSaVioR 's comments, @viirya ?

@HeartSaVioR
Copy link
Contributor

Let's also mention the behavior change in ss-migration-guide.md

Let's make sure this review comment is also addressed as well. I just skipped mentioning it as it's already commented.

@viirya
Copy link
Member Author

viirya commented Nov 12, 2020

Shall we finalize and merge this PR by address @HeartSaVioR 's comments, @viirya ?

Yeah, I will address these comments.

val STATEFUL_OPERATOR_CORRECTNESS_CHECK_ENABLED =
buildConf("spark.sql.streaming.statefulOperator.correctnessCheck")
val STATEFUL_OPERATOR_CHECK_CORRECTNESS_ENABLED =
buildConf("spark.sql.streaming.statefulOperator.checkCorrectness.enabled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the naming, cc @cloud-fan .

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35581/

@SparkQA

This comment has been minimized.

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35581/

Copy link
Contributor

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super picky on the config name, we were trying to get rid of extra .xxx. and nothing else uses .statefulOperator. but changing to combine with checkCorrectness seems very long so seems fine to me

@viirya
Copy link
Member Author

viirya commented Nov 12, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35620/

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35620/

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Test build #131014 has finished for PR 30210 at commit 5dff48f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @viirya and all!
Merged to master for Apache Spark 3.1.

@HeartSaVioR
Copy link
Contributor

Sorry to all for all noises. Please disregard all conversation. I'll remove them now.

@apache apache deleted a comment from dongjoon-hyun Nov 14, 2020
@apache apache deleted a comment from dongjoon-hyun Nov 14, 2020
@apache apache deleted a comment from dongjoon-hyun Nov 14, 2020
@apache apache deleted a comment from dongjoon-hyun Nov 14, 2020
@apache apache deleted a comment from dongjoon-hyun Nov 14, 2020
@apache apache deleted a comment from dongjoon-hyun Nov 14, 2020
@apache apache deleted a comment from HyukjinKwon Nov 14, 2020
@apache apache deleted a comment from HyukjinKwon Nov 14, 2020
@apache apache deleted a comment from dongjoon-hyun Nov 14, 2020
@apache apache deleted a comment from dongjoon-hyun Nov 14, 2020
@apache apache deleted a comment from dongjoon-hyun Nov 14, 2020
@apache apache deleted a comment from dongjoon-hyun Nov 14, 2020
@HeartSaVioR
Copy link
Contributor

I just initiated the discussion on dev@ mailing list which I should have been done instead.
https://lists.apache.org/thread.html/r30069e17f59e8d29267ae296d56840970905476019023f20164ee5a3%40%3Cdev.spark.apache.org%3E

My apologize to make noises and feel anyone unhappy.

@viirya viirya deleted the SPARK-33259 branch December 27, 2023 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants