-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32165][SQL] Ensure Spark only initiates SharedState once across SparkSessions #35224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hey @cloud-fan - could I please get an "ok to test" here (and any thoughts you have about this approach)? Seems like this issue is hitting folks in production per comments on this: https://issues.apache.org/jira/browse/SPARK-32165. |
|
I thought the memory lead has been fixed by 4d90c5d ? If there is a new memory leak, can you explain the object reference path from the GC root? |
|
Can one of the admins verify this patch? |
|
@cloud-fan It is old memory leak and originally it was mentioned in the PR#28128 and comment. I reproduce the issue on Spark 3.2.0 by running the code from the comment and I see the same result: |
|
Then can you explain more about how the memory leak happens? such as the object references path and the GC root? The added listeners can be GCed after 4d90c5d so I don't see how it can cause memory leak again. It will be great if you can share the heap dump when the memory leak happens. |
|
@dnskr - I will defer to you here to provide this information given that it sounds like you are hitting this in prod and are experiencing production failures (I'm mostly operating off of the info from your comment on the jira ticket). I hadn't previously seen 4d90c5d, and it does look like that should resolve this issue, but if we have a case to the contrary in prod, we can definitely investigate further. If it's just the unit test from that other ticket that's failing, that's likely fine given that after 4d90c5d, the orphaned listeners will just be GCed away. This test replicates this behavior: spark/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala Line 49 in 661b80d
|
|
I haven't seen 4d90c5d either. The change fixes I have created similar PR#35234. @cloud-fan @vinooganesh could you please have a look into it? Regarding heap dump and related things, I don't have good experience with memory leak investigations so would highly appreciate if you could help with it. |
|
@vinooganesh Related jira ticket https://issues.apache.org/jira/browse/SPARK-32165 has been closed so you can close the PR. |
|
Closing per comment in https://issues.apache.org/jira/browse/SPARK-32165. Thanks @cloud-fan ! |
What changes were proposed in this pull request?
The memory leak that was partially fixed in #28128 doesn't cover the case where sessionState is touched. From the initial description: "Once SessionState is touched, it will add two more listeners into the SparkContext, namely SQLAppStatusListener and ExecutionListenerBus."
Why are the changes needed?
This fixes a memory leak that that can cause a spark application to oom if many spark sessions are created
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test included as a part of the PR.