Skip to content

Conversation

@vinooganesh
Copy link
Contributor

What changes were proposed in this pull request?

The memory leak that was partially fixed in #28128 doesn't cover the case where sessionState is touched. From the initial description: "Once SessionState is touched, it will add two more listeners into the SparkContext, namely SQLAppStatusListener and ExecutionListenerBus."

Why are the changes needed?

This fixes a memory leak that that can cause a spark application to oom if many spark sessions are created

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test included as a part of the PR.

@github-actions github-actions bot added the SQL label Jan 17, 2022
@vinooganesh
Copy link
Contributor Author

Hey @cloud-fan - could I please get an "ok to test" here (and any thoughts you have about this approach)? Seems like this issue is hitting folks in production per comments on this: https://issues.apache.org/jira/browse/SPARK-32165.

@cloud-fan
Copy link
Contributor

I thought the memory lead has been fixed by 4d90c5d ?

If there is a new memory leak, can you explain the object reference path from the GC root?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@dnskr
Copy link

dnskr commented Jan 17, 2022

@cloud-fan It is old memory leak and originally it was mentioned in the PR#28128 and comment.

I reproduce the issue on Spark 3.2.0 by running the code from the comment and I see the same result:
SQLAppStatusListener and ExecutionListenerBus added each time called

    SparkSession
      .builder()
      .sparkContext(context)
      .master("local")
      .getOrCreate()
      .sessionState // this touches the sessionState

@cloud-fan
Copy link
Contributor

Then can you explain more about how the memory leak happens? such as the object references path and the GC root? The added listeners can be GCed after 4d90c5d so I don't see how it can cause memory leak again.

It will be great if you can share the heap dump when the memory leak happens.

@vinooganesh
Copy link
Contributor Author

@dnskr - I will defer to you here to provide this information given that it sounds like you are hitting this in prod and are experiencing production failures (I'm mostly operating off of the info from your comment on the jira ticket). I hadn't previously seen 4d90c5d, and it does look like that should resolve this issue, but if we have a case to the contrary in prod, we can definitely investigate further.

If it's just the unit test from that other ticket that's failing, that's likely fine given that after 4d90c5d, the orphaned listeners will just be GCed away. This test replicates this behavior:

test("SPARK-34087: Fix memory leak of ExecutionListenerBus") {

@dnskr
Copy link

dnskr commented Jan 18, 2022

I haven't seen 4d90c5d either. The change fixes ExecutionListenerBus memory leak but not the memory leak of SQLAppStatusListener in case when new SessionState is touched.

I have created similar PR#35234. @cloud-fan @vinooganesh could you please have a look into it?

Regarding heap dump and related things, I don't have good experience with memory leak investigations so would highly appreciate if you could help with it.

@dnskr
Copy link

dnskr commented Jan 19, 2022

@vinooganesh Related jira ticket https://issues.apache.org/jira/browse/SPARK-32165 has been closed so you can close the PR.

@vinooganesh
Copy link
Contributor Author

Closing per comment in https://issues.apache.org/jira/browse/SPARK-32165. Thanks @cloud-fan !

@vinooganesh vinooganesh deleted the vinooganesh/SPARK-32165 branch January 19, 2022 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants