[SPARK-42219][CORE] Introducing a config to close all active SparkContexts after the Main method has finished#39775
[SPARK-42219][CORE] Introducing a config to close all active SparkContexts after the Main method has finished#39775attilapiros wants to merge 8 commits intoapache:masterfrom
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
That's a bad example from ancient YARN age, @attilapiros . I'd not re-enforce those bad habits. Instead, I can give you the counter examples like
spark/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
Lines 88 to 90 in aeb2a13
|
@dongjoon-hyun I have moved the config into the k8s module |
|
cc @holdenk |
|
This pyspark test failure is unrelated: |
|
cc @HyukjinKwon, @srowen |
|
SPARK-42698(#40314) is aiming to expand the scope of stopping SparkContext after |
srowen
left a comment
There was a problem hiding this comment.
Should this be specific to Kubernetes?
Does it need to be a config or a method you can call?
Actually, why would you not kill the contexts after main exits in any case?
The original #32283 was Kubernetes specific. This PR just adds a new config to have the old behaviour as default but make the new one also available.
Unfortunately there is a use case for both behaviour. See the next point.
I bumped into this change when I analysed an application where spark was used as a job server. |
|
cc @mridulm |
|
Our customer also encountered this issue recently. They are migrating their spark job server (a spring boot application) from Spark 2.4 to Spark 3. |
|
I am closing this PR. Job servers has the option to do blocking call in the main method to avoid the auto stopping of the active spark contexts. |
What changes were proposed in this pull request?
Introducing a config to close all active SparkContexts after the Main method has finished.
Why are the changes needed?
We run into errors after upgrading from Spark 3.1 to Spark 3.2 as the SparkContext get closed right after the starting of the application. It turned out the root cause is SPARK-34674 which introduced the closing of the SparkContexts after the Main method has finished. For details see #32283.
This application was a spark job server built on top of springboot so all the job submits were outside of the main method.
Does this PR introduce any user-facing change?
With the current default (true) so it is the same behaviour as for YARN.
How was this patch tested?
Manually.