Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 4 additions & 12 deletions python/pyspark/pandas/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -464,20 +464,12 @@ def is_testing() -> bool:
return "SPARK_TESTING" in os.environ


def default_session(conf: Optional[Dict[str, Any]] = None) -> SparkSession:
if conf is None:
conf = dict()
def default_session() -> SparkSession:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can remove the conf argument here?
I guess we should show a deprecation warning if it's not None for now and remove it in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this isn't an API that's not documented so it should be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just leave it to you.

spark = SparkSession.getActiveSession()
if spark is not None:
return spark

builder = SparkSession.builder.appName("pandas-on-Spark")
for key, value in conf.items():
builder = builder.config(key, value)
# Currently, pandas-on-Spark is dependent on such join due to 'compute.ops_on_diff_frames'
# configuration. This is needed with Spark 3.0+.
builder.config("spark.sql.analyzer.failAmbiguousSelfJoin", False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, we fixed this bug in the master branch so we don't need to set this anymore


if is_testing():
builder.config("spark.executor.allowSparkContext", False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this will be set separately when SparkContext is created, not here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and in fact this was set when we run tests with pytest when it was in Koalas repo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this was added for our test to check our code doesn't create SparkContext in executors.
But we can remove it anyway because the default value is False now.


return builder.getOrCreate()


Expand Down