Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Apr 3, 2017

What changes were proposed in this pull request?

This is a follow-up of #17285 .

How was this patch tested?

existing tests

@cloud-fan
Copy link
Contributor Author

cc @hvanhovell @gatorsmile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ueshin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the content in this file is copied from SQLConf, except the lazy val for session local timezone, and the 3 copy methods here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied from object SQLConf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to CatalystConf.STATIC and SQLConf.STATIC

@SparkQA
Copy link

SparkQA commented Apr 3, 2017

Test build #75485 has finished for PR 17521 at commit ef0bb16.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 3, 2017

Test build #75486 has finished for PR 17521 at commit eafa58c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 3, 2017

Test build #75489 has finished for PR 17521 at commit 709b3a0.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 3, 2017

Test build #75491 has finished for PR 17521 at commit 107bfd4.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 3, 2017

Test build #75492 has finished for PR 17521 at commit 32aaf63.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Apr 3, 2017

To be clear, I don't think we should have two separate places to define config entries. If this is what the pr is doing, I strongly veto.

I get that there is a small incremental improvement we can do by properly bucketing things into logical vs physical components, but I don't think it is worth the overhead of us needing to think about where a config is each time we want to look it up ...

@cloud-fan cloud-fan changed the title [SPARK-20204][SQL] separate SQLConf into catalyst confs and sql confs [SPARK-20204][SQL] remove SimpleCatalystConf and CatalystConf type alias Apr 4, 2017
@rxin
Copy link
Contributor

rxin commented Apr 4, 2017

LGTM

@SparkQA
Copy link

SparkQA commented Apr 4, 2017

Test build #75510 has finished for PR 17521 at commit 1616eb2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Apr 4, 2017

Merging in master.

@asfgit asfgit closed this in 402bf2a Apr 4, 2017
@nsyca
Copy link
Contributor

nsyca commented Apr 4, 2017

After merging my local branch up to this PR, I ran some of the regression tests from a machine in the Eastern time zone and observed the following failures:

[info] *** 35 TESTS FAILED ***
[error] Failed: Total 3094, Failed 35, Errors 0, Passed 3059, Ignored 54
[error] Failed tests:
[error] 	org.apache.spark.sql.SQLQuerySuite
[error] 	org.apache.spark.sql.DateFunctionsSuite
[error] 	org.apache.spark.sql.SQLQueryTestSuite
[error] 	org.apache.spark.sql.execution.datasources.csv.CSVSuite
[error] 	org.apache.spark.sql.DataFrameSuite
[error] 	org.apache.spark.sql.sources.PartitionedWriteSuite
[error] 	org.apache.spark.sql.JsonFunctionsSuite
[error] 	org.apache.spark.sql.execution.datasources.json.JsonSuite
[error] 	org.apache.spark.sql.execution.datasources.parquet.ParquetQuerySuite
[error] (sql/test:test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 1132 s, completed 4-Apr-2017 6:09:08 PM

One failure is SQLQuerySuite/SPARK-3173 Timestamp support in the parser. Once I change my machine to Pacific time zone, the failure is gone.

I am not making a statement that this PR causes the above failures. It could be something before this PR. I just want to share this information.

@rxin
Copy link
Contributor

rxin commented Apr 4, 2017

@nsyca can you look into it?

@nsyca
Copy link
Contributor

nsyca commented Apr 4, 2017

I will investigate. I am searching from the last good point I merged my private branch with the master trunk and will go from there.

@cloud-fan
Copy link
Contributor Author

@nsyca can you try marking SQLConf.SESSION_LOCAL_TIMEZONE as lazy val? I think the issue is that, once object SQLConf is instantiated, the default value for SESSION_LOCAL_TIMEZONE is determined, and later in the test we will change the timezone which can't take effect then.

@nsyca
Copy link
Contributor

nsyca commented Apr 5, 2017

@dilipbiswal has narrowed down that this PR is changing the behaviour. He will continue to investigate and will post an update in the next hour or so before he calls it a day.

@dilipbiswal
Copy link
Contributor

dilipbiswal commented Apr 5, 2017

@cloud-fan @nsyca Thanks !! Changing to make it lazy works for the test cases i have tried. I am running the full tests now.

@dilipbiswal
Copy link
Contributor

@cloud-fan @nsyca A quick update.. I ran the problematic tests and they pass with a change to move the time zone setting code to PlanTest.scala just before we create the SQLConf like following -

 // Timezone is fixed to America/Los_Angeles for those timezone sensitive tests (timestamp_*)
  TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))
  // Add Locale setting
  Locale.setDefault(Locale.US)

  protected val conf = new SQLConf().copy(SQLConf.CASE_SENSITIVE -> true)

Please let me know what you think ..

@gatorsmile
Copy link
Member

gatorsmile commented Apr 5, 2017

SESSION_LOCAL_TIMEZONE sounds a static SQLConf. Do we allow users to change it at runtime?It sounds like our codes do not allow users to change/refresh it.

Update: nvm. The purpose of this configuration is session specific.

@cloud-fan
Copy link
Contributor Author

also cc @ueshin , I think the default value of SESSION_LOCAL_TIMEZONE should always be the current timezone in the JVM. We may need something like ConfigEntryWithDefaultFunction, so that the default value is not fixed when the conf entry is created.

@gatorsmile
Copy link
Member

These test cases are timezone sensitive, right? If so, the changes made by @dilipbiswal are reasonable to me.

@ueshin
Copy link
Member

ueshin commented Apr 5, 2017

I agree with @cloud-fan's suggestion, ConfigEntryWithDefaultFunction approach. Hopefully the default value should be fixed when SQLConf instance is created with the timezone in the JVM at that moment.

@gatorsmile
Copy link
Member

@dilipbiswal Can you please try it based on what @cloud-fan and @ueshin suggested? Does it resolve the issue you report?

@dilipbiswal
Copy link
Contributor

dilipbiswal commented Apr 5, 2017

@gatorsmile @cloud-fan @ueshin Sorry .. i was on transit from work. Sure, i will make a try. However , i wanted to understand this a bit more. In my understanding, the current problem we are trying to address is only in our test environment. In our tests as part of the infrastructure we want to keep the timezone constant as we have hardcoded timestamp literals in the test and in order to make row comparisions work we need to match the timezone with the literals we use in the test. Today, we do that in QueryTest.scala here

However, due to the change in the PR, this is happening a bit late as we have created a cloned copy of SQLConf in PlanTest.scala (super class of QueryTest.scala) in here. So in my understanding its this copy step where the configurations are used the first time and populates the singleton object SQLConf. Previously Analyzer, optimizer etc used to instantiate a new CatalystConf class and they were seeing the updated default timezone. Please correct me on this.. Thats the reason, i was suggesting to move the setting of timezone up into PlanTest before the place where SQLConf is cloned.

I will study the ConfigEntryWithDefault approach in the meantime.

@nsyca
Copy link
Contributor

nsyca commented Apr 5, 2017

Just add another view point to this incident.

If those failed test cases were written with the time zone fixed to a region rather than Pacific time zone, we would fail fast in the first run over the regression machines in the west coast. We should have picked the time zone like GMT+1, one that is different from the GMT used by java.sql.Timestamp and from the local time zone of the regression machines.

@viirya
Copy link
Member

viirya commented Apr 5, 2017

The ConfigEntryWithDefaultFunction approach sounds good to me.

@dilipbiswal
Copy link
Contributor

@viirya Thanks !! Actually i assumed the problem is limited only to tests. I thought changing the timezone on the fly is not a realistic scenario. Looks like it is :-)

I will submit a pr with wenchen's suggestion.

@viirya
Copy link
Member

viirya commented Apr 5, 2017

The pr is at #17537.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants