-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-22016][SQL] Add HiveDialect for JDBC connection to Hive #19238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
Why not directly connecting to Hive metastore? |
|
@gatorsmile if Hive lies on the same infrastructure as the application, then the metastore should definitely solve the issue, but a connection over JDBC is needed when data comes from an external source which only exposes such a connection through its Hive server. We encountered this and ended up adding the HiveDialect to solve it. |
| assert(df3.collect() === Array(Row(21519, 1234))) | ||
| } | ||
| assert(df3.collect() === Array(Row(21519, 1234)) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ')' is wrong. Line 1105~1107 from the original have indentation issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must have changed when formatting the code using the IDE. Scalastyle checks passed though, but let me rollback that anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun done! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, actually, I meant the original Spark code is also wrong in terms of indentation. You can fix the indentation of original line 1105~1107 here. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun You are right! I misread the parenthesis. I think now is correct. Thank you for the observation :)
|
I can see the value, but it does not perform well in most cases if we using JDBC connection. Instead of adding the extra dialect to upstream, could you please add Hive as a separate data source? Thanks! |
|
Seems logical. Then, unless someone disagrees, feel free to close this PR and we will create a new spark package with this feature in a new repository. Thanks! |
|
This merge request would partly solve https://issues.apache.org/jira/browse/SPARK-21063 |
Closes apache#13794 Closes apache#18474 Closes apache#18897 Closes apache#18978 Closes apache#19152 Closes apache#19238 Closes apache#19295 Closes apache#19334 Closes apache#19335 Closes apache#19347 Closes apache#19236 Closes apache#19244 Closes apache#19300 Closes apache#19315 Closes apache#19356 Closes apache#15009 Closes apache#18253 Author: hyukjinkwon <[email protected]> Closes apache#19348 from HyukjinKwon/stale-prs.
What changes were proposed in this pull request?
Added a HiveDialect for JDBC connection to Hive.
It overrides two methods:
How was this patch tested?
It passes the added tests and it was used with a real Hive instance with real data.