-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29644][SQL] Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils #26301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
[SPARK-29644][SQL] Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils #26301
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
e13326d
Fixed ShortType wrongly set as Int in JDBCUtils.scala. Added test cas…
shivsood 395fe99
fix for ShortType and ByteType in JDBCUtils.scala, Unit test cases in…
shivsood 66dfd4c
test fixes
shivsood 9cdad2b
fixed test failure in JDBCSuite.scala to get tinyint and smallint as …
shivsood 0600894
removed redundant test cases.
shivsood File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto - #26549 (comment). Can anyone explain why it was possible, and how do we handle unsigned cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuing from #26549 (comment) : TINYINT looks like it's a single byte alright, so using ByteType is reasonable. However it looks like it's treated as signed by some but not all DBMSes. Is it unsigned in SQL Server?
Just checking: these types like TINYINT and SMALLINT are not standard types, although widely supported, right? should these types be used by default for all JDBC sources?
Yeah I have some more doubts now that the TINYINT issue was pointed out. @shivsood
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. @gatorsmile , @HyukjinKwon , @srowen . I overlooked that mismatch between TINYINT vs Byte in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SMALLINTis widely supported because it's SQL-92. For TINYINT, I agree that we need to revert partially for that type.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all for pointing out these issues. I had overlooked handling of unsigned cases and the fact that each database may define on its own. I think the problem exists for both SMALLINT and TINYINT.
My understanding is that there are no unsigned type in Spark. c.f. https://spark.apache.org/docs/latest/sql-reference.html. Is that assertion right?
Do we have a test for an integer where integer value is 4294967295? Per MySQL documentation that's possible that an unsigned integer will have that value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior would be as follows.
Overwrite scenario
Read scenario
If an existing table in DBMSS has type TinyInt, a read in Spark would results in a ShortType. Because ShortType range in Spark in -32786 to +32768, DBMSS signed value -127 to +127 and unsigned range of 0 to 255 will be handled.
If an existing table in DBMSS has type SmallInt, a read would result in Spark dataframe having a column type as Integer. Because Integer range in Spark in -2147483648 to +2147483648, DBMSS signed value --32768 to +32768 as well as unsigned range of 0 to 65535 will be handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have to use a widening conversion both ways. That's the safest thing to do, I guess, and less of a change from the current behavior, where bytes go all the way to ints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, I see. right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, thanks for the explanation, @shivsood
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised PR #27172 with the proposed fix. ShortType is unchanged, only ByteType fix is modified to map to ShortType on the read path so enable support for 0 to 255 range.
@maropu @srowen @HyukjinKwon as FYI. Thanks