-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32576][SQL] Support PostgreSQL bpchar type and array of char type
#29192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Could you add tests in |
|
So, I think its okay to add it for better interoperability. |
|
btw, thanks for your first contribution, @kujon ! |
|
ok to test |
will do! |
|
Test build #126374 has finished for PR 29192 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your first contribution, @kujon . Also, thank you for trying to add a test into PostgresIntegrationSuite.
Could you revise the PR description to be complete without referencing JIRA? The PR description will become a permanent commit log and will be read much more times.
bpchar array
|
@kujon Any update? If you get stuck, please let me know. |
|
Hi @maropu, I just need a few more days. I'm the meantime, I struggled finding said file, do you want me to add a fresh integration test? |
|
This one: https://github.com/apache/spark/blob/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala |
|
kindly ping. |
|
It seems that it's too hard requirement for a new contributor. How do you want to proceed this, @maropu ? |
…tgresIntegrationSuite ### What changes were proposed in this pull request? This PR intends to add tests to check if all the character types in PostgreSQL supported. The document for character types in PostgreSQL: https://www.postgresql.org/docs/current/datatype-character.html Closes #29192. ### Why are the changes needed? For better test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add tests. Closes #29394 from maropu/pr29192. Lead-authored-by: Takeshi Yamamuro <[email protected]> Co-authored-by: kujon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit b2c45f7) Signed-off-by: Dongjoon Hyun <[email protected]>
bpchar arraybpchar array
|
ok to test |
bpchar arraybpchar type and array of char type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I manually verified that this achieved the goal. We will add a test coverage later as a follow-up.
postgres=# \d t
Table "public.t"
Column | Type | Collation | Nullable | Default
--------+-----------------+-----------+----------+---------
a | character(64)[] | | |
scala> spark.read.jdbc("jdbc:postgresql://127.0.0.1:5432/?user=postgres&password=rootpass", "t", new java.util.Properties).show
+--------------------+
| a|
+--------------------+
|[str1 ...|
+--------------------+
scala> spark.read.jdbc("jdbc:postgresql://127.0.0.1:5432/?user=postgres&password=rootpass", "t", new java.util.Properties).printSchema
root
|-- a: array (nullable = true)
| |-- element: string (containsNull = true)
… type ### What changes were proposed in this pull request? This PR fixes the support for char(n)[], character(n)[] data types. Prior to this change, a user would get `Unsupported type ARRAY` exception when attempting to interact with the table with such types. The description is a bit more detailed in the [JIRA](https://issues.apache.org/jira/browse/SPARK-32393) itself, but the crux of the issue is that postgres driver names char and character types as `bpchar`. The relevant driver code can be found [here](https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/TypeInfoCache.java#L85-L87). `char` is very likely to be still needed, as it seems that pg makes a distinction between `char(1)` and `char(n > 1)` as per [this code](https://github.com/pgjdbc/pgjdbc/blob/b7fd9f3cef734b4c219e2f6bc6c19acf68b2990b/pgjdbc/src/main/java/org/postgresql/core/Oid.java#L64). ### Why are the changes needed? For completeness of the pg dialect support. ### Does this PR introduce _any_ user-facing change? Yes, successful reads of tables with bpchar array instead of errors after this fix. ### How was this patch tested? Unit tests Closes #29192 from kujon/fix_postgres_bpchar_array_support. Authored-by: kujon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 0ae94ad) Signed-off-by: Dongjoon Hyun <[email protected]>
|
Thank you for your first contribution, @kujon . |
|
Test build #127252 has finished for PR 29192 at commit
|
|
@dongjoon-hyun apologies, I am quite short on time recently. Thanks for accepting the PR. I'll add the tests when I find some spare time. |
|
Ahh, I see it's being tackled in #29397 already. Thank you for that! |
|
Never mind, @kujon! Thanks for your contribution! |
…ray types in PostgresIntegrationSuite ### What changes were proposed in this pull request? This is a follow-up PR of #29192 that adds integration tests for character arrays in `PostgresIntegrationSuite`. ### Why are the changes needed? For better test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add tests. Closes #29397 from maropu/SPARK-32576-FOLLOWUP. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
…ray types in PostgresIntegrationSuite ### What changes were proposed in this pull request? This is a follow-up PR of #29192 that adds integration tests for character arrays in `PostgresIntegrationSuite`. ### Why are the changes needed? For better test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add tests. Closes #29397 from maropu/SPARK-32576-FOLLOWUP. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]> (cherry picked from commit 7990ea1) Signed-off-by: Takeshi Yamamuro <[email protected]>
What changes were proposed in this pull request?
This PR fixes the support for char(n)[], character(n)[] data types. Prior to this change, a user would get
Unsupported type ARRAYexception when attempting to interact with the table with such types.The description is a bit more detailed in the JIRA itself, but the crux of the issue is that postgres driver names char and character types as
bpchar. The relevant driver code can be found here.charis very likely to be still needed, as it seems that pg makes a distinction betweenchar(1)andchar(n > 1)as per this code.Why are the changes needed?
For completeness of the pg dialect support.
Does this PR introduce any user-facing change?
Yes, successful reads of tables with bpchar array instead of errors after this fix.
How was this patch tested?
Unit tests