Skip to content

Conversation

@kujon
Copy link
Contributor

@kujon kujon commented Jul 22, 2020

What changes were proposed in this pull request?

This PR fixes the support for char(n)[], character(n)[] data types. Prior to this change, a user would get Unsupported type ARRAY exception when attempting to interact with the table with such types.

The description is a bit more detailed in the JIRA itself, but the crux of the issue is that postgres driver names char and character types as bpchar. The relevant driver code can be found here. char is very likely to be still needed, as it seems that pg makes a distinction between char(1) and char(n > 1) as per this code.

Why are the changes needed?

For completeness of the pg dialect support.

Does this PR introduce any user-facing change?

Yes, successful reads of tables with bpchar array instead of errors after this fix.

How was this patch tested?

Unit tests

@maropu
Copy link
Member

maropu commented Jul 23, 2020

Could you add tests in PostgresIntegrationSuite then check if the test can pass on your local env? (Note: our testing framework, Jenkins, does not run it).

@maropu
Copy link
Member

maropu commented Jul 23, 2020

bpchar is internally used in postgresql [1][2], but users can use it in DDL;

postgres=# create table t (v bpchar);
CREATE TABLE

So, I think its okay to add it for better interoperability.

@maropu
Copy link
Member

maropu commented Jul 23, 2020

btw, thanks for your first contribution, @kujon !

@maropu
Copy link
Member

maropu commented Jul 23, 2020

ok to test

@kujon
Copy link
Contributor Author

kujon commented Jul 23, 2020

Could you add tests in PostgresIntegrationSuite then check if the test can pass on your local env? (Note: our testing framework, Jenkins, does not run it).

will do!

@SparkQA
Copy link

SparkQA commented Jul 23, 2020

Test build #126374 has finished for PR 29192 at commit 1000377.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your first contribution, @kujon . Also, thank you for trying to add a test into PostgresIntegrationSuite.
Could you revise the PR description to be complete without referencing JIRA? The PR description will become a permanent commit log and will be read much more times.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32393][SQL] Fix postgres bpchar array support [SPARK-32393][SQL] Support PostgreSQL bpchar array Jul 24, 2020
@maropu
Copy link
Member

maropu commented Jul 28, 2020

@kujon Any update? If you get stuck, please let me know.

@kujon
Copy link
Contributor Author

kujon commented Jul 28, 2020

Hi @maropu, I just need a few more days. I'm the meantime, I struggled finding said file, do you want me to add a fresh integration test?

@maropu
Copy link
Member

maropu commented Jul 28, 2020

This one: https://github.com/apache/spark/blob/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
You can add test cases there, then run the tests via ./build/mvn -Pdocker-integration-tests -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite

@maropu
Copy link
Member

maropu commented Aug 3, 2020

kindly ping.

@dongjoon-hyun
Copy link
Member

It seems that it's too hard requirement for a new contributor. How do you want to proceed this, @maropu ?

dongjoon-hyun pushed a commit that referenced this pull request Aug 10, 2020
…tgresIntegrationSuite

### What changes were proposed in this pull request?

This PR intends to add tests to check if all the character types in PostgreSQL supported.

The document for character types in PostgreSQL: https://www.postgresql.org/docs/current/datatype-character.html

Closes #29192.

### Why are the changes needed?

For better test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Add tests.

Closes #29394 from maropu/pr29192.

Lead-authored-by: Takeshi Yamamuro <[email protected]>
Co-authored-by: kujon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit b2c45f7)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun dongjoon-hyun reopened this Aug 10, 2020
@dongjoon-hyun
Copy link
Member

I reopened this again, @kujon .

Hi, @maropu . It seems that this PR aims to support CREATE TABLE t (a char(64)[]); instead CREATE TABLE t (a bpchar).

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32393][SQL] Support PostgreSQL bpchar array [SPARK-32576][SQL] Support PostgreSQL bpchar array Aug 10, 2020
@dongjoon-hyun
Copy link
Member

ok to test

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32576][SQL] Support PostgreSQL bpchar array [SPARK-32576][SQL] Support PostgreSQL bpchar type and array of char type Aug 10, 2020
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually verified that this achieved the goal. We will add a test coverage later as a follow-up.

postgres=# \d t
                     Table "public.t"
 Column |      Type       | Collation | Nullable | Default
--------+-----------------+-----------+----------+---------
 a      | character(64)[] |           |          |

scala> spark.read.jdbc("jdbc:postgresql://127.0.0.1:5432/?user=postgres&password=rootpass", "t", new java.util.Properties).show
+--------------------+
|                   a|
+--------------------+
|[str1            ...|
+--------------------+


scala> spark.read.jdbc("jdbc:postgresql://127.0.0.1:5432/?user=postgres&password=rootpass", "t", new java.util.Properties).printSchema
root
 |-- a: array (nullable = true)
 |    |-- element: string (containsNull = true)

dongjoon-hyun pushed a commit that referenced this pull request Aug 10, 2020
… type

### What changes were proposed in this pull request?
This PR fixes the support for char(n)[], character(n)[] data types. Prior to this change, a user would get `Unsupported type ARRAY` exception when attempting to interact with the table with such types.

The description is a bit more detailed in the [JIRA](https://issues.apache.org/jira/browse/SPARK-32393) itself, but the crux of the issue is that postgres driver names char and character types as `bpchar`. The relevant driver code can be found [here](https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/TypeInfoCache.java#L85-L87). `char` is very likely to be still needed, as it seems that pg makes a distinction between `char(1)` and `char(n > 1)` as per [this code](https://github.com/pgjdbc/pgjdbc/blob/b7fd9f3cef734b4c219e2f6bc6c19acf68b2990b/pgjdbc/src/main/java/org/postgresql/core/Oid.java#L64).

### Why are the changes needed?
For completeness of the pg dialect support.

### Does this PR introduce _any_ user-facing change?
Yes, successful reads of tables with bpchar array instead of errors after this fix.

### How was this patch tested?
Unit tests

Closes #29192 from kujon/fix_postgres_bpchar_array_support.

Authored-by: kujon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 0ae94ad)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Thank you for your first contribution, @kujon .
We added you to the Apache Spark contributor group and assign SPARK-32576 to you.
Welcome!

@SparkQA
Copy link

SparkQA commented Aug 10, 2020

Test build #127252 has finished for PR 29192 at commit 1000377.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kujon
Copy link
Contributor Author

kujon commented Aug 10, 2020

@dongjoon-hyun apologies, I am quite short on time recently. Thanks for accepting the PR. I'll add the tests when I find some spare time.

@kujon
Copy link
Contributor Author

kujon commented Aug 10, 2020

Ahh, I see it's being tackled in #29397 already. Thank you for that!

@maropu
Copy link
Member

maropu commented Aug 10, 2020

Never mind, @kujon! Thanks for your contribution!

maropu added a commit that referenced this pull request Aug 10, 2020
…ray types in PostgresIntegrationSuite

### What changes were proposed in this pull request?

This is a follow-up PR of #29192 that adds integration tests for character arrays in `PostgresIntegrationSuite`.

### Why are the changes needed?

For better test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Add tests.

Closes #29397 from maropu/SPARK-32576-FOLLOWUP.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
maropu added a commit that referenced this pull request Aug 10, 2020
…ray types in PostgresIntegrationSuite

### What changes were proposed in this pull request?

This is a follow-up PR of #29192 that adds integration tests for character arrays in `PostgresIntegrationSuite`.

### Why are the changes needed?

For better test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Add tests.

Closes #29397 from maropu/SPARK-32576-FOLLOWUP.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
(cherry picked from commit 7990ea1)
Signed-off-by: Takeshi Yamamuro <[email protected]>
@dongjoon-hyun
Copy link
Member

Yep. Thank you, @kujon and @maropu . We are collaborating! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants