Skip to content

Conversation

@mojodna
Copy link
Contributor

@mojodna mojodna commented Jun 12, 2019

What changes were proposed in this pull request?

PostgreSQL doesn't have TINYINT, which would map directly, but SMALLINTs are sufficient for uni-directional translation.

A side-effect of this fix is that AggregatedDialect is now usable with multiple dialects targeting jdbc:postgresql, as PostgresDialect.getJDBCType no longer throws (for which reason backporting this fix would be lovely):

dialects.flatMap(_.getJDBCType(dt)).headOption

dialects.flatMap currently throws on the first attempt to get a JDBC type preventing subsequent dialects in the chain from providing an alternative.

How was this patch tested?

Unit tests.

PostgreSQL doesn't have TINYINT, which would map directly, but SMALLINTs
are sufficient for uni-directional translation.
@mojodna mojodna changed the title Map ByteType to SMALLINT Map ByteType to SMALLINT when using JDBC with PostgreSQL Jun 12, 2019
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @mojodna .

  1. In general, PostgreSQL users will not use unsupported types.
    I'm wondering if your goal is the one you mentioned as side-effects AggregatedDialect is now usable. Could you describe a little bit more about the use cases?
  2. Also, please create a JIRA issue for this suggestion and use the JIRA id to the PR title. PR is valuable, but JIRA issue also becomes a history.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File a JIRA, yes. You've checked this works on postgres?

@mojodna mojodna changed the title Map ByteType to SMALLINT when using JDBC with PostgreSQL [SPARK-28097] Map ByteType to SMALLINT when using JDBC with PostgreSQL Jun 18, 2019
@mojodna
Copy link
Contributor Author

mojodna commented Jun 18, 2019

@srowen yes, it works on Postgres.

@dongjoon-hyun https://issues.apache.org/jira/browse/SPARK-28100 describes the underlying problem (i.e. why I can't provide a custom dialect with an alternate mapping).

The actual use-case is writing spatial data from Spark (using Spark JTS) to Postgres using the JDBC sink (where I may need to map sqlType == Types.OTHER && typeName == "geometry" to `BinaryType, but that's entirely separate (and can be handled by my own dialect).

ByteTypes are in my data model as type discriminators, which is how I stumbled on this.

Some time later, I realized that I can cast my bytes to shorts before handing off to the JDBC sink, so this is more a case of things not working as I'd expect them to.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test to PostgresIntegrationSuite?

@gatorsmile
Copy link
Member

ok to test

@mojodna
Copy link
Contributor Author

mojodna commented Jun 18, 2019

@gatorsmile PostgresIntegrationSuite seems to just contain read-related tests (JDBC → DF); JDBCSuite.scala tests the mappings when writing (DF → JDBC).

@SparkQA
Copy link

SparkQA commented Jun 19, 2019

Test build #106641 has finished for PR 24845 at commit 50cb99a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jun 24, 2019

@gatorsmile given the comment at #24845 (comment) are you OK with this change?

@gatorsmile
Copy link
Member

@mojodna @srowen I think we still needs an end-to-end test to ensure it works well in postgreSQL. Is it hard to do it in PostgresIntegrationSuite ?

@srowen
Copy link
Member

srowen commented Jul 2, 2019

@mojodna what do you think about adding a simple additional test here to verify?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28097] Map ByteType to SMALLINT when using JDBC with PostgreSQL [SPARK-28097][SQL] Map ByteType to SMALLINT when using JDBC with PostgreSQL Jul 2, 2019
@mojodna
Copy link
Contributor Author

mojodna commented Jul 2, 2019

I'm swamped for the next couple weeks, but sure thing. Is there a specific test within PostgresIntegrationSuite that I should use as a template?

@maropu
Copy link
Member

maropu commented Jul 3, 2019

PostgresIntegrationSuite can run (DF → JDBC) tests easily like this;

  test("write byte as smallint") {
    sqlContext.createDataFrame(Seq((1.toByte, 2.toShort)))
      .write.jdbc(jdbcUrl, "byte_to_smallint_test", new Properties)
    val df = sqlContext.read.jdbc(jdbcUrl, "byte_to_smallint_test", new Properties)
    val schema = df.schema
    assert(schema(0).dataType == ShortType)
    assert(schema(1).dataType == ShortType)
    val rows = df.collect()
    assert(rows.length === 1)
    assert(rows(0).getShort(0) === 1)
    assert(rows(0).getShort(1) === 2)
  }

@dongjoon-hyun
Copy link
Member

Gentle ping, @mojodna .

@mojodna
Copy link
Contributor Author

mojodna commented Jul 12, 2019

Thanks @dongjoon-hyun. We just moved and are getting settled in, so sometime next week looks very likely.

@dongjoon-hyun
Copy link
Member

Thank you, @mojodna .

@mojodna
Copy link
Contributor Author

mojodna commented Jul 17, 2019

Updated. Thanks @maropu for doing the hard part!

@SparkQA
Copy link

SparkQA commented Jul 17, 2019

Test build #107798 has finished for PR 24845 at commit 67713ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.
Thank you, @mojodna , @srowen , @gatorsmile , @maropu !

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28097][SQL] Map ByteType to SMALLINT when using JDBC with PostgreSQL [SPARK-28097][SQL] Map ByteType to SMALLINT for PostgresDialect Jul 17, 2019
@dongjoon-hyun
Copy link
Member

Thank you so much for your contribution, @mojodna .
You are added to the Apache Spark contributor group and SPARK-28097 is assigned to you.

vinodkc pushed a commit to vinodkc/spark that referenced this pull request Jul 18, 2019
## What changes were proposed in this pull request?

PostgreSQL doesn't have `TINYINT`, which would map directly, but `SMALLINT`s are sufficient for uni-directional translation.

A side-effect of this fix is that `AggregatedDialect` is now usable with multiple dialects targeting `jdbc:postgresql`, as `PostgresDialect.getJDBCType` no longer throws (for which reason backporting this fix would be lovely):

https://github.com/apache/spark/blob/1217996f1574f758d8cccc1c4e3846452d24b35b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/AggregatedDialect.scala#L42

`dialects.flatMap` currently throws on the first attempt to get a JDBC type preventing subsequent dialects in the chain from providing an alternative.

## How was this patch tested?

Unit tests.

Closes apache#24845 from mojodna/postgres-byte-type-mapping.

Authored-by: Seth Fitzsimmons <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants