[SPARK-28097][SQL] Map ByteType to SMALLINT for PostgresDialect #24845

mojodna · 2019-06-12T01:22:52Z

What changes were proposed in this pull request?

PostgreSQL doesn't have TINYINT, which would map directly, but SMALLINTs are sufficient for uni-directional translation.

A side-effect of this fix is that AggregatedDialect is now usable with multiple dialects targeting jdbc:postgresql, as PostgresDialect.getJDBCType no longer throws (for which reason backporting this fix would be lovely):

spark/sql/core/src/main/scala/org/apache/spark/sql/jdbc/AggregatedDialect.scala

Line 42 in 1217996

dialects.flatMap(_.getJDBCType(dt)).headOption

dialects.flatMap currently throws on the first attempt to get a JDBC type preventing subsequent dialects in the chain from providing an alternative.

How was this patch tested?

Unit tests.

PostgreSQL doesn't have TINYINT, which would map directly, but SMALLINTs are sufficient for uni-directional translation.

dongjoon-hyun

Thank you for making a PR, @mojodna .

In general, PostgreSQL users will not use unsupported types.
I'm wondering if your goal is the one you mentioned as side-effects AggregatedDialect is now usable. Could you describe a little bit more about the use cases?
Also, please create a JIRA issue for this suggestion and use the JIRA id to the PR title. PR is valuable, but JIRA issue also becomes a history.

srowen

File a JIRA, yes. You've checked this works on postgres?

sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala

mojodna · 2019-06-18T20:35:29Z

@srowen yes, it works on Postgres.

@dongjoon-hyun https://issues.apache.org/jira/browse/SPARK-28100 describes the underlying problem (i.e. why I can't provide a custom dialect with an alternate mapping).

The actual use-case is writing spatial data from Spark (using Spark JTS) to Postgres using the JDBC sink (where I may need to map sqlType == Types.OTHER && typeName == "geometry" to `BinaryType, but that's entirely separate (and can be handled by my own dialect).

ByteTypes are in my data model as type discriminators, which is how I stumbled on this.

Some time later, I realized that I can cast my bytes to shorts before handing off to the JDBC sink, so this is more a case of things not working as I'd expect them to.

gatorsmile

Add a test to PostgresIntegrationSuite?

gatorsmile · 2019-06-18T22:00:21Z

ok to test

mojodna · 2019-06-18T23:24:08Z

@gatorsmile PostgresIntegrationSuite seems to just contain read-related tests (JDBC → DF); JDBCSuite.scala tests the mappings when writing (DF → JDBC).

SparkQA · 2019-06-19T01:12:12Z

Test build #106641 has finished for PR 24845 at commit 50cb99a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-06-24T23:04:48Z

@gatorsmile given the comment at #24845 (comment) are you OK with this change?

gatorsmile · 2019-06-25T21:18:42Z

@mojodna @srowen I think we still needs an end-to-end test to ensure it works well in postgreSQL. Is it hard to do it in PostgresIntegrationSuite ?

srowen · 2019-07-02T22:39:51Z

@mojodna what do you think about adding a simple additional test here to verify?

mojodna · 2019-07-02T22:50:49Z

I'm swamped for the next couple weeks, but sure thing. Is there a specific test within PostgresIntegrationSuite that I should use as a template?

maropu · 2019-07-03T04:05:10Z

PostgresIntegrationSuite can run (DF → JDBC) tests easily like this;

  test("write byte as smallint") {
    sqlContext.createDataFrame(Seq((1.toByte, 2.toShort)))
      .write.jdbc(jdbcUrl, "byte_to_smallint_test", new Properties)
    val df = sqlContext.read.jdbc(jdbcUrl, "byte_to_smallint_test", new Properties)
    val schema = df.schema
    assert(schema(0).dataType == ShortType)
    assert(schema(1).dataType == ShortType)
    val rows = df.collect()
    assert(rows.length === 1)
    assert(rows(0).getShort(0) === 1)
    assert(rows(0).getShort(1) === 2)
  }

dongjoon-hyun · 2019-07-12T18:09:20Z

Gentle ping, @mojodna .

mojodna · 2019-07-12T18:32:12Z

Thanks @dongjoon-hyun. We just moved and are getting settled in, so sometime next week looks very likely.

dongjoon-hyun · 2019-07-12T23:40:35Z

Thank you, @mojodna .

mojodna · 2019-07-17T19:53:08Z

Updated. Thanks @maropu for doing the hard part!

SparkQA · 2019-07-17T22:01:54Z

Test build #107798 has finished for PR 24845 at commit 67713ee.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM. Merged to master.
Thank you, @mojodna , @srowen , @gatorsmile , @maropu !

dongjoon-hyun · 2019-07-17T22:12:55Z

Thank you so much for your contribution, @mojodna .
You are added to the Apache Spark contributor group and SPARK-28097 is assigned to you.

## What changes were proposed in this pull request? PostgreSQL doesn't have `TINYINT`, which would map directly, but `SMALLINT`s are sufficient for uni-directional translation. A side-effect of this fix is that `AggregatedDialect` is now usable with multiple dialects targeting `jdbc:postgresql`, as `PostgresDialect.getJDBCType` no longer throws (for which reason backporting this fix would be lovely): https://github.com/apache/spark/blob/1217996f1574f758d8cccc1c4e3846452d24b35b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/AggregatedDialect.scala#L42 `dialects.flatMap` currently throws on the first attempt to get a JDBC type preventing subsequent dialects in the chain from providing an alternative. ## How was this patch tested? Unit tests. Closes apache#24845 from mojodna/postgres-byte-type-mapping. Authored-by: Seth Fitzsimmons <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

Map ByteType to SMALLINT

40decc6

PostgreSQL doesn't have TINYINT, which would map directly, but SMALLINTs are sufficient for uni-directional translation.

mojodna changed the title ~~Map ByteType to SMALLINT~~ Map ByteType to SMALLINT when using JDBC with PostgreSQL Jun 12, 2019

dongjoon-hyun added IMPROVEMENT and removed IMPROVEMENT labels Jun 12, 2019

dongjoon-hyun reviewed Jun 15, 2019

View reviewed changes

srowen reviewed Jun 17, 2019

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala Outdated Show resolved Hide resolved

Merge conditions

50cb99a

mojodna changed the title ~~Map ByteType to SMALLINT when using JDBC with PostgreSQL~~ [SPARK-28097] Map ByteType to SMALLINT when using JDBC with PostgreSQL Jun 18, 2019

srowen approved these changes Jun 18, 2019

View reviewed changes

gatorsmile reviewed Jun 18, 2019

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-28097] Map ByteType to SMALLINT when using JDBC with PostgreSQL~~ [SPARK-28097][SQL] Map ByteType to SMALLINT when using JDBC with PostgreSQL Jul 2, 2019

dongjoon-hyun added the SQL label Jul 2, 2019

Add integration test

67713ee

dongjoon-hyun approved these changes Jul 17, 2019

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-28097][SQL] Map ByteType to SMALLINT when using JDBC with PostgreSQL~~ [SPARK-28097][SQL] Map ByteType to SMALLINT for PostgresDialect Jul 17, 2019

dongjoon-hyun closed this in eb5dc74 Jul 17, 2019

[SPARK-28097][SQL] Map ByteType to SMALLINT for PostgresDialect #24845

[SPARK-28097][SQL] Map ByteType to SMALLINT for PostgresDialect #24845

Uh oh!

Conversation

mojodna commented Jun 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mojodna commented Jun 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile left a comment

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jun 18, 2019

Uh oh!

mojodna commented Jun 18, 2019

Uh oh!

SparkQA commented Jun 19, 2019

Uh oh!

srowen commented Jun 24, 2019

Uh oh!

gatorsmile commented Jun 25, 2019

Uh oh!

srowen commented Jul 2, 2019

Uh oh!

mojodna commented Jul 2, 2019

Uh oh!

maropu commented Jul 3, 2019

Uh oh!

dongjoon-hyun commented Jul 12, 2019

Uh oh!

mojodna commented Jul 12, 2019

Uh oh!

dongjoon-hyun commented Jul 12, 2019

Uh oh!

mojodna commented Jul 17, 2019

Uh oh!

SparkQA commented Jul 17, 2019

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jul 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mojodna commented Jun 12, 2019 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

mojodna commented Jun 18, 2019 •

edited

Loading