[SPARK-15648][SQL] Add teradataDialect for JDBC connection to Teradata #16746

klinvill · 2017-01-30T22:17:25Z

The contribution is my original work and I license the work to the project under the project’s open source license.

Note: the Teradata JDBC connector limits the row size to 64K. The default string datatype equivalent I used is a 255 character/byte length varchar. This effectively limits the max number of string columns to 250 when using the Teradata jdbc connector.

What changes were proposed in this pull request?

Added a teradataDialect for JDBC connection to Teradata. The Teradata dialect uses VARCHAR(255) in place of TEXT for string datatypes, and CHAR(1) in place of BIT(1) for boolean datatypes.

How was this patch tested?

I added two unit tests to double check that the types get set correctly for a teradata jdbc url. I also ran a couple manual tests to make sure the jdbc connector worked with teradata and to make sure that an error was thrown if a row could potentially exceed 64K (this error comes from the teradata jdbc connector, not from the spark code). I did not check how string columns longer than 255 characters are handled.

Note: the Teradata JDBC connector limits the row size to 64K. The default string datatype equivalent is a 255 character/byte length varchar.

gatorsmile · 2017-01-30T22:40:14Z

Are we able to find a docker image for Teradata Express? For example, we did it for Postgres.

Otherwise, it is hard for us to do the test for verification.

klinvill · 2017-01-30T22:50:08Z

Unfortunately I don't think there's a docker image for Teradata available yet. They do have the VM version and an AMI. Would either of those be sufficient?

gatorsmile · 2017-01-30T23:15:15Z

I do not have a solution to plug in VM into our test framework. How are you doing the test?

klinvill · 2017-01-30T23:30:58Z

I just tested it manually with a Teradata instance I have running. I didn't test it too extensively other than making sure that a write to a teradata table using a string datatype was working correctly for smaller strings (<255 characters).

dongjoon-hyun · 2017-01-31T06:51:06Z

sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala

+    case StringType => Some(JdbcType("VARCHAR(255)", java.sql.Types.VARCHAR))
+    case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR))
+    case _ => None
+  }


Hi, @klinvill .
According to the description and initial PR in SPARK-15648, Teradata didn't support LIMIT query at that time.
Now, does it supports LIMIT?

Hi @dongjoon-hyun,
Teradata still doesn't support LIMIT (it uses TOP instead) but the spark code that was originally using limit has been changed to use "where 1=0 instead".

/** * Get the SQL query that should be used to find if the given table exists. Dialects can * override this method to return a query that works best in a particular database. * @param table The name of the table. * @return The SQL query to use for checking the table. */ def getTableExistsQuery(table: String): String = { s"SELECT * FROM $table WHERE 1=0" } /** * The SQL query that should be used to discover the schema of a table. It only needs to * ensure that the result set has the same schema as the table, such as by calling * "SELECT * ...". Dialects can override this method to return a query that works best in a * particular database. * @param table The name of the table. * @return The SQL query to use for discovering the schema. */ @Since("2.1.0") def getSchemaQuery(table: String): String = { s"SELECT * FROM $table WHERE 1=0" }

dongjoon-hyun · 2017-01-31T06:54:45Z

sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala

+
+package org.apache.spark.sql.jdbc
+
+import java.sql.Types


A blank line is needed here. You can run the following command line to check that and to confirm after fixing.

$ dev/lint-scala

Thanks! Fixed in latest commit.

dongjoon-hyun · 2017-01-31T07:04:59Z

sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala

+    case StringType => Some(JdbcType("VARCHAR(255)", java.sql.Types.VARCHAR))
+    case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR))
+    case _ => None
+  }


Could you verify if we need to override the followings together?

override def quoteIdentifier(colName: String): String = ... override def getTableExistsQuery(table: String): String = ... override def isCascadingTruncateTable(): Option[Boolean] = ...

What about isCascadingTruncateTable? Could you check if Teradata does truncate cascadingly by default for TRUNCATE TABLE statement?

quoteIdentifier and getTableExistsQuery will both work for Teradata. Teradata does not cascade by default but it also doesn't have a TRUNCATE TABLE command (DELETE is used instead) so any commands that use TRUNCATE TABLE will fail.

dongjoon-hyun · 2017-01-31T07:07:10Z

BTW, @klinvill .
Do you use a real instance? Could you advice how the other persons like me can verify your PR on Teradata? Maybe, can we check Teradata Express or AWS Marketplace?

Note: the Teradata JDBC connector limits the row size to 64K. The default string datatype equivalent is a 255 character/byte length varchar.

klinvill · 2017-02-01T01:48:53Z

@dongjoon-hyun Yup, I was using a real instance for testing. The best way to test without a real instance is probably going to be using the Teradata Express vm: http://downloads.teradata.com/download/database/teradata-express-for-vmware-player. You can also build an instance using an AMI but it's fairly expensive for an AMI so I'd recommend the express vm instead. Unfortunately there's currently not a dockerized version available.

klinvill · 2017-03-29T04:29:18Z

Hi @dongjoon-hyun @gatorsmile, just circling back. Is it going to be impractical to check the PR against a VM rather than against a docker image?

gatorsmile · 2017-05-23T16:46:06Z

ok to test

SparkQA · 2017-05-23T18:54:29Z

Test build #77257 has finished for PR 16746 at commit 91e12e1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-05-23T18:59:57Z

LGTM

gatorsmile · 2017-05-23T19:01:18Z

Thanks! Merging to master.

klinvill · 2017-05-24T01:11:02Z

Thanks for the help and review!

The contribution is my original work and I license the work to the project under the project’s open source license. Note: the Teradata JDBC connector limits the row size to 64K. The default string datatype equivalent I used is a 255 character/byte length varchar. This effectively limits the max number of string columns to 250 when using the Teradata jdbc connector. ## What changes were proposed in this pull request? Added a teradataDialect for JDBC connection to Teradata. The Teradata dialect uses VARCHAR(255) in place of TEXT for string datatypes, and CHAR(1) in place of BIT(1) for boolean datatypes. ## How was this patch tested? I added two unit tests to double check that the types get set correctly for a teradata jdbc url. I also ran a couple manual tests to make sure the jdbc connector worked with teradata and to make sure that an error was thrown if a row could potentially exceed 64K (this error comes from the teradata jdbc connector, not from the spark code). I did not check how string columns longer than 255 characters are handled. Author: Kirby Linvill <[email protected]> Author: klinvill <[email protected]> Closes apache#16746 from klinvill/master.

SPARK-15648: Added teradataDialect for JDBC connection

4b8d9a6

Note: the Teradata JDBC connector limits the row size to 64K. The default string datatype equivalent is a 255 character/byte length varchar.

dongjoon-hyun reviewed Jan 31, 2017

View reviewed changes

SPARK-15648: Added teradataDialect for JDBC connection to Teradata

8258270

Note: the Teradata JDBC connector limits the row size to 64K. The default string datatype equivalent is a 255 character/byte length varchar.

Merge branch 'master' into master

0be3e0c

Merge branch 'master' into master

91e12e1

asfgit closed this in 4816c2e May 23, 2017


		package org.apache.spark.sql.jdbc

		import java.sql.Types

[SPARK-15648][SQL] Add teradataDialect for JDBC connection to Teradata #16746

[SPARK-15648][SQL] Add teradataDialect for JDBC connection to Teradata #16746

Uh oh!

Conversation

klinvill commented Jan 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

gatorsmile commented Jan 30, 2017

Uh oh!

klinvill commented Jan 30, 2017

Uh oh!

gatorsmile commented Jan 30, 2017

Uh oh!

klinvill commented Jan 30, 2017

Uh oh!

dongjoon-hyun Jan 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

klinvill Jan 31, 2017

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jan 31, 2017

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jan 31, 2017

Choose a reason for hiding this comment

Uh oh!

klinvill Jan 31, 2017

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jan 31, 2017

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jan 31, 2017

Choose a reason for hiding this comment

Uh oh!

klinvill Feb 1, 2017

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jan 31, 2017

Uh oh!

klinvill commented Feb 1, 2017

Uh oh!

klinvill commented Mar 29, 2017

Uh oh!

gatorsmile commented May 23, 2017

Uh oh!

SparkQA commented May 23, 2017

Uh oh!

gatorsmile commented May 23, 2017

Uh oh!

gatorsmile commented May 23, 2017

Uh oh!

klinvill commented May 24, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

klinvill commented Jan 30, 2017 •

edited

Loading

dongjoon-hyun Jan 31, 2017 •

edited

Loading