Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Nov 24, 2020

What changes were proposed in this pull request?

We supported Hive metastore are 0.12.0 through 3.1.2, but we supported hive-jdbc are 0.12.0 through 2.3.7. It will throw TProtocolException if we use hive-jdbc 3.x:

[root@spark-3267648 apache-hive-3.1.2-bin]# bin/beeline -u jdbc:hive2://localhost:10000/default
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Spark SQL (version 3.1.0-SNAPSHOT)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://localhost:10000/default> create table t1(id int) using parquet;
Unexpected end of file when reading from HS2 server. The root cause might be too many concurrent connections. Please ask the administrator to check the number of active connections, and adjust hive.server2.thrift.max.worker.threads if applicable.
Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0)
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:234)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
	at java.base/java.lang.Thread.run(Thread.java:832)

This pr upgrade hive-service-rpc to 3.1.2 to fix this issue.

Why are the changes needed?

To support hive-jdbc 3.x.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual test:

[root@spark-3267648 apache-hive-3.1.2-bin]# bin/beeline -u jdbc:hive2://localhost:10000/default
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Spark SQL (version 3.1.0-SNAPSHOT)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://localhost:10000/default> create table t1(id int) using parquet;
+---------+
| Result  |
+---------+
+---------+
No rows selected (1.051 seconds)
0: jdbc:hive2://localhost:10000/default> insert into t1 values(1);
+---------+
| Result  |
+---------+
+---------+
No rows selected (2.08 seconds)
0: jdbc:hive2://localhost:10000/default> select * from t1;
+-----+
| id  |
+-----+
| 1   |
+-----+
1 row selected (0.605 seconds)

<groupId>${hive.group}</groupId>
<artifactId>hive-service-rpc</artifactId>
<version>${hive.version}</version>
<version>3.1.2</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we have a JIRA to recover this when we upgrade Hive version to 3.1.2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangyum . Did you create a JIRA? I cannot find it yet. If you created, please link to SPARK-33525 .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon
Copy link
Member

cc @juliuszsompolski FYI since you're working on this area.

@SparkQA
Copy link

SparkQA commented Nov 24, 2020

Test build #131619 has finished for PR 30478 at commit 43d90ca.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please

@dongjoon-hyun
Copy link
Member

The Hive Thrift Server UT failure was due to the Jenkins Server change and it's fixed at master.

org.apache.spark.sql.hive.thriftserver.SparkThriftServerProtocolVersionsSuite.HIVE_CLI_SERVICE_PROTOCOL_V1 get binary type

@dongjoon-hyun
Copy link
Member

Retest this please

@SparkQA
Copy link

SparkQA commented Nov 24, 2020

Test build #131679 has finished for PR 30478 at commit 43d90ca.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 24, 2020

Test build #131688 has finished for PR 30478 at commit 43d90ca.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Nov 24, 2020

Retest this please

@SparkQA
Copy link

SparkQA commented Nov 25, 2020

Test build #131702 has finished for PR 30478 at commit 43d90ca.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Nov 25, 2020

Test build #131720 has finished for PR 30478 at commit 43d90ca.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Nov 25, 2020

retest this please.

@SparkQA
Copy link

SparkQA commented Nov 25, 2020

Test build #131734 has finished for PR 30478 at commit 43d90ca.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Nov 25, 2020

Retest this please

@SparkQA
Copy link

SparkQA commented Nov 25, 2020

Test build #131755 has finished for PR 30478 at commit 43d90ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @wangyum .

I verified this manually with the given example.

apache-hive-3.1.2-bin:$ bin/beeline -u jdbc:hive2://localhost:10000/default
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/dongjoon/APACHE/hive-release/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/dongjoon/APACHE/hadoop-release/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Spark SQL (version 3.1.0-SNAPSHOT)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://localhost:10000/default> create table t1(id int) using parquet;
+---------+
| Result  |
+---------+
+---------+
No rows selected (4.151 seconds)
0: jdbc:hive2://localhost:10000/default> select * from t1;
+-----+
| id  |
+-----+
+-----+
No rows selected (1.465 seconds)
0: jdbc:hive2://localhost:10000/default>

@wangyum wangyum deleted the SPARK-33525 branch November 25, 2020 21:42
@HyukjinKwon
Copy link
Member

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants