@@ -17,7 +17,7 @@ It is capable of using the following Thrift transports:
1717
1818As of version 1.0, it supports asynchronous execution of queries. This allows you to submit
1919a query, disconnect, then reconnect later to check the status and retrieve the results.
20- This frees systems of the need to keep a persistent TCP connection.
20+ This frees systems of the need to keep a persistent TCP connection.
2121
2222## About Thrift services and transports
2323
@@ -29,7 +29,7 @@ BufferedTransport.
2929
3030### Hiveserver2
3131
32- [ Hiveserver2] ( https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2 )
32+ [ Hiveserver2] ( https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2 )
3333(the new Thrift interface) can support many concurrent client connections. It is shipped
3434with Hive 0.10 and later. In Hive 0.10, only BufferedTranport and SaslClientTransport are
3535supported; starting with Hive 0.12, HTTPClientTransport is also supported.
@@ -63,7 +63,7 @@ Otherwise you'll get this nasty-looking exception in the logs:
6363 at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
6464 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
6565 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
66- at java.lang.Thread.run(Thread.java:662)
66+ at java.lang.Thread.run(Thread.java:662)
6767
6868### Other Hive-compatible services
6969
@@ -77,20 +77,20 @@ Since Hiveserver has no options, connection code is very simple:
7777
7878 RBHive.connect('hive.server.address', 10_000) do |connection|
7979 connection.fetch 'SELECT city, country FROM cities'
80- end
80+ end
8181 ➔ [{:city => "London", :country => "UK"}, {:city => "Mumbai", :country => "India"}, {:city => "New York", :country => "USA"}]
8282
8383### Hiveserver2
8484
8585Hiveserver2 has several options with how it is run. The connection code takes
8686a hash with these possible parameters:
8787* ` :transport ` - one of ` :buffered ` (BufferedTransport), ` :http ` (HTTPClientTransport), or ` :sasl ` (SaslClientTransport)
88- * ` :hive_version ` - the number after the period in the Hive version; e.g. ` 10 ` , ` 11 ` , ` 12 ` , ` 13 ` or one of
88+ * ` :hive_version ` - the number after the period in the Hive version; e.g. ` 10 ` , ` 11 ` , ` 12 ` , ` 13 ` or one of
8989 a set of symbols; see [ Hiveserver2 protocol versions] ( #hiveserver2-protocol-versions ) below for details
9090* ` :timeout ` - if using BufferedTransport or SaslClientTransport, this is how long the timeout on the socket will be
9191* ` :sasl_params ` - if using SaslClientTransport, this is a hash of parameters to set up the SASL connection
9292
93- If you pass either an empty hash or nil in place of the options (or do not supply them), the connection
93+ If you pass either an empty hash or nil in place of the options (or do not supply them), the connection
9494is attempted with the Hive version set to 0.10, using ` :buffered ` as the transport, and a timeout of 1800 seconds.
9595
9696Connecting with the defaults:
@@ -117,7 +117,17 @@ Connecting with a specific Hive version (0.12) and using the `:http` transport:
117117 connection.fetch('SHOW TABLES')
118118 end
119119
120- We have not tested the SASL connection, as we don't run SASL; pull requests and testing are welcomed.
120+ Connecting with SASL and Kerberos v5:
121+
122+ RBHive.tcli_connect('hive.hadoop.forward.co.uk', 10_000, {
123+ :transport => :sasl,
124+ :sasl_params => {
125+ :mechanism => 'GSSAPI',
126+ :remote_host => 'example.com',
127+ :remote_principal => 'hive/[email protected] ' 128+ ) do |connection|
129+ connection.fetch("show tables")
130+ end
121131
122132#### Hiveserver2 protocol versions
123133
@@ -204,7 +214,7 @@ one of the following values and meanings:
204214| : unknown | The query is in an unknown state
205215| : pending | The query is ready to run but is not running
206216
207- There are also the utility methods ` async_is_complete?(handles) ` , ` async_is_running?(handles) ` ,
217+ There are also the utility methods ` async_is_complete?(handles) ` , ` async_is_running?(handles) ` ,
208218` async_is_failed?(handles) ` and ` async_is_cancelled?(handles) ` .
209219
210220#### ` async_cancel(handles) `
@@ -225,14 +235,14 @@ same way as the normal synchronous methods.
225235
226236 RBHive.connect('hive.server.address', 10_000) do |connection|
227237 connection.fetch 'SELECT city, country FROM cities'
228- end
238+ end
229239 ➔ [{:city => "London", :country => "UK"}, {:city => "Mumbai", :country => "India"}, {:city => "New York", :country => "USA"}]
230240
231241#### Hiveserver2
232242
233243 RBHive.tcli_connect('hive.server.address', 10_000) do |connection|
234244 connection.fetch 'SELECT city, country FROM cities'
235- end
245+ end
236246 ➔ [{:city => "London", :country => "UK"}, {:city => "Mumbai", :country => "India"}, {:city => "New York", :country => "USA"}]
237247
238248### Executing a query
@@ -266,13 +276,13 @@ Then for Hiveserver:
266276
267277 RBHive.connect('hive.server.address', 10_000) do |connection|
268278 connection.create_table(table)
269- end
279+ end
270280
271281Or Hiveserver2:
272282
273283 RBHive.tcli_connect('hive.server.address', 10_000) do |connection|
274284 connection.create_table(table)
275- end
285+ end
276286
277287### Modifying table schema
278288
@@ -290,18 +300,18 @@ Then for Hiveserver:
290300
291301 RBHive.connect('hive.server.address') do |connection|
292302 connection.replace_columns(table)
293- end
303+ end
294304
295305Or Hiveserver2:
296306
297307 RBHive.tcli_connect('hive.server.address') do |connection|
298308 connection.replace_columns(table)
299- end
309+ end
300310
301311### Setting properties
302312
303313You can set various properties for Hive tasks, some of which change how they run. Consult the Apache
304- Hive documentation and Hadoop's documentation for the various properties that can be set.
314+ Hive documentation and Hadoop's documentation for the various properties that can be set.
305315For example, you can set the map-reduce job's priority with the following:
306316
307317 connection.set("mapred.job.priority", "VERY_HIGH")
@@ -310,15 +320,15 @@ For example, you can set the map-reduce job's priority with the following:
310320
311321#### Hiveserver
312322
313- RBHive.connect('hive.hadoop.forward.co.uk', 10_000) {|connection|
323+ RBHive.connect('hive.hadoop.forward.co.uk', 10_000) {|connection|
314324 result = connection.fetch("describe some_table")
315325 puts result.column_names.inspect
316326 puts result.first.inspect
317327 }
318328
319329#### Hiveserver2
320330
321- RBHive.tcli_connect('hive.hadoop.forward.co.uk', 10_000) {|connection|
331+ RBHive.tcli_connect('hive.hadoop.forward.co.uk', 10_000) {|connection|
322332 result = connection.fetch("describe some_table")
323333 puts result.column_names.inspect
324334 puts result.first.inspect
0 commit comments