Skip to content
Closed
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 48 additions & 1 deletion docs/sql-ref-functions-udf-hive.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,51 @@ license: |
limitations under the License.
---

Integration with Hive UDFs/UDAFs/UDTFs
### Description

Spark SQL supports integration of Hive UDFs, UDAFs and UDTFs. Similar to Spark UDFs and UDAFs, Hive UDFs work on a single row as input and generate a single row as output, while Hive UDAFs operate on multiple rows and return a single aggregated row as a result. In addition, Hive also supports UDTFs (User Defined Tabular Functions) that act on one row as input and return multiple rows as output. To use Hive UDFs/UDAFs/UTFs, the user should register them in Spark, and then use them in Spark SQL queries.

### Examples

<pre><code>
// Register a Hive UDF and use it in Spark SQL
// Scala
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, we need ADD JAR for the hive UDF below here.

// include the JAR file containing mytest.hiveUDF implementation
spark.sql("ADD JAR myHiveUDF.jar")
spark.sql("CREATE TEMPORARY FUNCTION testUDF AS 'mytest.hiveUDF'")
spark.sql("SELECT testUDF(value) FROM hiveUDFTestTable")

// Register a Hive UDAF and use it in Spark SQL
// Scala
// include the JAR file containing
// org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax
spark.sql("ADD JAR myHiveUDAF.jar")
spark.sql(
"""
|CREATE TEMPORARY FUNCTION hive_max
|AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax'
""".stripMargin)
spark.sql("SELECT key % 2, hive_max(key) FROM t GROUP BY key % 2")

// Register a Hive UDTF and use it in Spark SQL
// Scala
// GenericUDTFCount2 outputs the number of rows seen, twice.
// The function source code can be found at:
// https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide+UDTF
// include the JAR file containing GenericUDTFCount2 implementation
spark.sql("ADD JAR myHiveUDTF.jar")
spark.sql(
"""
|CREATE TEMPORARY FUNCTION udtf_count2
|AS 'org.apache.spark.sql.hive.execution.GenericUDTFCount2'
""".stripMargin)
spark.sql("SELECT udtf_count2(a) FROM (SELECT 1 AS a)").show

+----+
|col1|
+----+
| 1|
| 1|
+----+

</code></pre>