Skip to content

Commit 1434bde

Browse files
committed
Add documents about Hive UDFs
1 parent 106eaa9 commit 1434bde

1 file changed

Lines changed: 19 additions & 0 deletions

File tree

docs/sql-programming-guide.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1903,6 +1903,25 @@ releases of Spark SQL.
19031903
Hive can optionally merge the small files into fewer large files to avoid overflowing the HDFS
19041904
metadata. Spark SQL does not support that.
19051905

1906+
**Hive UDF/UDTF/UDAF**
1907+
1908+
Spark SQL implements the basic functionality of the Hive UDF/UDTF/UDAF, but does not support all the APIs for users.
1909+
Some of them are meaningless in Spark and the others are rarely used by users.
1910+
Below is a list of major APIs we don't support in Spark SQL:
1911+
1912+
* `getRequiredJars` and `getRequiredFiles` (`UDF` and `GenericUDF`) are functions to to automatically
1913+
include additional resources required by this UDF.
1914+
* `initialize(StructObjectInspector)` in `GenericUDTF` is not supported yet. Spark SQL currently uses
1915+
a deprecated interface `initialize(ObjectInspector[])` only.
1916+
* `configure` (`GenericUDF`, `GenericUDTF`, and `GenericUDAFEvaluator`) is a function to initialize
1917+
functions with `MapredContext`. But, Spark SQL does not use `MapredContext` internally.
1918+
* `close` (`GenericUDF` and `GenericUDAFEvaluator`) is a function to release associated resources.
1919+
Spark SQL does not call this function when tasks finished.
1920+
* `reset` (`GenericUDAFEvaluator`) is a function to re-initialize aggregation for reusing the same aggregation.
1921+
Spark SQL currently does not support the reuse of aggregation.
1922+
* `getWindowingEvaluator` (`GenericUDAFEvaluator`) is a function to optimize aggregation by evaluating
1923+
an aggregate over a fixed window. Spark SQL does not support this optimization yet.
1924+
19061925
# Reference
19071926

19081927
## Data Types

0 commit comments

Comments
 (0)