Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Jul 29, 2017

What changes were proposed in this pull request?

This pr added code to print warning messages when an unsupported function public void configure(MapredContext mapredContext) used in GenericUDF and GenericUDF. This is because Spark does not call configure internally and this is error-prone.

How was this patch tested?

Manually checked. If this pr applied, you hit a warning message below;

scala> sql("CREATE TEMPORARY FUNCTION test AS 'test.TestUDTF'")
17/07/29 11:57:08 WARN HiveShim: Found an overridden method `configure` in TestUDTF, but Spark does not call the method during initialization because Spark does not use MapredContext inside (See SPARK-21533). So, you might reconsider the implementation of TestUDTF.

@maropu
Copy link
Member Author

maropu commented Jul 29, 2017

@gatorsmile @hvanhovell What do you think this? Thanks!

@SparkQA
Copy link

SparkQA commented Jul 29, 2017

Test build #80035 has finished for PR 18768 at commit ff1b88b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 29, 2017

Test build #80037 has finished for PR 18768 at commit 1752f70.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 29, 2017

Test build #80038 has finished for PR 18768 at commit 2d586a8.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jul 29, 2017

Jenkins, retest this please.

}

def validateHiveUserDefinedFunction(udfClass: Class[_]): Unit = {
if (hasInheritanceOf[GenericUDF]("configure", udfClass) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When GenericUDF API has configure method? Seems GenericUDF at 0.10.0 has no such method?

https://hive.apache.org/javadocs/r0.10.0/api/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ran with a Hive version that configure is not implemented yet, is hasInheritanceOf safe from NoSuchMethodException ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC spark always refers to hive-exec-1.2.1.spark2.jar , so it seems we have no chance to get the exception. But, I think it is not a bad idea to catch NoSuchMethodException there for understandability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's right. As you said, it is still good to catch the exception.

@SparkQA
Copy link

SparkQA commented Jul 29, 2017

Test build #80040 has finished for PR 18768 at commit 2d586a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

private def isSubClassOf(t: Type, parent: Class[_]): Boolean = t match {
case cls: Class[_] => parent.isAssignableFrom(cls)
Copy link
Member

@viirya viirya Jul 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you pass in something not a Class[_]? If not, we can simply inline isAssignableFrom check into hasInheritanceOf. Then we don't need to have Type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@viirya
Copy link
Member

viirya commented Jul 30, 2017

Btw, is it possibly to add an unit test for it?

@SparkQA
Copy link

SparkQA commented Jul 30, 2017

Test build #80052 has finished for PR 18768 at commit c9b6080.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2017

Test build #80053 has finished for PR 18768 at commit 34238ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Is this the only function we miss for different types of Hive UD(A/T)Fs?

@gatorsmile
Copy link
Member

* Construct a [[FunctionBuilder]] based on the provided class that represents a function.
*/
private def makeFunctionBuilder(name: String, clazz: Class[_]): FunctionBuilder = {
validateHiveUserDefinedFunction(clazz)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am neural to introduce a warning message for this case. Not sure how helpful the warning message will be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. So, I think it's ok to revisit this again if we have more reports from users.

@maropu
Copy link
Member Author

maropu commented Jul 31, 2017

@viirya I've no idea about a way to add tests for this kind of cases and I think the previous prs similar to this case (just add code for warning) also had no test. But, if we could, we'd better to do...

@gatorsmile I checked these Hive classes and I found some other functions that Spark currently ignores... (e.g., GenericUDF.getRequiredJars, GenericUDF.close...). Also, Spark currently invokes the deprecated initialize interface only in GenericUDTF, so users get stuck if they use the non-deprecated initialize interface. (See: #18527). As you suggested above, it seems we better leave some documents about these unsupported behaviours.

@gatorsmile
Copy link
Member

Yes! Thanks for your efforts! Please update the documentation at first.

@maropu
Copy link
Member Author

maropu commented Jul 31, 2017

ok, I'll make a pr later. Thanks!

@maropu
Copy link
Member Author

maropu commented Aug 1, 2017

I think the document about these unsupported functions is enough for users, so I'll close this for now. If we have more response from users, we could revisit this again. Thanks.

@maropu maropu closed this Aug 1, 2017
ghost pushed a commit to dbtsai/spark that referenced this pull request Aug 1, 2017
## What changes were proposed in this pull request?
This pr added documents about unsupported functions in Hive UDF/UDTF/UDAF.
This pr relates to apache#18768 and apache#18527.

## How was this patch tested?
N/A

Author: Takeshi Yamamuro <[email protected]>

Closes apache#18792 from maropu/HOTFIX-20170731.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants