Skip to content

Conversation

@ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Aug 20, 2020

What changes were proposed in this pull request?

Use initialize(StructObjectInspector argOIs) instead initialize(ObjectInspector[] args) in HiveGenericUDTF.

Why are the changes needed?

In our case, we implement a Hive GenericUDTF and override initialize(StructObjectInspector argOIs). Then it's ok to execute with Hive, but failed with Spark SQL. Here is the Spark SQL error msg:

No handler for UDF/UDAF/UDTF 'com.xxxx.xxxUDTF': java.lang.IllegalStateException: 
Should not be called directly Please make sure your function overrides 
`public StructObjectInspector initialize(ObjectInspector[] args)`.

The reason is Spark HiveGenericUDTF call initialize(ObjectInspector[] argOIs) to init a UDTF, but it's a Deprecated method.

    public StructObjectInspector initialize(StructObjectInspector argOIs) throws UDFArgumentException {
        List<? extends StructField> inputFields = argOIs.getAllStructFieldRefs();
        ObjectInspector[] udtfInputOIs = new ObjectInspector[inputFields.size()];

        for(int i = 0; i < inputFields.size(); ++i) {
            udtfInputOIs[i] = ((StructField)inputFields.get(i)).getFieldObjectInspector();
        }

        return this.initialize(udtfInputOIs);
    }

    @Deprecated
    public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
        throw new IllegalStateException("Should not be called directly");
    }

We should use initialize(StructObjectInspector argOIs) to do this so that we can be compatible both of the two method. Same as Hive.

Does this PR introduce any user-facing change?

Yes, fix UDTF initialize method.

How was this patch tested?

manual test and passed HiveUDFDynamicLoadSuite

protected lazy val inputInspectors = children.map(toInspector)
protected lazy val inputInspectors = {
val inspectors = children.map(toInspector)
val fields = inspectors.indices.map(index => s"col$index").asJava
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name is not important so use col0 col1 ...

@SparkQA
Copy link

SparkQA commented Aug 20, 2020

Test build #127692 has finished for PR 29490 at commit c014b75.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

The failed test caused by #18527, I think we should support UTDF UDTFStack2. cc @wangyum

@dlutleixin
Copy link

yes, I have the same problem while using hive UDTF in spark-sql or spark.sql because of not override
public StructObjectInspector initialize(ObjectInspector[] args)
then I override it according to error msg and it works.

@ulysses-you
Copy link
Contributor Author

Then what do you think about this ? cc @dongjoon-hyun @sunchao

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @ulysses-you .
cc @gatorsmile , too.

@SparkQA
Copy link

SparkQA commented Nov 26, 2020

Test build #131822 has finished for PR 29490 at commit c014b75.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sunchao
Copy link
Member

sunchao commented Dec 30, 2020

oops sorry @ulysses-you just remembered that you pinged me on this PR. This looks mostly good to me exception one question: since the new API is added in 0.13 and Spark still support Hive 0.12, do we need to take care of backward compatibility here?

cc @wangyum

@gatorsmile
Copy link
Member

cc @somani

@ulysses-you
Copy link
Contributor Author

thanks @sunchao . If don't miss something, Hive1.2 has been removed since SPARK-32981 with branch-3.1. Hive0.12 is so far for master branch so we don't need care about compatible with it. cc @dongjoon-hyun isn't it ?

@gatorsmile
Copy link
Member

@sunchao , you think this will be affected by the Hive metastore client versions?

@sunchao
Copy link
Member

sunchao commented Dec 31, 2020

@sunchao , you think this will be affected by the Hive metastore client versions?

This is the part I'm not sure, that is, Spark loading permanent UDF classes from a HMS with version 0.12. But apparently Hive 0.12 doesn't support permanent UDF so seems this is not an issue.

@sunchao
Copy link
Member

sunchao commented Dec 31, 2020

BTW #30665 is trying to solve the same issue and we should consolidate on one PR.

@wangyum
Copy link
Member

wangyum commented Jan 1, 2021

retest this please.

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38167/

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38167/

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Test build #133578 has finished for PR 29490 at commit c014b75.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Test build #133580 has finished for PR 29490 at commit 05bd795.

  • This patch fails Java style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38169/

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38170/

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38170/

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38169/

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38171/

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Test build #133581 has finished for PR 29490 at commit 6286395.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38171/

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38172/

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Test build #133582 has finished for PR 29490 at commit 361f6ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38172/

@SparkQA
Copy link

SparkQA commented Jan 1, 2021

Test build #133583 has finished for PR 29490 at commit 1ba03cd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

The code and PR description look making sense but I think somebody should take a close look ... I will leave it to @sunchao and @wangyum if you guys don't mind.

@sunchao
Copy link
Member

sunchao commented Jan 3, 2021

I think there could be cases where one registers Hive 0.12 UDFs (although not sure how rare this is) in a HMS that Spark talks to, and things may fail when calling the UDFs since they don't have the new initialize method.

@ulysses-you
Copy link
Contributor Author

@sunchao you mean user create a permanent udf which is from Hive0.12 build-in function ? If so I believe it's really rare ..

@sunchao
Copy link
Member

sunchao commented Jan 4, 2021

@ulysses-you nvm please ignore my comment above :) I was thinking the case where Spark somehow loads the Hive 0.12 GenericUDTF class when executing user's custom UDTF but it seems that's not possible.

@wangyum
Copy link
Member

wangyum commented Jan 4, 2021

Do we still need this error message?

if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
s"$noHandlerMsg\nPlease make sure your function overrides " +
"`public StructObjectInspector initialize(ObjectInspector[] args)`."

@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Test build #133632 has finished for PR 29490 at commit b739b1a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38221/

@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38221/

@wangyum wangyum closed this in 48b9611 Jan 10, 2021
@wangyum
Copy link
Member

wangyum commented Jan 10, 2021

Merged to master.

@ulysses-you
Copy link
Contributor Author

thanks all!

@ulysses-you ulysses-you deleted the SPARK-32668 branch January 11, 2021 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants