-
Notifications
You must be signed in to change notification settings - Fork 2
Add tests for vectorized UDF. #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests for vectorized UDF. #26
Conversation
|
As for null-related errors, the reason is Pandas changes dtypes if integral type columns contain |
| from pyspark.sql.functions import pandas_udf | ||
| import pandas as pd | ||
| df = self.spark.range(100000) | ||
| f0 = pandas_udf(lambda **kwargs: pd.Series(1).repeat(kwargs['size']), LongType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that you use **kwargs way for size hint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give it a try and we can see how it turns out, then dscuss
|
Thanks @ueshin , this looks good. I'll merge it now and work on the errors.
Thanks for the pointer, I'll take a look at what's not implemented and see if I can help push that along. |
Modify test_vectorized_udf_datatype_string not to fail by unrelated error. closes #26
|
Thanks, merged now. I still need to fix the null-related errors with your suggestion. Would you mind if I used the |
|
@BryanCutler Sure, go ahead and use it. Thanks! |
Added tests from apache#19147 and adjusted.
Currently there are 1 failure and 7 errors with these tests.
test_vectorized_udf_datatype_string