-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24568] Code refactoring for DataType equalsXXX methods #21787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much does this deduplicate? Seems overkill and even more complicated and not reducing the codes much. Roughly guess 60ish line deletion and 70ish line addition.
|
Thanks @HyukjinKwon! I removed old commented code in new comment.
Please let me know if I need to change anything to make code changes more readable & less complex. |
|
But the current code looks less readable and adding more lines. I would rather leave this as was. |
|
true. Currently we have just 3 variations of comparing two datatypes for equality. Adding even one more equality function would easily cause writing same repetitive code which would negate observed increase in code lines. |
|
Constant variables and squashing the logic into one function look not worth enough and overkill. Less duplication is good of course but it doesn't look worth enough by the overkill. I would focus on more important stuff. -1 from me. |
Closes apache#17422 Closes apache#17619 Closes apache#18034 Closes apache#18229 Closes apache#18268 Closes apache#17973 Closes apache#18125 Closes apache#18918 Closes apache#19274 Closes apache#19456 Closes apache#19510 Closes apache#19420 Closes apache#20090 Closes apache#20177 Closes apache#20304 Closes apache#20319 Closes apache#20543 Closes apache#20437 Closes apache#21261 Closes apache#21726 Closes apache#14653 Closes apache#13143 Closes apache#17894 Closes apache#19758 Closes apache#12951 Closes apache#17092 Closes apache#21240 Closes apache#16910 Closes apache#12904 Closes apache#21731 Closes apache#21095 Added: Closes apache#19233 Closes apache#20100 Closes apache#21453 Closes apache#21455 Closes apache#18477 Added: Closes apache#21812 Closes apache#21787 Author: hyukjinkwon <[email protected]> Closes apache#21781 from HyukjinKwon/closing-prs.
What changes were proposed in this pull request?
Currently, DataType equals* methods has lot of code duplication. This PR adds -
Current functionality -
DataTypes are matched on three factors - (fieldName, dataType, nullability)
Current APIs
Same API contracts are maintained but just new helper functions equalsDataTypes, isSameFieldName, isSameNullability added to give more flexibility & code centralization.
Additional private vals are added to set behavior of helper functions.
Any combination of above variables can be generated in future to add new equals* APIs. For instance,
can easily be generated by calling -
New scalatests added to test missing API unit testing as well.
How was this patch tested?
Code was tested with existing scalatests. New scalatests added for more robust testing of this functionality.
All tests were ran locally to make sure it doesn't affect elsewhere.
Please review http://spark.apache.org/contributing.html before opening a pull request.