-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-38077][SQL] Fix binary compatibility issue with isDeterministic flag #35378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -254,6 +254,20 @@ case class StaticInvoke( | |
| returnNullable: Boolean = true, | ||
| isDeterministic: Boolean = true) extends InvokeLike { | ||
|
|
||
| // This additional constructor is added to keep binary compatibility after the addition of the | ||
| // above `isDeterministic` parameter. See SPARK-38077 for more detail. | ||
| def this( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a comment here explaining why we need this? Without any context, this looks a bit redundant.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure added comments. |
||
| staticObject: Class[_], | ||
| dataType: DataType, | ||
| functionName: String, | ||
| arguments: Seq[Expression], | ||
| inputTypes: Seq[AbstractDataType], | ||
| propagateNull: Boolean, | ||
| returnNullable: Boolean) = { | ||
| this(staticObject, dataType, functionName, arguments, inputTypes, | ||
| propagateNull, returnNullable, true) | ||
| } | ||
|
|
||
| val objectName = staticObject.getName.stripSuffix("$") | ||
| val cls = if (staticObject.getName == objectName) { | ||
| staticObject | ||
|
|
@@ -321,6 +335,20 @@ case class StaticInvoke( | |
| copy(arguments = newChildren) | ||
| } | ||
|
|
||
| object StaticInvoke { | ||
| def apply( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As we have second constructor, do we need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
| staticObject: Class[_], | ||
| dataType: DataType, | ||
| functionName: String, | ||
| arguments: Seq[Expression], | ||
| inputTypes: Seq[AbstractDataType], | ||
| propagateNull: Boolean, | ||
| returnNullable: Boolean): StaticInvoke = { | ||
| StaticInvoke(staticObject, dataType, functionName, arguments, inputTypes, propagateNull, | ||
| returnNullable, true) | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Calls the specified function on an object, optionally passing arguments. If the `targetObject` | ||
| * expression evaluates to null then null will be returned. | ||
|
|
@@ -358,6 +386,20 @@ case class Invoke( | |
| returnNullable : Boolean = true, | ||
| isDeterministic: Boolean = true) extends InvokeLike { | ||
|
|
||
| // This additional constructor is added to keep binary compatibility after the addition of the | ||
| // above `isDeterministic` parameter. See SPARK-38077 for more detail. | ||
| def this( | ||
| targetObject: Expression, | ||
| functionName: String, | ||
| dataType: DataType, | ||
| arguments: Seq[Expression], | ||
| methodInputTypes: Seq[AbstractDataType], | ||
| propagateNull: Boolean, | ||
| returnNullable : Boolean) = { | ||
| this(targetObject, functionName, dataType, arguments, methodInputTypes, propagateNull, | ||
| returnNullable, true) | ||
| } | ||
|
|
||
| lazy val argClasses = ScalaReflection.expressionJavaClasses(arguments) | ||
|
|
||
| override def nullable: Boolean = targetObject.nullable || needNullCheck || returnNullable | ||
|
|
@@ -471,6 +513,19 @@ case class Invoke( | |
| copy(targetObject = newChildren.head, arguments = newChildren.tail) | ||
| } | ||
|
|
||
| object Invoke { | ||
| def apply( | ||
| targetObject: Expression, | ||
| functionName: String, | ||
| dataType: DataType, | ||
| arguments: Seq[Expression], | ||
| methodInputTypes: Seq[AbstractDataType], | ||
| propagateNull: Boolean, | ||
| returnNullable : Boolean): Invoke = { | ||
| Invoke(targetObject, functionName, dataType, arguments, methodInputTypes, | ||
| propagateNull, returnNullable, true) | ||
| } | ||
| } | ||
| object NewInstance { | ||
| def apply( | ||
| cls: Class[_], | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, technically all expressions under catalyst are private, and we don't maintain binary compatibility here. For the same reason, we don't run MiMa too. I believe the downstream projects can work around by reflection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also made this kind of argument change at 3.2.0 too (7d8181b) without keeping binary compatibility. I would go -1 for this change - it makes less sense to keep binary compatibility for this argument specifically in the private package which we documented and we intentionally skip binary compatibility check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning says "between minor releases" ;)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an internal API, and I think it makes less sense to make some changes to keep the binary compatibility here. We should probably mention maintenance release too - note that they were all explicitly
private[sql]before (which we removed at SPARK-16813 to make the code easier to debug). Such compatibility has never been guaranteed in history.One option might be to revert #35243 from
branch-3.2since it is trivial up to my knowledge, V2 expressions are still unstable, and it virtually doesn't affect anything by default in Spark 3.2.1.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave this up to the maintainers to decide whether to revert, keep this change, or break binary compatibility. I'll add the library maintainer context here though (I maintain scalapb and sparksql-scalapb). We currently don't have a way to provide users the ability to use custom types with Datasets (such as sealed trait hierarchies). To remedy that, Spark provides
EncoderandDecoderwhich I believe are public (?), however implementing them requiresExpressionEncoderwhich quickly takes you to use catalysts expressions to do anything useful (instantiating objects, querying them, etc). Spark currently doesn't provide a general solution in this space and apparently library maintainers (myself included) dipped in the internals, and end users depend on us for this.Maintaining compatibility in the Spark/Scala ecosystem is really time consuming for maintainers - see this and this. The need for those versions came from users noticing problems, resulting in debugging by maintainers and so on. I'd like to ask to avoid/minimize binary breakages between maintenance releases. Breaking binary compatibility on feature releases makes it hard enough. Thank you!
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do sympathize with that. In order to address all such problems, expressions for API (V2 expressions) are under heavy development as a long run goal. I also agree that it's probably best to avoid the changes that unnecessarily break the compatibility of private/internal API, e.g., if that does not bring significant dev overhead.
For this PR, it would look awkward and confusing (see the comments in the code): if the developers should keep the binary compatibility in the expression at
StaticInvokeandInvokeor all the expressions. In addition, we should keep adding overloaded constructors, which is not ideal for private/internal API.EncoderandDecoderare indeed public butExpressionEncoderis currently not (that is under internalcatalystpackage). We guarantee, with binary compatibility check, and maintain the binary compatibility and backward compatibility as documented for public API but not for internal API.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK this is always case by case. Yes, we don't expect people to rely on private classes such as
Expression, but the fact is many Spark libraries are already using these private classes.The ecosystem is very important to Spark and I think we should try our best to fix binary compatibility if it does break downstream libraries. I'm +1 to this PR.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we keep this compatibility, we will have to make such exceptions every time when downstream projects are broken for using our internal or private codes. If this is very significant, and a large user group is affected, maybe we should think about making it as an exception but note that this is an exception to the norm.