Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,20 @@ case class StaticInvoke(
returnNullable: Boolean = true,
isDeterministic: Boolean = true) extends InvokeLike {

// This additional constructor is added to keep binary compatibility after the addition of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, technically all expressions under catalyst are private, and we don't maintain binary compatibility here. For the same reason, we don't run MiMa too. I believe the downstream projects can work around by reflection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also made this kind of argument change at 3.2.0 too (7d8181b) without keeping binary compatibility. I would go -1 for this change - it makes less sense to keep binary compatibility for this argument specifically in the private package which we documented and we intentionally skip binary compatibility check.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning says "between minor releases" ;)

Copy link
Member

@HyukjinKwon HyukjinKwon Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal API, and I think it makes less sense to make some changes to keep the binary compatibility here. We should probably mention maintenance release too - note that they were all explicitly private[sql] before (which we removed at SPARK-16813 to make the code easier to debug). Such compatibility has never been guaranteed in history.

One option might be to revert #35243 from branch-3.2 since it is trivial up to my knowledge, V2 expressions are still unstable, and it virtually doesn't affect anything by default in Spark 3.2.1.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this up to the maintainers to decide whether to revert, keep this change, or break binary compatibility. I'll add the library maintainer context here though (I maintain scalapb and sparksql-scalapb). We currently don't have a way to provide users the ability to use custom types with Datasets (such as sealed trait hierarchies). To remedy that, Spark provides Encoder and Decoder which I believe are public (?), however implementing them requires ExpressionEncoder which quickly takes you to use catalysts expressions to do anything useful (instantiating objects, querying them, etc). Spark currently doesn't provide a general solution in this space and apparently library maintainers (myself included) dipped in the internals, and end users depend on us for this.

Maintaining compatibility in the Spark/Scala ecosystem is really time consuming for maintainers - see this and this. The need for those versions came from users noticing problems, resulting in debugging by maintainers and so on. I'd like to ask to avoid/minimize binary breakages between maintenance releases. Breaking binary compatibility on feature releases makes it hard enough. Thank you!

Copy link
Member

@HyukjinKwon HyukjinKwon Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do sympathize with that. In order to address all such problems, expressions for API (V2 expressions) are under heavy development as a long run goal. I also agree that it's probably best to avoid the changes that unnecessarily break the compatibility of private/internal API, e.g., if that does not bring significant dev overhead.

For this PR, it would look awkward and confusing (see the comments in the code): if the developers should keep the binary compatibility in the expression at StaticInvoke and Invoke or all the expressions. In addition, we should keep adding overloaded constructors, which is not ideal for private/internal API.

Encoder and Decoder are indeed public but ExpressionEncoder is currently not (that is under internal catalyst package). We guarantee, with binary compatibility check, and maintain the binary compatibility and backward compatibility as documented for public API but not for internal API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this is always case by case. Yes, we don't expect people to rely on private classes such as Expression, but the fact is many Spark libraries are already using these private classes.

The ecosystem is very important to Spark and I think we should try our best to fix binary compatibility if it does break downstream libraries. I'm +1 to this PR.

Copy link
Member

@HyukjinKwon HyukjinKwon Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we keep this compatibility, we will have to make such exceptions every time when downstream projects are broken for using our internal or private codes. If this is very significant, and a large user group is affected, maybe we should think about making it as an exception but note that this is an exception to the norm.

// above `isDeterministic` parameter. See SPARK-38077 for more detail.
def this(
Copy link
Member

@viirya viirya Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here explaining why we need this? Without any context, this looks a bit redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure added comments.

staticObject: Class[_],
dataType: DataType,
functionName: String,
arguments: Seq[Expression],
inputTypes: Seq[AbstractDataType],
propagateNull: Boolean,
returnNullable: Boolean) = {
this(staticObject, dataType, functionName, arguments, inputTypes,
propagateNull, returnNullable, true)
}

val objectName = staticObject.getName.stripSuffix("$")
val cls = if (staticObject.getName == objectName) {
staticObject
Expand Down Expand Up @@ -321,6 +335,20 @@ case class StaticInvoke(
copy(arguments = newChildren)
}

object StaticInvoke {
def apply(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we have second constructor, do we need apply?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The this constructor ends up being a static method on StaticInvoke, the apply as a method on StaticInvoke$.class which serves as the companion object of StaticInvoke. They are both needed for binary compatibility. The Scala compile generates both of them for the default constructor.

staticObject: Class[_],
dataType: DataType,
functionName: String,
arguments: Seq[Expression],
inputTypes: Seq[AbstractDataType],
propagateNull: Boolean,
returnNullable: Boolean): StaticInvoke = {
StaticInvoke(staticObject, dataType, functionName, arguments, inputTypes, propagateNull,
returnNullable, true)
}
}

/**
* Calls the specified function on an object, optionally passing arguments. If the `targetObject`
* expression evaluates to null then null will be returned.
Expand Down Expand Up @@ -358,6 +386,20 @@ case class Invoke(
returnNullable : Boolean = true,
isDeterministic: Boolean = true) extends InvokeLike {

// This additional constructor is added to keep binary compatibility after the addition of the
// above `isDeterministic` parameter. See SPARK-38077 for more detail.
def this(
targetObject: Expression,
functionName: String,
dataType: DataType,
arguments: Seq[Expression],
methodInputTypes: Seq[AbstractDataType],
propagateNull: Boolean,
returnNullable : Boolean) = {
this(targetObject, functionName, dataType, arguments, methodInputTypes, propagateNull,
returnNullable, true)
}

lazy val argClasses = ScalaReflection.expressionJavaClasses(arguments)

override def nullable: Boolean = targetObject.nullable || needNullCheck || returnNullable
Expand Down Expand Up @@ -471,6 +513,19 @@ case class Invoke(
copy(targetObject = newChildren.head, arguments = newChildren.tail)
}

object Invoke {
def apply(
targetObject: Expression,
functionName: String,
dataType: DataType,
arguments: Seq[Expression],
methodInputTypes: Seq[AbstractDataType],
propagateNull: Boolean,
returnNullable : Boolean): Invoke = {
Invoke(targetObject, functionName, dataType, arguments, methodInputTypes,
propagateNull, returnNullable, true)
}
}
object NewInstance {
def apply(
cls: Class[_],
Expand Down