-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: support logical plan ScalarSubquery protobuf serde #18167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| pub fn parse_expr( | ||
| proto: &protobuf::LogicalExprNode, | ||
| registry: &dyn FunctionRegistry, | ||
| ctx: &TaskContext, | ||
| codec: &dyn LogicalExtensionCodec, | ||
| ) -> Result<Expr, Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Expr::ScalarSubquery essentially contains a LogicalPlan, in order to parse this logical plan we need to use LogicalPlanNode::try_into_logical_plan which needs a TaskContext. So had to expand this API to required a TaskContext instead of simply dyn FunctionRegistry (which TaskContext implements). This propagates in many other places.
| ExprType::ScalarSubquery(scalar_subquery) => { | ||
| let subquery = scalar_subquery | ||
| .subquery | ||
| .as_ref() | ||
| .ok_or_else(|| Error::required("subquery"))?; | ||
| Ok(Expr::ScalarSubquery(Subquery { | ||
| subquery: Arc::new(subquery.try_into_logical_plan(ctx, codec)?), | ||
| outer_ref_columns: parse_exprs( | ||
| &scalar_subquery.outer_ref_columns, | ||
| ctx, | ||
| codec, | ||
| )?, | ||
| spans: Spans::new(), | ||
| })) | ||
| } | ||
| ExprType::OuterReferenceColumn(OuterReferenceColumn { field, column }) => { | ||
| let column = column.to_owned().ok_or_else(|| Error::required("column"))?; | ||
| let field = field.as_ref().required("field")?; | ||
| Ok(Expr::OuterReferenceColumn(Arc::new(field), column.into())) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actual implementation for deserializing
| Expr::ScalarSubquery(Subquery { | ||
| subquery, | ||
| outer_ref_columns, | ||
| spans: _, | ||
| }) => protobuf::LogicalExprNode { | ||
| expr_type: Some(ExprType::ScalarSubquery(Box::new( | ||
| protobuf::ScalarSubquery { | ||
| subquery: Some(Box::new( | ||
| protobuf::LogicalPlanNode::try_from_logical_plan(subquery, codec) | ||
| .map_err(|e| Error::General(format!("Proto serialization error: Failed to serialize Scalar Subquery: {e}")))?, | ||
| )), | ||
| outer_ref_columns: serialize_exprs(outer_ref_columns, codec)?, | ||
| }, | ||
| ))), | ||
| }, | ||
| Expr::OuterReferenceColumn(field, column) => { | ||
| protobuf::LogicalExprNode { | ||
| expr_type: Some(ExprType::OuterReferenceColumn(OuterReferenceColumn{ | ||
| field: Some(field.as_ref().try_into()?), column: Some(column.into()) | ||
| })) | ||
| } | ||
| } | ||
| Expr::InSubquery(_) | Expr::Exists { .. } => { | ||
| return Err(Error::NotImplemented("Proto serialization error: Expr::InSubquery(_) | Expr::Exists { .. } not supported".to_string())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actual implementation for serializing
| protobuf::SelectionNode { | ||
| input: Some(Box::new(input)), | ||
| expr: Some(serialize_expr( | ||
| expr: Some(Box::new(serialize_expr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes to the datafusion.proto seems to also affect some other nodes during the generation of the proto code, probably because of the enum becoming too big so it now boxes seemingly unrelated nodes.
| bytes: &[u8], | ||
| registry: &dyn FunctionRegistry, | ||
| ) -> Result<Self>; | ||
| fn from_bytes_with_ctx(bytes: &[u8], ctx: &TaskContext) -> Result<Self>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking change to API here since we need a TaskContext because of that parse_expr change; I figured it'd make sense to also rename the method to be more accurate.
| /// [`from_bytes_with_registry`]: Self::from_bytes_with_registry | ||
| fn from_bytes(bytes: &[u8]) -> Result<Self> { | ||
| Self::from_bytes_with_registry(bytes, ®istry::NoRegistry {}) | ||
| Self::from_bytes_with_ctx(bytes, &TaskContext::default()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default TaskContext will have an empty function registry just like NoRegistry so this is equivalent; I also deleted NoRegistry entirely as it wasn't used anywhere else after it was removed here.
| // Copied from from_bytes_with_ctx below but with placeholder registry instead of | ||
| // default. | ||
| { | ||
| let bytes: &[u8] = &bytes; | ||
| let protobuf = protobuf::LogicalExprNode::decode(bytes).map_err(|e| { | ||
| plan_datafusion_err!("Error decoding expr as protobuf: {e}") | ||
| })?; | ||
|
|
||
| let extension_codec = PlaceholderLogicalExtensionCodec {}; | ||
| logical_plan::from_proto::parse_expr( | ||
| &protobuf, | ||
| &TaskContext::default(), | ||
| &extension_codec, | ||
| ) | ||
| .map_err(|e| plan_datafusion_err!("Error parsing protobuf into Expr: {e}"))?; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit ugly, but it was because since we're now requiring TaskContext instead of dyn FunctionRegistry, there was no easy way within TaskContext to just pass through arbitrary UDFs/UDAFs/UDWFs like what PlaceHolderRegistry was meant to do. Instead, we achieve the same behaviour via the codec. So I'm copying the from_bytes_with_ctx code into here to customize the codec we pass in.
|
I plan to review this, hopefully this weekend |
|
I'll close this PR as I don't think it's important enough to warrant reviewer capacity.
I think the main point highlighted by this PR is the inability of the current logical serde proto API to handle nested plans; otherwise the changes are rather elementary. Perhaps we can revisit this PR more easily after addressing #18477 |
|
Thanks for the exploration @Jefffrey and sorry I didn't give a good review. I agree with your conclusion that the takeaway here is that this seems like it should be easy but ends up being hard. |
Which issue does this PR close?
datafusion-proto#2640Rationale for this change
Be able to serialize/deserialize logical plans containing scalar subqueries & outer column references in proto.
What changes are included in this PR?
ScalarSubqueryandOuterReferenceColumnvariants toLogicalExprNodeindatafusion.protoparse_expr()to accept a&TaskContextinstead of a&dyn FunctionRegistry, making accompanying plumbing changesSerializeable::from_bytes_with_registrytrait method toSerializeable::from_bytes_with_ctxsince the&dyn FunctionRegistryparameter was replaced with&TaskContextlike aboveAre these changes tested?
Added roundtrip test for a query with a scalar subquery and an outer reference column.
Are there any user-facing changes?
Yes, see points 2 & 3 in the above section