Skip to content

Conversation

@wiedld
Copy link
Contributor

@wiedld wiedld commented Aug 31, 2024

Which issue does this PR close?

Closes #12119

Rationale for this change

We plan to change Datafusion to use BinaryVIew and Utf8View by default, however, the rest of the arrow ecosystem may not be ready for these data types. As such, we want to provide the option to cast the query output to a non-view type.

Because either Utf8 or LargeUtf8 can represented as a Utf8View, the cast on output converts to the larger type. Same with BinaryView to LargeBinary.

What changes are included in this PR?

During type coercion, after the plan has been coerced -- then based upon the final output type determine if an additional cast is needed.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.
I made one existing API public (no longer crate private), and we added the optimizer config option expand_views_at_output.

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Aug 31, 2024
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Aug 31, 2024
@wiedld wiedld marked this pull request as ready for review September 4, 2024 20:53
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wiedld -- this looks about perfect. I think it needs one more test, but otherwise it looks great

I had some small code improvement suggestions too but I don't think they are necessary

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wiedld -- looks great to me

let if_not_coerced = "Projection: a\n Sort: a ASC NULLS FIRST\n Projection: a\n EmptyRelation";
do_not_coerce_on_output(plan.clone(), if_not_coerced)?;
// Plan B: coerce requested: Utf8View => LargeUtf8 only on outermost
let if_coerced = "Projection: CAST(a AS LargeUtf8)\n Sort: a ASC NULLS FIRST\n Projection: a\n EmptyRelation";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb alamb merged commit 391f5cb into apache:main Sep 9, 2024
@alamb
Copy link
Contributor

alamb commented Sep 9, 2024

🚀

@alamb alamb deleted the 12119/coerce-on-output branch September 9, 2024 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate documentation Improvements or additions to documentation logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Convert Utf8View/BinaryView --> Utf8 / Binary at output

2 participants