Skip to content

Conversation

@jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Jun 22, 2025

Which issue does this PR close?

Rationale for this change

Allows Datafusion plans with the Duration type to be converted to Substrait plans.

What changes are included in this PR?

This commit adds support for Arrow Duration types in Substrait plans. Substrait has no equivalent to the Duration type, which only includes time-based information (i.e. some multiple of nanoseconds). However, the Substrait Interval Day type is very similar, it stores day and time-based information. This commit converts Arrow Duration types into Substrait Interval Day types, with a Duration specific type variation reference, so that it can round trip back to a Duration.

An alternative approach would be to use a new Substrait user defined type.

Are these changes tested?

Yes

Are there any user-facing changes?

Substrait now support Duration types

@github-actions github-actions bot added the substrait Changes to the substrait crate label Jun 22, 2025
This commit adds support for Arrow Duration types in Substrait plans.
Substrait has no equivalent to the Duration type, which only includes
time-based information (i.e. some multiple of nanoseconds). However,
the Substrait Interval Day type is very similar, it stores day and
time-based information. This commit converts Arrow Duration types into
Substrait Interval Day types, with a Duration specific type variation
reference, so that it can round trip back to a Duration.

An alternative approach would be to use a new Substrait user defined
type.

Resolves apache#16285
@jkosh44 jkosh44 force-pushed the substrait-duration branch from b30e0de to 8047a3e Compare June 22, 2025 19:25
@jkosh44 jkosh44 marked this pull request as ready for review June 22, 2025 19:25
@alamb
Copy link
Contributor

alamb commented Jun 23, 2025

Thank you @jkosh44 🙏

@gabotechs is there any chance you have time to review this PR?

@gabotechs
Copy link
Contributor

Sure! I'll get it done in a couple of days. Thanks for submitting this!

Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice, +1

Comment on lines 58 to 59
pub const DEFAULT_INTERVAL_DAY_TYPE_VARIATION: u32 = 0;
pub const DURATION_INTERVAL_DAY_TYPE_VARIATION: u32 = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea using the type variation capabilities, this looks good to me 👍.

Nit 1: maybe prefixing this constants with _REF as all the others?
Nit 2: maybe adding a comment explaining that the default variation maps to DataType::Interval(DayTime) and the other maps to DataType::Duration?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review @gabotechs -- I'll wait to see if @jkosh44 would like to make the suggested changes.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Jun 24, 2025

Just made the suggested updates.

I don't 100% know how the substrait plans are used, but I am slightly worried about one thing with this approach. Would it be possible to construct a substrait plan that converts the Duration to an IntervalDay, execute some operation that adds some days to the interval, then attempts and fails to convert it back to a duration?

@alamb
Copy link
Contributor

alamb commented Jun 25, 2025

I don't 100% know how the substrait plans are used, but I am slightly worried about one thing with this approach. Would it be possible to construct a substrait plan that converts the Duration to an IntervalDay, execute some operation that adds some days to the interval, then attempts and fails to convert it back to a duration?

I think you can always go back from Duration --> IntervalDay

But in any case, I don't think this would be a substrait specific difference -- it would be in the DataFusion engine itself

@alamb alamb merged commit dd2de91 into apache:main Jun 25, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Jun 25, 2025

Thanks again @jkosh44 and @gabotechs

@jkosh44 jkosh44 deleted the substrait-duration branch June 25, 2025 23:14
@jkosh44
Copy link
Contributor Author

jkosh44 commented Jun 28, 2025

@gabotechs This response from Substrait makes me a little nervous about this approach: substrait-io/substrait#822 (comment)

Duration doesn't really map 1:1 with IntervalDay. Duration has a single logical component for time, but IntervalDay has two logical components, time and days.

@gabotechs
Copy link
Contributor

Duration doesn't really map 1:1 with IntervalDay. Duration has a single logical component for time, but IntervalDay has two logical components, time and days.

🤔 it still does not sound too bad to me.

If you want to play it safe, one thing we can do is to take other types that are failing, like Float16, attempt to ship a solution based on Substrait UDTs, and if it turns out to be clean then port the same approach it to Duration/IntervalDay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[substrait] [sqllogictest] Unsupported cast type: Duration

3 participants