Skip to content

Conversation

@Blizzara
Copy link
Contributor

@Blizzara Blizzara commented Jul 22, 2024

Which issue does this PR close?

N/A
closes #12074

Rationale for this change

DF was using the Substrait type_variations on a Substrait Timestamp to indicate whether a timestamp is seconds/millis/micros/nanos, while that works within DF that's not interoperable with other systems. Substrait has since introduced a PrecisionTimestamp type which includes the precision as a first class option, so we should use that instead.

What changes are included in this PR?

  • Bump substrait to latest and fix breaks
  • Support consuming PrecisionTimestamp and PrecisionTimestampTz types in addition to the deprecate Timestamp type
  • Produce PrecisionTimestamp and PrecisionTimestampTz types

Are these changes tested?

Yes, with unit tests

Are there any user-facing changes?

@github-actions github-actions bot added the substrait Changes to the substrait crate label Jul 22, 2024
@Blizzara Blizzara force-pushed the avo/precision-timestamp branch from 49d90c8 to 350e00f Compare July 24, 2024 08:13
@Blizzara Blizzara force-pushed the avo/precision-timestamp branch from 350e00f to e8753f6 Compare August 20, 2024 11:16
r#type::Kind::IntervalYear(_) => {
Ok(DataType::Interval(IntervalUnit::YearMonth))
}
r#type::Kind::IntervalDay(_) => Ok(DataType::Interval(IntervalUnit::DayTime)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just cleanup - we don't check type variations for types where we don't use them

})) => {
// DF only supports millisecond precision, so we lose the micros here
ScalarValue::new_interval_dt(*days, (seconds * 1000) + (microseconds / 1000))
// DF only supports millisecond precision, so for any more granular type we lose precision
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes were needed as part of bumping substrait (substrait-io/substrait#665)

kind: Some(r#type::Kind::IntervalDay(r#type::IntervalDay {
type_variation_reference: DEFAULT_TYPE_VARIATION_REF,
nullability,
precision: Some(3), // DayTime precision is always milliseconds
Copy link
Contributor Author

@Blizzara Blizzara Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

required due to bumping substrait substrait-io/substrait#665

@Blizzara Blizzara force-pushed the avo/precision-timestamp branch from e8753f6 to a639feb Compare August 20, 2024 13:41
substrait = { version = "0.36.0", features = ["serde"] }
pbjson-types = "0.7"
prost = "0.13"
substrait = { version = "0.41", features = ["serde"] }
Copy link
Contributor Author

@Blizzara Blizzara Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a bump for precision timestamp types to have precision, and their value as i64 instead of u64

pbjson and prost need to be bumped to match substrait's deps

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Blizzara

I also have a PR prepared to update a bunch of these dependencies as well queued up for the arrow release next week: #12032

join_rel::JoinType::Anti => Ok(JoinType::LeftAnti),
join_rel::JoinType::Semi => Ok(JoinType::LeftSemi),
join_rel::JoinType::LeftAnti => Ok(JoinType::LeftAnti),
join_rel::JoinType::LeftSemi => Ok(JoinType::LeftSemi),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed due to bumping substrait - I think this is just a compile-time break tho, the actual protobuf values stay the same

@Blizzara Blizzara force-pushed the avo/precision-timestamp branch from a639feb to 5347f30 Compare August 20, 2024 14:50
@Blizzara Blizzara marked this pull request as ready for review August 20, 2024 14:55
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Blizzara -- this change makes sense to me.

substrait = { version = "0.36.0", features = ["serde"] }
pbjson-types = "0.7"
prost = "0.13"
substrait = { version = "0.41", features = ["serde"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Blizzara

I also have a PR prepared to update a bunch of these dependencies as well queued up for the arrow release next week: #12032

pub const DEFAULT_TYPE_VARIATION_REF: u32 = 0;
pub const UNSIGNED_INTEGER_TYPE_VARIATION_REF: u32 = 1;

#[deprecated(since = "41.0.0", note = "Use `PrecisionTimestamp(Tz)` type instead")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SInce we have already released 41 https://crates.io/crates/datafusion/41.0.0 I think we should probably label this with the next version. For example:

Suggested change
#[deprecated(since = "41.0.0", note = "Use `PrecisionTimestamp(Tz)` type instead")]
#[deprecated(since = "42.0.0", note = "Use `PrecisionTimestamp(Tz)` type instead")]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah yeah, this PR was a long time in the making 😅 fixed in a05637b

@alamb alamb merged commit 89cb6a2 into apache:main Aug 22, 2024
@alamb
Copy link
Contributor

alamb commented Aug 22, 2024

Thanks again @Blizzara

@Blizzara Blizzara deleted the avo/precision-timestamp branch August 22, 2024 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants