-
Notifications
You must be signed in to change notification settings - Fork 189
feat: add precision to IntervalDay and new IntervalCompound type #665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
53d48dc
bdc9122
1185a9e
edfa922
f882802
a1b7641
d02ab82
3175d00
090b43f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,7 +24,6 @@ Simple type classes are those that don't support any form of configuration. For | |
| | date | A date within [1000-01-01..9999-12-31]. | `int32` days since `1970-01-01` | ||
| | time | A time since the beginning of any day. Range of [0..86,399,999,999] microseconds; leap seconds need not be supported. | `int64` microseconds past midnight | ||
| | interval_year | Interval year to month. Supports a range of [-10,000..10,000] years with month precision (= [-120,000..120,000] months). Usually stored as separate integers for years and months, but only the total number of months is significant, i.e. `1y 0m` is considered equal to `0y 12m` or `1001y -12000m`. | `int32` years and `int32` months, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. `-10000y 200000m` is **not** allowed) | ||
| | interval_day | Interval day to second. Supports a range of [-3,650,000..3,650,000] days with microsecond precision (= [-315,360,000,000,000,000..315,360,000,000,000,000] microseconds). Usually stored as separate integers for various components, but only the total number of microseconds is significant, i.e. `1d 0s` is considered equal to `0d 86400s`. | `int32` days, `int32` seconds, and `int32` microseconds, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. `3650001d -86400s 0us` is **not** allowed) | ||
| | uuid | A universally-unique identifier composed of 128 bits. Typically presented to users in the following hexadecimal format: `c48ffa9e-64f4-44cb-ae47-152b4e60e77b`. Any 128-bit value is allowed, without specific adherence to RFC4122. | 16-byte `binary` | ||
|
|
||
| ## Compound Types | ||
|
|
@@ -43,6 +42,8 @@ Compound type classes are type classes that need to be configured by means of a | |
| | MAP<K, V> | An unordered list of type K keys with type V values. Keys may be repeated. While the key type could be nullable, keys may not be null. | `repeated KeyValue` (in turn two `Literal`s), all key types matching K and all value types matching V | ||
| | PRECISIONTIMESTAMP<P> | A timestamp with fractional second precision (P, number of digits) 0 <= P <= 9. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. | `int64` seconds, milliseconds, microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 (in an unspecified timezone) | ||
| | PRECISIONTIMESTAMPTZ<P> | A timezone-aware timestamp, with fractional second precision (P, number of digits) 0 <= P <= 9. Similar to aware datetime in Python. | `int64` seconds, milliseconds, microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 UTC | ||
| | INTERVAL_DAY<P> | Interval day to second. Supports a range of [-3,650,000..3,650,000] days with fractional second precision (P, number of digits) 0 <= P <= 9. Usually stored as separate integers for various components, but only the total number of fractional seconds is significant, i.e. `1d 0s` is considered equal to `0d 86400s`. | `int32` days, `int32` seconds, and `int64` fractional seconds, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. `3650001d -86400s 0us` is **not** allowed) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now that this is a parameterised type, do we need to update all the function extensions that refer to it as
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. I'm inclined to do in a follow-along once this is merged. |
||
| | INTERVAL_COMPOUND<P> | A compound interval type that is composed of elements of the underlying elements and rules of both interval_month and interval_day to express arbitrary durations across multiple grains. Substrait gives no definition for the conversion of values between independent grains (e.g. months to days). | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What is a grain in this context?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. year/month vs day/second. there are no consistent reliable translations between each. |
||
|
|
||
| ## User-Defined Types | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially this is struct<iyear, iday>. To specify it in text one would need to use the nested struct syntax. Would this be better flattened with the fields from both types included at the top level?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about the JSON representation or something else? I wasn't really worried about the json representation. Felt like this more clearly communicated the nature as opposed to field copying. It's a weak preference though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both JSON and the Substrait Text format (which was what I was thinking about since it'll require implementation) would be affected. I prefer struct<iyear, iday> to the compound approach. I would also classify my preference as weak.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the substrait text format might have the fields be flattened. I don't think it is critical for it to follow the same structure as the protobufs as long as lossible translation from one to the other is possible, right?