-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: Support defining custom MetricValues in PhysicalPlans #16195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
201ce87 to
7d35c02
Compare
LiaCastaneda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was passing by to take a look, this lgtm 👍
| write!(f, "{timestamp}") | ||
| } | ||
| Self::Custom { name, value } => { | ||
| write!(f, "name:{name} {value:?}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also require a Display for Custom?
|
|
||
| custom_val.aggregate(&other_custom_val); | ||
|
|
||
| if let MetricValue::Custom { value, .. } = custom_val { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this can also be written as
if let MetricValue::Custom { value, .. } = custom_val {
let counter = value
.as_any()
.downcast_ref::()
.expect("Expected CustomCounter");
assert_eq!(counter.count.load(Ordering::Relaxed), 31);
}. since custom_val will always be MetricValue::Custom
b2f0c90 to
0942254
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice initiative! I imagine that this would be generally very useful. It would be nice to have a clear API as small an well documented as possible. I think the proposal here goes in the good direction (I'll try a couple of things, but for now I have no other idea better than this)
| Self::Custom { name, value } => { | ||
| panic!("MetricValue::as_usize isn't supported for custom metric values. ({name}: {value:?})") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some CustomMetricValue implementations would like to implement as_usize(&self)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the possibility to implement it with a default that still throws 👍
3cbdc68 to
72bdb36
Compare
|
Thank you @sfluor for this PR @gabotechs / @LiaCastaneda please ping me when you think this PR is ready for a review / merge Thank you for the help getting it ready |
gabotechs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// ``` | ||
| /// | ||
| /// [`MetricValue::Custom`]: super::MetricValue::Custom | ||
| pub trait CustomMetricValue: Display + Debug + Send + Sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this new type of metric is more meaty than the other ones, and we can expect people to come here looking at the docs, what do you think about factoring it out to its own file?
metrics
mod.rs
baseline.rs
builder.rs
+ custom.rs
value.rs
This file is also dangerously approaching the 1000 LOC mark, so it will play in maintainability's favor
| .map(|nanos| nanos as usize) | ||
| .unwrap_or(0), | ||
| Self::Custom { name, value } => { | ||
| value.as_usize().unwrap_or_else(|| panic!("MetricValue::as_usize isn't supported for custom metric values. ({name}: {value:?})")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually panicking is avoided as much as possible, but it seems like it's not the only place where this can happen, and I have no better alternative without changing several function signatures. I think it's acceptable, but I'd weight other contributors opinion on this if any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use .unwrap_or(0) ?
edit: I guess its unconventional given that StartTimestamp & EndTimestamp defaults to 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the panic should be removed good as it leaves a tricky to use correctly API: if users don't implement CustomMetric::as_usize then their query may panic.
One way to avoid the panic would be to cange the signature of as_usize to return usize rather than Option<usize> and provide a default value -- maybe 0
In addition to returning usize we could also avoid providing a custom implementation of as_usizeand force the implementers to decide explicitly
bd5e3f6 to
676cda0
Compare
|
I have addressed the remaining comments cc @LiaCastaneda / @gabotechs / @alamb |
gabotechs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think this one is ready for a review cc @alamb
| if name != other_name { | ||
| panic!( | ||
| "Unsupported metric aggregation between {name} and {other_name}" | ||
| ) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other metrics do not seem to be applying this restriction, for being consistent with other metrics, maybe we can relax this restriction here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
676cda0 to
74e51f1
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @sfluor @gabotechs and @LiaCastaneda -- I think this is quite close
The only thing I think we should fix prior to merge is remove the panic.
Another thing that would be great would be a more integration style test that showed using a CustomMetric in a overall SQL query plan somehow (maybe we could wrap an existing ExecutionPlan with one that captured more metrics or something. This could be done as a follow on PR when someone has the time (we can file a follow on ticket)
Also, The PR description still says
Are there any user-facing changes?
There's one breaking change which is that MetricValue isn't PartialEq anymore (to avoid having this constraint on CustomMetric values).
Which I don't think is accurate anymore so perhaps we can update that
| .map(|nanos| nanos as usize) | ||
| .unwrap_or(0), | ||
| Self::Custom { name, value } => { | ||
| value.as_usize().unwrap_or_else(|| panic!("MetricValue::as_usize isn't supported for custom metric values. ({name}: {value:?})")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the panic should be removed good as it leaves a tricky to use correctly API: if users don't implement CustomMetric::as_usize then their query may panic.
One way to avoid the panic would be to cange the signature of as_usize to return usize rather than Option<usize> and provide a default value -- maybe 0
In addition to returning usize we could also avoid providing a custom implementation of as_usizeand force the implementers to decide explicitly
74e51f1 to
e14e66f
Compare
|
Thanks for the review @alamb ! I removed the panic and went with a default value of 0 |
See this issue: apache#16044 The MetricValue enum currently exposes only single-value statistics: counts, gauges, timers, timestamps, and a few hard-coded variants such as SpillCount or OutputRows. However there's often the need for custom metrics when using custom PhysicalPlans. At Datadog for instance we had the need for tracking the distribution of latencies of the sub-queries issued by a given phyiscal plan to be able to pin-point outliers. Similarly tracking the topN slowest sub-query is something that has been quite useful to help us debug slow queries. This PR allows each user to define their own MetricValue types as long as they are aggregatable. A very basic example is included in the PR using a custom counter.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks everyone -- this looks great to me
e14e66f to
b3e396b
Compare
|
There was one remaining test failing, I fixed it and updated the PR to the latest branch. Should it target |
The |
yeah, let's target the main branch |
|
Let's wait to merge this PR until we ship DataFusion 48 to limit the breaking changes I think we'll be able to merge this in the next few days |
|
We have now made the release-48 branch so what is merged into main will be released as part of DataFusion 49.0.0 |
|
🚀 |
…6195) See this issue: apache#16044 The MetricValue enum currently exposes only single-value statistics: counts, gauges, timers, timestamps, and a few hard-coded variants such as SpillCount or OutputRows. However there's often the need for custom metrics when using custom PhysicalPlans. At Datadog for instance we had the need for tracking the distribution of latencies of the sub-queries issued by a given phyiscal plan to be able to pin-point outliers. Similarly tracking the topN slowest sub-query is something that has been quite useful to help us debug slow queries. This PR allows each user to define their own MetricValue types as long as they are aggregatable. A very basic example is included in the PR using a custom counter.
|
|
||
| /// Compares this value with another custom value. | ||
| fn is_eq(&self, other: &Arc<dyn CustomMetricValue>) -> bool; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very simple trait with great example in the comment. I also like the test. Thanks @sfluor
…6195) See this issue: apache#16044 The MetricValue enum currently exposes only single-value statistics: counts, gauges, timers, timestamps, and a few hard-coded variants such as SpillCount or OutputRows. However there's often the need for custom metrics when using custom PhysicalPlans. At Datadog for instance we had the need for tracking the distribution of latencies of the sub-queries issued by a given phyiscal plan to be able to pin-point outliers. Similarly tracking the topN slowest sub-query is something that has been quite useful to help us debug slow queries. This PR allows each user to define their own MetricValue types as long as they are aggregatable. A very basic example is included in the PR using a custom counter. (cherry picked from commit fbafea4)
Which issue does this PR close?
Rationale for this change
See this issue: #16044
The MetricValue enum currently exposes only single-value statistics: counts, gauges, timers, timestamps, and a few hard-coded variants such as SpillCount or OutputRows.
However there's often the need for custom metrics when using custom PhysicalPlans. At Datadog for instance we had the need for tracking the distribution of latencies of the sub-queries issued by a given phyiscal plan to be able to pin-point outliers.
Similarly tracking the topN slowest sub-query is something that has been quite useful to help us debug slow queries.
This PR allows each user to define their own MetricValue types as long as they are aggregatable. A very basic example is included in the PR using a custom counter.
What changes are included in this PR?
Includes a new enum-type in
MetricValueto define custom aggregatable metric values.Are these changes tested?
I added two basic tests to show the usage and ensure it is working as expected for a basic custom counter type.
Are there any user-facing changes?
None