-
-
Notifications
You must be signed in to change notification settings - Fork 777
Additional code metrics instrumentation #4310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Now we also track: 1. Number of rules processed by rules engine (counter + timer) 2. How long it took to process rule for each unique trigger instance.
| trigger_instance, trigger_constants.TRIGGER_INSTANCE_PROCESSING) | ||
| self.rules_engine.handle_trigger_instance(trigger_instance) | ||
|
|
||
| with CounterWithTimer(key="st2.rule.processed"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bigmstone I'm open to a better name, something which would also make it consistent with action runner metric names.
I couldn't come up with anything better :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could also just call it st2.trigger_instance.processed, but this could also be a bit deceiving, imo, because trigger instances can also come in through the API and can be handled by other services.
At some point we might also care about total trigger instances (also the ones which are / will be processed elsewhere), but we definitely care specifically about trigger instances processed by the rules engine so the metrics key should convey that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably create a function to generate these namings in a standard way. Have a few input params so it's not so subjective, but like you I don't have many strong convictions here. I think it mainly just matters that it's consistent.
|
@bigmstone While working on that, I noticed we are missing "Gauge" metric type (so we can measure also total number of requests and similar and not just requests / second). I will add that in this PR. |
NOTE: Prometheus driver conflats counter and gauge atm a bit, we should eventually sort this out so using different drivers won't result in using different metrics type and different representation and visualization.
services with various metrics.
development environment.
bigmstone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
just cause additional confusion and harder to actually find out the actual metric name.
|
may be:
|
|
@dzimine Thanks for the feedback. Those are good metrics and most of them are tracked already now via this PR (docs at https://github.com/StackStorm/st2docs/pull/787/files?short_path=fa1dde0#diff-fa1dde031ec91f548c4d4c3a722ad980).
We also have that now, but some of those are scoped just to a rule reference and not trigger instance. I will also make sure we have all of that data on per trigger instance basis (because yes, that's important, trigger instance + rule combo could give us a clue on what is going on - e.g. is it something with trigger instance payload in combination with some rule criteria which is slow, etc.). On a related note - talked about this with @bigmstone on Slack yesterday. There is also a lot of "derived" metrics we could add to StackStorm, but, imo, that would add a lot of overhead and it's not necessary when those metrics can be derived using other existing metrics inside the monitoring visualization tool (e.g. execution status one for queue size, etc.). |
statsd counters are of a special type which is aggregated, sampled and calculated into rate so decreasing those will result in invalid / unexpected values. Decreasing them would only make sense if statsd wouldn't do any processing on them and treat them as raw values (e.g. gauges).
groups metrics based on the type so the suffix is redundant.
…ckStorm/st2 into rules_engine_metrics_instrumentation
This option can specify an optional prefix which is prepended to each metric key / name. This comes handy when you want to use the same statsd or other backend instance for multiple environments (each environment would specify a different prefix).
before metric name.
This pull request adds some additional metrics instrumentation to various services.
Rules Engine
st2api, st2auth, st2stream
Other Changes
TODO