Additional code metrics instrumentation #4310

Kami · 2018-08-17T11:16:01Z

This pull request adds some additional metrics instrumentation to various services.

Rules Engine

Track how many rules (trigger instances) are processed by rules engine and how long each processing took (counter + timer).
Track how long it took to process each unique trigger instance by the rules engine (timer, counter makes no sense since each TriggerInstance is unique and will always only be processed once).

st2api, st2auth, st2stream

Number of requests / second + request processing timing info
Counters for various request and response related info:
- request method
- request path
- response status code

Other Changes

EchoMetricsDriver which prints out all the metrics information to console which makes debugging easier

TODO

We need some documentation on exposed metrics and explanation what each one represents

Now we also track: 1. Number of rules processed by rules engine (counter + timer) 2. How long it took to process rule for each unique trigger instance.

Kami · 2018-08-17T11:18:00Z

st2reactor/st2reactor/rules/worker.py

                trigger_instance, trigger_constants.TRIGGER_INSTANCE_PROCESSING)
-            self.rules_engine.handle_trigger_instance(trigger_instance)
+
+            with CounterWithTimer(key="st2.rule.processed"):


@bigmstone I'm open to a better name, something which would also make it consistent with action runner metric names.

I couldn't come up with anything better :/

Perhaps we could also just call it st2.trigger_instance.processed, but this could also be a bit deceiving, imo, because trigger instances can also come in through the API and can be handled by other services.

At some point we might also care about total trigger instances (also the ones which are / will be processed elsewhere), but we definitely care specifically about trigger instances processed by the rules engine so the metrics key should convey that.

We should probably create a function to generate these namings in a standard way. Have a few input params so it's not so subjective, but like you I don't have many strong convictions here. I think it mainly just matters that it's consistent.

Kami · 2018-08-17T11:32:27Z

@bigmstone While working on that, I noticed we are missing "Gauge" metric type (so we can measure also total number of requests and similar and not just requests / second).

I will add that in this PR.

NOTE: Prometheus driver conflats counter and gauge atm a bit, we should eventually sort this out so using different drivers won't result in using different metrics type and different representation and visualization.

services with various metrics.

development environment.

bigmstone

LGTM

just cause additional confusion and harder to actually find out the actual metric name.

dzimine · 2018-08-20T16:17:33Z

may be:

do we already have number of actions / sec - scheduled, executed, action queue size, time per action?
on rules - it's interesting to add color to rule processing:
- triggers instances, rule evaluations (per trigger), rule invocations (per trigger)

Kami · 2018-08-21T08:23:44Z

@dzimine Thanks for the feedback.

Those are good metrics and most of them are tracked already now via this PR (docs at https://github.com/StackStorm/st2docs/pull/787/files?short_path=fa1dde0#diff-fa1dde031ec91f548c4d4c3a722ad980).

do we already have number of actions / sec - scheduled, executed, action queue size, time per action?

st2.action.executions and st2.action.executions.<execution status>. We don't have a special metric, but a queue size can be inferred using execution status metrics (running for running ones and delayed, requested, for ones which are waiting to be executed aka queue size).

on rules - it's interesting to add color to rule processing:triggers instances, rule evaluations (per trigger), rule invocations (per trigger)

We also have that now, but some of those are scoped just to a rule reference and not trigger instance. I will also make sure we have all of that data on per trigger instance basis (because yes, that's important, trigger instance + rule combo could give us a clue on what is going on - e.g. is it something with trigger instance payload in combination with some rule criteria which is slow, etc.).

On a related note - talked about this with @bigmstone on Slack yesterday. There is also a lot of "derived" metrics we could add to StackStorm, but, imo, that would add a lot of overhead and it's not necessary when those metrics can be derived using other existing metrics inside the monitoring visualization tool (e.g. execution status one for queue size, etc.).

statsd counters are of a special type which is aggregated, sampled and calculated into rate so decreasing those will result in invalid / unexpected values. Decreasing them would only make sense if statsd wouldn't do any processing on them and treat them as raw values (e.g. gauges).

groups metrics based on the type so the suffix is redundant.

…ckStorm/st2 into rules_engine_metrics_instrumentation

This option can specify an optional prefix which is prepended to each metric key / name. This comes handy when you want to use the same statsd or other backend instance for multiple environments (each environment would specify a different prefix).

before metric name.

Kami added 2 commits August 17, 2018 12:54

Fix typo.

0245a18

Add missing __all__, instrument rules engine with additional metrics.

0a975aa

Now we also track: 1. Number of rules processed by rules engine (counter + timer) 2. How long it took to process rule for each unique trigger instance.

Kami requested a review from bigmstone August 17, 2018 11:16

Use consistent metric name.

9b5a5af

Kami commented Aug 17, 2018

View reviewed changes

Kami added 5 commits August 17, 2018 13:33

Add missing __all__.

4d1b8f7

Add support for "Gauge" metric type to our metrics drivers and code.

20134d8

NOTE: Prometheus driver conflats counter and gauge atm a bit, we should eventually sort this out so using different drivers won't result in using different metrics type and different representation and visualization.

Add new instrumentation middleware which allows us to instrument our API

0b4a4e5

services with various metrics.

Add new instrumentation middleware to all the API services.

5d78d67

Add new echo metrics driver which prints out metric calls and use it in

b483f75

development environment.

Kami added this to the 2.9.0 milestone Aug 17, 2018

Kami added performance visibility labels Aug 17, 2018

Kami added 4 commits August 17, 2018 16:23

Also track total number of the incoming requests.

24f5d1a

Fix lint.

b16fb32

Add tests for new gauge methods.

c759b63

Use echo driver by default in dev environments.

179768e

bigmstone approved these changes Aug 17, 2018

View reviewed changes

Kami mentioned this pull request Aug 20, 2018

Add some documentation on metrics and instrumentation StackStorm/st2docs#787

Merged

Kami added 5 commits August 20, 2018 15:52

Use consistent metric names, add some additional instrumentation.

8d207ea

Use consistent method names.

9f8cb89

Fix method arguments.

27e85dc

Get rid of format_metric_key() function calls which provide no value and

478744e

just cause additional confusion and harder to actually find out the actual metric name.

Fix typo.

4086b5c

Kami added 2 commits August 21, 2018 10:23

Merge branch 'master' into rules_engine_metrics_instrumentation

387cf92

Reduce code duplication.

55b16f9

Kami added 21 commits August 21, 2018 11:36

Increase _counter and _timer suffixes since statsd already correctly

74efba2

groups metrics based on the type so the suffix is redundant.

Merge branch 'rules_engine_metrics_instrumentation' of github.com:Sta…

b014d7d

…ckStorm/st2 into rules_engine_metrics_instrumentation

Remove unused module.

8354248

Remove unused driver for now since it's just causing confusion.

82772cc

Fix metric name.

e2883d4

Update affected tests.

0110721

Update changelog.

efb7856

Add sample statsd config.

ebc1245

Add sample metrics configs for statsd config and carbon cache.

cacaa42

Fix file extension.

7d04e4e

Add new metrics.prefix config option.

ee8996f

This option can specify an optional prefix which is prepended to each metric key / name. This comes handy when you want to use the same statsd or other backend instance for multiple environments (each environment would specify a different prefix).

Add changelog entry.

eb2646d

Remove unused code.

6732471

Add a comment.

1170280

Make metric key generation more robust, include prefix after "st2" and

f4b3ca3

before metric name.

Update affected code and tests, add new tests.

505106e

Add missing module.

4fb318e

Fix typo.

d4c3aaf

Re-gen sample config.

fb54c2a

Re-generate sample config.

1f30852

Kami merged commit 289d9e3 into master Aug 22, 2018

Kami deleted the rules_engine_metrics_instrumentation branch August 22, 2018 15:42

Kami mentioned this pull request Aug 23, 2018

Additional metrics instrumentation for various services and code paths #4314

Open

4 tasks

cognifloyd mentioned this pull request Apr 9, 2021

Drop unused dependencies #5228

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Additional code metrics instrumentation #4310

Additional code metrics instrumentation #4310

Uh oh!

Kami commented Aug 17, 2018 •

edited

Loading

Uh oh!

Kami Aug 17, 2018

Uh oh!

Kami Aug 17, 2018

Uh oh!

bigmstone Aug 17, 2018

Uh oh!

Kami commented Aug 17, 2018

Uh oh!

bigmstone left a comment

Uh oh!

dzimine commented Aug 20, 2018

Uh oh!

Kami commented Aug 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Additional code metrics instrumentation #4310

Additional code metrics instrumentation #4310

Uh oh!

Conversation

Kami commented Aug 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rules Engine

st2api, st2auth, st2stream

Other Changes

TODO

Uh oh!

Kami Aug 17, 2018

Choose a reason for hiding this comment

Uh oh!

Kami Aug 17, 2018

Choose a reason for hiding this comment

Uh oh!

bigmstone Aug 17, 2018

Choose a reason for hiding this comment

Uh oh!

Kami commented Aug 17, 2018

Uh oh!

bigmstone left a comment

Choose a reason for hiding this comment

Uh oh!

dzimine commented Aug 20, 2018

Uh oh!

Kami commented Aug 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Kami commented Aug 17, 2018 •

edited

Loading