feat: implement a Keptn metrics provider#1630
Conversation
docs/gitbook/usage/metrics.md
Outdated
| query: | | ||
| keptnmetric/my-namespace/response-time |
There was a problem hiding this comment.
| query: | | |
| keptnmetric/my-namespace/response-time | |
| query: "keptnmetric/my-namespace/response-time" |
I would make this a one-liner to suggest you can't have more than one object reference in there. I would also add an example for a real Istio metric like Istio 404 rate https://docs.flagger.app/usage/metrics#prometheus
pkg/controller/controller.go
Outdated
|
|
||
| import ( | ||
| "fmt" | ||
| "k8s.io/client-go/rest" |
There was a problem hiding this comment.
Please sort the imports into groups, run make fmt
202e636 to
7e567f8
Compare
pkg/metrics/providers/keptn.go
Outdated
|
|
||
| // delete the created analysis at the end of the function | ||
| defer func() { | ||
| _ = k.client. |
There was a problem hiding this comment.
instead of ignoring the error, we should log it.
| } | ||
| return &KeptnProvider{ | ||
| client: client, | ||
| analysisTimeout: 10 * time.Second, |
There was a problem hiding this comment.
has this been chosen randomly? how does this play with the Canary analysis's interval?
There was a problem hiding this comment.
This is the maximum time the provider will wait for the result of a Keptn Analysis to be available - I added this to prevent the provider from blocking the flagger controller indefinitely in case the created Analysis cannot be completed.
I chose this value as this should be sufficient in most cases for the created Keptn Analysis to be reconciled by the Keptn Analysis controller, however I can also adjust this If you'd like, or if there are some guidelines on how long a provider might take to retrieve a value.
069ebd3 to
8b93127
Compare
Thanks for the review @aryan9600! I just addressed your comments and adapted the code accordingly - If there are some further questions feel free to reach out :) |
aryan9600
left a comment
There was a problem hiding this comment.
took a second pass, its looking good, just had some minor nits.
can the docs be improved a bit more? a few suggestions:
- the
queryfield inMetricTemplateexample mentions the timeframe and arguments as well - make it clearer that resource name is actually that of the
AnalysisDefinitionand not the to be createdAnalysis. - an example around
Analysis, which shows a sampleAnalysisDefinitionand what theAnalysisobject generated by Flagger would look like
pkg/metrics/providers/keptn.go
Outdated
| } | ||
|
|
||
| var analysisResource = schema.GroupVersionResource{ | ||
| Group: "metrics.keptn.sh", |
There was a problem hiding this comment.
lets define a constant for this as well
pkg/metrics/providers/keptn.go
Outdated
| } | ||
| } | ||
| } | ||
| return 0, fmt.Errorf("could not retrieve KeptnMetric - no value found int resource %s/%s", queryObj.Namespace, queryObj.ResourceName) |
There was a problem hiding this comment.
| return 0, fmt.Errorf("could not retrieve KeptnMetric - no value found int resource %s/%s", queryObj.Namespace, queryObj.ResourceName) | |
| return 0, fmt.Errorf("could not retrieve KeptnMetric - no value found in resource %s/%s", queryObj.Namespace, queryObj.ResourceName) |
pkg/metrics/providers/keptn.go
Outdated
| select { | ||
| case <-ctx.Done(): | ||
| return 0, fmt.Errorf("encountered timeout while waiting for Keptn Analysis %s/%s to be finished", obj.Namespace, obj.ResourceName) | ||
| case <-time.After(500 * time.Millisecond): |
There was a problem hiding this comment.
| case <-time.After(500 * time.Millisecond): | |
| case <-time.After(time.Second): |
as mentioned in the above comment
pkg/metrics/providers/keptn_test.go
Outdated
| require.NotNil(t, provider) | ||
|
|
||
| isOnline, err := provider.IsOnline() | ||
| require.Nil(t, err) |
There was a problem hiding this comment.
| require.Nil(t, err) | |
| require.NoError(t, err) |
pkg/metrics/providers/keptn_test.go
Outdated
| func TestNewKeptnProviderNoKubeConfig(t *testing.T) { | ||
| provider, err := NewKeptnProvider(nil) | ||
|
|
||
| require.NotNil(t, err) |
There was a problem hiding this comment.
| require.NotNil(t, err) | |
| require.Error(t, err) |
pkg/metrics/providers/keptn_test.go
Outdated
| require.True(t, isOnline) | ||
| } | ||
|
|
||
| func TestNewKeptnProviderNoKubeConfig(t *testing.T) { |
There was a problem hiding this comment.
| func TestNewKeptnProviderNoKubeConfig(t *testing.T) { | |
| func TestNewKeptnProvider_NoKubeConfig(t *testing.T) { |
pkg/metrics/providers/keptn_test.go
Outdated
| require.Nil(t, provider) | ||
| } | ||
|
|
||
| func TestKeptnProvider_RunQueryKeptnMetric(t *testing.T) { |
There was a problem hiding this comment.
| func TestKeptnProvider_RunQueryKeptnMetric(t *testing.T) { | |
| func TestKeptnProvider_RunQuery_KeptnMetric(t *testing.T) { |
pkg/metrics/providers/keptn_test.go
Outdated
| tests := []struct { | ||
| name string | ||
| setupClient func() *fake.FakeDynamicClient | ||
| args args |
There was a problem hiding this comment.
why not just put query string here?
pkg/metrics/providers/keptn_test.go
Outdated
| tests := []struct { | ||
| name string | ||
| setupClient func() *fake.FakeDynamicClient | ||
| // verificationFunc() will run in a separate go routing |
There was a problem hiding this comment.
| // verificationFunc() will run in a separate go routing | |
| // verificationFunc() will run in a separate go routine |
pkg/metrics/providers/keptn_test.go
Outdated
| // verificationFunc() will run in a separate go routing | ||
| // and check if the expected resources are created | ||
| verificationFunc func(fakeClient *fake.FakeDynamicClient) error | ||
| args args |
|
@aryan9600 could you please push your suggestions on top of this PR. I see that @bacherfl has enabled us to push commit. |
|
Hi @stefanprodan @aryan9600 sorry for the late reply - I had a busy schedule recently. I'm not sure if i can get to incorporate the suggested changes this week, but I may have some time to do so next week - otherwise, feel free to add directly to the PR, as @stefanprodan suggested |
c559cdf to
29c8bf5
Compare
|
PR updated, the docs could still use a bit more info around |
|
@aryan9600 please sync the CRDs, the helm chart has the old one. |
Add a Keptn metrics provider for two resources: * KeptnMetric: Verify the value of a single metric. * Analysis (via AnalysisDefinition): Run a Keptn analysis over an interval validating SLOs. Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
29c8bf5 to
ce976e2
Compare
|
kuma e2e test failure is unrelated, happening because the tarball has moved to another location. i think we should not let it block this PR. we can fix and bump the kuma version later. |
stefanprodan
left a comment
There was a problem hiding this comment.
LGTM
Thanks @bacherfl and @aryan9600
Closes #1627
This PR adds the implementation of a
keptnmetrics providerThe main difference to other providers in Flagger is that this one is using a Kubernetes client to interact with Keptn resources (
KeptnMetricsandAnalyses). Therefore, in addition to the code inkeptn.go, theClusterRoleneeded to be adapted to be able to access the required Keptn resources.The provider supports the use of the following resources:
KeptnMetric: This resource represents the current value of a certain metric. The metric is retrieved by Keptn from one of the following data sources:prometheus,thanos,cortex,dynatrace,datadog.Analysis: This Keptn resource can be used to apply a more complex grading logic over a set of different metrics. More information about this concept can be found in the Keptn Docs