Skip to content

Knative Serving with istio fails, net-istio controller is OOMkilled  #12778

@skonto

Description

@skonto

What version of Knative?

1.3.0

Expected Behavior

Knative Serving with istio should install as expected.

Actual Behavior

Net-istio-controller pod is OOMkilled.

Steps to Reproduce the Problem

Similar issue has been previously reported at client-go lib here.

  • Install Knative serving on minikube with istio following the steps here.

  • Generate a dummy secret of 0.5M size

$head -c 500K </dev/urandom > secret.txt

Run the following script to create some dummy secrets:

#!/bin/bash
for i in {1..100}
  do
     echo "Creating namespace test-$i"
     oc create namespace test-$i
done

for i in {1..100}
  do
    for j in {1..50}
      do
      oc create secret generic test-secret-$j --from-file=./secret.txt -n test-$i
    done
done
  • Check net-istio-controller pod:
$ kubectl get pods -n knative-serving
NAME                                    READY   STATUS      RESTARTS   AGE
activator-54b76b65dc-nq4z6              1/1     Running     0          8m37s
autoscaler-bbd99dfbb-jdrv8              1/1     Running     0          8m36s
controller-64769f58d6-447r4             1/1     Running     0          8m36s
domain-mapping-846667c656-7b84v         1/1     Running     0          8m36s
domainmapping-webhook-f44f4785b-kft95   1/1     Running     0          8m35s
net-istio-controller-8d456687b-hq95g    0/1     OOMKilled   1          33s
net-istio-webhook-6d98796d6-r46jk       1/1     Running     0          8m6s
webhook-56c48c5b66-m9bf6                1/1     Running     0          8m34s
  • Controller pods gets stuck here:
{"severity":"INFO","timestamp":"2022-03-24T10:43:27.745116111Z","logger":"net-istio-controller","caller":"metrics/metrics_worker.go:76","message":"Flushing the existing exporter before setting up the new exporter.","commit":"c096fb6"}
{"severity":"INFO","timestamp":"2022-03-24T10:43:27.745358289Z","logger":"net-istio-controller","caller":"metrics/prometheus_exporter.go:51","message":"Created Prometheus exporter with config: &{knative.dev/net-istio net_istio_controller prometheus 5000000000 <nil>  false 9090 0.0.0.0}. Start the server for Prometheus exporter.","commit":"c096fb6"}
{"severity":"INFO","timestamp":"2022-03-24T10:43:27.745437056Z","logger":"net-istio-controller","caller":"metrics/metrics_worker.go:91","message":"Successfully updated the metrics exporter; old config: <nil>; new config &{knative.dev/net-istio net_istio_controller prometheus 5000000000 <nil>  false 9090 0.0.0.0}","commit":"c096fb6"}
{"level":"info","ts":1648118607.8376493,"logger":"fallback","caller":"injection/injection.go:61","msg":"Starting informers..."}
(edited)

$ kubectl get secrets -A | wc -l
1080

Check allocs:

image

Obviously secret lister consumes all the mem.

Note: This is a simple reproducer, but downstream a customer also faces this in a large cluster with many more secrets of various sizes. Also this is not restricted to secrets as we have other type of resources listed. In addition net-kourier is expected to suffer as it uses similar logic.

Potential solution:

Increasing the limits in the net-istio-controller pod does fix the problem but this is not satisfying enough as size can be unpredictable and may not be acceptable. At the customer side mem limit had to be set to 2Gi so that the pod could start normally.

A better option should be to only consider for caching the secrets we care about.
We get the secret lister here. This is derived from a secret sharedInformer here.
On that informer we call get. That informer is initialized from context here which uses this informer factory.
The factory initialized here only uses namespace filtering. If we use WithTweakListOptions (essentially K8s API ListOption) we should be able to filter based on same label what is fetched before it is stored in the cache.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.triage/acceptedIssues which should be fixed (post-triage)

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions