-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What version of Knative?
1.3.0
Expected Behavior
Knative Serving with istio should install as expected.
Actual Behavior
Net-istio-controller pod is OOMkilled.
Steps to Reproduce the Problem
Similar issue has been previously reported at client-go lib here.
-
Install Knative serving on minikube with istio following the steps here.
-
Generate a dummy secret of 0.5M size
$head -c 500K </dev/urandom > secret.txt
Run the following script to create some dummy secrets:
#!/bin/bash
for i in {1..100}
do
echo "Creating namespace test-$i"
oc create namespace test-$i
done
for i in {1..100}
do
for j in {1..50}
do
oc create secret generic test-secret-$j --from-file=./secret.txt -n test-$i
done
done
- Check net-istio-controller pod:
$ kubectl get pods -n knative-serving
NAME READY STATUS RESTARTS AGE
activator-54b76b65dc-nq4z6 1/1 Running 0 8m37s
autoscaler-bbd99dfbb-jdrv8 1/1 Running 0 8m36s
controller-64769f58d6-447r4 1/1 Running 0 8m36s
domain-mapping-846667c656-7b84v 1/1 Running 0 8m36s
domainmapping-webhook-f44f4785b-kft95 1/1 Running 0 8m35s
net-istio-controller-8d456687b-hq95g 0/1 OOMKilled 1 33s
net-istio-webhook-6d98796d6-r46jk 1/1 Running 0 8m6s
webhook-56c48c5b66-m9bf6 1/1 Running 0 8m34s
- Controller pods gets stuck here:
{"severity":"INFO","timestamp":"2022-03-24T10:43:27.745116111Z","logger":"net-istio-controller","caller":"metrics/metrics_worker.go:76","message":"Flushing the existing exporter before setting up the new exporter.","commit":"c096fb6"}
{"severity":"INFO","timestamp":"2022-03-24T10:43:27.745358289Z","logger":"net-istio-controller","caller":"metrics/prometheus_exporter.go:51","message":"Created Prometheus exporter with config: &{knative.dev/net-istio net_istio_controller prometheus 5000000000 <nil> false 9090 0.0.0.0}. Start the server for Prometheus exporter.","commit":"c096fb6"}
{"severity":"INFO","timestamp":"2022-03-24T10:43:27.745437056Z","logger":"net-istio-controller","caller":"metrics/metrics_worker.go:91","message":"Successfully updated the metrics exporter; old config: <nil>; new config &{knative.dev/net-istio net_istio_controller prometheus 5000000000 <nil> false 9090 0.0.0.0}","commit":"c096fb6"}
{"level":"info","ts":1648118607.8376493,"logger":"fallback","caller":"injection/injection.go:61","msg":"Starting informers..."}
(edited)
$ kubectl get secrets -A | wc -l
1080
Check allocs:
Obviously secret lister consumes all the mem.
Note: This is a simple reproducer, but downstream a customer also faces this in a large cluster with many more secrets of various sizes. Also this is not restricted to secrets as we have other type of resources listed. In addition net-kourier is expected to suffer as it uses similar logic.
Potential solution:
Increasing the limits in the net-istio-controller pod does fix the problem but this is not satisfying enough as size can be unpredictable and may not be acceptable. At the customer side mem limit had to be set to 2Gi so that the pod could start normally.
A better option should be to only consider for caching the secrets we care about.
We get the secret lister here. This is derived from a secret sharedInformer here.
On that informer we call get. That informer is initialized from context here which uses this informer factory.
The factory initialized here only uses namespace filtering. If we use WithTweakListOptions (essentially K8s API ListOption) we should be able to filter based on same label what is fetched before it is stored in the cache.
