Knative Serving with istio fails, net-istio controller is OOMkilled 

## What version of Knative?
1.3.0

## Expected Behavior

Knative Serving with istio should install as expected.

## Actual Behavior

Net-istio-controller pod is OOMkilled.

## Steps to Reproduce the Problem

Similar issue has been previously reported at client-go lib [here](https://github.com/kubernetes/client-go/issues/871).

* Install Knative serving on minikube with istio following the steps [here](https://knative.dev/docs/install/yaml-install/serving/install-serving-with-yaml/#install-a-networking-layer).

* Generate a dummy secret of 0.5M size
```
$head -c 500K </dev/urandom > secret.txt
```
Run the following script to create some dummy secrets:
```
#!/bin/bash
for i in {1..100}
  do
     echo "Creating namespace test-$i"
     oc create namespace test-$i
done

for i in {1..100}
  do
    for j in {1..50}
      do
      oc create secret generic test-secret-$j --from-file=./secret.txt -n test-$i
    done
done
```

* Check net-istio-controller pod:

```
$ kubectl get pods -n knative-serving
NAME                                    READY   STATUS      RESTARTS   AGE
activator-54b76b65dc-nq4z6              1/1     Running     0          8m37s
autoscaler-bbd99dfbb-jdrv8              1/1     Running     0          8m36s
controller-64769f58d6-447r4             1/1     Running     0          8m36s
domain-mapping-846667c656-7b84v         1/1     Running     0          8m36s
domainmapping-webhook-f44f4785b-kft95   1/1     Running     0          8m35s
net-istio-controller-8d456687b-hq95g    0/1     OOMKilled   1          33s
net-istio-webhook-6d98796d6-r46jk       1/1     Running     0          8m6s
webhook-56c48c5b66-m9bf6                1/1     Running     0          8m34s
```
* Controller pods gets stuck here:
```
{"severity":"INFO","timestamp":"2022-03-24T10:43:27.745116111Z","logger":"net-istio-controller","caller":"metrics/metrics_worker.go:76","message":"Flushing the existing exporter before setting up the new exporter.","commit":"c096fb6"}
{"severity":"INFO","timestamp":"2022-03-24T10:43:27.745358289Z","logger":"net-istio-controller","caller":"metrics/prometheus_exporter.go:51","message":"Created Prometheus exporter with config: &{knative.dev/net-istio net_istio_controller prometheus 5000000000 <nil>  false 9090 0.0.0.0}. Start the server for Prometheus exporter.","commit":"c096fb6"}
{"severity":"INFO","timestamp":"2022-03-24T10:43:27.745437056Z","logger":"net-istio-controller","caller":"metrics/metrics_worker.go:91","message":"Successfully updated the metrics exporter; old config: <nil>; new config &{knative.dev/net-istio net_istio_controller prometheus 5000000000 <nil>  false 9090 0.0.0.0}","commit":"c096fb6"}
{"level":"info","ts":1648118607.8376493,"logger":"fallback","caller":"injection/injection.go:61","msg":"Starting informers..."}
(edited)

$ kubectl get secrets -A | wc -l
1080
```
Check allocs:

![image](https://user-images.githubusercontent.com/7945591/159909082-7bfdfff9-0c68-427d-b4b5-03db6bba3223.png)

Obviously secret lister consumes all the mem. 

Note: This is a simple reproducer, but downstream a customer also faces this in a large cluster with many more secrets of various sizes. Also this is not restricted to secrets as we have other type of resources listed. In addition net-kourier is expected to suffer as it uses similar logic.

Potential solution:

Increasing the limits in the net-istio-controller pod does fix the problem but this is not satisfying enough as size can be unpredictable and may not be acceptable. At the customer side mem limit had to be set to 2Gi so that the pod could start normally.

A better option should be to only consider for caching the secrets we care about.
We get the secret lister [here](https://github.com/knative-sandbox/net-istio/blob/d8f4997f96235ee0c209e66723abd1d8f4e25f11/pkg/reconciler/ingress/controller.go#L93). This is derived from a secret sharedInformer [here](https://github.com/knative-sandbox/net-istio/blob/d8f4997f96235ee0c209e66723abd1d8f4e25f11/pkg/reconciler/ingress/controller.go#L84).
On that informer we call [get](https://github.com/knative-sandbox/net-istio/blob/d8f4997f96235ee0c209e66723abd1d8f4e25f11/vendor/knative.dev/pkg/injection/clients/namespacedkube/informers/core/v1/secret/secret.go#L56). That informer is initialized from context [here](https://github.com/knative-sandbox/net-istio/blob/d8f4997f96235ee0c209e66723abd1d8f4e25f11/vendor/knative.dev/pkg/injection/clients/namespacedkube/informers/core/v1/secret/secret.go#L37) which uses this [informer factory](https://github.com/knative-sandbox/net-istio/blob/d8f4997f96235ee0c209e66723abd1d8f4e25f11/vendor/knative.dev/pkg/injection/clients/namespacedkube/informers/core/v1/secret/secret.go#L45).
The factory initialized [here](https://github.com/knative-sandbox/net-istio/blob/d8f4997f96235ee0c209e66723abd1d8f4e25f11/vendor/knative.dev/pkg/injection/clients/namespacedkube/informers/factory/factory.go#L31) only uses [namespace filtering](https://github.com/knative-sandbox/net-istio/blob/d8f4997f96235ee0c209e66723abd1d8f4e25f11/vendor/knative.dev/pkg/injection/clients/namespacedkube/informers/factory/factory.go#L42). If we use [WithTweakListOptions (essentially K8s API ListOption)](https://github.com/knative-sandbox/net-istio/blob/d8f4997f96235ee0c209e66723abd1d8f4e25f11/vendor/k8s.io/client-go/informers/factory.go#L80)  we should be able to filter based on same label what is fetched before it is stored in the cache.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Knative Serving with istio fails, net-istio controller is OOMkilled #12778

What version of Knative?

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Knative Serving with istio fails, net-istio controller is OOMkilled #12778

Description

What version of Knative?

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions