Skip to content

Vector making api requests to Kubernetes API server without using resource_version #16797

@jeremy-mi-rh

Description

@jeremy-mi-rh

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

This is to split issues 16753 to its own.

Context

We use vector to deliver kubernetes_logs to our kafka cluster which will be later processed and ingested into Humio. Vector is deployed as a daemon set in our kubernetes clusteres (each with >1000 nodes running).

We recently had an outage in one of our kubernetes clusters (with ~1100 nodes running). There was a failure in ETCD leader node, which resulted in a cascaded failure where pods making 1000x API calls to our API server which eventually brought the kubernetes control plane down entirely.

In the process of remediation, we identified vector as one of the candidate that was hammering the API server. Shutting down vector along with a few other daemon sets eventually reduced the traffic on Control Plane components, which allows ETCD nodes to recover.

Issue: resource_version not set when making API requests to Kube API server

Based on this issue: #7943, resource_version was set to 0 in vector 0.18 - 0.20. PR #11714 adopted kube-rs and dropped the change in #9974. When looking at the Audit logs from kube-api-server, we don't see the resource_version set in the request URL, which makes us wonder if this was an regression

Sample request in audit logs:

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Request","auditID":"xxx","stage":"ResponseComplete","requestURI":"/api/v1/nodes?\u0026fieldSelector=metadata.name%3Dip-10-x-x-x.ec2.internal","verb":"list","user":{"username":"system:serviceaccount:vector:vector-agent","uid":"xxx","groups":["system:serviceaccounts","system:serviceaccounts:vector","system:authenticated"]},"sourceIPs":["10.x.x.x"],"objectRef":{"resource":"nodes","name":"ip-10-x-x-x.ec2.internal","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2023-03-08T02:29:15.992746Z","stageTimestamp":"2023-03-08T02:29:16.534351Z","annotations":{"authentication.k8s.io/legacy-token":"system:serviceaccount:vector:vector-agent","authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"default-reader-role-binding\" of ClusterRole \"cluster-read-all\" to Group \"system:authenticated\""}}

Version

vector 0.27.0 (x86_64-unknown-linux-gnu 5623d1e 2023-01-18)

References

#7943
#16753

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions