Skip to content

watcherStream.buf in apiserver can grow indefinitely #138217

@mengqiy

Description

@mengqiy

What happened?

The watcherStream.buf in apiserver can grow indefinitely. When this happens, ConsistentReadFromCache feature will wait for 3s and hope the cache to catch and then delegate to etcd.
One scenario that this can happen: high pod revision churns (pod mutation) and high number of watch requests.
There's no metric or logs that indicate the backlog in this buffer.

Enabling ConcurrentWatchObjectDecode FG doesn't help much.

What did you expect to happen?

  1. We may want to have a queue depth metric for this buffer. But I'm not sure if it is practical when it's an etcd vendor library code.
  2. If metric is not possible, we should at least add some log to show the queue depth for the buffer
  3. We may want to take some action when the buffer is too big. e.g. restart this etcd watch.

How can we reproduce it (as minimally and precisely as possible)?

It may not be the minimum to reproduce, but the steps I use:

  1. Generate pod mutation load: about 150 pod revisions/sec where each pod size is about 50KB.
  2. Create watch requests load: 40 pod watch requests/sec

Anything else we need to know?

No response

Kubernetes version

1.34
But I highly suspect it's the same how newer versions

Cloud provider

Details EKS

OS version

Details
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Details

Container runtime (CRI) and version (if applicable)

Details

Related plugins (CNI, CSI, ...) and versions (if applicable)

Details

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions