Connection limit for thick daemon.#1458
Connection limit for thick daemon.#1458juliusmh wants to merge 2 commits intok8snetworkplumbingwg:masterfrom
Conversation
Signed-off-by: Julius Hinze <[email protected]>
db9bce5 to
eb7efb0
Compare
SchSeba
left a comment
There was a problem hiding this comment.
is it possible to add a test for this one?
maybe put the number really low just to be sure we will get a retry from the CRI system to try to start the pod again?
also just a general question can you try to run a pprof to see what part of the code is wasting memory potentially we can improve that area in the code.
WalkthroughThis PR implements connection limiting for the multus daemon's CNI server to control concurrent request handling during pod burst scenarios. It adds a configurable Changes
Sequence DiagramsequenceDiagram
participant Pod1 as Pod 1
participant Pod2 as Pod 2
participant Pod3 as Pod 3
participant LimitListener
participant Semaphore as Semaphore (capacity: 1)
participant Daemon as CNI Daemon
Pod1->>LimitListener: connect
LimitListener->>Semaphore: acquire()
Semaphore-->>LimitListener: acquired (slot 0 occupied)
LimitListener->>Daemon: Accept()
Daemon-->>Pod1: connection established
Pod2->>LimitListener: connect
LimitListener->>Semaphore: acquire()
Note over Semaphore: blocked (at capacity)
Pod3->>LimitListener: connect
LimitListener->>Semaphore: acquire()
Note over Semaphore: blocked (at capacity)
Pod1->>LimitListener: close connection
LimitListener->>Semaphore: release()
Semaphore-->>LimitListener: slot available
LimitListener->>Semaphore: acquire() [Pod2]
Semaphore-->>LimitListener: acquired
LimitListener->>Daemon: Accept() [Pod2]
Daemon-->>Pod2: connection established
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Thanks for the comment @SchSeba, I added some tests but I'm not well versed in this repository. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
e2e/templates/multus-daemonset-thick.yml.j2 (1)
100-101: Clarify whetherconnectionLimit: 1should be global for all thick‑mode e2e runsHard‑coding
connectionLimit: 1in the shared thick DaemonSet config means every thick‑mode e2e job now runs with single‑connection concurrency, not just the connection‑limit test. That’s safe but might unnecessarily slow other tests and hides behavior under higher limits.Consider either:
- Making the limit configurable (e.g., via a Jinja/env parameter) and defaulting to “no limit” for general tests while overriding to
1only in the connection‑limit scenario, or- Adding a brief comment in this ConfigMap explaining that
1is intentionally low to exercise the limit behavior in e2e..github/workflows/kind-e2e.yml (1)
88-90: Scope the “Test connection limit” step to thick mode (optional)Right now this step runs for every matrix entry, including non‑thick manifests where
connectionLimitisn’t configured. That’s functionally fine but makes the step less clearly tied to the thick‑daemon feature and adds a bit of redundant test time.Consider either:
- Adding an
if: ${{ matrix.multus-manifest == 'multus-daemonset-thick.yml' }}guard (like the subdirectory chaining tests), or- Renaming the step to reflect that it’s a generic “many pods” sanity test for non‑thick runs, if you want the broader coverage.
pkg/server/types.go (1)
73-83: DocumentConnectionLimitsemantics inControllerNetConfThe new
ConnectionLimit *intfield is a clean way to plumb the option through config, and using a pointer keeps it backward compatible. Right now, its behavior (nil or ≤0 means “no limit”, positive values cap concurrent connections) is only implicit from the daemon logic.Consider adding a short comment on the field or in the struct doc to spell this out for users of the API, e.g., “maximum concurrent CNI server connections; nil or ≤0 disables limiting”.
cmd/multus-daemon/main.go (1)
32-33: Connection limit wiring looks good; consider validating non‑positive valuesThe listener wrapping with:
if limit := daemonConfig.ConnectionLimit; limit != nil && *limit > 0 { logging.Debugf("connection limit: %d", *limit) l = netutil.LimitListener(l, *limit) }cleanly keeps existing behavior for legacy configs and only applies
LimitListenerwhen explicitly set to a positive value.Two small follow‑ups to consider:
- Treat explicitly configured
0or negative values as misconfiguration (log a warning or error) instead of silently behaving as “no limit”, so bad config is visible.- Ensure the user‑facing config docs mention that only positive values enable limiting and that nil/0 (depending on how you want to treat 0) results in no concurrency cap.
Also applies to: 171-174
e2e/test-connection-limit.sh (1)
1-10: Consider adding a trap for cleanup (and optionally stricter shell flags)The script correctly exercises the “many pods” scenario and cleans up on the happy path. If
kubectl createorkubectl waitfails, though,set -o errexitwill exit before the delete runs, leaving resources around until the cluster is torn down.Optionally, you could:
- Add a simple trap to always attempt cleanup:
trap 'kubectl delete -f yamls/many-pods.yml >/dev/null 2>&1 || true' EXIT
- And, if you want more robust scripting, add
set -o nounset -o pipefailin line with other e2e scripts.Not required for correctness, but it tightens test hygiene.
e2e/templates/many-pods.yml.j2 (1)
1-23: Verify whether the test pod really needsprivileged: trueThis Deployment is only running
sleep, so it may not need fullprivilegedprivileges. If there’s no dependency on host networking features or special capabilities, consider tightening the securityContext (droppingprivileged: trueor replacing it with narrower capabilities) to keep the e2e fixtures as minimal‑privilege as possible.If other tests or the CNI setup rely on this being privileged, keeping it as‑is is fine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
.github/workflows/kind-e2e.yml(1 hunks)cmd/multus-daemon/main.go(2 hunks)e2e/templates/many-pods.yml.j2(1 hunks)e2e/templates/multus-daemonset-thick.yml.j2(1 hunks)e2e/test-connection-limit.sh(1 hunks)pkg/server/types.go(1 hunks)vendor/golang.org/x/net/netutil/listen.go(1 hunks)vendor/modules.txt(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cmd/multus-daemon/main.go (2)
pkg/logging/logging.go (1)
Debugf(126-128)vendor/golang.org/x/net/netutil/listen.go (1)
LimitListener(16-22)
🔇 Additional comments (2)
vendor/modules.txt (1)
204-217: Vendoredgolang.org/x/net/netutilentry is consistentThe new
golang.org/x/net/netutilline fits correctly under the existinggolang.org/x/netmodule section and matches the added vendored package and import usage.vendor/golang.org/x/net/netutil/listen.go (1)
1-87: VendoredLimitListenerimplementation looks standardThis
netutil.LimitListenerimplementation matches the usual upstream pattern (semaphore‑based cap, done channel, wrapped Conn releasing on Close). Keeping it as a straight vendored copy is good for future upstream syncs; I wouldn’t tweak behavior here unless you discover a specific bug and can upstream the fix.
Closes: #1346
Summary by CodeRabbit
Release Notes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.