Skip to content

Comments

Connection limit for thick daemon.#1458

Open
juliusmh wants to merge 2 commits intok8snetworkplumbingwg:masterfrom
juliusmh:jmh/limit_listener
Open

Connection limit for thick daemon.#1458
juliusmh wants to merge 2 commits intok8snetworkplumbingwg:masterfrom
juliusmh:jmh/limit_listener

Conversation

@juliusmh
Copy link

@juliusmh juliusmh commented Nov 6, 2025

Closes: #1346

Summary by CodeRabbit

Release Notes

  • New Features

    • Added configurable connection limit for the CNI server to constrain maximum concurrent connections.
  • Tests

    • Added end-to-end test scenarios for connection limit functionality.

✏️ Tip: You can customize this high-level summary in your review settings.

@coveralls
Copy link

coveralls commented Nov 13, 2025

Coverage Status

coverage: 49.68% (-0.04%) from 49.721%
when pulling eb7efb0 on juliusmh:jmh/limit_listener
into f42e0bd on k8snetworkplumbingwg:master.

@juliusmh juliusmh requested a review from karampok November 26, 2025 11:21
Copy link
Contributor

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to add a test for this one?
maybe put the number really low just to be sure we will get a retry from the CRI system to try to start the pod again?

also just a general question can you try to run a pprof to see what part of the code is wasting memory potentially we can improve that area in the code.

@coderabbitai
Copy link

coderabbitai bot commented Dec 8, 2025

Walkthrough

This PR implements connection limiting for the multus daemon's CNI server to control concurrent request handling during pod burst scenarios. It adds a configurable connectionLimit parameter to the daemon configuration, applies a listener wrapper to enforce the limit, provides test infrastructure with template manifests and an e2e test script, and vendors the required golang.org/x/net/netutil package.

Changes

Cohort / File(s) Change Summary
Daemon Configuration & Types
cmd/multus-daemon/main.go, pkg/server/types.go
Adds ConnectionLimit field to daemon config struct and applies netutil.LimitListener wrapper to the CNI server listener when configured; imports golang.org/x/net/netutil.
E2E Test Infrastructure
.github/workflows/kind-e2e.yml, e2e/test-connection-limit.sh, e2e/templates/many-pods.yml.j2, e2e/templates/multus-daemonset-thick.yml.j2
Adds new workflow step to run connection limit test; adds shell script to create multi-pod deployment, wait for readiness, and cleanup; adds Jinja2 template for 6-replica centos Deployment; configures connectionLimit: 1 in daemon ConfigMap.
Vendor Dependencies
vendor/golang.org/x/net/netutil/listen.go, vendor/modules.txt
Vendors golang.org/x/net/netutil package implementing LimitListener wrapper that caps concurrent connections using a semaphore channel; adds module entry to vendor manifest.

Sequence Diagram

sequenceDiagram
    participant Pod1 as Pod 1
    participant Pod2 as Pod 2
    participant Pod3 as Pod 3
    participant LimitListener
    participant Semaphore as Semaphore (capacity: 1)
    participant Daemon as CNI Daemon

    Pod1->>LimitListener: connect
    LimitListener->>Semaphore: acquire()
    Semaphore-->>LimitListener: acquired (slot 0 occupied)
    LimitListener->>Daemon: Accept()
    Daemon-->>Pod1: connection established

    Pod2->>LimitListener: connect
    LimitListener->>Semaphore: acquire()
    Note over Semaphore: blocked (at capacity)
    
    Pod3->>LimitListener: connect
    LimitListener->>Semaphore: acquire()
    Note over Semaphore: blocked (at capacity)

    Pod1->>LimitListener: close connection
    LimitListener->>Semaphore: release()
    Semaphore-->>LimitListener: slot available
    
    LimitListener->>Semaphore: acquire() [Pod2]
    Semaphore-->>LimitListener: acquired
    LimitListener->>Daemon: Accept() [Pod2]
    Daemon-->>Pod2: connection established
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • vendor/golang.org/x/net/netutil/listen.go: Semaphore-based connection wrapping and cleanup logic requires careful verification of race conditions, proper channel handling, and connection lifecycle management.
  • cmd/multus-daemon/main.go: Configuration integration and conditional listener wrapping; verify config parsing and logging.
  • e2e/test-connection-limit.sh: Shell script correctness, resource cleanup, and timeout handling.
  • Config propagation: Ensure ConnectionLimit field is correctly threaded through daemon startup with proper defaults and validation.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Connection limit for thick daemon' clearly and specifically describes the main change—adding connection limit functionality to the multus daemon in thick mode, which aligns with the primary objective.
Linked Issues check ✅ Passed The PR implements the requested feature from issue #1346: adds a configurable connectionLimit option to constrain parallel CNI requests, includes configuration support, server-side listener wrapping, e2e tests, and deployment templates with the feature enabled.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the connection limit feature: config struct extension, daemon logic to apply limits, vendor library for the limiting mechanism, test infrastructure, and e2e validation. No unrelated changes detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@juliusmh
Copy link
Author

juliusmh commented Dec 8, 2025

Thanks for the comment @SchSeba, I added some tests but I'm not well versed in this repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
e2e/templates/multus-daemonset-thick.yml.j2 (1)

100-101: Clarify whether connectionLimit: 1 should be global for all thick‑mode e2e runs

Hard‑coding connectionLimit: 1 in the shared thick DaemonSet config means every thick‑mode e2e job now runs with single‑connection concurrency, not just the connection‑limit test. That’s safe but might unnecessarily slow other tests and hides behavior under higher limits.

Consider either:

  • Making the limit configurable (e.g., via a Jinja/env parameter) and defaulting to “no limit” for general tests while overriding to 1 only in the connection‑limit scenario, or
  • Adding a brief comment in this ConfigMap explaining that 1 is intentionally low to exercise the limit behavior in e2e.
.github/workflows/kind-e2e.yml (1)

88-90: Scope the “Test connection limit” step to thick mode (optional)

Right now this step runs for every matrix entry, including non‑thick manifests where connectionLimit isn’t configured. That’s functionally fine but makes the step less clearly tied to the thick‑daemon feature and adds a bit of redundant test time.

Consider either:

  • Adding an if: ${{ matrix.multus-manifest == 'multus-daemonset-thick.yml' }} guard (like the subdirectory chaining tests), or
  • Renaming the step to reflect that it’s a generic “many pods” sanity test for non‑thick runs, if you want the broader coverage.
pkg/server/types.go (1)

73-83: Document ConnectionLimit semantics in ControllerNetConf

The new ConnectionLimit *int field is a clean way to plumb the option through config, and using a pointer keeps it backward compatible. Right now, its behavior (nil or ≤0 means “no limit”, positive values cap concurrent connections) is only implicit from the daemon logic.

Consider adding a short comment on the field or in the struct doc to spell this out for users of the API, e.g., “maximum concurrent CNI server connections; nil or ≤0 disables limiting”.

cmd/multus-daemon/main.go (1)

32-33: Connection limit wiring looks good; consider validating non‑positive values

The listener wrapping with:

if limit := daemonConfig.ConnectionLimit; limit != nil && *limit > 0 {
    logging.Debugf("connection limit: %d", *limit)
    l = netutil.LimitListener(l, *limit)
}

cleanly keeps existing behavior for legacy configs and only applies LimitListener when explicitly set to a positive value.

Two small follow‑ups to consider:

  • Treat explicitly configured 0 or negative values as misconfiguration (log a warning or error) instead of silently behaving as “no limit”, so bad config is visible.
  • Ensure the user‑facing config docs mention that only positive values enable limiting and that nil/0 (depending on how you want to treat 0) results in no concurrency cap.

Also applies to: 171-174

e2e/test-connection-limit.sh (1)

1-10: Consider adding a trap for cleanup (and optionally stricter shell flags)

The script correctly exercises the “many pods” scenario and cleans up on the happy path. If kubectl create or kubectl wait fails, though, set -o errexit will exit before the delete runs, leaving resources around until the cluster is torn down.

Optionally, you could:

  • Add a simple trap to always attempt cleanup:
trap 'kubectl delete -f yamls/many-pods.yml >/dev/null 2>&1 || true' EXIT
  • And, if you want more robust scripting, add set -o nounset -o pipefail in line with other e2e scripts.

Not required for correctness, but it tightens test hygiene.

e2e/templates/many-pods.yml.j2 (1)

1-23: Verify whether the test pod really needs privileged: true

This Deployment is only running sleep, so it may not need full privileged privileges. If there’s no dependency on host networking features or special capabilities, consider tightening the securityContext (dropping privileged: true or replacing it with narrower capabilities) to keep the e2e fixtures as minimal‑privilege as possible.

If other tests or the CNI setup rely on this being privileged, keeping it as‑is is fine.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f42e0bd and 95c131e.

📒 Files selected for processing (8)
  • .github/workflows/kind-e2e.yml (1 hunks)
  • cmd/multus-daemon/main.go (2 hunks)
  • e2e/templates/many-pods.yml.j2 (1 hunks)
  • e2e/templates/multus-daemonset-thick.yml.j2 (1 hunks)
  • e2e/test-connection-limit.sh (1 hunks)
  • pkg/server/types.go (1 hunks)
  • vendor/golang.org/x/net/netutil/listen.go (1 hunks)
  • vendor/modules.txt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cmd/multus-daemon/main.go (2)
pkg/logging/logging.go (1)
  • Debugf (126-128)
vendor/golang.org/x/net/netutil/listen.go (1)
  • LimitListener (16-22)
🔇 Additional comments (2)
vendor/modules.txt (1)

204-217: Vendored golang.org/x/net/netutil entry is consistent

The new golang.org/x/net/netutil line fits correctly under the existing golang.org/x/net module section and matches the added vendored package and import usage.

vendor/golang.org/x/net/netutil/listen.go (1)

1-87: Vendored LimitListener implementation looks standard

This netutil.LimitListener implementation matches the usual upstream pattern (semaphore‑based cap, done channel, wrapped Conn releasing on Close). Keeping it as a straight vendored copy is good for future upstream syncs; I wouldn’t tweak behavior here unless you discover a specific bug and can upstream the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[OOMKilled] High memory consumption

4 participants