Skip to content

fix(s3stream): reduce CPU overhead in AsyncNetworkBandwidthLimiter when traffic hits limit#3343

Open
JUSTMEETPATEL wants to merge 1 commit intoAutoMQ:mainfrom
JUSTMEETPATEL:fix/async-network-bandwidth-limiter-cpu-overhead
Open

fix(s3stream): reduce CPU overhead in AsyncNetworkBandwidthLimiter when traffic hits limit#3343
JUSTMEETPATEL wants to merge 1 commit intoAutoMQ:mainfrom
JUSTMEETPATEL:fix/async-network-bandwidth-limiter-cpu-overhead

Conversation

@JUSTMEETPATEL
Copy link
Copy Markdown

Motivation

Resolves #2052

When network traffic reaches the configured bandwidth limit, AsyncNetworkBandwidthLimiter consumes excessive CPU due to wasteful wake-up cycles in its consumer thread. The flame graphs in the issue show this overhead clearly.

Root Cause

Two problems in the signaling logic create a hot wake-up loop under sustained load:

  1. consume() calls condition.signalAll() on every enqueue (even when availableTokens <= 0). This wakes the run() thread, which acquires the lock, checks ableToConsume() (returns false since tokens are exhausted), and immediately blocks again. Under sustained traffic at the limit, every incoming request triggers this wasteful cycle: wake -> lock -> check -> sleep -> wake -> lock -> check -> sleep.

  2. refillToken() uses signalAll() instead of signal(). Only the single run() thread ever waits on this condition variable, so signalAll() is unnecessary overhead.

Changes

AsyncNetworkBandwidthLimiter.java (1 file changed, 1 insertion, 2 deletions):

  • consume(): Removed condition.signalAll() after enqueueing a BucketItem. The run() thread cannot drain the queue without available tokens, so waking it serves no purpose. When tokens are available and the queue is empty, consume() takes the fast-path and completes the future inline without queuing, so run() never needs a signal in that path either.

  • refillToken(): Replaced signalAll() with signal(). Only one thread (run()) waits on this condition, making signalAll() unnecessary.

Verification

All 10 existing tests in AsyncNetworkBandwidthLimiterTest pass:

  • testByPassConsume / testByPassConsume2
  • testThrottleConsume / testThrottleConsume2 / testThrottleConsume3 / testThrottleConsume4 / testThrottleConsume5
  • testThrottleConsumeWithPriority / testThrottleConsumeWithPriority1
  • testThrottleConsumeWithLargeChunk

…en traffic hits limit (AutoMQ#2052)

Remove unnecessary condition.signalAll() from consume() and replace
signalAll() with signal() in refillToken() to eliminate wasteful
wake-ups of the run() thread under sustained load.

When network traffic hits the configured bandwidth limit, the run()
thread was being woken on every incoming consume() call via
condition.signalAll(), even though availableTokens <= 0 meant it
could not make progress. This caused a hot cycle of: wake -> acquire
lock -> check ableToConsume() (false) -> block -> wake again, burning
CPU on lock contention and context switches.

Changes:
- consume(): Remove condition.signalAll() after enqueueing. The run()
  thread cannot drain the queue without tokens, so waking it is
  wasteful. When tokens are available and the queue is empty, consume()
  takes the fast-path and completes inline without queuing, so run()
  never needs a signal in that case either.
- refillToken(): Replace signalAll() with signal(). Only the single
  run() thread waits on this condition, making signalAll() unnecessary.
Copilot AI review requested due to automatic review settings May 2, 2026 10:47
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses excessive CPU usage in AsyncNetworkBandwidthLimiter when bandwidth is saturated by reducing unnecessary wake-ups of the consumer (run()) thread.

Changes:

  • Replace condition.signalAll() with condition.signal() in refillToken() since only the run() thread waits on the condition.
  • Remove condition.signalAll() from consume() after enqueueing, avoiding a hot wake/sleep loop when availableTokens <= 0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Excessive CPU Overhead in AsyncNetworkBandwidthLimiter When Network Traffic Hits the Limit

3 participants