fix(s3stream): reduce CPU overhead in AsyncNetworkBandwidthLimiter when traffic hits limit#3343
Open
JUSTMEETPATEL wants to merge 1 commit intoAutoMQ:mainfrom
Conversation
…en traffic hits limit (AutoMQ#2052) Remove unnecessary condition.signalAll() from consume() and replace signalAll() with signal() in refillToken() to eliminate wasteful wake-ups of the run() thread under sustained load. When network traffic hits the configured bandwidth limit, the run() thread was being woken on every incoming consume() call via condition.signalAll(), even though availableTokens <= 0 meant it could not make progress. This caused a hot cycle of: wake -> acquire lock -> check ableToConsume() (false) -> block -> wake again, burning CPU on lock contention and context switches. Changes: - consume(): Remove condition.signalAll() after enqueueing. The run() thread cannot drain the queue without tokens, so waking it is wasteful. When tokens are available and the queue is empty, consume() takes the fast-path and completes inline without queuing, so run() never needs a signal in that case either. - refillToken(): Replace signalAll() with signal(). Only the single run() thread waits on this condition, making signalAll() unnecessary.
|
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses excessive CPU usage in AsyncNetworkBandwidthLimiter when bandwidth is saturated by reducing unnecessary wake-ups of the consumer (run()) thread.
Changes:
- Replace
condition.signalAll()withcondition.signal()inrefillToken()since only therun()thread waits on the condition. - Remove
condition.signalAll()fromconsume()after enqueueing, avoiding a hot wake/sleep loop whenavailableTokens <= 0.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Resolves #2052
When network traffic reaches the configured bandwidth limit,
AsyncNetworkBandwidthLimiterconsumes excessive CPU due to wasteful wake-up cycles in its consumer thread. The flame graphs in the issue show this overhead clearly.Root Cause
Two problems in the signaling logic create a hot wake-up loop under sustained load:
consume()callscondition.signalAll()on every enqueue (even whenavailableTokens <= 0). This wakes therun()thread, which acquires the lock, checksableToConsume()(returnsfalsesince tokens are exhausted), and immediately blocks again. Under sustained traffic at the limit, every incoming request triggers this wasteful cycle: wake -> lock -> check -> sleep -> wake -> lock -> check -> sleep.refillToken()usessignalAll()instead ofsignal(). Only the singlerun()thread ever waits on this condition variable, sosignalAll()is unnecessary overhead.Changes
AsyncNetworkBandwidthLimiter.java(1 file changed, 1 insertion, 2 deletions):consume(): Removedcondition.signalAll()after enqueueing aBucketItem. Therun()thread cannot drain the queue without available tokens, so waking it serves no purpose. When tokens are available and the queue is empty,consume()takes the fast-path and completes the future inline without queuing, sorun()never needs a signal in that path either.refillToken(): ReplacedsignalAll()withsignal(). Only one thread (run()) waits on this condition, makingsignalAll()unnecessary.Verification
All 10 existing tests in
AsyncNetworkBandwidthLimiterTestpass:testByPassConsume/testByPassConsume2testThrottleConsume/testThrottleConsume2/testThrottleConsume3/testThrottleConsume4/testThrottleConsume5testThrottleConsumeWithPriority/testThrottleConsumeWithPriority1testThrottleConsumeWithLargeChunk