Fix IteratorThreadPoolManager thread leak by tracking and shutting down scheduled executors#3626
Open
t-h-i-n-k-er wants to merge 4 commits into
Conversation
IteratorThreadPoolManager's constructor creates three
ScheduledExecutorService instances (two via
Executors.newSingleThreadScheduledExecutor() and one via
Executors.newScheduledThreadPool(int) inside createExecutorService,
called twice) to run periodic monitoring tasks. The references to
these scheduled executors are discarded immediately: each is created
inline as the receiver of scheduleWithFixedDelay(...) and never stored
in a field. The non-daemon monitoring threads they spawn therefore
keep running indefinitely and there is no way to shut them down; they
leak across query lifecycle cycles and across test harness runs that
bring up and tear down a fresh tablet-server in-process.
This commit adds IteratorThreadPoolManagerThreadLeakTest which fails
on the current code:
- scheduledExecutorsFieldExists: asserts the class declares a
scheduledExecutors field of type List so the manager retains
references to the ScheduledExecutorService instances it creates.
Fails because no such field exists.
- shutdownMethodExists: asserts the class declares a shutdown method
so callers can release the thread pools and scheduled executors.
Fails because no such method exists.
- scheduledExecutorsAreRetainedAndShutDown: constructs a manager,
asserts the scheduledExecutors list is populated, calls shutdown,
and asserts every retained scheduled executor and every tracked
ThreadPoolExecutor is shut down. Fails because the
scheduledExecutors field and shutdown method do not exist.
Refs NationalSecurityAgency#3601
…wn scheduled executors
Track every ScheduledExecutorService created by the manager in a new
scheduledExecutors list instead of discarding their references. Add a
shutdown(IteratorEnvironment) method that shuts down every
ThreadPoolExecutor in threadPools, every ScheduledExecutorService in
scheduledExecutors, and invalidates the ivaratorFutures cache. Callers
that manage the lifecycle of the tablet-server side query machinery
(for example test harnesses that bring up and tear down a fresh
tablet-server in-process) can now invoke shutdown to release the
periodic monitoring threads spawned in the constructor instead of
leaking them across lifecycle cycles.
The test added in the previous commit now passes:
- scheduledExecutorsFieldExists: the scheduledExecutors field is now
declared as a List<ScheduledExecutorService>.
- shutdownMethodExists: the shutdown(IteratorEnvironment) method is
now declared.
- scheduledExecutorsAreRetainedAndShutDown: after construction the
scheduledExecutors list contains every scheduled executor (>= 3
entries: two single-thread monitors and one thread-pool monitor
per createExecutorService call), and after shutdown every retained
scheduled executor and every tracked ThreadPoolExecutor is shut
down and terminates within 5 seconds.
Refs NationalSecurityAgency#3601
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
IteratorThreadPoolManager's constructor creates three ScheduledExecutorService instances (two via Executors.newSingleThreadScheduledExecutor() and one via Executors.newScheduledThreadPool(int) inside createExecutorService, called twice) to run periodic monitoring tasks. The references to these scheduled executors are discarded immediately — each is created inline as the receiver of scheduleWithFixedDelay(...) and never stored in a field. The non-daemon monitoring threads they spawn therefore keep running indefinitely and there is no way to shut them down; they leak across query lifecycle cycles and across test-harness runs that bring up and tear down a fresh tablet-server in-process.
What this PR does:
Two commits, in the order requested on #3602 — a test that demonstrates the failure condition, and then the commit to fix it:
1: Add failing test demonstrating IteratorThreadPoolManager thread leak — adds IteratorThreadPoolManagerThreadLeakTest which fails on the current code:
scheduledExecutorsFieldExists | asserts the class declares a scheduledExecutors field of type List so the manager retains references to the ScheduledExecutorService instances it creates. Fails because no such field exists.
shutdownMethodExists | asserts the class declares a shutdown method so callers can release the thread pools and scheduled executors. Fails because no such method exists.
scheduledExecutorsAreRetainedAndShutDown | constructs a manager via reflection, asserts the scheduledExecutors list is populated (≥ 3 entries), calls shutdown, and asserts every retained scheduled executor and every tracked ThreadPoolExecutor is shut down and terminates within 5 seconds. Fails because the scheduledExecutors field and shutdown method do not exist.
Verified locally: 3 tests fail on unfixed code, 3 tests pass on fixed code.
2: Fix IteratorThreadPoolManager thread leak by tracking and shutting down scheduled executors — tracks every ScheduledExecutorService created by the manager in a new scheduledExecutors list instead of discarding their references, and adds a shutdown(IteratorEnvironment) method that shuts down every ThreadPoolExecutor in threadPools, every ScheduledExecutorService in scheduledExecutors, and invalidates the ivaratorFutures cache. All tests now pass.
Why shutdownNow() rather than shutdown()?
The monitoring tasks scheduled on these executors run indefinitely (1 s / 10 s / 60 s fixed-delay loops). shutdown() would wait for the current iteration to finish, which is fine for the periodic monitors but provides no bounded guarantee; shutdownNow() cancels the periodic task immediately and returns the list of never-started tasks (which we discard, since each task is a stateless lambda). This matches the existing pattern used for the ThreadPoolExecutor instances in threadPools (which also use shutdownNow() semantics implicitly via core-thread-timeout + abandonment).