Skip to content

Fix IteratorThreadPoolManager thread leak by tracking and shutting down scheduled executors#3626

Open
t-h-i-n-k-er wants to merge 4 commits into
NationalSecurityAgency:integrationfrom
t-h-i-n-k-er:pr2-iterator-thread-pool-manager-fix
Open

Fix IteratorThreadPoolManager thread leak by tracking and shutting down scheduled executors#3626
t-h-i-n-k-er wants to merge 4 commits into
NationalSecurityAgency:integrationfrom
t-h-i-n-k-er:pr2-iterator-thread-pool-manager-fix

Conversation

@t-h-i-n-k-er

Copy link
Copy Markdown

Summary:

IteratorThreadPoolManager's constructor creates three ScheduledExecutorService instances (two via Executors.newSingleThreadScheduledExecutor() and one via Executors.newScheduledThreadPool(int) inside createExecutorService, called twice) to run periodic monitoring tasks. The references to these scheduled executors are discarded immediately — each is created inline as the receiver of scheduleWithFixedDelay(...) and never stored in a field. The non-daemon monitoring threads they spawn therefore keep running indefinitely and there is no way to shut them down; they leak across query lifecycle cycles and across test-harness runs that bring up and tear down a fresh tablet-server in-process.

What this PR does:

Two commits, in the order requested on #3602 — a test that demonstrates the failure condition, and then the commit to fix it:

1: Add failing test demonstrating IteratorThreadPoolManager thread leak — adds IteratorThreadPoolManagerThreadLeakTest which fails on the current code:

  • scheduledExecutorsFieldExists | asserts the class declares a scheduledExecutors field of type List so the manager retains references to the ScheduledExecutorService instances it creates. Fails because no such field exists.

  • shutdownMethodExists | asserts the class declares a shutdown method so callers can release the thread pools and scheduled executors. Fails because no such method exists.

  • scheduledExecutorsAreRetainedAndShutDown | constructs a manager via reflection, asserts the scheduledExecutors list is populated (≥ 3 entries), calls shutdown, and asserts every retained scheduled executor and every tracked ThreadPoolExecutor is shut down and terminates within 5 seconds. Fails because the scheduledExecutors field and shutdown method do not exist.

Verified locally: 3 tests fail on unfixed code, 3 tests pass on fixed code.

2: Fix IteratorThreadPoolManager thread leak by tracking and shutting down scheduled executors — tracks every ScheduledExecutorService created by the manager in a new scheduledExecutors list instead of discarding their references, and adds a shutdown(IteratorEnvironment) method that shuts down every ThreadPoolExecutor in threadPools, every ScheduledExecutorService in scheduledExecutors, and invalidates the ivaratorFutures cache. All tests now pass.

Why shutdownNow() rather than shutdown()?

The monitoring tasks scheduled on these executors run indefinitely (1 s / 10 s / 60 s fixed-delay loops). shutdown() would wait for the current iteration to finish, which is fine for the periodic monitors but provides no bounded guarantee; shutdownNow() cancels the periodic task immediately and returns the list of never-started tasks (which we discard, since each task is a stateless lambda). This matches the existing pattern used for the ThreadPoolExecutor instances in threadPools (which also use shutdownNow() semantics implicitly via core-thread-timeout + abandonment).

t-h-i-n-k-er and others added 4 commits June 21, 2026 23:11
IteratorThreadPoolManager's constructor creates three
ScheduledExecutorService instances (two via
Executors.newSingleThreadScheduledExecutor() and one via
Executors.newScheduledThreadPool(int) inside createExecutorService,
called twice) to run periodic monitoring tasks. The references to
these scheduled executors are discarded immediately: each is created
inline as the receiver of scheduleWithFixedDelay(...) and never stored
in a field. The non-daemon monitoring threads they spawn therefore
keep running indefinitely and there is no way to shut them down; they
leak across query lifecycle cycles and across test harness runs that
bring up and tear down a fresh tablet-server in-process.

This commit adds IteratorThreadPoolManagerThreadLeakTest which fails
on the current code:

  - scheduledExecutorsFieldExists: asserts the class declares a
    scheduledExecutors field of type List so the manager retains
    references to the ScheduledExecutorService instances it creates.
    Fails because no such field exists.

  - shutdownMethodExists: asserts the class declares a shutdown method
    so callers can release the thread pools and scheduled executors.
    Fails because no such method exists.

  - scheduledExecutorsAreRetainedAndShutDown: constructs a manager,
    asserts the scheduledExecutors list is populated, calls shutdown,
    and asserts every retained scheduled executor and every tracked
    ThreadPoolExecutor is shut down. Fails because the
    scheduledExecutors field and shutdown method do not exist.

Refs NationalSecurityAgency#3601
…wn scheduled executors

Track every ScheduledExecutorService created by the manager in a new
scheduledExecutors list instead of discarding their references. Add a
shutdown(IteratorEnvironment) method that shuts down every
ThreadPoolExecutor in threadPools, every ScheduledExecutorService in
scheduledExecutors, and invalidates the ivaratorFutures cache. Callers
that manage the lifecycle of the tablet-server side query machinery
(for example test harnesses that bring up and tear down a fresh
tablet-server in-process) can now invoke shutdown to release the
periodic monitoring threads spawned in the constructor instead of
leaking them across lifecycle cycles.

The test added in the previous commit now passes:

  - scheduledExecutorsFieldExists: the scheduledExecutors field is now
    declared as a List<ScheduledExecutorService>.
  - shutdownMethodExists: the shutdown(IteratorEnvironment) method is
    now declared.
  - scheduledExecutorsAreRetainedAndShutDown: after construction the
    scheduledExecutors list contains every scheduled executor (>= 3
    entries: two single-thread monitors and one thread-pool monitor
    per createExecutorService call), and after shutdown every retained
    scheduled executor and every tracked ThreadPoolExecutor is shut
    down and terminates within 5 seconds.

Refs NationalSecurityAgency#3601
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant