7773: io: always cleanup AsyncFd registration list on deregister by martin-augment · Pull Request #33 · martin-augment/tokio

martin-augment · 2025-12-15T09:47:25Z

7773: To review by AI

Fixes memory leak when fd is closed before AsyncFd drop. Fixes: tokio-rs#7563

coderabbitai · 2025-12-15T09:47:40Z

Walkthrough

This change introduces test-only accessors for IO registration counts across the runtime, exposed through tokio::runtime::Handle and internal driver APIs when the "full" feature is enabled. The accessors allow test code to track pending and total IO registrations. Additionally, error handling in the deregistration path is improved to ensure cleanup steps execute before error propagation. A new regression test replaces an existing test, using registration count tracking to detect potential memory leaks during AsyncFd lifecycle operations with repeated open/close cycles.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-7773-2025-12-15-09-47-21

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2025-12-15T09:49:42Z

Code Review for PR #33: io: always cleanup AsyncFd registration list on deregister

Summary

This PR addresses issue tokio-rs#7563, fixing a memory leak in AsyncFd when file descriptors are closed before the AsyncFd is dropped. The core fix ensures that cleanup of internal registration structures happens even when OS-level deregistration fails.

✅ Positive Aspects

Critical Bug Fix: The change correctly addresses a real memory leak scenario where OS deregistration can fail (e.g., when fd is already closed), but internal cleanup must still occur.
Correct Error Handling Order: The logic is sound:
- Attempt OS deregistration (store result)
- Always perform internal cleanup
- Return OS error after cleanup
This ensures resources are freed even on failure.
Comprehensive Test Coverage: The new test memory_leak_when_fd_closed_before_drop is excellent:
- Reproduces the exact failure scenario
- Uses realistic socket pairs
- Has clear assertions with helpful error messages
- Tests over multiple iterations to detect leaks
Test Infrastructure: The addition of test-only introspection methods (io_pending_registration_count, io_total_registration_count) is appropriate and properly gated with #[cfg(feature = "full")], #[doc(hidden)], and #[allow(unreachable_pub)].

🔍 Issues & Concerns

HIGH PRIORITY - Inefficient Implementation

Location: tokio/src/runtime/io/registration_set.rs:63-80

The total_registration_count method has a serious performance issue:

pub(super) fn total_registration_count(&self, synced: &mut Synced) -> usize {
    let mut items = Vec::new();
    while let Some(item) = synced.registrations.pop_back() {
        items.push(item);
    }
    let count = items.len();
    for item in items.into_iter().rev() {
        synced.registrations.push_front(item);
    }
    count
}

Problems:

O(n) time complexity: Drains and rebuilds entire list just to count
Memory allocation: Creates temporary Vec
Lock contention: Holds mutex while doing expensive operations
Not actually safe: Despite the comment, this modifies the list structure temporarily

Better approach: Use synced.registrations.len() if LinkedList supports it, or iterate without removal:

pub(super) fn total_registration_count(&self, synced: &Synced) -> usize {
    synced.registrations.iter().count()
}

If the LinkedList type doesn't provide these, consider tracking count separately with an atomic counter.

MEDIUM PRIORITY - Code Quality

Comment style inconsistency (tokio/src/runtime/io/driver.rs:299-300):
```
// Deregister the source with the OS poller **first**
// Cleanup ALWAYS happens
```
The comment says "first" but that's now misleading - deregistration is attempted first, but cleanup happens regardless. Consider:
```
// Attempt OS deregistration first, but cleanup always happens regardless of result
```
Inconsistent formatting (tokio/src/runtime/io/driver.rs:332):
Missing space after comment marker:
```
/// Returns the total number of registrations in the main list (test-only)
```
Should match line 327's style with "test-only, not part of public API".
Comment style - Multiple locations have "TEST PURPOSE RELATED TO PR io: always cleanup AsyncFd registration list on deregister tokio-rs/tokio#7773" which:
- Should reference the issue (The AsyncFd in Tokio has a serious memory leak issue. tokio-rs/tokio#7563) not the PR number
- Will become outdated once merged
- Consider: /// Test-only API for verifying registration cleanup (see issue #7563)

LOW PRIORITY - Minor Improvements

Test assertion message (tokio/tests/io_async_fd.rs:1013):
The assertion message is excellent but could clarify the tolerance:

assert!(
    final_count <= initial_count + 2 && max_count_seen <= initial_count + 2,
    "Memory leak detected: final count {} (initial: {}), max seen: {}. \
     Expected at most {} (initial + 2 for tolerance), but {} objects leaked.",
    final_count, initial_count, max_count_seen, 
    initial_count + 2, max_count_seen.saturating_sub(initial_count)
);

Magic number: The +2 tolerance in the test should be documented - why is 2 acceptable? Is this for concurrent background tasks?

🔒 Security Analysis

✅ No security concerns identified. The change:

Reduces resource leaks (improves DoS resistance)
Maintains proper error propagation
Uses unsafe code appropriately (closing fd via libc in test)
Test code properly uses std::mem::forget to leak fd ownership intentionally

⚡ Performance Considerations

Production code: The fix has zero performance overhead - just reordered existing operations
Test code: The total_registration_count method is inefficient (see HIGH PRIORITY above) but only used in tests
Test runtime: The test does 30 iterations with yields/sleeps - acceptable for regression test

📋 Recommendations

Must fix before merge:

Implement total_registration_count efficiently without draining/rebuilding the list

Should fix before merge:

Update comments to reference issue The AsyncFd in Tokio has a serious memory leak issue. tokio-rs/tokio#7563 rather than PR io: always cleanup AsyncFd registration list on deregister tokio-rs/tokio#7773
Clarify the "first" comment about deregistration order
Make comment style consistent

Nice to have:

Document why +2 tolerance is acceptable in the test
Improve test assertion message clarity

🎯 Verdict

The core fix is correct and solves a real problem, but the test infrastructure implementation needs improvement before merge. The inefficient total_registration_count method should be fixed - while it's test-only code, it sets a poor precedent and could cause issues if the registration list grows large in future test scenarios.

The bug fix logic itself (storing error, cleaning up, then returning error) is exactly right and should be merged once the test infrastructure is optimized.

cursor · 2025-12-15T09:50:23Z

tokio/tests/io_async_fd.rs

+        initial_count + ITERATIONS,
+        max_count_seen.saturating_sub(initial_count)
+    );
 }


Bug: Test for try_with_interest accidentally deleted

The try_with_interest test was completely replaced by the new memory_leak_when_fd_closed_before_drop test rather than adding the new test alongside it. The deleted test verified important API behavior - that when try_with_interest fails, the error correctly returns the original Arc source via into_parts(), allowing callers to recover their resource. This appears to be accidental since the new test serves a completely different purpose (testing memory leak fixes in tokio-rs#7563).

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct ! The old test purpose was unrelated to the functionality covered by the new test, so it will be good to restore it back. Prevents introducing regressions related to AsyncFd:: try_with_interest()

cursor · 2025-12-15T09:50:23Z

tokio/src/runtime/io/registration_set.rs

+        // Restore items in reverse order (since we popped from back)
+        for item in items.into_iter().rev() {
+            synced.registrations.push_front(item);
+        }


Bug: List restoration reverses registration order

The total_registration_count function corrupts the registrations list by reversing its order. Items are popped from the back (yielding [C, B, A]), then the iterator is reversed (giving [A, B, C]), and each is pushed to the front — resulting in [C, B, A] instead of the original [A, B, C]. Either push_front without .rev() or push_back with .rev() would preserve the correct order. Though marked test-only, this still corrupts runtime state during tests.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The reverse iteration while re-adding the items combined with push_front() reverses the final order of the items with the one before counting them.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tokio/tests/io_async_fd.rs (1)
956-959: Consider removing redundant imports.

AsRawFd, RawFd, Arc, and AsyncFd are already imported at the file level (lines 4, 7, 20). Only Handle is a new import needed here.
-    use std::os::unix::io::{AsRawFd, RawFd};
-    use std::sync::Arc;
-    use tokio::io::unix::AsyncFd;
     use tokio::runtime::Handle;

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 897dc05 and 0a0f94e.

📒 Files selected for processing (4)

tokio/src/runtime/handle.rs (1 hunks)
tokio/src/runtime/io/driver.rs (3 hunks)
tokio/src/runtime/io/registration_set.rs (1 hunks)
tokio/tests/io_async_fd.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

tokio/src/runtime/io/registration_set.rs (1)

tokio/src/runtime/io/driver.rs (1)

total_registration_count (335-338)

tokio/tests/io_async_fd.rs (3)

tokio/src/io/async_fd.rs (3)

as_raw_fd (928-930)

new (226-231)

drop (948-950)

tokio/src/io/poll_evented.rs (2)

new (89-91)

drop (291-296)

tokio/src/runtime/io/scheduled_io.rs (2)

drop (374-376)

drop (569-578)

tokio/src/runtime/io/driver.rs (1)

tokio/src/runtime/io/registration_set.rs (1)

total_registration_count (63-81)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Cursor Bugbot
GitHub Check: claude-review

🔇 Additional comments (5)

tokio/src/runtime/handle.rs (1)

491-507: LGTM!

The test-only accessors are appropriately gated with #[cfg(feature = "full")], hidden from docs, and cleanly delegate to the underlying IO driver. This provides the necessary observability for the regression test without polluting the public API.

tokio/src/runtime/io/driver.rs (2)

298-313: LGTM! This is the core fix for the memory leak.

The change correctly ensures cleanup always executes before error propagation. Previously, if registry.deregister(source) failed (e.g., when the fd was already closed), the early ? return would skip the internal cleanup, leaking ScheduledIo objects and leaving metrics inconsistent.

323-339: LGTM!

Test-only accessors are properly gated and follow the established pattern. The lock acquisition for total_registration_count is necessary given the implementation in registration_set.rs.

tokio/src/runtime/io/registration_set.rs (1)

56-60: LGTM!

Clean implementation that mirrors the existing needs_release() pattern.

tokio/tests/io_async_fd.rs (1)

950-1022: LGTM! Well-designed regression test.

The test correctly reproduces the memory leak scenario by closing the fd before dropping AsyncFd, then verifying that registration counts remain bounded. The assertion message provides good debugging context.

coderabbitai · 2025-12-15T09:51:05Z

tokio/src/runtime/io/registration_set.rs

+    /// TEST PURPOSE RELATED TO PR #7773
+    #[cfg(feature = "full")]
+    pub(super) fn total_registration_count(&self, synced: &mut Synced) -> usize {
+        // Count by temporarily draining the list, then restoring it
+        // This is safe for test purposes
+        let mut items = Vec::new();
+
+        // Drain all items
+        while let Some(item) = synced.registrations.pop_back() {
+            items.push(item);
+        }
+
+        let count = items.len();
+
+        // Restore items in reverse order (since we popped from back)
+        for item in items.into_iter().rev() {
+            synced.registrations.push_front(item);
+        }
+
+        count
+    }


⚠️ Potential issue | 🟡 Minor

List order is reversed after counting.

The drain-and-restore logic reverses the list order. After popping from back into a Vec and then iterating in reverse while pushing to front, the final order is the reverse of the original.

While this may not affect test correctness (since you're only counting), the comment "Restore items in reverse order" is misleading—it describes the iteration order, not the restoration goal.

Consider using a simple traversal if the LinkedList supports iteration, or fix the restoration logic:

- // Restore items in reverse order (since we popped from back) - for item in items.into_iter().rev() { - synced.registrations.push_front(item); + // Restore items in original order + for item in items.into_iter() { + synced.registrations.push_back(item); }

Alternatively, if order truly doesn't matter for tests, update the comment to clarify that order is not preserved.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In tokio/src/runtime/io/registration_set.rs around lines 61–81, the current drain-and-restore approach reverses the LinkedList order; replace the destructive pop/restore logic with a non-mutating count (e.g., use synced.registrations.len() or synced.registrations.iter().count()) so the list order is preserved, and remove or update the misleading "Restore items in reverse order" comment accordingly.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! The reverse iteration while re-adding the items combined with push_front() reverses the final order of the items with the one before counting them.

augmentcode

Review completed. 2 suggestions posted. Overall the cleanup-on-deregister change makes sense; I left a couple notes to keep the test instrumentation from perturbing runtime state and to ensure the regression test reliably exercises the failing path.

Comment augment review to trigger a new review at any time.

augmentcode · 2025-12-15T09:52:08Z

tokio/src/runtime/io/registration_set.rs

+        let count = items.len();
+
+        // Restore items in reverse order (since we popped from back)
+        for item in items.into_iter().rev() {


This drain/restore logic appears to reverse synced.registrations (collecting via pop_back, then iterating rev() and push_front). Since this mutates driver state even in a test-only helper, it could subtly change behavior after calling the counter.

_{🤖 Was this useful? React with 👍 or 👎}

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The reverse iteration while re-adding the items combined with push_front() reverses the final order of the items with the one before counting them.

augmentcode · 2025-12-15T09:52:08Z

tokio/tests/io_async_fd.rs

+        let async_fd = AsyncFd::new(ArcFd(afd.clone())).unwrap();
+
+        unsafe {
+            libc::close(raw_fd);


The return value of libc::close is ignored here; if close fails (or is interrupted), the fd may remain open and the test may not reliably exercise the intended deregister-failure path.

_{🤖 Was this useful? React with 👍 or 👎}

value:good-to-have; category:bug; feedback:The Augment AI reviewer is correct! The system call result should be checked before continuing with the rest of the test because now it assumes that everything is always successful.

martin-augment · 2025-12-15T11:06:51Z

HIGH PRIORITY - Inefficient Implementation

Location: tokio/src/runtime/io/registration_set.rs:63-80

The total_registration_count method has a serious performance issue:
pub(super) fn total_registration_count(&self, synced: &mut Synced) -> usize {
    let mut items = Vec::new();
    while let Some(item) = synced.registrations.pop_back() {
        items.push(item);
    }
    let count = items.len();
    for item in items.into_iter().rev() {
        synced.registrations.push_front(item);
    }
    count
}
Problems:

O(n) time complexity: Drains and rebuilds entire list just to count

Memory allocation: Creates temporary Vec

Lock contention: Holds mutex while doing expensive operations

Not actually safe: Despite the comment, this modifies the list structure temporarily

Better approach: Use synced.registrations.len() if LinkedList supports it, or iterate without removal:
pub(super) fn total_registration_count(&self, synced: &Synced) -> usize {
    synced.registrations.iter().count()
}
If the LinkedList type doesn't provide these, consider tracking count separately with an atomic counter.

value:good-but-wont-fix; category:bug; feedback:The Claude AI reviewer is correct that the implementation is not optimal! But since it is used only for the integration tests it is OK-ish. It could be improved later if this method is needed for the non-test part of the codebase.

martin-augment · 2025-12-15T11:11:08Z

Comment style inconsistency (tokio/src/runtime/io/driver.rs:299-300):
// Deregister the source with the OS poller **first**
// Cleanup ALWAYS happens
The comment says "first" but that's now misleading - deregistration is attempted first, but cleanup happens regardless. Consider:
// Attempt OS deregistration first, but cleanup always happens regardless of result

value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct that the comments could be made more clear by explaining that the result is ignored until the cleanup is finished.

martin-augment · 2025-12-15T11:12:27Z

2. Inconsistent formatting (tokio/src/runtime/io/driver.rs:332):
Missing space after comment marker:
rust /// Returns the total number of registrations in the main list (test-only)
Should match line 327's style with "test-only, not part of public API".

value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct the docstring of this method is not consistent with the other newly introduced methods which are visible only for the IT tests.

martin-augment · 2025-12-15T11:18:15Z

956-959: Consider removing redundant imports.

AsRawFd, RawFd, Arc, and AsyncFd are already imported at the file level (lines 4, 7, 20). Only Handle is a new import needed here.
-    use std::os::unix::io::{AsRawFd, RawFd};
-    use std::sync::Arc;
-    use tokio::io::unix::AsyncFd;
     use tokio::runtime::Handle;

value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! These dependencies are already imported globally for the whole IT test and there is no need to re-import them just for this particular test case.

F4RAN added 8 commits December 12, 2025 22:28

io: always cleanup AsyncFd registration list on deregister

931635d

Fixes memory leak when fd is closed before AsyncFd drop. Fixes: tokio-rs#7563

fix: formatter issues

3d7d87d

test: in linux environment

b6452c7

test: linux test with fix

bf5c706

chore: remove additional method

e659d60

chore: remove additional debug comments

b44e56d

fix:formatter

94f46b9

fix: style: fix clippy warnings and format code

0a0f94e

cursor bot reviewed Dec 15, 2025

View reviewed changes

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

augmentcode bot reviewed Dec 15, 2025

View reviewed changes

Conversation

martin-augment commented Dec 15, 2025

Uh oh!

coderabbitai bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Dec 15, 2025

Code Review for PR #33: io: always cleanup AsyncFd registration list on deregister

Summary

✅ Positive Aspects

🔍 Issues & Concerns

HIGH PRIORITY - Inefficient Implementation

MEDIUM PRIORITY - Code Quality

LOW PRIORITY - Minor Improvements

🔒 Security Analysis

⚡ Performance Considerations

📋 Recommendations

🎯 Verdict

Uh oh!

cursor bot Dec 15, 2025

Choose a reason for hiding this comment

Bug: Test for try_with_interest accidentally deleted

Uh oh!

martin-augment Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

cursor bot Dec 15, 2025

Choose a reason for hiding this comment

Bug: List restoration reverses registration order

Uh oh!

martin-augment Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Dec 15, 2025

HIGH PRIORITY - Inefficient Implementation

Uh oh!

martin-augment commented Dec 15, 2025

Uh oh!

martin-augment commented Dec 15, 2025

Uh oh!

martin-augment commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Dec 15, 2025 •

edited

Loading

Bug: Test for `try_with_interest` accidentally deleted

martin-augment commented Dec 15, 2025 •

edited

Loading