Skip to content

7773: io: always cleanup AsyncFd registration list on deregister#33

Open
martin-augment wants to merge 8 commits intomasterfrom
pr-7773-2025-12-15-09-47-21
Open

7773: io: always cleanup AsyncFd registration list on deregister#33
martin-augment wants to merge 8 commits intomasterfrom
pr-7773-2025-12-15-09-47-21

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

7773: To review by AI

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 15, 2025

Walkthrough

This change introduces test-only accessors for IO registration counts across the runtime, exposed through tokio::runtime::Handle and internal driver APIs when the "full" feature is enabled. The accessors allow test code to track pending and total IO registrations. Additionally, error handling in the deregistration path is improved to ensure cleanup steps execute before error propagation. A new regression test replaces an existing test, using registration count tracking to detect potential memory leaks during AsyncFd lifecycle operations with repeated open/close cycles.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-7773-2025-12-15-09-47-21

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Dec 15, 2025

Code Review for PR #33: io: always cleanup AsyncFd registration list on deregister

Summary

This PR addresses issue tokio-rs#7563, fixing a memory leak in AsyncFd when file descriptors are closed before the AsyncFd is dropped. The core fix ensures that cleanup of internal registration structures happens even when OS-level deregistration fails.


✅ Positive Aspects

  1. Critical Bug Fix: The change correctly addresses a real memory leak scenario where OS deregistration can fail (e.g., when fd is already closed), but internal cleanup must still occur.

  2. Correct Error Handling Order: The logic is sound:

    • Attempt OS deregistration (store result)
    • Always perform internal cleanup
    • Return OS error after cleanup

    This ensures resources are freed even on failure.

  3. Comprehensive Test Coverage: The new test memory_leak_when_fd_closed_before_drop is excellent:

    • Reproduces the exact failure scenario
    • Uses realistic socket pairs
    • Has clear assertions with helpful error messages
    • Tests over multiple iterations to detect leaks
  4. Test Infrastructure: The addition of test-only introspection methods (io_pending_registration_count, io_total_registration_count) is appropriate and properly gated with #[cfg(feature = "full")], #[doc(hidden)], and #[allow(unreachable_pub)].


🔍 Issues & Concerns

HIGH PRIORITY - Inefficient Implementation

Location: tokio/src/runtime/io/registration_set.rs:63-80

The total_registration_count method has a serious performance issue:

pub(super) fn total_registration_count(&self, synced: &mut Synced) -> usize {
    let mut items = Vec::new();
    while let Some(item) = synced.registrations.pop_back() {
        items.push(item);
    }
    let count = items.len();
    for item in items.into_iter().rev() {
        synced.registrations.push_front(item);
    }
    count
}

Problems:

  1. O(n) time complexity: Drains and rebuilds entire list just to count
  2. Memory allocation: Creates temporary Vec
  3. Lock contention: Holds mutex while doing expensive operations
  4. Not actually safe: Despite the comment, this modifies the list structure temporarily

Better approach: Use synced.registrations.len() if LinkedList supports it, or iterate without removal:

pub(super) fn total_registration_count(&self, synced: &Synced) -> usize {
    synced.registrations.iter().count()
}

If the LinkedList type doesn't provide these, consider tracking count separately with an atomic counter.

MEDIUM PRIORITY - Code Quality

  1. Comment style inconsistency (tokio/src/runtime/io/driver.rs:299-300):

    // Deregister the source with the OS poller **first**
    // Cleanup ALWAYS happens

    The comment says "first" but that's now misleading - deregistration is attempted first, but cleanup happens regardless. Consider:

    // Attempt OS deregistration first, but cleanup always happens regardless of result
  2. Inconsistent formatting (tokio/src/runtime/io/driver.rs:332):
    Missing space after comment marker:

    /// Returns the total number of registrations in the main list (test-only)
    

    Should match line 327's style with "test-only, not part of public API".

  3. Comment style - Multiple locations have "TEST PURPOSE RELATED TO PR io: always cleanup AsyncFd registration list on deregister tokio-rs/tokio#7773" which:

LOW PRIORITY - Minor Improvements

  1. Test assertion message (tokio/tests/io_async_fd.rs:1013):
    The assertion message is excellent but could clarify the tolerance:

    assert!(
        final_count <= initial_count + 2 && max_count_seen <= initial_count + 2,
        "Memory leak detected: final count {} (initial: {}), max seen: {}. \
         Expected at most {} (initial + 2 for tolerance), but {} objects leaked.",
        final_count, initial_count, max_count_seen, 
        initial_count + 2, max_count_seen.saturating_sub(initial_count)
    );
  2. Magic number: The +2 tolerance in the test should be documented - why is 2 acceptable? Is this for concurrent background tasks?


🔒 Security Analysis

No security concerns identified. The change:

  • Reduces resource leaks (improves DoS resistance)
  • Maintains proper error propagation
  • Uses unsafe code appropriately (closing fd via libc in test)
  • Test code properly uses std::mem::forget to leak fd ownership intentionally

⚡ Performance Considerations

  1. Production code: The fix has zero performance overhead - just reordered existing operations
  2. Test code: The total_registration_count method is inefficient (see HIGH PRIORITY above) but only used in tests
  3. Test runtime: The test does 30 iterations with yields/sleeps - acceptable for regression test

📋 Recommendations

Must fix before merge:

  1. Implement total_registration_count efficiently without draining/rebuilding the list

Should fix before merge:

  1. Update comments to reference issue The AsyncFd in Tokio has a serious memory leak issue. tokio-rs/tokio#7563 rather than PR io: always cleanup AsyncFd registration list on deregister tokio-rs/tokio#7773
  2. Clarify the "first" comment about deregistration order
  3. Make comment style consistent

Nice to have:

  1. Document why +2 tolerance is acceptable in the test
  2. Improve test assertion message clarity

🎯 Verdict

The core fix is correct and solves a real problem, but the test infrastructure implementation needs improvement before merge. The inefficient total_registration_count method should be fixed - while it's test-only code, it sets a poor precedent and could cause issues if the registration list grows large in future test scenarios.

The bug fix logic itself (storing error, cleaning up, then returning error) is exactly right and should be merged once the test infrastructure is optimized.

initial_count + ITERATIONS,
max_count_seen.saturating_sub(initial_count)
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Test for try_with_interest accidentally deleted

The try_with_interest test was completely replaced by the new memory_leak_when_fd_closed_before_drop test rather than adding the new test alongside it. The deleted test verified important API behavior - that when try_with_interest fails, the error correctly returns the original Arc source via into_parts(), allowing callers to recover their resource. This appears to be accidental since the new test serves a completely different purpose (testing memory leak fixes in tokio-rs#7563).

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct ! The old test purpose was unrelated to the functionality covered by the new test, so it will be good to restore it back. Prevents introducing regressions related to AsyncFd:: try_with_interest()

// Restore items in reverse order (since we popped from back)
for item in items.into_iter().rev() {
synced.registrations.push_front(item);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: List restoration reverses registration order

The total_registration_count function corrupts the registrations list by reversing its order. Items are popped from the back (yielding [C, B, A]), then the iterator is reversed (giving [A, B, C]), and each is pushed to the front — resulting in [C, B, A] instead of the original [A, B, C]. Either push_front without .rev() or push_back with .rev() would preserve the correct order. Though marked test-only, this still corrupts runtime state during tests.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The reverse iteration while re-adding the items combined with push_front() reverses the final order of the items with the one before counting them.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tokio/tests/io_async_fd.rs (1)

956-959: Consider removing redundant imports.

AsRawFd, RawFd, Arc, and AsyncFd are already imported at the file level (lines 4, 7, 20). Only Handle is a new import needed here.

-    use std::os::unix::io::{AsRawFd, RawFd};
-    use std::sync::Arc;
-    use tokio::io::unix::AsyncFd;
     use tokio::runtime::Handle;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 897dc05 and 0a0f94e.

📒 Files selected for processing (4)
  • tokio/src/runtime/handle.rs (1 hunks)
  • tokio/src/runtime/io/driver.rs (3 hunks)
  • tokio/src/runtime/io/registration_set.rs (1 hunks)
  • tokio/tests/io_async_fd.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tokio/src/runtime/io/registration_set.rs (1)
tokio/src/runtime/io/driver.rs (1)
  • total_registration_count (335-338)
tokio/tests/io_async_fd.rs (3)
tokio/src/io/async_fd.rs (3)
  • as_raw_fd (928-930)
  • new (226-231)
  • drop (948-950)
tokio/src/io/poll_evented.rs (2)
  • new (89-91)
  • drop (291-296)
tokio/src/runtime/io/scheduled_io.rs (2)
  • drop (374-376)
  • drop (569-578)
tokio/src/runtime/io/driver.rs (1)
tokio/src/runtime/io/registration_set.rs (1)
  • total_registration_count (63-81)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: claude-review
🔇 Additional comments (5)
tokio/src/runtime/handle.rs (1)

491-507: LGTM!

The test-only accessors are appropriately gated with #[cfg(feature = "full")], hidden from docs, and cleanly delegate to the underlying IO driver. This provides the necessary observability for the regression test without polluting the public API.

tokio/src/runtime/io/driver.rs (2)

298-313: LGTM! This is the core fix for the memory leak.

The change correctly ensures cleanup always executes before error propagation. Previously, if registry.deregister(source) failed (e.g., when the fd was already closed), the early ? return would skip the internal cleanup, leaking ScheduledIo objects and leaving metrics inconsistent.


323-339: LGTM!

Test-only accessors are properly gated and follow the established pattern. The lock acquisition for total_registration_count is necessary given the implementation in registration_set.rs.

tokio/src/runtime/io/registration_set.rs (1)

56-60: LGTM!

Clean implementation that mirrors the existing needs_release() pattern.

tokio/tests/io_async_fd.rs (1)

950-1022: LGTM! Well-designed regression test.

The test correctly reproduces the memory leak scenario by closing the fd before dropping AsyncFd, then verifying that registration counts remain bounded. The assertion message provides good debugging context.

Comment on lines +61 to +81
/// TEST PURPOSE RELATED TO PR #7773
#[cfg(feature = "full")]
pub(super) fn total_registration_count(&self, synced: &mut Synced) -> usize {
// Count by temporarily draining the list, then restoring it
// This is safe for test purposes
let mut items = Vec::new();

// Drain all items
while let Some(item) = synced.registrations.pop_back() {
items.push(item);
}

let count = items.len();

// Restore items in reverse order (since we popped from back)
for item in items.into_iter().rev() {
synced.registrations.push_front(item);
}

count
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

List order is reversed after counting.

The drain-and-restore logic reverses the list order. After popping from back into a Vec and then iterating in reverse while pushing to front, the final order is the reverse of the original.

While this may not affect test correctness (since you're only counting), the comment "Restore items in reverse order" is misleading—it describes the iteration order, not the restoration goal.

Consider using a simple traversal if the LinkedList supports iteration, or fix the restoration logic:

-        // Restore items in reverse order (since we popped from back)
-        for item in items.into_iter().rev() {
-            synced.registrations.push_front(item);
+        // Restore items in original order
+        for item in items.into_iter() {
+            synced.registrations.push_back(item);
         }

Alternatively, if order truly doesn't matter for tests, update the comment to clarify that order is not preserved.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tokio/src/runtime/io/registration_set.rs around lines 61–81, the current
drain-and-restore approach reverses the LinkedList order; replace the
destructive pop/restore logic with a non-mutating count (e.g., use
synced.registrations.len() or synced.registrations.iter().count()) so the list
order is preserved, and remove or update the misleading "Restore items in
reverse order" comment accordingly.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! The reverse iteration while re-adding the items combined with push_front() reverses the final order of the items with the one before counting them.

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted. Overall the cleanup-on-deregister change makes sense; I left a couple notes to keep the test instrumentation from perturbing runtime state and to ensure the regression test reliably exercises the failing path.

Comment augment review to trigger a new review at any time.

let count = items.len();

// Restore items in reverse order (since we popped from back)
for item in items.into_iter().rev() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drain/restore logic appears to reverse synced.registrations (collecting via pop_back, then iterating rev() and push_front). Since this mutates driver state even in a test-only helper, it could subtly change behavior after calling the counter.

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The reverse iteration while re-adding the items combined with push_front() reverses the final order of the items with the one before counting them.

let async_fd = AsyncFd::new(ArcFd(afd.clone())).unwrap();

unsafe {
libc::close(raw_fd);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value of libc::close is ignored here; if close fails (or is interrupted), the fd may remain open and the test may not reliably exercise the intended deregister-failure path.

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback:The Augment AI reviewer is correct! The system call result should be checked before continuing with the rest of the test because now it assumes that everything is always successful.

@martin-augment
Copy link
Copy Markdown
Owner Author

HIGH PRIORITY - Inefficient Implementation

Location: tokio/src/runtime/io/registration_set.rs:63-80

The total_registration_count method has a serious performance issue:

pub(super) fn total_registration_count(&self, synced: &mut Synced) -> usize {
    let mut items = Vec::new();
    while let Some(item) = synced.registrations.pop_back() {
        items.push(item);
    }
    let count = items.len();
    for item in items.into_iter().rev() {
        synced.registrations.push_front(item);
    }
    count
}

Problems:

  1. O(n) time complexity: Drains and rebuilds entire list just to count
  2. Memory allocation: Creates temporary Vec
  3. Lock contention: Holds mutex while doing expensive operations
  4. Not actually safe: Despite the comment, this modifies the list structure temporarily

Better approach: Use synced.registrations.len() if LinkedList supports it, or iterate without removal:

pub(super) fn total_registration_count(&self, synced: &Synced) -> usize {
    synced.registrations.iter().count()
}

If the LinkedList type doesn't provide these, consider tracking count separately with an atomic counter.

value:good-but-wont-fix; category:bug; feedback:The Claude AI reviewer is correct that the implementation is not optimal! But since it is used only for the integration tests it is OK-ish. It could be improved later if this method is needed for the non-test part of the codebase.

@martin-augment
Copy link
Copy Markdown
Owner Author

  1. Comment style inconsistency (tokio/src/runtime/io/driver.rs:299-300):
    // Deregister the source with the OS poller **first**
    // Cleanup ALWAYS happens
    The comment says "first" but that's now misleading - deregistration is attempted first, but cleanup happens regardless. Consider:
    // Attempt OS deregistration first, but cleanup always happens regardless of result

value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct that the comments could be made more clear by explaining that the result is ignored until the cleanup is finished.

@martin-augment
Copy link
Copy Markdown
Owner Author

2. Inconsistent formatting (tokio/src/runtime/io/driver.rs:332):
Missing space after comment marker:
rust /// Returns the total number of registrations in the main list (test-only)
Should match line 327's style with "test-only, not part of public API".

value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct the docstring of this method is not consistent with the other newly introduced methods which are visible only for the IT tests.

@martin-augment
Copy link
Copy Markdown
Owner Author

martin-augment commented Dec 15, 2025

956-959: Consider removing redundant imports.

AsRawFd, RawFd, Arc, and AsyncFd are already imported at the file level (lines 4, 7, 20). Only Handle is a new import needed here.

-    use std::os::unix::io::{AsRawFd, RawFd};
-    use std::sync::Arc;
-    use tokio::io::unix::AsyncFd;
     use tokio::runtime::Handle;

value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! These dependencies are already imported globally for the whole IT test and there is no need to re-import them just for this particular test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants