Skip to content

time: avoid traversing entries in the time wheel twice#6718

Open
wathenjiang wants to merge 11 commits intotokio-rs:masterfrom
wathenjiang:fix-sharded-timer
Open

time: avoid traversing entries in the time wheel twice#6718
wathenjiang wants to merge 11 commits intotokio-rs:masterfrom
wathenjiang:fix-sharded-timer

Conversation

@wathenjiang
Copy link
Copy Markdown
Contributor

@wathenjiang wathenjiang commented Jul 24, 2024

This is for reverting the cached_when in #6584

In the extend_expiration method, we will not necessarily re-move the timer entry, but its state can be modified.

So we use the cached_when to calculate which slot the timer entry is currently stored in.

@github-actions github-actions bot added the R-loom-time-driver Run loom time driver tests on this PR label Jul 24, 2024
@wathenjiang wathenjiang added A-tokio Area: The main tokio crate M-time Module: tokio/time labels Jul 24, 2024
@wathenjiang wathenjiang changed the title time: revert the cached_when in TimerShared time: avoid traversing entries in the time wheel twice Jul 24, 2024
@Darksonn
Copy link
Copy Markdown
Member

What has changed since the original PR? Can you add a test for the previous bug? We probably won't merge this until we have #6717, but it still makes sense to add a dedicated test for the problem.

@wathenjiang
Copy link
Copy Markdown
Contributor Author

What has changed since the original PR? Can you add a test for the previous bug? We probably won't merge this until we have #6717, but it still makes sense to add a dedicated test for the problem.

  • The new added test is reset_timer_and_drop.
  • The change from the original PR can be seen in cd70a24

}
}
}
lock.occupied_bit_maintain(&expiration);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code doesn't call this. Why is it necessary now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I add occupied_bit_maintain now is that in the past, we would use take_entries to take all items from the linked list at once and then update the occupied bit. But now, we need to traverse the items one by one, which means we need to update the occupied bit after the traversal is completed. I think the difference here is that in the current version, the linked list is not necessarily empty, so I add this method.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be accurate to say that this is an operation that must be called when we are done with list? Would it make sense for it to be part of the destructor of EntryWaitersList?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be called before drop(lock) in if !waker_list.can_push() {?

Copy link
Copy Markdown
Contributor Author

@wathenjiang wathenjiang Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling occupied_bit_maintain before drop(lock) in if !waker_list.can_push() { makes sense to me. That is a good idea.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about set_elapsed?

I would really like to see a loom test where a timer is inserted during the if !waker_list.can_push() { ... } block. It's hard for me to tell whether it's correct.

I think you can do the loom test along these lines:

  1. Start a runtime with paused time.
  2. Create and poll 32 timers with the timer.
  3. Spawn a thread. The background thread will create a timer, register it, and wait for it to complete.
  4. In parallel with that wait for the 32 timers in order.
  5. Join the background thread.

Spawning the thread after registering the 32 timers should reduce the size of the loom test, since the region with more than 1 thread will be shorter. As long as the if !waker_list.can_push() { ... } block happens in the more-than-1-thread region of the test, it should work.

}
}
}
Err(state) if state == STATE_DEREGISTERED => {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can STATE_DEREGISTERED actually happen?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I think it will not happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-tokio Area: The main tokio crate M-time Module: tokio/time R-loom-time-driver Run loom time driver tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants