-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Fix EventPipeSession::WriteEventSuspend #45672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should take a dependency on the Pause/Resume APIs that were added. The backing value is explicitly not protected behind any locks and I'm not sure we should invest in enhancing it. It was only added for the GC gen0 promotion analysis work that went in right at the end of 5.
runtime/src/coreclr/src/vm/eventpipe.cpp
Lines 537 to 571 in fd094a9
| s_config.Disable(*pSession, pEventPipeProviderCallbackDataQueue); | |
| pSession->Disable(); // WriteAllBuffersToFile, and remove providers. | |
| // Do rundown before fully stopping the session unless rundown wasn't requested | |
| if (pSession->RundownRequested() && s_CanStartThreads) | |
| { | |
| pSession->EnableRundown(); // Set Rundown provider. | |
| EventPipeThread *const pEventPipeThread = EventPipeThread::GetOrCreate(); | |
| if (pEventPipeThread != nullptr) | |
| { | |
| pEventPipeThread->SetAsRundownThread(pSession); | |
| { | |
| s_config.Enable(*pSession, pEventPipeProviderCallbackDataQueue); | |
| { | |
| pSession->ExecuteRundown(); | |
| } | |
| s_config.Disable(*pSession, pEventPipeProviderCallbackDataQueue); | |
| } | |
| pEventPipeThread->SetAsRundownThread(nullptr); | |
| } | |
| else | |
| { | |
| _ASSERTE(!"Failed to get or create the EventPipeThread for rundown events."); | |
| } | |
| } | |
| s_allowWrite &= ~(pSession->GetMask()); | |
| // Remove the session from the array before calling SuspendWriteEvent. This way | |
| // we can guarantee that either the event write got the pointer and will complete | |
| // the write successfully, or it gets null and will bail. | |
| _ASSERTE(s_pSessions[pSession->GetIndex()] == pSession); | |
| s_pSessions[pSession->GetIndex()].Store(nullptr); |
Shouldn't the session already be disabled by the above logic (EventPipe::DisableInternal) before we reach this code path? We remove all the providers from the session, so new event writes shouldn't be hitting this code path right? By the time we've reached where the assertion was happening, new threads shouldn't be writing to this session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I checked again and it seems like the problem is happening because SetSessionWriteInProgress is set regardless of whether we actually write to the session here:
for (uint32_t i = 0; i < MaxNumberOfSessions; ++i)
{
if ((s_allowWrite & ((uint64_t)1 << i)) == 0)
continue;
// Now that we know this session is probably live we pay the perf cost of the memory barriers
// Setting this flag lets a thread trying to do a concurrent disable that it is not safe to delete
// session ID i. The if check above also ensures that once the session is unpublished this thread
// will eventually stop ever storing ID i into the WriteInProgress flag. This is important to
// guarantee termination of the YIELD_WHILE loop in SuspendWriteEvents.
pEventPipeThread->SetSessionWriteInProgress(i);
{
EventPipeSession *const pSession = s_pSessions[i].Load();
// Disable is allowed to set s_pSessions[i] = NULL at any time and that may have occured in between
// the check and the load
if (pSession != nullptr)
pSession->WriteEvent(
pThread,
event,
payload,
pActivityId,
pRelatedActivityId,
pEventThread,
pStack);
}
// Do not reference pSession past this point, we are signaling Disable() that it is safe to
// delete it
pEventPipeThread->SetSessionWriteInProgress(UINT32_MAX);
Let me follow up with a different fix to address the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, isn't the session disabled before we call SuspendWriteEvent? It seems like the real issue here is that we shouldn't be creating new threads for a disabled session
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I thought about this for a while and I believe we still have a race that looks like this:
-
<Thread 1> Calls EventPipe::WriteEvent and is pre-emptied here before it calls
SetSessionWriteInProgress. -
<Thread 2> Calls EventPipeSession::SuspendWriteEvent.
-
<Thread 2> Skips through the YIELD_WHILE for Thread 1 because
SetSessionWriteInProgresshasn't been called by Thread 1. -
<Thread 1> Wakes up and calls
SetSessionWriteInProgress -
<Thread 1> Gets preemptied before
SetSessionWriteInProgress(UINT32_MAX) is called -
<Thread 2> Calls EventPipeBufferManager::SuspendWriteEvent() which sees that Thread 1's
SessionWriteinProgressis stillsessionIndex.
Adding the YIELD_WHILE in EventPipeBufferManager::SuspendWriteEvent() should prevent this situation from happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make some time tonight to dig in and actually code review tonight
|
// Auto-generated message 69e114c which was merged 12/7 removed the intermediate src/coreclr/src/ folder. This PR needs to be updated as it touches files in that directory which causes conflicts. To update your commits you can use this bash script: https://gist.github.com/ViktorHofer/6d24f62abdcddb518b4966ead5ef3783. Feel free to use the comment section of the gist to improve the script for others. |
44f8b53 to
a907fb5
Compare
josalem
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this offline already. I'm convinced this will alleviate the issue connected to this PR. I would prefer if we didn't have a yield inside that spin lock, but the yield should only ever be a few CPU cycles in the rare case the predicate isn't already false (should being the operative term). I'll approve this for now, but let's hold out for @davmason to also think through it to make sure we didn't miss something.
|
I think the fix is functionally correct, but I don't like adding another loop. Conceptually EventPipeSession::SuspendWriteEvent already loops through all active threads (even though there is a race and it can miss some), doing it again here seems like a band-aid fix at best. Do we need the assert at all? Looking at the order of operations (here and here), the session has to be disabled to call SuspendWriteEvent, and the null pointer will be stored in s_pSessions before we call SuspendWriteEvent. Once s_pSessions has the null pointer stored, the check here will bail out and never call in to pSession->WriteEvent. So even though the thread is marked as "write in progress" it can't actually write any events, and won't allocate any buffers or anything. It seems like it should be safe to get rid of the assert and not add the YIELD_WHILE. |
|
@davmason ok - I’ll just remove the assert altogether and not add the yield while loop. Thanks |
|
Hello @sywhang! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
davmason
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Found from a test failure: #44447
It is possible for pThread to get a SessionWriteInProgress if it was created between EventPipeSession::SuspendWriteEvent and EventPipeBufferManager::SuspendWriteEvent because in EventPipe::WriteEventInternal we still call SessionWriteInProgress before checking if the session is enabled.
That thread will then get new EventPipeThread whose state does not indicate that the session is disabled, which means this thread will think it's still under writing.
The fix is to just remove this ASSERTE since this is not a valid assertion under this circumstance.
Note this won't result in a segfault in release build either, since pSession is null by the time the preempted thread returns to write an event after setting the SessionWriteInProgress.