Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions src/coreclr/vm/threadsuspend.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1983,9 +1983,14 @@ CONTEXT* AllocateOSContextHelper(BYTE** contextBuffer)
pfnInitializeContext2(NULL, context, NULL, &contextSize, xStateCompactionMask) :
InitializeContext(NULL, context, NULL, &contextSize);

// The following assert is valid, but gets triggered in some Win7 runs with no impact on functionality.
// commenting this out to reduce noise, as long as Win7 is supported.
// _ASSERTE(!success && GetLastError() == ERROR_INSUFFICIENT_BUFFER);
// Spec mentions that we may get a different error (it was observed on Windows7).
// In such case the contextSize is undefined.
if (success || GetLastError() != ERROR_INSUFFICIENT_BUFFER)
{
STRESS_LOG2(LF_SYNC, LL_INFO1000, "AllocateOSContextHelper: Unexpected result from InitializeContext (success: %d, error: %d).\n",
success, GetLastError());
return NULL;
}

// So now allocate a buffer of that size and call InitializeContext again
BYTE* buffer = new (nothrow)BYTE[contextSize];
Expand Down Expand Up @@ -2896,9 +2901,15 @@ BOOL Thread::RedirectThreadAtHandledJITCase(PFN_REDIRECTTARGET pTgt)
if (!pCtx)
{
pCtx = m_pSavedRedirectContext = ThreadStore::GrabOSContext(&m_pOSContextBuffer);
_ASSERTE(GetSavedRedirectContext() != NULL);
}

// We may not have a preallocated context. Could be short on memory when we tried to preallocate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crash was hit in the CI by a lot of tests. It sounds very unlikely we would be running out of memory that often.

Isn't the actual problem a race condition somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern as well. The situation should not be fatal, but should not be common either. These tests do not look like something that should run close to OOM.
I am not sure if this was always happening and handled or it is something new.

I could not reproduce this on purpose. I've spent quite some time trying. Maybe having fewer resources on lab machines is a factor.
I think the fix makes sense one way or another, but why we need it more than I'd expect is still a question.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to reproduce this so far. Did we see this on anything other than Windows7/x86?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have seen it on Win7/x86 only

Copy link
Member Author

@VSadov VSadov Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call InitializeContext with 0 for context buffer size to get the actual buffer size. The expected result is to get ERROR_INSUFFICIENT_BUFFER and to get the size required.
Rarely, for unknown reasons we get a different error. We have disabled an assert because of this.

// The following assert is valid, but gets triggered in some Win7 runs with no impact on functionality.

I wonder if in such case we also get invaid size (like 0) and thus the allocation fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec apparently mentions that:

 If the function fails with an error other than ERROR_INSUFFICIENT_BUFFER, the contents of ContextLength are undefined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just do not know what can possibly go wrong with a call like InitializeContext(NULL, context, NULL, &contextSize); - that is the same use as in the example.

Copy link
Member Author

@VSadov VSadov Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not even try to allocate when InitializeContext returns unexpected results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps that is the root cause, but the reason why the call may fail on Win7/x86 is still unclear.

// We cannot allocate here since we have a thread stopped in a random place, possibly holding locks
// that we would need while allocating.
// Other ways and attempts at suspending may yet succeed, but this redirection cannot continue.
if (!pCtx)
return (FALSE);

//////////////////////////////////////
// Get and save the thread's context
BOOL bRes = true;
Expand Down