Skip to content

Debugger hangs due to runtime deadlock. HashMap takes ThreadStore out of order #123650

@noahfalk

Description

@noahfalk

Description

The debugger is blocked in mscordbi code at:

      [Inline Frame] mscordbi.dll!SafeWaitForSingleObject(CordbProcess *) Line 257  C++
>     mscordbi.dll!CordbProcess::StopInternal(unsigned long dwTimeout, VMPTR_Base<AppDomain,void> pAppDomainToken) Line 3611  C++
      [Inline Frame] mscordbi.dll!StopContinueHolder::Init(CordbProcess *) Line 208 C++
      mscordbi.dll!CordbModule::GetFunctionFromToken(unsigned int token, ICorDebugFunction * * ppFunction) Line 1387    C++

mscordbi is waiting on the debuggee to respond which it won't do because it is deadlocked.

Thread 0x9314 in the debuggee (trying to acquire DebuggerController lock):

      [Inline Frame] coreclr.dll!CrstBase::AcquireLock(CrstBase *) Line 174   C++
      [Inline Frame] coreclr.dll!FunctionBase<CrstBase *,&CrstBase::AcquireLock,&CrstBase::ReleaseLock>::DoAcquire() Line 694 C++
      [Inline Frame] coreclr.dll!BaseHolder<CrstBase *,FunctionBase<CrstBase *,&CrstBase::AcquireLock,&CrstBase::ReleaseLock>,0,&CompareDefault<CrstBase *>>::Acquire() Line 266    C++
      [Inline Frame] coreclr.dll!BaseHolder<CrstBase *,FunctionBase<CrstBase *,&CrstBase::AcquireLock,&CrstBase::ReleaseLock>,0,&CompareDefault<CrstBase *>>::{ctor}(CrstBase *) Line 233 C++
      [Inline Frame] coreclr.dll!Holder<CrstBase *,&CrstBase::AcquireLock,&CrstBase::ReleaseLock,0,&CompareDefault<CrstBase *>,1>::{ctor}(CrstBase *) Line 729    C++
>     coreclr.dll!DebuggerController::DispatchPatchOrSingleStep(Thread * thread, _CONTEXT * context, const unsigned char * address, SCAN_TRIGGER which, DebuggerSteppingInfo * pDebuggerSteppingInfo) Line 3063   C++
      coreclr.dll!DebuggerController::DispatchNativeException(_EXCEPTION_RECORD * pException, _CONTEXT * pContext, unsigned long dwCode, Thread * pCurThread, DebuggerSteppingInfo * pDebuggerSteppingInfo) Line 4618   C++
      coreclr.dll!Debugger::FirstChanceNativeException(_EXCEPTION_RECORD * exception, _CONTEXT * context, unsigned long code, Thread * thread, int fIsVEH) Line 5470    C++
      coreclr.dll!IsDebuggerFault(_EXCEPTION_RECORD * pExceptionRecord, _CONTEXT * pContext, unsigned long exceptionCode, Thread * pThread) Line 5811 C++

And the DebuggerController lock is held by thread 0x1E38 which is here:

      [Inline Frame] coreclr.dll!CLREventBase::Wait(unsigned long) Line 412   C++
      coreclr.dll!Thread::WaitSuspendEventsHelper() Line 4458     C++
      coreclr.dll!Thread::RareDisablePreemptiveGC() Line 2185     C++
      [Inline Frame] coreclr.dll!Thread::DisablePreemptiveGC() Line 1288      C++
      [Inline Frame] coreclr.dll!GCHolderBase::EnterInternalCoop_HackNoThread(bool) Line 4619   C++
      [Inline Frame] coreclr.dll!GCCoopHackNoThread::{ctor}(bool) Line 4834   C++
      [Inline Frame] coreclr.dll!HashMap::LookupValue(unsigned __int64) Line 552    C++
      [Inline Frame] coreclr.dll!PtrHashMap::LookupValue(unsigned __int64 key, void *) Line 607 C++
      [Inline Frame] coreclr.dll!ReadyToRunInfo::GetMethodDescForEntryPointInNativeImage(unsigned __int64) Line 376     C++
      [Inline Frame] coreclr.dll!ReadyToRunInfo::GetMethodDescForEntryPoint(unsigned __int64 entryPoint) Line 104 C++
      coreclr.dll!ReadyToRunJitManager::JitCodeToMethodInfo(RangeSection * pRangeSection, unsigned __int64 currentPC, MethodDesc * * ppMethodDesc, EECodeInfo * pCodeInfo) Line 6451      C++
      coreclr.dll!EECodeInfo::Init(unsigned __int64 codeAddress, ExecutionManager::ScanFlag scanFlag) Line 15025  C++
      [Inline Frame] coreclr.dll!EECodeInfo::{ctor}(unsigned __int64) Line 2877     C++
      [Inline Frame] coreclr.dll!ExecutionManager::GetCodeStartAddress(unsigned __int64) Line 4980    C++
      coreclr.dll!EEDbgInterfaceImpl::GetNativeCodeStartAddress(unsigned __int64 address) Line 416    C++
      coreclr.dll!Debugger::GetJitInfoWorker(MethodDesc * fd, const unsigned char * pbAddr, DebuggerMethodInfo * * pMethInfo) Line 2711   C++
      [Inline Frame] coreclr.dll!Debugger::GetJitInfo(MethodDesc *) Line 2650 C++
      coreclr.dll!DebuggerController::BindPatch(DebuggerControllerPatch * patch, MethodDesc * pMD, const unsigned char *) Line 1432 C++
      coreclr.dll!DebuggerController::AddBindAndActivatePatchForMethodDesc(MethodDesc * fd, DebuggerJitInfo * dji, unsigned __int64 nativeOffset, DebuggerPatchKind kind, FramePointer fp, AppDomain * pAppDomain) Line 2242  C++
      coreclr.dll!DebuggerController::AddBindAndActivateILReplicaPatch(DebuggerControllerPatch * primary, DebuggerJitInfo * dji) Line 2006      C++
      coreclr.dll!Debugger::MapPatchToDJI(DebuggerControllerPatch * dcp, DebuggerJitInfo * djiTo) Line 4963 C++
      coreclr.dll!Debugger::MapAndBindFunctionPatches(DebuggerJitInfo * djiNew, MethodDesc * fd, const unsigned char *) Line 4879   C++
      coreclr.dll!Debugger::JITComplete(NativeCodeVersion nativeCodeVersion, unsigned __int64 newAddress) Line 2525     C++

Reproduction Steps

We don't have a reliable repro. With the right timing you would need to:

  1. Debug an app with a managed debugger and set a breakpoint on a ReadyToRun method that hasn't executed yet.
  2. Be in the middle stepping through code on debuggee thread A at the time debuggee thread B first executes the ReadyToRun method with the breakpoint.

Expected behavior

Debugger hits the breakpoint as expected.

Actual behavior

Debugger hangs.

Regression?

The root cause seems to have existed for a while but its possible other changes in the code have impacted the timing which changes the likelihood.

Known Workarounds

None so far.

Configuration

.NET 10 windows x64 is where the issue was spotted but code analysis suggests the race condition could probably be hit on other .NET versions, OSes, and architectures if timed properly.

Other information

The root cause appears to be bad lock ordering in the HashMap code which does a GC mode transition where it isn't always safe to do so.

DISABLED(GC_NOTRIGGER); // This is not a bug, we cannot decide, since the function ptr called may be either.
. Fixing that root cause has a partial implementation here: #123492

However reworking the HashMap memory cleanup is probably not low risk enough to back port to prior .NET releases so we'd also like to make a more targetted fix. A likely small fix for this case is to avoid having DebuggerController::BindPatch call GetJitInfo() as much as possible. The caller to BindPatch, AddBindAndActivatePatchForMethodDesc already knows the DebuggerJitInfo* and it could pass that pointer to BindPatch. This would avoid BindPatch looking it up again which goes down the problematic code path.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions