Skip to content

SynchronizationLockException in DefaultInstanceProfileAWSCredentials.GetCredentialsAsync() #4199

@msab-john

Description

@msab-john

Describe the bug

The call to ExitWriteLock is completed on a thread other than the one that acquired the write lock.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

Correct synchronization

Current Behavior

We get an SynchronizationLockException with the error The write lock is being released without being held.

Reproduction Steps

See stack trace below,

We observe the following error,

System.Threading.SynchronizationLockException: The write lock is being released without being held.
   at System.Threading.ReaderWriterLockSlim.ExitWriteLock()
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
   at Amazon.Runtime.Internal.EndpointDiscoveryHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.BaseAuthResolverHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.S3.Internal.AmazonS3ExceptionHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.MetricsHandler.InvokeAsync[T](IExecutionContext executionContext)
   at MSAB.Amazon.AmazonSimpleStorageService.OpenReadAsync(StoragePath path, CancellationToken cancellationToken) in C:\BuildAgent\work\426bfb8899671fab\packages\storage-server\aws\AmazonSimpleStorageService.cs:line 155
   at MSAB.BlockListFileStream.OpenAsync(IStorageReader storage, StorageBucketName bucket, BinaryValue hash, CancellationToken cancellationToken) in C:\BuildAgent\work\426bfb8899671fab\packages\storage-server\BlockStream.cs:line 9
   at MSAB.FileStreamingService.OpenFile(IAsyncStreamReader`1 requestStream, IServerStreamWriter`1 responseStream, ServerCallContext ctx) in C:\BuildAgent\work\426bfb8899671fab\packages\storage-server\FileStreamingService.cs:line 124
   at MSAB.FileStreamingService.OpenFile(IAsyncStreamReader`1 requestStream, IServerStreamWriter`1 responseStream, ServerCallContext ctx) in C:\BuildAgent\work\426bfb8899671fab\packages\storage-server\FileStreamingService.cs:line 80
   at MSAB.FileStreamingService.OpenFile(IAsyncStreamReader`1 requestStream, IServerStreamWriter`1 responseStream, ServerCallContext ctx) in C:\BuildAgent\work\426bfb8899671fab\packages\storage-server\FileStreamingService.cs:line 297
   at Grpc.Shared.Server.DuplexStreamingServerMethodInvoker`3.Invoke(HttpContext httpContext, ServerCallContext serverCallContext, IAsyncStreamReader`1 requestStream, IServerStreamWriter`1 responseStream)
   at Grpc.Shared.Server.DuplexStreamingServerMethodInvoker`3.Invoke(HttpContext httpContext, ServerCallContext serverCallContext, IAsyncStreamReader`1 requestStream, IServerStreamWriter`1 responseStream)
   at Grpc.AspNetCore.Server.Internal.CallHandlers.DuplexStreamingServerCallHandler`3.HandleCallAsyncCore(HttpContext httpContext, HttpContextServerCallContext serverCallContext)
   at Grpc.AspNetCore.Server.Internal.CallHandlers.ServerCallHandlerBase`3.<HandleCallAsync>g__AwaitHandleCall|8_0(HttpContextServerCallContext serverCallContext, Method`2 method, Task handleCall)

If we look at the code we can observe that it looks logically correct.

public override async Task<ImmutableCredentials> GetCredentialsAsync()
{
CheckIsIMDSEnabled();
ImmutableCredentials credentials = null;
// Try to acquire read lock. The thread would be blocked if another thread has write lock.
if (_credentialsLock.TryEnterReadLock(_credentialsLockTimeout))
{
try
{
if (null != _lastRetrievedCredentials)
{
if (_lastRetrievedCredentials.IsExpiredWithin(TimeSpan.Zero) &&
!_imdsRefreshFailed)
{
// this is the first failure - immediately try to renew
_imdsRefreshFailed = true;
_lastRetrievedCredentials = await FetchCredentialsAsync().ConfigureAwait(false);
}
// if credentials are expired, we'll still return them, but log a message about
// them being expired.
if (_lastRetrievedCredentials.IsExpiredWithin(TimeSpan.Zero))
{
_logger.InfoFormat(_usingExpiredCredentialsFromIMDS);
}
else
{
_imdsRefreshFailed = false;
}
return _lastRetrievedCredentials?.Credentials.Copy();
}
}
finally
{
_credentialsLock.ExitReadLock();
}
}
// If there's no credentials cached, hit IMDS directly. Try to acquire write lock.
if (_credentialsLock.TryEnterWriteLock(_credentialsLockTimeout))
{
try
{
// Check for last retrieved credentials again in case other thread might have already fetched it.
if (null == _lastRetrievedCredentials)
{
_lastRetrievedCredentials = await FetchCredentialsAsync().ConfigureAwait(false);
}
if (_lastRetrievedCredentials.IsExpiredWithin(TimeSpan.Zero) &&
!_imdsRefreshFailed)
{
// this is the first failure - immediately try to renew
_imdsRefreshFailed = true;
_lastRetrievedCredentials = await FetchCredentialsAsync().ConfigureAwait(false);
}
// if credentials are expired, we'll still return them, but log a message about
// them being expired.
if (_lastRetrievedCredentials.IsExpiredWithin(TimeSpan.Zero))
{
_logger.InfoFormat(_usingExpiredCredentialsFromIMDS);
}
else
{
_imdsRefreshFailed = false;
}
credentials = _lastRetrievedCredentials.Credentials?.Copy();
}
finally
{
_credentialsLock.ExitWriteLock();
}
}
if (credentials == null)
{
throw new AmazonServiceException(FailedToGetCredentialsMessage);
}
return credentials;
}

Error is thrown on this line

Because this async call is awaited in between TryEnterWriteLock and ExitWriteLock.

_lastRetrievedCredentials = await FetchCredentialsAsync().ConfigureAwait(false);

The root cause is that there's multiple threads involved and that the a different thread than the one that acquired the write lock is what's releasing the write lock.

Internally, the ReaderWriterLockSlim is using TLS to track enter/leave counts for read and write locks.

https://github.com/dotnet/runtime/blob/304e5a8e52b80f2c7d466c42f5bac8191ad78f71/src/libraries/System.Private.CoreLib/src/System/Threading/ReaderWriterLockSlim.cs#L85-L87

And that's why this code doesn't work, you cannot use the ReaderWriteLockSlim with async code like this and expect it to work.

Possible Solution

Revert to 3.7.401.2 or use something like SemaphoreSlim which doesn't have the same issue, or consider some other synchronization mechanism.

Additional Information/Context

No response

AWS .NET SDK and/or Package version used

AWSSDK.S3 4.0.2.1

Targeted .NET Platform

.NET 8

Operating System and version

Windows 11, AmazonLinux

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions