-
Notifications
You must be signed in to change notification settings - Fork 749
Fix race condition in concurrent DeploymentState access causing intermittent AzureDeployerTests failures #11974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…failures Co-authored-by: radical <[email protected]>
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 11974Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 11974" |
1 similar comment
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 11974Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 11974" |
Co-authored-by: radical <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a race condition in JsonExtensions.Prop() that was causing intermittent failures in AzureDeployerTests when multiple Azure Bicep resources were provisioned concurrently. The race condition occurred when multiple threads tried to access the same JSON property simultaneously, leading to InvalidOperationException or KeyNotFoundException.
Key changes:
- Added thread-safe locking mechanism to
JsonExtensions.Prop()method - Simplified the get-or-create logic by replacing
TryAddwith lock-protectedAdd - Added comprehensive unit tests to verify thread-safety and concurrent access behavior
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/Aspire.Hosting.Azure/Provisioning/JsonExtensions.cs |
Added lock-based synchronization to prevent race conditions in concurrent property access |
tests/Aspire.Hosting.Azure.Tests/JsonExtensionsTests.cs |
Added unit tests including concurrent access verification and edge case coverage |
| if (node is not null) | ||
| // Lock on the JsonObject to ensure thread-safe access when multiple | ||
| // bicep resources are being provisioned in parallel | ||
| lock (jsonObj) |
Copilot
AI
Oct 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locking on the JsonObject instance itself can be problematic as it's a public object that external code might also lock on, potentially causing deadlocks. Consider using a private static readonly object or ConcurrentDictionary for thread-safe operations instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this. I think thread safety would be better added to prevent multiple threads from using the JSON at once, rather than making Prop method itself safe.
Making Prop safe against itself just kicks the problem down to the next thing that breaks when multiple threads are reading/writing JSON at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot Address the feedback here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this. I think thread safety would be better added to prevent multiple threads from using the JSON at once, rather than making Prop method itself safe.
Making Prop safe against itself just kicks the problem down to the next thing that breaks when multiple threads are reading/writing JSON at the same time.
@copilot consider this feedback. Look at how the json objects are being used and from where. And how we can make it thread safe for writes, and reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this. I think thread safety would be better added to prevent multiple threads from using the JSON at once, rather than making Prop method itself safe.
Making Prop safe against itself just kicks the problem down to the next thing that breaks when multiple threads are reading/writing JSON at the same time.
@copilot This feedback has not been correctly addressed. Look at how the json objects are being used and from where. And how we can make it thread safe for writes, and reads. And then fix that. Add relevant tests too.
JamesNK
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #11974 (comment)
…bject Co-authored-by: radical <[email protected]>
Changed to use Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
@copilot This feedback has not been correctly addressed. Reconsider it. Look at how the json objects are being used and from where. And how we can make it thread safe for writes, and reads. And then fix that. Add relevant tests too. |
Address feedback to provide thread-safety at the ProvisioningContext level rather than in the Prop method itself. This ensures all DeploymentState access is synchronized when multiple resources are provisioned in parallel. - Added WithDeploymentState methods to ProvisioningContext for thread-safe access - Updated BicepProvisioner to use WithDeploymentState for all state modifications - Reverted JsonExtensions.Prop to simpler implementation without locking - Updated tests: removed concurrent test from JsonExtensionsTests - Added comprehensive thread-safety tests to ProvisioningContextTests Co-authored-by: radical <[email protected]>
Reconsidered the approach. Thread-safety is now provided at the ProvisioningContext level using Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Issues is fixed and it's blocking PRs!
| // Note: This method should only be called from within a synchronized context | ||
| // (e.g., within ProvisioningContext.WithDeploymentState) when multiple threads | ||
| // may be accessing the same JsonObject concurrently | ||
| node = new JsonObject(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot undo these changes since the core issue is fixed in WithDeploymentState
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted JsonExtensions.cs to its original implementation. The thread-safety is now solely handled by WithDeploymentState at the ProvisioningContext level. Commit: (see latest)
As suggested by @davidfowl, revert changes to JsonExtensions.cs since the core thread-safety issue is now properly handled by WithDeploymentState at the ProvisioningContext level. The original TryAdd logic with retry is restored. Co-authored-by: davidfowl <[email protected]>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Fix race condition in concurrent DeploymentState access causing intermittent AzureDeployerTests failures
Root Cause Analysis
The tests fail intermittently due to a race condition when multiple Azure Bicep resources are provisioned in parallel (via
Task.WhenAllinAzureEnvironmentResource.ProvisionAzureBicepResourcesAsync). All provisioning tasks share the sameDeploymentStateJsonObject throughProvisioningContextand access it concurrently without synchronization.The race condition occurs when:
jsonObj[key]and getnullInvalidOperationException: Collection was modifiedorKeyNotFoundExceptionImplementation Plan
Changes Made
Modified
src/Aspire.Hosting.Azure/Provisioning/ProvisioningContext.cs_deploymentStateLockobjectWithDeploymentState(Action<JsonObject>)method for synchronized state accessWithDeploymentState<T>(Func<JsonObject, T>)method for synchronized state readsModified
src/Aspire.Hosting.Azure/Provisioning/Provisioners/BicepProvisioner.cscontext.WithDeploymentState()src/Aspire.Hosting.Azure/Provisioning/JsonExtensions.csUpdated
tests/Aspire.Hosting.Azure.Tests/JsonExtensionsTests.csUpdated
tests/Aspire.Hosting.Azure.Tests/ProvisioningContextTests.csWithDeploymentState_ConcurrentAccess_IsThreadSafetestWithDeploymentState_Action_ExecutesSuccessfullytestWithDeploymentState_Func_ReturnsValuetestWithDeploymentState_ConcurrentReadsAndWrites_MaintainsConsistencytestTest Results
All tests pass consistently:
Design Rationale
The approach is superior because:
This addresses the feedback that making Prop thread-safe "just kicks the problem down to the next thing that breaks when multiple threads are reading/writing JSON at the same time."
Original prompt
This section details on the original issue you should resolve
<issue_title>[Failing test]: Aspire.Hosting.Azure.Tests.AzureDeployerTests fail on Azure DevOps:
the given key 'AZURE_CONTAINER_REGISTRY_NAME' was not present in the dictionary.</issue_title><issue_description>### Is there an existing issue for this?
Build information
Build:
Build error leg or test failing:
Aspire.Hosting.Azure.Tests.AzureDeployerTests
Pull Request: N/A
Fill in the error message template
Error Message
Multiple tests in
Aspire.Hosting.Azure.Tests.AzureDeployerTestsare failing exclusively on Azure DevOps (passing on GitHub Actions workflows) withKeyNotFoundExceptionwhen trying to access Azure resource configuration keys.Primary Failures (KeyNotFoundException)
All 7 tests fail with the same root cause - missing Azure resource configuration keys:
DeployAsync_WithMultipleComputeEnvironments_Works(line 370) - Missing:AZURE_CONTAINER_REGISTRY_NAMEDeployAsync_WithContainer_Works(line 188) - Missing:AZURE_CONTAINER_REGISTRY_NAMEDeployAsync_WithProjectResource_Works(line 291) - Missing:AZURE_CONTAINER_REGISTRY_NAMEDeployAsync_WithDockerfile_Works(line 235) - Missing:AZURE_CONTAINER_REGISTRY_NAMEDeployAsync_WithOnlyAzureResources_PrintsDashboardUrl(line 639) - Missing:AZURE_CONTAINER_APPS_ENVIRONMENT_DEFAULT_DOMAINDeployAsync_WithSingleRedisCache_CallsDeployingComputeResources(line 589) - Missing:AZURE_CONTAINER_REGISTRY_NAMEDeployAsync_WithAzureFunctionsProject_Works(line 891) - Missing:AZURE_CONTAINER_REGISTRY_NAMESecondary Failures (Missing Project Files)
2 additional tests failed due to missing test project files:
PublishAsAzureAppServiceWebsite_ThrowsIfNoEnvironmentMultipleAzureAppServiceEnvironmentsSupportedStack Trace
Similar stack traces occur for other failing tests at different line numbers (188, 235, 291, 589, 639, 891).
Notable Environment Differences (AzDO vs GitHub Actions)
Could not harvest all abandoned containers {"Error": "exec: \"docker\": executable file not found in %PATH%"}Other info
Test Context and Environment
The tests are using mock Azure provisioning with test credentials:
info: Aspire.Hosting.Azure.Provisioning.Internal.PublishModeProvisioningContextProvider[0]
Default subscription: Test Subscription (/subscriptions/12345678-1234-1234-1234-123456789012)
info: Aspire.Hosting.Azure.Provisioning.Internal.PublishModeProvisioningContextProvider[0]
Tenant: 87654321-4321-4321-4321-210987654321
info: Aspire.Hosting.Azure.Provisioning.Internal.PublishModeProvisioningContextProvider[0]
Using existing resource group test-rg.
Analysis
Root Cause: The tests expect Azure resource configuration keys (primarily
AZURE_CONTAINER_REGISTRY_NAMEandAZURE_CONTAINER_APPS_ENVIRONMENT_DEFAULT_DOMAIN) to be present in a dictionary during the deployment provisioning phase, but these keys are not being populated in the AzDO/Helix environment.Report
Summary
<agent_instructions>This seems to be a race condition. It fails only some times. Debug the test, and figure out how to reproduce the issue. Once you can reproduce, figure out how to fix the problem.</agent_instructions>
Comments on the Issue (you are @copilot in this section)
@radical cc @mitchdenny @eerhardt @captainsafia The fact that these ...Fixes #11728
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.