-
Notifications
You must be signed in to change notification settings - Fork 526
[Internal] Per Partition Automatic Failover: Fixes Metadata Requests Retry Policy #4205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Internal] Per Partition Automatic Failover: Fixes Metadata Requests Retry Policy #4205
Conversation
…gateway timeouts.
Microsoft.Azure.Cosmos/src/HttpClient/HttpTimeoutPolicyDefault.cs
Outdated
Show resolved
Hide resolved
Microsoft.Azure.Cosmos/src/HttpClient/HttpTimeoutPolicyDefault.cs
Outdated
Show resolved
Hide resolved
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for few small issues
…quests_on_gateway_timeouts
Microsoft.Azure.Cosmos/src/HttpClient/HttpTimeoutPolicyMetadataRead.cs
Outdated
Show resolved
Hide resolved
Microsoft.Azure.Cosmos/src/HttpClient/HttpTimeoutPolicyMetadataRead.cs
Outdated
Show resolved
Hide resolved
ealsur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting on the policy behavior that marks the region as unavailable on 503s. We explicitly document publicly we do not do this: https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/troubleshoot-sdk-availability#transient-connectivity-issues-on-tcp-protocol
ealsur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unblocking the PR as changes are coming
…ation index. Addressed review comments.
…quests_on_gateway_timeouts
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now
Microsoft.Azure.Cosmos/src/MetadataRequestThrottleRetryPolicy.cs
Outdated
Show resolved
Hide resolved
2f427e3
xinlian12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - Thanks
Pull Request Template
Description
Background:
The Cosmos .NET V3 SDK should attempt to retry on to another region for fetching the collection information (Read Collection call) or partition key ranges information (Get PkRanges call), if the master partition of the primary region is in complete quorum loss. However, this is not happening today reason being the request to the routing gateway takes more than
65seconds to respond back, thus timing out the SDK request. The SDK makes3retries, each of which times out within65seconds. Today, our .NET v3 SDK doesn't retry on gateway timeouts (on TaskCancelled exceptions), thus if the metadata information is not retrieved, then the SDK is stuck to get initialized.Account Setup For 3 regions : Create a cosmos account with 3 regions, P1 (Write), P2 (Read) and P3 (Read). The PPAF configuration from the BE is to failover to P2, in case P1 is unavailable.
Scenario: While creating the cosmos client, in the application preferred region, provide P1, P2 and P3 as preferred regions.
Before initializing the cosmos client, use the service fabric commands to trigger a "full quorum loss" on the master partition.
Current Behavior: The SDK keeps retrying on the region P1 for reading the collection information and times out eventually. To understand this better, take a look at the below diagnostics:
Diagnostics Snippet - Scenario: Master Partitions are in Complete Quorum Loss.
{ "name": "Read Collection", "duration in milliseconds": 196034.3648, "data": { "Client Side Request Stats": { "Id": "AggregatedClientSideRequestStatistics", "ContactedReplicas": [], "RegionsContacted": [], "FailedReplicas": [], "AddressResolutionStatistics": [], "StoreResponseStatistics": [], "HttpResponseStats": [ { "StartTimeUTC": "2023-10-20T21:57:22.9028944Z", "DurationInMs": 65009.841, "RequestUri": "https://dkundatest-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments", "ResourceType": "Collection", "HttpMethod": "GET", "ActivityId": "e0029295-d533-4e5f-8d1d-9f0957abe01f", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2023-10-20T21:58:27.9128502Z", "DurationInMs": 65008.3658, "RequestUri": "https://dkundatest-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments", "ResourceType": "Collection", "HttpMethod": "GET", "ActivityId": "e0029295-d533-4e5f-8d1d-9f0957abe01f", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2023-10-20T21:59:33.9271150Z", "DurationInMs": 65010.0114, "RequestUri": "https://dkundatest-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments", "ResourceType": "Collection", "HttpMethod": "GET", "ActivityId": "e0029295-d533-4e5f-8d1d-9f0957abe01f", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." } ] } } }Expected Behavior/ Acceptance Criteria:
Ideally, the above setup should have worked and the SDK should have retried to the region P2 to get the collection information from the gateway. Note that this behavior is expected irrespective to the fact that per partition automatic failover is enabled or not.
Scope:
CosmosClientset inDirectmode.CosmosClient, when all the metadata information are needed to be fetched and cached.High Level Changes:
TaskCancelledExceptionwhich is eventually wrapped in aCosmosExceptionfrom theCosmosHttpClient. The idea is to extend the retry policy to retry on CosmosExceptions.MetadataRequestThrottleRetryPolicy, which is a wrapper around theRequestThrottleRetryPolicyand particularly handles all of the metadata requests. The purpose is to mark an endpoint unavailable for read, when a gateway timeout occurs, so that the next retry could happen on another region.Design Approach:
Current Flow:
sequenceDiagram participant J as ClientRetryPolicy <br> [v3 Code] participant A as ConsistencyWriter <br> [Direct Code] participant B as AddressSelector <br> [Direct Code] participant C as GlobalAddressResolver <br> [v3 Code] participant D as AddressResolver <br> [v3 Code] participant E as PartitionKeyRangeCache <br> [v3 Code] participant K as ResourceThrottleRetryPolicy <br> [v3 Code] participant F as GatewayStoreModel <br> [v3 Code] participant G as GatewayAddressCache <br> [v3 Code] participant H as GatewayStoreClient <br> [v3 Code] participant I as CosmosHttpClient <br> [v3 Code] J-->>A: 1. DocumentService<br>Request loop Retry Iterations with force-refresh flag = false/ true A->>B: 2. ResolveAddressesAsync() B->>C: 3. ResolveAsync() C->>D: 4. ResolveAsync() D->>D: 5. ResolveAddressesAndIdentityAsync() D->>E: 6. TryLookupAsync() E->>E: 7. GetRoutingMapFor<br>CollectionAsync() E->>E: 8. ExecutePartitionKey<br>RangeReadChangeFeedAsync() E->>F: 9. ProcessMessageAsync() F->>H: 10. InvokeAsync() H->>I: 11. SendHttpAsync() Note over K: PartitionKeyRangeCache uses the <br> ResourceThrottleRetryPolicy to <br> retry the GET PkRanges call I-->>K: 12. CosmosException with 503 on timeout K->>K: 13. ShouldRetryAsync() <br> = RetryResult.NoRetry() end K-->>J: 14. CosmosException with 503 on timeout J->>J: 15. RetryAfter <br> (TimeSpan.Zero) D->>G: 16. TryGetAddressesAsync() G->>I: 17. GetAsync()Sample Diagnostics with Current Flow:
Scenario: Master Partitions are in Complete Quorum Loss.
{ "Summary": { "GatewayCalls": { "(0, 0)": 9, "(200, 0)": 1 } }, "name": "CreateItemAsync", "start datetime": "2023-11-09T23:51:23.579Z", "duration in milliseconds": 168399.8519, "data": { "Client Configuration": { "Client Created Time Utc": "2023-11-09T23:51:23.5574825Z", "MachineId": "vmId:9e487c80-ca5c-4cca-9076-1210bb14fa05", "VM Region": "eastus", "NumberOfClientsCreated": 1, "NumberOfActiveClients": 1, "ConnectionMode": "Direct", "User Agent": "cosmos-netstandard-sdk/3.36.0|1|X64|Microsoft Windows 10.0.22631|.NET 6.0.24|L|", "ConnectionConfig": { "gw": "(cps:50, urto:5, p:False, httpf: False)", "rntbd": "(cto: 5, icto: -1, mrpc: 30, mcpe: 65535, erd: True, pr: ReuseUnicastPort)", "other": "(ed:False, be:False)" }, "ConsistencyConfig": "(consistency: NotSet, prgns:[North Central US, East Asia], apprgn: )", "ProcessorCount": 16 } }, "children": [ { "name": "ItemSerialize", "duration in milliseconds": 3.7082 }, { "name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler", "duration in milliseconds": 168385.8862, "children": [ { "name": "Waiting for Initialization of client to complete", "duration in milliseconds": 5895.6745 }, { "name": "Get Collection Cache", "duration in milliseconds": 0.006 }, { "name": "Read Collection", "duration in milliseconds": 19066.7017, "data": { "Client Side Request Stats": { "Id": "AggregatedClientSideRequestStatistics", "ContactedReplicas": [], "RegionsContacted": [], "FailedReplicas": [], "AddressResolutionStatistics": [], "StoreResponseStatistics": [], "HttpResponseStats": [ { "StartTimeUTC": "2023-11-09T23:51:29.5225663Z", "DurationInMs": 3007.5937, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments", "ResourceType": "Collection", "HttpMethod": "GET", "ActivityId": "f1f6707f-a56f-4489-8831-df39c8c2ec8d", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2023-11-09T23:51:32.5303753Z", "DurationInMs": 5025.083, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments", "ResourceType": "Collection", "HttpMethod": "GET", "ActivityId": "f1f6707f-a56f-4489-8831-df39c8c2ec8d", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2023-11-09T23:51:38.5623660Z", "DurationInMs": 10015.7329, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments", "ResourceType": "Collection", "HttpMethod": "GET", "ActivityId": "f1f6707f-a56f-4489-8831-df39c8c2ec8d", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." } ] } } }, { "name": "Read Collection", "duration in milliseconds": 233.1066, "data": { "Client Side Request Stats": { "Id": "AggregatedClientSideRequestStatistics", "ContactedReplicas": [], "RegionsContacted": [], "FailedReplicas": [], "AddressResolutionStatistics": [], "StoreResponseStatistics": [], "HttpResponseStats": [ { "StartTimeUTC": "2023-11-09T23:51:48.5879667Z", "DurationInMs": 211.8006, "RequestUri": "https://dkppaf-eastasia.documents-test.windows-int.net/dbs/comments/colls/comments", "ResourceType": "Collection", "HttpMethod": "GET", "ActivityId": "63fe7148-3895-44b1-9f51-ed0c97b5b9a8", "StatusCode": "OK" } ] } } }, { "name": "Microsoft.Azure.Cosmos.Handlers.DiagnosticsHandler", "duration in milliseconds": 143152.8688, "data": { "System Info": { "systemHistory": [ { "dateUtc": "2023-11-09T23:53:18.6722453Z", "cpu": 2.282, "memory": 47082876, "threadInfo": { "isThreadStarving": "False", "threadWaitIntervalInMs": 0.0768, "availableThreads": 32765, "minThreads": 16, "maxThreads": 32767 }, "numberOfOpenTcpConnection": 0 }, { "dateUtc": "2023-11-09T23:53:28.6792048Z", "cpu": 2.108, "memory": 47083920, "threadInfo": { "isThreadStarving": "False", "threadWaitIntervalInMs": 0.0628, "availableThreads": 32765, "minThreads": 16, "maxThreads": 32767 }, "numberOfOpenTcpConnection": 0 }, { "dateUtc": "2023-11-09T23:53:38.6920165Z", "cpu": 2.264, "memory": 47087508, "threadInfo": { "isThreadStarving": "False", "threadWaitIntervalInMs": 0.2903, "availableThreads": 32765, "minThreads": 16, "maxThreads": 32767 }, "numberOfOpenTcpConnection": 0 }, { "dateUtc": "2023-11-09T23:53:48.7054952Z", "cpu": 3.256, "memory": 47122948, "threadInfo": { "isThreadStarving": "False", "threadWaitIntervalInMs": 0.062, "availableThreads": 32765, "minThreads": 16, "maxThreads": 32767 }, "numberOfOpenTcpConnection": 0 }, { "dateUtc": "2023-11-09T23:53:58.7185744Z", "cpu": 3.471, "memory": 47119588, "threadInfo": { "isThreadStarving": "False", "threadWaitIntervalInMs": 0.0841, "availableThreads": 32765, "minThreads": 16, "maxThreads": 32767 }, "numberOfOpenTcpConnection": 0 }, { "dateUtc": "2023-11-09T23:54:08.7322754Z", "cpu": 2.887, "memory": 47115352, "threadInfo": { "isThreadStarving": "False", "threadWaitIntervalInMs": 0.055, "availableThreads": 32765, "minThreads": 16, "maxThreads": 32767 }, "numberOfOpenTcpConnection": 0 } ] } }, "children": [ { "name": "Microsoft.Azure.Cosmos.Handlers.TelemetryHandler", "duration in milliseconds": 143152.268, "children": [ { "name": "Microsoft.Azure.Cosmos.Handlers.RetryHandler", "duration in milliseconds": 143151.2067, "children": [ { "name": "Microsoft.Azure.Cosmos.Handlers.RouterHandler", "duration in milliseconds": 71588.955, "children": [ { "name": "Microsoft.Azure.Cosmos.Handlers.TransportHandler", "duration in milliseconds": 71588.4398, "children": [ { "name": "Microsoft.Azure.Documents.ServerStoreModel Transport Request", "duration in milliseconds": 71585.6993, "data": { "Client Side Request Stats": { "Id": "AggregatedClientSideRequestStatistics", "ContactedReplicas": [], "RegionsContacted": [], "FailedReplicas": [], "AddressResolutionStatistics": [], "StoreResponseStatistics": [], "HttpResponseStats": [ { "StartTimeUTC": "2023-11-09T23:51:48.8523547Z", "DurationInMs": 513.076, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges", "ResourceType": "PartitionKeyRange", "HttpMethod": "GET", "ActivityId": "3f88c75b-1699-4df7-84fb-c32f18295fc8", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2023-11-09T23:51:49.3655972Z", "DurationInMs": 5005.4728, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges", "ResourceType": "PartitionKeyRange", "HttpMethod": "GET", "ActivityId": "3f88c75b-1699-4df7-84fb-c32f18295fc8", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2023-11-09T23:51:55.3768757Z", "DurationInMs": 65019.4496, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges", "ResourceType": "PartitionKeyRange", "HttpMethod": "GET", "ActivityId": "3f88c75b-1699-4df7-84fb-c32f18295fc8", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." } ] } } } ] } ] }, { "name": "Microsoft.Azure.Cosmos.Handlers.RouterHandler", "duration in milliseconds": 71558.0089, "children": [ { "name": "Microsoft.Azure.Cosmos.Handlers.TransportHandler", "duration in milliseconds": 71558.0034, "children": [ { "name": "Microsoft.Azure.Documents.ServerStoreModel Transport Request", "duration in milliseconds": 71557.9089, "data": { "Client Side Request Stats": { "Id": "AggregatedClientSideRequestStatistics", "ContactedReplicas": [], "RegionsContacted": [], "FailedReplicas": [], "AddressResolutionStatistics": [], "StoreResponseStatistics": [], "HttpResponseStats": [ { "StartTimeUTC": "2023-11-09T23:53:00.4191476Z", "DurationInMs": 520.8107, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges", "ResourceType": "PartitionKeyRange", "HttpMethod": "GET", "ActivityId": "39c6f277-ca6e-4bf4-962d-d74d3662f832", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2023-11-09T23:53:00.9400252Z", "DurationInMs": 5014.2824, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges", "ResourceType": "PartitionKeyRange", "HttpMethod": "GET", "ActivityId": "39c6f277-ca6e-4bf4-962d-d74d3662f832", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2023-11-09T23:53:06.9615907Z", "DurationInMs": 65012.6398, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges", "ResourceType": "PartitionKeyRange", "HttpMethod": "GET", "ActivityId": "39c6f277-ca6e-4bf4-962d-d74d3662f832", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." } ] } } } ] } ] } ] } ] } ] } ] } ] }Proposed Flow:
sequenceDiagram participant J as ClientRetryPolicy <br> [v3 Code] participant A as ConsistencyWriter <br> [Direct Code] participant B as AddressSelector <br> [Direct Code] participant C as GlobalAddressResolver <br> [v3 Code] participant D as AddressResolver <br> [v3 Code] participant E as PartitionKeyRangeCache <br> [v3 Code] participant K as MetadataRequestThrottleRetryPolicy <br> [v3 Code] participant F as GatewayStoreModel <br> [v3 Code] participant G as GatewayAddressCache <br> [v3 Code] participant H as GatewayStoreClient <br> [v3 Code] participant I as CosmosHttpClient <br> [v3 Code] participant L as GlobalEndpointManager <br> [v3 Code] participant M as TransportClient <br> [Direct Code] J-->>A: 1. DocumentService<br>Request loop Retry Iterations with force-refresh flag = false/ true A->>B: 2. ResolveAddressesAsync() B->>C: 3. ResolveAsync() C->>D: 4. ResolveAsync() D->>D: 5. ResolveAddressesAndIdentityAsync() D->>E: 6. TryLookupAsync() E->>E: 7. GetRoutingMapFor<br>CollectionAsync() E->>E: 8. ExecutePartitionKey<br>RangeReadChangeFeedAsync() E->>K: 9. OnBeforeSendRequest() K->>K: 10. Sets requestContext.RouteToLocation() <br> using current location index K->>L: 11. ResolveServiceEndpoint() L-->>K: 12. Resolve and save current endpoint E->>F: 13. ProcessMessageAsync(requestContext) F->>H: 14. InvokeAsync() H->>I: 15. SendHttpAsync() Note over K: PartitionKeyRangeCache uses the <br> MetadataRequestThrottleRetryPolicy <br> to retry the GET PkRanges call. <br> OperationType = ReadFeed. <br> ResourceType = PartitionKeyRange I-->>K: 16. CosmosException with 503 I-->>D: 22. Successful Response in second attempt. K->>K: 17. ShouldRetryAsync() kicks in. K->>K: 18. IncrementRetryIndexOnService<br>UnavailableForMetadataRead().<br> Increments the next location index. K->>K: 19. Retry the request <br> on the next location index. end K-->>J: 20. CosmosException with 503 J->>J: 21. RetryAfter <br> (TimeSpan.Zero) D->>G: 23. TryGetAddressesAsync() G->>I: 24. GetAsync() I-->>G: 25. Gets Address Information G-->>D: 26. Gets Address Information D-->>A: 27. Gets PerProtocolPartitionAddressInformation A->>M: 28. InvokeResourceOperationAsync(primaryUri)Complete Diagnostics After the Above Design Changes:
Scenario: Master Partitions are in Complete Quorum Loss.
{ "Summary": { "DirectCalls": { "(201, 0)": 1, "(204, 0)": 2 }, "GatewayCalls": { "(0, 0)": 7, "(200, 0)": 3, "(304, 0)": 1 } }, "name": "CreateItemAsync", "start datetime": "2024-01-09T00:07:55.674Z", "duration in milliseconds": 52766.2797, "data": { "Client Configuration": { "Client Created Time Utc": "2024-01-09T00:07:55.6480544Z", "MachineId": "hashedMachineName:e04049ca-e421-555f-21ff-492417ecaf52", "NumberOfClientsCreated": 1, "NumberOfActiveClients": 1, "ConnectionMode": "Direct", "User Agent": "cosmos-netstandard-sdk/3.37.1|1|X64|Microsoft Windows 10.0.22635|.NET 6.0.25|L|", "ConnectionConfig": { "gw": "(cps:50, urto:500, p:False, httpf: False)", "rntbd": "(cto: 5, icto: -1, mrpc: 30, mcpe: 65535, erd: True, pr: ReuseUnicastPort)", "other": "(ed:False, be:False)" }, "ConsistencyConfig": "(consistency: NotSet, prgns:[North Central US, East Asia], apprgn: )", "ProcessorCount": 8 } }, "children": [ { "name": "ItemSerialize", "duration in milliseconds": 4.4378 }, { "name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler", "duration in milliseconds": 52747.9847, "children": [ { "name": "Waiting for Initialization of client to complete", "duration in milliseconds": 5756.9445 }, { "name": "Get Collection Cache", "duration in milliseconds": 0.0043 }, { "name": "Read Collection", "duration in milliseconds": 15549.0829, "data": { "Client Side Request Stats": { "Id": "AggregatedClientSideRequestStatistics", "ContactedReplicas": [], "RegionsContacted": [], "FailedReplicas": [], "AddressResolutionStatistics": [], "StoreResponseStatistics": [], "HttpResponseStats": [ { "StartTimeUTC": "2024-01-09T00:08:01.4894987Z", "DurationInMs": 503.7343, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments", "ResourceType": "Collection", "HttpMethod": "GET", "ActivityId": "465bce3f-a83e-4ba2-9f2a-8d21ae3bcc91", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2024-01-09T00:08:01.9935045Z", "DurationInMs": 5021.4947, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments", "ResourceType": "Collection", "HttpMethod": "GET", "ActivityId": "465bce3f-a83e-4ba2-9f2a-8d21ae3bcc91", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2024-01-09T00:08:08.0141416Z", "DurationInMs": 8980.8771, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments", "ResourceType": "Collection", "HttpMethod": "GET", "ActivityId": "465bce3f-a83e-4ba2-9f2a-8d21ae3bcc91", "StatusCode": "OK" } ] } } }, { "name": "Microsoft.Azure.Cosmos.Handlers.DiagnosticsHandler", "duration in milliseconds": 31401.0146, "data": { "System Info": { "systemHistory": [ { "dateUtc": "2024-01-09T00:08:00.6690003Z", "cpu": 3.483, "memory": 26931268, "threadInfo": { "isThreadStarving": "no info", "availableThreads": 32765, "minThreads": 8, "maxThreads": 32767 }, "numberOfOpenTcpConnection": 0 }, { "dateUtc": "2024-01-09T00:08:10.6887491Z", "cpu": 3.466, "memory": 26936596, "threadInfo": { "isThreadStarving": "False", "threadWaitIntervalInMs": 0.2319, "availableThreads": 32765, "minThreads": 8, "maxThreads": 32767 }, "numberOfOpenTcpConnection": 0 }, { "dateUtc": "2024-01-09T00:08:20.7009190Z", "cpu": 2.91, "memory": 26936124, "threadInfo": { "isThreadStarving": "False", "threadWaitIntervalInMs": 0.0865, "availableThreads": 32765, "minThreads": 8, "maxThreads": 32767 }, "numberOfOpenTcpConnection": 0 }, { "dateUtc": "2024-01-09T00:08:30.7059013Z", "cpu": 5.189, "memory": 27004888, "threadInfo": { "isThreadStarving": "False", "threadWaitIntervalInMs": 0.2459, "availableThreads": 32765, "minThreads": 8, "maxThreads": 32767 }, "numberOfOpenTcpConnection": 0 }, { "dateUtc": "2024-01-09T00:08:40.7178070Z", "cpu": 3.105, "memory": 26952512, "threadInfo": { "isThreadStarving": "False", "threadWaitIntervalInMs": 0.1425, "availableThreads": 32765, "minThreads": 8, "maxThreads": 32767 }, "numberOfOpenTcpConnection": 0 } ] } }, "children": [ { "name": "Microsoft.Azure.Cosmos.Handlers.TelemetryHandler", "duration in milliseconds": 31400.1408, "children": [ { "name": "Microsoft.Azure.Cosmos.Handlers.RetryHandler", "duration in milliseconds": 31398.8421, "children": [ { "name": "Microsoft.Azure.Cosmos.Handlers.RouterHandler", "duration in milliseconds": 31396.233, "children": [ { "name": "Microsoft.Azure.Cosmos.Handlers.TransportHandler", "duration in milliseconds": 31395.5602, "children": [ { "name": "Microsoft.Azure.Documents.ServerStoreModel Transport Request", "duration in milliseconds": 31392.7365, "data": { "Client Side Request Stats": { "Id": "AggregatedClientSideRequestStatistics", "ContactedReplicas": [ { "Count": 3, "Uri": "rntbd://cdb-ms-test14-northcentralus1-fd1.documents-test.windows-int.net:14300/apps/9fa82d84-46e2-49bf-91e8-ea2a4c416297/services/8d579ef7-1ec8-4ea2-b1e9-874893eb9fd6/partitions/1096a52e-dc9d-497d-a8d5-f8aff61b3b57/replicas/133425533821859690p/" }, { "Count": 1, "Uri": "rntbd://cdb-ms-test14-northcentralus1-fd1.documents-test.windows-int.net:14001/apps/9fa82d84-46e2-49bf-91e8-ea2a4c416297/services/8d579ef7-1ec8-4ea2-b1e9-874893eb9fd6/partitions/1096a52e-dc9d-497d-a8d5-f8aff61b3b57/replicas/133425533821859692s/" }, { "Count": 1, "Uri": "rntbd://cdb-ms-test14-northcentralus1-fd1.documents-test.windows-int.net:14003/apps/9fa82d84-46e2-49bf-91e8-ea2a4c416297/services/8d579ef7-1ec8-4ea2-b1e9-874893eb9fd6/partitions/1096a52e-dc9d-497d-a8d5-f8aff61b3b57/replicas/133425533713419672s/" }, { "Count": 1, "Uri": "rntbd://cdb-ms-test14-northcentralus1-fd1.documents-test.windows-int.net:14302/apps/9fa82d84-46e2-49bf-91e8-ea2a4c416297/services/8d579ef7-1ec8-4ea2-b1e9-874893eb9fd6/partitions/1096a52e-dc9d-497d-a8d5-f8aff61b3b57/replicas/133425533821859691s/" } ], "RegionsContacted": [], "FailedReplicas": [], "AddressResolutionStatistics": [ { "StartTimeUTC": "2024-01-09T00:08:34.0155575Z", "EndTimeUTC": "2024-01-09T00:08:47.6666311Z", "TargetEndpoint": "https://dkppaf-northcentralus.documents-test.windows-int.net//addresses/?$resolveFor=dbs%2fUD9beQ%3d%3d%2fcolls%2fUD9beckK-kw%3d%2fdocs&$filter=protocol eq rntbd&$partitionKeyRangeIds=0" } ], "StoreResponseStatistics": [ { "ResponseTimeUTC": "2024-01-09T00:08:48.2157795Z", "ResourceType": "Document", "OperationType": "Create", "LocationEndpoint": "https://dkppaf-northcentralus.documents-test.windows-int.net/", "StoreResult": { "ActivityId": "c8467c84-d875-4e3b-9bf0-7cb1b2fe18a4", "StatusCode": "Created", "SubStatusCode": "Unknown", "LSN": 11465, "PartitionKeyRangeId": "0", "GlobalCommittedLSN": 11464, "ItemLSN": -1, "UsingLocalLSN": false, "QuorumAckedLSN": 11464, "SessionToken": "-1#11465", "CurrentWriteQuorum": 3, "CurrentReplicaSetSize": 4, "NumberOfReadRegions": 2, "IsValid": true, "StorePhysicalAddress": "rntbd://cdb-ms-test14-northcentralus1-fd1.documents-test.windows-int.net:14300/apps/9fa82d84-46e2-49bf-91e8-ea2a4c416297/services/8d579ef7-1ec8-4ea2-b1e9-874893eb9fd6/partitions/1096a52e-dc9d-497d-a8d5-f8aff61b3b57/replicas/133425533821859690p/", "RequestCharge": 7.05, "RetryAfterInMs": null, "BELatencyInMs": "5.163", "ReplicaHealthStatuses": [ "(port: 14300 | status: Connected | lkt: 1/9/2024 12:08:48 AM)" ], "transportRequestTimeline": { "requestTimeline": [ { "event": "Created", "startTimeUtc": "2024-01-09T00:08:47.6949569Z", "durationInMs": 7.3881 }, { "event": "ChannelAcquisitionStarted", "startTimeUtc": "2024-01-09T00:08:47.7023450Z", "durationInMs": 374.6118 }, { "event": "Pipelined", "startTimeUtc": "2024-01-09T00:08:48.0769568Z", "durationInMs": 26.7 }, { "event": "Transit Time", "startTimeUtc": "2024-01-09T00:08:48.1036568Z", "durationInMs": 91.3586 }, { "event": "Received", "startTimeUtc": "2024-01-09T00:08:48.1950154Z", "durationInMs": 12.1271 }, { "event": "Completed", "startTimeUtc": "2024-01-09T00:08:48.2071425Z", "durationInMs": 0 } ], "serviceEndpointStats": { "inflightRequests": 1, "openConnections": 1 }, "connectionStats": { "waitforConnectionInit": "True", "callsPendingReceive": 0, "lastSendAttempt": "2024-01-09T00:08:47.9935715Z", "lastSend": "2024-01-09T00:08:47.9945759Z", "lastReceive": "2024-01-09T00:08:48.0668450Z" }, "requestSizeInBytes": 1176, "requestBodySizeInBytes": 727, "responseMetadataSizeInBytes": 450, "responseBodySizeInBytes": 935 }, "TransportException": null } }, { "ResponseTimeUTC": "2024-01-09T00:08:48.3154370Z", "ResourceType": "Collection", "OperationType": "Head", "LocationEndpoint": "https://dkppaf-northcentralus.documents-test.windows-int.net/", "StoreResult": { "ActivityId": "c8467c84-d875-4e3b-9bf0-7cb1b2fe18a4", "StatusCode": "NoContent", "SubStatusCode": "Unknown", "LSN": 11465, "PartitionKeyRangeId": "0", "GlobalCommittedLSN": 11464, "ItemLSN": 1, "UsingLocalLSN": false, "QuorumAckedLSN": 11465, "SessionToken": "-1#11465", "CurrentWriteQuorum": 3, "CurrentReplicaSetSize": 4, "NumberOfReadRegions": 2, "IsValid": true, "StorePhysicalAddress": "rntbd://cdb-ms-test14-northcentralus1-fd1.documents-test.windows-int.net:14300/apps/9fa82d84-46e2-49bf-91e8-ea2a4c416297/services/8d579ef7-1ec8-4ea2-b1e9-874893eb9fd6/partitions/1096a52e-dc9d-497d-a8d5-f8aff61b3b57/replicas/133425533821859690p/", "RequestCharge": 0, "RetryAfterInMs": null, "BELatencyInMs": "0.403", "ReplicaHealthStatuses": [ "(port: 14300 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)", "(port: 14001 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)", "(port: 14003 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)", "(port: 14302 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)" ], "transportRequestTimeline": { "requestTimeline": [ { "event": "Created", "startTimeUtc": "2024-01-09T00:08:48.2418862Z", "durationInMs": 0.0984 }, { "event": "ChannelAcquisitionStarted", "startTimeUtc": "2024-01-09T00:08:48.2419846Z", "durationInMs": 0.3812 }, { "event": "Pipelined", "startTimeUtc": "2024-01-09T00:08:48.2423658Z", "durationInMs": 4.8247 }, { "event": "Transit Time", "startTimeUtc": "2024-01-09T00:08:48.2471905Z", "durationInMs": 67.4608 }, { "event": "Received", "startTimeUtc": "2024-01-09T00:08:48.3146513Z", "durationInMs": 0.1506 }, { "event": "Completed", "startTimeUtc": "2024-01-09T00:08:48.3148019Z", "durationInMs": 0 } ], "serviceEndpointStats": { "inflightRequests": 1, "openConnections": 1 }, "connectionStats": { "waitforConnectionInit": "False", "callsPendingReceive": 0, "lastSendAttempt": "2024-01-09T00:08:48.1032923Z", "lastSend": "2024-01-09T00:08:48.1036567Z", "lastReceive": "2024-01-09T00:08:48.1913953Z" }, "requestSizeInBytes": 424, "responseMetadataSizeInBytes": 295 }, "TransportException": null } }, { "ResponseTimeUTC": "2024-01-09T00:08:48.4046844Z", "ResourceType": "Collection", "OperationType": "Head", "LocationEndpoint": "https://dkppaf-northcentralus.documents-test.windows-int.net/", "StoreResult": { "ActivityId": "c8467c84-d875-4e3b-9bf0-7cb1b2fe18a4", "StatusCode": "NoContent", "SubStatusCode": "Unknown", "LSN": 11465, "PartitionKeyRangeId": "0", "GlobalCommittedLSN": 11465, "ItemLSN": 1, "UsingLocalLSN": false, "QuorumAckedLSN": 11465, "SessionToken": "-1#11465", "CurrentWriteQuorum": 3, "CurrentReplicaSetSize": 4, "NumberOfReadRegions": 2, "IsValid": true, "StorePhysicalAddress": "rntbd://cdb-ms-test14-northcentralus1-fd1.documents-test.windows-int.net:14300/apps/9fa82d84-46e2-49bf-91e8-ea2a4c416297/services/8d579ef7-1ec8-4ea2-b1e9-874893eb9fd6/partitions/1096a52e-dc9d-497d-a8d5-f8aff61b3b57/replicas/133425533821859690p/", "RequestCharge": 0, "RetryAfterInMs": null, "BELatencyInMs": "0.177", "ReplicaHealthStatuses": [ "(port: 14300 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)", "(port: 14001 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)", "(port: 14003 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)", "(port: 14302 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)" ], "transportRequestTimeline": { "requestTimeline": [ { "event": "Created", "startTimeUtc": "2024-01-09T00:08:48.3379270Z", "durationInMs": 0.0068 }, { "event": "ChannelAcquisitionStarted", "startTimeUtc": "2024-01-09T00:08:48.3379338Z", "durationInMs": 0.0075 }, { "event": "Pipelined", "startTimeUtc": "2024-01-09T00:08:48.3379413Z", "durationInMs": 0.2067 }, { "event": "Transit Time", "startTimeUtc": "2024-01-09T00:08:48.3381480Z", "durationInMs": 66.3662 }, { "event": "Received", "startTimeUtc": "2024-01-09T00:08:48.4045142Z", "durationInMs": 0.1483 }, { "event": "Completed", "startTimeUtc": "2024-01-09T00:08:48.4046625Z", "durationInMs": 0 } ], "serviceEndpointStats": { "inflightRequests": 1, "openConnections": 1 }, "connectionStats": { "waitforConnectionInit": "False", "callsPendingReceive": 0, "lastSendAttempt": "2024-01-09T00:08:48.2470408Z", "lastSend": "2024-01-09T00:08:48.2471906Z", "lastReceive": "2024-01-09T00:08:48.3146214Z" }, "requestSizeInBytes": 424, "responseMetadataSizeInBytes": 295 }, "TransportException": null } } ], "HttpResponseStats": [ { "StartTimeUTC": "2024-01-09T00:08:17.0721705Z", "DurationInMs": 519.3992, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges", "ResourceType": "PartitionKeyRange", "HttpMethod": "GET", "ActivityId": "7f8bcf09-0acc-41b9-aa31-2c3c6193041b", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2024-01-09T00:08:17.5916268Z", "DurationInMs": 5015.3219, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges", "ResourceType": "PartitionKeyRange", "HttpMethod": "GET", "ActivityId": "7f8bcf09-0acc-41b9-aa31-2c3c6193041b", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2024-01-09T00:08:23.6230150Z", "DurationInMs": 10013.8951, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges", "ResourceType": "PartitionKeyRange", "HttpMethod": "GET", "ActivityId": "7f8bcf09-0acc-41b9-aa31-2c3c6193041b", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2024-01-09T00:08:33.6579613Z", "DurationInMs": 165.3261, "RequestUri": "https://dkppaf-eastasia.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges", "ResourceType": "PartitionKeyRange", "HttpMethod": "GET", "ActivityId": "e9ad8c72-40b6-4528-bd30-732f45ca474e", "StatusCode": "OK" }, { "StartTimeUTC": "2024-01-09T00:08:33.8354862Z", "DurationInMs": 157.063, "RequestUri": "https://dkppaf-eastasia.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges", "ResourceType": "PartitionKeyRange", "HttpMethod": "GET", "ActivityId": "bda47b29-44e7-4340-b4a2-670868a72096", "StatusCode": "NotModified", "ReasonPhrase": "Not Modified" }, { "StartTimeUTC": "2024-01-09T00:08:34.0172521Z", "DurationInMs": 503.2223, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net//addresses/?$resolveFor=dbs%2fUD9beQ%3d%3d%2fcolls%2fUD9beckK-kw%3d%2fdocs&$filter=protocol eq rntbd&$partitionKeyRangeIds=0", "ResourceType": "Document", "HttpMethod": "GET", "ActivityId": "c8467c84-d875-4e3b-9bf0-7cb1b2fe18a4", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2024-01-09T00:08:34.5205260Z", "DurationInMs": 5012.3242, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net//addresses/?$resolveFor=dbs%2fUD9beQ%3d%3d%2fcolls%2fUD9beckK-kw%3d%2fdocs&$filter=protocol eq rntbd&$partitionKeyRangeIds=0", "ResourceType": "Document", "HttpMethod": "GET", "ActivityId": "c8467c84-d875-4e3b-9bf0-7cb1b2fe18a4", "ExceptionType": "System.Threading.Tasks.TaskCanceledException", "ExceptionMessage": "The operation was canceled." }, { "StartTimeUTC": "2024-01-09T00:08:40.5337065Z", "DurationInMs": 7131.9352, "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net//addresses/?$resolveFor=dbs%2fUD9beQ%3d%3d%2fcolls%2fUD9beckK-kw%3d%2fdocs&$filter=protocol eq rntbd&$partitionKeyRangeIds=0", "ResourceType": "Document", "HttpMethod": "GET", "ActivityId": "c8467c84-d875-4e3b-9bf0-7cb1b2fe18a4", "StatusCode": "OK" } ] } } } ] } ] } ] } ] } ] } ] }, { "name": "Response Serialization", "duration in milliseconds": 1.1941 } ] }Type of change
Please delete options that are not relevant.
Closing issues
To automatically close an issue: closes #4181