forked from Azure-Samples/art-voice-agent-accelerator
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
- Cosmos dependencies: audioagentcollection (76 calls, 14 failures, p95
60s) and users (14 calls, 6 failures, p95120s). Failures correlate with identity issues and long SDK timeouts.
Severity & SLA risk
- sev3 (elevated latency and intermittent failures)
Detection
- App Insights dependencies show above failure/latency metrics.
- Exceptions include upsert failures tied to prior token issues.
Impacted components
- Cosmos DB client configuration; partitioning/RU provisioning.
Suspected cause
- Long default timeouts and retry budgets; possibly hot partition/RU throttling. Confidence: medium.
Recommended actions
- Set bounded requestTimeout (e.g., 5–10s) and overall retry policy
- Enable fast-fail on cancellation from upstream
- Verify partition keys and RU/s; increase or autoscale where needed
- Add diagnostics capturing RU charge/throttle codes
Acceptance criteria
- p95 < 400ms, p99 < 1s on key operations
- Dependency failure rate <1%
- No prolonged 60–120s Cosmos calls
Follow-ups
- Add load test to validate RU/partitioning
Missing info
- Database/account names, SDK version in use
Reactions are currently unavailable