Skip to content

Comments

Syncinc to Azure Samples #95

Merged
JinLee794 merged 342 commits intoAzure-Samples:mainfrom
AIappsGBBFactory:main
Jan 12, 2026
Merged

Syncinc to Azure Samples #95
JinLee794 merged 342 commits intoAzure-Samples:mainfrom
AIappsGBBFactory:main

Conversation

@pablosalvador10
Copy link
Contributor

This pull request introduces significant enhancements to the Azure Real-Time (ART) Agent Accelerator, focusing on expanding the agent builder API, improving load testing flexibility, and updating documentation and changelogs for the upcoming 2.0.0-beta.1 release. The most notable changes include new endpoints for listing available voices and model deployments, extended model configuration options, improved load testing make targets, and updated API documentation tags.

Agent Builder API Enhancements:

  • Added a new /models endpoint to the agent builder API, allowing clients to fetch a real-time list of available OpenAI model deployments from Azure AI Foundry, including fallback logic if the Azure client is unavailable. [1] [2]
  • Enhanced the /voices endpoint to optionally fetch live TTS voice data from Azure Speech Service, with improved error handling and categorization, and included cache/source information in the response. [1] [2]
  • Extended the ModelConfigSchema with new parameters for endpoint preference, verbosity, advanced sampling, reasoning effort, and response formatting, supporting richer configuration for the Responses API and future models.

Load Testing and Makefile Improvements:

  • Refactored Makefile targets for load testing: added the PIPELINE parameter to allow selection between orchestration modes (e.g., cascade or voicelive), renamed targets for clarity, and updated help text and usage instructions. [1] [2] [3] [4] [5]

API Documentation and Tag Updates:

  • Updated API documentation tags to better reflect current endpoints, removing outdated categories (e.g., "ACS Media Session", "Telemetry") and adding new ones such as "Scenario Builder" and "Session Management".

Changelog and Release Notes:

  • Added a comprehensive changelog entry for version 2.0.0-beta.1, summarizing major features (Scenario Builder UI, VoiceHandler refactoring, evaluation framework), enhancements, bug fixes, infrastructure changes, and removals.

JinLee794 and others added 30 commits November 16, 2025 17:23
…-svc-terraform

adding ECS Terraform + updated unit tests
Updated the version of the text-embedding-3-large model.
… to derive WS URL

- Added optional `label` parameter to `make_status_envelope` function in `envelopes.py` to allow custom labels in status messages.
- Updated `entrypoint.sh` to derive WebSocket URL from `BACKEND_URL` or use `WS_URL` if provided, replacing placeholders in frontend assets.
- Upgraded `js-yaml` and `vite` dependencies in `package.json` and `package-lock.json`.
- Enhanced `App.jsx` to format event type labels and summarize event data for better user experience.
- Introduced new demo scenarios in `DemoScenariosWidget.jsx` to showcase Microsoft Copilot Studio integration and ACS call routing.
- Added tests for call transfer events in `test_acs_events_handlers.py` to ensure correct envelope broadcasting for transfer accepted and failed events.
- Created a new Jupyter notebook for custom speech model demonstration in `12-custom-speech-model.ipynb`.
- Updated Terraform parameters to include a new text embedding model in `main.tfvars.dev.json`.
- Added new utility functions for formatting event types and summarizing event data in App.jsx.
- Improved ChatBubble component to display event messages with formatted labels and timestamps.
- Updated DemoScenariosWidget to include new scenarios and enhanced filtering options based on tags.
- Introduced websocket URL derivation in postprovision.sh for better backend integration.
- Added tests for call transfer events in test_acs_events_handlers.py to ensure proper envelope broadcasting.
- Updated package.json to include js-yaml and upgraded vite version.
feat: Enhance event handling and UI components
…nteractions

- Implement distributed session bus using Redis for cross-replica session routing in connection manager.
- Add methods for publishing session envelopes to Redis channels.
- Introduce confirmation context for call center transfers to ensure explicit user consent.
- Update PayPal agent templates to clarify authentication and routing guidelines.
- Enhance real-time voice app to manage relay WebSocket connections and handle session updates more effectively.
- Improve error handling and logging for distributed session delivery and Redis interactions.
- Refactor session envelope handling in frontend to accommodate new event types and improve user experience.
…and improve PayPal agent handoff scenarios
- Added test audio file to samples/hello_world and samples/labs/dev directories.
- Updated audio agent notebook to reflect changes in available audio devices, including new device names and configurations.
- Fixed kernel execution count and improved error handling in the notebook.
- Introduced several new PDF documents related to credit card products, including BankAmericard, Customized Cash Rewards, Elite, Premium Rewards, Travel Rewards, and Unlimited Cash Rewards.
- Added corresponding images for credit card products.
JinLee794 and others added 23 commits December 25, 2025 14:33
…erarchy (#14)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
…erarchy (#15)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
…erarchy (#13)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
…erarchy (#12)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* fix: update project version to 2.0.0-beta in pyproject.toml

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* feat: integrate Responses API in orchestrators and add optimizations

**Cascade Orchestrator:**
- Update model selection to use agent.get_model_for_mode('cascade')
- Integrate Responses API routing based on endpoint_preference
- Add error handling for unsupported parameters
- Extract TTS processing into separate tts_processor module

**VoiceLive Orchestrator:**
- Update to use agent.get_model_for_mode('voicelive')
- Add registry cleanup to prevent unbounded growth
- Improve memory management and stale orchestrator cleanup
- Extract DTMF processing into separate dtmf_processor module

**Tests:**
- Add test_cascade_orchestrator_entry_points
- Add test_cascade_llm_processing
- Add test_dtmf_processor

Depends on: PR #1 (Responses API Infrastructure)

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* feat: add evaluation framework and frontend UI for Responses API

**Evaluation Framework:**
- Add EventRecorder with git commit SHA tracking
- Add API-aware scoring with budget adjustments for verbosity
- Add scenario runner for automated testing
- Add CLI for running evaluations
- Add validate_phases.py for phase-based validation
- Add wrappers for endpoint detection

**Frontend UI:**
- Add cascade_model and voicelive_model selectors in Agent Builder
- Add Responses API endpoint preference dropdown
- Add conditional fields for verbosity, reasoning_effort, etc.
- Update ScenarioBuilder with model configuration options
- Display API version fields

**Documentation:**
- Add docs/testing/model-evaluation.md
- Add evaluation playground Jupyter notebook

Depends on: PR #1 (Responses API Infrastructure)

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
Major refactoring of voice processing architecture:

Core Voice Changes:
- Implement new VoiceHandler as primary entry point for voice sessions
- Delete deprecated speech_cascade/tts.py (652 lines removed)
- Consolidate TTS functionality into voice/tts/playback.py
- Enhance CascadeOrchestrator with improved turn management
- Add VoiceSessionContext for clean dependency injection

API & Integration:
- Migrate /api/v1/browser/conversation to VoiceHandler
- Migrate /api/v1/media/stream to VoiceHandler
- Create MediaHandler→VoiceHandler compatibility alias
- Update media_handler.py for backward compatibility

Infrastructure:
- Improve telemetry with Azure-style span naming
- Enhance ACS helpers with better session management
- Update session terminator for lifecycle management
- Add orchestration improvements for unified agents

Configuration & Samples:
- Update auth agent and insurance scenario configs
- Add handoff tool enhancements with context variables
- Update gpt_flow sample for new patterns

Frontend:
- Refactor App.jsx for improved voice handling UI

Testing & Documentation:
- Add test_voice_handler_compat.py for backward compatibility
- Add MEDIAHANDLER_MIGRATION.md tracking document

This change maintains full backward compatibility while establishing
the foundation for cleaner voice processing patterns going forward.

Closes #[TBD]
…ripts (#20)

- Updated logging functions in preflight-checks.sh, ssl-preprovision.sh, sync-appconfig.sh, postprovision.sh, and preprovision.sh for consistent output formatting.
- Improved user prompts for SSL certificate configuration and Azure Entra group creation in ssl-preprovision.sh and postprovision.sh.
- Added color-coded success, warning, and error messages for better visibility.
- Modified the handling of environment variables in postprovision.sh to ensure updates are made without overwriting existing values.
- Updated Terraform configurations to manage app configuration and cognitive account settings with soft delete options.

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
Major refactoring of voice processing architecture:

Core Voice Changes:
- Implement new VoiceHandler as primary entry point for voice sessions
- Delete deprecated speech_cascade/tts.py (652 lines removed)
- Consolidate TTS functionality into voice/tts/playback.py
- Enhance CascadeOrchestrator with improved turn management
- Add VoiceSessionContext for clean dependency injection

API & Integration:
- Migrate /api/v1/browser/conversation to VoiceHandler
- Migrate /api/v1/media/stream to VoiceHandler
- Create MediaHandler→VoiceHandler compatibility alias
- Update media_handler.py for backward compatibility

Infrastructure:
- Improve telemetry with Azure-style span naming
- Enhance ACS helpers with better session management
- Update session terminator for lifecycle management
- Add orchestration improvements for unified agents

Configuration & Samples:
- Update auth agent and insurance scenario configs
- Add handoff tool enhancements with context variables
- Update gpt_flow sample for new patterns

Frontend:
- Refactor App.jsx for improved voice handling UI

Testing & Documentation:
- Add test_voice_handler_compat.py for backward compatibility
- Add MEDIAHANDLER_MIGRATION.md tracking document

This change maintains full backward compatibility while establishing
the foundation for cleaner voice processing patterns going forward.

Closes #[TBD]

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
* docs: add comprehensive voice processing architecture documentation

Add complete documentation for the voice processing architecture:

New Documentation:
- docs/architecture/voice/README.md - Comprehensive voice architecture guide
  * VoiceHandler overview and usage patterns
  * TTS playback and text processing
  * Speech cascade pipeline documentation
  * Audio specifications for browser and ACS transports
  * Testing guidelines with actual test file references
  * Troubleshooting guide for common issues

- apps/artagent/backend/voice/README.md - Developer quick reference
  * Directory structure and module organization
  * Quick start examples
  * Common tasks and patterns
  * File location guide
  * Testing commands

Documentation Updates:
- docs/mkdocs.yml - Add voice architecture to navigation
- docs/operations/troubleshooting.md - Add voice-specific troubleshooting

Key Improvements:
- Fixed mkdocs formatting for proper list rendering
- Updated all test references to match actual test files:
  * test_voice_handler_components.py
  * test_voice_handler_compat.py
  * test_cascade_orchestrator_entry_points.py
  * test_cascade_llm_processing.py
- Verified all script references (quick_test.sh, test_orchestrator.py)
- Added prerequisites for running tests with dev dependencies
- Included both basic and advanced testing examples

All file paths and examples have been verified against the actual codebase.

Related to #[TBD]

* Add custom styles for Flowy flowchart integration with agent blocks

* feat: Enhance output port visibility logic in ScenarioGraphCanvas

* feat: Add expandable full prompt view for source agent in HandoffEditorDialog

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
- Removed info-level logging for ACS configuration details to reduce verbosity.
- Changed some logging statements to debug level for better log management.
- Updated peer.service attribute in telemetry to use "azure-communication-services".
- Introduced a new orchestration.yaml file defining a default customer service scenario with multiple agents and handoff configurations.
- Removed info-level logging for ACS configuration details to reduce verbosity.
- Changed some logging statements to debug level for better log management.
- Updated peer.service attribute in telemetry to use "azure-communication-services".
- Introduced a new orchestration.yaml file defining a default customer service scenario with multiple agents and handoff configurations.

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
…cements, fixes, and infrastructure changes
…scripts

- Added mkdocs-mermaid-zoom to pyproject.toml and uv.lock for enhanced diagram support in documentation.
- Enhanced locustfile.acs_media.py with rate limit detection and error handling improvements.
- Introduced locustfile.browser_conversation.py for testing browser-based voice conversation endpoints.
- Improved metrics naming conventions for clarity in load testing results.
Copy link
Collaborator

@JinLee794 JinLee794 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JinLee794 JinLee794 merged commit 97737dd into Azure-Samples:main Jan 12, 2026
7 of 8 checks passed
JinLee794 added a commit to AIappsGBBFactory/art-voice-agent-accelerator that referenced this pull request Jan 20, 2026
#29)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* Syncinc to Azure Samples  (#95)

* Delete samples/labs/dev/leadership_phrases.txt

* Update version and SKU name in staging params

* Change version for text-embedding-3-large model

Updated the version of the text-embedding-3-large model.

* Update main.tfvars.staging.json

* Update communication.tf

* feat: Enhance status envelope with optional label and update frontend to derive WS URL

- Added optional `label` parameter to `make_status_envelope` function in `envelopes.py` to allow custom labels in status messages.
- Updated `entrypoint.sh` to derive WebSocket URL from `BACKEND_URL` or use `WS_URL` if provided, replacing placeholders in frontend assets.
- Upgraded `js-yaml` and `vite` dependencies in `package.json` and `package-lock.json`.
- Enhanced `App.jsx` to format event type labels and summarize event data for better user experience.
- Introduced new demo scenarios in `DemoScenariosWidget.jsx` to showcase Microsoft Copilot Studio integration and ACS call routing.
- Added tests for call transfer events in `test_acs_events_handlers.py` to ensure correct envelope broadcasting for transfer accepted and failed events.
- Created a new Jupyter notebook for custom speech model demonstration in `12-custom-speech-model.ipynb`.
- Updated Terraform parameters to include a new text embedding model in `main.tfvars.dev.json`.

* refactor: Comment out unused email communication service domain resource

* refactor: Comment out unused Azure email communication service resources

* feat: Enhance event handling and UI components

- Added new utility functions for formatting event types and summarizing event data in App.jsx.
- Improved ChatBubble component to display event messages with formatted labels and timestamps.
- Updated DemoScenariosWidget to include new scenarios and enhanced filtering options based on tags.
- Introduced websocket URL derivation in postprovision.sh for better backend integration.
- Added tests for call transfer events in test_acs_events_handlers.py to ensure proper envelope broadcasting.
- Updated package.json to include js-yaml and upgraded vite version.

* add value

* feat: Enhance distributed session handling and improve PayPal agent interactions

- Implement distributed session bus using Redis for cross-replica session routing in connection manager.
- Add methods for publishing session envelopes to Redis channels.
- Introduce confirmation context for call center transfers to ensure explicit user consent.
- Update PayPal agent templates to clarify authentication and routing guidelines.
- Enhance real-time voice app to manage relay WebSocket connections and handle session updates more effectively.
- Improve error handling and logging for distributed session delivery and Redis interactions.
- Refactor session envelope handling in frontend to accommodate new event types and improve user experience.

* feat: Enhance status tone metadata and improve chat bubble styling

* feat: Implement background task handling for MFA delivery and improve greeting messages in handoff processes

* feat: Enhance call escalation process with detailed transfer context and improve PayPal agent handoff scenarios

* feat: Implement retry mechanism for browser session ID resolution in media streaming

* feat: Enhance session management and greeting handling across various components

* fixing session mapping for acs calls

* add value

* add value

* adding test file

* Adding agents and templates for credit card recommendation and fee dispute agents

* add value

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* add value

* add value

* Implement Azure Voice Live service integration and enhance Terraform configurations for voice model deployments

* add value

* Add Azure Voice Live model configuration and outputs

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* remove sensitive contact information and unused transfer agency client data

* feat: Introduce Agent Consolidation Plan with YAML-driven architecture

- Added a comprehensive proposal for consolidating agent architecture in `apps/rtagent/backend/src/agents/`.
- Established key goals including single source of truth for agent definitions, auto-discovery, and unified tool registry.
- Analyzed current architecture and identified pain points such as manual handoff registration and duplicate tool registries.
- Proposed a new solution architecture featuring enhanced YAML schema, auto-discovery engine, and unified tool registry.
- Detailed implementation roadmap divided into phases for gradual migration and integration.
- Included backward compatibility strategy to ensure existing agents function without modification.
- Provided extensive documentation on YAML schema, CLI tool usage, and migration checklist.

* Refactor speech cascade handler and routing for browser communication

- Updated speech cascade handler to prioritize `on_greeting` callback over `on_tts_request` for greeting events.
- Added `queue_user_text` method to `SpeechCascadeHandler` for queuing user text input.
- Changed routing from `/realtime` to `/browser` for browser communication endpoints.
- Modified orchestration logic to ensure TTS responses are sent with blocking behavior to prevent overlap.
- Introduced WebSocket helper functions for better organization and clarity in messaging.
- Enhanced connection manager to handle Redis pubsub reconnections on credential expiration.
- Updated frontend components to reflect routing changes for browser communication.
- Adjusted tests to align with the new browser routing and functionality.
- Commented out live metrics enabling condition in telemetry configuration for future consideration.

* feat(telemetry): add decorators for tracing LLM, dependency, speech, and ACS calls

- Introduced , , , and  decorators for OpenTelemetry instrumentation.
- Implemented  context manager for tracking conversation turns with detailed metrics.
- Added helper functions for recording GenAI and speech metrics.
- Enhanced span attributes for Azure Application Insights visualization.

* Remove telemetry configuration module (telemetry_config_v2.py) to streamline codebase and eliminate unused functionality.

* feat: Enhance telemetry and tracing for CosmosDB and latency tool

- Added OpenTelemetry tracing to CosmosDB operations with a decorator for latency tracking.
- Integrated tracing spans in the LatencyTool for better observability in Application Insights.
- Updated telemetry configuration to suppress noisy logs and added new attributes for speech cascade metrics.
- Created unit tests for SessionAgentManager, covering configuration management, override resolution, handoff management, and persistence.
- Removed outdated endpoints review document.

* feat: Add useBackendHealth hook for backend health checks and integrate with readiness, agents, and health endpoints

test: Implement integration tests for VoiceLive Session Agent Manager, covering agent resolution, handoff mapping, and runtime modifications

* WARNING!!!! MAJOR REFACTOR COMMIT

- Removed the VoiceLive SDK integration module from the backend.
- Added a new AgentTopologyPanel component to the frontend for displaying agent inventory and connections.
- Integrated the AgentTopologyPanel into the main application layout.
- Updated the BackendIndicator to include agent count and selection functionality.
- Enhanced the ConversationControls with a fixed view switcher for better accessibility.
- Improved the useBackendHealth hook to handle various agent data structures.
- Updated styles for better responsiveness and visual consistency across components.
- Modified utility functions to format agent inventory data correctly.
- Adjusted import paths in orchestrators and tests to reflect the new backend structure.

* feat: Enhance agent handoff process and response handling; refactor UI components for improved usability

* feat: Update change notes for v2/speech-orchestration-and-monitoring branch; highlight major features, improvements, and new agents

* refactor: Remove Unified Agent Configuration Module; streamline agent management and improve code organization

* feat: Enhance ProfileDetailsPanel with resizable functionality and UI improvements

- Added resizable panel feature to ProfileDetailsPanel, allowing users to adjust width dynamically.
- Updated panel styling for improved aesthetics, including a gradient background and adjusted borders.
- Enhanced scrollbar visibility and overflow handling for better user experience.

refactor: Simplify GraphListView filter logic

- Removed default selection logic for filters in GraphListView, allowing users to start with no filters applied.
- Cleaned up useEffect dependencies for better performance and clarity.

docs: Introduce Backend Voice & Agents Architecture documentation

- Added comprehensive documentation outlining the architecture of backend voice and agent modules.
- Detailed separation of concerns between voice transport and agent business logic.
- Included data flow diagrams and module responsibilities for clarity.

docs: Create Handoff Logic Inventory for better understanding of handoff processes

- Documented the handoff logic across backend voice and agent modules.
- Established a single source of truth for handoff mappings and protocols.
- Summarized cleanup phases and their impact on the codebase.

fix: Update logging to safely handle span names

- Modified TraceLogFilter to safely retrieve span names, preventing attribute errors with NonRecordingSpan.

fix: Adjust telemetry configuration to capture all loggers

- Changed logger_name default to an empty string in TelemetryConfig to capture all loggers.

* feat: Implement context-aware greeting rendering in VoiceLive agent; enhance session management and logging

* feat: Refactor agent configuration and voice handling; streamline agent switching and TTS integration

* feat: Enhance Agent Details Panel and Session Management

- Added sessionAgentConfig prop to AgentDetailsPanel for dynamic agent configuration display.
- Implemented logic to show agent name, description, tools, and model/voice details based on session configuration.
- Introduced a new PanelCard in AgentDetailsPanel to display session agent configuration, including model, voice, and prompt preview.
- Updated App component to fetch session agent configuration on agent panel visibility and manage agent creation/updating.
- Added validation for TTS client initialization in dedicated_tts_pool.py to ensure clients are ready before use.
- Enhanced on_demand_pool.py to validate cached resources and remove invalid ones.
- Improved error logging in text_to_speech.py to include detailed initialization failure information and added is_ready property for synthesizer readiness check.

* Refactor code structure for improved readability and maintainability

* feat: Enhance MemoManager with background persistence and lifecycle management

- Added support for background persistence in MemoManager, allowing non-blocking state saving to Redis.
- Implemented task deduplication to cancel previous persistence tasks when a new one is initiated.
- Removed unused auto-refresh functionality and related attributes from MemoManager.
- Updated tests to verify new persistence behavior and ensure proper task management.
- Enhanced error handling and logging for background persistence operations.

* feat: Add Connection Warmup Analysis document for Azure Speech & OpenAI optimization

* feat(session): enhance session ID management and URL parameter support

- Added `pickSessionIdFromUrl` function to extract session ID from URL parameters.
- Updated `getOrCreateSessionId` to allow session ID restoration from URL.
- Refactored `setSessionId` for better logging and session management.
- Improved `createNewSessionId` to utilize `setSessionId`.

docs(api): restructure API documentation for clarity and completeness

- Organized API endpoints into categories: Health & Monitoring, Call Management, Media Streaming, Browser Conversations, Session Metrics, Agent Builder, Demo Environment, and TTS Health.
- Added detailed descriptions and examples for each endpoint.
- Included new sections for interactive API documentation and WebSocket endpoints.

docs(api-reference): update WebSocket message types and endpoint details

- Clarified message types for incoming audio data and control messages.
- Updated WebSocket endpoint URLs and query parameters for browser conversations and dashboard relay.

docs(architecture): refine agent architecture diagrams for clarity

- Adjusted diagrams to improve readability and understanding of the agent framework and orchestration.

fix(architecture): correct orchestration mode comparison table

- Updated ratings for Azure Speech voices and simplicity of setup in the orchestration comparison table.

docs(getting-started): add demo guide and enhance onboarding experience

- Introduced a new demo guide to facilitate user onboarding and provide structured paths for different user levels.
- Enhanced the getting started guide with tips and recommended paths for new users.

feat(aoai): implement OpenAI connection warmup to reduce latency

- Added `warm_openai_connection` function to pre-establish OpenAI connection and reduce cold-start latency on first call.

feat(speech): implement token warmup for Speech API to minimize latency

- Added `warm_token` method in `SpeechTokenManager` to pre-fetch tokens during startup, reducing latency on first API call.

* feat(healthcare): Implement Nurse Triage Agent with symptom assessment and routing capabilities

- Introduced a comprehensive voice agent for healthcare triage.
- Added agent configuration and prompt templates for patient interaction.
- Developed healthcare tools for patient verification, clinical knowledge search, and symptom urgency assessment.
- Integrated routing logic for scheduling appointments and emergency transfers.
- Enhanced documentation with demo scenarios and testing instructions.

* feat: Implement logging utility and session management

- Added a logger utility to manage console logging levels and filtering.
- Created session management functions to handle session IDs, including retrieval from URL and session creation.
- Developed styles for the frontend components to ensure consistent UI design.
- Configured Vite for the frontend build process with proper asset handling and environment variable support.
- Introduced scripts for starting the backend and frontend development servers, including Azure Dev Tunnel hosting.

* feat: Simplify agent handoff process by refining context management and removing redundant data collection

* feat: Enhance agent handoff process by managing conversation history and user context

* feat: Enhance message handling by persisting tool calls and results as JSON for conversation continuity

* feat: Implement silent handoff protocol across agents to enhance user experience and streamline transitions

* feat: Add Azure App Configuration module with RBAC and Key Vault integration

- Implemented main resource for Azure App Configuration in Terraform.
- Added outputs for App Configuration details including ID, name, and endpoint.
- Defined variables for App Configuration module, including identity and Key Vault integration.
- Updated main Terraform outputs to include App Configuration details.
- Enhanced error handling in Azure OpenAI client for missing endpoint configuration.
- Improved Redis manager to handle port configuration with better error messaging.
- Updated requirements to include Azure App Configuration SDKs.

* first code clean up

* enabling oidc

* Refactor code structure and remove redundant sections for improved readability and maintainability

* add value

* add value

* feat: Add managed certificate and domain registration modules

- Introduced `managed-cert-example.bicep` for example usage of managed certificate deployment.
- Created `managed-cert.bicep` to handle App Service Domain registration and managed SSL certificate generation.
- Implemented `role-assignment.bicep` for managing role assignments with support for built-in and custom roles.
- Added `windows-vm.bicep` for deploying a Windows VM as a jumphost with necessary networking components.
- Developed `peer-virtual-networks.bicep` for establishing peering between virtual networks.
- Implemented `private-dns-zone.bicep` for creating and linking private DNS zones to virtual networks.
- Created `private-endpoint.bicep` for deploying private endpoints with DNS zone integration.
- Added `vnet.bicep` for creating virtual networks with associated subnets and network security groups.
- Updated `types.bicep` with new types for model deployment, role assignments, and network configurations.
- Developed `secret.bicep` for managing secrets in Azure Key Vault.
- Created `network.bicep` for orchestrating network resources including virtual networks and subnets.

* fix: Update default location parameter in create_storage function for clarity

* feat: Extract AZURE_LOCATION from environment-specific tfvars file if not set

* feat: Implement location resolution with fallback chain in preprovision script

* fix: Update Dockerfile to install runtime dependencies and mitigate vulnerabilities

* chore: Update CHANGELOG for version 1.5.0 release and remove changenotes.md; enable remote builds in azure.yaml; enhance terraform initialization script with location prompts

* feat: Update launch configuration and scripts to use virtual environment with uv; enhance README for deployment clarity

* further deployment cleanup, docs update/tweaks, adding more todos

* removing unused dependency in src/herlpers.py

* refactor: Update architecture diagram in README for clarity and consistency in orchestration modes

* add value

* Refactor Terraform configuration:
- Update main.tf to adjust foundry account and project naming conventions.
- Remove feature flags and keys from appconfig module as they are now managed externally.
- Clean up variables.tf by removing unused variables and updating descriptions.
- Delete provider configuration file as it is no longer needed.
- Change default application name from "rtaudioagent" to "artagent" and adjust related settings.
- Modify connection settings and pool sizes for improved performance.

* feat: Enhance Azure Voice Live integration and refactor configuration management

* last changes

* feat: Add app configuration bootstrap to initialize environment variables

* Enhance configuration loading with .env.local support and update documentation

* fix voicelive output attributes

* add

* Refactor agent paths and update documentation for agent discovery and configuration

* Add Insurance Voice Agent Scenario documentation and update navigation

- Introduced a comprehensive guide for the Insurance Customer Service Scenario, detailing the security-focused multi-agent voice system for claims processing, fraud detection, and policy management.
- Updated mkdocs.yml to include the new Insurance documentation in the Industry Solutions section.

* Add integration proposal for Spec-Driven Development methodology in ARTVoice

* add value

* Enhance Terraform configuration and scripts for Voice Live integration

- Update Dockerfile to install dependencies and set up virtual environment.
- Modify initialize-terraform.sh and local-dev-setup.sh for improved script handling.
- Refactor sync-appconfig.sh to streamline key-value imports and feature flag management.
- Add provider.conf.json generation for remote state backend configuration.
- Update main.tf and outputs.tf to support new Voice Live model deployments.
- Introduce voice_live_location and voice_live_model_deployments variables in variables.tf.

* feat: Add Concierge agent configuration and prompts for banking scenarios

- Introduced a new YAML configuration for the Concierge agent, defining its voice, model, session, and tool configurations.
- Created a comprehensive prompt file for the Concierge agent, detailing voice and language settings, identity and trust guidelines, and operational modes.
- Implemented scenario orchestration analysis to address issues with agent initialization and fallback logic, ensuring the correct agent is set for banking scenarios.
- Renamed orchestration.yaml to scenario.yaml for consistency in scenario loading.
- Updated default start agent to BankingConcierge and added validation for agent existence at startup.

* feat: Enhance scenario loading to support orchestration.yaml naming convention

* feat: Implement scenario-based handoff map resolution for orchestrator configuration

* cicd test for azd deploy

* feat: Update audio handling and documentation dependencies for improved installation and error handling

* feat: Refactor app configuration handling to prioritize .env.local overrides and improve environment variable management

* feat: Revise documentation deployment workflow to enhance dependency management and streamline build process

* modified docs workflow

* feat: Add site_dir configuration to mkdocs.yml for improved site structure

* feat: Allow mkdocs build to proceed with warnings by removing --strict flag

* fix: Update health check endpoint in postprovision script to use correct API path

* refactor: Remove outdated AZD deployment workflow and update documentation links for clarity

* fix: Ensure principal_id logging does not fail and handle local_state retrieval correctly

* refactor: Simplify state key handling in provider configuration by using environment name

* fix: Skip null values when loading static parameters from tfvars file to use Terraform defaults

* fix: Use coalesce function for location assignment in storage account resource

* refactor: Remove unused backend API public URL variable and related validation

* refactor: Remove unused backend API public URL and source phone number from environment parameter files

* improvements flow

* fix: Implement auto-selection and timeout for user input in setup scripts

* add value

* fix: Update naming conventions for foundry account and project variables in locals

* fix: Update name from rtaudioagent to artaudioagent in environment parameter files

* fix: Update name from rtaudioagent to artaudioagent in environment parameter files

* fix: Update documentation URLs to reflect new repository location

* feat: Enhance API documentation and tagging for better clarity and organization

* docs: Update documentation links and improve clarity across various guides

* refactor: replace deploy-azd workflow with reusable template and remove redundant summary job

- Updated the deployment workflow name to "Deploy to Azure".
- Replaced the usage of the old deploy-azd.yml with a new reusable template _template-deploy-azd.yml.
- Removed the deployment summary job and its associated steps to streamline the workflow.

* fix: Add run-name to the Azure deployment workflow for better clarity

* fix: Update condition for output extraction in deployment workflow

* fix: Update GitHub token to use secrets for enhanced security

* feat: Add optional GitHub PAT secret and enhance environment variable handling for Azure deployment

* adding rg as env var set at the gh env level

* fix: Add emoji to workflow names for better visibility

* feat: Update documentation workflow name and enhance README with deployment badges

* fix: Update README layout and enhance navigation links for better user experience

* fix: Restore header for ARTVoice Accelerator Framework in README

* add value

* fix: Update README layout for improved clarity and navigation

* Enhance provisioning scripts and documentation

- Updated postprovision.sh to clarify phone number provisioning steps and added guidance for obtaining a phone number via Azure Portal.
- Modified preprovision.sh to include preflight checks for tools, authentication, and providers before proceeding with provisioning.
- Added jq as a prerequisite in the getting-started documentation and provided installation instructions for various platforms.
- Created a new TODO-deployfixes.md file to document common issues encountered during deployment sessions, including resolutions for Docker errors, jq installation, and subscription registration.
- Expanded troubleshooting.md with detailed solutions for common deployment and provisioning issues, including authentication mismatches, Docker errors, jq command not found, and ACS phone number prompts.
- Updated variables.tf to improve the description of the voice_live_location variable, including a link to supported Azure regions.

* feat: Update branch triggers in workflow to include feat/troubleshooting-enhancements

* fix(ci): simplify test-azd-hooks workflow tests and run in parallel

- Remove fragile grep-based function extraction that caused syntax errors
- Run lint, linux, macos, windows tests in parallel (no dependencies)
- Trigger on all pushes to main/staging (remove path filters for push)
- Simplify backend configuration test to avoid function sourcing issues

* feat: Add troubleshooting steps for "bad interpreter" errors and enhance post-provisioning instructions for phone number configuration

* feat: Add preprovision hook execution to Linux, macOS, and Windows test jobs in CI workflow

* feat: Enhance AZD hook testing with postprovision execution and Azure CLI setup

* feat: Update test job names for clarity and enhance preflight checks for CI mode

* feat: Update preflight checks to conditionally include Docker in CI mode and log its status

* feat: Add Dev Container testing for AZD hooks with environment validation and summary reporting

* feat: Enhance deployment scripts with pre/post-provisioning hooks and Azure CLI extension checks

* feat: Add troubleshooting guidance for MkDocs module errors and update dev dependencies in uv.lock

* feat: Update Azure deployment workflows and normalize container memory formats

* feat: Add troubleshooting guidance for Terraform state lock errors and provide remote/local fix options

* feat: Remove outdated troubleshooting documentation for deployment issues

* Apply suggestion from @Copilot

Co-authored-by: Copilot <[email protected]>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <[email protected]>

* Update .github/workflows/test-azd-hooks.yml

Co-authored-by: Copilot <[email protected]>

* feat: Implement TTS Streaming Latency Analysis and Optimization Plan

- Added a comprehensive document outlining the critical latency issues in TTS playback within the Speech Cascade architecture.
- Identified root causes including processing loop deadlock, sentence buffering delays, queue-based event processing, and full synthesis before streaming.
- Proposed a multi-phase optimization strategy to address identified issues, including:
  - Phase 0: Fix processing loop deadlock by creating a dedicated TTS processing task.
  - Phase 1: Reduce sentence buffer threshold for earlier TTS chunk dispatch.
  - Phase 2: Implement parallel TTS prefetching to synthesize the next sentence while streaming.
  - Phase 3: Enable streaming TTS synthesis to stream audio while synthesizing.
  - Phase 4: Achieve full pipeline parallelism for LLM to TTS to WebSocket streaming.
- Created a detailed test implementation plan with metrics and success criteria to validate improvements.

test: Add unit tests for HandoffService

- Created unit tests for the HandoffService, covering handoff detection, target resolution, and handoff resolution methods.
- Implemented tests for greeting selection and context building to ensure proper functionality.
- Added tests for the HandoffResolution dataclass to verify properties and default values.

* feat: Add Scenario Builder component and integrate with RealTimeVoiceApp

- Introduced ScenarioBuilder component for visual orchestration of agent flows.
- Implemented drag-and-drop functionality for agents and handoff configuration.
- Added buttons in RealTimeVoiceApp for accessing Agent and Scenario Builders.
- Enhanced state management for agent scenarios, including creation and updates.
- Integrated new handoff editor for configuring agent interactions.

* Refactor code structure for improved readability and maintainability

* Add error handling for Redis connection issues and implement unit tests for HandoffService

- Enhanced AzureRedisManager to handle RedisClusterException and OSError during client connection attempts.
- Introduced comprehensive unit tests for HandoffService, covering handoff detection, target resolution, handoff resolution, greeting selection, and context building.
- Added tests for HandoffResolution dataclass to ensure correct property behavior and default values.

* Enhance LiveOrchestrator to handle context-only session updates without UI broadcasts

* Refactor LiveOrchestrator to prevent duplicate UI updates by omitting redundant session_updated broadcasts during context-only updates.

* Refactor environment variable assignment in deploy workflow for clarity

* Refactor tests and dependencies following module renaming and API changes

- Removed pytest-twisted from dev dependencies in pyproject.toml and uv.lock.
- Updated conftest.py to mock configuration and Azure OpenAI client for tests.
- Skipped tests in test_acs_media_lifecycle.py, test_acs_media_lifecycle_memory.py, and test_acs_simple.py due to dependencies on removed/renamed modules.
- Adjusted imports in test_artagent_wshelpers.py for orchestrator path change.
- Skipped tests in test_call_transfer_service.py due to API changes in toolstore.
- Updated datetime usage in test_demo_env_phrase_bias.py to use UTC.
- Modified websocket endpoint assertions in test_realtime.py to reflect new paths.
- Added new test file test_voice_handler_components.py for voice handler components.

* Add comprehensive tests for VoiceLive handler and orchestrator memory management

- Implement tests to verify cleanup functionality in LiveOrchestrator.
- Ensure proper registration and unregistration of orchestrators in the registry.
- Test background task tracking and cleanup mechanisms.
- Validate greeting task cancellation during orchestrator cleanup.
- Introduce memory leak detection tests to prevent unbounded growth in orchestrator registry.
- Verify user message history deque is properly bounded and cleared on cleanup.
- Add scenario update tests to ensure correct agent management during updates.
- Optimize hot path functions to ensure non-blocking behavior during network calls.

* feat: Enhance AgentBuilder with consistent field names and improved UI elements

* Refactor logging levels from info to debug in connection manager, warmable pool, Redis manager, speech auth manager, speech recognizer, and text-to-speech modules for improved log verbosity control. Remove outdated greeting context tests and add comprehensive scenario orchestration contract tests to ensure functional contracts are preserved during refactoring. Update session agent manager tests to use set comparison for agent listing to avoid dict ordering issues.

* feat: Add predefined handoff condition patterns to enhance scenario orchestration

* add value

* feat(metrics): Introduce shared metrics factory for lazy initialization

- Added `metrics_factory.py` to provide a common infrastructure for OpenTelemetry metrics.
- Implemented `LazyMeter`, `LazyHistogram`, and `LazyCounter` for lazy initialization of metrics.
- Updated `speech_cascade/metrics.py` to utilize the new shared metrics factory, simplifying metric initialization.
- Refactored `voicelive/metrics.py` to use the shared factory for consistent metric handling.
- Enhanced orchestrator classes in `speech_cascade/orchestrator.py` and `voicelive/orchestrator.py` to cache orchestrator configurations, improving performance and reducing redundant calls.
- Introduced utility functions for building common metric attributes, ensuring consistency across metrics.

* feat: Consolidate handoff logic into a unified HandoffService for consistent behavior across orchestrators and enhance documentation

* fix: Simplify environment determination logic in deployment workflow

* add value

* feat: Add user flow screenshots and enhance documentation for guided agent setup

* feat: Enhance scenario testing instructions for clarity and user guidance

* fix: Correct image paths in quickstart guide for accurate rendering

* feat: Add initial agent builder and template selection screenshots to quickstart guide

* feat: Add demo profile creation steps and related images to quickstart guide

* feat: Implement EasyAuth configuration script and integrate into post-provisioning process

* refactor: Remove backend IP restrictions configuration and related outputs

* Added non qualifying rush response to ensure clear model behavior

* updated order so confirmation statement is in the correct spot

* add value

* add value

* chore: Remove unused workflow images for demo profiles

* fix: Update demo profile creation images in quickstart guide

* fix: Update home screen image in quickstart guide

* fix: Update home screen and scenario images in quickstart guide

* add value

* add value

* add value

* add value

* add value

* add

* add value

* art

* add opentelemetry import for tracing support in TTS module

* refactor: update LiveOrchestrator to enhance user message history management and improve handoff context

* Refactor TTS Playback and Voice Handling

- Consolidated TTS playback logic into a unified class for speech cascade.
- Removed deprecated VoiceSessionContext and related compatibility shims.
- Enhanced error handling during tool initialization and event handler registration.
- Updated model configuration handling in UnifiedAgent to prioritize mode-specific settings.
- Improved logging for TTS synthesis and streaming processes.
- Added new handoff tool registration for dynamic routing.

* refactor: streamline EasyAuth enabling process in CI mode and improve interactive prompts

* refactor: enhance EasyAuth interactive prompts and streamline user choices

* refactor: enhance run-name logic for Azure deployment workflow

* fix: update environment logic for pull_request events in Azure deployment workflow

* refactor: update preprovision hook execution and streamline backend configuration

* feat: add context variable support for handoffs and enhance UI for variable mapping

* feat: enhance TTS processing by adding text sanitization and sentence boundary detection (#11)

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#14)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#15)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#13)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#12)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: Responses API Infrastructure & Dual Model Configuration (#16)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* fix: update project version to 2.0.0-beta in pyproject.toml

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: Orchestrator Integration + Optimizations (#17)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* feat: integrate Responses API in orchestrators and add optimizations

**Cascade Orchestrator:**
- Update model selection to use agent.get_model_for_mode('cascade')
- Integrate Responses API routing based on endpoint_preference
- Add error handling for unsupported parameters
- Extract TTS processing into separate tts_processor module

**VoiceLive Orchestrator:**
- Update to use agent.get_model_for_mode('voicelive')
- Add registry cleanup to prevent unbounded growth
- Improve memory management and stale orchestrator cleanup
- Extract DTMF processing into separate dtmf_processor module

**Tests:**
- Add test_cascade_orchestrator_entry_points
- Add test_cascade_llm_processing
- Add test_dtmf_processor

Depends on: PR #1 (Responses API Infrastructure)

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: Evaluation Framework + Frontend UI (#18)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* feat: add evaluation framework and frontend UI for Responses API

**Evaluation Framework:**
- Add EventRecorder with git commit SHA tracking
- Add API-aware scoring with budget adjustments for verbosity
- Add scenario runner for automated testing
- Add CLI for running evaluations
- Add validate_phases.py for phase-based validation
- Add wrappers for endpoint detection

**Frontend UI:**
- Add cascade_model and voicelive_model selectors in Agent Builder
- Add Responses API endpoint preference dropdown
- Add conditional fields for verbosity, reasoning_effort, etc.
- Update ScenarioBuilder with model configuration options
- Display API version fields

**Documentation:**
- Add docs/testing/model-evaluation.md
- Add evaluation playground Jupyter notebook

Depends on: PR #1 (Responses API Infrastructure)

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* Cleaning up lifecycle management logic into dedicated structure, keep main.py clean (#19)

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: voice handler refactoring and MediaHandler migration

Major refactoring of voice processing architecture:

Core Voice Changes:
- Implement new VoiceHandler as primary entry point for voice sessions
- Delete deprecated speech_cascade/tts.py (652 lines removed)
- Consolidate TTS functionality into voice/tts/playback.py
- Enhance CascadeOrchestrator with improved turn management
- Add VoiceSessionContext for clean dependency injection

API & Integration:
- Migrate /api/v1/browser/conversation to VoiceHandler
- Migrate /api/v1/media/stream to VoiceHandler
- Create MediaHandler→VoiceHandler compatibility alias
- Update media_handler.py for backward compatibility

Infrastructure:
- Improve telemetry with Azure-style span naming
- Enhance ACS helpers with better session management
- Update session terminator for lifecycle management
- Add orchestration improvements for unified agents

Configuration & Samples:
- Update auth agent and insurance scenario configs
- Add handoff tool enhancements with context variables
- Update gpt_flow sample for new patterns

Frontend:
- Refactor App.jsx for improved voice handling UI

Testing & Documentation:
- Add test_voice_handler_compat.py for backward compatibility
- Add MEDIAHANDLER_MIGRATION.md tracking document

This change maintains full backward compatibility while establishing
the foundation for cleaner voice processing patterns going forward.

Closes #[TBD]

* Enhance logging and user prompts in preflight and pre-provisioning scripts (#20)

- Updated logging functions in preflight-checks.sh, ssl-preprovision.sh, sync-appconfig.sh, postprovision.sh, and preprovision.sh for consistent output formatting.
- Improved user prompts for SSL certificate configuration and Azure Entra group creation in ssl-preprovision.sh and postprovision.sh.
- Added color-coded success, warning, and error messages for better visibility.
- Modified the handling of environment variables in postprovision.sh to ensure updates are made without overwriting existing values.
- Updated Terraform configurations to manage app configuration and cognitive account settings with soft delete options.

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: voice handler refactoring and MediaHandler migration (#21)

Major refactoring of voice processing architecture:

Core Voice Changes:
- Implement new VoiceHandler as primary entry point for voice sessions
- Delete deprecated speech_cascade/tts.py (652 lines removed)
- Consolidate TTS functionality into voice/tts/playback.py
- Enhance CascadeOrchestrator with improved turn management
- Add VoiceSessionContext for clean dependency injection

API & Integration:
- Migrate /api/v1/browser/conversation to VoiceHandler
- Migrate /api/v1/media/stream to VoiceHandler
- Create MediaHandler→VoiceHandler compatibility alias
- Update media_handler.py for backward compatibility

Infrastructure:
- Improve telemetry with Azure-style span naming
- Enhance ACS helpers with better session management
- Update session terminator for lifecycle management
- Add orchestration improvements for unified agents

Configuration & Samples:
- Update auth agent and insurance scenario configs
- Add handoff tool enhancements with context variables
- Update gpt_flow sample for new patterns

Frontend:
- Refactor App.jsx for improved voice handling UI

Testing & Documentation:
- Add test_voice_handler_compat.py for backward compatibility
- Add MEDIAHANDLER_MIGRATION.md tracking document

This change maintains full backward compatibility while establishing
the foundation for cleaner voice processing patterns going forward.

Closes #[TBD]

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* enhanced the scenariobuilder with flowy  (#22)

* docs: add comprehensive voice processing architecture documentation

Add complete documentation for the voice processing architecture:

New Documentation:
- docs/architecture/voice/README.md - Comprehensive voice architecture guide
  * VoiceHandler overview and usage patterns
  * TTS playback and text processing
  * Speech cascade pipeline documentation
  * Audio specifications for browser and ACS transports
  * Testing guidelines with actual test file references
  * Troubleshooting guide for common issues

- apps/artagent/backend/voice/README.md - Developer quick reference
  * Directory structure and module organization
  * Quick start examples
  * Common tasks and patterns
  * File location guide
  * Testing commands

Documentation Updates:
- docs/mkdocs.yml - Add voice architecture to navigation
- docs/operations/troubleshooting.md - Add voice-specific troubleshooting

Key Improvements:
- Fixed mkdocs formatting for proper list rendering
- Updated all test references to match actual test files:
  * test_voice_handler_components.py
  * test_voice_handler_compat.py
  * test_cascade_orchestrator_entry_points.py
  * test_cascade_llm_processing.py
- Verified all script references (quick_test.sh, test_orchestrator.py)
- Added prerequisites for running tests with dev dependencies
- Included both basic and advanced testing examples

All file paths and examples have been verified against the actual codebase.

Related to #[TBD]

* Add custom styles for Flowy flowchart integration with agent blocks

* feat: Enhance output port visibility logic in ScenarioGraphCanvas

* feat: Add expandable full prompt view for source agent in HandoffEditorDialog

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* Refactor ACS logging and add default orchestration scenario

- Removed info-level logging for ACS configuration details to reduce verbosity.
- Changed some logging statements to debug level for better log management.
- Updated peer.service attribute in telemetry to use "azure-communication-services".
- Introduced a new orchestration.yaml file defining a default customer service scenario with multiple agents and handoff configurations.

* Refactor ACS logging and add default orchestration scenario (#23)

- Removed info-level logging for ACS configuration details to reduce verbosity.
- Changed some logging statements to debug level for better log management.
- Updated peer.service attribute in telemetry to use "azure-communication-services".
- Introduced a new orchestration.yaml file defining a default customer service scenario with multiple agents and handoff configurations.

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* Enhance logging functions to use log_plain for consistency and clarity in local development setup script

* Disable view toggle buttons for chat/graph/timeline in ConversationControls

* Add panning functionality to ScenarioGraphCanvas and reset button

* Update CHANGELOG.md for 2.0.0-beta.1 release: add new features, enhancements, fixes, and infrastructure changes

* feat: Add mkdocs-mermaid-zoom dependency and update locust load test scripts

- Added mkdocs-mermaid-zoom to pyproject.toml and uv.lock for enhanced diagram support in documentation.
- Enhanced locustfile.acs_media.py with rate limit detection and error handling improvements.
- Introduced locustfile.browser_conversation.py for testing browser-based voice conversation endpoints.
- Improved metrics naming conventions for clarity in load testing results.

* feat: Update Voice Live readiness status to use event envelope format

---------

Co-authored-by: Jin Lee <[email protected]>
Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
Co-authored-by: Anna Quincy <[email protected]>
Co-authored-by: Jin Lee <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Consolidate infrastructure resource documentation into infra/README.md (#26)

* Initial plan

* Add comprehensive infrastructure resources documentation with private networking links

Co-authored-by: JinLee794 <[email protected]>

* Consolidate infrastructure documentation into infra/README.md

Co-authored-by: JinLee794 <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: JinLee794 <[email protected]>

* enhancement: infra docs readme update (#100)

* Update version and SKU name in staging params

* Change version for text-embedding-3-large model

Updated the version of the text-embedding-3-large model.

* Update main.tfvars.staging.json

* Update communication.tf

* feat: Enhance status envelope with optional label and update frontend to derive WS URL

- Added optional `label` parameter to `make_status_envelope` function in `envelopes.py` to allow custom labels in status messages.
- Updated `entrypoint.sh` to derive WebSocket URL from `BACKEND_URL` or use `WS_URL` if provided, replacing placeholders in frontend assets.
- Upgraded `js-yaml` and `vite` dependencies in `package.json` and `package-lock.json`.
- Enhanced `App.jsx` to format event type labels and summarize event data for better user experience.
- Introduced new demo scenarios in `DemoScenariosWidget.jsx` to showcase Microsoft Copilot Studio integration and ACS call routing.
- Added tests for call transfer events in `test_acs_events_handlers.py` to ensure correct envelope broadcasting for transfer accepted and failed events.
- Created a new Jupyter notebook for custom speech model demonstration in `12-custom-speech-model.ipynb`.
- Updated Terraform parameters to include a new text embedding model in `main.tfvars.dev.json`.

* refactor: Comment out unused email communication service domain resource

* refactor: Comment out unused Azure email communication service resources

* feat: Enhance event handling and UI components

- Added new utility functions for formatting event types and summarizing event data in App.jsx.
- Improved ChatBubble component to display event messages with formatted labels and timestamps.
- Updated DemoScenariosWidget to include new scenarios and enhanced filtering options based on tags.
- Introduced websocket URL derivation in postprovision.sh for better backend integration.
- Added tests for call transfer events in test_acs_events_handlers.py to ensure proper envelope broadcasting.
- Updated package.json to include js-yaml and upgraded vite version.

* add value

* feat: Enhance distributed session handling and improve PayPal agent interactions

- Implement distributed session bus using Redis for cross-replica session routing in connection manager.
- Add methods for publishing session envelopes to Redis channels.
- Introduce confirmation context for call center transfers to ensure explicit user consent.
- Update PayPal agent templates to clarify authentication and routing guidelines.
- Enhance real-time voice app to manage relay WebSocket connections and handle session updates more effectively.
- Improve error handling and logging for distributed session delivery and Redis interactions.
- Refactor session envelope handling in frontend to accommodate new event types and improve user experience.

* feat: Enhance status tone metadata and improve chat bubble styling

* feat: Implement background task handling for MFA delivery and improve greeting messages in handoff processes

* feat: Enhance call escalation process with detailed transfer context and improve PayPal agent handoff scenarios

* feat: Implement retry mechanism for browser session ID resolution in media streaming

* feat: Enhance session management and greeting handling across various components

* fixing session mapping for acs calls

* add value

* add value

* adding test file

* Adding agents and templates for credit card recommendation and fee dispute agents

* add value

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* add value

* add value

* Implement Azure Voice Live service integration and enhance Terraform configurations for voice model deployments

* add value

* Add Azure Voice Live model configuration and outputs

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* remove sensitive contact information and unused transfer agency client data

* feat: Introduce Agent Consolidation Plan with YAML-driven architecture

- Added a comprehensive proposal for consolidating agent architecture in `apps/rtagent/backend/src/agents/`.
- Established key goals including single source of truth for agent definitions, auto-discovery, and unified tool registry.
- Analyzed current architecture and identified pain points such as manual handoff registration and duplicate tool registries.
- Proposed a new solution architecture featuring enhanced YAML schema, auto-discovery engine, and unified tool registry.
- Detailed implementation roadmap divided into phases for gradual migration and integration.
- Included backward compatibility strategy to ensure existing agents function without modification.
- Provided extensive documentation on YAML schema, CLI tool usage, and migration checklist.

* Refactor speech cascade handler and routing for browser communication

- Updated speech cascade handler to prioritize `on_greeting` callback over `on_tts_request` for greeting events.
- Added `queue_user_text` method to `SpeechCascadeHandler` for queuing user text input.
- Changed routing from `/realtime` to `/browser` for browser communication endpoints.
- Modified orchestration logic to ensure TTS responses are sent with blocking behavior to prevent overlap.
- Introduced WebSocket helper functions for better organization and clarity in messaging.
- Enhanced connection manager to handle Redis pubsub reconnections on credential expiration.
- Updated frontend components to reflect routing changes for browser communication.
- Adjusted tests to align with the new browser routing and functionality.
- Commented out live metrics enabling condition in telemetry configuration for future consideration.

* feat(telemetry): add decorators for tracing LLM, dependency, speech, and ACS calls

- Introduced , , , and  decorators for OpenTelemetry instrumentation.
- Implemented  context manager for tracking conversation turns with detailed metrics.
- Added helper functions for recording GenAI and speech metrics.
- Enhanced span attributes for Azure Application Insights visualization.

* Remove telemetry configuration module (telemetry_config_v2.py) to streamline codebase and eliminate unused functionality.

* feat: Enhance telemetry and tracing for CosmosDB and latency tool

- Added OpenTelemetry tracing to CosmosDB operations with a decorator for latency tracking.
- Integrated tracing spans in the LatencyTool for better observability in Application Insights.
- Updated telemetry configuration to suppress noisy logs and added new attributes for speech cascade metrics.
- Created unit tests for SessionAgentManager, covering configuration management, override resolution, handoff management, and persistence.
- Removed outdated endpoints review document.

* feat: Add useBackendHealth hook for backend health checks and integrate with readiness, agents, and health endpoints

test: Implement integration tests for VoiceLive Session Agent Manager, covering agent resolution, handoff mapping, and runtime modifications

* WARNING!!!! MAJOR REFACTOR COMMIT

- Removed the VoiceLive SDK integration module from the backend.
- Added a new AgentTopologyPanel component to the frontend for displaying agent inventory and connections.
- Integrated the AgentTopologyPanel into the main application layout.
- Updated the BackendIndicator to include agent count and selection functionality.
- Enhanced the ConversationControls with a fixed view switcher for better accessibility.
- Improved the useBackendHealth hook to handle various agent data structures.
- Updated styles for better responsiveness and visual consistency across components.
- Modified utility functions to format agent inventory data correctly.
- Adjusted import paths in orchestrators and tests to reflect the new backend structure.

* feat: Enhance agent handoff process and response handling; refactor UI components for improved usability

* feat: Update change notes for v2/speech-orchestration-and-monitoring branch; highlight major features, improvements, and new agents

* refactor: Remove Unified Agent Configuration Module; streamline agent management and improve code organization

* feat: Enhance ProfileDetailsPanel with resizable functionality and UI improvements

- Added resizable panel feature to ProfileDetailsPanel, allowing users to adjust width dynamically.
- Updated panel styling for improved aesthetics, including a gradient background and adjusted borders.
- Enhanced scrollbar visibility and overflow handling for better user experience.

refactor: Simplify GraphListView filter logic

- Removed default selection logic for filters in GraphListView, allowing users to start with no filters applied.
- Cleaned up useEffect dependencies for better performance and clarity.

docs: Introduce Backend Voice & Agents Architecture documentation

- Added comprehensive documentation outlining the architecture of backend voice and agent modules.
- Detailed separation of concerns between voice transport and agent business logic.
- Included data flow diagrams and module responsibilities for clarity.

docs: Create Handoff Logic Inventory for better understanding of handoff processes

- Documented the handoff logic across backend voice and agent modules.
- Established a single source of truth for handoff mappings and protocols.
- Summarized cleanup phases and their impact on the codebase.

fix: Update logging to safely handle span names

- Modified TraceLogFilter to safely retrieve span names, preventing attribute errors with NonRecordingSpan.

fix: Adjust telemetry configuration to capture all loggers

- Changed logger_name default to an empty string in TelemetryConfig to capture all loggers.

* feat: Implement context-aware greeting rendering in VoiceLive agent; enhance session management and logging

* feat: Refactor agent configuration and voice handling; streamline agent switching and TTS integration

* feat: Enhance Agent Details Panel and Session Management

- Added sessionAgentConfig prop to AgentDetailsPanel for dynamic agent configuration display.
- Implemented logic to show agent name, description, tools, and model/voice details based on session configuration.
- Introduced a new PanelCard in AgentDetailsPanel to display session agent configuration, including model, voice, and prompt preview.
- Updated App component to fetch session agent configuration on agent panel visibility and manage agent creation/updating.
- Added validation for TTS client initialization in dedicated_tts_pool.py to ensure clients are ready before use.
- Enhanced on_demand_pool.py to validate cached resources and remove invalid ones.
- Improved error logging in text_to_speech.py to include detailed initialization failure information and added is_ready property for synthesizer readiness check.

* Refactor code structure for improved readability and maintainability

* feat: Enhance MemoManager with background persistence and lifecycle management

- Added support for background persistence in MemoManager, allowing non-blocking state saving to Redis.
- Implemented task deduplication to cancel previous persistence tasks when a new one is initiated.
- Removed unused auto-refresh functionality and related attributes from MemoManager.
- Updated tests to verify new persistence behavior and ensure proper task management.
- Enhanced error handling and logging for background persistence operations.

* feat: Add Connection Warmup Analysis document for Azure Speech & OpenAI optimization

* feat(session): enhance sess…
JinLee794 added a commit to AIappsGBBFactory/art-voice-agent-accelerator that referenced this pull request Jan 22, 2026
…oyment fixes/enhancements (#30)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* Syncinc to Azure Samples  (#95)

* Delete samples/labs/dev/leadership_phrases.txt

* Update version and SKU name in staging params

* Change version for text-embedding-3-large model

Updated the version of the text-embedding-3-large model.

* Update main.tfvars.staging.json

* Update communication.tf

* feat: Enhance status envelope with optional label and update frontend to derive WS URL

- Added optional `label` parameter to `make_status_envelope` function in `envelopes.py` to allow custom labels in status messages.
- Updated `entrypoint.sh` to derive WebSocket URL from `BACKEND_URL` or use `WS_URL` if provided, replacing placeholders in frontend assets.
- Upgraded `js-yaml` and `vite` dependencies in `package.json` and `package-lock.json`.
- Enhanced `App.jsx` to format event type labels and summarize event data for better user experience.
- Introduced new demo scenarios in `DemoScenariosWidget.jsx` to showcase Microsoft Copilot Studio integration and ACS call routing.
- Added tests for call transfer events in `test_acs_events_handlers.py` to ensure correct envelope broadcasting for transfer accepted and failed events.
- Created a new Jupyter notebook for custom speech model demonstration in `12-custom-speech-model.ipynb`.
- Updated Terraform parameters to include a new text embedding model in `main.tfvars.dev.json`.

* refactor: Comment out unused email communication service domain resource

* refactor: Comment out unused Azure email communication service resources

* feat: Enhance event handling and UI components

- Added new utility functions for formatting event types and summarizing event data in App.jsx.
- Improved ChatBubble component to display event messages with formatted labels and timestamps.
- Updated DemoScenariosWidget to include new scenarios and enhanced filtering options based on tags.
- Introduced websocket URL derivation in postprovision.sh for better backend integration.
- Added tests for call transfer events in test_acs_events_handlers.py to ensure proper envelope broadcasting.
- Updated package.json to include js-yaml and upgraded vite version.

* add value

* feat: Enhance distributed session handling and improve PayPal agent interactions

- Implement distributed session bus using Redis for cross-replica session routing in connection manager.
- Add methods for publishing session envelopes to Redis channels.
- Introduce confirmation context for call center transfers to ensure explicit user consent.
- Update PayPal agent templates to clarify authentication and routing guidelines.
- Enhance real-time voice app to manage relay WebSocket connections and handle session updates more effectively.
- Improve error handling and logging for distributed session delivery and Redis interactions.
- Refactor session envelope handling in frontend to accommodate new event types and improve user experience.

* feat: Enhance status tone metadata and improve chat bubble styling

* feat: Implement background task handling for MFA delivery and improve greeting messages in handoff processes

* feat: Enhance call escalation process with detailed transfer context and improve PayPal agent handoff scenarios

* feat: Implement retry mechanism for browser session ID resolution in media streaming

* feat: Enhance session management and greeting handling across various components

* fixing session mapping for acs calls

* add value

* add value

* adding test file

* Adding agents and templates for credit card recommendation and fee dispute agents

* add value

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* add value

* add value

* Implement Azure Voice Live service integration and enhance Terraform configurations for voice model deployments

* add value

* Add Azure Voice Live model configuration and outputs

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* remove sensitive contact information and unused transfer agency client data

* feat: Introduce Agent Consolidation Plan with YAML-driven architecture

- Added a comprehensive proposal for consolidating agent architecture in `apps/rtagent/backend/src/agents/`.
- Established key goals including single source of truth for agent definitions, auto-discovery, and unified tool registry.
- Analyzed current architecture and identified pain points such as manual handoff registration and duplicate tool registries.
- Proposed a new solution architecture featuring enhanced YAML schema, auto-discovery engine, and unified tool registry.
- Detailed implementation roadmap divided into phases for gradual migration and integration.
- Included backward compatibility strategy to ensure existing agents function without modification.
- Provided extensive documentation on YAML schema, CLI tool usage, and migration checklist.

* Refactor speech cascade handler and routing for browser communication

- Updated speech cascade handler to prioritize `on_greeting` callback over `on_tts_request` for greeting events.
- Added `queue_user_text` method to `SpeechCascadeHandler` for queuing user text input.
- Changed routing from `/realtime` to `/browser` for browser communication endpoints.
- Modified orchestration logic to ensure TTS responses are sent with blocking behavior to prevent overlap.
- Introduced WebSocket helper functions for better organization and clarity in messaging.
- Enhanced connection manager to handle Redis pubsub reconnections on credential expiration.
- Updated frontend components to reflect routing changes for browser communication.
- Adjusted tests to align with the new browser routing and functionality.
- Commented out live metrics enabling condition in telemetry configuration for future consideration.

* feat(telemetry): add decorators for tracing LLM, dependency, speech, and ACS calls

- Introduced , , , and  decorators for OpenTelemetry instrumentation.
- Implemented  context manager for tracking conversation turns with detailed metrics.
- Added helper functions for recording GenAI and speech metrics.
- Enhanced span attributes for Azure Application Insights visualization.

* Remove telemetry configuration module (telemetry_config_v2.py) to streamline codebase and eliminate unused functionality.

* feat: Enhance telemetry and tracing for CosmosDB and latency tool

- Added OpenTelemetry tracing to CosmosDB operations with a decorator for latency tracking.
- Integrated tracing spans in the LatencyTool for better observability in Application Insights.
- Updated telemetry configuration to suppress noisy logs and added new attributes for speech cascade metrics.
- Created unit tests for SessionAgentManager, covering configuration management, override resolution, handoff management, and persistence.
- Removed outdated endpoints review document.

* feat: Add useBackendHealth hook for backend health checks and integrate with readiness, agents, and health endpoints

test: Implement integration tests for VoiceLive Session Agent Manager, covering agent resolution, handoff mapping, and runtime modifications

* WARNING!!!! MAJOR REFACTOR COMMIT

- Removed the VoiceLive SDK integration module from the backend.
- Added a new AgentTopologyPanel component to the frontend for displaying agent inventory and connections.
- Integrated the AgentTopologyPanel into the main application layout.
- Updated the BackendIndicator to include agent count and selection functionality.
- Enhanced the ConversationControls with a fixed view switcher for better accessibility.
- Improved the useBackendHealth hook to handle various agent data structures.
- Updated styles for better responsiveness and visual consistency across components.
- Modified utility functions to format agent inventory data correctly.
- Adjusted import paths in orchestrators and tests to reflect the new backend structure.

* feat: Enhance agent handoff process and response handling; refactor UI components for improved usability

* feat: Update change notes for v2/speech-orchestration-and-monitoring branch; highlight major features, improvements, and new agents

* refactor: Remove Unified Agent Configuration Module; streamline agent management and improve code organization

* feat: Enhance ProfileDetailsPanel with resizable functionality and UI improvements

- Added resizable panel feature to ProfileDetailsPanel, allowing users to adjust width dynamically.
- Updated panel styling for improved aesthetics, including a gradient background and adjusted borders.
- Enhanced scrollbar visibility and overflow handling for better user experience.

refactor: Simplify GraphListView filter logic

- Removed default selection logic for filters in GraphListView, allowing users to start with no filters applied.
- Cleaned up useEffect dependencies for better performance and clarity.

docs: Introduce Backend Voice & Agents Architecture documentation

- Added comprehensive documentation outlining the architecture of backend voice and agent modules.
- Detailed separation of concerns between voice transport and agent business logic.
- Included data flow diagrams and module responsibilities for clarity.

docs: Create Handoff Logic Inventory for better understanding of handoff processes

- Documented the handoff logic across backend voice and agent modules.
- Established a single source of truth for handoff mappings and protocols.
- Summarized cleanup phases and their impact on the codebase.

fix: Update logging to safely handle span names

- Modified TraceLogFilter to safely retrieve span names, preventing attribute errors with NonRecordingSpan.

fix: Adjust telemetry configuration to capture all loggers

- Changed logger_name default to an empty string in TelemetryConfig to capture all loggers.

* feat: Implement context-aware greeting rendering in VoiceLive agent; enhance session management and logging

* feat: Refactor agent configuration and voice handling; streamline agent switching and TTS integration

* feat: Enhance Agent Details Panel and Session Management

- Added sessionAgentConfig prop to AgentDetailsPanel for dynamic agent configuration display.
- Implemented logic to show agent name, description, tools, and model/voice details based on session configuration.
- Introduced a new PanelCard in AgentDetailsPanel to display session agent configuration, including model, voice, and prompt preview.
- Updated App component to fetch session agent configuration on agent panel visibility and manage agent creation/updating.
- Added validation for TTS client initialization in dedicated_tts_pool.py to ensure clients are ready before use.
- Enhanced on_demand_pool.py to validate cached resources and remove invalid ones.
- Improved error logging in text_to_speech.py to include detailed initialization failure information and added is_ready property for synthesizer readiness check.

* Refactor code structure for improved readability and maintainability

* feat: Enhance MemoManager with background persistence and lifecycle management

- Added support for background persistence in MemoManager, allowing non-blocking state saving to Redis.
- Implemented task deduplication to cancel previous persistence tasks when a new one is initiated.
- Removed unused auto-refresh functionality and related attributes from MemoManager.
- Updated tests to verify new persistence behavior and ensure proper task management.
- Enhanced error handling and logging for background persistence operations.

* feat: Add Connection Warmup Analysis document for Azure Speech & OpenAI optimization

* feat(session): enhance session ID management and URL parameter support

- Added `pickSessionIdFromUrl` function to extract session ID from URL parameters.
- Updated `getOrCreateSessionId` to allow session ID restoration from URL.
- Refactored `setSessionId` for better logging and session management.
- Improved `createNewSessionId` to utilize `setSessionId`.

docs(api): restructure API documentation for clarity and completeness

- Organized API endpoints into categories: Health & Monitoring, Call Management, Media Streaming, Browser Conversations, Session Metrics, Agent Builder, Demo Environment, and TTS Health.
- Added detailed descriptions and examples for each endpoint.
- Included new sections for interactive API documentation and WebSocket endpoints.

docs(api-reference): update WebSocket message types and endpoint details

- Clarified message types for incoming audio data and control messages.
- Updated WebSocket endpoint URLs and query parameters for browser conversations and dashboard relay.

docs(architecture): refine agent architecture diagrams for clarity

- Adjusted diagrams to improve readability and understanding of the agent framework and orchestration.

fix(architecture): correct orchestration mode comparison table

- Updated ratings for Azure Speech voices and simplicity of setup in the orchestration comparison table.

docs(getting-started): add demo guide and enhance onboarding experience

- Introduced a new demo guide to facilitate user onboarding and provide structured paths for different user levels.
- Enhanced the getting started guide with tips and recommended paths for new users.

feat(aoai): implement OpenAI connection warmup to reduce latency

- Added `warm_openai_connection` function to pre-establish OpenAI connection and reduce cold-start latency on first call.

feat(speech): implement token warmup for Speech API to minimize latency

- Added `warm_token` method in `SpeechTokenManager` to pre-fetch tokens during startup, reducing latency on first API call.

* feat(healthcare): Implement Nurse Triage Agent with symptom assessment and routing capabilities

- Introduced a comprehensive voice agent for healthcare triage.
- Added agent configuration and prompt templates for patient interaction.
- Developed healthcare tools for patient verification, clinical knowledge search, and symptom urgency assessment.
- Integrated routing logic for scheduling appointments and emergency transfers.
- Enhanced documentation with demo scenarios and testing instructions.

* feat: Implement logging utility and session management

- Added a logger utility to manage console logging levels and filtering.
- Created session management functions to handle session IDs, including retrieval from URL and session creation.
- Developed styles for the frontend components to ensure consistent UI design.
- Configured Vite for the frontend build process with proper asset handling and environment variable support.
- Introduced scripts for starting the backend and frontend development servers, including Azure Dev Tunnel hosting.

* feat: Simplify agent handoff process by refining context management and removing redundant data collection

* feat: Enhance agent handoff process by managing conversation history and user context

* feat: Enhance message handling by persisting tool calls and results as JSON for conversation continuity

* feat: Implement silent handoff protocol across agents to enhance user experience and streamline transitions

* feat: Add Azure App Configuration module with RBAC and Key Vault integration

- Implemented main resource for Azure App Configuration in Terraform.
- Added outputs for App Configuration details including ID, name, and endpoint.
- Defined variables for App Configuration module, including identity and Key Vault integration.
- Updated main Terraform outputs to include App Configuration details.
- Enhanced error handling in Azure OpenAI client for missing endpoint configuration.
- Improved Redis manager to handle port configuration with better error messaging.
- Updated requirements to include Azure App Configuration SDKs.

* first code clean up

* enabling oidc

* Refactor code structure and remove redundant sections for improved readability and maintainability

* add value

* add value

* feat: Add managed certificate and domain registration modules

- Introduced `managed-cert-example.bicep` for example usage of managed certificate deployment.
- Created `managed-cert.bicep` to handle App Service Domain registration and managed SSL certificate generation.
- Implemented `role-assignment.bicep` for managing role assignments with support for built-in and custom roles.
- Added `windows-vm.bicep` for deploying a Windows VM as a jumphost with necessary networking components.
- Developed `peer-virtual-networks.bicep` for establishing peering between virtual networks.
- Implemented `private-dns-zone.bicep` for creating and linking private DNS zones to virtual networks.
- Created `private-endpoint.bicep` for deploying private endpoints with DNS zone integration.
- Added `vnet.bicep` for creating virtual networks with associated subnets and network security groups.
- Updated `types.bicep` with new types for model deployment, role assignments, and network configurations.
- Developed `secret.bicep` for managing secrets in Azure Key Vault.
- Created `network.bicep` for orchestrating network resources including virtual networks and subnets.

* fix: Update default location parameter in create_storage function for clarity

* feat: Extract AZURE_LOCATION from environment-specific tfvars file if not set

* feat: Implement location resolution with fallback chain in preprovision script

* fix: Update Dockerfile to install runtime dependencies and mitigate vulnerabilities

* chore: Update CHANGELOG for version 1.5.0 release and remove changenotes.md; enable remote builds in azure.yaml; enhance terraform initialization script with location prompts

* feat: Update launch configuration and scripts to use virtual environment with uv; enhance README for deployment clarity

* further deployment cleanup, docs update/tweaks, adding more todos

* removing unused dependency in src/herlpers.py

* refactor: Update architecture diagram in README for clarity and consistency in orchestration modes

* add value

* Refactor Terraform configuration:
- Update main.tf to adjust foundry account and project naming conventions.
- Remove feature flags and keys from appconfig module as they are now managed externally.
- Clean up variables.tf by removing unused variables and updating descriptions.
- Delete provider configuration file as it is no longer needed.
- Change default application name from "rtaudioagent" to "artagent" and adjust related settings.
- Modify connection settings and pool sizes for improved performance.

* feat: Enhance Azure Voice Live integration and refactor configuration management

* last changes

* feat: Add app configuration bootstrap to initialize environment variables

* Enhance configuration loading with .env.local support and update documentation

* fix voicelive output attributes

* add

* Refactor agent paths and update documentation for agent discovery and configuration

* Add Insurance Voice Agent Scenario documentation and update navigation

- Introduced a comprehensive guide for the Insurance Customer Service Scenario, detailing the security-focused multi-agent voice system for claims processing, fraud detection, and policy management.
- Updated mkdocs.yml to include the new Insurance documentation in the Industry Solutions section.

* Add integration proposal for Spec-Driven Development methodology in ARTVoice

* add value

* Enhance Terraform configuration and scripts for Voice Live integration

- Update Dockerfile to install dependencies and set up virtual environment.
- Modify initialize-terraform.sh and local-dev-setup.sh for improved script handling.
- Refactor sync-appconfig.sh to streamline key-value imports and feature flag management.
- Add provider.conf.json generation for remote state backend configuration.
- Update main.tf and outputs.tf to support new Voice Live model deployments.
- Introduce voice_live_location and voice_live_model_deployments variables in variables.tf.

* feat: Add Concierge agent configuration and prompts for banking scenarios

- Introduced a new YAML configuration for the Concierge agent, defining its voice, model, session, and tool configurations.
- Created a comprehensive prompt file for the Concierge agent, detailing voice and language settings, identity and trust guidelines, and operational modes.
- Implemented scenario orchestration analysis to address issues with agent initialization and fallback logic, ensuring the correct agent is set for banking scenarios.
- Renamed orchestration.yaml to scenario.yaml for consistency in scenario loading.
- Updated default start agent to BankingConcierge and added validation for agent existence at startup.

* feat: Enhance scenario loading to support orchestration.yaml naming convention

* feat: Implement scenario-based handoff map resolution for orchestrator configuration

* cicd test for azd deploy

* feat: Update audio handling and documentation dependencies for improved installation and error handling

* feat: Refactor app configuration handling to prioritize .env.local overrides and improve environment variable management

* feat: Revise documentation deployment workflow to enhance dependency management and streamline build process

* modified docs workflow

* feat: Add site_dir configuration to mkdocs.yml for improved site structure

* feat: Allow mkdocs build to proceed with warnings by removing --strict flag

* fix: Update health check endpoint in postprovision script to use correct API path

* refactor: Remove outdated AZD deployment workflow and update documentation links for clarity

* fix: Ensure principal_id logging does not fail and handle local_state retrieval correctly

* refactor: Simplify state key handling in provider configuration by using environment name

* fix: Skip null values when loading static parameters from tfvars file to use Terraform defaults

* fix: Use coalesce function for location assignment in storage account resource

* refactor: Remove unused backend API public URL variable and related validation

* refactor: Remove unused backend API public URL and source phone number from environment parameter files

* improvements flow

* fix: Implement auto-selection and timeout for user input in setup scripts

* add value

* fix: Update naming conventions for foundry account and project variables in locals

* fix: Update name from rtaudioagent to artaudioagent in environment parameter files

* fix: Update name from rtaudioagent to artaudioagent in environment parameter files

* fix: Update documentation URLs to reflect new repository location

* feat: Enhance API documentation and tagging for better clarity and organization

* docs: Update documentation links and improve clarity across various guides

* refactor: replace deploy-azd workflow with reusable template and remove redundant summary job

- Updated the deployment workflow name to "Deploy to Azure".
- Replaced the usage of the old deploy-azd.yml with a new reusable template _template-deploy-azd.yml.
- Removed the deployment summary job and its associated steps to streamline the workflow.

* fix: Add run-name to the Azure deployment workflow for better clarity

* fix: Update condition for output extraction in deployment workflow

* fix: Update GitHub token to use secrets for enhanced security

* feat: Add optional GitHub PAT secret and enhance environment variable handling for Azure deployment

* adding rg as env var set at the gh env level

* fix: Add emoji to workflow names for better visibility

* feat: Update documentation workflow name and enhance README with deployment badges

* fix: Update README layout and enhance navigation links for better user experience

* fix: Restore header for ARTVoice Accelerator Framework in README

* add value

* fix: Update README layout for improved clarity and navigation

* Enhance provisioning scripts and documentation

- Updated postprovision.sh to clarify phone number provisioning steps and added guidance for obtaining a phone number via Azure Portal.
- Modified preprovision.sh to include preflight checks for tools, authentication, and providers before proceeding with provisioning.
- Added jq as a prerequisite in the getting-started documentation and provided installation instructions for various platforms.
- Created a new TODO-deployfixes.md file to document common issues encountered during deployment sessions, including resolutions for Docker errors, jq installation, and subscription registration.
- Expanded troubleshooting.md with detailed solutions for common deployment and provisioning issues, including authentication mismatches, Docker errors, jq command not found, and ACS phone number prompts.
- Updated variables.tf to improve the description of the voice_live_location variable, including a link to supported Azure regions.

* feat: Update branch triggers in workflow to include feat/troubleshooting-enhancements

* fix(ci): simplify test-azd-hooks workflow tests and run in parallel

- Remove fragile grep-based function extraction that caused syntax errors
- Run lint, linux, macos, windows tests in parallel (no dependencies)
- Trigger on all pushes to main/staging (remove path filters for push)
- Simplify backend configuration test to avoid function sourcing issues

* feat: Add troubleshooting steps for "bad interpreter" errors and enhance post-provisioning instructions for phone number configuration

* feat: Add preprovision hook execution to Linux, macOS, and Windows test jobs in CI workflow

* feat: Enhance AZD hook testing with postprovision execution and Azure CLI setup

* feat: Update test job names for clarity and enhance preflight checks for CI mode

* feat: Update preflight checks to conditionally include Docker in CI mode and log its status

* feat: Add Dev Container testing for AZD hooks with environment validation and summary reporting

* feat: Enhance deployment scripts with pre/post-provisioning hooks and Azure CLI extension checks

* feat: Add troubleshooting guidance for MkDocs module errors and update dev dependencies in uv.lock

* feat: Update Azure deployment workflows and normalize container memory formats

* feat: Add troubleshooting guidance for Terraform state lock errors and provide remote/local fix options

* feat: Remove outdated troubleshooting documentation for deployment issues

* Apply suggestion from @Copilot

Co-authored-by: Copilot <[email protected]>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <[email protected]>

* Update .github/workflows/test-azd-hooks.yml

Co-authored-by: Copilot <[email protected]>

* feat: Implement TTS Streaming Latency Analysis and Optimization Plan

- Added a comprehensive document outlining the critical latency issues in TTS playback within the Speech Cascade architecture.
- Identified root causes including processing loop deadlock, sentence buffering delays, queue-based event processing, and full synthesis before streaming.
- Proposed a multi-phase optimization strategy to address identified issues, including:
  - Phase 0: Fix processing loop deadlock by creating a dedicated TTS processing task.
  - Phase 1: Reduce sentence buffer threshold for earlier TTS chunk dispatch.
  - Phase 2: Implement parallel TTS prefetching to synthesize the next sentence while streaming.
  - Phase 3: Enable streaming TTS synthesis to stream audio while synthesizing.
  - Phase 4: Achieve full pipeline parallelism for LLM to TTS to WebSocket streaming.
- Created a detailed test implementation plan with metrics and success criteria to validate improvements.

test: Add unit tests for HandoffService

- Created unit tests for the HandoffService, covering handoff detection, target resolution, and handoff resolution methods.
- Implemented tests for greeting selection and context building to ensure proper functionality.
- Added tests for the HandoffResolution dataclass to verify properties and default values.

* feat: Add Scenario Builder component and integrate with RealTimeVoiceApp

- Introduced ScenarioBuilder component for visual orchestration of agent flows.
- Implemented drag-and-drop functionality for agents and handoff configuration.
- Added buttons in RealTimeVoiceApp for accessing Agent and Scenario Builders.
- Enhanced state management for agent scenarios, including creation and updates.
- Integrated new handoff editor for configuring agent interactions.

* Refactor code structure for improved readability and maintainability

* Add error handling for Redis connection issues and implement unit tests for HandoffService

- Enhanced AzureRedisManager to handle RedisClusterException and OSError during client connection attempts.
- Introduced comprehensive unit tests for HandoffService, covering handoff detection, target resolution, handoff resolution, greeting selection, and context building.
- Added tests for HandoffResolution dataclass to ensure correct property behavior and default values.

* Enhance LiveOrchestrator to handle context-only session updates without UI broadcasts

* Refactor LiveOrchestrator to prevent duplicate UI updates by omitting redundant session_updated broadcasts during context-only updates.

* Refactor environment variable assignment in deploy workflow for clarity

* Refactor tests and dependencies following module renaming and API changes

- Removed pytest-twisted from dev dependencies in pyproject.toml and uv.lock.
- Updated conftest.py to mock configuration and Azure OpenAI client for tests.
- Skipped tests in test_acs_media_lifecycle.py, test_acs_media_lifecycle_memory.py, and test_acs_simple.py due to dependencies on removed/renamed modules.
- Adjusted imports in test_artagent_wshelpers.py for orchestrator path change.
- Skipped tests in test_call_transfer_service.py due to API changes in toolstore.
- Updated datetime usage in test_demo_env_phrase_bias.py to use UTC.
- Modified websocket endpoint assertions in test_realtime.py to reflect new paths.
- Added new test file test_voice_handler_components.py for voice handler components.

* Add comprehensive tests for VoiceLive handler and orchestrator memory management

- Implement tests to verify cleanup functionality in LiveOrchestrator.
- Ensure proper registration and unregistration of orchestrators in the registry.
- Test background task tracking and cleanup mechanisms.
- Validate greeting task cancellation during orchestrator cleanup.
- Introduce memory leak detection tests to prevent unbounded growth in orchestrator registry.
- Verify user message history deque is properly bounded and cleared on cleanup.
- Add scenario update tests to ensure correct agent management during updates.
- Optimize hot path functions to ensure non-blocking behavior during network calls.

* feat: Enhance AgentBuilder with consistent field names and improved UI elements

* Refactor logging levels from info to debug in connection manager, warmable pool, Redis manager, speech auth manager, speech recognizer, and text-to-speech modules for improved log verbosity control. Remove outdated greeting context tests and add comprehensive scenario orchestration contract tests to ensure functional contracts are preserved during refactoring. Update session agent manager tests to use set comparison for agent listing to avoid dict ordering issues.

* feat: Add predefined handoff condition patterns to enhance scenario orchestration

* add value

* feat(metrics): Introduce shared metrics factory for lazy initialization

- Added `metrics_factory.py` to provide a common infrastructure for OpenTelemetry metrics.
- Implemented `LazyMeter`, `LazyHistogram`, and `LazyCounter` for lazy initialization of metrics.
- Updated `speech_cascade/metrics.py` to utilize the new shared metrics factory, simplifying metric initialization.
- Refactored `voicelive/metrics.py` to use the shared factory for consistent metric handling.
- Enhanced orchestrator classes in `speech_cascade/orchestrator.py` and `voicelive/orchestrator.py` to cache orchestrator configurations, improving performance and reducing redundant calls.
- Introduced utility functions for building common metric attributes, ensuring consistency across metrics.

* feat: Consolidate handoff logic into a unified HandoffService for consistent behavior across orchestrators and enhance documentation

* fix: Simplify environment determination logic in deployment workflow

* add value

* feat: Add user flow screenshots and enhance documentation for guided agent setup

* feat: Enhance scenario testing instructions for clarity and user guidance

* fix: Correct image paths in quickstart guide for accurate rendering

* feat: Add initial agent builder and template selection screenshots to quickstart guide

* feat: Add demo profile creation steps and related images to quickstart guide

* feat: Implement EasyAuth configuration script and integrate into post-provisioning process

* refactor: Remove backend IP restrictions configuration and related outputs

* Added non qualifying rush response to ensure clear model behavior

* updated order so confirmation statement is in the correct spot

* add value

* add value

* chore: Remove unused workflow images for demo profiles

* fix: Update demo profile creation images in quickstart guide

* fix: Update home screen image in quickstart guide

* fix: Update home screen and scenario images in quickstart guide

* add value

* add value

* add value

* add value

* add value

* add

* add value

* art

* add opentelemetry import for tracing support in TTS module

* refactor: update LiveOrchestrator to enhance user message history management and improve handoff context

* Refactor TTS Playback and Voice Handling

- Consolidated TTS playback logic into a unified class for speech cascade.
- Removed deprecated VoiceSessionContext and related compatibility shims.
- Enhanced error handling during tool initialization and event handler registration.
- Updated model configuration handling in UnifiedAgent to prioritize mode-specific settings.
- Improved logging for TTS synthesis and streaming processes.
- Added new handoff tool registration for dynamic routing.

* refactor: streamline EasyAuth enabling process in CI mode and improve interactive prompts

* refactor: enhance EasyAuth interactive prompts and streamline user choices

* refactor: enhance run-name logic for Azure deployment workflow

* fix: update environment logic for pull_request events in Azure deployment workflow

* refactor: update preprovision hook execution and streamline backend configuration

* feat: add context variable support for handoffs and enhance UI for variable mapping

* feat: enhance TTS processing by adding text sanitization and sentence boundary detection (#11)

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#14)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#15)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#13)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#12)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: Responses API Infrastructure & Dual Model Configuration (#16)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* fix: update project version to 2.0.0-beta in pyproject.toml

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: Orchestrator Integration + Optimizations (#17)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* feat: integrate Responses API in orchestrators and add optimizations

**Cascade Orchestrator:**
- Update model selection to use agent.get_model_for_mode('cascade')
- Integrate Responses API routing based on endpoint_preference
- Add error handling for unsupported parameters
- Extract TTS processing into separate tts_processor module

**VoiceLive Orchestrator:**
- Update to use agent.get_model_for_mode('voicelive')
- Add registry cleanup to prevent unbounded growth
- Improve memory management and stale orchestrator cleanup
- Extract DTMF processing into separate dtmf_processor module

**Tests:**
- Add test_cascade_orchestrator_entry_points
- Add test_cascade_llm_processing
- Add test_dtmf_processor

Depends on: PR #1 (Responses API Infrastructure)

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: Evaluation Framework + Frontend UI (#18)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* feat: add evaluation framework and frontend UI for Responses API

**Evaluation Framework:**
- Add EventRecorder with git commit SHA tracking
- Add API-aware scoring with budget adjustments for verbosity
- Add scenario runner for automated testing
- Add CLI for running evaluations
- Add validate_phases.py for phase-based validation
- Add wrappers for endpoint detection

**Frontend UI:**
- Add cascade_model and voicelive_model selectors in Agent Builder
- Add Responses API endpoint preference dropdown
- Add conditional fields for verbosity, reasoning_effort, etc.
- Update ScenarioBuilder with model configuration options
- Display API version fields

**Documentation:**
- Add docs/testing/model-evaluation.md
- Add evaluation playground Jupyter notebook

Depends on: PR #1 (Responses API Infrastructure)

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* Cleaning up lifecycle management logic into dedicated structure, keep main.py clean (#19)

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: voice handler refactoring and MediaHandler migration

Major refactoring of voice processing architecture:

Core Voice Changes:
- Implement new VoiceHandler as primary entry point for voice sessions
- Delete deprecated speech_cascade/tts.py (652 lines removed)
- Consolidate TTS functionality into voice/tts/playback.py
- Enhance CascadeOrchestrator with improved turn management
- Add VoiceSessionContext for clean dependency injection

API & Integration:
- Migrate /api/v1/browser/conversation to VoiceHandler
- Migrate /api/v1/media/stream to VoiceHandler
- Create MediaHandler→VoiceHandler compatibility alias
- Update media_handler.py for backward compatibility

Infrastructure:
- Improve telemetry with Azure-style span naming
- Enhance ACS helpers with better session management
- Update session terminator for lifecycle management
- Add orchestration improvements for unified agents

Configuration & Samples:
- Update auth agent and insurance scenario configs
- Add handoff tool enhancements with context variables
- Update gpt_flow sample for new patterns

Frontend:
- Refactor App.jsx for improved voice handling UI

Testing & Documentation:
- Add test_voice_handler_compat.py for backward compatibility
- Add MEDIAHANDLER_MIGRATION.md tracking document

This change maintains full backward compatibility while establishing
the foundation for cleaner voice processing patterns going forward.

Closes #[TBD]

* Enhance logging and user prompts in preflight and pre-provisioning scripts (#20)

- Updated logging functions in preflight-checks.sh, ssl-preprovision.sh, sync-appconfig.sh, postprovision.sh, and preprovision.sh for consistent output formatting.
- Improved user prompts for SSL certificate configuration and Azure Entra group creation in ssl-preprovision.sh and postprovision.sh.
- Added color-coded success, warning, and error messages for better visibility.
- Modified the handling of environment variables in postprovision.sh to ensure updates are made without overwriting existing values.
- Updated Terraform configurations to manage app configuration and cognitive account settings with soft delete options.

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: voice handler refactoring and MediaHandler migration (#21)

Major refactoring of voice processing architecture:

Core Voice Changes:
- Implement new VoiceHandler as primary entry point for voice sessions
- Delete deprecated speech_cascade/tts.py (652 lines removed)
- Consolidate TTS functionality into voice/tts/playback.py
- Enhance CascadeOrchestrator with improved turn management
- Add VoiceSessionContext for clean dependency injection

API & Integration:
- Migrate /api/v1/browser/conversation to VoiceHandler
- Migrate /api/v1/media/stream to VoiceHandler
- Create MediaHandler→VoiceHandler compatibility alias
- Update media_handler.py for backward compatibility

Infrastructure:
- Improve telemetry with Azure-style span naming
- Enhance ACS helpers with better session management
- Update session terminator for lifecycle management
- Add orchestration improvements for unified agents

Configuration & Samples:
- Update auth agent and insurance scenario configs
- Add handoff tool enhancements with context variables
- Update gpt_flow sample for new patterns

Frontend:
- Refactor App.jsx for improved voice handling UI

Testing & Documentation:
- Add test_voice_handler_compat.py for backward compatibility
- Add MEDIAHANDLER_MIGRATION.md tracking document

This change maintains full backward compatibility while establishing
the foundation for cleaner voice processing patterns going forward.

Closes #[TBD]

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* enhanced the scenariobuilder with flowy  (#22)

* docs: add comprehensive voice processing architecture documentation

Add complete documentation for the voice processing architecture:

New Documentation:
- docs/architecture/voice/README.md - Comprehensive voice architecture guide
  * VoiceHandler overview and usage patterns
  * TTS playback and text processing
  * Speech cascade pipeline documentation
  * Audio specifications for browser and ACS transports
  * Testing guidelines with actual test file references
  * Troubleshooting guide for common issues

- apps/artagent/backend/voice/README.md - Developer quick reference
  * Directory structure and module organization
  * Quick start examples
  * Common tasks and patterns
  * File location guide
  * Testing commands

Documentation Updates:
- docs/mkdocs.yml - Add voice architecture to navigation
- docs/operations/troubleshooting.md - Add voice-specific troubleshooting

Key Improvements:
- Fixed mkdocs formatting for proper list rendering
- Updated all test references to match actual test files:
  * test_voice_handler_components.py
  * test_voice_handler_compat.py
  * test_cascade_orchestrator_entry_points.py
  * test_cascade_llm_processing.py
- Verified all script references (quick_test.sh, test_orchestrator.py)
- Added prerequisites for running tests with dev dependencies
- Included both basic and advanced testing examples

All file paths and examples have been verified against the actual codebase.

Related to #[TBD]

* Add custom styles for Flowy flowchart integration with agent blocks

* feat: Enhance output port visibility logic in ScenarioGraphCanvas

* feat: Add expandable full prompt view for source agent in HandoffEditorDialog

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* Refactor ACS logging and add default orchestration scenario

- Removed info-level logging for ACS configuration details to reduce verbosity.
- Changed some logging statements to debug level for better log management.
- Updated peer.service attribute in telemetry to use "azure-communication-services".
- Introduced a new orchestration.yaml file defining a default customer service scenario with multiple agents and handoff configurations.

* Refactor ACS logging and add default orchestration scenario (#23)

- Removed info-level logging for ACS configuration details to reduce verbosity.
- Changed some logging statements to debug level for better log management.
- Updated peer.service attribute in telemetry to use "azure-communication-services".
- Introduced a new orchestration.yaml file defining a default customer service scenario with multiple agents and handoff configurations.

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* Enhance logging functions to use log_plain for consistency and clarity in local development setup script

* Disable view toggle buttons for chat/graph/timeline in ConversationControls

* Add panning functionality to ScenarioGraphCanvas and reset button

* Update CHANGELOG.md for 2.0.0-beta.1 release: add new features, enhancements, fixes, and infrastructure changes

* feat: Add mkdocs-mermaid-zoom dependency and update locust load test scripts

- Added mkdocs-mermaid-zoom to pyproject.toml and uv.lock for enhanced diagram support in documentation.
- Enhanced locustfile.acs_media.py with rate limit detection and error handling improvements.
- Introduced locustfile.browser_conversation.py for testing browser-based voice conversation endpoints.
- Improved metrics naming conventions for clarity in load testing results.

* feat: Update Voice Live readiness status to use event envelope format

---------

Co-authored-by: Jin Lee <[email protected]>
Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
Co-authored-by: Anna Quincy <[email protected]>
Co-authored-by: Jin Lee <[email protected]>
Co-authored-by: Copilot <[email protected]>

* enhancement: infra docs readme update (#100)

* Update version and SKU name in staging params

* Change version for text-embedding-3-large model

Updated the version of the text-embedding-3-large model.

* Update main.tfvars.staging.json

* Update communication.tf

* feat: Enhance status envelope with optional label and update frontend to derive WS URL

- Added optional `label` parameter to `make_status_envelope` function in `envelopes.py` to allow custom labels in status messages.
- Updated `entrypoint.sh` to derive WebSocket URL from `BACKEND_URL` or use `WS_URL` if provided, replacing placeholders in frontend assets.
- Upgraded `js-yaml` and `vite` dependencies in `package.json` and `package-lock.json`.
- Enhanced `App.jsx` to format event type labels and summarize event data for better user experience.
- Introduced new demo scenarios in `DemoScenariosWidget.jsx` to showcase Microsoft Copilot Studio integration and ACS call routing.
- Added tests for call transfer events in `test_acs_events_handlers.py` to ensure correct envelope broadcasting for transfer accepted and failed events.
- Created a new Jupyter notebook for custom speech model demonstration in `12-custom-speech-model.ipynb`.
- Updated Terraform parameters to include a new text embedding model in `main.tfvars.dev.json`.

* refactor: Comment out unused email communication service domain resource

* refactor: Comment out unused Azure email communication service resources

* feat: Enhance event handling and UI components

- Added new utility functions for formatting event types and summarizing event data in App.jsx.
- Improved ChatBubble component to display event messages with formatted labels and timestamps.
- Updated DemoScenariosWidget to include new scenarios and enhanced filtering options based on tags.
- Introduced websocket URL derivation in postprovision.sh for better backend integration.
- Added tests for call transfer events in test_acs_events_handlers.py to ensure proper envelope broadcasting.
- Updated package.json to include js-yaml and upgraded vite version.

* add value

* feat: Enhance distributed session handling and improve PayPal agent interactions

- Implement distributed session bus using Redis for cross-replica session routing in connection manager.
- Add methods for publishing session envelopes to Redis channels.
- Introduce confirmation context for call center transfers to ensure explicit user consent.
- Update PayPal agent templates to clarify authentication and routing guidelines.
- Enhance real-time voice app to manage relay WebSocket connections and handle session updates more effectively.
- Improve error handling and logging for distributed session delivery and Redis interactions.
- Refactor session envelope handling in frontend to accommodate new event types and improve user experience.

* feat: Enhance status tone metadata and improve chat bubble styling

* feat: Implement background task handling for MFA delivery and improve greeting messages in handoff processes

* feat: Enhance call escalation process with detailed transfer context and improve PayPal agent handoff scenarios

* feat: Implement retry mechanism for browser session ID resolution in media streaming

* feat: Enhance session management and greeting handling across various components

* fixing session mapping for acs calls

* add value

* add value

* adding test file

* Adding agents and templates for credit card recommendation and fee dispute agents

* add value

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* add value

* add value

* Implement Azure Voice Live service integration and enhance Terraform configurations for voice model deployments

* add value

* Add Azure Voice Live model configuration and outputs

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* remove sensitive contact information and unused transfer agency client data

* feat: Introduce Agent Consolidation Plan with YAML-driven architecture

- Added a comprehensive proposal for consolidating agent architecture in `apps/rtagent/backend/src/agents/`.
- Established key goals including single source of truth for agent definitions, auto-discovery, and unified tool registry.
- Analyzed current architecture and identified pain points such as manual handoff registration and duplicate tool registries.
- Proposed a new solution architecture featuring enhanced YAML schema, auto-discovery engine, and unified tool registry.
- Detailed implementation roadmap divided into phases for gradual migration and integration.
- Included backward compatibility strategy to ensure existing agents function without modification.
- Provided extensive documentation on YAML schema, CLI tool usage, and migration checklist.

* Refactor speech cascade handler and routing for browser communication

- Updated speech cascade handler to prioritize `on_greeting` callback over `on_tts_request` for greeting events.
- Added `queue_user_text` method to `SpeechCascadeHandler` for queuing user text input.
- Changed routing from `/realtime` to `/browser` for browser communication endpoints.
- Modified orchestration logic to ensure TTS responses are sent with blocking behavior to prevent overlap.
- Introduced WebSocket helper functions for better organization and clarity in messaging.
- Enhanced connection manager to handle Redis pubsub reconnections on credential expiration.
- Updated frontend components to reflect routing changes for browser communication.
- Adjusted tests to align with the new browser routing and functionality.
- Commented out live metrics enabling condition in telemetry configuration for future consideration.

* feat(telemetry): add decorators for tracing LLM, dependency, speech, and ACS calls

- Introduced , , , and  decorators for OpenTelemetry instrumentation.
- Implemented  context manager for tracking conversation turns with detailed metrics.
- Added helper functions for recording GenAI and speech metrics.
- Enhanced span attributes for Azure Application Insights visualization.

* Remove telemetry configuration module (telemetry_config_v2.py) to streamline codebase and eliminate unused functionality.

* feat: Enhance telemetry and tracing for CosmosDB and latency tool

- Added OpenTelemetry tracing to CosmosDB operations with a decorator for latency tracking.
- Integrated tracing spans in the LatencyTool for better observability in Application Insights.
- Updated telemetry configuration to suppress noisy logs and added new attributes for speech cascade metrics.
- Created unit tests for SessionAgentManager, covering configuration management, override resolution, handoff management, and persistence.
- Removed outdated endpoints review document.

* feat: Add useBackendHealth hook for backend health checks and integrate with readiness, agents, and health endpoints

test: Implement integration tests for VoiceLive Session Agent Manager, covering agent resolution, handoff mapping, and runtime modifications

* WARNING!!!! MAJOR REFACTOR COMMIT

- Removed the VoiceLive SDK integration module from the backend.
- Added a new AgentTopologyPanel component to the frontend for displaying agent inventory and connections.
- Integrated the AgentTopologyPanel into the main application layout.
- Updated the BackendIndicator to include agent count and selection functionality.
- Enhanced the ConversationControls with a fixed view switcher for better accessibility.
- Improved the useBackendHealth hook to handle various agent data structures.
- Updated styles for better responsiveness and visual consistency across components.
- Modified utility functions to format agent inventory data correctly.
- Adjusted import paths in orchestrators and tests to reflect the new backend structure.

* feat: Enhance agent handoff process and response handling; refactor UI components for improved usability

* feat: Update change notes for v2/speech-orchestration-and-monitoring branch; highlight major features, improvements, and new agents

* refactor: Remove Unified Agent Configuration Module; streamline agent management and improve code organization

* feat: Enhance ProfileDetailsPanel with resizable functionality and UI improvements

- Added resizable panel feature to ProfileDetailsPanel, allowing users to adjust width dynamically.
- Updated panel styling for improved aesthetics, including a gradient background and adjusted borders.
- Enhanced scrollbar visibility and overflow handling for better user experience.

refactor: Simplify GraphListView filter logic

- Removed default selection logic for filters in GraphListView, allowing users to start with no filters applied.
- Cleaned up useEffect dependencies for better performance and clarity.

docs: Introduce Backend Voice & Agents Architecture documentation

- Added comprehensive documentation outlining the architecture of backend voice and agent modules.
- Detailed separation of concerns between voice transport and agent business logic.
- Included data flow diagrams and module responsibilities for clarity.

docs: Create Handoff Logic Inventory for better understanding of handoff processes

- Documented the handoff logic across backend voice and agent modules.
- Established a single source of truth for handoff mappings and protocols.
- Summarized cleanup phases and their impact on the codebase.

fix: Update logging to safely handle span names

- Modified TraceLogFilter to safely retrieve span names, preventing attribute errors with NonRecordingSpan.

fix: Adjust telemetry configuration to capture all loggers

- Changed logger_name default to an empty string in TelemetryConfig to capture all loggers.

* feat: Implement context-aware greeting rendering in VoiceLive agent; enhance session management and logging

* feat: Refactor agent configuration and voice handling; streamline agent switching and TTS integration

* feat: Enhance Agent Details Panel and Session Management

- Added sessionAgentConfig prop to AgentDetailsPanel for dynamic agent configuration display.
- Implemented logic to show agent name, description, tools, and model/voice details based on session configuration.
- Introduced a new PanelCard in AgentDetailsPanel to display session agent configuration, including model, voice, and prompt preview.
- Updated App component to fetch session agent configuration on agent panel visibility and manage agent creation/updating.
- Added validation for TTS client initialization in dedicated_tts_pool.py to ensure clients are ready before use.
- Enhanced on_demand_pool.py to validate cached resources and remove invalid ones.
- Improved error logging in text_to_speech.py to include detailed initialization failure information and added is_ready property for synthesizer readiness check.

* Refactor code structure for improved readability and maintainability

* feat: Enhance MemoManager with background persistence and lifecycle management

- Added support for background persistence in MemoManager, allowing non-blocking state saving to Redis.
- Implemented task deduplication to cancel previous persistence tasks when a new one is initiated.
- Removed unused auto-refresh functionality and related attributes from MemoManager.
- Updated tests to verify new persistence behavior and ensure proper task management.
- Enhanced error handling and logging for background persistence operations.

* feat: Add Connection Warmup Analysis document for Azure Speech & OpenAI optimization

* feat(session): enhance session ID management and URL parameter support

- Added `pickSessionIdFromUrl` function to extract session ID from URL parameters.
- Updated `getOrCreateSessionId` to allow session ID restoration from URL.
- Refactored `setSessionId` for better logging and session management.
- Improved `createNewSessionId` to utilize `setSessionId`.

docs(api): restructure API documentation for clarity and completeness

- Organized API endpoints into categories: Health & Monitoring, Call Management, Media Streaming, Browser Conversations, Session Metrics, …
JinLee794 added a commit to AIappsGBBFactory/art-voice-agent-accelerator that referenced this pull request Jan 26, 2026
…t overview (#40)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* Syncinc to Azure Samples  (#95)

* Delete samples/labs/dev/leadership_phrases.txt

* Update version and SKU name in staging params

* Change version for text-embedding-3-large model

Updated the version of the text-embedding-3-large model.

* Update main.tfvars.staging.json

* Update communication.tf

* feat: Enhance status envelope with optional label and update frontend to derive WS URL

- Added optional `label` parameter to `make_status_envelope` function in `envelopes.py` to allow custom labels in status messages.
- Updated `entrypoint.sh` to derive WebSocket URL from `BACKEND_URL` or use `WS_URL` if provided, replacing placeholders in frontend assets.
- Upgraded `js-yaml` and `vite` dependencies in `package.json` and `package-lock.json`.
- Enhanced `App.jsx` to format event type labels and summarize event data for better user experience.
- Introduced new demo scenarios in `DemoScenariosWidget.jsx` to showcase Microsoft Copilot Studio integration and ACS call routing.
- Added tests for call transfer events in `test_acs_events_handlers.py` to ensure correct envelope broadcasting for transfer accepted and failed events.
- Created a new Jupyter notebook for custom speech model demonstration in `12-custom-speech-model.ipynb`.
- Updated Terraform parameters to include a new text embedding model in `main.tfvars.dev.json`.

* refactor: Comment out unused email communication service domain resource

* refactor: Comment out unused Azure email communication service resources

* feat: Enhance event handling and UI components

- Added new utility functions for formatting event types and summarizing event data in App.jsx.
- Improved ChatBubble component to display event messages with formatted labels and timestamps.
- Updated DemoScenariosWidget to include new scenarios and enhanced filtering options based on tags.
- Introduced websocket URL derivation in postprovision.sh for better backend integration.
- Added tests for call transfer events in test_acs_events_handlers.py to ensure proper envelope broadcasting.
- Updated package.json to include js-yaml and upgraded vite version.

* add value

* feat: Enhance distributed session handling and improve PayPal agent interactions

- Implement distributed session bus using Redis for cross-replica session routing in connection manager.
- Add methods for publishing session envelopes to Redis channels.
- Introduce confirmation context for call center transfers to ensure explicit user consent.
- Update PayPal agent templates to clarify authentication and routing guidelines.
- Enhance real-time voice app to manage relay WebSocket connections and handle session updates more effectively.
- Improve error handling and logging for distributed session delivery and Redis interactions.
- Refactor session envelope handling in frontend to accommodate new event types and improve user experience.

* feat: Enhance status tone metadata and improve chat bubble styling

* feat: Implement background task handling for MFA delivery and improve greeting messages in handoff processes

* feat: Enhance call escalation process with detailed transfer context and improve PayPal agent handoff scenarios

* feat: Implement retry mechanism for browser session ID resolution in media streaming

* feat: Enhance session management and greeting handling across various components

* fixing session mapping for acs calls

* add value

* add value

* adding test file

* Adding agents and templates for credit card recommendation and fee dispute agents

* add value

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* add value

* add value

* Implement Azure Voice Live service integration and enhance Terraform configurations for voice model deployments

* add value

* Add Azure Voice Live model configuration and outputs

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* remove sensitive contact information and unused transfer agency client data

* feat: Introduce Agent Consolidation Plan with YAML-driven architecture

- Added a comprehensive proposal for consolidating agent architecture in `apps/rtagent/backend/src/agents/`.
- Established key goals including single source of truth for agent definitions, auto-discovery, and unified tool registry.
- Analyzed current architecture and identified pain points such as manual handoff registration and duplicate tool registries.
- Proposed a new solution architecture featuring enhanced YAML schema, auto-discovery engine, and unified tool registry.
- Detailed implementation roadmap divided into phases for gradual migration and integration.
- Included backward compatibility strategy to ensure existing agents function without modification.
- Provided extensive documentation on YAML schema, CLI tool usage, and migration checklist.

* Refactor speech cascade handler and routing for browser communication

- Updated speech cascade handler to prioritize `on_greeting` callback over `on_tts_request` for greeting events.
- Added `queue_user_text` method to `SpeechCascadeHandler` for queuing user text input.
- Changed routing from `/realtime` to `/browser` for browser communication endpoints.
- Modified orchestration logic to ensure TTS responses are sent with blocking behavior to prevent overlap.
- Introduced WebSocket helper functions for better organization and clarity in messaging.
- Enhanced connection manager to handle Redis pubsub reconnections on credential expiration.
- Updated frontend components to reflect routing changes for browser communication.
- Adjusted tests to align with the new browser routing and functionality.
- Commented out live metrics enabling condition in telemetry configuration for future consideration.

* feat(telemetry): add decorators for tracing LLM, dependency, speech, and ACS calls

- Introduced , , , and  decorators for OpenTelemetry instrumentation.
- Implemented  context manager for tracking conversation turns with detailed metrics.
- Added helper functions for recording GenAI and speech metrics.
- Enhanced span attributes for Azure Application Insights visualization.

* Remove telemetry configuration module (telemetry_config_v2.py) to streamline codebase and eliminate unused functionality.

* feat: Enhance telemetry and tracing for CosmosDB and latency tool

- Added OpenTelemetry tracing to CosmosDB operations with a decorator for latency tracking.
- Integrated tracing spans in the LatencyTool for better observability in Application Insights.
- Updated telemetry configuration to suppress noisy logs and added new attributes for speech cascade metrics.
- Created unit tests for SessionAgentManager, covering configuration management, override resolution, handoff management, and persistence.
- Removed outdated endpoints review document.

* feat: Add useBackendHealth hook for backend health checks and integrate with readiness, agents, and health endpoints

test: Implement integration tests for VoiceLive Session Agent Manager, covering agent resolution, handoff mapping, and runtime modifications

* WARNING!!!! MAJOR REFACTOR COMMIT

- Removed the VoiceLive SDK integration module from the backend.
- Added a new AgentTopologyPanel component to the frontend for displaying agent inventory and connections.
- Integrated the AgentTopologyPanel into the main application layout.
- Updated the BackendIndicator to include agent count and selection functionality.
- Enhanced the ConversationControls with a fixed view switcher for better accessibility.
- Improved the useBackendHealth hook to handle various agent data structures.
- Updated styles for better responsiveness and visual consistency across components.
- Modified utility functions to format agent inventory data correctly.
- Adjusted import paths in orchestrators and tests to reflect the new backend structure.

* feat: Enhance agent handoff process and response handling; refactor UI components for improved usability

* feat: Update change notes for v2/speech-orchestration-and-monitoring branch; highlight major features, improvements, and new agents

* refactor: Remove Unified Agent Configuration Module; streamline agent management and improve code organization

* feat: Enhance ProfileDetailsPanel with resizable functionality and UI improvements

- Added resizable panel feature to ProfileDetailsPanel, allowing users to adjust width dynamically.
- Updated panel styling for improved aesthetics, including a gradient background and adjusted borders.
- Enhanced scrollbar visibility and overflow handling for better user experience.

refactor: Simplify GraphListView filter logic

- Removed default selection logic for filters in GraphListView, allowing users to start with no filters applied.
- Cleaned up useEffect dependencies for better performance and clarity.

docs: Introduce Backend Voice & Agents Architecture documentation

- Added comprehensive documentation outlining the architecture of backend voice and agent modules.
- Detailed separation of concerns between voice transport and agent business logic.
- Included data flow diagrams and module responsibilities for clarity.

docs: Create Handoff Logic Inventory for better understanding of handoff processes

- Documented the handoff logic across backend voice and agent modules.
- Established a single source of truth for handoff mappings and protocols.
- Summarized cleanup phases and their impact on the codebase.

fix: Update logging to safely handle span names

- Modified TraceLogFilter to safely retrieve span names, preventing attribute errors with NonRecordingSpan.

fix: Adjust telemetry configuration to capture all loggers

- Changed logger_name default to an empty string in TelemetryConfig to capture all loggers.

* feat: Implement context-aware greeting rendering in VoiceLive agent; enhance session management and logging

* feat: Refactor agent configuration and voice handling; streamline agent switching and TTS integration

* feat: Enhance Agent Details Panel and Session Management

- Added sessionAgentConfig prop to AgentDetailsPanel for dynamic agent configuration display.
- Implemented logic to show agent name, description, tools, and model/voice details based on session configuration.
- Introduced a new PanelCard in AgentDetailsPanel to display session agent configuration, including model, voice, and prompt preview.
- Updated App component to fetch session agent configuration on agent panel visibility and manage agent creation/updating.
- Added validation for TTS client initialization in dedicated_tts_pool.py to ensure clients are ready before use.
- Enhanced on_demand_pool.py to validate cached resources and remove invalid ones.
- Improved error logging in text_to_speech.py to include detailed initialization failure information and added is_ready property for synthesizer readiness check.

* Refactor code structure for improved readability and maintainability

* feat: Enhance MemoManager with background persistence and lifecycle management

- Added support for background persistence in MemoManager, allowing non-blocking state saving to Redis.
- Implemented task deduplication to cancel previous persistence tasks when a new one is initiated.
- Removed unused auto-refresh functionality and related attributes from MemoManager.
- Updated tests to verify new persistence behavior and ensure proper task management.
- Enhanced error handling and logging for background persistence operations.

* feat: Add Connection Warmup Analysis document for Azure Speech & OpenAI optimization

* feat(session): enhance session ID management and URL parameter support

- Added `pickSessionIdFromUrl` function to extract session ID from URL parameters.
- Updated `getOrCreateSessionId` to allow session ID restoration from URL.
- Refactored `setSessionId` for better logging and session management.
- Improved `createNewSessionId` to utilize `setSessionId`.

docs(api): restructure API documentation for clarity and completeness

- Organized API endpoints into categories: Health & Monitoring, Call Management, Media Streaming, Browser Conversations, Session Metrics, Agent Builder, Demo Environment, and TTS Health.
- Added detailed descriptions and examples for each endpoint.
- Included new sections for interactive API documentation and WebSocket endpoints.

docs(api-reference): update WebSocket message types and endpoint details

- Clarified message types for incoming audio data and control messages.
- Updated WebSocket endpoint URLs and query parameters for browser conversations and dashboard relay.

docs(architecture): refine agent architecture diagrams for clarity

- Adjusted diagrams to improve readability and understanding of the agent framework and orchestration.

fix(architecture): correct orchestration mode comparison table

- Updated ratings for Azure Speech voices and simplicity of setup in the orchestration comparison table.

docs(getting-started): add demo guide and enhance onboarding experience

- Introduced a new demo guide to facilitate user onboarding and provide structured paths for different user levels.
- Enhanced the getting started guide with tips and recommended paths for new users.

feat(aoai): implement OpenAI connection warmup to reduce latency

- Added `warm_openai_connection` function to pre-establish OpenAI connection and reduce cold-start latency on first call.

feat(speech): implement token warmup for Speech API to minimize latency

- Added `warm_token` method in `SpeechTokenManager` to pre-fetch tokens during startup, reducing latency on first API call.

* feat(healthcare): Implement Nurse Triage Agent with symptom assessment and routing capabilities

- Introduced a comprehensive voice agent for healthcare triage.
- Added agent configuration and prompt templates for patient interaction.
- Developed healthcare tools for patient verification, clinical knowledge search, and symptom urgency assessment.
- Integrated routing logic for scheduling appointments and emergency transfers.
- Enhanced documentation with demo scenarios and testing instructions.

* feat: Implement logging utility and session management

- Added a logger utility to manage console logging levels and filtering.
- Created session management functions to handle session IDs, including retrieval from URL and session creation.
- Developed styles for the frontend components to ensure consistent UI design.
- Configured Vite for the frontend build process with proper asset handling and environment variable support.
- Introduced scripts for starting the backend and frontend development servers, including Azure Dev Tunnel hosting.

* feat: Simplify agent handoff process by refining context management and removing redundant data collection

* feat: Enhance agent handoff process by managing conversation history and user context

* feat: Enhance message handling by persisting tool calls and results as JSON for conversation continuity

* feat: Implement silent handoff protocol across agents to enhance user experience and streamline transitions

* feat: Add Azure App Configuration module with RBAC and Key Vault integration

- Implemented main resource for Azure App Configuration in Terraform.
- Added outputs for App Configuration details including ID, name, and endpoint.
- Defined variables for App Configuration module, including identity and Key Vault integration.
- Updated main Terraform outputs to include App Configuration details.
- Enhanced error handling in Azure OpenAI client for missing endpoint configuration.
- Improved Redis manager to handle port configuration with better error messaging.
- Updated requirements to include Azure App Configuration SDKs.

* first code clean up

* enabling oidc

* Refactor code structure and remove redundant sections for improved readability and maintainability

* add value

* add value

* feat: Add managed certificate and domain registration modules

- Introduced `managed-cert-example.bicep` for example usage of managed certificate deployment.
- Created `managed-cert.bicep` to handle App Service Domain registration and managed SSL certificate generation.
- Implemented `role-assignment.bicep` for managing role assignments with support for built-in and custom roles.
- Added `windows-vm.bicep` for deploying a Windows VM as a jumphost with necessary networking components.
- Developed `peer-virtual-networks.bicep` for establishing peering between virtual networks.
- Implemented `private-dns-zone.bicep` for creating and linking private DNS zones to virtual networks.
- Created `private-endpoint.bicep` for deploying private endpoints with DNS zone integration.
- Added `vnet.bicep` for creating virtual networks with associated subnets and network security groups.
- Updated `types.bicep` with new types for model deployment, role assignments, and network configurations.
- Developed `secret.bicep` for managing secrets in Azure Key Vault.
- Created `network.bicep` for orchestrating network resources including virtual networks and subnets.

* fix: Update default location parameter in create_storage function for clarity

* feat: Extract AZURE_LOCATION from environment-specific tfvars file if not set

* feat: Implement location resolution with fallback chain in preprovision script

* fix: Update Dockerfile to install runtime dependencies and mitigate vulnerabilities

* chore: Update CHANGELOG for version 1.5.0 release and remove changenotes.md; enable remote builds in azure.yaml; enhance terraform initialization script with location prompts

* feat: Update launch configuration and scripts to use virtual environment with uv; enhance README for deployment clarity

* further deployment cleanup, docs update/tweaks, adding more todos

* removing unused dependency in src/herlpers.py

* refactor: Update architecture diagram in README for clarity and consistency in orchestration modes

* add value

* Refactor Terraform configuration:
- Update main.tf to adjust foundry account and project naming conventions.
- Remove feature flags and keys from appconfig module as they are now managed externally.
- Clean up variables.tf by removing unused variables and updating descriptions.
- Delete provider configuration file as it is no longer needed.
- Change default application name from "rtaudioagent" to "artagent" and adjust related settings.
- Modify connection settings and pool sizes for improved performance.

* feat: Enhance Azure Voice Live integration and refactor configuration management

* last changes

* feat: Add app configuration bootstrap to initialize environment variables

* Enhance configuration loading with .env.local support and update documentation

* fix voicelive output attributes

* add

* Refactor agent paths and update documentation for agent discovery and configuration

* Add Insurance Voice Agent Scenario documentation and update navigation

- Introduced a comprehensive guide for the Insurance Customer Service Scenario, detailing the security-focused multi-agent voice system for claims processing, fraud detection, and policy management.
- Updated mkdocs.yml to include the new Insurance documentation in the Industry Solutions section.

* Add integration proposal for Spec-Driven Development methodology in ARTVoice

* add value

* Enhance Terraform configuration and scripts for Voice Live integration

- Update Dockerfile to install dependencies and set up virtual environment.
- Modify initialize-terraform.sh and local-dev-setup.sh for improved script handling.
- Refactor sync-appconfig.sh to streamline key-value imports and feature flag management.
- Add provider.conf.json generation for remote state backend configuration.
- Update main.tf and outputs.tf to support new Voice Live model deployments.
- Introduce voice_live_location and voice_live_model_deployments variables in variables.tf.

* feat: Add Concierge agent configuration and prompts for banking scenarios

- Introduced a new YAML configuration for the Concierge agent, defining its voice, model, session, and tool configurations.
- Created a comprehensive prompt file for the Concierge agent, detailing voice and language settings, identity and trust guidelines, and operational modes.
- Implemented scenario orchestration analysis to address issues with agent initialization and fallback logic, ensuring the correct agent is set for banking scenarios.
- Renamed orchestration.yaml to scenario.yaml for consistency in scenario loading.
- Updated default start agent to BankingConcierge and added validation for agent existence at startup.

* feat: Enhance scenario loading to support orchestration.yaml naming convention

* feat: Implement scenario-based handoff map resolution for orchestrator configuration

* cicd test for azd deploy

* feat: Update audio handling and documentation dependencies for improved installation and error handling

* feat: Refactor app configuration handling to prioritize .env.local overrides and improve environment variable management

* feat: Revise documentation deployment workflow to enhance dependency management and streamline build process

* modified docs workflow

* feat: Add site_dir configuration to mkdocs.yml for improved site structure

* feat: Allow mkdocs build to proceed with warnings by removing --strict flag

* fix: Update health check endpoint in postprovision script to use correct API path

* refactor: Remove outdated AZD deployment workflow and update documentation links for clarity

* fix: Ensure principal_id logging does not fail and handle local_state retrieval correctly

* refactor: Simplify state key handling in provider configuration by using environment name

* fix: Skip null values when loading static parameters from tfvars file to use Terraform defaults

* fix: Use coalesce function for location assignment in storage account resource

* refactor: Remove unused backend API public URL variable and related validation

* refactor: Remove unused backend API public URL and source phone number from environment parameter files

* improvements flow

* fix: Implement auto-selection and timeout for user input in setup scripts

* add value

* fix: Update naming conventions for foundry account and project variables in locals

* fix: Update name from rtaudioagent to artaudioagent in environment parameter files

* fix: Update name from rtaudioagent to artaudioagent in environment parameter files

* fix: Update documentation URLs to reflect new repository location

* feat: Enhance API documentation and tagging for better clarity and organization

* docs: Update documentation links and improve clarity across various guides

* refactor: replace deploy-azd workflow with reusable template and remove redundant summary job

- Updated the deployment workflow name to "Deploy to Azure".
- Replaced the usage of the old deploy-azd.yml with a new reusable template _template-deploy-azd.yml.
- Removed the deployment summary job and its associated steps to streamline the workflow.

* fix: Add run-name to the Azure deployment workflow for better clarity

* fix: Update condition for output extraction in deployment workflow

* fix: Update GitHub token to use secrets for enhanced security

* feat: Add optional GitHub PAT secret and enhance environment variable handling for Azure deployment

* adding rg as env var set at the gh env level

* fix: Add emoji to workflow names for better visibility

* feat: Update documentation workflow name and enhance README with deployment badges

* fix: Update README layout and enhance navigation links for better user experience

* fix: Restore header for ARTVoice Accelerator Framework in README

* add value

* fix: Update README layout for improved clarity and navigation

* Enhance provisioning scripts and documentation

- Updated postprovision.sh to clarify phone number provisioning steps and added guidance for obtaining a phone number via Azure Portal.
- Modified preprovision.sh to include preflight checks for tools, authentication, and providers before proceeding with provisioning.
- Added jq as a prerequisite in the getting-started documentation and provided installation instructions for various platforms.
- Created a new TODO-deployfixes.md file to document common issues encountered during deployment sessions, including resolutions for Docker errors, jq installation, and subscription registration.
- Expanded troubleshooting.md with detailed solutions for common deployment and provisioning issues, including authentication mismatches, Docker errors, jq command not found, and ACS phone number prompts.
- Updated variables.tf to improve the description of the voice_live_location variable, including a link to supported Azure regions.

* feat: Update branch triggers in workflow to include feat/troubleshooting-enhancements

* fix(ci): simplify test-azd-hooks workflow tests and run in parallel

- Remove fragile grep-based function extraction that caused syntax errors
- Run lint, linux, macos, windows tests in parallel (no dependencies)
- Trigger on all pushes to main/staging (remove path filters for push)
- Simplify backend configuration test to avoid function sourcing issues

* feat: Add troubleshooting steps for "bad interpreter" errors and enhance post-provisioning instructions for phone number configuration

* feat: Add preprovision hook execution to Linux, macOS, and Windows test jobs in CI workflow

* feat: Enhance AZD hook testing with postprovision execution and Azure CLI setup

* feat: Update test job names for clarity and enhance preflight checks for CI mode

* feat: Update preflight checks to conditionally include Docker in CI mode and log its status

* feat: Add Dev Container testing for AZD hooks with environment validation and summary reporting

* feat: Enhance deployment scripts with pre/post-provisioning hooks and Azure CLI extension checks

* feat: Add troubleshooting guidance for MkDocs module errors and update dev dependencies in uv.lock

* feat: Update Azure deployment workflows and normalize container memory formats

* feat: Add troubleshooting guidance for Terraform state lock errors and provide remote/local fix options

* feat: Remove outdated troubleshooting documentation for deployment issues

* Apply suggestion from @Copilot

Co-authored-by: Copilot <[email protected]>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <[email protected]>

* Update .github/workflows/test-azd-hooks.yml

Co-authored-by: Copilot <[email protected]>

* feat: Implement TTS Streaming Latency Analysis and Optimization Plan

- Added a comprehensive document outlining the critical latency issues in TTS playback within the Speech Cascade architecture.
- Identified root causes including processing loop deadlock, sentence buffering delays, queue-based event processing, and full synthesis before streaming.
- Proposed a multi-phase optimization strategy to address identified issues, including:
  - Phase 0: Fix processing loop deadlock by creating a dedicated TTS processing task.
  - Phase 1: Reduce sentence buffer threshold for earlier TTS chunk dispatch.
  - Phase 2: Implement parallel TTS prefetching to synthesize the next sentence while streaming.
  - Phase 3: Enable streaming TTS synthesis to stream audio while synthesizing.
  - Phase 4: Achieve full pipeline parallelism for LLM to TTS to WebSocket streaming.
- Created a detailed test implementation plan with metrics and success criteria to validate improvements.

test: Add unit tests for HandoffService

- Created unit tests for the HandoffService, covering handoff detection, target resolution, and handoff resolution methods.
- Implemented tests for greeting selection and context building to ensure proper functionality.
- Added tests for the HandoffResolution dataclass to verify properties and default values.

* feat: Add Scenario Builder component and integrate with RealTimeVoiceApp

- Introduced ScenarioBuilder component for visual orchestration of agent flows.
- Implemented drag-and-drop functionality for agents and handoff configuration.
- Added buttons in RealTimeVoiceApp for accessing Agent and Scenario Builders.
- Enhanced state management for agent scenarios, including creation and updates.
- Integrated new handoff editor for configuring agent interactions.

* Refactor code structure for improved readability and maintainability

* Add error handling for Redis connection issues and implement unit tests for HandoffService

- Enhanced AzureRedisManager to handle RedisClusterException and OSError during client connection attempts.
- Introduced comprehensive unit tests for HandoffService, covering handoff detection, target resolution, handoff resolution, greeting selection, and context building.
- Added tests for HandoffResolution dataclass to ensure correct property behavior and default values.

* Enhance LiveOrchestrator to handle context-only session updates without UI broadcasts

* Refactor LiveOrchestrator to prevent duplicate UI updates by omitting redundant session_updated broadcasts during context-only updates.

* Refactor environment variable assignment in deploy workflow for clarity

* Refactor tests and dependencies following module renaming and API changes

- Removed pytest-twisted from dev dependencies in pyproject.toml and uv.lock.
- Updated conftest.py to mock configuration and Azure OpenAI client for tests.
- Skipped tests in test_acs_media_lifecycle.py, test_acs_media_lifecycle_memory.py, and test_acs_simple.py due to dependencies on removed/renamed modules.
- Adjusted imports in test_artagent_wshelpers.py for orchestrator path change.
- Skipped tests in test_call_transfer_service.py due to API changes in toolstore.
- Updated datetime usage in test_demo_env_phrase_bias.py to use UTC.
- Modified websocket endpoint assertions in test_realtime.py to reflect new paths.
- Added new test file test_voice_handler_components.py for voice handler components.

* Add comprehensive tests for VoiceLive handler and orchestrator memory management

- Implement tests to verify cleanup functionality in LiveOrchestrator.
- Ensure proper registration and unregistration of orchestrators in the registry.
- Test background task tracking and cleanup mechanisms.
- Validate greeting task cancellation during orchestrator cleanup.
- Introduce memory leak detection tests to prevent unbounded growth in orchestrator registry.
- Verify user message history deque is properly bounded and cleared on cleanup.
- Add scenario update tests to ensure correct agent management during updates.
- Optimize hot path functions to ensure non-blocking behavior during network calls.

* feat: Enhance AgentBuilder with consistent field names and improved UI elements

* Refactor logging levels from info to debug in connection manager, warmable pool, Redis manager, speech auth manager, speech recognizer, and text-to-speech modules for improved log verbosity control. Remove outdated greeting context tests and add comprehensive scenario orchestration contract tests to ensure functional contracts are preserved during refactoring. Update session agent manager tests to use set comparison for agent listing to avoid dict ordering issues.

* feat: Add predefined handoff condition patterns to enhance scenario orchestration

* add value

* feat(metrics): Introduce shared metrics factory for lazy initialization

- Added `metrics_factory.py` to provide a common infrastructure for OpenTelemetry metrics.
- Implemented `LazyMeter`, `LazyHistogram`, and `LazyCounter` for lazy initialization of metrics.
- Updated `speech_cascade/metrics.py` to utilize the new shared metrics factory, simplifying metric initialization.
- Refactored `voicelive/metrics.py` to use the shared factory for consistent metric handling.
- Enhanced orchestrator classes in `speech_cascade/orchestrator.py` and `voicelive/orchestrator.py` to cache orchestrator configurations, improving performance and reducing redundant calls.
- Introduced utility functions for building common metric attributes, ensuring consistency across metrics.

* feat: Consolidate handoff logic into a unified HandoffService for consistent behavior across orchestrators and enhance documentation

* fix: Simplify environment determination logic in deployment workflow

* add value

* feat: Add user flow screenshots and enhance documentation for guided agent setup

* feat: Enhance scenario testing instructions for clarity and user guidance

* fix: Correct image paths in quickstart guide for accurate rendering

* feat: Add initial agent builder and template selection screenshots to quickstart guide

* feat: Add demo profile creation steps and related images to quickstart guide

* feat: Implement EasyAuth configuration script and integrate into post-provisioning process

* refactor: Remove backend IP restrictions configuration and related outputs

* Added non qualifying rush response to ensure clear model behavior

* updated order so confirmation statement is in the correct spot

* add value

* add value

* chore: Remove unused workflow images for demo profiles

* fix: Update demo profile creation images in quickstart guide

* fix: Update home screen image in quickstart guide

* fix: Update home screen and scenario images in quickstart guide

* add value

* add value

* add value

* add value

* add value

* add

* add value

* art

* add opentelemetry import for tracing support in TTS module

* refactor: update LiveOrchestrator to enhance user message history management and improve handoff context

* Refactor TTS Playback and Voice Handling

- Consolidated TTS playback logic into a unified class for speech cascade.
- Removed deprecated VoiceSessionContext and related compatibility shims.
- Enhanced error handling during tool initialization and event handler registration.
- Updated model configuration handling in UnifiedAgent to prioritize mode-specific settings.
- Improved logging for TTS synthesis and streaming processes.
- Added new handoff tool registration for dynamic routing.

* refactor: streamline EasyAuth enabling process in CI mode and improve interactive prompts

* refactor: enhance EasyAuth interactive prompts and streamline user choices

* refactor: enhance run-name logic for Azure deployment workflow

* fix: update environment logic for pull_request events in Azure deployment workflow

* refactor: update preprovision hook execution and streamline backend configuration

* feat: add context variable support for handoffs and enhance UI for variable mapping

* feat: enhance TTS processing by adding text sanitization and sentence boundary detection (#11)

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#14)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#15)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#13)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#12)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: Responses API Infrastructure & Dual Model Configuration (#16)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* fix: update project version to 2.0.0-beta in pyproject.toml

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: Orchestrator Integration + Optimizations (#17)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* feat: integrate Responses API in orchestrators and add optimizations

**Cascade Orchestrator:**
- Update model selection to use agent.get_model_for_mode('cascade')
- Integrate Responses API routing based on endpoint_preference
- Add error handling for unsupported parameters
- Extract TTS processing into separate tts_processor module

**VoiceLive Orchestrator:**
- Update to use agent.get_model_for_mode('voicelive')
- Add registry cleanup to prevent unbounded growth
- Improve memory management and stale orchestrator cleanup
- Extract DTMF processing into separate dtmf_processor module

**Tests:**
- Add test_cascade_orchestrator_entry_points
- Add test_cascade_llm_processing
- Add test_dtmf_processor

Depends on: PR #1 (Responses API Infrastructure)

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: Evaluation Framework + Frontend UI (#18)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* feat: add evaluation framework and frontend UI for Responses API

**Evaluation Framework:**
- Add EventRecorder with git commit SHA tracking
- Add API-aware scoring with budget adjustments for verbosity
- Add scenario runner for automated testing
- Add CLI for running evaluations
- Add validate_phases.py for phase-based validation
- Add wrappers for endpoint detection

**Frontend UI:**
- Add cascade_model and voicelive_model selectors in Agent Builder
- Add Responses API endpoint preference dropdown
- Add conditional fields for verbosity, reasoning_effort, etc.
- Update ScenarioBuilder with model configuration options
- Display API version fields

**Documentation:**
- Add docs/testing/model-evaluation.md
- Add evaluation playground Jupyter notebook

Depends on: PR #1 (Responses API Infrastructure)

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* Cleaning up lifecycle management logic into dedicated structure, keep main.py clean (#19)

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: voice handler refactoring and MediaHandler migration

Major refactoring of voice processing architecture:

Core Voice Changes:
- Implement new VoiceHandler as primary entry point for voice sessions
- Delete deprecated speech_cascade/tts.py (652 lines removed)
- Consolidate TTS functionality into voice/tts/playback.py
- Enhance CascadeOrchestrator with improved turn management
- Add VoiceSessionContext for clean dependency injection

API & Integration:
- Migrate /api/v1/browser/conversation to VoiceHandler
- Migrate /api/v1/media/stream to VoiceHandler
- Create MediaHandler→VoiceHandler compatibility alias
- Update media_handler.py for backward compatibility

Infrastructure:
- Improve telemetry with Azure-style span naming
- Enhance ACS helpers with better session management
- Update session terminator for lifecycle management
- Add orchestration improvements for unified agents

Configuration & Samples:
- Update auth agent and insurance scenario configs
- Add handoff tool enhancements with context variables
- Update gpt_flow sample for new patterns

Frontend:
- Refactor App.jsx for improved voice handling UI

Testing & Documentation:
- Add test_voice_handler_compat.py for backward compatibility
- Add MEDIAHANDLER_MIGRATION.md tracking document

This change maintains full backward compatibility while establishing
the foundation for cleaner voice processing patterns going forward.

Closes #[TBD]

* Enhance logging and user prompts in preflight and pre-provisioning scripts (#20)

- Updated logging functions in preflight-checks.sh, ssl-preprovision.sh, sync-appconfig.sh, postprovision.sh, and preprovision.sh for consistent output formatting.
- Improved user prompts for SSL certificate configuration and Azure Entra group creation in ssl-preprovision.sh and postprovision.sh.
- Added color-coded success, warning, and error messages for better visibility.
- Modified the handling of environment variables in postprovision.sh to ensure updates are made without overwriting existing values.
- Updated Terraform configurations to manage app configuration and cognitive account settings with soft delete options.

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: voice handler refactoring and MediaHandler migration (#21)

Major refactoring of voice processing architecture:

Core Voice Changes:
- Implement new VoiceHandler as primary entry point for voice sessions
- Delete deprecated speech_cascade/tts.py (652 lines removed)
- Consolidate TTS functionality into voice/tts/playback.py
- Enhance CascadeOrchestrator with improved turn management
- Add VoiceSessionContext for clean dependency injection

API & Integration:
- Migrate /api/v1/browser/conversation to VoiceHandler
- Migrate /api/v1/media/stream to VoiceHandler
- Create MediaHandler→VoiceHandler compatibility alias
- Update media_handler.py for backward compatibility

Infrastructure:
- Improve telemetry with Azure-style span naming
- Enhance ACS helpers with better session management
- Update session terminator for lifecycle management
- Add orchestration improvements for unified agents

Configuration & Samples:
- Update auth agent and insurance scenario configs
- Add handoff tool enhancements with context variables
- Update gpt_flow sample for new patterns

Frontend:
- Refactor App.jsx for improved voice handling UI

Testing & Documentation:
- Add test_voice_handler_compat.py for backward compatibility
- Add MEDIAHANDLER_MIGRATION.md tracking document

This change maintains full backward compatibility while establishing
the foundation for cleaner voice processing patterns going forward.

Closes #[TBD]

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* enhanced the scenariobuilder with flowy  (#22)

* docs: add comprehensive voice processing architecture documentation

Add complete documentation for the voice processing architecture:

New Documentation:
- docs/architecture/voice/README.md - Comprehensive voice architecture guide
  * VoiceHandler overview and usage patterns
  * TTS playback and text processing
  * Speech cascade pipeline documentation
  * Audio specifications for browser and ACS transports
  * Testing guidelines with actual test file references
  * Troubleshooting guide for common issues

- apps/artagent/backend/voice/README.md - Developer quick reference
  * Directory structure and module organization
  * Quick start examples
  * Common tasks and patterns
  * File location guide
  * Testing commands

Documentation Updates:
- docs/mkdocs.yml - Add voice architecture to navigation
- docs/operations/troubleshooting.md - Add voice-specific troubleshooting

Key Improvements:
- Fixed mkdocs formatting for proper list rendering
- Updated all test references to match actual test files:
  * test_voice_handler_components.py
  * test_voice_handler_compat.py
  * test_cascade_orchestrator_entry_points.py
  * test_cascade_llm_processing.py
- Verified all script references (quick_test.sh, test_orchestrator.py)
- Added prerequisites for running tests with dev dependencies
- Included both basic and advanced testing examples

All file paths and examples have been verified against the actual codebase.

Related to #[TBD]

* Add custom styles for Flowy flowchart integration with agent blocks

* feat: Enhance output port visibility logic in ScenarioGraphCanvas

* feat: Add expandable full prompt view for source agent in HandoffEditorDialog

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* Refactor ACS logging and add default orchestration scenario

- Removed info-level logging for ACS configuration details to reduce verbosity.
- Changed some logging statements to debug level for better log management.
- Updated peer.service attribute in telemetry to use "azure-communication-services".
- Introduced a new orchestration.yaml file defining a default customer service scenario with multiple agents and handoff configurations.

* Refactor ACS logging and add default orchestration scenario (#23)

- Removed info-level logging for ACS configuration details to reduce verbosity.
- Changed some logging statements to debug level for better log management.
- Updated peer.service attribute in telemetry to use "azure-communication-services".
- Introduced a new orchestration.yaml file defining a default customer service scenario with multiple agents and handoff configurations.

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* Enhance logging functions to use log_plain for consistency and clarity in local development setup script

* Disable view toggle buttons for chat/graph/timeline in ConversationControls

* Add panning functionality to ScenarioGraphCanvas and reset button

* Update CHANGELOG.md for 2.0.0-beta.1 release: add new features, enhancements, fixes, and infrastructure changes

* feat: Add mkdocs-mermaid-zoom dependency and update locust load test scripts

- Added mkdocs-mermaid-zoom to pyproject.toml and uv.lock for enhanced diagram support in documentation.
- Enhanced locustfile.acs_media.py with rate limit detection and error handling improvements.
- Introduced locustfile.browser_conversation.py for testing browser-based voice conversation endpoints.
- Improved metrics naming conventions for clarity in load testing results.

* feat: Update Voice Live readiness status to use event envelope format

---------

Co-authored-by: Jin Lee <[email protected]>
Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
Co-authored-by: Anna Quincy <[email protected]>
Co-authored-by: Jin Lee <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Consolidate infrastructure resource documentation into infra/README.md (#26)

* Initial plan

* Add comprehensive infrastructure resources documentation with private networking links

Co-authored-by: JinLee794 <[email protected]>

* Consolidate infrastructure documentation into infra/README.md

Co-authored-by: JinLee794 <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: JinLee794 <[email protected]>

* enhancement: infra docs readme update (#100)

* Update version and SKU name in staging params

* Change version for text-embedding-3-large model

Updated the version of the text-embedding-3-large model.

* Update main.tfvars.staging.json

* Update communication.tf

* feat: Enhance status envelope with optional label and update frontend to derive WS URL

- Added optional `label` parameter to `make_status_envelope` function in `envelopes.py` to allow custom labels in status messages.
- Updated `entrypoint.sh` to derive WebSocket URL from `BACKEND_URL` or use `WS_URL` if provided, replacing placeholders in frontend assets.
- Upgraded `js-yaml` and `vite` dependencies in `package.json` and `package-lock.json`.
- Enhanced `App.jsx` to format event type labels and summarize event data for better user experience.
- Introduced new demo scenarios in `DemoScenariosWidget.jsx` to showcase Microsoft Copilot Studio integration and ACS call routing.
- Added tests for call transfer events in `test_acs_events_handlers.py` to ensure correct envelope broadcasting for transfer accepted and failed events.
- Created a new Jupyter notebook for custom speech model demonstration in `12-custom-speech-model.ipynb`.
- Updated Terraform parameters to include a new text embedding model in `main.tfvars.dev.json`.

* refactor: Comment out unused email communication service domain resource

* refactor: Comment out unused Azure email communication service resources

* feat: Enhance event handling and UI components

- Added new utility functions for formatting event types and summarizing event data in App.jsx.
- Improved ChatBubble component to display event messages with formatted labels and timestamps.
- Updated DemoScenariosWidget to include new scenarios and enhanced filtering options based on tags.
- Introduced websocket URL derivation in postprovision.sh for better backend integration.
- Added tests for call transfer events in test_acs_events_handlers.py to ensure proper envelope broadcasting.
- Updated package.json to include js-yaml and upgraded vite version.

* add value

* feat: Enhance distributed session handling and improve PayPal agent interactions

- Implement distributed session bus using Redis for cross-replica session routing in connection manager.
- Add methods for publishing session envelopes to Redis channels.
- Introduce confirmation context for call center transfers to ensure explicit user consent.
- Update PayPal agent templates to clarify authentication and routing guidelines.
- Enhance real-time voice app to manage relay WebSocket connections and handle session updates more effectively.
- Improve error handling and logging for distributed session delivery and Redis interactions.
- Refactor session envelope handling in frontend to accommodate new event types and improve user experience.

* feat: Enhance status tone metadata and improve chat bubble styling

* feat: Implement background task handling for MFA delivery and improve greeting messages in handoff processes

* feat: Enhance call escalation process with detailed transfer context and improve PayPal agent handoff scenarios

* feat: Implement retry mechanism for browser session ID resolution in media streaming

* feat: Enhance session management and greeting handling across various components

* fixing session mapping for acs calls

* add value

* add value

* adding test file

* Adding agents and templates for credit card recommendation and fee dispute agents

* add value

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* add value

* add value

* Implement Azure Voice Live service integration and enhance Terraform configurations for voice model deployments

* add value

* Add Azure Voice Live model configuration and outputs

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* remove sensitive contact information and unused transfer agency client data

* feat: Introduce Agent Consolidation Plan with YAML-driven architecture

- Added a comprehensive proposal for consolidating agent architecture in `apps/rtagent/backend/src/agents/`.
- Established key goals including single source of truth for agent definitions, auto-discovery, and unified tool registry.
- Analyzed current architecture and identified pain points such as manual handoff registration and duplicate tool registries.
- Proposed a new solution architecture featuring enhanced YAML schema, auto-discovery engine, and unified tool registry.
- Detailed implementation roadmap divided into phases for gradual migration and integration.
- Included backward compatibility strategy to ensure existing agents function without modification.
- Provided extensive documentation on YAML schema, CLI tool usage, and migration checklist.

* Refactor speech cascade handler and routing for browser communication

- Updated speech cascade handler to prioritize `on_greeting` callback over `on_tts_request` for greeting events.
- Added `queue_user_text` method to `SpeechCascadeHandler` for queuing user text input.
- Changed routing from `/realtime` to `/browser` for browser communication endpoints.
- Modified orchestration logic to ensure TTS responses are sent with blocking behavior to prevent overlap.
- Introduced WebSocket helper functions for better organization and clarity in messaging.
- Enhanced connection manager to handle Redis pubsub reconnections on credential expiration.
- Updated frontend components to reflect routing changes for browser communication.
- Adjusted tests to align with the new browser routing and functionality.
- Commented out live metrics enabling condition in telemetry configuration for future consideration.

* feat(telemetry): add decorators for tracing LLM, dependency, speech, and ACS calls

- Introduced , , , and  decorators for OpenTelemetry instrumentation.
- Implemented  context manager for tracking conversation turns with detailed metrics.
- Added helper functions for recording GenAI and speech metrics.
- Enhanced span attributes for Azure Application Insights visualization.

* Remove telemetry configuration module (telemetry_config_v2.py) to streamline codebase and eliminate unused functionality.

* feat: Enhance telemetry and tracing for CosmosDB and latency tool

- Added OpenTelemetry tracing to CosmosDB operations with a decorator for latency tracking.
- Integrated tracing spans in the LatencyTool for better observability in Application Insights.
- Updated telemetry configuration to suppress noisy logs and added new attributes for speech cascade metrics.
- Created unit tests for SessionAgentManager, covering configuration management, override resolution, handoff management, and persistence.
- Removed outdated endpoints review document.

* feat: Add useBackendHealth hook for backend health checks and integrate with readiness, agents, and health endpoints

test: Implement integration tests for VoiceLive Session Agent Manager, covering agent resolution, handoff mapping, and runtime modifications

* WARNING!!!! MAJOR REFACTOR COMMIT

- Removed the VoiceLive SDK integration module from the backend.
- Added a new AgentTopologyPanel component to the frontend for displaying agent inventory and connections.
- Integrated the AgentTopologyPanel into the main application layout.
- Updated the BackendIndicator to include agent count and selection functionality.
- Enhanced the ConversationControls with a fixed view switcher for better accessibility.
- Improved the useBackendHealth hook to handle various agent data structures.
- Updated styles for better responsiveness and visual consistency across components.
- Modified utility functions to format agent inventory data correctly.
- Adjusted import paths in orchestrators and tests to reflect the new backend structure.

* feat: Enhance agent handoff process and response handling; refactor UI components for improved usability

* feat: Update change notes for v2/speech-orchestration-and-monitoring branch; highlight major features, improvements, and new agents

* refactor: Remove Unified Agent Configuration Module; streamline agent management and improve code organization

* feat: Enhance ProfileDetailsPanel with resizable functionality and UI improvements

- Added resizable panel feature to ProfileDetailsPanel, allowing users to adjust width dynamically.
- Updated panel styling for improved aesthetics, including a gradient background and adjusted borders.
- Enhanced scrollbar visibility and overflow handling for better user experience.

refactor: Simplify GraphListView filter logic

- Removed default selection logic for filters in GraphListView, allowing users to start with no filters applied.
- Cleaned up useEffect dependencies for better performance and clarity.

docs: Introduce Backend Voice & Agents Architecture documentation

- Added comprehensive documentation outlining the architecture of backend voice and agent modules.
- Detailed separation of concerns between voice transport and agent business logic.
- Included data flow diagrams and module responsibilities for clarity.

docs: Create Handoff Logic Inventory for better understanding of handoff processes

- Documented the handoff logic across backend voice and agent modules.
- Established a single source of truth for handoff mappings and protocols.
- Summarized cleanup phases and their impact on the codebase.

fix: Update logging to safely handle span names

- Modified TraceLogFilter to safely retrieve span names, preventing attribute errors with NonRecordingSpan.

fix: Adjust telemetry configuration to capture all loggers

- Changed logger_name default to an empty string in TelemetryConfig to capture all loggers.

* feat: Implement context-aware greeting rendering in VoiceLive agent; enhance session management and logging

* feat: Refactor agent configuration and voice handling; streamline agent switching and TTS integration

* feat: Enhance Agent Details Panel and Session Management

- Added sessionAgentConfig prop to AgentDetailsPanel for dynamic agent configuration display.
- Implemented logic to show agent name, description, tools, and model/voice details based on session configuration.
- Introduced a new PanelCard in AgentDetailsPanel to display session agent configuration, including model, voice, and prompt preview.
- Updated App component to fetch session agent configuration on agent panel visibility and manage agent creation/updating.
- Added validation for TTS client initialization in dedicated_tts_pool.py to ensure clients are ready before use.
- Enhanced on_demand_pool.py to validate cached resources and remove invalid ones.
- Improved error logging in text_to_speech.py to include detailed initialization failure information and added is_ready property for synthesizer readiness check.

* Refactor code structure for improved readability and maintainability

* feat: Enhance MemoManager with background persistence and lifecycle management

- Added support for background persistence in MemoManager, allowing non-blocking state saving to Redis.
- Implemented task deduplication to cancel previous persistence tasks when a new one is initiated.
- Removed unused auto-refresh functionality and related attributes from MemoManager.
- Updated tests to verify new persistence behavior and ensure proper task management.
- Enhanced error handling and logging for background persistence operations.

* feat: Add Connection Warmup Analysis document for Azure Speech & OpenAI optimization

* feat(session): enhan…
JinLee794 added a commit to AIappsGBBFactory/art-voice-agent-accelerator that referenced this pull request Jan 26, 2026
…, evaluation framework (#41)

* cleanup legacy media handler, minor test and deploy cleanup (#25)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* refactor: remove legacy Redis management classes and related files

- Deleted AzureRedisManager, AsyncAzureRedisManager, RedisKeyManager, and associated models.
- Removed unused Redis interaction logic to streamline the codebase.
- Updated tests to reflect changes in the VoiceHandler module, removing deprecated MediaHandler alias.
- Ensured compatibility with the new voice module structure.

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
Co-authored-by: Pablo Salvador Lopez <[email protected]>

* feat: enhance azd environment variable handling with error checks and… (#29)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* Syncinc to Azure Samples  (#95)

* Delete samples/labs/dev/leadership_phrases.txt

* Update version and SKU name in staging params

* Change version for text-embedding-3-large model

Updated the version of the text-embedding-3-large model.

* Update main.tfvars.staging.json

* Update communication.tf

* feat: Enhance status envelope with optional label and update frontend to derive WS URL

- Added optional `label` parameter to `make_status_envelope` function in `envelopes.py` to allow custom labels in status messages.
- Updated `entrypoint.sh` to derive WebSocket URL from `BACKEND_URL` or use `WS_URL` if provided, replacing placeholders in frontend assets.
- Upgraded `js-yaml` and `vite` dependencies in `package.json` and `package-lock.json`.
- Enhanced `App.jsx` to format event type labels and summarize event data for better user experience.
- Introduced new demo scenarios in `DemoScenariosWidget.jsx` to showcase Microsoft Copilot Studio integration and ACS call routing.
- Added tests for call transfer events in `test_acs_events_handlers.py` to ensure correct envelope broadcasting for transfer accepted and failed events.
- Created a new Jupyter notebook for custom speech model demonstration in `12-custom-speech-model.ipynb`.
- Updated Terraform parameters to include a new text embedding model in `main.tfvars.dev.json`.

* refactor: Comment out unused email communication service domain resource

* refactor: Comment out unused Azure email communication service resources

* feat: Enhance event handling and UI components

- Added new utility functions for formatting event types and summarizing event data in App.jsx.
- Improved ChatBubble component to display event messages with formatted labels and timestamps.
- Updated DemoScenariosWidget to include new scenarios and enhanced filtering options based on tags.
- Introduced websocket URL derivation in postprovision.sh for better backend integration.
- Added tests for call transfer events in test_acs_events_handlers.py to ensure proper envelope broadcasting.
- Updated package.json to include js-yaml and upgraded vite version.

* add value

* feat: Enhance distributed session handling and improve PayPal agent interactions

- Implement distributed session bus using Redis for cross-replica session routing in connection manager.
- Add methods for publishing session envelopes to Redis channels.
- Introduce confirmation context for call center transfers to ensure explicit user consent.
- Update PayPal agent templates to clarify authentication and routing guidelines.
- Enhance real-time voice app to manage relay WebSocket connections and handle session updates more effectively.
- Improve error handling and logging for distributed session delivery and Redis interactions.
- Refactor session envelope handling in frontend to accommodate new event types and improve user experience.

* feat: Enhance status tone metadata and improve chat bubble styling

* feat: Implement background task handling for MFA delivery and improve greeting messages in handoff processes

* feat: Enhance call escalation process with detailed transfer context and improve PayPal agent handoff scenarios

* feat: Implement retry mechanism for browser session ID resolution in media streaming

* feat: Enhance session management and greeting handling across various components

* fixing session mapping for acs calls

* add value

* add value

* adding test file

* Adding agents and templates for credit card recommendation and fee dispute agents

* add value

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* add value

* add value

* Implement Azure Voice Live service integration and enhance Terraform configurations for voice model deployments

* add value

* Add Azure Voice Live model configuration and outputs

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* remove sensitive contact information and unused transfer agency client data

* feat: Introduce Agent Consolidation Plan with YAML-driven architecture

- Added a comprehensive proposal for consolidating agent architecture in `apps/rtagent/backend/src/agents/`.
- Established key goals including single source of truth for agent definitions, auto-discovery, and unified tool registry.
- Analyzed current architecture and identified pain points such as manual handoff registration and duplicate tool registries.
- Proposed a new solution architecture featuring enhanced YAML schema, auto-discovery engine, and unified tool registry.
- Detailed implementation roadmap divided into phases for gradual migration and integration.
- Included backward compatibility strategy to ensure existing agents function without modification.
- Provided extensive documentation on YAML schema, CLI tool usage, and migration checklist.

* Refactor speech cascade handler and routing for browser communication

- Updated speech cascade handler to prioritize `on_greeting` callback over `on_tts_request` for greeting events.
- Added `queue_user_text` method to `SpeechCascadeHandler` for queuing user text input.
- Changed routing from `/realtime` to `/browser` for browser communication endpoints.
- Modified orchestration logic to ensure TTS responses are sent with blocking behavior to prevent overlap.
- Introduced WebSocket helper functions for better organization and clarity in messaging.
- Enhanced connection manager to handle Redis pubsub reconnections on credential expiration.
- Updated frontend components to reflect routing changes for browser communication.
- Adjusted tests to align with the new browser routing and functionality.
- Commented out live metrics enabling condition in telemetry configuration for future consideration.

* feat(telemetry): add decorators for tracing LLM, dependency, speech, and ACS calls

- Introduced , , , and  decorators for OpenTelemetry instrumentation.
- Implemented  context manager for tracking conversation turns with detailed metrics.
- Added helper functions for recording GenAI and speech metrics.
- Enhanced span attributes for Azure Application Insights visualization.

* Remove telemetry configuration module (telemetry_config_v2.py) to streamline codebase and eliminate unused functionality.

* feat: Enhance telemetry and tracing for CosmosDB and latency tool

- Added OpenTelemetry tracing to CosmosDB operations with a decorator for latency tracking.
- Integrated tracing spans in the LatencyTool for better observability in Application Insights.
- Updated telemetry configuration to suppress noisy logs and added new attributes for speech cascade metrics.
- Created unit tests for SessionAgentManager, covering configuration management, override resolution, handoff management, and persistence.
- Removed outdated endpoints review document.

* feat: Add useBackendHealth hook for backend health checks and integrate with readiness, agents, and health endpoints

test: Implement integration tests for VoiceLive Session Agent Manager, covering agent resolution, handoff mapping, and runtime modifications

* WARNING!!!! MAJOR REFACTOR COMMIT

- Removed the VoiceLive SDK integration module from the backend.
- Added a new AgentTopologyPanel component to the frontend for displaying agent inventory and connections.
- Integrated the AgentTopologyPanel into the main application layout.
- Updated the BackendIndicator to include agent count and selection functionality.
- Enhanced the ConversationControls with a fixed view switcher for better accessibility.
- Improved the useBackendHealth hook to handle various agent data structures.
- Updated styles for better responsiveness and visual consistency across components.
- Modified utility functions to format agent inventory data correctly.
- Adjusted import paths in orchestrators and tests to reflect the new backend structure.

* feat: Enhance agent handoff process and response handling; refactor UI components for improved usability

* feat: Update change notes for v2/speech-orchestration-and-monitoring branch; highlight major features, improvements, and new agents

* refactor: Remove Unified Agent Configuration Module; streamline agent management and improve code organization

* feat: Enhance ProfileDetailsPanel with resizable functionality and UI improvements

- Added resizable panel feature to ProfileDetailsPanel, allowing users to adjust width dynamically.
- Updated panel styling for improved aesthetics, including a gradient background and adjusted borders.
- Enhanced scrollbar visibility and overflow handling for better user experience.

refactor: Simplify GraphListView filter logic

- Removed default selection logic for filters in GraphListView, allowing users to start with no filters applied.
- Cleaned up useEffect dependencies for better performance and clarity.

docs: Introduce Backend Voice & Agents Architecture documentation

- Added comprehensive documentation outlining the architecture of backend voice and agent modules.
- Detailed separation of concerns between voice transport and agent business logic.
- Included data flow diagrams and module responsibilities for clarity.

docs: Create Handoff Logic Inventory for better understanding of handoff processes

- Documented the handoff logic across backend voice and agent modules.
- Established a single source of truth for handoff mappings and protocols.
- Summarized cleanup phases and their impact on the codebase.

fix: Update logging to safely handle span names

- Modified TraceLogFilter to safely retrieve span names, preventing attribute errors with NonRecordingSpan.

fix: Adjust telemetry configuration to capture all loggers

- Changed logger_name default to an empty string in TelemetryConfig to capture all loggers.

* feat: Implement context-aware greeting rendering in VoiceLive agent; enhance session management and logging

* feat: Refactor agent configuration and voice handling; streamline agent switching and TTS integration

* feat: Enhance Agent Details Panel and Session Management

- Added sessionAgentConfig prop to AgentDetailsPanel for dynamic agent configuration display.
- Implemented logic to show agent name, description, tools, and model/voice details based on session configuration.
- Introduced a new PanelCard in AgentDetailsPanel to display session agent configuration, including model, voice, and prompt preview.
- Updated App component to fetch session agent configuration on agent panel visibility and manage agent creation/updating.
- Added validation for TTS client initialization in dedicated_tts_pool.py to ensure clients are ready before use.
- Enhanced on_demand_pool.py to validate cached resources and remove invalid ones.
- Improved error logging in text_to_speech.py to include detailed initialization failure information and added is_ready property for synthesizer readiness check.

* Refactor code structure for improved readability and maintainability

* feat: Enhance MemoManager with background persistence and lifecycle management

- Added support for background persistence in MemoManager, allowing non-blocking state saving to Redis.
- Implemented task deduplication to cancel previous persistence tasks when a new one is initiated.
- Removed unused auto-refresh functionality and related attributes from MemoManager.
- Updated tests to verify new persistence behavior and ensure proper task management.
- Enhanced error handling and logging for background persistence operations.

* feat: Add Connection Warmup Analysis document for Azure Speech & OpenAI optimization

* feat(session): enhance session ID management and URL parameter support

- Added `pickSessionIdFromUrl` function to extract session ID from URL parameters.
- Updated `getOrCreateSessionId` to allow session ID restoration from URL.
- Refactored `setSessionId` for better logging and session management.
- Improved `createNewSessionId` to utilize `setSessionId`.

docs(api): restructure API documentation for clarity and completeness

- Organized API endpoints into categories: Health & Monitoring, Call Management, Media Streaming, Browser Conversations, Session Metrics, Agent Builder, Demo Environment, and TTS Health.
- Added detailed descriptions and examples for each endpoint.
- Included new sections for interactive API documentation and WebSocket endpoints.

docs(api-reference): update WebSocket message types and endpoint details

- Clarified message types for incoming audio data and control messages.
- Updated WebSocket endpoint URLs and query parameters for browser conversations and dashboard relay.

docs(architecture): refine agent architecture diagrams for clarity

- Adjusted diagrams to improve readability and understanding of the agent framework and orchestration.

fix(architecture): correct orchestration mode comparison table

- Updated ratings for Azure Speech voices and simplicity of setup in the orchestration comparison table.

docs(getting-started): add demo guide and enhance onboarding experience

- Introduced a new demo guide to facilitate user onboarding and provide structured paths for different user levels.
- Enhanced the getting started guide with tips and recommended paths for new users.

feat(aoai): implement OpenAI connection warmup to reduce latency

- Added `warm_openai_connection` function to pre-establish OpenAI connection and reduce cold-start latency on first call.

feat(speech): implement token warmup for Speech API to minimize latency

- Added `warm_token` method in `SpeechTokenManager` to pre-fetch tokens during startup, reducing latency on first API call.

* feat(healthcare): Implement Nurse Triage Agent with symptom assessment and routing capabilities

- Introduced a comprehensive voice agent for healthcare triage.
- Added agent configuration and prompt templates for patient interaction.
- Developed healthcare tools for patient verification, clinical knowledge search, and symptom urgency assessment.
- Integrated routing logic for scheduling appointments and emergency transfers.
- Enhanced documentation with demo scenarios and testing instructions.

* feat: Implement logging utility and session management

- Added a logger utility to manage console logging levels and filtering.
- Created session management functions to handle session IDs, including retrieval from URL and session creation.
- Developed styles for the frontend components to ensure consistent UI design.
- Configured Vite for the frontend build process with proper asset handling and environment variable support.
- Introduced scripts for starting the backend and frontend development servers, including Azure Dev Tunnel hosting.

* feat: Simplify agent handoff process by refining context management and removing redundant data collection

* feat: Enhance agent handoff process by managing conversation history and user context

* feat: Enhance message handling by persisting tool calls and results as JSON for conversation continuity

* feat: Implement silent handoff protocol across agents to enhance user experience and streamline transitions

* feat: Add Azure App Configuration module with RBAC and Key Vault integration

- Implemented main resource for Azure App Configuration in Terraform.
- Added outputs for App Configuration details including ID, name, and endpoint.
- Defined variables for App Configuration module, including identity and Key Vault integration.
- Updated main Terraform outputs to include App Configuration details.
- Enhanced error handling in Azure OpenAI client for missing endpoint configuration.
- Improved Redis manager to handle port configuration with better error messaging.
- Updated requirements to include Azure App Configuration SDKs.

* first code clean up

* enabling oidc

* Refactor code structure and remove redundant sections for improved readability and maintainability

* add value

* add value

* feat: Add managed certificate and domain registration modules

- Introduced `managed-cert-example.bicep` for example usage of managed certificate deployment.
- Created `managed-cert.bicep` to handle App Service Domain registration and managed SSL certificate generation.
- Implemented `role-assignment.bicep` for managing role assignments with support for built-in and custom roles.
- Added `windows-vm.bicep` for deploying a Windows VM as a jumphost with necessary networking components.
- Developed `peer-virtual-networks.bicep` for establishing peering between virtual networks.
- Implemented `private-dns-zone.bicep` for creating and linking private DNS zones to virtual networks.
- Created `private-endpoint.bicep` for deploying private endpoints with DNS zone integration.
- Added `vnet.bicep` for creating virtual networks with associated subnets and network security groups.
- Updated `types.bicep` with new types for model deployment, role assignments, and network configurations.
- Developed `secret.bicep` for managing secrets in Azure Key Vault.
- Created `network.bicep` for orchestrating network resources including virtual networks and subnets.

* fix: Update default location parameter in create_storage function for clarity

* feat: Extract AZURE_LOCATION from environment-specific tfvars file if not set

* feat: Implement location resolution with fallback chain in preprovision script

* fix: Update Dockerfile to install runtime dependencies and mitigate vulnerabilities

* chore: Update CHANGELOG for version 1.5.0 release and remove changenotes.md; enable remote builds in azure.yaml; enhance terraform initialization script with location prompts

* feat: Update launch configuration and scripts to use virtual environment with uv; enhance README for deployment clarity

* further deployment cleanup, docs update/tweaks, adding more todos

* removing unused dependency in src/herlpers.py

* refactor: Update architecture diagram in README for clarity and consistency in orchestration modes

* add value

* Refactor Terraform configuration:
- Update main.tf to adjust foundry account and project naming conventions.
- Remove feature flags and keys from appconfig module as they are now managed externally.
- Clean up variables.tf by removing unused variables and updating descriptions.
- Delete provider configuration file as it is no longer needed.
- Change default application name from "rtaudioagent" to "artagent" and adjust related settings.
- Modify connection settings and pool sizes for improved performance.

* feat: Enhance Azure Voice Live integration and refactor configuration management

* last changes

* feat: Add app configuration bootstrap to initialize environment variables

* Enhance configuration loading with .env.local support and update documentation

* fix voicelive output attributes

* add

* Refactor agent paths and update documentation for agent discovery and configuration

* Add Insurance Voice Agent Scenario documentation and update navigation

- Introduced a comprehensive guide for the Insurance Customer Service Scenario, detailing the security-focused multi-agent voice system for claims processing, fraud detection, and policy management.
- Updated mkdocs.yml to include the new Insurance documentation in the Industry Solutions section.

* Add integration proposal for Spec-Driven Development methodology in ARTVoice

* add value

* Enhance Terraform configuration and scripts for Voice Live integration

- Update Dockerfile to install dependencies and set up virtual environment.
- Modify initialize-terraform.sh and local-dev-setup.sh for improved script handling.
- Refactor sync-appconfig.sh to streamline key-value imports and feature flag management.
- Add provider.conf.json generation for remote state backend configuration.
- Update main.tf and outputs.tf to support new Voice Live model deployments.
- Introduce voice_live_location and voice_live_model_deployments variables in variables.tf.

* feat: Add Concierge agent configuration and prompts for banking scenarios

- Introduced a new YAML configuration for the Concierge agent, defining its voice, model, session, and tool configurations.
- Created a comprehensive prompt file for the Concierge agent, detailing voice and language settings, identity and trust guidelines, and operational modes.
- Implemented scenario orchestration analysis to address issues with agent initialization and fallback logic, ensuring the correct agent is set for banking scenarios.
- Renamed orchestration.yaml to scenario.yaml for consistency in scenario loading.
- Updated default start agent to BankingConcierge and added validation for agent existence at startup.

* feat: Enhance scenario loading to support orchestration.yaml naming convention

* feat: Implement scenario-based handoff map resolution for orchestrator configuration

* cicd test for azd deploy

* feat: Update audio handling and documentation dependencies for improved installation and error handling

* feat: Refactor app configuration handling to prioritize .env.local overrides and improve environment variable management

* feat: Revise documentation deployment workflow to enhance dependency management and streamline build process

* modified docs workflow

* feat: Add site_dir configuration to mkdocs.yml for improved site structure

* feat: Allow mkdocs build to proceed with warnings by removing --strict flag

* fix: Update health check endpoint in postprovision script to use correct API path

* refactor: Remove outdated AZD deployment workflow and update documentation links for clarity

* fix: Ensure principal_id logging does not fail and handle local_state retrieval correctly

* refactor: Simplify state key handling in provider configuration by using environment name

* fix: Skip null values when loading static parameters from tfvars file to use Terraform defaults

* fix: Use coalesce function for location assignment in storage account resource

* refactor: Remove unused backend API public URL variable and related validation

* refactor: Remove unused backend API public URL and source phone number from environment parameter files

* improvements flow

* fix: Implement auto-selection and timeout for user input in setup scripts

* add value

* fix: Update naming conventions for foundry account and project variables in locals

* fix: Update name from rtaudioagent to artaudioagent in environment parameter files

* fix: Update name from rtaudioagent to artaudioagent in environment parameter files

* fix: Update documentation URLs to reflect new repository location

* feat: Enhance API documentation and tagging for better clarity and organization

* docs: Update documentation links and improve clarity across various guides

* refactor: replace deploy-azd workflow with reusable template and remove redundant summary job

- Updated the deployment workflow name to "Deploy to Azure".
- Replaced the usage of the old deploy-azd.yml with a new reusable template _template-deploy-azd.yml.
- Removed the deployment summary job and its associated steps to streamline the workflow.

* fix: Add run-name to the Azure deployment workflow for better clarity

* fix: Update condition for output extraction in deployment workflow

* fix: Update GitHub token to use secrets for enhanced security

* feat: Add optional GitHub PAT secret and enhance environment variable handling for Azure deployment

* adding rg as env var set at the gh env level

* fix: Add emoji to workflow names for better visibility

* feat: Update documentation workflow name and enhance README with deployment badges

* fix: Update README layout and enhance navigation links for better user experience

* fix: Restore header for ARTVoice Accelerator Framework in README

* add value

* fix: Update README layout for improved clarity and navigation

* Enhance provisioning scripts and documentation

- Updated postprovision.sh to clarify phone number provisioning steps and added guidance for obtaining a phone number via Azure Portal.
- Modified preprovision.sh to include preflight checks for tools, authentication, and providers before proceeding with provisioning.
- Added jq as a prerequisite in the getting-started documentation and provided installation instructions for various platforms.
- Created a new TODO-deployfixes.md file to document common issues encountered during deployment sessions, including resolutions for Docker errors, jq installation, and subscription registration.
- Expanded troubleshooting.md with detailed solutions for common deployment and provisioning issues, including authentication mismatches, Docker errors, jq command not found, and ACS phone number prompts.
- Updated variables.tf to improve the description of the voice_live_location variable, including a link to supported Azure regions.

* feat: Update branch triggers in workflow to include feat/troubleshooting-enhancements

* fix(ci): simplify test-azd-hooks workflow tests and run in parallel

- Remove fragile grep-based function extraction that caused syntax errors
- Run lint, linux, macos, windows tests in parallel (no dependencies)
- Trigger on all pushes to main/staging (remove path filters for push)
- Simplify backend configuration test to avoid function sourcing issues

* feat: Add troubleshooting steps for "bad interpreter" errors and enhance post-provisioning instructions for phone number configuration

* feat: Add preprovision hook execution to Linux, macOS, and Windows test jobs in CI workflow

* feat: Enhance AZD hook testing with postprovision execution and Azure CLI setup

* feat: Update test job names for clarity and enhance preflight checks for CI mode

* feat: Update preflight checks to conditionally include Docker in CI mode and log its status

* feat: Add Dev Container testing for AZD hooks with environment validation and summary reporting

* feat: Enhance deployment scripts with pre/post-provisioning hooks and Azure CLI extension checks

* feat: Add troubleshooting guidance for MkDocs module errors and update dev dependencies in uv.lock

* feat: Update Azure deployment workflows and normalize container memory formats

* feat: Add troubleshooting guidance for Terraform state lock errors and provide remote/local fix options

* feat: Remove outdated troubleshooting documentation for deployment issues

* Apply suggestion from @Copilot

Co-authored-by: Copilot <[email protected]>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <[email protected]>

* Update .github/workflows/test-azd-hooks.yml

Co-authored-by: Copilot <[email protected]>

* feat: Implement TTS Streaming Latency Analysis and Optimization Plan

- Added a comprehensive document outlining the critical latency issues in TTS playback within the Speech Cascade architecture.
- Identified root causes including processing loop deadlock, sentence buffering delays, queue-based event processing, and full synthesis before streaming.
- Proposed a multi-phase optimization strategy to address identified issues, including:
  - Phase 0: Fix processing loop deadlock by creating a dedicated TTS processing task.
  - Phase 1: Reduce sentence buffer threshold for earlier TTS chunk dispatch.
  - Phase 2: Implement parallel TTS prefetching to synthesize the next sentence while streaming.
  - Phase 3: Enable streaming TTS synthesis to stream audio while synthesizing.
  - Phase 4: Achieve full pipeline parallelism for LLM to TTS to WebSocket streaming.
- Created a detailed test implementation plan with metrics and success criteria to validate improvements.

test: Add unit tests for HandoffService

- Created unit tests for the HandoffService, covering handoff detection, target resolution, and handoff resolution methods.
- Implemented tests for greeting selection and context building to ensure proper functionality.
- Added tests for the HandoffResolution dataclass to verify properties and default values.

* feat: Add Scenario Builder component and integrate with RealTimeVoiceApp

- Introduced ScenarioBuilder component for visual orchestration of agent flows.
- Implemented drag-and-drop functionality for agents and handoff configuration.
- Added buttons in RealTimeVoiceApp for accessing Agent and Scenario Builders.
- Enhanced state management for agent scenarios, including creation and updates.
- Integrated new handoff editor for configuring agent interactions.

* Refactor code structure for improved readability and maintainability

* Add error handling for Redis connection issues and implement unit tests for HandoffService

- Enhanced AzureRedisManager to handle RedisClusterException and OSError during client connection attempts.
- Introduced comprehensive unit tests for HandoffService, covering handoff detection, target resolution, handoff resolution, greeting selection, and context building.
- Added tests for HandoffResolution dataclass to ensure correct property behavior and default values.

* Enhance LiveOrchestrator to handle context-only session updates without UI broadcasts

* Refactor LiveOrchestrator to prevent duplicate UI updates by omitting redundant session_updated broadcasts during context-only updates.

* Refactor environment variable assignment in deploy workflow for clarity

* Refactor tests and dependencies following module renaming and API changes

- Removed pytest-twisted from dev dependencies in pyproject.toml and uv.lock.
- Updated conftest.py to mock configuration and Azure OpenAI client for tests.
- Skipped tests in test_acs_media_lifecycle.py, test_acs_media_lifecycle_memory.py, and test_acs_simple.py due to dependencies on removed/renamed modules.
- Adjusted imports in test_artagent_wshelpers.py for orchestrator path change.
- Skipped tests in test_call_transfer_service.py due to API changes in toolstore.
- Updated datetime usage in test_demo_env_phrase_bias.py to use UTC.
- Modified websocket endpoint assertions in test_realtime.py to reflect new paths.
- Added new test file test_voice_handler_components.py for voice handler components.

* Add comprehensive tests for VoiceLive handler and orchestrator memory management

- Implement tests to verify cleanup functionality in LiveOrchestrator.
- Ensure proper registration and unregistration of orchestrators in the registry.
- Test background task tracking and cleanup mechanisms.
- Validate greeting task cancellation during orchestrator cleanup.
- Introduce memory leak detection tests to prevent unbounded growth in orchestrator registry.
- Verify user message history deque is properly bounded and cleared on cleanup.
- Add scenario update tests to ensure correct agent management during updates.
- Optimize hot path functions to ensure non-blocking behavior during network calls.

* feat: Enhance AgentBuilder with consistent field names and improved UI elements

* Refactor logging levels from info to debug in connection manager, warmable pool, Redis manager, speech auth manager, speech recognizer, and text-to-speech modules for improved log verbosity control. Remove outdated greeting context tests and add comprehensive scenario orchestration contract tests to ensure functional contracts are preserved during refactoring. Update session agent manager tests to use set comparison for agent listing to avoid dict ordering issues.

* feat: Add predefined handoff condition patterns to enhance scenario orchestration

* add value

* feat(metrics): Introduce shared metrics factory for lazy initialization

- Added `metrics_factory.py` to provide a common infrastructure for OpenTelemetry metrics.
- Implemented `LazyMeter`, `LazyHistogram`, and `LazyCounter` for lazy initialization of metrics.
- Updated `speech_cascade/metrics.py` to utilize the new shared metrics factory, simplifying metric initialization.
- Refactored `voicelive/metrics.py` to use the shared factory for consistent metric handling.
- Enhanced orchestrator classes in `speech_cascade/orchestrator.py` and `voicelive/orchestrator.py` to cache orchestrator configurations, improving performance and reducing redundant calls.
- Introduced utility functions for building common metric attributes, ensuring consistency across metrics.

* feat: Consolidate handoff logic into a unified HandoffService for consistent behavior across orchestrators and enhance documentation

* fix: Simplify environment determination logic in deployment workflow

* add value

* feat: Add user flow screenshots and enhance documentation for guided agent setup

* feat: Enhance scenario testing instructions for clarity and user guidance

* fix: Correct image paths in quickstart guide for accurate rendering

* feat: Add initial agent builder and template selection screenshots to quickstart guide

* feat: Add demo profile creation steps and related images to quickstart guide

* feat: Implement EasyAuth configuration script and integrate into post-provisioning process

* refactor: Remove backend IP restrictions configuration and related outputs

* Added non qualifying rush response to ensure clear model behavior

* updated order so confirmation statement is in the correct spot

* add value

* add value

* chore: Remove unused workflow images for demo profiles

* fix: Update demo profile creation images in quickstart guide

* fix: Update home screen image in quickstart guide

* fix: Update home screen and scenario images in quickstart guide

* add value

* add value

* add value

* add value

* add value

* add

* add value

* art

* add opentelemetry import for tracing support in TTS module

* refactor: update LiveOrchestrator to enhance user message history management and improve handoff context

* Refactor TTS Playback and Voice Handling

- Consolidated TTS playback logic into a unified class for speech cascade.
- Removed deprecated VoiceSessionContext and related compatibility shims.
- Enhanced error handling during tool initialization and event handler registration.
- Updated model configuration handling in UnifiedAgent to prioritize mode-specific settings.
- Improved logging for TTS synthesis and streaming processes.
- Added new handoff tool registration for dynamic routing.

* refactor: streamline EasyAuth enabling process in CI mode and improve interactive prompts

* refactor: enhance EasyAuth interactive prompts and streamline user choices

* refactor: enhance run-name logic for Azure deployment workflow

* fix: update environment logic for pull_request events in Azure deployment workflow

* refactor: update preprovision hook execution and streamline backend configuration

* feat: add context variable support for handoffs and enhance UI for variable mapping

* feat: enhance TTS processing by adding text sanitization and sentence boundary detection (#11)

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#14)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#15)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#13)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat(telemetry): consolidate to OpenTelemetry and establish proper hierarchy (#12)

Infrastructure Changes:
- Delete 6 obsolete latency_tool implementations (~2200 lines)
- Install SessionContextSpanProcessor for automatic session correlation
- Replace LatencyTool with @trace_speech decorators in legacy paths
- Remove latency_tool field from VoiceSessionContext

Speech Services & Dependencies:
- Add @trace_speech for STT partial/final transcripts with attributes
- Add TTS attributes: voice, output_format, language, audio_size_bytes
- Standardize ACS and Redis span attributes with OTel conventions
- Add voice_session root SERVER span in media/browser endpoints

Orchestrator & Token Tracking:
- Add tool execution and agent handoff observability spans
- Fix token tracking to use actual API usage data (not estimates)
- Update Azure OpenAI API to 2024-10-01-preview
- Add session metadata timestamps to MemoManager

Benefits:
- Single source of truth (ConversationTurnSpan + OTel)
- Complete E2E traces in Application Insights
- Accurate cost tracking and token visibility
- ~2300 lines of dead code removed

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: Responses API Infrastructure & Dual Model Configuration (#16)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* fix: update project version to 2.0.0-beta in pyproject.toml

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: Orchestrator Integration + Optimizations (#17)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* feat: integrate Responses API in orchestrators and add optimizations

**Cascade Orchestrator:**
- Update model selection to use agent.get_model_for_mode('cascade')
- Integrate Responses API routing based on endpoint_preference
- Add error handling for unsupported parameters
- Extract TTS processing into separate tts_processor module

**VoiceLive Orchestrator:**
- Update to use agent.get_model_for_mode('voicelive')
- Add registry cleanup to prevent unbounded growth
- Improve memory management and stale orchestrator cleanup
- Extract DTMF processing into separate dtmf_processor module

**Tests:**
- Add test_cascade_orchestrator_entry_points
- Add test_cascade_llm_processing
- Add test_dtmf_processor

Depends on: PR #1 (Responses API Infrastructure)

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: Evaluation Framework + Frontend UI (#18)

* feat: enhance azd environment variable handling with error checks and local state support

* fix: update foundry account and project naming conventions for consistency

* feat: add Responses API infrastructure and dual model configuration

**Infrastructure Changes:**
- Add UnifiedResponse dataclass for dual endpoint support
- Implement _should_use_responses_endpoint() routing logic
- Add _prepare_responses_params() and _prepare_chat_params() methods
- Update generate_response() to route between /chat/completions and /responses

**Model Configuration:**
- Add cascade_model and voicelive_model fields to AgentConfig
- Add get_model_for_mode() with support for 'cascade', 'media', 'voicelive', 'realtime' aliases
- Add Responses API fields: endpoint_preference, verbosity, min_p, typical_p, reasoning_effort, include_reasoning, max_completion_tokens
- Update ModelConfigSchema in agent_builder API

**Tests:**
- Add test_generate_response_respects_responses_config
- Add test_generate_response_respects_chat_config
- Add TestUnifiedAgentGetModelForMode test suite

This PR provides the foundation for Responses API support without changing orchestrator behavior.

* feat: add evaluation framework and frontend UI for Responses API

**Evaluation Framework:**
- Add EventRecorder with git commit SHA tracking
- Add API-aware scoring with budget adjustments for verbosity
- Add scenario runner for automated testing
- Add CLI for running evaluations
- Add validate_phases.py for phase-based validation
- Add wrappers for endpoint detection

**Frontend UI:**
- Add cascade_model and voicelive_model selectors in Agent Builder
- Add Responses API endpoint preference dropdown
- Add conditional fields for verbosity, reasoning_effort, etc.
- Update ScenarioBuilder with model configuration options
- Display API version fields

**Documentation:**
- Add docs/testing/model-evaluation.md
- Add evaluation playground Jupyter notebook

Depends on: PR #1 (Responses API Infrastructure)

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* Cleaning up lifecycle management logic into dedicated structure, keep main.py clean (#19)

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: voice handler refactoring and MediaHandler migration

Major refactoring of voice processing architecture:

Core Voice Changes:
- Implement new VoiceHandler as primary entry point for voice sessions
- Delete deprecated speech_cascade/tts.py (652 lines removed)
- Consolidate TTS functionality into voice/tts/playback.py
- Enhance CascadeOrchestrator with improved turn management
- Add VoiceSessionContext for clean dependency injection

API & Integration:
- Migrate /api/v1/browser/conversation to VoiceHandler
- Migrate /api/v1/media/stream to VoiceHandler
- Create MediaHandler→VoiceHandler compatibility alias
- Update media_handler.py for backward compatibility

Infrastructure:
- Improve telemetry with Azure-style span naming
- Enhance ACS helpers with better session management
- Update session terminator for lifecycle management
- Add orchestration improvements for unified agents

Configuration & Samples:
- Update auth agent and insurance scenario configs
- Add handoff tool enhancements with context variables
- Update gpt_flow sample for new patterns

Frontend:
- Refactor App.jsx for improved voice handling UI

Testing & Documentation:
- Add test_voice_handler_compat.py for backward compatibility
- Add MEDIAHANDLER_MIGRATION.md tracking document

This change maintains full backward compatibility while establishing
the foundation for cleaner voice processing patterns going forward.

Closes #[TBD]

* Enhance logging and user prompts in preflight and pre-provisioning scripts (#20)

- Updated logging functions in preflight-checks.sh, ssl-preprovision.sh, sync-appconfig.sh, postprovision.sh, and preprovision.sh for consistent output formatting.
- Improved user prompts for SSL certificate configuration and Azure Entra group creation in ssl-preprovision.sh and postprovision.sh.
- Added color-coded success, warning, and error messages for better visibility.
- Modified the handling of environment variables in postprovision.sh to ensure updates are made without overwriting existing values.
- Updated Terraform configurations to manage app configuration and cognitive account settings with soft delete options.

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* feat: voice handler refactoring and MediaHandler migration (#21)

Major refactoring of voice processing architecture:

Core Voice Changes:
- Implement new VoiceHandler as primary entry point for voice sessions
- Delete deprecated speech_cascade/tts.py (652 lines removed)
- Consolidate TTS functionality into voice/tts/playback.py
- Enhance CascadeOrchestrator with improved turn management
- Add VoiceSessionContext for clean dependency injection

API & Integration:
- Migrate /api/v1/browser/conversation to VoiceHandler
- Migrate /api/v1/media/stream to VoiceHandler
- Create MediaHandler→VoiceHandler compatibility alias
- Update media_handler.py for backward compatibility

Infrastructure:
- Improve telemetry with Azure-style span naming
- Enhance ACS helpers with better session management
- Update session terminator for lifecycle management
- Add orchestration improvements for unified agents

Configuration & Samples:
- Update auth agent and insurance scenario configs
- Add handoff tool enhancements with context variables
- Update gpt_flow sample for new patterns

Frontend:
- Refactor App.jsx for improved voice handling UI

Testing & Documentation:
- Add test_voice_handler_compat.py for backward compatibility
- Add MEDIAHANDLER_MIGRATION.md tracking document

This change maintains full backward compatibility while establishing
the foundation for cleaner voice processing patterns going forward.

Closes #[TBD]

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* enhanced the scenariobuilder with flowy  (#22)

* docs: add comprehensive voice processing architecture documentation

Add complete documentation for the voice processing architecture:

New Documentation:
- docs/architecture/voice/README.md - Comprehensive voice architecture guide
  * VoiceHandler overview and usage patterns
  * TTS playback and text processing
  * Speech cascade pipeline documentation
  * Audio specifications for browser and ACS transports
  * Testing guidelines with actual test file references
  * Troubleshooting guide for common issues

- apps/artagent/backend/voice/README.md - Developer quick reference
  * Directory structure and module organization
  * Quick start examples
  * Common tasks and patterns
  * File location guide
  * Testing commands

Documentation Updates:
- docs/mkdocs.yml - Add voice architecture to navigation
- docs/operations/troubleshooting.md - Add voice-specific troubleshooting

Key Improvements:
- Fixed mkdocs formatting for proper list rendering
- Updated all test references to match actual test files:
  * test_voice_handler_components.py
  * test_voice_handler_compat.py
  * test_cascade_orchestrator_entry_points.py
  * test_cascade_llm_processing.py
- Verified all script references (quick_test.sh, test_orchestrator.py)
- Added prerequisites for running tests with dev dependencies
- Included both basic and advanced testing examples

All file paths and examples have been verified against the actual codebase.

Related to #[TBD]

* Add custom styles for Flowy flowchart integration with agent blocks

* feat: Enhance output port visibility logic in ScenarioGraphCanvas

* feat: Add expandable full prompt view for source agent in HandoffEditorDialog

---------

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* Refactor ACS logging and add default orchestration scenario

- Removed info-level logging for ACS configuration details to reduce verbosity.
- Changed some logging statements to debug level for better log management.
- Updated peer.service attribute in telemetry to use "azure-communication-services".
- Introduced a new orchestration.yaml file defining a default customer service scenario with multiple agents and handoff configurations.

* Refactor ACS logging and add default orchestration scenario (#23)

- Removed info-level logging for ACS configuration details to reduce verbosity.
- Changed some logging statements to debug level for better log management.
- Updated peer.service attribute in telemetry to use "azure-communication-services".
- Introduced a new orchestration.yaml file defining a default customer service scenario with multiple agents and handoff configurations.

Co-authored-by: Jin Lee (HLS US SE) <[email protected]>

* Enhance logging functions to use log_plain for consistency and clarity in local development setup script

* Disable view toggle buttons for chat/graph/timeline in ConversationControls

* Add panning functionality to ScenarioGraphCanvas and reset button

* Update CHANGELOG.md for 2.0.0-beta.1 release: add new features, enhancements, fixes, and infrastructure changes

* feat: Add mkdocs-mermaid-zoom dependency and update locust load test scripts

- Added mkdocs-mermaid-zoom to pyproject.toml and uv.lock for enhanced diagram support in documentation.
- Enhanced locustfile.acs_media.py with rate limit detection and error handling improvements.
- Introduced locustfile.browser_conversation.py for testing browser-based voice conversation endpoints.
- Improved metrics naming conventions for clarity in load testing results.

* feat: Update Voice Live readiness status to use event envelope format

---------

Co-authored-by: Jin Lee <[email protected]>
Co-authored-by: Jin Lee (HLS US SE) <[email protected]>
Co-authored-by: Anna Quincy <[email protected]>
Co-authored-by: Jin Lee <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Consolidate infrastructure resource documentation into infra/README.md (#26)

* Initial plan

* Add comprehensive infrastructure resources documentation with private networking links

Co-authored-by: JinLee794 <[email protected]>

* Consolidate infrastructure documentation into infra/README.md

Co-authored-by: JinLee794 <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: JinLee794 <[email protected]>

* enhancement: infra docs readme update (#100)

* Update version and SKU name in staging params

* Change version for text-embedding-3-large model

Updated the version of the text-embedding-3-large model.

* Update main.tfvars.staging.json

* Update communication.tf

* feat: Enhance status envelope with optional label and update frontend to derive WS URL

- Added optional `label` parameter to `make_status_envelope` function in `envelopes.py` to allow custom labels in status messages.
- Updated `entrypoint.sh` to derive WebSocket URL from `BACKEND_URL` or use `WS_URL` if provided, replacing placeholders in frontend assets.
- Upgraded `js-yaml` and `vite` dependencies in `package.json` and `package-lock.json`.
- Enhanced `App.jsx` to format event type labels and summarize event data for better user experience.
- Introduced new demo scenarios in `DemoScenariosWidget.jsx` to showcase Microsoft Copilot Studio integration and ACS call routing.
- Added tests for call transfer events in `test_acs_events_handlers.py` to ensure correct envelope broadcasting for transfer accepted and failed events.
- Created a new Jupyter notebook for custom speech model demonstration in `12-custom-speech-model.ipynb`.
- Updated Terraform parameters to include a new text embedding model in `main.tfvars.dev.json`.

* refactor: Comment out unused email communication service domain resource

* refactor: Comment out unused Azure email communication service resources

* feat: Enhance event handling and UI components

- Added new utility functions for formatting event types and summarizing event data in App.jsx.
- Improved ChatBubble component to display event messages with formatted labels and timestamps.
- Updated DemoScenariosWidget to include new scenarios and enhanced filtering options based on tags.
- Introduced websocket URL derivation in postprovision.sh for better backend integration.
- Added tests for call transfer events in test_acs_events_handlers.py to ensure proper envelope broadcasting.
- Updated package.json to include js-yaml and upgraded vite version.

* add value

* feat: Enhance distributed session handling and improve PayPal agent interactions

- Implement distributed session bus using Redis for cross-replica session routing in connection manager.
- Add methods for publishing session envelopes to Redis channels.
- Introduce confirmation context for call center transfers to ensure explicit user consent.
- Update PayPal agent templates to clarify authentication and routing guidelines.
- Enhance real-time voice app to manage relay WebSocket connections and handle session updates more effectively.
- Improve error handling and logging for distributed session delivery and Redis interactions.
- Refactor session envelope handling in frontend to accommodate new event types and improve user experience.

* feat: Enhance status tone metadata and improve chat bubble styling

* feat: Implement background task handling for MFA delivery and improve greeting messages in handoff processes

* feat: Enhance call escalation process with detailed transfer context and improve PayPal agent handoff scenarios

* feat: Implement retry mechanism for browser session ID resolution in media streaming

* feat: Enhance session management and greeting handling across various components

* fixing session mapping for acs calls

* add value

* add value

* adding test file

* Adding agents and templates for credit card recommendation and fee dispute agents

* add value

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* Enhance audio transcription settings across agents and adjust logging levels for better debugging

* add value

* add value

* Implement Azure Voice Live service integration and enhance Terraform configurations for voice model deployments

* add value

* Add Azure Voice Live model configuration and outputs

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* fixing voicelive chat sequence on the ui

* remove sensitive contact information and unused transfer agency client data

* feat: Introduce Agent Consolidation Plan with YAML-driven architecture

- Added a comprehensive proposal for consolidating agent architecture in `apps/rtagent/backend/src/agents/`.
- Established key goals including single source of truth for agent definitions, auto-discovery, and unified tool registry.
- Analyzed current architecture and identified pain points such as manual handoff registration and duplicate tool registries.
- Proposed a new solution architecture featuring enhanced YAML schema, auto-discovery engine, and unified tool registry.
- Detailed implementation roadmap divided into phases for gradual migration and integration.
- Included backward compatibility strategy to ensure existing agents function without modification.
- Provided extensive documentation on YAML schema, CLI tool usage, and migration checklist.

* Refactor speech cascade handler and routing for browser communication

- Updated speech cascade handler to prioritize `on_greeting` callback over `on_tts_request` for greeting events.
- Added `queue_user_text` method to `SpeechCascadeHandler` for queuing user text input.
- Changed routing from `/realtime` to `/browser` for browser communication endpoints.
- Modified orchestration logic to ensure TTS responses are sent with blocking behavior to prevent overlap.
- Introduced WebSocket helper functions for better organization and clarity in messaging.
- Enhanced connection manager to handle Redis pubsub reconnections on credential expiration.
- Updated frontend components to reflect routing changes for browser communication.
- Adjusted tests to align with the new browser routing and functionality.
- Commented out live metrics enabling condition in telemetry configuration for future consideration.

* feat(telemetry): add decorators for tracing LLM, dependency, speech, and ACS calls

- Introduced , , , and  decorators for OpenTelemetry instrumentation.
- Implemented  context manager for tracking conversation turns with detailed metrics.
- Added helper functions for recording GenAI and speech metrics.
- Enhanced span attributes for Azure Application Insights visualization.

* Remove telemetry configuration module (telemetry_config_v2.py) to streamline codebase and eliminate unused functionality.

* feat: Enhance telemetry and tracing for CosmosDB and latency tool

- Added OpenTelemetry tracing to CosmosDB operations with a decorator for latency tracking.
- Integrated tracing spans in the LatencyTool for better observability in Application Insights.
- Updated telemetry configuration to suppress noisy logs and added new attributes for speech cascade metrics.
- Created unit tests for SessionAgentManager, covering configuration management, override resolution, handoff management, and persistence.
- Removed outdated endpoints review document.

* feat: Add useBackendHealth hook for backend health checks and integrate with readiness, agents, and health endpoints

test: Implement integration tests for VoiceLive Session Agent Manager, covering agent resolution, handoff mapping, and runtime modifications

* WARNING!!!! MAJOR REFACTOR COMMIT

- Removed the VoiceLive SDK integration module from the backend.
- Added a new AgentTopologyPanel component to the frontend for displaying agent inventory and connections.
- Integrated the AgentTopologyPanel into the main application layout.
- Updated the BackendIndicator to include agent count and selection functionality.
- Enhanced the ConversationControls with a fixed view switcher for better accessibility.
- Improved the useBackendHealth hook to handle various agent data structures.
- Updated styles for better responsiveness and visual consistency across components.
- Modified utility functions to format agent inventory data correctly.
- Adjusted import paths in orchestrators and tests to reflect the new backend structure.

* feat: Enhance agent handoff process and response handling; refactor UI components for improved usability

* feat: Update change notes for v2/speech-orchestration-and-monitoring branch; highlight major features, improvements, and new agents

* refactor: Remove Unified Agent Configuration Module; streamline agent management and improve code organization

* feat: Enhance ProfileDetailsPanel with resizable functionality and UI improvements

- Added resizable panel feature to ProfileDetailsPanel, allowing users to adjust width dynamically.
- Updated panel styling for improved aesthetics, including a gradient background and adjusted borders.
- Enhanced scrollbar visibility and overflow handling for better user experience.

refactor: Simplify GraphListView filter logic

- Removed default selection logic for filters in GraphListView, allowing users to start with no filters applied.
- Cleaned up useEffect dependencies for better performance and clarity.

docs: Introduce Backend Voice & Agents Architecture documentation

- Added comprehensive documentation outlining the architecture of backend voice and agent modules.
- Detailed separation of concerns between voice transport and agent business logic.
- Included data flow diagrams and module responsibilities for clarity.

docs: Create Handoff Logic Inventory for better understanding of handoff processes

- Documented the handoff logic across backend voice and agent modules.
- Established a single source of truth for handoff mappings and protocols.
- Summarized cleanup phases and their impact on the codebase.

fix: Update logging to safely handle span names

- Modified TraceLogFilter to safely retrieve span names, preventing attribute errors with NonRecordingSpan.

fix: Adjust telemetry configuration to capture all loggers

- Changed logger_name default to an empty string in TelemetryConfig to capture all loggers.

* feat: Implement context-aware greeting rendering in VoiceLive agent; enhance session management and logging

* feat: Refactor agent configuration and voice handling; streamline agent switching and TTS integration

* feat: Enhance Agent Details Panel and Session Management

- Added sessionAgentConfig prop to AgentDetailsPanel for dynamic agent configuration display.
- Implemented logic to show agent name, description, tools, and model/voice details based on session configuration.
- Introduced a new PanelCard in AgentDetailsPanel to display session agent configuration, including model, voice, and prompt preview.
- Updated App component to fetch session agent configuration on agent panel visibility and manage agent creation/updating.
- Added validation for TTS client initialization in dedicated_tts_pool.py to ensure clients are ready before use.
- Enhanced on_demand_pool.py to validate cached resources an…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants