azure cosmos db
10 TopicsBuilding Enterprise Voice-Enabled AI Agents with Azure Voice Live API
The sample application covered in this post demonstrates two approaches in an end-to-end solution that includes product search, order management, automated shipment creation, intelligent analytics, and comprehensive business intelligence through Microsoft Fabric integration. Use Case Scenario: Retail Fashion Agent Core Business Capabilities: Product Discovery and Ordering: Natural language product search across fashion categories (Winter wear, Active wear, etc.) and order placement. REST APIs hosted in Azure Function Apps provide this functionality and a Swagger definition is configured in the Application for tool action. Automated Fulfillment: Integration with Azure Logic Apps for shipment creation in Azure SQL Database Policy Support: Vector-powered QnA for returns, payment issues, and customer policies. Azure AI Search & File Search capabilities are used for this requirement. Conversation Analytics: AI-powered analysis using GPT-4o for sentiment scoring and performance evaluation. The Application captures the entire conversation between the customer and Agent and sends them to an Agent running in Azure Logic Apps to perform call quality assessment, before storing the results in Azure CosmosDB. When during the voice call the customer indicates that the conversation can be concluded, the Agent autonomously sends the conversation history to the Azure Logic App to perform quality assessment. Advanced Analytics Pipeline: Real-time Data Mirroring: Automatic synchronization from Azure Cosmos DB to Microsoft Fabric OneLake Business Intelligence: Custom Data Agents in Fabric for trend analysis and insights Executive Dashboards: Power BI reports for comprehensive performance monitoring Technical Architecture Overview The solution presents two approaches, each optimized for different enterprise scenarios: 🎯Approach 1: Direct Model Integration with GPT-Realtime Architecture Components This approach provides direct integration with Azure Voice Live API using GPT-Realtime model for immediate speech-to-speech conversational experiences without intermediate text processing. The Application connects to the Voice Live API uses a Web socket connection. The semantics of this API are similar to the one used when connecting to the GPT-Realtime API directly. The Voice Live API provides additional configurability, like the choice of a custom Voice from Azure Speech Services, options for echo cancellation, noise reduction and plugging an Avatar integration. Core Technical Stack: GPT-Realtime Model: Direct audio-to-audio processing Azure Speech Voice: High-quality TTS synthesis (en-IN-AartiIndicNeural) WebSocket Communication: Real-time bidirectional audio streaming Voice Activity Detection: Server-side VAD for natural conversation flow Client-Side Function Calling: Full control over tool execution logic Key Session Configuration The Direct Model Integration uses the session configuration below: session_config = { "input_audio_sampling_rate": 24000, "instructions": system_instructions, "turn_detection": { "type": "server_vad", "threshold": 0.5, "prefix_padding_ms": 300, "silence_duration_ms": 500, }, "tools": tools_list, "tool_choice": "auto", "input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"}, "input_audio_echo_cancellation": {"type": "server_echo_cancellation"}, "voice": { "name": "en-IN-AartiIndicNeural", "type": "azure-standard", "temperature": 0.8, }, "input_audio_transcription": {"model": "whisper-1"}, } Configuration Highlights: 24kHz Audio Sampling: High-quality audio processing for natural speech Server VAD: Optimized threshold (0.5) with 300ms padding for natural conversation flow Azure Deep Noise Suppression: Advanced noise reduction for clear audio Indic Voice Support: en-IN-AartiIndicNeural for localized customer experience Whisper-1 Transcription: Accurate speech recognition for conversation logging Connecting to the Azure Voice Live API The voicelive_modelclient.py demonstrates advanced WebSocket handling for real-time audio streaming: def get_websocket_url(self, access_token: str) -> str: """Generate WebSocket URL for Voice Live API.""" azure_ws_endpoint = endpoint.rstrip("/").replace("https://", "wss://") return ( f"{azure_ws_endpoint}/voice-live/realtime?api-version={api_version}" f"&model={model_name}" f"&agent-access-token={access_token}" ) async def connect(self): if self.is_connected(): # raise Exception("Already connected") self.log("Already connected") # Get access token access_token = self.get_azure_token() # Build WebSocket URL and headers ws_url = self.get_websocket_url(access_token) self.ws = await websockets.connect( ws_url, additional_headers={ "Authorization": f"Bearer {self.get_azure_token()}", "x-ms-client-request-id": str(uuid.uuid4()), }, ) print(f"Connected to Azure Voice Live API....") asyncio.create_task(self.receive()) await self.update_session() Function Calling Implementation The Direct Model Integration provides client-side function execution with complete control: tools_list = [ { "type": "function", "name": "perform_search_based_qna", "description": "call this function to respond to the user query on Contoso retail policies, procedures and general QnA", "parameters": { "type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"], }, }, { "type": "function", "name": "create_delivery_order", "description": "call this function to create a delivery order based on order id and destination location", "parameters": { "type": "object", "properties": { "order_id": {"type": "string"}, "destination": {"type": "string"}, }, "required": ["order_id", "destination"], }, }, { "type": "function", "name": "perform_call_log_analysis", "description": "call this function to analyze call log based on input call log conversation text", "parameters": { "type": "object", "properties": { "call_log": {"type": "string"}, }, "required": ["call_log"], }, }, { "type": "function", "name": "search_products_by_category", "description": "call this function to search for products by category", "parameters": { "type": "object", "properties": { "category": {"type": "string"}, }, "required": ["category"], }, }, { "type": "function", "name": "order_products", "description": "call this function to order products by product id and quantity", "parameters": { "type": "object", "properties": { "product_id": {"type": "string"}, "quantity": {"type": "integer"}, }, "required": ["product_id", "quantity"], }, } ] 🤖 Approach 2: Azure AI Foundry Agent Integration Architecture Components This approach leverages existing Azure AI Foundry Service Agents, providing enterprise-grade voice capabilities as a clean wrapper over pre-configured agents. It does not entail any code changes to the Agent itself to voice enable it. Core Technical Stack: Azure Fast Transcript: Advanced multi-language speech-to-text processing Azure AI Foundry Agent: Pre-configured Agent with autonomous capabilities GPT-4o-mini Model: Agent-configured model for text processing Neural Voice Synthesis: Indic language optimized TTS Semantic VAD: Azure semantic voice activity detection Session Configuration The Agent Integration approach uses advanced semantic voice activity detection: session_config = { "input_audio_sampling_rate": 24000, "turn_detection": { "type": "azure_semantic_vad", "threshold": 0.3, "prefix_padding_ms": 200, "silence_duration_ms": 200, "remove_filler_words": False, "end_of_utterance_detection": { "model": "semantic_detection_v1", "threshold": 0.01, "timeout": 2, }, }, "input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"}, "input_audio_echo_cancellation": {"type": "server_echo_cancellation"}, "voice": { "name": "en-IN-AartiIndicNeural", "type": "azure-standard", "temperature": 0.8, }, "input_audio_transcription": {"model": "azure-speech", "language": "en-IN, hi-IN"}, } Key Differentiators: Semantic VAD: Intelligent voice activity detection with utterance prediction Multi-language Support: Azure Speech with en-IN and hi-IN language support End-of-Utterance Detection: AI-powered conversation turn management Filler Word Handling: Configurable processing of conversational fillers Agent Integration Code The voicelive_client.py demonstrates seamless integration with Azure AI Foundry Agents. Notice that we need to provide the Azure AI Foundry Project Name and an ID of the Agent in it. We do not need to pass the model's name here, since the Agent is already configured with one. def get_websocket_url(self, access_token: str) -> str: """Generate WebSocket URL for Voice Live API.""" azure_ws_endpoint = endpoint.rstrip("/").replace("https://", "wss://") return ( f"{azure_ws_endpoint}/voice-live/realtime?api-version={api_version}" f"&agent-project-name={project_name}&agent-id={agent_id}" f"&agent-access-token={access_token}" ) async def connect(self): """Connects the client using a WS Connection to the Realtime API.""" if self.is_connected(): # raise Exception("Already connected") self.log("Already connected") # Get access token access_token = self.get_azure_token() # Build WebSocket URL and headers ws_url = self.get_websocket_url(access_token) self.ws = await websockets.connect( ws_url, additional_headers={ "Authorization": f"Bearer {self.get_azure_token()}", "x-ms-client-request-id": str(uuid.uuid4()), }, ) print(f"Connected to Azure Voice Live API....") asyncio.create_task(self.receive()) await self.update_session() Advanced Analytics Pipeline GPT-4o Powered Call Analysis The solution implements conversation analytics using Azure Logic Apps with GPT-4o: { "functions": [ { "name": "evaluate_call_log", "description": "Evaluate call log for Contoso Retail customer service call", "parameters": { "properties": { "call_reason": { "description": "Categorized call reason from 50+ predefined scenarios", "type": "string" }, "customer_satisfaction": { "description": "Overall satisfaction assessment", "type": "string" }, "customer_sentiment": { "description": "Emotional tone analysis", "type": "string" }, "call_rating": { "description": "Numerical rating (1-5 scale)", "type": "number" }, "call_rating_justification": { "description": "Detailed reasoning for rating", "type": "string" } } } } ] } Microsoft Fabric Integration The analytics pipeline extends into Microsoft Fabric for enterprise business intelligence: Fabric Integration Features: Real-time Data Mirroring: Cosmos DB to OneLake synchronization Custom Data Agents: Business-specific analytics agents in Fabric Copilot Integration: Natural language business intelligence queries Power BI Dashboards: Interactive reports and executive summaries Artefacts for reference The source code of the solution is available in the GitHub Repo here. An article on this topic is published on LinkedIn here A video recording of the demonstration of this App is available below: Part1 - walkthrough of the Agent configuration in Azure AI Foundry - here Part2 - demonstration of the Application that integrates with the Azure Voice Live API - here Part 3 - demonstration of the Microsoft Fabric Integration, Data Agents, Copilot in Fabric and Power BI for insights and analysis - here Conclusion Azure Voice Live API enables enterprises to build sophisticated voice-enabled AI assistants using two distinct architectural approaches. The Direct Model Integration provides ultra-low latency for real-time applications, while the Azure AI Foundry Agent Integration offers enterprise-grade governance and autonomous operation. Both approaches deliver the same comprehensive business capabilities: Natural voice interactions with advanced VAD and noise suppression Complete retail workflow automation from inquiry to fulfillment AI-powered conversation analytics with sentiment scoring Enterprise business intelligence through Microsoft Fabric integration The choice between approaches depends on your specific requirements: Choose Direct Model Integration for custom function calling and minimal latency Choose Azure AI Foundry Agent Integration for enterprise governance and existing investments544Views1like0CommentsCreating Intelligent Video Summaries and Avatar Videos with Azure AI Services
Unlock the true value of your organization’s video content! In this post, I share how we built an end-to-end AI video analytics platform using Microsoft Azure. Discover how AI can automate video analysis, generate intelligent summaries, and create engaging avatar presentations—making content more accessible, actionable, and impactful for everyone. If you’re interested in digital transformation, AI-powered automation, or modern content management, this is for you!711Views5likes1CommentAzure APIM Cost Rate Limiting with Cosmos & Flex Functions
Azure API Management (APIM) provides built-in rate limiting policies, but implementing sophisticated Dollar cost quota management for Azure OpenAI services requires a more tailored approach. This solution combines Azure Functions, Cosmos DB, and stored procedures to implement cost-based quota management with automatic renewal periods. Architecture Client → APIM (with RateLimitConfig) → Azure Function Proxy → Azure OpenAI ↓ Cosmos DB (quota tracking) Technical Implementation 1. Rate Limit Configuration in APIM The rate limiting configuration is injected into the request body by APIM using a policy fragment. Here's an example for a basic $5 quota: <set-variable name="rateLimitConfig" value="@{ var productId = context.Product.Id; var config = new JObject(); config["counterKey"] = productId; config["quota"] = 5; return config.ToString(); }" /> <include-fragment fragment-id="RateLimitConfig" /> For more advanced scenarios, you can customize token costs. Here's an example for a $10 quota with custom token pricing: <set-variable name="rateLimitConfig" value="@{ var productId = context.Product.Id; var config = new JObject(); config["counterKey"] = productId; config["startDate"] = "2025-03-02T00:00:00Z"; config["renewal_period"] = 86400; config["explicitEndDate"] = null; config["quota"] = 10; config["input_cost_per_token"] = 0.00003; config["output_cost_per_token"] = 0.00006; return config.ToString(); }" /> <include-fragment fragment-id="RateLimitConfig" /> Flexible Counter Keys The counterKey parameter is highly flexible and can be set to any unique identifier that makes sense for your rate limiting strategy: Product ID: Limit all users of a specific APIM product (e.g., "starter", "professional") User ID: Apply individual limits per user Subscription ID: Track usage at the subscription level Custom combinations: Combine identifiers for granular control (e.g., "product_starter_user_12345") Rate Limit Configuration Parameters Parameter Description Example Value Required counterKey Unique identifier for tracking quota usage "starter10" or "user_12345" Yes quota Maximum cost allowed in the renewal period 10 Yes startDate When the quota period begins. If not provided, the system uses the time when the policy is first applied "2025-03-02T00:00:00Z" No renewal_period Seconds until quota resets (86400 = daily). If not provided, no automatic reset occurs 86400 No endDate Optional end date for the quota period null or "2025-12-31T23:59:59Z" No input_cost_per_token Custom cost per input token 0.00003 No output_cost_per_token Custom cost per output token 0.00006 No Scheduling and Time Windows The time-based parameters work together to create flexible quota schedules: If the current date falls outside the range defined by startDate and endDate , requests will be rejected with an error The renewal window begins either on the specified startDate or when the policy is first applied The renewal_period determines how frequently the accumulated cost resets to zero Without a renewal_period , the quota accumulates indefinitely until the endDate is reached 2. Quota Checking and Cost Tracking The Azure Function performs two key operations: Pre-request quota check: Before processing each request, it verifies if the user has exceeded their quota Post-request cost tracking: After a successful request, it calculates the cost and updates the accumulated usage Cost Calculation For cost calculation, the system uses: Custom pricing: If input_cost_per_token and output_cost_per_token are provided in the rate limit config LiteLLM pricing: If custom pricing is not specified, the system falls back to LiteLLM's model prices for accurate cost estimation based on the model being used The function returns appropriate HTTP status codes and headers: HTTP 429 (Too Many Requests) when quota is exceeded Response headers with usage information: x-counter-key: starter5 x-accumulated-cost: 5.000915 x-quota: 5 3. Cosmos DB for State Management Cosmos DB maintains the quota state with documents that track: { "id": "starter5", "counterKey": "starter5", "accumulatedCost": 5.000915, "startDate": "2025-03-02T00:00:00.000Z", "renewalPeriod": 86400, "renewalStart": 1741132800000, "endDate": null, "quota": 5 } A stored procedure handles atomic updates to ensure accurate tracking, including: Adding costs to the accumulated total Automatically resetting costs when the renewal period is reached Updating quota values when configuration changes Benefits Fine-grained Cost Control: Track actual API usage costs rather than just request counts Flexible Quotas: Set daily, weekly, or monthly quotas with automatic renewal Transparent Usage: Response headers provide real-time quota usage information Product Differentiation: Different APIM products can have different quota levels Custom Pricing: Override default token costs for special pricing tiers Flexible Tracking: Use any identifier as the counter key for versatile quota management Time-based Scheduling: Define active periods and automatic reset windows for quota management Getting Started Deploy the Azure Function with Cosmos DB integration Configure APIM policies to include rate limit configuration Set up different product policies for various quota levels For a detailed implementation, visit our GitHub repository. Demo Video: https://www.youtube.com/watch?v=vMX86_XpSAo Tags: #AzureOpenAI #APIM #CosmosDB #RateLimiting #Serverless467Views0likes2CommentsBuilt-in Enterprise Readiness with Azure AI Agent Service
Ensure enterprise-grade security and compliance with Private Network Isolation (BYO VNet) in Azure AI Agent Service. This feature allows AI agents to operate within a private, isolated network, giving organizations full control over data and networking configurations. Learn how Private Network Isolation enhances security, scalability, and compliance for mission-critical AI workloads.3.3KViews3likes0CommentsIntroducing Azure AI Agent Service
Introducing Azure AI Agent Service at Microsoft Ignite 2024 Discover how Azure AI Agent Service is revolutionizing the development and deployment of AI agents. This service empowers developers to build, deploy, and scale high-quality AI agents tailored to business needs within hours. With features like rapid development, extensive data connections, flexible model selection, and enterprise-grade security, Azure AI Agent Service sets a new standard in AI automation84KViews10likes8CommentsSix reasons why startups and at-scale cloud native companies build their GenAI Apps with Azure
Azure has evolved as a platform of choice for many startups including Perplexity and Moveworks, as well as at-scale companies today. Here are six reasons why we see companies of all sizes building their GenAI apps on Azure OpenAI Service.3.1KViews2likes0CommentsThe Azure Multimodal AI & LLM Processing Solution Accelerator
The Azure Multimodal AI & LLM Processing Accelerator is your one-stop-shop for all backend AI+LLM processing use cases like content summarization, extraction, classification and enrichment. This single accelerator supports all types of input data (text, documents, audio, image, video etc) and combines the best of Azure AI Services and LLMs to achieve reliable, consistent and scalable automation of tasks.7.1KViews2likes0CommentsAzure Cosmos DB Conf 2024: A Deep Dive for Developers and AI Enthusiasts
The world of data is constantly evolving, and developers need powerful tools to keep pace. Enter Azure Cosmos DB, a globally distributed NoSQL database built for modern app development. This year’s Azure Cosmos DB Conf, happening virtually on April 16th, 2024, at 9AM – 12PM PT, is set to offer a wealth of insights and learning opportunities for anyone interested in Azure Cosmos DB.2.6KViews1like0Comments