azure cosmos db
38 TopicsBuilding Secure AI Chat Systems: Part 2 - Securing Your Architecture from Storage to Network
In Part 1 of this series, we tackled the critical challenge of protecting the LLM itself from malicious inputs. We implemented three essential security layers using Azure AI services: harmful content detection with Azure Content Safety, PII protection with Azure Text Analytics, and prompt injection prevention with Prompt Shields. These guardrails ensure that your AI model doesn't process harmful requests or leak sensitive information through cleverly crafted prompts. But even with a perfectly secured LLM, your entire AI chat system can still be compromised through architectural vulnerabilities. For example, the WotNot incident wasn't about prompt injection—it was 346,000 files sitting in an unsecured cloud storage bucket. Likewise the OmniGPT breach with 34 million lines of conversation logs due to backend database security failures. The global average cost of a data breach is now $4.44 million, and it takes organizations an average of 241 days to identify and contain an active breach. That's eight months where attackers have free reign in your systems. The financial cost is one thing, but the reputational damage and loss of customer is irreversible. This article focuses on the architectural security concerns I mentioned at the end of Part 1—the infrastructure that stores your chat histories, the networks that connect your services, and the databases that power your vector searches. We'll examine real-world breaches that happened in 2024 and 2025, understand exactly what went wrong, and implement Azure solutions that would have prevented them. By the end of this article, you'll have a production-ready, secure architecture for your AI chat system that addresses the most common—and most devastating—security failures we're seeing in the wild. Let's start with the most fundamental question: where is your data, and who can access it? 1. Preventing Exposed Storage with Network Isolation The Problem: When Your Database Is One Google Search Away Let me paint you a picture of what happened with two incidents in 2024-2025: WotNot AI Chatbot left 346,000 files completely exposed in an unsecured cloud storage bucket—passports, medical records, sensitive customer data, all accessible to anyone on the internet without even a password. Security researchers who discovered it tried for over two months to get the company to fix it. In May 2025, Canva Creators' data was exposed through an unsecured Chroma vector database operated by an AI chatbot company. The database contained 341 collections of documents including survey responses from 571 Canva Creators with email addresses, countries of residence, and comprehensive feedback. This marked the first reported data leak involving a vector database. The common thread? Public internet accessibility. These databases and storage accounts were accessible from anywhere in the world. No VPN required. No private network. Just a URL and you were in. Think about your current architecture. If someone found your Cosmos DB connection string or your Azure Storage account name, what's stopping them from accessing it? If your answer is "just the access key" or "firewall rules," you're one leaked credential away from being in the headlines. So what to do: Azure Private Link + Network Isolation The most effective way to prevent public exposure is simple: remove public internet access entirely. This is where Azure Private Link becomes your architectural foundation. With Azure Private Link, you can create a private endpoint inside your Azure Virtual Network (VNet) that becomes the exclusive gateway to your Azure services. Your Cosmos DB, Storage Accounts, Azure OpenAI Service, and other resources are completely removed from the public internet—they only respond to requests originating from within your VNet. Even if someone obtains your connection strings or access keys, they cannot use them without first gaining access to your private network. Implementation Overview: To implement Private Link for your AI chat system, you'll need to: Create an Azure Virtual Network (VNet) to host your private endpoints and application resources Configure private endpoints for each service (Cosmos DB, Storage, Azure OpenAI, Key Vault) Set up private DNS zones to automatically resolve service URLs to private IPs within your VNet Disable public network access on all your Azure resources Deploy your application inside the VNet using Azure App Service with VNet integration, Azure Container Apps, or Azure Kubernetes Service Verify isolation by attempting to access resources from outside the VNet (should fail) You can configure this through the Azure Portal, Azure CLI, ARM templates, or infrastructure-as-code tools like Terraform. The Azure documentation provides step-by-step guides for each service type. Figure 1: Private Link Architecture for AI Chat Systems Private endpoints ensure all data access occurs within the Azure Virtual Network, blocking public internet access to databases, storage, and AI services. 2. Protecting Conversation Data with Encryption at Rest The Problem: When Backend Databases Become Treasure Troves Network isolation solves the problem of external access, but what happens when attackers breach your perimeter through other means? What if a malicious insider gains access? What if there's a misconfiguration in your cloud environment? The data sitting in your databases becomes the ultimate prize. In February 2025, OmniGPT suffered a catastrophic breach where attackers accessed the backend database and extracted personal data from 30,000 users including emails, phone numbers, API keys, and over 34 million lines of conversation logs. The exposed data included links to uploaded files containing sensitive credentials, billing details, and API keys. These weren't prompt injection attacks. These weren't DDoS incidents. These were failures to encrypt sensitive data at rest. When attackers accessed the storage layer, they found everything in readable format—a goldmine of personal information, conversations, and credentials. Think about the conversations your AI chat system stores. Customer support queries that might include account numbers. Healthcare chatbots discussing symptoms and medications. HR assistants processing employee grievances. If someone gained unauthorized (or even authorized) access to your database today, would they be reading plaintext conversations? What to do: Azure Cosmos DB with Customer-Managed Keys The fundamental defense against data exposure is encryption at rest—ensuring that data stored on disk is encrypted and unreadable without the proper decryption keys. Even if attackers gain physical or logical access to your database files, the data remains protected as long as they don't have access to the encryption keys. But who controls those keys? With platform-managed encryption (the default in most cloud services), the cloud provider manages the encryption keys. While this protects against many threats, it doesn't protect against insider threats at the provider level, compromised provider credentials, or certain compliance scenarios where you must prove complete key control. Customer-Managed Keys (CMK) solve this by giving you complete ownership and control of the encryption keys. You generate, store, and manage the keys in your own key vault. The cloud service can only decrypt your data by requesting access to your keys—access that you control and can revoke at any time. If your keys are deleted or access is revoked, even the cloud provider cannot decrypt your data. Azure makes this easy with Azure Key Vault integrated with Azure Cosmos DB. The architecture uses "envelope encryption" where your data is encrypted with a Data Encryption Key (DEK), and that DEK is itself encrypted with your Key Encryption Key (KEK) stored in Key Vault. This provides layered security where even if the database is compromised, the data remains encrypted with keys only you control. While we covered PII detection and redaction using Azure Text Analytics in Part 1—which prevents sensitive data from being stored in the first place—encryption at rest with Customer-Managed Keys provides an additional, powerful layer of protection. In fact, many compliance frameworks like HIPAA, PCI-DSS, and certain government regulations explicitly require customer-controlled encryption for data at rest, making CMK not just a best practice but often a mandatory requirement for regulated industries. Implementation Overview: To implement Customer-Managed Keys for your chat history and vector storage: Create an Azure Key Vault with purge protection and soft delete enabled (required for CMK) Generate or import your encryption key in Key Vault (2048-bit RSA or 256-bit AES keys) Grant Cosmos DB access to Key Vault using a system-assigned or user-assigned managed identity Enable CMK on Cosmos DB by specifying your Key Vault key URI during account creation or update Configure the same for Azure Storage if you're storing embeddings or documents in Blob Storage Set up key rotation policies to automatically rotate keys on a schedule (recommended: every 90 days) Monitor key usage through Azure Monitor and set up alerts for unauthorized access attempts Figure 2: Envelope Encryption with Customer-Managed Keys User conversations are encrypted using a two-layer approach: (1) The AI Chat App sends plaintext messages to Cosmos DB, (2) Cosmos DB authenticates to Key Vault using Managed Identity to retrieve the Key Encryption Key (KEK), (3) Data is encrypted with a Data Encryption Key (DEK), (4) The DEK itself is encrypted with the KEK before storage. This ensures data remains encrypted even if the database is compromised, as decryption requires access to keys stored in your Key Vault. For AI chat systems in regulated industries (healthcare, finance, government), Customer-Managed Keys should be your baseline. The operational overhead is minimal with proper automation, and the compliance benefits are substantial. The entire process can be automated using Azure CLI, PowerShell, or infrastructure-as-code tools. For existing Cosmos DB accounts, enabling CMK requires creating a new account and migrating data. 3. Securing Vector Databases and Preventing Data Leakage The Problem: Vector Embeddings Are Data Too Vector databases are the backbone of modern RAG (Retrieval-Augmented Generation) systems. They store embeddings—mathematical representations of your documents, conversations, and knowledge base—that allow your AI to retrieve relevant context for every user query. But here's what most developers don't realize: those vectors aren't just abstract numbers. They contain your actual data. A critical oversight in AI chat architectures is treating vector databases—or in our case, Cosmos DB collections storing embeddings—as less sensitive than traditional data stores. Whether you're using a dedicated vector database or storing embeddings in Cosmos DB alongside your chat history, these mathematical representations need the same rigorous security controls as the original text. In documented cases, shared vector databases inadvertently mixed data between two corporate clients. One client's proprietary information began surfacing in response to the other client's queries, creating a serious confidentiality breach in what was supposed to be a multi-tenant system. Even more concerning are embedding inversion attacks, where adversaries exploit weaknesses to reconstruct original source data from its vector representation—effectively reverse-engineering your documents from the mathematical embeddings. Think about what's in your vector storage right now. Customer support conversations. Internal company documents. Product specifications. Medical records. Legal documents. If you're running a multi-tenant system, are you absolutely certain that Company A can't retrieve Company B's data? Can you guarantee that embeddings can't be reverse-engineered to expose the original text? What to do: Azure Cosmos DB for MongoDB with Logical Partitioning and RBAC The security of vector databases requires a multi-layered approach that addresses both storage isolation and access control. Azure Cosmos DB for MongoDB provides native support for vector search while offering enterprise-grade security features specifically designed for multi-tenant architectures. Logical partitioning creates strict data boundaries within your database by organizing data into isolated partitions based on a partition key (like tenant_id or user_id). When combined with Role-Based Access Control (RBAC), you create a security model where users and applications can only access their designated partitions—even if they somehow gain broader database access. Implementation Overview: To implement secure multi-tenant vector storage with Cosmos DB: Enable MongoDB RBAC on your Cosmos DB account using the EnableMongoRoleBasedAccessControl capability Design your partition key strategy based on tenant_id, user_id, or organization_id for maximum isolation Create collections with partition keys that enforce tenant boundaries at the storage level Define custom RBAC roles that grant access only to specific databases and partition key ranges Create user accounts per tenant or service principal with assigned roles limiting their scope Implement partition-aware queries in your application to always include the partition key filter Enable diagnostic logging to track all vector retrieval operations with user identity Configure cross-region replication for high availability while maintaining partition isolation Figure 3: Multi-Tenant Data Isolation with Partition Keys and RBAC Azure Cosmos DB enforces tenant isolation through logical partitioning and Role-Based Access Control (RBAC). Each tenant's data is stored in separate partitions (Partition A, B, C) based on the partition key (tenant_id). RBAC acts as a security gateway, validating every query to ensure users can only access their designated partition. Attempts to access other tenants' partitions are blocked at the RBAC layer, preventing cross-tenant data leakage in multi-tenant AI chat systems. Azure provides comprehensive documentation and CLI tools for configuring RBAC roles and partition strategies. The key is to design your partition scheme before loading data, as changing partition keys requires data migration. Beyond partitioning and RBAC, implement these AI-specific security measures: Validate embedding sources: Authenticate and continuously audit external data sources before vectorizing to prevent poisoned embeddings Implement similarity search thresholds: Set minimum similarity scores to prevent irrelevant cross-context retrieval Use metadata filtering: Add security labels (classification levels, access groups) to vector metadata and enforce filtering Monitor retrieval patterns: Alert on unusual patterns like one tenant making queries that correlate with another tenant's data Separate vector databases per sensitivity level: Keep highly confidential vectors (PII, PHI) in dedicated databases with stricter controls Hash document identifiers: Use hashed references instead of plaintext IDs in vector metadata to prevent enumeration attacks For production AI chat systems handling multiple customers or sensitive data, Cosmos DB with partition-based RBAC should be your baseline. The combination of storage-level isolation and access control provides defense in depth that application-layer filtering alone cannot match. Bonus: Secure Logging and Monitoring for AI Chat Systems During development, we habitually log everything—full request payloads, user inputs, model responses, stack traces. It's essential for debugging. But when your AI chat system goes to production and starts handling real user conversations, those same logging practices become a liability. Think about what flows through your AI chat system: customer support conversations containing account numbers, healthcare queries discussing medical conditions, HR chatbots processing employee complaints, financial assistants handling transaction details. If you're logging full conversations for debugging, you're creating a secondary repository of sensitive data that's often less protected than your primary database. The average breach takes 241 days to identify and contain. During that time, attackers often exfiltrate not just production databases, but also log files and monitoring data—places where developers never expected sensitive information to end up. The question becomes: how do you maintain observability and debuggability without creating a security nightmare? The Solution: Structured Logging with PII Redaction and Azure Monitor The key is to log metadata, not content. You need enough information to trace issues and understand system behavior without storing the actual sensitive conversations. Azure Monitor with Application Insights provides enterprise-grade logging infrastructure with built-in features for sanitizing sensitive data. Combined with proper application-level controls, you can maintain full observability while protecting user privacy. What to Log in Production AI Chat Systems: DO Log DON'T Log Request timestamps and duration Full user messages or prompts User IDs (hashed or anonymized) Complete model responses Session IDs (hashed) Raw embeddings or vectors Model names and versions used Personally identifiable information (PII) Token counts (input/output) Retrieved document content Embedding dimensions and similarity scores Database connection strings or API keys Retrieved document IDs (not content) Complete stack traces that might contain data Error codes and exception types Performance metrics (latency, throughput) RBAC decisions (access granted/denied) Partition keys accessed Rate limiting triggers Final Remarks: Building Compliant, Secure AI Systems Throughout this two-part series, we've addressed the complete security spectrum for AI chat systems—from protecting the LLM itself to securing the underlying infrastructure. But there's a broader context that makes all of this critical: compliance and regulatory requirements. AI chat systems operate within an increasingly complex regulatory landscape. The EU AI Act, which entered force on August 1, 2024, became the first comprehensive AI regulation by a major regulator, assigning applications to risk categories with high-risk systems subject to specific legal requirements. The NIS2 Directive further requires that AI model endpoints, APIs, and data pipelines be protected to prevent breaches and ensure secure deployment. Beyond AI-specific regulations, chat systems must comply with established data protection frameworks depending on their use case. GDPR mandates data minimization, user rights to erasure and data portability, 72-hour breach notification, and EU data residency for systems serving European users. Healthcare chatbots must meet HIPAA requirements including encryption, access controls, 6-year audit log retention, and Business Associate Agreements. Systems processing payment information fall under PCI-DSS, requiring cardholder data isolation, encryption, role-based access controls, and regular security testing. B2B SaaS platforms typically need SOC 2 Type II compliance, demonstrating security controls over data availability, confidentiality, continuous monitoring, and incident response procedures. Azure's architecture directly supports these compliance requirements through its built-in capabilities. Private Link enables data residency by keeping traffic within specified Azure regions while supporting network isolation requirements. Customer-Managed Keys provide the encryption controls and key ownership mandated by HIPAA and PCI-DSS. Cosmos DB's partition-based RBAC creates the access controls and audit trails required across all frameworks. Azure Monitor and diagnostic logging satisfy audit and monitoring requirements, while Azure Policy and Microsoft Purview automate compliance enforcement and reporting. The platform's certifications and compliance offerings (including HIPAA, PCI-DSS, SOC 2, and GDPR attestations) provide the documentation and third-party validation that auditors require, significantly reducing the operational burden of maintaining compliance. Further Resources: Azure Private Link Documentation Azure Cosmos DB Customer-Managed Keys Azure Key Vault Overview Azure Cosmos DB Role-Based Access Control Azure Monitor and Application Insights Azure Policy for Compliance Microsoft Purview Data Governance Azure Security Benchmark Stay secure, stay compliant, and build responsibly.114Views0likes0CommentsBuilding Enterprise Voice-Enabled AI Agents with Azure Voice Live API
The sample application covered in this post demonstrates two approaches in an end-to-end solution that includes product search, order management, automated shipment creation, intelligent analytics, and comprehensive business intelligence through Microsoft Fabric integration. Use Case Scenario: Retail Fashion Agent Core Business Capabilities: Product Discovery and Ordering: Natural language product search across fashion categories (Winter wear, Active wear, etc.) and order placement. REST APIs hosted in Azure Function Apps provide this functionality and a Swagger definition is configured in the Application for tool action. Automated Fulfillment: Integration with Azure Logic Apps for shipment creation in Azure SQL Database Policy Support: Vector-powered QnA for returns, payment issues, and customer policies. Azure AI Search & File Search capabilities are used for this requirement. Conversation Analytics: AI-powered analysis using GPT-4o for sentiment scoring and performance evaluation. The Application captures the entire conversation between the customer and Agent and sends them to an Agent running in Azure Logic Apps to perform call quality assessment, before storing the results in Azure CosmosDB. When during the voice call the customer indicates that the conversation can be concluded, the Agent autonomously sends the conversation history to the Azure Logic App to perform quality assessment. Advanced Analytics Pipeline: Real-time Data Mirroring: Automatic synchronization from Azure Cosmos DB to Microsoft Fabric OneLake Business Intelligence: Custom Data Agents in Fabric for trend analysis and insights Executive Dashboards: Power BI reports for comprehensive performance monitoring Technical Architecture Overview The solution presents two approaches, each optimized for different enterprise scenarios: 🎯Approach 1: Direct Model Integration with GPT-Realtime Architecture Components This approach provides direct integration with Azure Voice Live API using GPT-Realtime model for immediate speech-to-speech conversational experiences without intermediate text processing. The Application connects to the Voice Live API uses a Web socket connection. The semantics of this API are similar to the one used when connecting to the GPT-Realtime API directly. The Voice Live API provides additional configurability, like the choice of a custom Voice from Azure Speech Services, options for echo cancellation, noise reduction and plugging an Avatar integration. Core Technical Stack: GPT-Realtime Model: Direct audio-to-audio processing Azure Speech Voice: High-quality TTS synthesis (en-IN-AartiIndicNeural) WebSocket Communication: Real-time bidirectional audio streaming Voice Activity Detection: Server-side VAD for natural conversation flow Client-Side Function Calling: Full control over tool execution logic Key Session Configuration The Direct Model Integration uses the session configuration below: session_config = { "input_audio_sampling_rate": 24000, "instructions": system_instructions, "turn_detection": { "type": "server_vad", "threshold": 0.5, "prefix_padding_ms": 300, "silence_duration_ms": 500, }, "tools": tools_list, "tool_choice": "auto", "input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"}, "input_audio_echo_cancellation": {"type": "server_echo_cancellation"}, "voice": { "name": "en-IN-AartiIndicNeural", "type": "azure-standard", "temperature": 0.8, }, "input_audio_transcription": {"model": "whisper-1"}, } Configuration Highlights: 24kHz Audio Sampling: High-quality audio processing for natural speech Server VAD: Optimized threshold (0.5) with 300ms padding for natural conversation flow Azure Deep Noise Suppression: Advanced noise reduction for clear audio Indic Voice Support: en-IN-AartiIndicNeural for localized customer experience Whisper-1 Transcription: Accurate speech recognition for conversation logging Connecting to the Azure Voice Live API The voicelive_modelclient.py demonstrates advanced WebSocket handling for real-time audio streaming: def get_websocket_url(self, access_token: str) -> str: """Generate WebSocket URL for Voice Live API.""" azure_ws_endpoint = endpoint.rstrip("/").replace("https://", "wss://") return ( f"{azure_ws_endpoint}/voice-live/realtime?api-version={api_version}" f"&model={model_name}" f"&agent-access-token={access_token}" ) async def connect(self): if self.is_connected(): # raise Exception("Already connected") self.log("Already connected") # Get access token access_token = self.get_azure_token() # Build WebSocket URL and headers ws_url = self.get_websocket_url(access_token) self.ws = await websockets.connect( ws_url, additional_headers={ "Authorization": f"Bearer {self.get_azure_token()}", "x-ms-client-request-id": str(uuid.uuid4()), }, ) print(f"Connected to Azure Voice Live API....") asyncio.create_task(self.receive()) await self.update_session() Function Calling Implementation The Direct Model Integration provides client-side function execution with complete control: tools_list = [ { "type": "function", "name": "perform_search_based_qna", "description": "call this function to respond to the user query on Contoso retail policies, procedures and general QnA", "parameters": { "type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"], }, }, { "type": "function", "name": "create_delivery_order", "description": "call this function to create a delivery order based on order id and destination location", "parameters": { "type": "object", "properties": { "order_id": {"type": "string"}, "destination": {"type": "string"}, }, "required": ["order_id", "destination"], }, }, { "type": "function", "name": "perform_call_log_analysis", "description": "call this function to analyze call log based on input call log conversation text", "parameters": { "type": "object", "properties": { "call_log": {"type": "string"}, }, "required": ["call_log"], }, }, { "type": "function", "name": "search_products_by_category", "description": "call this function to search for products by category", "parameters": { "type": "object", "properties": { "category": {"type": "string"}, }, "required": ["category"], }, }, { "type": "function", "name": "order_products", "description": "call this function to order products by product id and quantity", "parameters": { "type": "object", "properties": { "product_id": {"type": "string"}, "quantity": {"type": "integer"}, }, "required": ["product_id", "quantity"], }, } ] 🤖 Approach 2: Azure AI Foundry Agent Integration Architecture Components This approach leverages existing Azure AI Foundry Service Agents, providing enterprise-grade voice capabilities as a clean wrapper over pre-configured agents. It does not entail any code changes to the Agent itself to voice enable it. Core Technical Stack: Azure Fast Transcript: Advanced multi-language speech-to-text processing Azure AI Foundry Agent: Pre-configured Agent with autonomous capabilities GPT-4o-mini Model: Agent-configured model for text processing Neural Voice Synthesis: Indic language optimized TTS Semantic VAD: Azure semantic voice activity detection Session Configuration The Agent Integration approach uses advanced semantic voice activity detection: session_config = { "input_audio_sampling_rate": 24000, "turn_detection": { "type": "azure_semantic_vad", "threshold": 0.3, "prefix_padding_ms": 200, "silence_duration_ms": 200, "remove_filler_words": False, "end_of_utterance_detection": { "model": "semantic_detection_v1", "threshold": 0.01, "timeout": 2, }, }, "input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"}, "input_audio_echo_cancellation": {"type": "server_echo_cancellation"}, "voice": { "name": "en-IN-AartiIndicNeural", "type": "azure-standard", "temperature": 0.8, }, "input_audio_transcription": {"model": "azure-speech", "language": "en-IN, hi-IN"}, } Key Differentiators: Semantic VAD: Intelligent voice activity detection with utterance prediction Multi-language Support: Azure Speech with en-IN and hi-IN language support End-of-Utterance Detection: AI-powered conversation turn management Filler Word Handling: Configurable processing of conversational fillers Agent Integration Code The voicelive_client.py demonstrates seamless integration with Azure AI Foundry Agents. Notice that we need to provide the Azure AI Foundry Project Name and an ID of the Agent in it. We do not need to pass the model's name here, since the Agent is already configured with one. def get_websocket_url(self, access_token: str) -> str: """Generate WebSocket URL for Voice Live API.""" azure_ws_endpoint = endpoint.rstrip("/").replace("https://", "wss://") return ( f"{azure_ws_endpoint}/voice-live/realtime?api-version={api_version}" f"&agent-project-name={project_name}&agent-id={agent_id}" f"&agent-access-token={access_token}" ) async def connect(self): """Connects the client using a WS Connection to the Realtime API.""" if self.is_connected(): # raise Exception("Already connected") self.log("Already connected") # Get access token access_token = self.get_azure_token() # Build WebSocket URL and headers ws_url = self.get_websocket_url(access_token) self.ws = await websockets.connect( ws_url, additional_headers={ "Authorization": f"Bearer {self.get_azure_token()}", "x-ms-client-request-id": str(uuid.uuid4()), }, ) print(f"Connected to Azure Voice Live API....") asyncio.create_task(self.receive()) await self.update_session() Advanced Analytics Pipeline GPT-4o Powered Call Analysis The solution implements conversation analytics using Azure Logic Apps with GPT-4o: { "functions": [ { "name": "evaluate_call_log", "description": "Evaluate call log for Contoso Retail customer service call", "parameters": { "properties": { "call_reason": { "description": "Categorized call reason from 50+ predefined scenarios", "type": "string" }, "customer_satisfaction": { "description": "Overall satisfaction assessment", "type": "string" }, "customer_sentiment": { "description": "Emotional tone analysis", "type": "string" }, "call_rating": { "description": "Numerical rating (1-5 scale)", "type": "number" }, "call_rating_justification": { "description": "Detailed reasoning for rating", "type": "string" } } } } ] } Microsoft Fabric Integration The analytics pipeline extends into Microsoft Fabric for enterprise business intelligence: Fabric Integration Features: Real-time Data Mirroring: Cosmos DB to OneLake synchronization Custom Data Agents: Business-specific analytics agents in Fabric Copilot Integration: Natural language business intelligence queries Power BI Dashboards: Interactive reports and executive summaries Artefacts for reference The source code of the solution is available in the GitHub Repo here. An article on this topic is published on LinkedIn here A video recording of the demonstration of this App is available below: Part1 - walkthrough of the Agent configuration in Azure AI Foundry - here Part2 - demonstration of the Application that integrates with the Azure Voice Live API - here Part 3 - demonstration of the Microsoft Fabric Integration, Data Agents, Copilot in Fabric and Power BI for insights and analysis - here Conclusion Azure Voice Live API enables enterprises to build sophisticated voice-enabled AI assistants using two distinct architectural approaches. The Direct Model Integration provides ultra-low latency for real-time applications, while the Azure AI Foundry Agent Integration offers enterprise-grade governance and autonomous operation. Both approaches deliver the same comprehensive business capabilities: Natural voice interactions with advanced VAD and noise suppression Complete retail workflow automation from inquiry to fulfillment AI-powered conversation analytics with sentiment scoring Enterprise business intelligence through Microsoft Fabric integration The choice between approaches depends on your specific requirements: Choose Direct Model Integration for custom function calling and minimal latency Choose Azure AI Foundry Agent Integration for enterprise governance and existing investments734Views1like0CommentsCreating Intelligent Video Summaries and Avatar Videos with Azure AI Services
Unlock the true value of your organization’s video content! In this post, I share how we built an end-to-end AI video analytics platform using Microsoft Azure. Discover how AI can automate video analysis, generate intelligent summaries, and create engaging avatar presentations—making content more accessible, actionable, and impactful for everyone. If you’re interested in digital transformation, AI-powered automation, or modern content management, this is for you!748Views5likes1CommentBuilding an AI-Powered ESG Consultant Using Azure AI Services: A Case Study
In today's corporate landscape, Environmental, Social, and Governance (ESG) compliance has become increasingly important for stakeholders. To address the challenges of analyzing vast amounts of ESG data efficiently, a comprehensive AI-powered solution called ESGai has been developed. This blog explores how Azure AI services were leveraged to create a sophisticated ESG consultant for publicly listed companies. https://youtu.be/5-oBdge6Q78?si=Vb9aHx79xk3VGYAh The Challenge: Making Sense of Complex ESG Data Organizations face significant challenges when analyzing ESG compliance data. Manual analysis is time-consuming, prone to errors, and difficult to scale. ESGai was designed to address these pain points by creating an AI-powered virtual consultant that provides detailed insights based on publicly available ESG data. Solution Architecture: The Three-Agent System ESGai implements a sophisticated three-agent architecture, all powered by Azure's AI capabilities: Manager Agent: Breaks down complex user queries into manageable sub-questions containing specific keywords that facilitate vector search retrieval. The system prompt includes generalized document headers from the vector database for context. Worker Agent: Processes the sub-questions generated by the Manager, connects to the vector database to retrieve relevant text chunks, and provides answers to the sub-questions. Results are stored in Cosmos DB for later use. Director Agent: Consolidates the answers from the Worker agent into a comprehensive final response tailored specifically to the user's original query. It's important to note that while conceptually there are three agents, the Worker is actually a single agent that gets called multiple times - once for each sub-question generated by the Manager. Current Implementation State The current MVP implementation has several limitations that are planned for expansion: Limited Company Coverage: The vector database currently stores data for only 2 companies, with 3 documents per company (Sustainability Report, XBRL, and BRSR). Single Model Deployment: Only one GPT-4o model is currently deployed to handle all agent functions. Basic Storage Structure: The Blob container has a simple structure with a single directory. While Azure Blob storage doesn't natively support hierarchical folders, the team plans to implement virtual folders in the future. Free Tier Limitations: Due to funding constraints, the AI Search service is using the free tier, which limits vector data storage to 50MB. Simplified Vector Database: The current index stores all 6 files (3 documents × 2 companies) in a single vector database without filtering capabilities or schema definition. Azure Services Powering ESGai The implementation of ESGai leverages multiple Azure services for a robust and scalable architecture: Azure AI Services: Provides pre-built APIs, SDKs, and services that incorporate AI capabilities without requiring extensive machine learning expertise. This includes access to 62 pre-trained models for chat completions through the AI Foundry portal. Azure OpenAI: Hosts the GPT-4o model for generating responses and the Ada embedding model for vectorization. The service combines OpenAI's advanced language models with Azure's security and enterprise features. Azure AI Foundry: Serves as an integrated platform for developing, deploying, and governing generative AI applications. It offers a centralized management centre that consolidates subscription information, connected resources, access privileges, and usage quotas. Azure AI Search (formerly Cognitive Search): Provides both full-text and vector search capabilities using the OpenAI ada-002 embedding model for vectorization. It's configured with hybrid search algorithms (BM25 RRF) for optimal chunk ranking. Azure Storage Services: Utilizes Blob Storage for storing PDFs, Business Responsibility Sustainability Reports (BRSRs), and other essential documents. It integrates seamlessly with AI Search using indexers to track database changes. Cosmos DB: Employs MongoDB APIs within Cosmos DB as a NoSQL database for storing chat history between agents and users. Azure App Services: Hosts the web application using a B3-tier plan optimized for cost efficiency, with GitHub Actions integrated for continuous deployment. Project Evolution: From Concept to Deployment The development of ESGai followed a structured approach through several phases: Phase 1: Data Cleaning Extracted specific KPIs from XML/XBRL datasets and BRSR reports containing ESG data for 1,000 listed companies Cleaned and standardized data to ensure consistency and accuracy Phase 2: RAG Framework Development Implemented Retrieval-Augmented Generation (RAG) to enhance responses by dynamically fetching relevant information Created a workflow that includes query processing, data retrieval, and response generation Phase 3: Initial Deployment Deployed models locally using Docker and n8n automation tools for testing Identified the need for more scalable web services Phase 4: Transition to Azure Services Migrated automation workflows from n8n to Azure AI Foundry services Leveraged Azure's comprehensive suite of AI services, storage solutions, and app hosting capabilities Technical Implementation Details Model Configurations: The GPT model is configured with: Model version: 2024-11-20 Temperature: 0.7 Max Response Token: 800 Past Messages: 10 Top-p: 0.95 Frequency/Presence Penalties: 0 The embedding model uses OpenAI-text-embedding-Ada-002 with 1536 dimensions and hybrid semantic search (BM25 RRF) algorithms. Cost Analysis and Efficiency A detailed cost breakdown per user query reveals: App Server: $390-400 AI Search: $5 per query RAG Query Processing: $4.76 per query Agent-specific costs: Manager: $0.05 (30 input tokens, 210 output tokens) Worker: $3.71 (1500 input tokens, 1500 output tokens) Director: $1.00 (600 input tokens, 600 output tokens) Challenges and Solutions The team faced several challenges during implementation: Quota Limitations: Initial deployments encountered token quota restrictions, which were resolved through Azure support requests (typically granted within 24 hours). Cost Optimization: High costs associated with vectorization required careful monitoring. The team addressed this by shutting down unused services and deploying on services with free tiers. Integration Issues: GitHub Actions raised errors during deployment, which were resolved using GitHub's App Service Build Service. Azure UI Complexity: The team noted that Azure AI service naming conventions were sometimes confusing, as the same name is used for both parent and child resources. Free Tier Constraints: The AI Search service's free tier limitation of 50MB for vector data storage restricts the amount of company information that can be included in the current implementation. Future Roadmap The current implementation is an MVP with several areas for expansion: Expand the database to include more publicly available sustainability reports beyond the current two companies Optimize token usage by refining query handling processes Research alternative embedding models to reduce costs while maintaining accuracy Implement a more structured storage system with virtual folders in Blob storage Upgrade from the free tier of AI Search to support larger data volumes Develop a proper schema for the vector database to enable filtering and more targeted searches Scale to multiple GPT model deployments for improved performance and redundancy Conclusion ESGai demonstrates how advanced AI techniques like Retrieval-Augmented Generation can transform data-intensive domains such as ESG consulting. By leveraging Azure's comprehensive suite of AI services alongside a robust agent-based architecture, this solution provides users with actionable insights while maintaining scalability and cost efficiency. https://youtu.be/5-oBdge6Q78?si=Vb9aHx79xk3VGYAh224Views0likes0CommentsAzure APIM Cost Rate Limiting with Cosmos & Flex Functions
Azure API Management (APIM) provides built-in rate limiting policies, but implementing sophisticated Dollar cost quota management for Azure OpenAI services requires a more tailored approach. This solution combines Azure Functions, Cosmos DB, and stored procedures to implement cost-based quota management with automatic renewal periods. Architecture Client → APIM (with RateLimitConfig) → Azure Function Proxy → Azure OpenAI ↓ Cosmos DB (quota tracking) Technical Implementation 1. Rate Limit Configuration in APIM The rate limiting configuration is injected into the request body by APIM using a policy fragment. Here's an example for a basic $5 quota: <set-variable name="rateLimitConfig" value="@{ var productId = context.Product.Id; var config = new JObject(); config["counterKey"] = productId; config["quota"] = 5; return config.ToString(); }" /> <include-fragment fragment-id="RateLimitConfig" /> For more advanced scenarios, you can customize token costs. Here's an example for a $10 quota with custom token pricing: <set-variable name="rateLimitConfig" value="@{ var productId = context.Product.Id; var config = new JObject(); config["counterKey"] = productId; config["startDate"] = "2025-03-02T00:00:00Z"; config["renewal_period"] = 86400; config["explicitEndDate"] = null; config["quota"] = 10; config["input_cost_per_token"] = 0.00003; config["output_cost_per_token"] = 0.00006; return config.ToString(); }" /> <include-fragment fragment-id="RateLimitConfig" /> Flexible Counter Keys The counterKey parameter is highly flexible and can be set to any unique identifier that makes sense for your rate limiting strategy: Product ID: Limit all users of a specific APIM product (e.g., "starter", "professional") User ID: Apply individual limits per user Subscription ID: Track usage at the subscription level Custom combinations: Combine identifiers for granular control (e.g., "product_starter_user_12345") Rate Limit Configuration Parameters Parameter Description Example Value Required counterKey Unique identifier for tracking quota usage "starter10" or "user_12345" Yes quota Maximum cost allowed in the renewal period 10 Yes startDate When the quota period begins. If not provided, the system uses the time when the policy is first applied "2025-03-02T00:00:00Z" No renewal_period Seconds until quota resets (86400 = daily). If not provided, no automatic reset occurs 86400 No endDate Optional end date for the quota period null or "2025-12-31T23:59:59Z" No input_cost_per_token Custom cost per input token 0.00003 No output_cost_per_token Custom cost per output token 0.00006 No Scheduling and Time Windows The time-based parameters work together to create flexible quota schedules: If the current date falls outside the range defined by startDate and endDate , requests will be rejected with an error The renewal window begins either on the specified startDate or when the policy is first applied The renewal_period determines how frequently the accumulated cost resets to zero Without a renewal_period , the quota accumulates indefinitely until the endDate is reached 2. Quota Checking and Cost Tracking The Azure Function performs two key operations: Pre-request quota check: Before processing each request, it verifies if the user has exceeded their quota Post-request cost tracking: After a successful request, it calculates the cost and updates the accumulated usage Cost Calculation For cost calculation, the system uses: Custom pricing: If input_cost_per_token and output_cost_per_token are provided in the rate limit config LiteLLM pricing: If custom pricing is not specified, the system falls back to LiteLLM's model prices for accurate cost estimation based on the model being used The function returns appropriate HTTP status codes and headers: HTTP 429 (Too Many Requests) when quota is exceeded Response headers with usage information: x-counter-key: starter5 x-accumulated-cost: 5.000915 x-quota: 5 3. Cosmos DB for State Management Cosmos DB maintains the quota state with documents that track: { "id": "starter5", "counterKey": "starter5", "accumulatedCost": 5.000915, "startDate": "2025-03-02T00:00:00.000Z", "renewalPeriod": 86400, "renewalStart": 1741132800000, "endDate": null, "quota": 5 } A stored procedure handles atomic updates to ensure accurate tracking, including: Adding costs to the accumulated total Automatically resetting costs when the renewal period is reached Updating quota values when configuration changes Benefits Fine-grained Cost Control: Track actual API usage costs rather than just request counts Flexible Quotas: Set daily, weekly, or monthly quotas with automatic renewal Transparent Usage: Response headers provide real-time quota usage information Product Differentiation: Different APIM products can have different quota levels Custom Pricing: Override default token costs for special pricing tiers Flexible Tracking: Use any identifier as the counter key for versatile quota management Time-based Scheduling: Define active periods and automatic reset windows for quota management Getting Started Deploy the Azure Function with Cosmos DB integration Configure APIM policies to include rate limit configuration Set up different product policies for various quota levels For a detailed implementation, visit our GitHub repository. Demo Video: https://www.youtube.com/watch?v=vMX86_XpSAo Tags: #AzureOpenAI #APIM #CosmosDB #RateLimiting #Serverless487Views0likes2CommentsBuilt-in Enterprise Readiness with Azure AI Agent Service
Ensure enterprise-grade security and compliance with Private Network Isolation (BYO VNet) in Azure AI Agent Service. This feature allows AI agents to operate within a private, isolated network, giving organizations full control over data and networking configurations. Learn how Private Network Isolation enhances security, scalability, and compliance for mission-critical AI workloads.3.4KViews3likes0CommentsIntroducing Azure AI Agent Service
Introducing Azure AI Agent Service at Microsoft Ignite 2024 Discover how Azure AI Agent Service is revolutionizing the development and deployment of AI agents. This service empowers developers to build, deploy, and scale high-quality AI agents tailored to business needs within hours. With features like rapid development, extensive data connections, flexible model selection, and enterprise-grade security, Azure AI Agent Service sets a new standard in AI automation85KViews10likes8CommentsAdopting Hybrid Search with Azure Cosmos DB
In an era where data accessibility and retrieval are crucial, Azure Cosmos DB introduces Hybrid Search, a cutting-edge feature that merges the capabilities of Vector Search and Full-Text Search. This integration enhances search relevance by combining semantic understanding with traditional keyword-based methods, making it ideal for diverse applications such as e-commerce, content management, and AI-driven chatbots. The blog provides a comprehensive guide on enabling and configuring Hybrid Search within Azure Cosmos DB, detailing the processes for setting up Vector Search and Full-Text Search. It also explores the underlying mechanics of Hybrid Search, which utilizes Reciprocal Rank Fusion (RRF) to combine multiple scoring functions for more accurate search results. Additionally, practical use cases and a step-by-step project example demonstrate how to implement an enterprise knowledge management system using Nest.js integrated with Azure Cosmos DB's Hybrid Search capabilities. This powerful combination offers developers and businesses the tools needed to create sophisticated, efficient, and intelligent search experiences within their applications.732Views2likes1Comment[pt2] Choosing the right Data Storage Source (Under Preview) for Azure AI Search
This blog introduces new preview data sources for Azure AI Search, including Fabric OneLake Files, Azure Cosmos DB for Gremlin, Azure Cosmos DB for MongoDB, SharePoint, and Azure Files. Each data source supports incremental indexing, metadata extraction, and AI enrichment, making Azure AI Search more powerful for enterprise search applications.280Views1like0Comments