artificial intelligence
55 TopicsAzure OpenAI Landing Zone reference architecture
In this article, delve into the synergy of Azure Landing Zones and Azure OpenAI Service, building a secure and scalable AI environment. Unpack the Azure OpenAI Landing Zone architecture, which integrates numerous Azure services for optimal AI workloads. Explore robust security measures and the significance of monitoring for operational success. This journey of deploying Azure OpenAI evolves alongside Azure's continual innovation.206KViews42likes20CommentsSecurity Best Practices for GenAI Applications (OpenAI) in Azure
This article presents an in-depth guide on security best practices for GenAI applications that use LLM models within the Azure platform. Aimed at developers and system administrators, it explores the essentials for maintaining the confidentiality, integrity, and availability of LLMs such as Azure OpenAI. It delves into practical measures for addressing security challenges, including data breaches, misuse of AI, and regulatory compliance, while also emphasizing the role of a shared responsibility model in cloud security. The guide provides a comprehensive roadmap for implementing layered security strategies, encryption protocols, access controls, and monitoring practices to ensure the robust security of LLM applications in Azure.73KViews20likes0CommentsEmpowering AI: Building and Deploying Azure AI Landing Zones with Terraform
Discover the power of deploying Azure AI Landing Zones with Terraform. Explore best practices, secure connectivity, and streamlined access to Azure AI services. Learn to create a strong cloud foundation, optimize performance, and ensure governance for your AI solutions. Join us on this practical journey to harness the true capabilities of AI.30KViews9likes17CommentsDemystifying Azure OpenAI Networking for Secure Chatbot Deployment
Embark on a technical exploration of Azure's networking features for building secure chatbots. In this article, we'll dive deep into the practical aspects of Azure's networking capabilities and their crucial role in ensuring the security of your OpenAI deployments. With real-world use cases and step-by-step instructions, you'll gain practical insights into optimizing Azure and OpenAI for your projects.27KViews7likes9CommentsAI Studio End-to-End Baseline Reference Implementation
Discover the Future of AI Deployment with Azure AI Studio’s Baseline Reference Implementation Azure AI Studio is reshaping the landscape of cloud AI integration with its commitment to operational excellence and strategic alignment with core business objectives. We are thrilled to introduce Azure AI Studio’s end-to-end baseline reference implementation—a streamlined architecture crafted for seamless, scalable, and secure AI cloud deployments. Embark on a journey to deploy sophisticated AI workloads with confidence, supported by Azure AI Studio's robust baseline architecture. Whether it's hosting interactive AI playgrounds, constructing complex AI workflows with Promptflow, or ensuring resilient and secure deployments within Azure's managed network environment, this implementation is your blueprint for success. Embrace a new era of AI innovation where security and scalability converge with organizational compliance and governance. Join us in deploying tomorrow's AI solutions, today.4.1KViews6likes0CommentsAzure OpenAI chat baseline architecture in an Azure landing zone
Unlock the potential of AI in the cloud. Our Azure OpenAI Chat Baseline Architecture provides the blueprint you need to transition from testing to a full production environment, all within an Azure landing zone. Ensure security, scalability, and governance are part of your AI strategy. Get started with our concise guide and embrace the future of AI on Azure.7.6KViews6likes0CommentsAdvanced RAG Solution Accelerator
Overview What is RAG and Why Advanced RAG? Retrieval-Augmented Generation (RAG) is a natural language processing technique that combines the strengths of retrieval-based and generation-based models. It uses search algorithms to retrieve relevant data from external sources such as databases, knowledge bases, document corpora, and web pages. This retrieved data, known as "grounding information," is then input into a large language model (LLM) to generate more accurate, relevant, and up-to-date outputs. Figure1: High level Retrieval Augmented Flow Usage Patterns Here are some horizontal use cases where customers have used Retrieval Augmented Generation based systems: Conversational Search and Insights: Summarize large volumes of information for easier consumption and communication. Content Generation: Tailor interactions with individualized information to produce personalized output and recommendations. AI Assistant, Q&A, and Decisioning: Analyze and interpret data to uncover patterns, identify trends, gain valuable insights, and answer questions. Also, below are a few examples of vertical use cases where Retrieval Augmented Generation have been beneficial. Public Sector Knowledge Base: A government agency needs a system to provide citizens with information about public services, such as permits, licenses, and local regulations. Compliance Document Retrieval: A regulatory body must assist organizations in understanding compliance requirements through a database of guidelines and policies. Healthcare Patient Education: A health department aims to provide patients with educational resources about common conditions and treatments. Challenges with Baseline RAG: Ability to cover complex data: RAG on plain text content seems to do fine. However, when the content becomes more complex like financial reports with images and complex tables and document sections spanning pages, being able to parse and index them isn’t straightforward. Context Window Limitations: As the dataset scales, the performance of RAG systems can degrade, particularly due to the "lost in the middle" phenomenon, making it challenging to retrieve specific information from large datasets. Search Limitations: Though there have been advancements in Search technology to be able to perform vector-based searches, however searching over vector embedding alone may not be sufficient for achieving high accuracy. Groundedness: When the search context is not enough, sometimes RAG systems can generate incorrect or misleading information that is not grounded in the customer’s data. Careful evaluations may be necessary to catch these and fix them. Latency and User Experience: Balancing performance and latency is crucial, as high latency can negatively impact the user experience. Optimizing this balance is a key challenge. Quality Improvement Levers: Identifying and effectively utilizing the right levers for quality improvements, such as accuracy, latency, and relevance, can be difficult. Advanced RAG aims to address the challenges faced with Baseline RAG by incorporating advanced techniques for ingestion, formatting, and intent extraction from both structured and unstructured data. It provides improved baseline architecture to build a more scalable solution that meets the accuracy and performance requirements of the business. By implementing advanced methodologies in data ingestion, search optimization, and evaluation, Advanced RAG enhances the overall effectiveness and reliability of Retrieval-Augmented Generation systems. This ensures that the business value of RAG systems is maximized, aligning technological capabilities with business needs. RAG Quality Improvement Background Our implementation uses default configurations from document indexing services to ingest financial data. We use Azure AI search for indexing it. The content was also vectorized in the index. The search index covered a few years of financial reports for the company. Once the RAG solution was implemented, overall accuracy was measured using the GPT similarity metric, which evaluates the similarity between user-provided ground truth answers and the model's predicted answers on a scale of 1 to 5, where 5 represents that the system produced answers that perfectly matched the ground truth answers. Accuracy Improvement Efforts To improve the accuracy of the Retrieval-Augmented Generation (RAG) system, several strategies were implemented that could be grouped under three different categories; Ingestion improvements, search improvements and improvements in tooling and evaluation. Ingestion Improvements: Improve Parsing: Efforts were made to minimize information loss during ingestion by handling data in images and complex tables. Image descriptions were generated, and various techniques were used to handle complex tables, including converting them into Markdown, HTML, and paragraph formats to ensure accurate parsing of tabular data. Information in images: The image below shows the performance of Microsoft Stock compared to the rest of the market (S&P 500 and NASDAQ). Efficient parsing techniques can eliminate the need for additional tables and supporting text content by extracting key insights from images and storing them in the form of text. Figure 2: Example of information in images Complex Tables: The image below shows an example of a table with financial data represented in a complex structure in the financial report. In this particular example, the table contains multiple sub columns (years) within a top-level column along with rows spanning over multiple lines. Figure 3: Example of a complex table in financial reports Optimal Chunk Size: The impact of chunk size on search results was analyzed. Parsed content was split into paragraphs, and a small percentage of these paragraphs were used to generate questions. Custom scripts created a question-to-paragraph mapping dataset. Different indexes with varying chunk sizes (e.g., 3k, 8k) were created, and search results were evaluated for different values of top_k. Recall Values with Different Chunk Sizes: The image and table below show recall values with different top_k search results on different indexes with different chunk sizes. For example, with a chunk size of 3k characters, the recall was 78.5% for top_k = 7 and 91% for top_k = 25. Figure 4: Recall on different chunk sizes Recall Chunk Size 8K (chars) 3K (chars) top_k = 7 69% 78.5% top_k = 25 76% 91% Table 1: Search Recall on different chunk sizes Based on the search recall value, for our content, it seems chunks of size 3K characters would work best for our content and we could use top_k of 25 to get most of the relevant search results Index Enrichment: Additional metadata was added to chunks during ingestion to aid in better retrieval. This included metadata in additional fields used during search (like headings, section topics etc.) and other fields used for filtering (report year etc.). Search Improvements: Pre-processing of User Input: Techniques such as rephrasing and query expansion were used to enhance the quality of user input. Advanced Search Features: The use of vector, semantic, and hybrid search features was implemented to increase the number of relevant results. Filtering and Reranking: Filters were dynamically extracted from user queries, and search results were reranked to improve relevance. Example of a rephrased user query: Below is an example of a user prompt that is then rephrased and fanned out into various smaller (more focused) search queries, produced from GPT-4o using a custom prompt User Query: Can you explain the difference between the gross profit for Microsoft in 2023 and 2024? Rephaser Output: Evaluation and Tooling: Standardizing Datasets: As the project progressed, soon we had too many datasets which started to result in inconsistent ways of measuring the quality of the bot response. To resolve that issue, we standardized our dataset and used AML to store, document and version them. When any updates were made to the dataset (say some inconsistency was found in the golden data set and that user prompt was ignored from accuracy computation the dataset was updated and a new version created). This way everyone was using a known dataset for evaluations. Standardizing Accuracy Calculation: To calculate accuracy of bot’s answers, similarity score is used, which is a rating between 1 to 5 based on how similar bot’s answer is to the golden dataset. Initially, the similarity score metric included in the Prompt Flow was used to calculate this, but soon we realized that from the produced scores it wasn’t easy to understand why certain things were scored in a certain way. So, the team created its own prompt and calibrated it against how humans had done the evaluations. The tuned prompt was then used in prompt flow to run evaluations. The prompt, along with scoring the bot’s result also provides reason why it gave that score, which useful in analyzing the results. The following image shows a snippet of that prompt: Figure 5: Custom prompt for scoring bot response Automating Accuracy Calculations: Tools were also developed to automate the generation of predictions and the evaluation of accuracy in a more repeatable and consistent way. More details on Analysis can be found in the Eval Tool section Analyzing problematic queries: Running evaluations and just looking at the overall / average score wasn’t enough to analyze the cause of the issue. So, we took a first pass at categorizing the user queries into certain buckets. These categories became: Queries that are direct hit on some content in the report – like revenue for a year Queries where we need to perform some calculations - like Gearing Ratio Queries that do compare and contrast across some KPI – largest two segments by revenue Open ended queries where we need to perform analysis – like why and what? Later, LLM was leveraged to auto categorize ground truth questions as more ground truth questions were updated. Once the questions were categorized, evaluations were broken down by these categories to ease analysis and understanding problematic queries. Figure below shows a snapshot of the spread of the user queries in the ground truth by the following categories: Figure 6 Spread of user prompt by categories The figure below shows a snapshot of similarity scores across these categories Figure 7 Average similarity score by category Later, another category was added (difficulty level) with below values. The final results were reported across these categories. o Easy: If the search context had the answer user was looking for o Medium: If there was no direct hit and required some calculations to get to the final result o Hard: If the question required some analysis to be performed on the retrieved/calculated data like a financial analyst does. Results after the Accuracy Improvement Efforts After multiple iterations of accuracy improvement efforts and stabilizing the solution the overall accuracy of the system came around 4.3, making the overall solution more acceptable to the end user. The solution was also scaled up to cover content across multiple years, with over 15 financial reports and roughly 1300 pages in total. Another important metric to consider – the pass rate based on question type (% of time some answers were scored a value of 4 or a 5) to ensure the copilot was consistently passing these ground truths. The table below lists the pass rate by difficulty: Pass rate by difficulty Difficulty Easy Medium Hard Pass Rate 95 79 72 Table 2: Accuracy Improvement Analysis by Difficulty Solution architecture The RAG solution is designed to handle various tasks using robust and scalable architecture. The architecture includes the following key aspects: Security User Authentication: The solution uses Microsoft Entra ID for user authentication, ensuring secure access to the system. Network Security: All runtime components are locked behind a Virtual Network (VNet) to ensure that traffic does not traverse public networks. This enhances security by isolating the components from external threats. Managed Identities: The solution leverages managed identities where possible to simplify the management of secrets and credentials. This reduces the risk of credential exposure and makes it easier to manage access to Azure resources. Composability Modular Design: The solution is broken down into smaller, well-defined core microservices and skills that act as plug-and-play components. This modular design allows you to use existing services or bring in new ones to meet your specific needs. Core Microservices: Backend services handle different aspects of the solution, such as session management, data processing, runtime configuration, and orchestration. Skills: Specialized services provide specific capabilities, such as cognitive search and image processing. These skills can be easily integrated or replaced as needed. Iterability Configuration Service: The solution includes a configuration service that allows you to create runtime configurations for each microservice. This enables you to make changes, such as updating prompts or search indexes, without redeploying the entire solution. Per-User Prompt Configuration: Configuration service can be used to apply different configurations for each user prompt, allowing for rapid experimentation and iteration. This flexibility helps to quickly adapt to changing requirements and improve the overall system. Testing and Evaluation: The solution also comes with the ability to run dummy/simulated conversations in the form of nightly runs, end-to-end integration tests on demand, and an evaluation tool to perform end-to-end evaluation of the solution. Logging and Instrumentation Application Insights: The solution integrates with Azure Application Insights in Azure Monitor for logging and instrumentation, making it easy to debug by reviewing logs. Traceability: One can easily trace what is happening in the backend using the conversation_id and dialog_id (unique GUIDs generated by the frontend) for each user session and interaction. This helps in identifying and resolving issues quickly. Figure 8: Solution Architecture Before exploring the data flow, we begin with the Ingestion process, crucial for preparing the solution. This involves creating and populating the search index with relevant content (corpus). Detailed instructions on parsing, chunking, and indexing can be found in the Solution capabilities section of the document. User Query Processing Flow User Authentication: Users interact with the bot via a web application and must authenticate using Microsoft Entra ID to ensure secure access. User Interaction: Once authenticated, users can submit requests through text or voice: The web app establishes a WebSocket connection with the backend session manager. For voice interactions, Microsoft Speech Services are utilized for live transcription. The web app requests a speech token from the backend, which is then used in the Speech SDK for transcription. Token Management: The backend retrieves secrets from Key Vault to generate tokens necessary for front end operations. Transcription and Submission: After transcription, the web app submits the transcribed text to the backend. Session Management: The session manager assigns unique connection IDs for each WebSocket connection to identify clients. User prompts are then pushed into a message queue, implemented using Azure Cache for Redis. Orchestrator: The orchestrator plays a critical role in managing the flow of information. It reads the user query from the message queue and performs several actions: Plan & Execute: It identifies the required actions based on the user query and context. Permissions: It checks user permissions using Role-Based Access Control (RBAC) or custom permissions on the content. NOTE: The current implementation doesn’t do it, however Orchestrator could easily be updated to do so. Invoke Actions: It triggers the appropriate actions, such as invoking the Azure AI Search for retrieving relevant information. Azure AI Search: The orchestrator interacts with Azure AI Search to query the unstructured knowledge base. This involves searching through financial reports or other content to find the information the user requested. Status & Response: The orchestrator processes the search results and formulates a response. It updates the queue with the status and the final response, which includes any necessary predictions or additional information. Session Manager: The response from the orchestrator is sent back to the session manager. This component is responsible for maintaining the session’s integrity and ensuring that each client receives the correct response. It uses the unique connection ID to route the response back to the appropriate client. Web App: The web app receives a response from the session manager. It then delivers the bot's response back to the user, completing the interaction cycle. This response can be in text and /or speech format, depending on the user's initial input method. Update History: On successful completion of bot response, the session manager updates the user profile and conversation history in the storage component. This includes details about user intents and entities, ensuring that the system can provide personalized and context-aware responses in future interactions. Developer Logs / Instrumentation: Throughout the process, logs and instrumentation data are collected. These logs are essential for monitoring and debugging the system, as well as for enhancing its performance and reliability. Evaluations and Quality Enhancements: The collected data along with golden datasets, manual feedback is utilized for ongoing evaluations and quality enhancements. Tools like Azure AI Foundry and VS Code along with the configuration service are used to test the bots, develop and evaluate different prompts and models. Monitoring and Reporting: The system is continuously monitored using Azure Monitor and other analytics tools. Power BI dashboards provide insights into system performance, user interactions, and other key metrics. This ensures that the solution remains responsive and effective over time. Solution capabilities The solution will support the following capabilities: Document Ingestion Pipeline Document ingestion in a Retrieval-Augmented Generation (RAG) application is a critical process that ensures efficient and accurate retrieval of information. Currently, the ingestion service supports the following scenarios: Large financial documents containing complex tables, graphs, charts and other figures Large retail product catalogs containing images and descriptions The overall process can be broken down into three primary stages: Document Loadin: The Document Loader is the first stage in the document ingestion pipeline. Its primary function is to load documents into memory and extract text and metadata. One can configure to use either Azure AI Document Intelligence service or LangChain with Azure AI Document Intelligence for text extraction. Document Parsing: Document Parser is the second stage in the document ingestion pipeline. Its role is to process the loaded text and metadata, splitting the document into manageable chunks and cleaning the text for indexing. One can use either a Fixed-size chunking with overlap or go with Layout-based chunking, where with the use of LLMs chunking is done based on whether certain paragraphs should be kept together. The solution used layout-based chunking and sections and subsections were extracted and maintained as metadata for the chunked paragraphs. Document Indexing: Document Indexer is the final stage in the document ingestion pipeline. Its purpose is to upload the parsed chunks into a search index, enabling efficient retrieval based on user queries. As part of document parsing additional metadata (section and subsection names and titles) are also passed along with the text to be indexed. Main content and certain metadata fields are also stored as vectors to enable better retrieval. Figure 9: Indexing by document Search Once the Ingestion pipeline is executed successfully resulting in a valid, queryable Search Index, the Search service can be configured and integrated into the end-to-end RAG application. The Search Service exposes an API that enables users to query a search index in Azure AI Search. It processes natural language queries, applies requested filters, and invokes search requests against the preconfigured search configuration using the Azure AI Search SDK. Search Index Configuration: The search index configuration defines the schema and the type of search to apply, including simple text search, vector search, hybrid search, and hybrid with additional semantic understanding. This is done as part of index creation and document ingestion. User Query: The process starts with a user query, a natural language input from the user. Query Embeddings Generation: Using an LLM, the query is vectorized so hybrid search could be performed on the user query. Search Filter Generation: From the user query, filters, based on criteria such as equality, range conditions, and substring matches, are generated to refine the search results. Search Invocation: The search service constructs a query using the embedding and filters, sends it to Azure AI Search via the Azure AI Search SDK, and receives the search results. Pruning: Pruning refines these results further to ensure relevance based on additional semantic filtering and ranking. Search Results: The final output represents the items from the search index that best match the user’s query, after all filters and pruning have been applied. Query Reprocessing One of the first steps we approach when we receive a chat message is preprocessing to make sure that we have better search results that will enable our RAG system to answer the question accurately. We perform the following steps as part of the preprocessing: Message Rephrasing: When the chatbot receives a new message, we need to make sure that we rephrase this message based on the chat history as this new message may depend on the previous context. For example, when we ask, “Which team won the Premier League in 2023?” and then we ask a follow-up question “What about the following year?” we will need to rephrase this follow-up question to “Which team won the premier League in 2024?” Fanout: If the query is asking about complex data that does not exist in the indexed documents, it can be calculated by other simpler data that already exists in the document. For example, if the indexed documents are financial reports and the query is asking about the gross profit margin, if we search for gross profit margin, we may not find it in the indexed documents. But to calculate the gross profit margin, we can use both the Revenue and the Cost Of Goods (COGS) which exist in the indexed documents. If we can break down the original question about gross profit margin to sub questions for Revenue and COGS, then that would help the model to calculate the gross profit margin given these values. Check out the new service Rewrite queries with semantic ranker in Azure AI Search (Preview). AI Skills To ensure modularity and ease of maintenance, our solution designates any service capable of providing data as a "skill." This approach allows for seamless plug-and-play integration of components. For instance, Azure AI Search is treated as a skill within our architecture. Should the solution require additional data sources, such as a relational database, these can be encapsulated within an API and similarly integrated as skills. Wrapping content providers as skills serves two primary purposes: Enhanced Logging and Debugging: Skills can be configured to incorporate logging and instrumentation fields, ensuring that all generated logs include relevant context. This uniformity greatly facilitates efficient debugging by providing comprehensive log insights. Dynamic Configuration: Skills can leverage the configuration service to expose runtime configurations. This flexibility is particularly beneficial during evaluations, allowing for adjustments such as modifying the number of top-k results or switching to a different search index to accommodate improvements in data ingestion. By adopting this skill-based approach, the architecture remains adaptable and scalable, supporting ongoing enhancements and diverse data integration. Sharing Intermediate Results Sharing intermediate results from the RAG process provides the user with details about what is happening once a query is sent to the bot. This is especially useful when the query takes a long time to return. This also helps to see how the query was broken down into smaller queries, so if something goes wrong (especially for harder queries), the user could have the ability to rephrase and get a better response. Once the user sends the query to the bot, the orchestrator emits intermediate updates like “Searching for ...”, “Retrieved XX results...” before the final answer is delivered. Figure 10: Messaging Framework Architecture to support this: WebSocket connection (Client <> Sessions Manager) - When the client connects to the session manager a persistent WebSocket connection is created, all communication between the client and session manager is handled through this connection. This also allows queueing up of multiple messages from the client. The session manager listens to the incoming messages and queues them up in a message queue. Then requests are handled one by one. Meanwhile, intermediate messages and final answers of the previously submitted messages are sent asynchronously back to the client. Message Queue (Session Manager <> Orchestrator) - once the session manager receives a request its enqueued into a task queue. Since there can be multiple orchestrator instances running in the cluster, the task queue ensures only one instance receives a particular request. The orchestrator then begins the RAG process. As the RAG process continues, the orchestrator sends intermediate messages by publishing them to a message queue. All instances of the session manager subscribe to this message queue. The instance handling the client relevant to the incoming message forwards it to the client. Runtime Configuration The runtime configuration service enhances the architecture's dynamicity and flexibility. It enables core services and AI skills to decouple and parameterize various components, such as prompts, search data settings, and operational parameters. These services can easily override default configurations with new versions at runtime, allowing for dynamic behavior adjustments during operation. Figure 11: Runtime Configuration Core Services and AI Skills: define unique identifiers for their individual configurations. At runtime, they check if the payload consists of a configuration override. If yes, they attempt to retrieve it from the cache. In a scenario where it is not present in cache memory, i.e., first time fetch, they read from the configuration service and save it in cache memory for future references. Configuration Service: facilitates Create, Read and Delete operations for a new configuration. Validates the incoming config against a Pydantic model and generates a unique version for the configuration upon successful save. Cosmos DB: persists the new config version. Redis: high availability memory store for storing and quick retrievals of configurations for subsequent queries. Evaluation Tool Improving accuracy of RAG based solution is a continuous process that requires experimenting with different changes, running predictions with those changes (running user query through the bot), evaluating the bot produced result against the ground truth and analyzing the issues and then repeating these steps again. All this required a consistent way of evaluating the end-to-end results. Initially the team did the evaluation and scoring of the results manually but as the search index grew (ingested a few thousand financial reports) and the golden dataset grew, doing it manually was very time-consuming. So, the team developed a custom prompt and used LLM to do the scoring. The prompt was calibrated against the human scores. Once the prompt was stabilized Evaluation tool was built to do two things: For each golden question call the bot endpoint and generate the prediction (bot answer) Then take the ground truth and predicted results to run evaluation of them and produce some metrics. Implementation Guide Please refer to GitHub repo. Additional Resources Get started on Azure AI Foundry Evaluation of generative AI applications Generate adversarial simulations for safety evaluation Generate synthetic data and simulate non-adversarial tasks AI architecture guidance to build AI workloads on Azure Responsible AI Tools and Practices4.1KViews5likes0CommentsSecurely Integrating Azure API Management with Azure OpenAI via Application Gateway
Introduction As organizations increasingly integrate AI into their applications, securing access to Azure OpenAI services becomes a critical priority. By default, Azure OpenAI can be exposed over the public internet, posing potential security risks. To mitigate these risks, enterprises often restrict OpenAI access using Private Endpoints, ensuring that traffic remains within their Azure Virtual Network (VNET) and preventing direct internet exposure. However, restricting OpenAI to a private endpoint introduces challenges when external applications, such as those hosted in AWS or on-premises environments, need to securely interact with OpenAI APIs. This is where Azure API Management (APIM) plays a crucial role. By deploying APIM within an internal VNET, it acts as a secure proxy between external applications and the OpenAI service, allowing controlled access while keeping OpenAI private. To further enhance security and accessibility, Azure Application Gateway (App Gateway) can be placed in front of APIM. This setup enables secure, policy-driven access by managing traffic flow, applying Web Application Firewall (WAF) rules, and enforcing SSL termination if needed. What This Blog Covers This blog provides a technical deep dive into setting up a fully secure architecture that integrates Azure OpenAI with APIM, Private Endpoints, and Application Gateway. Specifically, we will walk through: Configuring Azure OpenAI with a Private Endpoint to restrict public access and ensure communication remains within a secure network. Deploying APIM in an Internal VNET, allowing it to securely communicate with OpenAI while being inaccessible from the public internet. Setting up Application Gateway to expose APIM securely, allowing controlled external access with enhanced security. Configuring VNET, Subnets, and Network Security Groups (NSGs) to enforce network segmentation, traffic control, and security best practices. By the end of this guide, you will have a production-ready, enterprise-grade setup that ensures: End-to-end private connectivity for Azure OpenAI through APIM. Secure external access via Application Gateway while keeping OpenAI hidden from the internet. Granular network control using VNET, Subnets, and NSGs. This architecture provides a scalable and secure solution for enterprises needing to expose OpenAI securely without compromising privacy, performance, or compliance. Prerequisites Before diving into the integration of Azure API Management (APIM) with Azure OpenAI in a secure, private setup, ensure you have the following in place: 1. Azure Subscription & Required Permissions An active Azure Subscription with the ability to create resources. Contributor or Owner access to deploy Virtual Networks (VNETs), Subnets, Network Security Groups (NSGs), Private Endpoints, APIM, and Application Gateway. 2. Networking Setup Knowledge Familiarity with Azure Virtual Network (VNET) concepts, Subnets, and NSGs is helpful, as we will be designing a controlled network environment. 3. Required Azure Services The following services are needed for this integration: Azure Virtual Network (VNET) – To establish a private, secure network. Subnets & NSGs – For network segmentation and traffic control. Azure OpenAI Service – Deployed in a region that supports private endpoints. Azure API Management (APIM) – Deployed in an Internal VNET mode to act as a secure API proxy. Azure Private Endpoint – To restrict Azure OpenAI access to a private network. Azure Application Gateway – To expose APIM securely with load balancing and optional Web Application Firewall (WAF). 4. Networking and DNS Requirements Private DNS Zone: Required to resolve private endpoints within the VNET. Custom DNS Configuration: If using a custom DNS server, ensure proper forwarding rules are in place. Firewall/NSG Rules: Ensure necessary inbound and outbound rules allow communication between services. 5. Azure CLI or PowerShell (Optional, but Recommended) Azure CLI (az commands) or Azure PowerShell for efficient resource deployment. Once you have these prerequisites in place, we can proceed with designing the secure architecture for integrating Azure OpenAI with APIM using Private Endpoints and Application Gateway. Architecture Overview The architecture ensures secure and private connectivity between external users and Azure OpenAI while preventing direct public access to OpenAI’s APIs. It uses Azure API Management (APIM) in an Internal VNET, an Azure Private Endpoint for OpenAI, and an Application Gateway for controlled public exposure. Key Components & Flow User Requests External users access the API via a public endpoint exposed by Azure Application Gateway. The request passes through App Gateway before reaching APIM, ensuring security and traffic control. Azure API Management (APIM) – Internal VNET Mode APIM is deployed in Internal VNET mode, meaning it does not have a public endpoint. APIM serves as a proxy between external applications and Azure OpenAI, ensuring request validation, rate limiting, and security enforcement. The Management Plane of APIM still requires a public IP for admin operations, but the Data Plane (API traffic) remains fully private. Azure Private Endpoint for OpenAI APIM cannot access Azure OpenAI publicly since OpenAI is secured with a Private Endpoint. A Private Endpoint allows APIM to securely connect to Azure OpenAI within the same VNET, preventing internet exposure. This ensures that only APIM within the internal network can send requests to OpenAI. Managed Identity Authentication APIM uses a Managed Identity to authenticate securely with Azure OpenAI. This eliminates the need for hardcoded API keys and improves security by using Azure Role-Based Access Control (RBAC). Application Gateway for External Access Since APIM is not publicly accessible, an Azure Application Gateway (App Gateway) is placed in front of it. App Gateway acts as a reverse proxy that securely exposes APIM to the public while enforcing: SSL termination for secure HTTPS connections. Web Application Firewall (WAF) for protection against threats. Load balancing if multiple APIM instances exist. Network Segmentation & Security VNET & Subnets: APIM, OpenAI Private Endpoint, and App Gateway are deployed in separate subnets within an Azure Virtual Network (VNET). NSGs (Network Security Groups): Strict inbound and outbound rules ensure that only allowed traffic flows between components. Private DNS: Required to resolve Private Endpoint addresses inside the VNET. Security Enhancements No direct internet access to Azure OpenAI, ensuring full privacy. Controlled API exposure via App Gateway, securing public requests. Managed Identity for authentication, eliminating hardcoded credentials. Private Endpoint enforcement, blocking unwanted access from external sources. This architecture ensures that Azure OpenAI remains secure, APIM acts as a controlled gateway, and external users can access APIs safely through App Gateway. Azure CLI Script for VNet, Subnets, and NSG Configuration # Variables RESOURCE_GROUP="apim-openai-rg" LOCATION="eastus" VNET_NAME="apim-vnet" VNET_ADDRESS_PREFIX="10.0.0.0/16" # Subnets APP_GATEWAY_SUBNET="app-gateway-subnet" APP_GATEWAY_SUBNET_PREFIX="10.0.1.0/24" APIM_SUBNET="apim-subnet" APIM_SUBNET_PREFIX="10.0.2.0/24" OPENAI_PE_SUBNET="openai-pe-subnet" OPENAI_PE_SUBNET_PREFIX="10.0.3.0/24" # NSGs APP_GATEWAY_NSG="app-gateway-nsg" APIM_NSG="apim-nsg" OPENAI_PE_NSG="openai-pe-nsg" # Step 1: Create Resource Group az group create --name $RESOURCE_GROUP --location $LOCATION # Step 2: Create Virtual Network az network vnet create \ --resource-group $RESOURCE_GROUP \ --name $VNET_NAME \ --address-prefix $VNET_ADDRESS_PREFIX \ --subnet-name $APP_GATEWAY_SUBNET \ --subnet-prefix $APP_GATEWAY_SUBNET_PREFIX # Step 3: Create Additional Subnets (APIM & OpenAI Private Endpoint) az network vnet subnet create \ --resource-group $RESOURCE_GROUP \ --vnet-name $VNET_NAME \ --name $APIM_SUBNET \ --address-prefix $APIM_SUBNET_PREFIX az network vnet subnet create \ --resource-group $RESOURCE_GROUP \ --vnet-name $VNET_NAME \ --name $OPENAI_PE_SUBNET \ --address-prefix $OPENAI_PE_SUBNET_PREFIX # Step 4: Create NSGs az network nsg create --resource-group $RESOURCE_GROUP --name $APP_GATEWAY_NSG az network nsg create --resource-group $RESOURCE_GROUP --name $APIM_NSG az network nsg create --resource-group $RESOURCE_GROUP --name $OPENAI_PE_NSG # Step 5: Add NSG Rules for APIM (Allow 3443 for APIM Internal VNet) az network nsg rule create \ --resource-group $RESOURCE_GROUP \ --nsg-name $APIM_NSG \ --name AllowAPIMInbound3443 \ --priority 120 \ --direction Inbound \ --access Allow \ --protocol Tcp \ --source-address-prefixes ApiManagement \ --destination-address-prefixes VirtualNetwork \ --destination-port-ranges 3443 # Step 6: Associate NSGs with Subnets az network vnet subnet update \ --resource-group $RESOURCE_GROUP \ --vnet-name $VNET_NAME \ --name $APP_GATEWAY_SUBNET \ --network-security-group $APP_GATEWAY_NSG az network vnet subnet update \ --resource-group $RESOURCE_GROUP \ --vnet-name $VNET_NAME \ --name $APIM_SUBNET \ --network-security-group $APIM_NSG az network vnet subnet update \ --resource-group $RESOURCE_GROUP \ --vnet-name $VNET_NAME \ --name $OPENAI_PE_SUBNET \ --network-security-group $OPENAI_PE_NSG # Step 7: Configure Service Endpoints for APIM Subnet az network vnet subnet update \ --resource-group $RESOURCE_GROUP \ --vnet-name $VNET_NAME \ --name $APIM_SUBNET \ --service-endpoints Microsoft.EventHub Microsoft.KeyVault Microsoft.ServiceBus Microsoft.Sql Microsoft.Storage Microsoft.AzureActiveDirectory Microsoft.CognitiveServices Microsoft.Web Creating an Azure Open AI with private endpoint # Create an Azure OpenAI Resource az cognitiveservices account create \ --name $AOAI_NAME \ --resource-group $RESOURCE_GROUP \ --kind OpenAI \ --sku S0 \ --location $LOCATION \ --yes \ --custom-domain $AOAI_NAME #Create a Private Endpoint az network private-endpoint create \ --name $PRIVATE_ENDPOINT_NAME \ --resource-group $RESOURCE_GROUP \ --vnet-name $VNET_NAME \ --subnet $SUBNET_NAME \ --private-connection-resource-id $(az cognitiveservices account show --name $AOAI_NAME --resource-group $RESOURCE_GROUP --query id -o tsv) \ --group-id account \ --connection-name "${PRIVATE_ENDPOINT_NAME}-connection" # Create a Private DNS Zone az network private-dns zone create \ --resource-group $RESOURCE_GROUP \ --name $PRIVATE_DNS_ZONE_NAME # Link Private DNS Zone to VNet az network private-dns link vnet create \ --resource-group $RESOURCE_GROUP \ --zone-name $PRIVATE_DNS_ZONE_NAME \ --name "myDNSLink" \ --virtual-network $VNET_NAME \ --registration-enabled false # Retrieve the Private IP Address from the Private Endpoint PRIVATE_IP=$(az network private-endpoint show \ --name $PRIVATE_ENDPOINT_NAME \ --resource-group $RESOURCE_GROUP \ --query "customDnsConfigs[0].ipAddresses[0]" -o tsv) # Create a DNS Record for Azure OpenAI az network private-dns record-set a add-record \ --resource-group $RESOURCE_GROUP \ --zone-name $PRIVATE_DNS_ZONE_NAME \ --record-set-name $AOAI_NAME \ --ipv4-address $PRIVATE_IP # Disable Public Network Access az cognitiveservices account update \ --name $AOAI_NAME \ --resource-group $RESOURCE_GROUP \ --public-network-access Disabled Provisioning the Azure APIM instance to an internal VNet Please follow the link to provision: Deploy Azure API Management instance to internal VNet | Microsoft Learn Create an API for AOAI in APIM Please follow the link : Import an Azure OpenAI API as REST API - Azure API Management | Microsoft Learn Configure Azure Application Gateway with Azure APIM Please follow the link : Use API Management in a virtual network with Azure Application Gateway - Azure API Management | Microsoft Learn Conclusion Securing Azure OpenAI with private endpoints, APIM, and Application Gateway ensures a robust, enterprise-grade architecture that balances security, accessibility, and performance. By leveraging private endpoints, Azure OpenAI remains shielded from public exposure, while APIM acts as a controlled gateway for managing external API access. The addition of Application Gateway provides an extra security layer with SSL termination, WAF protection, and traffic management. With this setup, organizations can: ✔ Ensure end-to-end private connectivity for Azure OpenAI. ✔ Enable secure external access via APIM and Application Gateway. ✔ Enforce strict network segmentation with VNETs, Subnets, NSGs, and Private DNS. ✔ Strengthen security with Managed Identity authentication and controlled API exposure. By following this guide, you now have a scalable, production-ready solution to securely integrate Azure OpenAI with external applications, whether they reside in AWS, on-premises, or other cloud environments. Implement these best practices to maintain compliance, minimize security risks, and enhance the reliability of your AI-powered applications.2.6KViews4likes0CommentsAzure AI Foundry, GitHub Copilot, Fabric and more to Analyze usage stats from Utility Invoices
Overview With the introduction of Azure AI Foundry, integrating various AI services to streamline AI solution development and deployment of Agentic AI Workflow solutions like multi-modal, multi-model, dynamic & interactive Agents etc. has become more efficient. The platform offers a range of AI services, including Document Intelligence for extracting data from documents, natural language processing and robust machine learning capabilities, and more. Microsoft Fabric further enhances this ecosystem by providing robust data storage, analytics, and data science tools, enabling seamless data management and analysis. Additionally, Copilot and GitHub Copilot assist developers by offering AI-powered code suggestions and automating repetitive coding tasks, significantly boosting productivity and efficiency. Objectives In this use case, we will use monthly electricity bills from the utilities' website for a year and analyze them using Azure AI services within Azure AI Foundry. The electricity bills is simply an easy start but we could apply it to any other format really. Like say, W-2, I-9, 1099, ISO, EHR etc. By leveraging the Foundry's workflow capabilities, we will streamline the development stages step by step. Initially, we will use Document Intelligence to extract key data such as usage in kilowatts (KW), billed consumption, and other necessary information from each PDF file. This data will then be stored in Microsoft Fabric, where we will utilize its analytics and data science capabilities to process and analyze the information. We will also include a bit of processing steps to include Azure Functions to utilize GitHub Copilot in VS Code. Finally, we will create a Power BI dashboard in Fabric to visually display the analysis, providing insights into electricity usage trends and billing patterns over the year. Utility Invoice sample Building the solution Depicted in the picture are the key Azure and Copilot Services we will use to build the solution. Set up Azure AI Foundry Create a new project in Azure AI Foundry. Add Document Intelligence to your project. You can do this directly within the Foundry portal. Extract documents through Doc Intel Download the PDF files of the power bills and upload them to Azure Blob storage. I used Document Intelligence Studio to create a new project and Train custom models using the files from the Blob storage. Next, in your Azure AI Foundry project, add the Document Intelligence resource by providing the Endpoint URL and Keys. Data Extraction Use Azure Document Intelligence to extract required information from the PDF files. From the resource page in the Doc Intel service in the portal, copy the Endpoint URL and Keys. We will need these to connect the application to the Document Intelligence API. Next, let’s integrate doc intel with the project. In the Azure AI Foundry project, add the Document Intelligence resource by providing the Endpoint URL and Keys. Configure the settings as needed to start using doc intel for extracting data from the PDF documents. We can stay within the Azure AI Foundry portal for most of these steps, but for more advanced configurations, we might need to use the Document Intelligence Studio. GitHub Copilot in VS Code for Azure Functions For processing portions of the output from Doc Intel, what better way to create the Azure Function than in VS Code, especially with the help of GitHub Copilot. Let’s start by installing the Azure Functions extension in VS Code, then create a new function project. GitHub Copilot can assist in writing the code to process the JSON received. Additionally, we can get Copilot to help generate unit tests to ensure the function works correctly. We could use Copilot to explain the code and the tests it generates. Finally, we seamlessly integrate the generated code and unit tests into the Functions app code file, all within VS Code. Notice how we can prompt GitHub Copilot from step 1 of Creating the Workspace to inserting the generated code into the Python file for the Azure Function to testing it and all the way to deploying the Function. Store and Analyze information in Fabric There are many options for storing and analyzing JSON data in Fabric. Lakehouse, Data Warehouse, SQL Database, Power BI Datamart. As our dataset is small, let’s choose either SQL DB or PBI Datamart. PBI Datamart is great for smaller datasets and direct integration with PBI for dashboarding while SQL DB is good for moderate data volumes and supports transactional & analytical workloads. To insert the JSON values derived in the Azure Functions App either called from Logic Apps or directly from the AI Foundry through the API calls into Fabric, let’s explore two approaches. Using REST API and the other Using Functions with Azure SQL DB. Using REST API – Fabric provides APIs that we can call directly from our Function to insert records using HTTP client in the Function’s Python code to send POST requests to the Fabric API endpoints with our JSON data. Using Functions with Azure SQL DB – we can connect it directly from our Function using the SQL client in the Function to execute SQL INSERT statements to add records to the database. While we are at it, we could even get GitHub Copilot to write up the Unit Tests. Here’s a sample: Visualization in Fabric Power BI Let's start with creating visualizations in Fabric using the web version of Power BI for our report, UtilitiesBillAnalysisDashboard. You could use the PBI Desktop version too. Open the PBI Service and navigate to the workspace where you want to create your report. Click on "New" and select "Dataset" to add a new data source. Choose "SQL Server" from the list of data sources and enter "UtilityBillsServer" as the server name and "UtilityBillsDB" as the DB name to establish the connection. Once connected, navigate to the Navigator pane where we can select the table "tblElectricity" and the columns. I’ve shown these in the pictures below. For a clustered column (or bar) chart, let us choose the columns that contain our categorical data (e.g., month, year) and numerical data (e.g., kWh usage, billed amounts). After loading the data into PBI, drag the desired fields into the Values and Axis areas of the clustered column chart visualization. Customize the chart by adjusting the formatting options to enhance readability and insights. We now visualize our data in PBI within Fabric. We may need to do custom sort of the Month column. Let’s do this in the Data view. Select the table and create a new column with the following formula. This will create a custom sort column that we will use as ‘Sum of MonthNumber’ in ascending order. Other visualizations possibilities: Other Possibilities Agents with Custom Copilot Studio Next, you could leverage a custom Copilot to provide personalized energy usage recommendations based on historical data. Start by integrating the Copilot with your existing data pipeline in Azure AI Foundry. The Copilot can analyze electricity consumption patterns stored in your Fabric SQL DB and use ML models to identify optimization opportunities. For instance, it could suggest energy-efficient appliances, optimal usage times, or tips to reduce consumption. These recommendations can be visualized in PBI where users can track progress over time. To implement this, you would need to set up an API endpoint for the Copilot to access the data, train the ML models using Python in VS Code (let GitHub Copilot help you here… you will love it), and deploy the models to Azure using CLI / PowerShell / Bicep / Terraform / ARM or the Azure portal. Finally, connect the Copilot to PBI to visualize the personalized recommendations. Additionally, you could explore using Azure AI Agents for automated anomaly detection and alerts. This agent could monitor electricity bill data for unusual patterns and send notifications when anomalies are detected. Yet another idea would be to implement predictive maintenance for electrical systems, where an AI agent uses predictive analytics to forecast maintenance needs based on the data collected, helping to reduce downtime and improve system reliability. Summary We have built a solution that leveraged the seamless integration of pioneering AI technologies with Microsoft’s end-to-end platform. By leveraging Azure AI Foundry, we have developed a solution that uses Document Intelligence to scan electricity bills, stores the data in Fabric SQL DB, and processes it with Python in Azure Functions in VS Code, assisted by GitHub Copilot. The resulting insights are visualized in Power BI within Fabric. Additionally, we explored potential enhancements using Azure AI Agents and Custom Copilots, showcasing the ease of implementation and the transformative possibilities. Finally, speaking of possibilities – With Gen AI, the only limit is our imagination! Additional resources Explore Azure AI Foundry Start using the Azure AI Foundry SDK Review the Azure AI Foundry documentation and Call Azure Logic Apps as functions using Azure OpenAI Assistants Take the Azure AI Learn courses Learn more about Azure AI Services Document Intelligence: Azure AI Doc Intel GitHub Copilot examples: What can GitHub Copilot do – Examples Explore Microsoft Fabric: Microsoft Fabric Documentation See what you can connect with Azure Logic Apps: Azure Logic Apps Connectors About the Author Pradyumna (Prad) Harish is a Technology leader in the GSI Partner Organization at Microsoft. He has 26 years of experience in Product Engineering, Partner Development, Presales, and Delivery. Responsible for revenue growth through Cloud, AI, Cognitive Services, ML, Data & Analytics, Integration, DevOps, Open Source Software, Enterprise Architecture, IoT, Digital strategies and other innovative areas for business generation and transformation; achieving revenue targets via extensive experience in managing global functions, global accounts, products, and solution architects across over 26 countries.3.8KViews4likes1Comment