azure ai
30 TopicsBuilding Secure AI Chat Systems: Part 2 - Securing Your Architecture from Storage to Network
In Part 1 of this series, we tackled the critical challenge of protecting the LLM itself from malicious inputs. We implemented three essential security layers using Azure AI services: harmful content detection with Azure Content Safety, PII protection with Azure Text Analytics, and prompt injection prevention with Prompt Shields. These guardrails ensure that your AI model doesn't process harmful requests or leak sensitive information through cleverly crafted prompts. But even with a perfectly secured LLM, your entire AI chat system can still be compromised through architectural vulnerabilities. For example, the WotNot incident wasn't about prompt injection—it was 346,000 files sitting in an unsecured cloud storage bucket. Likewise the OmniGPT breach with 34 million lines of conversation logs due to backend database security failures. The global average cost of a data breach is now $4.44 million, and it takes organizations an average of 241 days to identify and contain an active breach. That's eight months where attackers have free reign in your systems. The financial cost is one thing, but the reputational damage and loss of customer is irreversible. This article focuses on the architectural security concerns I mentioned at the end of Part 1—the infrastructure that stores your chat histories, the networks that connect your services, and the databases that power your vector searches. We'll examine real-world breaches that happened in 2024 and 2025, understand exactly what went wrong, and implement Azure solutions that would have prevented them. By the end of this article, you'll have a production-ready, secure architecture for your AI chat system that addresses the most common—and most devastating—security failures we're seeing in the wild. Let's start with the most fundamental question: where is your data, and who can access it? 1. Preventing Exposed Storage with Network Isolation The Problem: When Your Database Is One Google Search Away Let me paint you a picture of what happened with two incidents in 2024-2025: WotNot AI Chatbot left 346,000 files completely exposed in an unsecured cloud storage bucket—passports, medical records, sensitive customer data, all accessible to anyone on the internet without even a password. Security researchers who discovered it tried for over two months to get the company to fix it. In May 2025, Canva Creators' data was exposed through an unsecured Chroma vector database operated by an AI chatbot company. The database contained 341 collections of documents including survey responses from 571 Canva Creators with email addresses, countries of residence, and comprehensive feedback. This marked the first reported data leak involving a vector database. The common thread? Public internet accessibility. These databases and storage accounts were accessible from anywhere in the world. No VPN required. No private network. Just a URL and you were in. Think about your current architecture. If someone found your Cosmos DB connection string or your Azure Storage account name, what's stopping them from accessing it? If your answer is "just the access key" or "firewall rules," you're one leaked credential away from being in the headlines. So what to do: Azure Private Link + Network Isolation The most effective way to prevent public exposure is simple: remove public internet access entirely. This is where Azure Private Link becomes your architectural foundation. With Azure Private Link, you can create a private endpoint inside your Azure Virtual Network (VNet) that becomes the exclusive gateway to your Azure services. Your Cosmos DB, Storage Accounts, Azure OpenAI Service, and other resources are completely removed from the public internet—they only respond to requests originating from within your VNet. Even if someone obtains your connection strings or access keys, they cannot use them without first gaining access to your private network. Implementation Overview: To implement Private Link for your AI chat system, you'll need to: Create an Azure Virtual Network (VNet) to host your private endpoints and application resources Configure private endpoints for each service (Cosmos DB, Storage, Azure OpenAI, Key Vault) Set up private DNS zones to automatically resolve service URLs to private IPs within your VNet Disable public network access on all your Azure resources Deploy your application inside the VNet using Azure App Service with VNet integration, Azure Container Apps, or Azure Kubernetes Service Verify isolation by attempting to access resources from outside the VNet (should fail) You can configure this through the Azure Portal, Azure CLI, ARM templates, or infrastructure-as-code tools like Terraform. The Azure documentation provides step-by-step guides for each service type. Figure 1: Private Link Architecture for AI Chat Systems Private endpoints ensure all data access occurs within the Azure Virtual Network, blocking public internet access to databases, storage, and AI services. 2. Protecting Conversation Data with Encryption at Rest The Problem: When Backend Databases Become Treasure Troves Network isolation solves the problem of external access, but what happens when attackers breach your perimeter through other means? What if a malicious insider gains access? What if there's a misconfiguration in your cloud environment? The data sitting in your databases becomes the ultimate prize. In February 2025, OmniGPT suffered a catastrophic breach where attackers accessed the backend database and extracted personal data from 30,000 users including emails, phone numbers, API keys, and over 34 million lines of conversation logs. The exposed data included links to uploaded files containing sensitive credentials, billing details, and API keys. These weren't prompt injection attacks. These weren't DDoS incidents. These were failures to encrypt sensitive data at rest. When attackers accessed the storage layer, they found everything in readable format—a goldmine of personal information, conversations, and credentials. Think about the conversations your AI chat system stores. Customer support queries that might include account numbers. Healthcare chatbots discussing symptoms and medications. HR assistants processing employee grievances. If someone gained unauthorized (or even authorized) access to your database today, would they be reading plaintext conversations? What to do: Azure Cosmos DB with Customer-Managed Keys The fundamental defense against data exposure is encryption at rest—ensuring that data stored on disk is encrypted and unreadable without the proper decryption keys. Even if attackers gain physical or logical access to your database files, the data remains protected as long as they don't have access to the encryption keys. But who controls those keys? With platform-managed encryption (the default in most cloud services), the cloud provider manages the encryption keys. While this protects against many threats, it doesn't protect against insider threats at the provider level, compromised provider credentials, or certain compliance scenarios where you must prove complete key control. Customer-Managed Keys (CMK) solve this by giving you complete ownership and control of the encryption keys. You generate, store, and manage the keys in your own key vault. The cloud service can only decrypt your data by requesting access to your keys—access that you control and can revoke at any time. If your keys are deleted or access is revoked, even the cloud provider cannot decrypt your data. Azure makes this easy with Azure Key Vault integrated with Azure Cosmos DB. The architecture uses "envelope encryption" where your data is encrypted with a Data Encryption Key (DEK), and that DEK is itself encrypted with your Key Encryption Key (KEK) stored in Key Vault. This provides layered security where even if the database is compromised, the data remains encrypted with keys only you control. While we covered PII detection and redaction using Azure Text Analytics in Part 1—which prevents sensitive data from being stored in the first place—encryption at rest with Customer-Managed Keys provides an additional, powerful layer of protection. In fact, many compliance frameworks like HIPAA, PCI-DSS, and certain government regulations explicitly require customer-controlled encryption for data at rest, making CMK not just a best practice but often a mandatory requirement for regulated industries. Implementation Overview: To implement Customer-Managed Keys for your chat history and vector storage: Create an Azure Key Vault with purge protection and soft delete enabled (required for CMK) Generate or import your encryption key in Key Vault (2048-bit RSA or 256-bit AES keys) Grant Cosmos DB access to Key Vault using a system-assigned or user-assigned managed identity Enable CMK on Cosmos DB by specifying your Key Vault key URI during account creation or update Configure the same for Azure Storage if you're storing embeddings or documents in Blob Storage Set up key rotation policies to automatically rotate keys on a schedule (recommended: every 90 days) Monitor key usage through Azure Monitor and set up alerts for unauthorized access attempts Figure 2: Envelope Encryption with Customer-Managed Keys User conversations are encrypted using a two-layer approach: (1) The AI Chat App sends plaintext messages to Cosmos DB, (2) Cosmos DB authenticates to Key Vault using Managed Identity to retrieve the Key Encryption Key (KEK), (3) Data is encrypted with a Data Encryption Key (DEK), (4) The DEK itself is encrypted with the KEK before storage. This ensures data remains encrypted even if the database is compromised, as decryption requires access to keys stored in your Key Vault. For AI chat systems in regulated industries (healthcare, finance, government), Customer-Managed Keys should be your baseline. The operational overhead is minimal with proper automation, and the compliance benefits are substantial. The entire process can be automated using Azure CLI, PowerShell, or infrastructure-as-code tools. For existing Cosmos DB accounts, enabling CMK requires creating a new account and migrating data. 3. Securing Vector Databases and Preventing Data Leakage The Problem: Vector Embeddings Are Data Too Vector databases are the backbone of modern RAG (Retrieval-Augmented Generation) systems. They store embeddings—mathematical representations of your documents, conversations, and knowledge base—that allow your AI to retrieve relevant context for every user query. But here's what most developers don't realize: those vectors aren't just abstract numbers. They contain your actual data. A critical oversight in AI chat architectures is treating vector databases—or in our case, Cosmos DB collections storing embeddings—as less sensitive than traditional data stores. Whether you're using a dedicated vector database or storing embeddings in Cosmos DB alongside your chat history, these mathematical representations need the same rigorous security controls as the original text. In documented cases, shared vector databases inadvertently mixed data between two corporate clients. One client's proprietary information began surfacing in response to the other client's queries, creating a serious confidentiality breach in what was supposed to be a multi-tenant system. Even more concerning are embedding inversion attacks, where adversaries exploit weaknesses to reconstruct original source data from its vector representation—effectively reverse-engineering your documents from the mathematical embeddings. Think about what's in your vector storage right now. Customer support conversations. Internal company documents. Product specifications. Medical records. Legal documents. If you're running a multi-tenant system, are you absolutely certain that Company A can't retrieve Company B's data? Can you guarantee that embeddings can't be reverse-engineered to expose the original text? What to do: Azure Cosmos DB for MongoDB with Logical Partitioning and RBAC The security of vector databases requires a multi-layered approach that addresses both storage isolation and access control. Azure Cosmos DB for MongoDB provides native support for vector search while offering enterprise-grade security features specifically designed for multi-tenant architectures. Logical partitioning creates strict data boundaries within your database by organizing data into isolated partitions based on a partition key (like tenant_id or user_id). When combined with Role-Based Access Control (RBAC), you create a security model where users and applications can only access their designated partitions—even if they somehow gain broader database access. Implementation Overview: To implement secure multi-tenant vector storage with Cosmos DB: Enable MongoDB RBAC on your Cosmos DB account using the EnableMongoRoleBasedAccessControl capability Design your partition key strategy based on tenant_id, user_id, or organization_id for maximum isolation Create collections with partition keys that enforce tenant boundaries at the storage level Define custom RBAC roles that grant access only to specific databases and partition key ranges Create user accounts per tenant or service principal with assigned roles limiting their scope Implement partition-aware queries in your application to always include the partition key filter Enable diagnostic logging to track all vector retrieval operations with user identity Configure cross-region replication for high availability while maintaining partition isolation Figure 3: Multi-Tenant Data Isolation with Partition Keys and RBAC Azure Cosmos DB enforces tenant isolation through logical partitioning and Role-Based Access Control (RBAC). Each tenant's data is stored in separate partitions (Partition A, B, C) based on the partition key (tenant_id). RBAC acts as a security gateway, validating every query to ensure users can only access their designated partition. Attempts to access other tenants' partitions are blocked at the RBAC layer, preventing cross-tenant data leakage in multi-tenant AI chat systems. Azure provides comprehensive documentation and CLI tools for configuring RBAC roles and partition strategies. The key is to design your partition scheme before loading data, as changing partition keys requires data migration. Beyond partitioning and RBAC, implement these AI-specific security measures: Validate embedding sources: Authenticate and continuously audit external data sources before vectorizing to prevent poisoned embeddings Implement similarity search thresholds: Set minimum similarity scores to prevent irrelevant cross-context retrieval Use metadata filtering: Add security labels (classification levels, access groups) to vector metadata and enforce filtering Monitor retrieval patterns: Alert on unusual patterns like one tenant making queries that correlate with another tenant's data Separate vector databases per sensitivity level: Keep highly confidential vectors (PII, PHI) in dedicated databases with stricter controls Hash document identifiers: Use hashed references instead of plaintext IDs in vector metadata to prevent enumeration attacks For production AI chat systems handling multiple customers or sensitive data, Cosmos DB with partition-based RBAC should be your baseline. The combination of storage-level isolation and access control provides defense in depth that application-layer filtering alone cannot match. Bonus: Secure Logging and Monitoring for AI Chat Systems During development, we habitually log everything—full request payloads, user inputs, model responses, stack traces. It's essential for debugging. But when your AI chat system goes to production and starts handling real user conversations, those same logging practices become a liability. Think about what flows through your AI chat system: customer support conversations containing account numbers, healthcare queries discussing medical conditions, HR chatbots processing employee complaints, financial assistants handling transaction details. If you're logging full conversations for debugging, you're creating a secondary repository of sensitive data that's often less protected than your primary database. The average breach takes 241 days to identify and contain. During that time, attackers often exfiltrate not just production databases, but also log files and monitoring data—places where developers never expected sensitive information to end up. The question becomes: how do you maintain observability and debuggability without creating a security nightmare? The Solution: Structured Logging with PII Redaction and Azure Monitor The key is to log metadata, not content. You need enough information to trace issues and understand system behavior without storing the actual sensitive conversations. Azure Monitor with Application Insights provides enterprise-grade logging infrastructure with built-in features for sanitizing sensitive data. Combined with proper application-level controls, you can maintain full observability while protecting user privacy. What to Log in Production AI Chat Systems: DO Log DON'T Log Request timestamps and duration Full user messages or prompts User IDs (hashed or anonymized) Complete model responses Session IDs (hashed) Raw embeddings or vectors Model names and versions used Personally identifiable information (PII) Token counts (input/output) Retrieved document content Embedding dimensions and similarity scores Database connection strings or API keys Retrieved document IDs (not content) Complete stack traces that might contain data Error codes and exception types Performance metrics (latency, throughput) RBAC decisions (access granted/denied) Partition keys accessed Rate limiting triggers Final Remarks: Building Compliant, Secure AI Systems Throughout this two-part series, we've addressed the complete security spectrum for AI chat systems—from protecting the LLM itself to securing the underlying infrastructure. But there's a broader context that makes all of this critical: compliance and regulatory requirements. AI chat systems operate within an increasingly complex regulatory landscape. The EU AI Act, which entered force on August 1, 2024, became the first comprehensive AI regulation by a major regulator, assigning applications to risk categories with high-risk systems subject to specific legal requirements. The NIS2 Directive further requires that AI model endpoints, APIs, and data pipelines be protected to prevent breaches and ensure secure deployment. Beyond AI-specific regulations, chat systems must comply with established data protection frameworks depending on their use case. GDPR mandates data minimization, user rights to erasure and data portability, 72-hour breach notification, and EU data residency for systems serving European users. Healthcare chatbots must meet HIPAA requirements including encryption, access controls, 6-year audit log retention, and Business Associate Agreements. Systems processing payment information fall under PCI-DSS, requiring cardholder data isolation, encryption, role-based access controls, and regular security testing. B2B SaaS platforms typically need SOC 2 Type II compliance, demonstrating security controls over data availability, confidentiality, continuous monitoring, and incident response procedures. Azure's architecture directly supports these compliance requirements through its built-in capabilities. Private Link enables data residency by keeping traffic within specified Azure regions while supporting network isolation requirements. Customer-Managed Keys provide the encryption controls and key ownership mandated by HIPAA and PCI-DSS. Cosmos DB's partition-based RBAC creates the access controls and audit trails required across all frameworks. Azure Monitor and diagnostic logging satisfy audit and monitoring requirements, while Azure Policy and Microsoft Purview automate compliance enforcement and reporting. The platform's certifications and compliance offerings (including HIPAA, PCI-DSS, SOC 2, and GDPR attestations) provide the documentation and third-party validation that auditors require, significantly reducing the operational burden of maintaining compliance. Further Resources: Azure Private Link Documentation Azure Cosmos DB Customer-Managed Keys Azure Key Vault Overview Azure Cosmos DB Role-Based Access Control Azure Monitor and Application Insights Azure Policy for Compliance Microsoft Purview Data Governance Azure Security Benchmark Stay secure, stay compliant, and build responsibly.244Views0likes0CommentsBuilding a Multi-Agent System with Azure AI Agent Service: Campus Event Management
Personal Background My name is Peace Silly. I studied French and Spanish at the University of Oxford, where I developed a strong interest in how language is structured and interpreted. That curiosity about syntax and meaning eventually led me to computer science, which I came to see as another language built on logic and structure. In the academic year 2024–2025, I completed the MSc Computer Science at University College London, where I developed this project as part of my Master’s thesis. Project Introduction Can large-scale event management be handled through a simple chat interface? This was the question that guided my Master’s thesis project at UCL. As part of the Industry Exchange Network (IXN) and in collaboration with Microsoft, I set out to explore how conversational interfaces and autonomous AI agents could simplify one of the most underestimated coordination challenges in campus life: managing events across multiple departments, societies, and facilities. At large universities, event management is rarely straightforward. Rooms are shared between academic timetables, student societies, and one-off events. A single lecture theatre might host a departmental seminar in the morning, a society meeting in the afternoon, and a careers talk in the evening, each relying on different systems, staff, and communication chains. Double bookings, last-minute cancellations, and maintenance issues are common, and coordinating changes often means long email threads, manual spreadsheets, and frustrated users. These inefficiencies do more than waste time; they directly affect how a campus functions day to day. When venues are unavailable or notifications fail to reach the right people, even small scheduling errors can ripple across entire departments. A smarter, more adaptive approach was needed, one that could manage complex workflows autonomously while remaining intuitive and human for end users. The result was the Event Management Multi-Agent System, a cloud-based platform where staff and students can query events, book rooms, and reschedule activities simply by chatting. Behind the scenes, a network of Azure-powered AI agents collaborates to handle scheduling, communication, and maintenance in real time, working together to keep the campus running smoothly. The user scenario shown in the figure below exemplifies the vision that guided the development of this multi-agent system. Starting with Microsoft Learning Resources I began my journey with Microsoft’s tutorial Build Your First Agent with Azure AI Foundry which introduced the fundamentals of the Azure AI Agent Service and provided an ideal foundation for experimentation. Within a few weeks, using the Azure Foundry environment, I extended those foundations into a fully functional multi-agent system. Azure Foundry’s visual interface was an invaluable learning space. It allowed me to deploy, test, and adjust model parameters such as temperature, system prompts, and function calling while observing how each change influenced the agents’ reasoning and collaboration. Through these experiments, I developed a strong conceptual understanding of orchestration and coordination before moving to the command line for more complex development later. When development issues inevitably arose, I relied on the Discord support community and the GitHub forum for troubleshooting. These communities were instrumental in addressing configuration issues and providing practical examples, ensuring that each agent performed reliably within the shared-thread framework. This early engagement with Microsoft’s learning materials not only accelerated my technical progress but also shaped how I approached experimentation, debugging, and iteration. It transformed a steep learning curve into a structured, hands-on process that mirrored professional software development practice. A Decentralised Team of AI Agents The system’s intelligence is distributed across three specialised agents, powered by OpenAI’s GPT-4.1 models through Azure OpenAI Service. They each perform a distinct role within the event management workflow: Scheduling Agent – interprets natural language requests, checks room availability, and allocates suitable venues. Communications Agent – notifies stakeholders when events are booked, modified, or cancelled. Maintenance Agent – monitors room readiness, posts fault reports when venues become unavailable, and triggers rescheduling when needed. Each agent operates independently but communicates through a shared thread, a transparent message log that serves as the coordination backbone. This thread acts as a persistent state space where agents post updates, react to changes, and maintain a record of every decision. For example, when a maintenance fault is detected, the Maintenance Agent logs the issue, the Scheduling Agent identifies an alternative venue, and the Communications Agent automatically notifies attendees. These interactions happen autonomously, with each agent responding to the evolving context recorded in the shared thread. Interfaces and Backend The system was designed with both developer-focused and user-facing interfaces, supporting rapid iteration and intuitive interaction. The Terminal Interface Initially, the agents were deployed and tested through a terminal interface, which provided a controlled environment for debugging and verifying logic step by step. This setup allowed quick testing of individual agents and observation of their interactions within the shared thread. The Chat Interface As the project evolved, I introduced a lightweight chat interface to make the system accessible to staff and students. This interface allows users to book rooms, query events, and reschedule activities using plain language. Recognising that some users might still want to see what happens behind the scenes, I added an optional toggle that reveals the intermediate steps of agent reasoning. This transparency feature proved valuable for debugging and for more technical users who wanted to understand how the agents collaborated. When a user interacts with the chat interface, they are effectively communicating with the Scheduling Agent, which acts as the primary entry point. The Scheduling Agent interprets natural-language commands such as “Book the Engineering Auditorium for Friday at 2 PM” or “Reschedule the robotics demo to another room.” It then coordinates with the Maintenance and Communications Agents to complete the process. Behind the scenes, the chat interface connects to a FastAPI backend responsible for core logic and data access. A Flask + HTMX layer handles lightweight rendering and interactivity, while the Azure AI Agent Service manages orchestration and shared-thread coordination. This combination enables seamless agent communication and reliable task execution without exposing any of the underlying complexity to the end user. Automated Notifications and Fault Detection Once an event is scheduled, the Scheduling Agent posts the confirmation to the shared thread. The Communications Agent, which subscribes to thread updates, automatically sends notifications to all relevant stakeholders by email. This ensures that every participant stays informed without any manual follow-up. The Maintenance Agent runs routine availability checks. If a fault is detected, it logs the issue to the shared thread, prompting the Scheduling Agent to find an alternative room. The Communications Agent then notifies attendees of the change, ensuring minimal disruption to ongoing events. Testing and Evaluation The system underwent several layers of testing to validate both functional and non-functional requirements. Unit and Integration Tests Backend reliability was evaluated through unit and integration tests to ensure that room allocation, conflict detection, and database operations behaved as intended. Automated test scripts verified end-to-end workflows for event creation, modification, and cancellation across all agents. Integration results confirmed that the shared-thread orchestration functioned correctly, with all test cases passing consistently. However, coverage analysis revealed that approximately 60% of the codebase was tested, leaving some areas such as Azure service integration and error-handling paths outside automated validation. These trade-offs were deliberate, balancing test depth with project scope and the constraints of mocking live dependencies. Azure AI Evaluation While functional testing confirmed correctness, it did not capture the agents’ reasoning or language quality. To assess this, I used Azure AI Evaluation, which measures conversational performance across metrics such as relevance, coherence, fluency, and groundedness. The results showed high scores in relevance (4.33) and groundedness (4.67), confirming the agents’ ability to generate accurate and context-aware responses. However, slightly lower fluency scores and weaker performance in multi-turn tasks revealed a retrieval–execution gap typical in task-oriented dialogue systems. Limitations and Insights The evaluation also surfaced several key limitations: Synthetic data: All tests were conducted with simulated datasets rather than live campus systems, limiting generalisability. Scalability: A non-functional requirement in the form of horizontal scalability was not tested. The architecture supports scaling conceptually but requires validation under heavier load. Despite these constraints, the testing process confirmed that the system was both technically reliable and linguistically robust, capable of autonomous coordination under normal conditions. The results provided a realistic picture of what worked well and what future iterations should focus on improving. Impact and Future Work This project demonstrates how conversational AI and multi-agent orchestration can streamline real operational processes. By combining Azure AI Agent Services with modular design principles, the system automates scheduling, communication, and maintenance while keeping the user experience simple and intuitive. The architecture also establishes a foundation for future extensions: Predictive maintenance to anticipate venue faults before they occur. Microsoft Teams integration for seamless in-chat scheduling. Scalability testing and real-user trials to validate performance at institutional scale. Beyond its technical results, the project underscores the potential of multi-agent systems in real-world coordination tasks. It illustrates how modularity, transparency, and intelligent orchestration can make everyday workflows more efficient and human-centred. Acknowledgements What began with a simple Microsoft tutorial evolved into a working prototype that reimagines how campuses could manage their daily operations through conversation and collaboration. This was both a challenging and rewarding journey, and I am deeply grateful to Professor Graham Roberts (UCL) and Professor Lee Stott (Microsoft) for their guidance, feedback, and support throughout the project.271Views2likes0CommentsModel Mondays S2E01 Recap: Advanced Reasoning Session
About Model Mondays Want to know what Reasoning models are and how you can build advanced reasoning scenarios like a Deep Research agent using Azure AI Foundry? Check out this recap from Model Mondays Season 2 Ep 1. Model Mondays is a weekly series to help you build your model IQ in three steps: 1. Catch the 5-min Highlights on Monday, to get up to speed on model news 2. Catch the 15-min Spotlight on Monday, for a deep-dive into a model or tool 3. Catch the 30-min AMA on Friday, for a Q&A session with subject matter experts Want to follow along? Register Here- to watch upcoming livestreams for Season 2 Visit The Forum- to see the full AMA schedule for Season 2 Register Here - to join the AMA on Friday Jun 20 Spotlight On: Advanced Reasoning This week, the Model Mondays spotlight was on Advanced Reasoning with subject matter expert Marlene Mhangami. In this blog post, I'll talk about my five takeaways from this episode: Why Are Reasoning Models Important? What Is an Advanced Reasoning Scenario? How Can I Get Started with Reasoning Models ? Spotlight: My Aha Moment Highlights: What’s New in Azure AI 1. Why Are Reasoning Models Important? In today's fast-evolving AI landscape, it's no longer enough for models to just complete text or summarize content. We need AI that can: Understand multi-step tasks Make decisions based on logic Plan sequences of actions or queries Connect context across turns Reasoning models are large language models (LLMs) trained with reinforcement learning techniques to "think" before they answer. Rather than simply generating a response based on probability, these models follow an internal thought process producing a chain of reasoning before responding. This makes them ideal for complex problem-solving tasks. And they’re the foundation of building intelligent, context-aware agents. They enable next-gen AI workflows in everything from customer support to legal research and healthcare diagnostics. Reason: They allow AI to go beyond surface-level response and deliver solutions that reflect understanding, not just language patterning. 2. What does Advanced Reasoning involve? An advanced reasoning scenario is one where a model: Breaks a complex prompt into smaller steps Retrieves relevant external data Uses logic to connect dots Outputs a structured, reasoned answer Example: A user asks: What are the financial and operational risks of expanding a startup to Southeast Asia in 2025? This is the kind of question that requires extensive research and analysis. A reasoning model might tackle this by: Retrieving reports on Southeast Asia market conditions Breaking down risks into financial, political, and operational buckets Cross-referencing data with recent trends Returning a reasoned, multi-part answer 3. How Can I Get Started with Reasoning Models? To get started, you need to visit a catalog that has examples of these models. Try the GitHub Models Marketplace and look for the reasoning category in the filter. Try the Azure AI Foundry model catalog and look for reasoning models by name. Example: The o-series of models from Azure Open AI The DeepSeek-R1 models The Grok 3 models The Phi-4 reasoning models Next, you can use SDKs or Playground for exploring the model capabiliies. 1. Try Lab 331 - for a beginner-friendly guide. 2. Try Lab 333 - for an advanced project. 3. Try the GitHub Model Playground - to compare reasoning and GPT models. 4. Try the Deep Research Agent using LangChain - sample as a great starting project. Have questions or comments? Join the Friday AMA on Azure AI Foundry Discord: 4. Spotlight: My Aha Moment Before this session, I thought reasoning meant longer or more detailed responses. But this session helped me realize that reasoning means structured thinking — models now plan, retrieve, and respond with logic. This inspired me to think about building AI agents that go beyond chat and actually assist users like a teammate. It also made me want to dive deeper into LangChain + Azure AI workflows to build mini-agents for real-world use. 5. Highlights: What’s New in Azure AI Here’s what’s new in the Azure AI Foundry: Direct From Azure Models - Try hosted models like OpenAI GPT on PTU plans SORA Video Playground - Generate video from prompts via SORA models Grok 3 Models - Now available for secure, scalable LLM experiences DeepSeek R1-0528 - A reasoning-optimized, Microsoft-tuned open-source model These are all available in the Azure Model Catalog and can be tried with your Azure account. Did You Know? Your first step is to find the right model for your task. But what if you could have the model automatically selected for you_ based on the prompt you provide? That's the magic of Model Router a deployable AI chat model that dynamically selects the best LLM based on your prompt. Instead of choosing one model manually, the Router makes that choice in real time. Currently, this works with a fixed set of Azure OpenAI models, including a reasoning model option. Keep an eye on the documentation for more updates. Why it’s powerful: Saves cost by switching between models based on complexity Optimizes performance by selecting the right model for the task Lets you test and compare model outputs quickly Try it out in Azure AI Foundry or read more in the Model Catalog Coming Up Next Next week, we dive into Model Context Protocol, an open protocol that empowers agentic AI applications by making it easier to discover and integrate knowledge and action tools with your model choices. Register Here to get reminded - and join us live on Monday! Join The Community Great devs don't build alone! In a fast-pased developer ecosystem, there's no time to hunt for help. That's why we have the Azure AI Developer Community. Join us today and let's journey together! Join the Discord - for real-time chats, events & learning Explore the Forum - for AMA recaps, Q&A, and help! About Me. I'm Sharda, a Gold Microsoft Learn Student Ambassador interested in cloud and AI. Find me on Github, Dev.to,, Tech Community and Linkedin. In this blog series I have summarizef my takeaways from this week's Model Mondays livestream .395Views0likes0CommentsModel Mondays S2:E4 Understanding AI Developer Experiences with Leo Yao
This week in Model Mondays, we put the spotlight on the AI Toolkit for Visual Studio Code - and explore the tools and workflows that make building generative AI apps and agents easier for developers. Read on for my recap. This post was generated with AI help and human revision & review. To learn more about our motivation and workflows, please refer to this document in our website. About Model Mondays Model Mondays is a weekly series designed to help you grow your Azure AI Foundry Model IQ step by step. Each week includes: 5-Minute Highlights – Quick news and updates about Azure AI models and tools on Monday 15-Minute Spotlight – Deep dive into a key model, protocol, or feature on Monday 30-Minute AMA on Friday – Live Q&A with subject matter experts from the Monday livestream If you're looking to grow your skills with the latest in AI model development, this series is a great place to begin. Useful links: Register for upcoming livestreams Watch past episodes Join the AMA on AI Developer Experiences Visit the Model Mondays forum Spotlight On: AI Developer Experiences 1. What is this topic and why is it important? AI Developer Experiences focus on making the process of building, testing, and deploying AI models as efficient as possible. With the right tools—such as the AI Toolkit and Azure AI Foundry extensions for Visual Studio Code—developers can eliminate unnecessary friction and focus on innovation. This is essential for accelerating the real-world impact of generative AI. 2. What is one key takeaway from the episode? The integration of Azure AI Foundry with Visual Studio Code allows developers to manage models, run experiments, and deploy applications directly from their preferred development environment. This unified workflow enhances productivity and simplifies the AI development lifecycle. 3. How can I get started? Here are a few resources to explore: Install the AI Toolkit for VS Code Explore Azure AI Foundry Documentation Join the Microsoft Tech Community to follow and contribute to discussions 4. What’s New in Azure AI Foundry? Azure AI Foundry continues to evolve to meet developer needs with more power, flexibility, and productivity. Here are some of the latest updates highlighted in this week’s episode: AI Toolkit for Visual Studio Code Now with deeper integration, allowing developers to manage models, run experiments, and deploy applications directly within their editor—streamlining the entire workflow. Prompt Shields Enhanced security capabilities designed to protect generative AI applications from prompt injection and unsafe content, improving reliability in production environments. Model Router A new intelligent routing system that dynamically directs model requests to the most suitable model available—enhancing performance and efficiency at scale. Expanded Model Catalog The catalog now includes more open-source and proprietary models, featuring the latest from Hugging Face, OpenAI, and other leading providers. Improved Documentation and Sample Projects Newly added guides and ready-to-use examples to help developers get started faster, understand workflows, and build confidently. My A-Ha Moment Before watching this episode, setting up an AI development environment always felt like a challenge. There were so many moving parts—configurations, integrations, and dependencies—that it was hard to know where to begin. Seeing the AI Toolkit in action inside Visual Studio Code changed everything for me. It was a realization moment: “That’s it? I can explore models, test prompts, and deploy apps—without ever leaving my editor?” This episode made it clear that building with AI doesn’t have to be complex or intimidating. With the right tools, experimentation becomes faster and far more enjoyable. Now, I’m genuinely excited to build, test, and explore new generative AI solutions because the process finally feels accessible. Coming Up Next Week In the next episode, we’ll be exploring Fine-Tuning and Distillation with Dave Voutila. This session will focus on how to adapt Azure OpenAI models to your unique use cases and apply best practices for efficient knowledge transfer. Register here to reserve your spot and be part of the conversation. Join the Community Building in AI is better when we do it together. That’s why the Azure AI Developer Community exists—to support your journey and provide resources every step of the way. Join the Discord for real-time discussions, events, and peer learning Explore the Forum to catch up on AMAs, ask questions, and connect with other developers About Me I'm Sharda, a Gold Microsoft Learn Student Ambassador passionate about cloud technologies and artificial intelligence. I enjoy learning, building, and helping others grow in tech. Connect with me: LinkedIn GitHub Dev.to Microsoft Tech Community244Views0likes0CommentsPower Up Your Open WebUI with Azure AI Speech: Quick STT & TTS Integration
Introduction Ever found yourself wishing your web interface could really talk and listen back to you? With a few clicks (and a bit of code), you can turn your plain Open WebUI into a full-on voice assistant. In this post, you’ll see how to spin up an Azure Speech resource, hook it into your frontend, and watch as user speech transforms into text and your app’s responses leap off the screen in a human-like voice. By the end of this guide, you’ll have a voice-enabled web UI that actually converses with users, opening the door to hands-free controls, better accessibility, and a genuinely richer user experience. Ready to make your web app speak? Let’s dive in. Why Azure AI Speech? We use Azure AI Speech service in Open Web UI to enable voice interactions directly within web applications. This allows users to: Speak commands or input instead of typing, making the interface more accessible and user-friendly. Hear responses or information read aloud, which improves usability for people with visual impairments or those who prefer audio. Provide a more natural and hands-free experience especially on devices like smartphones or tablets. In short, integrating Azure AI Speech service into Open Web UI helps make web apps smarter, more interactive, and easier to use by adding speech recognition and voice output features. If you haven’t hosted Open WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Open WebUI deployed already. Learn More about OpenWeb UI here. Deploy Azure AI Speech service in Azure. Navigate to the Azure Portal and search for Azure AI Speech on the Azure portal search bar. Create a new Speech Service by filling up the fields in the resource creation page. Click on “Create” to finalize the setup. After the resource has been deployed, click on “View resource” button and you should be redirected to the Azure AI Speech service page. The page should display the API Keys and Endpoints for Azure AI Speech services, which you can use in Open Web UI. Settings things up in Open Web UI Speech to Text settings (STT) Head to the Open Web UI Admin page > Settings > Audio. Paste the API Key obtained from the Azure AI Speech service page into the API key field below. Unless you use different Azure Region, or want to change the default configurations for the STT settings, leave all settings to blank. Text to Speech settings (TTS) Now, let's proceed with configuring the TTS Settings on OpenWeb UI by toggling the TTS Engine to Azure AI Speech option. Again, paste the API Key obtained from Azure AI Speech service page and leave all settings to blank. You can change the TTS Voice from the dropdown selection in the TTS settings as depicted in the image below: Click Save to reflect the change. Expected Result Now, let’s test if everything works well. Open a new chat / temporary chat on Open Web UI and click on the Call / Record button. The STT Engine (Azure AI Speech) should identify your voice and provide a response based on the voice input. To test the TTS feature, click on the Read Aloud (Speaker Icon) under any response from Open Web UI. The TTS Engine should reflect Azure AI Speech service! Conclusion And that’s a wrap! You’ve just given your Open WebUI the gift of capturing user speech, turning it into text, and then talking right back with Azure’s neural voices. Along the way you saw how easy it is to spin up a Speech resource in the Azure portal, wire up real-time transcription in the browser, and pipe responses through the TTS engine. From here, it’s all about experimentation. Try swapping in different neural voices or dialing in new languages. Tweak how you start and stop listening, play with silence detection, or add custom pronunciation tweaks for those tricky product names. Before you know it, your interface will feel less like a web page and more like a conversation partner.1.1KViews2likes1CommentCreate Stunning AI Videos with Sora on Azure AI Foundry!
Special credit to Rory Preddy for creating the GitHub resource that enable us to learn more about Azure Sora. Reach him out on LinkedIn to say thanks. Introduction Artificial Intelligence (AI) is revolutionizing content creation, and video generation is at the forefront of this transformation. OpenAI's Sora, a groundbreaking text-to-video model, allows creators to generate high-quality videos from simple text prompts. When paired with the powerful infrastructure of Azure AI Foundry, you can harness Sora's capabilities with scalability and efficiency, whether on a local machine or a remote setup. In this blog post, I’ll walk you through the process of generating AI videos using Sora on Azure AI Foundry. We’ll cover the setup for both local and remote environments. Requirements: Azure AI Foundry with sora model access A Linux Machine/VM. Make sure that the machine already has the package below: Java JRE 17 (Recommended) OR later Maven Step Zero – Deploying the Azure Sora model on AI Foundry Navigate to the Azure AI Foundry portal and head to the “Models + Endpoints” section (found on the left side of the Azure AI Foundry portal) > Click on the “Deploy Model” button > “Deploy base model” > Search for Sora > Click on “Confirm”. Give a deployment name and specify the Deployment type > Click “Deploy” to finalize the configuration. You should receive an API endpoint and Key after successful deploying Sora on Azure AI Foundry. Store these in a safe place because we will be using them in the next steps. Step one – Setting up the Sora Video Generator in the local/remote machine. Clone the roryp/sora repository on your machine by running the command below: git clone https://github.com/roryp/sora.git cd sora Then, edit the application.properties file in the src/main/resources/ folder to include your Azure OpenAI Credentials. Change the configuration below: azure.openai.endpoint=https://your-openai-resource.cognitiveservices.azure.com azure.openai.api-key=your_api_key_here If port 8080 is used for another application, and you want to change the port for which the web app will run, change the “server.port” configuration to include the desired port. Allow appropriate permissions to run the “mvnw” script file. chmod +x mvnw Run the application ./mvnw spring-boot:run Open your browser and type in your localhost/remote host IP (format: [host-ip:port]) in the browser search bar. If you are running a remote host, please do not forget to update your firewall/NSG to allow inbound connection to the configured port. You should see the web app to generate video with Sora AI using the API provided on Azure AI Foundry. Now, let’s generate a video with Sora Video Generator. Enter a prompt in the first text field, choose the video pixel resolution, and set the video duration. (Due to technical limitation, Sora can only generate video of a maximum of 20 seconds). Click on the “Generate video” button to proceed. The cost to generate the video should be displayed below the “Generate Video” button, for transparency purposes. You can click on the “View Breakdown” button to learn more about the cost breakdown. The video should be ready to download after a maximum of 5 minutes. You can check the status of the video by clicking on the “Check Status” button on the web app. The web app will inform you once the download is ready and the page should refresh every 10 seconds to fetch real-time update from Sora. Once it is ready, click on the “Download Video” button to download the video. Conclusion Generating AI videos with Sora on Azure AI Foundry is a game-changer for content creators, marketers, and developers. By following the steps outlined in this guide, you can set up your environment, integrate Sora, and start creating stunning AI-generated videos. Experiment with different prompts, optimize your workflow, and let your imagination run wild! Have you tried generating AI videos with Sora or Azure AI Foundry? Share your experiences or questions in the comments below. Don’t forget to subscribe for more AI and cloud computing tutorials!1.1KViews0likes3CommentsDeploy Open Web UI on Azure VM via Docker: A Step-by-Step Guide with Custom Domain Setup.
Introductions Open Web UI (often referred to as "Ollama Web UI" in the context of LLM frameworks like Ollama) is an open-source, self-hostable interface designed to simplify interactions with large language models (LLMs) such as GPT-4, Llama 3, Mistral, and others. It provides a user-friendly, browser-based environment for deploying, managing, and experimenting with AI models, making advanced language model capabilities accessible to developers, researchers, and enthusiasts without requiring deep technical expertise. This article will delve into the step-by-step configurations on hosting OpenWeb UI on Azure. Requirements: Azure Portal Account - For students you can claim $USD100 Azure Cloud credits from this URL. Azure Virtual Machine - with a Linux of any distributions installed. Domain Name and Domain Host Caddy Open WebUI Image Step One: Deploy a Linux – Ubuntu VM from Azure Portal Search and Click on “Virtual Machine” on the Azure portal search bar and create a new VM by clicking on the “+ Create” button > “Azure Virtual Machine”. Fill out the form and select any Linux Distribution image – In this demo, we will deploy Open WebUI on Ubuntu Pro 24.04. Click “Review + Create” > “Create” to create the Virtual Machine. Tips: If you plan to locally download and host open source AI models via Open on your VM, you could save time by increasing the size of the OS disk / attach a large disk to the VM. You may also need a higher performance VM specification since large resources are needed to run the Large Language Model (LLM) locally. Once the VM has been successfully created, click on the “Go to resource” button. You will be redirected to the VM’s overview page. Jot down the public IP Address and access the VM using the ssh credentials you have setup just now. Step Two: Deploy the Open WebUI on the VM via Docker Once you are logged into the VM via SSH, run the Docker Command below: docker run -d --name open-webui --network=host --add-host=host.docker.internal:host-gateway -e PORT=8080 -v open-webui:/app/backend/data --restart always ghcr.io/open-webui/open-webui:dev This Docker command will download the Open WebUI Image into the VM and will listen for Open Web UI traffic on port 8080. Wait for a few minutes and the Web UI should be up and running. If you had setup an inbound Network Security Group on Azure to allow port 8080 on your VM from the public Internet, you can access them by typing into the browser: [PUBLIC_IP_ADDRESS]:8080 Step Three: Setup custom domain using Caddy Now, we can setup a reverse proxy to map a custom domain to [PUBLIC_IP_ADDRESS]:8080 using Caddy. The reason why Caddy is useful here is because they provide automated HTTPS solutions – you don’t have to worry about expiring SSL certificate anymore, and it’s free! You must download all Caddy’s dependencies and set up the requirements to install it using this command: sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list sudo apt update && sudo apt install caddy Once Caddy is installed, edit Caddy’s configuration file at: /etc/caddy/Caddyfile , delete everything else in the file and add the following lines: yourdomainname.com { reverse_proxy localhost:8080 } Restart Caddy using this command: sudo systemctl restart caddy Next, create an A record on your DNS Host and point them to the public IP of the server. Step Four: Update the Network Security Group (NSG) To allow public access into the VM via HTTPS, you need to ensure the NSG/Firewall of the VM allow for port 80 and 443. Let’s add these rules into Azure by heading to the VM resources page you created for Open WebUI. Under the “Networking” Section > “Network Settings” > “+ Create port rule” > “Inbound port rule” On the “Destination port ranges” field, type in 443 and Click “Add”. Repeat these steps with port 80. Additionally, to enhance security, you should avoid external users from directly interacting with Open Web UI’s port - port 8080. You should add an inbound deny rule to that port. With that, you should be able to access the Open Web UI from the domain name you setup earlier. Conclusion And just like that, you’ve turned a blank Azure VM into a sleek, secure home for your Open Web UI, no magic required! By combining Docker’s simplicity with Caddy’s “set it and forget it” HTTPS magic, you’ve not only made your app accessible via a custom domain but also locked down security by closing off risky ports and keeping traffic encrypted. Azure’s cloud muscle handles the heavy lifting, while you get to enjoy the perks of a pro setup without the headache. If you are interested in using AI models deployed on Azure AI Foundry on OpenWeb UI via API, kindly read my other article: Step-by-step: Integrate Ollama Web UI to use Azure Open AI API with LiteLLM Proxy3.5KViews1like1CommentLearn How to Build Smarter AI Agents with Microsoft’s MCP Resources Hub
If you've been curious about how to build your own AI agents that can talk to APIs, connect with tools like databases, or even follow documentation you're in the right place. Microsoft has created something called MCP, which stands for Model‑Context‑Protocol. And to help you learn it step by step, they’ve made an amazing MCP Resources Hub on GitHub. In this blog, I’ll Walk you through what MCP is, why it matters, and how to use this hub to get started, even if you're new to AI development. What is MCP (Model‑Context‑Protocol)? Think of MCP like a communication bridge between your AI model and the outside world. Normally, when we chat with AI (like ChatGPT), it only knows what’s in its training data. But with MCP, you can give your AI real-time context from: APIs Documents Databases Websites This makes your AI agent smarter and more useful just like a real developer who looks up things online, checks documentation, and queries databases. What’s Inside the MCP Resources Hub? The MCP Resources Hub is a collection of everything you need to learn MCP: Videos Blogs Code examples Here are some beginner-friendly videos that explain MCP: Title What You'll Learn VS Code Agent Mode Just Changed Everything See how VS Code and MCP build an app with AI connecting to a database and following docs. The Future of AI in VS Code Learn how MCP makes GitHub Copilot smarter with real-time tools. Build MCP Servers using Azure Functions Host your own MCP servers using Azure in C#, .NET, or TypeScript. Use APIs as Tools with MCP See how to use APIs as tools inside your AI agent. Blazor Chat App with MCP + Aspire Create a chat app powered by MCP in .NET Aspire Tip: Start with the VS Code videos if you’re just beginning. Blogs Deep Dives and How-To Guides Microsoft has also written blogs that explain MCP concepts in detail. Some of the best ones include: Build AI agent tools using remote MCP with Azure Functions: Learn how to deploy MCP servers remotely using Azure. Create an MCP Server with Azure AI Agent Service : Enables Developers to create an agent with Azure AI Agent Service and uses the model context protocol (MCP) for consumption of the agents in compatible clients (VS Code, Cursor, Claude Desktop). Vibe coding with GitHub Copilot: Agent mode and MCP support: MCP allows you to equip agent mode with the context and capabilities it needs to help you, like a USB port for intelligence. When you enter a chat prompt in agent mode within VS Code, the model can use different tools to handle tasks like understanding database schema or querying the web. Enhancing AI Integrations with MCP and Azure API Management Enhance AI integrations using MCP and Azure API Management Understanding and Mitigating Security Risks in MCP Implementations Overview of security risks and mitigation strategies for MCP implementations Protecting Against Indirect Injection Attacks in MCP Strategies to prevent indirect injection attacks in MCP implementations Microsoft Copilot Studio MCP Announcement of the Microsoft Copilot Studio MCP lab Getting started with MCP for Beginners 9 part course on MCP Client and Servers Code Repositories Try it Yourself Want to build something with MCP? Microsoft has shared open-source sample code in Python, .NET, and TypeScript: Repo Name Language Description Azure-Samples/remote-mcp-apim-functions-python Python Recommended for Secure remote hosting Sample Python Azure Functions demonstrating remote MCP integration with Azure API Management Azure-Samples/remote-mcp-functions-python Python Sample Python Azure Functions demonstrating remote MCP integration Azure-Samples/remote-mcp-functions-dotnet C# Sample .NET Azure Functions demonstrating remote MCP integration Azure-Samples/remote-mcp-functions-typescript TypeScript Sample TypeScript Azure Functions demonstrating remote MCP integration Microsoft Copilot Studio MCP TypeScript Microsoft Copilot Studio MCP lab You can clone the repo, open it in VS Code, and follow the instructions to run your own MCP server. Using MCP with the AI Toolkit in Visual Studio Code To make your MCP journey even easier, Microsoft provides the AI Toolkit for Visual Studio Code. This toolkit includes: A built-in model catalog Tools to help you deploy and run models locally Seamless integration with MCP agent tools You can install the AI Toolkit extension from the Visual Studio Code Marketplace. Once installed, it helps you: Discover and select models quickly Connect those models to MCP agents Develop and test AI workflows locally before deploying to the cloud You can explore the full documentation here: Overview of the AI Toolkit for Visual Studio Code – Microsoft Learn This is perfect for developers who want to test things on their own system without needing a cloud setup right away. Why Should You Care About MCP? Because MCP: Makes your AI tools more powerful by giving them real-time knowledge Works with GitHub Copilot, Azure, and VS Code tools you may already use Is open-source and beginner-friendly with lots of tutorials and sample code It’s the future of AI development connecting models to the real world. Final Thoughts If you're learning AI or building software agents, don’t miss this valuable MCP Resources Hub. It’s like a starter kit for building smart, connected agents with Microsoft tools. Try one video or repo today. Experiment. Learn by doing and start your journey with the MCP for Beginners curricula.3KViews2likes2CommentsStep-by-step: Integrate Ollama Web UI to use Azure Open AI API with LiteLLM Proxy
Introductions Ollama WebUI is a streamlined interface for deploying and interacting with open-source large language models (LLMs) like Llama 3 and Mistral, enabling users to manage models, test them via a ChatGPT-like chat environment, and integrate them into applications through Ollama’s local API. While it excels for self-hosted models on platforms like Azure VMs, it does not natively support Azure OpenAI API endpoints—OpenAI’s proprietary models (e.g., GPT-4) remain accessible only through OpenAI’s managed API. However, tools like LiteLLM bridge this gap, allowing developers to combine Ollama-hosted models with OpenAI’s API in hybrid workflows, while maintaining compliance and cost-efficiency. This setup empowers users to leverage both self-managed open-source models and cloud-based AI services. Problem Statement As of February 2025, Ollama WebUI, still do not support Azure Open AI API. The Ollama Web UI only support self-hosted Ollama API and managed OpenAI API service (PaaS). This will be an issue if users want to use Open AI models they already deployed on Azure AI Foundry. Objective To integrate Azure OpenAI API via LiteLLM proxy into with Ollama Web UI. LiteLLM translates Azure AI API requests into OpenAI-style requests on Ollama Web UI allowing users to use OpenAI models deployed on Azure AI Foundry. If you haven’t hosted Ollama WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Ollama WebUI deployed already. Step 1: Deploy OpenAI models on Azure Foundry. If you haven’t created an Azure AI Hub already, search for Azure AI Foundry on Azure, and click on the “+ Create” button > Hub. Fill out all the empty fields with the appropriate configuration and click on “Create”. After the Azure AI Hub is successfully deployed, click on the deployed resources and launch the Azure AI Foundry service. To deploy new models on Azure AI Foundry, find the “Models + Endpoints” section on the left hand side and click on “+ Deploy Model” button > “Deploy base model” A popup will appear, and you can choose which models to deploy on Azure AI Foundry. Please note that the o-series models are only available to select customers at the moment. You can request access to the o-series models by completing this request access form, and wait until Microsoft approves the access request. Click on “Confirm” and another popup will emerge. Now name the deployment and click on “Deploy” to deploy the model. Wait a few moments for the model to deploy. Once it successfully deployed, please save the “Target URI” and the API Key. Step 2: Deploy LiteLLM Proxy via Docker Container Before pulling the LiteLLM Image into the host environment, create a file named “litellm_config.yaml” and list down the models you deployed on Azure AI Foundry, along with the API endpoints and keys. Replace "API_Endpoint" and "API_Key" with “Target URI” and “Key” found from Azure AI Foundry respectively. Template for the “litellm_config.yaml” file. model_list: - model_name: [model_name] litellm_params: model: azure/[model_name_on_azure] api_base: "[API_ENDPOINT/Target_URI]" api_key: "[API_Key]" api_version: "[API_Version]" Tips: You can find the API version info at the end of the Target URI of the model's endpoint: Sample Endpoint - https://example.openai.azure.com/openai/deployments/o1-mini/chat/completions?api-version=2024-08-01-preview Run the docker command below to start LiteLLM Proxy with the correct settings: docker run -d \ -v $(pwd)/litellm_config.yaml:/app/config.yaml \ -p 4000:4000 \ --name litellm-proxy-v1 \ --restart always \ ghcr.io/berriai/litellm:main-latest \ --config /app/config.yaml --detailed_debug Make sure to run the docker command inside the directory where you created the “litellm_config.yaml” file just now. The port used to listen for LiteLLM Proxy traffic is port 4000. Now that LiteLLM proxy had been deployed on port 4000, lets change the OpenAI API settings on Ollama WebUI. Navigate to Ollama WebUI’s Admin Panel settings > Settings > Connections > Under the OpenAI API section, write http://127.0.0.1:4000 as the API endpoint and set any key (You must write anything to make it work!). Click on “Save” button to reflect the changes. Refresh the browser and you should be able to see the AI models deployed on the Azure AI Foundry listed in the Ollama WebUI. Now let’s test the chat completion + Web Search capability using the "o1-mini" model on Ollama WebUI. Conclusion Hosting Ollama WebUI on an Azure VM and integrating it with OpenAI’s API via LiteLLM offers a powerful, flexible approach to AI deployment, combining the cost-efficiency of open-source models with the advanced capabilities of managed cloud services. While Ollama itself doesn’t support Azure OpenAI endpoints, the hybrid architecture empowers IT teams to balance data privacy (via self-hosted models on Azure AI Foundry) and cutting-edge performance (using Azure OpenAI API), all within Azure’s scalable ecosystem. This guide covers every step required to deploy your OpenAI models on Azure AI Foundry, set up the required resources, deploy LiteLLM Proxy on your host machine and configure Ollama WebUI to support Azure AI endpoints. You can test and improve your AI model even more with the Ollama WebUI interface with Web Search, Text-to-Image Generation, etc. all in one place.10KViews1like4CommentsMicrosoft AI Agents Learn Live Starting 15th April
Join us for an exciting Learn Live webinar where we dive into the fundamentals of using Azure AI Foundry and AI Agents. The series is to help you build powerful Agent applications. This learn live series will help you understand the AI agents, including when to use them and how to build them, using Azure AI Agent Service and Semantic Kernel Agent Framework. By the end of this learning series, you will have the skills needed to develop AI agents on Azure. This sessions will introduce you to AI agents, the next frontier in intelligent applications and explore how they can be developed and deployed on Microsoft Azure. Through this webinar, you'll gain essential skills to begin creating agents with the Azure AI Agent Service. We'll also discuss how to take your agents to the next level by integrating custom tools, allowing you to extend their capabilities beyond built-in functionalities to better meet your specific needs. Don't miss this opportunity to gain hands-on knowledge and insights from experts in the field. Register now and start your journey into building intelligent agents on Azure Register NOW Learn Live: Master the Skills to Create AI Agents | Microsoft Reactor Plan and Prepare to Develop AI Solution on Azure Microsoft Azure offers multiple services that enable developers to build amazing AI-powered solutions. Proper planning and preparation involves identifying the services you'll use and creating an optimal working environment for your development team. Learning objectives By the end of this module, you'll be able to: Identify common AI capabilities that you can implement in applications Describe Azure AI Services and considerations for using them Describe Azure AI Foundry and considerations for using it Identify appropriate developer tools and SDKs for an AI project Describe considerations for responsible AI Format: Livestream Topic: Core AI Language: English Details Fundamentals of AI agents on Azure AI agents represent the next generation of intelligent applications. Learn how they can be developed and used on Microsoft Azure. Learning objectives By the end of this module, you'll be able to: Describe core concepts related to AI agents Describe options for agent development Create and test an agent in the Azure AI Foundry portal Format: Livestream Topic: Core AI Language: English Details Develop an AI agent with Azure AI Agent Service This module provides engineers with the skills to begin building agents with Azure AI Agent Service. Learning objectives By the end of this module, you'll be able to: Describe the purpose of AI agents Explain the key features of Azure AI Agent Service Build an agent using the Azure AI Agent Service Integrate an agent in the Azure AI Agent Service into your own application Format: Livestream Topic: Core AI Language: English Details Integrate custom tools into your agent Built-in tools are useful, but they may not meet all your needs. In this module, learn how to extend the capabilities of your agent by integrating custom tools for your agent to use. Learning objectives By the end of this module, you'll be able to: Describe the benefits of using custom tools with your agent. Explore the different options for custom tools. Build an agent that integrates custom tools using the Azure AI Agent Service. Format: Livestream Topic: Core AI Language: English Details Develop an AI agent with Semantic Kernel - Training | Microsoft Learn By the end of this module, you'll be able to: Use Semantic Kernel to connect to an Azure AI Foundry project Create Azure AI Agent Service agents using the Semantic Kernel SDK Integrate plugin functions with your AI agent Develop an AI agent with Semantic Kernel Format: Livestream Topic: Core AI Language: English Details Details Orchestrate a multi-agent solution using Semantic Kernel Learn how to use the Semantic Kernel SDK to develop your own AI agents that can collaborate for a multi-agent solution. Learning objectives By the end of this module, you'll be able to: Build AI agents using the Semantic Kernel SDK Develop multi-agent solutions Create custom selection and termination strategies for agent collaboration Format: Livestream Topic: Core AI Language: English Details1.3KViews3likes0Comments