ai foundry
6 TopicsEdge AI for Beginners : Getting Started with Foundry Local
In Module 08 of the EdgeAI for Beginners course, Microsoft introduces Foundry Local a toolkit that helps you deploy and test Small Language Models (SLMs) completely offline. In this blog, I’ll share how I installed Foundry Local, ran the Phi-3.5-mini model on my windows laptop, and what I learned through the process. What Is Foundry Local? Foundry Local allows developers to run AI models locally on their own hardware. It supports text generation, summarization, and code completion — all without sending data to the cloud. Unlike cloud-based systems, everything happens on your computer, so your data never leaves your device. Prerequisites Before starting, make sure you have: Windows 10 or 11 Python 3.10 or newer Git Internet connection (for the first-time model download) Foundry Local installed Step 1 — Verify Installation After installing Foundry Local, open Command Prompt and type: foundry --version If you see a version number, Foundry Local is installed correctly. Step 2 — Start the Service Start the Foundry Local service using: foundry service start You should see a confirmation message that the service is running. Step 3 — List Available Models To view the models supported by your system, run: foundry model list You’ll get a list of locally available SLMs. Here’s what I saw on my machine: Note: Model availability depends on your device’s hardware. For most laptops, phi-3.5-mini works smoothly on CPU. Step 4 — Run the Phi-3.5 Model Now let’s start chatting with the model: foundry model run phi-3.5-mini-instruct-generic-cpu:1 Once it loads, you’ll enter an interactive chat mode. Try a simple prompt: Hello! What can you do? The model replies instantly — right from your laptop, no cloud needed. To exit, type: /exit How It Works Foundry Local loads the model weights from your device and performs inference locally.This means text generation happens using your CPU (or GPU, if available). The result: complete privacy, no internet dependency, and instant responses. Benefits for Students For students beginning their journey in AI, Foundry Local offers several key advantages: No need for high-end GPUs or expensive cloud subscriptions. Easy setup for experimenting with multiple models. Perfect for class assignments, AI workshops, and offline learning sessions. Promotes a deeper understanding of model behavior by allowing step-by-step local interaction. These factors make Foundry Local a practical choice for learning environments, especially in universities and research institutions where accessibility and affordability are important. Why Use Foundry Local Running models locally offers several practical benefits compared to using AI Foundry in the cloud. With Foundry Local, you do not need an internet connection, and all computations happen on your personal machine. This makes it faster for small models and more private since your data never leaves your device. In contrast, AI Foundry runs entirely on the cloud, requiring internet access and charging based on usage. For students and developers, Foundry Local is ideal for quick experiments, offline testing, and understanding how models behave in real-time. On the other hand, AI Foundry is better suited for large-scale or production-level scenarios where models need to be deployed at scale. In summary, Foundry Local provides a flexible and affordable environment for hands-on learning, especially when working with smaller models such as Phi-3, Qwen2.5, or TinyLlama. It allows you to experiment freely, learn efficiently, and better understand the fundamentals of Edge AI development. Optional: Restart Later Next time you open your laptop, you don’t have to reinstall anything. Just run these two commands again: foundry service start foundry model run phi-3.5-mini-instruct-generic-cpu:1 What I Learned Following the EdgeAI for Beginners Study Guide helped me understand: How edge AI applications work How small models like Phi 3.5 can run on a local machine How to test prompts and build chat apps with zero cloud usage Conclusion Running the Phi-3.5-mini model locally with Foundry Localgave me hands-on insight into edge AI. It’s an easy, private, and cost-free way to explore generative AI development. If you’re new to Edge AI, start with the EdgeAI for Beginners course and follow its Study Guide to get comfortable with local inference and small language models. Resources: EdgeAI for Beginners GitHub Repo Foundry Local Official Site Phi Model Link235Views0likes0CommentsBuilding a Multi-Agent System with Azure AI Agent Service: Campus Event Management
Personal Background My name is Peace Silly. I studied French and Spanish at the University of Oxford, where I developed a strong interest in how language is structured and interpreted. That curiosity about syntax and meaning eventually led me to computer science, which I came to see as another language built on logic and structure. In the academic year 2024–2025, I completed the MSc Computer Science at University College London, where I developed this project as part of my Master’s thesis. Project Introduction Can large-scale event management be handled through a simple chat interface? This was the question that guided my Master’s thesis project at UCL. As part of the Industry Exchange Network (IXN) and in collaboration with Microsoft, I set out to explore how conversational interfaces and autonomous AI agents could simplify one of the most underestimated coordination challenges in campus life: managing events across multiple departments, societies, and facilities. At large universities, event management is rarely straightforward. Rooms are shared between academic timetables, student societies, and one-off events. A single lecture theatre might host a departmental seminar in the morning, a society meeting in the afternoon, and a careers talk in the evening, each relying on different systems, staff, and communication chains. Double bookings, last-minute cancellations, and maintenance issues are common, and coordinating changes often means long email threads, manual spreadsheets, and frustrated users. These inefficiencies do more than waste time; they directly affect how a campus functions day to day. When venues are unavailable or notifications fail to reach the right people, even small scheduling errors can ripple across entire departments. A smarter, more adaptive approach was needed, one that could manage complex workflows autonomously while remaining intuitive and human for end users. The result was the Event Management Multi-Agent System, a cloud-based platform where staff and students can query events, book rooms, and reschedule activities simply by chatting. Behind the scenes, a network of Azure-powered AI agents collaborates to handle scheduling, communication, and maintenance in real time, working together to keep the campus running smoothly. The user scenario shown in the figure below exemplifies the vision that guided the development of this multi-agent system. Starting with Microsoft Learning Resources I began my journey with Microsoft’s tutorial Build Your First Agent with Azure AI Foundry which introduced the fundamentals of the Azure AI Agent Service and provided an ideal foundation for experimentation. Within a few weeks, using the Azure Foundry environment, I extended those foundations into a fully functional multi-agent system. Azure Foundry’s visual interface was an invaluable learning space. It allowed me to deploy, test, and adjust model parameters such as temperature, system prompts, and function calling while observing how each change influenced the agents’ reasoning and collaboration. Through these experiments, I developed a strong conceptual understanding of orchestration and coordination before moving to the command line for more complex development later. When development issues inevitably arose, I relied on the Discord support community and the GitHub forum for troubleshooting. These communities were instrumental in addressing configuration issues and providing practical examples, ensuring that each agent performed reliably within the shared-thread framework. This early engagement with Microsoft’s learning materials not only accelerated my technical progress but also shaped how I approached experimentation, debugging, and iteration. It transformed a steep learning curve into a structured, hands-on process that mirrored professional software development practice. A Decentralised Team of AI Agents The system’s intelligence is distributed across three specialised agents, powered by OpenAI’s GPT-4.1 models through Azure OpenAI Service. They each perform a distinct role within the event management workflow: Scheduling Agent – interprets natural language requests, checks room availability, and allocates suitable venues. Communications Agent – notifies stakeholders when events are booked, modified, or cancelled. Maintenance Agent – monitors room readiness, posts fault reports when venues become unavailable, and triggers rescheduling when needed. Each agent operates independently but communicates through a shared thread, a transparent message log that serves as the coordination backbone. This thread acts as a persistent state space where agents post updates, react to changes, and maintain a record of every decision. For example, when a maintenance fault is detected, the Maintenance Agent logs the issue, the Scheduling Agent identifies an alternative venue, and the Communications Agent automatically notifies attendees. These interactions happen autonomously, with each agent responding to the evolving context recorded in the shared thread. Interfaces and Backend The system was designed with both developer-focused and user-facing interfaces, supporting rapid iteration and intuitive interaction. The Terminal Interface Initially, the agents were deployed and tested through a terminal interface, which provided a controlled environment for debugging and verifying logic step by step. This setup allowed quick testing of individual agents and observation of their interactions within the shared thread. The Chat Interface As the project evolved, I introduced a lightweight chat interface to make the system accessible to staff and students. This interface allows users to book rooms, query events, and reschedule activities using plain language. Recognising that some users might still want to see what happens behind the scenes, I added an optional toggle that reveals the intermediate steps of agent reasoning. This transparency feature proved valuable for debugging and for more technical users who wanted to understand how the agents collaborated. When a user interacts with the chat interface, they are effectively communicating with the Scheduling Agent, which acts as the primary entry point. The Scheduling Agent interprets natural-language commands such as “Book the Engineering Auditorium for Friday at 2 PM” or “Reschedule the robotics demo to another room.” It then coordinates with the Maintenance and Communications Agents to complete the process. Behind the scenes, the chat interface connects to a FastAPI backend responsible for core logic and data access. A Flask + HTMX layer handles lightweight rendering and interactivity, while the Azure AI Agent Service manages orchestration and shared-thread coordination. This combination enables seamless agent communication and reliable task execution without exposing any of the underlying complexity to the end user. Automated Notifications and Fault Detection Once an event is scheduled, the Scheduling Agent posts the confirmation to the shared thread. The Communications Agent, which subscribes to thread updates, automatically sends notifications to all relevant stakeholders by email. This ensures that every participant stays informed without any manual follow-up. The Maintenance Agent runs routine availability checks. If a fault is detected, it logs the issue to the shared thread, prompting the Scheduling Agent to find an alternative room. The Communications Agent then notifies attendees of the change, ensuring minimal disruption to ongoing events. Testing and Evaluation The system underwent several layers of testing to validate both functional and non-functional requirements. Unit and Integration Tests Backend reliability was evaluated through unit and integration tests to ensure that room allocation, conflict detection, and database operations behaved as intended. Automated test scripts verified end-to-end workflows for event creation, modification, and cancellation across all agents. Integration results confirmed that the shared-thread orchestration functioned correctly, with all test cases passing consistently. However, coverage analysis revealed that approximately 60% of the codebase was tested, leaving some areas such as Azure service integration and error-handling paths outside automated validation. These trade-offs were deliberate, balancing test depth with project scope and the constraints of mocking live dependencies. Azure AI Evaluation While functional testing confirmed correctness, it did not capture the agents’ reasoning or language quality. To assess this, I used Azure AI Evaluation, which measures conversational performance across metrics such as relevance, coherence, fluency, and groundedness. The results showed high scores in relevance (4.33) and groundedness (4.67), confirming the agents’ ability to generate accurate and context-aware responses. However, slightly lower fluency scores and weaker performance in multi-turn tasks revealed a retrieval–execution gap typical in task-oriented dialogue systems. Limitations and Insights The evaluation also surfaced several key limitations: Synthetic data: All tests were conducted with simulated datasets rather than live campus systems, limiting generalisability. Scalability: A non-functional requirement in the form of horizontal scalability was not tested. The architecture supports scaling conceptually but requires validation under heavier load. Despite these constraints, the testing process confirmed that the system was both technically reliable and linguistically robust, capable of autonomous coordination under normal conditions. The results provided a realistic picture of what worked well and what future iterations should focus on improving. Impact and Future Work This project demonstrates how conversational AI and multi-agent orchestration can streamline real operational processes. By combining Azure AI Agent Services with modular design principles, the system automates scheduling, communication, and maintenance while keeping the user experience simple and intuitive. The architecture also establishes a foundation for future extensions: Predictive maintenance to anticipate venue faults before they occur. Microsoft Teams integration for seamless in-chat scheduling. Scalability testing and real-user trials to validate performance at institutional scale. Beyond its technical results, the project underscores the potential of multi-agent systems in real-world coordination tasks. It illustrates how modularity, transparency, and intelligent orchestration can make everyday workflows more efficient and human-centred. Acknowledgements What began with a simple Microsoft tutorial evolved into a working prototype that reimagines how campuses could manage their daily operations through conversation and collaboration. This was both a challenging and rewarding journey, and I am deeply grateful to Professor Graham Roberts (UCL) and Professor Lee Stott (Microsoft) for their guidance, feedback, and support throughout the project.214Views1like0CommentsAI Agents: Key Principles and Guidelines - Part 3
This blog post, the third in a series on AI agents, focuses on user-centric design principles for building effective and trustworthy agentic systems. Drawing from the "Agentic Design Patterns" section of Microsoft's "AI Agents for Beginners" GitHub repository, the post outlines key principles categorized by Agent (Space), Agent (Time), and Agent (Core). These principles emphasize connection, accessibility, leveraging historical context, adapting to future needs, and establishing trust through transparency and control. Practical implementation guidelines are provided, along with a travel agent example to illustrate how these principles can be applied in real-world scenarios. The post also links to additional resources and previous installments in the series for a comprehensive learning experience.2.7KViews1like0CommentsUnleashing the Power of AI Agents: Transforming Business Operations
Let "Get Started with AI Agents," in this short blog I want explore the evolution, capabilities, and applications of AI agents, highlighting their potential to enhance productivity and efficiency. We take a peak into the challenges of developing AI agents and introduce powerful tools like Azure AI Foundry and Azure AI Agent Service that empower developers to build, deploy, and scale AI agents securely and efficiently. In today's rapidly evolving technological landscape, the integration of AI agents into business processes is becoming increasingly essential. Lets delve into the transformative potential of AI agents and how they can revolutionize various aspects of our operations. We begin by exploring the evolution of LLM-based solutions, tracing the journey from no agents to sophisticated multi-agent systems. This progression highlights the growing complexity and capabilities of AI agents, which are now poised to handle wide-scope, complex use cases requiring diverse skills. Lets now look at agentic AI capabilities. AI agents can significantly enhance employee productivity and process efficiency, making our operations faster and more effective. Lets examine the key applications of AI agents across industries, such as travel booking and expense management, employee onboarding, personalized customer support, and data analytics and reporting. However, developing AI agents is not without its challenges. Some of the primary considerations, including tool integration, interoperability, scalability, real-time processing, maintenance, flexibility, error handling, and security. These challenges underscore the need for robust platforms that enable rapid development and secure deployment of AI agents. To this end, we introduce Azure AI Foundry and Azure AI Agent Service. These tools empower developers to build, deploy, and scale AI agents securely and efficiently. Azure AI Foundry offers a comprehensive suite of tools, including model catalogs, content safety features, and machine learning capabilities. The Azure AI Agent Service, currently in public preview, provides flexible model selection, extensive data connections, enterprise-grade security, and rapid development and automation capabilities. When building multi agent or agentic based systems there is a huge importance of multi-agent orchestration. Tools like AutoGen and Semantic Kernel facilitate the orchestration of multi-agent systems, enabling seamless integration and collaboration between different AI agents. In conclusion, the transformative potential of AI agents in driving productivity, efficiency, and innovation. By leveraging the capabilities of Azure AI Foundry and Azure AI Agent Service, we can overcome the challenges of AI agent development and unlock new opportunities for growth and success. Resources Azure AI Discord - https://aka.ms/AzureAI/Discord Global AI community - https://globalai.community Generative AI for beginners – https://aka.ms/genai-beginners AI Agents for beginners - https://aka.ms/ai-agents-beginners Attend one of the Global AI Bootcamp near you - https://globalai.community/bootcamp/ Build AI Tour open content - https://aka.ms/aitour/repos Build your first Agent with Azure AI Agent Service - Slide deck and code - https://github.com/microsoft/aitour-build-your-first-agent-with-azure-ai-agent-service1.2KViews2likes0CommentsUsing Advanced Reasoning Model on EdgeAI Part 1 - Quantization, Conversion, Performance
DeepSeek-R1 is very popular, and it can achieve the same capabilities as OpenAI o1 in advanced reasoning. Microsoft has also added DeepSeek-R1 models to Azure AI Foundry and GitHub Models. We can compare DeepSeek-R1 ith other available models through GitHub Models Playground Note This series revolves around deployment of SLMs to Edge Devices 'Edge AI' we will focus on the deployment advanced reasoning models, with different application scenarios. You can learn more in the following session AI Tour BRK453. In this experiement we want to deploy advanced reasoning models to the edge, so that they can run on edge devices with limited computing power and offline environments. At this time, the recommendation is to use the traditional ONNX model . We can use Microsoft Olive to convert the DeepSeek-R1 Distrill model. Getting started with Microsoft Olive is very straightforward. Install the Microsoft Olive library through the command line and Python 3.10+ (recommended) pip install olive-ai The DeepSeek-R1 Distrill model series has different parameters such as 1.5B, 7B, 8B, 14B, 32B, 70B, etc. This article is mainly based on the 1.5B, 7B, and 14B models (so a Small Language Model). CPU Inference Let's discuss 1.5B and 7B, which are models with lower parameter. We can directly use the CPU as computing for inference to test the effect (hardware environment Azure DevBox, AMD EPYC 7763 64-Core + 64GB Memory + 2T SSD) Quantization conversion olive auto-opt --model_name_or_path <Your DeepSeek-R1-Distill-Qwen-1.5B/7B local location> --output_path <Your Convert ONNX INT4 Model local location> --device cpu --provider CPUExecutionProvider --precision int4 --use_model_builder --log_level 1 You can download it directly from my Hugging face Repo (Note: This model is for testing and has not been fully tested by AI Content Safety or provided as an Offical Model) DeepSeek-R1-Distill-Qwen-1.5B-ONNX-INT4-CPU DeepSeek-R1-Distill-Qwen-7B-ONNX-INT4-CPU Running with ONNX Runtime GenAI Install ONNX Runtime GenAI and ONNX Runtime CPU support libraries pip install onnxruntime-genai pip install onnxruntime Sample Code https://github.com/kinfey/EdgeAIForAdvancedReasoning/blob/main/notebook/demo-1.5b.ipynb https://github.com/kinfey/EdgeAIForAdvancedReasoning/blob/main/notebook/demo-7b.ipynb Performance comparison 1.5B vs 7B We compare two different inference scenarios explain 1+1=2 1.5B quantized ONNX model memory occupied, time consumption and number of tokens generated: 7B quantized ONNX model memory occupied, time consumption and number of tokens generated 2. Find all pairwise different isomorphism groups with order 147 and no elements with order 49 1.5B quantized ONNX model memory occupied, time consumption and number of tokens generated: 7B quantized ONNX model memory occupied, time consumption and number of tokens generated Results of the numbers Through the test, we can see that the 1.5B model of DeepSeek is more suitable for use on CPU inference and can be deployed on traditional PCs or IoT devices. As for 7B, although it has better inference, it is not very effective on CPU operation. GPU Inference It is ideal if we have a GPU on the edge device. We can quantize and convert it to an ONNX model for CPU inference through Microsoft Olive. Of course, it can also be converted to a model for GPU inference. Here I take the 14B DeepSeek-R1-Distill-Qwen-14B as an example and make an inference comparison with Microsoft's Phi-4-14B Quantization conversion olive auto-opt --model_name_or_path <Your Phi-4-14B or DeepSeek-R1-Distill-Qwen-14B local path > --output_path <Your converted Phi-4-14B or DeepSeek-R1-Distill-Qwen-14B local path > --device gpu --provider CUDAExecutionProvider --precision int4 --use_model_builder --log_level 1 You can download it directly from my Hugging face Repo (Note: This model is for testing and has not been fully tested by AI Content Safety and not an Official Model) DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU Phi-4-14B-ONNX-INT4-GPU Running with ONNX Runtime GenAI CUDA Install ONNX Runtime GenAI and ONNX Runtime GPU support libraries pip install onnxruntime-genai-cuda pip install onnxruntime-gpu Compare the results in the GPU environment with Gradio It is recommended to use a GPU with more than 8G memory To increase the comparison of the results, we compare it with Phi-4-14B-ONNX-INT4-GPU and DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU to see the different results. We also show we use OpenAI o1-mini (it is recommended to use o1-mini through GitHub Models), Sample Code https://github.com/kinfey/EdgeAIForAdvancedReasoning/blob/main/notebook/Performance_AdvancedReasoning_ONNX_CPU.ipynb You can test any prompt on Gradio to compare the results of Phi-4-14B-ONNX-INT4-GPU, DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU and OpenAI o1 mini. DeepSeek-R1 reduces the cost of inference models and produces more instructive results on professional problems, but Phi-4-14B also has advantages in reasoning and uses lower computing power to complete inference. As for OpenAI o1 mini, it is more comprehensive and can touch all problems. If you want to deploy to Edge Device, Phi-4-14B and quantized DeepSeek-R1 are good choices for you. This blog is just a simple test and the first in this series. Please share your feedback and continue the discussion in the Microsoft AI Discord Channel. Feel free to me a message or comment. We look forward to sharing more around the opportunity of EdgeAI and more content in this series. Resource DeepSeek-R1 in GitHub Models https://github.com/marketplace/models/azureml-deepseek/DeepSeek-R1 DeepSeek-R1 in Azure AI Foundry https://ai.azure.com/explore/models/DeepSeek-R1/version/1/registry/azureml-deepseek Phi-4-14B in Hugging face https://huggingface.co/microsoft/phi-4 Learn about Microsoft Olive https://github.com/microsoft/olive Learn about ONNX Runtime GenAI https://github.com/microsoft/onnxruntime-genai Microsoft AI Discord Channel BRK453 Exploring cutting-edge models: LLMs, SLMs, local development and more https://aka.ms/aitour/brk453960Views0likes0CommentsTiny But Mighty: Unleashing the Power of Small Language Models 🚀
While Large Language Models (LLMs) like GPT-4 dominate headlines with their extensive capabilities, they often come at the cost of high computational requirements and complexity. For developers and organizations looking to implement AI solutions on edge devices or with limited resources, Small Language Models (SLMs) are emerging as a practical alternative. SLMs are not just "smaller" versions of their larger counterparts—they're designed to be faster, more efficient, and adaptable for specific tasks. With fewer parameters and lower computational needs, SLMs open the door to deploying AI on mobile devices, IoT systems, and edge environments without compromising performance. What You Stand to Learn 🧠 Introduction to Microsoft's AI Ecosystem Discover Microsoft's end-to-end AI development tools, from Azure AI Services to ONNX Runtime, enabling efficient and secure deployment of AI models across cloud and edge environments. The Advantages of SLMs over LLMs SLMs are game-changers for edge AI applications, providing faster training and inference times, reduced energy costs, and scalability across diverse devices. Hands-On with Phi-3 and ONNX Runtime Experience live demonstrations of SLMs in action with tools like Phi-3 and ONNX Runtime, showcasing how to fine-tune and deploy models on mobile devices, IoT, and hybrid cloud environments. Responsible AI Practices Understand how to safeguard your AI applications with Microsoft's Responsible AI toolkit, ensuring ethical and trustworthy deployments. Watch the Full Session 👨💻 📅 Date: December 12, 2024 ⏰ Time: 4 PM GMT | 5 PM CEST | 8 AM PT | 11 AM ET | 7 PM EAT A session packed with live demos, practical examples, and Q&A opportunities. Register NOW | Events | Microsoft Reactor Agenda 🔍 Introduction (5 min) A brief overview of the session and its focus on SLMs and LLMs. Microsoft AI Tooling (5 min) Explore the latest tools like Azure AI Services, Azure Machine Learning, and Responsible AI Tooling. How to Choose the Right Model (10 min) Key considerations such as performance, customizability, and ethical implications. Comparing SLMs vs LLMs (10 min) The strengths, weaknesses, and best use cases for both Small and Large Language Models. Deploying Models at the Edge (10 min) Insights into optimizing AI for mobile, IoT, and edge devices. Q&A Addressing participant questions about AI development and deployment.450Views2likes0Comments