azure
3403 TopicsUnleashing the Power of Model Context Protocol (MCP): A Game-Changer in AI Integration
Artificial Intelligence is evolving rapidly, and one of the most pressing challenges is enabling AI models to interact effectively with external tools, data sources, and APIs. The Model Context Protocol (MCP) solves this problem by acting as a bridge between AI models and external services, creating a standardized communication framework that enhances tool integration, accessibility, and AI reasoning capabilities. What is Model Context Protocol (MCP)? MCP is a protocol designed to enable AI models, such as Azure OpenAI models, to interact seamlessly with external tools and services. Think of MCP as a universal USB-C connector for AI, allowing language models to fetch information, interact with APIs, and execute tasks beyond their built-in knowledge. Key Features of MCP Standardized Communication – MCP provides a structured way for AI models to interact with various tools. Tool Access & Expansion – AI assistants can now utilize external tools for real-time insights. Secure & Scalable – Enables safe and scalable integration with enterprise applications. Multi-Modal Integration – Supports STDIO, SSE (Server-Sent Events), and WebSocket communication methods. MCP Architecture & How It Works MCP follows a client-server architecture that allows AI models to interact with external tools efficiently. Here’s how it works: Components of MCP MCP Host – The AI model (e.g., Azure OpenAI GPT) requesting data or actions. MCP Client – An intermediary service that forwards the AI model's requests to MCP servers. MCP Server – Lightweight applications that expose specific capabilities (APIs, databases, files, etc.). Data Sources – Various backend systems, including local storage, cloud databases, and external APIs. Data Flow in MCP The AI model sends a request (e.g., "fetch user profile data"). The MCP client forwards the request to the appropriate MCP server. The MCP server retrieves the required data from a database or API. The response is sent back to the AI model via the MCP client. Integrating MCP with Azure OpenAI Services Microsoft has integrated MCP with Azure OpenAI Services, allowing GPT models to interact with external services and fetch live data. This means AI models are no longer limited to static knowledge but can access real-time information. Benefits of Azure OpenAI Services + MCP Integration ✔ Real-time Data Fetching – AI assistants can retrieve fresh information from APIs, databases, and internal systems. ✔ Contextual AI Responses – Enhances AI responses by providing accurate, up-to-date information. ✔ Enterprise-Ready – Secure and scalable for business applications, including finance, healthcare, and retail. Hands-On Tools for MCP Implementation To implement MCP effectively, Microsoft provides two powerful tools: Semantic Workbench and AI Gateway. Microsoft Semantic Workbench A development environment for prototyping AI-powered assistants and integrating MCP-based functionalities. Features: Build and test multi-agent AI assistants. Configure settings and interactions between AI models and external tools. Supports GitHub Codespaces for cloud-based development. Explore Semantic Workbench Workbench interface examples Microsoft AI Gateway A plug-and-play interface that allows developers to experiment with MCP using Azure API Management. Features: Credential Manager – Securely handle API credentials. Live Experimentation – Test AI model interactions with external tools. Pre-built Labs – Hands-on learning for developers. Explore AI Gateway Setting Up MCP with Azure OpenAI Services Step 1: Create a Virtual Environment First, create a virtual environment using Python: python -m venv .venv Activate the environment: # Windows venv\Scripts\activate # MacOS/Linux source .venv/bin/activate Step 2: Install Required Libraries Create a requirements.txt file and add the following dependencies: langchain-mcp-adapters langgraph langchain-openai Then, install the required libraries: pip install -r requirements.txt Step 3: Set Up OpenAI API Key Ensure you have your OpenAI API key set up: # Windows setx OPENAI_API_KEY "<your_api_key> # MacOS/Linux export OPENAI_API_KEY=<your_api_key> Building an MCP Server This server performs basic mathematical operations like addition and multiplication. Create the Server File First, create a new Python file: touch math_server.py Then, implement the server: from mcp.server.fastmcp import FastMCP # Initialize the server mcp = FastMCP("Math") MCP.tool() def add(a: int, b: int) -> int: return a + b MCP.tool() def multiply(a: int, b: int) -> int: return a * b if __name__ == "__main__": mcp.run(transport="stdio") Your MCP server is now ready to run. Building an MCP Client This client connects to the MCP server and interacts with it. Create the Client File First, create a new file: touch client.py Then, implement the client: import asyncio from mcp import ClientSession, StdioServerParameters from langchain_openai import ChatOpenAI from mcp.client.stdio import stdio_client # Define server parameters server_params = StdioServerParameters( command="python", args=["math_server.py"], ) # Define the model model = ChatOpenAI(model="gpt-4o") async def run_agent(): async with stdio_client(server_params) as (read, write): async with ClientSession(read, write) as session: await session.initialize() tools = await load_mcp_tools(session) agent = create_react_agent(model, tools) agent_response = await agent.ainvoke({"messages": "what's (4 + 6) x 14?"}) return agent_response["messages"][3].content if __name__ == "__main__": result = asyncio.run(run_agent()) print(result) Your client is now set up and ready to interact with the MCP server. Running the MCP Server and Client Step 1: Start the MCP Server Open a terminal and run: python math_server.py This starts the MCP server, making it available for client connections. Step 2: Run the MCP Client In another terminal, run: python client.py Expected Output 140 This means the AI agent correctly computed (4 + 6) x 14 using both the MCP server and GPT-4o. Conclusion Integrating MCP with Azure OpenAI Services enables AI applications to securely interact with external tools, enhancing functionality beyond text-based responses. With standardized communication and improved AI capabilities, developers can build smarter and more interactive AI-powered solutions. By following this guide, you can set up an MCP server and client, unlocking the full potential of AI with structured external interactions. Next Steps: Explore more MCP tools and integrations. Extend your MCP setup to work with additional APIs. Deploy your solution in a cloud environment for broader accessibility. For further details, visit the GitHub repository for MCP integration examples and best practices. MCP GitHub Repository MCP Documentation Semantic Workbench AI Gateway MCP Video Walkthrough MCP Blog MCP Github End to End Demo62KViews11likes6CommentsUnderstanding Azure OpenAI Service Quotas and Limits: A Beginner-Friendly Guide
Azure OpenAI Service allows developers, researchers, and students to integrate powerful AI models like GPT-4, GPT-3.5, and DALL·E into their applications. But with great power comes great responsibility and limits. Before you dive into building your next AI-powered solution, it's crucial to understand how quotas and limits work in the Azure OpenAI ecosystem. This guide is designed to help students and beginners easily understand the concept of quotas, limits, and how to manage them effectively. What Are Quotas and Limits? Think of Azure's quotas as your "AI data pack." It defines how much you can use the service. Meanwhile, limits are hard boundaries set by Azure to ensure fair use and system stability. Quota The maximum number of resources (e.g., tokens, requests) allocated to your Azure subscription. Limit The technical cap imposed by Azure on specific resources (e.g., number of files, deployments). Key Metrics: TPM & RPM Tokens Per Minute (TPM) TPM refers to how many tokens you can use per minute across all your requests in each region. A token is a chunk of text. For example, the word "Hello" is 1 token, but "Understanding" might be 2 tokens. Each model has its own default TPM. Example: GPT-4 might allow 240,000 tokens per minute. You can split this quota across multiple deployments. Requests Per Minute (RPM) RPM defines how many API requests you can make every minute. For instance, GPT-3.5-turbo might allow 350 RPM. DALL·E image generation models might allow 6 RPM. Deployment, File, and Training Limits Here are some standard limits imposed on your OpenAI resource: Resource Type Limit Standard model deployments 32 Fine-tuned model deployments 5 Training jobs 100 total per resource (1 active at a time) Fine-tuning files 50 files (total size: 1 GB) Max prompt tokens per request Varies by model (e.g., 4096 tokens for GPT-3.5) How to View and Manage Your Quota Step-by-Step: Go to the Azure Portal. Navigate to your Azure OpenAI resource. Click on "Usage + quotas" in the left-hand menu. You will see TPM, RPM, and current usage status. To Request More Quota: In the same "Usage + quotas" panel, click on "Request quota increase". Fill in the form: Select the region. Choose the model family (e.g., GPT-4, GPT-3.5). Enter the desired TPM and RPM values. Submit and wait for Azure to review and approve. What is Dynamic Quota? Sometimes, Azure gives you extra quota based on demand and availability. “Dynamic quota” is not guaranteed and may increase or decrease. Useful for short-term spikes but should not be relied on for production apps. Example: During weekends, your GPT-3.5 TPM may temporarily increase if there's less traffic in your region. Best Practices for Students Monitor Regularly: Use the Azure Portal to keep an eye on your usage. Batch Requests: Combine multiple tasks in one API call to save tokens. Start Small: Begin with GPT-3.5 before requesting GPT-4 access. Plan Ahead: If you're preparing a demo or a project, request quota in advance. Handle Limits Gracefully: Code should manage 429 Too Many Requests errors. Quick Resources Azure OpenAI Quotas and Limits How to Request Quota in Azure Join the Conversation on Azure AI Foundry Discussions! Have ideas, questions, or insights about AI? Don't keep them to yourself! Share your thoughts, engage with experts, and connect with a community that’s shaping the future of artificial intelligence. 🧠✨ 👉 Click here to join the discussion!2.8KViews0likes0CommentsGetting Started with the AI Toolkit: A Beginner’s Guide with Demos and Resources
If you're curious about building AI solutions but don’t know where to start, Microsoft’s AI Toolkit is a great place to begin. Whether you’re a student, developer, or just someone exploring AI for the first time, this toolkit helps you build real-world solutions using Microsoft’s powerful AI services. In this blog, I’ll Walk you through what the AI Toolkit is, how you can get started, and where you can find helpful demos and ready-to-use code samples. What is the AI Toolkit? The AI Toolkit is a collection of tools, templates, and sample apps that make it easier to build AI-powered applications and copilots using Microsoft Azure. With the AI Toolkit, you can: Build intelligent apps without needing deep AI expertise. Use templates and guides that show you how everything works. Quickly prototype and deploy apps with natural language, speech, search, and more. Watch the AI Toolkit in Action Microsoft has created a video playlist that covers the AI Toolkit and shows you how to build apps step-by-step. You can watch the full playlist here: It is especially useful for developers who want to bring AI into their projects, but also for beginners who want to learn by doing. AI Toolkit Playlist – https://aka.ms/AIToolkit/videos These videos help you understand the flow of building AI agents, using Azure OpenAI, and other cognitive services in a hands-on way. Explore Sample Projects on GitHub Microsoft also provides a public GitHub repository where you can find real code examples built using the AI Toolkit. Here’s the GitHub repo: AI Toolkit Samples – https://github.com/Azure-Samples/AI_Toolkit_Samples This repository includes: Sample apps using Azure AI services like OpenAI, Cognitive Search, and Speech. Instructions to deploy apps using Azure. Code that you can clone, test, and build on top of. You don’t have to start from scratch just open the code, understand the structure, and make small edits to experiment. How to Get Started Here’s a simple path if you’re just starting: Watch 2 or 3 videos from the AI Toolkit Playlist. Go to the GitHub repository and try running one of the examples. Make small changes to the code (like updating the prompt or output). Try deploying the solution on Azure by following the guide in the repo. Keep building and learning. Why This Toolkit is Worth Exploring As someone who is also learning and experimenting, I found this toolkit to be: Easy to understand, even for beginners. Focused on real-world applications, not just theory. Helpful for building responsible AI solutions with good documentation. It gives a complete picture — from writing code to deploying apps. Final Thoughts The AI Toolkit helps you start your journey in AI without feeling overwhelmed. It provides real code, real use cases, and practical demos. With the support of Microsoft Learn and Azure samples, you can go from learning to building in no time. If you’re serious about building with AI, this is a resource worth exploring. Continue the discussion in the Azure AI Foundry Discord community at Https://aka.ms/AI/discord Join the Azure AI Foundry Discord Server! References AI Toolkit Playlist (YouTube) https://aka.ms/AIToolkit/videos AI Toolkit GitHub Repository https://github.com/Azure-Samples/AI_Toolkit_Samples Microsoft Learn: AI Toolkit Documentation https://learn.microsoft.com/en-us/azure/ai-services/toolkit/ Azure AI Services https://azure.microsoft.com/en-us/products/ai-services/1.8KViews0likes0CommentsModel Mondays S2E11: Exploring Speech AI in Azure AI Foundry
1. Weekly Highlights This week’s top news in the Azure AI ecosystem included: Lakuna — Copilot Studio Agent for Product Teams: A hackathon project built with Copilot Studio and Azure AI Foundry, Lakuna analyzes your requirements and docs to surface hidden assumptions, helping teams reflect, test, and reduce bias in product planning. Azure ND H200 v5 VMs for AI: Azure Machine Learning introduced ND H200 v5 VMs, featuring NVIDIA H200 GPUs (over 1TB GPU memory per VM!) for massive models, bigger context windows, and ultra-fast throughput. Agent Factory Blog Series: The next wave of agentic AI is about extensibility: plug your agents into hundreds of APIs and services using Model Connector Protocol (MCP) for portable, reusable tool integrations. GPT-5 Tool Calling on Azure AI Foundry: GPT-5 models now support free-form tool calling—no more rigid JSON! Output SQL, Python, configs, and more in your preferred format for natural, flexible workflows. Microsoft a Leader in 2025 Gartner Magic Quadrant: Azure was again named a leader for Cloud Native Application Platforms—validating its end-to-end runway for AI, microservices, DevOps, and more. 2. Spotlight On: Azure AI Foundry Speech Playground The main segment featured a live demo of the new Azure AI Speech Playground (now part of Foundry), showing how developers can experiment with and deploy cutting-edge voice, transcription, and avatar capabilities. Key Features & Demos: Speech Recognition (Speech-to-Text): Try real-time transcription directly in the playground—recognizing natural speech, pauses, accents, and domain terms. Batch and Fast transcription options for large files and blob storage. Custom Speech: Fine-tune models for your industry, vocabulary, and noise conditions. Text to Speech (TTS): Instantly convert text into natural, expressive audio in 150+ languages with 600+ neural voices. Demo: Listen to pre-built voices, explore whispering, cheerful, angry, and more styles. Custom Neural Voice: Clone and train your own professional or personal voice (with strict Responsible AI controls). Avatars & Video Translation: Bring your apps to life with prebuilt avatars and video translation, which syncs voice-overs to speakers in multilingual videos. Voice Live API: Voice Live API (Preview) integrates all premium speech capabilities with large language models, enabling real-time, proactive voice agents and chatbots. Demo: Language learning agent with voice, avatars, and proactive engagement. One-click code export for deployment in your IDE. 3. Customer Story: Hilo Health This week’s customer spotlight featured Helo Health—a healthcare technology company using Azure AI to boost efficiency for doctors, staff, and patients. How Hilo Uses Azure AI: Document Management: Automates fax/document filing, splits multi-page faxes by patient, reduces staff effort and errors using Azure Computer Vision and Document Intelligence. Ambient Listening: Ambient clinical note transcription captures doctor-patient conversations and summarizes them for easy EHR documentation. Genie AI Contact Center: Agentic voice assistants handle patient calls, book appointments, answer billing/refill questions, escalate to humans, and assist human agents—using Azure Communication Services, Azure Functions, FastAPI (community), and Azure OpenAI. Conversational Campaigns: Outbound reminders, procedure preps, and follow-ups all handled by voice AI—freeing up human staff. Impact: Hilo reaches 16,000+ physician practices and 180,000 providers, automates millions of communications, and processes $2B+ in payments annually—demonstrating how multimodal AI transforms patient journeys from first call to post-visit care. 4. Key Takeaways Here’s what you need to know from S2E11: Speech AI is Accessible: The Azure AI Foundry Speech Playground makes experimenting with voice recognition, TTS, and avatars easy for everyone. From Playground to Production: Fine-tune, export code, and deploy speech models in your own apps with Azure Speech Service. Responsible AI Built-In: Custom Neural Voice and avatars require application and approval, ensuring ethical, secure use. Agentic AI Everywhere: Voice Live API brings real-time, multimodal voice agents to any workflow. Healthcare Example: Hilo’s use of Azure AI shows the real-world impact of speech and agentic AI, from patient intake to after-visit care. Join the Community: Keep learning and building—join the Discord and Forum. Sharda's Tips: How I Wrote This Blog I organize key moments from each episode, highlight product demos and customer stories, and use GitHub Copilot for structure. For this recap, I tested the Speech Playground myself, explored the docs, and summarized answers to common developer questions on security, dialects, and deployment. Here’s my favorite Copilot prompt this week: "Generate a technical blog post for Model Mondays S2E11 based on the transcript and episode details. Focus on Azure Speech Playground, TTS, avatars, Voice Live API, and healthcare use cases. Add practical links for developers and students!" Coming Up Next Week Next week: Observability! Learn how to monitor, evaluate, and debug your AI models and workflows using Azure and OpenAI tools. Register For The Livestream – Sep 1, 2025 Register For The AMA – Sep 5, 2025 Ask Questions & View Recaps – Discussion Forum About Model Mondays Model Mondays is your weekly Azure AI learning series: 5-Minute Highlights: Latest AI news and product updates 15-Minute Spotlight: Demos and deep dives with product teams 30-Minute AMA Fridays: Ask anything in Discord or the forum Start building: Register For Livestreams Watch Past Replays Register For AMA Recap Past AMAs Join The Community Don’t build alone! The Azure AI Developer Community is here for real-time chats, events, and support: Join the Discord Explore the Forum About Me I'm Sharda, a Gold Microsoft Learn Student Ambassador focused on cloud and AI. Find me on GitHub, Dev.to, Tech Community, and LinkedIn. In this blog series, I share takeaways from each week’s Model Mondays livestream.335Views0likes0CommentsModel Mondays S2E12: Models & Observability
1. Weekly Highlights This week’s top news in the Azure AI ecosystem included: GPT Real Time (GA): Azure AI Foundry now offers GPT Real Time (GA)—lifelike voices, improved instruction following, audio fidelity, and function calling, with support for image context and lower pricing. Read the announcement and check out the model card for more details. Azure AI Translator API (Public Preview): Choose between fast Neural Machine Translation (NMT) or nuanced LLM-powered translations, with real-time flexibility for multilingual workflows. Read the announcement then check out the Azure AI Translator documentation for more details. Azure AI Foundry Agents Learning Plan: Build agents with autonomous goal pursuit, memory, collaboration, and deep fine-tuning (SFT, RFT, DPO) - on Azure AI Foundry. Read the announcement what Agentic AI involves - then follow this comprehensive learning plan with step-by-step guidance. CalcLM Agent Grid (Azure AI Foundry Labs): Project CalcLM: Agent Grid is a prototype and open-source experiment that illustrates how agents might live in a grid-like surface (like Excel). It's formula-first and lightweight - defining agentic workflows like calculations. Try the prototype and visit Foundry Labs to learn more. Agent Factory Blog: Observability in Agentic AI: Agentic AI tools and workflows are gaining rapid adoption in the enterprise. But delivering safe, reliable and performant agents requires foundation support for Observability. Read the 6-part Agent Factory series and check out the Top 5 agent observability best practices for reliable AI blog post for more details. 2. Spotlight On: Observability in Azure AI Foundry This week’s spotlight featured a deep dive and demo by Han Che (Senior PM, Core AI/ Microsoft ), showing observability end-to-end for agent workflows. Why Observability? Ensures AI quality, performance, and safety throughout the development lifecycle. Enables monitoring, root cause analysis, optimization, and governance for agents and models. Key Features & Demos: Development Lifecycle: Leaderboard: Pick the best model for your agent with real-time evaluation. Playground: Chat and prototype agents, view instant quality and safety metrics. Evaluators: Assess quality, risk, safety, intent resolution, tool accuracy, code vulnerability, and custom metrics. Governance: Integrate with partners like Cradle AI and SideDot for policy mapping and evidence archiving. Red Teaming Agent: Automatically test for vulnerabilities and unsafe behavior. CI/CD Integration: Automate evaluation in GitHub Actions and Azure DevOps pipelines. Azure DevOps GitHub Actions Monitoring Dashboard: Resource usage, application analytics, input/output tokens, request latency, cost breakdown, and evaluation scores. Azure Cost Management SDKs & Local Evaluation: Run evaluations locally or in the cloud with the Azure AI Evaluation SDK. Demo Highlights: Chat with a travel planning agent, view run metrics and tool usage. Drill into run details, debugging, and real-time safety/quality scores. Configure and run large-scale agent evaluations in CI/CD pipelines. Compare agents, review statistical analysis, and monitor in production dashboards 3. Customer Story: Saifr Saifr is a RegTech company that uses artificial intelligence to streamline compliance for marketing, communications, and creative teams in regulated industries. Incubated at Fidelity Labs (Fidelity Investments’ innovation arm), Saifr helps enterprises create, review, and approve content that meets regulatory standards—faster and with less manual effort. What Saifr Offers AI-Powered Compliance: Saifr’s platform leverages proprietary AI models trained on decades of regulatory expertise to automatically detect potential compliance risks in text, images, audio, and video. Automated Guardrails: The solution flags risky or non-compliant language, suggests compliant alternatives, and provides explanations—all in real time. Workflow Integration: Saifr seamlessly integrates with enterprise content creation and approval workflows, including cloud platforms and agentic AI systems like Azure AI Foundry. Multimodal Support: Goes beyond text to check images, videos, and audio for compliance risks, supporting modern marketing and communications teams. 4. Key Takeaways Observability is Essential: Azure AI Foundry offers complete monitoring, evaluation, tracing, and governance for agentic AI—making production safe, reliable, and compliant. Built-In Evaluation and Red Teaming: Use leaderboards, evaluators, and red teaming agents to assess and continuously improve model safety and quality. CI/CD and Dashboard Integration: Automate evaluations in GitHub Actions or Azure DevOps, then monitor and optimize agents in production with detailed dashboards. Compliance Made Easy: Safer’s agents and models help financial services and regulated industries proactively meet compliance standards for content and communications. Sharda's Tips: How I Wrote This Blog I focus on organizing highlights, summarizing customer stories, and linking to official Microsoft docs and real working resources. For this recap, I explored the Azure AI Foundry Observability docs, tested CI/CD pipeline integration, and watched the customer demo to share best practices for regulated industries. Here’s my Copilot prompt for this episode: "Generate a technical blog post for Model Mondays S2E12 based on the transcript and episode details. Focus on observability, agent dashboards, CI/CD, compliance, and customer stories. Add correct, working Microsoft links!" Coming Up Next Week Next week: Open Source Models! Join us for the final episode with Hugging Face VP of Product, live demos, and open model workflows. Register For The Livestream – Sep 15, 2025 About Model Mondays Model Mondays is your weekly Azure AI learning series: 5-Minute Highlights: Latest AI news and product updates 15-Minute Spotlight: Demos and deep dives with product teams 30-Minute AMA Fridays: Ask anything in Discord or the forum Start building: Watch Past Replays Register For AMA Recap Past AMAs Join The Community Don’t build alone! The Azure AI Developer Community is here for real-time chats, events, and support: Join the Discord Explore the Forum About Me I'm Sharda, a Gold Microsoft Learn Student Ambassador focused on cloud and AI. Find me on GitHub, Dev.to, Tech Community, and LinkedIn. In this blog series, I share takeaways from each week’s Model Mondays livestream.269Views0likes0CommentsModel Mondays S2E13: Open Source Models (Hugging Face)
1. Weekly Highlights 1. Weekly Highlights Here are the key updates we covered in the Season 2 finale: O1 Mini Reinforcement Fine-Tuning (GA): Fine-tune models with as few as ~100 samples using built-in Python code graders. Azure Live Interpreter API (Preview): Real-time speech-to-speech translation supporting 76 input languages and 143 locales with near human-level latency. Agent Factory – Part 5: Connecting agents using open standards like MCP (Model Context Protocol) and A2A (Agent-to-Agent protocol). Ask Ralph by Ralph Lauren: A retail example of agentic AI for conversational styling assistance, built on Azure OpenAI and Foundry’s agentic toolset. VS Code August Release: Brings auto-model selection, stronger safety guards for sensitive edits, and improved agent workflows through new agents.md support. 2. Spotlight – Open Source Models in Azure AI Foundry Guest: Jeff Boudier, VP of Product at Hugging Face Jeff showcased the deep integration between the Hugging Face community and Azure AI Foundry, where developers can access over 10 000 open-source models across multiple modalities—LLMs, speech recognition, computer vision, and even specialized domains like protein modeling and robotics. Demo Highlights Discover models through Azure AI Foundry’s task-based catalog filters. Deploy directly from Hugging Face Hub to Azure with one-click deployment. Explore Use Cases such as multilingual speech recognition and vision-language-action models for robotics. Jeff also highlighted notable models, including: SmoLM3 – a 3 B-parameter model with hybrid reasoning capabilities Qwen 3 Coder – a mixture-of-experts model optimized for coding tasks Parakeet ASR – multilingual speech recognition Microsoft Research protein-modeling collection MAGMA – a vision-language-action model for robotics Integration extends beyond deployment to programmatic access through the Azure CLI and Python SDKs, plus local development via new VS Code extensions. 3. Customer Story – DraftWise (BUILD 2025 Segment) The finale featured a customer spotlight on DraftWise, where CEO James Ding shared how the company accelerates contract drafting with Azure AI Foundry. Problem Legal contract drafting is time-consuming and error-prone. Solution DraftWise uses Azure AI Foundry to fine-tune Hugging Face language models on legal data, generating contract drafts and redline suggestions. Impact Faster drafting cycles and higher consistency Easy model management and deployment with Foundry’s secure workflows Transparent evaluation for legal compliance 4. Community Story – Hugging Face & Microsoft The episode also celebrated the ongoing collaboration between Hugging Face and Microsoft and the impact of open-source AI on the global developer ecosystem. Community Benefits Access to State-of-the-Art Models without licensing barriers Transparent Performance through public leaderboards and benchmarks Rapid Innovation as improvements and bug fixes spread quickly Education & Empowerment via tutorials, docs, and active forums Responsible AI Practices encouraged through community oversight 5. Key Takeaways Open Source AI Is Here to Stay Azure AI Foundry and Hugging Face make deploying, fine-tuning, and benchmarking open models easier than ever. Community Drives Innovation: Collaboration accelerates progress, improves transparency, and makes AI accessible to everyone. Responsible AI and Transparency: Open-source models come with clear documentation, licensing, and community-driven best practices. Easy Deployment & Customization: Azure AI Foundry lets you deploy, automate, and customize open models from a single, unified platform. Learn, Build, Share: The open-model ecosystem is a great place for students, developers, and researchers to learn, build, and share their work. Sharda's Tips: How I Wrote This Blog For this final recap, I focused on capturing the energy of the open source AI movement and the practical impact of Hugging Face and Azure AI Foundry collaboration. I watched the livestream, took notes on the demos and interviews, and linked directly to official resources for models, docs, and community sites. Here’s my Copilot prompt for this episode: "Generate a technical blog post for Model Mondays S2E13 based on the transcript and episode details. Focus on open source models, Hugging Face, Azure AI Foundry, and community workflows. Include practical links and actionable insights for developers and students! Learn & Connect Explore Open Models in Azure AI Foundry Hugging Face Leaderboard Responsible AI in Azure Machine Learning Llama-3 by Meta Hugging Face Community Azure AI Documentation About Model Mondays Model Mondays is your weekly Azure AI learning series: 5-Minute Highlights: Latest AI news and product updates 15-Minute Spotlight: Demos and deep dives with product teams 30-Minute AMA Fridays: Ask anything in Discord or the forum Start building: Watch Past Replays Register For AMA Recap Past AMAs Join The Community Don’t build alone! The Azure AI Developer Community is here for real-time chats, events, and support: Join the Discord Explore the Forum About Me I'm Sharda, a Gold Microsoft Learn Student Ambassador focused on cloud and AI. Find me on GitHub, Dev.to, Tech Community, and LinkedIn. In this blog series, I share takeaways from each week’s Model Mondays livestream.349Views0likes0CommentsEdge AI for Beginners : Getting Started with Foundry Local
In Module 08 of the EdgeAI for Beginners course, Microsoft introduces Foundry Local a toolkit that helps you deploy and test Small Language Models (SLMs) completely offline. In this blog, I’ll share how I installed Foundry Local, ran the Phi-3.5-mini model on my windows laptop, and what I learned through the process. What Is Foundry Local? Foundry Local allows developers to run AI models locally on their own hardware. It supports text generation, summarization, and code completion — all without sending data to the cloud. Unlike cloud-based systems, everything happens on your computer, so your data never leaves your device. Prerequisites Before starting, make sure you have: Windows 10 or 11 Python 3.10 or newer Git Internet connection (for the first-time model download) Foundry Local installed Step 1 — Verify Installation After installing Foundry Local, open Command Prompt and type: foundry --version If you see a version number, Foundry Local is installed correctly. Step 2 — Start the Service Start the Foundry Local service using: foundry service start You should see a confirmation message that the service is running. Step 3 — List Available Models To view the models supported by your system, run: foundry model list You’ll get a list of locally available SLMs. Here’s what I saw on my machine: Note: Model availability depends on your device’s hardware. For most laptops, phi-3.5-mini works smoothly on CPU. Step 4 — Run the Phi-3.5 Model Now let’s start chatting with the model: foundry model run phi-3.5-mini-instruct-generic-cpu:1 Once it loads, you’ll enter an interactive chat mode. Try a simple prompt: Hello! What can you do? The model replies instantly — right from your laptop, no cloud needed. To exit, type: /exit How It Works Foundry Local loads the model weights from your device and performs inference locally.This means text generation happens using your CPU (or GPU, if available). The result: complete privacy, no internet dependency, and instant responses. Benefits for Students For students beginning their journey in AI, Foundry Local offers several key advantages: No need for high-end GPUs or expensive cloud subscriptions. Easy setup for experimenting with multiple models. Perfect for class assignments, AI workshops, and offline learning sessions. Promotes a deeper understanding of model behavior by allowing step-by-step local interaction. These factors make Foundry Local a practical choice for learning environments, especially in universities and research institutions where accessibility and affordability are important. Why Use Foundry Local Running models locally offers several practical benefits compared to using AI Foundry in the cloud. With Foundry Local, you do not need an internet connection, and all computations happen on your personal machine. This makes it faster for small models and more private since your data never leaves your device. In contrast, AI Foundry runs entirely on the cloud, requiring internet access and charging based on usage. For students and developers, Foundry Local is ideal for quick experiments, offline testing, and understanding how models behave in real-time. On the other hand, AI Foundry is better suited for large-scale or production-level scenarios where models need to be deployed at scale. In summary, Foundry Local provides a flexible and affordable environment for hands-on learning, especially when working with smaller models such as Phi-3, Qwen2.5, or TinyLlama. It allows you to experiment freely, learn efficiently, and better understand the fundamentals of Edge AI development. Optional: Restart Later Next time you open your laptop, you don’t have to reinstall anything. Just run these two commands again: foundry service start foundry model run phi-3.5-mini-instruct-generic-cpu:1 What I Learned Following the EdgeAI for Beginners Study Guide helped me understand: How edge AI applications work How small models like Phi 3.5 can run on a local machine How to test prompts and build chat apps with zero cloud usage Conclusion Running the Phi-3.5-mini model locally with Foundry Localgave me hands-on insight into edge AI. It’s an easy, private, and cost-free way to explore generative AI development. If you’re new to Edge AI, start with the EdgeAI for Beginners course and follow its Study Guide to get comfortable with local inference and small language models. Resources: EdgeAI for Beginners GitHub Repo Foundry Local Official Site Phi Model Link917Views1like0CommentsAzure AI Model Inference API
The Azure AI Model Inference API provides a unified interface for developers to interact with various foundational models deployed in Azure AI Studio. This API allows developers to generate predictions from multiple models without changing their underlying code. By providing a consistent set of capabilities, the API simplifies the process of integrating and switching between different models, enabling seamless model selection based on task requirements.4.5KViews0likes2CommentsModel Mondays S2:E7 · AI-Assisted Azure Development
Welcome to Episode 7! This week, we explore how AI is transforming Azure development. We’ll break down two key tools—Azure MCP Server and GitHub Copilot for Azure—and see how they make working with Azure resources easier for everyone. We’ll also look at a real customer story from SightMachine, showing how AI streamlines manufacturing operations.383Views0likes0CommentsHow to use Instance Mix with Azure Virtual Machine Scale Sets
When a Virtual Machine Scale Set needs to scale out, it normally uses a single VM size (also known as SKU). Whilst that is simple, it can also become a constraint. For example the scale out might be limited if that specific size has limited regional capacity, quota pressure, or a price profile that is not ideal for every scale-out event. Instance Mix for Azure Virtual Machine Scale Sets allows you to configure your scale out with more options by define multiple compatible VM sizes in a single scale set. When scaling out, Azure can choose from your list during provisioning based on the allocation strategy you select. What is Instance Mix? Instance Mix is a Virtual Machine Scale Sets capability that lets you specify up to five VM sizes for a single scale set that uses Flexible orchestration mode. Instead of setting the scale set to a single size such as Standard_D2s_v5 , you set the scale set SKU name to Mix and define the real VM sizes in the skuProfile . At provisioning time, Azure uses the skuProfile.vmSizes list and the selected allocation strategy to decide which VM size to deploy. The high-level model looks like this: { "sku": { "name": "Mix", "capacity": 2 }, "properties": { "skuProfile": { "vmSizes": [ { "name": "Standard_D2s_v5" }, { "name": "Standard_D2as_v5" }, { "name": "Standard_D2s_v4" } ], "allocationStrategy": "CapacityOptimized" } } } You can use this if your workload can run on more than one VM size and you want to improve provisioning success, optimize for cost, or align allocation with reservations or savings plans. When should you use Instance Mix? Instance Mix is a good fit when: Your application can run correctly on several similar VM sizes. You want scale-out to have more than one capacity option in a region. You run Spot VMs and want Azure to prefer lower-priced capacity when available. You have reservations for specific sizes, or want to favor VM sizes with better savings-plan economics, and need predictable priority. You want one scale set instead of several separate scale sets for equivalent worker capacity. It is usually best for stateless or horizontally scalable workloads, such as web front ends, API tiers, queue workers, batch workers, and other scale-out services where each instance performs the same role. Avoid using Instance Mix as a way to combine very different machines in one pool. For example, mixing a small general-purpose VM with a much larger memory-optimized VM can make capacity planning, load distribution, and performance troubleshooting harder. Best practice is to use sizes with similar vCPU and memory for balanced load distribution and similar VM types for consistent performance. Key requirements and limits Before you deploy, check these requirements: Requirement Detail Orchestration mode Instance Mix is available only for Virtual Machine Scale Sets using Flexible orchestration mode. Number of VM sizes You can specify up to five VM sizes. VM family support The Instance Mix skuProfile supports A, B, D, E, and F VM families. Architecture Do not mix CPU architectures. For example, do not mix Arm64 and x64 sizes in the same Instance Mix. Storage interface Do not mix incompatible storage interfaces such as SCSI and NVMe. Premium storage capability Do not mix VM SKUs that use premium storage with SKUs that do not. Security profile The selected sizes must use a compatible security profile. Local disk configuration The selected sizes must have a consistent local disk configuration. Quota You must already have quota for the VM sizes you include. Instance Mix does not request quota for you. Unsupported scenarios Instance Mix currently does not support Standby Pools, Azure Dedicated Host, Proximity Placement Groups, on-demand capacity reservations, or diffDiskSettings on the OS disk. Microsoft Learn currently states that all public Azure regions support Instance Mix. VM size availability and quota still vary by region and subscription, so you should check the exact sizes you plan to use before deploying. Choose an allocation strategy Instance Mix supports three allocation strategies. Strategy Best for Behavior LowestPrice Cost-sensitive and fault-tolerant workloads, especially Spot Azure prefers the lowest-priced VM sizes in the list while considering available capacity. It deploys as many of the lowest-priced VMs as capacity allows before moving to higher-priced sizes, so higher-cost sizes may be selected to secure capacity. This is the default strategy if you do not specify one. CapacityOptimized Production workloads where provisioning success is more important than the lowest possible price Azure prioritizes VM sizes that have the highest likelihood of available capacity in the target region. Cost is not considered, and higher-cost sizes may be selected to secure capacity. Prioritized Predictable ordering, reservations, or savings plan alignment Azure follows user-defined rank values on VM sizes when provided, subject to quota and regional capacity. Lower rank values have higher priority. This strategy is currently documented as preview. Rank note: Use rank only with Prioritized . Ranks are optional with Prioritized ; when specified, lower numbers have higher priority, and rank values can be duplicated or non-sequential. Omit ranks for LowestPrice and CapacityOptimized . For many operations teams, CapacityOptimized is a practical starting point for production scale sets because it is designed to improve the chance of successful provisioning. For Spot-heavy or highly cost-sensitive workloads, LowestPrice may be a better first choice. If you have reservations for specific sizes, or want to favor VM sizes with better savings-plan economics, use Prioritized and rank those reservation-backed or preferred sizes first. Walkthru Prereqs The commands below are Bash-specific. Use Azure Cloud Shell in Bash mode or another Bash environment with Azure CLI installed. You need: An Azure subscription. Permission to create resource groups, networking, and Virtual Machine Scale Sets. Azure CLI version 2.66.0 or later. An SSH-capable client if you plan to connect to the Linux instances. Check your Azure CLI version: az version --query '"azure-cli"' --output tsv Sign in and select the subscription you want to use: az login az account set --subscription "<subscription-id-or-name>" Set variables for the example: RG="rg-vmss-instancemix-demo" LOCATION="australiaeast" VMSS="vmss-instancemix-demo" ADMIN_USER="azureuser" You can change LOCATION to another public Azure region if you don't feel like sharing a datacenter rack with drop bears and the occasional kangaroo. Keep the VM sizes in the examples compatible if you change regions or families. Step 1: Check VM size availability in the region Before creating the scale set, check that the candidate VM sizes are available in your target region and subscription. az vm list-skus \ --location "$LOCATION" \ --resource-type virtualMachines \ --size Standard_D \ --all \ --output table For this walkthrough, we will use these D-series sizes: Standard_D2s_v5 Standard_D2as_v5 Standard_D2s_v4 In the az vm list-skus output, check the Restrictions column. If a size is restricted in your region or subscription, choose another compatible size or deploy in a different region. Step 2: Check quota Instance Mix does not automatically request quota. If one size in the mix lacks quota, Azure can try another size from the list that has quota. If none of the eligible sizes has enough quota, deployment or scale-out can fail. Check current regional VM quota and usage: az vm list-usage \ --location "$LOCATION" \ --output table Look for: Total Regional vCPUs The family quota rows that match your selected VM sizes, such as DSv5, DASv5, or DSv4 family vCPUs If quota is too low, either request a quota increase or choose sizes that have available quota in the target region. Step 3: Create a resource group Create a resource group for the demo: az group create \ --name "$RG" \ --location "$LOCATION" Step 4: Create a VM Scale Set with Instance Mix The key settings are: --orchestration-mode Flexible --vm-sku Mix --skuprofile-vmsizes --skuprofile-allocation-strategy Create the scale set: az vmss create \ --resource-group "$RG" \ --name "$VMSS" \ --location "$LOCATION" \ --orchestration-mode Flexible \ --image Ubuntu2204 \ --vm-sku Mix \ --skuprofile-vmsizes Standard_D2s_v5 Standard_D2as_v5 Standard_D2s_v4 \ --skuprofile-allocation-strategy CapacityOptimized \ --instance-count 2 \ --admin-username "$ADMIN_USER" \ --generate-ssh-keys This creates a Flexible orchestration scale set with two instances. Because the scale set uses Instance Mix, the scale set SKU name is Mix , and the allowed VM sizes are stored in skuProfile.vmSizes . Do not expect every VM size in the list to appear immediately. Azure may deploy all initial instances using the same VM size if that satisfies the selected allocation strategy and capacity is available. The point of Instance Mix is to give Azure more eligible choices during provisioning and scaling. Step 5: Verify the Instance Mix configuration View the complete skuProfile : az vmss show \ --resource-group "$RG" \ --name "$VMSS" \ --query "skuProfile" \ --output jsonc Example output: { "allocationStrategy": "CapacityOptimized", "vmSizes": [ { "name": "Standard_D2s_v5" }, { "name": "Standard_D2as_v5" }, { "name": "Standard_D2s_v4" } ] } View only the configured VM sizes: az vmss show \ --resource-group "$RG" \ --name "$VMSS" \ --query "skuProfile.vmSizes[].name" \ --output tsv View only the allocation strategy: az vmss show \ --resource-group "$RG" \ --name "$VMSS" \ --query "skuProfile.allocationStrategy" \ --output tsv List the current instances and their VM sizes. Because Flexible orchestration instances are standard Azure VMs, use az vm list for full instance details and filter by the scale set resource ID: VMSS_ID=$(az vmss show \ --resource-group "$RG" \ --name "$VMSS" \ --query id \ --output tsv) az vm list \ --resource-group "$RG" \ --query "[?virtualMachineScaleSet.id=='$VMSS_ID'].{Name:name, VMSize:hardwareProfile.vmSize, ProvisioningState:provisioningState}" \ --output table Step 6: Scale out and observe allocation Scale the set from two instances to five: az vmss scale \ --resource-group "$RG" \ --name "$VMSS" \ --new-capacity 5 List the instances again: VMSS_ID=$(az vmss show \ --resource-group "$RG" \ --name "$VMSS" \ --query id \ --output tsv) az vm list \ --resource-group "$RG" \ --query "[?virtualMachineScaleSet.id=='$VMSS_ID'].{Name:name, VMSize:hardwareProfile.vmSize, ProvisioningState:provisioningState}" \ --output table You may see one VM size or several VM sizes. The result depends on allocation strategy, quota, regional capacity, and the sizes in your skuProfile . Step 7: Add autoscale Instance Mix does not require autoscale, but it becomes especially useful when a workload scales out under demand. Autoscale asks the scale set to add instances; Instance Mix gives Azure multiple eligible VM sizes to use for those new instances. Create an autoscale profile: az monitor autoscale create \ --resource-group "$RG" \ --resource "$VMSS" \ --resource-type Microsoft.Compute/virtualMachineScaleSets \ --name "vmss-autoscale" \ --min-count 2 \ --max-count 10 \ --count 2 Create a scale-out rule that adds two instances when average CPU is greater than 70 percent for five minutes: az monitor autoscale rule create \ --resource-group "$RG" \ --autoscale-name "vmss-autoscale" \ --condition "Percentage CPU > 70 avg 5m" \ --scale out 2 Create a scale-in rule that removes one instance when average CPU is less than 30 percent for five minutes: az monitor autoscale rule create \ --resource-group "$RG" \ --autoscale-name "vmss-autoscale" \ --condition "Percentage CPU < 30 avg 5m" \ --scale in 1 Tune these thresholds for your workload. Scale out aggressively enough to protect availability, and scale in conservatively enough to avoid unnecessary churn. Step 8: Update the VM sizes in the mix You can change the VM sizes in an existing Instance Mix configuration. The important detail is that updating --skuprofile-vmsizes replaces the full list. It does not add or remove a single size incrementally. For example, this command replaces the mix with four sizes: az vmss update \ --resource-group "$RG" \ --name "$VMSS" \ --skuprofile-vmsizes Standard_D2s_v5 Standard_D2as_v5 Standard_D2s_v4 Standard_D2as_v4 If you want to remove a size, specify the full list of sizes you want to keep. If you want to add a size, specify the full existing list plus the new size. Verify the new list: az vmss show \ --resource-group "$RG" \ --name "$VMSS" \ --query "skuProfile.vmSizes[].name" \ --output tsv Step 9: Change the allocation strategy You can also update the allocation strategy. For example, this changes the walkthrough scale set from its original CapacityOptimized strategy to LowestPrice . Use this as a real update example, not as a blanket production recommendation: az vmss update \ --resource-group "$RG" \ --name "$VMSS" \ --set skuProfile.allocationStrategy=LowestPrice When you change the allocation strategy, existing VMs are not immediately reshaped. The new strategy takes effect after the scale set scales in or out. If you change from Prioritized to another allocation strategy, Microsoft Learn notes that you must first nullify the priority ranks associated with the VM sizes. Step 10: Enable Instance Mix on an existing scale set You can enable Instance Mix on a separate existing scale set if it uses Flexible orchestration mode, does not already use Instance Mix, and the selected VM sizes are compatible. Do not run this procedure against the $VMSS scale set created earlier, because that scale set is already Instance Mix-enabled. The required settings are: Set sku.name to Mix . Set sku.tier explicitly to null unless you have confirmed it is already null or absent. Provide at least one VM size in skuProfile.vmSizes . Provide an allocation strategy, or let Azure use the default lowest-price strategy. Set a separate variable for the existing scale set you want to convert: EXISTING_VMSS="my-existing-flex-vmss" Example: az vmss update \ --resource-group "$RG" \ --name "$EXISTING_VMSS" \ --set sku.name=Mix sku.tier=null \ --skuprofile-vmsizes Standard_D2as_v4 Standard_D2s_v5 Standard_D2as_v5 \ --set skuProfile.allocationStrategy=CapacityOptimized As with other Instance Mix updates, existing VM instances are not necessarily changed immediately. The configuration is used for subsequent scale actions. Portal procedure If you prefer the Azure portal: Go to Virtual machine scale sets. Select Create. On the Basics tab, choose the subscription, resource group, scale set name, region, image, and administrator settings. Set Orchestration mode to Flexible. In the Size section, choose Select up to 5 sizes. Select up to five compatible VM sizes. Choose the Allocation strategy. If you choose Prioritized (preview), a Rank size section appears below Allocation strategy. Select Rank priority to open the prioritization blade and order the VM sizes based on your preferred priority. Complete the remaining tabs for networking, management, health, scaling, and advanced settings. Select Review + create, then create the scale set. After deployment, open the scale set and check the Overview blade. In the Properties section, check Size for the configured VM sizes and Management for the allocation strategy. Operational tips Use similar sizes for predictable application behavior For web, API, and worker tiers, choose sizes with similar vCPU and memory. This makes autoscale behavior easier to reason about and helps load distribution remain balanced. Do not rely on Instance Mix as a quota workaround Instance Mix can use another listed size if one size lacks quota, but it does not request quota for you. Check quota before production deployment, especially if your autoscale maximum is high. Align reservations and savings plans with Prioritized Reserved instance pricing and savings-plan discounts can apply with Instance Mix. If you want to consume reservation-backed sizes first, or favor sizes with better savings-plan economics, use the Prioritized strategy and rank those sizes first. Use Spot Priority Mix for Spot plus Standard Instance Mix can be used with both Spot and Standard VMs. The Instance Mix FAQ says to use Spot Priority Mix when you need a defined split between Spot and Standard capacity. Caveat: Instance Mix supports B-family sizes for regular capacity, but Azure Spot VMs do not support B-series or promo versions of any size. If your scale set will use Spot capacity, choose Spot-supported sizes for the mix. Be careful with unsupported properties If a deployment template includes unsupported properties such as Azure Dedicated Host, on-demand capacity reservation, Standby Pools, Proximity Placement Groups, or OS disk diffDiskSettings , remove those settings before using Instance Mix. Troubleshooting common deployment errors Error code Meaning Fix SkuProfileAllocationStrategyInvalid The allocation strategy is not valid. Use CapacityOptimized , Prioritized , or LowestPrice for the allocation strategy. SkuProfileVMSizesCannotBeNullOrEmpty No VM sizes were provided. Add at least one VM size to skuProfile.vmSizes . SkuProfileHasTooManyVMSizesInRequest More than five VM sizes were specified. Reduce the list to no more than five VM sizes. SkuProfileVMSizesCannotHaveDuplicates The same VM size appears more than once. Remove duplicate VM sizes. SkuNameMustBeMixIfSkuProfileIsSpecified A skuProfile was provided, but sku.name is not Mix . Set sku.name to Mix . SkuTierMustNotBeSetIfSkuProfileIsSpecified sku.tier is set when using Instance Mix. Set sku.tier to null or omit it. SkuProfileScenarioNotSupported The template includes a property not supported with Instance Mix. Remove unsupported properties such as host groups, capacity reservations, or standby pool settings. Instance Mix gives your operations team a practical way to make scale sets more flexible. By defining several compatible VM sizes and selecting the right allocation strategy, you can improve scale-out success, give Azure more capacity choices, and better align VM allocation with cost or reservation goals. For most production workloads, start with similar VM sizes, verify regional availability and quota, use Flexible orchestration mode, and test scale-out behavior before relying on the configuration in a critical environment. Further reading on Microsoft Learn: Use multiple Virtual Machine sizes with instance mix Create a scale set using instance mix Orchestration modes for Virtual Machine Scale Sets Update instance mix settings on an existing scale set Instance Mix FAQ and troubleshooting128Views0likes0Comments