best practices
81 TopicsDemystifying GitHub Copilot Security Controls: easing concerns for organizational adoption
At a recent developer conference, I delivered a session on Legacy Code Rescue using GitHub Copilot App Modernization. Throughout the day, conversations with developers revealed a clear divide: some have fully embraced Agentic AI in their daily coding, while others remain cautious. Often, this hesitation isn't due to reluctance but stems from organizational concerns around security and regulatory compliance. Having witnessed similar patterns during past technology shifts, I understand how these barriers can slow adoption. In this blog, I'll demystify the most common security concerns about GitHub Copilot and explain how its built-in features address them, empowering organizations to confidently modernize their development workflows. GitHub Copilot Model Training A common question I received at the conference was whether GitHub uses your code as training data for GitHub Copilot. I always direct customers to the GitHub Copilot Trust Center for clarity, but the answer is straightforward: “No. GitHub uses neither Copilot Business nor Enterprise data to train the GitHub model.” Notice this restriction also applies to third-party models as well (e.g. Anthropic, Google). GitHub Copilot Intellectual Property indemnification policy A frequent concern I hear is, since GitHub Copilot’s underlying models are trained on sources that include public code, it might simply “copy and paste” code from those sources. Let’s clarify how this actually works: Does GitHub Copilot “copy/paste”? “The AI models that create Copilot’s suggestions may be trained on public code, but do not contain any code. When they generate a suggestion, they are not “copying and pasting” from any codebase.” To provide an additional layer of protection, GitHub Copilot includes a “duplicate detection filter”. This feature helps prevent suggestions that closely match public code from being surfaced. (Note: This duplicate detection currently does not apply to the Copilot coding agent.) More importantly, customers are protected by an Intellectual Property indemnification policy. This means that if you receive an unmodified suggestion from GitHub Copilot and face a copyright claim as a result, Microsoft will defend you in court. GitHub Copilot Data Retention Another frequent question I hear concerns GitHub Copilot’s data retention policies. For organizations on GitHub Copilot Business and Enterprise plans, retention practices depend on how and where the service is accessed from: Access through IDE for Chat and Code Completions: Prompts and Suggestions: Not retained. User Engagement Data: Kept for two years. Feedback Data: Stored for as long as needed for its intended purpose. Other GitHub Copilot access and use: Prompts and Suggestions: Retained for 28 days. User Engagement Data: Kept for two years. Feedback Data: Stored for as long as needed for its intended purpose. For Copilot Coding Agent, session logs are retained for the life of the account in order to provide the service. Excluding content from GitHub Copilot To prevent GitHub Copilot from indexing sensitive files, you can configure content exclusions at the repository or organization level. In VS Code, use the .copilotignore file to exclude files client-side. Note that files listed in .gitignore are not indexed by default but may still be referenced if open or explicitly referenced (unless they’re excluded through .copilotignore or content exclusions). The life cycle of a GitHub Copilot code suggestion Here are the key protections at each stage of the life cycle of a GitHub Copilot code suggestion: In the IDE: Content exclusions prevent files, folders, or patterns from being included. GitHub proxy (pre-model safety): Prompts go through a GitHub proxy hosted in Microsoft Azure for pre-inference checks: screening for toxic or inappropriate language, relevance, and hacking attempts/jailbreak-style prompts before reaching the model. Model response: With the public code filter enabled, some suggestions are suppressed. The vulnerability protection feature blocks insecure coding patterns like hardcoded credentials or SQL injections in real time. Disable access to GitHub Copilot Free Due to the varying policies associated with GitHub Copilot Free, it is crucial for organizations to ensure it is disabled both in the IDE and on GitHub.com. Since not all IDEs currently offer a built-in option to disable Copilot Free, the most reliable method to prevent both accidental and intentional access is to implement firewall rule changes, as outlined in the official documentation. Agent Mode Allow List Accidental file system deletion by Agentic AI assistants can happen. With GitHub Copilot agent mode, the "Terminal auto approve” setting in VS Code can be used to prevent this. This setting can be managed centrally using a VS Code policy. MCP registry Organizations often want to restrict access to allow only trusted MCP servers. GitHub now offers an MCP registry feature for this purpose. This feature isn’t available in all IDEs and clients yet, but it's being developed. Compliance Certifications The GitHub Copilot Trust Center page lists GitHub Copilot's broad compliance credentials, surpassing many competitors in financial, security, privacy, cloud, and industry coverage. SOC 1 Type 2: Assurance over internal controls for financial reporting. SOC 2 Type 2: In-depth report covering Security, Availability, Processing Integrity, Confidentiality, and Privacy over time. SOC 3: General-use version of SOC 2 with broad executive-level assurance. ISO/IEC 27001:2013: Certification for a formal Information Security Management System (ISMS), based on risk management controls. CSA STAR Level 2: Includes a third-party attestation combining ISO 27001 or SOC 2 with additional cloud control matrix (CCM) requirements. TISAX: Trusted Information Security Assessment Exchange, covering automotive-sector security standards. In summary, while the adoption of AI tools like GitHub Copilot in software development can raise important questions around security, privacy, and compliance, it’s clear that existing safeguards in place help address these concerns. By understanding the safeguards, configurable controls, and robust compliance certifications offered, organizations and developers alike can feel more confident in embracing GitHub Copilot to accelerate innovation while maintaining trust and peace of mind.On‑Device AI with Windows AI Foundry
From “waiting” to “instant”- without sending data away AI is everywhere, but speed, privacy, and reliability are critical. Users expect instant answers without compromise. On-device AI makes that possible: fast, private and available, even when the network isn’t - empowering apps to deliver seamless experiences. Imagine an intelligent assistant that works in seconds, without sending a text to the cloud. This approach brings speed and data control to the places that need it most; while still letting you tap into cloud power when it makes sense. Windows AI Foundry: A Local Home for Models Windows AI Foundry is a developer toolkit that makes it simple to run AI models directly on Windows devices. It uses ONNX Runtime under the hood and can leverage CPU, GPU (via DirectML), or NPU acceleration, without requiring you to manage those details. The principle is straightforward: Keep the model and the data on the same device. Inference becomes faster, and data stays local by default unless you explicitly choose to use the cloud. Foundry Local Foundry Local is the engine that powers this experience. Think of it as local AI runtime - fast, private, and easy to integrate into an app. Why Adopt On‑Device AI? Faster, more responsive apps: Local inference often reduces perceived latency and improves user experience. Privacy‑first by design: Keep sensitive data on the device; avoid cloud round trips unless the user opts in. Offline capability: An app can provide AI features even without a network connection. Cost control: Reduce cloud compute and data costs for common, high‑volume tasks. This approach is especially useful in regulated industries, field‑work tools, and any app where users expect quick, on‑device responses. Hybrid Pattern for Real Apps On-device AI doesn’t replace the cloud, it complements it. Here’s how: Standalone On‑Device: Quick, private actions like document summarization, local search, and offline assistants. Cloud‑Enhanced (Optional): Large-context models, up-to-date knowledge, or heavy multimodal workloads. Design an app to keep data local by default and surface cloud options transparently with user consent and clear disclosures. Windows AI Foundry supports hybrid workflows: Use Foundry Local for real-time inference. Sync with Azure AI services for model updates, telemetry, and advanced analytics. Implement fallback strategies for resource-intensive scenarios. Application Workflow Code Example 1. Only On-Device: Tries Foundry Local first, falls back to ONNX if foundry_runtime.check_foundry_available(): # Use on-device Foundry Local models try: answer = foundry_runtime.run_inference(question, context) return answer, source="Foundry Local (On-Device)" except Exception as e: logger.warning(f"Foundry failed: {e}, trying ONNX...") if onnx_model.is_loaded(): # Fallback to local BERT ONNX model try: answer = bert_model.get_answer(question, context) return answer, source="BERT ONNX (On-Device)" except Exception as e: logger.warning(f"ONNX failed: {e}") return "Error: No local AI available" 2. Hybrid approach: On-device first, cloud as last resort def get_answer(question, context): """ Priority order: 1. Foundry Local (best: advanced + private) 2. ONNX Runtime (good: fast + private) 3. Cloud API (fallback: requires internet, less private) # in case of Hybrid approach, based on real-time scenario """ if foundry_runtime.check_foundry_available(): # Use on-device Foundry Local models try: answer = foundry_runtime.run_inference(question, context) return answer, source="Foundry Local (On-Device)" except Exception as e: logger.warning(f"Foundry failed: {e}, trying ONNX...") if onnx_model.is_loaded(): # Fallback to local BERT ONNX model try: answer = bert_model.get_answer(question, context) return answer, source="BERT ONNX (On-Device)" except Exception as e: logger.warning(f"ONNX failed: {e}, trying cloud...") # Last resort: Cloud API (requires internet) if network_available(): try: import requests response = requests.post( '{BASE_URL_AI_CHAT_COMPLETION}', headers={'Authorization': f'Bearer {API_KEY}'}, json={ 'model': '{MODEL_NAME}', 'messages': [{ 'role': 'user', 'content': f'Context: {context}\n\nQuestion: {question}' }] }, timeout=10 ) answer = response.json()['choices'][0]['message']['content'] return answer, source="Cloud API (Online)" except Exception as e: return "Error: No AI runtime available", source="Failed" else: return "Error: No internet and no local AI available", source="Offline" Demo Project Output: Foundry Local answering context-based questions offline : The Foundry Local engine ran the Phi-4-mini model offline and retrieved context-based data. : The Foundry Local engine ran the Phi-4-mini model offline and mentioned that there is no answer. Practical Use Cases Privacy-First Reading Assistant: Summarize documents locally without sending text to the cloud. Healthcare Apps: Analyze medical data on-device for compliance. Financial Tools: Risk scoring without exposing sensitive financial data. IoT & Edge Devices: Real-time anomaly detection without network dependency. Conclusion On-device AI isn’t just a trend - it’s a shift toward smarter, faster, and more secure applications. With Windows AI Foundry and Foundry Local, developers can deliver experiences that respect user specific data, reduce latency, and work even when connectivity fails. By combining local inference with optional cloud enhancements, you get the best of both worlds: instant performance and scalable intelligence. Whether you’re creating document summarizers, offline assistants, or compliance-ready solutions, this approach ensures your apps stay responsive, reliable, and user-centric. References Get started with Foundry Local - Foundry Local | Microsoft Learn What is Windows AI Foundry? | Microsoft Learn https://devblogs.microsoft.com/foundry/unlock-instant-on-device-ai-with-foundry-local/LangChain v1 is now generally available!
Today LangChain v1 officially launches and marks a new era for the popular AI agent library. The new version ushers in a more streamlined, and extensible foundation for building agentic LLM applications. In this post we'll breakdown what’s new, what changed, and what “general availability” means in practice. Join Microsoft Developer Advocates, Marlene Mhangami and Yohan Lasorsa, to see live demos of the new API and find out more about what JavaScript and Python developers need to know about v1. Register for this event here. Why v1? The Motivation Behind the Redesign The number of abstractions in LangChain had grown over the years to include chains, agents, tools, wrappers, prompt helpers and more, which, while powerful, introduced complexity and fragmentation. As model APIs evolve (multimodal inputs, richer structured output, tool-calling semantics), LangChain needed a cleaner, more consistent core to ensure production ready stability. In v1: All existing chains and agent abstractions in the old LangChain are deprecated; they are replaced by a single high-level agent abstraction built on LangGraph internals. LangGraph becomes the foundational runtime for durable, stateful, orchestrated execution. LangChain now emphasizes being the “fast path to agents” that doesn’t hide but builds upon LangGraph. The internal message format has been upgraded to support standard content blocks (e.g. text, reasoning, citations, tool calls) across model providers, decoupling “content” from raw strings. Namespace cleanup: the langchain package now focuses tightly on core abstractions (agents, models, messages, tools), while legacy patterns are moved into langchain-classic (or equivalents). What’s New & Noteworthy for Developers Here are key changes developers should pay attention to: 1. create_agent becomes the default API The create_agent function is now the idiomatic way to spin up agents in v1. It replaces older constructs (e.g. create_react_agent) with a clearer, more modular API. You can also now compose middleware around model calls, tool calls, before/after hooks, error handling, etc. 2. Standard content blocks & normalized message model One of LangChain's greatest stregnth's is it's model agnosticism. Content blocks move to standardize all outputs, so developers know exactly what to expect regardless of the model they are using. Responses from models are no longer opaque strings. Instead, they carry structured `content_blocks` which classify parts of the output (e.g. “text”, “reasoning”, “citation”, “tool_call”). 3. Multimodal and richer model inputs / outputs LangChain continues to support more than just text-based interactions, but in a more comprehensive way in v1. Models can accept and return files, images, video, etc., and the message format reflects this flexibility. This upgrade prepares us well for the next generation of models with mixed modalities (vision, audio, etc.). 4. Middleware hooks Because create_agent is designed as a pluggable pipeline, developers can now inject logic before/after model calls, before tool calls and more. New middleware such as 'human in the loop' and 'summarization' middleware have been added. This is a feature of the new package that I am most excited about it! Even with the simplified agents API, this option provides more room to customize workflows! Developers can try pre-built middleware or make their own. 5. Simplified, leaner namespace Many formerly top-level modules or helper classes have been removed or relocated to langchain-classic (or similarly stamped “legacy”) to declutter the main API surface. A migration guide is available to help projects transition from v0 to v1. While v1 is now the main line, older v0 is still documented and maintained for compatibility. What “General Availability” Means (and Doesn’t) v1 is production-ready, after testing the alpha version. The stable v0 release line remains supported for those unwilling or unable to migrate immediately. Breaking changes in public APIs will be accompanied by version bumps (i.e. minor version increments) and deprecation notices. The roadmap anticipates minor versions every 2–3 months (with patch releases more frequently). Because the field of LLM applications is evolving rapidly, the team expects continued iterations in v1—even in GA mode—with users encouraged to surface feedback, file issues, and adopt the migration path. (This is in line with the philosophy stated in docs.) Developer Callouts & Suggested Steps Some things we recommend for developers to do to get started with v1: Try the new API Now! LangChain Azure AI and Azure OpenAI have migrated to LangChain v1 and are ready to test! Learn more about using LangChain and Azure AI: Python: https://docs.langchain.com/oss/python/integrations/providers/azure_ai JavaScript: https://docs.langchain.com/oss/javascript/integrations/providers/microsoft Join us for a Live Stream on Wednesday 22 October 2025 Join Microsoft Developer Advocates Marlene Mhangami and Yohan Lasorsa for a livestream this Wednesday to see live demos and find out more about what JavaScript and Python developers need to know about v1. Register for this event here.1.1KViews0likes0CommentsUsing Keycloak with Azure AD to integrate AKS Cluster authentication process
Integrating Azure Kubernetes Service (AKS) with Keycloak through Azure Active Directory (Azure AD) as an intermediary leverages Azure AD’s support for OpenID Connect (OIDC) to handle authentication and authorization. This integration enhances security, streamlines user management, and simplifies the authentication process for users accessing the AKS cluster.AMA Spotlight: Build Smarter with Azure Developer CLI 'AZD'
Weekly AMA 'Ask Me Anything': Build Smarter with Azure Developer CLI Calling all AI engineers, developers, and builders of the future, this is your backstage pass to the tools shaping scalable, agentic AI deployments. Join Kristen Womack, Product Manager for the Azure Developer CLI (azd) Developer CLI (azd), and the engineering team behind azd for a live Ask Me Anything session every Thursday at 12:30pm PT in the Azure AI Foundry Discord. Whether you're: 🧠 Orchestrating multi-agent systems 📦 Deploying LLM-powered apps with Azure AI Foundry 🔐 Navigating least-privilege infrastructure setups 🛠️ Debugging and optimizing reproducible workflows …this AMA is your chance to connect directly with the team building the CLI that powers it all. 💡 Why Join? Real-time answers from the azd engineers and product team Deployment walkthroughs for Foundry templates, from chatbots to document processors Tips for CI/CD, debugging, and reproducibility in enterprise environments Community-first mindset: bring your feedback, challenges, and ideas Kristen Womack brings deep insight into developer experience and product strategy; this is a rare opportunity to learn from the source and shape the future of AI tooling. 🔧 Get Ready Before you join: Install azd 👉 Install Guide Explore Kristen’s work 👉 www.kristenwomack.io Join the Discord 👉 Azure AI Foundry Community 🗓️ Weekly Schedule 🕧 Thursdays at 12:30pm PT 📍 Azure AI Foundry Discord Bring your questions. Bring your curiosity. Build with the best. Additional resources: check out the AZD for Beginners course https://aka.ms/azd-for-beginnersAMA: Azure AI Foundry Voice Live API: Build Smarter, Faster Voice Agents
Join us LIVE in the Azure AI Foundry Discord on the 14th October, 2025, 10am PT to learn more about Voice Live API Voice is no longer a novelty, it's the next-gen interface between humans and machines. From automotive assistants to educational tutors, voice-driven agents are reshaping how we interact with technology. But building seamless, real-time voice experiences has often meant stitching together a patchwork of services: STT, GenAI, TTS, avatars, and more. Until now. Introducing Azure AI Foundry Voice Live API Launched into general availability on October 1, 2025, the Azure AI Foundry Voice Live API is a game-changer for developers building voice-enabled agents. It unifies the entire voice stack—speech-to-text, generative AI, text-to-speech, avatars, and conversational enhancements, into a single, streamlined interface. That means: ⚡ Lower latency 🧠 Smarter interactions 🛠️ Simplified development 📈 Scalable deployment Whether you're prototyping a voice bot for customer support or deploying a full-stack assistant in production, Voice Live API accelerates your journey from idea to impact. Ask Me Anything: Deep Dive with the CoreAI Speech Team Join us for a live AMA session where you can engage directly with the engineers behind the API: 🗓️ Date: 14th Oct 2025 🕒 Time: 10am PT 📍 Location: https://aka.ms/foundry/discord See the EVENTS 🎤 Speakers: Qinying Liao, Principal Program Manager, CoreAI Speech Jan Gorgen, Senior Program Manager, CoreAI Speech They’ll walk through real-world use cases, demo the API in action, and answer your toughest questions, from latency optimization to avatar integration. Who Should Attend? This AMA is designed for: AI engineers building multimodal agents Developers integrating voice into enterprise workflows Researchers exploring conversational UX Foundry users looking to scale voice prototypes Why It Matters Voice Live API isn’t just another endpoint, it’s a foundation for building natural, responsive, and production-ready voice agents. With Azure AI Foundry’s orchestration and deployment tools, you can: Skip the glue code Focus on experience design Deploy with confidence across platforms Bring Your Questions Curious about latency benchmarks? Want to know how avatars sync with TTS? Wondering how to integrate with your existing Foundry workflows? This is your chance to ask the team directly.From Cloud to Chip: Building Smarter AI at the Edge with Windows AI PCs
As AI engineers, we’ve spent years optimizing models for the cloud, scaling inference, wrangling latency, and chasing compute across clusters. But the frontier is shifting. With the rise of Windows AI PCs and powerful local accelerators, the edge is no longer a constraint it’s now a canvas. Whether you're deploying vision models to industrial cameras, optimizing speech interfaces for offline assistants, or building privacy-preserving apps for healthcare, Edge AI is where real-world intelligence meets real-time performance. Why Edge AI, Why Now? Edge AI isn’t just about running models locally, it’s about rethinking the entire lifecycle: - Latency: Decisions in milliseconds, not round-trips to the cloud. - Privacy: Sensitive data stays on-device, enabling HIPAA/GDPR compliance. - Resilience: Offline-first apps that don’t break when the network does. - Cost: Reduced cloud compute and bandwidth overhead. With Windows AI PCs powered by Intel and Qualcomm NPUs and tools like ONNX Runtime, DirectML, and Olive, developers can now optimize and deploy models with unprecedented efficiency. What You’ll Learn in Edge AI for Beginners The Edge AI for Beginners curriculum is a hands-on, open-source guide designed for engineers ready to move from theory to deployment. Multi-Language Support This content is available in over 48 languages, so you can read and study in your native language. What You'll Master This course takes you from fundamental concepts to production-ready implementations, covering: Small Language Models (SLMs) optimized for edge deployment Hardware-aware optimization across diverse platforms Real-time inference with privacy-preserving capabilities Production deployment strategies for enterprise applications Why EdgeAI Matters Edge AI represents a paradigm shift that addresses critical modern challenges: Privacy & Security: Process sensitive data locally without cloud exposure Real-time Performance: Eliminate network latency for time-critical applications Cost Efficiency: Reduce bandwidth and cloud computing expenses Resilient Operations: Maintain functionality during network outages Regulatory Compliance: Meet data sovereignty requirements Edge AI Edge AI refers to running AI algorithms and language models locally on hardware, close to where data is generated without relying on cloud resources for inference. It reduces latency, enhances privacy, and enables real-time decision-making. Core Principles: On-device inference: AI models run on edge devices (phones, routers, microcontrollers, industrial PCs) Offline capability: Functions without persistent internet connectivity Low latency: Immediate responses suited for real-time systems Data sovereignty: Keeps sensitive data local, improving security and compliance Small Language Models (SLMs) SLMs like Phi-4, Mistral-7B, Qwen and Gemma are optimized versions of larger LLMs, trained or distilled for: Reduced memory footprint: Efficient use of limited edge device memory Lower compute demand: Optimized for CPU and edge GPU performance Faster startup times: Quick initialization for responsive applications They unlock powerful NLP capabilities while meeting the constraints of: Embedded systems: IoT devices and industrial controllers Mobile devices: Smartphones and tablets with offline capabilities IoT Devices: Sensors and smart devices with limited resources Edge servers: Local processing units with limited GPU resources Personal Computers: Desktop and laptop deployment scenarios Course Modules & Navigation Course duration. 10 hours of content Module Topic Focus Area Key Content Level Duration 📖 00 Introduction to EdgeAI Foundation & Context EdgeAI Overview • Industry Applications • SLM Introduction • Learning Objectives Beginner 1-2 hrs 📚 01 EdgeAI Fundamentals Cloud vs Edge AI comparison EdgeAI Fundamentals • Real World Case Studies • Implementation Guide • Edge Deployment Beginner 3-4 hrs 🧠 02 SLM Model Foundations Model families & architecture Phi Family • Qwen Family • Gemma Family • BitNET • μModel • Phi-Silica Beginner 4-5 hrs 🚀 03 SLM Deployment Practice Local & cloud deployment Advanced Learning • Local Environment • Cloud Deployment Intermediate 4-5 hrs ⚙️ 04 Model Optimization Toolkit Cross-platform optimization Introduction • Llama.cpp • Microsoft Olive • OpenVINO • Apple MLX • Workflow Synthesis Intermediate 5-6 hrs 🔧 05 SLMOps Production Production operations SLMOps Introduction • Model Distillation • Fine-tuning • Production Deployment Advanced 5-6 hrs 🤖 06 AI Agents & Function Calling Agent frameworks & MCP Agent Introduction • Function Calling • Model Context Protocol Advanced 4-5 hrs 💻 07 Platform Implementation Cross-platform samples AI Toolkit • Foundry Local • Windows Development Advanced 3-4 hrs 🏭 08 Foundry Local Toolkit Production-ready samples Sample applications (see details below) Expert 8-10 hrs Each module includes Jupyter notebooks, code samples, and deployment walkthroughs, perfect for engineers who learn by doing. Developer Highlights - 🔧 Olive: Microsoft's optimization toolchain for quantization, pruning, and acceleration. - 🧩 ONNX Runtime: Cross-platform inference engine with support for CPU, GPU, and NPU. - 🎮 DirectML: GPU-accelerated ML API for Windows, ideal for gaming and real-time apps. - 🖥️ Windows AI PCs: Devices with built-in NPUs for low-power, high-performance inference. Local AI: Beyond the Edge Local AI isn’t just about inference, it’s about autonomy. Imagine agents that: - Learn from local context - Adapt to user behavior - Respect privacy by design With tools like Agent Framework, Azure AI Foundry and Windows Copilot Studio, and Foundry Local developers can orchestrate local agents that blend LLMs, sensors, and user preferences, all without cloud dependency. Try It Yourself Ready to get started? Clone the Edge AI for Beginners GitHub repo, run the notebooks, and deploy your first model to a Windows AI PC or IoT devices Whether you're building smart kiosks, offline assistants, or industrial monitors, this curriculum gives you the scaffolding to go from prototype to production.Essential Microsoft Resources for MVPs & the Tech Community from the AI Tour
Unlock the power of Microsoft AI with redeliverable technical presentations, hands-on workshops, and open-source curriculum from the Microsoft AI Tour! Whether you’re a Microsoft MVP, Developer, or IT Professional, these expertly crafted resources empower you to teach, train, and lead AI adoption in your community. Explore top breakout sessions covering GitHub Copilot, Azure AI, Generative AI, and security best practices—designed to simplify AI integration and accelerate digital transformation. Dive into interactive workshops that provide real-world applications of AI technologies. Take it a step further with Microsoft’s Open-Source AI Curriculum, offering beginner-friendly courses on AI, Machine Learning, Data Science, Cybersecurity, and GitHub Copilot—perfect for upskilling teams and fostering innovation. Don’t just learn—lead. Access these resources, host impactful training sessions, and drive AI adoption in your organization. Start sharing today! Explore now: Microsoft AI Tour Resources.Practical ways to use AI in your Data Science and ML journey
Are you ready to take your learning from “just reading” to “actively doing”? We are launching a series of interactive events designed to help you bridge the gap between theory and practice, connect with peers, and learn directly from experts, all in a vibrant, supportive Discord community. The series will guide you on practical ways to use AI in your learning journey with Data Science and Machine Learning. Join the series on Discord at: https://aka.ms/ds4beginners/discord Why You Should Join Guided Pathways: Move seamlessly from the popular Data Science for Beginners and Machine Learning for Beginners curriculums to interact with fellow learners in the Azure AI Foundry Discord community. We’ve designed study plans, office hours sessions, and showcases so you can learn, ask questions, and get feedback in real time. Practical AI Skills: See how Artificial Intelligence can supercharge your learning. . Community & Collaboration: Find peer learners with similar goals, collaborate on projects, and get support from experienced moderators and MVPs. Machine Learning AMA 1 Theme: Learning Machine Learning with AI, responsibly. Date: Tuesday, 23 September (10AM PST | 1PM ET | 5PM GMT | 8PM EAT) What to Expect: Curriculum overview, hands-on AI demos, critique and correct a simple classifier, and peer collaboration. Link: https://aka.ms/learnwithai/ml Data Science AMA 2 Theme: Learning Data Science with AI, responsibly. Date: Tuesday, 30 September (10AM PST | 1PM ET | 5PM GMT | 8PM EAT) What to Expect: Curriculum overview, integrating AI into your learning, live Q&A, and practical tips for responsible AI use. Link: https://aka.ms/learnwithai/ds Showcase 1: GitHub Copilot for Data Science Theme: Hands-on AI Agents for Data Scientists Date: Thursday, 25 September (10AM PST | 1PM ET | 5PM GMT | 8PM EAT) What to Expect: Learn to use Copilot’s advanced features in Python and Jupyter Notebooks, customize agents, and extend your analyses. Link: https://aka.ms/learnwithai/agents How to Get Involved Join the Discord Channels: https://aka.ms/ds4beginners/discord https://aka.ms/ml4beginners/discord Prepare: Make sure you have a GitHub account, VS Code, Data Wrangler, ready for hands-on sessions. What Makes This Program Different? Discord first learning: Experience higher engagement and deeper connections. Curriculum-anchored activities: Every event is designed to build on what you’ve learned in the open-source curriculums. AI-powered study plans and resources: Get starter notebooks, prompt cards, and recap templates to accelerate your progress. Resources Hands-on workshop: https://aka.ms/ds-agents Data Science for beginner's curriculum: https://aka.ms/ds4beginners Machine Learning for beginner's curriculum: https://aka.ms/ml4beginners Ready to transform your learning journey? Join us for these exciting events and join us in the Azure AI Foundry Discord community. Whether you’re just starting out or looking to deepen your skills, there’s a place for you here!Build Multi‑Agent AI Systems with Microsoft
Like many of you I have been on a journey to build AI systems where multiple agents (AI models with tools and autonomy) collaborate to solve complex tasks. In this post, I want to share the engineering challenges we faced, the architecture we designed with Azure AI Foundry, and the lessons learned along the way. Our goal is to empower AI engineers and developers to leverage multi-agent systems for real-world applications, with the benefit of Microsoft’s tools, research insights, and enterprise-grade platform. Why Multi‑Agent Systems? The Need for AI Teamwork Building a single AI agent to perform a task is often straightforward. However, many real-world processes are too complex for one agent alone. Tasks like in-depth research, enterprise workflow automation, or multi-step customer service involve context switching and specialized knowledge that overwhelm a lone chatbot. Multi-agent systems address this by distributing work across specialized agents while maintaining coordination. This approach brings several advantages: Scalability: Workloads can be split among agents, enabling horizontal scaling as tasks or data increase. More agents can handle more subtasks in parallel, avoiding bottlenecks. Specialisation: Each agent can be fine-tuned for a specific role or domain (e.g. research, summarisation, data extraction), which improves performance and maintainability. No single model has to be a master of all trades. Flexibility: Modular agents can be reused in different workflows or recombined to create new capabilities. It’s easy to extend the system by adding or swapping an agent without redesigning everything. Robustness: If one agent fails or underperforms, others can pick up the slack. Decoupling tasks means the overall system can tolerate faults better than a monolithic agent. This mirrors how human teams work: we achieve more by dividing and conquering complex problems. In fact, internal experiments and industry reports have shown that groups of AI agents can significantly outperform a single powerful model on complex, open-ended tasks. For example, Anthropic found a multi-agent system (Claude agents working together) answered 90% more queries correctly than a single-agent approach in one evaluation. The ability to operate in parallel is key – our experience likewise showed that multiple agents exploring different aspects of a problem can cover far more ground, albeit with increased resource usage. Challenge: A downside of multi-agent setups is they consume more resources (more model calls, more tokens) than single-agent runs. In Anthropic’s research, multi-agent systems used ~15× the tokens of a single chat session. We’ve observed similarly that letting agents think and interact in depth pays off in better results, but at a cost. Ensuring the task’s value justifies the cost is important when choosing a multi-agent solution. Designing the Architecture: Orchestration via a Lead Agent To harness these benefits, we designed a multi-agent architecture built around an orchestrator-worker pattern – very similar to Anthropic’s “lead agent and subagents” approach. In Azure AI Foundry (our enterprise AI platform), this takes shape as Connected Agents: a mechanism where a main agent can spawn and coordinate child agents to handle sub-tasks. The main agent is the brain of the operation, responsible for understanding the user’s request, breaking it into parts, and delegating those parts to the appropriate specialist agents. Each agent in the system is defined with three core components: Instructions (prompt/policy): defining the agent’s goal, role, and constraints (its “game plan”). Model: an LLM that powers the agent’s reasoning and dialogue (e.g. GPT-4 or other models available in Foundry). Tools: external capabilities the agent can invoke to get information or take actions (e.g. web search, databases, APIs). By composing agents with different instructions and tools, we create a team where each agent has a clear role. The main agent’s role is orchestration; the sub-agents focus on specific tasks. This separation of concerns makes the system easier to understand and debug, and prevents any single context window from becoming overloaded. How it works (overview): When a user query comes in, the lead agent analyzes the request and devises a plan. It may decide that multiple pieces of information or steps are needed. The lead agent then spins up subordinate agents in parallel to gather or compute those pieces]. Each sub-agent operates with its own context window and tools, exploring one aspect of the task. They report their findings back to the lead agent, which integrates the results and decides if more exploration is required. The loop continues until the lead agent is satisfied that it can produce a final answer, at which point it consolidates everything and returns the result to the user. This orchestrator/sub-agent pattern is powerful because it lets complex tasks be solved through natural language delegation rather than hard-coded logic. Notably, the main agent doesn’t need an if/else tree written by us to decide which sub-agent handles what; it uses the language model’s reasoning to route tasks. In Azure AI Foundry’s Connected Agents, the primary agent simply says (in effect) “You, Agent A, do X; You, Agent B, do Y,” and the platform handles the rest—no custom orchestration code needed. This drastically simplified our development: we focus on crafting the right prompts and agent designs, and let the AI figure out the coordination. Example: Sales Assistant with Specialist Agents To make this concrete, imagine a Sales Preparation Assistant that helps a sales team research a client before a meeting. Instead of trying to cram all knowledge and skills into one model, we give the assistant a team of four sub-agents, each an expert in a different area. The main agent (“Sales Assistant”) will ask each specialist for input and then compile a briefing. Agent Role Purpose & Task Example Tools/Models Used Market Research Agent Gathers industry trends and news related to the client’s sector. Bing Web Search, internal news API Competitive Analysis Agent Finds information on the client’s competitors and market position. Web Search, Company DB Customer Insights Agent Summarises the client’s history and interactions (from CRM data). Azure Cognitive Search on CRM, GPT-4 Financial Analysis Agent Reviews the client’s financial data and recent performance. Finance database query tool, Excel APIs Main Sales Assistant Orchestrator that delegates to the above agents, then synthesises a final report for the sales team. GPT-4 (with instructions to compile and format results) In this scenario, the Main Sales Assistant agent would ask each sub-agent to report on their specialty (market news, competition, CRM insights, finances). Rather than one AI trying to do it all (and possibly missing nuances), we have focused mini-AIs each doing a thorough job in parallel. This approach was shown to reduce the overall time required and improve the quality of the final output. In early trials, such multi-agent setups often succeed where single agents fall short – for instance, finding all relevant facts across disparate sources and preparing a comprehensive briefing more quickly. Development is easier too: if tomorrow we need to add a “Regulatory Compliance Agent” for a new client requirement, we can plug it in without retraining or heavily modifying the others. Orchestration under the hood: Azure AI Foundry’s Agent Service provides the runtime that makes all this work reliably. It manages the message passing between the main agent and sub-agents, ensures each tool invocation is executed (with retries on failure), and keeps a structured log of the entire multi-agent conversation (we call it a thread). This means developers don’t have to manually implement how agents call each other or share data; the platform handles those mechanics. Foundry also supports true agent-to-agent messaging if agents need to talk directly, but often a hierarchical pattern (through the main agent) suffices for task delegation. Tools, Knowledge, and the Model Context Protocol (MCP) For agents to be effective, especially in enterprise scenarios, they must integrate with external knowledge sources and services – no single LLM knows everything or can perform all actions. Microsoft’s approach emphasizes a rich tool integration layer. In our system, tools range from web search and databases to APIs for taking real actions (sending emails, executing workflows, etc.) Equipping agents with the right tools extends their capabilities dramatically: an agent can retrieve up-to-date info, pull data behind corporate firewalls, or trigger business processes. One key innovation is the Model-Context Protocol (MCP), which Foundry uses to manage tools. MCP provides a structured way for agents to discover and use tools dynamically at runtime. Traditionally, if you wanted your AI agent to use a new tool, you might have to hard-code that tool’s API and update the agent’s code or prompt. With MCP, tools are defined on a central tool server (with descriptions and endpoints), and agents can query this server to see what tools are available. The agent’s SDK then generates the necessary code “stubs” to call the tool on the fly. This means: Easier maintenance: You can add, update, or remove tools in one place (the MCP registry) without changing the agent’s code. When the Finance database API updates, just update its MCP entry; all agents automatically get the new version next time they run. Dynamic adaptability: Agents can choose tools based on context. For example, a research agent might discover that a new MarketAnalysisAPI tool is available and start using it for a finance query, whereas previously it only had a generic web search. Separation of concerns: Those building AI agents can rely on domain experts to maintain the tool definitions, while they focus on the agent logic. Agents treat tools in a uniform way, as functions they can call. In practice, tool selection became a critical part of our agent design. A lesson we learned is that giving agents access to the right tool, with a clear description, can make or break their performance. If a tool’s description is vague or overlapping with another, the agent might choose the wrong approach and wander down a blind alley. For instance, we saw cases where an agent would stubbornly query an internal knowledge base for information that actually only existed on the web, simply because the tool prompt made the web search sound less relevant. We addressed this by carefully curating tool descriptions and even building an internal tool-testing agent that automatically tries out tools and suggests better descriptions for them. Ensuring each tool had a distinct purpose and clear usage guidance dramatically improved our agents’ success rate in choosing the optimal tool for a given job. Finally, multi-modal support is worth noting. Some tasks involve not just text, but images or other media. Our multi-agent architecture, especially with Azure AI Foundry, can incorporate vision-capable models as agents or tools. For example, an “Image Analysis Agent” could be part of a team, or an agent might call a vision API tool. The Telco customer service demo (using Foundry + OpenAI Agent SDK) featured an agent that could handle image uploads (like an ID document) by invoking an image-processing function. The orchestration framework doesn’t fundamentally change with multi-modality, it simply treats the vision model as another specialist agent or tool in the conversation. The ability to plug in different AI skills (text, vision, search, etc.) under a unified agent system is a big advantage of Microsoft’s approach: the agent team becomes cross-functional, each member with their own modality or expertise, collectively solving richer tasks than any single foundation model could. Reliability, Safety, and Enterprise-Grade Engineering While the basic idea of agents chatting and calling tools is elegant, productionizing this system for enterprise use brought serious engineering challenges. We needed our multi-agent system to be reliable, controllable, and secure. Here are the key areas we focused on and how we addressed them: Observability and Debugging Multi-agent chains can be complex and non-deterministic each run might involve different paths as agents make choices. Early on, we realized that treating the system as a black box was untenable. Developers and operators must be able to observe what each agent is “thinking” and doing, or else diagnosing issues would be impossible. Azure AI Foundry’s Agent Service was built with full conversation traceability in mind. Every message between agents (and to the user), every tool invocation and result, is captured in a structured thread log. We integrated this with Azure Application Insights telemetry, so one can monitor performance, latencies, errors, and even token consumption of agents in real time. This tracing proved invaluable. For example, when a complex workflow wasn’t producing the expected outcome, we could replay the entire agent conversation step by step to see where things went awry. In one instance, we found that two sub-agents were given slightly overlapping responsibilities, causing them to waste time retrieving nearly identical information. The logs and message transcripts made this immediately clear, guiding us to tighten the role definitions. Moreover, because the system logs are structured (not just free-form text), we could build automatic analysis tools like checking how often an agent hits a retry or how many cycles a conversation goes through before completion – to spot anomalies. This kind of observability was something the open-source community also highlighted as crucial; in fact, Sematic Kernel, AutoGen frameworks introduce metrics tracking and message tracing for exactly this reason. We also developed visual debugging tools. One example is the AutoGen Studio (a low-code interface from Microsoft Research) which allows developers to visually inspect agent interactions in real time, pause agents, or adjust their behavior on the fly. This interactive approach accelerates the prompt-engineering loop: one can watch agents argue or collaborate live, and intervene if needed. Such capabilities turned out to be vital for understanding emergent behaviors in multi-agent setups. Coordination Complexity and State Management As more agents come into play, keeping them coordinated and preserving shared context is hard. Early versions of agents would sometimes spawn excessive numbers of agents or get stuck in loops. For instance, one of our prototypes (before we applied strict limits) ended up in a degenerate state where two agents kept handing control back and forth without making progress. This taught us to implement guardrails and smarter orchestration policies. In Azure AI Foundry, beyond the simple connected-agent pattern, we introduced a more structured orchestration capability called Multi-Agent Workflows. This lets developers explicitly define states, transitions, and triggers in a workflow that involves multiple agents. It’s like flowcharting the high-level process that the agents should follow, including how they pass data around. We use this for long-running or highly critical processes where you want extra determinism for example, an onboarding workflow might have clearly defined phases (Verification → Provisioning → Notification) each handled by different agents, and you want to ensure the process doesn’t derail. The workflow engine enforces that the system moves to the next state only when all agents in the current state have completed and certain conditions (triggers) are met. It also provides persistence: if the process needs to wait (say, for an external event or simply because it’s a lengthy task), the state is saved and can be resumed later without losing context. These workflow features were a response to reliability needs, they give fine-grained control and error recovery in multi-agent systems that operate over extended periods. In practice, we learned to use the simpler Connected Agents approach for quick, on-the-fly delegations (it’s amazingly capable with minimal setup), and reserve Workflow Orchestration for scenarios where we must guarantee a robust sequence over minutes, hours, or days. By having both options, we can strike a balance between flexibility and control as needed. Trust, Safety, and Governance When you let AI agents act autonomously (especially if they can use tools that modify data or interact with the real world), safety is paramount. From day one, our design included enterprise-grade safety measures: Content Filtering and Policy Enforcement: All AI outputs go through content filters to catch disallowed content or potential prompt injection attacks. The Foundry Agent Service has integrated guardrails so that even if an agent tries something risky (e.g., a tool returns a sensitive info that should not be shown), policies can prevent misuse or leakage. For example, we configured financial analysis agents with rules not to output certain PII or to stop if they detect a regulatory compliance issue, handing off to a human instead. Identity and Access Control: Agents operate with identities managed via Microsoft Entra ID (Azure AD). This means every action an agent takes can be attributed and audited. Role-Based Access Control (RBAC) is enforced: an agent only has access to the data and APIs its role permits. If an agent’s credentials are compromised or misused, Azure’s standard auditing can alert us. Essentially, agents are first-class service principals in our cloud stack. Network Isolation and Compliance: For enterprise deployments, Azure AI Foundry allows agents to run in isolated networks (so they can’t arbitrarily call external services unless allowed) and to use customer-managed storage and search indices. This addresses the data governance aspect, we can ensure an agent looking up internal documents only sees what it’s supposed to, and all data stays within compliant boundaries. Auditability: As mentioned earlier, every decision an agent makes (every tool it calls, every answer it gives) is recorded. This is crucial for trust, if a multi-agent system is making business decisions, we need to be able to explain and justify those decisions later. By retaining the full reasoning trace and sources, we make the system’s work transparent and auditable. In fact, our “Deep Research” agents output not just answers but also a log of how they arrived at that answer, including citations to source material for each claim. This level of detail is a must-have in regulated industries or any high-stakes use case. Overall, baking in trust and safety by design was a non-negotiable requirement. It does introduce some overhead – e.g., being strict about content filtering can sometimes stop an agent from a creative solution until we refine its prompt or the filter thresholds, but it’s worth it for the confidence it gives to deploy these agents at scale. Performance and Cost Considerations We touched on the resource cost of multi-agent systems. Another challenge was ensuring the system runs efficiently. Without care, adding agents can linearly increase cost and latency. We mitigated this in a few ways: Parallelism: We make agents run concurrently wherever possible. Our lead agents typically fire off multiple sub-agents at once rather than sequentially waiting for one then starting the next. Also, our agents themselves can issue parallel tool calls. In fact, we enabled some of our retrieval agents to batch multiple search queries and send them all at the same time. Anthropic reported that this kind of parallelism cut their research task times by up to 90%, and we’ve observed similar dramatic speed-ups. By doing in 1 minute what a single agent might take 10 minutes to do step-by-step, we make the approach far more practical. Of course, the flip side is hitting many APIs and LLM endpoints concurrently can spike usage costs; we carefully monitor usage and recommend multi-agent mode only when needed for the problem complexity. Scaling rules and agent limits: One lesson learned was to prevent “agent sprawl.” We devised guidelines (and encoded some in prompts) about how many sub-agents to use for a given task complexity. For simple fact queries, the main agent is encouraged to handle it alone or with at most one helper; for moderately complex tasks, maybe spin up 2–3; only truly complex projects get a dozen specialists. This avoids the situation where an overzealous orchestrator might launch an army of agents and overkill the problem. These limits were informed by experimentation and echo the principle of scaling effort to the problem size. Model selection: Multi-agent systems don’t always need the largest model for every agent. We often use a mix of model sizes to optimize cost. For instance, a straightforward data extraction agent might be powered by a cheaper GPT-3.5, while the synthesis agent uses GPT-4 for the final answer quality. Foundry makes it easy to deploy a range of model endpoints (including open-source Llama-based models) and each agent can pick the one best suited. We learned that using an expensive model for a simple sub-task is wasteful; a smaller model with the right tools can do the job just as well. This mix-and-match approach helped keep our compute costs in check without sacrificing outcome quality. Lessons Learned and Best Practices Building these multi-agent systems was an iterative learning process. Here are some of the key lessons and best practices that emerged, which we believe will be useful to anyone developing their own: Let’s expand on a couple of these points: Prompt engineering for multi-agent is different: We quickly discovered that writing prompts for a team of agents is an order of magnitude more complex than for a single chatbot. Not only do you have to get each agent’s behavior right, you must shape how they interact. One principle that served us well was: “Think like your agents.” When debugging, we’d often step through the conversation from each agent’s perspective, almost role-playing as them, to see why they might be doing something silly. If an agent was repeating another’s results, maybe our instructions were too vague and they didn’t realise that sub-task was already covered. The fix would be to clarify the division of labour in the lead agent’s prompt or introduce an ordering (e.g., Agent B only runs after Agent A’s info is in, etc.). Another principle: teach the orchestrator to delegate effectively. The main agent’s prompt now includes explicit guidance on how to break down tasks and how to phrase sub-agent assignments with plenty of detail. We learned that if the lead just says “Research topic X” to two different agents, they might both do the same thing. Now, the lead agent provides distinct objectives and context to each sub-agent (e.g., focus one on recent news, another on historical data, etc.). This reduced redundancy and missed coverage dramatically. Let the AI help improve itself: One delightful surprise was that large models can be quite good at analyzing and refining their own strategies when asked. We sometimes gave an agent a chance to critique its output or plan, essentially a self-reflection step. In other cases we had a “judge” agent evaluate the final answers against criteria (accuracy, completeness, etc.) These evaluations not only gave us a score for benchmarking changes, but the judge’s feedback (being an LLM) often highlighted exactly where an agent went off track or missed something. In a sense, we used one AI to tell us how to make another AI better. This kind of meta-prompting and self-correction became a powerful tool in our development cycle, allowing faster iteration without full human-in-the-loop at every turn. Know when to simplify: Not every problem needs a fleet of agents. A big lesson was to use the simplest approach that works. If a single agent with a smart prompt can handle a task reliably, that’s fine! We reserved multi-agent mode for when there was clear added value e.g., problems requiring parallel exploration, different expertise, or lengthy reasoning that benefits from splitting into parts. This discipline kept our systems leaner and easier to maintain. It also helped us explain the value to stakeholders: we could justify the complexity by pointing to concrete gains (like a task that went from 2 hours by a single high-end model to 10 minutes by a team of agents with better results). Conclusion and Next Steps Multi-agent AI systems have moved from intriguing research demos to practical, production-ready solutions. Our journey involved close collaboration between teams such as those who built open-source frameworks like AutoGen to experiment with multi-agent interactions) and the Azure AI product teams (who turned these concepts into the robust Azure AI Foundry Agent Service). Along the way, we learned how to orchestrate LLMs at scale, how to keep them in check, and how to squeeze the most value out of agent collaboration. Today, Azure AI Foundry’s Agents platform provides a unified environment to develop, test, and deploy multi-agent systems, complete with the orchestration, observability, and safety features to make them enterprise-ready. The public preview of features like Connected Agents and Deep Research (which is essentially an advanced research agent that uses the web + analysis in a multi-step process) is already enabling customers to build “AI teams” that tackle complex workflows. This is just the beginning. We’re continuing to improve the platform with feedback from developers: upcoming releases will further tighten integration with the broader Azure ecosystem (for example, more seamless use of Azure Cognitive Search, Excel as a tool, etc.), expand the library of pre-built agent templates in the Agent Catalog (so you can start with a solid example for common scenarios), and introduce more advanced coordination patterns inspired by real-world use cases. If you’re an AI engineer or developer eager to explore multi-agent systems, now is a great time to dive in. Here are some resources to get you started: Microsoft AI Agents for Beginners - Learn all about AI Agents with this FREE curricula Azure AI Foundry Documentation – Learn more about the Agent Service and how to configure agents, tools, and workflows. Microsoft Learn Modules – step-by-step tutorial to build a connected multi-agent solution (for example, a ticket triage system) using Azure AI Foundry Agent Service. This will walk you through setting up agents and using the SDK. Microsoft MCP for Beginners: Integrating MCP Tools – Another tutorial focused on the Model Context Protocol, showing how to enable dynamic tool discovery for your agents. Azure AI Foundry Agent Catalog – Browse a growing collection of open-sourced agent examples contributed by Microsoft and partners, covering scenarios from content compliance to manufacturing optimization. These samples are great starting points to see how multi-agent code is structured in real projects. Multi-agent systems represent a significant shift in how we conceptualise AI solutions: from single brilliant assistants to teams of specialised agents working in concert. The engineering journey hasn’t been easy we navigated challenges in coordination, built new tooling for control, and refined prompts endlessly. But the end result is a new class of AI applications that are more powerful, resilient, and tunable. We hope the insights shared here help you in your own journey to build with AI agents. We’re excited to see what you will create with these technologies. As we continue to push the frontier of agentic AI (both in research and in Azure), one thing is clear: many minds – human or AI – are often better than one. Happy building! Userful References Introducing Multi-Agent Orchestration in Foundry Agent Service – Build ... Building a multimodal, multi-agent system using Azure AI Agent Service ... How we built our multi-agent research system \ Anthropic What is Azure AI Foundry Agent Service? - Azure AI Foundry Multi-Agent Systems and MCP Tools Integration with Azure AI Foundry ... Introducing Deep Research in Azure AI Foundry Agent Service AutoGen v0.4: Advancing the development of agentic AI systems