ai
105 TopicsImplementing A2A protocol in NET: A Practical Guide
As AI systems mature into multi‑agent ecosystems, the need for agents to communicate reliably and securely has become fundamental. Traditionally, agents built on different frameworks like Semantic Kernel, LangChain, custom orchestrators, or enterprise APIs do not share a common communication model. This creates brittle integrations, duplicate logic, and siloed intelligence. The Agent‑to‑Agent Standard (A2AS) addresses this gap by defining a universal, vendor‑neutral protocol for structured agent interoperability. A2A establishes a common language for agents, built on familiar web primitives: JSON‑RPC 2.0 for messaging and HTTPS for transport. Each agent exposes a machine‑readable Agent Card describing its capabilities, supported input/output modes, and authentication requirements. Interactions are modeled as Tasks, which support synchronous, streaming, and long‑running workflows. Messages exchanged within a task contain Parts; text, structured data, files, or streams, that allow agents to collaborate without exposing internal implementation details. By standardizing discovery, communication, authentication, and task orchestration, A2A enables organizations to build composable AI architectures. Specialized agents can coordinate deep reasoning, planning, data retrieval, or business automation regardless of their underlying frameworks or hosting environments. This modularity, combined with industry adoption and Linux Foundation governance, positions A2A as a foundational protocol for interoperable AI systems. A2AS in .NET — Implementation Guide Prerequisites • .NET 8 SDK • Visual Studio 2022 (17.8+) • A2A and A2A.AspNetCore packages • Curl/Postman (optional, for direct endpoint testing) The open‑source A2A project provides a full‑featured .NET SDK, enabling developers to build and host A2A agents using ASP.NET Core or integrate with other agents as a client. Two A2A and A2A.AspNetCore packages power the experience. The SDK offers: A2AClient - to call remote agents TaskManager - to manage incoming tasks & message routing AgentCard / Message / Task models - strongly typed protocol objects MapA2A() - ASP.NET Core router integration that auto‑generates protocol endpoints This allows you to expose an A2A‑compliant agent with minimal boilerplate. Project Setup Create two separate projects: CurrencyAgentService → ASP.NET Core web project that hosts the agent A2AClient → Console app that discovers the agent card and sends a message Install the packages from the pre-requisites in the above projects. Building a Simple A2A Agent (Currency Agent Example) Below is a minimal Currency Agent implemented in ASP.NET Core. It responds by converting amounts between currencies. Step 1: In CurrencyAgentService project, create the CurrencyAgentImplementation class to implement the A2A agent. The class contains the logic for the following: a) Describing itself (agent “card” metadata). b) Processing the incoming text messages like “100 USD to EUR”. c) Returning a single text response with the conversion. The AttachTo(ITaskManager taskManager) method hooks two delegates on the provided taskManager - a) OnAgentCardQuery → GetAgentCardAsync: returns agent metadata. b) OnMessageReceived → ProcessMessageAsync: handles incoming messages and produces a response. Step 2: In the Program.cs of the Currency Agent Solution, create a TaskManager , and attach the agent to it, and expose the A2A endpoint. Typical flow: GET /agent → A2A host asks OnAgentCardQuery → returns the card POST /agent with a text message → A2A host calls OnMessageReceived → returns the conversion text. All fully A2A‑compliant. Calling an A2A Agent from .NET To interact with any A2A‑compliant agent from .NET, the client follows a predictable sequence: identify where the agent lives, discover its capabilities through the Agent Card, initialize a correctly configured A2AClient, construct a well‑formed message, send it asynchronously, and finally interpret the structured response. This ensures your client is fully aligned with the agent’s advertised contract and remains resilient as capabilities evolve. Below are the steps implemented to call the A2A agent from the A2A client: Identify the agent endpoint: Why: You need a stable base URL to resolve the agent’s metadata and send messages. What: Construct a Uri pointing to the agent service, e.g., https://localhost:7009/agent. Discover agent capabilities via an Agent Card. Why: Agent Cards provide a contract: name, description, final URL to call, and features (like streaming). This de-couples your client from hard-coded assumptions and enables dynamic capability checks. What: Use A2ACardResolver with the endpoint Uri, then call GetAgentCardAsync() to obtain an AgentCard. Initialize the A2AClient with the resolved URL. Why: The client encapsulates transport details and ensures messages are sent to the correct agent endpoint, which may differ from the discovery URL. What: Create A2AClient using new Uri (currencyCard.Url) from the Agent Card for correctness. Construct a well-formed agent request message. Why: Agents typically require structured messages for roles, traceability, and multi-part inputs. A unique message ID supports deduplication and logging. What: Build an AgentMessage: • Role = MessageRole.User clarifies intent. • MessageId = Guid.NewGuid().ToString() ensures uniqueness. • Parts contains content; for simple queries, a single TextPart with the prompt (e.g., “100 USD to EUR”). Package and send the message. Why: MessageSendParams can carry the message plus any optional settings (e.g., streaming flags or context). Using a dedicated params object keeps the API extensible. What: Wrap the AgentMessage in MessageSendParams and call SendMessageAsync(...) on the A2AClient. Outcome: Await the asynchronous response to avoid blocking and to stay scalable. Interpret the agent response. Why: Agents can return multiple Parts (text, data, attachments). Extracting the appropriate part avoids assumptions and keeps your client robust. What: Cast to AgentMessage, then read the first TextPart’s Text for the conversion result in this scenario. Best Practices 1. Keep Agents Focused and Single‑Purpose Design each agent around a clear, narrow capability (e.g., currency conversion, scheduling, document summarization). Single‑responsibility agents are easier to reason about, scale, and test, especially when they become part of larger multi‑agent workflows. 2. Maintain Accurate and Helpful Agent Cards The Agent Card is the first interaction point for any client. Ensure it accurately reflects: Supported input/output formats Streaming capabilities Authentication requirements (if any) Version information A clean and honest card helps clients integrate reliably without guesswork. 3. Prefer Structured Inputs and Outputs Although A2A supports plain text, using structured payloads through DataPart objects significantly improves consistency. JSON inputs and outputs reduce ambiguity, eliminate prompt‑engineering edge cases, and make agent behavior more deterministic especially when interacting with other automated agents. 4. Use Meaningful Task States Treat A2A Tasks as proper state machines. Transition through states intentionally (Submitted → Working → Completed, or Working → InputRequired → Completed). This gives clients clarity on progress, makes long‑running operations manageable, and enables more sophisticated control flows. 5. Provide Helpful Error Messages Make use of A2A and JSON‑RPC error codes such as -32602 (invalid input) or -32603 (internal error), and include additional context in the error payload. Avoid opaque messages, error details should guide the client toward recovery or correction. 6. Keep Agents Stateless Where Possible Stateless agents are easier to scale and less prone to hidden failures. When state is necessary, ensure it is stored externally or passed through messages or task contexts. For local POCs, in‑memory state is acceptable, but design with future statelessness in mind. 7. Validate Input Strictly Do not assume incoming messages are well‑formed. Validate fields, formats, and required parameters before processing. For example, a currency conversion agent should confirm both currencies exist and the value is numeric before attempting a conversion. 8. Design for Streaming Even if Disabled Streaming is optional, but it’s a powerful pattern for agents that perform progressive reasoning or long computations. Structuring your logic so it can later emit partial TextPart updates makes it easy to upgrade from synchronous to streaming workflows. 9. Include Traceability Metadata Embed and log identifiers such as TaskId, MessageId, and timestamps. These become crucial for debugging multi‑agent scenarios, improving observability, and correlating distributed workflows—especially once multiple agents collaborate. 10. Offer Clear Guidance When Input Is Missing Instead of returning a generic failure, consider shifting the task to InputRequired and explaining what the client should provide. This improves usability and makes your agent self‑documenting for new consumers.AI Toolkit Extension Pack for Visual Studio Code: Ignite 2025 Update
Unlock the Latest Agentic App Capabilities The Ignite 2025 update delivers a major leap forward for the AI Toolkit extension pack in VS Code, introducing a unified, end-to-end environment for building, visualizing, and deploying agentic applications to Microsoft Foundry, and the addition of Anthropic’s frontier Claude models in the Model Catalog! This release enables developers to build and debug locally in VS Code, then deploy to the cloud with a single click. Seamlessly switch between VS Code and the Foundry portal for visualization, orchestration, and evaluation, creating a smooth roundtrip workflow that accelerates innovation and delivers a truly unified AI development experience. Download the http://aka.ms/aitoolkit today and start building next-generation agentic apps in VS Code! What Can You Do with the AI Toolkit Extension Pack? Access Anthropic models in the Model Catalog Following the Microsoft, NVIDIA and Anthropic strategic partnerships announcement today, we are excited to share that Anthropic’s frontier Claude models including Claude Sonnet 4.5, Claude Opus 4.1, and Claude Haiku 4.5, are now integrated into the AI Toolkit, providing even more choices and flexibility when building intelligent applications and AI agents. Build AI Agents Using GitHub Copilot Scaffold agent applications using best-practice patterns, tool-calling examples, tracing hooks, and test scaffolds, all powered by Copilot and aligned with the Microsoft Agent Framework. Generate agent code in Python or .NET, giving you flexibility to target your preferred runtime. Build and Customize YAML Workflows Design YAML-based workflows in the Foundry portal, then continue editing and testing directly in VS Code. To customize your YAML-based workflows, instantly convert it to Agent Framework code using GitHub Copilot. Upgrade from declarative design to code-first customization without starting from scratch. Visualize Multi-Agent Workflows Envision your code-based agent workflows with an interactive graph visualizer that reveals each component and how they connect Watch in real-time how each node lights up as you run your agent. Use the visualizer to understand and debug complex agent graphs, making iteration fast and intuitive. Experiment, Debug, and Evaluate Locally Use the Hosted Agents Playground to quickly interact with your agents on your development machine. Leverage local tracing support to debug reasoning steps, tool calls, and latency hotspots—so you can quickly diagnose and fix issues. Define metrics, tasks, and datasets for agent evaluation, then implement metrics using the Foundry Evaluation SDK and orchestrate evaluations runs with the help of Copilot. Seamless Integration Across Environments Jump from Foundry Portal to VS Code Web for a development environment in your preferred code editor setting. Open YAML workflows, playgrounds, and agent templates directly in VS Code for editing and deployment. How to Get Started Install the AI Toolkit extension pack from the VS Code marketplace. Check out documentation. Get started with building workflows with Microsoft Foundry in VS Code 1. Work with Hosted (Pro-code) Agent workflows in VS Code 2. Work with Declarative (Low-code) Agent workflows in VS Code Feedback & Support Try out the extensions and let us know what you think! File issues or feedback on our GitHub repo for Foundry extension and AI Toolkit extension. Your input helps us make continuous improvements.2.4KViews4likes0CommentsUnderstanding Small Language Modes
Small Language Models (SLMs) bring AI from the cloud to your device. Unlike Large Language Models that require massive compute and energy, SLMs run locally, offering speed, privacy, and efficiency. They’re ideal for edge applications like mobile, robotics, and IoT.Why your LLM-powered app needs concurrency
As part of the Python advocacy team, I help maintain several open-source sample AI applications, like our popular RAG chat demo. Through that work, I’ve learned a lot about what makes LLM-powered apps feel fast, reliable, and responsive. One of the most important lessons: use an asynchronous backend framework. Concurrency is critical for LLM apps, which often juggle multiple API calls, database queries, and user requests at the same time. Without async, your app may spend most of its time waiting — blocking one user’s request while another sits idle. The need for concurrency Why? Let’s imagine we’re using a synchronous framework like Flask. We deploy that to a server with gunicorn and several workers. One worker receives a POST request to the "/chat" endpoint, which in turn calls the Azure OpenAI Chat Completions API. That API call can take several seconds to complete — and during that time, the worker is completely tied up, unable to handle any other requests. We could scale out by adding more CPU cores, workers, or threads, but that’s often wasteful and expensive. Without concurrency, each request must be handled serially: When your app relies on long, blocking I/O operations — like model calls, database queries, or external API lookups — a better approach is to use an asynchronous framework. With async I/O, the Python runtime can pause a coroutine that’s waiting for a slow response and switch to handling another incoming request in the meantime. With concurrency, your workers stay busy and can handle new requests while others are waiting: Asynchronous Python backends In the Python ecosystem, there are several asynchronous backend frameworks to choose from: Quart: the asynchronous version of Flask FastAPI: an API-centric, async-only framework (built on Starlette) Litestar: a batteries-included async framework (also built on Starlette) Django: not async by default, but includes support for asynchronous views All of these can be good options depending on your project’s needs. I’ve written more about the decision-making process in another blog post. As an example, let's see what changes when we port a Flask app to a Quart app. First, our handlers now have async in front, signifying that they return a Python coroutine instead of a normal function: async def chat_handler(): request_message = (await request.get_json())["message"] When deploying these apps, I often still use the Gunicorn production web server—but with the Uvicorn worker, which is designed for Python ASGI applications. Alternatively, you can run Uvicorn or Hypercorn directly as standalone servers. Asynchronous API calls To fully benefit from moving to an asynchronous framework, your app’s API calls also need to be asynchronous. That way, whenever a worker is waiting for an external response, it can pause that coroutine and start handling another incoming request. Let's see what that looks like when using the official OpenAI Python SDK. First, we initialize the async version of the OpenAI client: openai_client = openai.AsyncOpenAI( base_url=os.environ["AZURE_OPENAI_ENDPOINT"] + "/openai/v1", api_key=token_provider ) Then, whenever we make API calls with methods on that client, we await their results: chat_coroutine = await openai_client.chat.completions.create( deployment_id=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT"], messages=[{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": request_message}], stream=True, ) For the RAG sample, we also have calls to Azure services like Azure AI Search. To make those asynchronous, we first import the async variant of the credential and client classes in the aio module: from azure.identity.aio import DefaultAzureCredential from azure.search.documents.aio import SearchClient Then, like with the OpenAI async clients, we must await results from any methods that make network calls: r = await self.search_client.search(query_text) By ensuring that every outbound network call is asynchronous, your app can make the most of Python’s event loop — handling multiple user sessions and API requests concurrently, without wasting worker time waiting on slow responses. Sample applications We’ve already linked to several of our samples that use async frameworks, but here’s a longer list so you can find the one that best fits your tech stack: Repository App purpose Backend Frontend azure-search-openai-demo RAG with AI Search Python + Quart React rag-postgres-openai-python RAG with PostgreSQL Python + FastAPI React openai-chat-app-quickstart Simple chat with Azure OpenAI models Python + Quart plain JS openai-chat-backend-fastapi Simple chat with Azure OpenAI models Python + FastAPI plain JS deepseek-python Simple chat with Azure AI Foundry models Python + Quart plain JS1.3KViews4likes0CommentsJS AI Build-a-thon Setup in 5 Easy Steps
🔥 TL;DR — You’re 5 Steps Away from an AI Adventure Set up your project repo, follow the quests, build cool stuff, and level up. Everything’s automated, community-backed, and designed to help you actually learn AI — using the skills you already have. Let’s build the future. One quest at a time. 👉 Join the Build-a-thon | Chat on DiscordGetting Started with Azure MCP Server: A Guide for Developers
The world of cloud computing is growing rapidly, and Azure is at the forefront of this innovation. If you're a student developer eager to dive into Azure and learn about Model Context Protocol (MCP), the Azure MCP Server is your perfect starting point. This tool, currently in Public Preview, empowers AI agents to seamlessly interact with Azure services like Azure Storage, Cosmos DB, and more. Let's explore how you can get started! 🎯 Why Use the Azure MCP Server? The Azure MCP Server revolutionizes how AI agents and developers interact with Azure services. Here's a glimpse of what it offers: Exploration Made Easy: List storage accounts, databases, resource groups, tables, and more with natural language commands. Advanced Operations: Manage configurations, query analytics, and execute complex tasks like building Azure applications. Streamlined Integration: From JSON communication to intelligent auto-completion, the Azure MCP Server ensures smooth operations. Whether you're querying log analytics or setting up app configurations, this server simplifies everything. ✨ Installation Guide: One-Click and Manual Methods Prerequisites: Before you begin, ensure the following: Install either the Stable or Insiders release of VS Code. Add the GitHub Copilot and GitHub Copilot Chat extensions. Option 1: One-Click Install You can install the Azure MCP Server in VS Code or VS Code Insiders using NPX. Here's how: Simply run: npx -y /mcp@latest server start Option 2: Manual Install If you'd prefer manual setup, follow these steps: Create a .vscode/mcp.json file in your VS Code project directory. Add the following configuration: { "servers": { "Azure MCP Server": { "command": "npx", "args": ["-y", "@azure/mcp@latest", "server", "start"] } } } Here an example of the settings.json file Now, launch GitHub Copilot in Agent Mode to activate the Azure MCP Server. 🚀 Supercharging Azure Development Once installed, the Azure MCP Server unlocks an array of capabilities: Azure Cosmos DB: List, query, manage databases and containers. Azure Storage: Query blob containers, metadata, and tables. Azure Monitor: Use KQL to analyze logs and monitor your resources. App Configuration: Handle key-value pairs and labeled configurations. Test prompts like: "List my Azure Storage containers" "Query my Log Analytics workspace" "Show my key-value pairs in App Config" These commands let your agents harness the power of Azure services effortlessly. 🛡️ Security & Authentication The Azure MCP Server simplifies authentication using Azure Identity. Your login credentials are handled securely, with support for mechanisms like: Visual Studio credentials Azure CLI login Interactive Browser login For advanced scenarios, enable production credentials with: export AZURE_MCP_INCLUDE_PRODUCTION_CREDENTIALS=true Always perform a security review when integrating MCP servers to ensure compliance with regulations and standards. 🌟 Why Join the Azure MCP Community? As a developer, you're invited to contribute to the Azure MCP Server project. Whether it's fixing bugs, adding features, or enhancing documentation, your contributions are valued. Explore the Contributing Guide for details on getting involved. The Azure MCP Server is your gateway to leveraging Azure services with cutting-edge technology. Dive in, experiment, and bring your projects to life! What Azure project are you excited to build with the MCP Server? Let’s brainstorm ideas together!Engineering a Local-First Agentic Podcast Studio: A Deep Dive into Multi-Agent Orchestration
The transition from standalone Large Language Models (LLMs) to Agentic Orchestration marks the next frontier in AI development. We are moving away from simple "prompt-and-response" cycles toward a paradigm where specialized, autonomous units—AI Agents—collaborate to solve complex, multi-step problems. As a Technology Evangelist, my focus is on building these production-grade systems entirely on the edge, ensuring privacy, speed, and cost-efficiency. This technical guide explores the architecture and implementation of The AI Podcast Studio. This project demonstrates the seamless integration of the Microsoft Agent Framework, Local Small Language Models (SLMs), and VibeVoice to automate a complete tech podcast pipeline. I. The Strategic Intelligence Layer: Why Local-First? At the core of our studio is a Local-First philosophy. While cloud-based LLMs are powerful, they introduce friction in high-frequency, creative pipelines. By using Ollama as a model manager, we run SLMs like Qwen-3-8B directly on user hardware. 1. Architectural Comparison: Local vs. Cloud Choosing the deployment environment is a fundamental architectural decision. For an agentic podcasting workflow, the edge offers distinct advantages: Dimension Local Models (e.g., Qwen-3-8B) Cloud Models (e.g., GPT-5.2) Latency Zero/Ultra-low: Instant token generation without network "jitter". Variable: Dependent on network stability and API traffic. Privacy Total Sovereignty: Creative data and drafts never leave the local device. Shared Risk: Data is processed on third-party servers. Cost Zero API Fees: One-time hardware investment; free to run infinite tokens. Pay-as-you-go: Costs scale with token count and frequency of calls. Availability Offline: The studio remains functional without an internet connection. Online Only: Requires a stable, high-speed connection. 2. Reasoning and Tool-Calling on the Edge To move beyond simple chat, we implement Reasoning Mode, utilizing Chain-of-Thought (CoT) prompting. This allows our local agents to "think" through the podcast structure before writing. Furthermore, we grant them "superpowers" through Tool-Calling, allowing them to execute Python functions for real-time web searches to gather the latest news. II. The Orchestration Engine: Microsoft Agent Framework The true complexity of this project lies in Agent Orchestration—the coordination of specialized agents to work as a cohesive team. We distinguish between Agents, who act as "Jazz Musicians" making flexible decisions, and Workflows, which act as the "Orchestra" following a predefined score. 1. Advanced Orchestration Patterns Drawing from the WorkshopForAgentic architecture, the studio utilizes several sophisticated patterns: Sequential: A strict pipeline where the output of the Researcher flows into the Scriptwriter. Concurrent (Parallel): Multiple agents search different news sources simultaneously to speed up data gathering. Handoff: An agent dynamically "transfers" control to another specialist based on the context of the task. Magentic-One: A high-level "Manager" agent decides which specialist should handle the next task in real-time. III. Implementation: Code Analysis (Workshop Patterns) To maintain a production-grade codebase, we follow the modular structure found in the WorkshopForAgentic/code directory. This ensures that agents, clients, and workflows are decoupled and maintainable. 1. Configuration: Connecting to Local SLMs The first step is initializing the local model client using the framework's Ollama integration. # Based on WorkshopForAgentic/code/config.py from agent_framework.ollama import OllamaChatClient # Initialize the local client for Qwen-3-8B # Standard Ollama endpoint on localhost chat_client = OllamaChatClient( model_id="qwen3:8b", endpoint="http://localhost:11434" ) 2. Agent Definition: Specialized Roles Each agent is a ChatAgent instance defined by its persona and instructions. # Based on WorkshopForAgentic/code/agents.py from agent_framework import ChatAgent # The Researcher Agent: Responsible for web discovery researcher_agent = client.create_agent( name="SearchAgent", instructions="You are my assistant. Answer the questions based on the search engine.", tools=[web_search], ) # The Scriptwriter Agent: Responsible for conversational narrative generate_script_agent = client.create_agent( name="GenerateScriptAgent", instructions=""" You are my podcast script generation assistant. Please generate a 10-minute Chinese podcast script based on the provided content. The podcast script should be co-hosted by Lucy (the host) and Ken (the expert). The script content should be generated based on the input, and the final output format should be as follows: Speaker 1: …… Speaker 2: …… Speaker 1: …… Speaker 2: …… Speaker 1: …… Speaker 2: …… """ ) 3. Workflow Setup: The Sequential Pipeline For a deterministic production line, we use the WorkflowBuilder to connect our agents. # Based on WorkshopForAgentic/code/workflow_setup.py from agent_framework import WorkflowBuilder # Building the podcast pipeline search_executor = AgentExecutor(agent=search_agent, id="search_executor") gen_script_executor = AgentExecutor(agent=gen_script_agent, id="gen_script_executor") review_executor = ReviewExecutor(id="review_executor", genscript_agent_id="gen_script_executor") # Build workflow with approval loop # search_executor -> gen_script_executor -> review_executor # If not approved, review_executor -> gen_script_executor (loop back) workflow = ( WorkflowBuilder() .set_start_executor(search_executor) .add_edge(search_executor, gen_script_executor) .add_edge(gen_script_executor, review_executor) .add_edge(review_executor, gen_script_executor) # Loop back for regeneration .build() ) IV. Multimodal Synthesis: VibeVoice Technology The "Future Bytes" podcast is brought to life using VibeVoice, a specialized technology from Microsoft Research designed for natural conversational synthesis. Conversational Rhythm: It automatically handles natural turn-taking and speech cadences. High Efficiency: By operating at an ultra-low 7.5 Hz frame rate, it significantly reduces the compute power required for high-fidelity audio. Scalability: The system supports up to 4 distinct voices and can generate up to 90 minutes of continuous audio. V. Observability and Debugging: DevUI Building multi-agent systems requires deep visibility into the agentic "thinking" process. We leverage DevUI, a specialized web interface for testing and tracing: Interactive Tracing: Developers can watch the message flow and tool-calling in real-time. Automatic Discovery: DevUI auto-discovers agents defined within the project structure. Input Auto-Generation: The UI generates input fields based on workflow requirements, allowing for rapid iteration. VI. Technical Requirements for Edge Deployment Deploying this studio locally requires specific hardware and software configurations to handle simultaneous LLM and TTS inference: Software: Python 3.10+, Ollama, and the Microsoft Agent Framework. Hardware: 16GB+ RAM is the minimum requirement; 32GB is recommended for running multiple agents and VibeVoice concurrently. Compute: A modern GPU/NPU (e.g., NVIDIA RTX or Snapdragon X Elite) is essential for smooth inference. Final Perspective: From Coding to Directing The AI Podcast Studio represents a significant shift toward Agentic Content Creation. By mastering these orchestration patterns and leveraging local EdgeAI, developers move from simply writing code to directing entire ecosystems of intelligent agents. This "local-first" model ensures that the future of creativity is private, efficient, and infinitely scalable. Download sample Here Resource EdgeAI for Beginners - https://github.com/microsoft/edgeai-for-beginners Microsoft Agent Framework - https://github.com/microsoft/agent-framework Microsoft Agent Framework Samples - https://github.com/microsoft/agent-framework-samples7.6KViews3likes0CommentsOn‑Device AI with Windows AI Foundry and Foundry Local
From “waiting” to “instant”- without sending data away AI is everywhere, but speed, privacy, and reliability are critical. Users expect instant answers without compromise. On-device AI makes that possible: fast, private and available, even when the network isn’t - empowering apps to deliver seamless experiences. Imagine an intelligent assistant that works in seconds, without sending a text to the cloud. This approach brings speed and data control to the places that need it most; while still letting you tap into cloud power when it makes sense. Windows AI Foundry: A Local Home for Models Windows AI Foundry is a developer toolkit that makes it simple to run AI models directly on Windows devices. It uses ONNX Runtime under the hood and can leverage CPU, GPU (via DirectML), or NPU acceleration, without requiring you to manage those details. The principle is straightforward: Keep the model and the data on the same device. Inference becomes faster, and data stays local by default unless you explicitly choose to use the cloud. Foundry Local Foundry Local is the engine that powers this experience. Think of it as local AI runtime - fast, private, and easy to integrate into an app. Why Adopt On‑Device AI? Faster, more responsive apps: Local inference often reduces perceived latency and improves user experience. Privacy‑first by design: Keep sensitive data on the device; avoid cloud round trips unless the user opts in. Offline capability: An app can provide AI features even without a network connection. Cost control: Reduce cloud compute and data costs for common, high‑volume tasks. This approach is especially useful in regulated industries, field‑work tools, and any app where users expect quick, on‑device responses. Hybrid Pattern for Real Apps On-device AI doesn’t replace the cloud, it complements it. Here’s how: Standalone On‑Device: Quick, private actions like document summarization, local search, and offline assistants. Cloud‑Enhanced (Optional): Large-context models, up-to-date knowledge, or heavy multimodal workloads. Design an app to keep data local by default and surface cloud options transparently with user consent and clear disclosures. Windows AI Foundry supports hybrid workflows: Use Foundry Local for real-time inference. Sync with Azure AI services for model updates, telemetry, and advanced analytics. Implement fallback strategies for resource-intensive scenarios. Application Workflow Code Example using Foundry Local: 1. Only On-Device: Tries Foundry Local first, falls back to ONNX if foundry_runtime.check_foundry_available(): # Use on-device Foundry Local models try: answer = foundry_runtime.run_inference(question, context) return answer, source="Foundry Local (On-Device)" except Exception as e: logger.warning(f"Foundry failed: {e}, trying ONNX...") if onnx_model.is_loaded(): # Fallback to local BERT ONNX model try: answer = bert_model.get_answer(question, context) return answer, source="BERT ONNX (On-Device)" except Exception as e: logger.warning(f"ONNX failed: {e}") return "Error: No local AI available" 2. Hybrid approach: On-device first, cloud as last resort def get_answer(question, context): """ Priority order: 1. Foundry Local (best: advanced + private) 2. ONNX Runtime (good: fast + private) 3. Cloud API (fallback: requires internet, less private) # in case of Hybrid approach, based on real-time scenario """ if foundry_runtime.check_foundry_available(): # Use on-device Foundry Local models try: answer = foundry_runtime.run_inference(question, context) return answer, source="Foundry Local (On-Device)" except Exception as e: logger.warning(f"Foundry failed: {e}, trying ONNX...") if onnx_model.is_loaded(): # Fallback to local BERT ONNX model try: answer = bert_model.get_answer(question, context) return answer, source="BERT ONNX (On-Device)" except Exception as e: logger.warning(f"ONNX failed: {e}, trying cloud...") # Last resort: Cloud API (requires internet) if network_available(): try: import requests response = requests.post( '{BASE_URL_AI_CHAT_COMPLETION}', headers={'Authorization': f'Bearer {API_KEY}'}, json={ 'model': '{MODEL_NAME}', 'messages': [{ 'role': 'user', 'content': f'Context: {context}\n\nQuestion: {question}' }] }, timeout=10 ) answer = response.json()['choices'][0]['message']['content'] return answer, source="Cloud API (Online)" except Exception as e: return "Error: No AI runtime available", source="Failed" else: return "Error: No internet and no local AI available", source="Offline" Demo Project Output: Foundry Local answering context-based questions offline : The Foundry Local engine ran the Phi-4-mini model offline and retrieved context-based data. : The Foundry Local engine ran the Phi-4-mini model offline and mentioned that there is no answer. Practical Use Cases Privacy-First Reading Assistant: Summarize documents locally without sending text to the cloud. Healthcare Apps: Analyze medical data on-device for compliance. Financial Tools: Risk scoring without exposing sensitive financial data. IoT & Edge Devices: Real-time anomaly detection without network dependency. Conclusion On-device AI isn’t just a trend - it’s a shift toward smarter, faster, and more secure applications. With Windows AI Foundry and Foundry Local, developers can deliver experiences that respect user specific data, reduce latency, and work even when connectivity fails. By combining local inference with optional cloud enhancements, you get the best of both worlds: instant performance and scalable intelligence. Whether you’re creating document summarizers, offline assistants, or compliance-ready solutions, this approach ensures your apps stay responsive, reliable, and user-centric. References Get started with Foundry Local - Foundry Local | Microsoft Learn What is Windows AI Foundry? | Microsoft Learn https://devblogs.microsoft.com/foundry/unlock-instant-on-device-ai-with-foundry-local/Building an MCP Server for Microsoft Learn
So why Microsoft Learn? Well, it's a treasure trove of knowledge for developers and IT pros. Secondly, because it has search page with many filters, it lends itself well to be wrapped as a MCP server. I'm talking about this page of course Microsoft Learn. > DISCLAIMER, the below article is just to show you how easy you can wrap an API and make that into an MCP Server. There's now an official MCP Server for Docs and Learn which you're encouraged to use over building one for Docs and Learn yourself, see this link to the official server https://github.com/MicrosoftDocs/mcp MCP? Let's back up the tape a little, MCP, what's that? MCP stands for Model Context Protocol and is a new standard for dealing with AI apps. The idea is that you have features like prompts, tools and resources that you can use to build AI applications. Because it's a standard, you can easily share these features with others. In fact, you can use MCP servers running locally as well as ones that runs on an IP Address. Pait that with and Agentic client like Visual Studio Code or Claude Desktop and you have built an Agent. How do we build this? Well, the idea is to "wrap" Microsoft Learn's API for search for training content. This means that we will create an MCP server with tools that we can call. So first off, what parts of the API do we need? Free text search. We definitely want this one as it allows us to search for training content with keywords. Filters. We want to know what filters exist so we can search for content based on them. Topic, there are many different topics like products, roles and more, let's support this too. Building the MCP Server For this, we will use a transport method called SSE, Server-Sent Events. This is a simple way to create a server that people can interact with using HTTP. Here's how the server code will look: from mcp.server.fastmcp import FastMCP from starlette.applications import Starlette from starlette.routing import Mount, Host mcp = FastMCP("Microsoft Learn Search", "0.1.0") @mcp.tool() def learn_filter() -> list[Filter]: pass @mcp.tool() def free_text(query: str) -> list[Result]: pass @mcp.tool() def topic_search(category: str, topic: str) -> list[Result]: pass port = 8000 app = Starlette( routes=[ Mount('/', app=mcp.sse_app()), ] ) This code sets up a basic MCP server using FastMCP and Starlette. The learn_filter, free_text, and topic_search functions are placeholders for the actual implementation of the tools that will interact with the Microsoft Learn API. Note the use of the decorator "@mcp.tool" which registers these functions as tools in the MCP server. Look also at the result types Filter and Result, which are used to define the structure of the data returned by these tools. Implementing the Tools We will implement the tools one by one. Implementing the learn_filter Tool So let's start with the learn_filter function. This function will retrieve the available filters from the Microsoft Learn API. Here's how we can implement it: .tool() def learn_filter() -> list[Filter]: data = search_learn() return data Next, let's implement search_learn like so: # search/search.py import requests from config import BASE_URL from utils.models import Filter def search_learn() -> list[Filter]: """ Top level search function to fetch learn data from the Microsoft Learn API. """ url = to_url() response = requests.get(url) if response.status_code == 200: data = response.json() facets = data["facets"] filters = [] # print(facets.keys()) for key in facets.keys(): # print(f"{key}") for item in facets[key]: filter = Filter(**item) filters.append(filter) # print(f" - {filter.value} ({filter.count})") return filters else: return None def to_url(): return f"{BASE_URL}/api/contentbrowser/search?environment=prod&locale=en-us&facet=roles&facet=levels&facet=products&facet=subjects&facet=resource_type&%24top=30&showHidden=false&fuzzySearch=false" # config.py BASE_URL = "https://learn.microsoft.com" We also need to define the Filter model, we can use Pydantic for this: # utils/models.py class Filter(BaseModel): type: str value: str count: int How do we know what this looks like, well, we've tried to make a request to the API and looked at the response, here's an excerpt of the response: { "facets": { "roles": [ { "type": "role", "value": "Administrator", "count": 10 }, { "type": "role", "value": "Developer", "count": 20 } ], ... } } Technically, each item has more properties, but we only need the type, value, and count for our purposes. Implementing the free_text Tool For this, we will implement the free_text function, like so: # search/free_text.py import requests from utils.models import Result from config import BASE_URL def search_free_text(text: str) -> list[Result]: url = to_url(text) response = requests.get(url) results = [] if response.status_code == 200: data = response.json() if "results" in data: records = len(data["results"]) print(f"Search results: {records} records found") for item in data["results"]: result = Result(**item) result.url = f"{BASE_URL}{result.url}" results.append(result) return results def to_url(text: str): return f"{BASE_URL}/api/contentbrowser/search?environment=prod&locale=en-us&terms={text}&facet=roles&facet=levels&facet=products&facet=subjects&facet=resource_type&%24top=30&showHidden=false&fuzzySearch=false" Here we use a new type Result, which we also need to define. This will represent the search results returned by the Microsoft Learn API. Let's define it: # utils/models.py from pydantic import BaseModel class Result(BaseModel): title: str url: str popularity: float summary: str | None = None Finally, let's wire this up in our MCP server: .tool() def learn_filter() -> list[Filter]: data = search_learn() return data @mcp.tool() def free_text(query: str) -> list[Result]: print("LOG: free_text called with query:", query) data = search_free_text(query) return data Implementing Search by topic Just one more method to implement, the search_topic method, let's implement it like so: import requests from utils.models import Result from config import BASE_URL def search_topic(topic: str, category: str ="products") -> list[Result]: results = [] url = to_url(category, topic) response = requests.get(url) if response.status_code == 200: data = response.json() for item in data["results"]: result = Result(**item) result.url = f"{BASE_URL}{result.url}" results.append(result) return results else: return [] def to_url(category: str, topic: str): return f"{BASE_URL}/api/contentbrowser/search?environment=prod&locale=en-us&facet=roles&facet=levels&facet=products&facet=subjects&facet=resource_type&%24filter=({category}%2Fany(p%3A%20p%20eq%20%27{topic}%27))&%24top=30&showHidden=false&fuzzySearch=false" Great, this one also uses Result as the return type, so we only have left to wire this up in our MCP server: .tool() def topic_search(category: str, topic: str) -> list[Result]: print("LOG: topic_search called with category:", category, "and topic:", topic) data = search_topic(topic, category) return data That's it, let's move on to testing it. Testing the MCP Server with Visual Studio Code You can test this with Visual Studio Code and its Agent mode. What you need to do is: Create a file mcp.json in the .vscode directory and make sure it looks like this: { "inputs": [], "servers": { "learn": { "type": "sse", "url": "http://localhost:8000/sse" } } } Note how we point to a running server on port 8000, this is where our MCP server will run. Installing Dependencies We will need the following dependencies to run this code: pip install requests "mcp[cli]" You're also recommended to create a virtual environment for this project, you can do this with: python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate` Then install the dependencies as shown above. Running the MCP Server To run the MCP server, you can use the following command: uvicorn server:app How this one works is Uvicorn looks for a file server.py and an app variable in that file, which is the Starlette app we created earlier. Try it out We got the server running, and an entry in Visual Studio Code. Click the play icon just above your entry in Visual Studio Code, this should connect to the MCP server. Try it out by typing in the search box in the bottom right of your GitHub Copilot extension. For example, type "Do a free text search on JavaScript, use a tool". It should look something like this: You should see the response coming back from Microsoft Learn and your MCP Server. Summary Congrats, if you've done all the steps so far, you've managed to build an MCP server and solve a very practical problem: how to wrap an API, for a search AND make it "agentic". Learn more Check out the repo MCP CurriculumHow to Use Postgres MCP Server with GitHub Copilot in VS Code
GitHub Copilot has changed how developers write code, but when combined with an MCP (Model Copilot Protocol) server, it also connects your services. With it, Copilot can understand your database schema and generate relevant code for your API, data models, or business logic. In this guide, you'll learn how to use the Neon Serverless Postgres MCP server with GitHub Copilot in VS Code to build a sample REST API quickly. We'll walk through how to create an Azure Function that fetches data from a Neon database, all with minimal setup and no manual query writing. From Code Generation to Database Management with GitHub Copilot AI agents are no longer just helping write code—they’re creating and managing databases. When a chatbot logs a customer conversation, or a new region spins up in the Azure cloud, or a new user signs up, an AI agent can automatically create a database in seconds. No need to open a dashboard or call an API. This is the next evolution of software development: infrastructure as intent. With tools like database MCP servers, agents can now generate real backend services as easily as they generate code. GitHub Copilot becomes your full-stack teammate. It can answer database-related questions, fetch your database connection string, update environment variables in your Azure Function, generate SQL queries to populate tables with mock data, and even help you create new databases or tables on the fly. All directly from your editor, with natural language prompts. Neon has a dedicated MCP server that makes it possible for Copilot to directly understand the structure of your Postgres database. Let's get started with using the Neon MCP server and GitHub Copilot. What You’ll Need Node.js (>= v18.0.0) and npm: Download from nodejs.org. An Azure subscription (create one for free) Install either the stable or Insiders release of VS Code: Stable release Insiders release Install the GitHub Copilot, GitHub Copilot for Azure, and GitHub Copilot Chat extensions for VS Code Azure Functions Core Tools (for local testing) Connect GitHub Copilot to Neon MCP Server Create Neon Database Visit the Neon on Azure Marketplace page and follow the Create a Neon resource guide to deploy Neon on Azure for your subscription. Neon has free plan is more than enough to build proof of concept or kick off a new startup project. Install the Neon MCP Server for VS Code Neon MCP Server offers two options for connecting your VS Code MCP client to Neon. We will use the Remote Hosted MCP Server option. This is the simplest setup—no need to install anything locally or configure a Neon API key in your client. Add the following Neon MCP server configuration to your user settings in VS Code: { "mcp":{ "servers":{ "Neon":{ "command":"npx", "args":[ "-y", "mcp-remote", "https://mcp.neon.tech/sse" ] } } } } Click on Start on the MCP server. A browser window will open with an OAuth prompt. Just follow the steps to give your VS Code MCP client access to your Neon account. Generate an Azure Function REST API using GitHub Copilot Step 1: Create an empty folder (ex: mcp-server-vs-code) and open it in VS Code. Step 2: Open GitHub Copilot Chat in VS Code and switch to Agent mode. You should see the available tools. Step 3: Ask Copilot something like "Create an Azure function with an HTTP trigger”: Copilot will generate a REST API using Azure Functions in JavaScript with a basic setup to run the functions locally. Step 4: Next, you can ask to list existing Neon projects: Step 5: Try out different prompts to fetch the connection string for the chosen database and set it to the Azure Functions settings. Then ask to create a sample Customer table and so on. Or you can even prompt to create a new database branch on Neon. Step 6: Finally, you can prompt Copilot to update the Azure functions code to fetch data from the table: Combine the Azure MCP Server Neon MCP gives GitHub Copilot full access to your database schema, so it can help you write SQL queries, connect to the database, and build API endpoints. But when you add Azure MCP Server into the mix, Copilot can also understand your Azure services—like Blob Storage, Queues, and Azure AI. You can run both Neon MCP and Azure MCP at the same time to create a full cloud-native developer experience. For example: Use Neon MCP for serverless Postgres with branching and instant scale. Use Azure MCP to connect to other Azure services from the same Copilot chat. Even better: Azure MCP is evolving. In newer versions, you can spin up Azure Functions and other services directly from Copilot chat, without ever leaving your editor. Copilot pulls context from both MCP servers, which means smarter code suggestions and faster development. You can mix and match based on your stack and let Copilot help you build real, working apps in minutes. Final Thoughts With GitHub Copilot, Neon MCP server, and Azure Functions, you're no longer writing backend code line by line. It is so fast to build APIs. You're orchestrating services with intent. This is not the future—it’s something you can use today.