agents
127 Topicsđ AI Toolkit for VS Code: January 2026 Update
Happy New Year! đ We are kicking off 2026 with a major set of updates designed to streamline how you build, test, and deploy AI agents. This month, weâve focused on aligning with the latest GitHub Copilot standards, introducing powerful new debugging tools, and enhancing our support for enterprise-grade models via Microsoft Foundry. đĄ From Copilot Instructions to Agent Skills The biggest architectural shift following the latest VS Code Copilot standards, in v0.28.1 is the transition from Copilot Instructions to Copilot Skills. This transition has equipped GitHub Copilot specialized skills on developing AI agents using Microsoft Foundry and Agent Framework in a cost-efficient way. In AI Toolkit, we have migrated our Copilot Tools from the Custom Instructions to Agent Skills. This change allows for a more capable integration within GitHub Copilot Chat. đ Enhanced AIAgentExpert: Our custom agent now has a deeper understanding of workflow code generation and evaluation planning/execution. đ§šAutomatic Migration: When you upgrade to v0.28.1, the toolkit will automatically clean up your old instructions to ensure a seamless transition to the new skills-based framework. đď¸ Major Enhancements to Agent Development Our v0.28.0 milestone release brought significant improvements to how agents are authored and authenticated. đ Anthropic & Entra Auth Support Weâve expanded the Agent Builder and Playground to support Anthropic models using Entra Auth types. This provides enterprise developers with a more secure way to leverage Claude models within the Agent Framework while maintaining strict authentication standards. đ˘ Foundry-First Development We are prioritizing the Microsoft Foundry ecosystem to provide a more robust development experience: Foundry v2: Code generation for agents now defaults to Foundry v2. ⥠Eval Tool: You can now generate evaluation code directly within the toolkit to create and run evaluations in Microsoft Foundry. đ Model Catalog: Weâve optimized the Model Catalog to prioritize Foundry models and improved general loading performance. đď¸ đť Performance and Local Models For developers building on Windows, we continue to optimize the local model experience: Profiling for Windows ML: Version 0.28.0 introduces profiling features for Windows ML-based local models, allowing you to monitor performance and resource utilization directly within VS Code. Platform Optimization: To keep the interface clean, weâve removed the Windows AI API tab from the Model Catalog when running on Linux and macOS platforms. đ Squashing Bugs & Polishing the Experience Codespaces Fix: Resolved a crash occurring when selecting images in the Playground while using GitHub Codespaces. Resource Management: Fixed a delay where newly added models wouldn't immediately appear in the "My Resources" view. Claude Compatibility: Fixed an issue where non-empty content was required for Claude models when used via the AI Toolkit in GitHub Copilot. đ Getting Started Ready to experience the future of AI development? Here's how to get started: đĽ Download: Install the AI Toolkit from the Visual Studio Code Marketplace đ Learn: Explore our comprehensive AI Toolkit Documentation đ Discover: Check out the complete changelog for v0.24.0 We'd love to hear from you! Whether it's a feature request, bug report, or feedback on your experience, join the conversation and contribute directly on our GitHub repository. Happy Coding! đťâ¨Microsoft Frontier Program expands to individual Microsoft subscribers
The Frontier program that gives commercial Microsoft 365 Copilot customers early access to exciting, cutting-edge capabilities is now coming to individuals who have a Microsoft 365 Personal or Family subscription.9.5KViews7likes11CommentsCustomer Story: EPAM's Data-Driven Copilot Adoption Journey
EPAM, a global engineering consulting organization, used Microsoft 365 admin center and Super User reports to drive adoption at scale, using customized reporting for deeper insights and measurable impact. EPAMâs Copilot story is one of proactive measurement, data-driven license management, and continuous improvement â combining early adoption, custom analytics, and Microsoftâs evolving tools to lead in AI-driven enterprise productivity.722Views6likes0CommentsImplementing A2A protocol in NET: A Practical Guide
As AI systems mature into multiâagent ecosystems, the need for agents to communicate reliably and securely has become fundamental. Traditionally, agents built on different frameworks like Semantic Kernel, LangChain, custom orchestrators, or enterprise APIs do not share a common communication model. This creates brittle integrations, duplicate logic, and siloed intelligence. The AgentâtoâAgent Standard (A2AS) addresses this gap by defining a universal, vendorâneutral protocol for structured agent interoperability. A2A establishes a common language for agents, built on familiar web primitives: JSONâRPC 2.0 for messaging and HTTPS for transport. Each agent exposes a machineâreadable Agent Card describing its capabilities, supported input/output modes, and authentication requirements. Interactions are modeled as Tasks, which support synchronous, streaming, and longârunning workflows. Messages exchanged within a task contain Parts; text, structured data, files, or streams, that allow agents to collaborate without exposing internal implementation details. By standardizing discovery, communication, authentication, and task orchestration, A2A enables organizations to build composable AI architectures. Specialized agents can coordinate deep reasoning, planning, data retrieval, or business automation regardless of their underlying frameworks or hosting environments. This modularity, combined with industry adoption and Linux Foundation governance, positions A2A as a foundational protocol for interoperable AI systems. A2AS in .NET â Implementation Guide Prerequisites ⢠.NET 8 SDK ⢠Visual Studio 2022 (17.8+) ⢠A2A and A2A.AspNetCore packages ⢠Curl/Postman (optional, for direct endpoint testing) The openâsource A2A project provides a fullâfeatured .NET SDK, enabling developers to build and host A2A agents using ASP.NET Core or integrate with other agents as a client. Two A2A and A2A.AspNetCore packages power the experience. The SDK offers: A2AClient - to call remote agents TaskManager - to manage incoming tasks & message routing AgentCard / Message / Task models - strongly typed protocol objects MapA2A() - ASP.NET Core router integration that autoâgenerates protocol endpoints This allows you to expose an A2Aâcompliant agent with minimal boilerplate. Project Setup Create two separate projects: CurrencyAgentService â ASP.NET Core web project that hosts the agent A2AClient â Console app that discovers the agent card and sends a message Install the packages from the pre-requisites in the above projects. Building a Simple A2A Agent (Currency Agent Example) Below is a minimal Currency Agent implemented in ASP.NET Core. It responds by converting amounts between currencies. Step 1: In CurrencyAgentService project, create the CurrencyAgentImplementation class to implement the A2A agent. The class contains the logic for the following: a) Describing itself (agent âcardâ metadata). b) Processing the incoming text messages like â100 USD to EURâ. c) Returning a single text response with the conversion. The AttachTo(ITaskManager taskManager) method hooks two delegates on the provided taskManager - a) OnAgentCardQuery â GetAgentCardAsync: returns agent metadata. b) OnMessageReceived â ProcessMessageAsync: handles incoming messages and produces a response. Step 2: In the Program.cs of the Currency Agent Solution, create a TaskManager , and attach the agent to it, and expose the A2A endpoint. Typical flow: GET /agent â A2A host asks OnAgentCardQuery â returns the card POST /agent with a text message â A2A host calls OnMessageReceived â returns the conversion text. All fully A2Aâcompliant. Calling an A2A Agent from .NET To interact with any A2Aâcompliant agent from .NET, the client follows a predictable sequence: identify where the agent lives, discover its capabilities through the Agent Card, initialize a correctly configured A2AClient, construct a wellâformed message, send it asynchronously, and finally interpret the structured response. This ensures your client is fully aligned with the agentâs advertised contract and remains resilient as capabilities evolve. Below are the steps implemented to call the A2A agent from the A2A client: Identify the agent endpoint: Why: You need a stable base URL to resolve the agentâs metadata and send messages. What: Construct a Uri pointing to the agent service, e.g., https://localhost:7009/agent. Discover agent capabilities via an Agent Card. Why: Agent Cards provide a contract: name, description, final URL to call, and features (like streaming). This de-couples your client from hard-coded assumptions and enables dynamic capability checks. What: Use A2ACardResolver with the endpoint Uri, then call GetAgentCardAsync() to obtain an AgentCard. Initialize the A2AClient with the resolved URL. Why: The client encapsulates transport details and ensures messages are sent to the correct agent endpoint, which may differ from the discovery URL. What: Create A2AClient using new Uri (currencyCard.Url) from the Agent Card for correctness. Construct a well-formed agent request message. Why: Agents typically require structured messages for roles, traceability, and multi-part inputs. A unique message ID supports deduplication and logging. What: Build an AgentMessage: ⢠Role = MessageRole.User clarifies intent. ⢠MessageId = Guid.NewGuid().ToString() ensures uniqueness. ⢠Parts contains content; for simple queries, a single TextPart with the prompt (e.g., â100 USD to EURâ). Package and send the message. Why: MessageSendParams can carry the message plus any optional settings (e.g., streaming flags or context). Using a dedicated params object keeps the API extensible. What: Wrap the AgentMessage in MessageSendParams and call SendMessageAsync(...) on the A2AClient. Outcome: Await the asynchronous response to avoid blocking and to stay scalable. Interpret the agent response. Why: Agents can return multiple Parts (text, data, attachments). Extracting the appropriate part avoids assumptions and keeps your client robust. What: Cast to AgentMessage, then read the first TextPartâs Text for the conversion result in this scenario. Best Practices 1. Keep Agents Focused and SingleâPurpose Design each agent around a clear, narrow capability (e.g., currency conversion, scheduling, document summarization). Singleâresponsibility agents are easier to reason about, scale, and test, especially when they become part of larger multiâagent workflows. 2. Maintain Accurate and Helpful Agent Cards The Agent Card is the first interaction point for any client. Ensure it accurately reflects: Supported input/output formats Streaming capabilities Authentication requirements (if any) Version information A clean and honest card helps clients integrate reliably without guesswork. 3. Prefer Structured Inputs and Outputs Although A2A supports plain text, using structured payloads through DataPart objects significantly improves consistency. JSON inputs and outputs reduce ambiguity, eliminate promptâengineering edge cases, and make agent behavior more deterministic especially when interacting with other automated agents. 4. Use Meaningful Task States Treat A2A Tasks as proper state machines. Transition through states intentionally (Submitted â Working â Completed, or Working â InputRequired â Completed). This gives clients clarity on progress, makes longârunning operations manageable, and enables more sophisticated control flows. 5. Provide Helpful Error Messages Make use of A2A and JSONâRPC error codes such as -32602 (invalid input) or -32603 (internal error), and include additional context in the error payload. Avoid opaque messages, error details should guide the client toward recovery or correction. 6. Keep Agents Stateless Where Possible Stateless agents are easier to scale and less prone to hidden failures. When state is necessary, ensure it is stored externally or passed through messages or task contexts. For local POCs, inâmemory state is acceptable, but design with future statelessness in mind. 7. Validate Input Strictly Do not assume incoming messages are wellâformed. Validate fields, formats, and required parameters before processing. For example, a currency conversion agent should confirm both currencies exist and the value is numeric before attempting a conversion. 8. Design for Streaming Even if Disabled Streaming is optional, but itâs a powerful pattern for agents that perform progressive reasoning or long computations. Structuring your logic so it can later emit partial TextPart updates makes it easy to upgrade from synchronous to streaming workflows. 9. Include Traceability Metadata Embed and log identifiers such as TaskId, MessageId, and timestamps. These become crucial for debugging multiâagent scenarios, improving observability, and correlating distributed workflowsâespecially once multiple agents collaborate. 10. Offer Clear Guidance When Input Is Missing Instead of returning a generic failure, consider shifting the task to InputRequired and explaining what the client should provide. This improves usability and makes your agent selfâdocumenting for new consumers.Engineering a Local-First Agentic Podcast Studio: A Deep Dive into Multi-Agent Orchestration
The transition from standalone Large Language Models (LLMs) to Agentic Orchestration marks the next frontier in AI development. We are moving away from simple "prompt-and-response" cycles toward a paradigm where specialized, autonomous unitsâAI Agentsâcollaborate to solve complex, multi-step problems. As a Technology Evangelist, my focus is on building these production-grade systems entirely on the edge, ensuring privacy, speed, and cost-efficiency. This technical guide explores the architecture and implementation of The AI Podcast Studio. This project demonstrates the seamless integration of the Microsoft Agent Framework, Local Small Language Models (SLMs), and VibeVoice to automate a complete tech podcast pipeline. I. The Strategic Intelligence Layer: Why Local-First? At the core of our studio is a Local-First philosophy. While cloud-based LLMs are powerful, they introduce friction in high-frequency, creative pipelines. By using Ollama as a model manager, we run SLMs like Qwen-3-8B directly on user hardware. 1. Architectural Comparison: Local vs. Cloud Choosing the deployment environment is a fundamental architectural decision. For an agentic podcasting workflow, the edge offers distinct advantages: Dimension Local Models (e.g., Qwen-3-8B) Cloud Models (e.g., GPT-5.2) Latency Zero/Ultra-low: Instant token generation without network "jitter". Variable: Dependent on network stability and API traffic. Privacy Total Sovereignty: Creative data and drafts never leave the local device. Shared Risk: Data is processed on third-party servers. Cost Zero API Fees: One-time hardware investment; free to run infinite tokens. Pay-as-you-go: Costs scale with token count and frequency of calls. Availability Offline: The studio remains functional without an internet connection. Online Only: Requires a stable, high-speed connection. 2. Reasoning and Tool-Calling on the Edge To move beyond simple chat, we implement Reasoning Mode, utilizing Chain-of-Thought (CoT) prompting. This allows our local agents to "think" through the podcast structure before writing. Furthermore, we grant them "superpowers" through Tool-Calling, allowing them to execute Python functions for real-time web searches to gather the latest news. II. The Orchestration Engine: Microsoft Agent Framework The true complexity of this project lies in Agent Orchestrationâthe coordination of specialized agents to work as a cohesive team. We distinguish between Agents, who act as "Jazz Musicians" making flexible decisions, and Workflows, which act as the "Orchestra" following a predefined score. 1. Advanced Orchestration Patterns Drawing from the WorkshopForAgentic architecture, the studio utilizes several sophisticated patterns: Sequential: A strict pipeline where the output of the Researcher flows into the Scriptwriter. Concurrent (Parallel): Multiple agents search different news sources simultaneously to speed up data gathering. Handoff: An agent dynamically "transfers" control to another specialist based on the context of the task. Magentic-One: A high-level "Manager" agent decides which specialist should handle the next task in real-time. III. Implementation: Code Analysis (Workshop Patterns) To maintain a production-grade codebase, we follow the modular structure found in the WorkshopForAgentic/code directory. This ensures that agents, clients, and workflows are decoupled and maintainable. 1. Configuration: Connecting to Local SLMs The first step is initializing the local model client using the framework's Ollama integration. # Based on WorkshopForAgentic/code/config.py from agent_framework.ollama import OllamaChatClient # Initialize the local client for Qwen-3-8B # Standard Ollama endpoint on localhost chat_client = OllamaChatClient( model_id="qwen3:8b", endpoint="http://localhost:11434" ) 2. Agent Definition: Specialized Roles Each agent is a ChatAgent instance defined by its persona and instructions. # Based on WorkshopForAgentic/code/agents.py from agent_framework import ChatAgent # The Researcher Agent: Responsible for web discovery researcher_agent = client.create_agent( name="SearchAgent", instructions="You are my assistant. Answer the questions based on the search engine.", tools=[web_search], ) # The Scriptwriter Agent: Responsible for conversational narrative generate_script_agent = client.create_agent( name="GenerateScriptAgent", instructions=""" You are my podcast script generation assistant. Please generate a 10-minute Chinese podcast script based on the provided content. The podcast script should be co-hosted by Lucy (the host) and Ken (the expert). The script content should be generated based on the input, and the final output format should be as follows: Speaker 1: âŚâŚ Speaker 2: âŚâŚ Speaker 1: âŚâŚ Speaker 2: âŚâŚ Speaker 1: âŚâŚ Speaker 2: âŚâŚ """ ) 3. Workflow Setup: The Sequential Pipeline For a deterministic production line, we use the WorkflowBuilder to connect our agents. # Based on WorkshopForAgentic/code/workflow_setup.py from agent_framework import WorkflowBuilder # Building the podcast pipeline search_executor = AgentExecutor(agent=search_agent, id="search_executor") gen_script_executor = AgentExecutor(agent=gen_script_agent, id="gen_script_executor") review_executor = ReviewExecutor(id="review_executor", genscript_agent_id="gen_script_executor") # Build workflow with approval loop # search_executor -> gen_script_executor -> review_executor # If not approved, review_executor -> gen_script_executor (loop back) workflow = ( WorkflowBuilder() .set_start_executor(search_executor) .add_edge(search_executor, gen_script_executor) .add_edge(gen_script_executor, review_executor) .add_edge(review_executor, gen_script_executor) # Loop back for regeneration .build() ) IV. Multimodal Synthesis: VibeVoice Technology The "Future Bytes" podcast is brought to life using VibeVoice, a specialized technology from Microsoft Research designed for natural conversational synthesis. Conversational Rhythm: It automatically handles natural turn-taking and speech cadences. High Efficiency: By operating at an ultra-low 7.5 Hz frame rate, it significantly reduces the compute power required for high-fidelity audio. Scalability: The system supports up to 4 distinct voices and can generate up to 90 minutes of continuous audio. V. Observability and Debugging: DevUI Building multi-agent systems requires deep visibility into the agentic "thinking" process. We leverage DevUI, a specialized web interface for testing and tracing: Interactive Tracing: Developers can watch the message flow and tool-calling in real-time. Automatic Discovery: DevUI auto-discovers agents defined within the project structure. Input Auto-Generation: The UI generates input fields based on workflow requirements, allowing for rapid iteration. VI. Technical Requirements for Edge Deployment Deploying this studio locally requires specific hardware and software configurations to handle simultaneous LLM and TTS inference: Software: Python 3.10+, Ollama, and the Microsoft Agent Framework. Hardware: 16GB+ RAM is the minimum requirement; 32GB is recommended for running multiple agents and VibeVoice concurrently. Compute: A modern GPU/NPU (e.g., NVIDIA RTX or Snapdragon X Elite) is essential for smooth inference. Final Perspective: From Coding to Directing The AI Podcast Studio represents a significant shift toward Agentic Content Creation. By mastering these orchestration patterns and leveraging local EdgeAI, developers move from simply writing code to directing entire ecosystems of intelligent agents. This "local-first" model ensures that the future of creativity is private, efficient, and infinitely scalable. Download sample Here Resource EdgeAI for Beginners - https://github.com/microsoft/edgeai-for-beginners Microsoft Agent Framework - https://github.com/microsoft/agent-framework Microsoft Agent Framework Samples - https://github.com/microsoft/agent-framework-samples7.6KViews3likes0CommentsFrom Cloud to Chip: Building Smarter AI at the Edge with Windows AI PCs
As AI engineers, weâve spent years optimizing models for the cloud, scaling inference, wrangling latency, and chasing compute across clusters. But the frontier is shifting. With the rise of Windows AI PCs and powerful local accelerators, the edge is no longer a constraint itâs now a canvas. Whether you're deploying vision models to industrial cameras, optimizing speech interfaces for offline assistants, or building privacy-preserving apps for healthcare, Edge AI is where real-world intelligence meets real-time performance. Why Edge AI, Why Now? Edge AI isnât just about running models locally, itâs about rethinking the entire lifecycle: - Latency: Decisions in milliseconds, not round-trips to the cloud. - Privacy: Sensitive data stays on-device, enabling HIPAA/GDPR compliance. - Resilience: Offline-first apps that donât break when the network does. - Cost: Reduced cloud compute and bandwidth overhead. With Windows AI PCs powered by Intel and Qualcomm NPUs and tools like ONNX Runtime, DirectML, and Olive, developers can now optimize and deploy models with unprecedented efficiency. What Youâll Learn in Edge AI for Beginners The Edge AI for Beginners curriculum is a hands-on, open-source guide designed for engineers ready to move from theory to deployment. Multi-Language Support This content is available in over 48 languages, so you can read and study in your native language. What You'll Master This course takes you from fundamental concepts to production-ready implementations, covering: Small Language Models (SLMs) optimized for edge deployment Hardware-aware optimization across diverse platforms Real-time inference with privacy-preserving capabilities Production deployment strategies for enterprise applications Why EdgeAI Matters Edge AI represents a paradigm shift that addresses critical modern challenges: Privacy & Security: Process sensitive data locally without cloud exposure Real-time Performance: Eliminate network latency for time-critical applications Cost Efficiency: Reduce bandwidth and cloud computing expenses Resilient Operations: Maintain functionality during network outages Regulatory Compliance: Meet data sovereignty requirements Edge AI Edge AI refers to running AI algorithms and language models locally on hardware, close to where data is generated without relying on cloud resources for inference. It reduces latency, enhances privacy, and enables real-time decision-making. Core Principles: On-device inference: AI models run on edge devices (phones, routers, microcontrollers, industrial PCs) Offline capability: Functions without persistent internet connectivity Low latency: Immediate responses suited for real-time systems Data sovereignty: Keeps sensitive data local, improving security and compliance Small Language Models (SLMs) SLMs like Phi-4, Mistral-7B, Qwen and Gemma are optimized versions of larger LLMs, trained or distilled for: Reduced memory footprint: Efficient use of limited edge device memory Lower compute demand: Optimized for CPU and edge GPU performance Faster startup times: Quick initialization for responsive applications They unlock powerful NLP capabilities while meeting the constraints of: Embedded systems: IoT devices and industrial controllers Mobile devices: Smartphones and tablets with offline capabilities IoT Devices: Sensors and smart devices with limited resources Edge servers: Local processing units with limited GPU resources Personal Computers: Desktop and laptop deployment scenarios Course Modules & Navigation Course duration. 10 hours of content Module Topic Focus Area Key Content Level Duration đ 00 Introduction to EdgeAI Foundation & Context EdgeAI Overview ⢠Industry Applications ⢠SLM Introduction ⢠Learning Objectives Beginner 1-2 hrs đ 01 EdgeAI Fundamentals Cloud vs Edge AI comparison EdgeAI Fundamentals ⢠Real World Case Studies ⢠Implementation Guide ⢠Edge Deployment Beginner 3-4 hrs đ§ 02 SLM Model Foundations Model families & architecture Phi Family ⢠Qwen Family ⢠Gemma Family ⢠BitNET ⢠ΟModel ⢠Phi-Silica Beginner 4-5 hrs đ 03 SLM Deployment Practice Local & cloud deployment Advanced Learning ⢠Local Environment ⢠Cloud Deployment Intermediate 4-5 hrs âď¸ 04 Model Optimization Toolkit Cross-platform optimization Introduction ⢠Llama.cpp ⢠Microsoft Olive ⢠OpenVINO ⢠Apple MLX ⢠Workflow Synthesis Intermediate 5-6 hrs đ§ 05 SLMOps Production Production operations SLMOps Introduction ⢠Model Distillation ⢠Fine-tuning ⢠Production Deployment Advanced 5-6 hrs đ¤ 06 AI Agents & Function Calling Agent frameworks & MCP Agent Introduction ⢠Function Calling ⢠Model Context Protocol Advanced 4-5 hrs đť 07 Platform Implementation Cross-platform samples AI Toolkit ⢠Foundry Local ⢠Windows Development Advanced 3-4 hrs đ 08 Foundry Local Toolkit Production-ready samples Sample applications (see details below) Expert 8-10 hrs Each module includes Jupyter notebooks, code samples, and deployment walkthroughs, perfect for engineers who learn by doing. Developer Highlights - đ§ Olive: Microsoft's optimization toolchain for quantization, pruning, and acceleration. - đ§Š ONNX Runtime: Cross-platform inference engine with support for CPU, GPU, and NPU. - đŽ DirectML: GPU-accelerated ML API for Windows, ideal for gaming and real-time apps. - đĽď¸ Windows AI PCs: Devices with built-in NPUs for low-power, high-performance inference. Local AI: Beyond the Edge Local AI isnât just about inference, itâs about autonomy. Imagine agents that: - Learn from local context - Adapt to user behavior - Respect privacy by design With tools like Agent Framework, Azure AI Foundry and Windows Copilot Studio, and Foundry Local developers can orchestrate local agents that blend LLMs, sensors, and user preferences, all without cloud dependency. Try It Yourself Ready to get started? Clone the Edge AI for Beginners GitHub repo, run the notebooks, and deploy your first model to a Windows AI PC or IoT devices Whether you're building smart kiosks, offline assistants, or industrial monitors, this curriculum gives you the scaffolding to go from prototype to production.