llm
73 TopicsAI Toolkit Extension Pack for Visual Studio Code: Ignite 2025 Update
Unlock the Latest Agentic App Capabilities The Ignite 2025 update delivers a major leap forward for the AI Toolkit extension pack in VS Code, introducing a unified, end-to-end environment for building, visualizing, and deploying agentic applications to Microsoft Foundry, and the addition of Anthropic’s frontier Claude models in the Model Catalog! This release enables developers to build and debug locally in VS Code, then deploy to the cloud with a single click. Seamlessly switch between VS Code and the Foundry portal for visualization, orchestration, and evaluation, creating a smooth roundtrip workflow that accelerates innovation and delivers a truly unified AI development experience. Download the http://aka.ms/aitoolkit today and start building next-generation agentic apps in VS Code! What Can You Do with the AI Toolkit Extension Pack? Access Anthropic models in the Model Catalog Following the Microsoft, NVIDIA and Anthropic strategic partnerships announcement today, we are excited to share that Anthropic’s frontier Claude models including Claude Sonnet 4.5, Claude Opus 4.1, and Claude Haiku 4.5, are now integrated into the AI Toolkit, providing even more choices and flexibility when building intelligent applications and AI agents. Build AI Agents Using GitHub Copilot Scaffold agent applications using best-practice patterns, tool-calling examples, tracing hooks, and test scaffolds, all powered by Copilot and aligned with the Microsoft Agent Framework. Generate agent code in Python or .NET, giving you flexibility to target your preferred runtime. Build and Customize YAML Workflows Design YAML-based workflows in the Foundry portal, then continue editing and testing directly in VS Code. To customize your YAML-based workflows, instantly convert it to Agent Framework code using GitHub Copilot. Upgrade from declarative design to code-first customization without starting from scratch. Visualize Multi-Agent Workflows Envision your code-based agent workflows with an interactive graph visualizer that reveals each component and how they connect Watch in real-time how each node lights up as you run your agent. Use the visualizer to understand and debug complex agent graphs, making iteration fast and intuitive. Experiment, Debug, and Evaluate Locally Use the Hosted Agents Playground to quickly interact with your agents on your development machine. Leverage local tracing support to debug reasoning steps, tool calls, and latency hotspots—so you can quickly diagnose and fix issues. Define metrics, tasks, and datasets for agent evaluation, then implement metrics using the Foundry Evaluation SDK and orchestrate evaluations runs with the help of Copilot. Seamless Integration Across Environments Jump from Foundry Portal to VS Code Web for a development environment in your preferred code editor setting. Open YAML workflows, playgrounds, and agent templates directly in VS Code for editing and deployment. How to Get Started Install the AI Toolkit extension pack from the VS Code marketplace. Check out documentation. Get started with building workflows with Microsoft Foundry in VS Code 1. Work with Hosted (Pro-code) Agent workflows in VS Code 2. Work with Declarative (Low-code) Agent workflows in VS Code Feedback & Support Try out the extensions and let us know what you think! File issues or feedback on our GitHub repo for Foundry extension and AI Toolkit extension. Your input helps us make continuous improvements.3.1KViews4likes0CommentsAgents League: Meet the Winners
Agents League brought together developers from around the world to build AI agents using Microsoft's developer tools. With 100+ submissions across three tracks, choosing winners was genuinely difficult. Today, we're proud to announce the category champions. 🎨 Creative Apps Winner: CodeSonify View project CodeSonify turns source code into music. As a genuinely thoughtful system, its functions become ascending melodies, loops create rhythmic patterns, conditionals trigger chord changes, and bugs produce dissonant sounds. It supports 7 programming languages and 5 musical styles, with each language mapped to its own key signature and code complexity directly driving the tempo. What makes CodeSonify stand out is the depth of execution. CodeSonify team delivered three integrated experiences: a web app with real-time visualization and one-click MIDI export, an MCP server exposing 5 tools inside GitHub Copilot in VS Code Agent Mode, and a diff sonification engine that lets you hear a code review. A clean refactor sounds harmonious. A messy one sounds chaotic. The team even built the MIDI generator from scratch in pure TypeScript with zero external dependencies. Built entirely with GitHub Copilot assistance, this is one of those projects that makes you think about code differently. 🧠 Reasoning Agents Winner: CertPrep Multi-Agent System View project CertPrep Multi-Agent System team built a production-grade 8-agent system for personalized Microsoft certification exam preparation, supporting 9 exam families including AI-102, AZ-204, AZ-305, and more. Each agent has a distinct responsibility: profiling the learner, generating a week-by-week study schedule, curating learning paths, tracking readiness, running mock assessments, and issuing a GO / CONDITIONAL GO / NOT YET booking recommendation. The engineering behind the scene here is impressive. A 3-tier LLM fallback chain ensures the system runs reliably even without Azure credentials, with the full pipeline completing in under 1 second in mock mode. A 17-rule guardrail pipeline validates every agent boundary. Study time allocation uses the Largest Remainder algorithm to guarantee no domain is silently zeroed out. 342 automated tests back it all up. This is what thoughtful multi-agent architecture looks like in practice. 💼 Enterprise Agents Winner: Whatever AI Assistant (WAIA) View project WAIA is a production-ready multi-agent system for Microsoft 365 Copilot Chat and Microsoft Teams. A workflow agent routes queries to specialized HR, IT, or Fallback agents, transparently to the user, handling both RAG-pattern Q&A and action automation — including IT ticket submission via a SharePoint list. Technically, it's a showcase of what serious enterprise agent development looks like: a custom MCP server secured with OAuth Identity Passthrough, streaming responses via the OpenAI Responses API, Adaptive Cards for human-in-the-loop approval flows, a debug mode accessible directly from Teams or Copilot, and full OpenTelemetry integration visible in the Foundry portal. Franck also shipped end-to-end automated Bicep deployment so the solution can land in any Azure environment. It's polished, thoroughly documented, and built to be replicated. Thank you To every developer who submitted and shipped projects during Agents League: thank you 💜 Your creativity and innovation brought Agents League to life! 👉 Browse all submissions on GitHubGitHub Copilot SDK and Hybrid AI in Practice: Automating README to PPT Transformation
Introduction In today's rapidly evolving AI landscape, developers often face a critical choice: should we use powerful cloud-based Large Language Models (LLMs) that require internet connectivity, or lightweight Small Language Models (SLMs) that run locally but have limited capabilities? The answer isn't either-or—it's hybrid models—combining the strengths of both to create AI solutions that are secure, efficient, and powerful. This article explores hybrid model architectures through the lens of GenGitHubRepoPPT, demonstrating how to elegantly combine Microsoft Foundry Local, GitHub Copilot SDK, and other technologies to automatically generate professional PowerPoint presentations from GitHub README files. 1. Hybrid Model Scenarios and Value 1.1 What Are Hybrid Models? Hybrid AI Models strategically combine locally-running Small Language Models (SLMs) with cloud-based Large Language Models (LLMs) within the same application, selecting the most appropriate model for each task based on its unique characteristics. Core Principles: Local Processing for Sensitive Data: Privacy-critical content analysis happens on-device Cloud for Value Creation: Complex reasoning and creative generation leverage cloud power Balancing Cost and Performance: High-frequency, simple tasks run locally to minimize API costs 1.2 Typical Hybrid Model Use Cases Use Case Local SLM Role Cloud LLM Role Value Proposition Intelligent Document Processing Text extraction, structural analysis Content refinement, format conversion Privacy protection + Professional output Code Development Assistant Syntax checking, code completion Complex refactoring, architecture advice Fast response + Deep insights Customer Service Systems Intent recognition, FAQ handling Complex issue resolution Reduced latency + Enhanced quality Content Creation Platforms Keyword extraction, outline generation Article writing, multilingual translation Cost control + Creative assurance 1.3 Why Choose Hybrid Models? Three Core Advantages: Privacy and Security Sensitive data never leaves local devices Compliant with GDPR, HIPAA, and other regulations Ideal for internal corporate documents and personal information Cost Optimization Reduces cloud API call frequency Local models have zero usage fees Predictable operational costs Performance and Reliability Local processing eliminates network latency Partial functionality in offline environments Cloud models ensure high-quality output 2. Core Technology Analysis 2.1 Large Language Models (LLMs): Cloud Intelligence Representatives What are LLMs? Large Language Models are deep learning-based natural language processing models, typically with billions to trillions of parameters. Through training on massive text datasets, they've acquired powerful language understanding and generation capabilities. Representative Models: Claude Sonnet 4.5: Anthropic's flagship model, excelling at long-context processing and complex reasoning GPT-5.2 Series: OpenAI's general-purpose language models Gemini: Google's multimodal large models LLM Advantages: ✅ Exceptional text generation quality ✅ Powerful contextual understanding ✅ Support for complex reasoning tasks ✅ Continuous model updates and optimization Typical Applications: Professional document writing (technical reports, business plans) Code generation and refactoring Multilingual translation Creative content creation 2.2 Small Language Models (SLMs) and Microsoft Foundry Local 2.2.1 SLM Characteristics Small Language Models typically have 1B-7B parameters, designed specifically for resource-constrained environments. Mainstream SLM Model Families: Microsoft Phi Family (Phi Family): Inference-optimized efficient models Alibaba Qwen Family (Qwen Family): Excellent Chinese language capabilities Mistral Series: Outstanding performance with small parameter counts SLM Advantages: ⚡ Low-latency response (millisecond-level) 💰 Zero API costs 🔒 Fully local, data stays on-device 📱 Suitable for edge device deployment 2.2.2 Microsoft Foundry Local: The Foundation of Local AI Foundry Local is Microsoft's local AI runtime tool, enabling developers to easily run SLMs on Windows or macOS devices. Core Features: OpenAI-Compatible API # Using Foundry Local is like using OpenAI API from openai import OpenAI from foundry_local import FoundryLocalManager manager = FoundryLocalManager("qwen2.5-7b-instruct") client = OpenAI( base_url=manager.endpoint, api_key=manager.api_key ) Hardware Acceleration Support CPU: General computing support GPU: NVIDIA, AMD, Intel graphics acceleration NPU: Qualcomm, Intel AI-specific chips Apple Silicon: Neural Engine optimization Based on ONNX Runtime Cross-platform compatibility Highly optimized inference performance Supports model quantization (INT4, INT8) Convenient Model Management # View available models foundry model list # Run a model foundry model run qwen2.5-7b-instruct-generic-cpu:4 # Check running status foundry service ps Foundry Local Application Value: 🎓 Educational Scenarios: Students can learn AI development without cloud subscriptions 🏢 Enterprise Environments: Process sensitive data while maintaining compliance 🧪 R&D Testing: Rapid prototyping without API cost concerns ✈️ Offline Environments: Works on planes, subways, and other no-network scenarios 2.3 GitHub Copilot SDK: The Express Lane from Agent to Business Value 2.3.1 What is GitHub Copilot SDK? GitHub Copilot SDK, released as a technical preview on January 22, 2026, is a game-changer for AI Agent development. Unlike other AI SDKs, Copilot SDK doesn't just provide API calling interfaces—it delivers a complete, production-grade Agent execution engine. Why is it revolutionary? Traditional AI application development requires you to build: ❌ Context management systems (multi-turn conversation state) ❌ Tool orchestration logic (deciding when to call which tool) ❌ Model routing mechanisms (switching between different LLMs) ❌ MCP server integration ❌ Permission and security boundaries ❌ Error handling and retry mechanisms Copilot SDK provides all of this out-of-the-box, letting you focus on business logic rather than underlying infrastructure. 2.3.2 Core Advantages: The Ultra-Short Path from Concept to Code Production-Grade Agent Engine: Battle-Tested Reliability Copilot SDK uses the same Agent core as GitHub Copilot CLI, which means: ✅ Validated in millions of real-world developer scenarios ✅ Capable of handling complex multi-step task orchestration ✅ Automatic task planning and execution ✅ Built-in error recovery mechanisms Real-World Example: In the GenGitHubRepoPPT project, we don't need to hand-write the "how to convert outline to PPT" logic—we simply tell Copilot SDK the goal, and it automatically: Analyzes outline structure Plans slide layouts Calls file creation tools Applies formatting logic Handles multilingual adaptation # Traditional approach: requires hundreds of lines of code for logic def create_ppt_traditional(outline): slides = parse_outline(outline) for slide in slides: layout = determine_layout(slide) content = format_content(slide) apply_styling(content, layout) # ... more manual logic return ppt_file # Copilot SDK approach: focus on business intent session = await client.create_session({ "model": "claude-sonnet-4.5", "streaming": True, "skill_directories": [skills_dir] }) session.send_and_wait({"prompt": prompt}, timeout=600) Custom Skills: Reusable Encapsulation of Business Knowledge This is one of Copilot SDK's most powerful features. In traditional AI development, you need to provide complete prompts and context with every call. Skills allow you to: Define once, reuse forever: # .copilot_skills/ppt/SKILL.md # PowerPoint Generation Expert Skill ## Expertise You are an expert in business presentation design, skilled at transforming technical content into easy-to-understand visual presentations. ## Workflow 1. **Structure Analysis** - Identify outline hierarchy (titles, subtitles, bullet points) - Determine topic and content density for each slide 2. **Layout Selection** - Title slide: Use large title + subtitle layout - Content slides: Choose single/dual column based on bullet count - Technical details: Use code block or table layouts 3. **Visual Optimization** - Apply professional color scheme (corporate blue + accent colors) - Ensure each slide has a visual focal point - Keep bullets to 5-7 items per page 4. **Multilingual Adaptation** - Choose appropriate fonts based on language (Chinese: Microsoft YaHei, English: Calibri) - Adapt text direction and layout conventions ## Output Requirements Generate .pptx files meeting these standards: - 16:9 widescreen ratio - Consistent visual style - Editable content (not images) - File size < 5MB Business Code Generation Capability This is the core value of this project. Unlike generic LLM APIs, Copilot SDK with Skills can generate truly executable business code. Comparison Example: Aspect Generic LLM API Copilot SDK + Skills Task Description Requires detailed prompt engineering Concise business intent suffices Output Quality May need multiple adjustments Professional-grade on first try Code Execution Usually example code Directly generates runnable programs Error Handling Manual implementation required Agent automatically handles and retries Multi-step Tasks Manual orchestration needed Automatic planning and execution Comparison of manual coding workload: Task Manual Coding Copilot SDK Processing logic code ~500 lines ~10 lines configuration Layout templates ~200 lines Declared in Skill Style definitions ~150 lines Declared in Skill Error handling ~100 lines Automatically handled Total ~950 lines ~10 lines + Skill file Tool Calling & MCP Integration: Connecting to the Real World Copilot SDK doesn't just generate code—it can directly execute operations: 🗃️ File System Operations: Create, read, modify files 🌐 Network Requests: Call external APIs 📊 Data Processing: Use pandas, numpy, and other libraries 🔧 Custom Tools: Integrate your business logic 3. GenGitHubRepoPPT Case Study 3.1 Project Overview GenGitHubRepoPPT is an innovative hybrid AI solution that combines local AI models with cloud-based AI agents to automatically generate professional PowerPoint presentations from GitHub repository README files in under 5 minutes. Technical Architecture: 3.2 Why Adopt a Hybrid Model? Stage 1: Local SLM Processes Sensitive Data Task: Analyze GitHub README, extract key information, generate structured outline Reasons for choosing Qwen-2.5-7B + Foundry Local: Privacy Protection README may contain internal project information Local processing ensures data doesn't leave the device Complies with data compliance requirements Cost Effectiveness Each analysis processes thousands of tokens Cloud API costs are significant in high-frequency scenarios Local models have zero additional fees Performance Qwen-2.5-7B excels at text analysis tasks Outstanding Chinese support Acceptable CPU inference latency (typically 2-3 seconds) Stage 2: Cloud LLM + Copilot SDK Creates Business Value Task: Create well-formatted PowerPoint files based on outline Reasons for choosing Claude Sonnet 4.5 + Copilot SDK: Automated Business Code Generation Traditional approach pain points: Need to hand-write 500+ lines of code for PPT layout logic Require deep knowledge of python-pptx library APIs Style and formatting code is error-prone Multilingual support requires additional conditional logic Copilot SDK solution: Declare business rules and best practices through Skills Agent automatically generates and executes required code Zero-code implementation of complex layout logic Development time reduced from 2-3 days to 2-3 hours Ultra-Short Path from Intent to Execution Comparison: Different ways to implement "Generate professional PPT" 3. Production-Grade Reliability and Quality Assurance Battle-tested Agent engine: Uses the same core as GitHub Copilot CLI Validated in millions of real-world scenarios Automatically handles edge cases and errors Consistent output quality: Professional standards ensured through Skills Automatic validation of generated files Built-in retry and error recovery mechanisms 4. Rapid Iteration and Optimization Capability Scenario: Client requests PPT style adjustment The GitHub Repo https://github.com/kinfey/GenGitHubRepoPPT 4. Summary 4.1 Core Value of Hybrid Models + Copilot SDK The GenGitHubRepoPPT project demonstrates how combining hybrid models with Copilot SDK creates a new paradigm for AI application development. Privacy and Cost Balance The hybrid approach allows sensitive README analysis to happen locally using Qwen-2.5-7B, ensuring data never leaves the device while incurring zero API costs. Meanwhile, the value-creating work—generating professional PowerPoint presentations—leverages Claude Sonnet 4.5 through Copilot SDK, delivering quality that justifies the per-use cost. From Code to Intent Traditional AI development required writing hundreds of lines of code to handle PPT generation logic, layout selection, style application, and error handling. With Copilot SDK and Skills, developers describe what they want in natural language, and the Agent automatically generates and executes the necessary code. What once took 3-5 days now takes 3-4 hours, with 95% less code to maintain. Automated Business Code Generation Copilot SDK doesn't just provide code examples—it generates complete, executable business logic. When you request a multilingual PPT, the Agent understands the requirement, selects appropriate fonts, generates the implementation code, executes it with error handling, validates the output, and returns a ready-to-use file. Developers focus on business intent rather than implementation details. 4.2 Technology Trends The Shift to Intent-Driven Development We're witnessing a fundamental change in how developers work. Rather than mastering every programming language detail and framework API, developers are increasingly defining what they want through declarative Skills. Copilot SDK represents this future: you describe capabilities in natural language, and AI Agents handle the code generation and execution automatically. Edge AI and Cloud AI Integration The evolution from pure cloud LLMs (powerful but privacy-concerning) to pure local SLMs (private but limited) has led to today's hybrid architectures. GenGitHubRepoPPT exemplifies this trend: local models handle data analysis and structuring, while cloud models tackle complex reasoning and professional output generation. This combination delivers fast, secure, and professional results. Democratization of Agent Development Copilot SDK dramatically lowers the barrier to building AI applications. Senior engineers see 10-20x productivity gains. Mid-level engineers can now build sophisticated agents that were previously beyond their reach. Even junior engineers and business experts can participate by writing Skills that capture domain knowledge without deep technical expertise. The future isn't about whether we can build AI applications—it's about how quickly we can turn ideas into reality. References Projects and Code GenGitHubRepoPPT GitHub Repository - Case study project Microsoft Foundry Local - Local AI runtime GitHub Copilot SDK - Agent development SDK Copilot SDK Getting Started Tutorial - Official quick start Deep Dive: Copilot SDK Build an Agent into Any App with GitHub Copilot SDK - Official announcement GitHub Copilot SDK Cookbook - Practical examples Copilot CLI Official Documentation - CLI tool documentation Learning Resources Edge AI for Beginners - Edge AI introductory course Azure AI Foundry Documentation - Azure AI documentation GitHub Copilot Extensions Guide - Extension development guide1.8KViews3likes2CommentsHow to use any Python AI agent framework with free GitHub Models
I ❤️ when companies offer free tiers for developer services, since it gives everyone a way to learn new technologies without breaking the bank. Free tiers are especially important for students and people between jobs, when the desire to learn is high but the available cash is low. That's why I'm such a fan of GitHub Models: free, high-quality generative AI models available to anyone with a GitHub account. The available models include the latest OpenAI LLMs (like o3-mini), LLMs from the research community (like Phi and Llama), LLMs from other popular providers (like Mistral and Jamba), multimodal models (like gpt-4o and llama-vision-instruct) and even a few embedding models (from OpenAI and Cohere). With access to such a range of models, you can prototype complex multi-model workflows to improve your productivity or heck, just make something fun for yourself. 🤗 To use GitHub Models, you can start off in no-code mode: open the playground for a model, send a few requests, tweak the parameters, and check out the answers. When you're ready to write code, select "Use this model". A screen will pop up where you can select a programming language (Python/JavaScript/C#/Java/REST) and select an SDK (which varies depending on model). Then you'll get instructions and code for that model, language, and SDK. But here's what's really cool about GitHub Models: you can use them with all the popular Python AI frameworks, even if the framework has no specific integration with GitHub Models. How is that possible? The vast majority of Python AI frameworks support the OpenAI Chat Completions API, since that API became a defacto standard supported by many LLM API providers besides OpenAI itself. GitHub Models also provide OpenAI-compatible endpoints for chat completion models. Therefore, any Python AI framework that supports OpenAI-like models can be used with GitHub Models as well. 🎉 To prove it, I've made a new repository with examples from eight different Python AI agent packages, all working with GitHub Models: python-ai-agent-frameworks-demos. There are examples for AutoGen, LangGraph, Llamaindex, OpenAI Agents SDK, OpenAI standard SDK, PydanticAI, Semantic Kernel, and SmolAgents. You can open that repository in GitHub Codespaces, install the packages, and get the examples running immediately. Now let's walk through the API connection code for GitHub Models for each framework. Even if I missed your favorite framework, I hope my tips here will help you connect any framework to GitHub Models. OpenAI I'll start with openai , the package that started it all! import openai client = openai.OpenAI( api_key=os.environ["GITHUB_TOKEN"], base_url="https://models.inference.ai.azure.com") The code above demonstrates the two key parameters we'll need to configure for all frameworks: api_key : When using OpenAI.com, you pass your OpenAI API key here. When using GitHub Models, you pass in a Personal Access Token (PAT). If you open the repository (or any repository) in GitHub Codespaces, a PAT is already stored in the GITHUB_TOKEN environment variable. However, if you're working locally with GitHub Models, you'll need to generate a PAT yourself and store it. PATs expire after a while, so you need to generate new PATs every so often. base_url : This parameter tells the OpenAI client to send all requests to "https://models.inference.ai.azure.com" instead of the OpenAI.com API servers. That's the domain that hosts the OpenAI-compatible endpoint for GitHub Models, so you'll always pass that domain as the base URL. If we're working with the new openai-agents SDK, we use very similar code, but we must use the AsyncOpenAI client from openai instead. Lately, Python AI packages are defaulting to async, because it's so much better for performance. import agents import openai client = openai.AsyncOpenAI( base_url="https://models.inference.ai.azure.com", api_key=os.environ["GITHUB_TOKEN"]) model = agents.OpenAIChatCompletionsModel( model="gpt-4o", openai_client=client) spanish_agent = agents.Agent( name="Spanish agent", instructions="You only speak Spanish.", model=model) PydanticAI Now let's look at all of the packages that make it really easy for us, by allowing us to directly bring in an instance of either OpenAI or AsyncOpenAI . For PydanticAI, we configure an AsyncOpenAI client, then construct an OpenAIModel object from PydanticAI, and pass that model to the agent: import openai import pydantic_ai import pydantic_ai.models.openai client = openai.AsyncOpenAI( api_key=os.environ["GITHUB_TOKEN"], base_url="https://models.inference.ai.azure.com") model = pydantic_ai.models.openai.OpenAIModel( "gpt-4o", provider=OpenAIProvider(openai_client=client)) spanish_agent = pydantic_ai.Agent( model, system_prompt="You only speak Spanish.") Semantic Kernel For Semantic Kernel, the code is very similar. We configure an AsyncOpenAI client, then construct an OpenAIChatCompletion object from Semantic Kernel, and add that object to the kernel. import openai import semantic_kernel.connectors.ai.open_ai import semantic_kernel.agents chat_client = openai.AsyncOpenAI( api_key=os.environ["GITHUB_TOKEN"], base_url="https://models.inference.ai.azure.com") chat = semantic_kernel.connectors.ai.open_ai.OpenAIChatCompletion( ai_model_id="gpt-4o", async_client=chat_client) kernel.add_service(chat) spanish_agent = semantic_kernel.agents.ChatCompletionAgent( kernel=kernel, name="Spanish agent" instructions="You only speak Spanish") AutoGen Next, we'll check out a few frameworks that have their own wrapper of the OpenAI clients, so we won't be using any classes from openai directly. For AutoGen, we configure both the OpenAI parameters and the model name in the same object, then pass that to each agent: import autogen_ext.models.openai import autogen_agentchat.agents client = autogen_ext.models.openai.OpenAIChatCompletionClient( model="gpt-4o", api_key=os.environ["GITHUB_TOKEN"], base_url="https://models.inference.ai.azure.com") spanish_agent = autogen_agentchat.agents.AssistantAgent( "spanish_agent", model_client=client, system_message="You only speak Spanish") LangGraph For LangGraph, we configure a very similar object, which even has the same parameter names: import langchain_openai import langgraph.graph model = langchain_openai.ChatOpenAI( model="gpt-4o", api_key=os.environ["GITHUB_TOKEN"], base_url="https://models.inference.ai.azure.com", ) def call_model(state): messages = state["messages"] response = model.invoke(messages) return {"messages": [response]} workflow = langgraph.graph.StateGraph(MessagesState) workflow.add_node("agent", call_model) SmolAgents Once again, for SmolAgents, we configure a similar object, though with slightly different parameter names: import smolagents model = smolagents.OpenAIServerModel( model_id="gpt-4o", api_key=os.environ["GITHUB_TOKEN"], api_base="https://models.inference.ai.azure.com") agent = smolagents.CodeAgent(model=model) Llamaindex I saved Llamaindex for last, as it is the most different. The llama-index package has a different constructor for OpenAI.com versus OpenAI-like servers, so I opted to use that OpenAILike constructor instead. However, I also needed an embeddings model for my example, and the package doesn't have an OpenAIEmbeddingsLike constructor, so I used the standard OpenAIEmbedding constructor. import llama_index.embeddings.openai import llama_index.llms.openai_like import llama_index.core.agent.workflow Settings.llm = llama_index.llms.openai_like.OpenAILike( model="gpt-4o", api_key=os.environ["GITHUB_TOKEN"], api_base="https://models.inference.ai.azure.com", is_chat_model=True) Settings.embed_model = llama_index.embeddings.openai.OpenAIEmbedding( model="text-embedding-3-small", api_key=os.environ["GITHUB_TOKEN"], api_base="https://models.inference.ai.azure.com") agent = llama_index.core.agent.workflow.ReActAgent( tools=query_engine_tools, llm=Settings.llm) Choose your models wisely! In all of the examples above, I specified the gpt-4o model. The gpt-4o model is a great choice for agents because it supports function calling, and many agent frameworks only work (or work best) with models that natively support function calling. Fortunately, GitHub Models includes multiple models that support function calling, at least in my basic experiments: gpt-4o gpt-4o-mini o3-mini AI21-Jamba-1.5-Large AI21-Jamba-1.5-Mini Codestral-2501 Cohere-command-r Ministral-3B Mistral-Large-2411 Mistral-Nemo Mistral-small You might find that some models work better than others, especially if you're using agents with multiple tools. With GitHub Models, it's very easy to experiment and see for yourself, by simply changing the model name and re-running the code. Join the AI Agents Hackathon We are currently running a free virtual hackathon from April 8th - 30th, to challenge developers to create agentic applications using Microsoft technologies. You could build an agent entirely using GitHub Models and submit it to the hackathon for a chance to win amazing prizes! You can also join our 30+ streams about building AI agents, including a stream all about prototyping with GitHub Models. Learn more and register at https://aka.ms/agentshack2.9KViews3likes0CommentsIntroducing PII Shield: A Privacy Proxy for Every LLM Call
Why do we need a utility like PII Shield? Let’s start with an uncomfortable truth: modern AI systems are swimming in sensitive data - often without us fully realizing the risks we’re taking. The data is sensitive. Names, emails, phone numbers, credit cards, IBANs, SSNs, Aadhaar, PAN, Driving Licenses, UPI IDs, addresses - all of them frequently appear in user prompts, RAG documents, and tool inputs. Prompts leaving trust boundary: Even when the model lives in your own subscription - Azure OpenAI, a Foundry deployment, a self-hosted OSS model - the prompt still travels through SDKs, gateways, observability stacks etc. Before it gets to LLM, the inference layer often logs, evaluates, or fine-tunes against what it sees. Any of those hops could be a potential exposure of raw PII if not handled well. Changing regulatory asks: EU AI Act, India's DPDP Act, CCPA, HIPAA, PCI-DSS and a dozen sectoral rules now explicitly cover AI-driven processing. “Not knowing that your data may get stored elsewhere” is not an option anymore. The ad-hoc fixes don't scale. Most teams start with a regex or two. Then they add another team, another use case, another language, another ID format — and the regex file becomes a tar pit nobody wants to touch. The damage from getting this wrong is asymmetric. When controls hold and decisions are sound, nothing remarkable happens. The AI feature launches, threats are contained, and the system does what it was designed to do. But when security assumptions fail, the consequences compound fast. A single misstep can trigger public disclosures, regulatory scrutiny, financial penalties and more. That imbalance is why security can’t be treated as a finishing step. We need a single, well-tested, observable layer that sits between every AI application in the organisation and every model it talks to - and makes sure raw PII never crosses that boundary. Introducing PII Shield "PII Shield" is an intelligent anonymization layer that sits between your AI application and the LLM. It detects PII, applies a configurable privacy action per entity type, and - when you want it to - reverses the anonymization after the LLM has done its work. At a glance: REST API: /anonymize, /deanonymize and /apps - built with FastAPI. PII detection via Microsoft Presidio, with a pluggable NLP backend (spaCy, ONNX, Stanza, HuggingFace Transformers). Custom recognisers for India specific identifiers: Aadhaar, PAN, Driving Licence (all 36 states), PIN code, UPI ID, Indian phone numbers — plus 100+ Presidio defaults. Per-entity strategies: replace, hash (SHA-256), encrypt (PQC ML-KEM-768 + AES-256-GCM) or fake Multi-tenant: register applications via POST /apps, configure strategies per app, route via the X-App-Id header. Reversible by design: the "Anonymise → LLM → De-anonymise" sandwich pattern means the LLM only sees anonymized values (e.g. {{PERSON_1}} and {{EMAIL_ADDRESS_2}} etc.), never the real values. Library mode: pip install pii_shield and use the engine directly for batch jobs, no FastAPI required. Observability built in - Open Telemetry traces/metrics/logs, dual-backend (OTLP for local Grafana LGTM, Azure Monitor for production), pre-built Grafana dashboards Before we go deep into how this works, let's understand a few key concepts that this utility is centered around. Entities and recognisers An entity is a single, contiguous piece of PII that PII Shield treats as one unit of redaction - for example PERSON, EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, IBAN_CODE, LOCATION, IN_AADHAAR, IN_PAN, IN_DRIVING_LICENSE, IN_UPI_ID. Each entity has: a type (the label above) that determines which placeholder format and which anonymisation strategy applies. a span in the original text - start and end character offsets - so the anonymiser can replace exactly the right slice without disturbing the surrounding text. a confidence score (0.0–1.0) reported by whichever recogniser found it, which the downstream code uses to drop low-quality hits or break ties between overlapping matches. a stable identity within the request - two occurrences of "Rahul Sharma" are the same entity instance, so they collapse to a single {{PERSON_1}} placeholder rather than {{PERSON_1}} and {{PERSON_2}}. That property is what makes coreference survive the round-trip through the LLM. A recogniser is the component that actually finds entities of a given type in text. PII Shield uses three recognizers, and they typically run together: NLP recognisers rely on the language model loaded by the Presidio Analyzer (spaCy / ONNX / Stanza / Transformers). They're the right tool for entities that have no fixed shape such as PERSON, LOCATION, ORGANIZATION, NRP where context and grammar matter more than format. E.g. "Rahul met Priya in Bengaluru" can't be detected with regex; the model must understand that those tokens are people and a place. Pattern recognisers (extension of the PatternRecognizer Class) are small Python classes that combine three things: one or more regex patterns (e.g. the 12-digit Aadhaar layout, the AAAAA9999A PAN structure, a state-prefixed Driving Licence number); a list of context keywords that boost confidence when they appear nearby (aadhaar, uidai, licence, dl, pan, card, dob); and a base confidence score plus an optional validator function for IDs with a checksum (Aadhaar's Verhoeff digit, PAN's structural rules, IBAN's mod-97). The regex says "this could be an Aadhaar"; the context boost says "and the word 'aadhaar' is three tokens to the left, so I'm now very sure" and the validator says "the checksum agrees, accept it." That layering is what keeps false positives in check on noisy CRM notes and chat transcripts. Hybrid / deny-list recognisers sit on top, for things like an internal project codename list or a customer-id format that's specific to your business. When the Analyzer runs, every recogniser scores the same span independently; overlapping hits are reconciled (highest-confidence wins, longer spans absorb shorter ones), and a final list of (entity type, start, end, score) tuples is what the anonymiser sees. Adding a new recognizer is easy with the YAML configuration approach. Refer to the below document for details how-to-add-a-recognizer The Sandwich Pattern We call it a sandwich pattern because two thin layers of PII Shield wrap around the LLM call: a) the top slice (anonymise) replaces every detected entity with a stable, opaque placeholder before the prompt ever leaves your trust boundary, b) the bottom slice (de-anonymise) puts the original values back when the response returns. The LLM in the middle is just the filling - it reasons over {{PERSON_1}} and {{EMAIL_ADDRESS_2}} instead of "Rahul Sharma" and "rahul@example.com", and it has no idea that anything was ever swapped. This is what makes the pattern so easy to retrofit: your application keeps its existing prompts, the LLM keeps its existing API, and the only "PII Shield" component that knows the real values is the very short-lived mapping. Coreference still works end-to-end because {{PERSON_1}} is the same person everywhere in the prompt - and, almost always, everywhere in the response too. Encrypted entities use ML-KEM-768 + AES-256-GCM by default — a NIST-standardised post-quantum KEM. High level design The diagram below depicts key components of “PII Shield”. User / Client App: The end user (or a downstream service) who initiates the request that eventually reaches an AI workload. They are unaware of the redaction layer; from their perspective the AI app responds with full, restored PII. AI Application (Agent / RAG / Chat): The customer's GenAI workload (chatbot, agentic flow, RAG pipeline). It owns the business logic but does not call the LLM directly with raw user data. Instead, it routes every prompt through PII Shield first and only forwards the anonymized result. This keeps the AI app free of redaction logic and lets the same shield be reused across multiple workloads. PII Shield (FastAPI): The central, stateless service. Every request flows through this single boundary, which makes it the natural place to enforce policy, rate-limit, audit, and instrument. Internally it is composed of several cooperating layers: REST API - the public surface: POST /anonymize_unique (detect + redact in one call, returning a session ID), POST /deanonymize (restore using that session ID), plus /apps and /apps/{id}/config for tenant onboarding and per-app strategy configuration. Health, metrics, and OpenAPI docs are exposed alongside. Presidio Engine: the detection-and-redaction core, built on Microsoft Presidio and split into two stages - Analyze: runs the configured NLP backend (spaCy for fastest CPU inference, ONNX for quantized speed, Stanza for higher recall, or Hugging Face Transformers for state-of-the-art models like IndicNER) and PII Shield's own custom recognizers (Aadhaar, PAN, IFSC, UPI, IN_BANK_ACCOUNT, IN_PHONE, IN_PIN_CODE, CKYC, PRAN, APAAR, GeoCoordinate, NaturalDate, etc.). Each detected span carries a confidence score and is disambiguated using context-keyword boosting. Anonymizer (Operators): applies the per-entity strategy chosen by the app config: replace (placeholder tokens like {{IN_AADHAAR_1}} for the LLM-sandwich pattern), hash (irreversible SHA-256, for analytics on hashed values), encrypt (reversible AES-GCM or post-quantum Kyber for compliance-sensitive flows), or fake (format-preserving synthetic values from Faker for safe demo / synthetic-test datasets). State Stores: there are two logical stores, both backed by Redis but with different lifetimes and access patterns: Session Mapping Store: short-lived UUID→entity-map records that power the sandwich pattern (anonymize → call LLM → de-anonymize the response back to original values). The default TTL is intentionally short (seconds-to-minutes) so plaintext mappings don't linger. App Registry: long-lived per-app configuration: entity-type strategies, allow-lists, entity-type-allow-lists, encryption-key references, and a curated allow-list of fictional brand names. This is what makes the platform multi-tenant with per-tenant policy. Telemetry: Uses OpenTelemetry SDK, and auto-instruments every stage of the pipeline: span per request, metrics for entity-type counts/anonymization latency/cache hit rate, and structured logs with trace correlation. The same OTLP stream can be split between Azure Monitor and a self-hosted Grafana stack. LLM (Azure OpenAI / OSS): The downstream model. PII Shield's contract guarantees it only ever receives anonymized text; raw PII never crosses the trust boundary into the model provider's tenancy. This satisfies most data-residency, DPDP, and GDPR concerns by construction rather than by audit after the fact. Azure Monitor / OTLP → Grafana: This is where OpenTelemetry data lands. Operators get pre-built Grafana dashboards (request volume by entity type, p95/p99 latency per stage, per-app usage, error breakdowns) so production issues — a misbehaving recognizer, a slow NLP backend, a noisy app — are diagnosed from a single pane. Playground & Admin UI: Two Streamlit front-ends come bundled with the platform. The Playground lets developers paste arbitrary text, switch NLP backends, and instantly see what gets detected and how it is redacted — invaluable for tuning recognizers and demos. The Admin UI lets operators register applications, set per-entity strategies, manage allow-lists, and view live entity stats. Both call the same REST API, so anything the UI does is reproducible from a script or CI pipeline. Request flow at a glance - A user types a message into the AI application — say, a chatbot that helps customers with banking queries. Before the application sends anything to the language model, it hands the raw text over to PII Shield through the /anonymize_unique endpoint. PII Shield runs the text through its detection pipeline, swaps every piece of personal information for a stable placeholder token, and returns two things to the AI app: the redacted text and a short-lived session ID. The application then calls the LLM with this safe, placeholder-laden prompt — the model never sees a real Aadhaar number, account number, or phone number. Once the LLM responds, the AI app posts that response back to PII Shield's /deanonymize endpoint along with the same session ID it received earlier. PII Shield looks up the session, restores every placeholder in the LLM's reply with the original value, and hands the fully restored text back to the application, which delivers it to the user. From the user's perspective the conversation feels completely natural — they see their own details, in their own words, exactly as they'd expect. Behind the scenes, the session mapping that made the round-trip possible is automatically dropped from Redis the moment its TTL expires, so plaintext PII never lingers in the system longer than it has to. In parallel, every stage of this flow emits OpenTelemetry traces, metrics, and logs to the observability stack, giving operators a clear, real-time view of what was detected, how long each step took, and how the system is behaving in production. The entire set-up could be deployed on premises as well as on any of the clouds by replacing components with corresponding cloud offerings. Below is an example in Azure - Container apps is used to host the REST services, Azure Cache for Redis is for the state store, Application Insights, Log Ananlutics and Azure Managed Grafana are used for telemetry. Please refer to the codebase here for the implementation and deployment scripts (local & Azure). code repository pii-shield How does it work? The sandwich pattern explained earlier has three phrases: Anonymize LLM Processing De-anonymize Anonymize When the app sends POST /anonymize with "Call Rahul Sharma at rahul@example.com about Aadhaar 2345 6789 0123.", PII Shield does five things in order: Resolve config. If an X-App-Id header is present, the app's per-entity strategy map is loaded from Redis (with a short in-process cache to keep the hot path fast). If absent, global defaults apply. Detect. The Presidio Analyzer runs the configured NLP backend (spaCy / ONNX / Transformers) plus all built-in and custom pattern recognisers. The output is a list of spans — (entity_type, start, end, score, text) tuples — possibly overlapping, possibly with low-confidence noise. Post-analyse. Adjacent locations and PIN codes get merged into a single ADDRESS, overlapping detections are deduped (longest wins, ties broken by score), context keywords boost confidence, and entries below threshold are dropped. This is the difference between "raw Presidio output" and "something safe to anonymise". Assign unique placeholders. For each distinct PII value, a numbered placeholder is minted: {{PERSON_1}}, {{PERSON_2}}, … The numbering is per entity type, per request, and ordering is stable — the same person mentioned three times in the input gets the same {{PERSON_1}} everywhere. This is what lets the LLM reason about coreference correctly. Apply operators. For each entity, the per-app strategy decides what to write into the output: replace → the placeholder above (reversible; mapping kept). hash → {ENTITY}_<sha256-prefix>> (one-way; not added to entity_mapping — there's nothing to restore). encrypt → {{{ENTITY}_ENC_<base64-ciphertext>}} using ML-KEM-768 + AES-256-GCM (reversible without keeping a mapping; the ciphertext is self-contained). fake → a synthetic value from the locale-aware faker (irreversible; useful for demos and synthetic test data). The reversible mapping {placeholder → original} is then saved to the session store (in-memory dict keyed by a fresh UUID by default; swappable for Redis or a database). The response carries the id, the anonymized_text, and entity_mapping for inspection: { "id": "a1b2c3d4-...", "anonymized_text": "Call {{PERSON_1}} at {{EMAIL_ADDRESS_1}} about Aadhaar {{IN_AADHAAR_1}}.", "entity_mapping": { "{{PERSON_1}}": "Rahul Sharma", "{{EMAIL_ADDRESS_1}}": "rahul@example.com", "{{IN_AADHAAR_1}}": "2345 6789 0123" } } LLM Processing The app sends the anonymised text to the LLM exactly as it would have sent the original. The model treats {{PERSON_1}}, {{IN_AADHAAR_1}}, etc. as opaque proper nouns and reasons over them naturally. Because each distinct value got a stable, unique placeholder, the model doesn't lose coreference: "Tell {{PERSON_1}} that their Aadhaar {{IN_AADHAAR_1}} has been verified" still makes perfect sense. The response usually preserves the placeholders verbatim — sometimes the LLM rewrites half the sentence, but as long as the placeholders are still in there somewhere, de-anonymisation will work. De-anonymize The app calls POST /deanonymize with the session id and the LLM's response. PII Shield does three things: Fetch the mapping for that id from the session store (single dict lookup; sub-microsecond). Replace placeholders in the text — longest placeholder first. This matters: a naïve replacement could let {{PERSON_1}} clobber the inner part of {{PERSON_10}}. Sorting by length descending is a small detail with big consequences. Decrypt encrypted entities in-place. Encrypted entities don't appear in the session mapping (the ciphertext is the placeholder), so they're handled separately: any {{ENTITY_ENC_…}} token in the text is base64-decoded and decrypted with the configured backend (PQC by default; Fernet auto-detected for backwards compatibility). Importantly, the input text to /deanonymize does not have to match what /anonymize produced. The LLM may have rewritten everything around the placeholders - added text, translated, summarised - and PII Shield will still cleanly restore the PII wherever the placeholders appear. Only the placeholders trigger replacement; everything else passes through untouched. The library mode The application can also be used as a library if you do not want to deploy it as a service for batch use cases. Here is an example: from pii_shield import PiiShieldEngine engine = PiiShieldEngine() # load NLP model once, reuse result = engine.anonymize("Rahul's Aadhaar is 2345 6789 0123") # "<PERSON_1>'s Aadhaar is <IN_AADHAAR_1>" restored = engine.deanonymize(result.anonymized_text, result.entity_mapping) The BatchProcessor currently handles CSVs and DataFrames with thread/process parallelism - useful for ad-hoc file processing and batch offline pipelines. What's next PII Shield is the foundation. Where it really starts to pay off is when you wire it into the rest of your AI stack so that no developer can accidentally bypass it. There are two more blogs in the series: PII Shield as middleware in Microsoft Agent Framework Agents are versatile. They invoke tools, fan out to sub-agents, cache intermediate context, and call LLMs from places you didn't expect. The next post will show how to plug PII Shield in as a middleware in Microsoft Agent Framework — automatically anonymising every prompt going out to a model and de-anonymising every response coming back, transparently to the agent author. No more "did we remember to scrub this one?". PII Shield + Azure API Management (APIM) For organisations standardising on APIM as their LLM gateway, the natural place for PII Shield is inside the policy pipeline. The follow-up post will cover integrating PII Shield with APIM so that every request to an LLM backend — regardless of which app or team made it — is intercepted, anonymised, forwarded, and de-anonymised on the way back. One policy, organisation-wide privacy.If You're Building AI on Azure, ECS 2026 is Where You Need to Be
Let me be direct: there's a lot of noise in the conference calendar. Generic cloud events. Vendor showcases dressed up as technical content. Sessions that look great on paper but leave you with nothing you can actually ship on Monday. ECS 2026 isn't that. As someone who will be on stage at Cologne this May, I can tell you the European Collaboration Summit combined with the European AI & Cloud Summit and European Biz Apps Summit is one of the few events I've seen where engineers leave with real, production-applicable knowledge. Three days. Three summits. 3,000+ attendees. One of the largest Microsoft-focused events in Europe, and it keeps getting better. If you're building AI systems on Azure, designing cloud-native architectures, or trying to figure out how to take your AI experiments to production — this is where the conversation is happening. What ECS 2026 Actually Is ECS 2026 runs May 5–7 at Confex in Cologne, Germany. It brings together three co-located summits under one roof: European Collaboration Summit — Microsoft 365, Teams, Copilot, and governance European AI & Cloud Summit — Azure architecture, AI agents, cloud security, responsible AI European BizApps Summit — Power Platform, Microsoft Fabric, Dynamics For Azure engineers and AI developers, the European AI & Cloud Summit is your primary destination. But don't ignore the overlap, some of the most interesting AI conversations happen at the intersection of collaboration tooling and cloud infrastructure. The scale matters here: 3,000+ attendees, 100+ sessions, multiple deep-dive tracks, and a speaker lineup that includes Microsoft executives, Regional Directors, and MVPs who have built, broken, and rebuilt production systems. The Azure + AI Track - What's Actually On the Agenda The AI & Cloud Summit agenda is built around real technical depth. Not "intro to AI" content, actual architecture decisions, patterns that work, and lessons from things that didn't. Here's what you can expect: AI Agents and Agentic Systems This is where the energy is right now, and ECS is leaning in. Expect sessions covering how to design agent workflows, chain reasoning steps, handle memory and state, and integrate with Azure AI services. Marco Casalaina, VP of Products for Azure AI at Microsoft, is speaking if you want to understand the direction of the Azure AI platform from the people building it, this is a direct line. Azure Architecture at Scale Cloud-native patterns, microservices, containers, and the architectural decisions that determine whether your system holds up under real load. These sessions go beyond theory you'll hear from engineers who've shipped these designs at enterprise scale. Observability, DevOps, and Production AI Getting AI to production is harder than the demos suggest. Sessions here cover monitoring AI systems, integrating LLMs into CI/CD pipelines, and building the operational practices that keep AI in production reliable and governable. Cloud Security and Compliance Security isn't optional when you're putting AI in front of users or connecting it to enterprise data. Tracks cover identity, access patterns, responsible AI governance, and how to design systems that satisfy compliance requirements without becoming unmaintainable. Pre-Conference Deep Dives One underrated part of ECS: the pre-conference workshops. These are extended, hands-on sessions typically 3–6 hours that let you go deep on a single topic with an expert. Think of them as intensive short courses where you can actually work through the material, not just watch slides. If you're newer to a particular area of Azure AI, or you want to build fluency in a specific pattern before the main conference sessions, these are worth the early travel. The Speaker Quality Is Different Here The ECS speaker roster includes Microsoft executives, Microsoft MVPs, and Regional Directors, people who have real accountability for the products and patterns they're presenting. You'll hear from over 20 Microsoft speakers: Marco Casalaina — VP of Products, Azure AI at Microsoft Adam Harmetz — VP of Product at Microsoft, Enterprise Agent And dozens of MVPs and Regional Directors who are in the field every day, solving the same problems you are. These aren't keynote-only speakers — they're in the session rooms, at the hallway track, available for real conversations. The Hallway Track Is Not a Cliché I know "networking" sounds like a corporate afterthought. At ECS it genuinely isn't. When you put 3,000 practitioners, engineers, architects, DevOps leads, security specialists in one venue for three days, the conversations between sessions are often more valuable than the sessions themselves. You get candid answers to "how are you actually handling X in production?" that you won't find in documentation. The European Microsoft community is tight-knit and collaborative. ECS is where that community concentrates. Why This Matters Right Now We're in a period where AI development is moving fast but the engineering discipline around it is still maturing. Most teams are figuring out: How to move from AI prototype to production system How to instrument and observe AI behaviour reliably How to design agent systems that don't become unmaintainable How to satisfy security and compliance requirements in AI-integrated architectures ECS 2026 is one of the few places where you can get direct answers to these questions from people who've solved them — not theoretically, but in production, on Azure, in the last 12 months. If you go, you'll come back with practical patterns you can apply immediately. That's the bar I hold events to. ECS consistently clears it. Register and Explore the Agenda Register for ECS 2026: ecs.events Explore the AI & Cloud Summit agenda: cloudsummit.eu/en/agenda Dates: May 5–7, 2026 | Location: Confex, Cologne, Germany Early registration is worth it the pre-conference workshops fill up. And if you're coming, find me, I'll be the one talking too much about AI agents and Azure deployments. See you in Cologne.From Terminal to Autonomous Coding: Mastering GitHub Copilot CLI ACP Server
Introduction The rise of AI-powered development is no longer just about autocomplete—it’s about autonomous agents that can think, act, and collaborate. At the center of this transformation is the Agent Client Protocol (ACP) and its integration with GitHub Copilot CLI. If you’ve ever wanted to: Integrate Copilot into your own tools Build custom AI-driven developer workflows Orchestrate coding agents in CI/CD Then understanding the GitHub Copilot CLI ACP Server is a game-changer. This article will take you from zero to advanced, covering concepts, architecture, setup, and real-world use cases. What Is Agent Client Protocol (ACP)? The Agent Client Protocol (ACP) is an open standard designed to connect clients (like IDEs or tools) with AI agents in a consistent and interoperable way. Why ACP Exists Before ACP: Every IDE needed custom integration for each AI agent Every agent needed custom APIs per editor ACP solves this by introducing a universal communication layer. Key Idea “Any editor can talk to any agent.” Core Capabilities ACP enables: Standardized messaging between client and agent Streaming responses Tool execution with permissions Session lifecycle management Multi-agent coordination This makes ACP a foundation layer for the agentic developer ecosystem. What Is GitHub Copilot CLI ACP Server? GitHub Copilot CLI can run as an ACP-compatible server, exposing its AI capabilities programmatically. 👉 In simple terms: It turns Copilot into a backend AI agent service that any tool can connect to. According to GitHub Docs: Copilot CLI can run in ACP mode using a flag It supports standardized communication via ACP It enables integration with IDEs, pipelines, and custom tools Architecture: How ACP + Copilot CLI Works Components Component Role Client Sends prompts, receives responses ACP Protocol Standard communication layer Copilot CLI AI agent executing tasks System Files, repos, tools Getting Started (Beginner Level) Install GitHub Copilot CLI Ensure: Copilot subscription is active CLI installed and authenticated Start ACP Server Default (stdio mode – recommended) TCP Mode (for remote systems) stdio: Best for IDE integration TCP: Best for distributed systems Connect Using ACP Client (Example) Using TypeScript SDK: You: Start Copilot as a process Create streams Initialize connection Send prompt Receive streaming response ACP uses: NDJSON streams over stdin/stdout Event-driven communication ACP Workflow Explained A typical flow looks like this: Step-by-step lifecycle Initialize connection Create session Send prompt Agent processes task Streaming updates returned Optional tool execution (with permissions) Session ends ACP supports: Text + multimodal inputs Incremental responses Cancellation and control Real-World Use Cases IDE Integration (Custom Editors) Build your own AI-powered editor: Connect via ACP Send code context Receive suggestions CI/CD Automation Imagine: Use ACP to: Auto-fix bugs Generate tests Refactor code Multi-Agent Systems ACP enables: Copilot + other agents working together Task delegation Workflow orchestration Custom Developer Tools Examples: AI code review dashboards Internal dev assistants ChatOps integrations Advanced Concepts Session Management ACP allows: Isolated sessions Custom working directories Context persistence Streaming Responses Instead of waiting: Receive responses in chunks Build real-time UIs Permission Handling ACP includes: Tool execution approvals Security boundaries Controlled automation Extensibility ACP supports: Multiple SDKs (TypeScript, Python, Rust, Kotlin) Custom clients Future protocol evolution ACP vs Traditional Integration Feature Traditional APIs ACP Integration Custom per tool Standardized Streaming Limited Native Multi-agent Hard Built-in Extensibility Low High Interoperability Poor Excellent Why ACP + Copilot CLI Is a Big Deal This combination unlocks: ✅ Platform-level AI integration No more vendor lock-in per editor ✅ True agentic workflows Agents don’t just suggest—they act ✅ Ecosystem growth Any tool can plug into Copilot Challenges & Considerations ACP is still in public preview Requires understanding of: Streams Async communication Debugging agent workflows can be complex Future of Developer Experience ACP represents a shift toward: “AI-native development platforms” Future possibilities: Fully autonomous CI/CD pipelines Cross-agent collaboration Self-healing codebases Final Thoughts The GitHub Copilot CLI ACP Server is not just a feature—it’s a foundation for the next generation of software development. If you are: A developer → build smarter tools A tech lead → design AI-driven workflows A CTO aspirant → understand this deeply Then ACP is something you must master early. Quick Summary ACP = Standard protocol for AI agents Copilot CLI = Can run as ACP server Enables = IDEs, CI/CD, multi-agent systems Key power = interoperability + automationVectorless Reasoning-Based RAG: A New Approach to Retrieval-Augmented Generation
Introduction Retrieval-Augmented Generation (RAG) has become a widely adopted architecture for building AI applications that combine Large Language Models (LLMs) with external knowledge sources. Traditional RAG pipelines rely heavily on vector embeddings and similarity search to retrieve relevant documents. While this works well for many scenarios, it introduces challenges such as: Requires chunking documents into small segments Important context can be split across chunks Embedding generation and vector databases add infrastructure complexity A new paradigm called Vectorless Reasoning-Based RAG is emerging to address these challenges. One framework enabling this approach is PageIndex, an open-source document indexing system that organizes documents into a hierarchical tree structure and allows Large Language Models (LLMs) to perform reasoning-based retrieval over that structure. Vectorless Reasoning-Based RAG Instead of vectors, this approach uses structured document navigation. User Query ->Document Tree Structure ->LLM Reasoning ->Relevant Nodes Retrieved ->LLM Generates Answer This mimics how humans read documents: Look at the table of contents Identify relevant sections Read the relevant content Answer the question Core features No Vector Database: It relies on document structure and LLM reasoning for retrieval. It does not depend on vector similarity search. No Chunking: Documents are not split into artificial chunks. Instead, they are organized using their natural structure, such as pages and sections. Human-like Retrieval: The system mimics how human experts read documents. It navigates through sections and extracts information from relevant parts. Better Explainability and Traceability: Retrieval is based on reasoning. The results can be traced back to specific pages and sections. This makes the process easier to interpret. It avoids opaque and approximate vector search, often called “vibe retrieval.” When to Use Vectorless RAG Vectorless RAG works best when: Data is structured or semi-structured Documents have clear metadata Knowledge sources are well organized Queries require reasoning rather than semantic similarity Examples: enterprise knowledge bases internal documentation systems compliance and policy search healthcare documentation financial reporting Implementing Vectorless RAG with Azure AI Foundry Step 1 : Install Pageindex using pip command, from pageindex import PageIndexClient import pageindex.utils as utils # Get your PageIndex API key from https://dash.pageindex.ai/api-keys PAGEINDEX_API_KEY = "YOUR_PAGEINDEX_API_KEY" pi_client = PageIndexClient(api_key=PAGEINDEX_API_KEY) Step 2 : Set up your LLM Example using Azure OpenAI: from openai import AsyncAzureOpenAI client = AsyncAzureOpenAI( api_key=AZURE_OPENAI_API_KEY, azure_endpoint=AZURE_OPENAI_ENDPOINT, api_version=AZURE_OPENAI_API_VERSION ) async def call_llm(prompt, temperature=0): response = await client.chat.completions.create( model=AZURE_DEPLOYMENT_NAME, messages=[{"role": "user", "content": prompt}], temperature=temperature ) return response.choices[0].message.content.strip() Step 3: Page Tree Generation import os, requests pdf_url = "https://arxiv.org/pdf/2501.12948.pdf" //give the pdf url for tree generation, here given one for example pdf_path = os.path.join("../data", pdf_url.split('/')[-1]) os.makedirs(os.path.dirname(pdf_path), exist_ok=True) response = requests.get(pdf_url) with open(pdf_path, "wb") as f: f.write(response.content) print(f"Downloaded {pdf_url}") doc_id = pi_client.submit_document(pdf_path)["doc_id"] print('Document Submitted:', doc_id) Step 4 : Print the generated pageindex tree structure if pi_client.is_retrieval_ready(doc_id): tree = pi_client.get_tree(doc_id, node_summary=True)['result'] print('Simplified Tree Structure of the Document:') utils.print_tree(tree) else: print("Processing document, please try again later...") Step 5 : Use LLM for tree search and identify nodes that might contain relevant context import json query = "What are the conclusions in this document?" tree_without_text = utils.remove_fields(tree.copy(), fields=['text']) search_prompt = f""" You are given a question and a tree structure of a document. Each node contains a node id, node title, and a corresponding summary. Your task is to find all nodes that are likely to contain the answer to the question. Question: {query} Document tree structure: {json.dumps(tree_without_text, indent=2)} Please reply in the following JSON format: {{ "thinking": "<Your thinking process on which nodes are relevant to the question>", "node_list": ["node_id_1", "node_id_2", ..., "node_id_n"] }} Directly return the final JSON structure. Do not output anything else. """ tree_search_result = await call_llm(search_prompt) Step 6 : Print retrieved nodes and reasoning process node_map = utils.create_node_mapping(tree) tree_search_result_json = json.loads(tree_search_result) print('Reasoning Process:') utils.print_wrapped(tree_search_result_json['thinking']) print('\nRetrieved Nodes:') for node_id in tree_search_result_json["node_list"]: node = node_map[node_id] print(f"Node ID: {node['node_id']}\t Page: {node['page_index']}\t Title: {node['title']}") Step 7: Answer generation node_list = json.loads(tree_search_result)["node_list"] relevant_content = "\n\n".join(node_map[node_id]["text"] for node_id in node_list) print('Retrieved Context:\n') utils.print_wrapped(relevant_content[:1000] + '...') answer_prompt = f""" Answer the question based on the context: Question: {query} Context: {relevant_content} Provide a clear, concise answer based only on the context provided. """ print('Generated Answer:\n') answer = await call_llm(answer_prompt) utils.print_wrapped(answer) When to Use Each Approach Both vector-based RAG and vectorless RAG have their strengths. Choosing the right approach depends on the nature of the documents and the type of retrieval required. When to Use Vector Database–Based RAG Vector-based retrieval works best when dealing with large collections of unrelated or loosely structured documents. In such cases, semantic similarity is often sufficient to identify relevant information quickly. Use vector RAG when: Searching across many independent documents Semantic similarity is sufficient to locate relevant content Real-time retrieval is required over very large datasets Common use cases include: Customer support knowledge bases Conversational chatbots Product and content search systems When to Use Vectorless RAG Vectorless approaches such as PageIndex are better suited for long, structured documents where understanding the logical organization of the content is important. Use vectorless RAG when: Documents contain clear hierarchical structure Logical reasoning across sections is required High retrieval accuracy is critical Typical examples include: Financial filings and regulatory reports Legal documents and contracts Technical manuals and documentation Academic and research papers In these scenarios, navigating the document structure allows the system to identify the exact section that logically contains the answer, rather than relying only on semantic similarity. Conclusion Vector databases significantly advanced RAG architectures by enabling scalable semantic search across large datasets. However, they are not the optimal solution for every type of document. Vectorless approaches such as PageIndex introduce a different philosophy: instead of retrieving text that is merely semantically similar, they retrieve text that is logically relevant by reasoning over the structure of the document. As RAG architectures continue to evolve, the future will likely combine the strengths of both approaches. Hybrid systems that integrate vector search for broad retrieval and reasoning-based navigation for precision may offer the best balance of scalability and accuracy for enterprise AI applications.9.1KViews2likes0CommentsLearn how to build agents and workflows in Python
We just concluded Python + Agents, a six-part livestream series where we explored the foundational concepts behind building AI agents in Python using the Microsoft Agent Framework: Using agents with tools, MCP servers, and subagents Adding context to agents with database calls and long-term memory with Redis or Mem0 Monitoring using OpenTelemetry and evaluating quality with the Azure AI Evaluation SDK AI-driven workflows with conditional branching, structured outputs, and multi-agent orchestration Adding human-in-the-loop with tool approval and checkpoints All of the materials from our series are available for you to keep learning from, and linked below: Video recordings of each stream Powerpoint slides that you can use for reviewing or even teaching the material to your own community Open-source code samples you can run yourself using frontier LLMs from GitHub Models or Microsoft Foundry Models Spanish speaker? Check out the Spanish version of the series. 🙋🏽♂️ Have follow up questions? Join the weekly Python+AI office hours on Foundry Discord or the weekly Agent Framework office hours. Building your first agent in Python 📺 Watch YouTube recording In the first session of our Python + Agents series, we'll kick things off with the fundamentals: what AI agents are, how they work, and how to build your first one using the Microsoft Agent Framework. We'll start with the core anatomy of an agent, then walk through how tool calling works in practice—beginning with a single tool, expanding to multiple tools, and finally connecting to tools exposed through local MCP servers. We'll conclude with the supervisor agent pattern, where a single supervisor agent coordinates subtasks across multiple subagents, by treating each agent as a tool. Along the way, we'll share tips for debugging and inspecting agents, like using the DevUI interface from Microsoft Agent Framework for interacting with agent prototypes. 🖼️ Slides for this session 💻 Code repository with examples: python-agentframework-demos 📝 Write-up for this session Adding context and memory to agents 📺 Watch YouTube recording In the second session of our Python + Agents series, we'll extend agents built with the Microsoft Agent Framework by adding two essential capabilities: context and memory. We'll begin with context, commonly known as Retrieval‑Augmented Generation (RAG), and show how agents can ground their responses using knowledge retrieved from local data sources such as SQLite or PostgreSQL. This enables agents to provide accurate, domain‑specific answers based on real information rather than model hallucination. Next, we'll explore memory—both short‑term, thread‑level context and long‑term, persistent memory. You'll see how agents can store and recall information using solutions like Redis or open‑source libraries such as Mem0, enabling them to remember previous interactions, user preferences, and evolving tasks across sessions. By the end, you'll understand how to build agents that are not only capable but context‑aware and memory‑efficient, resulting in richer, more personalized user experiences. 🖼️ Slides for this session 💻 Code repository with examples: python-agentframework-demos 📝 Write-up for this session Monitoring and evaluating agents 📺 Watch YouTube recording In the third session of our Python + Agents series, we'll focus on two essential components of building reliable agents: observability and evaluation. We'll begin with observability, using OpenTelemetry to capture traces, metrics, and logs from agent actions. You'll learn how to instrument your agents and use a local Aspire dashboard to identify slowdowns and failures. From there, we'll explore how to evaluate agent behavior using the Azure AI Evaluation SDK. You'll see how to define evaluation criteria, run automated assessments over a set of tasks, and analyze the results to measure accuracy, helpfulness, and task success. By the end of the session, you'll have practical tools and workflows for monitoring, measuring, and improving your agents—so they're not just functional, but dependable and verifiably effective. 🖼️ Slides for this session 💻 Code repository with examples: python-agentframework-demos 📝 Write-up for this session Building your first AI-driven workflows 📺 Watch YouTube recording In Session 4 of our Python + Agents series, we'll explore the foundations of building AI‑driven workflows using the Microsoft Agent Framework: defining workflow steps, connecting them, passing data between them, and introducing simple ways to guide the path a workflow takes. We'll begin with a conceptual overview of workflows and walk through their core components: executors, edges, and events. You'll learn how workflows can be composed of simple Python functions or powered by full AI agents when a step requires model‑driven behavior. From there, we'll dig into conditional branching, showing how workflows can follow different paths depending on model outputs, intermediate results, or lightweight decision functions. We'll introduce structured outputs as a way to make branching more reliable and easier to maintain—avoiding vague string checks and ensuring that workflow decisions are based on clear, typed data. We'll discover how the DevUI interface makes it easier to develop workflows by visualizing the workflow graph and surfacing the streaming events during a workflow's execution. Finally, we'll dive into an E2E demo application that uses workflows inside a user-facing application with a frontend and backend. 🖼️ Slides for this session 💻 Code repository with examples: python-agentframework-demos 📝 Write-up for this session Orchestrating advanced multi-agent workflows 📺 Watch YouTube recording In Session 5 of our Python + Agents series, we'll go beyond workflow fundamentals and explore how to orchestrate advanced, multi‑agent workflows using the Microsoft Agent Framework. This session focuses on patterns that coordinate multiple steps or multiple agents at once, enabling more powerful and flexible AI‑driven systems. We'll begin by comparing sequential vs. concurrent execution, then dive into techniques for running workflow steps in parallel. You'll learn how fan‑out and fan‑in edges enable multiple branches to run at the same time, how to aggregate their results, and how concurrency allows workflows to scale across tasks efficiently. From there, we'll introduce two multi‑agent orchestration approaches that are built into the framework. We'll start with handoff, where control moves entirely from one agent to another based on workflow logic, which is useful for routing tasks to the right agent as the workflow progresses. We'll then look at Magentic, a planning‑oriented supervisor that generates a high‑level plan for completing a task and delegates portions of that plan to other agents. Finally, we'll wrap up with a demo of an E2E application that showcases a concurrent multi-agent workflow in action. 🖼️ Slides for this session 💻 Code repository with examples: python-agentframework-demos 📝 Write-up for this session Adding a human in the loop to agentic workflows 📺 Watch YouTube recording In the final session of our Python + Agents series, we'll explore how to incorporate human‑in‑the‑loop (HITL) interactions into agentic workflows using the Microsoft Agent Framework. This session focuses on adding points where a workflow can pause, request input or approval from a user, and then resume once the human has responded. HITL is especially important because LLMs can produce uncertain or inconsistent outputs, and human checkpoints provide an added layer of accuracy and oversight. We'll begin with the framework's requests‑and‑responses model, which provides a structured way for workflows to ask questions, collect human input, and continue execution with that data. We'll move onto tool approval, one of the most frequent reasons an agent requests input from a human, and see how workflows can surface pending tool calls for approval or rejection. Next, we'll cover checkpoints and resuming, which allow workflows to pause and be restarted later. This is especially important for HITL scenarios where the human may not be available immediately. We'll walk through examples that demonstrate how checkpoints store progress, how resuming picks up the workflow state, and how this mechanism supports longer‑running or multi‑step review cycles. This session brings together everything from the series—agents, workflows, branching, orchestration—and shows how to integrate humans thoughtfully into AI‑driven processes, especially when reliability and judgment matter most. 🖼️ Slides for this session 💻 Code repository with examples: python-agentframework-demos 📝 Write-up for this session3.5KViews2likes0CommentsGiving Your AI Agents Reliable Skills with the Agent Skills SDK
AI agents are becoming increasingly capable, but they often do not have the context they need to do real work reliably. Your agent can reason well, but it does not actually know how to do the specific things your team needs it to do. For example, it cannot follow your company's incident response playbook, it does not know your escalation policy, and it has no idea how to page the on-call engineer at 3 AM. There are many ways to close this gap, from RAG to custom tool implementations. Agent Skills is one approach that stands out because it is designed around portability and progressive disclosure, keeping context window usage minimal while giving agents access to deep expertise on demand. What is Agent Skills? Agent Skills is an open format for giving agents new capabilities and expertise. The format was originally developed by Anthropic and released as an open standard. It is now supported by a growing list of agent products including Claude Code, VS Code, GitHub, OpenAI Codex, Cursor, Gemini CLI, and many others. As defined in the spec, a skill is a folder on disk containing a SKILL.md file with metadata and instructions, plus optional scripts, references, and assets: incident-response/ SKILL.md # Required: instructions + metadata references/ # Optional: additional documentation severity-levels.md escalation-policy.md scripts/ # Optional: executable code page-oncall.sh assets/ # Optional: templates, diagrams, data files The SKILL.md file has YAML frontmatter with a name and description (so agents know when the skill is relevant), followed by markdown instructions that tell the agent how to perform the task. The format is intentionally simple: self-documenting, extensible, and portable. What makes this design practical is progressive disclosure. The spec is built around the idea that agents should not load everything at once. It works in three stages: Discovery: At startup, agents load only the name and description of each available skill, just enough to know when it might be relevant. Activation: When a task matches a skill's description, the agent reads the full SKILL.md instructions into context. Execution: The agent follows the instructions, optionally loading referenced files or executing bundled scripts as needed. This keeps agents fast while giving them access to deep context on demand. The format is well-designed and widely adopted, but if you want to use skills from your own agents, there is a gap between the spec and a working implementation. The Agent Skills SDK Conceptually, a skill is more than a folder. It is a unit of expertise: a name, a description, a body of instructions, and a set of supporting resources. The file layout is one way to represent that, but there is nothing about the concept that requires a filesystem. The Agent Skills SDK is an open-source Python library built around that idea, treating skills as abstract units of expertise that can be stored anywhere and consumed by any agent framework. It does this by addressing two challenges that come up when you try to use the format from your own agents. The first is where skills live. The spec defines skills as folders on disk, and the tools that support the format today all assume skills are local files. Files are inherently portable, and that is one of the format's strengths. But in the real world, not every team can or wants to serve skills from the filesystem. Maybe your team keeps them in an S3 bucket. Maybe they are in Azure Blob Storage behind your CDN. Maybe they live in a database alongside the rest of your application data. At the moment, if your skills are not on the local filesystem, you are on your own. The SDK changes where skills are served from, not how they are authored. The content and format stay the same regardless of the storage backend, so skills remain portable across providers. The second is how agents consume them. The spec defines the progressive disclosure pattern but actually implementing it in your agent requires real work. You need to figure out how to validate skills against the spec, generate a catalog for the system prompt, expose the right tools for on-demand content retrieval, and handle the back-and-forth of the agent requesting metadata, then the body, then individual references or scripts. That is a lot of plumbing regardless of where the skills are stored, and the work multiplies if you want to support more than one agent framework. The SDK solves both by separating where skills come from (providers) from how agents use them (integrations), so you can mix and match freely. Load skills from the filesystem today, move them to an HTTP server tomorrow, swap in a custom database provider next month, and your agent code does not change at all. How the SDK works The SDK is a set of Python packages organized around two ideas: storage-agnostic providers and progressive disclosure. The provider abstraction means your skills can live anywhere. The SDK ships with providers for the local filesystem and static HTTP servers, but the SkillProvider interface is simple enough that you can write your own in a few methods. A Cosmos DB provider, a Git provider, a SharePoint provider, whatever makes sense for your team. The rest of the SDK does not care where the data comes from. On top of that, the SDK implements the progressive disclosure pattern from the spec as a set of tools that any LLM agent can use. At startup, the SDK generates a skills catalog containing each skill's name and description. Your agent injects this catalog into its system prompt so it knows what is available. Then, during a conversation, the agent calls tools to retrieve content on demand, following the same discovery-activation-execution flow the spec describes. Here is the flow in practice: You register skills from any source (local files, an HTTP server, your own database). The SDK generates a catalog and tool usage instructions, which you inject into the system prompt. The agent calls tools to retrieve content on demand. This matters because context windows are finite. An incident response skill might have a main body, three reference documents, two scripts, and a flowchart. The agent should not load all of that upfront. It should read the body first, then pull the escalation policy only when the conversation actually gets to escalation. A quick example Here is what it looks like in practice. Start by loading a skill from the filesystem: from pathlib import Path from agentskills_core import SkillRegistry from agentskills_fs import LocalFileSystemSkillProvider provider = LocalFileSystemSkillProvider(Path("my-skills")) registry = SkillRegistry() await registry.register("incident-response", provider) Now wire it into a LangChain agent: from langchain.agents import create_agent from agentskills_langchain import get_tools, get_tools_usage_instructions tools = get_tools(registry) skills_catalog = await registry.get_skills_catalog(format="xml") tool_usage_instructions = get_tools_usage_instructions() system_prompt = ( "You are an SRE assistant. Use the available skill tools to look up " "incident response procedures, severity definitions, and escalation " "policies. Always cite which reference document you used.\n\n" f"{skills_catalog}\n\n" f"{tool_usage_instructions}" ) agent = create_agent( llm, tools, system_prompt=system_prompt, ) That is it. The agent now knows what skills are available and has tools to fetch their content. When a user asks "How do I handle a SEV1 incident?", the agent will call get_skill_body to read the instructions, then get_skill_reference to pull the severity levels document, all without you writing any of that retrieval logic. The same pattern works with Microsoft Agent Framework: from agentskills_agentframework import get_tools, get_tools_usage_instructions tools = get_tools(registry) skills_catalog = await registry.get_skills_catalog(format="xml") tool_usage_instructions = get_tools_usage_instructions() system_prompt = ( "You are an SRE assistant. Use the available skill tools to look up " "incident response procedures, severity definitions, and escalation " "policies. Always cite which reference document you used.\n\n" f"{skills_catalog}\n\n" f"{tool_usage_instructions}" ) agent = Agent( client=client, instructions=system_prompt, tools=tools, ) What is in the SDK The SDK is split into small, composable packages so you only install what you need: agentskills-core handles registration, validation, the skills catalog, and the progressive disclosure API. It also defines the SkillProvider interface that all providers implement. agentskills-fs and agentskills-http are the two built-in providers. The filesystem provider loads skills from local directories. The HTTP provider loads them from any static file host: S3, Azure Blob Storage, GitHub Pages, a CDN, or anything that serves files over HTTP. agentskills-langchain and agentskills-agentframework generate framework-native tools and tool usage instructions from a skill registry. agentskills-mcp-server spins up an MCP server that exposes skill tool access and usage as tools and resources, so any MCP-compatible client can use them. Because providers and integrations are separate packages, you can combine them however you want. Use the filesystem provider during development, switch to the HTTP provider in production, or write a custom provider that reads skills from your own database. The integration layer does not need to know or care. Where to go from here The full source, working examples, and detailed API docs are on GitHub: github.com/pratikxpanda/agentskills-sdk The repo includes end-to-end examples for both LangChain and Microsoft Agent Framework, covering filesystem providers, HTTP providers, and MCP. There is also a sample incident-response skill you can use to try things out. A proposal to contribute this SDK to the official agentskills repository has been submitted. If you find it useful, feel free to show your support on the GitHub issue. To learn more about the Agent Skills format itself: What are skills? covers the format and why it matters. Specification is the complete format reference for SKILL.md files. Integrate skills explains how to add skills support to your agent. Example skills on GitHub are a good starting point for writing your own. The SDK is MIT licensed and contributions are welcome. If you have questions or ideas, post a question here or open an issue on the repo.