github
97 Topics🏆 Agents League Winner Spotlight – Reasoning Agents Track
Agents League was designed to showcase what agentic AI can look like when developers move beyond single‑prompt interactions and start building systems that plan, reason, verify, and collaborate. Across three competitive tracks—Creative Apps, Reasoning Agents, and Enterprise Agents—participants had two weeks to design and ship real AI agents using production‑ready Microsoft and GitHub tools, supported by live coding battles, community AMAs, and async builds on GitHub. Today, we’re excited to spotlight the winning project for the Reasoning Agents track, built on Microsoft Foundry: CertPrep Multi‑Agent System — Personalised Microsoft Exam Preparation by Athiq Ahmed. The Reasoning Agents Challenge Scenario The goal of the Reasoning Agents track challenge was to design a multi‑agent system capable of effectively assisting students in preparing for Microsoft certification exams. Participants were asked to build an agentic workflow that could understand certification syllabi, generate personalized study plans, assess learner readiness, and continuously adapt based on performance and feedback. The suggested reference architecture modeled a realistic learning journey: starting from free‑form student input, a sequence of specialized reasoning agents collaboratively curated Microsoft Learn resources, produced structured study plans with timelines and milestones, and maintained learner engagement through reminders. Once preparation was complete, the system shifted into an assessment phase to evaluate readiness and either recommend the appropriate Microsoft certification exam or loop back into targeted remediation—emphasizing reasoning, decision‑making, and human‑in‑the‑loop validation at every step. All details are available here: agentsleague/starter-kits/2-reasoning-agents at main · microsoft/agentsleague. The Winning Project: CertPrep Multi‑Agent System The CertPrep Multi‑Agent System is an AI solution for personalized Microsoft certification exam preparation, supporting nine certification exam families. At a high level, the system turns free‑form learner input into a structured certification plan, measurable progress signals, and actionable recommendations—demonstrating exactly the kind of reasoned orchestration this track was designed to surface. Inside the Multi‑Agent Architecture At its core, the system is designed as a multi‑agent pipeline that combines sequential reasoning, parallel execution, and human‑in‑the‑loop gates, with traceability and responsible AI guardrails. The solution is composed of eight specialized reasoning agents, each focused on a specific stage of the learning journey: LearnerProfilingAgent – Converts free‑text background information into a structured learner profile using Microsoft Foundry SDK (with deterministic fallbacks). StudyPlanAgent – Generates a week‑by‑week study plan using a constrained allocation algorithm to respect the learner’s available time. LearningPathCuratorAgent – Maps exam domains to curated Microsoft Learn resources with trusted URLs and estimated effort. ProgressAgent – Computes a weighted readiness score based on domain coverage, time utilization, and practice performance. AssessmentAgent – Generates and evaluates domain‑proportional mock exams. CertificationRecommendationAgent – Issues a clear “GO / CONDITIONAL GO / NOT YET” decision with remediation steps and next‑cert suggestions. Throughout the pipeline, a 17‑rule Guardrails Pipeline enforces validation checks at every agent boundary, and two explicit human‑in‑the‑loop gates ensure that decisions are made only when sufficient learner confirmation or data is present. CertPrep leverages Microsoft Foundry Agent Service and related tooling to run this reasoning pipeline reliably and observably: Managed agents via Foundry SDK Structured JSON outputs using GPT‑4o (JSON mode) with conservative temperature settings Guardrails enforced through Azure Content Safety Parallel agent fan‑out using concurrent execution Typed contracts with Pydantic for every agent boundary AI-assisted development with GitHub Copilot, used throughout for code generation, refactoring, and test scaffolding Notably, the full pipeline is designed to run in under one second in mock mode, enabling reliable demos without live credentials. User Experience: From Onboarding to Exam Readiness Beyond its backend architecture, CertPrep places strong emphasis on clarity, transparency, and user trust through a well‑structured front‑end experience. The application is built with Streamlit and organized as a 7‑tab interactive interface, guiding learners step‑by‑step through their preparation journey. From a user’s perspective, the flow looks like this: Profile & Goals Input Learners start by describing their background, experience level, and certification goals in natural language. The system immediately reflects how this input is interpreted by displaying the structured learner profile produced by the profiling agent. Learning Path & Study Plan Visualization Once generated, the study plan is presented using visual aids such as Gantt‑style timelines and domain breakdowns, making it easy to understand weekly milestones, expected effort, and progress over time. Progress Tracking & Readiness Scoring As learners move forward, the UI surfaces an exam‑weighted readiness score, combining domain coverage, study plan adherence, and assessment performance—helping users understand why the system considers them ready (or not yet). Assessments and Feedback Practice assessments are generated dynamically, and results are reported alongside actionable feedback rather than just raw scores. Transparent Recommendations Final recommendations are presented clearly, supported by reasoning traces and visual summaries, reinforcing trust and explainability in the agent’s decision‑making. The UI also includes an Admin Dashboard and demo‑friendly modes, enabling judges, reviewers, or instructors to inspect reasoning traces, switch between live and mock execution, and demonstrate the system reliably without external dependencies. Why This Project Stood Out This project embodies the spirit of the Reasoning Agents track in several ways: ✅ Clear separation of reasoning roles, instead of prompt‑heavy monoliths ✅ Deterministic fallbacks and guardrails, critical for educational and decision‑support systems ✅ Observable, debuggable workflows, aligned with Foundry’s production goals ✅ Explainable outputs, surfaced directly in the UX It demonstrates how agentic patterns translate cleanly into maintainable architectures when supported by the right platform abstractions. Try It Yourself Explore the project, architecture, and demo here: 🔗 GitHub Issue (full project details): https://github.com/microsoft/agentsleague/issues/76 🎥 Demo video: https://www.youtube.com/watch?v=okWcFnQoBsE 🌐 Live app (mock data): https://agentsleague.streamlit.app/Supercharge Your Dev Workflows with GitHub Copilot Custom Skills
The Problem Every team has those repetitive, multi-step workflows that eat up time: Running a sequence of CLI commands, parsing output, and generating a report Querying multiple APIs, correlating data, and summarizing findings Executing test suites, analyzing failures, and producing actionable insights You've probably documented these in a wiki or a runbook. But every time, you still manually copy-paste commands, tweak parameters, and stitch results together. What if your AI coding assistant could do all of that — triggered by a single natural language request? That's exactly what GitHub Copilot Custom Skills enable. What Are Custom Skills? A skill is a folder containing a SKILL.md file (instructions for the AI), plus optional scripts, templates, and reference docs. When you ask Copilot something that matches the skill's description, it loads the instructions and executes the workflow autonomously. Think of it as giving your AI assistant a runbook it can actually execute, not just read. Without Skills With Skills Read the wiki for the procedure Copilot loads the procedure automatically Copy-paste 5 CLI commands Copilot runs the full pipeline Manually parse JSON output Script generates a formatted HTML report 15-30 minutes of manual work One natural language request, ~2 minutes How It Works The key insight: the skill file is the contract between you and the AI. You describe what to do and how, and Copilot handles the orchestration. Prerequisites Requirement Details VS Code Latest stable release GitHub Copilot Active Copilot subscription (Individual, Business, or Enterprise) Agent mode Select "Agent" mode in the Copilot Chat panel (the default in recent versions) Runtime tools Whatever your scripts need — Python, Node.js, .NET CLI, az CLI, etc. Note: Agent Skills follow an open standard — they work across VS Code, GitHub Copilot CLI, and GitHub Copilot coding agent. No additional extensions or cloud services are required for the skill system itself. Anatomy of a Skill .github/skills/my-skill/ ├── SKILL.md # Instructions (required) └── references/ ├── resources/ │ ├── run.py # Automation script │ ├── query-template.sql # Reusable query template │ └── config.yaml # Static configuration └── reports/ └── report_template.html # Output template The SKILL.md File Every skill has the same structure: --- name: my-skill description: 'What this does and when to use it. Include trigger phrases so Copilot knows when to load it. USE FOR: specific task A, task B. Trigger phrases: "keyword1", "keyword2".' argument-hint: 'What inputs the user should provide.' --- # My Skill ## When to Use - Situation A - Situation B ## Quick Start \```powershell cd .github/skills/my-skill/references/resources py run.py <arg1> <arg2> \``` ## What It Does | Step | Action | Purpose | |------|--------|---------| | 1 | Fetch data from source | Gather raw input | | 2 | Process and transform | Apply business logic | | 3 | Generate report | Produce actionable output | ## Output Description of what the user gets back. Key Design Principles Description is discovery. The description field is the only thing Copilot reads to decide whether to load your skill. Pack it with trigger phrases and keywords. Progressive loading. Copilot reads only name + description (~100 tokens) for all skills. It loads the full SKILL.md body only for matched skills. Reference files load only when the procedure references them. Self-contained procedures. Include everything the AI needs to execute — exact commands, parameter formats, file paths. Don't assume prior knowledge. Scripts do the heavy lifting. The AI orchestrates; your scripts execute. This keeps the workflow deterministic and reproducible. Example: Build a Deployment Health Check Skill Let's build a skill that checks the health of a deployment by querying an API, comparing against expected baselines, and generating a summary. Step 1 — Create the folder structure .github/skills/deployment-health/ ├── SKILL.md └── references/ └── resources/ ├── check_health.py └── endpoints.yaml Step 2 — Write the SKILL.md --- name: deployment-health description: 'Check deployment health across environments. Queries health endpoints, compares response times against baselines, and flags degraded services. USE FOR: deployment validation, health check, post-deploy verification, service status. Trigger phrases: "check deployment health", "is the deployment healthy", "post-deploy check", "service health".' argument-hint: 'Provide the environment name (e.g., staging, production).' --- # Deployment Health Check ## When to Use - After deploying to any environment - During incident triage to check service status - Scheduled spot checks ## Quick Start \```bash cd .github/skills/deployment-health/references/resources python check_health.py <environment> \``` ## What It Does 1. Loads endpoint definitions from `endpoints.yaml` 2. Calls each endpoint, records response time and status code 3. Compares against baseline thresholds 4. Generates an HTML report with pass/fail status ## Output HTML report at `references/reports/health_<environment>_<date>.html` Step 3 — Write the script # check_health.py import sys, yaml, requests, time, json from datetime import datetime def main(): env = sys.argv[1] with open("endpoints.yaml") as f: config = yaml.safe_load(f) results = [] for ep in config["endpoints"]: url = ep["url"].replace("{env}", env) start = time.time() resp = requests.get(url, timeout=10) elapsed = time.time() - start results.append({ "service": ep["name"], "status": resp.status_code, "latency_ms": round(elapsed * 1000), "threshold_ms": ep["threshold_ms"], "healthy": resp.status_code == 200 and elapsed * 1000 < ep["threshold_ms"] }) healthy = sum(1 for r in results if r["healthy"]) print(f"Health check: {healthy}/{len(results)} services healthy") # ... generate HTML report ... if __name__ == "__main__": main() Step 4 — Use it Just ask Copilot in agent mode: "Check deployment health for staging" Copilot will: Match against the skill description Load the SKILL.md instructions Run python check_health.py staging Open the generated report Summarize findings in chat More Skill Ideas Skills aren't limited to any specific domain. Here are patterns that work well: Skill What It Automates Test Regression Analyzer Run tests, parse failures, compare against last known-good run, generate diff report API Contract Checker Compare Open API specs between branches, flag breaking changes Security Scan Reporter Run SAST/DAST tools, correlate findings, produce prioritized report Cost Analysis Query cloud billing APIs, compare costs across periods, flag anomalies Release Notes Generator Parse git log between tags, categorize changes, generate changelog Infrastructure Drift Detector Compare live infra state vs IaC templates, flag drift Log Pattern Analyzer Query log aggregation systems, identify anomaly patterns, summarize Performance Bench marker Run benchmarks, compare against baselines, flag regressions Dependency Auditor Scan dependencies, check for vulnerabilities and outdated packages The pattern is always the same: instructions (SKILL.md) + automation script + output template. Tips for Writing Effective Skills Do Front-load the description with keywords — this is how Copilot discovers your skill Include exact commands — cd path/to/dir && python script.py <args> Document input/output clearly — what goes in, what comes out Use tables for multi-step procedures — easier for the AI to follow Include time zone conversion notes if dealing with timestamps Bundle HTML report templates — rich output beats plain text Don't Don't use vague descriptions — "A useful skill" won't trigger on anything Don't assume context — include all paths, env vars, and prerequisites Don't put everything in SKILL.md — use references/ for large files Don't hardcode secrets — use environment variables or Azure Key Vault Don't skip error guidance — tell the AI what common errors look like and how to fix them Skill Locations Skills can live at project or personal level: Location Scope Shared with team? .github/skills/<name>/ Project Yes (via source control) .agents/skills/<name>/ Project Yes (via source control) .claude/skills/<name>/ Project Yes (via source control) ~/.copilot/skills/<name>/ Personal No ~/.agents/skills/<name>/ Personal No ~/.claude/skills/<name>/ Personal No Project-level skills are committed to your repo and shared with the team. Personal skills are yours and roam with your VS Code settings sync. You can also configure additional skill locations via the chat.skillsLocations VS Code setting. How Skills Fit in the Copilot Customization Stack Skills are one of several customization primitives. Here's when to use what: Primitive Use When Workspace Instructions (.github/copilot-instructions.md) Always-on rules: coding standards, naming conventions, architectural guidelines File Instructions (.github/instructions/*.instructions.md) Rules scoped to specific file patterns (e.g., all *.test.ts files) Prompts (.github/prompts/*.prompt.md) Single-shot tasks with parameterized inputs Skills (.github/skills/<name>/SKILL.md) Multi-step workflows with bundled scripts and templates Custom Agents (.github/agents/*.agent.md) Isolated subagents with restricted tool access or multi-stage pipelines Hooks (.github/hooks/*.json) Deterministic shell commands at agent lifecycle events (auto-format, block tools) Plugins Installable skill bundles from the community (awesome-copilot) Slash Commands & Quick Creation Skills automatically appear as slash commands in chat. Type / to see all available skills. You can also pass context after the command: /deployment-health staging /webapp-testing for the login page Want to create a skill fast? Type /create-skill in chat and describe what you need. Copilot will ask clarifying questions and generate the SKILL.md with proper frontmatter and directory structure. You can also extract a skill from an ongoing conversation: after debugging a complex issue, ask "create a skill from how we just debugged that" to capture the multi-step procedure as a reusable skill. Controlling When Skills Load Use frontmatter properties to fine-tune skill availability: Configuration Slash command? Auto-loaded? Use case Default (both omitted) Yes Yes General-purpose skills user-invocable: false No Yes Background knowledge the model loads when relevant disable-model-invocation: true Yes No Skills you only want to run on demand Both set No No Disabled skills The Open Standard Agent Skills follow an open standard that works across multiple AI agents: GitHub Copilot in VS Code — chat and agent mode GitHub Copilot CLI — terminal workflows GitHub Copilot coding agent — automated coding tasks Claude Code, Gemini CLI — compatible agents via .claude/skills/ and .agents/skills/ Skills you write once are portable across all these tools. Getting Started Create .github/skills/<your-skill>/SKILL.md in your repo Write a keyword-rich description in the YAML frontmatter Add your procedure and reference scripts Open VS Code, switch to Agent mode, and ask Copilot to do the task Watch it discover your skill, load the instructions, and execute Or skip the manual setup — type /create-skill in chat and describe what you need. That's it. No extension to install. No config file to update. No deployment pipeline. Just markdown and scripts, version-controlled in your repo. Custom Skills turn your documented procedures into executable AI workflows. Start with your most painful manual task, wrap it in a SKILL.md, and let Copilot handle the rest. Further Reading: Official Agent Skills docs Community skills & plugins (awesome-copilot) Anthropic reference skillsFrom CI/CD to Continuous AI: The Future of GitHub Automation
Introduction For over a decade, CI/CD (Continuous Integration and Continuous Deployment) has been the backbone of modern software engineering. It helped teams move from manual, error-prone deployments to automated, reliable pipelines. But today, we are standing at the edge of another transformation—one that is far more powerful. Welcome to the era of Continuous AI. This new paradigm is not just about automating pipelines—it’s about building self-improving, intelligent systems that can analyze, decide, and act with minimal human intervention. With the emergence of AI-powered workflows inside GitHub, automation is evolving from rule-based execution to context-aware decision-making. This article explores: What Continuous AI is How it differs from CI/CD Real-world use cases Architecture patterns Challenges and best practices What the future holds for engineering teams The Evolution: From CI to CI/CD to Continuous AI 1. Continuous Integration (CI) Developers merge code frequently Automated builds and tests validate changes Goal: Catch issues early 2. Continuous Deployment (CD) Code automatically deployed to production Reduced manual intervention Goal: Faster delivery 3. Continuous AI (The Next Step) Systems don’t just execute—they think and improve AI agents analyze code, detect issues, suggest fixes, and even implement them Goal: Autonomous software evolution What is Continuous AI? Continuous AI is a model where: Software systems continuously improve themselves using AI-driven insights and automated actions. Instead of static pipelines, you get: Intelligent workflows Context-aware automation Self-healing repositories Autonomous decision-making systems Key Characteristics Feature CI/CD Continuous AI Execution Rule-based AI-driven Flexibility Low High Decision-making Predefined Dynamic Learning None Continuous Output Build & deploy Improve & optimize Why Continuous AI Matters Traditional automation has limitations: It cannot adapt to new patterns It cannot reason about code quality It cannot proactively improve systems Continuous AI solves these problems by introducing: Context awareness Learning from past data Proactive optimization This leads to: Faster development cycles Higher code quality Reduced operational overhead Smarter engineering teams Core Components of Continuous AI in GitHub 1. AI Agents AI agents act as autonomous workers inside your repository. They can: Review pull requests Suggest improvements Generate tests Fix bugs 2. Agentic Workflows Unlike YAML pipelines, these workflows: Are written in natural language or simplified formats Use AI to interpret intent Adapt based on context 3. Event-Driven Intelligence Workflows trigger on events like: Pull request creation Issue updates Failed builds But instead of just reacting, they: Analyze the situation Decide the best course of action 4. Feedback Loops Continuous AI systems improve over time using: Past PR data Test failures Deployment outcomes CI/CD vs Continuous AI: A Deep Comparison Traditional CI/CD Pipeline Developer pushes code Pipeline runs tests Build is generated Code is deployed ➡️ Everything is predefined and static Continuous AI Workflow Developer creates PR AI agent reviews code Suggests improvements Generates missing tests Fixes minor issues automatically Learns from feedback ➡️ Dynamic, intelligent, and evolving Real-World Use Cases 1. Automated Pull Request Reviews AI agents can: Detect code smells Suggest optimizations Ensure coding standards 2. Self-Healing Repositories Automatically fix failing builds Update dependencies Resolve merge conflicts 3. Intelligent Test Generation Generate test cases based on code changes Improve coverage over time 4. Issue Triage Automation Categorize issues Assign priorities Route to correct teams 5. Documentation Automation Auto-generate README updates Keep documentation in sync with code Architecture of Continuous AI Systems A typical architecture includes: Layer 1: Event Sources GitHub events (PRs, commits, issues) Layer 2: AI Decision Engine LLM-based agents Context analysis Task planning Layer 3: Action Layer GitHub Actions Scripts Automation tools Layer 4: Feedback Loop Logs Metrics Model improvement Multi-Agent Systems: The Next Level Continuous AI becomes more powerful when multiple agents collaborate. Example Setup: Code Review Agent → Reviews PRs Test Agent → Generates tests Security Agent → Scans vulnerabilities Docs Agent → Updates documentation These agents: Communicate with each other Share context Coordinate tasks ➡️ This creates a virtual AI engineering team Benefits for Engineering Teams 1. Increased Productivity Developers spend less time on repetitive tasks. 2. Better Code Quality Continuous improvements ensure cleaner codebases. 3. Faster Time-to-Market Automation reduces bottlenecks. 4. Reduced Burnout Engineers focus on innovation instead of maintenance. Challenges and Risks 1. Over-Automation Too much automation can reduce human oversight. 2. Security Concerns AI workflows may misuse permissions if not controlled. 3. Trust Issues Teams may hesitate to rely on AI decisions. 4. Cost of AI Operations Running AI agents continuously can increase costs. Best Practices for Implementing Continuous AI 1. Start Small Begin with: PR review automation Test generation 2. Human-in-the-Loop Ensure: Critical decisions require approval 3. Use Least Privilege Restrict workflow permissions. 4. Monitor and Measure Track: Accuracy Impact Cost 5. Build Feedback Loops Continuously improve models and workflows. Future of GitHub Automation The future is heading toward: Fully autonomous repositories AI-driven engineering teams Continuous optimization of software systems We may soon see: Repos that refactor themselves Systems that predict failures before they occur AI architects designing system improvements Conclusion CI/CD transformed how we build and deliver software. But Continuous AI is set to transform how software evolves. It moves us from: “Automating tasks” → “Automating intelligence” For engineering leaders, this is not just a technical shift—it’s a strategic advantage. Early adopters of Continuous AI will build faster, smarter, and more resilient systems. The question is no longer: “Should we adopt AI in our workflows?” But: “How fast can we transition to Continuous AI?”Demystifying GitHub Copilot Security Controls: easing concerns for organizational adoption
At a recent developer conference, I delivered a session on Legacy Code Rescue using GitHub Copilot App Modernization. Throughout the day, conversations with developers revealed a clear divide: some have fully embraced Agentic AI in their daily coding, while others remain cautious. Often, this hesitation isn't due to reluctance but stems from organizational concerns around security and regulatory compliance. Having witnessed similar patterns during past technology shifts, I understand how these barriers can slow adoption. In this blog, I'll demystify the most common security concerns about GitHub Copilot and explain how its built-in features address them, empowering organizations to confidently modernize their development workflows. GitHub Copilot Model Training A common question I received at the conference was whether GitHub uses your code as training data for GitHub Copilot. I always direct customers to the GitHub Copilot Trust Center for clarity, but the answer is straightforward: “No. GitHub uses neither Copilot Business nor Enterprise data to train the GitHub model.” Notice this restriction also applies to third-party models as well (e.g. Anthropic, Google). GitHub Copilot Intellectual Property indemnification policy A frequent concern I hear is, since GitHub Copilot’s underlying models are trained on sources that include public code, it might simply “copy and paste” code from those sources. Let’s clarify how this actually works: Does GitHub Copilot “copy/paste”? “The AI models that create Copilot’s suggestions may be trained on public code, but do not contain any code. When they generate a suggestion, they are not “copying and pasting” from any codebase.” To provide an additional layer of protection, GitHub Copilot includes a “duplicate detection filter”. This feature helps prevent suggestions that closely match public code from being surfaced. (Note: This duplicate detection currently does not apply to the Copilot coding agent.) More importantly, customers are protected by an Intellectual Property indemnification policy. This means that if you receive an unmodified suggestion from GitHub Copilot and face a copyright claim as a result, Microsoft will defend you in court. GitHub Copilot Data Retention Another frequent question I hear concerns GitHub Copilot’s data retention policies. For organizations on GitHub Copilot Business and Enterprise plans, retention practices depend on how and where the service is accessed from: Access through IDE for Chat and Code Completions: Prompts and Suggestions: Not retained. User Engagement Data: Kept for two years. Feedback Data: Stored for as long as needed for its intended purpose. Other GitHub Copilot access and use: Prompts and Suggestions: Retained for 28 days. User Engagement Data: Kept for two years. Feedback Data: Stored for as long as needed for its intended purpose. For Copilot Coding Agent, session logs are retained for the life of the account in order to provide the service. Excluding content from GitHub Copilot To prevent GitHub Copilot from indexing sensitive files, you can configure content exclusions at the repository or organization level. In VS Code, use the .copilotignore file to exclude files client-side. Note that files listed in .gitignore are not indexed by default but may still be referenced if open or explicitly referenced (unless they’re excluded through .copilotignore or content exclusions). The life cycle of a GitHub Copilot code suggestion Here are the key protections at each stage of the life cycle of a GitHub Copilot code suggestion: In the IDE: Content exclusions prevent files, folders, or patterns from being included. GitHub proxy (pre-model safety): Prompts go through a GitHub proxy hosted in Microsoft Azure for pre-inference checks: screening for toxic or inappropriate language, relevance, and hacking attempts/jailbreak-style prompts before reaching the model. Model response: With the public code filter enabled, some suggestions are suppressed. The vulnerability protection feature blocks insecure coding patterns like hardcoded credentials or SQL injections in real time. Disable access to GitHub Copilot Free Due to the varying policies associated with GitHub Copilot Free, it is crucial for organizations to ensure it is disabled both in the IDE and on GitHub.com. Since not all IDEs currently offer a built-in option to disable Copilot Free, the most reliable method to prevent both accidental and intentional access is to implement firewall rule changes, as outlined in the official documentation. Agent Mode Allow List Accidental file system deletion by Agentic AI assistants can happen. With GitHub Copilot agent mode, the "Terminal auto approve” setting in VS Code can be used to prevent this. This setting can be managed centrally using a VS Code policy. MCP registry Organizations often want to restrict access to allow only trusted MCP servers. GitHub now offers an MCP registry feature for this purpose. This feature isn’t available in all IDEs and clients yet, but it's being developed. Compliance Certifications The GitHub Copilot Trust Center page lists GitHub Copilot's broad compliance credentials, surpassing many competitors in financial, security, privacy, cloud, and industry coverage. SOC 1 Type 2: Assurance over internal controls for financial reporting. SOC 2 Type 2: In-depth report covering Security, Availability, Processing Integrity, Confidentiality, and Privacy over time. SOC 3: General-use version of SOC 2 with broad executive-level assurance. ISO/IEC 27001:2013: Certification for a formal Information Security Management System (ISMS), based on risk management controls. CSA STAR Level 2: Includes a third-party attestation combining ISO 27001 or SOC 2 with additional cloud control matrix (CCM) requirements. TISAX: Trusted Information Security Assessment Exchange, covering automotive-sector security standards. In summary, while the adoption of AI tools like GitHub Copilot in software development can raise important questions around security, privacy, and compliance, it’s clear that existing safeguards in place help address these concerns. By understanding the safeguards, configurable controls, and robust compliance certifications offered, organizations and developers alike can feel more confident in embracing GitHub Copilot to accelerate innovation while maintaining trust and peace of mind.GitHub Copilot SDK and Hybrid AI in Practice: Automating README to PPT Transformation
Introduction In today's rapidly evolving AI landscape, developers often face a critical choice: should we use powerful cloud-based Large Language Models (LLMs) that require internet connectivity, or lightweight Small Language Models (SLMs) that run locally but have limited capabilities? The answer isn't either-or—it's hybrid models—combining the strengths of both to create AI solutions that are secure, efficient, and powerful. This article explores hybrid model architectures through the lens of GenGitHubRepoPPT, demonstrating how to elegantly combine Microsoft Foundry Local, GitHub Copilot SDK, and other technologies to automatically generate professional PowerPoint presentations from GitHub README files. 1. Hybrid Model Scenarios and Value 1.1 What Are Hybrid Models? Hybrid AI Models strategically combine locally-running Small Language Models (SLMs) with cloud-based Large Language Models (LLMs) within the same application, selecting the most appropriate model for each task based on its unique characteristics. Core Principles: Local Processing for Sensitive Data: Privacy-critical content analysis happens on-device Cloud for Value Creation: Complex reasoning and creative generation leverage cloud power Balancing Cost and Performance: High-frequency, simple tasks run locally to minimize API costs 1.2 Typical Hybrid Model Use Cases Use Case Local SLM Role Cloud LLM Role Value Proposition Intelligent Document Processing Text extraction, structural analysis Content refinement, format conversion Privacy protection + Professional output Code Development Assistant Syntax checking, code completion Complex refactoring, architecture advice Fast response + Deep insights Customer Service Systems Intent recognition, FAQ handling Complex issue resolution Reduced latency + Enhanced quality Content Creation Platforms Keyword extraction, outline generation Article writing, multilingual translation Cost control + Creative assurance 1.3 Why Choose Hybrid Models? Three Core Advantages: Privacy and Security Sensitive data never leaves local devices Compliant with GDPR, HIPAA, and other regulations Ideal for internal corporate documents and personal information Cost Optimization Reduces cloud API call frequency Local models have zero usage fees Predictable operational costs Performance and Reliability Local processing eliminates network latency Partial functionality in offline environments Cloud models ensure high-quality output 2. Core Technology Analysis 2.1 Large Language Models (LLMs): Cloud Intelligence Representatives What are LLMs? Large Language Models are deep learning-based natural language processing models, typically with billions to trillions of parameters. Through training on massive text datasets, they've acquired powerful language understanding and generation capabilities. Representative Models: Claude Sonnet 4.5: Anthropic's flagship model, excelling at long-context processing and complex reasoning GPT-5.2 Series: OpenAI's general-purpose language models Gemini: Google's multimodal large models LLM Advantages: ✅ Exceptional text generation quality ✅ Powerful contextual understanding ✅ Support for complex reasoning tasks ✅ Continuous model updates and optimization Typical Applications: Professional document writing (technical reports, business plans) Code generation and refactoring Multilingual translation Creative content creation 2.2 Small Language Models (SLMs) and Microsoft Foundry Local 2.2.1 SLM Characteristics Small Language Models typically have 1B-7B parameters, designed specifically for resource-constrained environments. Mainstream SLM Model Families: Microsoft Phi Family (Phi Family): Inference-optimized efficient models Alibaba Qwen Family (Qwen Family): Excellent Chinese language capabilities Mistral Series: Outstanding performance with small parameter counts SLM Advantages: ⚡ Low-latency response (millisecond-level) 💰 Zero API costs 🔒 Fully local, data stays on-device 📱 Suitable for edge device deployment 2.2.2 Microsoft Foundry Local: The Foundation of Local AI Foundry Local is Microsoft's local AI runtime tool, enabling developers to easily run SLMs on Windows or macOS devices. Core Features: OpenAI-Compatible API # Using Foundry Local is like using OpenAI API from openai import OpenAI from foundry_local import FoundryLocalManager manager = FoundryLocalManager("qwen2.5-7b-instruct") client = OpenAI( base_url=manager.endpoint, api_key=manager.api_key ) Hardware Acceleration Support CPU: General computing support GPU: NVIDIA, AMD, Intel graphics acceleration NPU: Qualcomm, Intel AI-specific chips Apple Silicon: Neural Engine optimization Based on ONNX Runtime Cross-platform compatibility Highly optimized inference performance Supports model quantization (INT4, INT8) Convenient Model Management # View available models foundry model list # Run a model foundry model run qwen2.5-7b-instruct-generic-cpu:4 # Check running status foundry service ps Foundry Local Application Value: 🎓 Educational Scenarios: Students can learn AI development without cloud subscriptions 🏢 Enterprise Environments: Process sensitive data while maintaining compliance 🧪 R&D Testing: Rapid prototyping without API cost concerns ✈️ Offline Environments: Works on planes, subways, and other no-network scenarios 2.3 GitHub Copilot SDK: The Express Lane from Agent to Business Value 2.3.1 What is GitHub Copilot SDK? GitHub Copilot SDK, released as a technical preview on January 22, 2026, is a game-changer for AI Agent development. Unlike other AI SDKs, Copilot SDK doesn't just provide API calling interfaces—it delivers a complete, production-grade Agent execution engine. Why is it revolutionary? Traditional AI application development requires you to build: ❌ Context management systems (multi-turn conversation state) ❌ Tool orchestration logic (deciding when to call which tool) ❌ Model routing mechanisms (switching between different LLMs) ❌ MCP server integration ❌ Permission and security boundaries ❌ Error handling and retry mechanisms Copilot SDK provides all of this out-of-the-box, letting you focus on business logic rather than underlying infrastructure. 2.3.2 Core Advantages: The Ultra-Short Path from Concept to Code Production-Grade Agent Engine: Battle-Tested Reliability Copilot SDK uses the same Agent core as GitHub Copilot CLI, which means: ✅ Validated in millions of real-world developer scenarios ✅ Capable of handling complex multi-step task orchestration ✅ Automatic task planning and execution ✅ Built-in error recovery mechanisms Real-World Example: In the GenGitHubRepoPPT project, we don't need to hand-write the "how to convert outline to PPT" logic—we simply tell Copilot SDK the goal, and it automatically: Analyzes outline structure Plans slide layouts Calls file creation tools Applies formatting logic Handles multilingual adaptation # Traditional approach: requires hundreds of lines of code for logic def create_ppt_traditional(outline): slides = parse_outline(outline) for slide in slides: layout = determine_layout(slide) content = format_content(slide) apply_styling(content, layout) # ... more manual logic return ppt_file # Copilot SDK approach: focus on business intent session = await client.create_session({ "model": "claude-sonnet-4.5", "streaming": True, "skill_directories": [skills_dir] }) session.send_and_wait({"prompt": prompt}, timeout=600) Custom Skills: Reusable Encapsulation of Business Knowledge This is one of Copilot SDK's most powerful features. In traditional AI development, you need to provide complete prompts and context with every call. Skills allow you to: Define once, reuse forever: # .copilot_skills/ppt/SKILL.md # PowerPoint Generation Expert Skill ## Expertise You are an expert in business presentation design, skilled at transforming technical content into easy-to-understand visual presentations. ## Workflow 1. **Structure Analysis** - Identify outline hierarchy (titles, subtitles, bullet points) - Determine topic and content density for each slide 2. **Layout Selection** - Title slide: Use large title + subtitle layout - Content slides: Choose single/dual column based on bullet count - Technical details: Use code block or table layouts 3. **Visual Optimization** - Apply professional color scheme (corporate blue + accent colors) - Ensure each slide has a visual focal point - Keep bullets to 5-7 items per page 4. **Multilingual Adaptation** - Choose appropriate fonts based on language (Chinese: Microsoft YaHei, English: Calibri) - Adapt text direction and layout conventions ## Output Requirements Generate .pptx files meeting these standards: - 16:9 widescreen ratio - Consistent visual style - Editable content (not images) - File size < 5MB Business Code Generation Capability This is the core value of this project. Unlike generic LLM APIs, Copilot SDK with Skills can generate truly executable business code. Comparison Example: Aspect Generic LLM API Copilot SDK + Skills Task Description Requires detailed prompt engineering Concise business intent suffices Output Quality May need multiple adjustments Professional-grade on first try Code Execution Usually example code Directly generates runnable programs Error Handling Manual implementation required Agent automatically handles and retries Multi-step Tasks Manual orchestration needed Automatic planning and execution Comparison of manual coding workload: Task Manual Coding Copilot SDK Processing logic code ~500 lines ~10 lines configuration Layout templates ~200 lines Declared in Skill Style definitions ~150 lines Declared in Skill Error handling ~100 lines Automatically handled Total ~950 lines ~10 lines + Skill file Tool Calling & MCP Integration: Connecting to the Real World Copilot SDK doesn't just generate code—it can directly execute operations: 🗃️ File System Operations: Create, read, modify files 🌐 Network Requests: Call external APIs 📊 Data Processing: Use pandas, numpy, and other libraries 🔧 Custom Tools: Integrate your business logic 3. GenGitHubRepoPPT Case Study 3.1 Project Overview GenGitHubRepoPPT is an innovative hybrid AI solution that combines local AI models with cloud-based AI agents to automatically generate professional PowerPoint presentations from GitHub repository README files in under 5 minutes. Technical Architecture: 3.2 Why Adopt a Hybrid Model? Stage 1: Local SLM Processes Sensitive Data Task: Analyze GitHub README, extract key information, generate structured outline Reasons for choosing Qwen-2.5-7B + Foundry Local: Privacy Protection README may contain internal project information Local processing ensures data doesn't leave the device Complies with data compliance requirements Cost Effectiveness Each analysis processes thousands of tokens Cloud API costs are significant in high-frequency scenarios Local models have zero additional fees Performance Qwen-2.5-7B excels at text analysis tasks Outstanding Chinese support Acceptable CPU inference latency (typically 2-3 seconds) Stage 2: Cloud LLM + Copilot SDK Creates Business Value Task: Create well-formatted PowerPoint files based on outline Reasons for choosing Claude Sonnet 4.5 + Copilot SDK: Automated Business Code Generation Traditional approach pain points: Need to hand-write 500+ lines of code for PPT layout logic Require deep knowledge of python-pptx library APIs Style and formatting code is error-prone Multilingual support requires additional conditional logic Copilot SDK solution: Declare business rules and best practices through Skills Agent automatically generates and executes required code Zero-code implementation of complex layout logic Development time reduced from 2-3 days to 2-3 hours Ultra-Short Path from Intent to Execution Comparison: Different ways to implement "Generate professional PPT" 3. Production-Grade Reliability and Quality Assurance Battle-tested Agent engine: Uses the same core as GitHub Copilot CLI Validated in millions of real-world scenarios Automatically handles edge cases and errors Consistent output quality: Professional standards ensured through Skills Automatic validation of generated files Built-in retry and error recovery mechanisms 4. Rapid Iteration and Optimization Capability Scenario: Client requests PPT style adjustment The GitHub Repo https://github.com/kinfey/GenGitHubRepoPPT 4. Summary 4.1 Core Value of Hybrid Models + Copilot SDK The GenGitHubRepoPPT project demonstrates how combining hybrid models with Copilot SDK creates a new paradigm for AI application development. Privacy and Cost Balance The hybrid approach allows sensitive README analysis to happen locally using Qwen-2.5-7B, ensuring data never leaves the device while incurring zero API costs. Meanwhile, the value-creating work—generating professional PowerPoint presentations—leverages Claude Sonnet 4.5 through Copilot SDK, delivering quality that justifies the per-use cost. From Code to Intent Traditional AI development required writing hundreds of lines of code to handle PPT generation logic, layout selection, style application, and error handling. With Copilot SDK and Skills, developers describe what they want in natural language, and the Agent automatically generates and executes the necessary code. What once took 3-5 days now takes 3-4 hours, with 95% less code to maintain. Automated Business Code Generation Copilot SDK doesn't just provide code examples—it generates complete, executable business logic. When you request a multilingual PPT, the Agent understands the requirement, selects appropriate fonts, generates the implementation code, executes it with error handling, validates the output, and returns a ready-to-use file. Developers focus on business intent rather than implementation details. 4.2 Technology Trends The Shift to Intent-Driven Development We're witnessing a fundamental change in how developers work. Rather than mastering every programming language detail and framework API, developers are increasingly defining what they want through declarative Skills. Copilot SDK represents this future: you describe capabilities in natural language, and AI Agents handle the code generation and execution automatically. Edge AI and Cloud AI Integration The evolution from pure cloud LLMs (powerful but privacy-concerning) to pure local SLMs (private but limited) has led to today's hybrid architectures. GenGitHubRepoPPT exemplifies this trend: local models handle data analysis and structuring, while cloud models tackle complex reasoning and professional output generation. This combination delivers fast, secure, and professional results. Democratization of Agent Development Copilot SDK dramatically lowers the barrier to building AI applications. Senior engineers see 10-20x productivity gains. Mid-level engineers can now build sophisticated agents that were previously beyond their reach. Even junior engineers and business experts can participate by writing Skills that capture domain knowledge without deep technical expertise. The future isn't about whether we can build AI applications—it's about how quickly we can turn ideas into reality. References Projects and Code GenGitHubRepoPPT GitHub Repository - Case study project Microsoft Foundry Local - Local AI runtime GitHub Copilot SDK - Agent development SDK Copilot SDK Getting Started Tutorial - Official quick start Deep Dive: Copilot SDK Build an Agent into Any App with GitHub Copilot SDK - Official announcement GitHub Copilot SDK Cookbook - Practical examples Copilot CLI Official Documentation - CLI tool documentation Learning Resources Edge AI for Beginners - Edge AI introductory course Azure AI Foundry Documentation - Azure AI documentation GitHub Copilot Extensions Guide - Extension development guide1.6KViews3likes2CommentsAgents League: Meet the Winners
Agents League brought together developers from around the world to build AI agents using Microsoft's developer tools. With 100+ submissions across three tracks, choosing winners was genuinely difficult. Today, we're proud to announce the category champions. 🎨 Creative Apps Winner: CodeSonify View project CodeSonify turns source code into music. As a genuinely thoughtful system, its functions become ascending melodies, loops create rhythmic patterns, conditionals trigger chord changes, and bugs produce dissonant sounds. It supports 7 programming languages and 5 musical styles, with each language mapped to its own key signature and code complexity directly driving the tempo. What makes CodeSonify stand out is the depth of execution. CodeSonify team delivered three integrated experiences: a web app with real-time visualization and one-click MIDI export, an MCP server exposing 5 tools inside GitHub Copilot in VS Code Agent Mode, and a diff sonification engine that lets you hear a code review. A clean refactor sounds harmonious. A messy one sounds chaotic. The team even built the MIDI generator from scratch in pure TypeScript with zero external dependencies. Built entirely with GitHub Copilot assistance, this is one of those projects that makes you think about code differently. 🧠 Reasoning Agents Winner: CertPrep Multi-Agent System View project CertPrep Multi-Agent System team built a production-grade 8-agent system for personalized Microsoft certification exam preparation, supporting 9 exam families including AI-102, AZ-204, AZ-305, and more. Each agent has a distinct responsibility: profiling the learner, generating a week-by-week study schedule, curating learning paths, tracking readiness, running mock assessments, and issuing a GO / CONDITIONAL GO / NOT YET booking recommendation. The engineering behind the scene here is impressive. A 3-tier LLM fallback chain ensures the system runs reliably even without Azure credentials, with the full pipeline completing in under 1 second in mock mode. A 17-rule guardrail pipeline validates every agent boundary. Study time allocation uses the Largest Remainder algorithm to guarantee no domain is silently zeroed out. 342 automated tests back it all up. This is what thoughtful multi-agent architecture looks like in practice. 💼 Enterprise Agents Winner: Whatever AI Assistant (WAIA) View project WAIA is a production-ready multi-agent system for Microsoft 365 Copilot Chat and Microsoft Teams. A workflow agent routes queries to specialized HR, IT, or Fallback agents, transparently to the user, handling both RAG-pattern Q&A and action automation — including IT ticket submission via a SharePoint list. Technically, it's a showcase of what serious enterprise agent development looks like: a custom MCP server secured with OAuth Identity Passthrough, streaming responses via the OpenAI Responses API, Adaptive Cards for human-in-the-loop approval flows, a debug mode accessible directly from Teams or Copilot, and full OpenTelemetry integration visible in the Foundry portal. Franck also shipped end-to-end automated Bicep deployment so the solution can land in any Azure environment. It's polished, thoroughly documented, and built to be replicated. Thank you To every developer who submitted and shipped projects during Agents League: thank you 💜 Your creativity and innovation brought Agents League to life! 👉 Browse all submissions on GitHubFrom Prototype to Production: Building a Hosted Agent with AI Toolkit & Microsoft Foundry
From Prototype to Production: Building a Hosted Agent with AI Toolkit & Microsoft Foundry Agentic AI is no longer a future concept — it’s quickly becoming the backbone of intelligent, action-oriented applications. But while it’s easy to prototype an AI agent, taking it all the way to production requires much more than a clever prompt. In this blog post - and the accompanying video tutorial - we walk through the end-to-end journey of an AI engineer building, testing, and operationalizing a hosted AI agent using AI Toolkit in Visual Studio Code and Microsoft Foundry. The goal is to show not just how to build an agent, but how to do it in a way that’s scalable, testable, and production ready. The scenario: a retail agent for sales and inventory insights To make things concrete, the demo uses a fictional DIY and home‑improvement retailer called Zava. The objective is to build an AI agent that can assist the internal team in: Analyzing sales data (e.g. reason over a product catalog, identify top‑selling categories, etc.) Managing inventory (e.g. Detect products running low on stock, trigger restock actions, etc.) Chapter 1 (min 00:00 – 01:20): Model selection with GitHub Copilot and AI Toolkit The journey starts in Visual Studio Code, using GitHub Copilot together with the AI Toolkit. Instead of picking a model arbitrarily, we: Describe the business scenario in natural language Ask Copilot to perform a comparative analysis between two candidate models Define explicit evaluation criteria (reasoning quality, tool support, suitability for analytics) Copilot leverages AI Toolkit skills to explain why one model is a better fit than the other — turning model selection into a transparent, repeatable decision. To go deeper, we explore the AI Toolkit Model Catalog, which lets you: Browse hundreds of models Filter by hosting platform (GitHub, Microsoft Foundry, local) Filter by publisher (open‑source and proprietary) Once the right model is identified, we deploy it to Microsoft Foundry with a single click and validate it with test prompts. Chapter 2 (min 01:20 – 02:48): Rapid agent prototyping with Agent Builder UI With the model ready, it’s time to build the agent. Using the Agent Builder UI, we configure: The agent’s identity (name, role, responsibilities) Instructions that define tone, behavior, and scope The model the agent runs on The tools and data sources it can access For this scenario, we add: File search, grounded on uploaded sales logs and a product catalog Code interpreter, enabling the agent to compute metrics, generate charts, and write reports We can then test the agent in the right-side playground by asking business questions like: “What were the top three selling categories in 2025?” The response is not generic — it’s grounded in the retailer’s data, and you can inspect which tools and data were used to produce the answer. The Agent Builder also provides local evaluation and tracing functionalities. Chapter 3 (min 02:48 – 04:04): From UI prototype to hosted agent code UI-based prototyping is powerful, but real solutions often require custom logic. This is where we transition from prototype to production by using a built-in workflow to migrate from UI to a hosted agent template The result is a production-ready scaffold that includes: Agent code (built with Microsoft Agent Framework; you can choose between Python or C#) A YAML-based agent definition Container configuration files From here, we extend the agent with custom functions — for example, to create and manage restock orders. GitHub Copilot helps accelerate this step by adapting the template to the Zava business scenario. Chapter 4 (min 04:04 – 05:12): Local debugging and cloud deployment Before deploying, we test the agent locally: Ask it to identify products running out of stock Trigger a restock action using the custom function Debug the full tool‑calling flow end to end Once validated, we deploy the agent to Microsoft Foundry. By deploying the agent to the Cloud, we don’t just get compute power, but a whole set of built-in features to operationalize our solution and maintain it in production. Chapter 5 (min 05:12 – 08:04): Evaluation, safety, and monitoring in Foundry Production readiness doesn’t stop at deployment. In the Foundry portal, we explore: Evaluation runs, using both real and synthetic datasets LLM‑based judges that score responses across multiple metrics, with explanations Red teaming, where an adversarial agent probes for unsafe or undesired behavior Monitoring dashboards, tracking usage, latency, regressions, and cost across the agent fleet These capabilities make it possible to move from ad‑hoc testing to continuous quality and safety assessment. Why this workflow matters This end-to-end flow demonstrates a key idea: Agentic AI isn’t just about building agents — it’s about operating them responsibly at scale. By combining AI Toolkit in VS Code with Microsoft Foundry, you get: A smooth developer experience Clear separation between experimentation and production Built‑in evaluation, safety, and observability Resources Demo Sample: GitHub Repo Foundry tutorials: Inside Microsoft Foundry - YouTubePhi-4-Reasoning-Vision-15B: Use Cases In-Depth
Phi-4-Reasoning-vision-15B is Microsoft's latest vision reasoning model released on Microsoft Foundry. It combines high-resolution visual perception with selective, task-aware reasoning, making it the first model in the Phi-4 family to simultaneously achieve both "seeing clearly" and "thinking deeply" as a small language model (SLM). Traditional vision models only perform passive perception — recognizing "what's in" an image. Phi-4-Reasoning-Vision-15B goes further by performing structured, multi-step reasoning: understanding visual structure in images, connecting it with textual context, and reaching actionable conclusions. This enables developers to build intelligent applications ranging from chart analysis to GUI automation. Core Design Features 2.1 Selective Reasoning The model's most critical design feature is its hybrid reasoning behavior. It can switch between "reasoning mode" and "non-reasoning mode" based on the prompt: When deep reasoning is needed (e.g., math problems, logical analysis) → Multi-step reasoning chain is activated When fast perception is sufficient (e.g., OCR, element localization) → Direct output with reduced latency 2.2 Three Thinking Modes (from Notebook Examples) Developers can precisely control reasoning behavior via the thinking_mode parameter: Mode Trigger Description Best For hybrid (Mixed) Default Model autonomously decides whether deep reasoning is needed General use, balancing speed and accuracy think (Deep Thinking) Appends <think> token Forces full reasoning chain Complex math / science / logic problems nothink (Fast Response) Appends <nothink> token Skips reasoning chain, outputs directly Low-latency perception tasks, simple Q&A The corresponding code implementation: def run_inference(processor, model, prompt, image, thinking_mode="hybrid"): ## FORM MESSAGE AND LOAD IMAGE messages = [ { "role": "user", "content": prompt, } ] ## PROCESS INPUTS prompt = processor.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, return_dict=False, ) if thinking_mode == "think": prompt = str(prompt) + "<think>" elif thinking_mode == "nothink": prompt = str(prompt) + "<|dummy_84|>" print(f"Prompt: {prompt}") inputs = processor(text=prompt, images=[image], return_tensors="pt").to(model.device) ## GENERATE RESPONSE output_ids = model.generate( **inputs, max_new_tokens=1024, temperature=None, top_p=None, do_sample=False, use_cache=False, ) ## DECODE RESPONSE sequence_length = inputs["input_ids"].shape[1] sequence_length -= 1 if thinking_mode == "think" else 0 # remove the extra token for nothink mode new_output_ids = output_ids[:, sequence_length:] model_output = processor.batch_decode( new_output_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False )[0] return model_output This design allows developers to dynamically balance latency and accuracy at runtime — essential for real-time interactive applications. Key Use Cases Use Case 1: GUI Agents (Computer Use Agents) This is one of the model's most important application areas.The model receives a screenshot and a natural language instruction, then outputs the normalized bounding box coordinates for the target UI element. The Notebook also provides a plot_boxes() visualization function that compares model predictions (red box) against ground truth annotations (green box). Real-World Example — E-Commerce Shopping Agent: As described in the official documentation, in retail scenarios the model serves as the perception layer for computer-use agents: Screen comprehension: Identifies products, prices, filters, promotions, buttons, and cart states Grounded output: Produces actionable coordinates for upstream agent models (e.g., Fara-7B) to execute clicks, scrolls, and other interactions Real-time decision support: Compact model size and low-latency inference, suitable for navigating dense product listings and comparing options Use Case 2: Mathematical and Scientific Visual Reasoning Typical applications: Interpreting geometric figures and function graphs for problem-solving Analyzing scientific experiment diagrams and data charts Education: Students photograph and upload problems; the model shows the complete reasoning process and solution steps Use Case 3: Document, Chart, and Table Understanding Typical applications: IT Operations: Interpreting monitoring dashboards, performance charts, and incident reports to assist diagnosis and decision-making Financial Analysis: Extracting metrics from report screenshots and interpreting trends Enterprise Report Automation: Processing scanned documents and tables to generate structured summaries Samples 1. Using Phi-4-Reasoning-Vision-15B to detect jaywalking Go to - Sample Code 2. Using Phi-4-Reasoning-Vision-15B to math Go to - Sample Code 3. Using Phi-4-Reasoning-Vision-15B for GUI Agent Go to - Sample Code Model Comparison at a Glance Below is a comparison of Phi-4-Reasoning-Vision-15B against comparable models on key tasks: No Thinking Mode Thinking Mode Phi-4-Reasoning-Vision-15B shows clear advantages in math reasoning and GUI grounding tasks while remaining competitive in general multimodal understanding. Summary Phi-4-Reasoning-Vision-15B represents a significant milestone for small vision reasoning models: Sees clearly: High-resolution visual perception supporting documents, charts, UI screenshots, and more Thinks deeply: Selective multi-step reasoning chains that rival larger models on complex tasks Runs fast: 15B parameters + NoThink mode, suitable for real-time interactive applications Adapts flexibly: Three thinking modes switchable on the fly, letting developers dynamically balance accuracy and latency at runtime Whether building e-commerce shopping agents, IT operations assistants, or educational tutoring tools, this model provides a complete capability chain from "seeing" to "understanding" to "acting." Resources 1. Read official Blog - Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model 2. Learn more about Phi-4-reasoning-vision in Huggingface - https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B 3. Learn more about Microsoft Phi Family - Microsoft Phi CookBook679Views0likes0CommentsGiving Your AI Agents Reliable Skills with the Agent Skills SDK
AI agents are becoming increasingly capable, but they often do not have the context they need to do real work reliably. Your agent can reason well, but it does not actually know how to do the specific things your team needs it to do. For example, it cannot follow your company's incident response playbook, it does not know your escalation policy, and it has no idea how to page the on-call engineer at 3 AM. There are many ways to close this gap, from RAG to custom tool implementations. Agent Skills is one approach that stands out because it is designed around portability and progressive disclosure, keeping context window usage minimal while giving agents access to deep expertise on demand. What is Agent Skills? Agent Skills is an open format for giving agents new capabilities and expertise. The format was originally developed by Anthropic and released as an open standard. It is now supported by a growing list of agent products including Claude Code, VS Code, GitHub, OpenAI Codex, Cursor, Gemini CLI, and many others. As defined in the spec, a skill is a folder on disk containing a SKILL.md file with metadata and instructions, plus optional scripts, references, and assets: incident-response/ SKILL.md # Required: instructions + metadata references/ # Optional: additional documentation severity-levels.md escalation-policy.md scripts/ # Optional: executable code page-oncall.sh assets/ # Optional: templates, diagrams, data files The SKILL.md file has YAML frontmatter with a name and description (so agents know when the skill is relevant), followed by markdown instructions that tell the agent how to perform the task. The format is intentionally simple: self-documenting, extensible, and portable. What makes this design practical is progressive disclosure. The spec is built around the idea that agents should not load everything at once. It works in three stages: Discovery: At startup, agents load only the name and description of each available skill, just enough to know when it might be relevant. Activation: When a task matches a skill's description, the agent reads the full SKILL.md instructions into context. Execution: The agent follows the instructions, optionally loading referenced files or executing bundled scripts as needed. This keeps agents fast while giving them access to deep context on demand. The format is well-designed and widely adopted, but if you want to use skills from your own agents, there is a gap between the spec and a working implementation. The Agent Skills SDK Conceptually, a skill is more than a folder. It is a unit of expertise: a name, a description, a body of instructions, and a set of supporting resources. The file layout is one way to represent that, but there is nothing about the concept that requires a filesystem. The Agent Skills SDK is an open-source Python library built around that idea, treating skills as abstract units of expertise that can be stored anywhere and consumed by any agent framework. It does this by addressing two challenges that come up when you try to use the format from your own agents. The first is where skills live. The spec defines skills as folders on disk, and the tools that support the format today all assume skills are local files. Files are inherently portable, and that is one of the format's strengths. But in the real world, not every team can or wants to serve skills from the filesystem. Maybe your team keeps them in an S3 bucket. Maybe they are in Azure Blob Storage behind your CDN. Maybe they live in a database alongside the rest of your application data. At the moment, if your skills are not on the local filesystem, you are on your own. The SDK changes where skills are served from, not how they are authored. The content and format stay the same regardless of the storage backend, so skills remain portable across providers. The second is how agents consume them. The spec defines the progressive disclosure pattern but actually implementing it in your agent requires real work. You need to figure out how to validate skills against the spec, generate a catalog for the system prompt, expose the right tools for on-demand content retrieval, and handle the back-and-forth of the agent requesting metadata, then the body, then individual references or scripts. That is a lot of plumbing regardless of where the skills are stored, and the work multiplies if you want to support more than one agent framework. The SDK solves both by separating where skills come from (providers) from how agents use them (integrations), so you can mix and match freely. Load skills from the filesystem today, move them to an HTTP server tomorrow, swap in a custom database provider next month, and your agent code does not change at all. How the SDK works The SDK is a set of Python packages organized around two ideas: storage-agnostic providers and progressive disclosure. The provider abstraction means your skills can live anywhere. The SDK ships with providers for the local filesystem and static HTTP servers, but the SkillProvider interface is simple enough that you can write your own in a few methods. A Cosmos DB provider, a Git provider, a SharePoint provider, whatever makes sense for your team. The rest of the SDK does not care where the data comes from. On top of that, the SDK implements the progressive disclosure pattern from the spec as a set of tools that any LLM agent can use. At startup, the SDK generates a skills catalog containing each skill's name and description. Your agent injects this catalog into its system prompt so it knows what is available. Then, during a conversation, the agent calls tools to retrieve content on demand, following the same discovery-activation-execution flow the spec describes. Here is the flow in practice: You register skills from any source (local files, an HTTP server, your own database). The SDK generates a catalog and tool usage instructions, which you inject into the system prompt. The agent calls tools to retrieve content on demand. This matters because context windows are finite. An incident response skill might have a main body, three reference documents, two scripts, and a flowchart. The agent should not load all of that upfront. It should read the body first, then pull the escalation policy only when the conversation actually gets to escalation. A quick example Here is what it looks like in practice. Start by loading a skill from the filesystem: from pathlib import Path from agentskills_core import SkillRegistry from agentskills_fs import LocalFileSystemSkillProvider provider = LocalFileSystemSkillProvider(Path("my-skills")) registry = SkillRegistry() await registry.register("incident-response", provider) Now wire it into a LangChain agent: from langchain.agents import create_agent from agentskills_langchain import get_tools, get_tools_usage_instructions tools = get_tools(registry) skills_catalog = await registry.get_skills_catalog(format="xml") tool_usage_instructions = get_tools_usage_instructions() system_prompt = ( "You are an SRE assistant. Use the available skill tools to look up " "incident response procedures, severity definitions, and escalation " "policies. Always cite which reference document you used.\n\n" f"{skills_catalog}\n\n" f"{tool_usage_instructions}" ) agent = create_agent( llm, tools, system_prompt=system_prompt, ) That is it. The agent now knows what skills are available and has tools to fetch their content. When a user asks "How do I handle a SEV1 incident?", the agent will call get_skill_body to read the instructions, then get_skill_reference to pull the severity levels document, all without you writing any of that retrieval logic. The same pattern works with Microsoft Agent Framework: from agentskills_agentframework import get_tools, get_tools_usage_instructions tools = get_tools(registry) skills_catalog = await registry.get_skills_catalog(format="xml") tool_usage_instructions = get_tools_usage_instructions() system_prompt = ( "You are an SRE assistant. Use the available skill tools to look up " "incident response procedures, severity definitions, and escalation " "policies. Always cite which reference document you used.\n\n" f"{skills_catalog}\n\n" f"{tool_usage_instructions}" ) agent = Agent( client=client, instructions=system_prompt, tools=tools, ) What is in the SDK The SDK is split into small, composable packages so you only install what you need: agentskills-core handles registration, validation, the skills catalog, and the progressive disclosure API. It also defines the SkillProvider interface that all providers implement. agentskills-fs and agentskills-http are the two built-in providers. The filesystem provider loads skills from local directories. The HTTP provider loads them from any static file host: S3, Azure Blob Storage, GitHub Pages, a CDN, or anything that serves files over HTTP. agentskills-langchain and agentskills-agentframework generate framework-native tools and tool usage instructions from a skill registry. agentskills-mcp-server spins up an MCP server that exposes skill tool access and usage as tools and resources, so any MCP-compatible client can use them. Because providers and integrations are separate packages, you can combine them however you want. Use the filesystem provider during development, switch to the HTTP provider in production, or write a custom provider that reads skills from your own database. The integration layer does not need to know or care. Where to go from here The full source, working examples, and detailed API docs are on GitHub: github.com/pratikxpanda/agentskills-sdk The repo includes end-to-end examples for both LangChain and Microsoft Agent Framework, covering filesystem providers, HTTP providers, and MCP. There is also a sample incident-response skill you can use to try things out. A proposal to contribute this SDK to the official agentskills repository has been submitted. If you find it useful, feel free to show your support on the GitHub issue. To learn more about the Agent Skills format itself: What are skills? covers the format and why it matters. Specification is the complete format reference for SKILL.md files. Integrate skills explains how to add skills support to your agent. Example skills on GitHub are a good starting point for writing your own. The SDK is MIT licensed and contributions are welcome. If you have questions or ideas, post a question here or open an issue on the repo.Building a Dual Sidecar Pod: Combining GitHub Copilot SDK with Skill Server on Kubernetes
Why the Sidecar Pattern? In Kubernetes, a Pod is the smallest deployable unit — a single Pod can contain multiple containers that share the same network namespace and storage volumes. The Sidecar pattern places auxiliary containers alongside the main application container within the same Pod. These Sidecar containers extend or enhance the main container's functionality without modifying it. 💡 Beginner Tip: If you're new to Kubernetes, think of a Pod as a shared office — everyone in the room (containers) has their own desk (process), but they share the same network (IP address), the same file cabinet (storage volumes), and can communicate without leaving the room (localhost communication). The Sidecar pattern is not a new concept. As early as 2015, the official Kubernetes blog described this pattern in a post about Composite Containers. Service mesh projects like Envoy, Istio, and Linkerd extensively use Sidecar containers for traffic management, observability, and security policies. In the AI application space, we are now exploring how to apply this proven pattern to new scenarios. Why does this matter? There are three fundamental reasons: 1. Separation of Concerns Each container in a Pod has a single, well-defined responsibility. The main application container doesn't need to know how AI content is generated or how skills are managed — it only serves the results. This separation allows each component to be independently tested, debugged, and replaced, aligning with the Unix philosophy of "do one thing well." In practice, this means: the frontend team can iterate on Nginx configuration without affecting AI logic; AI engineers can upgrade the Copilot SDK version without touching skill management code; and operations staff can adjust skill configurations without notifying the development team. 2. Shared Localhost Network All containers in a Pod share the same network namespace, with the same 127.0.0.1. This means communication between Sidecars is just a simple localhost HTTP call — no service discovery, no DNS resolution, no cross-node network hops. From a performance perspective, localhost communication traverses the kernel's loopback interface, with latency typically in the microsecond range. In contrast, cross-Pod ClusterIP Service calls require routing through kube-proxy's iptables/IPVS rules, with latency typically in the millisecond range. For AI agent scenarios that require frequent interaction, this difference is meaningful. From a security perspective, localhost communication doesn't traverse any network interface, making it inherently immune to eavesdropping by other Pods in the cluster. Unless a Service is explicitly configured, Sidecar ports are not exposed outside the Pod. 3. Efficient Data Transfer via Shared Volumes Kubernetes emptyDir volumes allow containers within the same Pod to share files on disk. Once a Sidecar writes a file, the main container can immediately read and serve it — no message queues, no additional API calls, no databases. This is ideal for workflows where one container produces artifacts (such as generated blog posts) and another consumes them. ⚠️ Technical Precision Note: "Efficient" here means eliminating the overhead of network serialization/deserialization and message middleware. However, emptyDir fundamentally relies on standard file system I/O (disk read/write or tmpfs) and is not equivalent to OS-level "Zero-Copy" (such as the sendfile() system call or DMA direct memory access). For blog content generation — a file-level data transfer use case — filesystem sharing is already highly efficient and sufficiently simple. In the gh-cli-blog-agent project, we take this pattern to its fullest extent by using two Sidecars within a single Pod: A Note on Kubernetes Native Sidecar Containers It is worth noting that Kubernetes 1.28 (August 2023) introduced native Sidecar container support via KEP-753, which reached GA (General Availability) in Kubernetes 1.33 (April 2025). Native Sidecars are implemented by setting restartPolicy: Always on initContainers, providing capabilities that the traditional approach lacks: Deterministic startup order: init containers start in declaration order; main containers only start after Sidecar containers are ready Non-blocking Pod termination: Sidecars are automatically cleaned up after main containers exit, preventing Jobs/CronJobs from being stuck Probe support: Sidecars can be configured with startup, readiness, and liveness probes to signal their operational state This project currently uses the traditional approach of deploying Sidecars as regular containers, with application-level health check polling (wait_for_skill_server) to handle startup dependencies. This approach is compatible with all Kubernetes versions (1.24+), making it suitable for scenarios requiring broad compatibility. If your cluster version is ≥ 1.29 (or ≥ 1.33 for GA stability), we strongly recommend migrating to native Sidecars for platform-level startup order guarantees and more graceful lifecycle management. Migration example: # Native Sidecar syntax (Kubernetes 1.29+) initContainers: - name: skill-server image: blog-agent-skill restartPolicy: Always # Key: marks this as a Sidecar ports: - containerPort: 8002 startupProbe: # Platform-level startup readiness signal httpGet: path: /health port: 8002 periodSeconds: 2 failureThreshold: 30 - name: copilot-agent image: blog-agent-copilot restartPolicy: Always ports: - containerPort: 8001 containers: - name: blog-app # Main container starts last; Sidecars are ready image: blog-agent-main ports: - containerPort: 80 Architecture Overview The deployment defines three containers and three volumes: Container Image Port Role blog-app blog-agent-main 80 Nginx — serves Web UI and reverse proxies to Sidecars copilot-agent blog-agent-copilot 8001 FastAPI — AI blog generation powered by GitHub Copilot SDK skill-server blog-agent-skill 8002 FastAPI — skill file management and synchronization Volume Type Purpose blog-data emptyDir Copilot agent writes generated blogs; Nginx serves them skills-shared emptyDir Skill server writes skill files; Copilot agent reads them skills-source ConfigMap Kubernetes-managed skill definition files (read-only) 💡 Design Insight: The three-volume design embodies the "least privilege" principle — blog-data is shared only between the Copilot agent (write) and Nginx (read); skills-shared is shared only between the skill server (write) and the Copilot agent (read). skills-source provides read-only skill definition sources via ConfigMap, forming a unidirectional data flow: ConfigMap → skill-server → shared volume → copilot-agent. The Kubernetes deployment YAML clearly describes this structure: volumes: - name: blog-data emptyDir: sizeLimit: 256Mi # Production best practice: always set sizeLimit to prevent disk exhaustion - name: skills-shared emptyDir: sizeLimit: 64Mi # Skill files are typically small - name: skills-source configMap: name: blog-agent-skill ⚠️ Production Recommendation: The original configuration used emptyDir: {} without a sizeLimit. In production, an unrestricted emptyDir can grow indefinitely until it exhausts the node's disk space, triggering a node-level DiskPressure condition and causing other Pods to be evicted. Always setting a reasonable sizeLimit for emptyDir is part of the Kubernetes security baseline. Community tools like Kyverno can enforce this practice at the cluster level. Nginx reverse proxies route requests to Sidecars via localhost: # Reverse proxy to copilot-agent sidecar (localhost:8001 within the same Pod) location /agent/ { proxy_pass http://127.0.0.1:8001/; proxy_set_header Host $host; proxy_set_header X-Request-ID $request_id; # Enables cross-container request tracing proxy_read_timeout 600s; # AI generation may take a while } # Reverse proxy to skill-server sidecar (localhost:8002 within the same Pod) location /skill/ { proxy_pass http://127.0.0.1:8002/; proxy_set_header Host $host; } Since all three containers share the same network namespace, 127.0.0.1:8001 and 127.0.0.1:8002 are directly accessible — no ClusterIP Service is needed for intra-Pod communication. This is a core feature of the Kubernetes Pod networking model: all containers within the same Pod share a single network namespace, including IP address and port space. Advantage 1: GitHub Copilot SDK as a Sidecar Encapsulating the GitHub Copilot SDK as a Sidecar, rather than embedding it in the main application, provides several architectural advantages. Understanding the GitHub Copilot SDK Architecture Before diving deeper, let's understand how the GitHub Copilot SDK works. The SDK entered technical preview in January 2026, exposing the production-grade agent runtime behind GitHub Copilot CLI as a programmable SDK supporting Python, TypeScript, Go, and .NET. The SDK's communication architecture is as follows: The SDK client communicates with a locally running Copilot CLI process via the JSON-RPC protocol. The CLI handles model routing, authentication management, MCP server integration, and other low-level details. This means you don't need to build your own planner, tool loop, and runtime — these are all provided by an engine that has been battle-tested in production at GitHub's scale. The benefit of encapsulating this SDK in a Sidecar container is: containerization isolates the CLI process's dependencies and runtime environment, preventing dependency conflicts with the main application or other components. Cross-Platform Node.js Installation in the Container A notable implementation detail is how Node.js (required by the Copilot CLI) is installed inside the container. Rather than relying on third-party APT repositories like NodeSource — which can introduce DNS resolution failures and GPG key management issues in restricted network environments — the Dockerfile downloads the official Node.js binary directly from nodejs.org with automatic architecture detection: # Install Node.js 20+ (official binary, no NodeSource APT repo needed) ARG NODE_VERSION=20.20.0 RUN DPKG_ARCH=$(dpkg --print-architecture) \ && case "${DPKG_ARCH}" in amd64) ARCH=x64;; arm64) ARCH=arm64;; armhf) ARCH=armv7l;; *) ARCH=${DPKG_ARCH};; esac \ && curl -fsSL "https://nodejs.org/dist/v${NODE_VERSION}/node-v${NODE_VERSION}-linux-${ARCH}.tar.xz" -o node.tar.xz \ && tar -xJf node.tar.xz -C /usr/local --strip-components=1 --no-same-owner \ && rm -f node.tar.xz The case statement maps Debian's architecture identifiers (amd64, arm64, armhf) to Node.js's naming convention (x64, arm64, armv7l). This ensures the same Dockerfile works seamlessly on both linux/amd64 (Intel/AMD) and linux/arm64 (Apple Silicon, AWS Graviton) build platforms — an important consideration given the growing adoption of ARM-based infrastructure. Independent Lifecycle and Resource Management The Copilot agent is the most resource-intensive component — it needs to run the Copilot CLI process, manage JSON-RPC communication, and handle streaming responses. By isolating it in its own container, we can assign dedicated CPU and memory limits without affecting the lightweight Nginx container: # copilot-agent: needs more resources for AI inference coordination resources: requests: cpu: 250m memory: 512Mi limits: cpu: "1" memory: 2Gi # blog-app: lightweight Nginx with minimal resource needs resources: requests: cpu: 50m memory: 64Mi limits: cpu: 200m memory: 128Mi This resource isolation delivers two key benefits: Fault isolation: If the Copilot agent crashes due to a timeout or memory spike (OOMKilled), Kubernetes only restarts that container — the Nginx frontend continues running and serving previously generated content. Users see "generation feature temporarily unavailable" rather than "entire site is down." Fine-grained resource scheduling: The Kubernetes scheduler selects nodes based on the sum of Pod-level resource requests. Distributing resource requests across containers allows kubelet to more precisely track each component's actual resource consumption, helping HPA (Horizontal Pod Autoscaler) make better scaling decisions. Graceful Startup Coordination In a multi-Sidecar Pod, regular containers start concurrently (note: this is precisely one of the issues that native Sidecars, discussed earlier, can solve). The Copilot agent handles this through application-level startup dependency checks — it waits for the skill server to become healthy before initializing the CopilotClient: async def wait_for_skill_server(url: str, retries: int = 30, delay: float = 2.0): """Wait for the skill-server sidecar to become healthy. In traditional Sidecar deployments (regular containers), containers start concurrently with no guaranteed startup order. This function implements application-level readiness waiting. If using Kubernetes native Sidecars (initContainers + restartPolicy: Always), the platform guarantees Sidecars start before main containers, which can simplify this logic. """ async with httpx.AsyncClient() as client: for i in range(retries): try: resp = await client.get(f"{url}/health", timeout=5.0) if resp.status_code == 200: logger.info(f"Skill server is healthy at {url}") return True except Exception: pass logger.info(f"Waiting for skill server... ({i + 1}/{retries})") await asyncio.sleep(delay) raise RuntimeError(f"Skill server at {url} did not become healthy") This pattern is critical in traditional Sidecar architectures: you cannot assume startup order, so explicit readiness checks are necessary. The wait_for_skill_server function polls http://127.0.0.1:8002/health at 2-second intervals up to 30 times (maximum total wait of 60 seconds) — simple, effective, and resilient. 💡 Comparison: With native Sidecars, the skill-server would be declared as an initContainer with a startupProbe. Kubernetes would ensure the skill-server is ready before starting the copilot-agent. In that case, wait_for_skill_server could be simplified to a single health check confirmation rather than a retry loop. SDK Configuration via Environment Variables All Copilot SDK configuration is passed through Kubernetes-native primitives, reflecting the 12-Factor App principle of externalized configuration: env: - name: SKILL_SERVER_URL value: "http://127.0.0.1:8002" - name: SKILLS_DIR value: "/skills-shared/blog/SKILL.md" - name: COPILOT_GITHUB_TOKEN valueFrom: secretKeyRef: name: blog-agent-secret key: copilot-github-token Key design decisions explained: COPILOT_GITHUB_TOKEN is stored in a Kubernetes Secret — never baked into images or passed as build arguments. Using the GitHub Copilot SDK requires a valid GitHub Copilot subscription (unless using BYOK mode, i.e., Bring Your Own Key), making secure management of this token critical. SKILLS_DIR points to skill files synchronized to a shared volume by the other Sidecar. This means the Copilot agent container image is completely stateless and can be reused across different skill configurations. SKILL_SERVER_URL uses 127.0.0.1 instead of a service name — since this is intra-Pod communication, DNS resolution is unnecessary. 🔐 Production Security Tip: For stricter security requirements, consider using External Secrets Operator to sync Secrets from AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault, rather than managing them directly in Kubernetes. Native Kubernetes Secrets are only Base64-encoded by default, not encrypted at rest (unless Encryption at Rest is enabled). CopilotClient Sessions and Skill Integration The core of the Copilot Sidecar lies in how it creates sessions with skill directories. When a blog generation request is received, it creates a session with access to skill definitions: session = await copilot_client.create_session({ "model": "claude-sonnet-4-5-20250929", "streaming": True, "skill_directories": [SKILLS_DIR] }) The skill_directories parameter points to files on the shared volume — files placed there by the skill-server sidecar. This is the handoff point: the skill server manages which skills are available, and the Copilot agent consumes them. Neither container needs to know about the other's internal implementation — they are coupled only through the filesystem as an implicit contract. 💡 About Copilot SDK Skills: The GitHub Copilot SDK allows you to define custom Agents, Skills, and Tools. Skills are essentially instruction sets written in Markdown format (typically named SKILL.md) that define the agent's behavior, constraints, and workflows in a specific domain. This is consistent with the .copilot_skills/ directory mechanism in GitHub Copilot CLI. File-Based Output to Shared Volumes Generated blog posts are written to the blog-data shared volume, which is simultaneously mounted in the Nginx container: BLOG_DIR = os.path.join(WORK_DIR, "blog") # ... # Blog saved as blog-YYYY-MM-DD.md # Nginx can serve it immediately from /blog/ without any restart The Nginx configuration auto-indexes this directory: location /blog/ { alias /usr/share/nginx/html/blog/; autoindex on; } The moment the Copilot agent writes a file, it's immediately accessible through the Nginx Web UI. No API calls, no database writes, no cache invalidation — just a shared filesystem. This file-based data transfer has an additional benefit: natural persistence and auditability. Each blog exists as an independent Markdown file with a date-timestamp in its name, making it easy to trace generation history. (Note, however, that emptyDir lifecycle is tied to the Pod — data is lost when the Pod is recreated. For persistence needs, see the "Production Recommendations" section below.) Advantage 2: Skill Server as a Sidecar The skill server is the second Sidecar — a lightweight FastAPI service responsible for managing the skill definitions used by the Copilot agent. Separating skill management into its own container offers clear advantages. Decoupled Skill Lifecycle Skill definitions are stored in a Kubernetes ConfigMap: apiVersion: v1 kind: ConfigMap metadata: name: blog-agent-skill data: SKILL.md: | # Blog Generator Skill Instructions You are a professional technical evangelist... ## Key Requirements 1. Outline generation 2. Mandatory online research (DeepSearch) 3. Technical evangelist perspective ... ConfigMaps can be updated independently of any container image. When you run kubectl apply to update a ConfigMap, Kubernetes synchronizes the change to the volumes mounted in the Pod. ⚠️ Important Detail: ConfigMap volume updates do not take effect immediately. The kubelet detects ConfigMap changes through periodic synchronization, with a default sync period controlled by --sync-frequency (default: 1 minute), plus the ConfigMap cache TTL. The actual propagation delay can be 1–2 minutes. If immediate effect is needed, you must actively call the /sync endpoint to trigger a file synchronization: def sync_skills(): """Copy skill files from ConfigMap source to the shared volume.""" source = Path(SKILLS_SOURCE_DIR) dest = Path(SKILLS_SHARED_DIR) / "blog" dest.mkdir(parents=True, exist_ok=True) synced = 0 for skill_file in source.iterdir(): if skill_file.is_file(): target = dest / skill_file.name shutil.copy2(str(skill_file), str(target)) synced += 1 return synced This design means: updating AI behavior requires no container image rebuilds or redeployments. You simply update the ConfigMap, trigger a sync, and the agent's behavior changes. This is a tremendous operational advantage for iterating on prompts and skills in production. 💡 Advanced Thought: Why not mount the ConfigMap directly to the copilot-agent's SKILLS_DIR path? While technically feasible, introducing the skill-server as an intermediary provides the triple value of validation, API access, and extensibility (see "Why Not Embed Skills in the Copilot Agent" below). Minimal Resource Footprint The skill server does one thing — serve and sync files. Its resource requirements reflect this: resources: requests: cpu: 50m memory: 64Mi limits: cpu: 200m memory: 256Mi Compared to the Copilot agent's 2Gi memory limit, the skill server costs a fraction of the resources. This is the beauty of the Sidecar pattern — you can add lightweight containers for auxiliary functionality without significantly increasing the Pod's total resource consumption. REST API for Skill Introspection The skill server provides a simple REST API that allows external systems or operators to query available skills: .get("/skills") async def list_skills(): """List all available skills.""" source = Path(SKILLS_SOURCE_DIR) skills = [] for f in sorted(source.iterdir()): if f.is_file(): skills.append({ "name": f.stem, "filename": f.name, "size": f.stat().st_size, "url": f"/skill/{f.name}", }) return {"skills": skills, "total": len(skills)} @app.get("/skill/{filename}") async def get_skill(filename: str): """Get skill content by filename.""" file_path = Path(SKILLS_SOURCE_DIR) / filename if not file_path.exists() or not file_path.is_file(): raise HTTPException(status_code=404, detail=f"Skill '{filename}' not found") return {"filename": filename, "content": file_path.read_text(encoding="utf-8")} This API serves multiple purposes: Debugging: Verify which skills are currently loaded without needing to kubectl exec into the container, significantly lowering the troubleshooting barrier. Monitoring: External tools can poll /skills to ensure the expected skill set is deployed. Combined with Prometheus Blackbox Exporter, you can implement configuration drift detection. Extensibility: Future systems can dynamically register or update skills via the API, providing a foundation for A/B testing different prompt strategies. Why Not Embed Skills in the Copilot Agent? Mounting the ConfigMap directly into the Copilot agent container seems simpler. But separating it into a dedicated Sidecar has the following advantages: Validation layer: The skill server can validate skill file format and content before synchronization, preventing invalid skill definitions from causing Copilot SDK runtime errors. API access: Skills become queryable and manageable through a REST interface, supporting operational automation. Independent evolution of logic: If skill management becomes more complex (e.g., dynamic skill registration, version management, prompt A/B testing, role-based skill distribution), the skill server can evolve independently without affecting the Copilot agent. Clear data flow: ConfigMap → skill-server → shared volume → copilot-agent. Each arrow is an explicit, observable step. When something goes wrong, you can pinpoint exactly which stage failed. 💡 Architectural Trade-off: For small-scale deployments or PoC (Proof of Concept) work, directly mounting the ConfigMap to the Copilot agent is a perfectly reasonable choice — fewer components means lower operational overhead. The Sidecar approach's value becomes fully apparent in medium-to-large-scale production environments. Architectural decisions should always align with team size, operational maturity, and business requirements. End-to-End Workflow Here is the complete data flow when a user requests a blog post generation: Every step uses intra-Pod communication — localhost HTTP calls or shared filesystem reads. No external network calls are needed between components. The only external dependency is the Copilot SDK's connection to GitHub authentication services and AI model endpoints via the Copilot CLI. The Kubernetes Service exposes three ports for external access: ports: - name: http # Nginx UI + reverse proxy port: 80 nodePort: 30081 - name: agent-api # Direct access to Copilot Agent port: 8001 nodePort: 30082 - name: skill-api # Direct access to Skill Server port: 8002 nodePort: 30083 ⚠️ Security Warning: In production, it is not recommended to directly expose the agent-api and skill-api ports via NodePort. These two APIs should only be accessible through the Nginx reverse proxy (/agent/ and /skill/ paths), with authentication and rate limiting configured at the Nginx layer. Directly exposing Sidecar ports bypasses the reverse proxy's security controls. Recommended configuration: # Production recommended: only expose the Nginx port ports: - name: http port: 80 targetPort: 80 # Combine with NetworkPolicy to restrict inter-Pod communication Production Recommendations and Architecture Extensions When moving this architecture from a development/demo environment to production, the following areas deserve attention: Cross-Platform Build and Deployment The project's Makefile auto-detects the host architecture to select the appropriate Docker build platform, eliminating the need for manual configuration: ARCH := $(shell uname -m) ifeq ($(ARCH),x86_64) DOCKER_PLATFORM ?= linux/amd64 else ifeq ($(ARCH),aarch64) DOCKER_PLATFORM ?= linux/arm64 else ifeq ($(ARCH),arm64) DOCKER_PLATFORM ?= linux/arm64 else DOCKER_PLATFORM ?= linux/amd64 endif Both macOS and Linux are supported as development environments with dedicated tool installation targets: # macOS (via Homebrew) make install-tools-macos # Linux (downloads official binaries to /usr/local/bin) make install-tools-linux The Linux installation target downloads kubectl and kind binaries directly from upstream release URLs with architecture-aware selection, avoiding dependency on any package manager beyond curl and sudo. This makes the setup portable across different Linux distributions (Ubuntu, Debian, Fedora, etc.). Health Checks and Probe Configuration Configure complete probes for each container to ensure Kubernetes can properly manage container lifecycles: # copilot-agent probe example livenessProbe: httpGet: path: /health port: 8001 initialDelaySeconds: 10 periodSeconds: 30 timeoutSeconds: 5 readinessProbe: httpGet: path: /health port: 8001 periodSeconds: 10 startupProbe: # AI agent startup may be slow httpGet: path: /health port: 8001 periodSeconds: 5 failureThreshold: 30 # Allow up to 150 seconds for startup Data Persistence The emptyDir lifecycle is tied to the Pod. If generated blogs need to survive Pod recreation, consider these approaches: PersistentVolumeClaim (PVC): Replace the blog-data volume with a PVC; data persists independently of Pod lifecycle Object storage upload: After the Copilot agent generates a blog, asynchronously upload to S3/Azure Blob/GCS Git repository push: Automatically commit and push generated Markdown files to a Git repository for versioned management Security Hardening # Set security context for each container securityContext: runAsNonRoot: true runAsUser: 1000 readOnlyRootFilesystem: true # Only write through emptyDir allowPrivilegeEscalation: false capabilities: drop: ["ALL"] Observability Extensions The Sidecar pattern is naturally suited for adding observability components. You can add a third (or fourth) Sidecar to the same Pod for log collection, metrics export, or distributed tracing: Horizontal Scaling Strategy Since containers within a Pod scale together, HPA scaling granularity is at the Pod level. This means: If the Copilot agent is the bottleneck, scaling Pod replicas also scales Nginx and skill-server (minimal waste since they are lightweight) If skill management becomes compute-intensive in the future, consider splitting the skill-server from a Sidecar into an independent Deployment + ClusterIP Service for independent scaling Evolution Path from Sidecar to Microservices The dual Sidecar architecture provides a clear path for future migration to microservices: Each migration step only requires changing the communication method (localhost → Service DNS); business logic remains unchanged. This is the architectural flexibility that good separation of concerns provides. sample code - https://github.com/kinfey/Multi-AI-Agents-Cloud-Native/tree/main/code/GitHubCopilotSideCar Summary The dual Sidecar pattern in this project demonstrates a clean cloud-native AI application architecture: Main container (Nginx) stays lean and focused — it only serves HTML and proxies requests. It knows nothing about AI or skills. Sidecar 1 (Copilot Agent) encapsulates all AI logic. It uses the GitHub Copilot SDK, manages sessions, and generates content. Its only coupling to the rest of the Pod is through environment variables and shared volumes. The container image is built with cross-platform support — Node.js is installed from official binaries with automatic architecture detection, ensuring the same Dockerfile works on both amd64 and arm64 platforms. Sidecar 2 (Skill Server) provides a dedicated management layer for AI skill definitions. It bridges Kubernetes-native configuration (ConfigMap) with the Copilot SDK's runtime needs. This separation gives you independent deployability, isolated failure domains, and — most importantly — the ability to change AI behavior (skills, prompts, models) without rebuilding any container images. The Sidecar pattern is more than an architectural curiosity; it is a practical approach to composing AI services in Kubernetes, allowing each component to evolve at its own pace. With cross-platform build support (macOS and Linux, amd64 and arm64), Kubernetes native Sidecars reaching GA in 1.33, and AI development tools like the GitHub Copilot SDK maturing, we anticipate this "AI agent + Sidecar" combination pattern will see validation and adoption in more production environments. References GitHub Copilot SDK Repository — Official SDK supporting Python/TypeScript/Go/.NET KEP-753: Sidecar Containers — Kubernetes native Sidecar container proposal Kubernetes v1.33 Release: Sidecar Containers GA — Sidecar container GA announcement The Distributed System Toolkit: Patterns for Composite Containers — Classic early Kubernetes article on the Sidecar pattern 12-Factor App: Config — Externalized configuration principles