github
351 TopicsAgents League: Two Weeks, Three Tracks, One Challenge
We're inviting all developers to join Agents League, running February 16-27. It's a two-week challenge where you'll build AI agents using production-ready tools, learn from live coding sessions, and get feedback directly from Microsoft product teams. We've put together starter kits for each track to help you get up and running quickly that also includes requirements and guidelines. Whether you want to explore what GitHub Copilot can do beyond autocomplete, build reasoning agents on Microsoft Foundry, or create enterprise integrations for Microsoft 365 Copilot, we have a track for you. Important: Register first to be eligible for prizes and your digital badge. Without registration, you won't qualify for awards or receive a badge when you submit. What Is Agents League? It's a 2-week competition that combines learning with building: đ˝ď¸ Live coding battles â Watch Product teams, MVPs and community members tackle challenges in real-time on Microsoft Reactor đť Async challenges â Build at your own pace, on your schedule đŹ Discord community â Connect with other participants, join AMAs, and get help when you need it đ Prizes â $500 per track winner, plus GitHub Copilot Pro subscriptions for top picks The Three Tracks đ¨ Creative Apps â Build with GitHub Copilot (Chat, CLI, or SDK) đ§ Reasoning Agents â Build with Microsoft Foundry đź Enterprise Agents â Build with M365 Agents Toolkit (or Copilot Studio) More details on each track below, or jump straight to the starter kits. The Schedule Agents League starts on February 16th and runs through Feburary 27th. Within 2 weeks, we host live battles on Reactor and AMA sessions on Discord. Week 1: Live Battles (Feb 17-19) We're kicking off with live coding battles streamed on Microsoft Reactor. Watch experienced developers compete in real-time, explaining their approach and architectural decisions as they go. Tue Feb 17, 9 AM PT â đ¨ Creative Apps battle Wed Feb 18, 9 AM PT â đ§ Reasoning Agents battle Thu Feb 19, 9 AM PT â đź Enterprise Agents battle All sessions are recorded, so you can watch on your own schedule. Week 2: Build + AMAs (Feb 24-26) This is your time to build and ask questions on Discord. The async format means you work when it suits you, evenings, weekends, whatever fits your schedule. We're also hosting AMAs on Discord where you can ask questions directly to Microsoft experts and product teams: Tue Feb 24, 9 AM PT â đ¨ Creative Apps AMA Wed Feb 25, 9 AM PT â đ§ Reasoning Agents AMA Thu Feb 26, 9 AM PT â đź Enterprise Agents AMA Bring your questions, get help when you're stuck, and share what you're building with the community. Pick Your Track We've created a starter kit for each track with setup guides, project ideas, and example scenarios to help you get started quickly. đ¨ Creative Apps Tool: GitHub Copilot (Chat, CLI, or SDK) Build innovative, imaginative applications that showcase the potential of AI-assisted development. All application types are welcome, web apps, CLI tools, games, mobile apps, desktop applications, and more. The starter kit walks you through GitHub Copilot's different modes and provides prompting tips to get the best results. View the Creative Apps starter kit. đ§ Reasoning Agents Tool: Microsoft Foundry (UI or SDK) and/or Microsoft Agent Framework Build a multi-agent system that leverages advanced reasoning capabilities to solve complex problems. This track focuses on agents that can plan, reason through multi-step problems, and collaborate. The starter kit includes architecture patterns, reasoning strategies (planner-executor, critic/verifier, self-reflection), and integration guides for tools and MCP servers. View the Reasoning Agents starter kit. đź Enterprise Agents Tool: M365 Agents Toolkit or Copilot Studio Create intelligent agents that extend Microsoft 365 Copilot to address real-world enterprise scenarios. Your agent must work on Microsoft 365 Copilot Chat. Bonus points for: MCP server integration, OAuth security, Adaptive Cards UI, connected agents (multi-agent architecture). View the Enterprise Agents starter kit. Prizes & Recognition To be eligible for prizes and your digital badge, you must register before submitting your project. Category Winners ($500 each): đ¨ Creative Apps winner đ§ Reasoning Agents winner đź Enterprise Agents winner GitHub Copilot Pro subscriptions: Community Favorite (voted by participants on Discord) Product Team Picks (selected by Microsoft product teams) Everyone who registers and submits a project wins: A digital badge to showcase their participation. Beyond the prizes, every participant gets feedback from the teams who built these tools, a valuable opportunity to learn and improve your approach to AI agent development. How to Get Started Register first â This is required to be eligible for prizes and to receive your digital badge. Without registration, your submission won't qualify for awards or a badge. Pick a track â Choose one track. Explore the starter kits to help you decide. Watch the battles â See how experienced developers approach these challenges. Great for learning even if you're still deciding whether to compete. Build your project â You have until Feb 27. Work on your own schedule. Submit via GitHub â Open an issue using the project submission template. Join us on Discord â Get help, share your progress, and vote for your favorite projects on Discord. Links Register: https://aka.ms/agentsleague/register Starter Kits: https://github.com/microsoft/agentsleague/starter-kits Discord: https://aka.ms/agentsleague/discord Live Battles: https://aka.ms/agentsleague/battles Submit Project: Project submission templateFrom Zero to 16 Games in 2 Hours
From Zero to 16 Games in 2 Hours: Teaching Prompt Engineering to Students with GitHub Copilot CLI Introduction What happens when you give a room full of 14-year-olds access to AI-powered development tools and challenge them to build games? You might expect chaos, confusion, or at best, a few half-working prototypes. Instead, we witnessed something remarkable: 16 fully functional HTML5 games created in under two hours, all from students with varying programming experience. This wasn't magic, it was the power of GitHub Copilot CLI combined with effective prompt engineering. By teaching students to communicate clearly with AI, we transformed a traditional coding workshop into a rapid prototyping session that exceeded everyone's expectations. The secret weapon? A technique called "one-shot prompting" that enables anyone to generate complete, working applications from a single, well-crafted prompt. In this article, we'll explore how we structured this workshop using CopilotCLI-OneShotPromptGameDev, a methodology designed to teach prompt engineering fundamentals while producing tangible, exciting results. Whether you're an educator planning STEM workshops, a developer exploring AI-assisted coding, or simply curious about how young people can leverage AI tools effectively, this guide provides a practical blueprint you can replicate. What is GitHub Copilot CLI? GitHub Copilot CLI extends the familiar Copilot experience beyond your code editor into the command line. While Copilot in VS Code suggests code completions as you type, Copilot CLI allows you to have conversational interactions with AI directly in your terminal. You describe what you want to accomplish in natural language, and the AI responds with shell commands, explanations, or in our case, complete code files. This terminal-based approach offers several advantages for learning and rapid prototyping. Students don't need to configure complex IDE settings or navigate unfamiliar interfaces. They simply type their request, review the AI's output, and iterate. The command line provides a transparent view of exactly what's happening, no hidden abstractions or magical "autocomplete" that obscures the learning process. For our workshop, Copilot CLI served as a bridge between students' creative ideas and working code. They could describe a game concept in plain English, watch the AI generate HTML, CSS, and JavaScript, then immediately test the result in a browser. This rapid feedback loop kept engagement high and made the connection between language and code tangible. Installing GitHub Copilot CLI Setting up Copilot CLI requires a few straightforward steps. Before the workshop, we ensured all machines were pre-configured, but students also learned the installation process as part of understanding how developer tools work. First, you'll need Node.js installed on your system. Copilot CLI runs as a Node package, so this is a prerequisite: # Check if Node.js is installed node --version # If not installed, download from https://nodejs.org/ # Or use a package manager: # Windows (winget) winget install OpenJS.NodeJS.LTS # macOS (Homebrew) brew install node # Linux (apt) sudo apt install nodejs npm These commands verify your Node.js installation or guide you through installing it using your operating system's preferred package manager. Next, install the GitHub CLI, which provides the foundation for Copilot CLI: # Windows winget install GitHub.cli # macOS brew install gh # Linux sudo apt install gh This installs the GitHub command-line interface, which handles authentication and provides the framework for Copilot integration. With GitHub CLI installed, authenticate with your GitHub account: gh auth login This command initiates an interactive authentication flow that connects your terminal to your GitHub account, enabling access to Copilot features. Finally, install the Copilot CLI extension: gh extension install github/gh-copilot This adds Copilot capabilities to your GitHub CLI installation, enabling the conversational AI features we'll use for game development. Verify the installation by running: gh copilot --help If you see the help output with available commands, you're ready to start prompting. The entire setup takes about 5-10 minutes on a fresh machine, making it practical for classroom environments. Understanding One-Shot Prompting Traditional programming education follows an incremental approach: learn syntax, understand concepts, build small programs, gradually tackle larger projects. This method is thorough but slow. One-shot prompting inverts this modelâyou start with the complete vision and let AI handle the implementation details. A one-shot prompt provides the AI with all the context it needs to generate a complete, working solution in a single response. Instead of iteratively refining code through multiple exchanges, you craft one comprehensive prompt that specifies requirements, constraints, styling preferences, and technical specifications. The AI then produces complete, functional code. This approach teaches a crucial skill: clear communication of technical requirements. Students must think through their entire game concept before typing. What does the game look like? How does the player interact with it? What happens when they win or lose? By forcing this upfront thinking, one-shot prompting develops the same analytical skills that professional developers use when writing specifications or planning architectures. The technique also demonstrates a powerful principle: with sufficient context, AI can handle implementation complexity while humans focus on creativity and design. Students learned they could create sophisticated games without memorizing JavaScript syntaxâthey just needed to describe their vision clearly enough for the AI to understand. Crafting Effective Prompts for Game Development The difference between a vague prompt and an effective one-shot prompt is the difference between frustration and success. We taught students a structured approach to prompt construction that consistently produced working games. Start with the game type and core mechanic. Don't just say "make a game"âspecify what kind: Create a complete HTML5 game where the player controls a spaceship that must dodge falling asteroids. This opening establishes the fundamental gameplay loop: control a spaceship, avoid obstacles. The AI now has a clear mental model to work from. Add visual and interaction details. Games are visual experiences, so specify how things should look and respond: Create a complete HTML5 game where the player controls a spaceship that must dodge falling asteroids. The spaceship should be a blue triangle at the bottom of the screen, controlled by left and right arrow keys. Asteroids are brown circles that fall from the top at random positions and increasing speeds. These additions provide concrete visual targets and define the input mechanism. The AI can now generate specific CSS colors and event handlers. Define win/lose conditions and scoring: Create a complete HTML5 game where the player controls a spaceship that must dodge falling asteroids. The spaceship should be a blue triangle at the bottom of the screen, controlled by left and right arrow keys. Asteroids are brown circles that fall from the top at random positions and increasing speeds. Display a score that increases every second the player survives. The game ends when an asteroid hits the spaceship, showing a "Game Over" screen with the final score and a "Play Again" button. This complete prompt now specifies the entire game loop: gameplay, scoring, losing, and restarting. The AI has everything needed to generate a fully playable game. The formula students learned: Game Type + Visual Description + Controls + Rules + Win/Lose + Score = Complete Game Prompt. Running the Workshop: Structure and Approach Our two-hour workshop followed a carefully designed structure that balanced instruction with hands-on creation. We partnered with University College London and students access to GitHub Education to access resources specifically designed for classroom settings, including student accounts with Copilot access and amazing tools like VSCode and Azure for Students and for Schools VSCode Education. The first 20 minutes covered fundamentals: what is AI, how does Copilot work, and why does prompt quality matter? We demonstrated this with a live example, showing how "make a game" produces confused output while a detailed prompt generates playable code. This contrast immediately captured students' attention, they could see the direct relationship between their words and the AI's output. The next 15 minutes focused on the prompt formula. We broke down several example prompts, highlighting each component: game type, visuals, controls, rules, scoring. Students practiced identifying these elements in prompts before writing their own. This analysis phase prepared them to construct effective prompts independently. The remaining 85 minutes were dedicated to creation. Students worked individually or in pairs, brainstorming game concepts, writing prompts, generating code, testing in browsers, and iterating. Instructors circulated to help debug prompts (not code an important distinction) and encourage experimentation. We deliberately avoided teaching JavaScript syntax. When students encountered bugs, we guided them to refine their prompts rather than manually fix code. This maintained focus on the core skill: communicating with AI effectively. Surprisingly, this approach resulted in fewer bugs overall because students learned to be more precise in their initial descriptions. Student Projects: The Games They Created The diversity of games produced in 85 minutes of building time amazed everyone present. Students didn't just follow a template, they invented entirely new concepts and successfully communicated them to Copilot CLI. One student created a "Fruit Ninja" clone where players clicked falling fruit to slice it before it hit the ground. Another built a typing speed game that challenged players to correctly type increasingly difficult words against a countdown timer. A pair of collaborators produced a two-player tank battle where each player controlled their tank with different keyboard keys. Several students explored educational games: a math challenge where players solve equations to destroy incoming meteors, a geography quiz with animated maps, and a vocabulary builder where correct definitions unlock new levels. These projects demonstrated that one-shot prompting isn't limited to entertainment, students naturally gravitated toward useful applications. The most complex project was a procedurally generated maze game with fog-of-war mechanics. The student spent extra time on their prompt, specifying exactly how visibility should work around the player character. Their detailed approach paid off with a surprisingly sophisticated result that would typically require hours of manual coding. By the session's end, we had 16 complete, playable HTML5 games. Every student who participated produced something they could share with friends and family a tangible achievement that transformed an abstract "coding workshop" into a genuine creative accomplishment. Key Benefits of Copilot CLI for Rapid Prototyping Our workshop revealed several advantages that make Copilot CLI particularly valuable for rapid prototyping scenarios, whether in educational settings or professional development. Speed of iteration fundamentally changes what's possible. Traditional game development requires hours to produce even simple prototypes. With Copilot CLI, students went from concept to playable game in minutes. This compressed timeline enables experimentation, if your first idea doesn't work, try another. This psychological freedom to fail fast and try again proved more valuable than any technical instruction. Accessibility removes barriers to entry. Students with no prior coding experience produced results comparable to those who had taken programming classes. The playing field leveled because success depended on creativity and communication rather than memorized syntax. This democratization of development opens doors for students who might otherwise feel excluded from technical fields. Focus on design over implementation teaches transferable skills. Whether students eventually become programmers, designers, product managers, or pursue entirely different careers, the ability to clearly specify requirements and think through complete systems applies universally. They learned to think like system designers, not just coders. The feedback loop keeps engagement high. Seeing your words transform into working software within seconds creates an addictive cycle of creation and testing. Students who typically struggle with attention during lectures remained focused throughout the building session. The immediate gratification of seeing their games work motivated continuous refinement. Debugging through prompts teaches root cause analysis. When games didn't work as expected, students had to analyze what they'd asked for versus what they received. This comparison exercise developed critical thinking about specifications a skill that serves developers throughout their careers. Tips for Educators: Running Your Own Workshop If you're planning to replicate this workshop, several lessons from our experience will help ensure success. Pre-configure machines whenever possible. While installation is straightforward, classroom time is precious. Having Copilot CLI ready on all devices lets you dive into content immediately. If pre-configuration isn't possible, allocate the first 15-20 minutes specifically for setup and troubleshoot as a group. Prepare example prompts across difficulty levels. Some students will grasp one-shot prompting immediately; others will need more scaffolding. Having templates ranging from simple ("Create Pong") to complex (the spaceship example above) lets you meet students where they are. Emphasize that "prompt debugging" is the goal. When students ask for help fixing broken code, redirect them to examine their prompt. What did they ask for? What did they get? Where's the gap? This redirection reinforces the workshop's core learning objective and builds self-sufficiency. Celebrate and share widely. Build in time at the end for students to demonstrate their games. This showcase moment validates their work and often inspires classmates to try new approaches in future sessions. Consider creating a shared folder or simple website where all games can be accessed after the workshop. Access GitHub Education resources at education.github.com before your workshop. The GitHub Education program provides free access to developer tools for students and educators, including Copilot. The resources there include curriculum materials, teaching guides, and community support that can enhance your workshop. Beyond Games: Where This Leads The techniques students learned extend far beyond game development. One-shot prompting with Copilot CLI works for any development task: creating web pages, building utilities, generating data processing scripts, or prototyping application interfaces. The fundamental skill, communicating requirements clearly to AI applies wherever AI-assisted development tools are used. Several students have continued exploring after the workshop. Some discovered they enjoy the creative aspects of game design and are learning traditional programming to gain more control. Others found that prompt engineering itself interests them, they're exploring how different phrasings affect AI outputs across various domains. For professional developers, the workshop's lessons apply directly to working with Copilot, ChatGPT, and other AI coding assistants. The ability to craft precise, complete prompts determines whether these tools save time or create confusion. Investing in prompt engineering skills yields returns across every AI-assisted workflow. Key Takeaways Clear prompts produce working code: The one-shot prompting formula (Game Type + Visuals + Controls + Rules + Win/Lose + Score) reliably generates playable games from single prompts Copilot CLI democratizes development: Students with no coding experience created functional applications by focusing on communication rather than syntax Rapid iteration enables experimentation: Minutes-per-prototype timelines encourage creative risk-taking and learning from failures Prompt debugging builds analytical skills: Comparing intended versus actual results teaches specification writing and root cause analysis Sixteen games in two hours is achievable: With proper structure and preparation, young students can produce impressive results using AI-assisted development Conclusion and Next Steps Our workshop demonstrated that AI-assisted development tools like GitHub Copilot CLI aren't just productivity boosters for experienced programmers, they're powerful educational instruments that make software creation accessible to beginners. By focusing on prompt engineering rather than traditional syntax instruction, we enabled 14-year-old students to produce complete, functional games in a fraction of the time traditional methods would require. The sixteen games created during those two hours represent more than just workshop outputs. They represent a shift in how we might teach technical creativity: start with vision, communicate clearly, iterate quickly. Whether students pursue programming careers or not, they've gained experience in thinking systematically about requirements and translating ideas into specifications that produce real results. To explore this approach yourself, visit the CopilotCLI-OneShotPromptGameDev repository for prompt templates, workshop materials, and example games. For educational resources and student access to GitHub tools including Copilot, explore GitHub Education. And most importantly, start experimenting. Write a prompt, generate some code, and see what you can create in the next few minutes. Resources CopilotCLI-OneShotPromptGameDev Repository - Workshop materials, prompt templates, and example games GitHub Education - Free developer tools and resources for students and educators GitHub Copilot CLI Documentation - Official installation and usage guide GitHub CLI - Foundation tool required for Copilot CLI GitHub Copilot - Overview of Copilot features and pricing225Views2likes3CommentsNow in Foundry: Qwen3-Coder-Next, Qwen3-ASR-1.7B, Z-Image
This week's spotlight features three models from that demonstrate enterprise-grade AI across the full scope of modalities. From low latency coding agents to state-of-the-art multilingual speech recognition and foundation-quality image generation, these models showcase the breadth of innovation happening in open-source AI. Each model balances performance with practical deployment considerations, making them viable for production systems while pushing the boundaries of what's possible in their respective domains. This week's Model Mondays edition highlights Qwen3-Coder-Next, an 80B MoE model that activates only 3B parameters while delivering coding agent capabilities with 256k context; Qwen3-ASR-1.7B, which achieves state-of-the-art accuracy across 52 languages and dialects; and Z-Image from Tongyi-MAI, an undistilled text-to-image foundation model with full Classifier-Free Guidance support for professional creative workflows. Models of the week Qwen: Qwen3-Coder-Next Model Specs Parameters / size: 80B total (3B activated) Context length: 262,144 tokens Primary task: Text generation (coding agents, tool use) Why it's interesting Extreme efficiency: Activates only 3B of 80B parameters while delivering performance comparable to models with 10-20x more active parameters, making advanced coding agents viable for local deployment on consumer hardware Built for agentic workflows: Excels at long-horizon reasoning, complex tool usage, and recovering from execution failures, a critical capability for autonomous development that go beyond simple code completion Benchmarks: Competitive performance with significantly larger models on SWE-bench and coding benchmarks (Technical Report) Try it Use Case Prompt Pattern Code generation with tool use Provide task context, available tools, and execution environment details Long-context refactoring Include full codebase context within 256k window with specific refactoring goals Autonomous debugging Present error logs, stack traces, and relevant code with failure recovery instructions Multi-file code synthesis Describe architecture requirements and file structure expectations Financial services sample prompt: You are a coding agent for a fintech platform. Implement a transaction reconciliation service that processes batches of transactions, detects discrepancies between internal records and bank statements, and generates audit reports. Use the provided database connection tool, logging utility, and alert system. Handle edge cases including partial matches, timing differences, and duplicate transactions. Include unit tests with 90%+ coverage. Qwen: Qwen3-ASR-1.7B Model Specs Parameters / size: 1.7B Context length: 256 tokens (default), configurable up to 4096 Primary task: Automatic speech recognition (multilingual) Why it's interesting All-in-one multilingual capability: Single 1.7B model handles language identification plus speech recognition for 30 languages, 22 Chinese dialects, and English accents from multiple regionsâeliminating the need to manage separate models per language Specialized audio versatility: Transcribes not just clean speech but singing voice, songs with background music, and extended audio files, expanding use cases beyond traditional ASR to entertainment and media workflows State-of-the-art accuracy: Outperforms GPT-4o, Gemini-2.5, and Whisper-large-v3 across multiple benchmarks. English: Tedlium 4.50 WER vs 7.69/6.15/6.84; Chinese: WenetSpeech 4.97/5.88 WER vs 15.30/14.43/9.86 (Technical Paper) Language ID included: 97.9% average accuracy across benchmark datasets for automatic language identification, eliminating the need for separate language detection pipelines Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation Customer support sample prompt: Deploy Qwen3-ASR-1.7B to a Microsoft Foundry endpoint and transcribe multilingual customer service calls. Send audio files via API to automatically detect the language (from 52 supported options including 30 languages and 22 Chinese dialects) and generate accurate transcripts. Process calls from customers speaking English, Spanish, Mandarin, Cantonese, Arabic, French, and other languages without managing separate models per language. Use transcripts for quality assurance, compliance monitoring, and customer sentiment analysis. Tongyi-MAI: Z-Image Model Specs Parameters / size: 6B Context length: N/A (text-to-image) Primary task: Text-to-image generation Why it's interesting Undistilled foundation model: Full-capacity base without distillation preserves complete training signal with Classifier-Free Guidance support (a technique that improves prompt adherence and output quality), enabling complex prompt engineering and negative prompting that distilled models cannot achieve High output diversity: Generates distinct character identities in multi-person scenes with varied compositions, facial features, and lighting, critical for creative applications requiring visual variety rather than consistency Aesthetic versatility: Handles diverse visual styles from hyper-realistic photography to anime and stylized illustrations within a single model, supporting resolutions from 512Ă512 to 2048Ă2048 at any aspect ratio with 28-50 inference steps (Technical Paper) Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation E-commerce sample prompt: Professional product photography of a modern ergonomic office chair in a bright Scandinavian-style home office. Natural window lighting from left, clean white desk with laptop and succulent plant, light oak hardwood floor. Chair positioned at 45-degree angle showing design details. Photorealistic, commercial photography, sharp focus, 85mm lens, f/2.8, soft shadows. Getting started You can deploy openâsource Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry301Views0likes0CommentsBuild an AI-Powered Space Invaders Game
Build an AI-Powered Space Invaders Game: Integrating LLMs into HTML5 Games with Microsoft Foundry Local Introduction What if your game could talk back to you? Imagine playing Space Invaders while an AI commander taunts you during battle, delivers personalized mission briefings, and provides real-time feedback based on your performance. This isn't science fiction it's something you can build today using HTML, JavaScript, and a locally-running AI model. In this tutorial, we'll explore how to create an HTML5 game with integrated Large Language Model (LLM) features using Microsoft Foundry Local. You'll learn how to combine classic game development with modern AI capabilities, all running entirely on your own machineâno cloud services, no API costs, no internet connection required during gameplay. We'll be working with the Space Invaders - AI Commander Edition project, which demonstrates exactly how to architect games that leverage local AI. Whether you're a student learning game development, exploring AI integration patterns, or building your portfolio, this guide provides practical, hands-on experience with technologies that are reshaping how we build interactive applications. What You'll Learn By the end of this tutorial, you'll understand how to combine traditional web development with local AI inference. These skills transfer directly to building chatbots, interactive tutorials, AI-enhanced productivity tools, and any application where you want intelligent, context-aware responses. Set up Microsoft Foundry Local for running AI models on your machine Understand the architecture of games that integrate LLM features Use GitHub Copilot CLI to accelerate your development workflow Implement AI-powered game features like dynamic commentary and adaptive feedback Extend the project with your own creative AI features Why Local AI for Games? Before diving into the code, let's understand why running AI locally matters for game development. Traditional cloud-based AI services have limitations that make them impractical for real-time gaming experiences. Latency is the first challenge. Cloud API calls typically take 500ms to several seconds, an eternity in a game running at 60 frames per second. Local inference can respond in tens of milliseconds, enabling AI responses that feel instantaneous and natural. When an enemy ship appears, your AI commander can taunt you immediately, not three seconds later. Cost is another consideration. Cloud AI services charge per token, which adds up quickly when generating dynamic content during gameplay. Local models have zero per-use cost, once installed, they run entirely on your hardware. This frees you to experiment without worrying about API bills. Privacy and offline capability complete the picture. Local AI keeps all data on your machine, perfect for games that might handle player information. And since nothing requires internet connectivity, your game works anywhere, on planes, in areas with poor connectivity, or simply when you want to play without network access. Understanding Microsoft Foundry Local Microsoft Foundry Local is a runtime that enables you to run small language models (SLMs) directly on your computer. It's designed for developers who want to integrate AI capabilities into applications without requiring cloud infrastructure. Think of it as having a miniature AI assistant living on your laptop. Foundry Local handles the complex work of loading AI models, managing memory, and processing inference requests through a simple API. You send text prompts, and it returns AI-generated responses, all happening locally on your CPU or GPU. The models are optimized to run efficiently on consumer hardware, so you don't need a supercomputer. For our Space Invaders game, Foundry Local powers the "AI Commander" feature. During gameplay, the game sends context about what's happening, your score, accuracy, current level, enemies remaining and receives back contextual commentary, taunts, and encouragement. The result feels like playing alongside an AI companion who actually understands the game. Setting Up Your Development Environment Let's get your machine ready for AI-powered game development. We'll install Foundry Local, clone the project, and verify everything works. The entire setup takes about 10-15 minutes. Step 1: Install Microsoft Foundry Local Foundry Local installation varies by operating system. Open your terminal and run the appropriate command: # Windows (using winget) winget install Microsoft.FoundryLocal # macOS (using Homebrew) brew install microsoft/foundrylocal/foundrylocal These commands download and install the Foundry Local runtime along with a default small language model. The installation includes everything needed to run AI inference locally. Verify the installation by running: foundry --version If you see a version number, Foundry Local is ready. If you encounter errors, ensure you have administrator/sudo privileges and that your package manager is up to date. Step 2: Install Node.js (If Not Already Installed) Our game's AI features require a small Node.js server to communicate between the browser and Foundry Local. Check if Node.js is installed: node --version If you see a version number (v16 or higher recommended), you're set. Otherwise, install Node.js: # Windows winget install OpenJS.NodeJS.LTS # macOS brew install node # Linux sudo apt install nodejs npm Node.js provides the JavaScript runtime that powers our proxy server, bridging browser code with the local AI model. Step 3: Clone the Project Get the Space Invaders project onto your machine: git clone https://github.com/leestott/Spaceinvaders-FoundryLocal.git cd Spaceinvaders-FoundryLocal This downloads all game files, including the HTML interface, game logic, AI integration module, and server code. Step 4: Install Dependencies and Start the Server Install the Node.js packages and launch the AI-enabled server: npm install npm start The first command downloads required packages (primarily for the proxy server). The second starts the server, which listens for AI requests from the game. You should see output indicating the server is running on port 3001. Step 5: Play the Game Open your browser and navigate to: http://localhost:3001 You should see Space Invaders with "AI: ONLINE" displayed in the game HUD, indicating that AI features are active. Use arrow keys or A/D to move, SPACE to fire, and P to pause. The AI Commander will start providing commentary as you play! Understanding the Project Architecture Now that the game is running, let's explore how the different pieces fit together. Understanding this architecture will help you modify the game and apply these patterns to your own projects. The project follows a clean separation of concerns, with each file handling a specific responsibility: Spaceinvaders-FoundryLocal/ âââ index.html # Main game page and UI structure âââ styles.css # Retro arcade visual styling âââ game.js # Core game logic and rendering âââ llm.js # AI integration module âââ sound.js # Web Audio API sound effects âââ server.js # Node.js proxy for Foundry Local âââ package.json # Project configuration index.html: Defines the game canvas and UI elements. It's the entry point that loads all other modules. game.js: Contains the game loop, physics, collision detection, scoring, and rendering logic. This is the heart of the game. llm.js: Handles all communication with the AI backend. It formats game state into prompts and processes AI responses. server.js: A lightweight Express server that proxies requests between the browser and Foundry Local. sound.js: Synthesizes retro sound effects using the Web Audio APIâno audio files needed! How the AI Integration Works The magic of the AI Commander happens through a simple but powerful pattern. Let's trace the flow from gameplay event to AI response. When something interesting happens in the game, you clear a wave, achieve a combo, or lose a life, the game logic in game.js triggers an AI request. This request includes context about the current game state: your score, accuracy percentage, current level, lives remaining, and what just happened. The llm.js module formats this context into a prompt. For example, when you clear a wave with 85% accuracy, it might construct: You are an AI Commander in a Space Invaders game. The player just cleared wave 3 with 85% accuracy. Score: 12,500. Lives: 3. Provide a brief, enthusiastic comment (1-2 sentences). This prompt travels to server.js , which forwards it to Foundry Local. The AI model processes the prompt and generates a response like: "Impressive accuracy, pilot! Wave 3 didn't stand a chance. Keep that trigger finger sharp!" The response flows back through the server to the browser, where llm.js passes it to the game. The game displays the message in the HUD, creating the illusion of playing alongside an AI companion. This entire round trip typically completes in 50-200 milliseconds, fast enough to feel responsive without interrupting gameplay. Using GitHub Copilot CLI to Explore and Modify the Code GitHub Copilot CLI accelerates your development workflow by letting you ask questions and generate code directly in your terminal. Let's use it to understand and extend the Space Invaders project. Installing Copilot CLI If you haven't installed Copilot CLI yet, here's the quick setup: # Install GitHub CLI winget install GitHub.cli # Windows brew install gh # macOS # Authenticate with GitHub gh auth login # Add Copilot extension gh extension install github/gh-copilot # Verify installation gh copilot --help With Copilot CLI ready, you can interact with AI directly from your terminal while working on the project. Exploring Code with Copilot CLI Use Copilot to understand unfamiliar code. Navigate to the project directory and try: gh copilot explain "How does llm.js communicate with the server?" Copilot analyzes the code and explains the communication pattern, helping you understand the architecture without reading every line manually. You can also ask about specific functions: gh copilot explain "What does the generateEnemyTaunt function do?" This accelerates onboarding to unfamiliar codebases, a valuable skill when working with open source projects or joining teams. Generating New Features Want to add a new AI feature? Ask Copilot to help generate the code: gh copilot suggest "Create a function that asks the AI to generate a mission briefing at the start of each level, including the level number and a random mission objective" Copilot generates starter code that you can customize and integrate. This combination of AI-powered development tools and AI-integrated gameplay demonstrates how LLMs are transforming both how we build games and how games behave. Customizing the AI Commander The default AI Commander provides generic gaming commentary, but you can customize its personality and responses. Open llm.js to find the prompt templates that control AI behavior. Changing the AI's Personality The system prompt defines who the AI "is." Find the base prompt and modify it: // Original const systemPrompt = "You are an AI Commander in a Space Invaders game."; // Customized - Drill Sergeant personality const systemPrompt = `You are Sergeant Blaster, a gruff but encouraging drill sergeant commanding space cadets. Use military terminology, call the player "cadet," and be tough but fair.`; // Customized - Supportive Coach personality const systemPrompt = `You are Coach Nova, a supportive and enthusiastic gaming coach. Use encouraging language, celebrate small victories, and provide gentle guidance when players struggle.`; These personality changes dramatically alter the game's feel without changing any gameplay code. It's a powerful example of how AI can add variety to games with minimal development effort. Adding New Commentary Triggers Currently the AI responds to wave completions and game events. You can add new triggers in game.js : // Add AI commentary when player achieves a kill streak if (killStreak >= 5 && !streakCommentPending) { requestAIComment('killStreak', { count: killStreak }); streakCommentPending = true; } // Add AI reaction when player narrowly avoids death if (nearMissOccurred) { requestAIComment('nearMiss', { livesRemaining: lives }); } Each new trigger point adds another opportunity for the AI to engage with the player, making the experience more dynamic and personalized. Understanding the Game Features Beyond AI integration, the Space Invaders project demonstrates solid game development patterns worth studying. Let's explore the key features. Power-Up System The game includes eight different power-ups, each with unique effects: SPREAD (Orange): Fires three projectiles in a spread pattern LASER (Red): Powerful beam with high damage RAPID (Yellow): Dramatically increased fire rate MISSILE (Purple): Homing projectiles that track enemies SHIELD (Blue): Grants an extra life EXTRA LIFE (Green): Grants two extra lives BOMB (Red): Destroys all enemies on screen BONUS (Gold): Random score bonus between 250-750 points Power-ups demonstrate state management, tracking which power-up is active, applying its effects to player actions, and handling timeouts. Study the power-up code in game.js to understand how temporary state modifications work. Leaderboard System The game persists high scores using the browser's localStorage API: // Saving scores localStorage.setItem('spaceInvadersScores', JSON.stringify(scores)); // Loading scores const savedScores = localStorage.getItem('spaceInvadersScores'); const scores = savedScores ? JSON.parse(savedScores) : []; This pattern works for any data you want to persist between sessionsâgame progress, user preferences, or accumulated statistics. It's a simple but powerful technique for web games. Sound Synthesis Rather than loading audio files, the game synthesizes retro sound effects using the Web Audio API in sound.js . This approach has several benefits: no external assets to load, smaller project size, and complete control over sound parameters. Examine how oscillators and gain nodes combine to create laser sounds, explosions, and victory fanfares. This knowledge transfers directly to any web project requiring audio feedback. Extending the Project: Ideas for Students Ready to make the project your own? Here are ideas ranging from beginner-friendly to challenging, each teaching valuable skills. Beginner: Customize Visual Theme Modify styles.css to create a new visual theme. Try changing the color scheme from green to blue, or create a "sunset" theme with orange and purple gradients. This builds CSS skills while making the game feel fresh. Intermediate: Add New Enemy Types Create a new enemy class in game.js with different movement patterns. Perhaps enemies that move in sine waves, or boss enemies that take multiple hits. This teaches object-oriented programming and game physics. Intermediate: Expand AI Interactions Add new AI features like: Pre-game mission briefings that set up the story Dynamic difficulty hints when players struggle Post-game performance analysis and improvement suggestions AI-generated names for enemy waves Advanced: Multiplayer Commentary Modify the game for two-player support and have the AI provide play-by-play commentary comparing both players' performance. This combines game networking concepts with advanced AI prompting. Advanced: Voice Integration Use the Web Speech API to speak the AI Commander's responses aloud. This creates a more immersive experience and demonstrates browser speech synthesis capabilities. Troubleshooting Common Issues If something isn't working, here are solutions to common problems. "AI: OFFLINE" Displayed in Game This means the game can't connect to the AI server. Check that: The server is running ( npm start shows no errors) You're accessing the game via http://localhost:3001 , not directly opening the HTML file Foundry Local is installed correctly ( foundry --version works) Server Won't Start If npm start fails: Ensure you ran npm install first Check that port 3001 isn't already in use by another application Verify Node.js is installed ( node --version ) AI Responses Are Slow Local AI performance depends on your hardware. If responses feel sluggish: Close other resource-intensive applications Ensure your laptop is plugged in (battery mode may throttle CPU) Consider that first requests may be slower as the model loads Key Takeaways Local AI enables real-time game features: Microsoft Foundry Local provides fast, free, private AI inference perfect for gaming applications Clean architecture matters: Separating game logic, AI integration, and server code makes projects maintainable and extensible AI personality is prompt-driven: Changing a few lines of prompt text completely transforms how the AI interacts with players Copilot CLI accelerates learning: Use it to explore unfamiliar code and generate new features quickly The patterns transfer everywhere: Skills from this project apply to chatbots, assistants, educational tools, and any AI-integrated application Conclusion and Next Steps You've now seen how to integrate AI capabilities into a browser-based game using Microsoft Foundry Local. The Space Invaders project demonstrates that modern AI features don't require cloud services or complex infrastructure, they can run entirely on your laptop, responding in milliseconds. More importantly, you've learned patterns that extend far beyond gaming. The architecture of sending context to an AI, receiving generated responses, and integrating them into user experiences applies to countless applications: customer support bots, educational tutors, creative writing tools, and accessibility features. Your next step is experimentation. Clone the repository, modify the AI's personality, add new commentary triggers, or build an entirely new game using these patterns. The combination of GitHub Copilot CLI for development assistance and Foundry Local for runtime AI gives you powerful tools to bring intelligent applications to life. Start playing, start coding, and discover what you can create when your games can think. Resources Space Invaders - AI Commander Edition Repository - Full source code and documentation Play Space Invaders Online - Try the basic version without AI features Microsoft Foundry Local Documentation - Official installation and API guide GitHub Copilot CLI Documentation - Installation and usage guide GitHub Education - Free developer tools for students Web Audio API Documentation - Learn about browser sound synthesis Canvas API Documentation - Master HTML5 game rendering156Views0likes1CommentChoosing the Right Model in GitHub Copilot: A Practical Guide for Developers
AI-assisted development has grown far beyond simple code suggestions. GitHub Copilot now supports multiple AI models, each optimized for different workflows, from quick edits to deep debugging to multi-step agentic tasks that generate or modify code across your entire repository. As developers, this flexibility is powerful⌠but only if we know how to choose the right model at the right time. In this guide, Iâll break down: Why model selection matters The four major categories of development tasks A simplified, developer-friendly model comparison table Enterprise considerations and practical tips This is written from the perspective of real-world customer conversations, GitHub Copilot demos, and enterprise adoption journeys Why Model Selection Matters GitHub Copilot isnât tied to a single model. Instead, it offers a range of models, each with different strengths: Some are optimized for speed Others are optimized for reasoning depth Some are built for agentic workflows Choosing the right model can dramatically improve: The quality of the output The speed of your workflow The accuracy of Copilotâs reasoning The effectiveness of Agents and Plan Mode Your usage efficiency under enterprise quotas Model selection is now a core part of modern software development, just like choosing the right library, framework, or cloud service. The Four Task Categories (and which Model Fits) To simplify model selection, I group tasks into four categories. Each category aligns naturally with specific types of models. 1. Everyday Development Tasks Examples: Writing new functions Improving readability Generating tests Creating documentation Best fit: General-purpose coding models (e.g., GPTâ4.1, GPTâ5âmini, Claude Sonnet) These models offer the best balance between speed and quality. 2. Fast, Lightweight Edits Examples: Quick explanations JSON/YAML transformations Small refactors Regex generation Short Q&A tasks Best fit: Lightweight models (e.g., Claude Haiku 4.5) These models give near-instant responses and keep you âin flow.â 3. Complex Debugging & Deep Reasoning Examples: Analyzing unfamiliar code Debugging tricky production issues Architecture decisions Multi-step reasoning Performance analysis Best fit: Deep reasoning models (e.g., GPTâ5, GPTâ5.1, GPTâ5.2, Claude Opus) These models handle large context, produce structured reasoning, and give the most reliable insights for complex engineering tasks. 4. Multi-step Agentic Development Examples: Repo-wide refactors Migrating a codebase Scaffolding entire features Implementing multi-file plans in Agent Mode Automated workflows (Plan â Execute â Modify) Best fit: Agent-capable models (e.g., GPTâ5.1âCodexâMax, GPTâ5.2âCodex) These models are ideal when you need Copilot to execute multi-step tasks across your repository. GitHub Copilot Models - Developer Friendly Comparison The set of models you can choose from depends on your Copilot subscription, and the available options may evolve over time. Each model also has its own premium request multiplier, which reflects the compute resources it requires. If you're using a paid Copilot plan, the multiplier determines how many premium requests are deducted whenever that model is used. Model Category Example Models (Premium request Multiplier for paid plans) What theyâre best at When to Use Them Fast Lightweight Models Claude Haiku 4.5, Gemini 3 Flash (0.33x) Grok Code Fast 1 (0.25x) Low latency, quick responses Small edits, Q&A, simple code tasks General-Purpose Coding Models GPTâ4.1, GPTâ5âmini (0x) GPT-5-Codex, Claude Sonnet 4.5 (1x) Reliable dayâtoâday development Writing functions, small tests, documentation Deep Reasoning Models GPT-5.1 Codex Mini (0.33x) GPTâ5, GPTâ5.1, GPT-5.1 Codex, GPTâ5.2, Claude Sonnet 4.0, Gemini 2.5 Pro, Gemini 3 Pro (1x) Claude Opus 4.5 (3x) Complex reasoning and debugging Architecture work, deep bug diagnosis Agentic / Multi-step Models GPTâ5.1âCodexâMax, GPTâ5.2âCodex (1x) Planning + execution workflows Repo-wide changes, feature scaffolding Enterprise Considerations For organizations using Copilot Enterprise or Business: Admins can control which models employees can use Model selection may be restricted due to security, regulation, or data governance You may see fewer available models depending on your organizationâs Copilot policies Using "Auto" Model selection in GitHub Copilot GitHub Copilotâs Auto model selection automatically chooses the best available model for your prompts, reducing the mental load of picking a model and helping you avoid rateâlimiting. When enabled, Copilot prioritizes model availability and selects from a rotating set of eligible models such as GPTâ4.1, GPTâ5 mini, GPTâ5.2âCodex, Claude Haiku 4.5, and Claude Sonnet 4.5 while respecting your subscription level and any administratorâimposed restrictions. Auto also excludes models blocked by policies, models with premium multipliers greater than 1, and models unavailable in your plan. For paid plans, Auto provides an additional benefit: a 10% discount on premium request multipliers when used in Copilot Chat. Overall, Auto offers a balanced, optimized experience by dynamically selecting a performant and costâefficient model without requiring developers to switch models manually. Read more about the 'Auto' Model selection here - About Copilot auto model selection - GitHub Docs Final Thoughts GitHub Copilot is becoming a core part of the developer workflows. Choosing the right model can dramatically improve your productivity, the accuracy of Copilotâs responses, your experience with multi-step agentic tasks, your ability to navigate complex codebases Whether youâre building features, debugging complex issues, or orchestrating repo-wide changes, picking the right model helps you get the best out of GitHub Copilot. References and Further Reading To explore each model further, visit the GitHub Copilot model comparison documentation or try switching models in Copilot Chat to see how they impact your workflow. AI model comparison - GitHub Docs Requests in GitHub Copilot - GitHub Docs About Copilot auto model selection - GitHub DocsDemystifying GitHub Copilot Security Controls: easing concerns for organizational adoption
At a recent developer conference, I delivered a session on Legacy Code Rescue using GitHub Copilot App Modernization. Throughout the day, conversations with developers revealed a clear divide: some have fully embraced Agentic AI in their daily coding, while others remain cautious. Often, this hesitation isn't due to reluctance but stems from organizational concerns around security and regulatory compliance. Having witnessed similar patterns during past technology shifts, I understand how these barriers can slow adoption. In this blog, I'll demystify the most common security concerns about GitHub Copilot and explain how its built-in features address them, empowering organizations to confidently modernize their development workflows. GitHub Copilot Model Training A common question I received at the conference was whether GitHub uses your code as training data for GitHub Copilot. I always direct customers to the GitHub Copilot Trust Center for clarity, but the answer is straightforward: âNo. GitHub uses neither Copilot Business nor Enterprise data to train the GitHub model.â Notice this restriction also applies to third-party models as well (e.g. Anthropic, Google). GitHub Copilot Intellectual Property indemnification policy A frequent concern I hear is, since GitHub Copilotâs underlying models are trained on sources that include public code, it might simply âcopy and pasteâ code from those sources. Letâs clarify how this actually works: Does GitHub Copilot âcopy/pasteâ? âThe AI models that create Copilotâs suggestions may be trained on public code, but do not contain any code. When they generate a suggestion, they are not âcopying and pastingâ from any codebase.â To provide an additional layer of protection, GitHub Copilot includes a âduplicate detection filterâ. This feature helps prevent suggestions that closely match public code from being surfaced. (Note: This duplicate detection currently does not apply to the Copilot coding agent.) More importantly, customers are protected by an Intellectual Property indemnification policy. This means that if you receive an unmodified suggestion from GitHub Copilot and face a copyright claim as a result, Microsoft will defend you in court. GitHub Copilot Data Retention Another frequent question I hear concerns GitHub Copilotâs data retention policies. For organizations on GitHub Copilot Business and Enterprise plans, retention practices depend on how and where the service is accessed from: Access through IDE for Chat and Code Completions: Prompts and Suggestions: Not retained. User Engagement Data: Kept for two years. Feedback Data: Stored for as long as needed for its intended purpose. Other GitHub Copilot access and use: Prompts and Suggestions: Retained for 28 days. User Engagement Data: Kept for two years. Feedback Data: Stored for as long as needed for its intended purpose. For Copilot Coding Agent, session logs are retained for the life of the account in order to provide the service. Excluding content from GitHub Copilot To prevent GitHub Copilot from indexing sensitive files, you can configure content exclusions at the repository or organization level. In VS Code, use the .copilotignore file to exclude files client-side. Note that files listed in .gitignore are not indexed by default but may still be referenced if open or explicitly referenced (unless theyâre excluded through .copilotignore or content exclusions). The life cycle of a GitHub Copilot code suggestion Here are the key protections at each stage of the life cycle of a GitHub Copilot code suggestion: In the IDE: Content exclusions prevent files, folders, or patterns from being included. GitHub proxy (pre-model safety): Prompts go through a GitHub proxy hosted in Microsoft Azure for pre-inference checks: screening for toxic or inappropriate language, relevance, and hacking attempts/jailbreak-style prompts before reaching the model. Model response: With the public code filter enabled, some suggestions are suppressed. The vulnerability protection feature blocks insecure coding patterns like hardcoded credentials or SQL injections in real time. Disable access to GitHub Copilot Free Due to the varying policies associated with GitHub Copilot Free, it is crucial for organizations to ensure it is disabled both in the IDE and on GitHub.com. Since not all IDEs currently offer a built-in option to disable Copilot Free, the most reliable method to prevent both accidental and intentional access is to implement firewall rule changes, as outlined in the official documentation. Agent Mode Allow List Accidental file system deletion by Agentic AI assistants can happen. With GitHub Copilot agent mode, the "Terminal auto approveâ setting in VS Code can be used to prevent this. This setting can be managed centrally using a VS Code policy. MCP registry Organizations often want to restrict access to allow only trusted MCP servers. GitHub now offers an MCP registry feature for this purpose. This feature isnât available in all IDEs and clients yet, but it's being developed. Compliance Certifications The GitHub Copilot Trust Center page lists GitHub Copilot's broad compliance credentials, surpassing many competitors in financial, security, privacy, cloud, and industry coverage. SOC 1 Type 2: Assurance over internal controls for financial reporting. SOC 2 Type 2: In-depth report covering Security, Availability, Processing Integrity, Confidentiality, and Privacy over time. SOC 3: General-use version of SOC 2 with broad executive-level assurance. ISO/IECâŻ27001:2013: Certification for a formal Information Security Management System (ISMS), based on risk management controls. CSA STAR Level 2: Includes a third-party attestation combining ISOâŻ27001 or SOC 2 with additional cloud control matrix (CCM) requirements. TISAX: Trusted Information Security Assessment Exchange, covering automotive-sector security standards. In summary, while the adoption of AI tools like GitHub Copilot in software development can raise important questions around security, privacy, and compliance, itâs clear that existing safeguards in place help address these concerns. By understanding the safeguards, configurable controls, and robust compliance certifications offered, organizations and developers alike can feel more confident in embracing GitHub Copilot to accelerate innovation while maintaining trust and peace of mind.GitHub Copilot SDK and Hybrid AI in Practice: Automating README to PPT Transformation
Introduction In today's rapidly evolving AI landscape, developers often face a critical choice: should we use powerful cloud-based Large Language Models (LLMs) that require internet connectivity, or lightweight Small Language Models (SLMs) that run locally but have limited capabilities? The answer isn't either-orâit's hybrid modelsâcombining the strengths of both to create AI solutions that are secure, efficient, and powerful. This article explores hybrid model architectures through the lens of GenGitHubRepoPPT, demonstrating how to elegantly combine Microsoft Foundry Local, GitHub Copilot SDK, and other technologies to automatically generate professional PowerPoint presentations from GitHub README files. 1. Hybrid Model Scenarios and Value 1.1 What Are Hybrid Models? Hybrid AI Models strategically combine locally-running Small Language Models (SLMs) with cloud-based Large Language Models (LLMs) within the same application, selecting the most appropriate model for each task based on its unique characteristics. Core Principles: Local Processing for Sensitive Data: Privacy-critical content analysis happens on-device Cloud for Value Creation: Complex reasoning and creative generation leverage cloud power Balancing Cost and Performance: High-frequency, simple tasks run locally to minimize API costs 1.2 Typical Hybrid Model Use Cases Use Case Local SLM Role Cloud LLM Role Value Proposition Intelligent Document Processing Text extraction, structural analysis Content refinement, format conversion Privacy protection + Professional output Code Development Assistant Syntax checking, code completion Complex refactoring, architecture advice Fast response + Deep insights Customer Service Systems Intent recognition, FAQ handling Complex issue resolution Reduced latency + Enhanced quality Content Creation Platforms Keyword extraction, outline generation Article writing, multilingual translation Cost control + Creative assurance 1.3 Why Choose Hybrid Models? Three Core Advantages: Privacy and Security Sensitive data never leaves local devices Compliant with GDPR, HIPAA, and other regulations Ideal for internal corporate documents and personal information Cost Optimization Reduces cloud API call frequency Local models have zero usage fees Predictable operational costs Performance and Reliability Local processing eliminates network latency Partial functionality in offline environments Cloud models ensure high-quality output 2. Core Technology Analysis 2.1 Large Language Models (LLMs): Cloud Intelligence Representatives What are LLMs? Large Language Models are deep learning-based natural language processing models, typically with billions to trillions of parameters. Through training on massive text datasets, they've acquired powerful language understanding and generation capabilities. Representative Models: Claude Sonnet 4.5: Anthropic's flagship model, excelling at long-context processing and complex reasoning GPT-5.2 Series: OpenAI's general-purpose language models Gemini: Google's multimodal large models LLM Advantages: â Exceptional text generation quality â Powerful contextual understanding â Support for complex reasoning tasks â Continuous model updates and optimization Typical Applications: Professional document writing (technical reports, business plans) Code generation and refactoring Multilingual translation Creative content creation 2.2 Small Language Models (SLMs) and Microsoft Foundry Local 2.2.1 SLM Characteristics Small Language Models typically have 1B-7B parameters, designed specifically for resource-constrained environments. Mainstream SLM Model Families: Microsoft Phi Family (Phi Family): Inference-optimized efficient models Alibaba Qwen Family (Qwen Family): Excellent Chinese language capabilities Mistral Series: Outstanding performance with small parameter counts SLM Advantages: ⥠Low-latency response (millisecond-level) đ° Zero API costs đ Fully local, data stays on-device đą Suitable for edge device deployment 2.2.2 Microsoft Foundry Local: The Foundation of Local AI Foundry Local is Microsoft's local AI runtime tool, enabling developers to easily run SLMs on Windows or macOS devices. Core Features: OpenAI-Compatible API # Using Foundry Local is like using OpenAI API from openai import OpenAI from foundry_local import FoundryLocalManager manager = FoundryLocalManager("qwen2.5-7b-instruct") client = OpenAI( base_url=manager.endpoint, api_key=manager.api_key ) Hardware Acceleration Support CPU: General computing support GPU: NVIDIA, AMD, Intel graphics acceleration NPU: Qualcomm, Intel AI-specific chips Apple Silicon: Neural Engine optimization Based on ONNX Runtime Cross-platform compatibility Highly optimized inference performance Supports model quantization (INT4, INT8) Convenient Model Management # View available models foundry model list # Run a model foundry model run qwen2.5-7b-instruct-generic-cpu:4 # Check running status foundry service ps Foundry Local Application Value: đ Educational Scenarios: Students can learn AI development without cloud subscriptions đ˘ Enterprise Environments: Process sensitive data while maintaining compliance đ§Ş R&D Testing: Rapid prototyping without API cost concerns âď¸ Offline Environments: Works on planes, subways, and other no-network scenarios 2.3 GitHub Copilot SDK: The Express Lane from Agent to Business Value 2.3.1 What is GitHub Copilot SDK? GitHub Copilot SDK, released as a technical preview on January 22, 2026, is a game-changer for AI Agent development. Unlike other AI SDKs, Copilot SDK doesn't just provide API calling interfacesâit delivers a complete, production-grade Agent execution engine. Why is it revolutionary? Traditional AI application development requires you to build: â Context management systems (multi-turn conversation state) â Tool orchestration logic (deciding when to call which tool) â Model routing mechanisms (switching between different LLMs) â MCP server integration â Permission and security boundaries â Error handling and retry mechanisms Copilot SDK provides all of this out-of-the-box, letting you focus on business logic rather than underlying infrastructure. 2.3.2 Core Advantages: The Ultra-Short Path from Concept to Code Production-Grade Agent Engine: Battle-Tested Reliability Copilot SDK uses the same Agent core as GitHub Copilot CLI, which means: â Validated in millions of real-world developer scenarios â Capable of handling complex multi-step task orchestration â Automatic task planning and execution â Built-in error recovery mechanisms Real-World Example: In the GenGitHubRepoPPT project, we don't need to hand-write the "how to convert outline to PPT" logicâwe simply tell Copilot SDK the goal, and it automatically: Analyzes outline structure Plans slide layouts Calls file creation tools Applies formatting logic Handles multilingual adaptation # Traditional approach: requires hundreds of lines of code for logic def create_ppt_traditional(outline): slides = parse_outline(outline) for slide in slides: layout = determine_layout(slide) content = format_content(slide) apply_styling(content, layout) # ... more manual logic return ppt_file # Copilot SDK approach: focus on business intent session = await client.create_session({ "model": "claude-sonnet-4.5", "streaming": True, "skill_directories": [skills_dir] }) session.send_and_wait({"prompt": prompt}, timeout=600) Custom Skills: Reusable Encapsulation of Business Knowledge This is one of Copilot SDK's most powerful features. In traditional AI development, you need to provide complete prompts and context with every call. Skills allow you to: Define once, reuse forever: # .copilot_skills/ppt/SKILL.md # PowerPoint Generation Expert Skill ## Expertise You are an expert in business presentation design, skilled at transforming technical content into easy-to-understand visual presentations. ## Workflow 1. **Structure Analysis** - Identify outline hierarchy (titles, subtitles, bullet points) - Determine topic and content density for each slide 2. **Layout Selection** - Title slide: Use large title + subtitle layout - Content slides: Choose single/dual column based on bullet count - Technical details: Use code block or table layouts 3. **Visual Optimization** - Apply professional color scheme (corporate blue + accent colors) - Ensure each slide has a visual focal point - Keep bullets to 5-7 items per page 4. **Multilingual Adaptation** - Choose appropriate fonts based on language (Chinese: Microsoft YaHei, English: Calibri) - Adapt text direction and layout conventions ## Output Requirements Generate .pptx files meeting these standards: - 16:9 widescreen ratio - Consistent visual style - Editable content (not images) - File size < 5MB Business Code Generation Capability This is the core value of this project. Unlike generic LLM APIs, Copilot SDK with Skills can generate truly executable business code. Comparison Example: Aspect Generic LLM API Copilot SDK + Skills Task Description Requires detailed prompt engineering Concise business intent suffices Output Quality May need multiple adjustments Professional-grade on first try Code Execution Usually example code Directly generates runnable programs Error Handling Manual implementation required Agent automatically handles and retries Multi-step Tasks Manual orchestration needed Automatic planning and execution Comparison of manual coding workload: Task Manual Coding Copilot SDK Processing logic code ~500 lines ~10 lines configuration Layout templates ~200 lines Declared in Skill Style definitions ~150 lines Declared in Skill Error handling ~100 lines Automatically handled Total ~950 lines ~10 lines + Skill file Tool Calling & MCP Integration: Connecting to the Real World Copilot SDK doesn't just generate codeâit can directly execute operations: đď¸ File System Operations: Create, read, modify files đ Network Requests: Call external APIs đ Data Processing: Use pandas, numpy, and other libraries đ§ Custom Tools: Integrate your business logic 3. GenGitHubRepoPPT Case Study 3.1 Project Overview GenGitHubRepoPPT is an innovative hybrid AI solution that combines local AI models with cloud-based AI agents to automatically generate professional PowerPoint presentations from GitHub repository README files in under 5 minutes. Technical Architecture: 3.2 Why Adopt a Hybrid Model? Stage 1: Local SLM Processes Sensitive Data Task: Analyze GitHub README, extract key information, generate structured outline Reasons for choosing Qwen-2.5-7B + Foundry Local: Privacy Protection README may contain internal project information Local processing ensures data doesn't leave the device Complies with data compliance requirements Cost Effectiveness Each analysis processes thousands of tokens Cloud API costs are significant in high-frequency scenarios Local models have zero additional fees Performance Qwen-2.5-7B excels at text analysis tasks Outstanding Chinese support Acceptable CPU inference latency (typically 2-3 seconds) Stage 2: Cloud LLM + Copilot SDK Creates Business Value Task: Create well-formatted PowerPoint files based on outline Reasons for choosing Claude Sonnet 4.5 + Copilot SDK: Automated Business Code Generation Traditional approach pain points: Need to hand-write 500+ lines of code for PPT layout logic Require deep knowledge of python-pptx library APIs Style and formatting code is error-prone Multilingual support requires additional conditional logic Copilot SDK solution: Declare business rules and best practices through Skills Agent automatically generates and executes required code Zero-code implementation of complex layout logic Development time reduced from 2-3 days to 2-3 hours Ultra-Short Path from Intent to Execution Comparison: Different ways to implement "Generate professional PPT" 3. Production-Grade Reliability and Quality Assurance Battle-tested Agent engine: Uses the same core as GitHub Copilot CLI Validated in millions of real-world scenarios Automatically handles edge cases and errors Consistent output quality: Professional standards ensured through Skills Automatic validation of generated files Built-in retry and error recovery mechanisms 4. Rapid Iteration and Optimization Capability Scenario: Client requests PPT style adjustment The GitHub Repo https://github.com/kinfey/GenGitHubRepoPPT 4. Summary 4.1 Core Value of Hybrid Models + Copilot SDK The GenGitHubRepoPPT project demonstrates how combining hybrid models with Copilot SDK creates a new paradigm for AI application development. Privacy and Cost Balance The hybrid approach allows sensitive README analysis to happen locally using Qwen-2.5-7B, ensuring data never leaves the device while incurring zero API costs. Meanwhile, the value-creating workâgenerating professional PowerPoint presentationsâleverages Claude Sonnet 4.5 through Copilot SDK, delivering quality that justifies the per-use cost. From Code to Intent Traditional AI development required writing hundreds of lines of code to handle PPT generation logic, layout selection, style application, and error handling. With Copilot SDK and Skills, developers describe what they want in natural language, and the Agent automatically generates and executes the necessary code. What once took 3-5 days now takes 3-4 hours, with 95% less code to maintain. Automated Business Code Generation Copilot SDK doesn't just provide code examplesâit generates complete, executable business logic. When you request a multilingual PPT, the Agent understands the requirement, selects appropriate fonts, generates the implementation code, executes it with error handling, validates the output, and returns a ready-to-use file. Developers focus on business intent rather than implementation details. 4.2 Technology Trends The Shift to Intent-Driven Development We're witnessing a fundamental change in how developers work. Rather than mastering every programming language detail and framework API, developers are increasingly defining what they want through declarative Skills. Copilot SDK represents this future: you describe capabilities in natural language, and AI Agents handle the code generation and execution automatically. Edge AI and Cloud AI Integration The evolution from pure cloud LLMs (powerful but privacy-concerning) to pure local SLMs (private but limited) has led to today's hybrid architectures. GenGitHubRepoPPT exemplifies this trend: local models handle data analysis and structuring, while cloud models tackle complex reasoning and professional output generation. This combination delivers fast, secure, and professional results. Democratization of Agent Development Copilot SDK dramatically lowers the barrier to building AI applications. Senior engineers see 10-20x productivity gains. Mid-level engineers can now build sophisticated agents that were previously beyond their reach. Even junior engineers and business experts can participate by writing Skills that capture domain knowledge without deep technical expertise. The future isn't about whether we can build AI applicationsâit's about how quickly we can turn ideas into reality. References Projects and Code GenGitHubRepoPPT GitHub Repository - Case study project Microsoft Foundry Local - Local AI runtime GitHub Copilot SDK - Agent development SDK Copilot SDK Getting Started Tutorial - Official quick start Deep Dive: Copilot SDK Build an Agent into Any App with GitHub Copilot SDK - Official announcement GitHub Copilot SDK Cookbook - Practical examples Copilot CLI Official Documentation - CLI tool documentation Learning Resources Edge AI for Beginners - Edge AI introductory course Azure AI Foundry Documentation - Azure AI documentation GitHub Copilot Extensions Guide - Extension development guideRethinking Documentation Translation: Treating Translations as Versioned Software Assets
Rethinking Documentation Translation: Treating Translations as Versioned Software Assets This article is written from the perspective of maintaining large, open-source documentation repositories in the Microsoft ecosystem. I am the maintainer of Co-op Translator, an open-source tool for automating multilingual documentation translation, used across multiple large documentation repositories, including Microsoftâs For Beginners series. In large documentation repositories, translation problems rarely fail loudly. They fail quietly, and they accumulate over time. Recently, we made a fundamental design decision in how Co-op Translator handles translations. Translations are treated as versioned software assets, not static outputs. This article explains why we reached that conclusion, and what this perspective enables for teams maintaining large, fast-moving documentation repositories. When translations quietly become a liability In most documentation projects, translations are treated as finished outputs. Once a file is translated, it is assumed to remain valid until someone explicitly notices a problem. But documentation rarely stands still. Text changes. Code examples evolve. Screenshots are replaced. Notebooks are updated to reflect new behavior. The problem is that these changes are often invisible in translated content. A translation may still read fluently, while the information it contains is already out of date. At that point, the issue is no longer about translation quality. It becomes a maintenance problem. Reframing the question Most translation workflows implicitly ask: Is this translation correct? In practice, maintainers struggle with a different question: Is this translation still synchronized with the current source? This distinction matters. A translation can be correct and still be out of sync. Once we acknowledged this, it became clear that treating translations as static content was no longer sufficient. The design decision: translations as versioned assets Starting with Co-op Translator 0.16.2, we made a deliberate design decision: Translations are treated as versioned software assets. This applies not only to Markdown files, but also to images, notebooks, and any other translated artifacts. Translated content is not just text. It is an artifact generated from a specific version of a source. To make this abstraction operational rather than theoretical, we did not invent a new mechanism. Instead, we looked to systems that already solve a similar problem: pip, poetry, and npm. These tools are designed to track artifacts as their sources evolve. We applied the same thinking to translated content. Closer to dependency management than translation jobs The closest analogy is software dependency management. When a dependency becomes outdated: it is not suddenly âwrong,â it is simply no longer aligned with the current version. Translations behave the same way. When the source document changes: the translated file does not immediately become incorrect, it becomes out of sync with its source version. This framing shifts the problem away from translation output and toward state and synchronization. Why file-level versioning matters Many translation systems operate at the string or segment level. That model works well for UI text and relatively stable resources. Documentation is different. A Markdown file is an artifact. A screenshot is an artifact. A notebook is an artifact. They are consumed as units, not as isolated strings. Managing translation state at the file level allows maintainers to reason about translations using the same mental model they already apply to other repository assets. What changed in practice From embedded markers to explicit state Previously, translation metadata lived inside translated files as embedded comments or markers. This approach had clear limitations: translation state was fragmented, difficult to inspect globally, and easy to miss as repositories grew. We moved to language-scoped JSON state files that explicitly track: the source version, the translated artifact, and its synchronization status. Translation state is no longer hidden inside content. It is a first-class, inspectable part of the repository. Extending the model to images and notebooks The same model now applies consistently to: translated images, localized notebooks, and other non-text artifacts. If an image changes in the source language, the translated image becomes out of sync. If a notebook is updated, its translated versions are evaluated against the new source version. The format does not matter. The lifecycle does. Once translations are treated as versioned assets, the system remains consistent across all content types. What this enables This design enables: Explicit drift detection See which translations are out of sync without guessing. Consistent maintenance signals Text, images, and notebooks follow the same rules. Clear responsibility boundaries The system reports state. Humans decide action. Scalability for fast-moving repositories Translation maintenance becomes observable, not reactive. In large documentation sets, this difference determines whether translation maintenance is sustainable at all. What this is not This system does not: judge translation quality, determine semantic correctness, or auto-approve content. It answers one question only: Is this translated artifact synchronized with its source version? Who this is for This approach is designed for teams that: maintain multilingual documentation, update content frequently, and need confidence in what is actually up to date. When documentation evolves faster than translations, treating translations as versioned assets becomes a necessity, not an optimization. Closing thought Once translations are modeled as software assets, long-standing ambiguities disappear. State becomes visible. Maintenance becomes manageable. And translations fit naturally into existing software workflows. At that point, the question is no longer whether translation drift exists, but: Can you see it? Reference Co-op Translator repository https://github.com/Azure/co-op-translatorThe Perfect Fusion of GitHub Copilot SDK and Cloud Native
In today's rapidly evolving AI landscape, we've witnessed the transformation from simple chatbots to sophisticated agent systems. As a developer and technology evangelist, I've observed an emerging trendâit's not about making AI omnipotent, but about enabling each AI Agent to achieve excellence in specific domains. Today, I want to share an exciting technology stack: GitHub Copilot SDK (a development toolkit that embeds production-grade agent engines into any application) + Agent-to-Agent (A2A) Protocol (a communication standard enabling standardized agent collaboration) + Cloud Native Deployment (the infrastructure foundation for production systems). Together, these three components enable us to build truly collaborative multi-agent systems. 1. From AI Assistants to Agent Engines: Redefining Capability Boundaries Traditional AI assistants often pursue "omnipotence"âattempting to answer any question you throw at them. However, in real production environments, this approach faces serious challenges: Inconsistent Quality: A single model trying to write code, perform data analysis, and generate creative content struggles to achieve professional standards in each domain Context Pollution: Mixing prompts from different tasks leads to unstable model outputs Difficult Optimization: Adjusting prompts for one task type may negatively impact performance on others High Development Barrier: Building agents from scratch requires handling planning, tool orchestration, context management, and other complex logic GitHub proposed a revolutionary approachâinstead of forcing developers to build agent frameworks from scratch, provide a production-tested, programmable agent engine. This is the core value of the GitHub Copilot SDK. Evolution from Copilot CLI to SDK GitHub Copilot CLI is a powerful command-line tool that can: Plan projects and features Modify files and execute commands Use custom agents Delegate tasks to cloud execution Integrate with MCP servers The GitHub Copilot SDK extracts the agentic core behind Copilot CLI and offers it as a programmable layer for any application. This means: You're no longer confined to terminal environments You can embed this agent engine into GUI applications, web services, and automation scripts You gain access to the same execution engine validated by millions of users Just like in the real world, we don't expect one person to be a doctor, lawyer, and engineer simultaneously. Instead, we provide professional tools and platforms that enable professionals to excel in their respective domains. 2. GitHub Copilot SDK: Embedding Copilot CLI's Agentic Core into Any App Before diving into multi-agent systems, we need to understand a key technology: GitHub Copilot SDK. What is GitHub Copilot SDK? GitHub Copilot SDK (now in technical preview) is a programmable agent execution platform. It allows developers to embed the production-tested agentic core from GitHub Copilot CLI directly into any application. Simply put, the SDK provides: Out-of-the-box Agent Loop: No need to build planners, tool orchestration, or context management from scratch Multi-model Support: Choose different AI models (like GPT-4, Claude Sonnet) for different task phases Tool and Command Integration: Built-in file editing, command execution, and MCP server integration capabilities Streaming Real-time Responses: Support for progress updates on long-running tasks Multi-language Support: SDKs available for Node.js, Python, Go, and .NET Why is the SDK Critical for Building Agents? Building an agentic workflow from scratch is extremely difficult. You need to handle: Context management across multiple conversation turns Orchestration of tools and commands Routing between different models MCP server integration Permission control, safety boundaries, and error handling GitHub Copilot SDK abstracts away all this underlying complexity. You only need to focus on: Defining agent professional capabilities (through Skill files) Providing domain-specific tools and constraints Implementing business logic SDK Usage Examples Python Example (from actual project implementation): from copilot import CopilotClient # Initialize client copilot_client = CopilotClient() await copilot_client.start() # Create session and load Skill session = await copilot_client.create_session({ "model": "claude-sonnet-4.5", "streaming": True, "skill_directories": ["/path/to/skills/blog/SKILL.md"] }) # Send task await session.send_and_wait({ "prompt": "Write a technical blog about multi-agent systems" }, timeout=600) Skill System: Professionalizing Agents While the SDK provides a powerful execution engine, how do we make agents perform professionally in specific domains? The answer is Skill files. A Skill file is a standardized capability definition containing: Capability Declaration: Explicitly tells the system "what I can do" (e.g., blog generation, PPT creation) Domain Knowledge: Preset best practices, standards, and terminology guidelines Workflow: Defines the complete execution path from input to output Output Standards: Ensures generated content meets format and quality requirements Through the combination of Skill files + SDK, we can build truly professional agents rather than generic "jack-of-all-trades assistants." 3. A2A Protocol: Enabling Seamless Agent Collaboration Once we have professional agents, the next challenge is: how do we make them work together? This is the core problem the Agent-to-Agent (A2A) Protocol aims to solve. Three Core Mechanisms of A2A Protocol 1. Agent Discovery (Service Discovery) Each agent exposes its capability card through the standardized /.well-known/agent-card.json endpoint, acting like a business card that tells other agents "what I can do": { "name": "blog_agent", "description": "Blog generation with DeepSearch", "primaryKeywords": ["blog", "article", "write"], "skills": [{ "id": "blog_generation", "tags": ["blog", "writing"], "examples": ["Write a blog about..."] }], "capabilities": { "streaming": true } } 2. Intelligent Routing The Orchestrator matches tasks with agent capabilities through scoring. The project's routing algorithm implements keyword matching and exclusion detection: Positive Matching: If a task contains an agent's primaryKeywords, score +0.5 Negative Exclusion: If a task contains other agents' keywords, score -0.3 This way, when users say "write a blog about cloud native," the system automatically selects the Blog Agent; when they say "create a tech presentation PPT," it routes to the PPT Agent. 3. SSE Streaming (Real-time Streaming) For time-consuming tasks (like generating a 5000-word blog), A2A uses Server-Sent Events to push real-time progress, allowing users to see the agent working instead of just waiting. This is crucial for user experience. 4. Cloud Native Deployment: Making Agent Systems Production-Ready Even the most powerful technology is just a toy if it can't be deployed to production environments. This project demonstrates a complete deployment of a multi-agent system to a cloud-native platform (Azure Container Apps). Why Choose Cloud Native? Elastic Scaling: When blog generation requests surge, the Blog Agent can auto-scale; it scales down to zero during idle times to save costs Independent Evolution: Each agent has its own Docker image and deployment pipeline; updating the Blog Agent doesn't affect the PPT Agent Fault Isolation: If one agent crashes, it won't bring down the entire system; the Orchestrator automatically degrades Global Distribution: Through Azure Container Apps, agents can be deployed across multiple global regions to reduce latency Container Deployment Essentials Each agent in the project has a standardized Dockerfile: FROM python:3.12-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 8001 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8001"] Combined with the deploy-to-aca.sh script, one-click deployment to Azure: # Build and push image az acr build --registry myregistry --image blog-agent:latest . # Deploy to Container Apps az containerapp create \ --name blog-agent \ --resource-group my-rg \ --environment my-env \ --image myregistry.azurecr.io/blog-agent:latest \ --secrets github-token=$COPILOT_TOKEN \ --env-vars COPILOT_GITHUB_TOKEN=secretref:github-token 5. Real-World Results: From "Works" to "Works Well" Let's see how this system performs in real scenarios. Suppose a user initiates a request: "Write a technical blog about Kubernetes multi-tenancy security, including code examples and best practices" System Execution Flow: Orchestrator receives the request and scans all agents' capability cards Keyword matching: "write" + "blog" â Blog Agent scores 1.0, PPT Agent scores 0.0 Routes to Blog Agent, loads technical writing Skill Blog Agent initiates DeepSearch to collect latest K8s security materials SSE real-time push: "Collecting materials..." â "Generating outline..." â "Writing content..." Returns complete blog after 5 minutes, including code highlighting, citation sources, and best practices summary Compared to traditional "omnipotent" AI assistants, this system's advantages: â Professionalism: Blog Agent trained with technical writing Skills produces content with clear structure, accurate terminology, and executable code â Visibility: Users see progress throughout, knowing what the AI is doing â Extensibility: Adding new agents (video script, data analysis) in the future requires no changes to existing architecture 6. Key Technical Challenges and Solutions Challenge 1: Inaccurate Agent Capability Descriptions Leading to Routing Errors Solution: Define clear primaryKeywords and examples in Agent Cards Implement exclusion detection mechanism to prevent tasks from being routed to unsuitable agents Challenge 2: Poor User Experience for Long-Running Tasks Solution: Fully adopt SSE streaming, pushing working/completed/error status in real-time Display progress hints in status messages so users know what the system is doing Challenge 3: Sensitive Information Leakage Risk Solution: Use Azure Key Vault or Container Apps Secrets to manage GitHub Tokens Inject via environment variables, never hardcode in code or images Check required environment variables in deployment scripts to prevent configuration errors 7. Future Outlook: SDK-Driven Multi-Agent Ecosystem This project is just the beginning. As GitHub Copilot SDK and A2A Protocol mature, we can build richer agent ecosystems: Actual SDK Application Scenarios According to GitHub's official blog, development teams have already used the Copilot SDK to build: YouTube Chapter Generator: Automatically generates timestamped chapter markers for videos Custom Agent GUIs: Visual agent interfaces for specific business scenarios Speech-to-Command Workflows: Control desktop applications through voice AI Battle Games: Interactive competitive experiences with AI Intelligent Summary Tools: Automatic extraction and summarization of key information Multi-Agent System Evolution Directions đŞ Agent Marketplace: Developers can publish specialized agents (legal documents, medical reports, etc.) that plug-and-play via A2A protocol đ Cascade Orchestration: Orchestrator automatically breaks down complex tasks, calling multiple agents collaboratively (e.g., "write blog + generate images + create PPT") đ Cross-Platform Interoperability: Based on A2A standards, agents developed by different companies can call each other, breaking down data silos âď¸ Automated Workflows: Delegate routine repetitive work to agent chains, letting humans focus on creative work đŻ Vertical Domain Specialization: Combined with Skill files, build high-precision agents in professional fields like finance, healthcare, and legal Core Value of the SDK The significance of GitHub Copilot SDK lies in: it empowers every developer to become a builder of agent systems. You don't need deep learning experts, you don't need to implement agent frameworks yourself, and you don't even need to manage GPU clusters. You only need to: Install the SDK (npm install github/copilot-sdk) Define your business logic and tools Write Skill files describing professional capabilities Call the SDK's execution engine And you can build production-grade intelligent agent applications. Summary: From Demo to Production GitHub Copilot SDK + A2A + Cloud Native isn't three independent technology stacks, but a complete methodology: GitHub Copilot SDK provides an out-of-the-box agent execution engineâhandling planning, tool orchestration, context management, and other underlying complexity Skill files enable agents with domain-specific professional capabilitiesâdefining best practices, workflows, and output standards A2A Protocol enables standardized communication and collaboration between agentsâimplementing service discovery, intelligent routing, and streaming Cloud Native makes the entire system production-readyâcontainerization, elastic scaling, fault isolation For developers, this means we no longer need to build agent frameworks from scratch or struggle with the black magic of prompt engineering. We only need to: Use GitHub Copilot SDK to obtain a production-grade agent execution engine Write domain-specific Skill files to define professional capabilities Follow A2A protocol to implement standard interfaces between agents Deploy to cloud platforms through containerization And we can build AI Agent systems that are truly usable, well-designed, and production-ready. đ Start Building Complete project code is open source: https://github.com/kinfey/Multi-AI-Agents-Cloud-Native/tree/main/code/GitHubCopilotAgents_A2A Follow the README guide and deploy your first Multi-Agent system in 30 minutes! References GitHub Copilot SDK Official Announcement - Build an agent into any app with the GitHub Copilot SDK GitHub Copilot SDK Repository - github.com/github/copilot-sdk A2A Protocol Official Specification - a2a-protocol.org/latest/ Project Source Code - Multi-AI-Agents-Cloud-Native Azure Container Apps Documentation - learn.microsoft.com/azure/container-apps541Views0likes0CommentsWhat is trending in Hugging Face on Microsoft Foundry? Feb, 2, 2026
Openâsource AI is moving fast, with important breakthroughs in reasoning, agentic systems, multimodality, and efficiency emerging every day. Hugging Face has been a leading platform where researchers, startups, and developers share and discover new models. Microsoft Foundry brings these trending Hugging Face models into a productionâready experience, where developers can explore, evaluate, and deploy them within their Azure environment. Our weekly Model Mondayâs series highlights Hugging Face models available in Foundry, focusing on what matters most to developers: why a model is interesting, where it fits, and how to put it to work quickly. This weekâs Model Mondays edition highlights three Hugging Face models, including a powerful Mixture-of-Experts model from Z. AI designed for lightweight deployment, Metaâs unified foundation model for image and video segmentation, and MiniMaxâs latest open-source agentic model optimized for complex workflows. Models of the week Z.AIâs GLM-4.7-flash Model Basics Model name: zai-org/GLM-4.7-Flash Parameters / size: 30B total -3B active Default settings: 131,072 max new tokens Primary task: Agentic, Reasoning and Coding Why this model matters Why itâs interesting: It utilizes a Mixture-of-Experts (MoE) architecture (30B total parameters and 3B active parameters) to offer a new option for lightweight deployment. It demonstrates strong performance on logic and reasoning benchmarks, outperforming similar sized models like gpt-oss-20b on AIME 25 and GPQA benchmarks. It supports advanced inference features like "Preserved Thinking" mode for multi-turn agentic tasks. Bestâfit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. Whatâs notable: From the Foundry catalog, users can deploy on a A100 instance or unsloth/GLM-4.7-Flash-GGUF on a CPU. ource SOTA scores among models of comparable size. Additionally, compared to similarly sized models, GLM-4.7-Flash demonstrates superior frontend and backend development capabilities. Click to see more: https://docs.z.ai Try it Use case Bestâpractice prompt pattern Agentic coding (multiâstep repo work, debugging, refactoring) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and stepâbyâstep execution, then a single consolidated result. Longâcontext agent workflows (local or lowâcost autonomous agents) Call out longâhorizon consistency and context preservation. Instruct the model to retain earlier assumptions and decisions across turns. Now that you know GLMâ4.7âFlash works best when you give it a clear goal and let it reason through a bounded task, hereâs an example prompt that a product or engineering team might use to identify risks and propose mitigations: You are a software reliability analyst for a midâscale SaaS platform. Review recent incident reports, production logs, and customer issues to uncover edgeâcase failures outside normal usage (e.g., rare inputs, boundary conditions, timing/concurrency issues, config drift, or unexpected feature interactions). Prioritize lowâfrequency, highâimpact risks that standard testing misses. Recommend minimal, lowâcost fixes (validation, guardrails, fallback logic, or documentation). Deliver a concise executive summary with sections: Observed Edge Cases, Root Causes, User Impact, Recommended Lightweight Fixes, and Validation Steps. Meta's Segment Anything 3 (SAM3) Model Basics Model name: facebook/sam3 Parameters / size: 0.9B Primary task: Mask Generation, Promptable Concept Segmentation (PCS) Why this model matters Why itâs interesting: It handles a vastly larger set of open-vocabulary prompts than SAM 2, and unifies image and video segmentation capabilities. It includes a "SAM 3 Tracker" mode that acts as a drop-in replacement for SAM 2 workflows with improved performance. Bestâfit use cases: Open-vocabulary object detection, video object tracking, and automatic mask generation Whatâs notable: Introduces Promptable Concept Segmentation (PCS), allowing users to find all matching objects (e.g., "dial") via text prompt rather than just single instances. Try it This model enables users to identify specific objects within video footage and isolate them over extended periods. With just one line of code, it is possible to detect multiple similar objects simultaneously. The accompanying GIF demonstrates how SAM3 efficiently highlights players wearing white on the field as they appear and disappear from view. Additional examples are available at the following repository: https://github.com/facebookresearch/sam3/blob/main/assets/player.gif Use case Bestâpractice prompt pattern Agentic coding (multiâstep repo work, debugging, refactoring) Treat SAMâŻ3 as a concept detector, not an interactive click tool. Use short, concrete nounâphrase concept prompts instead of describing the scene or asking questions. Example prompt: âyellow school busâ or âshipping containersâ. Avoid verbs or full sentences. Video segmentation + object tracking Specify the same concept prompt once, then apply it across the video sequence. Do not restate the prompt per frame. Let the model maintain identity continuity. Example: âperson wearing a red jerseyâ. Hardâtoâname or visually subtle objects Use exemplarâbased prompts (image region or box) when text alone is ambiguous. Optionally combine positive and negative exemplars to refine the concept. Avoid overâconstraining with long descriptions. Using the GIF above as a leading example, here is a prompt that shows how SAMâŻ3 turns raw sports footage into structured, reusable data. By identifying and tracking players based on visual concepts like jersey color so that sports leagues can turn tracked data into interactive experiences where automated player identification can relay stats, fun facts, etc when built into a larger application. Here is a prompt that will allow you to start identifying specific players across video: Act as a sports analytics operator analyzing football match footage. Segment and track all football players wearing blue jerseys across the video. Generate pixelâaccurate segmentation masks for each player and assign persistent instance IDs that remain stable during camera movement, zoom, and player occlusion. Exclude referees, opposing team jerseys, sidelines, and crowd. Output frameâlevel masks and tracking metadata suitable for overlays, player statistics, and downstream analytics pipelines. MiniMax AI's MiniMax-M2.1 Model Basics Model name: MiniMaxAI/MiniMax-M2.1 Parameters / size: 229B-10B Active Default settings: 200,000 max new tokens Primary task: Agentic and Coding Why this model matters Why itâs interesting: It is optimized for robustness in coding, tool use, and long-horizon planning, outperforming Claude Sonnet 4.5 in multilingual scenarios. It excels in full-stack application development, capable of architecting apps "from zero to oneâ. Previous coding models focused on Python optimization, M2.1 brings enhanced capabilities in Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, JavaScript, and other languages. The model delivers exceptional stability across various coding agent frameworks. Bestâfit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. Whatâs notable: The release of open-source weights for M2.1 delivers a massive leap over M2 on software engineering leaderboards. https://www.minimax.io/ Try it Use case Bestâpractice prompt pattern Endâtoâend agentic coding (multiâfile edits, runâfix loops) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and stepâbyâstep execution, then a single consolidated result. Longâhorizon toolâusing agents (shell, browser, Python) Explicitly request stepwise planning and sequential tool use. M2.1âs interleaved thinking and improved instructionâconstraint handling are designed for complex, multiâstep analytical tasks that require evidence tracking and coherent synthesis, not conversational backâandâforth. Longâcontext reasoning & analysis (large documents / logs) Declare the scope and desired output structure up front. MiniMaxâM2.1 performs best when the objective and final artifact are clear, allowing it to manage long context and maintain coherence. Because MiniMaxâM2.1 is designed to act as a longâhorizon analytical agent, it shines when you give it a clear end goal and let it work through large volumes of informationâhereâs a prompt a risk or compliance team could use in practice: You are a financial risk analysis agent. Analyze the following transaction logs and compliance policy documents to identify potential regulatory violations and systemic risk patterns. Plan your approach before executing. Work through the data step by step, referencing evidence where relevant. Deliver a final report with the following sections: Key Risk Patterns Identified, Supporting Evidence, Potential Regulatory Impact, Recommended Mitigations. Your response should be a complete, executive-ready report, not a conversational draft. Getting started You can deploy openâsource Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry608Views0likes0Comments