github
354 TopicsBuilding a Dual Sidecar Pod: Combining GitHub Copilot SDK with Skill Server on Kubernetes
Why the Sidecar Pattern? In Kubernetes, a Pod is the smallest deployable unit — a single Pod can contain multiple containers that share the same network namespace and storage volumes. The Sidecar pattern places auxiliary containers alongside the main application container within the same Pod. These Sidecar containers extend or enhance the main container's functionality without modifying it. 💡 Beginner Tip: If you're new to Kubernetes, think of a Pod as a shared office — everyone in the room (containers) has their own desk (process), but they share the same network (IP address), the same file cabinet (storage volumes), and can communicate without leaving the room (localhost communication). The Sidecar pattern is not a new concept. As early as 2015, the official Kubernetes blog described this pattern in a post about Composite Containers. Service mesh projects like Envoy, Istio, and Linkerd extensively use Sidecar containers for traffic management, observability, and security policies. In the AI application space, we are now exploring how to apply this proven pattern to new scenarios. Why does this matter? There are three fundamental reasons: 1. Separation of Concerns Each container in a Pod has a single, well-defined responsibility. The main application container doesn't need to know how AI content is generated or how skills are managed — it only serves the results. This separation allows each component to be independently tested, debugged, and replaced, aligning with the Unix philosophy of "do one thing well." In practice, this means: the frontend team can iterate on Nginx configuration without affecting AI logic; AI engineers can upgrade the Copilot SDK version without touching skill management code; and operations staff can adjust skill configurations without notifying the development team. 2. Shared Localhost Network All containers in a Pod share the same network namespace, with the same 127.0.0.1. This means communication between Sidecars is just a simple localhost HTTP call — no service discovery, no DNS resolution, no cross-node network hops. From a performance perspective, localhost communication traverses the kernel's loopback interface, with latency typically in the microsecond range. In contrast, cross-Pod ClusterIP Service calls require routing through kube-proxy's iptables/IPVS rules, with latency typically in the millisecond range. For AI agent scenarios that require frequent interaction, this difference is meaningful. From a security perspective, localhost communication doesn't traverse any network interface, making it inherently immune to eavesdropping by other Pods in the cluster. Unless a Service is explicitly configured, Sidecar ports are not exposed outside the Pod. 3. Efficient Data Transfer via Shared Volumes Kubernetes emptyDir volumes allow containers within the same Pod to share files on disk. Once a Sidecar writes a file, the main container can immediately read and serve it — no message queues, no additional API calls, no databases. This is ideal for workflows where one container produces artifacts (such as generated blog posts) and another consumes them. ⚠️ Technical Precision Note: "Efficient" here means eliminating the overhead of network serialization/deserialization and message middleware. However, emptyDir fundamentally relies on standard file system I/O (disk read/write or tmpfs) and is not equivalent to OS-level "Zero-Copy" (such as the sendfile() system call or DMA direct memory access). For blog content generation — a file-level data transfer use case — filesystem sharing is already highly efficient and sufficiently simple. In the gh-cli-blog-agent project, we take this pattern to its fullest extent by using two Sidecars within a single Pod: A Note on Kubernetes Native Sidecar Containers It is worth noting that Kubernetes 1.28 (August 2023) introduced native Sidecar container support via KEP-753, which reached GA (General Availability) in Kubernetes 1.33 (April 2025). Native Sidecars are implemented by setting restartPolicy: Always on initContainers, providing capabilities that the traditional approach lacks: Deterministic startup order: init containers start in declaration order; main containers only start after Sidecar containers are ready Non-blocking Pod termination: Sidecars are automatically cleaned up after main containers exit, preventing Jobs/CronJobs from being stuck Probe support: Sidecars can be configured with startup, readiness, and liveness probes to signal their operational state This project currently uses the traditional approach of deploying Sidecars as regular containers, with application-level health check polling (wait_for_skill_server) to handle startup dependencies. This approach is compatible with all Kubernetes versions (1.24+), making it suitable for scenarios requiring broad compatibility. If your cluster version is ≥ 1.29 (or ≥ 1.33 for GA stability), we strongly recommend migrating to native Sidecars for platform-level startup order guarantees and more graceful lifecycle management. Migration example: # Native Sidecar syntax (Kubernetes 1.29+) initContainers: - name: skill-server image: blog-agent-skill restartPolicy: Always # Key: marks this as a Sidecar ports: - containerPort: 8002 startupProbe: # Platform-level startup readiness signal httpGet: path: /health port: 8002 periodSeconds: 2 failureThreshold: 30 - name: copilot-agent image: blog-agent-copilot restartPolicy: Always ports: - containerPort: 8001 containers: - name: blog-app # Main container starts last; Sidecars are ready image: blog-agent-main ports: - containerPort: 80 Architecture Overview The deployment defines three containers and three volumes: Container Image Port Role blog-app blog-agent-main 80 Nginx — serves Web UI and reverse proxies to Sidecars copilot-agent blog-agent-copilot 8001 FastAPI — AI blog generation powered by GitHub Copilot SDK skill-server blog-agent-skill 8002 FastAPI — skill file management and synchronization Volume Type Purpose blog-data emptyDir Copilot agent writes generated blogs; Nginx serves them skills-shared emptyDir Skill server writes skill files; Copilot agent reads them skills-source ConfigMap Kubernetes-managed skill definition files (read-only) 💡 Design Insight: The three-volume design embodies the "least privilege" principle — blog-data is shared only between the Copilot agent (write) and Nginx (read); skills-shared is shared only between the skill server (write) and the Copilot agent (read). skills-source provides read-only skill definition sources via ConfigMap, forming a unidirectional data flow: ConfigMap → skill-server → shared volume → copilot-agent. The Kubernetes deployment YAML clearly describes this structure: volumes: - name: blog-data emptyDir: sizeLimit: 256Mi # Production best practice: always set sizeLimit to prevent disk exhaustion - name: skills-shared emptyDir: sizeLimit: 64Mi # Skill files are typically small - name: skills-source configMap: name: blog-agent-skill ⚠️ Production Recommendation: The original configuration used emptyDir: {} without a sizeLimit. In production, an unrestricted emptyDir can grow indefinitely until it exhausts the node's disk space, triggering a node-level DiskPressure condition and causing other Pods to be evicted. Always setting a reasonable sizeLimit for emptyDir is part of the Kubernetes security baseline. Community tools like Kyverno can enforce this practice at the cluster level. Nginx reverse proxies route requests to Sidecars via localhost: # Reverse proxy to copilot-agent sidecar (localhost:8001 within the same Pod) location /agent/ { proxy_pass http://127.0.0.1:8001/; proxy_set_header Host $host; proxy_set_header X-Request-ID $request_id; # Enables cross-container request tracing proxy_read_timeout 600s; # AI generation may take a while } # Reverse proxy to skill-server sidecar (localhost:8002 within the same Pod) location /skill/ { proxy_pass http://127.0.0.1:8002/; proxy_set_header Host $host; } Since all three containers share the same network namespace, 127.0.0.1:8001 and 127.0.0.1:8002 are directly accessible — no ClusterIP Service is needed for intra-Pod communication. This is a core feature of the Kubernetes Pod networking model: all containers within the same Pod share a single network namespace, including IP address and port space. Advantage 1: GitHub Copilot SDK as a Sidecar Encapsulating the GitHub Copilot SDK as a Sidecar, rather than embedding it in the main application, provides several architectural advantages. Understanding the GitHub Copilot SDK Architecture Before diving deeper, let's understand how the GitHub Copilot SDK works. The SDK entered technical preview in January 2026, exposing the production-grade agent runtime behind GitHub Copilot CLI as a programmable SDK supporting Python, TypeScript, Go, and .NET. The SDK's communication architecture is as follows: The SDK client communicates with a locally running Copilot CLI process via the JSON-RPC protocol. The CLI handles model routing, authentication management, MCP server integration, and other low-level details. This means you don't need to build your own planner, tool loop, and runtime — these are all provided by an engine that has been battle-tested in production at GitHub's scale. The benefit of encapsulating this SDK in a Sidecar container is: containerization isolates the CLI process's dependencies and runtime environment, preventing dependency conflicts with the main application or other components. Cross-Platform Node.js Installation in the Container A notable implementation detail is how Node.js (required by the Copilot CLI) is installed inside the container. Rather than relying on third-party APT repositories like NodeSource — which can introduce DNS resolution failures and GPG key management issues in restricted network environments — the Dockerfile downloads the official Node.js binary directly from nodejs.org with automatic architecture detection: # Install Node.js 20+ (official binary, no NodeSource APT repo needed) ARG NODE_VERSION=20.20.0 RUN DPKG_ARCH=$(dpkg --print-architecture) \ && case "${DPKG_ARCH}" in amd64) ARCH=x64;; arm64) ARCH=arm64;; armhf) ARCH=armv7l;; *) ARCH=${DPKG_ARCH};; esac \ && curl -fsSL "https://nodejs.org/dist/v${NODE_VERSION}/node-v${NODE_VERSION}-linux-${ARCH}.tar.xz" -o node.tar.xz \ && tar -xJf node.tar.xz -C /usr/local --strip-components=1 --no-same-owner \ && rm -f node.tar.xz The case statement maps Debian's architecture identifiers (amd64, arm64, armhf) to Node.js's naming convention (x64, arm64, armv7l). This ensures the same Dockerfile works seamlessly on both linux/amd64 (Intel/AMD) and linux/arm64 (Apple Silicon, AWS Graviton) build platforms — an important consideration given the growing adoption of ARM-based infrastructure. Independent Lifecycle and Resource Management The Copilot agent is the most resource-intensive component — it needs to run the Copilot CLI process, manage JSON-RPC communication, and handle streaming responses. By isolating it in its own container, we can assign dedicated CPU and memory limits without affecting the lightweight Nginx container: # copilot-agent: needs more resources for AI inference coordination resources: requests: cpu: 250m memory: 512Mi limits: cpu: "1" memory: 2Gi # blog-app: lightweight Nginx with minimal resource needs resources: requests: cpu: 50m memory: 64Mi limits: cpu: 200m memory: 128Mi This resource isolation delivers two key benefits: Fault isolation: If the Copilot agent crashes due to a timeout or memory spike (OOMKilled), Kubernetes only restarts that container — the Nginx frontend continues running and serving previously generated content. Users see "generation feature temporarily unavailable" rather than "entire site is down." Fine-grained resource scheduling: The Kubernetes scheduler selects nodes based on the sum of Pod-level resource requests. Distributing resource requests across containers allows kubelet to more precisely track each component's actual resource consumption, helping HPA (Horizontal Pod Autoscaler) make better scaling decisions. Graceful Startup Coordination In a multi-Sidecar Pod, regular containers start concurrently (note: this is precisely one of the issues that native Sidecars, discussed earlier, can solve). The Copilot agent handles this through application-level startup dependency checks — it waits for the skill server to become healthy before initializing the CopilotClient: async def wait_for_skill_server(url: str, retries: int = 30, delay: float = 2.0): """Wait for the skill-server sidecar to become healthy. In traditional Sidecar deployments (regular containers), containers start concurrently with no guaranteed startup order. This function implements application-level readiness waiting. If using Kubernetes native Sidecars (initContainers + restartPolicy: Always), the platform guarantees Sidecars start before main containers, which can simplify this logic. """ async with httpx.AsyncClient() as client: for i in range(retries): try: resp = await client.get(f"{url}/health", timeout=5.0) if resp.status_code == 200: logger.info(f"Skill server is healthy at {url}") return True except Exception: pass logger.info(f"Waiting for skill server... ({i + 1}/{retries})") await asyncio.sleep(delay) raise RuntimeError(f"Skill server at {url} did not become healthy") This pattern is critical in traditional Sidecar architectures: you cannot assume startup order, so explicit readiness checks are necessary. The wait_for_skill_server function polls http://127.0.0.1:8002/health at 2-second intervals up to 30 times (maximum total wait of 60 seconds) — simple, effective, and resilient. 💡 Comparison: With native Sidecars, the skill-server would be declared as an initContainer with a startupProbe. Kubernetes would ensure the skill-server is ready before starting the copilot-agent. In that case, wait_for_skill_server could be simplified to a single health check confirmation rather than a retry loop. SDK Configuration via Environment Variables All Copilot SDK configuration is passed through Kubernetes-native primitives, reflecting the 12-Factor App principle of externalized configuration: env: - name: SKILL_SERVER_URL value: "http://127.0.0.1:8002" - name: SKILLS_DIR value: "/skills-shared/blog/SKILL.md" - name: COPILOT_GITHUB_TOKEN valueFrom: secretKeyRef: name: blog-agent-secret key: copilot-github-token Key design decisions explained: COPILOT_GITHUB_TOKEN is stored in a Kubernetes Secret — never baked into images or passed as build arguments. Using the GitHub Copilot SDK requires a valid GitHub Copilot subscription (unless using BYOK mode, i.e., Bring Your Own Key), making secure management of this token critical. SKILLS_DIR points to skill files synchronized to a shared volume by the other Sidecar. This means the Copilot agent container image is completely stateless and can be reused across different skill configurations. SKILL_SERVER_URL uses 127.0.0.1 instead of a service name — since this is intra-Pod communication, DNS resolution is unnecessary. 🔐 Production Security Tip: For stricter security requirements, consider using External Secrets Operator to sync Secrets from AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault, rather than managing them directly in Kubernetes. Native Kubernetes Secrets are only Base64-encoded by default, not encrypted at rest (unless Encryption at Rest is enabled). CopilotClient Sessions and Skill Integration The core of the Copilot Sidecar lies in how it creates sessions with skill directories. When a blog generation request is received, it creates a session with access to skill definitions: session = await copilot_client.create_session({ "model": "claude-sonnet-4-5-20250929", "streaming": True, "skill_directories": [SKILLS_DIR] }) The skill_directories parameter points to files on the shared volume — files placed there by the skill-server sidecar. This is the handoff point: the skill server manages which skills are available, and the Copilot agent consumes them. Neither container needs to know about the other's internal implementation — they are coupled only through the filesystem as an implicit contract. 💡 About Copilot SDK Skills: The GitHub Copilot SDK allows you to define custom Agents, Skills, and Tools. Skills are essentially instruction sets written in Markdown format (typically named SKILL.md) that define the agent's behavior, constraints, and workflows in a specific domain. This is consistent with the .copilot_skills/ directory mechanism in GitHub Copilot CLI. File-Based Output to Shared Volumes Generated blog posts are written to the blog-data shared volume, which is simultaneously mounted in the Nginx container: BLOG_DIR = os.path.join(WORK_DIR, "blog") # ... # Blog saved as blog-YYYY-MM-DD.md # Nginx can serve it immediately from /blog/ without any restart The Nginx configuration auto-indexes this directory: location /blog/ { alias /usr/share/nginx/html/blog/; autoindex on; } The moment the Copilot agent writes a file, it's immediately accessible through the Nginx Web UI. No API calls, no database writes, no cache invalidation — just a shared filesystem. This file-based data transfer has an additional benefit: natural persistence and auditability. Each blog exists as an independent Markdown file with a date-timestamp in its name, making it easy to trace generation history. (Note, however, that emptyDir lifecycle is tied to the Pod — data is lost when the Pod is recreated. For persistence needs, see the "Production Recommendations" section below.) Advantage 2: Skill Server as a Sidecar The skill server is the second Sidecar — a lightweight FastAPI service responsible for managing the skill definitions used by the Copilot agent. Separating skill management into its own container offers clear advantages. Decoupled Skill Lifecycle Skill definitions are stored in a Kubernetes ConfigMap: apiVersion: v1 kind: ConfigMap metadata: name: blog-agent-skill data: SKILL.md: | # Blog Generator Skill Instructions You are a professional technical evangelist... ## Key Requirements 1. Outline generation 2. Mandatory online research (DeepSearch) 3. Technical evangelist perspective ... ConfigMaps can be updated independently of any container image. When you run kubectl apply to update a ConfigMap, Kubernetes synchronizes the change to the volumes mounted in the Pod. ⚠️ Important Detail: ConfigMap volume updates do not take effect immediately. The kubelet detects ConfigMap changes through periodic synchronization, with a default sync period controlled by --sync-frequency (default: 1 minute), plus the ConfigMap cache TTL. The actual propagation delay can be 1–2 minutes. If immediate effect is needed, you must actively call the /sync endpoint to trigger a file synchronization: def sync_skills(): """Copy skill files from ConfigMap source to the shared volume.""" source = Path(SKILLS_SOURCE_DIR) dest = Path(SKILLS_SHARED_DIR) / "blog" dest.mkdir(parents=True, exist_ok=True) synced = 0 for skill_file in source.iterdir(): if skill_file.is_file(): target = dest / skill_file.name shutil.copy2(str(skill_file), str(target)) synced += 1 return synced This design means: updating AI behavior requires no container image rebuilds or redeployments. You simply update the ConfigMap, trigger a sync, and the agent's behavior changes. This is a tremendous operational advantage for iterating on prompts and skills in production. 💡 Advanced Thought: Why not mount the ConfigMap directly to the copilot-agent's SKILLS_DIR path? While technically feasible, introducing the skill-server as an intermediary provides the triple value of validation, API access, and extensibility (see "Why Not Embed Skills in the Copilot Agent" below). Minimal Resource Footprint The skill server does one thing — serve and sync files. Its resource requirements reflect this: resources: requests: cpu: 50m memory: 64Mi limits: cpu: 200m memory: 256Mi Compared to the Copilot agent's 2Gi memory limit, the skill server costs a fraction of the resources. This is the beauty of the Sidecar pattern — you can add lightweight containers for auxiliary functionality without significantly increasing the Pod's total resource consumption. REST API for Skill Introspection The skill server provides a simple REST API that allows external systems or operators to query available skills: .get("/skills") async def list_skills(): """List all available skills.""" source = Path(SKILLS_SOURCE_DIR) skills = [] for f in sorted(source.iterdir()): if f.is_file(): skills.append({ "name": f.stem, "filename": f.name, "size": f.stat().st_size, "url": f"/skill/{f.name}", }) return {"skills": skills, "total": len(skills)} @app.get("/skill/{filename}") async def get_skill(filename: str): """Get skill content by filename.""" file_path = Path(SKILLS_SOURCE_DIR) / filename if not file_path.exists() or not file_path.is_file(): raise HTTPException(status_code=404, detail=f"Skill '{filename}' not found") return {"filename": filename, "content": file_path.read_text(encoding="utf-8")} This API serves multiple purposes: Debugging: Verify which skills are currently loaded without needing to kubectl exec into the container, significantly lowering the troubleshooting barrier. Monitoring: External tools can poll /skills to ensure the expected skill set is deployed. Combined with Prometheus Blackbox Exporter, you can implement configuration drift detection. Extensibility: Future systems can dynamically register or update skills via the API, providing a foundation for A/B testing different prompt strategies. Why Not Embed Skills in the Copilot Agent? Mounting the ConfigMap directly into the Copilot agent container seems simpler. But separating it into a dedicated Sidecar has the following advantages: Validation layer: The skill server can validate skill file format and content before synchronization, preventing invalid skill definitions from causing Copilot SDK runtime errors. API access: Skills become queryable and manageable through a REST interface, supporting operational automation. Independent evolution of logic: If skill management becomes more complex (e.g., dynamic skill registration, version management, prompt A/B testing, role-based skill distribution), the skill server can evolve independently without affecting the Copilot agent. Clear data flow: ConfigMap → skill-server → shared volume → copilot-agent. Each arrow is an explicit, observable step. When something goes wrong, you can pinpoint exactly which stage failed. 💡 Architectural Trade-off: For small-scale deployments or PoC (Proof of Concept) work, directly mounting the ConfigMap to the Copilot agent is a perfectly reasonable choice — fewer components means lower operational overhead. The Sidecar approach's value becomes fully apparent in medium-to-large-scale production environments. Architectural decisions should always align with team size, operational maturity, and business requirements. End-to-End Workflow Here is the complete data flow when a user requests a blog post generation: Every step uses intra-Pod communication — localhost HTTP calls or shared filesystem reads. No external network calls are needed between components. The only external dependency is the Copilot SDK's connection to GitHub authentication services and AI model endpoints via the Copilot CLI. The Kubernetes Service exposes three ports for external access: ports: - name: http # Nginx UI + reverse proxy port: 80 nodePort: 30081 - name: agent-api # Direct access to Copilot Agent port: 8001 nodePort: 30082 - name: skill-api # Direct access to Skill Server port: 8002 nodePort: 30083 ⚠️ Security Warning: In production, it is not recommended to directly expose the agent-api and skill-api ports via NodePort. These two APIs should only be accessible through the Nginx reverse proxy (/agent/ and /skill/ paths), with authentication and rate limiting configured at the Nginx layer. Directly exposing Sidecar ports bypasses the reverse proxy's security controls. Recommended configuration: # Production recommended: only expose the Nginx port ports: - name: http port: 80 targetPort: 80 # Combine with NetworkPolicy to restrict inter-Pod communication Production Recommendations and Architecture Extensions When moving this architecture from a development/demo environment to production, the following areas deserve attention: Cross-Platform Build and Deployment The project's Makefile auto-detects the host architecture to select the appropriate Docker build platform, eliminating the need for manual configuration: ARCH := $(shell uname -m) ifeq ($(ARCH),x86_64) DOCKER_PLATFORM ?= linux/amd64 else ifeq ($(ARCH),aarch64) DOCKER_PLATFORM ?= linux/arm64 else ifeq ($(ARCH),arm64) DOCKER_PLATFORM ?= linux/arm64 else DOCKER_PLATFORM ?= linux/amd64 endif Both macOS and Linux are supported as development environments with dedicated tool installation targets: # macOS (via Homebrew) make install-tools-macos # Linux (downloads official binaries to /usr/local/bin) make install-tools-linux The Linux installation target downloads kubectl and kind binaries directly from upstream release URLs with architecture-aware selection, avoiding dependency on any package manager beyond curl and sudo. This makes the setup portable across different Linux distributions (Ubuntu, Debian, Fedora, etc.). Health Checks and Probe Configuration Configure complete probes for each container to ensure Kubernetes can properly manage container lifecycles: # copilot-agent probe example livenessProbe: httpGet: path: /health port: 8001 initialDelaySeconds: 10 periodSeconds: 30 timeoutSeconds: 5 readinessProbe: httpGet: path: /health port: 8001 periodSeconds: 10 startupProbe: # AI agent startup may be slow httpGet: path: /health port: 8001 periodSeconds: 5 failureThreshold: 30 # Allow up to 150 seconds for startup Data Persistence The emptyDir lifecycle is tied to the Pod. If generated blogs need to survive Pod recreation, consider these approaches: PersistentVolumeClaim (PVC): Replace the blog-data volume with a PVC; data persists independently of Pod lifecycle Object storage upload: After the Copilot agent generates a blog, asynchronously upload to S3/Azure Blob/GCS Git repository push: Automatically commit and push generated Markdown files to a Git repository for versioned management Security Hardening # Set security context for each container securityContext: runAsNonRoot: true runAsUser: 1000 readOnlyRootFilesystem: true # Only write through emptyDir allowPrivilegeEscalation: false capabilities: drop: ["ALL"] Observability Extensions The Sidecar pattern is naturally suited for adding observability components. You can add a third (or fourth) Sidecar to the same Pod for log collection, metrics export, or distributed tracing: Horizontal Scaling Strategy Since containers within a Pod scale together, HPA scaling granularity is at the Pod level. This means: If the Copilot agent is the bottleneck, scaling Pod replicas also scales Nginx and skill-server (minimal waste since they are lightweight) If skill management becomes compute-intensive in the future, consider splitting the skill-server from a Sidecar into an independent Deployment + ClusterIP Service for independent scaling Evolution Path from Sidecar to Microservices The dual Sidecar architecture provides a clear path for future migration to microservices: Each migration step only requires changing the communication method (localhost → Service DNS); business logic remains unchanged. This is the architectural flexibility that good separation of concerns provides. sample code - https://github.com/kinfey/Multi-AI-Agents-Cloud-Native/tree/main/code/GitHubCopilotSideCar Summary The dual Sidecar pattern in this project demonstrates a clean cloud-native AI application architecture: Main container (Nginx) stays lean and focused — it only serves HTML and proxies requests. It knows nothing about AI or skills. Sidecar 1 (Copilot Agent) encapsulates all AI logic. It uses the GitHub Copilot SDK, manages sessions, and generates content. Its only coupling to the rest of the Pod is through environment variables and shared volumes. The container image is built with cross-platform support — Node.js is installed from official binaries with automatic architecture detection, ensuring the same Dockerfile works on both amd64 and arm64 platforms. Sidecar 2 (Skill Server) provides a dedicated management layer for AI skill definitions. It bridges Kubernetes-native configuration (ConfigMap) with the Copilot SDK's runtime needs. This separation gives you independent deployability, isolated failure domains, and — most importantly — the ability to change AI behavior (skills, prompts, models) without rebuilding any container images. The Sidecar pattern is more than an architectural curiosity; it is a practical approach to composing AI services in Kubernetes, allowing each component to evolve at its own pace. With cross-platform build support (macOS and Linux, amd64 and arm64), Kubernetes native Sidecars reaching GA in 1.33, and AI development tools like the GitHub Copilot SDK maturing, we anticipate this "AI agent + Sidecar" combination pattern will see validation and adoption in more production environments. References GitHub Copilot SDK Repository — Official SDK supporting Python/TypeScript/Go/.NET KEP-753: Sidecar Containers — Kubernetes native Sidecar container proposal Kubernetes v1.33 Release: Sidecar Containers GA — Sidecar container GA announcement The Distributed System Toolkit: Patterns for Composite Containers — Classic early Kubernetes article on the Sidecar pattern 12-Factor App: Config — Externalized configuration principles131Views0likes0CommentsNew Azure Open AI models bring fast, expressive, and real‑time AI experiences in Microsoft Foundry
Modern AI applications, whether voice‑first experiences or building large software systems, rarely fit into a single prompt. Real work unfolds over time: maintaining context, following instructions, invoking tools, and adapting as requirements evolve. When these foundations break down through latency spikes, instruction drift, or unreliable tool calls, both user conversations and developer workflows are impacted. OpenAI’s latest models address this shared challenge by prioritizing continuity and reliability across real‑time interaction and long‑running engineering tasks. Starting today, GPT-Realtime-1.5, GPT-Audio-1.5, and GPT-5.3-Codex are rolling out into Microsoft Foundry. Together, these models reflect the growing needs of the modern developer and push the needle from short, stateless interactions toward AI systems that can reason, act, and collaborate over time. GPT-5.3-Codex at a glance GPT‑5.3‑Codex brings together advanced coding capability with broader reasoning and professional problem solving in a single model built for real engineering work. It unifies the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT5.2 in one system. This shifts the experience from optimizing isolated outputs to supporting longer running development efforts; where repositories are large, changes span multiple steps, and requirements aren’t always fully specified at the start. What’s improved Model experiences 25% faster execution time, according to Open AI, than its predecessors so developers can accelerate development of new applications. Built for long-running tasks that involve research, tool use, and complex, multi‑step execution while maintaining context. Midtask steerability and frequent updates allow developers to redirect and collaborate with the model as it works without losing context. Stronger computer-use capabilities allow developers to execute across the full spectrum of technical work. Common use cases Developers and teams can apply GPT‑5.3‑Codex across a wide range of scenarios, including: Refactoring and modernizing large or legacy applications Performing multi‑step migrations or upgrades Running agentic developer workflows that span analysis, implementation, testing, and remediation Automating code reviews, test generation, and defect detection Supporting development in security‑sensitive or regulated environments Pricing Model Input Price/1M Tokens Cached Input Price/1M Tokens Output Price/1M Tokens GPT-5.3-Codex $1.75 $0.175 $14.00 GPT-Realtime-1.5 and GPT-Audio-1.5 at a glance The models deliver measurable gains in reasoning and speech understanding for real‑time voice interactions on Microsoft Foundry. In OpenAI’s evaluations, it shows a +5% lift on Big Bench Audio (reasoning), a +10.23% improvement in alphanumeric transcription, and a +7% gain in instruction following, while maintaining low‑latency performance. Key improvements include: What's improved More natural‑sounding speech: Audio output is smoother and more conversational, with improved pacing and prosody. Higher audio quality: Clearer, more consistent audio output across supported voices. Improved instruction following: Better alignment with developer‑provided system and user instructions during live interactions. Function calling support: Enables structured, tool‑driven interactions within real‑time audio flows. Common use cases Developers are using GPT-Realtime-1.5 and GPT-Audio-1.5 for scenarios where low‑latency voice interaction is essential, including: Conversational voice agents for customer support or internal help desks Voice‑enabled assistants embedded in applications or devices Live voice interfaces for kiosks, demos, and interactive experiences Hands‑free workflows where audio input and output replace keyboard interaction Pricing Model Text Audio Image Input Cached Input Output Input Cached Input Output Input Cached Input Output GPT-Realtime-1.5 $4.00 $0.04 $16.0 $32.0 $0.40 $64.00 $4.00 $0.04 $16.0 GPT-Audio-1.5 $2.50 n/a $10.0 $32.00 n/a $64.00 $2.50 n/a $10.0 Getting started in Microsoft Foundry Start building in Microsoft Foundry, evaluate performance, and explore Azure Open AI models today. Foundry brings evaluation, deployment, and governance into a single workflow, helping teams progress from experiments to scalable applications while maintaining security and operational controls.3.5KViews1like0CommentsAgentic Code Fixing with GitHub Copilot SDK and Foundry Local
Introduction AI-powered coding assistants have transformed how developers write and review code. But most of these tools require sending your source code to cloud services, a non-starter for teams working with proprietary codebases, air-gapped environments, or strict compliance requirements. What if you could have an intelligent coding agent that finds bugs, fixes them, runs your tests, and produces PR-ready summaries, all without a single byte leaving your machine? The Local Repo Patch Agent demonstrates exactly this. By combining the GitHub Copilot SDK for agent orchestration with Foundry Local for on-device inference, this project creates a fully autonomous coding workflow that operates entirely on your hardware. The agent scans your repository, identifies bugs and code smells, applies fixes, verifies them through your test suite, and generates a comprehensive summary of all changes, completely offline and secure. This article explores the architecture behind this integration, walks through the key implementation patterns, and shows you how to run the agent yourself. Whether you're building internal developer tools, exploring agentic workflows, or simply curious about what's possible when you combine GitHub's SDK with local AI, this project provides a production-ready foundation to build upon. Why Local AI Matters for Code Analysis Cloud-based AI coding tools have proven their value—GitHub Copilot has fundamentally changed how millions of developers work. But certain scenarios demand local-first approaches where code never leaves the organisation's network. Consider these real-world constraints that teams face daily: Regulatory compliance: Financial services, healthcare, and government projects often prohibit sending source code to external services, even for analysis Intellectual property protection: Proprietary algorithms and trade secrets can't risk exposure through cloud API calls Air-gapped environments: Secure facilities and classified projects have no internet connectivity whatsoever Latency requirements: Real-time code analysis in IDEs benefits from zero network roundtrip Cost control: High-volume code analysis without per-token API charges The Local Repo Patch Agent addresses all these scenarios. By running the AI model on-device through Foundry Local and using the GitHub Copilot SDK for orchestration, you get the intelligence of agentic coding workflows with complete data sovereignty. The architecture proves that "local-first" doesn't mean "capability-limited." The Technology Stack Two core technologies make this architecture possible, working together through a clever integration called BYOK (Bring Your Own Key). Understanding how they complement each other reveals the elegance of the design. GitHub Copilot SDK The GitHub Copilot SDK provides the agent runtime, the scaffolding that handles planning, tool invocation, streaming responses, and the orchestration loop that makes agentic behaviour possible. Rather than managing raw LLM API calls, developers define tools (functions the agent can call) and system prompts, and the SDK handles everything else. Key capabilities the SDK brings to this project: Session management: Maintains conversation context across multiple agent interactions Tool orchestration: Automatically invokes defined tools when the model requests them Streaming support: Real-time response streaming for responsive user interfaces Provider abstraction: Works with any OpenAI-compatible API through the BYOK configuration Foundry Local Foundry Local brings Azure AI Foundry's model catalog to your local machine. It automatically selects the best available hardware acceleration—GPU, NPU, or CP, and exposes models through an OpenAI-compatible API on localhost. Models run entirely on-device with no telemetry or data transmission. For this project, Foundry Local provides: On-device inference: All AI processing happens locally, ensuring complete data privacy Dynamic port allocation: The SDK auto-detects the Foundry Local endpoint, eliminating configuration hassle Model flexibility: Swap between models like qwen2.5-coder-1.5b , phi-3-mini , or larger variants based on your hardware OpenAI API compatibility: Standard API format means the GitHub Copilot SDK works without modification The BYOK Integration The entire connection between the GitHub Copilot SDK and Foundry Local happens through a single configuration object. This BYOK (Bring Your Own Key) pattern tells the SDK to route all inference requests to your local model instead of cloud services: const session = await client.createSession({ model: modelId, provider: { type: "openai", // Foundry Local speaks OpenAI's API format baseUrl: proxyBaseUrl, // Streaming proxy → Foundry Local apiKey: manager.apiKey, wireApi: "completions", // Chat Completions API }, streaming: true, tools: [ /* your defined tools */ ], }); This configuration is the key insight: with one config object, you've redirected an entire agent framework to run on local hardware. No code changes to the SDK, no special adapters—just standard OpenAI-compatible API communication. Architecture Overview The Local Repo Patch Agent implements a layered architecture where each component has a clear responsibility. Understanding this flow helps when extending or debugging the system. ┌─────────────────────────────────────────────────────────┐ │ Your Terminal / Web UI │ │ npm run demo / npm run ui │ └──────────────┬──────────────────────────────────────────┘ │ ┌──────────────▼──────────────────────────────────────────┐ │ src/agent.ts (this project) │ │ │ │ ┌────────────────────────────┐ ┌──────────────────┐ │ │ │ GitHub Copilot SDK │ │ Agent Tools │ │ │ │ (CopilotClient) │ │ list_files │ │ │ │ BYOK → Foundry │ │ read_file │ │ │ └────────┬───────────────────┘ │ write_file │ │ │ │ │ run_command │ │ └────────────┼───────────────────────┴──────────────────┘ │ │ │ │ JSON-RPC │ ┌────────────▼─────────────────────────────────────────────┐ │ GitHub Copilot CLI (server mode) │ │ Agent orchestration layer │ └────────────┬─────────────────────────────────────────────┘ │ POST /v1/chat/completions (BYOK) ┌────────────▼─────────────────────────────────────────────┐ │ Foundry Local (on-device inference) │ │ Model: qwen2.5-coder-1.5b via ONNX Runtime │ │ Endpoint: auto-detected (dynamic port) │ └───────────────────────────────────────────────────────────┘ The data flow works as follows: your terminal or web browser sends a request to the agent application. The agent uses the GitHub Copilot SDK to manage the conversation, which communicates with the Copilot CLI running in server mode. The CLI, configured with BYOK, sends inference requests to Foundry Local running on localhost. Responses flow back up the same path, with tool invocations happening in the agent.ts layer. The Four-Phase Workflow The agent operates through a structured four-phase loop, each phase building on the previous one's output. This decomposition transforms what would be an overwhelming single prompt into manageable, verifiable steps. Phase 1: PLAN The planning phase scans the repository and produces a numbered fix plan. The agent reads every source and test file, identifies potential issues, and outputs specific tasks to address: // Phase 1 system prompt excerpt const planPrompt = ` You are a code analysis agent. Scan the repository and identify: 1. Bugs that cause test failures 2. Code smells and duplication 3. Style inconsistencies Output a numbered list of fixes, ordered by priority. Each item should specify: file path, line numbers, issue type, and proposed fix. `; The tools available during this phase are list_files and read_file —the agent explores the codebase without modifying anything. This read-only constraint prevents accidental changes before the plan is established. Phase 2: EDIT With a plan in hand, the edit phase applies each fix by rewriting affected files. The agent receives the plan from Phase 1 and systematically addresses each item: // Phase 2 adds the write_file tool const editTools = [ { name: "write_file", description: "Write content to a file, creating or overwriting it", parameters: { type: "object", properties: { path: { type: "string", description: "File path relative to repo root" }, content: { type: "string", description: "Complete file contents" } }, required: ["path", "content"] } } ]; The write_file tool is sandboxed to the demo-repo directory, path traversal attempts are blocked, preventing the agent from modifying files outside the designated workspace. Phase 3: VERIFY After making changes, the verification phase runs the project's test suite to confirm fixes work correctly. If tests fail, the agent attempts to diagnose and repair the issue: // Phase 3 adds run_command with an allowlist const allowedCommands = ["npm test", "npm run lint", "npm run build"]; const runCommandTool = { name: "run_command", description: "Execute a shell command (npm test, npm run lint, npm run build only)", execute: async (command: string) => { if (!allowedCommands.includes(command)) { throw new Error(`Command not allowed: ${command}`); } // Execute and return stdout/stderr } }; The command allowlist is a critical security measure. The agent can only run explicitly permitted commands—no arbitrary shell execution, no data exfiltration, no system modification. Phase 4: SUMMARY The final phase produces a PR-style Markdown report documenting all changes. This summary includes what was changed, why each change was necessary, test results, and recommended follow-up actions: ## Summary of Changes ### Bug Fix: calculateInterest() in account.js - **Issue**: Division instead of multiplication caused incorrect interest calculations - **Fix**: Changed `principal / annualRate` to `principal * (annualRate / 100)` - **Tests**: 3 previously failing tests now pass ### Refactor: Duplicate formatCurrency() removed - **Issue**: Identical function existed in account.js and transaction.js - **Fix**: Both files now import from utils.js - **Impact**: Reduced code duplication, single source of truth ### Test Results - **Before**: 6/9 passing - **After**: 9/9 passing This structured output makes code review straightforward, reviewers can quickly understand what changed and why without digging through diffs. The Demo Repository: Intentional Bugs The project includes a demo-repo directory containing a small banking utility library with intentional problems for the agent to find and fix. This provides a controlled environment to demonstrate the agent's capabilities. Bug 1: Calculation Error in calculateInterest() The account.js file contains a calculation bug that causes test failures: // BUG: should be principal * (annualRate / 100) function calculateInterest(principal, annualRate) { return principal / annualRate; // Division instead of multiplication! } This bug causes 3 of 9 tests to fail. The agent identifies it during the PLAN phase by correlating test failures with the implementation, then fixes it during EDIT. Bug 2: Code Duplication The formatCurrency() function is copy-pasted in both account.js and transaction.js, even though a canonical version exists in utils.js. This duplication creates maintenance burden and potential inconsistency: // In account.js (duplicated) function formatCurrency(amount) { return '$' + amount.toFixed(2); } // In transaction.js (also duplicated) function formatCurrency(amount) { return '$' + amount.toFixed(2); } // In utils.js (canonical, but unused) export function formatCurrency(amount) { return '$' + amount.toFixed(2); } The agent identifies this duplication during planning and refactors both files to import from utils.js, eliminating redundancy. Handling Foundry Local Streaming Quirks One technical challenge the project solves is Foundry Local's behaviour with streaming requests. As of version 0.5, Foundry Local can hang on stream: true requests. The project includes a streaming proxy that works around this limitation transparently. The Streaming Proxy The streaming-proxy.ts file implements a lightweight HTTP proxy that converts streaming requests to non-streaming, then re-encodes the single response as SSE (Server-Sent Events) chunks—the format the OpenAI SDK expects: // streaming-proxy.ts simplified logic async function handleRequest(req: Request): Promise { const body = await req.json(); // If it's a streaming chat completion, convert to non-streaming if (body.stream === true && req.url.includes('/chat/completions')) { body.stream = false; const response = await fetch(foundryEndpoint, { method: 'POST', body: JSON.stringify(body), headers: { 'Content-Type': 'application/json' } }); const data = await response.json(); // Re-encode as SSE stream for the SDK return createSSEResponse(data); } // Non-streaming and non-chat requests pass through unchanged return fetch(foundryEndpoint, req); } This proxy runs on port 8765 by default and sits between the GitHub Copilot SDK and Foundry Local. The SDK thinks it's talking to a streaming-capable endpoint, while the actual inference happens non-streaming. The conversion is transparent, no changes needed to SDK configuration. Text-Based Tool Call Detection Small on-device models like qwen2.5-coder-1.5b sometimes output tool calls as JSON text rather than using OpenAI-style function calling. The SDK won't fire tool.execution_start events for these text-based calls, so the agent includes a regex-based detector: // Pattern to detect tool calls in model output const toolCallPattern = /\{[\s\S]*"name":\s*"(list_files|read_file|write_file|run_command)"[\s\S]*\}/; function detectToolCall(text: string): ToolCall | null { const match = text.match(toolCallPattern); if (match) { try { return JSON.parse(match[0]); } catch { return null; } } return null; } This fallback ensures tool calls are captured regardless of whether the model uses native function calling or text output, keeping the dashboard's tool call counter and CLI log accurate. Security Considerations Running an AI agent that can read and write files and execute commands requires careful security design. The Local Repo Patch Agent implements multiple layers of protection: 100% local execution: No code, prompts, or responses leave your machine—complete data sovereignty Command allowlist: The agent can only run npm test , npm run lint , and npm run build —no arbitrary shell commands Path sandboxing: File tools are locked to the demo-repo/ directory; path traversal attempts like ../../../etc/passwd are rejected File size limits: The read_file tool rejects files over 256 KB, preventing memory exhaustion attacks Recursion limits: Directory listing caps at 20 levels deep, preventing infinite traversal These constraints demonstrate responsible AI agent design. The agent has enough capability to do useful work but not enough to cause harm. When extending this project for your own use cases, maintain similar principles, grant minimum necessary permissions, validate all inputs, and fail closed on unexpected conditions. Running the Agent Getting the Local Repo Patch Agent running on your machine takes about five minutes. The project includes setup scripts that handle prerequisites automatically. Prerequisites Before running the setup, ensure you have: Node.js 18 or higher: Download from nodejs.org (LTS version recommended) Foundry Local: Install via winget install Microsoft.FoundryLocal (Windows) or brew install foundrylocal (macOS) GitHub Copilot CLI: Follow the GitHub Copilot CLI install guide Verify your installations: node --version # Should print v18.x.x or higher foundry --version copilot --version One-Command Setup The easiest path uses the provided setup scripts that install dependencies, start Foundry Local, and download the AI model: # Clone the repository git clone https://github.com/leestott/copilotsdk_foundrylocal.git cd copilotsdk_foundrylocal # Windows (PowerShell) .\setup.ps1 # macOS / Linux chmod +x setup.sh ./setup.sh When setup completes, you'll see: ━━━ Setup complete! ━━━ You're ready to go. Run one of these commands: npm run demo CLI agent (terminal output) npm run ui Web dashboard (http://localhost:3000) Manual Setup If you prefer step-by-step control: # Install npm packages npm install cd demo-repo && npm install --ignore-scripts && cd .. # Start Foundry Local and download the model foundry service start foundry model run qwen2.5-coder-1.5b # Copy environment configuration cp .env.example .env # Run the agent npm run demo The first model download takes a few minutes depending on your connection. After that, the model runs from cache with no internet required. Using the Web Dashboard For a visual experience with real-time streaming, launch the web UI: npm run ui Open http://localhost:3000 in your browser. The dashboard provides: Phase progress sidebar: Visual indication of which phase is running, completed, or errored Live streaming output: Model responses appear in real-time via WebSocket Tool call log: Every tool invocation logged with phase context Phase timing table: Performance metrics showing how long each phase took Environment info: Current model, endpoint, and repository path at a glance Configuration Options The agent supports several environment variables for customisation. Edit the .env file or set them directly: Variable Default Description FOUNDRY_LOCAL_ENDPOINT auto-detected Override the Foundry Local API endpoint FOUNDRY_LOCAL_API_KEY auto-detected Override the API key FOUNDRY_MODEL qwen2.5-coder-1.5b Which model to use from the Foundry Local catalog FOUNDRY_TIMEOUT_MS 180000 (3 min) How long each agent phase can run before timing out FOUNDRY_NO_PROXY — Set to 1 to disable the streaming proxy PORT 3000 Port for the web dashboard Using Different Models To try a different model from the Foundry Local catalog: # Use phi-3-mini instead FOUNDRY_MODEL=phi-3-mini npm run demo # Use a larger model for higher quality (requires more RAM/VRAM) FOUNDRY_MODEL=qwen2.5-7b npm run demo Adjusting for Slower Hardware If you're running on CPU-only or limited hardware, increase the timeout to give the model more time per phase: # 5 minutes per phase instead of 3 FOUNDRY_TIMEOUT_MS=300000 npm run demo Troubleshooting Common Issues When things don't work as expected, these solutions address the most common problems: Problem Solution foundry: command not found Install Foundry Local—see Prerequisites section copilot: command not found Install GitHub Copilot CLI—see Prerequisites section Agent times out on every phase Increase FOUNDRY_TIMEOUT_MS (e.g., 300000 for 5 min). CPU-only machines are slower. Port 3000 already in use Set PORT=3001 npm run ui Model download is slow First download can take 5-10 min. Subsequent runs use the cache. Cannot find module errors Run npm install again, then cd demo-repo && npm install --ignore-scripts Tests still fail after agent runs The agent edits files in demo-repo/. Reset with git checkout demo-repo/ and run again. PowerShell blocks setup.ps1 Run Set-ExecutionPolicy -Scope Process Bypass first, then .\setup.ps1 Diagnostic Test Scripts The src/tests/ folder contains standalone scripts for debugging SDK and Foundry Local integration issues. These are invaluable when things go wrong: # Debug-level SDK event logging npx tsx src/tests/test-debug.ts # Test non-streaming inference (bypasses streaming proxy) npx tsx src/tests/test-nostream.ts # Raw fetch to Foundry Local (bypasses SDK entirely) npx tsx src/tests/test-stream-direct.ts # Start the traffic-inspection proxy npx tsx src/tests/test-proxy.ts These scripts isolate different layers of the stack, helping identify whether issues lie in Foundry Local, the streaming proxy, the SDK, or your application code. Key Takeaways BYOK enables local-first AI: A single configuration object redirects the entire GitHub Copilot SDK to use on-device inference through Foundry Local Phased workflows improve reliability: Breaking complex tasks into PLAN → EDIT → VERIFY → SUMMARY phases makes agent behaviour predictable and debuggable Security requires intentional design: Allowlists, sandboxing, and size limits constrain agent capabilities to safe operations Local models have quirks: The streaming proxy and text-based tool detection demonstrate how to work around on-device model limitations Real-time feedback matters: The web dashboard with WebSocket streaming makes agent progress visible and builds trust in the system The architecture is extensible: Add new tools, change models, or modify phases to adapt the agent to your specific needs Conclusion and Next Steps The Local Repo Patch Agent proves that sophisticated agentic coding workflows don't require cloud infrastructure. By combining the GitHub Copilot SDK's orchestration capabilities with Foundry Local's on-device inference, you get intelligent code analysis that respects data sovereignty completely. The patterns demonstrated here, BYOK integration, phased execution, security sandboxing, and streaming workarounds, transfer directly to production systems. Consider extending this foundation with: Custom tool sets: Add database queries, API calls to internal services, or integration with your CI/CD pipeline Multiple repository support: Scan and fix issues across an entire codebase or monorepo Different model sizes: Use smaller models for quick scans, larger ones for complex refactoring Human-in-the-loop approval: Add review steps before applying fixes to production code Integration with Git workflows: Automatically create branches and PRs from agent-generated fixes Clone the repository, run through the demo, and start building your own local-first AI coding tools. The future of developer AI isn't just cloud—it's intelligent systems that run wherever your code lives. Resources Local Repo Patch Agent Repository – Full source code with setup scripts and documentation Foundry Local – Official site for on-device AI inference Foundry Local GitHub Repository – Installation instructions and CLI reference Foundry Local Get Started Guide – Official Microsoft Learn documentation Foundry Local SDK Reference – Python and JavaScript SDK documentation GitHub Copilot SDK – Official SDK repository GitHub Copilot SDK BYOK Documentation – Bring Your Own Key integration guide GitHub Copilot SDK Getting Started – SDK setup and first agent tutorial Microsoft Sample: Copilot SDK + Foundry Local – Official integration sample from Microsoft513Views0likes0CommentsAgents League: Two Weeks, Three Tracks, One Challenge
We're inviting all developers to join Agents League, running February 16-27. It's a two-week challenge where you'll build AI agents using production-ready tools, learn from live coding sessions, and get feedback directly from Microsoft product teams. We've put together starter kits for each track to help you get up and running quickly that also includes requirements and guidelines. Whether you want to explore what GitHub Copilot can do beyond autocomplete, build reasoning agents on Microsoft Foundry, or create enterprise integrations for Microsoft 365 Copilot, we have a track for you. Important: Register first to be eligible for prizes and your digital badge. Without registration, you won't qualify for awards or receive a badge when you submit. What Is Agents League? It's a 2-week competition that combines learning with building: 📽️ Live coding battles – Watch Product teams, MVPs and community members tackle challenges in real-time on Microsoft Reactor 💻 Async challenges – Build at your own pace, on your schedule 💬 Discord community – Connect with other participants, join AMAs, and get help when you need it 🏆 Prizes – $500 per track winner, plus GitHub Copilot Pro subscriptions for top picks The Three Tracks 🎨 Creative Apps — Build with GitHub Copilot (Chat, CLI, or SDK) 🧠 Reasoning Agents — Build with Microsoft Foundry 💼 Enterprise Agents — Build with M365 Agents Toolkit (or Copilot Studio) More details on each track below, or jump straight to the starter kits. The Schedule Agents League starts on February 16th and runs through Feburary 27th. Within 2 weeks, we host live battles on Reactor and AMA sessions on Discord. Week 1: Live Battles (Feb 17-19) We're kicking off with live coding battles streamed on Microsoft Reactor. Watch experienced developers compete in real-time, explaining their approach and architectural decisions as they go. Tue Feb 17, 9 AM PT — 🎨 Creative Apps battle Wed Feb 18, 9 AM PT — 🧠 Reasoning Agents battle Thu Feb 19, 9 AM PT — 💼 Enterprise Agents battle All sessions are recorded, so you can watch on your own schedule. Week 2: Build + AMAs (Feb 24-26) This is your time to build and ask questions on Discord. The async format means you work when it suits you, evenings, weekends, whatever fits your schedule. We're also hosting AMAs on Discord where you can ask questions directly to Microsoft experts and product teams: Tue Feb 24, 9 AM PT — 🎨 Creative Apps AMA Wed Feb 25, 9 AM PT — 🧠 Reasoning Agents AMA Thu Feb 26, 9 AM PT — 💼 Enterprise Agents AMA Bring your questions, get help when you're stuck, and share what you're building with the community. Pick Your Track We've created a starter kit for each track with setup guides, project ideas, and example scenarios to help you get started quickly. 🎨 Creative Apps Tool: GitHub Copilot (Chat, CLI, or SDK) Build innovative, imaginative applications that showcase the potential of AI-assisted development. All application types are welcome, web apps, CLI tools, games, mobile apps, desktop applications, and more. The starter kit walks you through GitHub Copilot's different modes and provides prompting tips to get the best results. View the Creative Apps starter kit. 🧠 Reasoning Agents Tool: Microsoft Foundry (UI or SDK) and/or Microsoft Agent Framework Build a multi-agent system that leverages advanced reasoning capabilities to solve complex problems. This track focuses on agents that can plan, reason through multi-step problems, and collaborate. The starter kit includes architecture patterns, reasoning strategies (planner-executor, critic/verifier, self-reflection), and integration guides for tools and MCP servers. View the Reasoning Agents starter kit. 💼 Enterprise Agents Tool: M365 Agents Toolkit or Copilot Studio Create intelligent agents that extend Microsoft 365 Copilot to address real-world enterprise scenarios. Your agent must work on Microsoft 365 Copilot Chat. Bonus points for: MCP server integration, OAuth security, Adaptive Cards UI, connected agents (multi-agent architecture). View the Enterprise Agents starter kit. Prizes & Recognition To be eligible for prizes and your digital badge, you must register before submitting your project. Category Winners ($500 each): 🎨 Creative Apps winner 🧠 Reasoning Agents winner 💼 Enterprise Agents winner GitHub Copilot Pro subscriptions: Community Favorite (voted by participants on Discord) Product Team Picks (selected by Microsoft product teams) Everyone who registers and submits a project wins: A digital badge to showcase their participation. Beyond the prizes, every participant gets feedback from the teams who built these tools, a valuable opportunity to learn and improve your approach to AI agent development. How to Get Started Register first — This is required to be eligible for prizes and to receive your digital badge. Without registration, your submission won't qualify for awards or a badge. Pick a track — Choose one track. Explore the starter kits to help you decide. Watch the battles — See how experienced developers approach these challenges. Great for learning even if you're still deciding whether to compete. Build your project — You have until Feb 27. Work on your own schedule. Submit via GitHub — Open an issue using the project submission template. Join us on Discord — Get help, share your progress, and vote for your favorite projects on Discord. Links Register: https://aka.ms/agentsleague/register Starter Kits: https://github.com/microsoft/agentsleague/starter-kits Discord: https://aka.ms/agentsleague/discord Live Battles: https://aka.ms/agentsleague/battles Submit Project: Project submission templateFrom Zero to 16 Games in 2 Hours
From Zero to 16 Games in 2 Hours: Teaching Prompt Engineering to Students with GitHub Copilot CLI Introduction What happens when you give a room full of 14-year-olds access to AI-powered development tools and challenge them to build games? You might expect chaos, confusion, or at best, a few half-working prototypes. Instead, we witnessed something remarkable: 16 fully functional HTML5 games created in under two hours, all from students with varying programming experience. This wasn't magic, it was the power of GitHub Copilot CLI combined with effective prompt engineering. By teaching students to communicate clearly with AI, we transformed a traditional coding workshop into a rapid prototyping session that exceeded everyone's expectations. The secret weapon? A technique called "one-shot prompting" that enables anyone to generate complete, working applications from a single, well-crafted prompt. In this article, we'll explore how we structured this workshop using CopilotCLI-OneShotPromptGameDev, a methodology designed to teach prompt engineering fundamentals while producing tangible, exciting results. Whether you're an educator planning STEM workshops, a developer exploring AI-assisted coding, or simply curious about how young people can leverage AI tools effectively, this guide provides a practical blueprint you can replicate. What is GitHub Copilot CLI? GitHub Copilot CLI extends the familiar Copilot experience beyond your code editor into the command line. While Copilot in VS Code suggests code completions as you type, Copilot CLI allows you to have conversational interactions with AI directly in your terminal. You describe what you want to accomplish in natural language, and the AI responds with shell commands, explanations, or in our case, complete code files. This terminal-based approach offers several advantages for learning and rapid prototyping. Students don't need to configure complex IDE settings or navigate unfamiliar interfaces. They simply type their request, review the AI's output, and iterate. The command line provides a transparent view of exactly what's happening, no hidden abstractions or magical "autocomplete" that obscures the learning process. For our workshop, Copilot CLI served as a bridge between students' creative ideas and working code. They could describe a game concept in plain English, watch the AI generate HTML, CSS, and JavaScript, then immediately test the result in a browser. This rapid feedback loop kept engagement high and made the connection between language and code tangible. Installing GitHub Copilot CLI Setting up Copilot CLI requires a few straightforward steps. Before the workshop, we ensured all machines were pre-configured, but students also learned the installation process as part of understanding how developer tools work. First, you'll need Node.js installed on your system. Copilot CLI runs as a Node package, so this is a prerequisite: # Check if Node.js is installed node --version # If not installed, download from https://nodejs.org/ # Or use a package manager: # Windows (winget) winget install OpenJS.NodeJS.LTS # macOS (Homebrew) brew install node # Linux (apt) sudo apt install nodejs npm These commands verify your Node.js installation or guide you through installing it using your operating system's preferred package manager. Next, install the GitHub CLI, which provides the foundation for Copilot CLI: # Windows winget install GitHub.cli # macOS brew install gh # Linux sudo apt install gh This installs the GitHub command-line interface, which handles authentication and provides the framework for Copilot integration. With GitHub CLI installed, authenticate with your GitHub account: gh auth login This command initiates an interactive authentication flow that connects your terminal to your GitHub account, enabling access to Copilot features. Finally, install the Copilot CLI extension: gh extension install github/gh-copilot This adds Copilot capabilities to your GitHub CLI installation, enabling the conversational AI features we'll use for game development. Verify the installation by running: gh copilot --help If you see the help output with available commands, you're ready to start prompting. The entire setup takes about 5-10 minutes on a fresh machine, making it practical for classroom environments. Understanding One-Shot Prompting Traditional programming education follows an incremental approach: learn syntax, understand concepts, build small programs, gradually tackle larger projects. This method is thorough but slow. One-shot prompting inverts this model—you start with the complete vision and let AI handle the implementation details. A one-shot prompt provides the AI with all the context it needs to generate a complete, working solution in a single response. Instead of iteratively refining code through multiple exchanges, you craft one comprehensive prompt that specifies requirements, constraints, styling preferences, and technical specifications. The AI then produces complete, functional code. This approach teaches a crucial skill: clear communication of technical requirements. Students must think through their entire game concept before typing. What does the game look like? How does the player interact with it? What happens when they win or lose? By forcing this upfront thinking, one-shot prompting develops the same analytical skills that professional developers use when writing specifications or planning architectures. The technique also demonstrates a powerful principle: with sufficient context, AI can handle implementation complexity while humans focus on creativity and design. Students learned they could create sophisticated games without memorizing JavaScript syntax—they just needed to describe their vision clearly enough for the AI to understand. Crafting Effective Prompts for Game Development The difference between a vague prompt and an effective one-shot prompt is the difference between frustration and success. We taught students a structured approach to prompt construction that consistently produced working games. Start with the game type and core mechanic. Don't just say "make a game"—specify what kind: Create a complete HTML5 game where the player controls a spaceship that must dodge falling asteroids. This opening establishes the fundamental gameplay loop: control a spaceship, avoid obstacles. The AI now has a clear mental model to work from. Add visual and interaction details. Games are visual experiences, so specify how things should look and respond: Create a complete HTML5 game where the player controls a spaceship that must dodge falling asteroids. The spaceship should be a blue triangle at the bottom of the screen, controlled by left and right arrow keys. Asteroids are brown circles that fall from the top at random positions and increasing speeds. These additions provide concrete visual targets and define the input mechanism. The AI can now generate specific CSS colors and event handlers. Define win/lose conditions and scoring: Create a complete HTML5 game where the player controls a spaceship that must dodge falling asteroids. The spaceship should be a blue triangle at the bottom of the screen, controlled by left and right arrow keys. Asteroids are brown circles that fall from the top at random positions and increasing speeds. Display a score that increases every second the player survives. The game ends when an asteroid hits the spaceship, showing a "Game Over" screen with the final score and a "Play Again" button. This complete prompt now specifies the entire game loop: gameplay, scoring, losing, and restarting. The AI has everything needed to generate a fully playable game. The formula students learned: Game Type + Visual Description + Controls + Rules + Win/Lose + Score = Complete Game Prompt. Running the Workshop: Structure and Approach Our two-hour workshop followed a carefully designed structure that balanced instruction with hands-on creation. We partnered with University College London and students access to GitHub Education to access resources specifically designed for classroom settings, including student accounts with Copilot access and amazing tools like VSCode and Azure for Students and for Schools VSCode Education. The first 20 minutes covered fundamentals: what is AI, how does Copilot work, and why does prompt quality matter? We demonstrated this with a live example, showing how "make a game" produces confused output while a detailed prompt generates playable code. This contrast immediately captured students' attention, they could see the direct relationship between their words and the AI's output. The next 15 minutes focused on the prompt formula. We broke down several example prompts, highlighting each component: game type, visuals, controls, rules, scoring. Students practiced identifying these elements in prompts before writing their own. This analysis phase prepared them to construct effective prompts independently. The remaining 85 minutes were dedicated to creation. Students worked individually or in pairs, brainstorming game concepts, writing prompts, generating code, testing in browsers, and iterating. Instructors circulated to help debug prompts (not code an important distinction) and encourage experimentation. We deliberately avoided teaching JavaScript syntax. When students encountered bugs, we guided them to refine their prompts rather than manually fix code. This maintained focus on the core skill: communicating with AI effectively. Surprisingly, this approach resulted in fewer bugs overall because students learned to be more precise in their initial descriptions. Student Projects: The Games They Created The diversity of games produced in 85 minutes of building time amazed everyone present. Students didn't just follow a template, they invented entirely new concepts and successfully communicated them to Copilot CLI. One student created a "Fruit Ninja" clone where players clicked falling fruit to slice it before it hit the ground. Another built a typing speed game that challenged players to correctly type increasingly difficult words against a countdown timer. A pair of collaborators produced a two-player tank battle where each player controlled their tank with different keyboard keys. Several students explored educational games: a math challenge where players solve equations to destroy incoming meteors, a geography quiz with animated maps, and a vocabulary builder where correct definitions unlock new levels. These projects demonstrated that one-shot prompting isn't limited to entertainment, students naturally gravitated toward useful applications. The most complex project was a procedurally generated maze game with fog-of-war mechanics. The student spent extra time on their prompt, specifying exactly how visibility should work around the player character. Their detailed approach paid off with a surprisingly sophisticated result that would typically require hours of manual coding. By the session's end, we had 16 complete, playable HTML5 games. Every student who participated produced something they could share with friends and family a tangible achievement that transformed an abstract "coding workshop" into a genuine creative accomplishment. Key Benefits of Copilot CLI for Rapid Prototyping Our workshop revealed several advantages that make Copilot CLI particularly valuable for rapid prototyping scenarios, whether in educational settings or professional development. Speed of iteration fundamentally changes what's possible. Traditional game development requires hours to produce even simple prototypes. With Copilot CLI, students went from concept to playable game in minutes. This compressed timeline enables experimentation, if your first idea doesn't work, try another. This psychological freedom to fail fast and try again proved more valuable than any technical instruction. Accessibility removes barriers to entry. Students with no prior coding experience produced results comparable to those who had taken programming classes. The playing field leveled because success depended on creativity and communication rather than memorized syntax. This democratization of development opens doors for students who might otherwise feel excluded from technical fields. Focus on design over implementation teaches transferable skills. Whether students eventually become programmers, designers, product managers, or pursue entirely different careers, the ability to clearly specify requirements and think through complete systems applies universally. They learned to think like system designers, not just coders. The feedback loop keeps engagement high. Seeing your words transform into working software within seconds creates an addictive cycle of creation and testing. Students who typically struggle with attention during lectures remained focused throughout the building session. The immediate gratification of seeing their games work motivated continuous refinement. Debugging through prompts teaches root cause analysis. When games didn't work as expected, students had to analyze what they'd asked for versus what they received. This comparison exercise developed critical thinking about specifications a skill that serves developers throughout their careers. Tips for Educators: Running Your Own Workshop If you're planning to replicate this workshop, several lessons from our experience will help ensure success. Pre-configure machines whenever possible. While installation is straightforward, classroom time is precious. Having Copilot CLI ready on all devices lets you dive into content immediately. If pre-configuration isn't possible, allocate the first 15-20 minutes specifically for setup and troubleshoot as a group. Prepare example prompts across difficulty levels. Some students will grasp one-shot prompting immediately; others will need more scaffolding. Having templates ranging from simple ("Create Pong") to complex (the spaceship example above) lets you meet students where they are. Emphasize that "prompt debugging" is the goal. When students ask for help fixing broken code, redirect them to examine their prompt. What did they ask for? What did they get? Where's the gap? This redirection reinforces the workshop's core learning objective and builds self-sufficiency. Celebrate and share widely. Build in time at the end for students to demonstrate their games. This showcase moment validates their work and often inspires classmates to try new approaches in future sessions. Consider creating a shared folder or simple website where all games can be accessed after the workshop. Access GitHub Education resources at education.github.com before your workshop. The GitHub Education program provides free access to developer tools for students and educators, including Copilot. The resources there include curriculum materials, teaching guides, and community support that can enhance your workshop. Beyond Games: Where This Leads The techniques students learned extend far beyond game development. One-shot prompting with Copilot CLI works for any development task: creating web pages, building utilities, generating data processing scripts, or prototyping application interfaces. The fundamental skill, communicating requirements clearly to AI applies wherever AI-assisted development tools are used. Several students have continued exploring after the workshop. Some discovered they enjoy the creative aspects of game design and are learning traditional programming to gain more control. Others found that prompt engineering itself interests them, they're exploring how different phrasings affect AI outputs across various domains. For professional developers, the workshop's lessons apply directly to working with Copilot, ChatGPT, and other AI coding assistants. The ability to craft precise, complete prompts determines whether these tools save time or create confusion. Investing in prompt engineering skills yields returns across every AI-assisted workflow. Key Takeaways Clear prompts produce working code: The one-shot prompting formula (Game Type + Visuals + Controls + Rules + Win/Lose + Score) reliably generates playable games from single prompts Copilot CLI democratizes development: Students with no coding experience created functional applications by focusing on communication rather than syntax Rapid iteration enables experimentation: Minutes-per-prototype timelines encourage creative risk-taking and learning from failures Prompt debugging builds analytical skills: Comparing intended versus actual results teaches specification writing and root cause analysis Sixteen games in two hours is achievable: With proper structure and preparation, young students can produce impressive results using AI-assisted development Conclusion and Next Steps Our workshop demonstrated that AI-assisted development tools like GitHub Copilot CLI aren't just productivity boosters for experienced programmers, they're powerful educational instruments that make software creation accessible to beginners. By focusing on prompt engineering rather than traditional syntax instruction, we enabled 14-year-old students to produce complete, functional games in a fraction of the time traditional methods would require. The sixteen games created during those two hours represent more than just workshop outputs. They represent a shift in how we might teach technical creativity: start with vision, communicate clearly, iterate quickly. Whether students pursue programming careers or not, they've gained experience in thinking systematically about requirements and translating ideas into specifications that produce real results. To explore this approach yourself, visit the CopilotCLI-OneShotPromptGameDev repository for prompt templates, workshop materials, and example games. For educational resources and student access to GitHub tools including Copilot, explore GitHub Education. And most importantly, start experimenting. Write a prompt, generate some code, and see what you can create in the next few minutes. Resources CopilotCLI-OneShotPromptGameDev Repository - Workshop materials, prompt templates, and example games GitHub Education - Free developer tools and resources for students and educators GitHub Copilot CLI Documentation - Official installation and usage guide GitHub CLI - Foundation tool required for Copilot CLI GitHub Copilot - Overview of Copilot features and pricing314Views2likes3CommentsNow in Foundry: Qwen3-Coder-Next, Qwen3-ASR-1.7B, Z-Image
This week's spotlight features three models from that demonstrate enterprise-grade AI across the full scope of modalities. From low latency coding agents to state-of-the-art multilingual speech recognition and foundation-quality image generation, these models showcase the breadth of innovation happening in open-source AI. Each model balances performance with practical deployment considerations, making them viable for production systems while pushing the boundaries of what's possible in their respective domains. This week's Model Mondays edition highlights Qwen3-Coder-Next, an 80B MoE model that activates only 3B parameters while delivering coding agent capabilities with 256k context; Qwen3-ASR-1.7B, which achieves state-of-the-art accuracy across 52 languages and dialects; and Z-Image from Tongyi-MAI, an undistilled text-to-image foundation model with full Classifier-Free Guidance support for professional creative workflows. Models of the week Qwen: Qwen3-Coder-Next Model Specs Parameters / size: 80B total (3B activated) Context length: 262,144 tokens Primary task: Text generation (coding agents, tool use) Why it's interesting Extreme efficiency: Activates only 3B of 80B parameters while delivering performance comparable to models with 10-20x more active parameters, making advanced coding agents viable for local deployment on consumer hardware Built for agentic workflows: Excels at long-horizon reasoning, complex tool usage, and recovering from execution failures, a critical capability for autonomous development that go beyond simple code completion Benchmarks: Competitive performance with significantly larger models on SWE-bench and coding benchmarks (Technical Report) Try it Use Case Prompt Pattern Code generation with tool use Provide task context, available tools, and execution environment details Long-context refactoring Include full codebase context within 256k window with specific refactoring goals Autonomous debugging Present error logs, stack traces, and relevant code with failure recovery instructions Multi-file code synthesis Describe architecture requirements and file structure expectations Financial services sample prompt: You are a coding agent for a fintech platform. Implement a transaction reconciliation service that processes batches of transactions, detects discrepancies between internal records and bank statements, and generates audit reports. Use the provided database connection tool, logging utility, and alert system. Handle edge cases including partial matches, timing differences, and duplicate transactions. Include unit tests with 90%+ coverage. Qwen: Qwen3-ASR-1.7B Model Specs Parameters / size: 1.7B Context length: 256 tokens (default), configurable up to 4096 Primary task: Automatic speech recognition (multilingual) Why it's interesting All-in-one multilingual capability: Single 1.7B model handles language identification plus speech recognition for 30 languages, 22 Chinese dialects, and English accents from multiple regions—eliminating the need to manage separate models per language Specialized audio versatility: Transcribes not just clean speech but singing voice, songs with background music, and extended audio files, expanding use cases beyond traditional ASR to entertainment and media workflows State-of-the-art accuracy: Outperforms GPT-4o, Gemini-2.5, and Whisper-large-v3 across multiple benchmarks. English: Tedlium 4.50 WER vs 7.69/6.15/6.84; Chinese: WenetSpeech 4.97/5.88 WER vs 15.30/14.43/9.86 (Technical Paper) Language ID included: 97.9% average accuracy across benchmark datasets for automatic language identification, eliminating the need for separate language detection pipelines Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation Customer support sample prompt: Deploy Qwen3-ASR-1.7B to a Microsoft Foundry endpoint and transcribe multilingual customer service calls. Send audio files via API to automatically detect the language (from 52 supported options including 30 languages and 22 Chinese dialects) and generate accurate transcripts. Process calls from customers speaking English, Spanish, Mandarin, Cantonese, Arabic, French, and other languages without managing separate models per language. Use transcripts for quality assurance, compliance monitoring, and customer sentiment analysis. Tongyi-MAI: Z-Image Model Specs Parameters / size: 6B Context length: N/A (text-to-image) Primary task: Text-to-image generation Why it's interesting Undistilled foundation model: Full-capacity base without distillation preserves complete training signal with Classifier-Free Guidance support (a technique that improves prompt adherence and output quality), enabling complex prompt engineering and negative prompting that distilled models cannot achieve High output diversity: Generates distinct character identities in multi-person scenes with varied compositions, facial features, and lighting, critical for creative applications requiring visual variety rather than consistency Aesthetic versatility: Handles diverse visual styles from hyper-realistic photography to anime and stylized illustrations within a single model, supporting resolutions from 512×512 to 2048×2048 at any aspect ratio with 28-50 inference steps (Technical Paper) Try it Use Case Prompt Pattern Multilingual transcription Send audio files via API with automatic language detection Call center analytics Process customer service recordings to extract transcripts and identify languages Content moderation Transcribe user-generated audio content across multiple languages Meeting transcription Convert multilingual meeting recordings to text for documentation E-commerce sample prompt: Professional product photography of a modern ergonomic office chair in a bright Scandinavian-style home office. Natural window lighting from left, clean white desk with laptop and succulent plant, light oak hardwood floor. Chair positioned at 45-degree angle showing design details. Photorealistic, commercial photography, sharp focus, 85mm lens, f/2.8, soft shadows. Getting started You can deploy open‑source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry692Views0likes0CommentsBuild an AI-Powered Space Invaders Game
Build an AI-Powered Space Invaders Game: Integrating LLMs into HTML5 Games with Microsoft Foundry Local Introduction What if your game could talk back to you? Imagine playing Space Invaders while an AI commander taunts you during battle, delivers personalized mission briefings, and provides real-time feedback based on your performance. This isn't science fiction it's something you can build today using HTML, JavaScript, and a locally-running AI model. In this tutorial, we'll explore how to create an HTML5 game with integrated Large Language Model (LLM) features using Microsoft Foundry Local. You'll learn how to combine classic game development with modern AI capabilities, all running entirely on your own machine—no cloud services, no API costs, no internet connection required during gameplay. We'll be working with the Space Invaders - AI Commander Edition project, which demonstrates exactly how to architect games that leverage local AI. Whether you're a student learning game development, exploring AI integration patterns, or building your portfolio, this guide provides practical, hands-on experience with technologies that are reshaping how we build interactive applications. What You'll Learn By the end of this tutorial, you'll understand how to combine traditional web development with local AI inference. These skills transfer directly to building chatbots, interactive tutorials, AI-enhanced productivity tools, and any application where you want intelligent, context-aware responses. Set up Microsoft Foundry Local for running AI models on your machine Understand the architecture of games that integrate LLM features Use GitHub Copilot CLI to accelerate your development workflow Implement AI-powered game features like dynamic commentary and adaptive feedback Extend the project with your own creative AI features Why Local AI for Games? Before diving into the code, let's understand why running AI locally matters for game development. Traditional cloud-based AI services have limitations that make them impractical for real-time gaming experiences. Latency is the first challenge. Cloud API calls typically take 500ms to several seconds, an eternity in a game running at 60 frames per second. Local inference can respond in tens of milliseconds, enabling AI responses that feel instantaneous and natural. When an enemy ship appears, your AI commander can taunt you immediately, not three seconds later. Cost is another consideration. Cloud AI services charge per token, which adds up quickly when generating dynamic content during gameplay. Local models have zero per-use cost, once installed, they run entirely on your hardware. This frees you to experiment without worrying about API bills. Privacy and offline capability complete the picture. Local AI keeps all data on your machine, perfect for games that might handle player information. And since nothing requires internet connectivity, your game works anywhere, on planes, in areas with poor connectivity, or simply when you want to play without network access. Understanding Microsoft Foundry Local Microsoft Foundry Local is a runtime that enables you to run small language models (SLMs) directly on your computer. It's designed for developers who want to integrate AI capabilities into applications without requiring cloud infrastructure. Think of it as having a miniature AI assistant living on your laptop. Foundry Local handles the complex work of loading AI models, managing memory, and processing inference requests through a simple API. You send text prompts, and it returns AI-generated responses, all happening locally on your CPU or GPU. The models are optimized to run efficiently on consumer hardware, so you don't need a supercomputer. For our Space Invaders game, Foundry Local powers the "AI Commander" feature. During gameplay, the game sends context about what's happening, your score, accuracy, current level, enemies remaining and receives back contextual commentary, taunts, and encouragement. The result feels like playing alongside an AI companion who actually understands the game. Setting Up Your Development Environment Let's get your machine ready for AI-powered game development. We'll install Foundry Local, clone the project, and verify everything works. The entire setup takes about 10-15 minutes. Step 1: Install Microsoft Foundry Local Foundry Local installation varies by operating system. Open your terminal and run the appropriate command: # Windows (using winget) winget install Microsoft.FoundryLocal # macOS (using Homebrew) brew install microsoft/foundrylocal/foundrylocal These commands download and install the Foundry Local runtime along with a default small language model. The installation includes everything needed to run AI inference locally. Verify the installation by running: foundry --version If you see a version number, Foundry Local is ready. If you encounter errors, ensure you have administrator/sudo privileges and that your package manager is up to date. Step 2: Install Node.js (If Not Already Installed) Our game's AI features require a small Node.js server to communicate between the browser and Foundry Local. Check if Node.js is installed: node --version If you see a version number (v16 or higher recommended), you're set. Otherwise, install Node.js: # Windows winget install OpenJS.NodeJS.LTS # macOS brew install node # Linux sudo apt install nodejs npm Node.js provides the JavaScript runtime that powers our proxy server, bridging browser code with the local AI model. Step 3: Clone the Project Get the Space Invaders project onto your machine: git clone https://github.com/leestott/Spaceinvaders-FoundryLocal.git cd Spaceinvaders-FoundryLocal This downloads all game files, including the HTML interface, game logic, AI integration module, and server code. Step 4: Install Dependencies and Start the Server Install the Node.js packages and launch the AI-enabled server: npm install npm start The first command downloads required packages (primarily for the proxy server). The second starts the server, which listens for AI requests from the game. You should see output indicating the server is running on port 3001. Step 5: Play the Game Open your browser and navigate to: http://localhost:3001 You should see Space Invaders with "AI: ONLINE" displayed in the game HUD, indicating that AI features are active. Use arrow keys or A/D to move, SPACE to fire, and P to pause. The AI Commander will start providing commentary as you play! Understanding the Project Architecture Now that the game is running, let's explore how the different pieces fit together. Understanding this architecture will help you modify the game and apply these patterns to your own projects. The project follows a clean separation of concerns, with each file handling a specific responsibility: Spaceinvaders-FoundryLocal/ ├── index.html # Main game page and UI structure ├── styles.css # Retro arcade visual styling ├── game.js # Core game logic and rendering ├── llm.js # AI integration module ├── sound.js # Web Audio API sound effects ├── server.js # Node.js proxy for Foundry Local └── package.json # Project configuration index.html: Defines the game canvas and UI elements. It's the entry point that loads all other modules. game.js: Contains the game loop, physics, collision detection, scoring, and rendering logic. This is the heart of the game. llm.js: Handles all communication with the AI backend. It formats game state into prompts and processes AI responses. server.js: A lightweight Express server that proxies requests between the browser and Foundry Local. sound.js: Synthesizes retro sound effects using the Web Audio API—no audio files needed! How the AI Integration Works The magic of the AI Commander happens through a simple but powerful pattern. Let's trace the flow from gameplay event to AI response. When something interesting happens in the game, you clear a wave, achieve a combo, or lose a life, the game logic in game.js triggers an AI request. This request includes context about the current game state: your score, accuracy percentage, current level, lives remaining, and what just happened. The llm.js module formats this context into a prompt. For example, when you clear a wave with 85% accuracy, it might construct: You are an AI Commander in a Space Invaders game. The player just cleared wave 3 with 85% accuracy. Score: 12,500. Lives: 3. Provide a brief, enthusiastic comment (1-2 sentences). This prompt travels to server.js , which forwards it to Foundry Local. The AI model processes the prompt and generates a response like: "Impressive accuracy, pilot! Wave 3 didn't stand a chance. Keep that trigger finger sharp!" The response flows back through the server to the browser, where llm.js passes it to the game. The game displays the message in the HUD, creating the illusion of playing alongside an AI companion. This entire round trip typically completes in 50-200 milliseconds, fast enough to feel responsive without interrupting gameplay. Using GitHub Copilot CLI to Explore and Modify the Code GitHub Copilot CLI accelerates your development workflow by letting you ask questions and generate code directly in your terminal. Let's use it to understand and extend the Space Invaders project. Installing Copilot CLI If you haven't installed Copilot CLI yet, here's the quick setup: # Install GitHub CLI winget install GitHub.cli # Windows brew install gh # macOS # Authenticate with GitHub gh auth login # Add Copilot extension gh extension install github/gh-copilot # Verify installation gh copilot --help With Copilot CLI ready, you can interact with AI directly from your terminal while working on the project. Exploring Code with Copilot CLI Use Copilot to understand unfamiliar code. Navigate to the project directory and try: gh copilot explain "How does llm.js communicate with the server?" Copilot analyzes the code and explains the communication pattern, helping you understand the architecture without reading every line manually. You can also ask about specific functions: gh copilot explain "What does the generateEnemyTaunt function do?" This accelerates onboarding to unfamiliar codebases, a valuable skill when working with open source projects or joining teams. Generating New Features Want to add a new AI feature? Ask Copilot to help generate the code: gh copilot suggest "Create a function that asks the AI to generate a mission briefing at the start of each level, including the level number and a random mission objective" Copilot generates starter code that you can customize and integrate. This combination of AI-powered development tools and AI-integrated gameplay demonstrates how LLMs are transforming both how we build games and how games behave. Customizing the AI Commander The default AI Commander provides generic gaming commentary, but you can customize its personality and responses. Open llm.js to find the prompt templates that control AI behavior. Changing the AI's Personality The system prompt defines who the AI "is." Find the base prompt and modify it: // Original const systemPrompt = "You are an AI Commander in a Space Invaders game."; // Customized - Drill Sergeant personality const systemPrompt = `You are Sergeant Blaster, a gruff but encouraging drill sergeant commanding space cadets. Use military terminology, call the player "cadet," and be tough but fair.`; // Customized - Supportive Coach personality const systemPrompt = `You are Coach Nova, a supportive and enthusiastic gaming coach. Use encouraging language, celebrate small victories, and provide gentle guidance when players struggle.`; These personality changes dramatically alter the game's feel without changing any gameplay code. It's a powerful example of how AI can add variety to games with minimal development effort. Adding New Commentary Triggers Currently the AI responds to wave completions and game events. You can add new triggers in game.js : // Add AI commentary when player achieves a kill streak if (killStreak >= 5 && !streakCommentPending) { requestAIComment('killStreak', { count: killStreak }); streakCommentPending = true; } // Add AI reaction when player narrowly avoids death if (nearMissOccurred) { requestAIComment('nearMiss', { livesRemaining: lives }); } Each new trigger point adds another opportunity for the AI to engage with the player, making the experience more dynamic and personalized. Understanding the Game Features Beyond AI integration, the Space Invaders project demonstrates solid game development patterns worth studying. Let's explore the key features. Power-Up System The game includes eight different power-ups, each with unique effects: SPREAD (Orange): Fires three projectiles in a spread pattern LASER (Red): Powerful beam with high damage RAPID (Yellow): Dramatically increased fire rate MISSILE (Purple): Homing projectiles that track enemies SHIELD (Blue): Grants an extra life EXTRA LIFE (Green): Grants two extra lives BOMB (Red): Destroys all enemies on screen BONUS (Gold): Random score bonus between 250-750 points Power-ups demonstrate state management, tracking which power-up is active, applying its effects to player actions, and handling timeouts. Study the power-up code in game.js to understand how temporary state modifications work. Leaderboard System The game persists high scores using the browser's localStorage API: // Saving scores localStorage.setItem('spaceInvadersScores', JSON.stringify(scores)); // Loading scores const savedScores = localStorage.getItem('spaceInvadersScores'); const scores = savedScores ? JSON.parse(savedScores) : []; This pattern works for any data you want to persist between sessions—game progress, user preferences, or accumulated statistics. It's a simple but powerful technique for web games. Sound Synthesis Rather than loading audio files, the game synthesizes retro sound effects using the Web Audio API in sound.js . This approach has several benefits: no external assets to load, smaller project size, and complete control over sound parameters. Examine how oscillators and gain nodes combine to create laser sounds, explosions, and victory fanfares. This knowledge transfers directly to any web project requiring audio feedback. Extending the Project: Ideas for Students Ready to make the project your own? Here are ideas ranging from beginner-friendly to challenging, each teaching valuable skills. Beginner: Customize Visual Theme Modify styles.css to create a new visual theme. Try changing the color scheme from green to blue, or create a "sunset" theme with orange and purple gradients. This builds CSS skills while making the game feel fresh. Intermediate: Add New Enemy Types Create a new enemy class in game.js with different movement patterns. Perhaps enemies that move in sine waves, or boss enemies that take multiple hits. This teaches object-oriented programming and game physics. Intermediate: Expand AI Interactions Add new AI features like: Pre-game mission briefings that set up the story Dynamic difficulty hints when players struggle Post-game performance analysis and improvement suggestions AI-generated names for enemy waves Advanced: Multiplayer Commentary Modify the game for two-player support and have the AI provide play-by-play commentary comparing both players' performance. This combines game networking concepts with advanced AI prompting. Advanced: Voice Integration Use the Web Speech API to speak the AI Commander's responses aloud. This creates a more immersive experience and demonstrates browser speech synthesis capabilities. Troubleshooting Common Issues If something isn't working, here are solutions to common problems. "AI: OFFLINE" Displayed in Game This means the game can't connect to the AI server. Check that: The server is running ( npm start shows no errors) You're accessing the game via http://localhost:3001 , not directly opening the HTML file Foundry Local is installed correctly ( foundry --version works) Server Won't Start If npm start fails: Ensure you ran npm install first Check that port 3001 isn't already in use by another application Verify Node.js is installed ( node --version ) AI Responses Are Slow Local AI performance depends on your hardware. If responses feel sluggish: Close other resource-intensive applications Ensure your laptop is plugged in (battery mode may throttle CPU) Consider that first requests may be slower as the model loads Key Takeaways Local AI enables real-time game features: Microsoft Foundry Local provides fast, free, private AI inference perfect for gaming applications Clean architecture matters: Separating game logic, AI integration, and server code makes projects maintainable and extensible AI personality is prompt-driven: Changing a few lines of prompt text completely transforms how the AI interacts with players Copilot CLI accelerates learning: Use it to explore unfamiliar code and generate new features quickly The patterns transfer everywhere: Skills from this project apply to chatbots, assistants, educational tools, and any AI-integrated application Conclusion and Next Steps You've now seen how to integrate AI capabilities into a browser-based game using Microsoft Foundry Local. The Space Invaders project demonstrates that modern AI features don't require cloud services or complex infrastructure, they can run entirely on your laptop, responding in milliseconds. More importantly, you've learned patterns that extend far beyond gaming. The architecture of sending context to an AI, receiving generated responses, and integrating them into user experiences applies to countless applications: customer support bots, educational tutors, creative writing tools, and accessibility features. Your next step is experimentation. Clone the repository, modify the AI's personality, add new commentary triggers, or build an entirely new game using these patterns. The combination of GitHub Copilot CLI for development assistance and Foundry Local for runtime AI gives you powerful tools to bring intelligent applications to life. Start playing, start coding, and discover what you can create when your games can think. Resources Space Invaders - AI Commander Edition Repository - Full source code and documentation Play Space Invaders Online - Try the basic version without AI features Microsoft Foundry Local Documentation - Official installation and API guide GitHub Copilot CLI Documentation - Installation and usage guide GitHub Education - Free developer tools for students Web Audio API Documentation - Learn about browser sound synthesis Canvas API Documentation - Master HTML5 game rendering231Views0likes1CommentChoosing the Right Model in GitHub Copilot: A Practical Guide for Developers
AI-assisted development has grown far beyond simple code suggestions. GitHub Copilot now supports multiple AI models, each optimized for different workflows, from quick edits to deep debugging to multi-step agentic tasks that generate or modify code across your entire repository. As developers, this flexibility is powerful… but only if we know how to choose the right model at the right time. In this guide, I’ll break down: Why model selection matters The four major categories of development tasks A simplified, developer-friendly model comparison table Enterprise considerations and practical tips This is written from the perspective of real-world customer conversations, GitHub Copilot demos, and enterprise adoption journeys Why Model Selection Matters GitHub Copilot isn’t tied to a single model. Instead, it offers a range of models, each with different strengths: Some are optimized for speed Others are optimized for reasoning depth Some are built for agentic workflows Choosing the right model can dramatically improve: The quality of the output The speed of your workflow The accuracy of Copilot’s reasoning The effectiveness of Agents and Plan Mode Your usage efficiency under enterprise quotas Model selection is now a core part of modern software development, just like choosing the right library, framework, or cloud service. The Four Task Categories (and which Model Fits) To simplify model selection, I group tasks into four categories. Each category aligns naturally with specific types of models. 1. Everyday Development Tasks Examples: Writing new functions Improving readability Generating tests Creating documentation Best fit: General-purpose coding models (e.g., GPT‑4.1, GPT‑5‑mini, Claude Sonnet) These models offer the best balance between speed and quality. 2. Fast, Lightweight Edits Examples: Quick explanations JSON/YAML transformations Small refactors Regex generation Short Q&A tasks Best fit: Lightweight models (e.g., Claude Haiku 4.5) These models give near-instant responses and keep you “in flow.” 3. Complex Debugging & Deep Reasoning Examples: Analyzing unfamiliar code Debugging tricky production issues Architecture decisions Multi-step reasoning Performance analysis Best fit: Deep reasoning models (e.g., GPT‑5, GPT‑5.1, GPT‑5.2, Claude Opus) These models handle large context, produce structured reasoning, and give the most reliable insights for complex engineering tasks. 4. Multi-step Agentic Development Examples: Repo-wide refactors Migrating a codebase Scaffolding entire features Implementing multi-file plans in Agent Mode Automated workflows (Plan → Execute → Modify) Best fit: Agent-capable models (e.g., GPT‑5.1‑Codex‑Max, GPT‑5.2‑Codex) These models are ideal when you need Copilot to execute multi-step tasks across your repository. GitHub Copilot Models - Developer Friendly Comparison The set of models you can choose from depends on your Copilot subscription, and the available options may evolve over time. Each model also has its own premium request multiplier, which reflects the compute resources it requires. If you're using a paid Copilot plan, the multiplier determines how many premium requests are deducted whenever that model is used. Model Category Example Models (Premium request Multiplier for paid plans) What they’re best at When to Use Them Fast Lightweight Models Claude Haiku 4.5, Gemini 3 Flash (0.33x) Grok Code Fast 1 (0.25x) Low latency, quick responses Small edits, Q&A, simple code tasks General-Purpose Coding Models GPT‑4.1, GPT‑5‑mini (0x) GPT-5-Codex, Claude Sonnet 4.5 (1x) Reliable day‑to‑day development Writing functions, small tests, documentation Deep Reasoning Models GPT-5.1 Codex Mini (0.33x) GPT‑5, GPT‑5.1, GPT-5.1 Codex, GPT‑5.2, Claude Sonnet 4.0, Gemini 2.5 Pro, Gemini 3 Pro (1x) Claude Opus 4.5 (3x) Complex reasoning and debugging Architecture work, deep bug diagnosis Agentic / Multi-step Models GPT‑5.1‑Codex‑Max, GPT‑5.2‑Codex (1x) Planning + execution workflows Repo-wide changes, feature scaffolding Enterprise Considerations For organizations using Copilot Enterprise or Business: Admins can control which models employees can use Model selection may be restricted due to security, regulation, or data governance You may see fewer available models depending on your organization’s Copilot policies Using "Auto" Model selection in GitHub Copilot GitHub Copilot’s Auto model selection automatically chooses the best available model for your prompts, reducing the mental load of picking a model and helping you avoid rate‑limiting. When enabled, Copilot prioritizes model availability and selects from a rotating set of eligible models such as GPT‑4.1, GPT‑5 mini, GPT‑5.2‑Codex, Claude Haiku 4.5, and Claude Sonnet 4.5 while respecting your subscription level and any administrator‑imposed restrictions. Auto also excludes models blocked by policies, models with premium multipliers greater than 1, and models unavailable in your plan. For paid plans, Auto provides an additional benefit: a 10% discount on premium request multipliers when used in Copilot Chat. Overall, Auto offers a balanced, optimized experience by dynamically selecting a performant and cost‑efficient model without requiring developers to switch models manually. Read more about the 'Auto' Model selection here - About Copilot auto model selection - GitHub Docs Final Thoughts GitHub Copilot is becoming a core part of the developer workflows. Choosing the right model can dramatically improve your productivity, the accuracy of Copilot’s responses, your experience with multi-step agentic tasks, your ability to navigate complex codebases Whether you’re building features, debugging complex issues, or orchestrating repo-wide changes, picking the right model helps you get the best out of GitHub Copilot. References and Further Reading To explore each model further, visit the GitHub Copilot model comparison documentation or try switching models in Copilot Chat to see how they impact your workflow. AI model comparison - GitHub Docs Requests in GitHub Copilot - GitHub Docs About Copilot auto model selection - GitHub DocsDemystifying GitHub Copilot Security Controls: easing concerns for organizational adoption
At a recent developer conference, I delivered a session on Legacy Code Rescue using GitHub Copilot App Modernization. Throughout the day, conversations with developers revealed a clear divide: some have fully embraced Agentic AI in their daily coding, while others remain cautious. Often, this hesitation isn't due to reluctance but stems from organizational concerns around security and regulatory compliance. Having witnessed similar patterns during past technology shifts, I understand how these barriers can slow adoption. In this blog, I'll demystify the most common security concerns about GitHub Copilot and explain how its built-in features address them, empowering organizations to confidently modernize their development workflows. GitHub Copilot Model Training A common question I received at the conference was whether GitHub uses your code as training data for GitHub Copilot. I always direct customers to the GitHub Copilot Trust Center for clarity, but the answer is straightforward: “No. GitHub uses neither Copilot Business nor Enterprise data to train the GitHub model.” Notice this restriction also applies to third-party models as well (e.g. Anthropic, Google). GitHub Copilot Intellectual Property indemnification policy A frequent concern I hear is, since GitHub Copilot’s underlying models are trained on sources that include public code, it might simply “copy and paste” code from those sources. Let’s clarify how this actually works: Does GitHub Copilot “copy/paste”? “The AI models that create Copilot’s suggestions may be trained on public code, but do not contain any code. When they generate a suggestion, they are not “copying and pasting” from any codebase.” To provide an additional layer of protection, GitHub Copilot includes a “duplicate detection filter”. This feature helps prevent suggestions that closely match public code from being surfaced. (Note: This duplicate detection currently does not apply to the Copilot coding agent.) More importantly, customers are protected by an Intellectual Property indemnification policy. This means that if you receive an unmodified suggestion from GitHub Copilot and face a copyright claim as a result, Microsoft will defend you in court. GitHub Copilot Data Retention Another frequent question I hear concerns GitHub Copilot’s data retention policies. For organizations on GitHub Copilot Business and Enterprise plans, retention practices depend on how and where the service is accessed from: Access through IDE for Chat and Code Completions: Prompts and Suggestions: Not retained. User Engagement Data: Kept for two years. Feedback Data: Stored for as long as needed for its intended purpose. Other GitHub Copilot access and use: Prompts and Suggestions: Retained for 28 days. User Engagement Data: Kept for two years. Feedback Data: Stored for as long as needed for its intended purpose. For Copilot Coding Agent, session logs are retained for the life of the account in order to provide the service. Excluding content from GitHub Copilot To prevent GitHub Copilot from indexing sensitive files, you can configure content exclusions at the repository or organization level. In VS Code, use the .copilotignore file to exclude files client-side. Note that files listed in .gitignore are not indexed by default but may still be referenced if open or explicitly referenced (unless they’re excluded through .copilotignore or content exclusions). The life cycle of a GitHub Copilot code suggestion Here are the key protections at each stage of the life cycle of a GitHub Copilot code suggestion: In the IDE: Content exclusions prevent files, folders, or patterns from being included. GitHub proxy (pre-model safety): Prompts go through a GitHub proxy hosted in Microsoft Azure for pre-inference checks: screening for toxic or inappropriate language, relevance, and hacking attempts/jailbreak-style prompts before reaching the model. Model response: With the public code filter enabled, some suggestions are suppressed. The vulnerability protection feature blocks insecure coding patterns like hardcoded credentials or SQL injections in real time. Disable access to GitHub Copilot Free Due to the varying policies associated with GitHub Copilot Free, it is crucial for organizations to ensure it is disabled both in the IDE and on GitHub.com. Since not all IDEs currently offer a built-in option to disable Copilot Free, the most reliable method to prevent both accidental and intentional access is to implement firewall rule changes, as outlined in the official documentation. Agent Mode Allow List Accidental file system deletion by Agentic AI assistants can happen. With GitHub Copilot agent mode, the "Terminal auto approve” setting in VS Code can be used to prevent this. This setting can be managed centrally using a VS Code policy. MCP registry Organizations often want to restrict access to allow only trusted MCP servers. GitHub now offers an MCP registry feature for this purpose. This feature isn’t available in all IDEs and clients yet, but it's being developed. Compliance Certifications The GitHub Copilot Trust Center page lists GitHub Copilot's broad compliance credentials, surpassing many competitors in financial, security, privacy, cloud, and industry coverage. SOC 1 Type 2: Assurance over internal controls for financial reporting. SOC 2 Type 2: In-depth report covering Security, Availability, Processing Integrity, Confidentiality, and Privacy over time. SOC 3: General-use version of SOC 2 with broad executive-level assurance. ISO/IEC 27001:2013: Certification for a formal Information Security Management System (ISMS), based on risk management controls. CSA STAR Level 2: Includes a third-party attestation combining ISO 27001 or SOC 2 with additional cloud control matrix (CCM) requirements. TISAX: Trusted Information Security Assessment Exchange, covering automotive-sector security standards. In summary, while the adoption of AI tools like GitHub Copilot in software development can raise important questions around security, privacy, and compliance, it’s clear that existing safeguards in place help address these concerns. By understanding the safeguards, configurable controls, and robust compliance certifications offered, organizations and developers alike can feel more confident in embracing GitHub Copilot to accelerate innovation while maintaining trust and peace of mind.GitHub Copilot SDK and Hybrid AI in Practice: Automating README to PPT Transformation
Introduction In today's rapidly evolving AI landscape, developers often face a critical choice: should we use powerful cloud-based Large Language Models (LLMs) that require internet connectivity, or lightweight Small Language Models (SLMs) that run locally but have limited capabilities? The answer isn't either-or—it's hybrid models—combining the strengths of both to create AI solutions that are secure, efficient, and powerful. This article explores hybrid model architectures through the lens of GenGitHubRepoPPT, demonstrating how to elegantly combine Microsoft Foundry Local, GitHub Copilot SDK, and other technologies to automatically generate professional PowerPoint presentations from GitHub README files. 1. Hybrid Model Scenarios and Value 1.1 What Are Hybrid Models? Hybrid AI Models strategically combine locally-running Small Language Models (SLMs) with cloud-based Large Language Models (LLMs) within the same application, selecting the most appropriate model for each task based on its unique characteristics. Core Principles: Local Processing for Sensitive Data: Privacy-critical content analysis happens on-device Cloud for Value Creation: Complex reasoning and creative generation leverage cloud power Balancing Cost and Performance: High-frequency, simple tasks run locally to minimize API costs 1.2 Typical Hybrid Model Use Cases Use Case Local SLM Role Cloud LLM Role Value Proposition Intelligent Document Processing Text extraction, structural analysis Content refinement, format conversion Privacy protection + Professional output Code Development Assistant Syntax checking, code completion Complex refactoring, architecture advice Fast response + Deep insights Customer Service Systems Intent recognition, FAQ handling Complex issue resolution Reduced latency + Enhanced quality Content Creation Platforms Keyword extraction, outline generation Article writing, multilingual translation Cost control + Creative assurance 1.3 Why Choose Hybrid Models? Three Core Advantages: Privacy and Security Sensitive data never leaves local devices Compliant with GDPR, HIPAA, and other regulations Ideal for internal corporate documents and personal information Cost Optimization Reduces cloud API call frequency Local models have zero usage fees Predictable operational costs Performance and Reliability Local processing eliminates network latency Partial functionality in offline environments Cloud models ensure high-quality output 2. Core Technology Analysis 2.1 Large Language Models (LLMs): Cloud Intelligence Representatives What are LLMs? Large Language Models are deep learning-based natural language processing models, typically with billions to trillions of parameters. Through training on massive text datasets, they've acquired powerful language understanding and generation capabilities. Representative Models: Claude Sonnet 4.5: Anthropic's flagship model, excelling at long-context processing and complex reasoning GPT-5.2 Series: OpenAI's general-purpose language models Gemini: Google's multimodal large models LLM Advantages: ✅ Exceptional text generation quality ✅ Powerful contextual understanding ✅ Support for complex reasoning tasks ✅ Continuous model updates and optimization Typical Applications: Professional document writing (technical reports, business plans) Code generation and refactoring Multilingual translation Creative content creation 2.2 Small Language Models (SLMs) and Microsoft Foundry Local 2.2.1 SLM Characteristics Small Language Models typically have 1B-7B parameters, designed specifically for resource-constrained environments. Mainstream SLM Model Families: Microsoft Phi Family (Phi Family): Inference-optimized efficient models Alibaba Qwen Family (Qwen Family): Excellent Chinese language capabilities Mistral Series: Outstanding performance with small parameter counts SLM Advantages: ⚡ Low-latency response (millisecond-level) 💰 Zero API costs 🔒 Fully local, data stays on-device 📱 Suitable for edge device deployment 2.2.2 Microsoft Foundry Local: The Foundation of Local AI Foundry Local is Microsoft's local AI runtime tool, enabling developers to easily run SLMs on Windows or macOS devices. Core Features: OpenAI-Compatible API # Using Foundry Local is like using OpenAI API from openai import OpenAI from foundry_local import FoundryLocalManager manager = FoundryLocalManager("qwen2.5-7b-instruct") client = OpenAI( base_url=manager.endpoint, api_key=manager.api_key ) Hardware Acceleration Support CPU: General computing support GPU: NVIDIA, AMD, Intel graphics acceleration NPU: Qualcomm, Intel AI-specific chips Apple Silicon: Neural Engine optimization Based on ONNX Runtime Cross-platform compatibility Highly optimized inference performance Supports model quantization (INT4, INT8) Convenient Model Management # View available models foundry model list # Run a model foundry model run qwen2.5-7b-instruct-generic-cpu:4 # Check running status foundry service ps Foundry Local Application Value: 🎓 Educational Scenarios: Students can learn AI development without cloud subscriptions 🏢 Enterprise Environments: Process sensitive data while maintaining compliance 🧪 R&D Testing: Rapid prototyping without API cost concerns ✈️ Offline Environments: Works on planes, subways, and other no-network scenarios 2.3 GitHub Copilot SDK: The Express Lane from Agent to Business Value 2.3.1 What is GitHub Copilot SDK? GitHub Copilot SDK, released as a technical preview on January 22, 2026, is a game-changer for AI Agent development. Unlike other AI SDKs, Copilot SDK doesn't just provide API calling interfaces—it delivers a complete, production-grade Agent execution engine. Why is it revolutionary? Traditional AI application development requires you to build: ❌ Context management systems (multi-turn conversation state) ❌ Tool orchestration logic (deciding when to call which tool) ❌ Model routing mechanisms (switching between different LLMs) ❌ MCP server integration ❌ Permission and security boundaries ❌ Error handling and retry mechanisms Copilot SDK provides all of this out-of-the-box, letting you focus on business logic rather than underlying infrastructure. 2.3.2 Core Advantages: The Ultra-Short Path from Concept to Code Production-Grade Agent Engine: Battle-Tested Reliability Copilot SDK uses the same Agent core as GitHub Copilot CLI, which means: ✅ Validated in millions of real-world developer scenarios ✅ Capable of handling complex multi-step task orchestration ✅ Automatic task planning and execution ✅ Built-in error recovery mechanisms Real-World Example: In the GenGitHubRepoPPT project, we don't need to hand-write the "how to convert outline to PPT" logic—we simply tell Copilot SDK the goal, and it automatically: Analyzes outline structure Plans slide layouts Calls file creation tools Applies formatting logic Handles multilingual adaptation # Traditional approach: requires hundreds of lines of code for logic def create_ppt_traditional(outline): slides = parse_outline(outline) for slide in slides: layout = determine_layout(slide) content = format_content(slide) apply_styling(content, layout) # ... more manual logic return ppt_file # Copilot SDK approach: focus on business intent session = await client.create_session({ "model": "claude-sonnet-4.5", "streaming": True, "skill_directories": [skills_dir] }) session.send_and_wait({"prompt": prompt}, timeout=600) Custom Skills: Reusable Encapsulation of Business Knowledge This is one of Copilot SDK's most powerful features. In traditional AI development, you need to provide complete prompts and context with every call. Skills allow you to: Define once, reuse forever: # .copilot_skills/ppt/SKILL.md # PowerPoint Generation Expert Skill ## Expertise You are an expert in business presentation design, skilled at transforming technical content into easy-to-understand visual presentations. ## Workflow 1. **Structure Analysis** - Identify outline hierarchy (titles, subtitles, bullet points) - Determine topic and content density for each slide 2. **Layout Selection** - Title slide: Use large title + subtitle layout - Content slides: Choose single/dual column based on bullet count - Technical details: Use code block or table layouts 3. **Visual Optimization** - Apply professional color scheme (corporate blue + accent colors) - Ensure each slide has a visual focal point - Keep bullets to 5-7 items per page 4. **Multilingual Adaptation** - Choose appropriate fonts based on language (Chinese: Microsoft YaHei, English: Calibri) - Adapt text direction and layout conventions ## Output Requirements Generate .pptx files meeting these standards: - 16:9 widescreen ratio - Consistent visual style - Editable content (not images) - File size < 5MB Business Code Generation Capability This is the core value of this project. Unlike generic LLM APIs, Copilot SDK with Skills can generate truly executable business code. Comparison Example: Aspect Generic LLM API Copilot SDK + Skills Task Description Requires detailed prompt engineering Concise business intent suffices Output Quality May need multiple adjustments Professional-grade on first try Code Execution Usually example code Directly generates runnable programs Error Handling Manual implementation required Agent automatically handles and retries Multi-step Tasks Manual orchestration needed Automatic planning and execution Comparison of manual coding workload: Task Manual Coding Copilot SDK Processing logic code ~500 lines ~10 lines configuration Layout templates ~200 lines Declared in Skill Style definitions ~150 lines Declared in Skill Error handling ~100 lines Automatically handled Total ~950 lines ~10 lines + Skill file Tool Calling & MCP Integration: Connecting to the Real World Copilot SDK doesn't just generate code—it can directly execute operations: 🗃️ File System Operations: Create, read, modify files 🌐 Network Requests: Call external APIs 📊 Data Processing: Use pandas, numpy, and other libraries 🔧 Custom Tools: Integrate your business logic 3. GenGitHubRepoPPT Case Study 3.1 Project Overview GenGitHubRepoPPT is an innovative hybrid AI solution that combines local AI models with cloud-based AI agents to automatically generate professional PowerPoint presentations from GitHub repository README files in under 5 minutes. Technical Architecture: 3.2 Why Adopt a Hybrid Model? Stage 1: Local SLM Processes Sensitive Data Task: Analyze GitHub README, extract key information, generate structured outline Reasons for choosing Qwen-2.5-7B + Foundry Local: Privacy Protection README may contain internal project information Local processing ensures data doesn't leave the device Complies with data compliance requirements Cost Effectiveness Each analysis processes thousands of tokens Cloud API costs are significant in high-frequency scenarios Local models have zero additional fees Performance Qwen-2.5-7B excels at text analysis tasks Outstanding Chinese support Acceptable CPU inference latency (typically 2-3 seconds) Stage 2: Cloud LLM + Copilot SDK Creates Business Value Task: Create well-formatted PowerPoint files based on outline Reasons for choosing Claude Sonnet 4.5 + Copilot SDK: Automated Business Code Generation Traditional approach pain points: Need to hand-write 500+ lines of code for PPT layout logic Require deep knowledge of python-pptx library APIs Style and formatting code is error-prone Multilingual support requires additional conditional logic Copilot SDK solution: Declare business rules and best practices through Skills Agent automatically generates and executes required code Zero-code implementation of complex layout logic Development time reduced from 2-3 days to 2-3 hours Ultra-Short Path from Intent to Execution Comparison: Different ways to implement "Generate professional PPT" 3. Production-Grade Reliability and Quality Assurance Battle-tested Agent engine: Uses the same core as GitHub Copilot CLI Validated in millions of real-world scenarios Automatically handles edge cases and errors Consistent output quality: Professional standards ensured through Skills Automatic validation of generated files Built-in retry and error recovery mechanisms 4. Rapid Iteration and Optimization Capability Scenario: Client requests PPT style adjustment The GitHub Repo https://github.com/kinfey/GenGitHubRepoPPT 4. Summary 4.1 Core Value of Hybrid Models + Copilot SDK The GenGitHubRepoPPT project demonstrates how combining hybrid models with Copilot SDK creates a new paradigm for AI application development. Privacy and Cost Balance The hybrid approach allows sensitive README analysis to happen locally using Qwen-2.5-7B, ensuring data never leaves the device while incurring zero API costs. Meanwhile, the value-creating work—generating professional PowerPoint presentations—leverages Claude Sonnet 4.5 through Copilot SDK, delivering quality that justifies the per-use cost. From Code to Intent Traditional AI development required writing hundreds of lines of code to handle PPT generation logic, layout selection, style application, and error handling. With Copilot SDK and Skills, developers describe what they want in natural language, and the Agent automatically generates and executes the necessary code. What once took 3-5 days now takes 3-4 hours, with 95% less code to maintain. Automated Business Code Generation Copilot SDK doesn't just provide code examples—it generates complete, executable business logic. When you request a multilingual PPT, the Agent understands the requirement, selects appropriate fonts, generates the implementation code, executes it with error handling, validates the output, and returns a ready-to-use file. Developers focus on business intent rather than implementation details. 4.2 Technology Trends The Shift to Intent-Driven Development We're witnessing a fundamental change in how developers work. Rather than mastering every programming language detail and framework API, developers are increasingly defining what they want through declarative Skills. Copilot SDK represents this future: you describe capabilities in natural language, and AI Agents handle the code generation and execution automatically. Edge AI and Cloud AI Integration The evolution from pure cloud LLMs (powerful but privacy-concerning) to pure local SLMs (private but limited) has led to today's hybrid architectures. GenGitHubRepoPPT exemplifies this trend: local models handle data analysis and structuring, while cloud models tackle complex reasoning and professional output generation. This combination delivers fast, secure, and professional results. Democratization of Agent Development Copilot SDK dramatically lowers the barrier to building AI applications. Senior engineers see 10-20x productivity gains. Mid-level engineers can now build sophisticated agents that were previously beyond their reach. Even junior engineers and business experts can participate by writing Skills that capture domain knowledge without deep technical expertise. The future isn't about whether we can build AI applications—it's about how quickly we can turn ideas into reality. References Projects and Code GenGitHubRepoPPT GitHub Repository - Case study project Microsoft Foundry Local - Local AI runtime GitHub Copilot SDK - Agent development SDK Copilot SDK Getting Started Tutorial - Official quick start Deep Dive: Copilot SDK Build an Agent into Any App with GitHub Copilot SDK - Official announcement GitHub Copilot SDK Cookbook - Practical examples Copilot CLI Official Documentation - CLI tool documentation Learning Resources Edge AI for Beginners - Edge AI introductory course Azure AI Foundry Documentation - Azure AI documentation GitHub Copilot Extensions Guide - Extension development guide