copilot
78 TopicsFrom AI Suggestions to Autonomous CRM Actions in Dynamics 365
Modern CRM AI solutions often stop at case summarization—but real transformation requires more. This blog introduces a CRM Copilot Agent Accelerator built on Microsoft Power Platform, designed to evolve AI from simple insights to predictive intelligence and ultimately to autonomous actions. By combining Dynamics 365, Dataverse, Power Automate, and AI Builder, and extending capabilities through modular add-on packs, this approach enables organizations to reduce manual effort, improve decision-making, and scale service operations efficiently—without additional Copilot licensing.File share migrations simplified with Azure Copilot Migration Agent
Building on our earlier announcement of discovery and assessment support for SMB and NFS file shares in Azure Migrate, we are extending the experience to support end-to-end file share migrations within the same workflow. With Azure Copilot Migration Agent, customers can move from discovery and assessment to migration through a single guided experience in Azure Migrate. By bringing planning and execution together, the agent helps organizations streamline migration activity, reduce handoffs, and maintain continuity across stages. Overview Since the release of file share discovery and assessment in Azure Migrate earlier this year, customers have indicated that while visibility into their file share estate improved, the transition to execution remained fragmented. In many cases, teams still had to work across separate workflows for inventory, readiness planning, and migration, increasing operational friction and the risk of losing context between stages. Azure Copilot Migration Agent helps address this gap by bringing discovery, assessment, planning, and execution into a single guided journey. Azure Migrate provides visibility and recommendations, while Azure Storage Mover supports execution in a connected, agentic experience. The result is a more consistent migration path that reduces complexity, preserves context, and helps teams move file shares to Azure with greater operational confidence. Customer Value This update streamlines the migration journey by connecting each stage of the process and reducing operational overhead. Natural language guidance helps teams start and manage migration activities much faster, often in hours or days instead of weeks. The experience supports the following scenarios: End-to-end discovery, assessment, and migration for on-premises Windows and Linux file shares (SMB) to Azure Files. Discovery and assessment for on-premises Windows and Linux file shares (NFS). Data transfers from one Azure Blob container to another container. Design principles The experience preserves continuity across inventory, readiness insights, and execution planning, enables direct movement of validated shares when heavyweight orchestration is unnecessary, maintains approval and sequencing controls, and supports the file and object movement patterns commonly required in production environments. Getting Started with Storage Migration in Azure Copilot Migration Agent (ACMA) Launch Azure Migrate: Sign-in to the Azure portal, open Azure Migrate. From the Getting Started page, open Azure Copilot Migration Agent, then select or create an Azure Migrate project. Describe the migration in natural language. The agent detects storage migration intent and assists with storage migration planning and routes execution requests seamlessly. Examples scenarios and prompts Migration of on-premises Windows Server data over SMB to Azure Files 2. Prompt: Help me transfer data from one Azure blob container to another blob container Call to action Storage integrated capability is launching in Limited Preview at Microsoft Build. Sign up for the Preview here. For questions, contact storagemigrationcopilotagent@microsoft.com. Learn More File share discovery and assessment in Azure Migrate Azure Copilot Migration Agent Azure Storage Mover386Views2likes0CommentsToken economics–driven architecture: hybrid models, AI Runway, AKS Kata MicroVM, MCP
1. The moment the bill arrived For most of 2024 and 2025, "Agents" were a demo word. In 2026 they are a line item on the cloud invoice. Every major model provider — OpenAI, Anthropic, Google, Mistral, DeepSeek, and even the in-cluster open-weights serving stacks — now bills by the token. Input tokens, output tokens, cached tokens, reasoning tokens, tool-call tokens. The unit price has come down. The number of tokens an autonomous agent burns through has gone up by an order of magnitude. The slide deck I keep coming back to is module 02 of the Enterprise Agent Workshop — Token Economics and Cost Control. The short version: an agentic system is not a chat app. A chat app emits one model call per user turn. An agent emits a model call to plan, another to pick a tool, another to interpret the tool result, another to decide the next step, and another to summarize — and then it loops. Multiply by tools that themselves invoke models. Multiply again by retries and reflection. The bill is no longer "what does the model cost per million tokens." The bill is "what does my architecture cost per user request." This post is about an architecture that answers that question on purpose — and that does it without giving up the security properties an enterprise actually needs. The blueprint lives in this repo, BYOT_Dev: a four-agent SDLC tower (Requirements → Code → Test → Deploy) running on AKS, each agent boxed inside its own Kata MicroVM, each one exposing tools to GitHub Copilot Chat over the Model Context Protocol, and all of them sharing a single on-cluster small-language-model endpoint served by AI Runway 2. Why agentic workloads inflate the token bill Three forces compound: Autonomy multiplies call count. A user typing "build me a URL shortener" produces one prompt at the IDE. By the time a 4-agent pipeline has clarified requirements, generated code, written tests, and produced a Kubernetes manifest, you have spent 30–200 model calls — most of them invisible to the user. Reasoning eats output tokens. Modern reasoning models think before they speak. That hidden chain-of-thought is billed. A 5-line answer might charge you for 3,000 reasoning tokens. Context inflation. Every tool result is re-injected into the next call. A 50 KB code review answer becomes the context of the next refactor turn. Costs grow super-linearly with conversation depth. You can't out-prompt-engineer this. The only durable mitigation is architectural — and it has three levers: Lever What it means in practice Model tiering Use a small, cheap model for narrow tasks; reserve the frontier model for orchestration and judgement. Placement tiering Place each model where it's cheapest to run: on-cluster CPU for tiny SLMs, on-cluster GPU for mid-size models, cloud APIs for frontier reasoning. Protocol tiering Use a standard like MCP so the expensive orchestrator can hand off subtasks to the cheap workers without lock-in. The architecture this post describes pulls all three levers at once. 3. The mental model: frontier brain, small-model hands Look at this picture: spec: image: ghcr.io/kaito-project/aikit/llama3.2:1b model: { id: "kaito/llama3.2-1b", source: huggingface } engine: { type: llamacpp } provider: name: kaito overrides: resource: instanceType: Standard_D4s_v3 preferredNodes: ["aks-nodepool1-21523631-vmss000001"] nodeSelector: { agentpool: nodepool1 } resources: { cpu: "2", memory: "4Gi" } scaling: { replicas: 1 } AI Runway then takes care of: selecting the engine (llamacpp for CPU, vllm or dynamo for GPU); selecting the provider (kaito today, others coming); pulling the model image from the AIKit catalog; exposing an OpenAI-compatible Service at http://llama3-2-1b-cpu.airunway-models.svc:80/v1. A note on the CPU-only example. This repo deliberately uses CPU + Llama-3.2-1B to prove the architecture can run on the cheapest node SKU available. In production you should not assume CPU is always right. The right answer is scenario-driven: Scenario Suggested placement High-volume, narrow, latency-tolerant task (e.g. "expand a requirement into bullet points") On-cluster CPU SLM (1B–3B) — what this repo demonstrates Code generation, refactoring, multi-file reasoning On-cluster GPU mid-model (7B–14B) via KAITO vllm, on an AKS GPU pool that auto-scales from zero Privacy-sensitive enterprise data, must not leave the cluster On-cluster GPU, possibly with confidential compute Frontier reasoning, planning, judging tool output The Copilot seat's already-included frontier model, called sparingly via MCP — not a second pay-per-token endpoint you have to provision AI Runway makes that choice a YAML edit, not a refactor. The point of the abstraction is optionality — the right to change your mind about token economics quarter by quarter without rewriting agents. 5. Hybrid scaling: all inference on AKS, planning on the Copilot tokens you already pay for The single biggest token-economics mistake an enterprise can make right now is treating model placement as a binary — "all in the cluster" or "all on a pay-per-token cloud API." Real workloads are neither. The pattern that actually saves money has two ingredients, and both of them are already on your invoice: AKS that you already provisioned. A small CPU node pool for the steady-state workload, plus a GPU node pool that scales from zero when the small pool can't keep up. Same cluster, same Kata isolation, one invoice line. The Copilot seat the developer already pays for. Copilot Chat's frontier model has its own token allowance baked into the seat. Use that allowance — not a separately provisioned cloud inference endpoint — to do the planning that drives the cheap AKS workers via MCP. That is the whole "hybrid." No external Foundry endpoint, no second per-token meter for inference. Just AKS capacity that grows when you need it + a frontier brain you already pay for. The agent traffic split is roughly: ~85% of agent calls are short, narrow, predictable — "expand this requirement", "format this YAML", "summarize this diff". A 1B–3B model on a CPU node answers these in seconds; the bill is the node, not the token. ~15% are heavier — multi-file refactors, long-context reasoning, the 400-line FastAPI generation. They need a 7B–14B model on a GPU. Planning and judgement on top of all of it are done by the Copilot seat's frontier model, which the user is paying for whether you build BYOT or not. Lever A — a GPU node pool on the same AKS cluster, scaled 0 → N Keep the always-on tiny-cpu ModelDeployment for steady state. Add a second AI Runway ModelDeployment for the mid model on a GPU node pool that is created at size zero and managed by the AKS Cluster Autoscaler az aks nodepool add \ --cluster-name $CLUSTER --resource-group $RG \ --name gpupool \ --node-vm-size Standard_NC24ads_A100_v4 \ --node-count 0 --min-count 0 --max-count 4 \ --enable-cluster-autoscaler \ --node-taints sku=gpu:NoSchedule \ --workload-runtime KataVmIsolation # airunway/modeldeployment-mid-gpu.yaml (sketch) spec: image: ghcr.io/kaito-project/aikit/qwen2.5:7b engine: { type: vllm } provider: name: kaito overrides: resource: instanceType: Standard_NC24ads_A100_v4 nodeSelector: { agentpool: gpupool } tolerations: [{ key: sku, operator: Equal, value: gpu, effect: NoSchedule }] resources: { cpu: "4", memory: "32Gi", nvidia.com/gpu: "1" } scaling: { replicas: 0, maxReplicas: 4 } The key trick is replicas: 0 plus an autoscaler min-count 0. When nobody is asking the mid model anything, no GPU node is running and no GPU node is billed. The first request causes AI Runway to scale to 1, which triggers the Cluster Autoscaler to provision a GPU node, which gets scheduled with Kata Pod Sandboxing intact. When traffic dies down, both the replica and the node go back to zero. All of this is inside AKS — the agents never leave the cluster to find a GPU. Lever B — reuse the Copilot frontier tokens you already pay for This is the lever most token-cost writeups miss. Every developer using BYOT already has a Copilot seat. That seat carries a frontier-model token allowance which Copilot Chat consumes the moment the user types into the chat. The orchestration loop in docs/workflow.md — "plan which tool to call next, read the tool's output, summarize the result" — is paid out of that allowance, not out of a new inference endpoint you provision. This means: You do not stand up a separate cloud OpenAI / Foundry deployment for "the smart model." The smart model is already on the user's screen. You do not put a per-token meter on the agent-to-frontier path. The frontier is upstream of your agents — it calls them via MCP, not the other way around. The only per-token spend the architecture introduces is what Copilot itself charges against the seat, which is independent of how many BYOT agents you stand up. The net effect: the parts of the workload that are expensive per token (planning, judgement) run on tokens the company already buys; the parts that are cheap to compute (long-form generation) run on AKS compute you already pay for as node hours. How the agents pick between the two AKS backends The Agent Framework client in https://github.com/kinfey/Multi-AI-Agents-Cloud-Native/blob/main/code/BYOT_Dev/agents/app/airunway_client.py takes its base_url and model from the ConfigMap. Three strategies, in increasing sophistication: Per-role static binding. byot-requirements and byot-test (cheap, narrow) get AIRUNWAY_BASE_URL=tiny-cpu. byot-code and byot-deploy (heavier generation) get AIRUNWAY_BASE_URL=mid-gpu. One ConfigMap, one rollout. Try-then-scale-up inside the tool. Each tool tries tiny-cpu first; if the answer is too short, fails a quality check, or times out, it retries against mid-gpu. The small model handles the easy 85%; the GPU pool handles only the 15% that actually needed it. AI Gateway in front of both. Put Azure API Management as an AI Gateway in front of the two AI Runway services. The agent talks to one URL; the gateway does semantic caching, token budgeting, and load-aware routing between tiny-cpu and mid-gpu. Both backends remain in your AKS — the gateway only routes. A back-of-envelope token saving Assume one Copilot Chat session through the BYOT tower fires 30 model calls at the lower agents. If those 30 went to an external frontier API at, say, $5 / million output tokens with an average 2 K output per call, that is $0.30 / session in additional lower-tier model spend — stacked on top of what Copilot Chat already charges the seat for planning. With the hybrid AKS + seat-tokens pattern: 25–26 calls (~85%) → tiny-cpu on a CPU node that is already running for the always-on agents → ≈ $0 marginal 4–5 calls (~15%) → mid-gpu, billed as GPU node hours only while AI Runway has scaled up, and amortised across every concurrent BYOT user that lands on the same node → ≈ $0.02–0.05 Planning / judgement → already inside the Copilot seat allowance the developer is paying for → $0 additional A $0.30-per-session pay-per-token outcome collapses toward ≈ $0.02–0.05 of pure AKS compute, and the GPU bill returns to zero when nobody is asking hard questions. That is the lever. The reason it works is that AI Runway gives the agents a single in-cluster front door, AKS gives the cluster elastic GPU capacity it doesn't pay for while idle, and Copilot Chat brings its own pre-paid frontier brain. 6. Kata MicroVM: the hardware-level helmet for agentic code Cost is one half of the agentic-workload problem. The other half is what happens inside the box you put the agent in. Earlier this year I published Giving the Copilot SDK Agent a "hardware-level helmet" using Kata microVM on AKS. The argument, compressed: A traditional container is an apartment with shared roof — the host Linux kernel. For a hand-written service the tenant is predictable. For an agent, the tenant is the model, deciding at runtime which shell command to run, which file to read, which npx package to install. That's a new threat model. Container namespaces aren't sized for it. You want a dedicated guest kernel per Pod — a microVM. Kata Containers is the integration layer that gives Kubernetes microVMs. AKS ships it as Pod Sandboxing with the kata-vm-isolation RuntimeClass on top of Hyper-V — created automatically when the node pool is provisioned with --workload-runtime KataVmIsolation. In BYOT_Dev every agent Pod sets: spec: runtimeClassName: kata-vm-isolation containers: - name: agent securityContext: runAsNonRoot: true readOnlyRootFilesystem: true capabilities: { drop: ["ALL"] } seccompProfile: { type: RuntimeDefault } …and AKS does the rest. The Pod boots a real Hyper-V microVM, with its own guest kernel, before the container even starts. Verifying it is one command: kubectl -n agents exec deploy/byot-requirements -- uname -r # compare with the kernel on the node — they differ → microVM confirmed The repository also pins one agent per node via podAntiAffinity on kubernetes.io/hostname, so the four agents live on four physically distinct Kata hosts — a model escape in one cannot reach the others through a shared host kernel, because there is no shared host kernel. The connection to token economics is this: the moment you trust a cheap on-cluster model to run agent loops on real customer code, the security envelope has to be stronger than a normal container, not weaker. Kata is the thing that makes "cheap" and "safe" not a trade-off. And because AKS Pod Sandboxing applies the same way to the CPU pool, the GPU pool, and any future node pool you add for burst, the hybrid placement story above does not weaken the isolation story — every Pod, on every tier, still boots its own guest kernel. 7. MCP: how GitHub Copilot Chat actually drives this tower The final piece is the protocol. The agents inside the Kata MicroVMs are useless unless something can call them. The "something" the user already has open is GitHub Copilot Chat in VS Code. The Model Context Protocol is the standard Copilot Chat (and almost every other serious agentic IDE) speaks to remote tool servers. In this repo each role exposes its tools via FastMCP over Streamable HTTP — see agents/app/main.py and the per-role tool sets in agents/app/roles/. Service exposure is a small but important detail. The repo uses type: LoadBalancer for each role's Service — see k8s/services.yaml — because: kubectl port-forward does not work against Kata Pods (the listener lives inside the microVM, not in the host sandbox netns); kubectl proxy works but pins Copilot to localhost and requires a long-running local process; a LoadBalancer gives each agent a public Azure IP the IDE can hit directly. Once the four LoadBalancer IPs are in .vscode/mcp.json, Copilot Chat in agent mode sees four MCP servers — byot-requirements, byot-code, byot-test, byot-deploy — and the user can simply say: "Use the byot tower to take this idea — a URL shortener with click analytics — from requirements through deployment." What happens under the covers (docs/workflow.md): Copilot's frontier model plans the sequence. Frontier tokens spent: small, but smart. It calls byot-requirements.gather_requirements({"idea": "URL shortener…"}) over MCP. No frontier tokens; the cluster-side Llama-3.2-1B does the work. It calls byot-code.implement_from_requirements({...}). Same — cluster-side small model. It calls byot-test.generate_test_plan({...}). Same. It calls byot-deploy.generate_k8s_manifest({...}). Same. Copilot's frontier model reads the four results and presents a coherent summary to the user. Frontier tokens spent: small. The expensive model decided what to do five times. The cheap model did the actual long-form generation four times. That is the token-economics win, and the only reason it's possible without lock-in is that MCP is an open standard. 8. Reading the architecture as a budget statement Translate the picture into a unit-cost table — now with the hybrid tiers explicit: Layer Where the cost lives What controls it User input + IDE planning Copilot seat (per-user subscription) Already paid — flat rate Frontier orchestration tokens Copilot seat token allowance — already included, used for MCP planning, no separate endpoint Number of agent rounds Copilot does Tool-call traffic Azure LoadBalancer egress Negligible at this scale tiny-cpu inference (steady state, ~85%) AKS CPU node hours (1× D4s_v3 in this demo) Replicas, model size, batch size mid-gpu inference (autoscaled, ~15%) AKS GPU node hours on the same cluster, only while replicas > 0 Cluster Autoscaler / Karpenter min=0 max=N, scale-to-zero Hardware isolation AKS Pod Sandboxing (Kata) — same node hours Whether you turn it on (you should) Provider swap-out AI Runway YAML A kubectl apply Three things to notice. First, most of the per-request variable cost has moved from a token meter to a node meter. CPU hours are easier to forecast, easier to chargeback, and easier to cap than per-call token spend. You know how many D4s_v3 cores you're paying for; you do not know in advance how many tokens a frontier model will decide it needs. Second, GPU capacity is no longer a fixed bet, and it never leaves AKS. The GPU node pool sits at zero nodes until AI Runway needs it, and when it does, it scales up inside the same cluster under the same Kata RuntimeClass — no second region, no second tenancy, no second per-token bill. Third, the frontier brain is reused, not re-bought. The planning and judgement that drives the whole tower runs on the Copilot seat token allowance the developer already pays for. There is no separate "smart model" cloud endpoint provisioned by BYOT, so there is no second per-token meter to babysit. And because Kata Pod Sandboxing is included in AKS and applies the same way on the CPU pool and the GPU pool, the security cost on top of the compute cost is zero. That is what makes this architecture cost-aware and elastically-scalable and safety-aware at the same time. Those three used to be a trade-off. They no longer are. 9. Six commands, end-to-end For completeness, the repo's run order (README.md): # 0. one-time prereqs: az login, kubectl, helm, docker, aks-preview az login # 1. provision AKS with Kata + ACR + AzureLinux bash infra/01-create-aks-kata.sh # 2. install AI Runway controller + KAITO provider (pinned to v0.5.0) bash infra/02-install-airunway.sh # 3. deploy Llama-3.2-1B on CPU via AI Runway ModelDeployment bash infra/03-deploy-qwen.sh # 4. build & push the single agent image to ACR bash infra/04-build-push-agents.sh # 5. deploy the 4 Kata-isolated MCP agents bash infra/05-deploy-agents.sh # 6. print the public MCP endpoints for GitHub Copilot bash infra/06-show-mcp-endpoints.sh Drop the printed IPs into .vscode/mcp.json, open Copilot Chat, and you have a fully working, hardware-isolated, cost-aware agentic tower talking to a small model on a CPU node — driven by the frontier model the user is already paying a seat for. Add the mid-gpu ModelDeployment on a scale-to-zero GPU node pool alongside it whenever your traffic justifies the next tier; the agents and the Copilot integration don't change, and nothing leaves AKS. 10. Wrapping up: the through-line Let me trace it one more time: Token economics is the new SLO. Agentic workloads multiply model calls; every call has a price. Architecture, not prompts, is what bends the curve. Tier your models, tier your placement. Frontier reasoning at the top; small models for the bulk work; on-cluster CPU for steady state and on-cluster GPU for the heavy 15%. Mix AKS compute with the Copilot tokens you already pay for. Don't add a second pay-per-token cloud endpoint for inference. The heavy compute belongs on an AKS GPU pool that scales from zero; the planning belongs on the Copilot seat allowance the developer already has. That combination both saves tokens (no new per-token meter) and scales elastically (the cluster grows only when AI Runway asks). AI Runway makes placement a YAML edit. Today's CPU Llama is tomorrow's GPU Qwen on the same cluster. Same agent code. Kata MicroVM is non-negotiable for agentic code. The tenant is the model. The roof must be your own. AKS Pod Sandboxing makes it turnkey — and it applies the same way on the CPU pool and the GPU pool. MCP is the bridge. GitHub Copilot Chat is already an MCP client. Expose the cheap workers as MCP tools and the frontier brain calls them — burning the seat tokens, not new tokens. The reference build is in this repo. Six commands, four agents, one tiny CPU model, full microVM isolation, real Copilot Chat integration — and a hybrid scaling path you can layer on without changing the agents or leaving AKS. In the agentic era, a container is not just a box for your application — it is a box for uncertainty and for tokens. The microVM hardens the box; AI Runway lets you slide the model in and out of the box, between CPU and GPU nodes in the same cluster, without rewriting anything; MCP lets the user's expensive IDE drive the cheap box from the outside on tokens already on its tab. That is the through-line. Build the tower. Watch the bill. Further reading Sample Code Giving the Copilot SDK Agent a "hardware-level helmet" using Kata microVM on AKS. AI Runway and KAITO. Kata Containers · AKS Pod Sandboxing. Model Context Protocol · GitHub Copilot Chat MCP support.270Views0likes0CommentsBuilding a GitHub Copilot Agent Usage Dashboard
Introduction Working with organisations that are attempting to create GitHub Copilot custom agents, take-up of these agents by their community becomes important to know. Some questions quickly emerge are "how well are we actually using it?", "which agents are getting used and which have not had that much traction?". Native metrics provide high-level insights into adoption, but they lack the depth needed to answer more granular questions—such as which agent workflows are most used, or how behaviour evolves over time. In this post, I’ll walk through how to build an enterprise-grade GitHub Copilot usage dashboard that captures detailed telemetry from VS Code using OpenTelemetry, processes it in Azure Monitor, and visualises insights in Grafana—all using a reproducible, infrastructure-as-code approach. The dashboard can be made available to anyone that needs it. Architecture VS Code can be configured to emit metrics using Open Telemetry as a standard. This is a configuration item in VS Code and you essentially point it to an Open Telemetry Collector. The collector is an endpoint that can consume the telemetry. In this implementation, it is a container image that is hosted in Azure and I have chosen Azure Container Apps (ACA) for this purpose as it is an easy to use managed environment - but it could also run in Azure Kubernetes Service (AKS) with a little more effort. There is a prebuilt image opentelemetry collector for this and this has been adapted to inject configuration to send the telemetry to Azure Application Insights. For defining and hosting the dashboard, I have chosen another Azure managed service Azure Managed Grafana Sample Dashboard The sample dashboard is one that contains a collection of visualisations derived from the collected data in Application Insights. Azure Managed Grafana allows you to visually author these dashboards or they can be implemented as a JSON file and adapted from there. Note that the telemetry generated by VS Code gives the location of the users - city, region and country, but does not include any personally-identifying information (PII) and so cannot be used to track individuals. As I understand it, this is by design. Managed Grafana has its own permission structure, which may then be used to give users access to the dashboard. Implementation Details There is a GitHub repo Copilot Usage Dashboard that contains details of how to implement this together with instructions for either "click-ops" 🙂creation or via Terraform. So I suggest you follow the link to my repo to look at the details. In summary, there needs to be in Azure: Azure Container App (ACA) that hosts the collector - this needs to have public ingress Azure Container Registry (ACR) that hosts the docker image that is customised via the Dockerfile Key Vault that hosts the Application Insights connection string that ACA references Application Insights - this needs to be created with a flag to allow it to work with Grafana data Log Analytics Workspace that works with Application Insights Azure Managed Grafana to host the Grafana dashboard The main thing to bear in mind is that VS Code needs to be configured to emit OpenTelemetry { "github.copilot.nextEditSuggestions.enabled": true, "github.copilot.chat.otel.enabled": true, "github.copilot.chat.otel.exporterType": "otlp-http", "github.copilot.chat.otel.otlpEndpoint": "https://<fqdn>" } where the FQDN is the URL of the public ingress to the Azure Container App. There is a Dockerfile in this repo that just injects the correct configuration file into the OpenTelemetry collector image. It is this configuration file that tells the collector to emit to Application Insights. It is of the form below: receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: batch: attributes: actions: - key: environment value: "prod" action: upsert exporters: azuremonitor: connection_string: "${APPLICATIONINSIGHTS_CONNECTION_STRING}" debug: verbosity: detailed service: pipelines: traces: receivers: [otlp] processors: [batch, attributes] exporters: [azuremonitor, debug] metrics: receivers: [otlp] processors: [batch, attributes] exporters: [azuremonitor] As can be seen above, there is a placeholder for the Application Insights connection string - in the ACA configuration this is an environment variable that then points to a secret which is in key vault. If all is well, VS Code will emit telemetry to the container image running in ACA and this will use its configuration to send to Application Insights. The Grafana dashboard then using this data. Troubleshooting The GitHub repo goes into the detail of troubleshooting, but the overall steps to troubleshoot are: If there is no data in Grafana, check that Grafana has access to Application Insights check whether there is telemetry being pushed into Application Insights by looking at the logs and looking for the contents of the table Dependencies. If there is telemetry there, then it is Grafana permissions. If not, look to ACA Look at ACA logs to see if it is healthy and look to see if there is any logs being received Use a curl request to send a fake log to ACA (a sample is in the repo) to see if the ACA is accepting logs Check the connection to Application Insights is correct and is being pulled from key vault or replace the environment variable value with the connection string directly If all good so far, then it may be that the configuration in VS Code is not correct or in the correct place. Hopefully the more detailed steps will resolve any issues quickly. Further thoughts and enhancements This implementation attempts to build a dashboard showing GitHub Copilot agent usage using a standard set of security controls, but more may be needed. Here is a list of possible enhancements: A more refined dashboard. This should be easy as there are samples for all sorts of visualisations and few of these may allow more focus on agent and model usage. the ACA-hosted OpenTelemetry collector has a public-facing ingress. This may need to be locked-down at the network level by address restriction or by a non-public ingress. Care would need to be taken to make sure that this is then visible/reachable to the intended VS Code user audience The ACA collector endpoint is not authenticated in of itself. This could be achieved at the container level by putting an authenticating proxy in the Dockerfile or at the ACA ingress level. Some investigation would be needed to see how the VS Code configuration could work with this and this may dictate largely what form this authentication can take. How the VS Code configuration changes can be automated for a user base has not been investigated as part of this work. It is assumed that an organisation may be able to roll-out these changes using their application deployment automation. Summary This approach provides a means by which an organisation can track the usage of GitHub Copilot agents (and their models), that is not provided by GitHub Enterprise dashboards. This will provide insights into the take up of custom agents and their underlying models - allowing an organisation to test whether their investments on custom agents are being used effectively. Additionally, the dashboards themselves can easily be rolled-out to a wider community than GitHub Enterprise one.Agents That Build Agents: A SKILL-first Blueprint with MS Agent Framework & Foundry
The single insight that changes everything Most "build an AI agent" tutorials collapse two completely different jobs into one tangled mess: the job of building an agent (writing the code, defining its tools, evaluating it, packaging it), and the job of running an agent (planning, reasoning, calling tools, remembering users, delivering outcomes). Once you separate them, modern agent development becomes a clean two-layer architecture: A Coding Agent sits on top — that's how you produce an agent. A Runtime Agent sits below — that's the agent your business operates. Microsoft Agent Framework is the SDK that ties them together; Microsoft Foundry is the platform both layers publish to and run on. But the secret ingredient — the thing that turns a generic Copilot into a domain-aware engineer — is the SKILL. SKILL is what the Coding Agent reads before writing a single line. It's how requirements become artifacts that actually match your framework, your conventions, and your fixtures. This post walks the entire two-layer architecture, in the order you should learn it — with SKILL as the star of Layer 1. We ground every concept in ZavaShop, a fictional global e-commerce company with 5 fulfillment centers, dozens of suppliers, and a CEO who wants one live dashboard for all of it. Both Python and .NET (C#) are first-class — pick the language your team will run in production. LAYER 1 — The Coding Agent (Build Time) The Coding Agent is not the agent your customer talks to. It's the agent that constructs the agent your customer talks to. Its output is a bundle of artifacts — code, agent definitions, workflows, skills, connectors, evals, tests, configs, docs — that flow through validation and into Foundry. Build time has five movements. Movement 1 — Requirements & Planning Before the Coding Agent writes a single line, you owe it three things: A real business pain. Not "let's build an agent." Rather: "Mei, the supervisor at Seattle DC, gets interrupted 60 times a day by stock-level questions." A list of acceptance criteria. What does "done" look like? "Agent answers stock questions for SKUs in our 10-SKU catalog. P95 latency under 4s. Wrong-tool rate under 5% on the eval set." The fixtures it'll run on. Real or realistic data — warehouses, SKUs, POs, customers — so the Coding Agent isn't reasoning about a vacuum. ZavaShop context. The workshop ships workshop/data/ — 5 warehouses, 10 SKUs, 6 POs, 8 suppliers, 5 contracts, 4 customers (3 VIP), 6 orders, 5 carriers, 4 open exceptions. Every artifact the Coding Agent generates is anchored to this shared fixture set, so numbers stay consistent across the entire system. Movement 2 — The Coding Agent + its SKILL (the star of build time) This is the movement most teams skip — and it's the one that decides whether your build-time output is professional code or "ChatGPT-shaped" code. What a Coding Agent actually is The Coding Agent is GitHub Copilot Chat in Agent Mode, configured with a domain-aware agent definition. In the ZavaShop workshop, it lives at .github/agents/zavashop-coding-agent.agent.md and is activated from the VS Code Agent picker. You start each session with one plain sentence: "I'm working on the inventory agent in Python — wire up stock and PO lookups against the fixtures, plus a HostedMCPTool for the warehouse handbook." Notice what's not in that sentence: no library names, no class names, no file paths. The Coding Agent has to fill all of that in. The mechanism it uses is the SKILL. What a SKILL is A SKILL is a structured contract that teaches the Coding Agent how to write code in your framework, your conventions, and your domain. It is the most important file in the entire build-time layer — without it, GitHub Copilot is a fluent generalist; with it, it becomes a domain-aware specialist that writes code your tech leads would have written. Conceptually, a SKILL contains: Section Purpose Scope & when to use "Use this SKILL for building agents on Foundry / Azure AI — tools, MCP, Toolbox, Skills, Memory, Threads" Framework idioms The exact way to construct AzureAIAgentClient, register function tools, wire HostedMCPTool, create a Thread Code patterns Reference snippets the Coding Agent imitates — naming, import order, error handling, type hints Fixture/data contract How to load workshop/data/, which loaders exist (find_stock, find_po, etc.), where to add sys.path Anti-patterns What not to do — don't hardcode the model name, don't write inline mock dicts, don't bypass the data loader Acceptance heuristics How to map a LAB's acceptance criteria to runnable checks (eval rows, smoke tests) A SKILL is versioned with the codebase. When the framework releases a new idiom, you update the SKILL once; every agent built afterwards picks it up automatically. This is the single biggest reason convention drift disappears. The six SKILLs in the ZavaShop workshop The workshop ships six SKILLs — three for each language track — and they cover three orthogonal capability surfaces: Track SKILL Use it for 🐍 Python agent-framework-azure-ai-py Single agent on Foundry: tools, MCP, Toolbox, Skills, Memory, Threads 🐍 Python agent-framework-workflows-py Multi-agent workflows: WorkflowBuilder, executors, HITL, Checkpoint 🐍 Python agent-framework-agui-py AG-UI server + client: SSE, frontend/backend tools, shared state, HITL 🟦 .NET agent-framework-azure-ai-csharp Same as the Python azure-ai SKILL, for C# 🟦 .NET agent-framework-workflows-csharp Same as the Python workflows SKILL, for C# 🟦 .NET agent-framework-agui-csharp AG-UI in ASP.NET Core: MapAGUI, AGUIChatClient, HITL How the Coding Agent uses SKILL The Coding Agent's loop is SKILL-first, code-second: The discipline is captured in the workshop's one mantra: "Read the SKILL first." It is not optional. Skip it and you're back to generic Copilot output. Movement 3 — A worked example: from a single sentence to a runnable agent Let's trace what happens when you sit down in front of the Coding Agent and say: "I'm working on the inventory agent in Python — wire up stock and PO lookups against the fixtures, plus a HostedMCPTool for the warehouse handbook." Step 1 — The Coding Agent routes The Coding Agent's definition (.github/agents/zavashop-coding-agent.agent.md) contains a routing table: It loads both files into context before doing anything else. Step 2 — The Coding Agent plans Using the SKILL's patterns and the LAB's acceptance criteria, it generates a plan: Every item in the plan traces back to either the SKILL (how) or the LAB README (what). No invention, no guessing. Step 3 — The Coding Agent generates Now — and only now — it writes code. Because the SKILL specified the patterns, what comes out is shaped like the rest of your codebase: # scripts/inventory_agent.py import os import sys from pathlib import Path # Pattern from SKILL: add data folder to sys.path, import named loaders sys.path.insert(0, str(Path(__file__).resolve().parents[1] / "data")) from zava_data import find_stock, find_po # noqa: E402 from agent_framework import HostedMCPTool from agent_framework_azure_ai import AzureAIAgentClient from azure.identity import DefaultAzureCredential def stock_lookup(sku: str, warehouse: str) -> dict: """Return on-hand quantity for a SKU at a warehouse.""" return find_stock(sku, warehouse) def po_lookup(po_id: str) -> dict: """Return the status of a purchase order.""" return find_po(po_id) async def main() -> None: client = AzureAIAgentClient( project_endpoint=os.environ["FOUNDRY_PROJECT_ENDPOINT"], model_deployment_name=os.environ["FOUNDRY_MODEL"], # gpt-5.5 from env async_credential=DefaultAzureCredential(), ) agent = client.create_agent( name="Zara", instructions=( "You are Zara, the warehouse assistant for Seattle DC (SEA-01). " "Use the tools when asked about stock or POs. " "Answer briefly." ), tools=[ stock_lookup, po_lookup, HostedMCPTool( name="warehouse-docs", url=os.environ["WAREHOUSE_DOCS_MCP_URL"], ), ], ) thread = agent.get_new_thread() print(await agent.run("How many SKU-7421 are at SEA-01?", thread=thread)) Notice the things the SKILL enforced without you having to ask: env-var-driven config, named function tools with docstrings, sys.path data-loader pattern, HostedMCPTool placed alongside function tools, Thread for multi-turn. Step 4 — The Coding Agent validates The SKILL also told it how to validate. The Coding Agent runs: a smoke test against fixtures (SKU-7421 @ SEA-01 → 312), the eval set (eval_queries.jsonl) — was the right tool called? did the answer contain the expected fact? a red-team probe round. It reports back: "3/3 acceptance criteria pass. Eval score 5/5. Red-team: no successful prompt injections." Step 5 — Done What landed in your repo is not just a script. It's an artifact bundle — code + agent definition + tools + eval rows + a one-page README — that matches the way your team writes agents. That bundle is what flows into the next three movements. Movement 4 — Agent Artifacts (the outputs) A well-instructed Coding Agent produces eight kinds of artifact. Together they make up "an agent" in the deployable sense: Artifact What it is Why it matters Source code The Agent / Workflow program Versioned, reviewable, diffable Agent definitions Name, instructions, tool list The "personality" — independently editable Workflows WorkflowBuilder graphs Multi-agent orchestration as code Skills Named, packaged behaviors Reusable capabilities — one Skill, many agents Connectors MCP servers, Toolbox registrations Where the agent reaches into the world Evals eval_queries.jsonl and harness Regression target for every prompt change Tests & configs Unit tests, .env schema, deployment manifests Reproducibility Documentation READMEs, runbooks The agent your future self can operate Don't confuse two senses of "skill" here. A SKILL file (uppercase, in .github/skills/) instructs the Coding Agent at build time. An Agent Skill (a Foundry concept) is a named runtime capability the Runtime Agent calls. Both names are deliberate — Layer 1's SKILL produces, among other artifacts, Layer 2's Skills. Movement 5 — Validation Before any artifact reaches Foundry, four gates run: Tests — unit + integration. Did find_stock("SKU-7421", "SEA-01") return 312, the value in the fixture? Lint & types — ruff/mypy on Python, dotnet build warnings on .NET. The model has to read these signatures; sloppy ones cause real bugs. Evaluation — run the eval set. Did the right tool get called? Did the answer contain the expected fact? You need a score, not a vibe. Red-Team probes — adversarial inputs that try to drift the agent off topic or extract another customer's data. The Foundry red-team SDK ships a battery of these. Evangelist takeaway. "We built an agent" is not a deliverable. "We built an agent and here is its pass rate on a versioned eval set, plus a red-team report" is a deliverable. Validation belongs at build time, not "we'll add it later." Movement 6 — Publish & Deploy When validation is green, the Coding Agent's outputs flow into Foundry and Azure: Push to Microsoft Foundry — agent definitions, Skills, Toolbox tools, and custom evals register against your Foundry project. They are now governed, versioned, and observable. Deploy to Azure — the runtime host (AG-UI server, workflow worker, Teams app, API surface) ships to your Azure target (App Service, Container Apps, AKS, Functions). Same env vars drive local dev and cloud. The same artifact set deploys to dev, staging, and production. There is no "production-only" code in your agent. LAYER 2 — The Runtime Agent (Runtime) Now the agent is live. Every conversation, every action against your data, every memory it writes — that's Layer 2. Five concerns define it. Concern 1 — Users & Channels A Runtime Agent reaches users through the channels they already use: Microsoft Teams — the agent shows up where work already happens. Outlook — triage, reply, summarize, schedule. Custom web / mobile / voice — built on AG-UI, which ships a React client covering streaming text, frontend tools, backend tools, shared state, generative UI, predictive updates, HITL prompts. The channel is a deployment choice, not an architectural choice. The same agent definition can surface in Teams and on a React dashboard. ZavaShop context. Mei's agent shows up in Teams. The CEO's control tower is a React app on top of AG-UI. The agent definition behind both is the same artifact set the Coding Agent produced. Concern 2 — The Runtime Agent itself The Runtime Agent is the loop you've heard about a thousand times — now it's a concrete piece of architecture: AIAgent = model + instructions + tools + thread Inside the loop: The model plans & reasons about the next step. It calls tools through MCP, Toolbox, or local functions. It reads & writes memory. It streams output back to the channel. # Python — the runtime shape (exactly what the Coding Agent produced) agent = client.create_agent( name="Zara", instructions="You are Zara, the warehouse assistant for Seattle DC.", tools=[stock_lookup, po_lookup, warehouse_docs_mcp], ) Concern 3 — Tools & Integrations (the runtime capability surface) At runtime, a Runtime Agent reaches the outside world through four kinds of capabilities — and which one to use is a real engineering decision: Capability Lives in Use when Function tool The agent's own process Local code: a calculation, a DB query, a fixture lookup MCP tool An external MCP server The capability is owned by another system, exposed via MCP Toolbox tool The Foundry project (server-side, tenant-wide) Capability is shared by multiple agents, must be governed Agent Skill The Foundry project A combination of tools + policy as one named capability Mental progression: You don't have to start with Toolbox — but the moment a second agent touches the same domain, migrate. ZavaShop context. Local fixtures → function tools. The warehouse handbook → MCP. Supplier-portal connectors shared by procurement, fulfillment, and finance → Toolbox tools. "Validate-PO-against-contract" → an Agent Skill. Concern 4 — Memory & State State at runtime comes in two flavors: Thread = state inside one conversation thread = agent.get_new_thread() await agent.run("Look up PO-1043.", thread=thread) await agent.run("And its supplier?", thread=thread) # knows which PO Memory = state across conversations Foundry Memory is durable, retrievable knowledge about a user — VIP status, packaging preferences, delivery windows. Memory holds stable preferences and facts, not chat transcripts. ZavaShop context. Customer service agent Aria remembers across sessions that C-204 is VIP, prefers no cardboard, and wants 6–8pm delivery. Concern 5 — Actions & Outcomes Real systems take actions that change state and produce outcomes other systems observe: Trigger events — kick off a workflow, page a human. Generate outputs — write a PO, draft an email, push to a record. Notify channels — send back to Teams, update a dashboard, hit a webhook. Observability — every action streams to Application Insights / Azure Monitor. This is also where Workflows live. WorkflowBuilder is Agent Framework's orchestration primitive: Three workflow features matter most: Reuse, don't rebuild — tools written at build time are workflow nodes at runtime. Human-in-the-Loop (HITL) — pauses, asks a human, resumes from the exact step. Checkpointing — workflows survive process restarts. ZavaShop context. Fulfillment director Diego's team handles a $10K+ exception every day. Before: an email chain across 5 teams. After: a WorkflowBuilder graph with one HITL approval and full audit trail. Cross-cutting: the shared services that make this safe Both layers sit on top of platform services non-negotiable for enterprise deployment: Service What it does for your agents Microsoft Entra ID Who is the user? Who is the agent? Managed identity for tool calls Microsoft Defender for Cloud Threat detection across the agent's compute + data plane Microsoft Sentinel SIEM — correlate agent actions with security signals Azure Key Vault Secrets, keys, connection strings — never in code, never in .env checked to git Azure Monitor / App Insights Every agent turn, every tool call, every workflow step — observable and queryable Azure Policy & governance Guardrails on what can be deployed where, by whom Skip this row and you have a demo that has not yet failed. Mapping the ZavaShop workshop to the architecture Layer 1 artifacts shipped in the repo: .github/agents/zavashop-coding-agent.agent.md — the Coding Agent definition .github/skills/agent-framework-{azure-ai,workflows,agui}-{py,csharp}/ — the six SKILLs workshop/data/ — shared fixtures every artifact grounds in Per-lab READMEs + eval_queries.jsonl — Layer 1 validation inputs Layer 2 artifacts produced over the course of the workshop: A single agent (Zara) — function tools + HostedMCPTool + Thread A procurement agent (Pierre) — Toolbox + Agent Skills + approval policy A customer-service agent (Aria) — Foundry Memory + Evaluation + Red-Team A multi-agent fulfillment workflow (Diego) — WorkflowBuilder + HITL + Checkpoint An AG-UI control tower for the CEO — covering all 7 AG-UI features Same model across the stack — gpt-5.5 on Foundry + text-embedding-3-small. Change one env var, run the same artifact in the other language. Three habits that separate strong agent engineers Read the SKILL first. Make it ritual. The Coding Agent does it automatically; you should do it manually when reviewing the agent's output. Treat tools as a public API. Names, signatures, docstrings, return shapes — they are how the model sees your system at runtime. Refactor them like any other API. Measure before you tune. A prompt change without an eval delta is a vibe. With one, it's engineering. Getting started in 60 seconds git clone https://github.com/microsoft/Learn-Microsoft-Agent-Framework-with-Foundry-ZavaShop-Supply-Chain-Workshop cd Learn-Microsoft-Agent-Framework-with-Foundry-ZavaShop-Supply-Chain-Workshop # Foundry prereqs: gpt-5.5 + text-embedding-3-small deployed in your Foundry project az login --use-device-code # Python track python -m venv .venv && source .venv/bin/activate pip install agent-framework agent-framework-azure-ai agent-framework-ag-ui \ azure-identity python-dotenv fastapi "uvicorn[standard]" # .NET track dotnet --version # ≥ 10.0.100 # .env at repo root cat > .env <<EOF FOUNDRY_PROJECT_ENDPOINT=https://<your-project>.services.ai.azure.com/api/projects/<project-name> FOUNDRY_MODEL=gpt-5.5 AZURE_OPENAI_EMBEDDING_MODEL=text-embedding-3-small AGUI_SERVER_URL=http://127.0.0.1:5100/ AG_UI_API_KEY=zava-control-tower-demo-key EOF # In VS Code → Copilot Chat → Agent Mode → pick zavashop-coding-agent # Then say: "I'm working on the inventory agent in Python — meet Mei." The one mantra: "Read the SKILL first." Closing thought Modern agent development is not one job — it's two. The Coding Agent designs and builds; the Runtime Agent operates and delivers. Microsoft Agent Framework is the SDK that makes both layers feel like the same conceptual model. Microsoft Foundry is the platform both layers publish to and run on. And the engine that turns a generic Copilot into a domain-aware engineer — that takes a sentence-long requirement and lands a runnable, validated, deployable artifact — is the SKILL. Write a good SKILL once, and every agent built afterwards inherits your team's taste, your fixtures, your patterns, your discipline. The ZavaShop workshop is the smallest end-to-end example I can give you that actually exercises both layers, with six SKILLs ready to read. Walk it once, and the next time someone asks "how do we build agents in our org?", you won't be pointing at a tutorial — you'll be pointing at an architecture. 👉 Start with the workshop on GitHub29KViews3likes0CommentsMoving Beyond Prompts: A Practical Introduction to Spec-Driven Development
In the last year, many of us have started writing code differently. We describe what we want, let AI generate an answer, review it, tweak the prompt, and try again. This loop—prompt, retry, adjust—has quietly become part of our daily workflow. At first, it feels incredibly productive. But as the complexity of the task increases, something changes. The iteration cycle becomes longer, outputs become inconsistent, and the effort shifts from solving the problem to refining the prompt. This is where a subtle but important shift in approach can help: moving from prompt-driven development to spec-driven development. The Problem: Prompt → Retry → Guess Most AI-assisted workflows today look something like this: Write a prompt describing the task Review the generated output Adjust the prompt Repeat until it looks acceptable In practice, this often simplifies to: Prompt → Retry → Guess Figure: Prompt-driven vs spec-driven workflow comparison For simple tasks, this works well. But for anything involving multiple inputs, constraints, or edge cases, the process can become unpredictable. In my experience, the challenge is not the model—it is the lack of structure in how we describe the problem. A Shift in Thinking: From Prompts to Specifications Instead of asking AI to “figure it out,” spec-driven development introduces a simple idea: Define the problem clearly before asking for a solution. A specification (spec) is not a long document—it is a structured way of describing: Inputs Outputs Constraints Edge cases When this structure is provided upfront, the interaction changes significantly. Rather than iterating on vague prompts, you are guiding the system with a clear contract. What This Looks Like in Practice Let’s take a simple example: an order summary API (for example, a backend service hosted on Azure App Service). Without a Spec (Typical Prompt) “Write an API that returns order details for a user.” A model can generate something reasonable, but in practice, the responses often vary: Field names may be inconsistent Pagination may be missing Edge cases (no orders, large datasets) may not be handled Structure may change across iterations Example response (typical output): { "userId": 123, "orders": [ { "id": 1, "amount": 250 } ] } With a Spec (Structured Input) Now consider providing a simple specification: Specification: Input: userId page pageSize Output: userId orders[] orderId totalAmount orderDate pagination page pageSize totalRecords Constraints: Default pageSize = 10 Return empty list if no orders Handle large datasets efficiently Example response (based on the spec): { "userId": 123, "orders": [ { "orderId": 1, "totalAmount": 250, "orderDate": "2024-01-10" } ], "pagination": { "page": 1, "pageSize": 10, "totalRecords": 50 } } Why This Tends to Work The difference here is not just stylistic—it is structural. An unstructured prompt leaves room for interpretation. A spec reduces ambiguity by defining expectations explicitly. In practice, I have observed that providing structured inputs like this often leads to the following: More consistent field naming Better handling of edge cases Reduced need for repeated prompt refinement Rather than relying on trial-and-error, the interaction becomes more predictable and aligned with expectations. Applying This to Existing Code (Refactor Scenario) This approach becomes even more useful when applied to existing code. Instead of asking: “Fix the bug in the Auth controller” You can define expected behavior: Input validation rules Response formats Error handling Authorization behavior The task then becomes aligning the implementation with the defined spec. This shifts the interaction from guesswork to validation—comparing current behavior with intended behavior. Example Comparison (Auth Scenario) Without Spec (Typical Prompt) “Fix the login issue in Auth controller” Possible outcomes include: Partial validation added Inconsistent error responses No clear handling of repeated failed attempts With Spec (Defined Behavior) Spec defines: Validate username and password Return consistent error responses Lock account after 5 failed attempts Do not expose internal errors Resulting behavior: Input validation is consistently applied Error responses follow a defined structure Edge cases like account lockout are handled explicitly This mirrors the same pattern seen in the API example—moving from ambiguity to clearly defined behavior. A Practical Way to Start You do not need new tools or frameworks to try this. A simple workflow that has worked well in practice: Ask – Describe the problem (prompt, discussion, or notes) Write a spec – Define inputs, outputs, constraints Refine – Remove ambiguity Generate – Use the spec as input Validate – Compare output with the spec This adds a small upfront step, but it often reduces back-and-forth iterations later. The Practical Challenge One important point to note: Writing a good spec requires understanding the problem. Spec-driven development does not eliminate complexity—it surfaces it earlier. In many cases, the hardest part is not writing code, but clearly defining: What the system should do What it should not do How it should behave under edge conditions This is also why specs evolve over time. They do not need to be perfect upfront. They improve as your understanding improves. Where This Approach Helps From what I have seen, this approach is most useful in scenarios where the problem involves multiple inputs, defined contracts, or structured outputs such as APIs, schema-driven systems, or refactoring existing code where consistency matters. Where It May Not Be Necessary For simpler tasks such as small scripts, minor UI changes, or quick experiments, a detailed specification may not add much value. In those cases, a straightforward prompt is often sufficient. A Note on Tools Tools like GitHub Copilot, Azure AI Studio, and AI-assisted workflows in Visual Studio Code tend to be more effective when given clear, structured inputs. Spec-driven development is not tied to any specific tool. It is a way of thinking about how we interact with these systems more effectively. References https://github.com/features/copilot https://platform.openai.com/docs/guides/prompt-engineering https://github.com/github/spec-kit Amplifier - Modular AI Agent Framework - Amplifier Final Thoughts Many discussions around AI-assisted development focus on what tools can do. This approach focuses on something slightly different: How developers can structure problems more effectively before implementation. In my experience, moving from prompts to specs does not eliminate iteration, but it makes that iteration more predictable and purposeful.Building an Enterprise Knowledge Copilot with Foundry IQ and Agentic Retrieval on Azure AI
Every enterprise has the same problem: knowledge scattered across SharePoint, file shares, wikis, and email. This article walks through building a knowledge copilot that unifies that data behind a single conversational interface — using Microsoft's Foundry IQ knowledge bases and the agentic retrieval engine in Azure AI Search. The Problem: Fragmented Knowledge, Fragmented Answers Enterprise AI projects today share a common pain point. Each new agent or copilot that needs to answer questions from company data must rebuild its own retrieval pipeline from scratch — data connections, chunking logic, embeddings, routing, permissions — all duplicated project after project. The result is a tangle of fragmented, siloed pipelines that are expensive to maintain and inconsistent in quality. Consider a field technician troubleshooting equipment. The answer might span a vendor manual stored in OneLake, a company repair policy on SharePoint, and a public electrical standard on the web. Traditional single-index RAG cannot orchestrate across those sources in one pass. The technician waits, the issue escalates, and productivity drops. Foundry IQ, announced in public preview in November 2025, addresses this directly. It provides a unified knowledge layer for agents — a single endpoint that replaces per-project RAG pipelines with a reusable, topic-centric knowledge base that any number of agents can consume. What Is Foundry IQ? Foundry IQ introduces four capabilities built on top of Azure AI Search: Knowledge Bases — Reusable, topic-centric collections (e.g., "employee policies," "product documentation") available directly in the Foundry portal. Rather than wiring retrieval logic into every agent, you define a knowledge base once and ground multiple agents through a single API. Indexed and Federated Knowledge Sources — A knowledge base can draw from Azure Blob Storage, OneLake, SharePoint, Azure AI Search indexes, the web, and MCP servers (MCP in private preview). Developers do not need to manage different retrieval strategies per source; the knowledge base presents a unified endpoint. Agentic Retrieval Engine — A self-reflective query engine that uses AI to plan, search, and synthesize answers with configurable retrieval reasoning effort. Enterprise-Grade Security — Document-level access control and alignment with existing permissions models. Microsoft Purview sensitivity labels are respected through the indexing and retrieval pipeline, so classified content remains governed as it flows into knowledge bases. For indexed sources, Foundry IQ automatically manages the full indexing pipeline: content is ingested, chunked, vectorized, and prepared for hybrid retrieval. When Azure Content Understanding is enabled, complex documents gain layout-aware enrichment — tables, figures, and headers are extracted and structured without extra engineering work. How Agentic Retrieval Works Single-shot RAG — one query, one index, one pass — breaks down when questions are ambiguous, multi-hop, or span several data silos. Foundry IQ's agentic retrieval engine treats retrieval as a multi-step reasoning task rather than a keyword lookup: Plan — The engine analyzes the conversation and decomposes the query into focused sub-queries, deciding which knowledge sources to consult. Search — Sub-queries run concurrently against selected sources using keyword, vector, or hybrid techniques. Rank — Semantic reranking identifies the most relevant results. Reflect — If the information gathered is insufficient, the engine iterates — issuing follow-up queries autonomously. Synthesize — Results are unified into a natural-language answer with source references. Developers control this behaviour through a high-level retrieval reasoning effort setting. Lower effort suits fast, lightweight lookups; higher effort enables iterative search and richer planning across the entire data estate. Real-world impact: AT&T integrated Azure AI Search and retrieval-augmented generation into its multi-agent framework, reducing customer resolution times by 33 percent, cutting average handle time by nearly 10 percent, and scaling 71 AI solutions to 100,000 employees. Ontario Power Generation used agentic retrieval to sift through over 40 years of nuclear operating experience, enabling data-driven decision-making and helping new staff learn from decades of institutional knowledge. Architecture Overview Step-by-Step: Setting Up the Knowledge Copilot Provision Resources You need an Azure AI Search service (Basic tier or above), a Microsoft Foundry project, an embedding model deployment (e.g., text-embedding-3-large), and an LLM deployment (e.g., gpt-4.1) for query planning and answer generation. .NET 8 or later is required for the C# SDK. Create a Knowledge Base in Azure AI Search Using the Azure.Search.Documents preview SDK, define an index, a knowledge source pointing to your data, and a knowledge base with OutputMode set to AnswerSynthesis for natural-language answers with citations. The following C# snippet (adapted from the official Azure AI Search quickstart) shows the knowledge base creation: using Azure; using Azure.Identity; using Azure.Search.Documents.Indexes; var searchEndpoint = "https://<your-service>.search.windows.net"; var aoaiEndpoint = "https://<your-resource>.openai.azure.com/"; var indexClient = new SearchIndexClient( new Uri(searchEndpoint), new DefaultAzureCredential()); // Configure the LLM for query planning and answer synthesis var openAiParameters = new AzureOpenAIVectorizerParameters { ResourceUri = new Uri(aoaiEndpoint), DeploymentName = "gpt-4.1", ModelName = "gpt-4.1" }; var model = new KnowledgeBaseAzureOpenAIModel(openAiParameters); // Create the knowledge base with answer synthesis enabled var knowledgeBase = new KnowledgeBase("<knowledge-base-name>") { OutputMode = KnowledgeBaseOutputMode.AnswerSynthesis, AnswerInstructions = "Provide a concise answer based on the retrieved documents.", Models = { model } }; await indexClient.CreateOrUpdateKnowledgeBaseAsync(knowledgeBase); Connect an Agent to the Knowledge Base via MCP Each knowledge base exposes a Model Context Protocol (MCP) endpoint that MCP-compatible agents can call. The Foundry IQ-specific agent SDK currently offers full code samples for Python and REST API, but you can use the general-purpose MCP tooling in C# to achieve the same connection. The following pattern is drawn from the official Microsoft Learn documentation on MCP tools with Foundry Agents: using Azure.AI.Projects; using Azure.Identity; var endpoint = "https://<your-resource>.services.ai.azure.com/api/projects/<your-project>"; var model = "gpt-4.1-mini"; // Point the MCP tool at the knowledge base's MCP endpoint var mcpTool = new MCPToolDefinition( serverLabel: "enterprise_kb", serverUrl: "https://<search-service>.search.windows.net" + "/knowledgebases/<kb-name>/mcp?api-version=2025-11-01-preview"); mcpTool.AllowedTools.Add("knowledge_base_retrieve"); // Create the agent with the MCP tool attached var projectClient = new AIProjectClient(new Uri(endpoint), new DefaultAzureCredential()); var agentVersion = await projectClient.AgentAdministrationClient .CreateAgentVersionAsync( "enterprise-copilot", new ProjectsAgentVersionCreationOptions( new DeclarativeAgentDefinition(model) { Instructions = "You are a company knowledge assistant. " + "Always search the knowledge base before answering. " + "If the knowledge base has no answer, say so clearly.", Tools = { mcpTool } })); The agent instructions are critical — explicitly requiring the agent to use the knowledge base prevents it from answering purely from the LLM's training data. Query the Copilot Once the agent is published, your application layer simply sends user questions via the Azure AI Projects SDK or REST API. The agent autonomously invokes the knowledge base tool, retrieves grounded context, and returns an answer with citations referencing the original documents. Trade-offs and Considerations Dimension Detail Maturity Foundry IQ is in public preview — not recommended for production workloads without accepting preview SLA terms. Cost Agentic retrieval has two billing streams: token-based billing from Azure AI Search for retrieval, and billing from Azure OpenAI for query planning and answer synthesis. Latency vs. Quality Higher retrieval reasoning effort produces better answers but adds latency due to iterative search. For sub-second lookups, use minimal effort; for complex multi-hop questions, use medium. C# SDK Coverage The Foundry IQ–specific agent connection SDK currently supports Python and REST API. C# support is available for the underlying agentic retrieval queries and for general MCP tool integration. Security Document-level ACLs from SharePoint are enforced at query time. For per-user authorization in Foundry Agent Service, the current preview does not support per-request MCP headers — use the Azure OpenAI Responses API as an alternative. Key Takeaways Foundry IQ transforms enterprise RAG from a bespoke, per-project exercise into a managed, reusable knowledge layer. You define a knowledge base once, connect it to your data sources, and any number of agents or apps can consume it. The agentic retrieval engine handles query planning, multi-source search, semantic reranking, and iterative refinement — capabilities that previously required significant custom engineering. For .NET developers, the Azure AI Search C# SDK and the MCP tooling in the Agent Framework provide the building blocks to integrate this into your applications today. References: What is Foundry IQ? Create a knowledge base in Azure AI Search Foundry IQ: Unlocking ubiquitous knowledge for agentsIf You're Building AI on Azure, ECS 2026 is Where You Need to Be
Let me be direct: there's a lot of noise in the conference calendar. Generic cloud events. Vendor showcases dressed up as technical content. Sessions that look great on paper but leave you with nothing you can actually ship on Monday. ECS 2026 isn't that. As someone who will be on stage at Cologne this May, I can tell you the European Collaboration Summit combined with the European AI & Cloud Summit and European Biz Apps Summit is one of the few events I've seen where engineers leave with real, production-applicable knowledge. Three days. Three summits. 3,000+ attendees. One of the largest Microsoft-focused events in Europe, and it keeps getting better. If you're building AI systems on Azure, designing cloud-native architectures, or trying to figure out how to take your AI experiments to production — this is where the conversation is happening. What ECS 2026 Actually Is ECS 2026 runs May 5–7 at Confex in Cologne, Germany. It brings together three co-located summits under one roof: European Collaboration Summit — Microsoft 365, Teams, Copilot, and governance European AI & Cloud Summit — Azure architecture, AI agents, cloud security, responsible AI European BizApps Summit — Power Platform, Microsoft Fabric, Dynamics For Azure engineers and AI developers, the European AI & Cloud Summit is your primary destination. But don't ignore the overlap, some of the most interesting AI conversations happen at the intersection of collaboration tooling and cloud infrastructure. The scale matters here: 3,000+ attendees, 100+ sessions, multiple deep-dive tracks, and a speaker lineup that includes Microsoft executives, Regional Directors, and MVPs who have built, broken, and rebuilt production systems. The Azure + AI Track - What's Actually On the Agenda The AI & Cloud Summit agenda is built around real technical depth. Not "intro to AI" content, actual architecture decisions, patterns that work, and lessons from things that didn't. Here's what you can expect: AI Agents and Agentic Systems This is where the energy is right now, and ECS is leaning in. Expect sessions covering how to design agent workflows, chain reasoning steps, handle memory and state, and integrate with Azure AI services. Marco Casalaina, VP of Products for Azure AI at Microsoft, is speaking if you want to understand the direction of the Azure AI platform from the people building it, this is a direct line. Azure Architecture at Scale Cloud-native patterns, microservices, containers, and the architectural decisions that determine whether your system holds up under real load. These sessions go beyond theory you'll hear from engineers who've shipped these designs at enterprise scale. Observability, DevOps, and Production AI Getting AI to production is harder than the demos suggest. Sessions here cover monitoring AI systems, integrating LLMs into CI/CD pipelines, and building the operational practices that keep AI in production reliable and governable. Cloud Security and Compliance Security isn't optional when you're putting AI in front of users or connecting it to enterprise data. Tracks cover identity, access patterns, responsible AI governance, and how to design systems that satisfy compliance requirements without becoming unmaintainable. Pre-Conference Deep Dives One underrated part of ECS: the pre-conference workshops. These are extended, hands-on sessions typically 3–6 hours that let you go deep on a single topic with an expert. Think of them as intensive short courses where you can actually work through the material, not just watch slides. If you're newer to a particular area of Azure AI, or you want to build fluency in a specific pattern before the main conference sessions, these are worth the early travel. The Speaker Quality Is Different Here The ECS speaker roster includes Microsoft executives, Microsoft MVPs, and Regional Directors, people who have real accountability for the products and patterns they're presenting. You'll hear from over 20 Microsoft speakers: Marco Casalaina — VP of Products, Azure AI at Microsoft Adam Harmetz — VP of Product at Microsoft, Enterprise Agent And dozens of MVPs and Regional Directors who are in the field every day, solving the same problems you are. These aren't keynote-only speakers — they're in the session rooms, at the hallway track, available for real conversations. The Hallway Track Is Not a Cliché I know "networking" sounds like a corporate afterthought. At ECS it genuinely isn't. When you put 3,000 practitioners, engineers, architects, DevOps leads, security specialists in one venue for three days, the conversations between sessions are often more valuable than the sessions themselves. You get candid answers to "how are you actually handling X in production?" that you won't find in documentation. The European Microsoft community is tight-knit and collaborative. ECS is where that community concentrates. Why This Matters Right Now We're in a period where AI development is moving fast but the engineering discipline around it is still maturing. Most teams are figuring out: How to move from AI prototype to production system How to instrument and observe AI behaviour reliably How to design agent systems that don't become unmaintainable How to satisfy security and compliance requirements in AI-integrated architectures ECS 2026 is one of the few places where you can get direct answers to these questions from people who've solved them — not theoretically, but in production, on Azure, in the last 12 months. If you go, you'll come back with practical patterns you can apply immediately. That's the bar I hold events to. ECS consistently clears it. Register and Explore the Agenda Register for ECS 2026: ecs.events Explore the AI & Cloud Summit agenda: cloudsummit.eu/en/agenda Dates: May 5–7, 2026 | Location: Confex, Cologne, Germany Early registration is worth it the pre-conference workshops fill up. And if you're coming, find me, I'll be the one talking too much about AI agents and Azure deployments. See you in Cologne.Supercharge Your Dev Workflows with GitHub Copilot Custom Skills
The Problem Every team has those repetitive, multi-step workflows that eat up time: Running a sequence of CLI commands, parsing output, and generating a report Querying multiple APIs, correlating data, and summarizing findings Executing test suites, analyzing failures, and producing actionable insights You've probably documented these in a wiki or a runbook. But every time, you still manually copy-paste commands, tweak parameters, and stitch results together. What if your AI coding assistant could do all of that — triggered by a single natural language request? That's exactly what GitHub Copilot Custom Skills enable. What Are Custom Skills? A skill is a folder containing a SKILL.md file (instructions for the AI), plus optional scripts, templates, and reference docs. When you ask Copilot something that matches the skill's description, it loads the instructions and executes the workflow autonomously. Think of it as giving your AI assistant a runbook it can actually execute, not just read. Without Skills With Skills Read the wiki for the procedure Copilot loads the procedure automatically Copy-paste 5 CLI commands Copilot runs the full pipeline Manually parse JSON output Script generates a formatted HTML report 15-30 minutes of manual work One natural language request, ~2 minutes How It Works The key insight: the skill file is the contract between you and the AI. You describe what to do and how, and Copilot handles the orchestration. Prerequisites Requirement Details VS Code Latest stable release GitHub Copilot Active Copilot subscription (Individual, Business, or Enterprise) Agent mode Select "Agent" mode in the Copilot Chat panel (the default in recent versions) Runtime tools Whatever your scripts need — Python, Node.js, .NET CLI, az CLI, etc. Note: Agent Skills follow an open standard — they work across VS Code, GitHub Copilot CLI, and GitHub Copilot coding agent. No additional extensions or cloud services are required for the skill system itself. Anatomy of a Skill .github/skills/my-skill/ ├── SKILL.md # Instructions (required) └── references/ ├── resources/ │ ├── run.py # Automation script │ ├── query-template.sql # Reusable query template │ └── config.yaml # Static configuration └── reports/ └── report_template.html # Output template The SKILL.md File Every skill has the same structure: --- name: my-skill description: 'What this does and when to use it. Include trigger phrases so Copilot knows when to load it. USE FOR: specific task A, task B. Trigger phrases: "keyword1", "keyword2".' argument-hint: 'What inputs the user should provide.' --- # My Skill ## When to Use - Situation A - Situation B ## Quick Start \```powershell cd .github/skills/my-skill/references/resources py run.py <arg1> <arg2> \``` ## What It Does | Step | Action | Purpose | |------|--------|---------| | 1 | Fetch data from source | Gather raw input | | 2 | Process and transform | Apply business logic | | 3 | Generate report | Produce actionable output | ## Output Description of what the user gets back. Key Design Principles Description is discovery. The description field is the only thing Copilot reads to decide whether to load your skill. Pack it with trigger phrases and keywords. Progressive loading. Copilot reads only name + description (~100 tokens) for all skills. It loads the full SKILL.md body only for matched skills. Reference files load only when the procedure references them. Self-contained procedures. Include everything the AI needs to execute — exact commands, parameter formats, file paths. Don't assume prior knowledge. Scripts do the heavy lifting. The AI orchestrates; your scripts execute. This keeps the workflow deterministic and reproducible. Example: Build a Deployment Health Check Skill Let's build a skill that checks the health of a deployment by querying an API, comparing against expected baselines, and generating a summary. Step 1 — Create the folder structure .github/skills/deployment-health/ ├── SKILL.md └── references/ └── resources/ ├── check_health.py └── endpoints.yaml Step 2 — Write the SKILL.md --- name: deployment-health description: 'Check deployment health across environments. Queries health endpoints, compares response times against baselines, and flags degraded services. USE FOR: deployment validation, health check, post-deploy verification, service status. Trigger phrases: "check deployment health", "is the deployment healthy", "post-deploy check", "service health".' argument-hint: 'Provide the environment name (e.g., staging, production).' --- # Deployment Health Check ## When to Use - After deploying to any environment - During incident triage to check service status - Scheduled spot checks ## Quick Start \```bash cd .github/skills/deployment-health/references/resources python check_health.py <environment> \``` ## What It Does 1. Loads endpoint definitions from `endpoints.yaml` 2. Calls each endpoint, records response time and status code 3. Compares against baseline thresholds 4. Generates an HTML report with pass/fail status ## Output HTML report at `references/reports/health_<environment>_<date>.html` Step 3 — Write the script # check_health.py import sys, yaml, requests, time, json from datetime import datetime def main(): env = sys.argv[1] with open("endpoints.yaml") as f: config = yaml.safe_load(f) results = [] for ep in config["endpoints"]: url = ep["url"].replace("{env}", env) start = time.time() resp = requests.get(url, timeout=10) elapsed = time.time() - start results.append({ "service": ep["name"], "status": resp.status_code, "latency_ms": round(elapsed * 1000), "threshold_ms": ep["threshold_ms"], "healthy": resp.status_code == 200 and elapsed * 1000 < ep["threshold_ms"] }) healthy = sum(1 for r in results if r["healthy"]) print(f"Health check: {healthy}/{len(results)} services healthy") # ... generate HTML report ... if __name__ == "__main__": main() Step 4 — Use it Just ask Copilot in agent mode: "Check deployment health for staging" Copilot will: Match against the skill description Load the SKILL.md instructions Run python check_health.py staging Open the generated report Summarize findings in chat More Skill Ideas Skills aren't limited to any specific domain. Here are patterns that work well: Skill What It Automates Test Regression Analyzer Run tests, parse failures, compare against last known-good run, generate diff report API Contract Checker Compare Open API specs between branches, flag breaking changes Security Scan Reporter Run SAST/DAST tools, correlate findings, produce prioritized report Cost Analysis Query cloud billing APIs, compare costs across periods, flag anomalies Release Notes Generator Parse git log between tags, categorize changes, generate changelog Infrastructure Drift Detector Compare live infra state vs IaC templates, flag drift Log Pattern Analyzer Query log aggregation systems, identify anomaly patterns, summarize Performance Bench marker Run benchmarks, compare against baselines, flag regressions Dependency Auditor Scan dependencies, check for vulnerabilities and outdated packages The pattern is always the same: instructions (SKILL.md) + automation script + output template. Tips for Writing Effective Skills Do Front-load the description with keywords — this is how Copilot discovers your skill Include exact commands — cd path/to/dir && python script.py <args> Document input/output clearly — what goes in, what comes out Use tables for multi-step procedures — easier for the AI to follow Include time zone conversion notes if dealing with timestamps Bundle HTML report templates — rich output beats plain text Don't Don't use vague descriptions — "A useful skill" won't trigger on anything Don't assume context — include all paths, env vars, and prerequisites Don't put everything in SKILL.md — use references/ for large files Don't hardcode secrets — use environment variables or Azure Key Vault Don't skip error guidance — tell the AI what common errors look like and how to fix them Skill Locations Skills can live at project or personal level: Location Scope Shared with team? .github/skills/<name>/ Project Yes (via source control) .agents/skills/<name>/ Project Yes (via source control) .claude/skills/<name>/ Project Yes (via source control) ~/.copilot/skills/<name>/ Personal No ~/.agents/skills/<name>/ Personal No ~/.claude/skills/<name>/ Personal No Project-level skills are committed to your repo and shared with the team. Personal skills are yours and roam with your VS Code settings sync. You can also configure additional skill locations via the chat.skillsLocations VS Code setting. How Skills Fit in the Copilot Customization Stack Skills are one of several customization primitives. Here's when to use what: Primitive Use When Workspace Instructions (.github/copilot-instructions.md) Always-on rules: coding standards, naming conventions, architectural guidelines File Instructions (.github/instructions/*.instructions.md) Rules scoped to specific file patterns (e.g., all *.test.ts files) Prompts (.github/prompts/*.prompt.md) Single-shot tasks with parameterized inputs Skills (.github/skills/<name>/SKILL.md) Multi-step workflows with bundled scripts and templates Custom Agents (.github/agents/*.agent.md) Isolated subagents with restricted tool access or multi-stage pipelines Hooks (.github/hooks/*.json) Deterministic shell commands at agent lifecycle events (auto-format, block tools) Plugins Installable skill bundles from the community (awesome-copilot) Slash Commands & Quick Creation Skills automatically appear as slash commands in chat. Type / to see all available skills. You can also pass context after the command: /deployment-health staging /webapp-testing for the login page Want to create a skill fast? Type /create-skill in chat and describe what you need. Copilot will ask clarifying questions and generate the SKILL.md with proper frontmatter and directory structure. You can also extract a skill from an ongoing conversation: after debugging a complex issue, ask "create a skill from how we just debugged that" to capture the multi-step procedure as a reusable skill. Controlling When Skills Load Use frontmatter properties to fine-tune skill availability: Configuration Slash command? Auto-loaded? Use case Default (both omitted) Yes Yes General-purpose skills user-invocable: false No Yes Background knowledge the model loads when relevant disable-model-invocation: true Yes No Skills you only want to run on demand Both set No No Disabled skills The Open Standard Agent Skills follow an open standard that works across multiple AI agents: GitHub Copilot in VS Code — chat and agent mode GitHub Copilot CLI — terminal workflows GitHub Copilot coding agent — automated coding tasks Claude Code, Gemini CLI — compatible agents via .claude/skills/ and .agents/skills/ Skills you write once are portable across all these tools. Getting Started Create .github/skills/<your-skill>/SKILL.md in your repo Write a keyword-rich description in the YAML frontmatter Add your procedure and reference scripts Open VS Code, switch to Agent mode, and ask Copilot to do the task Watch it discover your skill, load the instructions, and execute Or skip the manual setup — type /create-skill in chat and describe what you need. That's it. No extension to install. No config file to update. No deployment pipeline. Just markdown and scripts, version-controlled in your repo. Custom Skills turn your documented procedures into executable AI workflows. Start with your most painful manual task, wrap it in a SKILL.md, and let Copilot handle the rest. Further Reading: Official Agent Skills docs Community skills & plugins (awesome-copilot) Anthropic reference skillsDemystifying GitHub Copilot Security Controls: easing concerns for organizational adoption
At a recent developer conference, I delivered a session on Legacy Code Rescue using GitHub Copilot App Modernization. Throughout the day, conversations with developers revealed a clear divide: some have fully embraced Agentic AI in their daily coding, while others remain cautious. Often, this hesitation isn't due to reluctance but stems from organizational concerns around security and regulatory compliance. Having witnessed similar patterns during past technology shifts, I understand how these barriers can slow adoption. In this blog, I'll demystify the most common security concerns about GitHub Copilot and explain how its built-in features address them, empowering organizations to confidently modernize their development workflows. GitHub Copilot Model Training A common question I received at the conference was whether GitHub uses your code as training data for GitHub Copilot. I always direct customers to the GitHub Copilot Trust Center for clarity, but the answer is straightforward: “No. GitHub uses neither Copilot Business nor Enterprise data to train the GitHub model.” Notice this restriction also applies to third-party models as well (e.g. Anthropic, Google). GitHub Copilot Intellectual Property indemnification policy A frequent concern I hear is, since GitHub Copilot’s underlying models are trained on sources that include public code, it might simply “copy and paste” code from those sources. Let’s clarify how this actually works: Does GitHub Copilot “copy/paste”? “The AI models that create Copilot’s suggestions may be trained on public code, but do not contain any code. When they generate a suggestion, they are not “copying and pasting” from any codebase.” To provide an additional layer of protection, GitHub Copilot includes a “duplicate detection filter”. This feature helps prevent suggestions that closely match public code from being surfaced. (Note: This duplicate detection currently does not apply to the Copilot coding agent.) More importantly, customers are protected by an Intellectual Property indemnification policy. This means that if you receive an unmodified suggestion from GitHub Copilot and face a copyright claim as a result, Microsoft will defend you in court. GitHub Copilot Data Retention Another frequent question I hear concerns GitHub Copilot’s data retention policies. For organizations on GitHub Copilot Business and Enterprise plans, retention practices depend on how and where the service is accessed from: Access through IDE for Chat and Code Completions: Prompts and Suggestions: Not retained. User Engagement Data: Kept for two years. Feedback Data: Stored for as long as needed for its intended purpose. Other GitHub Copilot access and use: Prompts and Suggestions: Retained for 28 days. User Engagement Data: Kept for two years. Feedback Data: Stored for as long as needed for its intended purpose. For Copilot Coding Agent, session logs are retained for the life of the account in order to provide the service. Excluding content from GitHub Copilot To prevent GitHub Copilot from indexing sensitive files, you can configure content exclusions at the repository or organization level. In VS Code, use the .copilotignore file to exclude files client-side. Note that files listed in .gitignore are not indexed by default but may still be referenced if open or explicitly referenced (unless they’re excluded through .copilotignore or content exclusions). The life cycle of a GitHub Copilot code suggestion Here are the key protections at each stage of the life cycle of a GitHub Copilot code suggestion: In the IDE: Content exclusions prevent files, folders, or patterns from being included. GitHub proxy (pre-model safety): Prompts go through a GitHub proxy hosted in Microsoft Azure for pre-inference checks: screening for toxic or inappropriate language, relevance, and hacking attempts/jailbreak-style prompts before reaching the model. Model response: With the public code filter enabled, some suggestions are suppressed. The vulnerability protection feature blocks insecure coding patterns like hardcoded credentials or SQL injections in real time. Disable access to GitHub Copilot Free Due to the varying policies associated with GitHub Copilot Free, it is crucial for organizations to ensure it is disabled both in the IDE and on GitHub.com. Since not all IDEs currently offer a built-in option to disable Copilot Free, the most reliable method to prevent both accidental and intentional access is to implement firewall rule changes, as outlined in the official documentation. Agent Mode Allow List Accidental file system deletion by Agentic AI assistants can happen. With GitHub Copilot agent mode, the "Terminal auto approve” setting in VS Code can be used to prevent this. This setting can be managed centrally using a VS Code policy. MCP registry Organizations often want to restrict access to allow only trusted MCP servers. GitHub now offers an MCP registry feature for this purpose. This feature isn’t available in all IDEs and clients yet, but it's being developed. Compliance Certifications The GitHub Copilot Trust Center page lists GitHub Copilot's broad compliance credentials, surpassing many competitors in financial, security, privacy, cloud, and industry coverage. SOC 1 Type 2: Assurance over internal controls for financial reporting. SOC 2 Type 2: In-depth report covering Security, Availability, Processing Integrity, Confidentiality, and Privacy over time. SOC 3: General-use version of SOC 2 with broad executive-level assurance. ISO/IEC 27001:2013: Certification for a formal Information Security Management System (ISMS), based on risk management controls. CSA STAR Level 2: Includes a third-party attestation combining ISO 27001 or SOC 2 with additional cloud control matrix (CCM) requirements. TISAX: Trusted Information Security Assessment Exchange, covering automotive-sector security standards. In summary, while the adoption of AI tools like GitHub Copilot in software development can raise important questions around security, privacy, and compliance, it’s clear that existing safeguards in place help address these concerns. By understanding the safeguards, configurable controls, and robust compliance certifications offered, organizations and developers alike can feel more confident in embracing GitHub Copilot to accelerate innovation while maintaining trust and peace of mind.