copilot
64 TopicsGitHub Copilot App Canvas Is a Runtime
There is a quiet shift happening in how we build software with AI. We are moving from writing static code to orchestrating living systems where developers and AI agents co-create, observe, and evolve a solution in real time. This post is a working theory of what GitHub Copilot App Canvas is actually for, grounded in a real, runnable demo you can clone today: leestott/agent-runtime-canvas. The Agent Runtime canvas open beside the chat — control bar, activity spotlight, requirement & constraints, and the live agent roster. The headline claim, which the rest of this post defends with code: Traditional UIs are for using software. Canvas is for shaping software while it runs. 1. The misconception worth getting out of the way The first instinct most engineers have when they see Canvas is to build a UI with it a dashboard, a DevOps board, an admin panel. That is the wrong mental model, and it leads to disappointment. A Kanban board rendered in Canvas is just a worse version of a tool that already exists. Canvas is not where your users live. It is where your system becomes visible to you and to the AI while you are still figuring it out. The distinction matters: You don't build Canvas instead of your UI. You use Canvas to figure out, test, and evolve the UI and the system before and during building it. Canvas solves problems your final UI should never try to solve in a visible way agent coordination, intermediate state, test validation, failure propagation. These are observability concerns, not end-user features. Canvas is intended for test validation and the implementation of agent-driven solutions not for shipping a production control panel. A useful analogy: Figma is Human-to-Human one person designs a static artifact for another person to read. Canvas is Human-to-AI-to-System a shared surface where a human, an AI agent, and a running system all act on the same live model. Figma shows you a picture of the software. Canvas is a runtime where things actually execute. 2. The positioning, stated plainly Here is the thesis the demo is built to prove: Canvas redefines software development by shifting from writing static code to orchestrating living systems, where developers and AI co-create, observe, and evolve solutions in real time. Instead of building UIs for users, we build interactive environments for agents — turning debugging, testing, and execution into a continuous, visual feedback loop that accelerates innovation and brings ideas to production faster than ever. Read that again with the demo in mind, because the demo is not a slide, it is a working Copilot CLI extension that renders exactly this loop. 3. What we built: the Agent Runtime canvas agent-runtime-canvas is a GitHub Copilot CLI canvas extension called Agent Runtime. It turns Canvas into a runtime observability and control plane for a multi-agent software system that is being designed, tested, and evolved in real time. The canvas renders a single living SystemModel that both humans and the AI agent edit at the same time. The agent drives it through five canvas actions; the human drives it through panel controls. Every change streams to the iframe over Server-Sent Events (SSE), so the system visibly evolves through interaction. The seven panels: a system you can watch think Panel What it makes observable Requirement & constraints The feature under design plus editable policies and constraints Agents Active agents, their responsibilities, and live state (idle / working / done / error / blocked) Task Flow The dependency graph of tasks across agents, with live status Artifacts The intermediate outputs each task emits Validation Test cases, pass/fail, expected vs. actual, and the reasoning behind each verdict Live State The shared memory objects the agents read and write — directly human-editable Timeline A change-over-time log, including before→after state diffs None of these are things you would put in front of an end user. All of them are things you desperately want to see while you and an AI are co-designing an agentic system. The five agent actions The AI co-creates and evolves the system by calling five actions, declared in the canvas extension: Action Effect decompose_system Break a requirement into collaborating agents + a task-flow graph execute_workflow Coordinate agents to advance tasks ( step / run / pause / resume / reset ) validate_output Run evaluation tests, return structured pass/fail + reasoning update_system_design Modify architecture/logic: requirement, constraints, agents, tasks track_state Persist/update a shared state object, recording the diff on the timeline The critical detail is that human controls and agent actions funnel through the exact same store. There is no separate "AI view" and "human view" — one model, two kinds of participant. 4. How it actually works (the parts that matter) The extension is deliberately small and dependency-free. It uses only Node's built-in modules plus github/copilot-sdk , which the CLI auto-resolves. Three files do the work: .github/extensions/agent-runtime/ extension.mjs # wiring: loopback HTTP server, SSE, /control, 5 canvas actions store.mjs # durable SystemModel + execution engine + validation ui.mjs # iframe renderer (system view, validation, state, timeline) One shared model, broadcast on every mutation The heart of the demo is the SystemStore . It is an EventEmitter : every mutation bumps a version, appends a timeline entry, persists to disk, and broadcasts a fresh snapshot to all connected panels. This is the single line that makes "humans and AI edit the same live system" true rather than aspirational: // store.mjs — every change is versioned, logged, persisted, and broadcast. _commit(eventType, summary, detail) { this.model.version += 1; this.model.updatedAt = now(); if (eventType) { this.model.timeline.unshift({ id: uid("ev"), ts: now(), type: eventType, summary, detail: detail || null, }); this.model.timeline = this.model.timeline.slice(0, 200); } this._queueSave(); // best-effort JSON persistence under ~/.copilot this.emit("change", this.model); // fan out to every SSE client return this.model; } The agent action and the human button hit the same method In extension.mjs , the canvas action handler and the iframe's /control POST both call store.execute(...) . That symmetry is the whole point — neither the human nor the AI is privileged: // extension.mjs — a human control POST maps onto the same store method // the AI agent calls through the execute_workflow canvas action. function applyControl(store, body) { switch (body.action) { case "execute": return store.execute(body.mode || "step", body); case "validate": return store.validate(body.tests); case "decompose":return store.decompose(body.requirement, body); case "inject_failure": return store.injectFailure(body.taskKey); case "edit_state": return store.editState(body.key, body.value); // ...requirement, constraints, clear_failures, update_design } } Execution you can watch one task at a time The engine advances the task graph through a visible begin→dwell→finish lifecycle so the active agent is always observable. A ready task is one whose dependencies are all done : // store.mjs — the scheduler only starts a task when its deps are satisfied. _readyTask() { return this.model.tasks.find( (t) => t.status === "pending" && t.deps.every((d) => { const dep = this.model.tasks.find((x) => x.id === d); return dep && dep.status === "done"; }), ); } When a task finishes, its agent emits an artifact and writes to shared state; when a dependency fails, the engine walks the graph to a fixpoint and marks every downstream task blocked . That is failure propagation you can see — exactly the kind of thing a production UI would (correctly) hide, and exactly the kind of thing you need exposed while designing the system. Validation as a first-class, re-runnable citizen The default evaluation suite asserts properties of the running system, not of static code — every test returns an expected value, an actual value, and a human -readable reason: // store.mjs — tests assert properties of the live system model. _defaultTests() { const t = (name, target, assertion) => ({ id: uid("test"), name, target, assertion }); return [ t("All tasks reach a terminal state", "tasks", "no_pending"), t("No tasks failed", "tasks", "none_failed"), t("Every completed task emitted an artifact", "artifacts", "artifact_per_done"), t("Design state populated before build", "state", "design_before_build"), t("Decision recorded by Reviewer", "state", "has_decision"), ]; } This is the "continuous, visual feedback loop" from the thesis, made concrete: decompose → execute → validate → redesign → re-validate, with the Timeline recording every before→after transition. 5. Run it yourself You need a GitHub Copilot CLI / app with canvas support (the canvas-renderer capability) and this repo opened as your workspace. There is no npm install the SDK is auto-resolved and the extension uses only built-in Node modules. Clone and open the workspace. git clone https://github.com/leestott/agent-runtime-canvas.git cd agent-runtime-canvas The extension auto-discovers from .github/extensions/agent-runtime/ . Open the canvas with a requirement. Ask Copilot: Open the Agent Runtime canvas with the requirement "Add CSV export to the reports page". Walk the loop. Decompose into five agents and a six-task graph, press Run ▶, watch the spotlight track the active agent, press Run tests ✓ for 5/5 green, then Inject failure ⚡ to watch downstream tasks go blocked and validation drop to 4/5 — and recover. State persists per documentId under ~/.copilot/extensions/agent-runtime/artifacts/ , so a reload resumes exactly where you left off. The companion demoscript.md in the repo gives you a tight, timed walkthrough. 6. Why this is an observability story Once you accept that Canvas is a runtime rather than a UI, the most compelling use case becomes observability of agentic systems. Agentic software is notoriously hard to debug: the interesting behavior lives in intermediate state, coordination order, and the moments where one agent's failure cascades into another's. A production UI is designed to hide all of that. A Canvas is designed to surface it, temporarily, while you are shaping the system — and then get out of the way. This reframes Canvas alongside the broader Microsoft and GitHub agent tooling story. As teams adopt the GitHub Copilot SDK and patterns like the open Model Context Protocol to wire agents into real systems, the gap is rarely "can the agent act?" it is "can a human see what the agent did, judge it, and steer it?" Canvas is a candidate answer to that second question. When you take agents toward production on Azure with services like Microsoft Foundry, the same instinct applies: build the evaluation and observability loop first, and let it shape the system before you commit a single end-user pixel. 7. The open question: why can't Canvas be multi-user? There is an obvious next frontier, and it is worth stating as an honest open question rather than a finished feature. Everything that makes Canvas valuable also makes it a natural collaborative surface: It is a shared space. It is visual. It is collaborative. Multiple participants — human and AI — interact with the same surface. If Figma earned its place by making Human-to-Human design multiplayer, the provocative question is whether a project- or repo-scoped Canvas can make Human-to-AI-to-System development multiplayer too: several engineers and several agents shaping one running system on one surface. The demo here is single-user by design, but its architecture — one shared store, versioned, broadcast to every subscriber — is already the shape you would need. That is a genuine research direction, and worth experimenting with as licensing and access broaden. 8. Honest limitations In the spirit of building credibility rather than hype: This is a demonstration. The decomposition, artifacts, and state are synthesized to make the runtime loop legible — it models an agentic system rather than running arbitrary production agents. It is single-user and single-machine. The loopback HTTP server and per-document store are local by design; multi-user is an aspiration, not a shipped capability. Access is gated. Canvas support requires a Copilot CLI/app build with the canvas-renderer capability. Licensing and preview access are the biggest practical blockers to wider experimentation today. Persistence is best-effort. State is written to a local JSON artifact; treat it as demo durability, not a database. Key takeaways Don't build a UI in Canvas. Use Canvas to shape, test, and evolve a system — and the UI — while it runs. Traditional UIs are for using software; Canvas is for shaping software while it runs. Canvas is Human-to-AI-to-System, a runtime where things execute — not a static design surface. Its strongest use case is observability and validation of agentic systems: surface the intermediate state your production UI should hide. The shared-model architecture — one versioned store broadcast to every participant — is what makes human + AI co-editing real, and what hints at a multi-user future. Next steps Clone and run the demo: github.com/leestott/agent-runtime-canvas. Read the extension source under .github/extensions/agent-runtime/ — start with store.mjs . Explore the building blocks: the GitHub Copilot SDK, the Model Context Protocol, and Microsoft Foundry for taking agentic systems toward production. Try the multi-user thought experiment: fork the store, add a second subscriber, and ask what changes when two humans and two agents share one surface.177Views0likes0CommentsMind the Specs: Grading formal specifications and KPIs as artefacts for LLM-driven code generation
Large language models now write code straight from a prompt, but the specification in between is never checked, and a model asked to judge its own work brings the same blind spots to the review. We built a pipeline that lifts a plain-language requirements bundle into two graded specifications (a formal Alloy model and a set of numerical KPI targets), scores both before a single line of code is written, and hands the graded result to the code generator. It starts from GitHub Spec Kit and the Azure Well-Architected Framework. Here is what we built, and what we learned from running it at scale. The problem Writing software used to be four separate activities: gathering requirements, writing a specification, verifying it, and implementing it. A language model collapses all four into a single step. Two of those activities used to give us a quality signal before any code existed: a formal specification you could inspect, and measurable targets an implementation had to hit. The prompt-to-code loop inherits neither. There is no externally observable signal, before a line of code is written, that the requirements a model received are even well-formed enough to drive a correct implementation. You might think the model could just check its own work. It cannot do so reliably. Ask a language model to check the logic it just wrote: not only will it bring the same blind spot to the review, but its stochastic nature will make it produce different answers on each run. A SAT solver does not behave this way. Its verdict is deterministic: the same specification produces the same verdict every time. The thing that historically kept formal specification out of everyday development was never its rigour, it was the cost of writing the specification by hand. And that is exactly the step a language model can now do. What we built We built an agentic pipeline that sits between the requirements and the generated code. In plain terms it takes the requirements once, turns them into two things that can be checked by a machine: a precise description of rules that the system must obey, and a set of measurable targets that the system must hit. These artefacts are both graded, and are handed to the code generator. We split the work in two and gave each half to the tool that is good at it. The language model does the creative part, turning messy prose into formal structure. Deterministic checks, not the model's own opinion, grade what it produces. From a single Spec Kit artefacts bundle the pipeline builds two graded specifications before any code exists, and then carries both into code generation. Since these grades are computed deterministically rather than just generated, you can actually trust them. The input is a GitHub Spec Kit bundle. Spec Kit is an open-source, specification-first toolkit: instead of prompting for code directly, you describe what you want to build, and it produces a set of structured artefacts, a feature specification, a data model, and a set of API contracts. Our pipeline reads that bundle and turns it into the two graded specifications in parallel. overview. Spec Kit artefacts on the left. The Alloy lifter (with SAT solver and the attack step) and the KPI agent run in parallel. Their graded outputs are merged into a verification report that feeds the guided code generator. A dashed baseline path feeds the goal alone to the generator for comparison. Lift the requirements into a formal model The first half is structural. An Alloy lifter translates the requirements into a formal model written in Alloy, a specification language whose rules a SAT solver can check exhaustively, and whose verdict is deterministic, so the grade never depends on asking an LLM what it thinks. A banking requirement like "zero balance discrepancies" becomes a precise, checkable rule: the money leaving one account and the money arriving in another must always add up to the balances you started with, so a transfer can never quietly create or destroy money. The solver searches for any scenario that would break the rule. We modified Spec Kit's templates to force the model to output functional requirements and their corresponding Alloy code blocks in a structured format. Against the stock templates, that change alone nearly doubled the Alloy code compilation rate, jumping from 40 to 74 percent. A machine-written specification cannot be trusted, though, so the lifter does more than write it: it attacks it. Each load-bearing rule is deliberately broken by clearing its body and injecting a clause that forces a violation and the solver is re-run on the broken model. If the solver fails after this mutation, the original rule genuinely caught the violation it was meant to catch. If it still passes, the rule never really constrained anything on its own. Mutation testing usually grades a test suite against a specification that is assumed correct; here the roles are reversed, and the specification itself is on trial. Turn the requirements into measurable targets The second half is measurable. A KPI agent takes the same Spec Kit bundle, retrieves the most relevant principles from the Azure Well-Architected Framework, and derives numerical targets in the Goal-Question-Metric style. Each target carries an explicit threshold, a direction, and a measurement method, the kind of target a monitoring tool could actually track. Where earlier automated approaches stopped at describing quality in words, this half emits the actual numbers an implementation has to satisfy. And the knowledge base is a setting, not a fixture: swapping the Well-Architected Framework for ISO 25010, the NIST Cybersecurity Framework, or Google's SRE workbook requires zero changes to the underlying code. Review the report before any code Both graded halves merge into one human-readable verification report: the patterns the model applied, which rules passed, the counterexamples the solver found, the attack results, and the KPI threshold table. A developer reads it first and can see exactly where the specification is weak: a rule that passed for the wrong reason, or a requirement that nothing covers. After revising the specification, they re-run the lifting phase. Because the process is cached, re-runs are cheap, allowing the developer to loop until the report looks perfect, all before any code exists. The work shifts from reviewing generated code after the fact to curating a specification and reading a report before anything is built. Carry the graded context into code generation Only then does the report do its real job. In the guided pipeline, the merged report becomes the context handed to a code generator, which is asked to implement each rule, requirement, and KPI threshold and to leave markers tracing the code back to them. A baseline generator gets only the plain-language goal. Same generator, same settings; the only difference is whether it can see the graded specification. Feeding graded artefacts, rather than raw prose, into code generation is the piece that ties the whole pipeline together. So three choices separate this from simply asking a model for a spec: the specification is attacked rather than trusted, the targets are numbers rather than prose, and what reaches the code generator is graded evidence rather than raw text. How we tested it We ran the pipeline at scale: 270 Alloy lifts and 1,930 KPI records, across three application domains chosen to differ sharply (banking, software-as-a-service, and healthcare), three levels of requirement detail, four knowledge bases, and three model tiers, with ten runs of each combination so a real effect could be told apart from noise. For the code-generation half, we generated two codes for each case, once with the graded report as context and once from the plain-language goal alone, and compared the two. What we found First, the foundation: the specifications proved gradeable. The rubric cleanly separated sound specifications from degenerate ones. Because it returned the same verdict run after run, the grades are reliable enough to act on. The three key observations are as follows: The model matters more than the prompt Of the two knobs a practitioner controls, the model you choose and the amount of detail you write, the model dominated by roughly nine to one. A weak model could not be rescued by richer requirements. But you do not need the most expensive one: a mid-tier model delivered about 98 percent of the best model's quality at under a third of the cost and about half the time. The cheapest tier was a false economy, producing a model the analyser could even load only 23 percent of the time. More detail can backfire More requirements are not always better. Sparse and standard requirements scored the same, but over-specified requirements collapsed: KPI quality fell from about 0.89 to about 0.73, and the effect held across all four knowledge bases. Pile in too much numerical detail and the pipeline starts echoing the numbers it was handed instead of deriving sound ones, which is the opposite of what more detail is supposed to buy. Graded context produces far better code This is the payoff, and it is the point of the whole pipeline. Across all nine combinations of domain and detail, code generated with the graded verification context scored about 8 out of 10, against about 1 out of 10 for the same generator given only the plain-language goal. The guided code carried the traceability back to each requirement, the named rules, and the structural patterns that a bare prompt gives us no way to know about. This part of the study is a single run per combination, so we report the size and the consistency of the gap rather than a precise average, but the gap was large and it held in every case. What this means for you Four things to take from our study into your own work: Write requirements at a standard, middle level of detail. Not sparse, and not exhaustively numerical. The middle is the sweet spot on both halves of the specification. Reach for a capable mid-tier model before you invest in heavy prompt engineering. Model choice moves quality more than requirement detail does, and the mid tier is the value leader. Give the code generator externally graded context instead of letting it specify for itself. That is where most of the quality gain came from. Treat the knowledge base as a setting worth tuning, not a fixed ingredient. Each is a recommendation that data supports under the conditions we tested, not a universal law. The limit Every grade measures structure, not meaning. A high score says the specification is well-formed, discriminating, and stable. It does not say whether the invariants are the right ones, or the thresholds are the right ones for your deployment. A specification can be perfectly well-formed and still describe the wrong system. That judgement stays with a human, which is where we think it belongs. The pipeline is built to make that judgement efficient by moving it earlier, to curating the specification and reading the report, rather than to remove it. Generated code should not be shipped end to end without human validation. Try it The full pipeline, every input, and the artefacts behind every figure are in the project repository. If you want the Microsoft tools it builds on, start here: Project repository: https://github.com/RadaanMadhan/Specification-Led-Development GitHub Spec Kit: https://github.com/github/spec-kit Azure Well-Architected Framework: https://learn.microsoft.com/en-us/azure/well-architected/ If you'd like to explore the work in more detail, we've included the full technical report in the project repository, covering the related work, methodology, pipeline design, experimental setup, and extended results. About the team This project was carried out by six students at Imperial College London: Leon Hausmann, Charlotte Maxwell, Radaan Madhan, Keshav Das, Anson Huang, and Ander Cobo, in collaboration with Microsoft and supervised by Lee Stott (Microsoft) and Max Cattafi (Imperial College London)166Views1like0CommentsPair Programming and test-driven development with Visual Studio Live Share and GitHub Copilot
Pair programming and TDD are valuable software development techniques that can enhance the development process, and GitHub Copilot can help teams effectively utilize these practices to produce high-quality code.7.6KViews1like1CommentMake Your Copilot Credits Count: A Student's Guide to Smarter AI Usage
If you're a student enrolled in GitHub Education, you already have something most developers pay for: free access to GitHub Copilot and its premium features. That's incredible. But here's the thing, free access doesn't mean unlimited usage, and not all AI interactions cost the same. Every chat message, every agent task, every model call consumes something called AI Credits, and knowing how they work will help you use Copilot smarter, produce better code, and build the kind of disciplined AI habits that professional developers are only just starting to learn. This post is inspired by a fantastic deep-dive from my collegaue developer advocate Bruno: "GitHub Copilot and Tokens: How to Keep Using AI Without Burning Your Budget" . We've taken those professional lessons and tailored them specifically for students because your learning environment, your assignments, and your goals are different from a seasoned engineer at a tech company. TL;DR: Use autocomplete before chat. Choose the right model. Keep context small. Start fresh chats often. Plan before you build. These habits will make you a better developer and stretch your credits further. What Are AI Credits and Why Do They Matter? When you interact with GitHub Copilot through chat, agent mode, or inline edits the model processes tokens. Tokens are small chunks of text (roughly 3–4 characters each). Every interaction consumes: Input tokens — everything sent to the model (your message, attached files, chat history, instructions) Output tokens — everything the model generates back to you Cached tokens — context the model reuses from previous turns (cheaper) These tokens are converted to AI Credits, where 1 AI Credit = $0.01 USD. Different models have very different token costs a lightweight model like GPT-5 mini charges $0.25 per million input tokens, while a powerful model like GPT-5.5 charges $5.00 per million input tokens (20x more expensive). Using the wrong model for a simple task is like taking a taxi to a destination that's a 5-minute walk. See the official pricing table: GitHub Copilot Models and Pricing . Figure 1: The four cost tiers of Copilot interactions. Autocomplete and Next Edit Suggestions are free — they do not consume AI Credits on paid plans Strategy 1: Tab Before Chat The Free Tier is Powerful Here is the single most impactful habit you can build: always try autocomplete before opening chat. According to GitHub's official billing documentation, code completions and Next Edit Suggestions are not billed as AI Credits on paid plans. That means every time you press Tab to accept an inline suggestion, you are getting AI assistance for free. Use autocomplete (Tab) for: Completing a line or a simple function Generating repetitive boilerplate (constructors, properties, getters/setters) Completing a repeated pattern you've started Writing obvious next lines like console.log , imports, or variable declarations Adjusting variable names inline Only move to Inline Edit (Ctrl+I / Cmd+I) when autocomplete isn't enough for a local change. Only open a Chat window when you need genuine reasoning an explanation, a plan, or a multi-step solution. As Bruno puts it: "The most expensive model in the world should not be helping you write public string Name { get; set; } . That's what Tab is for. And coffee." Strategy 2: Choose the Right Model for the Job GitHub Copilot gives you access to models from OpenAI, Anthropic, and Google each at different price points and capability levels. The key insight from VS Code's official Copilot usage guide is: reserve powerful reasoning models for tasks that genuinely need them. Your Task Recommended Model Tier Example Models Simple question or boilerplate Lightweight GPT-5 mini, Gemini 3 Flash Code explanation or basic docs Lightweight GPT-5 mini, GPT-5.4 nano Writing tests or debugging a single function Medium / Versatile Claude Haiku 4.5, GPT-5.4 Multi-file refactor or code review Medium / Versatile Claude Sonnet 4.6, GPT-5.4 Complex system design or architecture Powerful Claude Opus 4.7, GPT-5.5 Long agentic workflows Powerful (scoped!) Claude Opus 4.8, GPT-5.5 Not sure what you need Auto (recommended default) Copilot selects for you GitHub Copilot's Auto Model Selection feature automatically chooses a model based on task complexity, availability, and policies. For most students, Auto should be your default only switch manually when you have a specific reason. And when the complex task is done, switch back to Auto or a lighter model. Strategy 3: Context is Currency Smaller is Smarter Here's the counterintuitive truth that surprises most developers: the expensive part of a prompt is usually not the question you type it's everything surrounding it. Every token consumed by Copilot includes: All your previous chat messages in the session Every file you have open or attached Workspace search results Copilot pulled in Build output, terminal logs, or diff content Responses from any MCP (Model Context Protocol) servers you have enabled Your custom instructions file ( .github/copilot-instructions.md ) A single question inside a conversation with 80 messages, 12 open files, and 3 tool call results can cost significantly more than the same question asked fresh in a new chat with one relevant file attached. Figure 2: The same task asked two ways. Scope your prompts to save credits and often get better answers. Practical rules for context management: Attach only 2–3 relevant files — not your entire project Don't ask Copilot to analyse the whole repo when you only need changes in one module Paste only the first relevant error from a log, not 2,000 lines of output Remove timestamps and duplicate stack traces from pasted logs State the expected output format explicitly so the model stops early Use /compact in VS Code Chat to summarise a long conversation without losing key context Use /fork to explore an alternative direction without polluting the main conversation Strategy 4: Start Fresh Chats When You Change Tasks This is one of the simplest optimisations and one of the most ignored. The VS Code Copilot usage guide is explicit about it: when a conversation grows, it carries context from all previous messages. If you switch to an unrelated task in the same session, the model still processes that irrelevant history and you pay for it in credits. Bad pattern: Chat session: - "Help me fix the JWT bug in auth.ts" [10 messages] - "Now write unit tests for my sorting algorithm" [still in same chat!] - "Can you generate the README for my project?" [still in same chat!] - "Now debug this CSS layout issue..." [still in same chat!] Smart pattern: Chat 1: "Fix JWT bug in auth.ts" - DONE, close chat. Chat 2: "Write unit tests for sorting algorithm" - DONE, close chat. Chat 3: "Generate README for project" - fresh context, fresh cost. New task = new chat. Your human brain benefits too — focused sessions produce better outcomes than sprawling multi-topic conversations. Strategy 5: Plan Before You Build Use Agent Mode Wisely Agent mode is one of the most powerful Copilot features for students working on larger assignments — it can create files, run terminal commands, edit across multiple files, and execute tests. But agent mode also carries the highest token cost, because it loops: it plans, acts, observes tool output, then plans again. The VS Code documentation recommends separating planning from implementation to reduce rework and back-and-forth. Here's a phased approach that saves credits and produces better results: Figure 3: The credit-smart workflow. Always try the cheaper option first, escalate only when needed. Phase 1: Plan (lightweight model, low cost) I need to add user authentication to my Express app. Before writing any code, give me a step-by-step plan covering which files to create, which packages to install, and what tests to write. Do not write code yet. Phase 2: Scoped Implementation (one feature at a time) Using the plan we agreed, implement only Step 1: create src/middleware/auth.ts with JWT validation. Do not modify any other files yet. Phase 3: Validate Run the existing tests in tests/auth.test.ts and report the results. Fix only test failures related to the new auth middleware. Phase 4: Cleanup The implementation is complete. Update README.md with setup instructions for the auth module. Keep it under 200 words. Each phase is small, scoped, and verifiable. You can stop at any phase, check the result, and only continue when you're satisfied. This dramatically reduces expensive re-runs where the agent reverses its own changes. Strategy 6: Review Your MCP Servers and Custom Instructions MCP Servers MCP (Model Context Protocol) servers let Copilot connect to external tools databases, GitHub issues, Jira, Slack, browser automation, and more. Each enabled server expands what the agent can do, but also adds to the context the model must consider, which increases token usage. For students, a practical rule: only enable MCP servers relevant to your current project. If you're working on a simple Python web app, you probably don't need browser automation, a Kubernetes connector, and a Slack integration all active at the same time. See the VS Code MCP servers documentation for how to enable, disable, and configure them. Custom Instructions A .github/copilot-instructions.md file in your repository lets you give Copilot standing instructions — coding standards, testing commands, architecture conventions. This is a fantastic feature. But that file is included in every prompt's context, so a bloated instructions file costs credits on every single interaction. A good custom instructions file is: Short — under 200 words for a student project Specific to this repository's real conventions Clear about test commands (e.g., npm test , pytest ) Free of generic advice that applies to every codebase on earth Example of a good student instructions file: # Copilot Instructions for MyWebApp Language: TypeScript (strict mode) Framework: Express.js with Prisma ORM Tests: Run with `npm test` (Jest) Lint: Run with `npm run lint` (ESLint + Prettier) Conventions: - Use async/await, not callbacks - Validate all request inputs with Zod - Keep controllers thin; put logic in service files - Write a test for every new public function That's it. Short, actionable, and genuinely useful — not a 500-line manifesto. Strategy 7: Use Traditional Tools First AI is excellent for reasoning, explaining, planning, and connecting ideas. It is not the right tool for every job. Before reaching for Copilot chat, ask yourself whether a traditional tool can answer your question faster, cheaper, and more reliably: Compiler / type-checker — to find type errors (TypeScript, mypy) Linter — to find style and logic issues (ESLint, Pylint, Checkstyle) Formatter — to fix formatting (Prettier, Black, gofmt) Test runner — to confirm whether your code works (Jest, pytest, JUnit) Debugger — to step through execution and inspect state Docs / Stack Overflow — for well-documented APIs and common patterns If your linter tells you there's a missing import, fix it directly — don't ask Copilot to analyse your code to find it. Let deterministic tools do deterministic work, and let AI do the reasoning where it genuinely adds value. Your GitHub Education Benefits: What You Get If you haven't already, apply for GitHub Education with your school email address. Once verified, you receive: Free GitHub Copilot including premium features — see how to enable Copilot as a student Free GitHub Codespaces — 180 core hours per month, equivalent to GitHub Pro (great for browser-based coding with Copilot built in) GitHub Student Developer Pack — free access to dozens of professional tools from GitHub's partners, including cloud credits, domains, and IDEs GitHub Classroom — your instructors can manage assignments and provide feedback GitHub Community Exchange — discover and contribute to student-built projects Campus Experts program — become a student leader in your tech community These benefits are designed to give you real-world tools in an educational setting. Copilot is the standout feature — it's the same tool professional developers use every day. Using it wisely during your studies means you'll arrive in the workforce already ahead of the curve. Pre-Prompt Checklist for Students Before you fire off your next Copilot prompt, run through this checklist. It takes 10 seconds and can save significant credits — and more importantly, it builds the mental habits of a professional AI user. Figure 4: Two-column checklist covering what to check before opening chat and when writing your prompt. Before you open chat: ☐ Can Tab / autocomplete solve this? ☐ Is inline edit (Ctrl+I) enough for this local change? ☐ Can a linter, compiler, or test runner answer this? ☐ Is this a different task from my last message? If so, start a new chat. ☐ Am I on Auto model selection (or the right tier for this task)? ☐ Should I ask for a plan before asking for code? ☐ Do I have MCP servers enabled that I don't need right now? ☐ Is my copilot-instructions.md file concise and current? When writing your prompt: ☐ Attach only 2–3 relevant files, not the whole project ☐ Paste only the first relevant error from any logs ☐ Define the files to change, the goal, and any files not to touch ☐ Ask for a plan before implementation on complex tasks ☐ Remove timestamps and duplicate stack traces from pasted logs ☐ State the expected output format and length ☐ Use /compact if the session is getting long ☐ Use /fork to explore alternatives without polluting the main thread A Note on Responsible AI Use in Education Using Copilot smartly is not just about saving credits it's about developing genuine skills. When you ask Copilot to write all your code without understanding it, you lose the learning opportunity the assignment was designed to create. When you review and understand every suggestion Copilot makes, you learn faster, build better instincts, and can confidently explain your own work. Best practices for academic integrity with AI tools: Understand before you accept — never paste code you can't explain Use Copilot to learn, not to skip learning — ask it to explain the code it generates Follow your institution's AI policy — many universities have specific guidance on AI use in assessments Treat Copilot as a senior pair-programmer, not an answer machine — question its suggestions, push back, iterate Verify facts and documentation links — AI can hallucinate; always check official sources GitHub Education exists to give you real professional tools while you learn. The goal is for you to graduate with genuine skills, a real portfolio, and the confidence that comes from building things yourself — with AI as your collaborator, not your ghostwriter. Key Takeaways Tab first — autocomplete and Next Edit Suggestions are free; use them for everything small Auto model by default — only switch to a powerful model when you have a clear reason Context is cost — fewer files, fewer messages, fewer tools = fewer tokens New task = new chat — don't carry stale context into unrelated work Plan before you build — a 10-message plan session is cheaper than 50 messages of rework Keep instructions short — your copilot-instructions.md runs on every prompt Use traditional tools first — linters and compilers are free, fast, and deterministic Understand your code — Copilot is a collaborator, not a replacement for learning Resources and Next Steps GitHub Education — apply for your free student benefits GitHub Student Developer Pack — explore free tools for students Enable GitHub Copilot as a student GitHub Copilot: Models and Pricing — understand exactly what each model costs Auto Model Selection in GitHub Copilot VS Code: Optimising GitHub Copilot Usage — the official guide that inspired many of these tips Managing MCP Servers in VS Code El Bruno: GitHub Copilot and Tokens (the original professional perspective) GitHub Education Community Discussions — connect with students and educators worldwide This post draws on insights from El Bruno's developer blog and best practices from GitHub Education. All pricing figures are sourced from the official GitHub Copilot billing documentation and are correct as of June 2026.3.9KViews0likes1CommentStep-by-Step: Setting Up GitHub Student and GitHub Copilot as an Authenticated Student Developer
To become an authenticated GitHub Student Developer, follow these steps: create a GitHub account, verify student status through a school email or contact GitHub support, sign up for the student developer pack, connect to Copilot and activate the GitHub Student Developer Pack benefits. The GitHub Student Developer Pack offers 100s of free software offers and other benefits such as Azure credit, Codespaces, a student gallery, campus experts program, and a learning lab. Copilot provides autocomplete-style suggestions from AI as you code. Visual Studio Marketplace also offers GitHub Copilot Labs, a companion extension with experimental features, and GitHub Copilot for autocomplete-style suggestions. Setting up your GitHub Student and GitHub Copilot as an authenticated Github Student Developer417KViews14likes17CommentsEmbracing Responsible AI: A Comprehensive Guide and Call to Action
In an age where artificial intelligence (AI) is becoming increasingly integrated into our daily lives, the need for responsible AI practices has never been more critical. From healthcare to finance, AI systems influence decisions affecting millions of people. As developers, organizations, and users, we are responsible for ensuring that these technologies are designed, deployed, and evaluated ethically. This blog will delve into the principles of responsible AI, the importance of assessing generative AI applications, and provide a call to action to engage with the Microsoft Learn Module on responsible AI evaluations. What is Responsible AI? Responsible AI encompasses a set of principles and practices aimed at ensuring that AI technologies are developed and used in ways that are ethical, fair, and accountable. Here are the core principles that define responsible AI: Fairness AI systems must be designed to avoid bias and discrimination. This means ensuring that the data used to train these systems is representative and that the algorithms do not favor one group over another. Fairness is crucial in applications like hiring, lending, and law enforcement, where biased AI can lead to significant societal harm. Transparency Transparency involves making AI systems understandable to users and stakeholders. This includes providing clear explanations of how AI models make decisions and what data they use. Transparency builds trust and allows users to challenge or question AI decisions when necessary. Accountability Developers and organizations must be held accountable for the outcomes of their AI systems. This includes establishing clear lines of responsibility for AI decisions and ensuring that there are mechanisms in place to address any negative consequences that arise from AI use. Privacy AI systems often rely on vast amounts of data, raising concerns about user privacy. Responsible AI practices involve implementing robust data protection measures, ensuring compliance with regulations like GDPR, and being transparent about how user data is collected, stored, and used. The Importance of Evaluating Generative AI Applications Generative AI, which includes technologies that can create text, images, music, and more, presents unique challenges and opportunities. Evaluating these applications is essential for several reasons: Quality Assessment Evaluating the output quality of generative AI applications is crucial to ensure that they meet user expectations and ethical standards. Poor-quality outputs can lead to misinformation, misrepresentation, and a loss of trust in AI technologies. Custom Evaluators Learning to create and use custom evaluators allows developers to tailor assessments to specific applications and contexts. This flexibility is vital in ensuring that the evaluation process aligns with the intended use of the AI system. Synthetic Datasets Generative AI can be used to create synthetic datasets, which can help in training AI models while addressing privacy concerns and data scarcity. Evaluating these synthetic datasets is essential to ensure they are representative and do not introduce bias. Call to Action: Engage with the Microsoft Learn Module To deepen your understanding of responsible AI and enhance your skills in evaluating generative AI applications, I encourage you to explore the Microsoft Learn Module available at this link. What You Will Learn: Concepts and Methodologies: The module covers essential frameworks for evaluating generative AI, including best practices and methodologies that can be applied across various domains. Hands-On Exercises: Engage in practical, code-first exercises that simulate real-world scenarios. These exercises will help you apply the concepts learned tangibly, reinforcing your understanding. Prerequisites: An Azure subscription (you can create one for free). Basic familiarity with Azure and Python programming. Tools like Docker and Visual Studio Code for local development. Why This Matters By participating in this module, you are not just enhancing your skills; you are contributing to a broader movement towards responsible AI. As AI technologies continue to evolve, the demand for professionals who understand and prioritize ethical considerations will only grow. Your engagement in this learning journey can help shape the future of AI, ensuring it serves humanity positively and equitably. Conclusion As we navigate the complexities of AI technology, we must prioritize responsible AI practices. By engaging with educational resources like the Microsoft Learn Module on responsible AI evaluations, we can equip ourselves with the knowledge and skills necessary to create AI systems that are not only innovative but also ethical and responsible. Join the movement towards responsible AI today! Take the first step by exploring the Microsoft Learn Module and become an advocate for ethical AI practices in your community and beyond. Together, we can ensure that AI serves as a force for good in our society. References Evaluate generative AI applications https://learn.microsoft.com/en-us/training/paths/evaluate-generative-ai-apps/?wt.mc_id=studentamb_263805 Azure Subscription for Students https://azure.microsoft.com/en-us/free/students/?wt.mc_id=studentamb_263805 Visual Studio Code https://code.visualstudio.com/?wt.mc_id=studentamb_263805933Views0likes0CommentsStudent Devs: Build AI Agents, Compete for $55K in Prizes
Student Devs: Build AI Agents, Compete for $55K in Prizes 🎮 AI Skills Fest • June 4–14, 2026 • Free to Enter $55K Prize Pool 3 Challenge Tracks 10 Days of Hacking Free To Enter Whether you're a first-year CS student or a final-year senior with a portfolio full of projects, Agents League is the best way to gain hands-on experience with agentic AI this summer and walk away with real skills employers are hiring for right now. What You'll Actually Learn Forget passive tutorials. Agents League is project-based learning at full speed. By the end of the hackathon, you'll have built a working AI agent and gained practical experience with the tools shaping the future of software development. 🤖 AI-Assisted Development Use GitHub Copilot to accelerate your coding workflow — from scaffolding to debugging — the way professional developers do today. 🧩 Multi-Step Reasoning Build agents with Microsoft Foundry that can plan, reason, and execute complex tasks — the core of agentic AI. 🏢 Enterprise AI Patterns Learn to build production-ready agents that integrate with Microsoft 365 and Copilot Studio — skills that translate directly to industry jobs. 🔧 Prompt Engineering Design effective prompts and orchestration flows that make AI agents reliable and useful in the real world. 📦 GitHub Workflows Submit your project through GitHub — practising version control, README writing, and open-source collaboration. 🎯 Competitive Problem-Solving Work under real constraints with deadlines, judging criteria, and peer competition — just like industry hackathons and sprints. Pick Your Track (or Try All Three) Agents League has three challenge tracks, each using different Microsoft AI tools. Choose based on your interests or stretch yourself by competing in multiple tracks. Track 01. Creative Apps Build an innovative application with AI-assisted development. This track rewards creativity, dream big and let GitHub Copilot help you bring ideas to life faster than ever. Tool: GitHub Copilot Track 02. Reasoning Agents Create intelligent agents that solve complex problems through multi-step reasoning. Think: agents that can research, plan, and act. This is the cutting edge of AI. Tool: Microsoft Foundry Track 03. Enterprise Agents Build knowledge agents that integrate with Microsoft 365 Copilot. Learn how businesses are deploying AI today and add enterprise AI to your skillset. Tool: Copilot Studio • M365 Opportunities You Won't Want to Miss Agents League isn't just a competition, it's a launchpad. Here's what's in it for you beyond the code: 💰 Win from a $55,000 USD Prize Pool Prizes are awarded across all three tracks smaller teams and solo hackers have a real shot. 📺 Watch Live Coding Battles at Microsoft Reactor See industry experts go head-to-head building AI agents live. Learn advanced techniques you can apply immediately to your own project. 🎓 Free Learning Resources on Microsoft Learn Access curated learning paths and the AI Skills Navigator, structured content designed to get you from zero to submission-ready. 🌍 Join a Global Developer Community Connect with thousands of developers on the Agents League Discord. Find teammates, ask questions, and build your professional network. 📂 Build Your Portfolio with a Real Project Every submission lives on GitHub. Walk away with a polished, public project that demonstrates your AI skills to future employers and grad schools. 🏆 Gain Recognition from Microsoft and the Community Top projects get visibility across the Microsoft developer ecosystem. Stand out from the crowd in internship and job applications. Key Dates to Remember Event Date Hacking Period Opens June 4, 2026 Registration Deadline June 12, 2026 — 12:00 PM PT Submission Deadline June 14, 2026 — 11:59 PM PT How to Get Started (Right Now) You don't have to wait until June 4th to start preparing. Here's your pre-hackathon game plan: Register for the hackathon it's free and open to everyone. Pick a track that matches your interests or curiosity. Explore the learning resources on Microsoft Learn and the AI Skills Navigator. Join the Discord community to find teammates and get early tips. Watch the Reactor event series for live coding battles and expert walkthroughs. Set up your GitHub repo and start experimenting before the hacking window opens. Helpful Links Register for Agents League Free entry, sign up now Microsoft Reactor Events Live coding battles & workshops AI Skills Fest The broader event Microsoft Learn Free learning paths The Arena Awaits 🏆 Ten days. Three tracks. $55K in prizes. Whether you go solo or squad up, this is your chance to build something real with AI and have a blast doing it. Register Now It's Free | Watch Reactor Events Agents League is part of AI Skills Fest and is open to the public at no cost. Review the Hackathon Rules and Regulations and the Microsoft Event Code of Conduct before participating.866Views0likes0CommentsIntegrating Microsoft Foundry with OpenClaw: Step by Step Model Configuration
Step 1: Deploying Models on Microsoft Foundry Let us kick things off in the Azure portal. To get our OpenClaw agent thinking like a genius, we need to deploy our models in Microsoft Foundry. For this guide, we are going to focus on deploying gpt-5.2-codex on Microsoft Foundry with OpenClaw. Navigate to your AI Hub, head over to the model catalog, choose the model you wish to use with OpenClaw and hit deploy. Once your deployment is successful, head to the endpoints section. Important: Grab your Endpoint URL and your API Keys right now and save them in a secure note. We will need these exact values to connect OpenClaw in a few minutes. Step 2: Installing and Initializing OpenClaw Next up, we need to get OpenClaw running on your machine. Open up your terminal and run the official installation script: curl -fsSL https://openclaw.ai/install.sh | bash The wizard will walk you through a few prompts. Here is exactly how to answer them to link up with our Azure setup: First Page (Model Selection): Choose "Skip for now". Second Page (Provider): Select azure-openai-responses. Model Selection: Select gpt-5.2-codex , For now only the models listed (hosted on Microsoft Foundry) in the picture below are available to be used with OpenClaw. Follow the rest of the standard prompts to finish the initial setup. Step 3: Editing the OpenClaw Configuration File Now for the fun part. We need to manually configure OpenClaw to talk to Microsoft Foundry. Open your configuration file located at ~/.openclaw/openclaw.json in your favorite text editor. Replace the contents of the models and agents sections with the following code block: { "models": { "providers": { "azure-openai-responses": { "baseUrl": "https://<YOUR_RESOURCE_NAME>.openai.azure.com/openai/v1", "apiKey": "<YOUR_AZURE_OPENAI_API_KEY>", "api": "openai-responses", "authHeader": false, "headers": { "api-key": "<YOUR_AZURE_OPENAI_API_KEY>" }, "models": [ { "id": "gpt-5.2-codex", "name": "GPT-5.2-Codex (Azure)", "reasoning": true, "input": ["text", "image"], "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }, "contextWindow": 400000, "maxTokens": 16384, "compat": { "supportsStore": false } }, { "id": "gpt-5.2", "name": "GPT-5.2 (Azure)", "reasoning": false, "input": ["text", "image"], "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }, "contextWindow": 272000, "maxTokens": 16384, "compat": { "supportsStore": false } } ] } } }, "agents": { "defaults": { "model": { "primary": "azure-openai-responses/gpt-5.2-codex" }, "models": { "azure-openai-responses/gpt-5.2-codex": {} }, "workspace": "/home/<USERNAME>/.openclaw/workspace", "compaction": { "mode": "safeguard" }, "maxConcurrent": 4, "subagents": { "maxConcurrent": 8 } } } } You will notice a few placeholders in that JSON. Here is exactly what you need to swap out: Placeholder Variable What It Is Where to Find It <YOUR_RESOURCE_NAME> The unique name of your Azure OpenAI resource. Found in your Azure Portal under the Azure OpenAI resource overview. <YOUR_AZURE_OPENAI_API_KEY> The secret key required to authenticate your requests. Found in Microsoft Foundry under your project endpoints or Azure Portal keys section. <USERNAME> Your local computer's user profile name. Open your terminal and type whoami to find this. Step 4: Restart the Gateway After saving the configuration file, you must restart the OpenClaw gateway for the new Foundry settings to take effect. Run this simple command: openclaw gateway restart Configuration Notes & Deep Dive If you are curious about why we configured the JSON that way, here is a quick breakdown of the technical details. Authentication Differences Azure OpenAI uses the api-key HTTP header for authentication. This is entirely different from the standard OpenAI Authorization: Bearer header. Our configuration file addresses this in two ways: Setting "authHeader": false completely disables the default Bearer header. Adding "headers": { "api-key": "<key>" } forces OpenClaw to send the API key via Azure's native header format. Important Note: Your API key must appear in both the apiKey field AND the headers.api-key field within the JSON for this to work correctly. The Base URL Azure OpenAI's v1-compatible endpoint follows this specific format: https://<your_resource_name>.openai.azure.com/openai/v1 The beautiful thing about this v1 endpoint is that it is largely compatible with the standard OpenAI API and does not require you to manually pass an api-version query parameter. Model Compatibility Settings "compat": { "supportsStore": false } disables the store parameter since Azure OpenAI does not currently support it. "reasoning": true enables the thinking mode for GPT-5.2-Codex. This supports low, medium, high, and xhigh levels. "reasoning": false is set for GPT-5.2 because it is a standard, non-reasoning model. Model Specifications & Cost Tracking If you want OpenClaw to accurately track your token usage costs, you can update the cost fields from 0 to the current Azure pricing. Here are the specs and costs for the models we just deployed: Model Specifications Model Context Window Max Output Tokens Image Input Reasoning gpt-5.2-codex 400,000 tokens 16,384 tokens Yes Yes gpt-5.2 272,000 tokens 16,384 tokens Yes No Current Cost (Adjust in JSON) Model Input (per 1M tokens) Output (per 1M tokens) Cached Input (per 1M tokens) gpt-5.2-codex $1.75 $14.00 $0.175 gpt-5.2 $2.00 $8.00 $0.50 Conclusion: And there you have it! You have successfully bridged the gap between the enterprise-grade infrastructure of Microsoft Foundry and the local autonomy of OpenClaw. By following these steps, you are not just running a chatbot; you are running a sophisticated agent capable of reasoning, coding, and executing tasks with the full power of GPT-5.2-codex behind it. The combination of Azure's reliability and OpenClaw's flexibility opens up a world of possibilities. Whether you are building an automated devops assistant, a research agent, or just exploring the bleeding edge of AI, you now have a robust foundation to build upon. Now it is time to let your agent loose on some real tasks. Go forth, experiment with different system prompts, and see what you can build. If you run into any interesting edge cases or come up with a unique configuration, let me know in the comments below. Happy coding!11KViews2likes2CommentsFrom Zero to 16 Games in 2 Hours
From Zero to 16 Games in 2 Hours: Teaching Prompt Engineering to Students with GitHub Copilot CLI Introduction What happens when you give a room full of 14-year-olds access to AI-powered development tools and challenge them to build games? You might expect chaos, confusion, or at best, a few half-working prototypes. Instead, we witnessed something remarkable: 16 fully functional HTML5 games created in under two hours, all from students with varying programming experience. This wasn't magic, it was the power of GitHub Copilot CLI combined with effective prompt engineering. By teaching students to communicate clearly with AI, we transformed a traditional coding workshop into a rapid prototyping session that exceeded everyone's expectations. The secret weapon? A technique called "one-shot prompting" that enables anyone to generate complete, working applications from a single, well-crafted prompt. In this article, we'll explore how we structured this workshop using CopilotCLI-OneShotPromptGameDev, a methodology designed to teach prompt engineering fundamentals while producing tangible, exciting results. Whether you're an educator planning STEM workshops, a developer exploring AI-assisted coding, or simply curious about how young people can leverage AI tools effectively, this guide provides a practical blueprint you can replicate. What is GitHub Copilot CLI? GitHub Copilot CLI extends the familiar Copilot experience beyond your code editor into the command line. While Copilot in VS Code suggests code completions as you type, Copilot CLI allows you to have conversational interactions with AI directly in your terminal. You describe what you want to accomplish in natural language, and the AI responds with shell commands, explanations, or in our case, complete code files. This terminal-based approach offers several advantages for learning and rapid prototyping. Students don't need to configure complex IDE settings or navigate unfamiliar interfaces. They simply type their request, review the AI's output, and iterate. The command line provides a transparent view of exactly what's happening, no hidden abstractions or magical "autocomplete" that obscures the learning process. For our workshop, Copilot CLI served as a bridge between students' creative ideas and working code. They could describe a game concept in plain English, watch the AI generate HTML, CSS, and JavaScript, then immediately test the result in a browser. This rapid feedback loop kept engagement high and made the connection between language and code tangible. Installing GitHub Copilot CLI Setting up Copilot CLI requires a few straightforward steps. Before the workshop, we ensured all machines were pre-configured, but students also learned the installation process as part of understanding how developer tools work. First, you'll need Node.js installed on your system. Copilot CLI runs as a Node package, so this is a prerequisite: # Check if Node.js is installed node --version # If not installed, download from https://nodejs.org/ # Or use a package manager: # Windows (winget) winget install OpenJS.NodeJS.LTS # macOS (Homebrew) brew install node # Linux (apt) sudo apt install nodejs npm These commands verify your Node.js installation or guide you through installing it using your operating system's preferred package manager. Next, install the GitHub CLI, which provides the foundation for Copilot CLI: # Windows winget install GitHub.cli # macOS brew install gh # Linux sudo apt install gh This installs the GitHub command-line interface, which handles authentication and provides the framework for Copilot integration. With GitHub CLI installed, authenticate with your GitHub account: gh auth login This command initiates an interactive authentication flow that connects your terminal to your GitHub account, enabling access to Copilot features. Finally, install the Copilot CLI extension: gh extension install github/gh-copilot This adds Copilot capabilities to your GitHub CLI installation, enabling the conversational AI features we'll use for game development. Verify the installation by running: gh copilot --help If you see the help output with available commands, you're ready to start prompting. The entire setup takes about 5-10 minutes on a fresh machine, making it practical for classroom environments. Understanding One-Shot Prompting Traditional programming education follows an incremental approach: learn syntax, understand concepts, build small programs, gradually tackle larger projects. This method is thorough but slow. One-shot prompting inverts this model—you start with the complete vision and let AI handle the implementation details. A one-shot prompt provides the AI with all the context it needs to generate a complete, working solution in a single response. Instead of iteratively refining code through multiple exchanges, you craft one comprehensive prompt that specifies requirements, constraints, styling preferences, and technical specifications. The AI then produces complete, functional code. This approach teaches a crucial skill: clear communication of technical requirements. Students must think through their entire game concept before typing. What does the game look like? How does the player interact with it? What happens when they win or lose? By forcing this upfront thinking, one-shot prompting develops the same analytical skills that professional developers use when writing specifications or planning architectures. The technique also demonstrates a powerful principle: with sufficient context, AI can handle implementation complexity while humans focus on creativity and design. Students learned they could create sophisticated games without memorizing JavaScript syntax—they just needed to describe their vision clearly enough for the AI to understand. Crafting Effective Prompts for Game Development The difference between a vague prompt and an effective one-shot prompt is the difference between frustration and success. We taught students a structured approach to prompt construction that consistently produced working games. Start with the game type and core mechanic. Don't just say "make a game"—specify what kind: Create a complete HTML5 game where the player controls a spaceship that must dodge falling asteroids. This opening establishes the fundamental gameplay loop: control a spaceship, avoid obstacles. The AI now has a clear mental model to work from. Add visual and interaction details. Games are visual experiences, so specify how things should look and respond: Create a complete HTML5 game where the player controls a spaceship that must dodge falling asteroids. The spaceship should be a blue triangle at the bottom of the screen, controlled by left and right arrow keys. Asteroids are brown circles that fall from the top at random positions and increasing speeds. These additions provide concrete visual targets and define the input mechanism. The AI can now generate specific CSS colors and event handlers. Define win/lose conditions and scoring: Create a complete HTML5 game where the player controls a spaceship that must dodge falling asteroids. The spaceship should be a blue triangle at the bottom of the screen, controlled by left and right arrow keys. Asteroids are brown circles that fall from the top at random positions and increasing speeds. Display a score that increases every second the player survives. The game ends when an asteroid hits the spaceship, showing a "Game Over" screen with the final score and a "Play Again" button. This complete prompt now specifies the entire game loop: gameplay, scoring, losing, and restarting. The AI has everything needed to generate a fully playable game. The formula students learned: Game Type + Visual Description + Controls + Rules + Win/Lose + Score = Complete Game Prompt. Running the Workshop: Structure and Approach Our two-hour workshop followed a carefully designed structure that balanced instruction with hands-on creation. We partnered with University College London and students access to GitHub Education to access resources specifically designed for classroom settings, including student accounts with Copilot access and amazing tools like VSCode and Azure for Students and for Schools VSCode Education. The first 20 minutes covered fundamentals: what is AI, how does Copilot work, and why does prompt quality matter? We demonstrated this with a live example, showing how "make a game" produces confused output while a detailed prompt generates playable code. This contrast immediately captured students' attention, they could see the direct relationship between their words and the AI's output. The next 15 minutes focused on the prompt formula. We broke down several example prompts, highlighting each component: game type, visuals, controls, rules, scoring. Students practiced identifying these elements in prompts before writing their own. This analysis phase prepared them to construct effective prompts independently. The remaining 85 minutes were dedicated to creation. Students worked individually or in pairs, brainstorming game concepts, writing prompts, generating code, testing in browsers, and iterating. Instructors circulated to help debug prompts (not code an important distinction) and encourage experimentation. We deliberately avoided teaching JavaScript syntax. When students encountered bugs, we guided them to refine their prompts rather than manually fix code. This maintained focus on the core skill: communicating with AI effectively. Surprisingly, this approach resulted in fewer bugs overall because students learned to be more precise in their initial descriptions. Student Projects: The Games They Created The diversity of games produced in 85 minutes of building time amazed everyone present. Students didn't just follow a template, they invented entirely new concepts and successfully communicated them to Copilot CLI. One student created a "Fruit Ninja" clone where players clicked falling fruit to slice it before it hit the ground. Another built a typing speed game that challenged players to correctly type increasingly difficult words against a countdown timer. A pair of collaborators produced a two-player tank battle where each player controlled their tank with different keyboard keys. Several students explored educational games: a math challenge where players solve equations to destroy incoming meteors, a geography quiz with animated maps, and a vocabulary builder where correct definitions unlock new levels. These projects demonstrated that one-shot prompting isn't limited to entertainment, students naturally gravitated toward useful applications. The most complex project was a procedurally generated maze game with fog-of-war mechanics. The student spent extra time on their prompt, specifying exactly how visibility should work around the player character. Their detailed approach paid off with a surprisingly sophisticated result that would typically require hours of manual coding. By the session's end, we had 16 complete, playable HTML5 games. Every student who participated produced something they could share with friends and family a tangible achievement that transformed an abstract "coding workshop" into a genuine creative accomplishment. Key Benefits of Copilot CLI for Rapid Prototyping Our workshop revealed several advantages that make Copilot CLI particularly valuable for rapid prototyping scenarios, whether in educational settings or professional development. Speed of iteration fundamentally changes what's possible. Traditional game development requires hours to produce even simple prototypes. With Copilot CLI, students went from concept to playable game in minutes. This compressed timeline enables experimentation, if your first idea doesn't work, try another. This psychological freedom to fail fast and try again proved more valuable than any technical instruction. Accessibility removes barriers to entry. Students with no prior coding experience produced results comparable to those who had taken programming classes. The playing field leveled because success depended on creativity and communication rather than memorized syntax. This democratization of development opens doors for students who might otherwise feel excluded from technical fields. Focus on design over implementation teaches transferable skills. Whether students eventually become programmers, designers, product managers, or pursue entirely different careers, the ability to clearly specify requirements and think through complete systems applies universally. They learned to think like system designers, not just coders. The feedback loop keeps engagement high. Seeing your words transform into working software within seconds creates an addictive cycle of creation and testing. Students who typically struggle with attention during lectures remained focused throughout the building session. The immediate gratification of seeing their games work motivated continuous refinement. Debugging through prompts teaches root cause analysis. When games didn't work as expected, students had to analyze what they'd asked for versus what they received. This comparison exercise developed critical thinking about specifications a skill that serves developers throughout their careers. Tips for Educators: Running Your Own Workshop If you're planning to replicate this workshop, several lessons from our experience will help ensure success. Pre-configure machines whenever possible. While installation is straightforward, classroom time is precious. Having Copilot CLI ready on all devices lets you dive into content immediately. If pre-configuration isn't possible, allocate the first 15-20 minutes specifically for setup and troubleshoot as a group. Prepare example prompts across difficulty levels. Some students will grasp one-shot prompting immediately; others will need more scaffolding. Having templates ranging from simple ("Create Pong") to complex (the spaceship example above) lets you meet students where they are. Emphasize that "prompt debugging" is the goal. When students ask for help fixing broken code, redirect them to examine their prompt. What did they ask for? What did they get? Where's the gap? This redirection reinforces the workshop's core learning objective and builds self-sufficiency. Celebrate and share widely. Build in time at the end for students to demonstrate their games. This showcase moment validates their work and often inspires classmates to try new approaches in future sessions. Consider creating a shared folder or simple website where all games can be accessed after the workshop. Access GitHub Education resources at education.github.com before your workshop. The GitHub Education program provides free access to developer tools for students and educators, including Copilot. The resources there include curriculum materials, teaching guides, and community support that can enhance your workshop. Beyond Games: Where This Leads The techniques students learned extend far beyond game development. One-shot prompting with Copilot CLI works for any development task: creating web pages, building utilities, generating data processing scripts, or prototyping application interfaces. The fundamental skill, communicating requirements clearly to AI applies wherever AI-assisted development tools are used. Several students have continued exploring after the workshop. Some discovered they enjoy the creative aspects of game design and are learning traditional programming to gain more control. Others found that prompt engineering itself interests them, they're exploring how different phrasings affect AI outputs across various domains. For professional developers, the workshop's lessons apply directly to working with Copilot, ChatGPT, and other AI coding assistants. The ability to craft precise, complete prompts determines whether these tools save time or create confusion. Investing in prompt engineering skills yields returns across every AI-assisted workflow. Key Takeaways Clear prompts produce working code: The one-shot prompting formula (Game Type + Visuals + Controls + Rules + Win/Lose + Score) reliably generates playable games from single prompts Copilot CLI democratizes development: Students with no coding experience created functional applications by focusing on communication rather than syntax Rapid iteration enables experimentation: Minutes-per-prototype timelines encourage creative risk-taking and learning from failures Prompt debugging builds analytical skills: Comparing intended versus actual results teaches specification writing and root cause analysis Sixteen games in two hours is achievable: With proper structure and preparation, young students can produce impressive results using AI-assisted development Conclusion and Next Steps Our workshop demonstrated that AI-assisted development tools like GitHub Copilot CLI aren't just productivity boosters for experienced programmers, they're powerful educational instruments that make software creation accessible to beginners. By focusing on prompt engineering rather than traditional syntax instruction, we enabled 14-year-old students to produce complete, functional games in a fraction of the time traditional methods would require. The sixteen games created during those two hours represent more than just workshop outputs. They represent a shift in how we might teach technical creativity: start with vision, communicate clearly, iterate quickly. Whether students pursue programming careers or not, they've gained experience in thinking systematically about requirements and translating ideas into specifications that produce real results. To explore this approach yourself, visit the CopilotCLI-OneShotPromptGameDev repository for prompt templates, workshop materials, and example games. For educational resources and student access to GitHub tools including Copilot, explore GitHub Education. And most importantly, start experimenting. Write a prompt, generate some code, and see what you can create in the next few minutes. Resources CopilotCLI-OneShotPromptGameDev Repository - Workshop materials, prompt templates, and example games GitHub Education - Free developer tools and resources for students and educators GitHub Copilot CLI Documentation - Official installation and usage guide GitHub CLI - Foundation tool required for Copilot CLI GitHub Copilot - Overview of Copilot features and pricing710Views2likes3CommentsChoosing the Right Intelligence Layer for Your Application
Introduction One of the most common questions developers ask when planning AI-powered applications is: "Should I use the GitHub Copilot SDK or the Microsoft Agent Framework?" It's a natural question, both technologies let you add an intelligence layer to your apps, both come from Microsoft's ecosystem, and both deal with AI agents. But they solve fundamentally different problems, and understanding where each excels will save you weeks of architectural missteps. The short answer is this: the Copilot SDK puts Copilot inside your app, while the Agent Framework lets you build your app out of agents. They're complementary, not competing. In fact, the most interesting applications use both, the Agent Framework as the system architecture and the Copilot SDK as a powerful execution engine within it. This article breaks down each technology's purpose, architecture, and ideal use cases. We'll walk through concrete scenarios, examine a real-world project that combines both, and give you a decision framework for your own applications. Whether you're building developer tools, enterprise workflows, or data analysis pipelines, you'll leave with a clear understanding of which tool belongs where in your stack. The Core Distinction: Embedding Intelligence vs Building With Intelligence Before comparing features, it helps to understand the fundamental design philosophy behind each technology. They approach the concept of "adding AI to your application" from opposite directions. The GitHub Copilot SDK exposes the same agentic runtime that powers Copilot CLI as a programmable library. When you use it, you're embedding a production-tested agent, complete with planning, tool invocation, file editing, and command execution, directly into your application. You don't build the orchestration logic yourself. Instead, you delegate tasks to Copilot's agent loop and receive results. Think of it as hiring a highly capable contractor: you describe the job, and the contractor figures out the steps. The Microsoft Agent Framework is a framework for building, orchestrating, and hosting your own agents. You explicitly model agents, workflows, state, memory, hand-offs, and human-in-the-loop interactions. You control the orchestration, policies, deployment, and observability. Think of it as designing the company that employs those contractors: you define the roles, processes, escalation paths, and quality controls. This distinction has profound implications for what you build and how you build it. GitHub Copilot SDK: When Your App Wants Copilot-Style Intelligence The GitHub Copilot SDK is the right choice when you want to embed agentic behavior into an existing application without building your own planning or orchestration layer. It's optimized for developer workflows and task automation scenarios where you need an AI agent to do things, edit files, run commands, generate code, interact with tools, reliably and quickly. What You Get Out of the Box The SDK communicates with the Copilot CLI server via JSON-RPC, managing the CLI process lifecycle automatically. This means your application inherits capabilities that have been battle-tested across millions of Copilot CLI users: Planning and execution: The agent analyzes tasks, breaks them into steps, and executes them autonomously Built-in tool support: File system operations, Git operations, web requests, and shell command execution work out of the box MCP (Model Context Protocol) integration: Connect to any MCP server to extend the agent's capabilities with custom data sources and tools Multi-language support: Available as SDKs for Python, TypeScript/Node.js, Go, and .NET Custom tool definitions: Define your own tools and constrain which tools the agent can access BYOK (Bring Your Own Key): Use your own API keys from OpenAI, Azure AI Foundry, or Anthropic instead of GitHub authentication Architecture The SDK's architecture is deliberately simple. Your application communicates with the Copilot CLI running in server mode: Your Application ↓ SDK Client ↓ JSON-RPC Copilot CLI (server mode) The SDK manages the CLI process lifecycle automatically. You can also connect to an external CLI server if you need more control over the deployment. This simplicity is intentional, it keeps the integration surface small so you can focus on your application logic rather than agent infrastructure. Ideal Use Cases for the Copilot SDK The Copilot SDK shines in scenarios where you need a competent agent to execute tasks on behalf of users. These include: AI-powered developer tools: IDEs, CLIs, internal developer portals, and code review tools that need to understand, generate, or modify code "Do the task for me" agents: Applications where users describe what they want—edit these files, run this analysis, generate a pull request and the agent handles execution Rapid prototyping with agentic behavior: When you need to ship an intelligent feature quickly without building a custom planning or orchestration system Internal tools that interact with codebases: Build tools that explore repositories, generate documentation, run migrations, or automate repetitive development tasks A practical example: imagine building an internal CLI that lets engineers say "set up a new microservice with our standard boilerplate, CI pipeline, and monitoring configuration." The Copilot SDK agent would plan the file creation, scaffold the code, configure the pipeline YAML, and even run initial tests, all without you writing orchestration logic. Microsoft Agent Framework: When Your App Is the Intelligence System The Microsoft Agent Framework is the right choice when you need to build a system of agents that collaborate, maintain state, follow business processes, and operate with enterprise-grade governance. It's designed for long-running, multi-agent workflows where you need fine-grained control over every aspect of orchestration. What You Get Out of the Box The Agent Framework provides a comprehensive foundation for building sophisticated agent systems in both Python and .NET: Graph-based workflows: Connect agents and deterministic functions using data flows with streaming, checkpointing, human-in-the-loop, and time-travel capabilities Multi-agent orchestration: Define how agents collaborate, hand off tasks, escalate decisions, and share state Durability and checkpoints: Workflows can pause, resume, and recover from failures, essential for business-critical processes Human-in-the-loop: Built-in support for approval gates, review steps, and human override points Observability: OpenTelemetry integration for distributed tracing, monitoring, and debugging across agent boundaries Multiple agent providers: Use Azure OpenAI, OpenAI, and other LLM providers as the intelligence behind your agents DevUI: An interactive developer UI for testing, debugging, and visualizing workflow execution Architecture The Agent Framework gives you explicit control over the agent topology. You define agents, connect them in workflows, and manage the flow of data between them: ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │ Agent A │────▶│ Agent B │────▶│ Agent C │ │ (Planner) │ │ (Executor) │ │ (Reviewer) │ └─────────────┘ └──────────────┘ └──────────────┘ Define Execute Validate strategy tasks output Each agent has its own instructions, tools, memory, and state. The framework manages communication between agents, handles failures, and provides visibility into what's happening at every step. This explicitness is what makes it suitable for enterprise applications where auditability and control are non-negotiable. Ideal Use Cases for the Agent Framework The Agent Framework excels in scenarios where you need a system of coordinated agents operating under business rules. These include: Multi-agent business workflows: Customer support pipelines, research workflows, operational processes, and data transformation pipelines where different agents handle different responsibilities Systems requiring durability: Workflows that run for hours or days, need checkpoints, can survive restarts, and maintain state across sessions Governance-heavy applications: Processes requiring approval gates, audit trails, role-based access, and compliance documentation Agent collaboration patterns: Applications where agents need to negotiate, escalate, debate, or refine outputs iteratively before producing a final result Enterprise data pipelines: Complex data processing workflows where AI agents analyze, transform, and validate data through multiple stages A practical example: an enterprise customer support system where a triage agent classifies incoming tickets, a research agent gathers relevant documentation and past solutions, a response agent drafts replies, and a quality agent reviews responses before they reach the customer, with a human escalation path when confidence is low. Side-by-Side Comparison To make the distinction concrete, here's how the two technologies compare across key dimensions that matter when choosing an intelligence layer for your application. Dimension GitHub Copilot SDK Microsoft Agent Framework Primary purpose Embed Copilot's agent runtime into your app Build and orchestrate your own agent systems Orchestration Handled by Copilot's agent loop, you delegate You define explicitly, agents, workflows, state, hand-offs Agent count Typically single agent per session Multi-agent systems with agent-to-agent communication State management Session-scoped, managed by the SDK Durable state with checkpointing, time-travel, persistence Human-in-the-loop Basic, user confirms actions Rich approval gates, review steps, escalation paths Observability Session logs and tool call traces Full OpenTelemetry, distributed tracing, DevUI Best for Developer tools, task automation, code-centric workflows Enterprise workflows, multi-agent systems, business processes Languages Python, TypeScript, Go, .NET Python, .NET Learning curve Low, install, configure, delegate tasks Moderate, design agents, workflows, state, and policies Maturity Technical Preview Preview with active development, 7k+ stars, 100+ contributors Real-World Example: Both Working Together The most compelling applications don't choose between these technologies, they combine them. A perfect demonstration of this complementary relationship is the Agentic House project by my colleague Anthony Shaw, which uses an Agent Framework workflow to orchestrate three agents, one of which is powered by the GitHub Copilot SDK. The Problem Agentic House lets users ask natural language questions about their Home Assistant smart home data. Questions like "what time of day is my phone normally fully charged?" or "is there a correlation between when the back door is open and the temperature in my office?" require exploring available data, writing analysis code, and producing visual results—a multi-step process that no single agent can handle well alone. The Architecture The project implements a three-agent pipeline using the Agent Framework for orchestration: ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │ Planner │────▶│ Coder │────▶│ Reviewer │ │ (GPT-4.1) │ │ (Copilot) │ │ (GPT-4.1) │ └─────────────┘ └──────────────┘ └──────────────┘ Plan Notebook Approve/ analysis generation Reject Planner Agent: Takes a natural language question and creates a structured analysis plan, which Home Assistant entities to query, what visualizations to create, what hypotheses to test. This agent uses GPT-4.1 through Azure AI Foundry or GitHub Models. Coder Agent: Uses the GitHub Copilot SDK to generate a complete Jupyter notebook that fetches data from the Home Assistant REST API via MCP, performs the analysis, and creates visualizations. The Copilot agent is constrained to only use specific tools, demonstrating how the SDK supports tool restriction. Reviewer Agent: Acts as a security gatekeeper, reviewing the generated notebook to ensure it only reads and displays data. It rejects notebooks that attempt to modify Home Assistant state, import dangerous modules, make external network requests, or contain obfuscated code. Why This Architecture Works This design demonstrates several principles about when to use which technology: Agent Framework provides the workflow: The sequential pipeline with planning, execution, and review is a classic Agent Framework pattern. Each agent has a clear role, and the framework manages the flow between them. Copilot SDK provides the coding execution: The Coder agent leverages Copilot's battle-tested ability to generate code, work with files, and use MCP tools. Building a custom code generation agent from scratch would take significantly longer and produce less reliable results. Tool constraints demonstrate responsible AI: The Copilot SDK agent is constrained to only specific tools, showing how you can embed powerful agentic behavior while maintaining security boundaries. Standalone agents handle planning and review: The Planner and Reviewer use simpler LLM-based agents, they don't need Copilot's code execution capabilities, just good reasoning. While the Home Assistant data is a fun demonstration, the pattern is designed for something much more significant: applying AI agents for complex research against private data sources. The same architecture could analyze internal databases, proprietary datasets, or sensitive business metrics. Decision Framework: Which Should You Use? When deciding between the Copilot SDK and the Agent Framework, or both, consider these questions about your application. Start with the Copilot SDK if: You need a single agent to execute tasks autonomously (code generation, file editing, command execution) Your application is developer-facing or code-centric You want to ship agentic features quickly without building orchestration infrastructure The tasks are session-scoped, they start and complete within a single interaction You want to leverage Copilot's existing tool ecosystem and MCP integration Start with the Agent Framework if: You need multiple agents collaborating with different roles and responsibilities Your workflows are long-running, require checkpoints, or need to survive restarts You need human-in-the-loop approvals, escalation paths, or governance controls Observability and auditability are requirements (regulated industries, enterprise compliance) You're building a platform where the agents themselves are the product Use both together if: You need a multi-agent workflow where at least one agent requires strong code execution capabilities You want Agent Framework's orchestration with Copilot's battle-tested agent runtime as one of the execution engines Your system involves planning, coding, and review stages that benefit from different agent architectures You're building research or analysis tools that combine AI reasoning with code generation Getting Started Both technologies are straightforward to install and start experimenting with. Here's how to get each running in minutes. GitHub Copilot SDK Quick Start Install the SDK for your preferred language: # Python pip install github-copilot-sdk # TypeScript / Node.js npm install @github/copilot-sdk # .NET dotnet add package GitHub.Copilot.SDK # Go go get github.com/github/copilot-sdk/go The SDK requires the Copilot CLI to be installed and authenticated. Follow the Copilot CLI installation guide to set that up. A GitHub Copilot subscription is required for standard usage, though BYOK mode allows you to use your own API keys without GitHub authentication. Microsoft Agent Framework Quick Start Install the framework: # Python pip install agent-framework --pre # .NET dotnet add package Microsoft.Agents.AI The Agent Framework supports multiple LLM providers including Azure OpenAI and OpenAI directly. Check the quick start tutorial for a complete walkthrough of building your first agent. Try the Combined Approach To see both technologies working together, clone the Agentic House project: git clone https://github.com/tonybaloney/agentic-house.git cd agentic-house uv sync You'll need a Home Assistant instance, the Copilot CLI authenticated, and either a GitHub token or Azure AI Foundry endpoint. The project's README walks through the full setup, and the architecture provides an excellent template for building your own multi-agent systems with embedded Copilot capabilities. Key Takeaways Copilot SDK = "Put Copilot inside my app": Embed a production-tested agentic runtime with planning, tool execution, file edits, and MCP support directly into your application Agent Framework = "Build my app out of agents": Design, orchestrate, and host multi-agent systems with explicit workflows, durable state, and enterprise governance They're complementary, not competing: The Copilot SDK can act as a powerful execution engine inside Agent Framework workflows, as demonstrated by the Agentic House project Choose based on your orchestration needs: If you need one agent executing tasks, start with the Copilot SDK. If you need coordinated agents with business logic, start with the Agent Framework The real power is in combination: The most sophisticated applications use Agent Framework for workflow orchestration and the Copilot SDK for high-leverage task execution within those workflows Conclusion and Next Steps The question isn't really "Copilot SDK or Agent Framework?" It's "where does each fit in my architecture?" Understanding this distinction unlocks a powerful design pattern: use the Agent Framework to model your business processes as agent workflows, and use the Copilot SDK wherever you need a highly capable agent that can plan, code, and execute autonomously. Start by identifying your application's needs. If you're building a developer tool that needs to understand and modify code, the Copilot SDK gets you there fast. If you're building an enterprise system where multiple AI agents need to collaborate under governance constraints, the Agent Framework provides the architecture. And if you need both, as most ambitious applications do, now you know how they fit together. The AI development ecosystem is moving rapidly. Both technologies are in active development with growing communities and expanding capabilities. The architectural patterns you learn today, embedding intelligent agents, orchestrating multi-agent workflows, combining execution engines with orchestration frameworks, will remain valuable regardless of how the specific tools evolve. Resources GitHub Copilot SDK Repository – SDKs for Python, TypeScript, Go, and .NET with documentation and examples Microsoft Agent Framework Repository – Framework source, samples, and workflow examples for Python and .NET Agentic House – Real-world example combining Agent Framework with Copilot SDK for smart home data analysis Agent Framework Documentation – Official Microsoft Learn documentation with tutorials and user guides Copilot CLI Installation Guide – Setup instructions for the CLI that powers the Copilot SDK Copilot SDK Getting Started Guide – Step-by-step tutorial for SDK integration Copilot SDK Cookbook – Practical recipes for common tasks across all supported languages1.4KViews3likes0Comments