copilot app
1 TopicGitHub Copilot App Canvas Is a Runtime
There is a quiet shift happening in how we build software with AI. We are moving from writing static code to orchestrating living systems where developers and AI agents co-create, observe, and evolve a solution in real time. This post is a working theory of what GitHub Copilot App Canvas is actually for, grounded in a real, runnable demo you can clone today: leestott/agent-runtime-canvas. The Agent Runtime canvas open beside the chat — control bar, activity spotlight, requirement & constraints, and the live agent roster. The headline claim, which the rest of this post defends with code: Traditional UIs are for using software. Canvas is for shaping software while it runs. 1. The misconception worth getting out of the way The first instinct most engineers have when they see Canvas is to build a UI with it a dashboard, a DevOps board, an admin panel. That is the wrong mental model, and it leads to disappointment. A Kanban board rendered in Canvas is just a worse version of a tool that already exists. Canvas is not where your users live. It is where your system becomes visible to you and to the AI while you are still figuring it out. The distinction matters: You don't build Canvas instead of your UI. You use Canvas to figure out, test, and evolve the UI and the system before and during building it. Canvas solves problems your final UI should never try to solve in a visible way agent coordination, intermediate state, test validation, failure propagation. These are observability concerns, not end-user features. Canvas is intended for test validation and the implementation of agent-driven solutions not for shipping a production control panel. A useful analogy: Figma is Human-to-Human one person designs a static artifact for another person to read. Canvas is Human-to-AI-to-System a shared surface where a human, an AI agent, and a running system all act on the same live model. Figma shows you a picture of the software. Canvas is a runtime where things actually execute. 2. The positioning, stated plainly Here is the thesis the demo is built to prove: Canvas redefines software development by shifting from writing static code to orchestrating living systems, where developers and AI co-create, observe, and evolve solutions in real time. Instead of building UIs for users, we build interactive environments for agents — turning debugging, testing, and execution into a continuous, visual feedback loop that accelerates innovation and brings ideas to production faster than ever. Read that again with the demo in mind, because the demo is not a slide, it is a working Copilot CLI extension that renders exactly this loop. 3. What we built: the Agent Runtime canvas agent-runtime-canvas is a GitHub Copilot CLI canvas extension called Agent Runtime. It turns Canvas into a runtime observability and control plane for a multi-agent software system that is being designed, tested, and evolved in real time. The canvas renders a single living SystemModel that both humans and the AI agent edit at the same time. The agent drives it through five canvas actions; the human drives it through panel controls. Every change streams to the iframe over Server-Sent Events (SSE), so the system visibly evolves through interaction. The seven panels: a system you can watch think Panel What it makes observable Requirement & constraints The feature under design plus editable policies and constraints Agents Active agents, their responsibilities, and live state (idle / working / done / error / blocked) Task Flow The dependency graph of tasks across agents, with live status Artifacts The intermediate outputs each task emits Validation Test cases, pass/fail, expected vs. actual, and the reasoning behind each verdict Live State The shared memory objects the agents read and write — directly human-editable Timeline A change-over-time log, including before→after state diffs None of these are things you would put in front of an end user. All of them are things you desperately want to see while you and an AI are co-designing an agentic system. The five agent actions The AI co-creates and evolves the system by calling five actions, declared in the canvas extension: Action Effect decompose_system Break a requirement into collaborating agents + a task-flow graph execute_workflow Coordinate agents to advance tasks ( step / run / pause / resume / reset ) validate_output Run evaluation tests, return structured pass/fail + reasoning update_system_design Modify architecture/logic: requirement, constraints, agents, tasks track_state Persist/update a shared state object, recording the diff on the timeline The critical detail is that human controls and agent actions funnel through the exact same store. There is no separate "AI view" and "human view" — one model, two kinds of participant. 4. How it actually works (the parts that matter) The extension is deliberately small and dependency-free. It uses only Node's built-in modules plus github/copilot-sdk , which the CLI auto-resolves. Three files do the work: .github/extensions/agent-runtime/ extension.mjs # wiring: loopback HTTP server, SSE, /control, 5 canvas actions store.mjs # durable SystemModel + execution engine + validation ui.mjs # iframe renderer (system view, validation, state, timeline) One shared model, broadcast on every mutation The heart of the demo is the SystemStore . It is an EventEmitter : every mutation bumps a version, appends a timeline entry, persists to disk, and broadcasts a fresh snapshot to all connected panels. This is the single line that makes "humans and AI edit the same live system" true rather than aspirational: // store.mjs — every change is versioned, logged, persisted, and broadcast. _commit(eventType, summary, detail) { this.model.version += 1; this.model.updatedAt = now(); if (eventType) { this.model.timeline.unshift({ id: uid("ev"), ts: now(), type: eventType, summary, detail: detail || null, }); this.model.timeline = this.model.timeline.slice(0, 200); } this._queueSave(); // best-effort JSON persistence under ~/.copilot this.emit("change", this.model); // fan out to every SSE client return this.model; } The agent action and the human button hit the same method In extension.mjs , the canvas action handler and the iframe's /control POST both call store.execute(...) . That symmetry is the whole point — neither the human nor the AI is privileged: // extension.mjs — a human control POST maps onto the same store method // the AI agent calls through the execute_workflow canvas action. function applyControl(store, body) { switch (body.action) { case "execute": return store.execute(body.mode || "step", body); case "validate": return store.validate(body.tests); case "decompose":return store.decompose(body.requirement, body); case "inject_failure": return store.injectFailure(body.taskKey); case "edit_state": return store.editState(body.key, body.value); // ...requirement, constraints, clear_failures, update_design } } Execution you can watch one task at a time The engine advances the task graph through a visible begin→dwell→finish lifecycle so the active agent is always observable. A ready task is one whose dependencies are all done : // store.mjs — the scheduler only starts a task when its deps are satisfied. _readyTask() { return this.model.tasks.find( (t) => t.status === "pending" && t.deps.every((d) => { const dep = this.model.tasks.find((x) => x.id === d); return dep && dep.status === "done"; }), ); } When a task finishes, its agent emits an artifact and writes to shared state; when a dependency fails, the engine walks the graph to a fixpoint and marks every downstream task blocked . That is failure propagation you can see — exactly the kind of thing a production UI would (correctly) hide, and exactly the kind of thing you need exposed while designing the system. Validation as a first-class, re-runnable citizen The default evaluation suite asserts properties of the running system, not of static code — every test returns an expected value, an actual value, and a human -readable reason: // store.mjs — tests assert properties of the live system model. _defaultTests() { const t = (name, target, assertion) => ({ id: uid("test"), name, target, assertion }); return [ t("All tasks reach a terminal state", "tasks", "no_pending"), t("No tasks failed", "tasks", "none_failed"), t("Every completed task emitted an artifact", "artifacts", "artifact_per_done"), t("Design state populated before build", "state", "design_before_build"), t("Decision recorded by Reviewer", "state", "has_decision"), ]; } This is the "continuous, visual feedback loop" from the thesis, made concrete: decompose → execute → validate → redesign → re-validate, with the Timeline recording every before→after transition. 5. Run it yourself You need a GitHub Copilot CLI / app with canvas support (the canvas-renderer capability) and this repo opened as your workspace. There is no npm install the SDK is auto-resolved and the extension uses only built-in Node modules. Clone and open the workspace. git clone https://github.com/leestott/agent-runtime-canvas.git cd agent-runtime-canvas The extension auto-discovers from .github/extensions/agent-runtime/ . Open the canvas with a requirement. Ask Copilot: Open the Agent Runtime canvas with the requirement "Add CSV export to the reports page". Walk the loop. Decompose into five agents and a six-task graph, press Run ▶, watch the spotlight track the active agent, press Run tests ✓ for 5/5 green, then Inject failure ⚡ to watch downstream tasks go blocked and validation drop to 4/5 — and recover. State persists per documentId under ~/.copilot/extensions/agent-runtime/artifacts/ , so a reload resumes exactly where you left off. The companion demoscript.md in the repo gives you a tight, timed walkthrough. 6. Why this is an observability story Once you accept that Canvas is a runtime rather than a UI, the most compelling use case becomes observability of agentic systems. Agentic software is notoriously hard to debug: the interesting behavior lives in intermediate state, coordination order, and the moments where one agent's failure cascades into another's. A production UI is designed to hide all of that. A Canvas is designed to surface it, temporarily, while you are shaping the system — and then get out of the way. This reframes Canvas alongside the broader Microsoft and GitHub agent tooling story. As teams adopt the GitHub Copilot SDK and patterns like the open Model Context Protocol to wire agents into real systems, the gap is rarely "can the agent act?" it is "can a human see what the agent did, judge it, and steer it?" Canvas is a candidate answer to that second question. When you take agents toward production on Azure with services like Microsoft Foundry, the same instinct applies: build the evaluation and observability loop first, and let it shape the system before you commit a single end-user pixel. 7. The open question: why can't Canvas be multi-user? There is an obvious next frontier, and it is worth stating as an honest open question rather than a finished feature. Everything that makes Canvas valuable also makes it a natural collaborative surface: It is a shared space. It is visual. It is collaborative. Multiple participants — human and AI — interact with the same surface. If Figma earned its place by making Human-to-Human design multiplayer, the provocative question is whether a project- or repo-scoped Canvas can make Human-to-AI-to-System development multiplayer too: several engineers and several agents shaping one running system on one surface. The demo here is single-user by design, but its architecture — one shared store, versioned, broadcast to every subscriber — is already the shape you would need. That is a genuine research direction, and worth experimenting with as licensing and access broaden. 8. Honest limitations In the spirit of building credibility rather than hype: This is a demonstration. The decomposition, artifacts, and state are synthesized to make the runtime loop legible — it models an agentic system rather than running arbitrary production agents. It is single-user and single-machine. The loopback HTTP server and per-document store are local by design; multi-user is an aspiration, not a shipped capability. Access is gated. Canvas support requires a Copilot CLI/app build with the canvas-renderer capability. Licensing and preview access are the biggest practical blockers to wider experimentation today. Persistence is best-effort. State is written to a local JSON artifact; treat it as demo durability, not a database. Key takeaways Don't build a UI in Canvas. Use Canvas to shape, test, and evolve a system — and the UI — while it runs. Traditional UIs are for using software; Canvas is for shaping software while it runs. Canvas is Human-to-AI-to-System, a runtime where things execute — not a static design surface. Its strongest use case is observability and validation of agentic systems: surface the intermediate state your production UI should hide. The shared-model architecture — one versioned store broadcast to every participant — is what makes human + AI co-editing real, and what hints at a multi-user future. Next steps Clone and run the demo: github.com/leestott/agent-runtime-canvas. Read the extension source under .github/extensions/agent-runtime/ — start with store.mjs . Explore the building blocks: the GitHub Copilot SDK, the Model Context Protocol, and Microsoft Foundry for taking agentic systems toward production. Try the multi-user thought experiment: fork the store, add a second subscriber, and ask what changes when two humans and two agents share one surface.138Views0likes0Comments