best practices
1755 TopicsHow to build long-running MCP tools on Azure Functions
Recently, a customer building servers with the Azure Functions MCP extension reached out and asked: How do I handle tools that take longer than the client is willing to wait? This becomes especially relevant when tool calls move beyond simple request/response into multi-step workflows and long-running operations. At the same time, MCP is evolving to address exactly this. The Tasks extension is introduced in the 2026-07-28 release candidate, defining a standard way to model long-running work. In this post, we’ll walk through how to build long-running MCP tools on Azure Functions using Durable Functions , a framework for authoring stateful, long-running workflows as ordinary code, with checkpointing, scaling, and recovery handled automatically. MCP tools today Today, MCP tools are fundamentally request/response: the client issues a tools/call the server returns a result This works well for fast operations, but breaks down when: workflows take minutes execution depends on multiple steps latency is unpredictable In practice, clients enforce their own tool-call timeouts. These aren't standardized by the MCP spec and vary per client, but they're often in the ~30–60 second range. If a tool exceeds that window: In practice, clients often enforce short timeouts. If a tool exceeds that window: the client times out the agent observes a failed call the underlying work may still be running So the core issue is that you have synchronous tool calls don’t naturally model long-running work. The MCP Tasks extension The Tasks extension to address this. With the extension, a server can respond to a tools/call with an asynchronous task handle instead of a final result, and the client drives the lifecycle from there: tasks/get: poll the task's status tasks/update: submit input back to the server if the task reaches input_required tasks/cancel: cancel an in-flight task A task carries a status ("working", "input_required", "completed", "failed", or "cancelled") and on completion, the final result. Task creation is server-directed: the client advertises support by including the extension in its per-request capabilities, and the server decides per request whether to return a task. A server won't return a task to a client that hasn't advertised support. It's important to note that Tasks rely on ecosystem support. Clients must advertise the extension, and MCP SDKs must implement the task lifecycle, before servers can use it. So while Tasks is now a defined extension, broad client and SDK support is still in progress. Implement long-runng tasks with Durable Functions today Until the Tasks extension is broadly supported across clients, we need a pattern that works with existing request/response clients and supports long-running execution. The following samples show how, using Durable Functions: Python NET The long-running work in this sample mines a short chain of blocks. Each block requires solving a computational puzzle where the system keeps trying different inputs until it finds one that produces a result matching a specific pattern (for example, starting with a certain number of zeros). Because this involves lots of trial and error, it naturally takes time, making it a good example of a long-running workflow. The server in the sample exposes two tools: start_mining Starts a Durable Functions orchestration to mine the blocks Waits briefly (within a configurable budget) Returns result inline if completed within budget OR returns workflow_id if still running get_mining_result Takes the workflow_id Returns the current state, e.g. "completed", "running", "failed", or "not_found" To ensure that the agent calls the tools in the right order, workflow_id is a required parameter of get_mining_result, so the agent can't poll without starting a mining run first. Also, the "running" response carries a poll_after_seconds and a next instruction, ensuring the agent to poll again if work is not done rather than give up or assume completion. Even so, the poll path still relies on the agent correctly remembering, and not hallucinating, the workflow_id it was handed. If it garbles or invents an id, the poll lands on the wrong instance or none at all (which is why get_mining_result returns "not_found" rather than guessing). What changes with the Tasks extension Once the Tasks extension is fully implemented across clients and SDKs, the model becomes simpler and more reliable: the server returns a Task handle, the client manages the polling and lifecyle calls, and the SDK tracks execution state. This removes a key limitation of today’s solution, which requires the agent to remember and correctly pass identifiers like workflow_id. Call to action Try out the sample and let us know whether it addresses your MCP needs around long-running or workflow type tools!86Views0likes0CommentsGitHub Copilot App - Canvas Is Not a UI Builder
What if your development environment didn't just help you write code, but helped you observe, steer, and evolve a living system while it runs? That's the shift GitHub Copilot App Canvas represents. Canvas redefines how developers interact with agent-driven software: not by building traditional user interfaces, but by creating interactive environments where humans and AI co-create, test, and iterate in real time. This post walks through a real Canvas extension we built, a Multi-Agent Dev Canvas that demonstrates how Canvas becomes a runtime observability and control plane for an agent-driven system. We'll cover why Canvas exists, how it differs from traditional UI development, and how you can use it to accelerate the design-test-evolve loop for any multi-agent application. The Misconception: "Canvas Is for Building UIs" The first instinct many developers have when they see Canvas is to treat it like a UI framework, a place to build dashboards, boards, or user-facing applications. That's not what Canvas is for. Here's the distinction that matters: Traditional UIs are for using software. They serve end-users who interact with a finished product. Canvas is for shaping software while it runs. It serves developers and AI agents who are actively building, testing, and evolving a system. Canvas solves problems your final UI should never try to solve in a visible way. It's the observability layer, the control plane, the validation surface — all the things you need during development that disappear before production. Think of it this way: you wouldn't ship your debugger to users, but you absolutely need it while building. What We Built: A Multi-Agent Dev Canvas To demonstrate Canvas as a development runtime, we built a Multi-Agent Dev Canvas, a standalone GitHub Copilot Canvas extension (this repo, copilot-canvas-runtime) that treats an entire multi-agent system as a living, observable environment. The same pattern applies to any agent-driven system built on services such as Microsoft Foundry. The Multi-Agent Dev Canvas: a runtime observability and control plane where developers and AI agents collaborate to design, test, and evolve an agent-driven system in real time. The canvas provides four integrated panels: System View: See Your Agents Working Five specialised agents are displayed as live cards with real-time status indicators. Each card shows the agent's name, responsibility, current status (idle, running, done, or error), task count, and last action taken. When an agent is active, its card pulses blue. When it fails, it glows red. You see the system breathe. decompose_system — Breaks requirements into agent tasks execute_workflow — Coordinates agents to perform tasks validate_output — Runs evaluation tests and returns structured results update_system_design — Modifies architecture based on feedback track_state — Persists and updates system state over time Task Flows: Watch Work Move Through the Pipeline Below the agents, a flow graph visualises how tasks route between agents. When you decompose a system requirement like "Build an AI-powered code review agent," the canvas shows five components (pr-ingestion, code-analysis, feedback-generator, learning-loop, notification-service) flowing from the decomposer to the executor and designer agents. Each flow carries a status badge, pending, pass, or fail. Validation Panel: Continuous Testing, Not Afterthought Testing The validation panel displays structured test results with pass/fail badges and reasoning. When you run validation, each test case evaluates against specific criteria: ✅ "PR ingestion handles large diffs" — Meets criteria: process diffs over 5,000 lines without timeout ❌ "Feedback is actionable" — Failed: does not satisfy criteria that each suggestion includes a code fix ✅ "Learning loop converges" — Meets criteria: accept rate improves over 10 iterations ✅ "Notifications are non-blocking" — Meets criteria: delivery latency under 500ms This isn't a test runner you invoke separately, it's a validation surface embedded in the development loop. You see failures the moment they happen, in context, alongside the agents and flows that produced them. Live State Timeline: Every Mutation, Visible The right panel tracks every state change with timestamps. Decomposition events, workflow executions, validation runs, failure injections — all appear chronologically. This is the system's memory, visible to both the human developer and the AI agents working alongside them. Canvas as a Runtime: The Key Capabilities What makes Canvas a runtime rather than a display layer is that the agent can act through it. The canvas exposes seven agent-callable actions: Action What It Does decompose_system Accept requirements and components, generate task flows, update the system design execute_workflow Run pending tasks through the agent pipeline, produce artifacts validate_output Evaluate test cases against criteria, return structured pass/fail with reasoning update_system_design Modify the architecture description, constraints, or component list live track_state Read the full system state — agents, flows, validations, history, artifacts inject_failure Force an agent into an error state to test system adaptation pause_resume Toggle execution on and off The human developer can click Decompose, Execute, or Validate directly in the canvas. The AI agent can invoke the same actions programmatically. Both parties operate on the same surface, the same state, the same system, that's what makes Canvas collaborative in a way traditional tooling is not. Why This Matters: Canvas vs. Figma vs. Traditional UIs It helps to position Canvas against tools developers already know: Figma is Human-to-Human collaboration on design. Multiple people interact with the same visual surface, but nothing executes. It's a design tool. Traditional UIs are Human-to-System. Users interact with finished software through a polished interface. Canvas is Human-to-AI-to-System. It's a shared space where things actually execute. The developer steers, the AI acts, and the system evolves, all visible, all in real time. Canvas is collaborative in the Figma sense — it's a shared space, it's visual, multiple participants interact with the same surface. But unlike Figma, the participants include AI agents, and the surface isn't a mockup — it's a live system. How the Extension Works: Under the Hood A Canvas extension is a standard GitHub Copilot CLI extension, a single extension.mjs file that speaks JSON-RPC over stdio. The key components: 1. State Management Each canvas instance maintains its own system state: agents, task flows, validations, a state history timeline, artifacts, and the current system design. State is held in-memory per instance and pushed to the iframe via Server-Sent Events whenever it changes. function createInitialState() { return { agents: [ { id: "decomposer", name: "decompose_system", status: "idle", responsibility: "Break requirements into agent tasks" }, { id: "executor", name: "execute_workflow", status: "idle", responsibility: "Coordinate agents to perform tasks" }, // ... three more agents ], taskFlows: [], validations: [], stateHistory: [], artifacts: [], systemDesign: { description: "", constraints: [], components: [] }, execution: { paused: false, stepCount: 0 }, }; } 2. Real-Time Updates via Server-Sent Events The canvas runs a loopback HTTP server per instance. The iframe connects to an /events endpoint and receives state updates as they happen — no polling, no websocket complexity. if (req.url === "/events") { res.writeHead(200, { "Content-Type": "text/event-stream", "Cache-Control": "no-cache" }); clients.add(res); // Push current state immediately on connect res.write(`data: ${JSON.stringify(getState(instanceId))}\n\n`); } 3. Dual Interaction Model Every action is available through two paths. The human clicks a button in the iframe, which POSTs to the local server. The AI agent calls invoke_canvas_action through the SDK. Both paths mutate the same state and trigger the same SSE broadcast. Neither is privileged over the other. 4. Canvas Declaration The canvas registers with the Copilot SDK using createCanvas , declaring its identity, description, and all agent-callable actions with JSON Schema validation on inputs: createCanvas({ id: "multi-agent-dev", displayName: "Multi-Agent Dev Canvas", description: "Runtime observability and control plane for multi-agent development", actions: [ { name: "decompose_system", description: "Break requirements into agent tasks", inputSchema: { type: "object", properties: { requirements: { type: "string" }, components: { type: "array", items: { type: "string" } } }, required: ["requirements"] }, handler: async (ctx) => { /* ... */ }, }, // ... six more actions ], open: async (ctx) => { /* start server, return URL */ }, onClose: async (ctx) => { /* clean up */ }, }); Scenarios This Enables The Multi-Agent Dev Canvas supports four development scenarios that would be impossible with traditional tooling: 1. End-to-End Feature Design Tell the agent "Build an AI-powered code review system." Watch it decompose the requirement into five components, route tasks to specialist agents, execute the workflow, and validate the outputs, all visible in real time. Iterate by modifying constraints or components and re-running. 2. Live Agent Collaboration Observation See how agents hand off work to each other. The flow graph shows which agent produced what, which tasks are pending, and where bottlenecks form. This is the kind of observability you need when debugging multi-agent orchestration but would never expose in a production UI. 3. Fault Injection and Adaptation Testing Use inject_failure to force an agent into an error state. Watch how the system responds. Does the orchestrator recover? Do downstream tasks fail gracefully? This chaos-engineering approach, applied during development, visible in real time, catches integration failures before they reach production. 4. Validation-Driven Iteration Define test criteria, run validation, see which tests fail, update the system design, re-run. The validation panel isn't a separate CI pipeline, it's embedded in the development surface, creating a continuous feedback loop between design decisions and their measurable outcomes. Getting Started: Build Your Own Canvas Extension To create a Canvas extension in your own project: Read the SDK docs — Run extensions_manage({ operation: "guide" }) in GitHub Copilot CLI to get the canonical documentation paths. Scaffold — Run extensions_manage({ operation: "scaffold", kind: "canvas", name: "my-canvas", location: "project" }) to generate the boilerplate. Implement — Edit extension.mjs with your canvas logic: state model, actions, renderer HTML, and SSE updates. Reload — Run extensions_reload to activate your changes. Drive — Open with open_canvas , invoke actions with invoke_canvas_action , and iterate. The canvas extension lives in .github/extensions/your-canvas/extension.mjs for project-scoped extensions, or in your user extensions directory for personal use. No package.json needed, the github/copilot-sdk import is auto-resolved. Key Takeaways Canvas is a development runtime, not a UI framework. You don't build Canvas instead of your UI, you use Canvas to figure out, test, and evolve the UI and system before and during building it. Canvas solves problems your final UI should never expose. Agent observability, fault injection, live state mutation, validation feedback loops, these are development concerns, not user concerns. Canvas is Human-to-AI-to-System collaboration. Both the developer and the AI agent operate on the same surface, the same state, the same running system. It's Figma-like collaboration, but with AI agents, and things actually execute. Canvas turns debugging, testing, and execution into a continuous visual feedback loop. Instead of switching between an editor, a terminal, a test runner, and a monitoring dashboard, you have one surface where the system lives and evolves. Canvas extensions are lightweight. A single extension.mjs file, no dependencies, loopback HTTP server with SSE, the infrastructure gets out of the way so you can focus on the system you're building. The Bigger Picture Canvas redefines software development by shifting from writing static code to orchestrating living systems. Developers and AI co-create, observe, and evolve solutions in real time. Instead of building UIs for users, we build interactive environments for agents, turning debugging, testing, and execution into a continuous, visual feedback loop that accelerates innovation and brings ideas to production faster than ever. The Multi-Agent Dev Canvas we built here is one example. The pattern applies anywhere you're building agent-driven systems: AI orchestration, workflow automation, data pipelines, autonomous services. Anywhere you need to see, steer, and validate a complex system as it runs, that's where Canvas belongs. Resources copilot-canvas-runtime — this repository: the Multi-Agent Dev Canvas extension, scenario, and demo prompt GitHub Copilot Documentation — Official documentation for GitHub Copilot features Microsoft Foundry Documentation — Build and deploy AI agents with Microsoft FoundryMCP for Beginners: Why Every AI Engineer and Developer Should Learn the Model Context Protocol
If you have spent any time building with large language models in the last year, you have hit the same wall everyone hits: your model is brilliant at reasoning but blind to the real world. It cannot read your database, call your internal API, search your documents, or trigger a deployment unless you hand-write glue code for every single integration. The Model Context Protocol (MCP) exists to tear that wall down, and Microsoft's open-source MCP for Beginners curriculum (reachable via the short link https://aka.ms/mcp-for-beginners) is the most complete, hands-on way to learn it. This post explains what MCP is, walks through the latest updates to the course, shows real code, and makes the case for why MCP belongs on your learning roadmap right now. Whether you are an AI engineer shipping agents to production, a developer wiring tools into Copilot, or a student trying to build a standout portfolio project. What is MCP, and why does it matter? Think of MCP as a universal translator for AI applications. Just as a USB-C port lets you connect any peripheral to any laptop without a custom cable per device, MCP lets an AI model connect to any tool or data source through one standardized protocol. The course uses exactly this analogy, and it holds up well. Before MCP, integrations were an M × N problem: every one of your M AI applications needed bespoke code to talk to each of your N tools. MCP turns that into an M + N problem. Build a tool once as an MCP server, and any MCP-compatible client, Claude Desktop, VS Code, Cursor, GitHub Copilot, and many others — can use it immediately. The protocol is built on a clean client–server model with a small set of primitives: Tools — functions the model can call (query a database, send an email, run code). Resources — data the server exposes for context (files, records, documents). Prompts — reusable, parameterized prompt templates. Sampling — a server asking the client's LLM to generate a completion, enabling collaborative workflows. Elicitation — a server requesting structured input from the user mid-task. Roots — boundaries that tell a server which directories or resources it is allowed to operate on. Communication runs over JSON-RPC, with transports for local processes ( stdio ) and remote servers (streamable HTTP). That standardization is the whole point: write to the spec, and you interoperate with the entire ecosystem. What's new: the latest updates to the course The MCP for Beginners curriculum is actively maintained, and the public changelog reads like a release log for a living product. Here are the most important recent changes, drawn directly from that changelog. 1. Aligned to MCP Specification The biggest update: the entire curriculum has been validated against the current MCP Specification 2025-11-25 and the latest official SDKs. Stale references to older spec revisions (2025-03-26 and 2025-06-18) were corrected across the security, transport, real-time search, sampling, and stdio-server modules, with links repointed to the canonical modelcontextprotocol.io spec paths. A gap analysis confirmed the course already covers every primitive introduced or expanded in the latest spec: Sampling — covered in lesson 3.14 and Advanced Topics. Elicitation (including URL mode) — in Core Concepts and Protocol Features. Roots — in the Introduction, Core Concepts, and Root Contexts. Tasks (experimental, long-running operations) — in Core Concepts and Protocol Features. Tool Annotations ( readOnlyHint / destructiveHint ) — in Core Concepts and Protocol Features. 2. Samples validated against current SDKs Code that does not run is worse than no code at all, so the maintainers re-validated the core samples: TypeScript: @modelcontextprotocol/sdk resolved to 1.29.0 ; a tsc --noEmit type-check passed with no errors — the McpServer and StdioServerTransport APIs remain valid. Python: validated in an isolated virtual environment with mcp[cli] (1.27.2); FastMCP.list_tools() correctly returned the sample add and subtract tools. SDK version pins across labs were bumped (for example mcp>=1.26.0 ) and lockfiles regenerated so every sample tracks the current release. 3. A serious security pass Security is treated as a first-class concern, not an afterthought. A full audit across every dependency manifest and the sample source code was run, and npm audit now reports 0 vulnerabilities in every audited directory. Highlights: Transitive npm advisories (in the MCP Inspector dev tool, the OpenAI client, and the SDK) were remediated by bumping @modelcontextprotocol/inspector to 0.22.0 and pinning a patched shell-quote . A real code-level command-injection fix (OWASP A03): an open_in_vscode tool that used subprocess.run(..., shell=True) was rewritten to launch the resolved executable directly with no shell — closing a metacharacter-injection vector. Python dependencies were audited with pip-audit , and a vulnerable transitive werkzeug was pinned to a patched >=3.1.6 . For anyone learning to ship agents, this is gold: the course demonstrates the whole secure-development loop, not just the happy path. 4. New lessons and a growing curriculum The curriculum keeps expanding with practical, modern lessons: 5.17 Adversarial Multi-Agent Reasoning — two agents argue opposite sides of a question using shared MCP tools ( web_search + run_python ), judged by a third agent. Includes a Mermaid architecture diagram, orchestrators in Python, TypeScript, and C#, and use cases like hallucination detection, threat modeling, and API design review. 3.12 MCP Hosts — configuration for Claude Desktop, VS Code, Cursor, Cline, and Windsurf, with JSON templates and a transport comparison table. 3.13 MCP Inspector — a debugging guide for testing tools, resources, and prompts. 4.1 Pagination — cursor-based pagination patterns in Python, TypeScript, and Java. 5.16 Protocol Features — progress notifications, request cancellation, resource templates, and lifecycle management. 5. Microsoft product rebranding Content was updated to reflect Microsoft's rebranding: Azure AI Foundry → Microsoft Foundry, and the AI Toolkit (AITK) → Microsoft Foundry Toolkit Extension for VS Code. If you have seen older tutorials referencing the previous names, the curriculum is now current. Your first MCP server: see how little code it takes The course's "first server" lesson builds a simple calculator. Here is the shape of a minimal MCP server in Python using FastMCP , which mirrors the validated sample in the repo. Notice how the protocol plumbing disappears — you just decorate functions. # server.py — a minimal MCP server with two tools from mcp.server.fastmcp import FastMCP # Name your server; this identifies it to MCP clients mcp = FastMCP("Calculator") @mcp.tool() def add(a: int, b: int) -> int: """Add two numbers and return the result.""" return a + b @mcp.tool() def subtract(a: int, b: int) -> int: """Subtract b from a and return the result.""" return a - b if __name__ == "__main__": # Run over stdio so local hosts (VS Code, Claude Desktop) can connect mcp.run() The same idea in TypeScript, using the official SDK validated at version 1.29.0 : // server.ts — minimal MCP server in TypeScript import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"; import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js"; import { z } from "zod"; const server = new McpServer({ name: "Calculator", version: "1.0.0" }); // Register a tool with a typed input schema server.tool( "add", { a: z.number(), b: z.number() }, async ({ a, b }) => ({ content: [{ type: "text", text: String(a + b) }], }) ); // Connect over stdio and start listening const transport = new StdioServerTransport(); await server.connect(transport); That is a complete, runnable server. The docstrings and schemas matter: MCP exposes them to the model so it knows when and how to call each tool. Clear descriptions are effectively prompt engineering for your tools — a common pitfall is leaving them vague, which leads to the model misusing or ignoring the tool. Connecting it in VS Code Once your server runs, an MCP host connects to it. A typical VS Code / host configuration looks like this: { "servers": { "calculator": { "command": "python", "args": ["server.py"] } } } Lesson 3.12 (MCP Hosts) covers the equivalent JSON for Claude Desktop, Cursor, Cline, and Windsurf, and lesson 3.13 shows how to use the MCP Inspector to test your tools before wiring them into a host — the single best debugging habit you can build early. How the course is structured The curriculum is organized as a progressive journey with hands-on code in C#, Java, JavaScript, Python, Rust, and TypeScript. It is grouped into phases: Foundations (Modules 0–2): Introduction, Core Concepts, and Security. Building (Module 3): Getting Started — 15 lessons covering your first server and client, LLM clients, VS Code integration, stdio and HTTP streaming, testing, deployment, auth, hosts, the Inspector, sampling, and MCP Apps. Growing (Modules 4–5): Practical Implementation and Advanced Topics — 17 advanced lessons including Azure integration, OAuth2, Entra ID auth, scaling, multi-modality, context engineering, custom transports, and adversarial multi-agent reasoning. Mastery (Modules 6–11): Community Contributions, Lessons from Early Adoption, Best Practices, Case Studies, a Microsoft Foundry Toolkit workshop, and an end-to-end 13-lab PostgreSQL capstone. That final module is the standout for portfolio building: a complete, production-flavored path that takes you from architecture and row-level security through database design, a FastMCP server, semantic search with pgvector and Azure OpenAI, testing, Docker deployment to Azure Container Apps, and monitoring with Application Insights. Why developers should learn MCP now For AI engineers MCP is becoming the default integration layer for agents. Instead of re-implementing tool calling for every framework, you write to one open protocol and your tools work everywhere. The advanced modules — sampling, roots, elicitation, scaling, routing, and adversarial multi-agent patterns — are exactly the techniques you need to move agents from demo to production. For developers MCP is already wired into tools you use daily: VS Code, GitHub Copilot, Claude Desktop, Cursor, and more. Learning to build an MCP server means you can expose your systems — internal APIs, databases, CI/CD — to AI assistants safely. The security-first approach in the course (OAuth2, Entra ID, RBAC, dependency auditing) teaches you to do this the right way from day one. For students MCP is a rare opportunity to learn a technology while it is still early, with a free, beginner-friendly, Microsoft-maintained curriculum and code in six languages. The 13-lab capstone alone is a genuine portfolio project. And with content translated into 50+ languages, the barrier to entry is low no matter where you are. Responsible and secure by design A recurring theme worth calling out: the course does not treat security and governance as optional extras. It models real practices you should carry into your own work: Least privilege via roots — constrain what a server can touch. Tool annotations — mark tools readOnlyHint or destructiveHint so clients can warn users before destructive actions. No shells for user input — the command-injection fix is a textbook example of why you never pass untrusted input through a shell. Dependency hygiene — audit with npm audit and pip-audit , and pin patched releases. Proper auth — dedicated lessons on OAuth2 and Microsoft Entra ID. Key takeaways MCP standardizes how AI connects to tools and data, turning a combinatorial integration problem into a simple, reusable one. The course is current, validated against MCP Specification 2025-11-25 with SDKs at TypeScript 1.29.0 and Python mcp 1.27.2 . Samples actually run, and the repo demonstrates a full secure-development loop with 0 reported vulnerabilities after auditing. It is broad and deep: from a 10-line calculator server to a 13-lab production capstone, in six languages. It is the fastest credible path to MCP fluency for AI engineers, developers, and students alike. Get started today Open the course: https://aka.ms/mcp-for-beginners (redirects to the GitHub repository). Fork and clone it — use a sparse checkout to skip translations for a faster download: git clone --filter=blob:none --sparse https://github.com/microsoft/mcp-for-beginners.git cd mcp-for-beginners git sparse-checkout set --no-cone "/*" "!translations" "!translated_images" Build your first server with lesson 3.1 in your language of choice. Debug it with the MCP Inspector, then connect it in VS Code. Go deep with the 13-lab database capstone, and read the official spec at modelcontextprotocol.io. Track what's new in the changelog and join the community discussions. MCP is quietly becoming the connective tissue of the AI ecosystem. The earlier you learn it, the more leverage you will have — and Microsoft's MCP for Beginners is the clearest on-ramp available. Star the repo, build a server this week, and start connecting your AI to the world.How are you handling performance review data across Teams, Outlook, and SharePoint?
We just wrapped up our mid-year review cycle and it was a mess. Managers had to go through old Teams chat messages to find feedback they gave or go through our sharepoint documents to see where everyones goals were. You think there is a way I can use copilot inside Microsoft Teams to somehow parse through all of this before reviews? I saw this app performance 365 or something like that in the app store but I dont think its an official Microsoft app. Any ideas?39Views0likes1CommentHow Many Copies of Each Layer Does Your Container Registry Actually Need?
Authors: Payal Mahesh and Vicky Lin Azure Container Registry team: Jeanine Burke and Johnson Shi Introduction It's Monday morning. You spin up a fresh 1,000-node AKS cluster for a big training run or a fleet-wide rollout. Every node reaches for the same large container image at the same instant. What actually happens in the next ten minutes - and whether your pods reach Ready in 9 minutes or 14 - turns out to depend on a single number you've probably never thought about: how many copies of each image layer exist behind your registry. At the surface, you see a single capacity number for your registry size - but behind that abstraction, Azure Container Registry maintains copies of your layer data to optimize pull performance. That number of copies directly determines the read throughput available per layer. Each copy can serve requests independently, so distributing the layer across storage allows it to be read in parallel. More copies mean more independent readers - and higher aggregate throughput when thousands of nodes pull at once. The intuitive answer is that more is better: add copies, get faster pulls. When we actually tested it at 1,000-node scale, the truth turned out to be more interesting: A few extra copies helped a little. A moderate number helped a lot, and eliminated storage throttling entirely. A large number helped no more than the moderate one. A huge number actually made pulls slower again. Think of it like opening checkout lanes at a grocery store. Opening a few more lanes when the store is slammed cuts the line dramatically. Past a certain point, though, extra lanes barely help, because by then it's the customers, not the cashiers, who are the bottleneck. And open too many? Now the staff is spread thin and tripping over each other, and the line moves worse than it did at the sweet spot. This post walks through what we measured, why the curve bends where it does, and what we're building next so finding that sweet spot isn't something anyone has to do by hand. Key Takeaways There's a sweet spot, not a slope. Adding copies per layer cut pod-startup P99 by 27% and raised P50 per-node egress throughput by 244%, but only up to a point. Past that, the returns vanish, and far past it, latency actually regresses. Storage throttling is the real enemy. The win comes from spreading load across enough storage backends that no single backend gets pinned at its egress ceiling. Once throttling is gone, more copies stop helping. Storage scale alone has a ceiling. Even at the sweet spot, the per-backend egress limit caps total throughput. The next jump in performance has to come from somewhere else, which is exactly what we're building (see What's Next). This isn't something customers should need to manage. We're building a proactive, on-demand storage scaling capability that automatically grows the footprint before throttling happens and shrinks it back when the burst is over. A quick bit of background Within a region, the layer data behind your container images is backed by Azure storage. The number of copies ACR maintains per layer determines how many independent storage backends a concurrent-pull workload can spread its reads across. That's what matters, because each backend has a finite egress ceiling. Once concurrent reads against one backend get close to that ceiling, requests start getting throttled, and your pulls slow down in proportion. The principle is simple: more copies per layer means more backends serving the same data, which means more total egress headroom and fewer throttled requests. What we wanted data on was how many, and where it stops helping. How we tested We ran a controlled series of large-scale pull tests against ACR Premium on a roughly 1,000-node cluster, with every node pulling the same large image cold at the same time (no local cache on any node). The only thing we changed between runs was the number of per-layer copies behind a single registry endpoint. Everything else, including rate limits, the image, node count, and concurrency, stayed constant. For each run we measured pod-startup latency (P50/P90/P99), end-to-end storage read latency, egress throughput distributions (P50-P99.9), and storage throttling events. Pod-startup latency is our headline metric, because it's the one number that reflects the actual customer experience no matter where the bottleneck happens to be. Per-node egress throughput matters too, though. It tells you directly how much pull bandwidth ACR delivers to your fleet, and it's usually what customers have in mind when they ask how much faster extra copies will make their pulls. We report egress as a distribution rather than a single average, since per-request and per-time-window views can tell very different stories about the same set of pulls. These are observations from a single controlled environment, not a service guarantee. Absolute numbers will move with image size, node count, layer composition, network topology, and concurrency. What we found We tested five configurations, sweeping from a low baseline number of per-layer copies up to a very high one. We name them by relative copy count rather than exact instance counts: Baseline: the lowest level, our reference point. Low: a modest step up from Baseline. Mid: a meaningful step up from Low. Higher: a further step up from Mid. Very high: the largest configuration we tested, well above Higher. Here are the numbers. All percent changes are relative to Baseline. Configuration Pod startup P50 Pod startup P90 Pod startup P99 Storage throttling events Peak per-backend egress Baseline (fewest copies) 9m 36s 11m 0s 14m 16s Many; all top backends above the egress ceiling Highest Low 9m 27s (−2%) 10m 14s (−7%) 12m 59s (−9%) Some; one backend still above the ceiling High Mid 9m 25s (−2%) 9m 45s (−11%) 10m 22s (−27%) Zero Below the ceiling Higher 9m 20s (−3%) 9m 37s (−13%) 10m 22s (−27%) Zero Well below the ceiling Very high 9m 28s (−1%) 10m 31s (−4%) 13m 48s (−3%) Zero Lowest Look at the P99 pod-startup column from top to bottom: 14m 16s, 12m 59s, 10m 22s, 10m 22s, 13m 48s. It improves, flattens out, then climbs back up. Three things explain that shape: 1. The win: Throttling falls off a cliff at the Mid configuration As we added copies per layer, per-backend egress fell and storage-side throttling decreased. At the Mid configuration, throttling errors hit zero, and they stayed at zero for every configuration above it. The upside isn't just that the errors went away, though. It's raw pull bandwidth. At the Mid sweet spot, the typical node saw its P50 egress throughput jump 244% over Baseline. With load spread across enough copies, each node pulled its layers off storage much faster, not just without stalling. For a workload owner, that's the difference between watching pods come up in a steady stream and watching them stall for tens of seconds at a time while throttling clears. Same image, same node count, same registry, very different experience. To put it in concrete terms: if your team runs a daily AI training kickoff that needs all 1,000 nodes pulling before the job can start, this is the difference between starting on time and starting four minutes late every day. Over a quarter of training runs, that adds up. 2. The surprise: more copies made pulls slower This is the finding that genuinely surprised us. Going from Higher to Very high, the largest configuration we tested, cost us 3 minutes and 26 seconds at P99: 10m 22s climbing back up to 13m 48s. That gave back almost the entire benefit we'd built up over the previous four configurations. Tail storage-read latency at Very high actually came out worse than Baseline. The Very high run is where the wheels came off, and the reason is the trade-off underneath. Once storage throttling is gone, more copies stop buying you anything, and the cost of fanning reads across that many backends starts to take over. The throughput distribution shows it clearly. P50 and P75 throughput had been climbing steadily and getting smoother through Mid and Higher, then dropped sharply at Very high while the peak P99/P99.9 spikes came back. Spread the same load across too many backends and it fragments into smaller, less consistent bursts. The takeaway is that "more is better" stops being true past the sweet spot, and the failure mode is quiet. You won't see throttling errors. You'll just see your pulls get slower. 3. What we didn't expect: at few copies, the hottest backend is what hurts you At the lowest copy counts, pull traffic wasn't spread evenly across the underlying storage footprint. Some backends absorbed far more traffic than others. As we added copies, that distribution evened out and the hottest backends cooled down. The implication is sharp. You can saturate the busiest backend, and trigger throttling, even when the total headroom across all your backends is large in aggregate. What matters is the load on the hottest backend, not the average. That's exactly the failure mode that demand-driven, proactive scaling (described below) is meant to head off before it happens. So how should you think about this? You don't size copies yourself; ACR manages the storage footprint behind your registry. Still, it helps to understand what moves the sweet spot, because the shape of your own workload is what decides where it lands. The bigger your worst-case concurrent burst (more nodes, larger images, higher concurrency), the more copies per layer it takes to keep pulls off the throttling ceiling, and the further out the sweet spot sits. Smaller workloads may already be sitting on the flat part of the curve. One thing is worth saying plainly. The storage footprint underneath is managed by ACR and shared across many registries, so there's no fixed, private storage budget that maps one-to-one to your workload. The sweet spot isn't a number you compute and provision; it's a behavior the platform has to land on for you, which is exactly why we're moving toward demand-driven scaling that handles it automatically. That's what brings us to what we're building next. What's next: proactive, on-demand storage scaling and a caching layer The fixed-copy tests above answer the question "how many should the ACR system provision?" but they assume a single, static answer. Real workloads aren't static. A 1,000-node burst happens at deploy time, not at 3 a.m. on a Tuesday. And no matter how many copies are provisioned, the per-backend storage ceiling still bounds peak deliverable throughput. So we're investing along two complementary directions. 1. Proactive, demand-driven storage scaling We're building a capability that adjusts the number of per-layer copies automatically based on real-time pull demand: Proactive, not reactive. The system scales the storage footprint before concurrent pull pressure pushes any single backend near the throttling threshold, so throttling is prevented before it forms rather than cleaned up after the fact. On-demand scale-out. The footprint expands automatically as sustained pull demand grows. Scale-in when demand subsides. The footprint contracts so you're not paying for steady-state capacity you only needed during a burst. Tiering for cold content. Long-tail, rarely-pulled content can sit on colder storage, so the redundant footprint of frequently-pulled content doesn't pay full hot-storage cost everywhere. The benefit to customers is straightforward: smoother pulls under burst, higher delivered throughput on average, no permanent over-provisioning, and no manual re-tuning as workloads grow. 2. A caching layer to absorb burst beyond the storage ceiling Even a perfectly scaled storage footprint runs into the per-backend egress ceiling at extreme scale. To push past it, we're investing in a caching layer in the registry service that absorbs burst traffic before it ever reaches storage. A pull surge that hits the same set of layers, which is the common case for fleet-wide deployments, can be served largely from cache. That takes a lot of load off any single storage backend and complements the storage scaling above. We'll share results from this work in follow-up posts. If you have questions about scaling ACR for your workload, or about how we measure storage performance, reach out on the Azure Container Registry GitHub repository. Note: All results in this post are based on controlled internal testing configurations and are intended to illustrate general scaling behavior rather than prescribe exact configurations.180Views0likes0CommentsMicrosoft Leads a New Era of Software Supply Chain Transparency
Microsoft announces the general availability of Microsoft’s Signing Transparency (MST) – a first-of-its-kind capability that brings unprecedented visibility and trust to our software supply chain. With this release, Microsoft is leading the industry by recording the build of critical cloud services into a publicly readable and verifiable SCITT standard (Supply Chain Integrity, Transparency, and Trust) compliant ledger. This means every production software build for in scope services like Azure Attestation and Azure Managed HSM (Hardware Security Module), Azure confidential ledger, Microsoft Signing Transparency itself (and others over time) – is now logged in an immutable, tamper-evident record. Only builds that are in the MST ledger are deployed to production; this gives customers confidence that the supply chain for these critical services can be audited at anytime. Notably, the MST ledger is fully open source and built to align with the emerging IETF SCITT standard. By embracing SCITT’s principles and open protocols, Microsoft ensures that MST not only secures our own ecosystem but also contributes to a broader industry movement toward standardized supply chain transparency. The open-source MST ledger serves as a verifiable trust anchor that any organization or researcher can inspect, audit, or even integrate with their own tooling. MST itself meets the highest levels of transparency, backed by a tamper-proof confidential ledger, open-source, and independently verified. Specifically, we are making the foundation of our trust model transparent and accessible to everyone – reinforcing that trust must be earned through proof, not just promises. This launch marks a major milestone in our commitment to Zero Trust principles, extending “never trust, always verify” all the way into the build itself. Building on a public preview introduced late last year, MST’s general availability delivers verifiable transparency at the software level. It transforms traditional code signing with an additive trust layer that is accessible via an open verification model. Every new software update is accompanied by a publicly auditable proof of integrity, enabling security teams to proactively confirm that each update is authentic and unaltered. To help organizations get the most out of this capability, we are also introducing a free tool to explore the contents – Ledger Explorer – an offline tool that allows security teams to examine MST ledger entries, verify cryptographic proofs, and even validate the ledger’s integrity independently. This tool, combined with MST’s open design, ensures that every Microsoft customer – and the broader community – can hold us accountable in real time for the software we run on their behalf. Key Benefits of Microsoft’s Signing Transparency (MST) Verified Code Integrity – Every software release is cryptographically logged in MST’s ledgers. This makes each build tamper-evident and traceable. If an attacker attempts to inject malicious code or sign an unauthorized update, it will be evident through the well-defined validation step built into the SCITT standard. Organizations gain the assurance that code integrity can be independently confirmed at any time. Independent Verification & Zero Trust – MST enables customers and auditors to verify software authenticity on their own, without having to solely rely on vendor attestations. For each update, Microsoft provides a transparency “receipt” (proof of logging) that you can use to prove the update was officially published and unaltered. This fosters a “don’t just trust, verify” approach, empowering security teams to double-check everything running in their environment aligns with what Microsoft intended. Audit-Trail & Compliance – The transparency ledger creates a permanent, auditable timeline of code deployments. Every entry is a record of what was released and when, backed by cryptographic proofs. This simplifies compliance reporting and accelerates forensic analysis. In the event of an incident, you can quickly audit the ledger to see if any unexpected code was introduced. For highly regulated industries, MST offers concrete evidence of software integrity and policy compliance over time. Leadership & Open Standards – We are delivering real transparency now, encouraging a future where all critical software is released with verifiable integrity. MST’s open source implementation and SCITT-compliant design exemplify our commitment to openness and collaboration. We believe widespread adoption of these standards will strengthen supply chain security for everyone, making trust verification a universal practice. Next Steps Microsoft’s Signing Transparency is more than a new security feature and shapes the advances in trust technology. As threats grow more sophisticated, we must evolve the way we assure our customers about the software they depend on. With MST now generally available, we are leading by example: proving that it is possible to open up the traditionally opaque process of software deployment and turn it into a source of strength and trust, i.e. empowering each person with verifiable transparency. We invite the industry to join us on this journey and get started by reading the documentation and exploring Ledger Explorer today! Together, by embracing transparency and open standards, we can turn “trust but verify” from a slogan into an everyday reality for digital infrastructure.Windows 365 and developer environments: how do you balance security and productivity?
Hi everyone, I’d like to raise a topic that we are currently struggling with, and I suspect many other organizations are facing the same challenge. We are in the process of establishing a Windows 365–based development environment, where developers work in Cloud PCs. This is largely driven by: a BYOD strategy security requirements (no sensitive code on unmanaged devices) the need for standardization However, this quickly becomes complex in practice. The core challenge We are trying to balance three competing priorities: 1. Security requirements No sensitive code on local devices Minimal attack surface Zero Trust principles and Conditional Access Full traceability of identity and actions 2. Developer needs Local admin rights to be able to do their work Freedom to install tools, SDKs, and runtimes Flexibility without constant blocking Fast iteration cycles The reality is that if it takes too long to get access or permissions, it breaks the developer workflow. 3. IT and governance Standardization of environments Manageability and patching License and cost control Compliance and auditability The practical dilemma Developers want to be local admins on their machines Security teams prefer: Just-In-Time access (PIM), or No admin privileges at all In practice: PIM tends not to work well for developers It introduces too much friction It disrupts flow and often leads to workarounds What we are currently exploring We are testing a model where: Developers work in Windows 365 Cloud PCs They use their regular corporate identity (Entra ID) Isolation is achieved through the environment, not separate accounts Developers have local admin rights within the Cloud PC However, this raises a new question: How do we secure an environment where the user is an admin? Questions to the community I would really appreciate insights from others who have been through similar scenarios: 1. Identity vs privilege Do you use the same identity for everything, or separate user/admin accounts? How far do you take identity separation? 2. Local admin rights Do you allow developers to have local admin rights? Is it permanent or Just-In-Time? If JIT, how do you make it work without impacting productivity? 3. Cloud-based development environments If you are using Windows 365, Dev Box, or AVD: Has this made it easier to relax restrictions? Or are you facing the same challenges, just in the cloud? 4. Guardrails instead of restrictions Instead of trying to prevent everything: EDR / endpoint protection Conditional Access Network isolation Monitoring and detection Has anyone successfully shifted from strict control to strong guardrails and detection? Current reflection I am starting to think that: Focusing on secure, isolated environments for development may be more effective than trying to tightly control every individual action. In other words: secure the platform not every single user behavior But this is far from straightforward. Purpose of this discussion The goal is to find a realistic blueprint that: maintains high developer productivity meets security requirements minimizes friction in day-to-day work Not something theoretically perfect, but something that actually works. If you have experience in this area, I would really value your input: what has worked well what has not worked key design decisions you would recommend Thanks in advance.33Views0likes0CommentsAzure Function App — Queue-Based Architecture for Long-Running Sync Jobs
The Problem: HTTP Triggers and Long-Running Jobs Don't Mix Here's a situation you've probably run into: you have a job that needs to loop over dozens of Azure resources, call APIs, and do real work. You wrap it in an HTTP-triggered Azure Function so it can be called on demand. It works great and after a few minutes, the caller gets a 504 Gateway Timeout. The 230-second limit is enforced by Azure Front Door / the platform load balancer. It cannot be overridden by app settings or host configuration. Any HTTP trigger that runs longer than ~3.5 minutes will timeout for the caller. In our case, the job iterates over 30+ Azure subscriptions — for each one it switches context, lists resources, and triggers image imports. Total runtime: anywhere from 2 to 10 minutes depending on how many ACRs need updating. Way over the limit. The Solution: Decouple Request from Execution via a Queue The fix is clean once you see it: the HTTP trigger shouldn't do the work — it should just accept the work and hand it off. That's what a queue is for. The flow splits into two independent phases: Request phase — The HTTP trigger validates the caller (JWT + app role check), packages the job parameters into a queue message, and returns 202 Accepted. This takes under 3 seconds. Execution phase — A Queue Trigger picks up the message and runs the actual sync. No HTTP connection involved, so there's no timeout. On a Dedicated (P-series) plan, execution time is unlimited. Approach What the caller gets Result HTTP trigger → run sync inline Waits for the full job to complete 504 TIMEOUT after 230 seconds HTTP trigger → Queue → Queue Trigger 202 Accepted immediately NO TIMEOUT job runs as long as needed 🤸♀️There's an added bonus - Reliability in Azure Queue Storage: Azure Storage Queues give you automatic retry out of the box. If the job crashes halfway through, the message becomes visible again after a visibility timeout and the Queue Trigger picks it up for a retry — up to 5 attempts before the message is moved to the poison queue. No retry logic to write 🤸♀️. Locking Down the Endpoint Since the HTTP trigger is the public entry point, it needs solid auth. We layer two things: ⭐Use EasyAuth for the "is this a real Entra ID token?" check, and a custom App Role for the "is this person allowed to trigger syncs?" check. These are independent concerns and should stay that way. Layer What it does How EasyAuth (Entra ID) Rejects requests without a valid Entra ID Bearer token — before your code even runs Configured at the Function App level via the Authentication blade App Role check Validates that the token contains the SyncJob.Execute role — only assigned users/SPs can trigger the job Decoded in the function code from the JWT roles claim Managed Identity Authenticates the Function App to Azure APIs (no credentials in code) Connect-AzAccount -Identity — identity assigned via RBAC One gotcha worth knowing: when using v2 tokens (which is the default with modern App Registrations), the aud claim in the token is the raw App ID GUID — not the api:// prefixed URI. You need to explicitly add both forms to your allowedAudiences in EasyAuth, otherwise valid tokens get rejected. APP_ID="<your-app-id>" TENANT_ID="<your-tenant-id>" FUNCTION_APP_URL="https://<your-function-app>.azurewebsites.net" # Interactive login (device code flow — works from any terminal) az login --tenant "${TENANT_ID}" \ --scope "api://${APP_ID}/.default" \ --use-device-code TOKEN=$(az account get-access-token \ --scope "api://${APP_ID}/.default" \ --query accessToken -o tsv) # Trigger the sync — returns 202 immediately curl -s -X POST "${FUNCTION_APP_URL}/api/SyncContainerRegistryHttpTrigger" \ -H "Authorization: Bearer ${TOKEN}" \ -H "Content-Type: application/json" Passing Parameters Through the Queue One nice property of this pattern: the queue message is just JSON, so you can pass whatever parameters the job needs. In our case, we pass a subscriptionFilter wildcard so callers can target a subset of subscriptions without touching any code. The parameter travels the full chain: HTTP body → queue message → Queue Trigger → PowerShell script parameter. Here's how each step handles it. Step 1 — HTTP Trigger reads the body and enqueues the message using the Push-OutputBinding output binding. Azure Functions wires the binding to the queue automatically — no SDK call needed: param($Request, $TriggerMetadata) # ... decode the JWT, check role assignment $queuePayload = @{ triggeredBy = $decoded.Payload.upn ?? $decoded.Payload.oid triggeredAt = (Get-Date -Format 'o') subscriptionFilter = if ($body.subscriptionFilter) { $body.subscriptionFilter } else { "*" } } | ConvertTo-Json -Compress Push-OutputBinding -Name QueueMessage -Value $queuePayload Push-OutputBinding -Name Response -Value ([HttpResponseContext]@{ StatusCode = [System.Net.HttpStatusCode]::Accepted Body = @{ message = "Sync job queued. Check Azure Monitor logs for execution status." } }) ⭐Push-OutputBinding is how Azure Functions PowerShell workers write to output bindings (queues, blobs, HTTP responses…). The binding name QueueMessage maps to the queue defined in function.json — the runtime handles serialisation and delivery. Step 2 — Queue Trigger passes the filter to the script as a named parameter: param($QueueItem, $TriggerMetadata) Write-Host "Triggered SyncContainerRegistry via Storage Queue. Payload: $QueueItem" $subscriptionFilter = if ($QueueItem.subscriptionFilter) { $QueueItem.subscriptionFilter } else { "*" } $SubscriptionFilter = $subscriptionFilter . "$PSScriptRoot/../SyncContainerRegistry/run.ps1" Step 3 — Long running job with the filter as parameter: param($Timer) if (-not $SubscriptionFilter) { $SubscriptionFilter = "*" } $subscriptions = Get-AzSubscription | Where-Object { $_.Name -like $SubscriptionFilter } foreach ($subscription in $subscriptions) { Set-AzContext -SubscriptionId $subscription.Id | Out-Null # ... do the work } Targeting a subset of subscriptions # Sync all subscriptions (default — omit the body) curl -s -X POST "${FUNCTION_APP_URL}/api/SyncContainerRegistryHttpTrigger" \ -H "Authorization: Bearer ${TOKEN}" \ -H "Content-Type: application/json" # Sync only subscriptions matching a pattern curl -s -X POST "${FUNCTION_APP_URL}/api/SyncContainerRegistryHttpTrigger" \ -H "Authorization: Bearer ${TOKEN}" \ -H "Content-Type: application/json" \ -d '{"subscriptionFilter": "*project-alpha*"}' ⭐PowerShell's -like operator uses * as a wildcard anywhere in the string. The pattern *project-alpha* matches sub-mycompany-project-alpha-prd, sub-mycompany-project-alpha-dev, etc. A pattern without a leading * only matches from the start of the string — keep this in mind when naming subscriptions. Pushing a Message Directly via PowerShell You can also push a message straight to the queue without going through the HTTP trigger — useful for testing, scripting, or bypassing the auth layer in a controlled environment. Connect-AzAccount # or -Identity for a Managed Identity context $storageAccount = "<your-storage-account>" $queueName = "sync-job-queue" # Build the payload — same shape the HTTP trigger produces $payload = @{ triggeredBy = $env:USERNAME triggeredAt = (Get-Date -Format 'o') subscriptionFilter = "*project-alpha*" # or "*" for all } | ConvertTo-Json -Compress # Get a queue client via the connected account (no key needed) $ctx = New-AzStorageContext -StorageAccountName $storageAccount -UseConnectedAccount $queue = Get-AzStorageQueue -Name $queueName -Context $ctx $queue.QueueClient.SendMessage($payload) ⭐ -UseConnectedAccount authenticates via the current Connect-AzAccount session — no storage key required, as long as your identity has the Storage Queue Data Message Sender role on the storage account. The Queue Message The HTTP trigger packages the caller identity and filter into a simple JSON payload before enqueuing. The Queue Trigger reads it back as a deserialised PowerShell object — no manual JSON parsing needed. { "triggeredBy": "user@company.com", "triggeredAt": "2026-06-01T11:03:55.570+02:00", "subscriptionFilter": "*project-alpha*" } Design Decisions at a Glance Decision Choice Why Async execution Azure Storage Queue HTTP trigger has a hard 230s timeout. The sync job takes 2–10 minutes. The queue decouples acceptance from execution — and gives us retry for free. Authentication EasyAuth + App Role No credentials in code. Access is controlled via Entra ID app roles — revocable per user without touching infrastructure. Azure identity Managed Identity No secrets to rotate or store. The Function App authenticates to Azure APIs using its platform-assigned identity. Job parameter Wildcard filter via queue payload Lets callers target any subscription subset without code changes. The filter travels through the queue — the Queue Trigger just passes it along. Hosting plan Dedicated (P-series) Consumption plan caps function execution at 10 minutes. A Dedicated plan has no execution time limit — essential when the job can run longer. See you in the Cloud JamesdldStruggling to get managers to actually use 1:1 meeting agendas in Teams
We've been trying to get our managers to run structured 1:1s with their direct reports using Teams. Right now they just hop on a call with no agenda and wing it. HR wants there to be a documented agenda, talking points from both sides, and some kind of record of what was discussed. We tried using Loop components and OneNote but managers find it clunky to set up every time and most of them just stopped doing it after a few weeks. Is there a better way to handle recurring 1:1 meeting agendas directly in Teams?42Views0likes2CommentsCommunities tab in Teams
Hi there I've recently had the Communities tab pop up in my Teams alongside the Teams and Channels tab. No one else in my organisation can see this yet and we aren't sure why. I know it's being rolled out on a timeline but I was also wondering if it might be because I'm the only one in the org who has an Microsoft Viva Employee Communications and Communities licence? Does anyone have any insights into this? We'd like to make a bit of a roll out plan once this appears in our colleagues Team's set ups.138Views0likes2Comments