python
97 TopicsPlatform Improvements for Python AI Apps on Azure App Service
Overview Azure App Service (Linux) is a fully managed PaaS offering that supports a broad range of languages, including Python, Node.js, .NET, PHP, and Java. Developers can push source code or deploy a pre-built artifact; the platform handles the rest, including dependency installation, application containerization, and running the application at cloud scale. More customers are building intelligent applications using Azure AI Foundry and other AI services, and Python has become a language of choice for these workloads. The performance and reliability of the Python deployment pipeline directly shape the developer's experience on the platform, so we looked across the deployment path for opportunities to reduce latency and improve reliability. The first set of changes has reduced Python deployment latency on Azure App Service Linux by approximately 30%. This is the first step in a broader effort to make the platform better suited for AI application development, but the gains resulting from this effort will benefit all apps on the platform. Let's look at the details. Where Deployment Time Was Going Python web application deployments on Azure App Service Linux rely on Oryx, the platform's open-source build system, to produce runnable artifacts during remote builds. Platform telemetry showed that around 70% of Python app deployments use remote builds, and the majority of those resolve dependencies via requirements.txt using pip install. To understand where time was going, we profiled a stress workload: a 7.5 GB PyTorch application. Most production builds are smaller, but stress-testing a dependency-heavy application made the pipeline bottlenecks clear. When a Python app is deployed via remote build, the build container in Kudu (the App Service deployment service) runs Oryx to: Extract the uploaded source code. Create a Python virtual environment. Install dependencies via pip install; 4.35 min (~34% of build time). Copy files to a staging directory; 0.98 min (~8%). Compress via tar + gzip into an archive; 7.53 min (~58%). Write the archive to /home (Azure Storage SMB mount). The app container then extracts this archive to the local disk on every cold start. Why the Archive-Based Approach? The /home directory is backed by an Azure Storage SMB mount, where small-file I/O is comparatively expensive. Python dependencies are file-heavy: virtual environments commonly contain tens of thousands of files, and dependency-heavy ML applications can exceed 200,000 files. Writing those files individually over SMB would be prohibitively slow. Instead, the pipeline builds on the container's local filesystem, writes a single compressed archive over SMB, and the app container extracts it locally on startup for efficient module loading. Key insight: Compression was the single largest phase at 58% of build time, longer than installing the packages themselves. What We Changed Zstandard Compression (Replacing gzip) Standard gzip compression is single-threaded. In our benchmark, compression accounted for 58% of total build time, making it the dominant bottleneck. Because the archive is also decompressed during container startup, decompression time affects runtime startup latency as well. We evaluated three compression algorithms: gzip, LZ4, and Zstandard (zstd). The following results are averaged across multiple deployments of a 7.5 GB Python application with PyTorch and additional ML packages: Metric gzip LZ4 zstd Compression time 7.53 min 1.20 min 1.18 min Decompression time 2.80 min 1.18 min 1.07 min Archive size 4.0 GB 5.0 GB 4.8 GB Both zstd and LZ4 were more than 6× faster than gzip for compression and more than 2× faster for decompression. We selected zstd for the following reasons: Comparable speed to LZ4, with smaller archive sizes (4.8 GB vs. 5.0 GB). Mature ecosystem: zstd is based on RFC 8878 published in 2021 and ships with many common Linux distributions. Native tar support: tar –I zstd works out of the box; no extra packages required. Result: Compression time dropped from 7.53 min → 1.18 min (6.4× faster). Decompression improved from 2.80 min → 1.07 min (2.6× faster), directly reducing cold-start latency. Faster Package Installation with uv pip is implemented in Python and has historically optimized compatibility over maximum parallelism. In dependency-heavy workloads, package download, resolution, and installation can become a major part of deployment time. In our 7.5 GB PyTorch benchmark, package installation accounted for ~34% of total build time (4.35 min out of 12.86 min). We introduced uv, a Python package manager written in Rust, as the primary installer for compatible requirements.txt deployments. Its uv pip install interface works with standard pip workflows. Fallback strategy: Compatibility remains the priority. When uv cannot handle a deployment, the platform retries with pip, preserving the behavior customers already depend on. Cache behavior: Package caches remain local to the build container. When the same app is deployed again before the kudu (build) container is recycled, both pip and uv can reuse cached packages and avoid repeated downloads. Result: Package installation time dropped from 4.35 min → 1.50 min (3× faster). Reducing File Copy Overhead A file copy showed up in two places. First, before compression, the build process copied the entire build directory (application code plus Python packages) to a staging location. This existed historically as a safety measure; creating a clean snapshot before tar reads the file tree. But the cost was steep for the large number of files inherent in Python dependencies. The fix was straightforward: create the tar archive directly from the build directory, skipping the intermediate copy entirely. Second, for pre-built deployment scenarios, we replaced the legacy Kudu sync path with Linux-native rsync. That gave us a better optimized tool for large Linux file trees and reduced the overhead of moving files into the final deployment location. Because this path is used beyond Python, the improvement benefits pre-built apps across the broader App Service Linux ecosystem. Result: Eliminated the 0.98-minute staging copy (8% of build time), reduced temporary disk usage, and improved the remaining file sync path. Pre-Built Python Wheels Cache We added a complementary optimization: a read-only cache of pre-built wheels for commonly used Python packages, selected using platform telemetry. The cache is mounted into the Kudu build container at runtime for Python workloads, allowing the installer to use local wheel artifacts before downloading packages externally. When a matching wheel is available, the installer uses it directly, avoiding a network fetch for that package. Cache misses fall back to the upstream registry (e.g., PyPI) as usual. The cache is managed by the platform and kept up to date, so supported Python builds can use it without any app change. Combined Results Controlled Benchmark (PyTorch 7.5 GB, P1mv3 App Service Tier) The following benchmark was measured on the P1mv3 App Service tier. Values in the "After" column reflect the optimized pipeline with zstd compression, uv package installation, direct tar creation, and the pre-built wheels cache enabled together. Phase Before After Improvement Package installation 4.35 min 1.50 min ~3× faster File copy 0.98 min 0 min Eliminated Compression 7.53 min 1.18 min ~6× faster Total build time 12.86 min ~2.68 min ~79% reduction Production Fleet (All Python Linux Web Apps) Production telemetry across Python deployments shows the impact of these changes: deployment latency decreased by approximately 30% after the rollout. The controlled benchmark shows a larger improvement (~79%) because it exercises a dependency-heavy workload where package installation, file copy, and compression dominate total build time. Typical production apps are smaller and spend less time proportionally in those phases. Beyond Faster Builds: Reliability and Runtime Performance Faster builds only help when deployment requests reliably reach a worker that is ready to build. We updated the primary deployment clients Azure CLI, GitHub Actions, and Azure DevOps Pipelines to warm up Kudu before initiating deployments. Clients now issue a lightweight health-check request to the Kudu endpoint, helping ensure the deployment container is running and ready before the deployment begins. Clients also preserve affinity to the warmed-up worker using the ARR affinity cookie returned by the first request. This increases the chance that the deployment uses a worker with Kudu already running and local package caches already available from recent deployments. Together, these client-side changes reduced deployment failures from transient infrastructure issues and helped the pipeline optimizations reach the build phase reliably. Result: Deployment failures caused by cold-start errors (502, 503, 499) dropped by ~30%. We also improved the default runtime configuration for Python apps using the platform-provided Gunicorn startup path. Previously, the platform defaulted to a single worker, leaving most CPU cores idle. Now, it follows Gunicorn's recommended worker formula, fully utilizing available cores on multi-core SKUs and delivering higher request throughput out of the box. workers = (2 × NUM_CORES) + 1 Key Takeaways Measure before optimizing: Platform telemetry showed that remote builds and requirements.txt based installs were the dominant Python deployment paths, which helped us focus on changes that would benefit the most customers. Compression was the biggest bottleneck: In the dependency-heavy benchmark, archive compression took longer than package installation. Replacing gzip with zstd reduced both build time and cold-start extraction time. File count matters: Python virtual environments can contain tens of thousands of files, and AI workloads can contain many more. Reducing unnecessary file copies and using Linux-native file sync helped lower overhead. Compatibility needs a fallback path: Introducing uv improved the common path, while falling back to pip preserved compatibility for apps that depend on existing Python packaging behavior. Deployment reliability is part of performance: Faster builds only help if deployment requests consistently reach a ready worker. Warm-up and worker affinity made the optimized path more reliable for customers. Beyond deployment: Runtime defaults, such as Gunicorn worker configuration, also affect how production apps perform once deployment is complete. Together, these changes made Python deployments faster and more reliable while preserving compatibility through safe fallbacks. We will continue improving the platform to make Azure App Service faster, more reliable, and better suited for AI application development.81Views1like0CommentsGive Your AI Agent Eyes: Browser-Harness Meets Playwright Workspaces Remote Browsers
What happens when you hand a coding agent a real browser — not a mock, not an API wrapper, but a full Chromium instance running in the cloud? It fills form for you. It does research for you. It navigates JavaScript-heavy SPAs that would make any REST-based scraper weep. And it does it across 10+ parallel sessions without touching your local machine. This is the story of combining two tools that were built for different worlds — and discovering they're a perfect fit. The Problem Today's coding agents — Codex, Claude Code, Copilot — are extraordinary at reading and writing code. But ask one to product availability on an web site, and it hits a wall. Modern websites are JavaScript-rendered, authentication-gated, geolocation-aware, and hostile to simple HTTP requests. The agent needs a real browser. Not requests.get(). Not a headless puppeteer script you wrote last Tuesday. A browser that renders CSS, executes JavaScript, handles cookies, and lets the agent see what a human would see. Enter Browser-Harness Browser-harness is an open-source tool that gives AI agents direct control over a Chrome browser via the Chrome DevTools Protocol (CDP). It exposes a clean Python API: ● agent: wants to upload a file │ ● agent-workspace/agent_helpers.py → helper missing │ ● agent writes it agent_helpers.py │ + custom helper ✓ file uploaded One websocket to Chrome, nothing between. The agent writes what's missing during execution. The harness improves itself every run. But there's a catch. Where does this browser run? The Infrastructure Gap If the browser runs locally, you've got problems: Your machine is busy. Running Chrome while the agent works eats RAM and CPU. No parallelism. One browser per machine. Want to scrape 10 sites simultaneously? Buy 10 machines. No consistency. Different OS, different Chrome versions, different results. No isolation. Letting the agent run amock on autopilot with your local browser can be risky, it can reuse your creds, stored cookies and sessions No observability. The agent is clicking around in a browser you can't see. What you really want is a browser that runs somewhere else — managed, scalable, observable — and your agent just connects to it over a WebSocket. Enter Playwright Workspaces Playwright Workspaces provides exactly this: remote browser endpoints on Azure. You make an HTTP request, a Chromium instance spins up in the cloud, and you get back a WebSocket URL (wss://...) to connect via CDP. The key insight: browser-harness speaks CDP. Playwright Workspaces serves CDP. They snap together like LEGO. Your Agent → browser-harness → CDP WebSocket → Playwright Workspaces → Cloud Chromium No local Chrome needed. No browser installation. No display server. Just a WebSocket connection to a fully managed browser. The Two-Step Connection Flow Connecting them is surprisingly simple: Step 1: Provision a remote browser def get_connect_options(os_name="linux", run_id=str(uuid.uuid4())) -> tuple[str, dict[str, str]]: service_url = os.getenv("PLAYWRIGHT_SERVICE_URL") service_access_token = os.getenv("PLAYWRIGHT_SERVICE_ACCESS_TOKEN") headers = {"Authorization": f"Bearer {service_access_token}"} service_run_id = os.getenv("PLAYWRIGHT_SERVICE_RUN_ID") ws_endpoint = f"{service_url}?os={os_name}&runId={service_run_id}&api-version=2025-09-01" return ws_endpoint, headers Step 2: Point browser-harness at it export BU_CDP_WS="${session_url}" browser-harness -c "print(page_info())" # → {'url': 'about:blank', 'title': '', 'w': 780, 'h': 441} That's it. Your agent now controls a cloud browser. What This Unlocks: A Real-World Demo We gave a coding agent this prompt: "Go to Website1, search for gifts under ₹500 for 10-year-old kids. Must be useful, reusable (not single-use). Delivery in Bengaluru within 3 days. Must have 5 pieces available." Here's what the agent did — autonomously, with no human intervention: Provisioned a remote Chromium browser via Playwright Workspaces Connected browser-harness to the cloud browser over WebSocket Navigated to FirstCry.com Set delivery location to Bengaluru (pincode 560001) Searched for kids' gifts Applied filters — price ₹0–250 and ₹250–500 via JavaScript DOM interaction Browsed products, rejecting single-use items (greeting cards) in favor of reusable ones (stainless steel water bottles) Checked delivery dates — rejected items with 6-day delivery, found ones with Next Day Delivery Verified stock availability — confirmed ADD TO CART was active with no stock warnings Took screenshots at every step for audit and debugging Result: Found the Brand A 600 Stainless Steel Water Bottle at ₹444.69 with next-day delivery to Bengaluru. All criteria met. The entire workflow ran on a remote browser in Azure — the local machine never launched Chrome. The Power of Remote Endpoints Why does running browsers remotely change everything? 1. Massive Parallelism Spin up multiple remote browsers and work in parallel. Each gets its own isolated Chromium instance. No resource contention, no port conflicts. 2. Zero Local Dependencies No Chrome installation. No chromedriver version mismatches. No --no-sandbox hacks. The browser is a managed service — you just connect to it. 3. Geographic Flexibility Remote browsers run in Azure data centers. Need to see what a website looks like from East US? Or Southeast Asia? Pick your region. The browser's IP and geolocation are in the cloud, not on your laptop. 4. Ephemeral & Secure Each browser session is isolated and destroyed when the WebSocket closes. No leftover cookies, no persistent state leaking between runs. Every session starts clean. The Bigger Picture We're at an inflection point. AI agents are moving from code generation to code execution — and execution means interacting with the real world. Browsers are the universal interface to that world. The combination of browser-harness (agent-to-browser control) and Playwright Workspaces (managed remote browsers) creates a powerful primitive: give any AI agent a browser, anywhere, on demand. Get Started The full sample — including the playwright_service_client.py helper, setup prompts, and environment templates — is available here: 📦 playwright-workspaces/samples/browser-harness Resources: Playwright Workspaces Documentation Browser-Harness GitHub Create a Playwright Workspace180Views3likes0CommentsSecuring Your AI Agents Before They Ship: Red Teaming with Microsoft PyRIT
Securing Your AI Agents Before They Ship: Red Teaming with Microsoft PyRIT You wouldn't ship a web app without running OWASP ZAP or Snyk. So why are AI agents going to production without a single security scan? Prompt injection, data leakage, system prompt theft — the OWASP Top 10 for LLM Applications reads like a checklist of things most teams haven't tested for. PyRIT is Microsoft's open-source answer: an automation framework battle-tested on 100+ products including Copilot. But here's the catch — PyRIT is a research library. To make it work in a real engineering workflow, you need to wrap it. This post shows you how. In this post: Why AI red teaming is fundamentally different from traditional security testing What PyRIT gives you out of the box How to build a thin wrapper that turns PyRIT into a config-driven, pipeline-ready scanner When and how to plug it into your CI/CD workflow Customizing every step for your threat model 🛡️ Why AI Red Teaming Is Different If you're building agentic AI — systems that reason, call tools, and take actions — you already know that traditional security testing doesn't cut it. Microsoft's AI Red Team learned this the hard way after red-teaming 100+ generative AI products. Three things make AI red teaming unique: You're testing two risk surfaces at once — security vulnerabilities (prompt injection, data exfiltration) *and* responsible AI harms (bias, toxicity, manipulation). Traditional pen testers focus on one. Outputs are probabilistic — the same prompt can produce different responses across runs. You can't just assert on a fixed output. You need automated scoring at scale. Every architecture is different — standalone chatbots, RAG pipelines, multi-agent workflows, tool-calling agents. A single test harness has to flex across all of them. The OWASP LLM Top 10 (2025) gives us the taxonomy — prompt injection, sensitive information disclosure, excessive agency, system prompt leakage, data poisoning, supply chain risks, improper output handling, embedding weaknesses, misinformation, and unbounded consumption. Every AI agent you deploy is exposed to all ten. The question is whether *you* discover the gaps or your users do. 🔧 What PyRIT Gives You PyRIT (Python Risk Identification Tool) started as internal scripts at Microsoft in 2022. Today it's a 3,800-star, MIT-licensed framework with 129 contributors and a published paper. "We were able to pick a harm category, generate several thousand malicious prompts, and use PyRIT's scoring engine to evaluate the output from the Copilot system — all in the matter of hours instead of weeks." — Microsoft Security Blog The building blocks: 53+ datasets — AIRT, HarmBench, AdvBench, XSTest, and more. Curated adversarial prompts covering content harms, jailbreaks, data exfiltration, and social bias. 70+ prompt converters — Base64, ROT13, Leetspeak, Unicode confusables, LLM-powered rephrasing, translation, multimodal injection. They stack — a prompt can be translated, then Base64-encoded, then embedded in an image. 6 attack strategies — from simple `PromptSendingAttack` (single-turn) to `CrescendoAttack` (gradual escalation), `TreeOfAttacksWithPruning` (TAP), and multi-turn dialogue attacks. 20+ scorers — LLM-as-judge, Azure AI Content Safety, true/false classifiers, Likert scales. 10+ targets — OpenAI, Azure, HuggingFace, HTTP endpoints, Playwright, WebSockets. This is powerful — PyRIT gives you the components — datasets, converters, attack strategies, scorers — but not the glue. You still need something that loads a config, wires the right components together, runs attacks, scores the results, and tells your pipeline pass or fail. That's what a wrapper does. 🏗️ Building an Enterprise Wrapper The idea is simple: take PyRIT's primitives and compose them into an opinionated, config-driven pipeline that any developer can run with a single command. Below is given the idea on how we can create the wrapper around PyRIT to make it useful for agentic ai security testing, but this is not limited. The Flow Everything starts with a YAML config and ends with a pass/fail exit code: The key insight: every step in this pipeline is configurable through YAML, not code. Switching attack strategies, adding converters, adjusting thresholds — it's all one config edit away. Project Structure At POC level, the wrapper is lean — an orchestrator that stitches PyRIT's components together: The heart of it is `runner.py` — a single orchestrator that: Reads the YAML config Dynamically imports your agent and wraps it as a PyRIT target Loads datasets (built-in + your custom prompts) Runs attacks in escalating phases — plain → encoded → semantic → multi-turn Scores every response using LLM-as-judge Maps findings to OWASP LLM Top 10 categories Generates reports (JSON, Markdown) Evaluates the release gate — pass or fail That's it. One file doing the orchestration, a handful of helpers, and a YAML config. You can start here and grow the structure as your needs evolve. The Configuration A single YAML file drives everything: scan: name: "my-agent-security-scan" # Point to your agent — any Python callable target: module: "my_agent.main" function: "handle_prompt" # Attack strategies to run strategies: - name: "prompt_sending" # Single-turn baseline - name: "crescendo" # Gradual escalation max_turns: 5 # Prompt transformations — they stack automatically converters: - name: "Base64Converter" - name: "LeetspeakConverter" - name: "TranslationConverter" # Adversarial prompt sets datasets: - "airt_illegal" # Built-in: Microsoft AI Red Team - "harmbench" # Built-in: harm evaluation benchmark - path: "datasets/custom_prompts.yaml" # Release gate thresholds gate: fail_on: critical_findings: 0 # Zero tolerance for critical high_findings: 3 max_score: 0.3 # Defense rate must exceed 70% Point the target at your agent, pick your strategies and datasets, set your thresholds — run. Teams can start scanning their agents in an afternoon, not weeks. 🔄 Plugging Into Your Pipeline Since the wrapper is a pip-installable package(we can use setup tools or poetry to build and make it pip installable), integrating it into any CI/CD system is straightforward — `pip install`, then call the CLI. No custom actions or marketplace extensions needed. The key decision is when to run scans. Not every merge needs a full red team pass. Here's what works in practice: The idea is that developers can optionally run quick scans locally as a fast feedback loop, while full scans are manually triggered or approval-gated — the tech lead or architect decides when it's worth running a comprehensive assessment based on the nature of the changes. Since it's just a CLI, integration is the same everywhere — GitHub Actions, Azure DevOps, Jenkins, or a shell script. Install the package, call `pyrit-scan run`, check the exit code. ⚙️ Customization Without Forking The whole point of a wrapper is that teams customize behavior through configuration — not by modifying framework code. What to Customize How Example Which agent to test Point target.module + target.function in YAML to any Python callable Your chatbot, RAG pipeline, or multi-agent workflow Attack strategies Add/remove entries under strategies in YAML Start with prompt_sending , add crescendo when ready Prompt transformations List converters in YAML — they stack automatically Base64 → Leetspeak → Translation = multi-phase evasion Datasets Use built-in (53+) or add custom YAML prompt files HIPAA prompts, financial compliance scenarios Scoring thresholds Set per-OWASP-category thresholds in gate.fail_on Zero tolerance for data leakage (LLM02), relaxed for misinformation (LLM09) Report formats List formats in reporting.formats JSON for automation, PDF for compliance, JUnit for dashboards New attack classes Register via custom_attacks in YAML — module + class name No framework code change, no PR needed 🎯 Start Red Teaming Today AI red teaming isn't a nice-to-have anymore. If you're shipping agentic AI — systems that call tools, access data, and take actions on behalf of users — you need automated security testing in your pipeline. PyRIT gives you the primitives. A thin wrapper gives you the automation. Together, they turn AI security from a one-off exercise into a continuous, measurable practice. The pattern: YAML config → wrap your agent → run attacks → score → map to OWASP → gate the release. Build it once. Run it on every release. Sleep better. Resources PyRIT on GitHub — source code, docs, and community PyRIT Documentation — getting started guides and API reference OWASP LLM Top 10 (2025) — the industry standard risk taxonomy Microsoft AI Red Team Hub — threat models, bug bars, and best practices 3 Takeaways from Red Teaming 100 Products — lessons learned at scale PyRIT Launch Blog — origin story and key design decisions PyRIT Paper (arXiv) — the academic paper693Views0likes0CommentsMicrosoft 365 multi-agent workflow with Microsoft Agent Framework
Learn how to design and run a multi‑agent workflow with Microsoft Agent Framework: from building a coordinated set of specialized agents and tools, to hosting and deploying them with Azure AI Foundry, and finally exposing the same workflow to users in Microsoft 365 (Teams or Copilot). This walkthrough demonstrates a practical end‑to‑end pattern for orchestrating agents, adding tools, and packaging the solution for real‑world applications.386Views0likes0CommentsAzure Functions Ignite 2025 Update
Azure Functions is redefining event-driven applications and high-scale APIs in 2025, accelerating innovation for developers building the next generation of intelligent, resilient, and scalable workloads. This year, our focus has been on empowering AI and agentic scenarios: remote MCP server hosting, bulletproofing agents with Durable Functions, and first-class support for critical technologies like OpenTelemetry, .NET 10 and Aspire. With major advances in serverless Flex Consumption, enhanced performance, security, and deployment fundamentals across Elastic Premium and Flex, Azure Functions is the platform of choice for building modern, enterprise-grade solutions. Remote MCP Model Context Protocol (MCP) has taken the world by storm, offering an agent a mechanism to discover and work deeply with the capabilities and context of tools. When you want to expose MCP/tools to your enterprise or the world securely, we recommend you think deeply about building remote MCP servers that are designed to run securely at scale. Azure Functions is uniquely optimized to run your MCP servers at scale, offering serverless and highly scalable features of Flex Consumption plan, plus two flexible programming model options discussed below. All come together using the hardened Functions service plus new authentication modes for Entra and OAuth using Built-in authentication. Remote MCP Triggers and Bindings Extension GA Back in April, we shared a new extension that allows you to author MCP servers using functions with the MCP tool trigger. That MCP extension is now generally available, with support for C#(.NET), Java, JavaScript (Node.js), Python, and Typescript (Node.js). The MCP tool trigger allows you to focus on what matters most: the logic of the tool you want to expose to agents. Functions will take care of all the protocol and server logistics, with the ability to scale out to support as many sessions as you want to throw at it. [Function(nameof(GetSnippet))] public object GetSnippet( [McpToolTrigger(GetSnippetToolName, GetSnippetToolDescription)] ToolInvocationContext context, [BlobInput(BlobPath)] string snippetContent ) { return snippetContent; } New: Self-hosted MCP Server (Preview) If you’ve built servers with official MCP SDKs and want to run them as remote cloud‑scale servers without re‑writing any code, this public preview is for you. You can now self‑host your MCP server on Azure Functions—keep your existing Python, TypeScript, .NET, or Java code and get rapid 0 to N scaling, built-in server authentication and authorization, consumption-based billing, and more from the underlying Azure Functions service. This feature complements the Azure Functions MCP extension for building MCP servers using the Functions programming model (triggers & bindings). Pick the path that fits your scenario—build with the extension or standard MCP SDKs. Either way you benefit from the same scalable, secure, and serverless platform. Use the official MCP SDKs: # MCP.tool() async def get_alerts(state: str) -> str: """Get weather alerts for a US state. Args: state: Two-letter US state code (e.g. CA, NY) """ url = f"{NWS_API_BASE}/alerts/active/area/{state}" data = await make_nws_request(url) if not data or "features" not in data: return "Unable to fetch alerts or no alerts found." if not data["features"]: return "No active alerts for this state." alerts = [format_alert(feature) for feature in data["features"]] return "\n---\n".join(alerts) Use Azure Functions Flex Consumption Plan's serverless compute using Custom Handlers in host.json: { "version": "2.0", "configurationProfile": "mcp-custom-handler", "customHandler": { "description": { "defaultExecutablePath": "python", "arguments": ["weather.py"] }, "http": { "DefaultAuthorizationLevel": "anonymous" }, "port": "8000" } } Learn more about MCPTrigger and self-hosted MCP servers at https://aka.ms/remote-mcp Built-in MCP server authorization (Preview) The built-in authentication and authorization feature can now be used for MCP server authorization, using a new preview option. You can quickly define identity-based access control for your MCP servers with Microsoft Entra ID or other OpenID Connect providers. Learn more at https://aka.ms/functions-mcp-server-authorization. Better together with Foundry agents Microsoft Foundry is the starting point for building intelligent agents, and Azure Functions is the natural next step for extending those agents with remote MCP tools. Running your tools on Functions gives you clean separation of concerns, reuse across multiple agents, and strong security isolation. And with built-in authorization, Functions enables enterprise-ready authentication patterns, from calling downstream services with the agent’s identity to operating on behalf of end users with their delegated permissions. Build your first remote MCP server and connect it to your Foundry agent at https://aka.ms/foundry-functions-mcp-tutorial. Agents Microsoft Agent Framework 2.0 (Public Preview Refresh) We’re excited about the preview refresh 2.0 release of Microsoft Agent Framework that builds on battle hardened work from Semantic Kernel and AutoGen. Agent Framework is an outstanding solution for building multi-agent orchestrations that are both simple and powerful. Azure Functions is a strong fit to host Agent Framework with the service’s extreme scale, serverless billing, and enterprise grade features like VNET networking and built-in auth. Durable Task Extension for Microsoft Agent Framework (Preview) The durable task extension for Microsoft Agent Framework transforms how you build production-ready, resilient and scalable AI agents by bringing the proven durable execution (survives crashes and restarts) and distributed execution (runs across multiple instances) capabilities of Azure Durable Functions directly into the Microsoft Agent Framework. Combined with Azure Functions for hosting and event-driven execution, you can now deploy stateful, resilient AI agents that automatically handle session management, failure recovery, and scaling, freeing you to focus entirely on your agent logic. Key features of the durable task extension include: Serverless Hosting: Deploy agents on Azure Functions with auto-scaling from thousands of instances to zero, while retaining full control in a serverless architecture. Automatic Session Management: Agents maintain persistent sessions with full conversation context that survives process crashes, restarts, and distributed execution across instances Deterministic Multi-Agent Orchestrations: Coordinate specialized durable agents with predictable, repeatable, code-driven execution patterns Human-in-the-Loop with Serverless Cost Savings: Pause for human input without consuming compute resources or incurring costs Built-in Observability with Durable Task Scheduler: Deep visibility into agent operations and orchestrations through the Durable Task Scheduler UI dashboard Create a durable agent: endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME", "gpt-4o-mini") # Create an AI agent following the standard Microsoft Agent Framework pattern agent = AzureOpenAIChatClient( endpoint=endpoint, deployment_name=deployment_name, credential=AzureCliCredential() ).create_agent( instructions="""You are a professional content writer who creates engaging, well-structured documents for any given topic. When given a topic, you will: 1. Research the topic using the web search tool 2. Generate an outline for the document 3. Write a compelling document with proper formatting 4. Include relevant examples and citations""", name="DocumentPublisher", tools=[ AIFunctionFactory.Create(search_web), AIFunctionFactory.Create(generate_outline) ] ) # Configure the function app to host the agent with durable session management app = AgentFunctionApp(agents=[agent]) app.run() Durable Task Scheduler dashboard for agent and agent workflow observability and debugging For more information on the durable task extension for Agent Framework, see the announcement: https://aka.ms/durable-extension-for-af-blog. Flex Consumption Updates As you know, Flex Consumption means serverless without compromise. It combines elastic scale and pay‑for‑what‑you‑use pricing with the controls you expect: per‑instance concurrency, longer executions, VNet/private networking, and Always Ready instances to minimize cold starts. Since launching GA at Ignite 2024 last year, Flex Consumption has had tremendous growth with over 1.5 billion function executions per day and nearly 40 thousand apps. Here’s what’s new for Ignite 2025: 512 MB instance size (GA). Right‑size lighter workloads, scale farther within default quota. Availability Zones (GA). Distribute instances across zones. Rolling updates (Public Preview). Unlock zero-downtime deployments of code or config by setting a single configuration. See below for more information. Even more improvements including: new diagnostic settingsto route logs/metrics, use Key Vault App Config references, new regions, and Custom Handler support. To get started, review Flex Consumption samples, or dive into the documentation to see how Flex can support your workloads. Migrating to Azure Functions Flex Consumption Migrating to Flex Consumption is simple with our step-by-step guides and agentic tools. Move your Azure Functions apps or AWS Lambda workloads, update your code and configuration, and take advantage of new automation tools. With Linux Consumption retiring, now is the time to switch. For more information, see: Migrate Consumption plan apps to the Flex Consumption plan Migrate AWS Lambda workloads to Azure Functions Durable Functions Durable Functions introduces powerful new features to help you build resilient, production-ready workflows: Distributed Tracing: lets you track requests across components and systems, giving you deep visibility into orchestration and activities with support for App Insights and OpenTelemetry. Extended Sessions support in .NET isolated: improves performance by caching orchestrations in memory, ideal for fast sequential activities and large fan-out/fan-in patterns. Orchestration versioning (public preview): enables zero-downtime deployments and backward compatibility, so you can safely roll out changes without disrupting in-flight workflows Durable Task Scheduler Updates Durable Task Scheduler Dedicated SKU (GA): Now generally available, the Dedicated SKU offers advanced orchestration for complex workflows and intelligent apps. It provides predictable pricing for steady workloads, automatic checkpointing, state protection, and advanced monitoring for resilient, reliable execution. Durable Task Scheduler Consumption SKU (Public Preview): The new Consumption SKU brings serverless, pay-as-you-go orchestration to dynamic and variable workloads. It delivers the same orchestration capabilities with flexible billing, making it easy to scale intelligent applications as needed. For more information see: https://aka.ms/dts-ga-blog OpenTelemetry support in GA Azure Functions OpenTelemetry is now generally available, bringing unified, production-ready observability to serverless applications. Developers can now export logs, traces, and metrics using open standards—enabling consistent monitoring and troubleshooting across every workload. Key capabilities include: Unified observability: Standardize logs, traces, and metrics across all your serverless workloads for consistent monitoring and troubleshooting. Vendor-neutral telemetry: Integrate seamlessly with Azure Monitor or any OpenTelemetry-compliant backend, ensuring flexibility and choice. Broad language support: Works with .NET (isolated), Java, JavaScript, Python, PowerShell, and TypeScript. Start using OpenTelemetry in Azure Functions today to unlock standards-based observability for your apps. For step-by-step guidance on enabling OpenTelemetry and configuring exporters for your preferred backend, see the documentation. Deployment with Rolling Updates (Preview) Achieving zero-downtime deployments has never been easier. The Flex Consumption plan now offers rolling updates as a site update strategy. Set a single property, and all future code deployments and configuration changes will be released with zero-downtime. Instead of restarting all instances at once, the platform now drains existing instances in batches while scaling out the latest version to match real-time demand. This ensures uninterrupted in-flight executions and resilient throughput across your HTTP, non-HTTP, and Durable workloads – even during intensive scale-out scenarios. Rolling updates are now in public preview. Learn more at https://aka.ms/functions/rolling-updates. Secure Identity and Networking Everywhere By Design Security and trust are paramount. Azure Functions incorporates proven best practices by design, with full support for managed identity—eliminating secrets and simplifying secure authentication and authorization. Flex Consumption and other plans offer enterprise-grade networking features like VNETs, private endpoints, and NAT gateways for deep protection. The Azure Portal streamlines secure function creation, and updated scenarios and samples showcase these identity and networking capabilities in action. Built-in authentication (discussed above) enables inbound client traffic to use identity as well. Check out our updated Functions Scenarios page with quickstarts or our secure samples gallery to see these identity and networking best practices in action. .NET 10 Azure Functions now supports .NET 10, bringing in a great suite of new features and performance benefits for your code. .NET 10 is supported on the isolated worker model, and it’s available for all plan types except Linux Consumption. As a reminder, support ends for the legacy in-process model on November 10, 2026, and the in-process model is not being updated with .NET 10. To stay supported and take advantage of the latest features, migrate to the isolated worker model. Aspire Aspire is an opinionated stack that simplifies development of distributed applications in the cloud. The Azure Functions integration for Aspire enables you to develop, debug, and orchestrate an Azure Functions .NET project as part of an Aspire solution. Aspire publish directly deploys to your functions to Azure Functions on Azure Container Apps. Aspire 13 includes an updated preview version of the Functions integration that acts as a release candidate with go-live support. The package will be moved to GA quality with Aspire 13.1. Java 25, Node.js 24 Azure Functions now supports Java 25 and Node.js 24 in preview. You can now develop functions using these versions locally and deploy them to Azure Functions plans. Learn how to upgrade your apps to these versions here In Summary Ready to build what’s next? Update your Azure Functions Core Tools today and explore the latest samples and quickstarts to unlock new capabilities for your scenarios. The guided quickstarts run and deploy in under 5 minutes, and incorporate best practices—from architecture to security to deployment. We’ve made it easier than ever to scaffold, deploy, and scale real-world solutions with confidence. The future of intelligent, scalable, and secure applications starts now—jump in and see what you can create!3.6KViews1like2CommentsGive your Foundry Agent Custom Tools with MCP Servers on Azure Functions
This blog post is for developers who have an MCP server deployed to Azure Functions and want to connect it to Microsoft Foundry agents. It walks through why you'd want to do this, the different authentication options available, and how to get your agent calling your MCP tools. Connect your MCP server on Azure Functions to Foundry Agent If you've been following along with this blog series, you know that Azure Functions is a great place to host remote MCP servers. You get scalable infrastructure, built-in auth, and serverless billing. All the good stuff. But hosting an MCP server is only half the picture. The real value comes when something actually uses those tools. Microsoft Foundry lets you build AI agents that can reason, plan, and take actions. By connecting your MCP server to an agent, you're giving it access to your custom tools, whether that's querying a database, calling an API, or running some business logic. The agent discovers your tools, decides when to call them, and uses the results to respond to the user. Why connect MCP servers to Foundry agents? You might already have an MCP server that works great with VS Code, VS, Cursor, or other MCP clients. Connecting that same server to a Foundry agent means you can reuse those tools in a completely different context, i.e. in an enterprise AI agent that your team or customers interact with. No need to rebuild anything. Your MCP server stays the same; you're just adding another consumer. Prerequisites Before proceeding, make sure you have the following: 1. An MCP server deployed to Azure Functions. If you don't have one yet, you can deploy one quickly by following one of the samples: Python TypeScript .NET 2. A Foundry project with a deployed model and a Foundry agent Authentication options Depending on where you are in development, you can pick what makes sense and upgrade later. Here's a summary: Method Description When to use Key-based (default) Agent authenticates by passing a shared function access key in the request header. This method is the default authentication for HTTP endpoints in Functions. Development, or when Entra auth isn't required. Microsoft Entra Agent authenticates using either its own identity (agent identity) or the shared identity of the Foundry project (project managed identity). Use agent identity for production scenarios, but limit shared identity to development. OAuth identity passthrough Agent prompts users to sign in and authorize access, using the provided token to authenticate. Production, when each user must authenticate individually. Unauthenticated Agent makes unauthenticated calls. Development only, or tools that access only public information. Connect your MCP server to your Foundry agent If your server uses key-based auth or is unauthenticated, it should be relatively straightforward to set up the connection from a Foundry agent. The Microsoft Entra and OAuth identity passthrough are options that require extra steps to set up. Check out detailed step-by-step instructions for each authentication method. At a high level, the process looks like this: Enable built-in MCP authentication : When you deploy a server to Azure Functions, key-based auth is the default. You'll need to disable that and enable built-in MCP auth instead. If you deployed one of the sample servers in the Prerequisite section, this step is already done for you. Get your MCP server endpoint URL: For MCP extension-based servers, it's https://<FUNCTION_APP_NAME>.azurewebsites.net/runtime/webhooks/mcp Get your credentials based on your chosen auth method: a managed identity configuration, OAuth credentials Add the MCP server as a tool in the Foundry portal by navigating to your agent, adding a new MCP tool, and providing the endpoint and credentials. Microsoft Entra connection required fields OAuth Identity required fields Once the server is configured as a tool, test it in the Agent Builder playground by sending a prompt that triggers one of your MCP tools. Closing thoughts What I find exciting about this is the composability. You build your MCP server once and it works everywhere: VS Code, VS, Cursor, ChatGPT, and now Foundry agents. The MCP protocol is becoming the universal interface for tool use in AI, and Azure Functions makes it easy to host these servers at scale and with security. Are you building agents with Foundry? Have you connected your MCP servers to other clients? I'd love to hear what tools you're exposing and how you're using them. Share with us your thoughts! What's next In the next blog post, we'll go deeper into other MCP topics and cover new MCP features and developments in Azure Functions. Stay tuned!470Views0likes0CommentsMCP Apps on Azure Functions: Quick Start with TypeScript
Azure Functions makes hosting MCP apps simple: build locally, create a secure endpoint, and deploy fast with Azure Developer CLI (azd). This guide shows you how using a weather app example. What Are MCP Apps? MCP Apps let MCP servers return interactive HTML interfaces such as data visualizations, forms, dashboards that render directly inside MCP-compatible hosts (Visual Studio Code Copilot, Claude, ChatGPT, etc.). Learn more about MCP Apps in the official documentation. Having an interactive UI removes many restrictions that plain texts have, such as if your scenario has: Interactive Data: Replacing lists with clickable maps or charts for deep exploration. Complex Setup: Use one-page forms instead of long, back-and-forth questioning. Rich Media: Embed native viewers to pan, zoom, or rotate 3D models and documents. Live Updates: Maintain real-time dashboards that refresh without new prompts. Workflow Management: Handle multi-step tasks like approvals with navigation buttons and persistent state. MCP App Hosting as a Feature Azure Functions provides an easy abstraction to help you build MCP servers without having to learn the nitty-gritty of the MCP protocol. When hosting your MCP App on Functions, you get: MCP tools (server logic): Handle client requests, call backend services, return structured data - Azure Functions manages the MCP protocol details for you MCP resources (UI payloads such as app widgets): Serve interactive HTML, JSON documents, or formatted content - just focus on your UI logic Secure HTTPS access: Built-in authentication using Azure Functions keys, plus built-in MCP authentication with OAuth support for enterprise-grade security Easy deployment with Bicep and azd: Infrastructure as Code for reliable deployments Local development: Test and debug locally before deploying Auto-scaling: Azure Functions handles scaling, retries, and monitoring automatically The weather app in this repo is an example of this feature, not the only use case. Architecture Overview Example: The classic Weather App The sample implementation includes: A GetWeather MCP tool that fetches weather by location (calls Open-Meteo geocoding and forecast APIs) A Weather Widget MCP resource that serves interactive HTML/JS code (runs in the client; fetches data via GetWeather tool) A TypeScript service layer that abstracts API calls and data transformation (runs on the server) Bidirectional communication: client-side UI calls server-side tools, receives data, renders locally Local and remote testing flow for MCP clients (via MCP Inspector, VS Code, or custom clients) How UI Rendering Works in MCP Apps In the Weather App example: Azure Functions serves getWeatherWidget as a resource → returns weather-app.ts compiled to HTML/JS Client renders the Weather Widget UI User interacts with the widget or requests are made internally The widget calls the getWeather tool → server processes and returns weather data The widget renders the weather data on the client side This architecture keeps the UI responsive locally while using server-side logic and data on demand. Quick Start Checkout repository: https://github.com/Azure-Samples/remote-mcp-functions-typescript Run locally: npm install npm run build func start Local endpoint: http://0.0.0.0:7071/runtime/webhooks/mcp Deploy to Azure: azd provision azd deploy Remote endpoint: https://.azurewebsites.net/runtime/webhooks/mcp TypeScript MCP Tools Snippet (Get Weather service) In Azure Functions, you define MCP tools using app.mcpTool(). The toolName and description tell clients what this tool does, toolProperties defines the input arguments (like location as a string), and handler points to your function that processes the request. app.mcpTool("getWeather", { toolName: "GetWeather", description: "Returns current weather for a location via Open-Meteo.", toolProperties: { location: arg.string().describe("City name to check weather for") }, handler: getWeather, }); Resource Trigger Snippet (Weather App Hook) MCP resources are defined using app.mcpResource(). The uri is how clients reference this resource, resourceName and description provide metadata, mimeType tells clients what type of content to expect, and handler is your function that returns the actual content (like HTML for a widget). app.mcpResource("getWeatherWidget", { uri: "ui://weather/index.html", resourceName: "Weather Widget", description: "Interactive weather display for MCP Apps", mimeType: "text/html;profile=mcp-app", handler: getWeatherWidget, }); Sample repos and references Complete sample repository with TypeScript implementation: https://github.com/Azure-Samples/remote-mcp-functions-typescript Official MCP extension documentation: https://learn.microsoft.com/azure/azure-functions/functions-bindings-mcp?pivots=programming-language-typescript Java sample: https://github.com/Azure-Samples/remote-mcp-functions-java .NET sample: https://github.com/Azure-Samples/remote-mcp-functions-dotnet Python sample: https://github.com/Azure-Samples/remote-mcp-functions-python MCP Inspector: https://github.com/modelcontextprotocol/inspector Final Takeaway MCP Apps are just MCP servers but they represent a paradigm shift by transforming the AI from a text-based chatbot into a functional interface. Instead of forcing users to navigate complex tasks through back-and-forth conversations, these apps embed interactive UIs and tools directly into the chat, significantly improving the user experience and the usefulness of MCP servers. Azure Functions allows developers to quickly build and host an MCP app by providing an easy abstraction and deployment experience. The platform also provides built-in features to secure and scale your MCP apps, plus a serverless pricing model so you can just focus on the business logic.408Views1like0CommentsBring Your Own Model (BYOM) for Azure AI Applications using Azure Machine Learning
Modern AI-powered applications running on Azure increasingly require flexibility in model choice. While managed model catalogs accelerate time to value, real-world enterprise applications often need to: Host open‑source or fine‑tuned models Deploy domain‑specific or regulated models inside a tenant boundary Maintain tight control over runtime environments and versions Integrate AI inference into existing application architectures This is where Bring Your Own Model (BYOM) becomes a core architectural capability, not just an AI feature. In this post, we’ll walk through a production-ready BYOM pattern for Azure applications, using: Azure Machine Learning as the model lifecycle and inference platform Azure-hosted applications (and optionally Microsoft Foundry) as the orchestration layer The focus is on building scalable, governable AI-powered apps on Azure, not platform lock‑in. We use SmolLM‑135M as a reference model. The same pattern applies to any open‑source or proprietary model. Reference Architecture: Azure BYOM for AI Applications At a high level, the responsibilities are clearly separated: Azure Layer Responsibility Azure Application Layer API, app logic, orchestration, agent logic Azure Machine Learning Model registration, environments, scalable inference Azure Identity & Networking Authentication, RBAC, private endpoints Key principle: Applications orchestrate. Azure ML executes the model. This keeps AI workloads modular, auditable, and production-safe. BYOM Workflow Overview Provision Azure Machine Learning Create Azure ML compute Author code in an Azure ML notebook Download and package the model Register the model Define a reproducible inference environment Implement scoring logic Deploy a managed online endpoint Use the endpoint from Microsoft Foundry Step 1: Provision Azure Machine Learning An Azure ML workspace is the governance boundary for BYOM: Model versioning and lineage Environment definitions Secure endpoint hosting Auditability Choose region carefully for latency, data residency, and networking. Step 2: Create Azure ML Compute (Compute Instance) Create a Compute Instance in Azure ML Studio. Why this matters: Managed Jupyter environment Identity integrated (no secrets in notebooks) Ideal for model packaging and testing - Enable auto‑shutdown for cost control - CPU is sufficient for most development workflows Step 3: Create an Azure ML Notebook Open Azure ML Studio → Notebooks Create a new Python notebook Select the Python SDK v2 kernel This notebook will handle the entire BYOM lifecycle. Step 4: Connect to the Azure ML Workspace # Import Azure ML SDK client from azure.ai.ml import MLClient # Import identity library for secure authentication from azure.identity import DefaultAzureCredential # Define workspace details subscription_id = "<SUBSCRIPTION_ID>" resource_group = "<RESOURCE_GROUP>" workspace_name = "<WORKSPACE_NAME>" # Create MLClient using Microsoft Entra ID # No keys or secrets are embedded in code ml_client = MLClient( DefaultAzureCredential(), subscription_id, resource_group, workspace_name ) The code above uses enterprise identity and aligns with zero‑trust practices. Step 5: Download and Package Model Artifacts from transformers import AutoModelForCausalLM, AutoTokenizer import os # Hugging Face model identifier model_id = "HuggingFaceTB/SmolLM-135M" # Local directory where model artifacts will be stored model_dir = "smollm_135m" os.makedirs(model_dir, exist_ok=True) # Download model weights model = AutoModelForCausalLM.from_pretrained(model_id) # Download tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id) # Save artifacts locally model.save_pretrained(model_dir) tokenizer.save_pretrained(model_dir) 🔹 Open‑source or proprietary models follow the same packaging pattern 🔹 Azure ML treats all registered models identically Step 6: Register the Model in Azure ML Register the packaged artifacts as a custom model asset. Optionally, developers can: Enables version tracking Supports rolling upgrades Integrates with CI/CD pipelines This is the foundation for repeatable inference deployments. from azure.ai.ml.entities import Model # Create a model asset in Azure ML registered_model = Model( path=model_dir, name="SmolLM-135M", description="BYOM model for Microsoft Foundry extensibility", type="custom_model" ) # Register (or update) the model ml_client.models.create_or_update(registered_model) Step 7: Define a Reproducible Inference Environment name: dev-hf-base channels: - conda-forge dependencies: - python=3.12 - numpy=2.3.1 - pip=25.1.1 - scipy=1.16.1 - pip: - azureml-inference-server-http==1.4.1 - inference-schema[numpy-support] - accelerate==1.10.0 - einops==0.8.1 - torch==2.0.0 - transformers==4.55.2 ⚠️ Environment management is the hardest part of BYOM ✅ Treat environment changes like code changes BYOM Inference Patterns The same model can expose multiple behaviors. Pattern 1: Text Generation Endpoint This is the most common pattern for AI-powered applications: REST-based text generation Stateless inference Horizontal scaling through Azure ML managed endpoints Ideal for: Copilots Chat APIs Summarization or content generation services Scoring Script (score.py) import os import json import torch from transformers import AutoTokenizer, AutoModelForCausalLM def init(): """ Called once when the container starts. Loads the model and tokenizer into memory. """ global model, tokenizer # Azure ML injects model path at runtime model_dir = os.getenv("AZUREML_MODEL_DIR") tokenizer = AutoTokenizer.from_pretrained(model_dir) model = AutoModelForCausalLM.from_pretrained(model_dir) model.eval() def run(raw_data): """ Called for each inference request. Expects JSON input with a 'prompt' field. """ data = json.loads(raw_data) prompt = data.get("prompt", "") # Tokenize input text inputs = tokenizer(prompt, return_tensors="pt") # Generate text without tracking gradients with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=100) # Decode output tokens into text response_text = tokenizer.decode(outputs[0], skip_special_tokens=True) return {"response": response_text} Example Request { "prompt": "Summarize the BYOM pattern in one sentence." } Example Response { "response": "Bring Your Own Model (BYOM) allows organizations to extend Microsoft Foundry with custom models hosted on Azure Machine Learning while maintaining enterprise governance and scalability." } Pattern 2: Predictive / Token Rank Analysis The same model can expose non-generative behaviors, such as: Token likelihood analysis Ranking or scoring Model introspection services This enables AI-backed analytics capabilities, not just chat. import torch from transformers import AutoModelForCausalLM, AutoTokenizer class PredictiveAnalysisModel: """ Computes the rank of each token based on the model's next-token probability distribution. """ def init(self, model, tokenizer): self.model = model self.tokenizer = tokenizer self.model.eval() def analyze(self, text): tokens = self.tokenizer.tokenize(text) token_ids = self.tokenizer.convert_tokens_to_ids(tokens) # Start with BOS token input_sequence = [self.tokenizer.bos_token_id, *token_ids] results = [] for i in range(len(token_ids)): context = input_sequence[: i + 1] model_input = torch.tensor([context]) with torch.no_grad(): outputs = self.model(model_input) logits = outputs.logits[0, -1] sorted_indices = torch.argsort(logits, descending=True) actual_token = token_ids[i] rank = (sorted_indices == actual_token).nonzero(as_tuple=True)[0].item() results.append({ "token": tokens[i], "rank": rank }) return results @classmethod def from_disk(cls, model_path): model = AutoModelForCausalLM.from_pretrained(model_path) tokenizer = AutoTokenizer.from_pretrained(model_path) return cls(model, tokenizer) Scoring Script (score.py) import os from predictive_analysis import PredictiveAnalysisModel def init(): """ Loads predictive analysis model from disk. """ global model model_dir = os.getenv("AZUREML_MODEL_DIR") model = PredictiveAnalysisModel.from_disk(model_dir) def run(text: str): """ Accepts raw text input and returns token ranks. """ return { "token_ranks": model.analyze(text) } Example Request { "text": "This is a test." } Example Response { "token_ranks": [ { "token": "This", "rank": 518 }, { "token": " is", "rank": 2 }, { "token": " a", "rank": 0 }, { "token": " test", "rank": 33 }, { "token": ".", "rank": 77 } ] } Consuming the BYOM Endpoint from Azure Applications Azure ML endpoints are external inference services consumed by apps. Option A: Application-Controlled Invocation App calls Azure ML endpoint directly IAM, networking, and retries controlled by the app Recommended for most production systems import requests import os AML_ENDPOINT = os.environ["AML_ENDPOINT"] AML_KEY = os.environ["AML_KEY"] headers = { "Authorization": f"Bearer {AML_KEY}", "Content-Type": "application/json" } payload = { "prompt": "Summarize BYOM in one sentence." } response = requests.post(AML_ENDPOINT, json=payload, headers=headers) print(response.json()) Option B: Tool-Based Invocation Expose the ML endpoint as an OpenAPI interface Allow higher-level orchestration layers (such as agents) to invoke it dynamically Both patterns integrate cleanly with Azure App Services, Container Apps, Functions, and Kubernetes-based apps. Operational Considerations Dependency management is ongoing work Model upgrades require redeployment Private networking must be planned early Use managed Foundry models where possible Use BYOM when business or regulatory needs require it Security and Governance by Default BYOM on Azure ML integrates natively with Azure platform controls: Entra ID & managed identity RBAC-based permissions Private networking and VNET isolation Centralized logging and diagnostics This makes BYOM suitable for regulated industries and production‑critical AI workloads. When Should You Use BYOM? BYOM is the right choice when: You need model choice independence You want to deploy open‑source or proprietary LLMs You require enterprise‑grade controls You are building AI APIs, agents, or copilots at scale For experimentation, higher‑level tooling may be faster. For production, BYOM provides the control and durability enterprises require. Conclusion Azure applications increasingly depend on AI, but models should not dictate architecture. With Azure Machine Learning as the execution layer and Azure Apps as the orchestration layer, organizations can: combine managed and custom models Enforce security and compliance Scale AI workloads reliably Avoid platform and vendor lock-in Bring Your Own Model (BYOM) is no longer a niche requirement. It is a foundational pattern for enterprise AI platforms. Azure Machine Learning enables BYOM across open‑source models, fine‑tuned variants, and proprietary LLMs, allowing organizations to innovate without being locked into a single model provider. You build the application. Azure delivers the platform. You own the model. That is the essence of BYOM on Azure.715Views1like0CommentsContinued Investment in Azure App Service
This blog was originally published to the App Service team blog Recent Investments Premium v4 (Pv4) Azure App Service Premium v4 delivers higher performance and scalability on newer Azure infrastructure while preserving the fully managed PaaS experience developers rely on. Premium v4 offers expanded CPU and memory options, improved price-performance, and continued support for App Service capabilities such as deployment slots, integrated monitoring, and availability zone resiliency. These improvements help teams modernize and scale demanding workloads without taking on additional operational complexity. App Service Managed Instance App Service Managed Instance extends the App Service model to support Windows web applications that require deeper environment control. It enables plan-level isolation, optional private networking, and operating system customization while retaining managed scaling, patching, identity, and diagnostics. Managed Instance is designed to reduce migration friction for existing applications, allowing teams to move to a modern PaaS environment without code changes. Faster Runtime and Language Support Azure App Service continues to invest in keeping pace with modern application stacks. Regular updates across .NET, Node.js, Python, Java, and PHP help developers adopt new language versions and runtime improvements without managing underlying infrastructure. Reliability and Availability Improvements Ongoing investments in platform reliability and resiliency strengthen production confidence. Expanded Availability Zone support and related infrastructure improvements help applications achieve higher availability with more flexible configuration options as workloads scale. Deployment Workflow Enhancements Deployment workflows across Azure App Service continue to evolve, with ongoing improvements to GitHub Actions, Azure DevOps, and platform tooling. These enhancements reduce friction from build to production while preserving the managed App Service experience. A Platform That Grows With You These recent investments reflect a consistent direction for Azure App Service: active development focused on performance, reliability, and developer productivity. Improvements to runtimes, infrastructure, availability, and deployment workflows are designed to work together, so applications benefit from platform progress without needing to re-architect or change operating models. The recent General Availability of Aspire on Azure App Service is another example of this direction. Developers building distributed .NET applications can now use the Aspire AppHost model to define, orchestrate, and deploy their services directly to App Service — bringing a code-first development experience to a fully managed platform. We are also seeing many customers build and run AI-powered applications on Azure App Service, integrating models, agents, and intelligent features directly into their web apps and APIs. App Service continues to evolve to support these scenarios, providing a managed, scalable foundation that works seamlessly with Azure's broader AI services and tooling. Whether you are modernizing with Premium v4, migrating existing workloads using App Service Managed Instance, or running production applications at scale - including AI-enabled workloads - Azure App Service provides a predictable and transparent foundation that evolves alongside your applications. Azure App Service continues to focus on long-term value through sustained investment in a managed platform developers can rely on as requirements grow, change, and increasingly incorporate AI. Get Started Ready to build on Azure App Service? Here are some resources to help you get started: Create your first web app — Deploy a web app in minutes using the Azure portal, CLI, or VS Code. App Service documentation — Explore guides, tutorials, and reference for the full platform. Aspire on Azure App Service — Now generally available. Deploy distributed .NET applications to App Service using the Aspire AppHost model. Pricing and plans — Compare tiers including Premium v4 and find the right fit for your workload. App Service on Azure Architecture Center — Reference architectures and best practices for production deployments.452Views1like0CommentsTake Control of Every Message: Partial Failure Handling for Service Bus Triggers in Azure Functions
The Problem: All-or-Nothing Batch Processing in Azure Service Bus Azure Service Bus is one of the most widely used messaging services for building event-driven applications on Azure. When you use Azure Functions with a Service Bus trigger in batch mode, your function receives multiple messages at once for efficient, high-throughput processing. But what happens when one message in the batch fails? Your function receives a batch of 50 Service Bus messages. 49 process perfectly. 1 fails. What happens? In the default model, the entire batch fails. All 50 messages go back on the queue and get reprocessed, including the 49 that already succeeded. This leads to: Duplicate processing — messages that were already handled successfully get processed again Wasted compute — you pay for re-executing work that already completed Infinite retry loops — if that one "poison" message keeps failing, it blocks the entire batch indefinitely Idempotency burden — your downstream systems must handle duplicates gracefully, adding complexity to every consumer This is the classic all-or-nothing batch failure problem. Azure Functions solves it with per-message settlement. The Solution: Per-Message Settlement for Azure Service Bus Azure Functions gives you direct control over how each individual message is settled in real time, as you process it. Instead of treating the batch as all-or-nothing, you settle each message independently based on its processing outcome. With Service Bus message settlement actions in Azure Functions, you can: Action What It Does Complete Remove the message from the queue (successfully processed) Abandon Release the lock so the message returns to the queue for retry, optionally modifying application properties Dead-letter Move the message to the dead-letter queue (poison message handling) Defer Keep the message in the queue but make it only retrievable by sequence number This means in a batch of 50 messages, you can: Complete 47 that processed successfully Abandon 2 that hit a transient error (with updated retry metadata) Dead-letter 1 that is malformed and will never succeed All in a single function invocation. No reprocessing of successful messages. No building failure response objects. No all-or-nothing. Why This Matters 1. Eliminates Duplicate Processing When you complete messages individually, successfully processed messages are immediately removed from the queue. There's no chance of them being redelivered, even if other messages in the same batch fail. 2. Enables Granular Error Handling Different failures deserve different treatments. A malformed message should be dead-lettered immediately. A message that failed due to a transient database timeout should be abandoned for retry. A message that requires manual intervention should be deferred. Per-message settlement gives you this granularity. 3. Implements Exponential Backoff Without External Infrastructure By combining abandon with modified application properties, you can track retry counts per message and implement exponential backoff patterns directly in your function code, no additional queues or Durable Functions required. 4. Reduces Cost You stop paying for redundant re-execution of already-successful work. In high-throughput systems processing millions of messages, this can be a material cost reduction. 5. Simplifies Idempotency Requirements When successful messages are never redelivered, your downstream systems don't need to guard against duplicates as aggressively. This reduces architectural complexity and potential for bugs. Before: One Message = One Function Invocation Before batch support, there was no cardinality option, Azure Functions processed each Service Bus message as a separate function invocation. If your queue had 50 messages, the runtime spun up 50 individual executions. Single-Message Processing (The Old Way) import { app, InvocationContext } from '@azure/functions'; async function processOrder( message: unknown, // ← One message at a time, no batch context: InvocationContext ): Promise<void> { try { const order = message as Order; await processOrder(order); } catch (error) { context.error('Failed to process message:', error); // Message auto-complete by default. throw error; } } app.serviceBusQueue('processOrder', { connection: 'ServiceBusConnection', queueName: 'orders-queue', handler: processOrder, }); What this cost you: 50 messages on the queue Old (single-message) New (batch + settlement) Function invocations 50 separate invocations 1 invocation Connection overhead 50 separate DB/API connections 1 connection, reused across batch Compute cost 50× invocation overhead 1× invocation overhead Settlement control Binary: throw or don't 4 actions per message Every message paid the full price of a function invocation, startup, connection setup, teardown. At scale (millions of messages/day), this was a significant cost and latency penalty. And when a message failed, your only option was to throw (retry the whole message) or swallow the error (lose it silently). Code Examples Let's see how this looks across all three major Azure Functions language stacks. Node.js (TypeScript with @ azure/functions-extensions-servicebus) import '@azure/functions-extensions-servicebus'; import { app, InvocationContext } from '@azure/functions'; import { ServiceBusMessageContext, messageBodyAsJson } from '@azure/functions-extensions-servicebus'; interface Order { id: string; product: string; amount: number; } export async function processOrderBatch( sbContext: ServiceBusMessageContext, context: InvocationContext ): Promise<void> { const { messages, actions } = sbContext; for (const message of messages) { try { const order = messageBodyAsJson<Order>(message); await processOrder(order); await actions.complete(message); // ✅ Done } catch (error) { context.error(`Failed ${message.messageId}:`, error); await actions.deadletter(message); // ☠️ Poison } } } app.serviceBusQueue('processOrderBatch', { connection: 'ServiceBusConnection', queueName: 'orders-queue', sdkBinding: true, autoCompleteMessages: false, cardinality: 'many', handler: processOrderBatch, }); Key points: Enable sdkBinding: true and autoCompleteMessages: false to gain manual settlement control ServiceBusMessageContext provides both the messages array and actions object Settlement actions: complete(), abandon(), deadletter(), defer() Application properties can be passed to abandon() for retry tracking Built-in helpers like messageBodyAsJson<T>() handle Buffer-to-object parsing Full sample: serviceBusSampleWithComplete Python (V2 Programming Model) import json import logging from typing import List import azure.functions as func import azurefunctions.extensions.bindings.servicebus as servicebus app = func.FunctionApp(http_auth_level=func.AuthLevel.FUNCTION) @app.service_bus_queue_trigger(arg_name="messages", queue_name="orders-queue", connection="SERVICEBUS_CONNECTION", auto_complete_messages=False, cardinality="many") def process_order_batch(messages: List[servicebus.ServiceBusReceivedMessage], message_actions: servicebus.ServiceBusMessageActions): for message in messages: try: order = json.loads(message.body) process_order(order) message_actions.complete(message) # ✅ Done except Exception as e: logging.error(f"Failed {message.message_id}: {e}") message_actions.dead_letter(message) # ☠️ Poison def process_order(order): logging.info(f"Processing order: {order['id']}") Key points: Uses azurefunctions.extensions.bindings.servicebus for SDK-type bindings with ServiceBusReceivedMessage Supports both queue and topic triggers with cardinality="many" for batch processing Each message exposes SDK properties like body, enqueued_time_utc, lock_token, message_id, and sequence_number Full sample: servicebus_samples_settlement .NET (C# Isolated Worker) using Azure.Messaging.ServiceBus; using Microsoft.Azure.Functions.Worker; public class ServiceBusBatchProcessor(ILogger<ServiceBusBatchProcessor> logger) { [Function(nameof(ProcessOrderBatch))] public async Task ProcessOrderBatch( [ServiceBusTrigger("orders-queue", Connection = "ServiceBusConnection")] ServiceBusReceivedMessage[] messages, ServiceBusMessageActions messageActions) { foreach (var message in messages) { try { var order = message.Body.ToObjectFromJson<Order>(); await ProcessOrder(order); await messageActions.CompleteMessageAsync(message); // ✅ Done } catch (Exception ex) { logger.LogError(ex, "Failed {MessageId}", message.MessageId); await messageActions.DeadLetterMessageAsync(message); // ☠️ Poison } } } private Task ProcessOrder(Order order) => Task.CompletedTask; } public record Order(string Id, string Product, decimal Amount); Key points: Inject ServiceBusMessageActions directly alongside the message array Each message is individually settled with CompleteMessageAsync, DeadLetterMessageAsync, or AbandonMessageAsync Application properties can be modified on abandon to track retry metadata Full sample: ServiceBusReceivedMessageFunctions.cs499Views3likes0Comments