azure functions
378 TopicsPerformance Tuning and Scaling Optimization for Large-Scale Azure Workloads
Summary As cloud-native systems scale, performance challenges rarely stem from a single bottleneck. Instead, they emerge from the interaction between compute, orchestration, and data layers under load. This article captures a practical optimization journey of a high-volume Azure-based workload and highlights how controlled scaling, improved orchestration design, and proactive database maintenance can significantly outperform brute-force scaling. Introduction Distributed systems are often designed with the assumption that scaling out will solve performance issues. However, for orchestration-heavy and database-intensive workloads, this approach can introduce more problems than it solves. In this scenario, the system processed millions of transactional records through Azure Functions, Durable Functions, messaging pipelines, APIs, and SQL databases. As the workload grew, the platform began experiencing: CPU and memory spikes Slower SQL queries Service Bus throttling Increased retries and execution delays What stood out was that these issues were not due to insufficient resources, but due to inefficient execution patterns at scale. The optimization effort therefore focused on controlling how the system scaled and executed, rather than simply increasing capacity. Understanding Workload Behavior A critical early step was identifying the nature of the workload—specifically, whether it was CPU-heavy or data-heavy. Rethinking Scaling: More Is Not Always Better One of the most important lessons was that scaling out aggressively can degrade performance. As more function instances processed messages in parallel: Database calls increased sharply API traffic surged Lock contention intensified Retry rates increased This created a cascading effect where retries amplified load, further slowing down the system. To address this, scaling was intentionally controlled using: Concurrency limits on function execution Batch-based processing instead of full parallel fan-out Small delays to smooth traffic spikes Chunking of large datasets into manageable units This shift from maximum parallelism to controlled throughput significantly improved system stability. Compute Optimization: CPU and Memory After stabilizing scaling behavior, the next step was optimizing compute usage. CPU Optimization CPU spikes were largely caused by excessive parallel execution and orchestration overhead. Improvements included: Breaking large workloads into smaller units Reducing unnecessary fan-outs of processes Limiting concurrent executions This resulted in more predictable CPU usage and improved execution consistency. Memory Optimization Memory pressure was primarily driven by large payloads and batch processing. Optimizations focused on: Processing data in smaller chunks Avoiding large in-memory payloads and memory leaks Reducing orchestration state size These changes improved system reliability and reduced execution failures under load. Scaling Approaches: Practical Trade-Offs Both vertical and horizontal scaling were used, but with careful consideration. Scale Up (Vertical Scaling) Quick to implement No architectural changes required Useful for immediate stabilization However, it had cost and scalability limits. Scale Out (Horizontal Scaling) Better suited for long-term scalability Enables workload distribution But without control, it can: Increase database contention Amplify retries Introduce instability Key Insight The most effective approach was not choosing one over the other but combining both with strict control over concurrency and execution patterns. Durable Functions: Orchestration Optimization Durable Functions were central to the system, making orchestration design a key factor in performance. Challenges Observed The initial design relied heavily on nested sub-orchestrators, which introduced: High orchestration overhead Increased replay and persistence operations Slower execution at scale Key Improvements Refactoring unnecessary sub-orchestrators into Activity Functions simplified execution and improved throughput. The benefits included: Reduced orchestration latency Faster execution cycles Lower infrastructure cost Note: However, sub-orchestrators remain the right choice when the design requires composing multiple dependent steps, managing scoped retry/error logic, or isolating orchestration history. The decision should be driven by the complexity and reuse requirements of each workflow segment and not applied as a blanket rule. Improved Retry Strategy Retry behavior was also optimized by redefining execution boundaries. Previously: One activity processed multiple records A single failure triggered a retry of the entire batch After optimization: One activity handled one logical unit of work This enabled: Granular retries Better failure isolation Reduced duplicate processing Database Hygiene: A Critical Foundation The database emerged as a major bottleneck due to fragmentation and stale statistics caused by continuous high-volume operations. Issues Identified Fragmented indexes Inefficient query plans Increased query execution time Optimization Approach A proactive maintenance strategy was implemented using scheduled jobs to: Update statistics regularly Rebuild indexes Maintain query performance consistency Controlled Database Load For heavy long-running workloads in multi-tenant architecture, execution of DB intensive process was intentionally run in singleton fashion at a tenant level to reduce contention. This approach: Prevented concurrent heavy operations Improved overall system stability Delivered more predictable throughput Observability: Finding the Real Problem A major challenge during optimization was distinguishing between symptoms and root causes. For example: Slow APIs were often caused by database contention High retries were triggered by upstream throttling Orchestration delays originated from downstream dependencies To address this, end-to-end observability was established using: Application-level tracing Load testing correlations Cross-service telemetry analysis This enabled accurate root cause identification and prevented misdirected optimization efforts. Key Takeaways Some key principles emerged from this optimization journey: Scaling more does not always mean performing better Controlled parallelism is more effective than unrestricted concurrency Orchestration design directly impacts system performance Database maintenance must be proactive Retry strategies should align with logical units of work Observability is essential for correct diagnosis Conclusion Performance tuning in distributed systems is less about adding resources and more about using them efficiently. By focusing on controlled scaling, simplifying orchestration, maintaining database health, and improving observability, the system achieved higher throughput, lower cost, and significantly improved stability. These lessons are broadly applicable to any Azure-based system handling large-scale, orchestration-heavy workloads and can help teams design more predictable and resilient architectures.388Views5likes0CommentsWhy Does Azure App Service Return HTTP 404?
When an application deployed to Azure App Service suddenly starts returning HTTP 404 – Not Found, it can be confusing —especially when: The deployment completed successfully The App Service shows as Running No obvious errors appear in the portal This behaviour is more common than it appears and is often linked to routing, configuration, or platform : In this article, I’ll walk through real-world reasons why Azure App Service can return HTTP 404 errors, based on issues . The goal is to help you systematically isolate the root cause—whether it’s application-level, configuration-related, or platform-specific. What Does HTTP 404 Mean in Azure App Service? An HTTP 404 response from Azure App Service means: The incoming request successfully reached Azure App Service, but neither the platform nor the application could locate the requested resource. This distinction is important. Unlike connectivity or DNS issues, a 404 confirms that: DNS resolution worked The request hit the App Service front end The failure happened after request routing Incorrect Application URL or Route This is the most common cause of 404 errors. Typical scenarios Accessing the root URL (https://<app>.azurewebsites.net) for a Web API that exposes only API routes Missing route prefixes such as /api , /v1controller/action name segments Case sensitivity mismatches on Linux App Service Example https://myapp.azurewebsites.net Returns 404, but: https://myapp.azurewebsites.net/weatherforecast Works as expected. ✅ Tip: Always validate your routing locally and confirm the exact same path is being accessed in Azure. Application Appears Running, but Startup Failed Partially It is possible for an App Service to show Running even when the application failed to initialize fully. Common causes Missing or incorrect environment variables Invalid connection strings Exceptions thrown during Program.cs / Startup.cs Dependency initialization failures at startup In such scenarios, the app may start the host process but fail to register routes—resulting in 404 responses instead of 500 errors. ✅ Where to check Application logs Deployment logs Kudu → LogFiles Static Files Not Found or Not Being Served For applications hosting static content (HTML, JavaScript, images, JSON files), a 404 can occur even when files exist. Common reasons Files not deployed to the expected directory (wwor root, /home/site/wwwroot) Missing or unsupported MIME type configuration (commonly seen with .json) Static file middleware not enabled in ASP.NET Core applications ✅ Quick validation: Deploy a simple test.html to wwwroot and try accessing it directly. Windows vs Linux App Service Differences Behaviour can differ significantly between Windows App Service and Linux App Service. Common pitfalls on Linux Case-sensitive file paths (Index.html ≠ index.html) Missing or incorrect startup command Differences in request routing handled by Nginx ✅ Tip: If the app works on Windows App Service but fails on Linux, always recheck file casing and startup configuration first. Custom Domain and Networking Configuration Issues In some cases, requests reach the App Service but fail due to domain or network constraints. Possible causes Incorrect custom domain binding ✅ Isolation step: Always test using the default *.azurewebsites.net specific issues the issue is domain-specific. 6. Health Checks or Monitoring Probes Targeting Invalid Paths Seeing periodic 404 entries in logs—every few minutes—is often a sign of misconfigured probes. Typical scenarios App Service Health Check configured with a non-existent endpoint External monitoring tools probing /health or paths that do no exist ✅ Fix: Ensure the health check path maps to a valid endpoint implemented by the application. 7.Missing or Corrupted Deployment Artifacts Even when deployments report success, application files may not be where the runtime expects them. Commonly observed with Zip deployments WEBSITE_RUN_FROM_PACKAGE misconfigurations Partial or interrupted deployments ✅ Verify using Kudu: Browse /home/site/wwwroot and check files are present. Quick Troubleshooting Checklist If your Azure App Service is returning HTTP 404: Verify the exact URL and route Test hostingstart.html or a static file (for example, /hostingstart.html) Review startup and application logs Inspect deployed artifacts via Kudu Validate Windows vs Linux behaviour differences Review networking, authentication, and health check settings 8. Application Gateway infront of App Service If you have Application gateway infront of app service , please check the re-write rules so that the request is being sent to correct path. Final Thoughts HTTP 404 errors on Azure App Service are rarely random. In most cases, they point to: Routing mismatches Startup or configuration failures Platform-specific behavior differences By breaking the investigation into platform → configuration → application, you can systematically narrow down the root cause and resolve the issue. Happy debugging 🚀320Views1like0CommentsHow to Troubleshoot Azure Functions Service Bus Trigger Issues
Overview Azure Functions integrates with Azure Service Bus via triggers and bindings, allowing you to build event-driven applications that react to queue and topic messages. The Service Bus trigger uses PeekLock mode to receive messages, automatically manages message locks, and completes or abandons messages based on function execution results. When this integration encounters problems, you may see one or more of these symptoms: Messages accumulate in the queue or topic subscription and are not processed Functions execute but messages end up in the dead-letter queue (DLQ) MessageLockLostException or ServiceBusException errors in Application Insights Messages are processed multiple times (duplicate processing) The function app shows connection failures or AMQP errors in logs Trigger scaling does not work as expected — too few or too many instances Session-enabled queues stop processing after a period of time This blog walks you through how the Service Bus trigger works internally, what can go wrong, and — most importantly — how to systematically diagnose and resolve these failures. Understanding How the Service Bus Trigger Works Before diving into troubleshooting, it is important to understand how the Service Bus trigger processes messages. Message Processing Flow Service Bus Namespace (Queue or Topic/Subscription) → Functions runtime discovers serviceBusTrigger binding → ServiceBusProcessor created (PeekLock mode) → Message received → Lock acquired → Function invoked with message payload → Function succeeds → Message Completed ✓ → Function fails → Message Abandoned → Redelivered → Max delivery count reached → Dead-Letter Queue The Functions runtime uses the Azure.Messaging.ServiceBus SDK under the hood. It creates a ServiceBusProcessor (or ServiceBusSessionProcessor for session-enabled entities) that manages the message receive loop, lock renewal, and concurrency. Key Concepts Concept Description PeekLock The default receive mode. The message is locked for a duration (default 30 seconds at the entity level) and must be completed or abandoned. Auto-Complete By default (autoCompleteMessages: true), the runtime calls Complete on success and Abandon on failure. You can disable this to handle settlement in your own code. Lock Renewal If function execution takes longer than the lock duration, the runtime automatically renews the lock up to maxAutoLockRenewalDuration (default 5 minutes). Concurrency maxConcurrentCalls (default 16) controls how many messages are processed in parallel per instance. On multi-core plans, this is multiplied by the core count. Prefetch prefetchCount (default 0) controls how many messages are pre-fetched from the broker to improve throughput. Dead-Letter Queue Messages that exceed the maximum delivery count (set on the Service Bus entity, default 10) are moved to the DLQ instead of being redelivered. host.json Configuration Reference All Service Bus trigger settings are configured under the extensions.serviceBus section of host.json: { "version": "2.0", "extensions": { "serviceBus": { "clientRetryOptions":{ "mode": "exponential", "tryTimeout": "00:01:00", "delay": "00:00:00.80", "maxDelay": "00:01:00", "maxRetries": 3 }, "prefetchCount": 0, "transportType": "amqpWebSockets", "webProxy": "https://proxyserver:8080", "autoCompleteMessages": true, "maxAutoLockRenewalDuration": "00:05:00", "maxConcurrentCalls": 16, "maxConcurrentSessions": 8, "maxMessageBatchSize": 1000, "minMessageBatchSize": 1, "maxBatchWaitTime": "00:00:30", "sessionIdleTimeout": "00:01:00", "enableCrossEntityTransactions": false } } } Note: The clientRetryOptions settings apply only to interactions with the Service Bus service. They do not affect retries of function executions. For function-level retries, see Azure Functions error handling and retries. Issue Categories Category Typical Symptoms Root Cause Area Connection AMQP errors, timeout, function not triggering Connection string, network, firewall Authentication 401/403 errors, unauthorized access Managed identity, RBAC, SAS policy Message Lock MessageLockLostException, duplicate processing Long-running functions, lock duration mismatch Dead-Letter Messages going to DLQ unexpectedly Function exceptions, max delivery count Scaling Messages accumulating, underscaling Target-based scaling, host settings Configuration Trigger not firing, entity not found host.json, app settings, binding attributes Session Session processing stops Session lock, idle timeout, concurrency Networking Timeout in VNet-integrated apps NSG, private endpoints, DNS Common Causes and Solutions 1. Connection String or Configuration Errors Symptoms: Function does not trigger at all Error: "MessagingEntityNotFoundException" — queue or topic not found Error: "No connection string configured for the Service Bus trigger" Error referencing an invalid or missing app setting Why This Happens: The Service Bus trigger requires a valid connection to your Service Bus namespace. By default, it looks for an app setting named AzureWebJobsServiceBus. If you specify a custom Connection property on the trigger attribute, the runtime looks for that named setting instead. If the connection string is missing, invalid, or points to the wrong namespace, the trigger cannot create a ServiceBusProcessor and messages will not be processed. How to Verify: Check your trigger attribute for the Connection property value:[ServiceBusTrigger("myqueue", Connection = "ServiceBusConnection")] Navigate to your Function App → Settings → Configuration → Application settings Verify the connection setting exists and is correctly named For connection string–based connections, confirm the value contains a valid endpoint, SharedAccessKeyName, and SharedAccessKey For managed identity connections, confirm <CONNECTION_NAME>__fullyQualifiedNamespace is set to <your-namespace>.servicebus.windows.net Solution: Set the correct connection string or managed identity configuration in Application Settings Verify the queue or topic name in the trigger attribute matches the actual entity name in your Service Bus namespace (names are case-sensitive) If using managed identity, ensure the __fullyQualifiedNamespace suffix is used (with double underscores): { "ServiceBusConnection__fullyQualifiedNamespace": "myservicebus.servicebus.windows.net" } Ref: Service Bus trigger — Connections 2. Authentication and Authorization Failures (RBAC / SAS) Symptoms: Error: "Unauthorized access. 'Listen' claim(s) are required to perform this operation." Error: "AuthorizationFailedException" or "UnauthorizedException" Error: "Attempted to perform an unauthorized operation." 401 or 403 errors in Application Insights Why This Happens: The Service Bus trigger requires Listen permission on the queue or subscription. If you are using a Shared Access Signature (SAS) policy that does not include the Listen claim, or a managed identity without the correct RBAC role, the runtime cannot receive messages. For managed identity connections, the identity must be assigned the Azure Service Bus Data Receiver role (or Azure Service Bus Data Owner) at the appropriate scope. For topic subscriptions, the role assignment must have effective scope over the subscription resource, not just the topic. How to Verify: For SAS-based connections: Go to your Service Bus namespace → Shared access policies Confirm the policy used in your connection string has the Listen claim If your function also sends messages (output binding), the policy needs Send as well For managed identity: Go to your Service Bus namespace → Access control (IAM) → Role assignments Verify your Function App's managed identity has Azure Service Bus Data Receiver For topic triggers, verify the role is assigned at the subscription level (not just the topic) Solution: For SAS: Use a policy that has the required claims, or create a new policy with Listen (and Send if needed) For managed identity: Assign the correct role. Use the Azure CLI if the portal does not expose the subscription resource as a scope: Ref: Grant permission to the identity 3. Message Lock Lost Exceptions Symptoms: Error: "MessageLockLostException: The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue." Messages are processed but then redelivered (duplicate processing) Messages eventually end up in the dead-letter queue after repeated failures Why This Happens: When the Service Bus trigger receives a message in PeekLock mode, it acquires a lock for a duration configured on the Service Bus entity (default 30 seconds). The Functions runtime automatically renews this lock while your function is executing, up to the maxAutoLockRenewalDuration (default 5 minutes). A MessageLockLostException occurs when: Function execution exceeds maxAutoLockRenewalDuration — If your function takes longer than 5 minutes (the default), the lock renewal stops and the lock expires. The message becomes available for redelivery. Lock renewal fails due to a transient error — A network blip or Service Bus throttling can prevent a renewal request from succeeding. The entity's lock duration is very short — If the lock duration on the queue or subscription is set lower than the time between renewal attempts, the lock may expire between renewals. Batch processing with long execution times — For batch-triggered functions, maxAutoLockRenewalDuration applies to the entire batch, not individual messages. Note: automatic lock renewal is not supported for batch functions — the lock duration is determined by the entity-level setting. How to Verify: Check Application Insights for MessageLockLostException entries and note the function execution duration Compare the execution duration against your maxAutoLockRenewalDuration setting Check the lock duration on your Service Bus entity: Go to Service Bus namespace → Queue or Topic/Subscription → Properties Note the Lock duration value Solution: Increase maxAutoLockRenewalDuration in host.json to exceed your longest expected function execution time: { "version": "2.0", "extensions": { "serviceBus": { "maxAutoLockRenewalDuration": "00:10:00" } } } Increase the entity's lock duration on the Service Bus queue or subscription (maximum 5 minutes) to provide a larger window between renewal attempts Optimize function execution time — If your function is doing heavy processing, consider: Offloading work to a Durable Functions orchestration Using a queue-based load leveling pattern Breaking long operations into smaller units For batch functions — Reduce maxMessageBatchSize so that each batch completes within the entity's lock duration, since automatic lock renewal does not apply to batches Important: maxAutoLockRenewalDuration only applies to single-message functions. For batch functions, the message lock is governed by the entity-level lock duration setting. 4. Messages Going to the Dead-Letter Queue Symptoms: Messages appear in the dead-letter queue (DLQ) instead of being processed The DeadLetterReason on the dead-lettered message shows MaxDeliveryCountExceeded Function logs show repeated exceptions for the same message Some messages process successfully while others consistently fail Why This Happens: When a function throws an unhandled exception, the runtime calls Abandon on the message (when autoCompleteMessages is true). The message is returned to the queue and its DeliveryCount is incremented. Once the delivery count reaches the entity's Max delivery count (default 10), the message is automatically moved to the DLQ by Service Bus. Common reasons messages repeatedly fail: Poison messages with malformed or unexpected content Transient dependency failures (database, external API) that affect all retries Deserialization errors when the message body does not match the expected type Application bugs triggered by specific message content How to Verify: Check the dead-letter queue using Service Bus Explorer (Azure Portal → Service Bus namespace → Queue → Service Bus Explorer → Dead-letter tab) Inspect the DeadLetterReason and DeadLetterErrorDescription properties on the dead-lettered messages Check Application Insights for exceptions correlated with the message IDs Review the DeliveryCount on the messages — if it equals the max delivery count, the message was redelivered until it was DLQ'd Solution: Fix the root cause — Examine the dead-lettered messages and the corresponding exceptions to identify why processing fails Add error handling — Implement try-catch logic and decide whether to complete, dead-letter, or abandon the message explicitly: [Function(nameof(ProcessMessage))] public async Task ProcessMessage( [ServiceBusTrigger("myqueue", Connection = "ServiceBusConnection", AutoCompleteMessages = false)] ServiceBusReceivedMessage message, ServiceBusMessageActions messageActions) { try { // Process the message await ProcessAsync(message); await messageActions.CompleteMessageAsync(message); } catch (InvalidDataException) { // Poison message — send to DLQ with a reason await messageActions.DeadLetterMessageAsync(message, "InvalidData", "Message body could not be deserialized."); } catch (Exception ex) { // Transient failure — abandon for retry _logger.LogError(ex, "Processing failed, abandoning message {MessageId}", message.MessageId); await messageActions.AbandonMessageAsync(message); } } Increase max delivery count on the Service Bus entity if you need more retry attempts before dead-lettering Process the DLQ — Set up a separate function or process to monitor and handle dead-lettered messages Tip: Use ServiceBusMessageActions with AutoCompleteMessages = false. This prevents the runtime from attempting to complete messages after a successful function invocation. 5. Duplicate Message Processing Symptoms: Business logic executes more than once for the same message Database records or downstream operations are duplicated Logs show the same MessageId processed by multiple instances or multiple times on the same instance Why This Happens: Duplicate processing can occur in several scenarios: Message lock lost — If the lock expires (see Issue 3), the message becomes available and is picked up again — either by the same or a different instance Function timeout — If the function exceeds the functionTimeout in host.json (default 5 minutes for Consumption, 30 minutes for Premium/Dedicated), the runtime cancels the invocation but the message may have already been partially processed Instance restarts — If the Function App instance restarts or is scaled down during processing, in-flight messages are abandoned and redelivered At-least-once delivery — Service Bus guarantees at-least-once delivery. In rare cases, a message may be delivered more than once even without lock expiration How to Verify: Search Application Insights for the same MessageId appearing in multiple invocations: traces | where message has "MessageId" | summarize count() by tostring(customDimensions["MessageId"]) | where count_ > 1 Check if MessageLockLostException precedes the duplicate invocation Review functionTimeout settings in host.json Solution: Make your function idempotent — Design processing logic so that executing it multiple times with the same message produces the same result. Common patterns: Use the MessageId as a deduplication key Use upserts instead of inserts in your database Check for an existing record before processing Enable duplicate detection on the Service Bus entity: Set requiresDuplicateDetection: true when creating the queue or topic Configure duplicateDetectionHistoryTimeWindow (default 10 minutes) Address lock expiration — Follow the guidance in Issue 3 to prevent lock-related redelivery Use sessions for ordered, exactly-once-per-session processing when your business logic requires it 6. Scaling Issues — Messages Accumulating in the Queue Symptoms: Message count in the queue or subscription grows steadily Only one or a few instances are running despite a large backlog Target-based scaling does not appear to be working Messages are processed very slowly Why This Happens: Azure Functions uses target-based scaling for Service Bus triggers on Consumption, Elastic Premium and Flex Consumption plan. The scale controller monitors the entity's message count and active message count to decide how many instances to allocate. Scaling issues can arise from: maxConcurrentCalls is too low — Each instance processes at most maxConcurrentCalls messages concurrently. If this is set to 1 and messages take 1 second each, a single instance can only process ~60 messages/minute. functionTimeout or long processing — If each message takes a long time, fewer messages are processed per instance and scale-out is needed. Consumption plan cold start — New instances take time to spin up and establish connections. Premium plan with VNET_ROUTE_ALL — VNet integration can slow cold starts due to DNS resolution and private endpoint setup. Batch size misconfigured — For batch-triggered functions, a very large maxMessageBatchSize with long processing per message can bottleneck throughput. How to Verify: Check the active message count on your Service Bus entity over time Review the instance count in Metrics → Function App → Instance Count Check Application Insights for function invocation durations Verify maxConcurrentCalls and other settings in host.json Solution: Increase maxConcurrentCalls if your function can safely handle more parallelism: { "version": "2.0", "extensions": { "serviceBus": { "maxConcurrentCalls": 32 } } } Use prefetchCount to reduce latency by pre-fetching messages from the broker: { "version": "2.0", "extensions": { "serviceBus": { "prefetchCount": 32 } } } Use batched functions for high-throughput scenarios — process multiple messages per invocation: [Function(nameof(ProcessBatch))] public void ProcessBatch( [ServiceBusTrigger("myqueue", Connection = "ServiceBusConnection", IsBatched = true)] ServiceBusReceivedMessage[] messages) { foreach (var message in messages) { // Process each message } } Optimize function execution time — Reduce the per-message processing duration to allow higher throughput per instance For Premium plans, consider setting FUNCTIONS_WORKER_PROCESS_COUNT to use multiple language worker processes per instance for out-of-process language workers 7. Session-Enabled Queue or Subscription Issues Symptoms: Session processing stops after some time Error: "SessionLockLostException" Only some sessions are being processed while others are idle Sessions appear "stuck" and messages accumulate Why This Happens: When IsSessionsEnabled = true on the trigger, the runtime creates a ServiceBusSessionProcessor. This processor acquires a session lock, processes messages for that session, and then moves to the next session. Issues can arise from: maxConcurrentSessions is too low — The default is 8. If you have many active sessions, some will wait for a processor to become available. sessionIdleTimeout is too short — When no messages arrive for a session within this timeout, the session is released. If messages arrive slightly after the timeout, a new session lock must be acquired, adding latency. Long-running session processing — If processing a message within a session takes longer than the session lock duration, a SessionLockLostException occurs. Single-threaded per session — Within a session, messages are processed sequentially (FIFO). If one message in a session takes very long, it blocks subsequent messages in that session. How to Verify: Check Application Insights for SessionLockLostException Review the maxConcurrentSessions and sessionIdleTimeout settings in host.json Monitor the number of active sessions on your Service Bus entity Solution: Increase maxConcurrentSessions to process more sessions in parallel: { "version": "2.0", "extensions": { "serviceBus": { "maxConcurrentSessions": 32, "sessionIdleTimeout": "00:02:00" } } } Increase maxAutoLockRenewalDuration to prevent session lock expiration during long-running processing Optimize per-message processing time within sessions Review your session design — If you have a very large number of sessions with low message volume per session, consider whether sessions are the right pattern for your use case 8. AMQP Connection and Network Errors Symptoms: Error: "An AMQP error occurred (condition: 'amqp:link:detach-forced')." Error: "ServiceBusCommunicationException" or "SocketException" Error: "The link 'xxx' is force detached... due to broker shutting down" Intermittent connection drops and slow reconnects Trigger stops firing after a period of working correctly Why This Happens: The Service Bus trigger communicates with the Service Bus namespace over AMQP (TCP port 5671/5672). Connection issues can occur when: Network firewall blocks AMQP ports — Corporate firewalls or NSGs may block the required ports VNet integration without proper routing — Missing service endpoints, private endpoints, or DNS configuration Service Bus namespace throttling — Exceeding the messaging units for your tier causes throttling responses Idle connection timeout — Long-idle connections may be terminated by intermediate network devices Service Bus service maintenance — Broker restarts or failovers can force-detach links How to Verify: Check Application Insights for ServiceBusCommunicationException or AMQP-related errors Test connectivity from your Function App's network context: For VNet-integrated apps: use Diagnose and solve problems → Network Troubleshooter Test DNS resolution for <namespace>.servicebus.windows.net Test TCP connectivity on port 5671 Check Service Bus namespace metrics for throttling (ThrottledRequests metric) Review NSG rules on the Function App's subnet Solution: Allow AMQP traffic — Ensure ports 5671 and 5672 are open outbound in your NSG/firewall rules. Alternatively, switch to WebSockets: { "version": "2.0", "extensions": { "serviceBus": { "transportType": "amqpWebSockets" } } } Using amqpWebSockets routes traffic over port 443, which is more likely to be allowed by corporate firewalls. Configure private endpoints for VNet-integrated apps: Create a private endpoint for your Service Bus namespace Configure private DNS zone privatelink.servicebus.windows.net Ensure DNS zone is linked to your VNet Scale up the Service Bus tier if throttling is the issue — check the namespace's messaging units and consider upgrading from Basic to Standard or Premium Configure retry options in host.json for transient failures: { "version": "2.0", "extensions": { "serviceBus": { "clientRetryOptions": { "mode": "exponential", "maxRetries": 5, "delay": "00:00:01", "maxDelay": "00:01:00", "tryTimeout": "00:02:00" } } } } 9. Extension Bundle or NuGet Package Version Mismatch Symptoms: Error: "The 'serviceBusTrigger' binding type is not registered" Error: "Microsoft.Azure.WebJobs.Host: Error indexing method..." Function works locally but fails in Azure Missing features (e.g., ServiceBusMessageActions, IsBatched) that should be available Why This Happens: The Service Bus trigger implementation lives in the extension package. For non-compiled languages (Node.js, Python, PowerShell, Java) it is delivered via extension bundles. For compiled .NET apps, it comes from NuGet packages. If the version is outdated or mismatched, trigger types may not be registered or newer features may be unavailable. App Type Package Source .NET Isolated Microsoft.Azure.Functions.Worker.Extensions.ServiceBus (NuGet) .NET In-Process Microsoft.Azure.WebJobs.Extensions.ServiceBus (NuGet) Node.js, Python, Java, PowerShell Extension bundle in host.json How to Verify: For .NET apps: Check the version of the Service Bus extension NuGet package in your .csproj file For non-.NET apps: Check the extensionBundle version range in host.json Compare against the latest available versions on NuGet Solution: For .NET Isolated apps, update to the latest extension: <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.ServiceBus" Version="5.22.0" /> For non-.NET apps, ensure your extension bundle is current: { "version": "2.0", "extensionBundle": { "id": "Microsoft.Azure.Functions.ExtensionBundle", "version": "[4.*, 5.0.0)" } } For features like ServiceBusMessageActions and IsBatched, ensure you are on extension version 5.14.1 or later 10. Function Timeout Causing Message Redelivery Symptoms: Function execution is cancelled mid-processing CancellationToken is triggered before function completes Messages are redelivered and may eventually end up in the DLQ Application Insights shows FunctionTimeoutException Why This Happens: Azure Functions enforces a maximum execution timeout per invocation. The default depends on your hosting plan: Ref: Function app timeout duration Plan Default Timeout Maximum Timeout Consumption 5 minutes 10 minutes Flex Consumption 30 minutes Unlimited Premium 30 minutes Unlimited Dedicated (App Service) 30 minutes Unlimited If your Service Bus-triggered function exceeds this timeout, the runtime cancels the invocation. The message is abandoned and redelivered by Service Bus. How to Verify: Check Application Insights for FunctionTimeoutException Review function execution durations in Application Insights: requests | where name == "ProcessMessage" | summarize avg(duration), max(duration), percentile(duration, 95) by bin(timestamp, 1h) Check the functionTimeout setting in host.json Solution: Increase functionTimeout in host.json (within plan limits): { "version": "2.0", "functionTimeout": "00:10:00" } Upgrade your plan if you need longer execution times — Premium and Dedicated plans support unlimited timeout Optimize processing — Offload long-running work to Durable Functions, or use the claim-check pattern to move heavy payloads out of the message Use the CancellationToken to gracefully handle timeout and avoid partial processing: [Function(nameof(ProcessMessage))] public async Task ProcessMessage( [ServiceBusTrigger("myqueue", Connection = "ServiceBusConnection")] ServiceBusReceivedMessage message, CancellationToken cancellationToken) { await DoWorkAsync(message, cancellationToken); } Using Diagnose and Solve Problems The Azure Portal provides built-in diagnostics for Service Bus integration issues. How to Access: Navigate to your Function App in the Azure Portal Select Diagnose and solve problems from the left menu Search for relevant detectors: Detector What It Checks Function App Down or Reporting Errors Overall app health, host status, crash history Functions Configurations Check host.json and app settings validation Messaging Function Trigger Failure Helps troubleshoot messaging function trigger failures Network Troubleshooter VNet, private endpoint, and access restriction diagnostics These detectors run automated checks and provide targeted recommendations. Quick Troubleshooting Checklist Use this checklist to systematically diagnose Service Bus trigger issues: [ ] Connection: Is the Service Bus connection string or managed identity configuration set correctly in Application Settings? [ ] Entity name: Does the queue/topic/subscription name in the trigger attribute match the actual Service Bus entity? [ ] RBAC: For managed identity, does the Function App have Azure Service Bus Data Receiver role? [ ] Extension version: Is the Service Bus extension (NuGet or extension bundle) up to date? [ ] host.json: Is the serviceBus section configured correctly under extensions? [ ] Message locks: Is maxAutoLockRenewalDuration sufficient for your function's execution time? [ ] Dead-letter queue: Are messages accumulating in the DLQ? Check DeadLetterReason. [ ] Function timeout: Is your function completing within the plan's timeout limit? [ ] Network: For VNet-integrated apps, can the app reach the Service Bus namespace on the required ports? [ ] Scaling: Are enough instances allocated? Check instance count vs. message backlog. [ ] Exceptions: Check Application Insights for the first and most frequent exceptions. [ ] Diagnose and Solve: Have you run the built-in detectors in the Azure Portal? Conclusion Azure Functions Service Bus trigger issues span a wide range — from simple connection misconfigurations to complex message lock timing problems. The key to efficient troubleshooting is a systematic approach: Key Takeaways: Start with the basics — Verify connection settings, entity names, and permissions first. Most issues are configuration-related. Understand the lock lifecycle — maxAutoLockRenewalDuration, entity lock duration, and function execution time must be tuned in concert to prevent MessageLockLostException and duplicate processing. Design for at-least-once delivery — Make your functions idempotent. Service Bus guarantees at-least-once, not exactly-once. Use ServiceBusMessageActions for control — Disable autoCompleteMessages and settle messages explicitly for production-grade error handling. Monitor the dead-letter queue — DLQ messages are a direct signal that something is failing. Inspect them regularly. Tune concurrency for throughput — maxConcurrentCalls, prefetchCount, and batching settings significantly impact throughput. Apply one fix at a time — Change one setting, restart, and recheck. Avoid multiple simultaneous changes that obscure which fix resolved the issue. If you continue to experience issues after following these steps, consider opening a support ticket with Microsoft Azure Support, providing: Function App name and resource group Timestamp of when the issue started Application Insights exceptions and traces around the failure time Service Bus entity configuration (lock duration, max delivery count, sessions) host.json serviceBus configuration Recent deployment or configuration changes Networking configuration details (if VNet-integrated) References Azure Service Bus trigger for Azure Functions Azure Service Bus bindings — host.json settings Azure Functions error handling and retries Target-based scaling for Service Bus Service Bus PeekLock behavior Azure Functions networking options Azure Functions diagnostics Troubleshoot Azure Functions Service Bus dead-letter queues Azure Service Bus RBAC roles Have questions or feedback? Leave a comment below.450Views0likes0CommentsHow to Troubleshoot Azure Functions Host Startup Issue
Overview Azure Functions is a powerful serverless compute service that enables you to run event-driven code without managing infrastructure. When you deploy a Function App, the Azure Functions host is the runtime process responsible for discovering your functions, loading extensions and bindings, connecting to storage, and starting trigger listeners. A host startup issue occurs when the Functions runtime fails to initialize and cannot reach a healthy Running state. When this happens, you may see one or more of these symptoms: "Function host is not running" error in the Azure Portal Functions are not visible in the Functions blade Triggers stop firing — HTTP functions return 503, timer/queue functions are silent The portal shows Error state or no response on the host status endpoint Application Insights logs show repeated startup exceptions followed by restarts Log Stream shows a restart loop or no output at all This issue can be frustrating, especially when a deployment appeared to succeed and your code works correctly on your local machine. In this blog, we will explore how the host starts up, what can go wrong, and — most importantly — how to systematically diagnose and resolve startup failures. Understanding How the Host Starts Up Before diving into troubleshooting, it is important to understand the startup sequence. The Functions host executes the following steps each time the runtime initializes: Host Startup Sequence ASP.NET Core Startup → Register WebHost services (DI, secrets, diagnostics, middleware) → WebJobsScriptHostService.StartAsync() → Check file system (run-from-package validation) → Build inner ScriptHost → ScriptHost.InitializeAsync() → PreInitialize (validate settings, file system) → Load function metadata (function.json / decorators) → Load extensions and bindings (extension bundles / NuGet) → Create function descriptors and register triggers → Start trigger listeners → State = Running ✓ Complete Source Code: Azure/azure-functions-host If any step in this sequence fails, the host enters an Error state and attempts to restart with exponential backoff (starting at 1 second, up to 2 minutes between attempts). After repeated failures, the platform may report an application-level failure. Host States The Functions host can be in any of the following states: State Meaning Default Host has not yet been created Starting Host is in the process of starting Initialized Functions indexed, listeners not yet running Running Fully running — triggers active, functions discoverable Error Host encountered an error — will attempt restart Stopping Host is shutting down Stopped Host is stopped Offline Host is offline (app_offline.htm is present) Only when the host reaches the Running state are functions visible in the portal and triggers active. The Error state triggers an automatic restart loop. Key Settings That Affect Startup Setting Purpose Impact If Wrong FUNCTIONS_EXTENSION_VERSION Specifies the runtime version (e.g., ~4) Host throws startup error if missing or invalid FUNCTIONS_WORKER_RUNTIME Specifies the language runtime (e.g., dotnet-isolated, node, python) Host cannot load the correct worker process AzureWebJobsStorage Connection string for the required storage account Host cannot store keys, coordinate triggers, or maintain state WEBSITE_RUN_FROM_PACKAGE Controls how deployment packages are loaded Host shuts down if package is inaccessible or corrupted WEBSITE_CONTENTAZUREFILECONNECTIONSTRING Storage connection for content share (Consumption/Premium) Host cannot access function code WEBSITE_CONTENTSHARE File share name for function content Host cannot locate function files Startup Failure Categories Category Examples Typical Symptom Configuration Missing/invalid app settings, bad host.json Host enters Error state immediately Storage AzureWebJobsStorage unreachable, expired SAS token, firewall Host fails repeatedly, storage-related exceptions Extensions/Bindings Missing extension bundle, version mismatch, load failure Host errors during extension loading phase Deployment/Packaging Corrupted zip, wrong package structure, missing files Host starts but finds no functions, or fails to load assemblies Code/Startup DI exception, external startup error, assembly conflict Host errors during initialization with code-specific exception Runtime/Worker Wrong worker runtime, language mismatch, gRPC failure Host cannot establish worker channel Networking VNet blocks outbound, DNS failure, private endpoint misconfigured Host cannot reach storage/dependencies at startup Platform Resource exhaustion, app_offline.htm, platform issue Host enters Offline state or is killed before startup completes Common Causes and Solutions 1. Missing or Invalid FUNCTIONS_EXTENSION_VERSION Symptoms: Host immediately fails to start Error message: "Invalid site extension configuration. Please update the App Setting 'FUNCTIONS_EXTENSION_VERSION' to a valid value (e.g. ~4)." Repeated restart loops in Application Insights Why This Happens: The FUNCTIONS_EXTENSION_VERSION app setting tells the platform which version of the Functions runtime to load. When your app runs as a hosted site extension (the normal case in Azure), this setting is validated as one of the first steps in ScriptHost.PreInitialize(). If it is missing, empty, or set to an unrecognized value, the host throws a HostInitializationException and cannot proceed. How to Verify: Navigate to your Function App in the Azure Portal Go to Settings → Configuration → Application settings Look for FUNCTIONS_EXTENSION_VERSION Confirm it is set to a valid value: ~4 (recommended), ~3 (legacy), or a specific version Solution: Set FUNCTIONS_EXTENSION_VERSION to ~4 (or the appropriate version for your app) If the setting was recently changed or removed, restore it Save and restart the Function App Ref: FUNCTIONS_EXTENSION_VERSION 2. Missing or Mismatched FUNCTIONS_WORKER_RUNTIME Symptoms: Error: "The 'FUNCTIONS_WORKER_RUNTIME' setting is required..." (diagnostic code AZFD0011) Error: "The 'FUNCTIONS_WORKER_RUNTIME' is set to 'X', which does not match the worker runtime metadata..." (diagnostic code AZFD0013) Host enters Error state after loading function metadata Why This Happens: The FUNCTIONS_WORKER_RUNTIME setting controls which language worker process the host launches (e.g., dotnet-isolated, node, python, java, powershell). During initialization, the host validates that this setting matches the actual function metadata discovered in your deployment. A mismatch — for example, deploying a Python app but having FUNCTIONS_WORKER_RUNTIME=node — results in a HostInitializationException. How to Verify: Check the app setting value in Portal → Configuration Compare against your actual project type: C# in-process: dotnet C# isolated: dotnet-isolated Node.js: node Python: python Java: java PowerShell: powershell Solution: Set FUNCTIONS_WORKER_RUNTIME to the correct value matching your function code If you recently migrated language models (e.g., in-process to isolated), update the setting accordingly Save and restart Ref: FUNCTIONS_WORKER_RUNTIME 3. Storage Account Connectivity Issues (AzureWebJobsStorage) Symptoms: Host fails to start and cannot recover Errors related to Blob storage connectivity "Unable to get function keys" or secret management errors Health check returns Unhealthy Why This Happens: The Functions host requires a valid and reachable storage account for: Storing function keys and secrets Coordinating distributed triggers (e.g., timer triggers, queue listeners) Maintaining internal state and lock management Hosting the content share for Consumption and Premium plans The host runs a background health check (WebJobsStorageHealthCheck) every 30 seconds that verifies Blob storage connectivity. If the storage account is unreachable — due to a wrong connection string, rotated keys, firewall restrictions, deleted account, or expired SAS token — the host will fail to initialize properly. How to Verify: Check your Application Settings for these storage-related values: Setting Required For AzureWebJobsStorage All plans — primary storage connection WEBSITE_CONTENTAZUREFILECONNECTIONSTRING Consumption and Premium plans — content share WEBSITE_CONTENTSHARE Consumption and Premium plans — file share name You can also verify storage connectivity using the host status endpoint. Solution: Verify the storage account exists — check the Azure Portal to confirm it has not been deleted or disabled Check for rotated keys — if storage keys were recently regenerated, update the connection string: Get the new connection string from the Storage Account → Access keys blade Update AzureWebJobsStorage in your Function App settings Check storage firewall rules: Go to Storage Account → Networking Ensure the Function App has access (public endpoint, service endpoint, or private endpoint depending on your architecture) For SAS-token-based connections — verify the token has not expired (diagnostic code AZFD0006) For VNet-integrated apps: Ensure service endpoints or private endpoints are configured for the storage account Verify DNS resolution works for *.blob.core.windows.net, *.queue.core.windows.net, *.table.core.windows.net, and *.file.core.windows.net For detailed guidance, see Storage considerations for Azure Functions. 4. Invalid host.json Configuration Symptoms: Error: "The host.json file is missing the required 'version' property." (diagnostic code AZFD0009) Error: "'X' is an invalid value for host.json 'version' property." JSON deserialization failures in logs Host enters a special HandlingConfigurationParsingError mode Why This Happens: The host.json file is parsed early in the startup sequence. If it is missing the required "version": "2.0" property, contains invalid JSON syntax, or has unrecognized configuration values, the host throws a HostConfigurationException. The host then restarts in a degraded mode that skips host.json parsing — the admin APIs remain functional for diagnostics, but functions will not load. How to Verify: Check your host.json in the deployment: Windows plans: Use Kudu → Debug Console → Navigate to site/wwwroot/host.json Linux/Flex Consumption: Use SSH or Azure CLI Validate that the file: Is valid JSON (use a JSON validator) Contains the required "version": "2.0" property Does not have unrecognized or misspelled configuration keys Minimal valid host.json: { "version": "2.0" } Typical host.json with extension bundle: { "version": "2.0", "extensionBundle": { "id": "Microsoft.Azure.Functions.ExtensionBundle", "version": "[4.*, 5.0.0)" }, "logging": { "applicationInsights": { "samplingSettings": { "isEnabled": true, "excludedTypes": "Request" } } } } Solution: Fix any JSON syntax errors Ensure "version": "2.0" is present Remove or correct any unrecognized configuration keys Redeploy or edit the file directly via Kudu (Windows plans) Ref: host.json 5. Extension Bundle or Binding Load Failures Symptoms: Host fails to start with extension-related errors in logs Error: "Referenced bundle X of version Y does not meet the required minimum version..." Error: "One or more loaded extensions do not meet the minimum requirements..." Errors referencing ScriptStartUpErrorLoadingExtensionBundle or ScriptStartUpUnableToLoadExtension Works locally but fails in Azure Why This Happens: Azure Functions uses extension bundles to provide trigger and binding implementations (Service Bus, Event Hubs, Cosmos DB, etc.). During startup, the ScriptStartupTypeLocator loads extension assemblies from either the bundle path or the bin folder. If the bundle is missing, the version is incompatible, an assembly fails to load, or the type does not implement the expected interfaces, the host throws a HostInitializationException. How to Verify: Check host.json for the extensionBundle configuration Verify the version range is compatible with your runtime version For compiled C# apps that don't use bundles, verify all required NuGet packages are present and compatible Solution: Ensure extensionBundle is configured in host.json: { "version": "2.0", "extensionBundle": { "id": "Microsoft.Azure.Functions.ExtensionBundle", "version": "[4.*, 5.0.0)" } } Use the correct version range for your runtime: Functions v4: [4.*, 5.0.0) For compiled .NET apps using explicit extensions: Verify all extension NuGet packages are up to date Ensure extensions.json is present in the bin folder after build Check for assembly version conflicts in the build output 6. Deployment Package Issues (WEBSITE_RUN_FROM_PACKAGE) Symptoms: Host shuts down immediately after startup Error: "Shutting down host due to presence of FAILED TO INITIALIZE RUN FROM PACKAGE.txt" Functions were visible before but disappeared after deployment "No functions found" in the portal Read-only file system errors in logs Why This Happens: When WEBSITE_RUN_FROM_PACKAGE is configured, the Functions host runs directly from a deployment package (ZIP file). During startup, the host checks the file system for failure markers. If the file FAILED TO INITIALIZE RUN FROM PACKAGE.txt is found, the host immediately shuts down the application — this is a fatal, non-recoverable error that requires redeployment. Other common package issues include an inaccessible URL, an expired SAS token, files nested in a subfolder instead of the ZIP root, or a corrupted package. WEBSITE_RUN_FROM_PACKAGE Values: Value Behavior 1 Runs from a local package in d:\home\data\SitePackages (Windows) or /home/data/SitePackages (Linux) <URL> Runs from a remote package at the specified URL (required for Linux Consumption) Not set Traditional deployment — files extracted to wwwroot How to Verify: Check WEBSITE_RUN_FROM_PACKAGE in Application Settings If value is 1: Go to Kudu → Debug Console Navigate to d:\home\data\SitePackages Verify a .zip file exists and packagename.txt points to it If value is a URL: Try accessing the URL directly — it should download the ZIP Check for expired SAS tokens (403 response) or missing blobs (404 response) Verify package contents: Download and extract the ZIP Confirm host.json and function files are at the root level, not in a nested subfolder Common Issues: Problem Symptom Fix Expired SAS token Package URL returns 403 Generate new SAS with longer expiry Package URL not accessible Package URL returns 404 Verify blob exists and URL is correct Wrong package structure Files in subfolder Ensure files are at ZIP root Corrupted package Host startup errors Redeploy with a fresh package Storage firewall blocking Timeout errors Allow Function App access to storage Solution: Redeploy your Function App using your preferred deployment method If using URL-based packages, regenerate the SAS token or use managed identity-based access If the failure marker file exists, redeployment will overwrite it Restart the Function App after fixing: Ref: WEBSITE_RUN_FROM_PACKAGE 7. Code-Level Startup Exceptions (DI and External Startup) Symptoms: Host Error state with application-specific exception in logs Error: "Error configuring services in an external startup class" (diagnostic code AZFD0005) Dependency injection failures (InvalidOperationException, TypeLoadException) Errors in Program.cs or Startup.cs of your application Assembly binding or version conflict exceptions Why This Happens: For isolated worker (.NET) apps, your Program.cs runs custom startup code before the worker connects to the host. For in-process (.NET) apps, custom IWebJobsStartup implementations run during host initialization. If this code throws — for example, a missing dependency, a failed external service connection, or a type load error — the host catches the exception and enters an Error state with a HostInitializationException. How to Verify: Check Application Insights Exceptions table for the specific exception type and stack trace Look for errors containing AZFD0005 (external startup error) Review your Program.cs / Startup.cs for: Service registrations that depend on external resources (databases, APIs, Key Vault) Missing NuGet packages or assembly version mismatches Configuration values that may differ between local and Azure environments Solution: Fix the exception identified in logs — the stack trace usually points directly to the failing code Ensure all required environment variables and connection strings are set in Application Settings For assembly conflicts, check that all NuGet package versions are compatible and aligned Consider making external-service connections resilient by deferring initialization or adding retry logic Test startup locally with the same environment variables as Azure 8. Language Worker Channel Failure Symptoms: Error: "Failed to start Language Worker Channel for language: {runtime}" Error: "Failed to start Rpc Server. Check if your app is hitting connection limits." Host starts but cannot communicate with the language worker process Timeout errors during worker initialization Why This Happens: For out-of-process languages (Node.js, Python, Java, PowerShell, .NET Isolated), the Functions host communicates with a separate worker process over gRPC. If the host cannot start the gRPC server, or the worker process fails to launch or connect, the host throws a HostInitializationException. Common causes include: Port conflicts Missing language runtime or incorrect version Worker process crashes on startup Resource exhaustion (memory, file handles) How to Verify: Check Application Insights for gRPC or worker-related errors Verify the correct language runtime version is installed: For Node.js: Check WEBSITE_NODE_DEFAULT_VERSION For Python: Check the Python version in Configuration → General settings For Java: Check FUNCTIONS_WORKER_JAVA_LOAD_APP_LIBS and Java version For .NET Isolated: Check target framework in the deployed assemblies Check if the Function App is hitting plan resource limits Solution: Ensure the correct language runtime version is configured For Linux Consumption, verify the correct runtime stack is selected in Configuration → General settings If resource limits are suspected, consider scaling up to a higher plan tier Restart the Function App to clear temporary port or resource issues 9. Networking Blocking Required Dependencies Symptoms: Host fails to start in VNet-integrated apps Timeout errors connecting to storage or other Azure services Works without VNet integration, fails with it enabled DNS resolution failures in logs NSG or firewall-related errors Why This Happens: During startup, the Functions host must reach several external endpoints: Azure Storage (Blob, Queue, Table, File) — for keys, triggers, and state Extension bundle CDN — to download extension bundles (first run or cold start) Azure Key Vault — if Key Vault references are used in app settings Application Insights — for telemetry (non-blocking, but can delay if timing out) If VNet integration, NSG rules, forced tunneling, or a firewall blocks these outbound connections, the host cannot complete startup. How to Verify: Check if the Function App has VNet integration enabled (Networking blade) Review NSG rules on the integrated subnet — ensure outbound to Azure services is allowed For apps with forced tunneling, verify the firewall/NVA allows required endpoints Check DNS resolution for storage endpoints from within the VNet context Solution: Add NSG rules or firewall rules to allow outbound traffic to the required endpoints Configure service endpoints or private endpoints for storage on the integrated subnet Ensure DNS resolution works for all required endpoints For private DNS zones, ensure proper zone links and records exist for storage See Azure Functions networking options for detailed configuration guidance. 10. app_offline.htm Causing Offline State Symptoms: Host status shows Offline All requests return an offline page Portal shows the app is running but functions return errors Why This Happens: If a file named app_offline.htm exists in the function app's script root directory, the host detects it during startup and enters the Offline state. Some deployment tools create this file during deployment to gracefully take the app offline, and it should be removed automatically when deployment completes. If it is left behind — for example, due to a failed deployment — the host remains offline. How to Verify: Windows plans: Go to Kudu → Debug Console → Navigate to site/wwwroot and look for app_offline.htm Linux: Use SSH or Azure CLI to check for the file Solution: Delete app_offline.htm from the app's root directory The host will automatically detect the deletion and restart into a normal state If the file reappears after deletion, investigate your deployment pipeline — it may be creating the file but failing to remove it Using Diagnose and Solve Problems The Azure Portal provides built-in diagnostics specifically designed for Functions host startup issues. How to Access: Navigate to your Function App in the Azure Portal Select Diagnose and solve problems from the left menu Search for relevant detectors: Detector What It Checks Function App Down or Reporting Errors Overall app health, host status, crash history Function App Startup Issue Specific startup failure analysis, configuration validation Functions Configurations Check host.json and app settings validation Functions Deployment Recent deployment status and potential issues Network Troubleshooter VNet, private endpoint, and access restriction diagnostics These detectors run automated checks against your Function App and provide targeted recommendations. The detectors often identify the root cause faster than manual investigation. Verifying Host Status via REST API You can check the host status programmatically to determine the current state and any reported errors. Get Host Status: curl "https://<app>.azurewebsites.net/admin/host/status?code=<master-key>"</master-key></app> See Admin API for details. The state field is the single most important indicator: State Action Running Host is healthy — investigate function-level issues Error Host startup failed — check the errors array for root cause Offline app_offline.htm present — check deployment state No response / timeout Host cannot serve requests — check platform health and networking List Functions (verify discovery): curl "https://<app>.azurewebsites.net/admin/functions?code=<master-key>"</master-key></app> Quick Troubleshooting Checklist Use this checklist to systematically diagnose host startup issues: [ ] Host status: Check /admin/host/status — is the state Running, Error, or Offline? [ ] First error: Check Application Insights Exceptions or Log Stream — what is the first exception after the latest restart? [ ] FUNCTIONS_EXTENSION_VERSION: Is it set to a valid value (e.g., ~4)? [ ] FUNCTIONS_WORKER_RUNTIME: Is it set correctly and does it match the deployed code? [ ] AzureWebJobsStorage: Is the connection string valid? Is the storage account reachable from the app's network context? [ ] host.json: Does it exist, contain valid JSON, and include "version": "2.0"? [ ] Extension bundle: Is extensionBundle configured with a compatible version range? [ ] Package deployment: If using WEBSITE_RUN_FROM_PACKAGE, is the package accessible and correctly structured? [ ] Startup code: For .NET apps, does Program.cs / startup code throw during DI registration? [ ] Networking: If VNet-integrated, can the app reach storage, Key Vault, and extension CDN endpoints? [ ] Offline file: Is app_offline.htm present in the root directory? [ ] Diagnose and Solve: Have you run the Function App Startup Issue detector in the Azure Portal? Diagnostic Event Codes Reference When reviewing logs, look for these Azure Functions diagnostic codes that are related to startup failures: Code Name Meaning AZFD0005 External Startup Error Error in a custom IWebJobsStartup class AZFD0006 SAS Token Expiring AzureWebJobsStorage SAS token is expiring or expired AZFD0009 Unable to Parse host.json host.json file is missing or has invalid content AZFD0011 Missing FUNCTIONS_WORKER_RUNTIME The required worker runtime setting is not configured AZFD0013 Worker Runtime Mismatch FUNCTIONS_WORKER_RUNTIME does not match deployed function metadata These codes appear in Application Insights traces and diagnostic event logs. Diagnostic Events Conclusion Azure Functions host startup failures can be caused by a wide range of issues — from a simple missing app setting to complex networking misconfigurations. The key to efficient troubleshooting is a systematic approach: Key Takeaways: Always check host status first — the /admin/host/status endpoint tells you the current state and any errors Find the first error, not the cascade — look for the initial exception after the most recent restart Validate configuration — FUNCTIONS_EXTENSION_VERSION, FUNCTIONS_WORKER_RUNTIME, and AzureWebJobsStorage are the three settings that cause the most startup failures Check host.json — a missing version property or invalid JSON is a common and easily fixable cause Verify deployment artifacts — ensure your package is complete, correctly structured, and accessible Use built-in diagnostics — the Diagnose and Solve Problems detectors are purpose-built for these issues Apply one fix at a time — change one setting, restart, and recheck. Avoid multiple simultaneous changes that obscure which fix resolved the issue If you continue to experience startup issues after following these steps, consider opening a support ticket with Microsoft Azure Support, providing: Function App name and resource group Timestamp of when the issue started Host status endpoint response (copy the full JSON) The first exception from Application Insights or Log Stream Recent deployment or configuration changes Networking configuration details (if VNet-integrated) References Azure Functions host.json reference Azure Functions app settings reference Azure Functions deployment technologies Storage considerations for Azure Functions Azure Functions networking options Azure Functions diagnostics Azure Functions Admin API (host status) Run your functions from a package file Troubleshoot Azure Functions Have questions or feedback? Leave a comment below.676Views1like0CommentsNetwork Connectivity Check APIs for Logic App Standard
Introduction When your Logic App Standard is integrated with a Virtual Network (VNET), you can use these APIs to troubleshoot connectivity issues to downstream resources like SQL databases, Storage Accounts, Service Bus, Key Vault, and more. The checks run directly from the worker hosting your Logic App, so the results reflect the actual network path your workflows use. API Overview API HTTP Method Route Suffix Purpose ConnectivityCheck POST /connectivityCheck Validates end-to-end connectivity to an Azure resource (SQL, Key Vault, Storage, Service Bus, etc.) DnsCheck POST /dnsCheck Performs DNS resolution for a hostname TcpPingCheck POST /tcpPingCheck Performs a TCP ping to a host and port How to Call Using Azure API Playground Sign in with your Azure account. https://portal.azure.com/#view/Microsoft_Azure_Resources/ArmPlayground.ReactView Use POST method with the URLs below. Instead of API playground you can also use PowerShell or Az Rest URL Pattern Production slot: POST https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Web/sites/{logicAppName}/connectivityCheck?api-version=2026-03-01-preview Deployment slot: POST https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Web/sites/{logicAppName}/slots/{slotName}/connectivityCheck?api-version=2026-03-01-preview Replace connectivityCheck with dnsCheck or tcpPingCheck as needed. all the requests should be Json 1. ConnectivityCheck Tests end-to-end connectivity from your Logic App to an Azure resource. This validates DNS, TCP, and authentication in a single call. Supported Provider Types ProviderType Use For KeyVault Azure Key Vault SQL Azure SQL Database / SQL Server ServiceBus Azure Service Bus EventHubs Azure Event Hubs BlobStorage Azure Blob Storage FileShare Azure File Share (see Port 445 limitation) only tese 443 QueueStorage Azure Queue Storage TableStorage Azure Table Storage Web Any HTTP/HTTPS endpoint Credential Types CredentialType When to Use ConnectionString You have a connection string to provide directly Authentication You have an endpoint URL with username and password CredentialReference You want to reference an existing connection string or app setting by name AppSetting You want to reference an app setting configured on the Logic App ManagedIdentity Your Logic App uses Managed Identity to authenticate Sample Request — Connection String (SQL Database) POST https://management.azure.com/subscriptions/{subId}/resourceGroups/{rg}/providers/Microsoft.Web/sites/{logicAppName}/connectivityCheck?api-version=2026-03-01-preview Content-Type: application/json { "properties": { "providerType": "SQL", "credentials": { "credentialType": "ConnectionString", "connectionString": "Server=tcp:myserver.database.windows.net,1433;Database=mydb;User ID=myuser;Password=mypassword;Encrypt=True;TrustServerCertificate=False;" }, "resourceMetadata": { "entityName": "" } } } Sample Request — App Setting Reference (Service Bus) Use this when your connection string is stored in an app setting on the Logic App (e.g., ServiceBusConnection). { "properties": { "providerType": "ServiceBus", "credentials": { "credentialType": "AppSetting", "appSetting": "ServiceBusConnection" }, "resourceMetadata": { "entityName": "myqueue" } } } Sample Request — Managed Identity (Blob Storage) Use this when your Logic App authenticates using Managed Identity. { "properties": { "providerType": "BlobStorage", "credentials": { "credentialType": "ManagedIdentity", "managedIdentity": { "targetResourceUrl": "https://mystorageaccount.blob.core.windows.net", "clientId": "" } }, "resourceMetadata": { "entityName": "" } } } Tip: Leave clientId empty to use the system-assigned managed identity. Provide a client ID to use a specific user-assigned managed identity. 2. DnsCheck Tests whether a hostname can be resolved from your Logic App's worker. This is useful for verifying private DNS zones and private endpoints are configured correctly. Sample Request POST https://management.azure.com/subscriptions/{subId}/resourceGroups/{rg}/providers/Microsoft.Web/sites/{logicAppName}/dnsCheck?api-version=2026-03-01-preview Content-Type: application/json { "properties": { "dnsName": "myserver.database.windows.net" } } 3. TcpPingCheck Tests whether a TCP connection can be established from your Logic App to a specific host and port. This is useful for checking if a port is open and reachable through your VNET. Sample Request POST https://management.azure.com/subscriptions/{subId}/resourceGroups/{rg}/providers/Microsoft.Web/sites/{logicAppName}/tcpPingCheck?api-version=2026-03-01-preview Content-Type: application/json { "properties": { "host": "myserver.database.windows.net", "port": "1433" } } Port 445 (SMB / Azure File Share) — Known Limitation Port 445 cannot be reliably tested using TcpPingCheck or ConnectivityCheck with the FileShare provider type. Restricted Outgoing Ports Regardless of address, applications cannot connect to anywhere using ports 445, 137, 138, and 139. In other words, even if connecting to a non-private IP address or the address of a virtual network, connections to ports 445, 137, 138, and 139 are not permitted.Using an AI Agent to Troubleshoot and Fix Azure Function App Issues
TOC Preparation Troubleshooting Workflow Conclusion Preparation Topic: Required tools AI agent: for example, Copilot CLI / OpenCode / Hermes / OpenClaw, etc. In this example, we use Copilot CLI. Model access: for example, Anthropic Claude Opus. Relevant skills: this example does not use skills, but using relevant skills can speed up troubleshooting. Topic: Compliant with your organization Enterprise-level projects are sensitive, so you must confirm with the appropriate stakeholders before using them. Enterprise environments may also have strict standards for AI agent usage. Topic: Network limitations If the process involves restarting the Function App container or restarting related settings, communication between the user and the agent may be interrupted, and you will need to use /resume. If the agent needs internet access for investigation, the app must have outbound connectivity. If the Kudu container cannot be used because of network issues, this type of investigation cannot be carried out. Topic: Permission limitations If you are using Azure blessed images, according to the official documentation, the containers use the fixed password Docker!. However, if you are using a custom container, you will need to provide an additional login method. For resources the agent does not already have permission to investigate, you will need to enable SAMI and assign the appropriate RBAC roles. Troubleshooting Workflow Let’s use a classic case where an HTTP trigger cannot be tested from the Azure Portal. As you can see, when clicking Test/Run in the Azure Portal, an error message appears. At the same time, however, the home page does not show any abnormal status. At this point, we first obtain the Function App’s SAMI and assign it the Owner role for the entire resource group. This is only for demonstration purposes. In practice, you should follow the principle of least privilege and scope permissions down to only the specific resources and operations that are actually required. Next, go to the Kudu container, which is the always-on maintenance container dedicated to the app. Install and enable Copilot CLI. Then we can describe the problem we are encountering. After the agent processes the issue and interacts with you further, it can generate a reasonable investigation report. In this example, it appears that the Function App’s Storage Account access key had been rotated previously, but the Function App had not updated the corresponding environment variable. Once we understand the issue, we could perform the follow-up actions ourselves. However, to demonstrate the agent’s capabilities, you can also allow it to fix the problem directly, provided that you have granted the corresponding permissions through SAMI. During the process, the container restart will disconnect the session, so you will need to return to the Kudu container and resume the previous session so it can continue. Finally, it will inform you that the issue has been fixed, and then you can validate the result. This is the validation result, and it looks like the repair was successful. Conclusion After each repair, we can even extract the experience from that case into a skill and store it in a Storage Account for future reuse. In this way, we can not only reduce the agent’s initial investigation time for similar issues, but also save tokens. This makes both time and cost management more efficient.436Views3likes0CommentsAzure Functions Ignite 2025 Update
Azure Functions is redefining event-driven applications and high-scale APIs in 2025, accelerating innovation for developers building the next generation of intelligent, resilient, and scalable workloads. This year, our focus has been on empowering AI and agentic scenarios: remote MCP server hosting, bulletproofing agents with Durable Functions, and first-class support for critical technologies like OpenTelemetry, .NET 10 and Aspire. With major advances in serverless Flex Consumption, enhanced performance, security, and deployment fundamentals across Elastic Premium and Flex, Azure Functions is the platform of choice for building modern, enterprise-grade solutions. Remote MCP Model Context Protocol (MCP) has taken the world by storm, offering an agent a mechanism to discover and work deeply with the capabilities and context of tools. When you want to expose MCP/tools to your enterprise or the world securely, we recommend you think deeply about building remote MCP servers that are designed to run securely at scale. Azure Functions is uniquely optimized to run your MCP servers at scale, offering serverless and highly scalable features of Flex Consumption plan, plus two flexible programming model options discussed below. All come together using the hardened Functions service plus new authentication modes for Entra and OAuth using Built-in authentication. Remote MCP Triggers and Bindings Extension GA Back in April, we shared a new extension that allows you to author MCP servers using functions with the MCP tool trigger. That MCP extension is now generally available, with support for C#(.NET), Java, JavaScript (Node.js), Python, and Typescript (Node.js). The MCP tool trigger allows you to focus on what matters most: the logic of the tool you want to expose to agents. Functions will take care of all the protocol and server logistics, with the ability to scale out to support as many sessions as you want to throw at it. [Function(nameof(GetSnippet))] public object GetSnippet( [McpToolTrigger(GetSnippetToolName, GetSnippetToolDescription)] ToolInvocationContext context, [BlobInput(BlobPath)] string snippetContent ) { return snippetContent; } New: Self-hosted MCP Server (Preview) If you’ve built servers with official MCP SDKs and want to run them as remote cloud‑scale servers without re‑writing any code, this public preview is for you. You can now self‑host your MCP server on Azure Functions—keep your existing Python, TypeScript, .NET, or Java code and get rapid 0 to N scaling, built-in server authentication and authorization, consumption-based billing, and more from the underlying Azure Functions service. This feature complements the Azure Functions MCP extension for building MCP servers using the Functions programming model (triggers & bindings). Pick the path that fits your scenario—build with the extension or standard MCP SDKs. Either way you benefit from the same scalable, secure, and serverless platform. Use the official MCP SDKs: # MCP.tool() async def get_alerts(state: str) -> str: """Get weather alerts for a US state. Args: state: Two-letter US state code (e.g. CA, NY) """ url = f"{NWS_API_BASE}/alerts/active/area/{state}" data = await make_nws_request(url) if not data or "features" not in data: return "Unable to fetch alerts or no alerts found." if not data["features"]: return "No active alerts for this state." alerts = [format_alert(feature) for feature in data["features"]] return "\n---\n".join(alerts) Use Azure Functions Flex Consumption Plan's serverless compute using Custom Handlers in host.json: { "version": "2.0", "configurationProfile": "mcp-custom-handler", "customHandler": { "description": { "defaultExecutablePath": "python", "arguments": ["weather.py"] }, "http": { "DefaultAuthorizationLevel": "anonymous" }, "port": "8000" } } Learn more about MCPTrigger and self-hosted MCP servers at https://aka.ms/remote-mcp Built-in MCP server authorization (Preview) The built-in authentication and authorization feature can now be used for MCP server authorization, using a new preview option. You can quickly define identity-based access control for your MCP servers with Microsoft Entra ID or other OpenID Connect providers. Learn more at https://aka.ms/functions-mcp-server-authorization. Better together with Foundry agents Microsoft Foundry is the starting point for building intelligent agents, and Azure Functions is the natural next step for extending those agents with remote MCP tools. Running your tools on Functions gives you clean separation of concerns, reuse across multiple agents, and strong security isolation. And with built-in authorization, Functions enables enterprise-ready authentication patterns, from calling downstream services with the agent’s identity to operating on behalf of end users with their delegated permissions. Build your first remote MCP server and connect it to your Foundry agent at https://aka.ms/foundry-functions-mcp-tutorial. Agents Microsoft Agent Framework 2.0 (Public Preview Refresh) We’re excited about the preview refresh 2.0 release of Microsoft Agent Framework that builds on battle hardened work from Semantic Kernel and AutoGen. Agent Framework is an outstanding solution for building multi-agent orchestrations that are both simple and powerful. Azure Functions is a strong fit to host Agent Framework with the service’s extreme scale, serverless billing, and enterprise grade features like VNET networking and built-in auth. Durable Task Extension for Microsoft Agent Framework (Preview) The durable task extension for Microsoft Agent Framework transforms how you build production-ready, resilient and scalable AI agents by bringing the proven durable execution (survives crashes and restarts) and distributed execution (runs across multiple instances) capabilities of Azure Durable Functions directly into the Microsoft Agent Framework. Combined with Azure Functions for hosting and event-driven execution, you can now deploy stateful, resilient AI agents that automatically handle session management, failure recovery, and scaling, freeing you to focus entirely on your agent logic. Key features of the durable task extension include: Serverless Hosting: Deploy agents on Azure Functions with auto-scaling from thousands of instances to zero, while retaining full control in a serverless architecture. Automatic Session Management: Agents maintain persistent sessions with full conversation context that survives process crashes, restarts, and distributed execution across instances Deterministic Multi-Agent Orchestrations: Coordinate specialized durable agents with predictable, repeatable, code-driven execution patterns Human-in-the-Loop with Serverless Cost Savings: Pause for human input without consuming compute resources or incurring costs Built-in Observability with Durable Task Scheduler: Deep visibility into agent operations and orchestrations through the Durable Task Scheduler UI dashboard Create a durable agent: endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME", "gpt-4o-mini") # Create an AI agent following the standard Microsoft Agent Framework pattern agent = AzureOpenAIChatClient( endpoint=endpoint, deployment_name=deployment_name, credential=AzureCliCredential() ).create_agent( instructions="""You are a professional content writer who creates engaging, well-structured documents for any given topic. When given a topic, you will: 1. Research the topic using the web search tool 2. Generate an outline for the document 3. Write a compelling document with proper formatting 4. Include relevant examples and citations""", name="DocumentPublisher", tools=[ AIFunctionFactory.Create(search_web), AIFunctionFactory.Create(generate_outline) ] ) # Configure the function app to host the agent with durable session management app = AgentFunctionApp(agents=[agent]) app.run() Durable Task Scheduler dashboard for agent and agent workflow observability and debugging For more information on the durable task extension for Agent Framework, see the announcement: https://aka.ms/durable-extension-for-af-blog. Flex Consumption Updates As you know, Flex Consumption means serverless without compromise. It combines elastic scale and pay‑for‑what‑you‑use pricing with the controls you expect: per‑instance concurrency, longer executions, VNet/private networking, and Always Ready instances to minimize cold starts. Since launching GA at Ignite 2024 last year, Flex Consumption has had tremendous growth with over 1.5 billion function executions per day and nearly 40 thousand apps. Here’s what’s new for Ignite 2025: 512 MB instance size (GA). Right‑size lighter workloads, scale farther within default quota. Availability Zones (GA). Distribute instances across zones. Rolling updates (Public Preview). Unlock zero-downtime deployments of code or config by setting a single configuration. See below for more information. Even more improvements including: new diagnostic settingsto route logs/metrics, use Key Vault App Config references, new regions, and Custom Handler support. To get started, review Flex Consumption samples, or dive into the documentation to see how Flex can support your workloads. Migrating to Azure Functions Flex Consumption Migrating to Flex Consumption is simple with our step-by-step guides and agentic tools. Move your Azure Functions apps or AWS Lambda workloads, update your code and configuration, and take advantage of new automation tools. With Linux Consumption retiring, now is the time to switch. For more information, see: Migrate Consumption plan apps to the Flex Consumption plan Migrate AWS Lambda workloads to Azure Functions Durable Functions Durable Functions introduces powerful new features to help you build resilient, production-ready workflows: Distributed Tracing: lets you track requests across components and systems, giving you deep visibility into orchestration and activities with support for App Insights and OpenTelemetry. Extended Sessions support in .NET isolated: improves performance by caching orchestrations in memory, ideal for fast sequential activities and large fan-out/fan-in patterns. Orchestration versioning (public preview): enables zero-downtime deployments and backward compatibility, so you can safely roll out changes without disrupting in-flight workflows Durable Task Scheduler Updates Durable Task Scheduler Dedicated SKU (GA): Now generally available, the Dedicated SKU offers advanced orchestration for complex workflows and intelligent apps. It provides predictable pricing for steady workloads, automatic checkpointing, state protection, and advanced monitoring for resilient, reliable execution. Durable Task Scheduler Consumption SKU (Public Preview): The new Consumption SKU brings serverless, pay-as-you-go orchestration to dynamic and variable workloads. It delivers the same orchestration capabilities with flexible billing, making it easy to scale intelligent applications as needed. For more information see: https://aka.ms/dts-ga-blog OpenTelemetry support in GA Azure Functions OpenTelemetry is now generally available, bringing unified, production-ready observability to serverless applications. Developers can now export logs, traces, and metrics using open standards—enabling consistent monitoring and troubleshooting across every workload. Key capabilities include: Unified observability: Standardize logs, traces, and metrics across all your serverless workloads for consistent monitoring and troubleshooting. Vendor-neutral telemetry: Integrate seamlessly with Azure Monitor or any OpenTelemetry-compliant backend, ensuring flexibility and choice. Broad language support: Works with .NET (isolated), Java, JavaScript, Python, PowerShell, and TypeScript. Start using OpenTelemetry in Azure Functions today to unlock standards-based observability for your apps. For step-by-step guidance on enabling OpenTelemetry and configuring exporters for your preferred backend, see the documentation. Deployment with Rolling Updates (Preview) Achieving zero-downtime deployments has never been easier. The Flex Consumption plan now offers rolling updates as a site update strategy. Set a single property, and all future code deployments and configuration changes will be released with zero-downtime. Instead of restarting all instances at once, the platform now drains existing instances in batches while scaling out the latest version to match real-time demand. This ensures uninterrupted in-flight executions and resilient throughput across your HTTP, non-HTTP, and Durable workloads – even during intensive scale-out scenarios. Rolling updates are now in public preview. Learn more at https://aka.ms/functions/rolling-updates. Secure Identity and Networking Everywhere By Design Security and trust are paramount. Azure Functions incorporates proven best practices by design, with full support for managed identity—eliminating secrets and simplifying secure authentication and authorization. Flex Consumption and other plans offer enterprise-grade networking features like VNETs, private endpoints, and NAT gateways for deep protection. The Azure Portal streamlines secure function creation, and updated scenarios and samples showcase these identity and networking capabilities in action. Built-in authentication (discussed above) enables inbound client traffic to use identity as well. Check out our updated Functions Scenarios page with quickstarts or our secure samples gallery to see these identity and networking best practices in action. .NET 10 Azure Functions now supports .NET 10, bringing in a great suite of new features and performance benefits for your code. .NET 10 is supported on the isolated worker model, and it’s available for all plan types except Linux Consumption. As a reminder, support ends for the legacy in-process model on November 10, 2026, and the in-process model is not being updated with .NET 10. To stay supported and take advantage of the latest features, migrate to the isolated worker model. Aspire Aspire is an opinionated stack that simplifies development of distributed applications in the cloud. The Azure Functions integration for Aspire enables you to develop, debug, and orchestrate an Azure Functions .NET project as part of an Aspire solution. Aspire publish directly deploys to your functions to Azure Functions on Azure Container Apps. Aspire 13 includes an updated preview version of the Functions integration that acts as a release candidate with go-live support. The package will be moved to GA quality with Aspire 13.1. Java 25, Node.js 24 Azure Functions now supports Java 25 and Node.js 24 in preview. You can now develop functions using these versions locally and deploy them to Azure Functions plans. Learn how to upgrade your apps to these versions here In Summary Ready to build what’s next? Update your Azure Functions Core Tools today and explore the latest samples and quickstarts to unlock new capabilities for your scenarios. The guided quickstarts run and deploy in under 5 minutes, and incorporate best practices—from architecture to security to deployment. We’ve made it easier than ever to scaffold, deploy, and scale real-world solutions with confidence. The future of intelligent, scalable, and secure applications starts now—jump in and see what you can create!3.6KViews1like2CommentsAnnouncing General Availability: Azure Logic Apps Standard Custom Code with .NET 8
We’re excited to announce the General Availability (GA) of Custom Code support in Azure Logic Apps Standard with .NET 8. This release marks a significant step forward in enabling developers to build more powerful, flexible, and maintainable integration workflows using familiar .NET tools and practices. With this capability, developers can now embed custom .NET 8 code directly within their Logic Apps Standard workflows. This unlocks advanced logic scenarios, promotes code reuse, and allows seamless integration with existing .NET libraries and services—making it easier than ever to build enterprise-grade solutions on Azure. What’s New in GA This GA release introduces several key enhancements that improve the development experience and expand the capabilities of custom code in Logic Apps: Bring Your Own Packages Developers can now include and manage their own NuGet packages within custom code projects without having to resolve conflicts with the dependencies used by the language worker host. The update includes the ability to load the assembly dependencies of the custom code project into a separate Assembly context allowing you to bring any NET8 compatible dependent assembly versions that your project need. There are only three exceptions to this rule: Microsoft.Extensions.Logging.Abstractions Microsoft.Extensions.DependencyInjection.Abstractions Microsoft.Azure.Functions.Extensions.Workflows.Abstractions Dependency Injection Native Support Custom code now supports native Dependency Injection (DI), enabling better separation of concerns and more testable, maintainable code. This aligns with modern .NET development patterns and simplifies service management within your custom logic. To enable Dependency Injection, developers will need to provide a StartupConfiguration class, defining the list of dependencies: using Microsoft.Azure.Functions.Extensions.Workflows; using Microsoft.Extensions.DependencyInjection; public class StartupConfiguration : IConfigureStartup { /// <summary> /// Configures services for the Azure Functions application. /// </summary> /// <param name="services">The service collection to configure.</param> public void Configure(IServiceCollection services) { // Register the routing service with dependency injection services.AddSingleton<IRoutingService, OrderRoutingService>(); services.AddSingleton<IDiscountService, DiscountService>(); } } You will also need to initialize those register those services during your custom code class constructor: public class MySampleFunction { private readonly ILogger<MySampleFunction> logger; private readonly IRoutingService routingService; private readonly IDiscountService discountService; public MySampleFunction(ILoggerFactory loggerFactory, IRoutingService routingService, IDiscountService discountService) { this.logger = loggerFactory.CreateLogger<MySampleFunction>(); this.routingService = routingService; this.discountService = discountService; } // your function logic here } Improved Authoring Experience The development experience has been significantly enhanced with improved tooling and templates. Whether you're using Visual Studio or Visual Studio Code, you’ll benefit from streamlined scaffolding, local debugging, and deployment workflows that make building and managing custom code faster and more intuitive. The following user experience improvements were added: Local functions metadata are kept between VS Code sessions, so you don't receive validation errors when editing workflows that depend on the local functions. Projects are also built when designer starts, so you don't have to manually update references. New context menu gestures, allowing you to create new local functions or build your functions project directly from the explorer area Unified debugging experience, making it easer for you to debug. We have now a single task for debugging custom code and logic apps, which makes starting a new debug session as easy as pressing F5. Learn More To get started with custom code in Azure Logic Apps Standard, visit the official Microsoft Learn documentation: Create and run custom code in Azure Logic Apps Standard You can also find example code for Dependency injection wsilveiranz/CustomCode-Dependency-InjectionGive your Foundry Agent Custom Tools with MCP Servers on Azure Functions
This blog post is for developers who have an MCP server deployed to Azure Functions and want to connect it to Microsoft Foundry agents. It walks through why you'd want to do this, the different authentication options available, and how to get your agent calling your MCP tools. Connect your MCP server on Azure Functions to Foundry Agent If you've been following along with this blog series, you know that Azure Functions is a great place to host remote MCP servers. You get scalable infrastructure, built-in auth, and serverless billing. All the good stuff. But hosting an MCP server is only half the picture. The real value comes when something actually uses those tools. Microsoft Foundry lets you build AI agents that can reason, plan, and take actions. By connecting your MCP server to an agent, you're giving it access to your custom tools, whether that's querying a database, calling an API, or running some business logic. The agent discovers your tools, decides when to call them, and uses the results to respond to the user. Why connect MCP servers to Foundry agents? You might already have an MCP server that works great with VS Code, VS, Cursor, or other MCP clients. Connecting that same server to a Foundry agent means you can reuse those tools in a completely different context, i.e. in an enterprise AI agent that your team or customers interact with. No need to rebuild anything. Your MCP server stays the same; you're just adding another consumer. Prerequisites Before proceeding, make sure you have the following: 1. An MCP server deployed to Azure Functions. If you don't have one yet, you can deploy one quickly by following one of the samples: Python TypeScript .NET 2. A Foundry project with a deployed model and a Foundry agent Authentication options Depending on where you are in development, you can pick what makes sense and upgrade later. Here's a summary: Method Description When to use Key-based (default) Agent authenticates by passing a shared function access key in the request header. This method is the default authentication for HTTP endpoints in Functions. Development, or when Entra auth isn't required. Microsoft Entra Agent authenticates using either its own identity (agent identity) or the shared identity of the Foundry project (project managed identity). Use agent identity for production scenarios, but limit shared identity to development. OAuth identity passthrough Agent prompts users to sign in and authorize access, using the provided token to authenticate. Production, when each user must authenticate individually. Unauthenticated Agent makes unauthenticated calls. Development only, or tools that access only public information. Connect your MCP server to your Foundry agent If your server uses key-based auth or is unauthenticated, it should be relatively straightforward to set up the connection from a Foundry agent. The Microsoft Entra and OAuth identity passthrough are options that require extra steps to set up. Check out detailed step-by-step instructions for each authentication method. At a high level, the process looks like this: Enable built-in MCP authentication : When you deploy a server to Azure Functions, key-based auth is the default. You'll need to disable that and enable built-in MCP auth instead. If you deployed one of the sample servers in the Prerequisite section, this step is already done for you. Get your MCP server endpoint URL: For MCP extension-based servers, it's https://<FUNCTION_APP_NAME>.azurewebsites.net/runtime/webhooks/mcp Get your credentials based on your chosen auth method: a managed identity configuration, OAuth credentials Add the MCP server as a tool in the Foundry portal by navigating to your agent, adding a new MCP tool, and providing the endpoint and credentials. Microsoft Entra connection required fields OAuth Identity required fields Once the server is configured as a tool, test it in the Agent Builder playground by sending a prompt that triggers one of your MCP tools. Closing thoughts What I find exciting about this is the composability. You build your MCP server once and it works everywhere: VS Code, VS, Cursor, ChatGPT, and now Foundry agents. The MCP protocol is becoming the universal interface for tool use in AI, and Azure Functions makes it easy to host these servers at scale and with security. Are you building agents with Foundry? Have you connected your MCP servers to other clients? I'd love to hear what tools you're exposing and how you're using them. Share with us your thoughts! What's next In the next blog post, we'll go deeper into other MCP topics and cover new MCP features and developments in Azure Functions. Stay tuned!472Views0likes0Comments