Logic Apps Standard Hosting & Performance Tips
Published Oct 17 2023 05:36 PM 8,011 Views
Microsoft
Blog Post Contributors: Ben Gimblett (Engineer), Rohitha Hewawasam (Principal Software Engineering Manager), Fraser Molyneux (Cloud Solution Architect), Harold Campos (Principal Program Manager)



Background

 

As Logic Apps Standard capabilities continue to grow, we are seeing both, net new customers and existing Logic Apps customers currently using Logic Apps Consumption, moving their workflows to the Standard tier, to benefit from the new features and functionality it provides. See here for more info..
 
There's more than one reason for a customer to look toward Logic App Standard. These include leveraging the features available on the Azure Web Apps and Azure Functions platform (App Insights integration, Virtual Network integration and "in-app" / "built-in" connectors to name three) and different choices of dedicated hosting for which the customer has full control.  In addition, with the announcement that Logic Apps Integration Service Environment (ISE), will reach end of Life on August 2024, see here for more detail, we're seeing more customers moving their ISE workloads to Logic Apps Standard. Most commonly this is to ASEv3 hosted App Service Plans, but this is not the only hosting choice.
 
With the increased choice of hosting, our customers often come to us with similar questions:
 
How do I make the right hosting choice? How many hosting plans might I need? How do I distribute "apps" and "workflows" on a hosting plan? What are the tips and tricks I need to know if I want to tune my Logic Apps performance?
 
This blog post is aimed at new customers to Logic Apps Standard and existing customers migrating to Logic Apps Standard. The goal is to provide additional information to help customers understand better the hosting choices, suggested best practices and the settings they need to be aware of which impact how the workflow (or workflows) will be executed at runtime.
We intend to follow up on this post with further posts that cover more integration scenarios (we'll focus on Service Bus and IBM MQ as an example here) and take a deeper dive into monitoring in the future.

Connectors

 
We cannot have a conversation about Logic Apps without first discussing connectors. This post is not intended to be a deep dive on connectors, but for the purposes of 'level-setting' lets quickly summarise what connectors are and the different types available with Logic Apps Standard and how those different types of connectors compare.
 
To keep things simple I'm going to simplify to just "Managed" connectors, which includes the "ISE" labelled connectors used in Integrated Services Environment (ISE) and the newer "built-in" or "in-app" (labelled) connectors that are only available for Logic Apps Standard. Understanding the difference between these connector types is a good place to start.


"Managed" Connectors

 
There is a huge choice of Managed Connectors available and these can be utilised from any version of Logic Apps (be it the multi-tenant "Consumption" or the newer single tenant Logic Apps "Standard"). There are also different categories of Managed Connector see here for more information. Keep in mind that it isn't just Logic Apps that use Managed Connectors, other services do too. For example the Power Platform.
 
Managed Connectors are essentially APIs which are hosted in Azure. Each Connector "API" understands how to communicate with the service it's intended to integrate with - its a big value proposition of Logic Apps all up; the ability to compose-in complex integrations simply. Managed connectors provide that value by abstracting away the complex details often required for an integration behind a simple to use interface.
 
However, most Managed Connectors also have a limitation. Because they're hosted elsewhere in Azure they are only accessible to receive traffic through a public endpoint (an FQDN that resolves to an Azure DC public IP) and they cannot be joined directly to a customer virtual network. At the same time not all services you want to integrate with are hosted in Azure. In the past, in order to overcome the networking limitations there's usually been two choices:
 
  • The first is to use an On-premises-data-gateway - a component that can facilitate communication between on premises and Azure without the need for a direct link or a public inbound port to be opened through the firewall.
  • The second is for customers previously using Integrated Service Environment (ISE). This worked because the ISE was part of your Virtual Network and deployed with the ISE you also have a category of Managed connector (labelled "ISE") alongside the Logic Apps host compute in one of the adjacent subnets. These connectors were only accessible from within the customers virtual network (customer network and firewall rules notwithstanding) and the connectors could reach both virtual network resources and any resources in a linked network (for example on premises via a VPN or Express Route) where permitted.
 
Generally speaking the main challenge presented by Managed Connectors is the additional latency and configuration of network resources (such as firewalls) caused by the additional network hops between the Logic App and the Connector and the Connector to the service being integrated with. Where the service to be integrated with is on premises there can be additional configuration and operations overhead required to facilitate connectivity.


Which brings us to today: Logic Apps Standard and a better way of solving the problem

Logic App Standard "in-app" or "built-in" Connectors

Note: The connections for these connectors are referred to as "service provider" [model] connections. See here for more details.

Built-in (or in-app) Connectors are only for use in Logic Apps Standard. Unlike Managed Connectors, these connectors are implemented as Azure Functions custom connectors (the Logic Apps Standard runtime itself is implemented via an Azure Functions custom extension). See here to learn more.
 
Note: There's nothing stopping you of building your own custom built-in connectors, if you have the requirements. See here to learn more.
 
This change in Connector architecture has a couple of immediate benefits making in-app connectors the recommended choice:
 
  1. This new type of connector can benefit from the hosting plan network integration; either the Virtual Network associated with an ASEv3 hosted App Service Plan or the virtual network you integrate with if you're using a Logic App Standard "WS" plan (the default hosting type). If you're confused at this point, don't worry, we'll come back to hosting plans shortly.
  2. As the point above implies, the Connector is running within the Logic App (it runs on the same underlying Virtual Machine "worker" as the Logic App runtime) and so there is no additional network hop between the Logic App and connector and no protocol break. Furthermore, as the trigger part of the connector is in effect running "in process" with the workflow at runtime, performance of the trigger is increased too. It also means that for connectors to Azure resources such as Service Bus you can take advantage of private endpoints within the virtual network.
Today you can expect parity between "ISE" labelled connectors and "built-in" connectors - which is important news for customers making the migration journey from ISE.

 

Hosting Plans Alternatives

 

The Logic Apps Standard two fully supported and GA hosting options today are:
  1. Logic Apps Standard run on a Workflow Standard (WS) hosting plan. This is the workflow version of Functions Elastic Premium hosting plan and is limited to Windows right now.

  2. Logic Apps Standard running on Isolated (Iv2) Windows App Service Plans on App Service Environment v3 (ASEv3)

 
Both the above alternatives are "single-tenant" meaning that the compute that runs the Logic Apps Host (and your workflows) is clean and provisioned for you alone. However, there are some important differences that may influence your selection:

Run on a Default WS Plan

 
  • Is capable of scaling automatically based on events (this is useful for Logic Apps with event based triggers)
  • The maximum scale out is 20 instances in some regions, and up to 100 in others.  See here for the region list
  • Scale out time is very fast and an "always ready" pool can be pre-allocated (note you pay for pre-allocated instances)
  • Scale up to 4cores and 14GB per worker instance see here for more details.
  • Today: Requires a subnet per plan for Virtual Network Integration and one for private endpoints (this subnet can be shared between plans and other Az resources)
 

Run on an Isolated-v2 App Service Plans (App Service Environment v3)

 
Side note: The naming is a little confusing here -the 3rd version of ASE has larger plans than the prior versions hence "Isolated v2 or IV2"
  • You control the plan scale out yourself based on the metrics available through the configuration of "Inflate" and "deflate" rules. CPU is one of the first metrics to look at for Logic Apps workloads
  • Maximum scale out is 100 instances per plan with 200 overall (for the ASEv3) see here for more details
  • Scale out time (adding a worker or instance) is not as fast as the WS plan
  • Scale up vertically to 64cores and 256GB RAM per instance
  • Turn-key Virtual network configuration (comparatively simpler than WS/EP plans) fewer subnets required
  • Today this is the only hosting to support the File System Connector (however it's planned for this to be supported for WS plans soon) see here for more information.
 

To summarize

 
  • If you have event based Logic Apps and you have "bursty" loads where rapid scale-out is more important than vertical sizing, particularly for message or event triggered workflows, then WS plans are a good choice.
  • If you have higher core/memory requirements, a more level load, or you require the higher level of overall isolation (for example you're working on an application which will be included in a compliance audit such as PCI-DSS) then ASEv3 +Iv2 plans might be a better choice. Why more isolation? One of the main differences is that http "front ends" are shared for multi-tenant web apps (including WS plans) but they are "single-tenant" for ASEv3 meaning it should be easier to have a "single tenant" conversation if you're using ASEv3 - for reasons of compliance during an audit.

Density

 

But hang on, that's all great, but what about density? How many Logic Apps should I run on each plan? Is there any "rule of thumb" or best practice here I can follow? Those are the questions.
 
Frustratingly there's no one size fits all answer...
 
Simply because Logic Apps workflows are a blank canvass. Customers are free to create on that canvass any kind of integration workflow from simple to complex, for some customers the data payload will be small, for others large. For some customers there a requirement for high throughput (concurrency) , for others not so much. For some customers more memory is important, for example memory pressure typically cant be solved by scale-out, unlike for example CPU constraints. Etcetera.
 
We can provide some advice on how to approach reaching the combination of workflows and plans that meets your requirements.

A step back, what are "Logic Apps" in Logic Apps Standard and how do workflows fit into this:
 
Hosting Plans for Logic App Standard are "containers" , OK overloaded term, I mean it's a home for your workflow definition JSON files. (Previously a workflow was a standalone resource in Azure.)
 
The "Logic App" application you create on a WS Plan or App Service Plan (ASP) represents the "process" and hosts the runtime. By default that means the runtime is hosted, along with it's connectors as a Functions "Extension bundle" and will be, as this is on Windows, in a w3wp host. This is why you don't see a "bin" folder when developing in VS Code, or exploring the file structure in Kudu. If you are digging around in Kudu and depending on your workflow you may notice that the w3wp process has a child node process. This only happens if you are using the inline code (JavaScript action) and here the Node process is spun up as a "Language worker" and is responsible for executing the JavaScript in your inline code action. In the future other language choices will be offered beyond JS, for example an inline dotnet code action would require a dotnet language worker.
 
You can have more than one "Logic App" application on each plan (up to the total limits allowed). If you're familiar with Azure Functions then you can draw some comparisons:
  • Logic App is equivalent to a Function App
    • A Logic App contains one or more workflows
    • A Function App contains one or more functions
 
As your plan scales out then you end up with a "process" per "Logic App" per VM worker (compute instance) with each VM worker that scales out to an identical replica(s).
Each of your "Logic App" applications can contain one or more "workflow" files. As each host process comes up (Workflow JSON files are loaded when the process starts) the Logic Apps runtime will interpret each of the workflows it finds within the project folder (deployed within WWWROOT) and setup the trigger and subsequent actions.

It follows therefore:
 
  • The more workflows you have in a given "Logic App" and the more complex those workflows are, then the more work that process needs to do at runtime.
  • The more "Logic Apps" you have on a hosting plan then the more processes there are on each worker VM to compete for the compute resources available (determined by the pricing tier or vertical size you assign, literally the number of cores and amount of memory).
Remember that an instance of the Logic Apps runtime is hosted as an extension to the functions runtime, each process represents the runtime and configured triggers. This means triggers run per "Logic App" app deployed, per scaled out worker VM.
 
And there's another Logic Apps Standard feature that impacts how you should approach density...

Stateful / Stateless workflows

 
A new feature that came with Logic Apps Standard is the ability to define workflows that are stateless (previously you could only create stateful), a decision you can now make when you first create the workflow.
 
There is a difference worth noting here from a runtime perspective, beyond the obvious implied by the naming; for a stateful workflow the trigger and any instance of a workflow action could run on any VM Worker (when you're scaled out). To make that easier to understand here is a really simple example:
 
  • Using the Azure Portal I create a new Logic App Standard. I choose a WS2 Plan (2 cores) which is the recommended minimum.
  • I keep the scale-out defaults, a minimum of one instance (this is required so there's always a runtime instance hosting a trigger to handle incoming messages/events or timers) and allow the plan to scale up to a maximum of 20 instances (VM workers).
  • When the Logic App and associated "plan" resources are deployed I add a new stateful workflow called "workflow1".
  • On that workflow I add a "when an Http Request is received" trigger (this is "In app").
  • Next I add a Java Script inline code action to extract some values from the request body or do some simple processing.
  • Finally I add a http Response action to send some derived output values back to the caller with an "OK" status.
A really simple stateful workflow.
 
Now lets suppose I am going to be receiving a relatively high number of http requests to my new workflow once it goes live, in excess of 200 requests per second. What's happening with my workflow at runtime, and when the plans scales out?
 
The first thing that happens is the workflow is interpreted so the runtime can figure out the trigger required and the actions required and what order they need to run in (all of this is of course described in the workflow JSON file).
 
The interesting part is this:
 
Each action on a Logic App workflow (irrespective of stateful or stateless type) is run as a dynamic Function invocation. Initially these are run in-process (the scale out is via the thread pool) but as the message rate increases the plan will begin to scale out the underlying compute (as noted earlier, "how" this happens depends on the plan type). So as the plan scales out and new VM workers are added two things are happening:
  • My workflow gets more trigger instances for each configured workflow trigger (one per "Logic App" per VM worker) which means the plan can serve a higher rate of requests (or events / messages). In this case then I can think about it (in effect) as more http endpoints to receive messages. But in other scenarios it could be , for example, additional trigger instances to consume messages or events.
  • I get more processes (due to the compute scale out) to run actions and because I defined a stateful workflow in the simple example above, my Logic Apps actions (facilitated by the linked Az Storage Account) will be scheduled to run async via an event - what this means is that any given action could run on any given VM worker instance as the plan scales out.
In other words:

  • The scale out of VM worker (the plan scale-out) will scale-out the trigger capacity - allowing a higher concurrent rate of Logic App workflow instances to start - thus a higher trigger processing throughput is observed. (With App insights you can write a KQL query to count the request telemetry written for each trigger invocation and bucket that to understand the trigger throughput).
  • For stateful workflows I have more processes which could execute the action instances of any given workflow - in other words for stateful actions, the action could run anywhere on the scaled out plan and this is possible due to the "state" being externalized into Azure Storage and action invocation being event driven for stateful workflows. Note: If I want to get a sense for how fast my Logic Apps are completing I can go back to my App Insights telemetry. If there's a closing action for the workflow (and provided you have error handling which ensures this action is always going to fire eventually) then you can find the telemetry item for that action and do a query to count/bucket this for another view on throughput.
Notes:
 
  • An alternative to counting trigger/action request telemetry for workflow start and end is to use the trace telemetry emitted as described towards the end of  the blog post here or metrics here
  • It's a good cloud pattern (and certainly less expensive) to choose as small as compute size as possible (few cores, least memory, least expensive) and then scale out quickly on demand then scale back quickly to control costs. As opposed to just a few very large (many cores, a lot of memory) compute instances. This holds true for Logic Apps, as well as other Azure workloads. However, because of the work required within the runtime process it's a good idea to start with at least 2 cores (WS2, or I1v2) to avoid competition between the runtime and the other compute resources under load (which would lead to high CPU and unpredictable performance). If you already know your workflow is going to be doing a lot of work, for example buffering large messages, you may want to start testing on a WS3 (or if using ASEv3 the equivalent Isolated plan) and scale up if you need from there.
  • There is a counter point to the advice above which you should be aware of - there is a cost to the benefit of using storage for the state. Its a similar argument to leveraging external caches - externalizing state carries with it a penalty for serialization and wire costs. The Logic Apps runtime has optimizations in play to make intelligent decisions when scheduling actions to be run. For example, if related actions can be handled locally within the same compute instance. For example, if two action sequence can be run on the same compute / worker instance and they're known to be idempotent, then the runtime is able to skip the "in-between" checkpointing to storage saving on compute and wire costs. For this reason , for some workflows, you may observe better overall performance with a larger compute size, if you have the budget, because the process is able to do more where it needs to.

Just before we leave this section there's a few other things you need to be aware of:

 
  • Stateless Logic Apps don't incur the latency and that's a trade-off for the persistence that the storage account facilitates, but as a result they cant be executed in the same way. Therefore unlike it's stateful counterpart, a stateless Logic App is lower "latency" but is unlikely to achieve the same throughput at scale because all the actions on the workflow will execute within the same process.
  • An increasingly common pattern we are seeing larger customers adopt for more complex workloads is to mix and match. Perhaps you have a more complex workflow which requires stateful functionality - in this scenario you may also have groups of actions where it's OK for them to be stateless. You can break those actions out into "child" workflows, using the "invoke workflow" action from the parent and this ensures that whilst overall you run stateful - parts of the workflow can be scheduled stateless reducing overhead on storage and lowering the execution latency for those stateless "child" workflow actions
  • There is a side point to add here - in my view it's an anti-pattern to have too complex a workflow. It does depend on what the workflow is doing, but there's more chance you introduce a throughput bottleneck and in addition it's harder to maintain and test happy and un-happy paths. The best practice is to break down the workflows into smaller chunks - try and keep them to modelling the "integration-flow" (something which is hard to do in "code") and break out to application code where more complex business logic processing is required.
  • The storage account used for Stateful workflow execution can quickly become a bottleneck in itself at scale (by default this is the storage account represented by 'WebJobsStorage' setting, the plan default storage account).
  • To overcome this it's a good practice to "shard" the storage under your logic apps plans - this is explained in more detail here in Rohitha's blog and is highly recommended. Note: You can get an indicator of storage account throttling in this scenario by monitoring the storage account "transactions" metric and splitting on the "Response type" dimension, if you start to observe a higher percentage of "other" responses (not "success") during a stateful Logic App workflow load test this is indicative of storage account throttling.


Hang on! You didn't give me an answer, you didn't really provide an answer for density!

 

That's right :slightly_frowning_face: and that's because there is no one size fits all answer and unfortunately "benchmarks" don't generally offer a great deal of value for the reason that they are rarely representative to what a given customer is trying to do.
 
With that said, hopefully the notes above have helped you understand a bit more about how you can start to reach that decision. Keep the following in mind:
 
  • A hosting plan represents a logical scale unit. Workflows under that plan will share the same scale out settings, the same 'host.json' settings (for example trigger settings) and the same compute resources (vertical size)
    • Therefore, workflows contained by that plan should be a comfortable fit for those overarching settings, for example:
      • Don't have a small plan and later add a workflow that potentially requires a lot of memory (you could scale up the plan, but you're scaling it up for all the Apps and workflows under that plan!)
      • Don't later add a larger workflow to an existing plan where the new workflow has a much higher throughput requirement, particularly if the plan scale-out is constrained - this could lead to the new workflow competing for resources and becoming a "noisy neighbour" impacting the other workflows. In this scenario it may be better to consider breaking this new workflow out onto it's own plan.
    • As hinted above, understand each workflows load requirements. Is it stable, or 'bursty', is high throughput required or low latency (it is almost impossible to achieve both, usually you need to aim for one or the other) test workflows carefully and use the results to guide which workflows may better coexist together.
    • Look carefully at the workflow - is the trigger or actions (either intentionally or not) going to run sequentially / in-series? (For some scenarios , for example sessions/ordered messaging , it has to be this way). These workflows won't scale out as freely. An example would be a loop construct, by default these are usually parallel in execution, but it's possible to force sequential processing. Another example is where "concurrency" is set to on (doing this will throttle back the concurrent execution which can be an intentional bottleneck).
  • In the case of needing to use stateless workflows; it would be perfectly possible to "shard" your workflow over more than one hosting App. Consider a workflow with an http trigger, or a Service Bus Queue trigger (Service Bus Queues default to the "competing consumer" pattern) as examples. In these cases replicating your stateless workflow across two or more apps in a given plan increases the number of processes that workflow can run and scale in per scaled out worker. In effect it increases density. The same would not apply conceptually to stateful, because as explained above , the actions are async and the runtime distributes them across all available workers via the associated Storage Account - so trying to shard them would have limited impact for the bulk of the workflow.

Hosting plan additional notes:

 
  • It is also possible to run Logic Apps Standard on Kubernetes (preview) see here for more information
  • Logic Apps Standard pricing differs from the previous "Consumption" Logic Apps - and hosting choices impact this, for more detail see here for more information
  • Logic Apps Standard projects can be converted as a one-way operation to dotnet (for example you may do this for custom connector development) see here for more information

Workflow tips

 

Now you hopefully understand the hosting a bit better and have gained some tips and insight into that, lets cover some more tips for a better experience.
  • Probably the most important tip is to use batching on request triggers where available (for example Event Hubs, Service Bus or IBM MQ)

    • When you use batching, which is set for the trigger in the host JSON (exact details depend on the trigger), also be sure to use the trigger "split-on" functionality (this only applies to "stateful workflows, it doesn't work for "stateless") - what this will do is cause the runtime to de-batch for you meaning that you get a message per workflow instance and you don't need a foreach loop. This is both more performing and makes for a simpler workflow

    • Avoid explicit trigger concurrency settings unless you have a good reason to throttle back the trigger.

    • Where there's batching there's usually a "prefetch" value. Start with a value which is equal to the batch defined.

    • Test batch sizes. A bigger batch will be more performant for a smaller event/message size. In the inverse, if you have larger messages you'll need to test with a smaller batch / pre-fetch size. For example there's a HUGE difference between receiving ~1KB and receiving ~100KB messages and trigger tuning values wont be the same in each case.

  • As noted above check the plan scale-out (and if using ASPs that the right metrics are implement for the inflate and deflate rules, which you need to derive through testing. Again you can start with CPU). Throughput is going to be far higher if the runtime is allowed to load level across multiple plan worker VMs. Consider not capping horizontal scale out for "bursty" loads which need to be completed within a constrained time window (in other words where the throughput is a requirement of the overall SLA for the application) but ensure the plan scales back correctly

    • Keep in mind that trigger settings apply at the host level (will apply to all workflows in the plan) - and there's a trigger instance per "app", per scaled out worker VM at runtime.

  • Enable Application Insights for a deeper view into what's happening with your Logic App

    • Ensure you're using V2 (as of writing today you need to set this manually) see here for more information

    • Keep an eye on the Log Analytics table size for the Application Insights tables. Particularly Request and Trace table. Log Analytics/App Insights are billed per GB ingress. So volume matters over time. It's possible to dial down the rate of data. (Leaving that as a follow-up; as it's beyond the scope of this post). However, you can track cost through the cost analysis view in the backing Log Analytics workspace.

    • The "Live metrics" blade in App Insights has a real time view on the number of plan workers - top right hand corner ("Servers online") right now this is the best / easiest way to check scale-out. You can configure App Insights to collect performance counters and use that data . but it's not real time and it's also additional data ingress to Log Analytics (cost). There is work ongoing to provide a better solution to tracking scale-out going forward.

    • At the time of writing the App Insights App Map and E2E (end to end) views are broken in some use cases for Logic Apps. Under certain circumstances it means you cant follow a workflow execution and it's dependencies. This will be fixed soon.

    • If you're unsure of a given trigger config (or version) being used, check App Insights. The LA runtime dumps the default trigger config (and any changes you make as override in 'host.json') as it "sees" it to the trace telemetry emitted to the trace table in the backing Log Analytics workspace to your App Insights resource.

  • Use the async action where workflows have an Http dependency. Logic Apps http triggers and actions can work together themselves (or with other services which support the pattern) to provide http polling. This is a lot more scale-able and a lot more efficient when integrating with services that will take a reasonable time to respond. Noting: There is a limit to this - the async pattern uses polling and in the extreme this isn't scale-able due to the high volume of connections - but outside of the edge case it's recommended and helps prevent transient fault issues and connection timeouts.

  • Use the "diagnose and solve problems" plugins as well as App Insights to help triage runtime issues - available form the diagnose and solve problems wizard in the portal UI under the deployed Logic App (same for App service apps and functions apps). This can be a very useful set of tools to troubleshoot issues such as connections or SNAT port exhaustion. It will also give you a time boxed summary of CPU and memory consumption and has some Logic Apps specific plugins.

  • Use inline JScript or custom code (the Logic Apps team will soon be shipping .Net6 support as hinted above) for data manipulation. Don't use logic app branching or foreach to iterate over collection and do data transforms. As noted in the sections above under "hosting" this can lead to a degradation in performance.

  • When using concurrency for controlling downstream throughput (to control backpressure on dependencies) - move that Logic into a child workflow and set the concurrency on the child workflow instead of the main workflow.

    • You can also control trigger throughput with batching where supported and this can be a better overall option for controlling throughput without much overhead


Service Bus and IBM MQ Connector tips

 

  • At the time of writing the concurrency setting for the Service Bus trigger is not fully supported. Changes are planned to be delivered imminently to fix this.
  • To achieve a higher performance with the IBM MQ connector we recommend using the autocomplete instead of the Browse Lock feature. We recommend the Browse Lock feature for scenarios that require to atomically retrieve and lock messages from a queue in a non-destructive manner. Please read this article for further guidance: Logic Apps Mission Critical Series: "We Speak: IBM MQ (Part 1)".
  • When batching with the IBM MQ we recommend smaller batches (i.e. 50 to 200 messages per batch subject to message size).
  • For the IBM MQ connector, we recommend using multiple workflows to achieve a higher system throughput instead of large batch sizes and one workflow.
  • For the IBM MQ connector, we recommend using stateless workflows for a fast response and stateful workflows for reliability and persistence.
  • Using Service bus built in trigger with autocomplete (letting the trigger settle the lock) is recommended over the " peek lock " version for more complex stateful workflows, particularly when receive batching. I'm saying that knowing the peek lock version of the trigger is very popular. Why?
    • Today, additional overhead is required to correlate the explicit settlement actions and the lock settlements cant be batched this reduces the trigger throughput. Using the version of the trigger which itself manages the lock settlement for each message (or each message in a batch) as part of the underlying implementation is demonstrably better for throughput due to the reduced overhead.
    • In addition settling the lock at the trigger means you can side-step the common pattern of having a parallel loop construct (parallel to the main workflow execution) which keeps checking to see if the lock needs renewing - avoiding this means the workflow should be cleaner and quicker
    • There is a trade-off here - so think about it like this: If you have a relatively simple process, a good example is a simple workflow that will de queue a message, enrich it by calling a dependent down stream service, before persisting it to storage (could be a database). This kind of simple workflow benefits from the pseudo transaction offered by Service Bus and the peek lock. More over it should be relatively low latency meaning that even a default lock timeout should be more than sufficient to allow the workflow to complete even under high loads. If you notice re-delivery or 'deadlettering' it's a flag (in this scenario) something else is wrong. If however you have a more complex process then it's going to be more efficient performance wise to have a stateful workflow and let Logic Apps handle the message reliability (this works because of the reliable nature of the Azure Storage provider underpinning the stateful workflow). However to do this you must ensure that you have the correct error handling (and compensating logic in your flow) for each step. Doing this carefully will ensure that the workflow can "complete" each message and it is "handled" without any further intervention on both the "Happy" and Unhappy" paths (for example manual intervention from a customers support or ops team for a failed workflow).
    • If you are using the version of the Service Bus Trigger which autocompletes (settles the lock for you) then play with potentially smaller batch size/prefetch settings and (once available) the concurrency settings - this may be necessary to throttle back trigger execution to give a chance for started workflows to make their way to completion , particularly for very high throughput scenarios at scale, and/or where the workflow will have a high latency/expected time to complete is in minutes rather than seconds. Not doing this could lead to degradation in resources (resource contention) and issues completing workflows within the expected time window
  • There is currently a limitation for Service Bus and IBM MQ "built-in" triggers regarding plans which are allowed to scale in and scale out. The issue occurs for a specific use case which is the Logic Apps "Service Bus trigger with peek lock" and "IBM MQ trigger with Browse Lock" with stateful workflows. This trigger is required to be paired with either a "complete", "abandon" or "Lock renewal" action. The issue occurs because at the time of writing the final action (for example "complete") needs to occur on the same process/VM as the trigger - and out-of-the-box (as explained in the sections above) this is not guaranteed for stateful actions. The underlying limitation is one that the Service Bus team are looking at removing as soon as they can but it's a non trivial piece of work. For now, a few things are required to work around the issue but one of those is fixing scale (avoiding scale-back whilst workflows are in progress) so the cost implications need to be considered carefully. For full details see here.


Summary

 

Remember that development is iterative and despite the length of this article and the many topics addressed here, there is not a "one size fits all" answer. Challenges can most often be addressed by looking at a combination of areas from architecture and configuration to implementation. Understanding Logic Apps capabilities, optimizing the workflow and tuning the configuration is essential for scaling efficiently.

The key takeaways for approaching performance when developing Logic Apps are:
 
  • Validate your Architecture and Topology against your business requirements. What are you trying to accomplish with the solution?
  • Ensure you understand the observability tools available to diagnose and solve problems through Azure Monitor and Application Insights.
  • You can also leverage this to identify bottlenecks and opportunities for improvements.
  • Consider: Connector Types, Workflow Types, Hosting/Scaling, Trigger Configuration and Implementation and validate this against the business requirements and topics in this blog.


Next time we plan to dive into observability in a bit more detail.

 

2 Comments
Co-Authors
Version history
Last update:
‎Oct 19 2023 04:52 AM
Updated by: