Blog Post

Microsoft Developer Community Blog
3 MIN READ

Improve LLM backend resiliency with load balancer and circuit breaker rules in Azure API Management

Julia_Muiruri's avatar
Julia_Muiruri
Icon for Microsoft rankMicrosoft
Mar 20, 2025

This article is part of a series of articles on Azure API Management and Generative AI. We believe that adding Azure API Management to your AI projects can help you scale your AI models, make them more secure and easier to manage.

We previously covered the hidden risks of AI APIs in today's AI-driven technological landscape. In this article, we dive deeper into one of the supported Gen AI policies in API Management, which allows your applications to change the effective Gen AI backend based on unexpected and specified events.

In Azure API Management, you can set up your different LLMs as backends and define structures to route requests to prioritized backends and add automatic circuit breaker rules to protect backends from too many requests.

Under normal conditions, if your Azure OpenAI service fails, users of your application will continue to receive error messages, an experience that will persist until the backend issue is resolved and becomes ready to serve requests again.

Backend unavailable

Similarly, managing multiple Azure OpenAI resources can be cumbersome, as manual URL changes are required in your API settings to switch between backend entities. This approach lacks efficiency and does not account for dynamic user conditions, preventing seamless switching to the optimal backend services for enhanced performance and reliability.

How load balancing will work

First configure your Azure OpenAI resources as referenceable backends, defining the base-url and assign a backend-id. As an example, let's assume we have three different Azure OpenAI resources as follows:

OpenAI resource definitions for round-robin

To set up load balancing across the backends, you can either use one of supported approaches/ strategies or a combination of two to ensure optimal use of your Azure OpenAI resources.

1. Round Robin 

As the name suggests, API Management will evenly distribute requests to the available backends in the pool.

Round robin graph

2. Priority-based

For this approach, you organize multiple backends into priority groups, and API Management will follow and assign requests to these backends in order of priority. Back to our example, we are going to assign openai1 the top priority (priority 1), assign openai2 to priority 2 and add openai3 with priority 3

Priority assigned

This will mean that requests will be forwarded to openai1 (priority 1), but if the service is unreachable, the calls will reroute to hit openai2 defined in the next priority group and so on.

priority-based graph

3. Weighted

Here, you assign weights to your backends, and requests will be distributed based on these relative weights.

Weights assigned

For our example above, we want to be even more specific by saying that while all requests default to openai1, in the event of its failure, we now want requests to be equally distributed to our priority 2 backends (specified by the 50/50 weight allocation)

Weighted graph

Now, configure your circuit breaker rules

The next step is to define rules to that listen to the events in your API, and trip when specified conditions are met. Let's look at the example below to learn more about how this works.

Circuit breaker configuration
  1. Inside your CircuitBreaker property configuration, you define an array that can hold multiple rules
  2. This section defines the conditions that must be met for the circuit breaker to trip.
    a. The circuit breaker will trip if there is at least one failure
    b. The number of failures specified in count will be monitored within 5-minute intervals
    c. We are looking out for errors that return a status code of 429 (Too Many Requests), and you can define a range of codes here
  3. The circuit will remain tripped for 1 minute, after which it will reset and route traffic to the endpoint

Alright, so what should be my next steps?

This article just introduced you to one of the many Generative AI supported capabilities in Azure API Management. We have more policies that you can use to better manage your AI APIs, covered in other articles in this series. Do check them out.

 

Do you have any resources I can look at in the meantime to learn more?

Absolutely! Check out: -

  1. https://learn.microsoft.com/en-us/azure/api-management/set-backend-service-policy
  2. https://learn.microsoft.com/en-us/azure/api-management/backends?tabs=bicep
  3. https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/backend-pool-load-balancing
Updated Mar 28, 2025
Version 2.0
No CommentsBe the first to comment