Introduction:
A comprehensive overview of the most frequently used and discussed architecture patterns among our customers in various domains.
1) AOAI with Azure Frontdoor for loadbalancing
Architecture diagram:
Key Highlights:
If you set equal weights for all origins and a high latency sensitivity in Azure Front Door, it will consider all origins that have a latency within the specified range of the fastest origin as eligible for routing traffic. So, all the origins should receive approximately equal amounts of traffic, provided their latencies are within the specified range.
However, it’s important to note that this doesn’t guarantee a perfect round-robin distribution. The actual distribution can vary based on factors like network conditions and changes in latency. If you need strict round-robin load balancing, you might need to consider other services or features that specifically support this method.
Use Postman for testing:
Request 1:
Request 2:
For perfect round robin distribution, you can use Azure Application Gateway with the same health check endpoints.
2) AOAI with APIM
Architecture diagram:
Key highlights:
a) Round Robin load balancing with Retry logic
<policies>
<inbound>
<base />
<cache-lookup-value key="backend-counter" variable-name="backend-counter" />
<choose>
<when condition="@(!context.Variables.ContainsKey("backend-counter"))">
<set-variable name="backend-counter" value="0" />
<cache-store-value key="backend-counter" value="0" duration="100" />
</when>
</choose>
<choose>
<when condition="@(int.Parse((string)context.Variables["backend-counter"]) == 0)">
<set-backend-service base-url="https://aoaipoc.openai.azure.com/" />
<set-variable name="backend-counter" value="1" />
<cache-store-value key="backend-counter" value="1" duration="100" />
</when>
<when condition="@(int.Parse((string)context.Variables["backend-counter"]) == 1)">
<set-backend-service base-url="https://aoaipoc2.openai.azure.com/" />
<set-variable name="backend-counter" value="0" />
<cache-store-value key="backend-counter" value="0" duration="100" />
</when>
</choose>
</inbound>
<backend>
<retry condition="@(context.Response.StatusCode >= 500 || context.Response.StatusCode >= 400)" count="6" interval="10" first-fast-retry="true">
<choose>
<when condition="@((context.Response.StatusCode >= 500 || context.Response.StatusCode >= 400) && (int.Parse((string)context.Variables["backend-counter"])) == 0)">
<set-backend-service base-url="https://aoaipoc.openai.azure.com/" />
<set-variable name="backend-counter" value="1" />
<cache-store-value key="backend-counter" value="1" duration="100" />
</when>
<when condition="@((context.Response.StatusCode >= 500 || context.Response.StatusCode >= 400) && (int.Parse((string)context.Variables["backend-counter"])) == 1)">
<set-backend-service base-url="https://aoaipoc2.openai.azure.com/" />
<set-variable name="backend-counter" value="0" />
<cache-store-value key="backend-counter" value="0" duration="100" />
</when>
</choose>
<forward-request buffer-request-body="true" />
</retry>
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>
Testing on round robin load balancing using APIM :
b) AAD authentication from APIM to Azure OpenAI
Step 1 – Enable Managed Identity in APIM
Step 2 – Provide necessary RBAC:
In the IAM of Azure OpenAI service add the OpenAI user role for the APIM Managed Identity (Managed Identity will have the same name of APIM).
Step 3 - Add the Managed Identity policy in APIM:
<policies>
<inbound>
<base />
<authentication-managed-identity resource="https://cognitiveservices.azure.com" />
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>
Testing for Managed Identity Policy:
c) Policy to extract callerID (Subject from APIM)
For extracting other details from JWT, refer -
Azure API Management policy expressions | Microsoft Learn
<validate-jwt header-name="Authorization"
failed-validation-httpcode="401"
failed-validation-error-message="Token is invalid"
output-token-variable-name="jwt-token">
<issuers>
<issuer>{{myIssuer}}</issuer>
</issuers>
</validate-jwt>
<!-- Extract the subject and add it to a header -->
<set-header name="caller-objectid" exists-action="override">
<value>@(((Jwt)context.Variables["jwt-token"]).Subject)</value>
</set-header>
d) Logging and Monitoring using APIM:
Use Azure monitor and APIM to enable enhanced logging and monitoring of the published AOAI APIs. Learn more - Tutorial - Monitor published APIs in Azure API Management | Microsoft Learn
Sample log queries for prompt completion:
ApiManagementGatewayLogs
| extend model = tostring(parse_json(BackendResponseBody)['model'])
| extend prompttokens = parse_json(parse_json(BackendResponseBody)['usage'])['prompt_tokens']
| extend completiontokens = parse_json(parse_json(BackendResponseBody)['usage'])['completion_tokens']
| extend responsetext = (parse_json(parse_json(BackendResponseBody)['choices'])[0]['message'])
| extend prompttext = (parse_json(RequestBody)['messages'])
For more queries refer to documentation here: Implement logging and monitoring for Azure OpenAI large language models - Azure Architecture Center ...
e) For advanced logging, more than 8192 bytes refer to the documentation here: openai-python-enterprise-logging/advanced-logging at main · Azure-Samples/openai-python-enterprise-l...
f) For Budgets and cost management using APIM refer this blog - Azure Budgets and Azure OpenAI Cost Management - Microsoft Community Hub
3) AOAI with Frontdoor and APIM multi-region deployment for a full-fledged multi-region availability
Refer to the DR documentation - Deploy Azure API Management instance to multiple Azure regions - Azure API Management | Microsoft Le...
a. In Frontdoor give both APIM regional gateway URLs as backend Origins, example
https://apimname-westeurope-01.regional.azure-api.net & https://apimname-japaneast-01.regional.azure-api.net
b. Configure the API Management regional status endpoints - e.g. https://apimname-westeurope-01.regional.azure-api.net/status-0123456789abcdef
c. Sample policy to be used to make the regional gateways route to respective backends.
<policies>
<inbound>
<base />
<choose>
<when condition="@("West Europe".Equals(context.Deployment.Region, StringComparison.OrdinalIgnoreCase))">
<set-backend-service base-url="http://aoai-backend-westeurope.com/" />
</when>
<when condition="@("Japan East".Equals(context.Deployment.Region, StringComparison.OrdinalIgnoreCase))">
<set-backend-service base-url="http://aoai-backend-japaneast.com/" />
</when>
<otherwise>
<set-backend-service base-url="http://aoai-backend-other.com/" />
</otherwise>
</choose>
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>
In conclusion, this article will be a starting point to implement scalable architecture patterns using Azure OpenAI models with other Azure services. As we continue to explore the potential of AI, we’ll continue to update our patterns and documents, guiding us towards smarter and more efficient systems.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.