With the rise of Artificial intelligence (AI), many industries and users are looking into ways to leverage this technology for their own development purposes and use cases. The field is expected to continue growing in the coming years as more companies invest in AI research and development.
Microsoft’s Azure OpenAI Service provides REST API access to OpenAI's powerful language models the GPT-4, GPT-35-Turbo, and Embeddings model series. With Azure OpenAI, customers get the security capabilities of Microsoft Azure while running the same models as OpenAI. Azure OpenAI offers private networking, regional availability, and responsible AI content filtering. However, the integration and embrace of new technology can present a range of difficulties and apprehensions, with foremost among these being the concern pertaining to security and governance. This is where Azure API Management comes in, providing a robust solution to address hurdles like throttling and monitoring. It facilitates the secure exposure of Azure OpenAI endpoints, ensuring their safeguarding, expeditiousness, and observability. Furthermore, it offers comprehensive support for the exploration, integration, and utilization of these APIs by both internal and external users.
Especially, because a majority of users are currently looking for a quick and secure way to start using Azure OpenAI. The most popular question we get is, “How can I safely and securely create my own ChatGPT with our own company specific data?”. Once an MVP is up and running, the next question is “How can I scale this into production”? For our use case today, we want to leverage the power of Azure API Management to meet production-ready requirements and provide a secure and reliable solution.
"servers": [
{
"url": "<AZURE_OPENAI_ENDPOINT>/openai",
"variables": {
"endpoint": {
"default": "<AZURE_OPENAI_RESOURCE_NAME>.openai.azure.com"
}
}
}
<policies>
<inbound>
<base />
<set-header name="api-key" exists-action="override">
<value>{{APPSVC_KEY}}</value>
</set-header>
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>
Note: This demonstrates a simplified implementation, ideally you want to use the third type, Key Vault, to source the secret externally.
Any AI project can be delicately architected. However, with how new Azure OpenAI, there are a few delicate details we need to consider that we have to take into consideration before building AI projects:
1. Advanced request throttling with Azure API Management:
API throttling, also known as rate limiting, is a mechanism used to control the rate at which clients can make requests to an API. It is implemented to prevent abuse, ensure fair usage, protect server resources, and maintain overall system stability and performance. Throttling restricts the number of API requests that a client can make within a specified time period.
Azure API Management helps with flexible throttling. Flexible throttling allows you to set different throttling limits for different types of clients or requests based on various criteria, such as client identity, subscription level, or API product.
Below is an example of an API management policy in Azure API Management that implements rate limiting for cost control purposes. This policy will limit the number of requests a client can make to your API within a specified time period:
<inbound>
<base />
<rate-limit-by-key calls="1000" renewal-period="3600" />
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
Explanation of the policy:
For a more advanced example, check out our Open AI Cost Gateway Pattern, where you can track spending by product for each and every request and rate limit by product based on spending limits.
2. Use Application Gateway and Azure API Management together for API Monitoring:
Logging and monitoring are critical aspects of managing and maintaining APIs, ensuring their availability, performance, and security. When integrating Azure OpenAI models into your solutions through Azure API Management and an API gateway, you need to establish robust logging and monitoring practices to gain insights into API usage, detect and troubleshoot issues, and optimize performance. Implementation example:
See Tutorial: Monitor published APIs for more detailed information.
3. Use Application Gateway for load balancing:
Load balancing is crucial in Azure OpenAI projects to efficiently distribute incoming API requests across multiple resources, ensuring optimal performance and resource utilization. The challenge lies in managing uneven traffic distribution during peak hours, achieving scalability while avoiding underutilization or overloading of servers, ensuring fault tolerance, and dynamically allocating resources based on fluctuating workloads. Proper load distribution strategies and session persistence play vital roles in maintaining high availability and seamless user experiences.
You can use it to route traffic to different endpoints. For more detailed information, check out this documentation.
For this we can use Azure Application Gateway. Application Gateway sits between clients and services. It acts as a reverse proxy, routing requests from clients to services. API Management doesn't perform any load balancing, so it should be used in conjunction with a load balancer such as Azure Application Gateway. You can use it to route traffic to different endpoints. For more detailed information, check out this documentation.
If you are looking for a way to create your first Azure OpenAI solution, you might be overwhelmed by the many options and possibilities. However, a good strategy is to start with a low-risk, high-value use case that targets your internal audience. For instance, one of the common solutions for adoption is the below architecture, which is available through one of our Microsoft Accelerators. This architecture enables you to build a MVP solution that leverages Azure GPT to create a private search engine and a chat experience powered by Azure ChatGPT and Azure GPT-4 for your web application or integrated into various different channels through our Azure Bot Framework, like Microsoft Teams. The advantage of this architecture is that it allows you to use your own data securely within your own Azure Tenant, and to access data from various sources and types. As shown in the green highlighted box, the model is grounded with data from Azure Cognitive Search, Azure SQL Database, specific tabular files, and even the internet via our Azure Bing API.
For more information on the details of the accelerator, kindly find our Github Repo here and the introduction deck here.
As mentioned, the above accelerator architecture is an amazing place to start to build your MVP solution. To make this an enterprise ready solution and to address some of the challenges that we mentioned above, like API Management and Load Balancing. Post MVP, we would highly recommend starting to adopt the Microsoft reference architecture below. This architecture highlights the key components to ensure we can properly manage our API’s through Azure API Management, utilize Application Gateway to help with load balancing, and monitoring our overall API’s with Azure monitoring. Overall, the solution enables advanced logging capabilities for tracking API usage and performance and robust security measures to help protect sensitive data and help prevent malicious activity.
Note: Reference to Architecture
Feel free to reach out with any questions about these solutions or maybe even proposals on future blog posts!
A big thank you goes to our specialists in these subject areas, make sure to follow them for any updates or reach out to them directly:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.