application modernization
58 TopicsCalculating Chargebacks for Business Units/Projects Utilizing a Shared Azure OpenAI Instance
Azure OpenAI Service is at the forefront of technological innovation, offering REST API access to OpenAI's suite of revolutionary language models, including GPT-4, GPT-35-Turbo, and the Embeddings model series. Enhancing Throughput for Scale As enterprises seek to deploy OpenAI's powerful language models across various business units, they often require granular control over configuration and performance metrics. To address this need, Azure OpenAI Service is introducing dedicated throughput, a feature that provides a dedicated connection to OpenAI models with guaranteed performance levels. Throughput is quantified in terms of tokens per second (tokens/sec), allowing organizations to precisely measure and optimize the performance for both prompts and completions. The model of provisioned throughput provides enhanced management and adaptability for varying workloads, guaranteeing system readiness for spikes in demand. This capability also ensures a uniform user experience and steady performance for applications that require real-time responses. Resource Sharing and Chargeback Mechanisms Large organizations frequently provision a singular instance of Azure OpenAI Service that is shared across multiple internal departments. This shared use necessitates an efficient mechanism for allocating costs to each business unit or consumer, based on the number of tokens consumed. This article delves into how chargeback is calculated for each business unit based on their token usage. Leveraging Azure API Management Policies for Token Tracking Azure API Management Policies offer a powerful solution for monitoring and logging the token consumption for each internal application. The process can be summarized in the following steps: ** Sample Code: Refer to this GitHub repository to get a step-by-stepinstruction on how to build the solution outlined below : private-openai-with-apim-for-chargeback 1. Client Applications Authorizes to API Management To make sure only legitimate clients can call the Azure OpenAI APIs, each client must first authenticate against Azure Active Directory and call APIM endpoint. In this scenario, the API Management service acts on behalf of the backend API, and the calling application requests access to the API Management instance. The scope of the access token is between the calling application and the API Management gateway. In API Management, configure a policy (validate-jwtorvalidate-azure-ad-token) to validate the token before the gateway passes the request to the backend. 2. APIM redirects the request to OpenAI service via private endpoint. Upon successful verification of the token, Azure API Management (APIM) routes the request to Azure OpenAI service to fetch response for completions endpoint, which also includes prompt and completion token counts. 3. Capture and log API response to Event Hub Leveraging the log-to-eventhub policy to capture outgoing responses for logging or analytics purposes. To use this policy, aloggerneeds to be configured in the API Management: # API Management service-specific details $apimServiceName = "apim-hello-world" $resourceGroupName = "myResourceGroup" # Create logger $context = New-AzApiManagementContext -ResourceGroupName $resourceGroupName -ServiceName $apimServiceName New-AzApiManagementLogger -Context $context -LoggerId "OpenAiChargeBackLogger" -Name "ApimEventHub" -ConnectionString "Endpoint=sb://<EventHubsNamespace>.servicebus.windows.net/;SharedAccessKeyName=<KeyName>;SharedAccessKey=<key>" -Description "Event hub logger with connection string" Within outbound policies section, pull specific data from the body of the response and send this information to the previously configured EventHub instance. This is not just a simple logging exercise; it is an entry point into a whole ecosystem of real-time analytics and monitoring capabilities: <outbound> <choose> <when condition="@(context.Response.StatusCode == 200)"> <log-to-eventhub logger-id="TokenUsageLogger">@{ var responseBody = context.Response.Body?.As<JObject>(true); return new JObject( new JProperty("Timestamp", DateTime.UtcNow.ToString()), new JProperty("ApiOperation", responseBody["object"].ToString()), new JProperty("AppKey", context.Request.Headers.GetValueOrDefault("Ocp-Apim-Subscription-Key",string.Empty)), new JProperty("PromptTokens", responseBody["usage"]["prompt_tokens"].ToString()), new JProperty("CompletionTokens", responseBody["usage"]["completion_tokens"].ToString()), new JProperty("TotalTokens", responseBody["usage"]["total_tokens"].ToString()) ).ToString(); }</log-to-eventhub> </when> </choose> <base /> </outbound> EventHub serves as a powerful fulcrum, offering seamless integration with a wide array of Azure and Microsoft services. For example, the logged data can be directly streamed to Azure Stream Analytics for real-time analytics or to Power BI for real-time dashboards With Azure Event Grid, the same data can also be used to trigger workflows or automate tasks based on specific conditions met in the incoming responses. Moreover, the architecture is extensible to non-Microsoft services as well. Event Hubs can interact smoothly with external platforms like Apache Spark, allowing you to perform data transformations or feed machine learning models. 4: Data Processing with Azure Functions An Azure Function is invoked when data is sent to the EventHub instance, allowing for bespoke data processing in line with your organization’s unique requirements. For instance, this could range from dispatching the data to Azure Monitor, streaming it to Power BI dashboards, or even sending detailed consumption reports via Azure Communication Service. [Function("TokenUsageFunction")] public async Task Run([EventHubTrigger("%EventHubName%", Connection = "EventHubConnection")] string[] openAiTokenResponse) { //Eventhub Messages arrive as an array foreach (var tokenData in openAiTokenResponse) { try { _logger.LogInformation($"Azure OpenAI Tokens Data Received: {tokenData}"); var OpenAiToken = JsonSerializer.Deserialize<OpenAiToken>(tokenData); if (OpenAiToken == null) { _logger.LogError($"Invalid OpenAi Api Token Response Received. Skipping."); continue; } _telemetryClient.TrackEvent("Azure OpenAI Tokens", OpenAiToken.ToDictionary()); } catch (Exception e) { _logger.LogError($"Error occured when processing TokenData: {tokenData}", e.Message); } } } In the example above, Azure function processes the tokens response data in Event Hub and sends them to Application Insights telemetry, and a basic Dashboard is configured in Azure, displaying the token consumption for each client application. This information can conveniently be used to compute chargeback costs. A sample query used in dashboard above that fetches tokens consumed by a specific client: customEvents | where name contains "Azure OpenAI Tokens" | extend tokenData = parse_json(customDimensions) | where tokenData.AppKey contains "your-client-key" | project Timestamp = tokenData.Timestamp, Stream = tokenData.Stream, ApiOperation = tokenData.ApiOperation, PromptTokens = tokenData.PromptTokens, CompletionTokens = tokenData.CompletionTokens, TotalTokens = tokenData.TotalTokens Azure OpenAI Landing Zone reference architecture A crucial detail to ensure the effectiveness of this approach is to secure the Azure OpenAI service by implementing Private Endpoints and using Managed Identities for App Service to authorize access to Azure AI services. This will limit access so that only the App Service can communicate with the Azure OpenAI service. Failing to do this would render the solution ineffective, as individuals could bypass the APIM/App Service and directly access the OpenAI Service if they get hold of the access key for OpenAI. Refer to Azure OpenAI Landing Zone reference architecture to build a secure and scalable AI environment. Additional Considerations If the client application is external, consider using an Application Gateway in frontof the Azure APIM If "streaming" is set to true, tokens count is not returned in response. In that that caselibraries liketiktoken(Python), orgpt-3-encoder(javascript)for most GPT-3 models can be used to programmatically calculate tokens count for the user prompt and completion response. A useful guideline to remember is that in typical English text, one token is approximately equal to around 4 characters. This equates to about three-quarters of a word, meaning that 100 tokens are roughly equivalent to 75 words. (P.S. Microsoft does not endorse or guarantee any third-partylibraries.) A subscription key or a custom header like app-key can also be used to uniquely identify the client as appId in OAuth token is not very intuitive. Rate-limiting can be implemented for incoming requests using OAuth tokens or Subscription Keys, adding another layer of security and resource management. The solution can also be extended to redirect different clients to different Azure OpenAI instances. For example., some clients utilize an Azure OpenAI instance with default quotas, whereas premium clients get to consume Azure Open AI instance with dedicated throughput. Conclusion Azure OpenAI Service stands as an indispensable tool for organizations seeking to harness the immense power of language models. With the feature of provisioned throughput, clients can define their usage limits in throughput units and freely allocate these to the OpenAI model of their choice. However, the financial commitment can be significant and is dependent on factors like the chosen model's type, size, and utilization. An effective chargeback system offers several advantages, such as heightened accountability, transparent costing, and judicious use of resources within the organization.19KViews9likes9CommentsChecklist for Migrating Web Apps to App Service
App Service continues to invest in migration tooling to allow customers to easily migrate their web apps to App Service. The current set of tools enable discovery, assessment, and migration of web apps across various scenarios and scopes viz. standalone web app, single IIS server and even a datacenter.14KViews8likes1CommentExtend the capabilities of your AKS deployments with Kubernetes Apps on Azure Marketplace
We’re excited to announce that Kubernetes Apps in the Azure Marketplace is now Generally Available. Azure Kubernetes Service (AKS) provides a robust and scalable managed Kubernetes platform for organizations running their most mission-critical applications on Azure. With Kubernetes Apps, teams can further extend the capabilities of their AKS deployments with a vibrant ecosystem of tested and transactable third-party solutions from industry-leading partners and popular open-source offerings.12KViews7likes0CommentsAKS at Build: Enhancing security, reliability, and ease of use for developers and platform teams
At Microsoft Build 2024, we’re releasing a host of new features for Azure Kubernetes Service (AKS) aimed at making Kubernetes adoption easier and more accessible to a greater number of teams.9.8KViews6likes1CommentTop 10 Considerations for running your workload successfully on Azure this Holiday Season
Are you a Microsoft customer running your workload on Azure? Preparing for the Holiday shopping this season? As the joyful time of year approaches, a key theme seems to be that this holiday season will mean more togetherness, more commerce, and more revelry than last year. Black Friday, Small Business Saturday and Cyber Monday will test your app’s limits, and so it’s time for your Infrastructure and Application teams to ensure that your platforms delivers when it is needed the most. Be it shopping applications on the web and mobile or payment gateways or banking systems supporting payments or inventory systems or billing systems – anything and everything associated with the shopping season should be prepared to face the load for this holiday season. Application resilience is vital to keep up with the higher customer demand during the holiday season. Below are the top 10 considerations to ensure your App is resilient on Azure and to make sure your Technology team can handle your platform with grace. Here are my top ten considerations for running your workload on Azure to handle this holiday season: Multi Region: Consider running in multiple Azure regions for resilience check our mission critical refernce architecture . If running Active-Passive, then Perform DR drill in paired Azure region.If running in single region for whatever reason please consider running your workload across multiple availability zones for better resilience. Load Test: Perform load testing preferably use fully managed service like Azure load test it is easy to generate high-scale load and identify app performance bottleneckswith Azure load tests. Right Sizing: Ensure you have tested if your current instance size/count of different Azure components in your solution can handle the load, if not, please re-size well ahead of time.Baseline your Azure resources to support peak. Don’t just fully rely on Auto scale as scaling your infrastructure to exterme spikes will still take some time. Test for reliability using Azure chaos studioa method of experimenting with controlled fault injection against your applications. Review this reliablity checklistfor more exhautive set of reliablity guidance under well architected framework(WAF) and best practices. Azure Advisor: Advisor is a personalized cloud consultant that helps you follow best practices to optimize your Azure deployments. Quota: Review the Quotafor the resources you are using in your subscriptions.Quotas can be adjustable or non-adjustable.Adjustable quotasfor which you can request quota increases fall into this category. Each subscription has a default quota value for each quota. You can request an increase for an adjustable quota from theAzure HomeMy quotaspage, providing an amount or usage percentage and submitting it directly. This is the quickest way to increase quotas. Supported version: Be on supported version of the service ex: If you are running Azure Kubernetes Service upgrade to supported AKS versions ideally 1.26 or even higher. This will make sure even if you run into issues Microsoft support teams can help you. Service Health: Configure service health advisory and service issues alerts.This is the only way Azure communicating back and most often missing to act on key alerts/notifications can be costlier. Monitoring for reliability Get an overall picture of application health. If something fails, you need to know that it failed, when it failed, and why.Azure Monitor is a comprehensive monitoring solution for collecting, analyzing, and responding to monitoring data from your cloud environments. Review Reliability across all the services on Azure via Reliability Workbookkey tabs to look at is Availability zones and capacity for each services that you care about. This is the newest and most cool feature that I personally found useful to review against some common mistakes. DDOS: Holiday season is DDOS Season as well. Ensure all key security measures are also taken care along with DDOS Obviously, this list is not fully exhaustive list but I’ve tried to capture few top ones to make sure you are setup for success running on Azure this holiday season. Please let me know what you think. PS: do check out 7-part online series on Prepare your applications for the holiday season https://developer.microsoft.com/en-us/reactor/series/S-1189/1.9KViews4likes0CommentsAnnouncing: Contoso Real Estate JavaScript Composable Application Reference Sample
Today we are releasing Azure-Samples/contoso-real estate as open source – giving JavaScript developers a comprehensive architecture reference sample for building enterprise-grade, cloud-native and composable solutions on Azure, end-to-end with JavaScript.6KViews4likes0Comments