deployed azure ai chatbot assistant app service with gpt 4 1106-preview model

Copper Contributor

hi

I've deployed an app in azure app services. It uses the gpt 4 1106-preview model that has been deployed in azure open ai studio. When running the app, I get chat responses that are quite slow. When I send "hi", it takes about 3-5 seconds to respond. When it's a short paragraph, about 14-26 seconds. Long paragraphs, up to a minute. What possible reasons could there be for this?

 

thanks

3 Replies
Hello Emmae,
Please consider these pointers.
• Review and optimize the resource allocation for both your Azure App Service and the GPT-4 model deployment in Azure OpenAI Studio.
• Monitor and analyze performance metrics to identify bottlenecks and areas for improvement.
• Implement caching or pre-fetching mechanisms to reduce repeated processing of similar requests.
• Profile your application code and model inference pipeline to identify any inefficiencies or performance bottlenecks.
• Consider asynchronous processing or batching of requests to optimize resource utilization and reduce response times.

hello @Nitinnks thanks for your response

I've tried allocating more resources towards my app service and it seems to not be making a difference. The pricing tier I currently have for the GPT-4 model deployment in Azure OpenAI Studio is currently S0 standard. Is that the highest pricing tier currently? I can't seem to find a tier higher to upgrade?

What tool and performance metrics can I use to identify bottlenecks and areas for improvement?

 

thanks

Hello @emmae157   you can use below tools for the monitoring...

 

  1. Azure Monitor: Azure Monitor provides comprehensive monitoring capabilities for Azure resources, including Azure App Service and Azure OpenAI Studio deployments. It offers features like metrics, logs, alerts, and insights to track performance and identify bottlenecks.

  2. Application Performance Monitoring (APM) Tools: Tools like Application Insights, New Relic, Datadog, or Dynatrace offer deep insights into application performance, including response times, resource utilization, and transaction traces. They can help pinpoint performance issues and optimize resource allocation.