Azure App Service Limit (4) - CPU (Windows)
Published Sep 12 2023 09:27 PM 3,840 Views
Microsoft

This is the 4th blog of a series on Azure App Service Limits illustrations:

1)  Azure App Service Limit (1) - Remote Storage (Windows) - Microsoft Community Hub  

2)  Azure App Service Limit (2) - Temp File Usage (Windows) - Microsoft Community Hub  

3)  Azure App Service Limit (3) - Connection Limit (TCP Connection, SNAT and TLS Version) - Microsoft C... 

4)  Azure App Service Limit (4) - CPU (Windows) - Microsoft Community Hub  

5)  Azure App Service Limit (5) - Memory (Windows) - Microsoft Community Hub  

 

Except for platform-side limitations on remote storage, temporary folder usage, and connection limits platform side limitations, there are also machine-level resource limitations to consider, such as CPU and memory.

 

High CPU usage is a common performance issue that can impact the responsiveness of a site in Azure App Service. However, as App Service is a Platform-as-a-Service (PaaS) environment, customers have limited access to troubleshoot machine-level resource issues.

 

To address high CPU issues in Azure App Service, this blog aims to clarify how you can check for high CPU usage and provides answers to common customer questions. By following the guidance in the blog, you can gain insights into troubleshooting and resolving high CPU utilization problems, ultimately improving the performance and responsiveness of your App Service.

 

As we know, whenever the machine is in the high CPU state, the site's performance would be impacted for sure. No matter what the HTTP status code(500, 502 or 400.604, etc.) it is, the average response time of the site would be slower than normal. Before discussing CPU questions, let's explore how to determine if your site is experiencing slowness issues. If you are already familiar with this information, feel free to skip this section.

 

Where can I check the response time of the app service?

Besides checking application insight, we can also check at below 2 places:

a. Overview blade of the app service, check the metrics view Response Time(average):

Selina_Sun_0-1692061452820.png

 

b. We can also check from Diagnose and solve problems => Web App Slow

Selina_Sun_1-1692061497706.png

The average response time in (ms)

Selina_Sun_2-1692061528228.png

 

Once you have confirmed that your site is experiencing performance issues, you can proceed to check if the CPU usage is high and explore potential solutions. In this section, we will address some common questions related to high CPU usage and discuss how to resolve this issue effectively. By addressing these questions, we aim to provide guidance on identifying and mitigating high CPU issue.

 

1. Where can I check if the machine is experiencing high CPU issue?

We can check the metric in multiple places, but if you want to check the performance issue, we'd suggest checking from App Service Diagnose and solve problems => Availability and Performance => High CPU Analysis, since we can easy switch to the CPU usage at the process level from this place.

Selina_Sun_0-1692061945371.png

Below is the overall CPU percentage usage at the machine level.

Selina_Sun_1-1692062002920.png

 

2. We have multiple places to check CPU usage, including the following options, which one to should I rely on?

  • Application Insight
  • The overview blade of App service plan\App service
  • The high CPU analysis from the diagnose and solve problems

Since the data sources of the mentioned places are different, and the granularity and aggregation algorithms may vary, there could be slight differences in metric values in some cases. However, overall speaking, the metric data from these sources should be consistent and provide a similar trend of CPU usage.

 

Furthermore, when referring to the metric data in the service plan or app service overview blade, it is generally recommended to focus on the average percentage value rather than the maximum or minimum CPU values or CPU time. The average percentage value gives a more accurate representation of the overall CPU usage over a specific time interval, providing a better understanding of the resource utilization pattern. By considering the average percentage value, you can monitor the CPU usage pattern and assess whether the utilization is consistently high or experiencing occasional spikes. This helps in determining the appropriate actions for optimizing resource allocation, scaling, or investigating performance bottlenecks.

 

3. Why the CPU usage is high while the application is almost free?

The App Service, being a Platform-as-a-Service (PaaS) cloud environment, not only exposes the application to clients but also runs several processes on the same machine to monitor and ensure the site's health and security. These processes include metric sampling, antivirus, and other system-level tasks. It's important to note that these processes consume system resources, such as CPU and memory.

 

In scenarios where the machine is relatively small in terms of resources, it is more susceptible to impact when new processes start or stop. However, in general, if the CPU and memory usage of the machine remains below 80% or experiences spikes for only a short duration, it should not significantly impact the performance of the app.

 

Based on our experiences, there are two scenarios where high CPU usage may be observed, particularly in small instances:

(1) The plan has a high site density

Since sites within the same plan share the same set of machines, each site has a set of system processes running alongside it. As the number of sites increases within a plan, the count of these system processes also increases. This can lead to a larger number of processes competing for resources, such as kernel mode CPU.

To ensure optimal performance and resource allocation, it is recommended to maintain a reasonable number of sites within each plan. Below is the best practice site count we can follow:

Selina_Sun_2-1692062572296.png

 

(2) The CPU spike happened around the system maitenence:

As you may know, during system maintenance, all the sites on a particular machine are moved to a new machine. During this migration process, the CPU usage of the site's process may spike temporarily, especially if the site is idle in generall time. If the site count is high on the machine, the CPU usage spike during startup and shutdown can become more noticeable.

If you have concerns or suspect that specific system processes are causing issues within your App Service, I recommend reaching out to our Microsoft support team. They can provide further assistance and thoroughly investigate the situation. Our support team has the expertise to identify any potential issues with system processes and application process.

 

4. What is the threshold of the CPU usage for the App service?

The threshold for CPU usage in Azure App Service is not fixed and can vary depending on the specific requirements and workload of your application. However, as a general guideline, it is recommended to keep the CPU usage below 70-80% to ensure optimal performance and responsiveness. You can check the machine's cores count from Scale up blade as below:

Selina_Sun_3-1692062662441.png

 

5. Where can I check which process caused the high CPU? 

From the High CPU Analysis, there is a CPU Drill Down, click View details link, we can check the CPU usage detail of all the sites and their Kudu sites at each machine, and we can easily find the highest one from it.

Selina_Sun_4-1692062752621.png

Selina_Sun_5-1692062774172.png

 

6. Why some of the machines have high CPU while others are not? Is there any issue with those high CPU machines?

In most cases, all backend machines within Azure App Service are using the same platform version and identical settings, unless they are undergoing upgrade operations. Based on our experiences, performance issues caused directly by the Azure App Service machine itself are exceedingly rare. Therefore, it is recommended to initially focus on narrowing down the issue by examining the configuration settings, such as ARR Affinity, and investigating the application-level factors.

 

By following the diagram provided in the question 7 from below, you can take steps to troubleshoot and identify any configuration or application-related issues that could be affecting performance. This method helps in isolating and resolving potential bottlenecks within the configuration or application.

 

However, if you have genuine suspicions that the machine itself is causing the issue, you can consider replacing the machine using the Reboot Woker REST API call. This allows you to initiate the process of migrating your site to a different machine. Nevertheless, it is advisable to thoroughly investigate and rule out configuration and application concerns before resorting to machine replacement.

 

7. What should I do if I noticed high CPU issue? 

If we meet the high CPU issue, we can narrow down the issue by following the below diagram:

Selina_Sun_6-1692062925276.png

If we've got the conclusion that the high CPU is by the application code or web job code, we have multiple tools for the further investigation, such as for the .NET and .Net Core application, we can capture .NET Profiler Trace or Collect Memory Dump for the further assistance. Please note that memory dumps serve as snapshots of the machine's memory at the time of capture. Therefore, they are only useful if they are captured during the occurrence of the issue, such as when the CPU usage is at a high level (e.g., 80% or higher). If the dump is captured during normal operation with lower CPU usage, it will only provide call stacks of thread execution without providing insights into the root cause of the high CPU usage.

Co-Authors
Version history
Last update:
‎Sep 12 2023 10:25 PM
Updated by: