How to isolate latency issue for Azure Storage Account
Published Jun 01 2020 03:29 AM 18.6K Views
Microsoft

Scenario:

Let’s say, your application makes use of Azure Storage Account for uploading/downloading blobs. However, you are seeing that there is latency in either uploading the blob or downloading it. Though, there has been no change in your application, you are unable to figure out what is causing the issue. You are unsure of whether the latency is part of the application or from Azure Storage Account.

Now, let’s see how we can isolate the latency issue and understand whether the latency is from Storage account or Client Application.

Before, we understand how to troubleshoot the latency issue, let us understand what Server Latency and E2E Latency is:

  • End-to-end (E2E) latency : It measures the interval from when Azure Storage Server receives the first packet of the request until Azure Storage Server receives a client acknowledgment on the last packet of the response. In simpler terms it means the round trip of any operation starting at the client application, plus the time taken for processing the request at Storage Server and then coming back to the client application.
  • Server latency: It measures the interval from when Azure Storage Server receives the last packet of the request until the first packet of the response is returned from Azure Storage Server. In simpler terms, it means the time taken by the Storage Sever to process any given request.

The first starting point to identify any latency issue, is to check the metrices available on the Portal for the storage account. Navigate to the storage account -> Go to the Metrices tab -> Select the metric values depending on which Storage Service you are using. For example, I am making use of the Blob Service for selecting the metric values:
1.png

 

In usual scenarios, there should be less difference between the Sever Latency and E2E Latency. If you see that the E2E latency is significantly higher than the Server Latency, then it needs investigation as to why there is client-side latency.

If you have the Storage Analytical Logging enabled on the storage account, we can make use of these logs to find whether there is server latency or E2E latency. You can know more about the Storage Analytical Logging by referring to this link. The storage analytical logs are saved in the $logs container and can be accessed using the Storage Explorer tool. Please refer to the below screenshot for reference:

2.png

Now, in the storage analytical logs, we can see the E2E latency and Sever latency value for each operation done on the storage account. A log sample is as below:
3.png

 

In the logs, we can see that there is high E2E latency as compared to the Server Latency.

Thus, by making use of the metrices available on Portal or Storage Analytical Logs, we can understand whether the latency seen is from the client-side or from the Storage Server side.

Let us assume that there is high E2E latency and low Server Latency. Now, how are we going to isolate this issue further i.e. how to understand if the latency is because of the client application/VM’s performance or/and there is issue with the network.

The common attributes that can be checked to see if there is performance issue with the client application/VM are:

  1. High CPU on the client: When you are running the application, check if there is spike in the CPU of the client machine. If the CPU is higher than expected, it will result in performance issue for the application and in turn increase the latency value.
  2. Low available memory on the client: Check if you are running low on memory on the client machine.
  3. Running out of network bandwidth on the client.
  4. Misconfigured client application.

If you observe any of the above parameter while running your application, you will have to consider increasing your VM’s size. Try to scale your VM and check if there is any latency issue while uploading/downing blob to storage account.

If you find any performance issues with the application, you can refer to the performance checklist mentioned in the link. Note that the checklist is for the .NET applications.

You can also face the latency issue caused due to the Azure DC region from your application's location.  For isolating the Azure DC latency issue, you can follow the below steps:

  1. Test using Azure Speed meter given in the link: http://www.azurespeed.com/Azure/Latency
    Select the Azure Data center that you want to access, and it will show the latency for it. This will help in isolating the latency issue caused due to the Azure DC region from your application's location.
  2. Test using different container/blobs:
    a. Create a new container (don't have the same naming convention than the existing containers) in the same storage account and add a new blob to that container and test accessing that blob.  This will help eliminate any storage level partitioning issues.
    b. Create a blob in your storage account and attempt to access that blob instead of your blob in that storage account
  3. Test using the Azure Storage team’s reference samples given in the link here.
  4. Create a hosted service in the datacenter where the storage account is and then RDP onto the VM and test throughput from there. Test just using IE to browse to the blob, and the tools from above.  You will most likely achieve very fast downloads which will indicate that the problem is the client’s network connection or application, and not Azure.
  5. Make sure that you are aware of the performance targets and the throughput offered in storage account. For this you can refer the link here.  

 

You can also investigate the issue from network side. Having a good VM configuration is not enough if you are having low bandwidth network. The operations performed on storage account with low bandwidth network will have latency issue. Thus, we need to investigate if there is low network bandwidth which is causing high E2E latency.

For this, you can make use of various tools like PSPing, TCPing or simply take a traceroute from your client machine to the storage account. Additionally, you can also make use of the AzCopy command to see the throughput you are getting while uploading/downloading the blob from Storage Account.

You can refer to the below steps:

  1. PS Ping: It is a small utility that implements ping functionality. It can be used to measure latency and bandwidth. You can execute the command as below:
  • psping -n 300 broderwus.blob.core.windows.net:443 (if using SSL endpoint)
  • psping -n 300 broderwus.blob.core.windows.net:80 (if using non-SSL endpoint) 
    4.png

 

  1. TCP ping: It is a console application that implements “ping” functionality. However, it works over the TCP port.  This tool can also be used to measure bandwidth or latency issue. The command is:  tcping.exe <yourstorageaccountname>.blob.core.windows.net.
    5.png

3. Tracert command: You can get the trace route from your client machine to the storage account. You can execute the below command to get the details: 
tcping.exe <yourstorageaccountname>.blob.core.windows.net.

  1. AzCopy: AzCopy is command line tool that helps in transferring data to the storage account. It can he also be used to see the throughput that you are getting on the given network. The command to upload data to storage account is as below:
  • azcopy copy "Your local file path" "https://<yourstorageaccountname.blob.core.windows.net/test/SAS_token" --recursive=true 

6.png

  1. Storage Explorer: It a tool that can be used to perform activities on the storage account like uploading blob, downloading blobs, or assigning ACL permissions etc. The tool internally calls AzCopy command.
    7.png

If you observe any issue with network, then you will have to consider moving your application closer to region where the Storage Account is hosted. Let say that your application is hosted on Azure VM in Central US and the storage account it is accessing is in East US. To have better network performance, you will have to consider placing your application in the same region as that of storage account to avoid the network latency.

Hope this article helps you in isolating the latency issue and helps you in taking appropriate steps to mitigate it.

3 Comments
Version history
Last update:
‎Sep 15 2020 06:30 AM
Updated by: