Troubleshoot Cloud Service Application issue with Application Insight – Part 2 Common Scenarios

Published Jun 21 2022 08:04 PM 1,020 Views
Microsoft

When we use Azure Cloud Service to host website or proceed some data process, it’s always recommended to integrate a custom log system to collect more detailed information and log records. Application Insight is a such kind of official tool provided and supported by Azure. In part 1, we learn how to use the basic features of Application Insight with Cloud Service. In this blog (part 2), we’ll talk about some common issues which we can benefit from using Application Insight and how to troubleshoot them.

 

In this blog, Cloud Service stands for both classic Cloud Service and Cloud Service Extended Support because the Application Insight works on both of them in same way.

 

This blog will contain the following parts:

  1. The relationship between Windows Azure Diagnostic setting and Application Insight
  2. Common advanced way to use Application Insight with Cloud Service
    • The way to add custom log
    • The way to record the running WorkerRole application as request
    • Check the failed request and related exception of WebRole
    • Check the failed request and related exception of WorkerRole
  3. Common scenarios and related guidelines
    • Monitor the Memory and Request status of the Cloud Service WebRole by Application Insight
    • Troubleshoot performance issues such as slow response time
    • Troubleshoot performance issues such as high CPU/Memory of WorkerRole

 

Pre-requisites:

Please kindly follow the part 1 of this blog at first to have general knowledge about using Application Insight on Cloud Service.

 

The relationship between Windows Azure Diagnostic setting and Application Insight

As explained in part 1, when we enable Application Insight on Cloud Service project, we must enable Azure Diagnostic setting at same time. The reason is except the default setting, some metrics data and log collected Azure Diagnostic setting will also be sent into Application Insight.

 

Since this blog is focusing on the Application Insight, there will not be a detailed presentation of the Diagnostic setting of Cloud Service. About the different options in Diagnostic setting, please kindly check document about their definition. Here we’ll only point out one important piece of information which may be confusing:

When the Diagnostic setting is enabled, the performance counter setting works differently on WebRole and WorkerRole:

  1. For WebRole, the following 9 metrics data will be automatically collected even if we disable the performance counter in Diagnostic Setting. These 9 metrics data will be saved in performanceCounter table of Application Insight. The custom additional setting which we select in performance counter of Diagnostic Setting, such as \Process(w3wp)\% Processor Time, will be saved into customMetrics table if it’s enabled.

\Process(??APP_WIN32_PROC??)% Processor Time 

\Memory\Available Bytes 

.NET CLR Exceptions(??APP_CLR_PROC??)# of Exceps Thrown / sec 

\Process(??APP_WIN32_PROC??)\Private Bytes 

\Process(??APP_WIN32_PROC??)\IO Data Bytes/sec 

\Processor(_Total)%  Processor time 

\ASP.NET Applications(??APP_W3SVC_PROC??)\Requests/sec 

\ASP.NET Applications(??APP_W3SVC_PROC??)\Request Execution Time 

\ASP.NET Applications(??APP_W3SVC_PROC??)\Requests In Application Queue 

 

2. For WorkerRole, if we disable the performance counter in Diagnostic Setting, there will be no metrics data of the performance automatically collected and saved in Application Insight.

3. For both WebRole and WorkerRole, there is always a kind of metric data which will be saved into Application Insight automatically, HeartBeatState. This metric is to identify whether the instance is still healthy at server level. It will be triggered every 15 minutes and saved into customMetrics table. For all the other performance metric data, like WebRole, we need to manually enable it in performance counter of Diagnostic Setting.

 

Please keep in mind that Application Insight will generate record only when there is really data collected. For example, if we deploy a WebRole but never send request to it, or if we deploy a WorkerRole but the code will not read/write data from/into disk at all, or the amount of data IO is quite low, it’s possible that Application Insight will not record any information.

 

As shared in part 1, the relationship between the options in Diagnostic setting and table name in Application Insight logs page is:

Relationship of Diagnostic setting and Application Insight tableRelationship of Diagnostic setting and Application Insight table

 

Common advanced way to use Application Insight with Cloud Service

In this part, we’ll provide several common advanced ways to use Application Insight with Cloud Service. Compared to part 1, we will dig deeper into the advanced features such as using Azure Application Insight SDK in Cloud Service project to generate or modify the data saved into Application Insight.

 

The way to add custom log

It’s usual that the developers need to add some custom log in their application. This feature is also supported by Application Insight.

 

For that, comparing to the basic setup which we presented in part 1, we need to additionally install the SDK in the project. For details, please kindly refer to this document.

 

The startup function of WebRole can normally be Application_Start() in Global.asax. And the one of WorkerRole can normally be OnStart() in WorkerRoleName.cs.

 

In the official document, the recommended code to set the instrumentation key is:

 

TelemetryConfiguration configuration = TelemetryConfiguration.CreateDefault();
configuration.InstrumentationKey = RoleEnvironment.GetConfigurationSettingValue("APPINSIGHTS_INSTRUMENTATIONKEY");

var telemetryClient = new TelemetryClient(configuration);

 

 

This way can only set the configuration for one specific telemetryClient, which is used to communicate with Application Insight and send data. Since we may need to create telemetry clients in different classes, we can more easily set a default setting as what official example does.

 

TelemetryConfiguration.Active.InstrumentationKey = RoleEnvironment.GetConfigurationSettingValue("APPINSIGHTS_INSTRUMENTATIONKEY");

 

 

Then no matter where we want to add custom log, we simply need to:

 

using Microsoft.ApplicationInsights;

TelemetryClient ai = new TelemetryClient();
ai.TrackTrace("The custom log context");

 

 

With the above lines, the information will be logged into Application Insight, traces table.

custom trace log in Application Insightcustom trace log in Application Insight

 

Except the trace log, we can also use this way to record down the handled exceptions. More details will be explained in next two parts.

 

ai.TrackException(exception);

 

 

The way to record the running WorkerRole application as request

By design of Cloud Service, the request of WebRole is automatically marked with unique ID to identify the correlation. In WorkerRole, there isn’t such a system. But it’s possible to simulate the result of the WorkerRole application progress as a request and record this request into Application Insight. This can simplify our way to check the working status of the application in WorkerRole.

 

For more details about ai, please refer to the previous part, The way to add custom log.

 

The following is my WorkerRole application as example. This worker role will always keep adding trace logs into Application Insight every 30 seconds. But it will not always be added successfully because I use one changing bool variable select to make the Run function return a handled exception in every two loops. The trace log recorded into Application Insight will contain the timestamp, a fully random GUID as correlation ID to identify the relationship between request record and other records. Every loop is considered as a request, so it will generate a record of request with the start timestamp, the duration, the success status, the response code (200 for success and 500 for exception) and the correlation ID.

 

For the information of the request recorded, only Duration and Success status are necessary according to the document. All the other information can be removed. The reason why I put it into the request record is:

  • Start timestamp and response code can make it like a real request and different response code, for example 400 and 500 for failed requests can help when we want to identify different failure reasons.
  • If your application is simple thread, then the ID might not be so important because we can track different trace logs, exceptions and requests simply by timestamp. But if your application is multiple thread, imagine at same moment, there will be trace logs, exceptions and request records for different threads, it will not be possible any longer for us to track them by timestamp. A correlation ID which is used through all steps will be very important. According to the document, the ID of a request should be globally unique. To make sure the example works perfectly, we should add a function to verify if a newly generated random GUID is already used by any request records in same Application Insights.

 

using Microsoft.WindowsAzure.ServiceRuntime;
using System;
using System.Diagnostics;
using Microsoft.ApplicationInsights;
using Microsoft.ApplicationInsights.Extensibility;
using Microsoft.ApplicationInsights.DataContracts;

namespace WorkerRole1
{
    public class WorkerRole : RoleEntryPoint
    {
        private TelemetryClient ai = new TelemetryClient();
        private bool select = true;
        private int a = 0;
        private int b;
        private volatile bool onStopCalled = false;
        private volatile bool returnedFromRunMethod = false;
        private Stopwatch requestTimer;
        private bool requestResult;

        public override void Run()
        {
            ai.TrackTrace("WorkerRole1 is running AI");

            var request = new RequestTelemetry();

            while (true)
            {
                request.Name = "A test request";
                request.Id = Guid.NewGuid().ToString();
                request.StartTime = DateTimeOffset.UtcNow;

                ai.TrackTrace("New cycle. AI " + DateTimeOffset.UtcNow.ToString() + " " + request.Id);

                requestTimer = Stopwatch.StartNew();

                try
                {
                    if (onStopCalled == true)
                    {
                        ai.TrackTrace("Onstopcalled WorkerRole AI");
                        returnedFromRunMethod = true;
                        return;
                    }

                    if (select == true)
                    {
                        select = false;
                        b = 100 / a;
                    }
                    else
                    {
                        select = true;
                        b = 100 / 10;
                    }
                    ai.TrackTrace("normal WorkerRole AI " + DateTimeOffset.UtcNow.ToString() + " " + request.Id);
                    requestResult = true;
                }
                catch (Exception ex)
                {
                    ai.TrackTrace("Exception WorkerRole AI " + DateTimeOffset.UtcNow.ToString() + " " + request.Id);
                    ai.TrackException(ex, new Dictionary<string, string>() { { "id", request.Id } });
                    requestResult = false;
                }

                request.Success = requestResult;
                request.Duration = requestTimer.Elapsed;
                request.ResponseCode = requestResult ? "200" : "500";
                ai.TrackRequest(request);

                System.Threading.Thread.Sleep(30*1000);
            }
        }

        public override bool OnStart()
        {
            TelemetryConfiguration.Active.InstrumentationKey = RoleEnvironment.GetConfigurationSettingValue("APPINSIGHTS_INSTRUMENTATIONKEY");

            bool result = base.OnStart();

            return result;
        }

        public override void OnStop()
        {
            onStopCalled = true;

            while (returnedFromRunMethod == false)
            {
                System.Threading.Thread.Sleep(1000);
            }
        }

    }
}

 

 

P.S. We can use custom telemetry to do the same thing, but there will be many more configurations and more concepts to understand, so I don’t use it here. For more information, please kindly refer to the official example.

 

Please pay attention to the specific lines in the above example project which are necessary to record request into Application Insight:

  1. Line 4 to 6 are to import the Application Insight SDK
  2. Line 12 is to define a private TelemetryClient
  3. Line 25 is a generate a RequestTelemetry. Once it’s created, all the changes should be saved into this RequestTelemetry and SDK will save this RequestTelemetry into Application Insight.
  4. Line 29 to 31 are to configure the Name, Id and StartTime property of the request.
  5. Line 66 to 69 are to set Success, Duration and ResponseCode property of the request, then save it into Application Insight.

 

Except the above necessary steps, please also pay attention to the way how we save the custom Trace log and Exception, such as line 61 and 62. The unique specific ID will be very helpful for us to track the request workflow in Application Insight if your application is multi thread.

 

With the above way, we can easily track the operation result of WorkerRole application as request result of WebRole application in Application Insights.

Failures page in Application InsightFailures page in Application Insight

 

Check the failed request and related exception of WebRole

Usually, when we need to check the exception of WebRole, there must be requests sent and handled by IIS. For such kind of failed request, the unhandled exception and the handled exception (which is in try function) with ai.TrackException will be automatically collected into exception table. (For more details about ai, please refer to the previous part, The way to add custom log)

Handled exception exampleHandled exception example

 

P.S. For the example exception in the screenshot, if there isn’t ai.TrackException in line 47, the exception will be considered as handled exception, but it will not be recorded into Application Insight.

 

There are two possible ways to find the exception record by a failed request record.

 

One way is explained in the “How to check the performance summary of Cloud Service WebRole” in part 1. We only need to find the failed request in Operations page by adjusting the time range and selecting corresponding operation. By clicking on the operation name, the failed requests with specific exception type or specific response code will be listed automatically and we can check the details.

Failed operations in Application Insight - Failures pageFailed operations in Application Insight - Failures page

 

The other way is using the Logs page of Application Insights. This way is more complicated, but it can allow you to use more custom filters to look for the specific types of exception and provide more details which will not be displayed by first way.

 

By design of Cloud Service, the request of WebRole is automatically marked with unique ID to identify the correlation. We only need to know how we can find them in Application Insight.

 

What we need to do is:

  1. Locate the failed request in requests table and record its id

requests

| where resultCode == "500"

Id in requests table in Application Insight LogsId in requests table in Application Insight Logs

  1. Find the exception with same id in exceptions table

exceptions

| where operation_ParentId == "8d1adf11abf73c42"

Exceptions of failed requests in Application Insight LogsExceptions of failed requests in Application Insight Logs

 

The way of tracking exceptions based on a failed request will be very helpful when we want to troubleshoot an intermittent failure issue since it will contain the complete callstack of that request.

 

Check the failed request and related exception of WorkerRole

Since the unhandled exception of WorkerRole may cause the whole application downtime, we consider that all the exceptions in WorkerRole should be handled, which means to be included by try function. As WebRole, for the handled exceptions, we need to use ai.TrackException to record the exceptions into Application Insight. (For more details about ai, please refer to the previous part, The way to add custom log)

 

For the exception in WorkerRole, the way to check them is quite like the one of WebRole. The only difference is that there isn’t a built-in system to capture, or we say to record the exceptions automatically, so some additional code is needed for that.

 

Here we need to talk about the multiple possible situations:

  1. Our WorkerRole doesn’t include a system of recording custom requests (refer to previous part, The way to record the function of WorkerRole application as request), the only data which we can use to track the relationship between exception record and real operation in application is the timestamp.

 

In this situation, the way of checking Failures page is still possible for user to use, but we will need to switch to Exceptions page and check the timestamp by ourselves.

Exceptions in Application Insight Failures pageExceptions in Application Insight Failures page

 

The way to check accurate data in Logs page can also be used. The following is an example query to check exceptions between a specific time range.

exceptions

| where timestamp between (datetime(2022-05-11 00:00) .. datetime(2022-05-13 00:00))

  1. Our WorkerRole includes a system of recording custom requests with custom ID but it’s not included in the exception record, it will be the same as situation 1.
  2. Our WorkerRole includes a system of recording custom requests with custom ID and it’s included in the exception record, such as the line 62 of the example of previous part The way to record the function of WorkerRole application as request, it will be the same as situation of WebRole. We’ll be able to use both ways of checking Failures page and Logs page to find the related requests and exceptions. The query used in Logs page will be like:

requests

| where success == False

Failed requests in Application Insight LogsFailed requests in Application Insight Logs

 

exceptions

| where * contains "ade4308c-28cb-4aca-bda1-0ba7c32b8c36"

Exception in Application Insight Logs with custom IDException in Application Insight Logs with custom ID

 

 

Common scenarios and related guidelines

In this part, we’ll provide several commonly asked scenarios and their related guidelines about how to use Application Insight to meet the requirements.

 

Monitor the Memory and Request status of the Cloud Service WebRole by Application Insight

It’s an often-asked question how we can monitor the Memory of the Cloud Service. For WebRole, we can also monitor the request status including the number of total requests, failed requests, exceptions etc. It’s reasonable because in the default metrics page of Cloud Service, the only collected data are CPU percentage, disk read/write and network in/out.

 

Default Metrics page of Cloud ServiceDefault Metrics page of Cloud Service

To meet this requirement, it’s the most basic usage of Application Insight. What we need to do is to only enable Application Insight on the role where we want to collect metrics data from and that’s all. We do not need any additional configuration in the diagnostic setting. The automatically collected data will contain all needed data for Memory usage and request status of a WebRole.

 

To see the collected data, it’s recommended to use the Metrics page of the Application Insight. Under Application Insight standard metrics as Metric Namespace, we can find the Available memory under Server part for the memory. Also we can find Server requests under Server part, Failed requests and exceptions under Failure part or some other metric type to monitor the request status.

 

WebRole Memory metrics chartWebRole Memory metrics chart

WebRole Exceptions metrics chartWebRole Exceptions metrics chart

WebRole Server requests metrics chartWebRole Server requests metrics chart

 

After checking the metrics data, if we need to get more detailed information such as which kind of exceptions the application is returning, we can switch to corresponding page, such as Failures or Performance page.

 

 

Monitor the Memory and Request status of the Cloud Service WorkerRole by Application Insight

Similar to WebRole, we can also monitor the memory and request status of the Cloud Service. But there will be some additional limitations:

  1. For WorkerRole, the memory metrics data will not be automatically collected. To monitor the memory status, we need to enable the \Memory\Available MBytes from Performance Counters of Diagnostic Setting. The collected data will be in custom metrics table of Logs page.

Performance Counter in Diagnostic settingPerformance Counter in Diagnostic setting

 

Custom performance counter metrics data in LogsCustom performance counter metrics data in Logs

 

  1. To view the metrics chart of the collected memory data, we can switch to the Metrics page of Application Insight, select Log-based metrics in Metric Namespace and \Memory\Available MBytes under CUSTOM in Metric. The chart of the Available Memory of selected time range will be displayed.

Metric option of custom Memory metrics chartMetric option of custom Memory metrics chart

 

Custom Memory metrics chartCustom Memory metrics chart

 

P.S. Please pay attention to the following 2 points:

  • The dotted line in the chart means that the data is not accurate enough to generate the data or the data is missed during that time range. From the Logs, we can see the interval of collecting the Memory data is about 3 minutes. In the chart above, since the time range is set to Last hour, the time difference between every two points will be less than 3 minutes so the collected data will not be accurate enough. Thus, it’s dotted line.
  • The unit of the data here is 2.5B. It’s not 2.5 byte, but 2.5 billion. 2.5 billion bytes are almost 2.5 GBytes so we can think it’s almost the same as a chart with unit GByte.

 

Troubleshoot performance issues such as slow response time

This is also a commonly asked question. For example, when our Cloud Service WebRole receives a request, it needs to get some data from a remote server, such as SQL Database, then generate the data into a web page and return it to the user. Imagine that this progress is much slower than expected but still successful, it’s reasonable that we want to clarify whether most of time spent is during the communication with SQL Database or during the progress inside the Cloud Service. For that we need to add some additional custom log to record the timestamp of each step, such as start of the progress, start of the communication with SQL Database, end of the communication with SQL Database and end of generating the webpage etc.

 

The above is only one possible scenario as example. The design of the custom log system needs to be done by developers for different scenarios. In this blog, we’ll only provide a few tips about how to design a such kind of custom log:

  1. For both WorkerRole and WebRole, please check previous part The way to add custom log to save custom trace log into Application Insights. It’s recommended to save trace log at every process start step. For example, in the above example scenario, we can add trace log at following points:
    • When the WebRole receives the request
    • When the WebRole starts to build communication with SQL server
    • When the WebRole receives the data returned by SQL server and starts generating the webpage
    • When the WebRole generates the webpage and returns it to user
  2. If the main process is an application in WorkerRole, please check previous part The way to record the function of WorkerRole application as request to add custom correlation ID into custom request record and exception record.

 

Once the system is online, we can check the requests in the Performance page of Application Insights and focus on the request durations by following:

  1. Select a specific operation which we want to check (optional)
  2. Scale the duration distribution chart to the longest duration part
  3. Click on Drill into x Samples
  4. Click on one request as example and get the built-in or custom ID of this request

Application Insight Performances pageApplication Insight Performances page

 

Application Insight Failed Request detail pageApplication Insight Failed Request detail page

 

If the system is not quite complicated, the time spent by different steps will be displayed in the End-to-end transaction chart. If the system is complicated or we’re using a custom ID which causes it unable to display the data in chart, we can get all related trace logs containing same correlation ID by following query:

traces

| where * contains "ade4308c-28cb-4aca-bda1-0ba7c32b8c36"

 

By this way, we can calculate the difference between every trace log to get the time spent by every step.

Custom trace logs in Logs pageCustom trace logs in Logs page

 

Troubleshoot performance issues such as high CPU/Memory of WorkerRole

Sometimes we also need to identify issues such as a WorkerRole consuming very high CPU/Memory. What we can observe from outside of the Cloud Service is that the WorkerRole is consuming much CPU/Memory but we cannot know what exactly is happening in the instance.

 

To troubleshoot such kind of issue, we mainly could do it by two steps:

  • We need to add a custom log to track every step which the WorkerRole application will do. This is very important because with this step, we can identify if the application is still running well and compare the time spent in each step with the normal situation. This can help us to identify whether the application is affected by the high CPU/Memory issue. About how to add custom log system, please kindly refer to the previous part The way to add custom log.
  • We may also need to capture the dump file. But since this blog is mainly regarding the usage of Application Insight, we’ll only give some simple ideas:
    • We can RDP into the instance having high CPU/Memory issue and verify which process is consuming most of the CPU/Memory. If it’s WaWorkerHost, then it means that it’s the application itself consuming so much CPU/Memory.
    • If the instances are having high CPU/Memory and the application is just with low-performance but not crashed, then we can try to RDP into the instance and capture a dump file for this. For more details about how to capture the dump file, please kindly refer to this document. For example, we can use following command to capture a dump file when the CPU consumed by WaWorkerHost is higher than 85 for at least 3 seconds. 5 dump files will be captured and saved into c:\procdumps directory.

              procdump.exe -accepteula -c 85 -s 3 -n 5 WaWorkerHost.exe c:\procdumps

    • In the diagnostic setting page of Cloud Service, we could also set the crash dump file auto-generation. For more details of this part, please refer to this document.

Dump file setting in Diagnostic SettingDump file setting in Diagnostic Setting

 

Co-Authors
Version history
Last update:
‎Jun 18 2022 06:12 PM
Updated by: