When we use Azure Cloud Service to host website or proceed some data process, it’s always recommended to integrate a custom log system to collect more detailed information and log records. Application Insight is a such kind of official tool provided and supported by Azure. In part 1, we learn how to use the basic features of Application Insight with Cloud Service. In this blog (part 2), we’ll talk about some common issues which we can benefit from using Application Insight and how to troubleshoot them.
In this blog, Cloud Service stands for both classic Cloud Service and Cloud Service Extended Support because the Application Insight works on both of them in same way.
This blog will contain the following parts:
Please kindly follow the part 1 of this blog at first to have general knowledge about using Application Insight on Cloud Service.
As explained in part 1, when we enable Application Insight on Cloud Service project, we must enable Azure Diagnostic setting at same time. The reason is except the default setting, some metrics data and log collected Azure Diagnostic setting will also be sent into Application Insight.
Since this blog is focusing on the Application Insight, there will not be a detailed presentation of the Diagnostic setting of Cloud Service. About the different options in Diagnostic setting, please kindly check document about their definition. Here we’ll only point out one important piece of information which may be confusing:
When the Diagnostic setting is enabled, the performance counter setting works differently on WebRole and WorkerRole:
\Process(??APP_WIN32_PROC??)% Processor Time
\Memory\Available Bytes
.NET CLR Exceptions(??APP_CLR_PROC??)# of Exceps Thrown / sec
\Process(??APP_WIN32_PROC??)\Private Bytes
\Process(??APP_WIN32_PROC??)\IO Data Bytes/sec
\Processor(_Total)% Processor time
\ASP.NET Applications(??APP_W3SVC_PROC??)\Requests/sec
\ASP.NET Applications(??APP_W3SVC_PROC??)\Request Execution Time
\ASP.NET Applications(??APP_W3SVC_PROC??)\Requests In Application Queue
2. For WorkerRole, if we disable the performance counter in Diagnostic Setting, there will be no metrics data of the performance automatically collected and saved in Application Insight.
3. For both WebRole and WorkerRole, there is always a kind of metric data which will be saved into Application Insight automatically, HeartBeatState. This metric is to identify whether the instance is still healthy at server level. It will be triggered every 15 minutes and saved into customMetrics table. For all the other performance metric data, like WebRole, we need to manually enable it in performance counter of Diagnostic Setting.
Please keep in mind that Application Insight will generate record only when there is really data collected. For example, if we deploy a WebRole but never send request to it, or if we deploy a WorkerRole but the code will not read/write data from/into disk at all, or the amount of data IO is quite low, it’s possible that Application Insight will not record any information.
As shared in part 1, the relationship between the options in Diagnostic setting and table name in Application Insight logs page is:
In this part, we’ll provide several common advanced ways to use Application Insight with Cloud Service. Compared to part 1, we will dig deeper into the advanced features such as using Azure Application Insight SDK in Cloud Service project to generate or modify the data saved into Application Insight.
It’s usual that the developers need to add some custom log in their application. This feature is also supported by Application Insight.
For that, comparing to the basic setup which we presented in part 1, we need to additionally install the SDK in the project. For details, please kindly refer to this document.
The startup function of WebRole can normally be Application_Start() in Global.asax. And the one of WorkerRole can normally be OnStart() in WorkerRoleName.cs.
In the official document, the recommended code to set the instrumentation key is:
TelemetryConfiguration configuration = TelemetryConfiguration.CreateDefault();
configuration.InstrumentationKey = RoleEnvironment.GetConfigurationSettingValue("APPINSIGHTS_INSTRUMENTATIONKEY");
var telemetryClient = new TelemetryClient(configuration);
This way can only set the configuration for one specific telemetryClient, which is used to communicate with Application Insight and send data. Since we may need to create telemetry clients in different classes, we can more easily set a default setting as what official example does.
TelemetryConfiguration.Active.InstrumentationKey = RoleEnvironment.GetConfigurationSettingValue("APPINSIGHTS_INSTRUMENTATIONKEY");
Then no matter where we want to add custom log, we simply need to:
using Microsoft.ApplicationInsights;
TelemetryClient ai = new TelemetryClient();
ai.TrackTrace("The custom log context");
With the above lines, the information will be logged into Application Insight, traces table.
Except the trace log, we can also use this way to record down the handled exceptions. More details will be explained in next two parts.
ai.TrackException(exception);
By design of Cloud Service, the request of WebRole is automatically marked with unique ID to identify the correlation. In WorkerRole, there isn’t such a system. But it’s possible to simulate the result of the WorkerRole application progress as a request and record this request into Application Insight. This can simplify our way to check the working status of the application in WorkerRole.
For more details about ai, please refer to the previous part, The way to add custom log.
The following is my WorkerRole application as example. This worker role will always keep adding trace logs into Application Insight every 30 seconds. But it will not always be added successfully because I use one changing bool variable select to make the Run function return a handled exception in every two loops. The trace log recorded into Application Insight will contain the timestamp, a fully random GUID as correlation ID to identify the relationship between request record and other records. Every loop is considered as a request, so it will generate a record of request with the start timestamp, the duration, the success status, the response code (200 for success and 500 for exception) and the correlation ID.
For the information of the request recorded, only Duration and Success status are necessary according to the document. All the other information can be removed. The reason why I put it into the request record is:
using Microsoft.WindowsAzure.ServiceRuntime;
using System;
using System.Diagnostics;
using Microsoft.ApplicationInsights;
using Microsoft.ApplicationInsights.Extensibility;
using Microsoft.ApplicationInsights.DataContracts;
namespace WorkerRole1
{
public class WorkerRole : RoleEntryPoint
{
private TelemetryClient ai = new TelemetryClient();
private bool select = true;
private int a = 0;
private int b;
private volatile bool onStopCalled = false;
private volatile bool returnedFromRunMethod = false;
private Stopwatch requestTimer;
private bool requestResult;
public override void Run()
{
ai.TrackTrace("WorkerRole1 is running AI");
var request = new RequestTelemetry();
while (true)
{
request.Name = "A test request";
request.Id = Guid.NewGuid().ToString();
request.StartTime = DateTimeOffset.UtcNow;
ai.TrackTrace("New cycle. AI " + DateTimeOffset.UtcNow.ToString() + " " + request.Id);
requestTimer = Stopwatch.StartNew();
try
{
if (onStopCalled == true)
{
ai.TrackTrace("Onstopcalled WorkerRole AI");
returnedFromRunMethod = true;
return;
}
if (select == true)
{
select = false;
b = 100 / a;
}
else
{
select = true;
b = 100 / 10;
}
ai.TrackTrace("normal WorkerRole AI " + DateTimeOffset.UtcNow.ToString() + " " + request.Id);
requestResult = true;
}
catch (Exception ex)
{
ai.TrackTrace("Exception WorkerRole AI " + DateTimeOffset.UtcNow.ToString() + " " + request.Id);
ai.TrackException(ex, new Dictionary<string, string>() { { "id", request.Id } });
requestResult = false;
}
request.Success = requestResult;
request.Duration = requestTimer.Elapsed;
request.ResponseCode = requestResult ? "200" : "500";
ai.TrackRequest(request);
System.Threading.Thread.Sleep(30*1000);
}
}
public override bool OnStart()
{
TelemetryConfiguration.Active.InstrumentationKey = RoleEnvironment.GetConfigurationSettingValue("APPINSIGHTS_INSTRUMENTATIONKEY");
bool result = base.OnStart();
return result;
}
public override void OnStop()
{
onStopCalled = true;
while (returnedFromRunMethod == false)
{
System.Threading.Thread.Sleep(1000);
}
}
}
}
P.S. We can use custom telemetry to do the same thing, but there will be many more configurations and more concepts to understand, so I don’t use it here. For more information, please kindly refer to the official example.
Please pay attention to the specific lines in the above example project which are necessary to record request into Application Insight:
Except the above necessary steps, please also pay attention to the way how we save the custom Trace log and Exception, such as line 61 and 62. The unique specific ID will be very helpful for us to track the request workflow in Application Insight if your application is multi thread.
With the above way, we can easily track the operation result of WorkerRole application as request result of WebRole application in Application Insights.
Usually, when we need to check the exception of WebRole, there must be requests sent and handled by IIS. For such kind of failed request, the unhandled exception and the handled exception (which is in try function) with ai.TrackException will be automatically collected into exception table. (For more details about ai, please refer to the previous part, The way to add custom log)
P.S. For the example exception in the screenshot, if there isn’t ai.TrackException in line 47, the exception will be considered as handled exception, but it will not be recorded into Application Insight.
There are two possible ways to find the exception record by a failed request record.
One way is explained in the “How to check the performance summary of Cloud Service WebRole” in part 1. We only need to find the failed request in Operations page by adjusting the time range and selecting corresponding operation. By clicking on the operation name, the failed requests with specific exception type or specific response code will be listed automatically and we can check the details.
The other way is using the Logs page of Application Insights. This way is more complicated, but it can allow you to use more custom filters to look for the specific types of exception and provide more details which will not be displayed by first way.
By design of Cloud Service, the request of WebRole is automatically marked with unique ID to identify the correlation. We only need to know how we can find them in Application Insight.
What we need to do is:
requests
| where resultCode == "500"
exceptions
| where operation_ParentId == "8d1adf11abf73c42"
The way of tracking exceptions based on a failed request will be very helpful when we want to troubleshoot an intermittent failure issue since it will contain the complete callstack of that request.
Since the unhandled exception of WorkerRole may cause the whole application downtime, we consider that all the exceptions in WorkerRole should be handled, which means to be included by try function. As WebRole, for the handled exceptions, we need to use ai.TrackException to record the exceptions into Application Insight. (For more details about ai, please refer to the previous part, The way to add custom log)
For the exception in WorkerRole, the way to check them is quite like the one of WebRole. The only difference is that there isn’t a built-in system to capture, or we say to record the exceptions automatically, so some additional code is needed for that.
Here we need to talk about the multiple possible situations:
In this situation, the way of checking Failures page is still possible for user to use, but we will need to switch to Exceptions page and check the timestamp by ourselves.
The way to check accurate data in Logs page can also be used. The following is an example query to check exceptions between a specific time range.
exceptions
| where timestamp between (datetime(2022-05-11 00:00) .. datetime(2022-05-13 00:00))
requests
| where success == False
exceptions
| where * contains "ade4308c-28cb-4aca-bda1-0ba7c32b8c36"
In this part, we’ll provide several commonly asked scenarios and their related guidelines about how to use Application Insight to meet the requirements.
It’s an often-asked question how we can monitor the Memory of the Cloud Service. For WebRole, we can also monitor the request status including the number of total requests, failed requests, exceptions etc. It’s reasonable because in the default metrics page of Cloud Service, the only collected data are CPU percentage, disk read/write and network in/out.
To meet this requirement, it’s the most basic usage of Application Insight. What we need to do is to only enable Application Insight on the role where we want to collect metrics data from and that’s all. We do not need any additional configuration in the diagnostic setting. The automatically collected data will contain all needed data for Memory usage and request status of a WebRole.
To see the collected data, it’s recommended to use the Metrics page of the Application Insight. Under Application Insight standard metrics as Metric Namespace, we can find the Available memory under Server part for the memory. Also we can find Server requests under Server part, Failed requests and exceptions under Failure part or some other metric type to monitor the request status.
After checking the metrics data, if we need to get more detailed information such as which kind of exceptions the application is returning, we can switch to corresponding page, such as Failures or Performance page.
Similar to WebRole, we can also monitor the memory and request status of the Cloud Service. But there will be some additional limitations:
P.S. Please pay attention to the following 2 points:
This is also a commonly asked question. For example, when our Cloud Service WebRole receives a request, it needs to get some data from a remote server, such as SQL Database, then generate the data into a web page and return it to the user. Imagine that this progress is much slower than expected but still successful, it’s reasonable that we want to clarify whether most of time spent is during the communication with SQL Database or during the progress inside the Cloud Service. For that we need to add some additional custom log to record the timestamp of each step, such as start of the progress, start of the communication with SQL Database, end of the communication with SQL Database and end of generating the webpage etc.
The above is only one possible scenario as example. The design of the custom log system needs to be done by developers for different scenarios. In this blog, we’ll only provide a few tips about how to design a such kind of custom log:
Once the system is online, we can check the requests in the Performance page of Application Insights and focus on the request durations by following:
If the system is not quite complicated, the time spent by different steps will be displayed in the End-to-end transaction chart. If the system is complicated or we’re using a custom ID which causes it unable to display the data in chart, we can get all related trace logs containing same correlation ID by following query:
traces
| where * contains "ade4308c-28cb-4aca-bda1-0ba7c32b8c36"
By this way, we can calculate the difference between every trace log to get the time spent by every step.
Sometimes we also need to identify issues such as a WorkerRole consuming very high CPU/Memory. What we can observe from outside of the Cloud Service is that the WorkerRole is consuming much CPU/Memory but we cannot know what exactly is happening in the instance.
To troubleshoot such kind of issue, we mainly could do it by two steps:
procdump.exe -accepteula -c 85 -s 3 -n 5 WaWorkerHost.exe c:\procdumps
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.