Optimizing Cost using the Azure Monitor OpenTelemetry Distro
Published Apr 17 2024 11:00 AM 1,852 Views
Microsoft

Joe Beernink, Cijo Thomas, and Vishwesh Bankwar contributed to this blog.

 

When Willow, a global technology company, decided to use Azure Monitor, they leveraged the .NET Azure Monitor OpenTelemetry Distro. The “Distro” is a data collection library which powers Azure Monitor Application Insights and simplifies OpenTelemetry on Azure.

 

Joe Beernink, Principal Software Engineer at Willow, shares more, “We wanted to use the latest OpenTelemetry libraries to be well-positioned for the future, and we like the flexibility of Azure Monitor’s consumption-based pricing so we can optimize for our scenarios and our observability can scale up as needed.”

 

However, immediately after Willow deployed the Azure Monitor OpenTelemetry Distro, their Azure Costs spiked, and they approached Azure Monitor Team for strategies to optimize cost. By following the strategies below, they were able to reduce costs by over 90%!

 

Identify cost drivers

First, it’s a good idea to better understand your cost baseline to learn which factors are driving up your cost. To do this, click on Usage and estimated costs within your Log Analytics Workspace that is associated with your App Insights Resource. Any table that starts with App is used to power Application Insights. Learn More.

CostBlogGIF.gif

What Willow discovered is most of their costs were driven by three tables:

  • ContainerAppConsoleLogs_CL
  • AppTraces (which maps to OpenTelemetry Logs due to historical reasons)
  • AppRequests (which maps to OpenTelemetry server spans)

Now, Willow was able to target their cost reduction techniques.

 

Disable Console Logging Provider in production

The default ASP.NET Core web app adds a Console Logging Provider by default. It was accidentally left on in production and quickly drove up costs. Azure Container Apps exported the console logs to the ContainerAppConsoleLogs_CL table in Log Analytics, adding millions of rows per day. Disabling the Console Logging Provider does not impact Azure Monitor Application Insights, and it reduced Willow's observability costs by a stunning 87%! Similarly, customers can also opt-in to an OpenTelemetry Console Exporter, and leaving it on in production can also drive up costs.

 

Collect fewer logs

The default “Information” log level can be chatty and therefore costly. This was noticeable in the high ingestion volume in the AppTraces table. By configuring the application to only collect logs with a level of “Warning” and higher, it further reduced Willow’s observability costs by another 44%!

Here’s how they configured their appsettings.json file:

 

 "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Azure": "Warning",
      "Microsoft": "Warning",
    }
  },

 

If more selective Log Filtering is required, customers can add filtering rules using iLogger. It’s also possible to use an OpenTelemetry Log Filtering Processor, though it’s typically more expensive in terms of CPU consumption.

 

Filter out noisy traces

Next, Willow realized that their health checks were generating lots of requests that weren’t worth the additional cost. This was noticeable in the high ingestion volume in the AppRequests table. To filter them out, they configured their AspNetCoreInstrumentation to filter out requests from the healthz, readyz, and livez endpoints.

 

Here’s the code they added to their application startup class:

 

builder.AddAspNetCoreInstrumentation((options) => options.Filter = httpContext =>
{
// Do not collect request logs for healthz, livez, readyz methods
return
   !httpContext.Request.Path.ToString().Contains("healthz", StringComparison.InvariantCultureIgnoreCase) &&
   !httpContext.Request.Path.ToString().Contains("readyz", StringComparison.InvariantCultureIgnoreCase) &&
   !httpContext.Request.Path.ToString().Contains("livez", StringComparison.InvariantCultureIgnoreCase);
});

 

Combined with the other changes, this decreased the monthly log analytics charges by a total of 92%, which for a startup, or any company, is a serious positive impact on the bottom line!

 

Use metrics for alerting

Willow needed to slice and alert on customer ID, a custom business attribute. Since they are a B2B that serves a select set of customers, cardinality limits were not a concern.

 

Initially they were unable to figure out how to add custom attributes to ASP.NET Core Instrumentation Metrics, so they were forced to send 100% of OpenTelemetry traces for accurate alerting.

 

However, when they discovered that .NET 8+ natively supports enriching ASP.NET Core request metrics with "tags", they were able to change their alerts to be based on OpenTelemetry metrics and break their overreliance on traces.

Here’s how they added metric tags:

 

public static class IApplicationBuilderExtensions
    {
        public static IApplicationBuilder UseWillowContext(this IApplicationBuilder app, IConfiguration configuration)
        {
            var willowContext = configuration.GetSection("WillowContext").Get<WillowContextOptions>();
            if (willowContext != null)
            {
                app.Use(async (context, next) =>
                {
                    var tagsFeature = context.Features.Get<IHttpMetricsTagsFeature>();

                    if (tagsFeature != null)
                    {
                        foreach (var val in willowContext.Values)
                        {
                            tagsFeature.Tags.Add(val);
                        }
                    }
                    await next.Invoke();
                });
            }
#endif
            return app;
        }
    }

 

In the future, Willow could further reduce costs by optimizing sampling on OpenTelemetry logs and traces without any impact to metric accuracy. As Willow scales to higher traffic, they have a great foundation to manage costs even as the number of requests grows. Among other things, this would reduce the volume to the AppMetrics table. This underscores the value that can be gained by alerting on metrics from Azure Monitor’s time series database as opposed to log-based metrics from Log Analytics.

 

Select a cheaper log storage tier

Willow explored whether they could switch to the Basic Log tier for their AppTraces Table in Log Analytics (which maps to OpenTelemetry Logs due to historical reasons). While they could have reduced their cost for logs to about 1/5 per GB compared to the default analytics tier, Willow decided now was not the time to make a change. They wanted a longer storage configuration only offered by the Analytics Tier. Willow is aware that Azure Monitor is looking to further stratify their storage tiers to meet customer demands and plans to follow Azure Monitor Twitter and follow Azure Updates for the latest announcements.

 

Introducing trace-based log sampling

The most recent release of the Azure Monitor OpenTelemetry Distro includes Trace-based Log Sampling. This means that the Azure Monitor Sampler will sample out logs associated with traces that are sampled out, further reducing costs. Any OpenTelemetry logs not in the context of a trace will automatically be sampled in to ensure nothing critical is sampled out. This improves sampling effectiveness and offers yet another capability to reduce costs.

 

Next Steps

Enable the Application Insights via the Azure Monitor OpenTelemetry Distro and get the full benefits of Azure Monitor’s consumption-pricing model. Consider the tips offered in this blog to reduce your observability spend and set yourself up for long-term success!

 

Co-Authors
Version history
Last update:
‎Apr 16 2024 01:00 PM
Updated by: