Improving the tools for monitoring Log Analytics workspace health

Published Jul 31 2022 03:34 AM 2,323 Views
Microsoft

The Log Analytics product team identified the two important indicators of workspace’s health as ingestion latency and query success percentage.

 

Ingestion latency signals measure the time it takes for an event to be reported, processed and become available for search in your logs data store. You can read more about how we calculate the latency and what can influence data latency time.

 

Query success percentage measures the number of queries that return a non-Http5XX - for example, when a query completes successfully or fails with user-side errors. This number does not include queries initiated by Azure log search Alerts or Azure Sentinel. This signal will be available via Resource health in the near future.

 

We’ve worked to enhance these workspace health indicators with features we released in the past year. We improved the workspace operation logs, where you can find information about issues related to log parsing, limitations reached and general data-related issues. You can also create alerts on these logs to get notifications on potential data loss events. 

 

We created a workspace status indication based on the data in the workspace operation logs table. In the workspace overview blade, you’ll have an indication of your overall workspace state. We’ll show warnings for issues of concern and errors for critical matters that need your attention.

shemers_0-1657111945262.png

 

 

The Workspace insights blade provides a unified view of your workspace usage, performance, health, agents, queries, and change log. This can help you understand the overall state of the workspace, its performance, ingestion spikes or drops, latency, and your queries' performance.

 

You can view a workspace’s resource health in several places in the Azure portal:

1) From the Monitor service menu, select Service health > Resource health and filter for the Log Analytics resource type.
2) From the Log Analytics workspace screen, select Resource health.
3) From the Log Analytics workspace screen, select Insights and select the
health tab.

 

We’re now happy to announce the release of two more resource health reports.

 

The report released today covers ingestion latency issues, and it shows three states:

AvailableNo workspace latency issues detected in the specified timeframe.

Degraded Estimated ingestion latency of more than one hour for more than 15 minutes. We’re actively working to mitigate this incident.

Unknown We are currently unable to determine the health of this workspace, or no data was ingested to this workspace in over 24 hours.

 

Moving forward, Resource health will support signals for drops is the query success rate.

 

We recommend setting up alerts on the workspace resource health signals following the steps in this article. 

Providing visibility on the health of your observability service is a focus area that will get further investment. Additional capabilities will be added in the future.  

 

Related articles:

Resource Health overview

Resource types and health checks in Azure resource health

Log analytics ingestion latency overview

Azure Monitor Status

 

Co-Authors
Version history
Last update:
‎Jul 07 2022 12:45 AM
Updated by: