The Log Analytics product team identified the two important indicators of workspace’s health as ingestion latency and query success percentage.
Ingestion latency signals measure the time it takes for an event to be reported, processed and become available for search in your logs data store. You can read more about how we calculate the latency and what can influence data latency time.
Query success percentage measures the number of queries that return a non-Http5XX - for example, when a query completes successfully or fails with user-side errors. This number does not include queries initiated by Azure log search Alerts or Azure Sentinel. This signal will be available via Resource health in the near future.
We’ve worked to enhance these workspace health indicators with features we released in the past year. We improved the workspace operation logs, where you can find information about issues related to log parsing, limitations reached and general data-related issues. You can also create alerts on these logs to get notifications on potential data loss events.
We created a workspace status indication based on the data in the workspace operation logs table. In the workspace overview blade, you’ll have an indication of your overall workspace state. We’ll show warnings for issues of concern and errors for critical matters that need your attention.
The Workspace insights blade provides a unified view of your workspace usage, performance, health, agents, queries, and change log. This can help you understand the overall state of the workspace, its performance,ingestionspikesor drops, latency, and your queries' performance.
You can view a workspace’s resource health in several places in the Azure portal:
1) From the Monitor service menu, select Service health > Resource health and filter for the Log Analytics resource type. 2) From the Log Analytics workspace screen, select Resource health. 3) From the Log Analytics workspace screen, select Insights and select the health tab.
We’re now happy to announce the release of two more resource health reports.
The report released today covers ingestion latency issues, and it shows three states:
Available – No workspace latency issues detected in the specified timeframe.
Degraded – Estimated ingestion latency of more than one hour for more than 15 minutes. We’re actively working to mitigate this incident.
Unknown – We are currently unable to determine the health of this workspace, or no data was ingested to this workspace in over 24 hours.
Moving forward, Resource health will support signals for drops is the query success rate.
We recommend setting up alerts on the workspace resource health signals following the steps inthis article.
Providing visibility on the health of your observability service is a focus area that will get further investment. Additional capabilities will be added in the future.