Forum Discussion

SomeZnimav's avatar
SomeZnimav
Occasional Reader
Jun 03, 2026

SentinelHealth: Scheduled Rule Retry Logging Does Not Match Docs

## Objective

I am working on a health checks architecture for Microsoft Sentinel analytic rules. The goal is to build a set of monitoring queries/approaches that cover rule execution failures, configuration issues (entity mapping, partial success), rule audit tracking, and auto-disabled rule detection.

 

## My Current Approach

So far I have built monitoring for the following areas using the SentinelHealth and SentinelAudit tables:

- Scheduled rule window failures (retry exhaustion)

- NRT rule execution delays (cumulative delay over 25 minutes)

- Partial success and configuration issues (entity mapping drops, alert size limits, semantic errors) with transient error codes filtered out

- Auto-disabled rules detection

- Rule disable/delete audit tracking via SentinelAudit + AzActivity

 

## The Issue: Scheduled Rule Retry Logging

The documentation at https://learn.microsoft.com/en-us/azure/sentinel/monitor-analytics-rule-integrity#scheduled-rules states that when a scheduled rule fails, it is retried 5 more times on the same window (6 total attempts). It also provides this query to detect completely skipped windows:

```kql

_SentinelHealth()

| where SentinelResourceType == @"Analytics Rule"

| where SentinelResourceKind == "Scheduled"

| where Status != "Success"

| extend startTime = tostring(ExtendedProperties["QueryStartTimeUTC"])

| summarize failuresByStartTime = count() by startTime, SentinelResourceId

| where failuresByStartTime == 6

| summarize count() by SentinelResourceId

```

 

This query assumes that each retry attempt is logged as a separate event in SentinelHealth, all sharing the same QueryStartTimeUTC. You would then count 6 failure records per startTime to identify a fully skipped window.

However, in practice I am seeing different behavior. I ran a diagnostic query with a 90-day lookback (480 non-success events total, 73 unique rules). Every single event had a count of 1 per unique (SentinelResourceName, startTime) combination. No grouping of retries was observed at all.

I then found an actual failed-window event that confirms this. Here is the record:

- Rule: Port scan detected (ASIM Network Session schema)

- Status: Failure

- Description: "Rule's scheduled run at 06/01/2026 10:43:55 failed after numerous attempts. It will be re-executed over the next scheduled time."

- Issue Code: SemanticErrorInQuery

- Only 1 SentinelHealth record exists for this failed window

 

The Description field says "failed after numerous attempts" which indicates the retries happened internally, but only one consolidated Failure event was written to SentinelHealth after all retries were exhausted. The individual retry attempts do not appear as separate records.

This means the failuresByStartTime == 6 query from the documentation would never match this pattern, because there is only 1 record per failed window, not 6.

 

## Why This Matters

Yes, completely skipped windows are rare. In my 90-day dataset most failures were permanent types (SemanticErrorInQuery, QueryGeneralError) that would not benefit from retries anyway. But they still happen, and if a tenant experiences a transient issue that causes a higher rate of failed windows, the documented query would silently return nothing.

 

For my health checks I have rewritten the detection to simply look for Status == "Failure" with Description containing "failed after numerous attempts" which matches the actual consolidated event Sentinel writes.

 

## Questions

  1. Is the documented failuresByStartTime == 6 query still accurate? Or has the retry logging behavior changed to write a single consolidated event per failed window?
  2. Are there specific failure types or conditions where individual retries are logged as separate events? Perhaps transient failures behave differently from permanent ones in this regard?
  3. For anyone else building health monitoring on SentinelHealth - am I missing any important use cases beyond what I described above?

 

Any clarification would be appreciated.

No RepliesBe the first to reply