SentinelHealth: Scheduled Rule Retry Logging Does Not Match Docs

Question

## ObjectiveI am working on a health checks architecture for Microsoft Sentinel analytic rules. The goal is to build a set of monitoring queries/approaches that cover rule execution failures, configuration issues (entity mapping, partial success), rule audit tracking, and auto-disabled rule detection.&nbsp;## My Current ApproachSo far I have built monitoring for the following areas using the SentinelHealth and SentinelAudit tables:- Scheduled rule window failures (retry exhaustion)- NRT rule execution delays (cumulative delay over 25 minutes)- Partial success and configuration issues (entity mapping drops, alert size limits, semantic errors) with transient error codes filtered out- Auto-disabled rules detection- Rule disable/delete audit tracking via SentinelAudit + AzActivity&nbsp;## The Issue: Scheduled Rule Retry LoggingThe documentation at https://learn.microsoft.com/en-us/azure/sentinel/monitor-analytics-rule-integrity#scheduled-rules states that when a scheduled rule fails, it is retried 5 more times on the same window (6 total attempts). It also provides this query to detect completely skipped windows:```kql_SentinelHealth()| where SentinelResourceType == @"Analytics Rule"| where SentinelResourceKind == "Scheduled"| where Status != "Success"| extend startTime = tostring(ExtendedProperties["QueryStartTimeUTC"])| summarize failuresByStartTime = count() by startTime, SentinelResourceId| where failuresByStartTime == 6| summarize count() by SentinelResourceId```&nbsp;This query assumes that each retry attempt is logged as a separate event in SentinelHealth, all sharing the same QueryStartTimeUTC. You would then count 6 failure records per startTime to identify a fully skipped window.However, in practice I am seeing different behavior. I ran a diagnostic query with a 90-day lookback (480 non-success events total, 73 unique rules). Every single event had a count of 1 per unique (SentinelResourceName, startTime) combination. No grouping of retries was observed at all.I then found an actual failed-window event that confirms this. Here is the record:- Rule: Port scan detected (ASIM Network Session schema)- Status: Failure- Description: "Rule's scheduled run at 06/01/2026 10:43:55 failed after numerous attempts. It will be re-executed over the next scheduled time."- Issue Code: SemanticErrorInQuery- Only 1 SentinelHealth record exists for this failed window&nbsp;The Description field says "failed after numerous attempts" which indicates the retries happened internally, but only one consolidated Failure event was written to SentinelHealth after all retries were exhausted. The individual retry attempts do not appear as separate records.This means the failuresByStartTime == 6 query from the documentation would never match this pattern, because there is only 1 record per failed window, not 6.&nbsp;## Why This MattersYes, completely skipped windows are rare. In my 90-day dataset most failures were permanent types (SemanticErrorInQuery, QueryGeneralError) that would not benefit from retries anyway. But they still happen, and if a tenant experiences a transient issue that causes a higher rate of failed windows, the documented query would silently return nothing.&nbsp;For my health checks I have rewritten the detection to simply look for Status == "Failure" with Description containing "failed after numerous attempts" which matches the actual consolidated event Sentinel writes.&nbsp;## QuestionsIs the documented failuresByStartTime == 6 query still accurate? Or has the retry logging behavior changed to write a single consolidated event per failed window?Are there specific failure types or conditions where individual retries are logged as separate events? Perhaps transient failures behave differently from permanent ones in this regard?For anyone else building health monitoring on SentinelHealth - am I missing any important use cases beyond what I described above?&nbsp;Any clarification would be appreciated.

jamony · Answer

Hi, this is the sort of Sentinel issue where I would compare docs against real telemetry very carefully before building monitoring around it.

I would validate:

1. Rule type and schedule.

2. Whether the rule actually failed, retried, or skipped.

3. Tables receiving SentinelHealth records.

4. Time range and ingestion delay.

5. Whether the workspace is on the latest unified portal behavior.

If the docs say a retry should be logged but you cannot reproduce that, I would file feedback on the docs and include a minimal repro rule. For monitoring, build a fallback query around rule last-run status if retry events are not reliable.

Forum Discussion

SentinelHealth: Scheduled Rule Retry Logging Does Not Match Docs

1 Reply