log analytics
150 TopicsKerberos and the End of RC4: Protocol Hardening and Preparing for CVE‑2026‑20833
CVE-2026-20833 addresses the continued use of the RC4‑HMAC algorithm within the Kerberos protocol in Active Directory environments. Although RC4 has been retained for many years for compatibility with legacy systems, it is now considered cryptographically weak and unsuitable for modern authentication scenarios. As part of the security evolution of Kerberos, Microsoft has initiated a process of progressive protocol hardening, whose objective is to eliminate RC4 as an implicit fallback, establishing AES128 and AES256 as the default and recommended algorithms. This change should not be treated as optional or merely preventive. It represents a structural change in Kerberos behavior that will be progressively enforced through Windows security updates, culminating in a model where RC4 will no longer be implicitly accepted by the KDC. If Active Directory environments maintain service accounts, applications, or systems dependent on RC4, authentication failures may occur after the application of the updates planned for 2026, especially during the enforcement phases introduced starting in April and finalized in July 2026. For this reason, it is essential that organizations proactively identify and eliminate RC4 dependencies, ensuring that accounts, services, and applications are properly configured to use AES128 or AES256 before the definitive changes to Kerberos protocol behavior take effect. Official Microsoft References CVE-2026-25177 - Security Update Guide - Microsoft - Active Directory Domain Services Elevation of Privilege Vulnerability Microsoft Support – How to manage Kerberos KDC usage of RC4 for service account ticket issuance changes related to CVE-2026-20833 (KB 5073381) Microsoft Learn – Detect and Remediate RC4 Usage in Kerberos AskDS – What is going on with RC4 in Kerberos? Beyond RC4 for Windows authentication | Microsoft Windows Server Blog So, you think you’re ready for enforcing AES for Kerberos? | Microsoft Community Hub Risk Associated with the Vulnerability When RC4 is used in Kerberos tickets, an authenticated attacker can request Service Tickets (TGS) for valid SPNs, capture these tickets, and perform offline brute-force attacks, particularly Kerberoasting scenarios, with the goal of recovering service account passwords. Compared to AES, RC4 allows significantly faster cracking, especially for older accounts or accounts with weak passwords. Technical Overview of the Exploitation In simplified terms, the exploitation flow occurs as follows: The attacker requests a TGS for a valid SPN. The KDC issues the ticket using RC4, when that algorithm is still accepted. The ticket is captured and analyzed offline. The service account password is recovered. The compromised account is used for lateral movement or privilege escalation. Official Timeline Defined by Microsoft Important clarification on enforcement behavior Explicit account encryption type configurations continue to be honored even during enforcement mode. The Kerberos hardening associated with CVE‑2026‑20833 focuses on changing the default behavior of the KDC, enforcing AES-only encryption for TGS ticket issuance when no explicit configuration exists. This approach follows the same enforcement model previously applied to Kerberos session keys in earlier security updates (for example, KB5021131 related to CVE‑2022‑37966), representing another step in the progressive removal of RC4 as an implicit fallback. January 2026 – Audit Phase Starting in January 2026, Microsoft initiated the Audit Phase related to changes in RC4 usage within Kerberos, as described in the official guidance associated with CVE-2026-20833. The primary objective of this phase is to allow organizations to identify existing RC4 dependencies before enforcement changes are applied in later phases. During this phase, no functional breakage is expected, as RC4 is still permitted by the KDC. However, additional auditing mechanisms were introduced, providing greater visibility into how Kerberos tickets are issued in the environment. Analysis is primarily based on the following events recorded in the Security Log of Domain Controllers: Event ID 4768 – Kerberos Authentication Service (AS request / Ticket Granting Ticket) Event ID 4769 – Kerberos Service Ticket Operations (Ticket Granting Service – TGS) Additional events related to the KDCSVC service These events allow identification of: the account that requested authentication the requested service or SPN the source host of the request the encryption algorithm used for the ticket and session key This information is critical for detecting scenarios where RC4 is still being implicitly used, enabling operations teams to plan remediation ahead of the enforcement phase. If these events are not being logged on Domain Controllers, it is necessary to verify whether Kerberos auditing is properly enabled. For Kerberos authentication events to be recorded in the Security Log, the corresponding audit policies must be configured. The minimum recommended configuration is to enable Success auditing for the following subcategories: Kerberos Authentication Service Kerberos Service Ticket Operations Verification can be performed directly on a Domain Controller using the following commands: auditpol /get /subcategory:"Kerberos Service Ticket Operations" auditpol /get /subcategory:"Kerberos Authentication Service" In enterprise environments, the recommended approach is to apply this configuration via Group Policy, ensuring consistency across all Domain Controllers. The corresponding policy can be found at: Computer Configuration - Policies - Windows Settings - Security Settings - Advanced Audit Policy Configuration - Audit Policies - Account Logon Once enabled, these audits record events 4768 and 4769 in the Domain Controllers’ Security Log, allowing analysis tools—such as inventory scripts or SIEM/Log Analytics queries—to accurately identify where RC4 is still present in the Kerberos authentication flow. April 2026 – Enforcement with Manual Rollback With the April 2026 update, the KDC begins operating in AES-only mode (0x18) when the msDS-SupportedEncryptionTypes attribute is not defined. This means RC4 is no longer accepted as an implicit fallback. During this phase, applications, accounts, or computers that still implicitly depend on RC4 may start failing. Manual rollback remains possible via explicit configuration of the attribute in Active Directory. July 2026 – Final Enforcement Starting in July 2026, audit mode and rollback options are removed. RC4 will only function if explicitly configured—a practice that is strongly discouraged. This represents the point of no return in the hardening process. Official Monitoring Approach Microsoft provides official scripts in the repository: https://github.com/microsoft/Kerberos-Crypto/tree/main/scripts The two primary scripts used in this analysis are: Get-KerbEncryptionUsage.ps1 The Get-KerbEncryptionUsage.ps1 script, provided by Microsoft in the Kerberos‑Crypto repository, is designed to identify how Kerberos tickets are issued in the environment by analyzing authentication events recorded on Domain Controllers. Data collection is primarily based on: Event ID 4768 – Kerberos Authentication Service (AS‑REQ / TGT issuance) Event ID 4769 – Kerberos Service Ticket Operations (TGS issuance) From these events, the script extracts and consolidates several relevant fields for authentication flow analysis: Time – when the authentication occurred Requestor – IP address or host that initiated the request Source – account that requested the ticket Target – requested service or SPN Type – operation type (AS or TGS) Ticket – algorithm used to encrypt the ticket SessionKey – algorithm used to protect the session key Based on these fields, it becomes possible to objectively identify which algorithms are being used in the environment, both for ticket issuance and session establishment. This visibility is essential for detecting RC4 dependencies in the Kerberos authentication flow, enabling precise identification of which clients, services, or accounts still rely on this legacy algorithm. Example usage: .\Get-KerbEncryptionUsage.ps1 -Encryption RC4 -Searchscope AllKdcs | Export-Csv -Path .\KerbUsage_RC4_All_ThisDC.csv -NoTypeInformation -Encoding UTF8 Data Consolidation and Analysis In enterprise environments, where event volumes may be high, it is recommended to consolidate script results into analytical tools such as Power BI to facilitate visualization and investigation. The presented image illustrates an example dashboard built from collected results, enabling visibility into: Total events analyzed Number of Domain Controllers involved Number of requesting clients (Requestors) Most frequently involved services or SPNs (Targets) Temporal distribution of events RC4 usage scenarios (Ticket, SessionKey, or both) This type of visualization enables rapid identification of RC4 usage patterns, remediation prioritization, and progress tracking as dependencies are eliminated. Additionally, dashboards help answer key operational questions, such as: Which services still depend on RC4 Which clients are negotiating RC4 for sessions Which Domain Controllers are issuing these tickets Whether RC4 usage is decreasing over time This combined automated collection + analytical visualization approach is the recommended strategy to prepare environments for the Microsoft changes related to CVE‑2026‑20833 and the progressive removal of RC4 in Kerberos. Visualizing Results with Power BI To facilitate analysis and monitoring of RC4 usage in Kerberos, it is recommended to consolidate script results into a Power BI analytical dashboard. 1. Install Power BI Desktop Download and install Power BI Desktop from the official Microsoft website 2. Execute data collection After running the Get-KerbEncryptionUsage.ps1 script, save the generated CSV file to the following directory: C:\Temp\Kerberos_KDC_usage_of_RC4_Logs\KerbEncryptionUsage_RC4.csv 3. Open the dashboard in Power BI Open the file RC4-KerbEncryptionUsage-Dashboards.pbix using Power BI Desktop. If you are interested, please leave a comment on this post with your email address, and I will be happy to share with you. 4. Update the data source If the CSV file is located in a different directory, it will be necessary to adjust the data source path in Power BI. As illustrated, the dashboard uses a parameter named CsvFilePath, which defines the path to the collected CSV file. To adjust it: Open Transform Data in Power BI. Locate the CsvFilePath parameter in the list of Queries. Update the value to the directory where the CSV file was saved. Click Refresh Preview or Refresh to update the data. Click Home → Close & Apply. This approach allows rapid identification of RC4 dependencies, prioritization of remediation actions, and tracking of progress throughout the elimination process. List-AccountKeys.ps1 This script is used to identify which long-term keys are present on user, computer, and service accounts, enabling verification of whether RC4 is still required or whether AES128/AES256 keys are already available. Interpreting Observed Scenarios Microsoft recommends analyzing RC4 usage by jointly considering two key fields present in Kerberos events: Ticket Encryption Type Session Encryption Type Each combination represents a distinct Kerberos behavior, indicating the source of the issue, risk level, and remediation point in the environment. In addition to events 4768 and 4769, updates released starting January 13, 2026, introduce new Kdcsvc events in the System Event Log that assist in identifying RC4 dependencies ahead of enforcement. These events include: Event ID 201 – RC4 usage detected because the client advertises only RC4 and the service does not have msDS-SupportedEncryptionTypes defined. Event ID 202 – RC4 usage detected because the service account does not have AES keys and the msDS-SupportedEncryptionTypes attribute is not defined. Event ID 203 – RC4 usage blocked (enforcement phase) because the client advertises only RC4 and the service does not have msDS-SupportedEncryptionTypes defined. Event ID 204 – RC4 usage blocked (enforcement phase) because the service account does not have AES keys and msDS-SupportedEncryptionTypes is not defined. Event ID 205 – Detection of explicit enablement of insecure algorithms (such as RC4) in the domain policy DefaultDomainSupportedEncTypes. Event ID 206 – RC4 usage detected because the service accepts only AES, but the client does not advertise AES support. Event ID 207 – RC4 usage detected because the service is configured for AES, but the service account does not have AES keys. Event ID 208 – RC4 usage blocked (enforcement phase) because the service accepts only AES and the client does not advertise AES support. Event ID 209 – RC4 usage blocked (enforcement phase) because the service accepts only AES, but the service account does not have AES keys. https://support.microsoft.com/en-gb/topic/how-to-manage-kerberos-kdc-usage-of-rc4-for-service-account-ticket-issuance-changes-related-to-cve-2026-20833-1ebcda33-720a-4da8-93c1-b0496e1910dc They indicate situations where RC4 usage will be blocked in future phases, allowing early detection of configuration issues in clients, services, or accounts. These events are logged under: Log: System Source: Kdcsvc Below are the primary scenarios observed during the analysis of Kerberos authentication behavior, highlighting how RC4 usage manifests across different ticket and session encryption combinations. Each scenario represents a distinct risk profile and indicates specific remediation actions required to ensure compliance with the upcoming enforcement phases. Scenario A – RC4 / RC4 In this scenario, both the Kerberos ticket and the session key are issued using RC4. This is the worst possible scenario from a security and compatibility perspective, as it indicates full and explicit dependence on RC4 in the authentication flow. This condition significantly increases exposure to Kerberoasting attacks, since RC4‑encrypted tickets can be subjected to offline brute-force attacks to recover service account passwords. In addition, environments remaining in this state have a high probability of authentication failure after the April 2026 updates, when RC4 will no longer be accepted as an implicit fallback by the KDC. Events Associated with This Scenario During the Audit Phase, this scenario is typically associated with: Event ID 201 – Kdcsvc Indicates that: the client advertises only RC4 the service does not have msDS-SupportedEncryptionTypes defined the Domain Controller does not have DefaultDomainSupportedEncTypes defined This means RC4 is being used implicitly. This event indicates that the authentication will fail during the enforcement phase. Event ID 202 – Kdcsvc Indicates that: the service account does not have AES keys the service does not have msDS-SupportedEncryptionTypes defined This typically occurs when: legacy accounts have never had their passwords reset only RC4 keys exist in Active Directory Possible Causes Common causes include: the originating client (Requestor) advertises only RC4 the target service (Target) is not explicitly configured to support AES the account has only legacy RC4 keys the msDS-SupportedEncryptionTypes attribute is not defined Recommended Actions To remediate this scenario: Correctly identify the object involved in the authentication flow, typically: a service account (SPN) a computer account or a Domain Controller computer object Verify whether the object has AES keys available using analysis tools or scripts such as List-AccountKeys.ps1. If AES keys are not present, reset the account password, forcing generation of modern cryptographic keys (AES128 and AES256). Explicitly define the msDS-SupportedEncryptionTypes attribute to enable AES support. Recommended value for modern environments: 0x18 (AES128 + AES256) = 24 As illustrated below, this configuration can be applied directly to the msDS-SupportedEncryptionTypes attribute in Active Directory. AES can also be enabled via Active Directory Users and Computers by explicitly selecting: This account supports Kerberos AES 128 bit encryption This account supports Kerberos AES 256 bit encryption These options ensure that new Kerberos tickets are issued using AES algorithms instead of RC4. Temporary RC4 Usage (Controlled Rollback) In transitional scenarios—during migration or troubleshooting—it may be acceptable to temporarily use: 0x1C (RC4 + AES) = 28 This configuration allows the object to accept both RC4 and AES simultaneously, functioning as a controlled rollback while legacy dependencies are identified and corrected. However, the final objective must be to fully eliminate RC4 before the final enforcement phase in July 2026, ensuring the environment operates exclusively with AES128 and AES256. Scenario B – AES / RC4 In this case, the ticket is protected with AES, but the session is still negotiated using RC4. This typically indicates a client limitation, legacy configuration, or restricted advertisement of supported algorithms. Events Associated with This Scenario During the Audit Phase, this scenario may generate: Event ID 206 Indicates that: the service accepts only AES the client does not advertise AES in the Advertised Etypes In this case, the client is the issue. Recommended Action Investigate the Requestor Validate operating system, client type, and advertised algorithms Review legacy GPOs, hardening configurations, or settings that still force RC4 For Linux clients or third‑party applications, review krb5.conf, keytabs, and Kerberos libraries Scenario C – RC4 / AES Here, the session already uses AES, but the ticket is still issued using RC4. This indicates an implicit RC4 dependency on the Target or KDC side, and the environment may fail once enforcement begins. Events Associated with This Scenario This scenario may generate: Event ID 205 Indicates that the domain has explicit insecure algorithm configuration in: DefaultDomainSupportedEncTypes This means RC4 is explicitly allowed at the domain level. Recommended Action Correct the Target object Explicitly define msDS-SupportedEncryptionTypes with 0x18 = 24 Revalidate new ticket issuance to confirm full migration to AES / AES Conclusion CVE‑2026‑20833 represents a structural change in Kerberos behavior within Active Directory environments. Proper monitoring is essential before April 2026, and the msDS-SupportedEncryptionTypes attribute becomes the primary control point for service accounts, computer accounts, and Domain Controllers. July 2026 represents the final enforcement point, after which there will be no implicit rollback to RC4.1.5KViews3likes5CommentsSplitting single-tenant Microsoft Defender XDR Sentinel logs in multiple company scenarios
This article describes a simple, yet effective solution for the problem of segregating Microsoft Defender XDR and Entra ID Sentinel logs ingestion in a single-tenant with multiple companies scenario, leveraging Log Analytics workspace transformations and some simple KQL query statements.Part 3: Unified Security Intelligence - Orchestrating GenAI Threat Detection with Microsoft Sentinel
Why Sentinel for GenAI Security Observability? Before diving into detection rules, let's address why Microsoft Sentinel is uniquely positioned for GenAI security operations—especially compared to traditional or non-native SIEMs. Native Azure Integration: Zero ETL Overhead The problem with external SIEMs: To monitor your GenAI workloads with a third-party SIEM, you need to: Configure log forwarding from Log Analytics to external systems Set up data connectors or agents for Azure OpenAI audit logs Create custom parsers for Azure-specific log schemas Maintain authentication and network connectivity between Azure and your SIEM Pay data egress costs for logs leaving Azure The Sentinel advantage: Your logs are already in Azure. Sentinel connects directly to: Log Analytics workspace - Where your Container Insights logs already flow Azure OpenAI audit logs - Native access without configuration Azure AD sign-in logs - Instant correlation with identity events Defender for Cloud alerts - Platform-level AI threat detection included Threat intelligence feeds - Microsoft's global threat data built-in Microsoft Defender XDR - AI-driven cybersecurity that unifies threat detection and response across endpoints, email, identities, cloud apps and Sentinel There's no data movement, no ETL pipelines, and no latency from log shipping. Your GenAI security data is queryable in real-time. KQL: Built for Complex Correlation at Scale Why this matters for GenAI: Detecting sophisticated AI attacks requires correlating: Application logs (your code from Part 2) Azure OpenAI service logs (API calls, token usage, throttling) Identity signals (who authenticated, from where) Threat intelligence (known malicious IPs) Defender for Cloud alerts (platform-level anomalies) KQL's advantage: Kusto Query Language is designed for this. You can: Join across multiple data sources in a single query Parse nested JSON (like your structured logs) natively Use time-series analysis functions for anomaly detection and behavior patterns Aggregate millions of events in seconds Extract entities (users, IPs, sessions) automatically for investigation graphs Example: Correlating your app logs with Azure AD sign-ins and Defender alerts takes 10 lines of KQL. In a traditional SIEM, this might require custom scripts, data normalization, and significantly slower performance. User Security Context Flows Natively Remember the user_security_context you pass in extra_body from Part 2? That context: Automatically appears in Azure OpenAI's audit logs Flows into Defender for Cloud AI alerts Is queryable in Sentinel without custom parsing Maps to the same identity schema as Azure AD logs With external SIEMs: You'd need to: Extract user context from your application logs Separately ingest Azure OpenAI logs Write correlation logic to match them Maintain entity resolution across different data sources With Sentinel: It just works. The end_user_id, source_ip, and application_name are already normalized across Azure services. Built-In AI Threat Detection Sentinel includes pre-built detections for cloud and AI workloads: Azure OpenAI anomalous access patterns (out of the box) Unusual token consumption (built-in analytics templates) Geographic anomalies (using Azure's global IP intelligence) Impossible travel detection (cross-referencing sign-ins with AI API calls) Microsoft Defender XDR (correlation with endpoint, email, cloud app signals) These aren't generic "high volume" alerts—they're tuned for Azure AI services by Microsoft's security research team. You can use them as-is or customize them with your application-specific context. Entity Behavior Analytics (UEBA) Sentinel's UEBA automatically builds baselines for: Normal request volumes per user Typical request patterns per application Expected geographic access locations Standard model usage patterns Then it surfaces anomalies: "User_12345 normally makes 10 requests/day, suddenly made 500 in an hour" "Application_A typically uses GPT-3.5, suddenly switched to GPT-4 exclusively" "User authenticated from Seattle, made AI requests from Moscow 10 minutes later" This behavior modeling happens automatically—no custom ML model training required. Traditional SIEMs would require you to build this logic yourself. The Bottom Line For GenAI security on Azure: Sentinel reduces time-to-detection because data is already there Correlation is simpler because everything speaks the same language Investigation is faster because entities are automatically linked Cost is lower because you're not paying data egress fees Maintenance is minimal because connectors are native If your GenAI workloads are on Azure, using anything other than Sentinel means fighting against the platform instead of leveraging it. From Logs to Intelligence: The Complete Picture Your structured logs from Part 2 are flowing into Log Analytics. Here's what they look like: { "timestamp": "2025-10-21T14:32:17.234Z", "level": "INFO", "message": "LLM Request Received", "request_id": "a7c3e9f1-4b2d-4a8e-9c1f-3e5d7a9b2c4f", "session_id": "550e8400-e29b-41d4-a716-446655440000", "prompt_hash": "d3b07384d113edec49eaa6238ad5ff00", "security_check_passed": "PASS", "source_ip": "203.0.113.42", "end_user_id": "user_550e8400", "application_name": "AOAI-Customer-Support-Bot", "model_deployment": "gpt-4-turbo" } These logs are in the ContainerLogv2 table since our application “AOAI-Customer-Support-Bot” is running on Azure Kubernetes Services (AKS). Steps to Setup AKS to stream logs to Sentinel/Log Analytics From Azure portal, navigate to your AKS, then to Monitoring -> Insights Select Monitor Settings Under Container Logs Select the Sentinel-enabled Log Analytics workspace Select Logs and events Check the ‘Enable ContainerLogV2’ and ‘Enable Syslog collection’ options More details can be found at this link Kubernetes monitoring in Azure Monitor - Azure Monitor | Microsoft Learn Critical Analytics Rules: What to Detect and Why Rule 1: Prompt Injection Attack Detection Why it matters: Prompt injection is the GenAI equivalent of SQL injection. Attackers try to manipulate the model by overriding system instructions. Multiple attempts indicate intentional malicious behavior. What to detect: 3+ prompt injection attempts within 10 minutes from similar IP let timeframe = 1d; let threshold = 3; AlertEvidence | where TimeGenerated >= ago(timeframe) and EntityType == "Ip" | where DetectionSource == "Microsoft Defender for AI Services" | where Title contains "jailbreak" or Title contains "prompt injection" | summarize count() by bin (TimeGenerated, 1d), RemoteIP | where count_ >= threshold What the SOC sees: User identity attempting injection Source IP and geographic location Sample prompts for investigation Frequency indicating automation vs. manual attempts Severity: High (these are actual attempts to bypass security) Rule 2: Content Safety Filter Violations Why it matters: When Azure AI Content Safety blocks a request, it means harmful content (violence, hate speech, etc.) was detected. Multiple violations indicate intentional abuse or a compromised account. What to detect: Users with 3+ content safety violations in a 1 hour block during a 24 hour time period. let timeframe = 1d; let threshold = 3; ContainerLogV2 | where TimeGenerated >= ago(timeframe) | where isnotempty(LogMessage.end_user_id) | where LogMessage.security_check_passed == "FAIL" | extend source_ip=tostring(LogMessage.source_ip) | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend application_name = tostring(LogMessage.application_name) | extend security_check_passed = tostring (LogMessage.security_check_passed) | summarize count() by bin(TimeGenerated, 1h),source_ip,end_user_id,session_id,Computer,application_name,security_check_passed | where count_ >= threshold What the SOC sees: Severity based on violation count Time span showing if it's persistent vs. isolated Prompt samples (first 80 chars) for context Session ID for conversation history review Severity: High (these are actual harmful content attempts) Rule 3: Rate Limit Abuse Why it matters: Persistent rate limit violations indicate automated attacks, credential stuffing, or attempts to overwhelm the system. Legitimate users who hit rate limits don't retry 10+ times in minutes. What to detect: Users blocked by rate limiter 5+ times in 10 minutes let timeframe = 1h; let threshold = 5; AzureDiagnostics | where ResourceProvider == "MICROSOFT.COGNITIVESERVICES" | where OperationName == "Completions" or OperationName contains "ChatCompletions" | extend tokensUsed = todouble(parse_json(properties_s).usage.total_tokens) | summarize totalTokens = sum(tokensUsed), requests = count(), rateLimitErrors = countif(httpstatuscode_s == "429") by bin(TimeGenerated, 1h) | where count_ >= threshold What the SOC sees: Whether it's a bot (immediate retries) or human (gradual retries) Duration of attack Which application is targeted Correlation with other security events from same user/IP Severity: Medium (nuisance attack, possible reconnaissance) Rule 4: Anomalous Source IP for User Why it matters: A user suddenly accessing from a new country or VPN could indicate account compromise. This is especially critical for privileged accounts or after-hours access. What to detect: User accessing from an IP never seen in the last 7 days let lookback = 7d; let recent = 1h; let baseline = IdentityLogonEvents | where Timestamp between (ago(lookback + recent) .. ago(recent)) | where isnotempty(IPAddress) | summarize knownIPs = make_set(IPAddress) by AccountUpn; ContainerLogV2 | where TimeGenerated >= ago(recent) | where isnotempty(LogMessage.source_ip) | extend source_ip=tostring(LogMessage.source_ip) | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend application_name = tostring(LogMessage.application_name) | extend security_check_passed = tostring (LogMessage.security_check_passed) | extend full_prompt_sample = tostring (LogMessage.full_prompt_sample) | lookup baseline on $left.AccountUpn == $right.end_user_id | where isnull(knownIPs) or IPAddress !in (knownIPs) | project TimeGenerated, source_ip, end_user_id, session_id, Computer, application_name, security_check_passed, full_prompt_sample What the SOC sees: User identity and new IP address Geographic location change Whether suspicious prompts accompanied the new IP Timing (after-hours access is higher risk) Severity: Medium (environment compromise, reconnaissance) Rule 5: Coordinated Attack - Same Prompt from Multiple Users Why it matters: When 5+ users send identical prompts, it indicates a bot network, credential stuffing, or organized attack campaign. This is not normal user behavior. What to detect: Same prompt hash from 5+ different users within 1 hour let timeframe = 1h; let threshold = 5; ContainerLogV2 | where TimeGenerated >= ago(timeframe) | where isnotempty(LogMessage.prompt_hash) | where isnotempty(LogMessage.end_user_id) | extend source_ip=tostring(LogMessage.source_ip) | extend end_user_id=tostring(LogMessage.end_user_id) | extend prompt_hash=tostring(LogMessage.prompt_hash) | extend application_name = tostring(LogMessage.application_name) | extend security_check_passed = tostring (LogMessage.security_check_passed) | project TimeGenerated, prompt_hash, source_ip, end_user_id, application_name, security_check_passed | summarize DistinctUsers = dcount(end_user_id), Attempts = count(), Users = make_set(end_user_id, 100), IpAddress = make_set(source_ip, 100) by prompt_hash, bin(TimeGenerated, 1h) | where DistinctUsers >= threshold What the SOC sees: Attack pattern (single attacker with stolen accounts vs. botnet) List of compromised user accounts Source IPs for blocking Prompt sample to understand attack goal Severity: High (indicates organized attack) Rule 6: Malicious model detected Why it matters: Model serialization attacks can lead to serious compromise. When Defender for Cloud Model Scanning identifies issues with a custom or opensource model that is part of Azure ML Workspace, Registry, or hosted in Foundry, that may be or may not be a user oversight. What to detect: Model scan results from Defender for Cloud and if it is being actively used. What the SOC sees: Malicious model Applications leveraging the model Source IPs and users accessed the model Severity: Medium (can be user oversight) Advanced Correlation: Connecting the Dots The power of Sentinel is correlating your application logs with other security signals. Here are the most valuable correlations: Correlation 1: Failed GenAI Requests + Failed Sign-Ins = Compromised Account Why: Account showing both authentication failures and malicious AI prompts is likely compromised within a 1 hour timeframe l let timeframe = 1h; ContainerLogV2 | where TimeGenerated >= ago(timeframe) | where isnotempty(LogMessage.source_ip) | extend source_ip=tostring(LogMessage.source_ip) | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend application_name = tostring(LogMessage.application_name) | extend security_check_passed = tostring (LogMessage.security_check_passed) | extend full_prompt_sample = tostring (LogMessage.full_prompt_sample) | extend message = tostring (LogMessage.message) | where security_check_passed == "FAIL" or message contains "WARNING" | join kind=inner ( SigninLogs | where ResultType != 0 // 0 means success, non-zero indicates failure | project TimeGenerated, UserPrincipalName, ResultType, ResultDescription, IPAddress, Location, AppDisplayName ) on $left.end_user_id == $right.UserPrincipalName | project TimeGenerated, source_ip, end_user_id, application_name, full_prompt_sample, prompt_hash, message, security_check_passed Severity: High (High probability of compromise) Correlation 2: Application Logs + Defender for Cloud AI Alerts Why: Defender for Cloud AI Threat Protection detects platform-level threats (unusual API patterns, data exfiltration attempts). When both your code and the platform flag the same user, confidence is very high. let timeframe = 1h; ContainerLogV2 | where TimeGenerated >= ago(timeframe) | where isnotempty(LogMessage.source_ip) | extend source_ip=tostring(LogMessage.source_ip) | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend application_name = tostring(LogMessage.application_name) | extend security_check_passed = tostring (LogMessage.security_check_passed) | extend full_prompt_sample = tostring (LogMessage.full_prompt_sample) | extend message = tostring (LogMessage.message) | where security_check_passed == "FAIL" or message contains "WARNING" | join kind=inner ( AlertEvidence | where TimeGenerated >= ago(timeframe) and AdditionalFields.Asset == "true" | where DetectionSource == "Microsoft Defender for AI Services" | project TimeGenerated, Title, CloudResource ) on $left.application_name == $right.CloudResource | project TimeGenerated, application_name, end_user_id, source_ip, Title Severity: Critical (Multi-layer detection) Correlation 3: Source IP + Threat Intelligence Feeds Why: If requests come from known malicious IPs (C2 servers, VPN exit nodes used in attacks), treat them as high priority even if behavior seems normal. //This rule correlates GenAI app activity with Microsoft Threat Intelligence feed available in Sentinel and Microsoft XDR for malicious IP IOCs let timeframe = 10m; ContainerLogV2 | where TimeGenerated >= ago(timeframe) | where isnotempty(LogMessage.source_ip) | extend source_ip=tostring(LogMessage.source_ip) | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend application_name = tostring(LogMessage.application_name) | extend security_check_passed = tostring (LogMessage.security_check_passed) | extend full_prompt_sample = tostring (LogMessage.full_prompt_sample) | join kind=inner ( ThreatIntelIndicators | where IsActive == "true" | where ObservableKey startswith "ipv4-addr" or ObservableKey startswith "network-traffic" | project IndicatorIP = ObservableValue ) on $left.source_ip == $right.IndicatorIP | project TimeGenerated, source_ip, end_user_id, application_name, full_prompt_sample, security_check_passed Severity: High (Known bad actor) Workbooks: What Your SOC Needs to See Executive Dashboard: GenAI Security Health Purpose: Leadership wants to know: "Are we secure?" Answer with metrics. Key visualizations: Security Status Tiles (24 hours) Total Requests Success Rate Blocked Threats (Self detected + Content Safety + Threat Protection for AI) Rate Limit Violations Model Security Score (Red Team evaluation status of currently deployed model) ContainerLogV2 | where TimeGenerated > ago (1d) | extend security_check_passed = tostring (LogMessage.security_check_passed) | summarize SuccessCount=countif(security_check_passed == "PASS"), FailedCount=countif(security_check_passed == "FAIL") by bin(TimeGenerated, 1h) | extend TotalRequests = SuccessCount + FailedCount | extend SuccessRate = todouble(SuccessCount)/todouble(TotalRequests) * 100 | order by SuccessRate 1. Trend Chart: Pass vs. Fail Over Time Shows if attack volume is increasing Identifies attack time windows Validates that defenses are working ContainerLogV2 | where TimeGenerated > ago (14d) | extend security_check_passed = tostring (LogMessage.security_check_passed) | summarize SuccessCount=countif(security_check_passed == "PASS"), FailedCount=countif(security_check_passed == "FAIL") by bin(TimeGenerated, 1d) | render timechart 2. Top 10 Users by Security Events Bar chart of users with most failures ContainerLogV2 | where TimeGenerated > ago (1d) | where isnotempty(LogMessage.end_user_id) | extend end_user_id=tostring(LogMessage.end_user_id) | extend security_check_passed = tostring (LogMessage.security_check_passed) | where security_check_passed == "FAIL" | summarize FailureCount = count() by end_user_id | top 20 by FailureCount | render barchart Applications with most failures ContainerLogV2 | where TimeGenerated > ago (1d) | where isnotempty(LogMessage.application_name) | extend application_name=tostring(LogMessage.application_name) | extend security_check_passed = tostring (LogMessage.security_check_passed) | where security_check_passed == "FAIL" | summarize FailureCount = count() by application_name | top 20 by FailureCount | render barchart 3. Geographic Threat Map Where are attacks originating? Useful for geo-blocking decisions ContainerLogV2 | where TimeGenerated > ago (1d) | where isnotempty(LogMessage.application_name) | extend application_name=tostring(LogMessage.application_name) | extend source_ip=tostring(LogMessage.source_ip) | extend security_check_passed = tostring (LogMessage.security_check_passed) | where security_check_passed == "FAIL" | extend GeoInfo = geo_info_from_ip_address(source_ip) | project sourceip, GeoInfo.counrty, GeoInfo.city Analyst Deep-Dive: User Behavior Analysis Purpose: SOC analyst investigating a specific user or session Key components: 1. User Activity Timeline Every request from the user in time order ContainerLogV2 | where isnotempty(LogMessage.end_user_id) | project TimeGenerated, LogMessage.source_ip, LogMessage.end_user_id, LogMessage. session_id, Computer, LogMessage.application_name, LogMessage.request_id, LogMessage.message, LogMessage.full_prompt_sample | order by tostring(LogMessage_end_user_id), TimeGenerated Color-coded by security status AlertInfo | where DetectionSource == "Microsoft Defender for AI Services" | project TimeGenerated, AlertId, Title, Category, Severity, SeverityColor = case( Severity == "High", "🔴 High", Severity == "Medium", "🟠 Medium", Severity == "Low", "🟢 Low", "⚪ Unknown" ) 2. Session Analysis Table All sessions for the user ContainerLogV2 | where TimeGenerated > ago (1d) | where isnotempty(LogMessage.end_user_id) | extend end_user_id=tostring(LogMessage.end_user_id) | where end_user_id == "<username>" // Replace with actual username | extend application_name=tostring(LogMessage.application_name) | extend source_ip=tostring(LogMessage.source_ip) | extend session_id=tostri1ng(LogMessage.session_id) | extend security_check_passed = tostring (LogMessage.security_check_passed) | project TimeGenerated, session_id, end_user_id, application_name, security_check_passed Failed requests per session ContainerLogV2 | where TimeGenerated > ago (1d) | extend security_check_passed = tostring (LogMessage.security_check_passed) | where security_check_passed == "FAIL" | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend security_check_passed = tostring (LogMessage.security_check_passed) | summarize Failed_Sessions = count() by end_user_id, session_id | order by Failed_Sessions Session duration ContainerLogV2 | where TimeGenerated > ago (1d) | where isnotempty(LogMessage.session_id) | extend security_check_passed = tostring (LogMessage.security_check_passed) | where security_check_passed == "PASS" | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend application_name=tostring(LogMessage.application_name) | extend source_ip=tostring(LogMessage.source_ip) | summarize Start=min(TimeGenerated), End=max(TimeGenerated), count() by end_user_id, session_id, source_ip, application_name | extend DurationSeconds = datetime_diff("second", End, Start) 3. Prompt Pattern Detection Unique prompts by hash Frequency of each pattern Detect if user is fuzzing/testing boundaries Sample query for user investigation: ContainerLogV2 | where TimeGenerated > ago (14d) | where isnotempty(LogMessage.prompt_hash) | where isnotempty(LogMessage.full_prompt_sample) | extend prompt_hash=tostring(LogMessage.prompt_hash) | extend full_prompt_sample=tostring(LogMessage.full_prompt_sample) | extend application_name=tostring(LogMessage.application_name) | summarize count() by prompt_hash, full_prompt_sample | order by count_ Threat Hunting Dashboard: Proactive Detection Purpose: Find threats before they trigger alerts Key queries: 1. Suspicious Keywords in Prompts (e.g. Ignore, Disregard, system prompt, instructions, DAN, jailbreak, pretend, roleplay) let suspicious_prompts = externaldata (content_policy:int, content_policy_name:string, q_id:int, question:string) [ @"https://raw.githubusercontent.com/verazuo/jailbreak_llms/refs/heads/main/data/forbidden_question/forbidden_question_set.csv"] with (format="csv", has_header_row=true, ignoreFirstRecord=true); ContainerLogV2 | where TimeGenerated > ago (14d) | where isnotempty(LogMessage.full_prompt_sample) | extend full_prompt_sample=tostring(LogMessage.full_prompt_sample) | where full_prompt_sample in (suspicious_prompts) | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend application_name=tostring(LogMessage.application_name) | extend source_ip=tostring(LogMessage.source_ip) | project TimeGenerated, session_id, end_user_id, source_ip, application_name, full_prompt_sample 2. High-Volume Anomalies User sending too many requests by a IP or User. Assuming that Foundry Projects are configured to use Azure AD and not API Keys. //50+ requests in 1 hour let timeframe = 1h; let threshold = 50; AzureDiagnostics | where ResourceProvider == "MICROSOFT.COGNITIVESERVICES" | where OperationName == "Completions" or OperationName contains "ChatCompletions" | extend tokensUsed = todouble(parse_json(properties_s).usage.total_tokens) | summarize totalTokens = sum(tokensUsed), requests = count() by bin(TimeGenerated, 1h),CallerIPAddress | where count_ >= threshold 3. Rare Failures (Novel Attack Detection) Rare failures might indicate zero-day prompts or new attack techniques //10 or more failures in 24 hours ContainerLogV2 | where TimeGenerated >= ago (24h) | where isnotempty(LogMessage.security_check_passed) | extend security_check_passed=tostring(LogMessage.security_check_passed) | where security_check_passed == "FAIL" | extend application_name=tostring(LogMessage.application_name) | extend end_user_id=tostring(LogMessage.end_user_id) | extend source_ip=tostring(LogMessage.source_ip) | summarize FailedAttempts = count(), FirstAttempt=min(TimeGenerated), LastAttempt=max(TimeGenerated) by application_name | extend DurationHours = datetime_diff('hour', LastAttempt, FirstAttempt) | where DurationHours >= 24 and FailedAttempts >=10 | project application_name, FirstAttempt, LastAttempt, DurationHours, FailedAttempts Measuring Success: Security Operations Metrics Key Performance Indicators Mean Time to Detect (MTTD): let AppLog = ContainerLogV2 | extend application_name=tostring(LogMessage.application_name) | extend security_check_passed=tostring (LogMessage.security_check_passed) | extend session_id=tostring(LogMessage.session_id) | extend end_user_id=tostring(LogMessage.end_user_id) | extend source_ip=tostring(LogMessage.source_ip) | where security_check_passed=="FAIL" | summarize FirstLogTime=min(TimeGenerated) by application_name, session_id, end_user_id, source_ip; let Alert = AlertEvidence | where DetectionSource == "Microsoft Defender for AI Services" | extend end_user_id = tostring(AdditionalFields.AadUserId) | extend source_ip=RemoteIP | extend application_name=CloudResource | summarize FirstAlertTime=min(TimeGenerated) by AlertId, Title, application_name, end_user_id, source_ip; AppLog | join kind=inner (Alert) on application_name, end_user_id, source_ip | extend DetectionDelayMinutes=datetime_diff('minute', FirstAlertTime, FirstLogTime) | summarize MTTD_Minutes=round(avg (DetectionDelayMinutes),2) by AlertId, Title Target: <= 15 minutes from first malicious activity to alert Mean Time to Respond (MTTR): SecurityIncident | where Status in ("New", "Active") | where CreatedTime >= ago(14d) | extend ResponseDelay = datetime_diff('minute', LastActivityTime, FirstActivityTime) | summarize MTTR_Minutes = round (avg (ResponseDelay),2) by CreatedTime, IncidentNumber | order by CreatedTime, IncidentNumber asc Target: < 4 hours from alert to remediation Threat Detection Rate: ContainerLogV2 | where TimeGenerated > ago (1d) | extend security_check_passed = tostring (LogMessage.security_check_passed) | summarize SuccessCount=countif(security_check_passed == "PASS"), FailedCount=countif(security_check_passed == "FAIL") by bin(TimeGenerated, 1h) | extend TotalRequests = SuccessCount + FailedCount | extend SuccessRate = todouble(SuccessCount)/todouble(TotalRequests) * 100 | order by SuccessRate Context: 1-3% is typical for production systems (most traffic is legitimate) What You've Built By implementing the logging from Part 2 and the analytics rules in this post, your SOC now has: ✅ Real-time threat detection - Alerts fire within minutes of malicious activity ✅ User attribution - Every incident has identity, IP, and application context ✅ Pattern recognition - Detect both volume-based and behavior-based attacks ✅ Correlation across layers - Application logs + platform alerts + identity signals ✅ Proactive hunting - Dashboards for finding threats before they trigger rules ✅ Executive visibility - Metrics showing program effectiveness Key Takeaways GenAI threats need GenAI-specific analytics - Generic rules miss context like prompt injection, content safety violations, and session-based attacks Correlation is critical - The most sophisticated attacks span multiple signals. Correlating app logs with identity and platform alerts catches what individual rules miss. User context from Part 2 pays off - end_user_id, source_ip, and session_id enable investigation and response at scale Prompt hashing enables pattern detection - Detect repeated attacks without storing sensitive prompt content Workbooks serve different audiences - Executives want metrics; analysts want investigation tools; hunters want anomaly detection Start with high-fidelity rules - Content Safety violations and rate limit abuse have very low false positive rates. Add behavioral rules after establishing baselines. What's Next: Closing the Loop You've now built detection and visibility. In Part 4, we'll close the security operations loop with: Part 4: Platform Integration and Automated Response Building SOAR playbooks for automated incident response Implementing automated key rotation with Azure Key Vault Blocking identities in Entra Creating feedback loops from incidents to code improvements The journey from blind spot to full security operations capability is almost complete. Previous: Part 1: Securing GenAI Workloads in Azure: A Complete Guide to Monitoring and Threat Protection - AIO11Y | Microsoft Community Hub Part 2: Part 2: Building Security Observability Into Your Code - Defensive Programming for Azure OpenAI | Microsoft Community Hub Next: Part 4: Platform Integration and Automated Response (Coming soon)PAAS resource metrics using Azure Data Collection Rule to Log Analytics Workspace
Hi Team, I want to build a use case to pull the Azure PAAS resources metrics using azure DCR and push that data metrics to log analytics workspace which eventually will push the data to azure event hub through streaming and final destination as azure postgres to store all the resources metrics information in a centralized table and create KPIs and dashboard for the clients for better utilization of resources. I have not used diagnose setting enabling option since it has its cons like we need to manually enable each resources settings also we get limited information extracted from diagnose setting. But while implementing i saw multiple articles stating DCR is not used for pulling PAAS metrics its only compatible for VM metrics. Want to understand is it possible to use DCR for PAAS metrics? Thanks in advance for any inputs.Solved138Views0likes2CommentsAzure Active Directory | Workbooks | Sign-In Analysis (Preview: AAD & AD FS)
This workbook will help you analyze your organization's sign-ins for both Azure AD and AD FS Sign-Ins This workbook will show you the General Analysis and Error Analysis. General Analysis: :pushpin: Sign-in Activity Summary :pushpin: Sign-in Analysis by Location :pushpin: Sign-in Analysis by Device Error Analysis: :pushpin: Sign-in Activity Summary :pushpin: Top Sign-In Errors by User or IP1.5KViews1like1CommentCreating alerts for custom errors with auditing and Log Analytics
Errors are an inherent part of any application. As a database professional managing an Azure SQL Managed Instance, you may be interested in understanding when specific errors occur and how to leverage user-generated errors to respond swiftly when particular scenarios arise. On this post we will see how we can setup alerts for scenarios like query blocking and long open transactions. This will be an extend of what is described on How to setup alerts for deadlocks using Log Analytics | Microsoft Community Hub Step 1 - Setup Log Analytics and auditing Follow steps 1, 2, 3 and 4 described on How to setup alerts for deadlocks using Log Analytics | Microsoft Community Hub Step 2 - Create a table to save the details of the blocking chain / long open transaction Getting an alert itself is not useful if you don`t have a way of getting details for later analysis. Create the table on a database of your choice. If you are interested on blocking: CREATE TABLE [dbo].[blocking_report] ( [DateTime] [datetime] NULL ,[HeadBlocker] [varchar](1) NOT NULL ,[SessionID] [smallint] NOT NULL ,[Login] [nvarchar](128) NOT NULL ,[Database] [nvarchar](128) NULL ,[BlockedBy] [smallint] NULL ,[OpenTransactions] [int] NULL ,[Status] [nvarchar](30) NOT NULL ,[WaitType] [nvarchar](60) NULL ,[WaitTime_ms] [bigint] NULL ,[WaitResource] [nvarchar](256) NULL ,[WaitResourceDesc] [nvarchar](3072) NULL ,[Command] [nvarchar](32) NULL ,[Application] [nvarchar](128) NULL ,[TotalCPU_ms] [int] NOT NULL ,[TotalPhysicalIO_MB] [bigint] NULL ,[MemoryUse_KB] [int] NULL ,[LoginTime] [datetime] NOT NULL ,[LastRequestStartTime] [datetime] NOT NULL ,[HostName] [nvarchar](128) NULL ,[QueryHash] [binary](8) NULL ,[BlockerQuery_or_MostRecentQuery] [nvarchar](max) NULL ) If you are interested on open transactions: CREATE TABLE [dbo].[opentransactions]( [CapturedTime] [datetime] NOT NULL, [tran_elapsed_time_seconds] [int] NULL, [transaction_begin_time] [datetime] NOT NULL, [session_id] [int] NOT NULL, [database_name] [nvarchar](128) NULL, [open_transaction_count] [int] NOT NULL, [host_name] [nvarchar](128) NULL, [program_name] [nvarchar](128) NULL, [login_name] [nvarchar](128) NULL, [status] [nvarchar](30) NULL, [text] [nvarchar](max) NULL ) Step 3 - Create a SQL Agent job Schedule the two SQL Agent jobs using the queries below. You might be interested on schedule that runs every X amount of seconds/minutes. Describing very briefly what each query does: if a query is being blocked for more than 60 seconds, save the blocking chain on a table and raise an error if a transaction is open for more than 120 seconds, save the query details on a table and raise an error Make the necessary adjustments according to your goals. Query for blocking: IF ( SELECT count(*) FROM sys.dm_exec_requests WHERE wait_type LIKE 'LCK%' AND wait_time > 60000 ) > 0 -- checks for queries waiting to obtain a lock for more than 30 seconds BEGIN INSERT INTO database_name.dbo.Blocking_report -- make sure that you change the database_name value SELECT ( SELECT getdate() ) ,[HeadBlocker] = CASE WHEN r2.session_id IS NOT NULL AND ( r.blocking_session_id = 0 OR r.session_id IS NULL ) THEN '1' ELSE '' END ,[SessionID] = s.session_id ,[Login] = s.login_name ,[Database] = db_name(p.dbid) ,[BlockedBy] = w.blocking_session_id ,[OpenTransactions] = r.open_transaction_count ,[Status] = s.STATUS ,[WaitType] = w.wait_type ,[WaitTime_ms] = w.wait_duration_ms ,[WaitResource] = r.wait_resource ,[WaitResourceDesc] = w.resource_description ,[Command] = r.command ,[Application] = s.program_name ,[TotalCPU_ms] = s.cpu_time ,[TotalPhysicalIO_MB] = (s.reads + s.writes) * 8 / 1024 ,[MemoryUse_KB] = s.memory_usage * 8192 / 1024 ,[LoginTime] = s.login_time ,[LastRequestStartTime] = s.last_request_start_time ,[HostName] = s.host_name ,[QueryHash] = r.query_hash ,[BlockerQuery_or_MostRecentQuery] = txt.TEXT FROM sys.dm_exec_sessions s LEFT OUTER JOIN sys.dm_exec_connections c ON (s.session_id = c.session_id) LEFT OUTER JOIN sys.dm_exec_requests r ON (s.session_id = r.session_id) LEFT OUTER JOIN sys.dm_os_tasks t ON ( r.session_id = t.session_id AND r.request_id = t.request_id ) LEFT OUTER JOIN ( SELECT * ,ROW_NUMBER() OVER ( PARTITION BY waiting_task_address ORDER BY wait_duration_ms DESC ) AS row_num FROM sys.dm_os_waiting_tasks ) w ON (t.task_address = w.waiting_task_address) AND w.row_num = 1 LEFT OUTER JOIN sys.dm_exec_requests r2 ON (s.session_id = r2.blocking_session_id) LEFT OUTER JOIN sys.sysprocesses p ON (s.session_id = p.spid) OUTER APPLY sys.dm_exec_sql_text(ISNULL(r.[sql_handle], c.most_recent_sql_handle)) AS txt WHERE s.is_user_process = 1 AND ( r2.session_id IS NOT NULL AND ( r.blocking_session_id = 0 OR r.session_id IS NULL ) ) OR blocked > 0 ORDER BY [HeadBlocker] DESC ,s.session_id; THROW 50000 ,'There are queries being blocked for more than 60 seconds' ,1; END Query for open transactions: DECLARE @count INT = ( SELECT count(*) FROM sys.dm_tran_active_transactions at INNER JOIN sys.dm_tran_session_transactions st ON st.transaction_id = at.transaction_id LEFT OUTER JOIN sys.dm_exec_sessions sess ON st.session_id = sess.session_id LEFT OUTER JOIN sys.dm_exec_connections conn ON conn.session_id = sess.session_id OUTER APPLY sys.dm_exec_sql_text(conn.most_recent_sql_handle) AS txt WHERE DATEDIFF(SECOND, transaction_begin_time, GETDATE()) > 120 -- 120 seconds ) IF @count > 0 BEGIN INSERT INTO database_name.dbo.opentransactions -- change database_name to where the table was created SELECT GETDATE() AS CapturedTime ,DATEDIFF(SECOND, transaction_begin_time, GETDATE()) AS tran_elapsed_time_seconds ,at.transaction_begin_time ,st.session_id ,DB_NAME(sess.database_id) AS database_name ,st.open_transaction_count ,sess.host_name ,sess.program_name ,sess.login_name ,sess.STATUS ,txt.TEXT FROM sys.dm_tran_active_transactions at INNER JOIN sys.dm_tran_session_transactions st ON st.transaction_id = at.transaction_id LEFT OUTER JOIN sys.dm_exec_sessions sess ON st.session_id = sess.session_id LEFT OUTER JOIN sys.dm_exec_connections conn ON conn.session_id = sess.session_id OUTER APPLY sys.dm_exec_sql_text(conn.most_recent_sql_handle) AS txt WHERE DATEDIFF(SECOND, transaction_begin_time, GETDATE()) > 120 --120 seconds ORDER BY tran_elapsed_time_seconds DESC; THROW 50000 ,'There are open transactions for more than 120 seconds' ,1; END Step 4 - Create the alert based on a Log Analytics query Just as described on steps 6 and 7 of How to setup alerts for deadlocks using Log Analytics | Microsoft Community Hub, you can create the two separate alerts using the queries below. For blocking: AzureDiagnostics | where TimeGenerated > ago(15m) //last 15 minutes | where LogicalServerName_s == "server_name" //server name | where Category == "SQLSecurityAuditEvents" | where additional_information_s contains "There are queries being blocked for more" | project TimeGenerated, LogicalServerName_s, database_name_s, additional_information_s For open transactions: AzureDiagnostics | where TimeGenerated > ago(15m) //last 15 minutes | where LogicalServerName_s == "server_name" //server name | where Category == "SQLSecurityAuditEvents" | where additional_information_s contains "There are open transactions for more than" | project TimeGenerated, LogicalServerName_s, database_name_s, additional_information_s244Views0likes0CommentsHow to setup alerts for deadlocks using Log Analytics
Managed Instance diagnostic events do not support sending deadlock information to Log Analytics. However, through auditing, it's possible to query failed queries along with their reported error messages—though this does not include deadlock XML. We will see how we can send information to Log Analytics and setup an alert for when a deadlock occurs. Step 1 - Deploy Log Analytics Create a Log Analytics workspace if you currently don't have one Create a Log Analytics workspace Step 2 - Add diagnostic setting On the Azure Portal, open the Diagnostic settings of your Azure SQL Managed Instance and choose Add diagnostic setting Select SQL Security Audit Event and choose has destination your Log Analytics workspace Step 3 - Create a server audit on the Azure SQL Managed Instance Run the query below on the Managed Instance Rename the server audit and server audit specification to a name of your choice. CREATE SERVER AUDIT [audittest] TO EXTERNAL_MONITOR GO -- we are adding Login audit, but only BATCH_COMPLETED_GROUP is necessary for query execution CREATE SERVER AUDIT SPECIFICATION audit_server FOR SERVER AUDIT audittest ADD (SUCCESSFUL_LOGIN_GROUP), ADD (BATCH_COMPLETED_GROUP), ADD (FAILED_LOGIN_GROUP) WITH (STATE = ON) GO ALTER SERVER AUDIT [audittest] WITH (STATE = ON) GO Step 4 - Check events on Log Analytics It may take some time for records to begin appearing in Log Analytics. Open your Log Analytics workspace and choose Logs To verify if data is being ingested, run the following query in Log Analytics and wait until you start getting the first results: Make sure that you change servername with your Azure SQL Managed Instance name AzureDiagnostics | where LogicalServerName_s == "servername" | where Category == "SQLSecurityAuditEvents" | take 10 Example: Step 5 - (Optional) Create a deadlock event for testing Create a deadlock scenario so you can see a record on log analytics. Example: Open SSMS and a new query window under the context of a user database (you can create a test database just for this test). create a table on a user database and insert 10 records: create table tb1 (id int identity(1,1) primary key clustered, col1 varchar(30)) go insert into tb1 values ('aaaaaaa') go 10 You can close the query window or reuse for the next step. Open a new query window (or reuse the first query window) and run (leave the query window open after executing): begin transaction update tb1 set col1 = 'bbbb' where id = 1 Open a second query window and run (leave the query window open after executing): begin transaction update tb1 set col1 = 'bbbb' where id = 2 Go back to the first query window opened and run (the query will be blocked - will stay executing): update tb1 set col1 = 'bbbb' where id = 2 Go back to the second query window opened and run (this transaction will be victim of deadlock): update tb1 set col1 = 'bbbb' where id = 1 You can rollback and close all windows after the deadlock exception. Step 6 - (Optional) Check the deadlock exception on Log Analytics Note: the record can take some minutes to appear on Log Analytics Use the query below to obtain the Deadlock events for the last hour (we are looking for Error 1205) Make sure that you change servername with your Azure SQL Managed Instance name AzureDiagnostics | where TimeGenerated > ago(1h) | where LogicalServerName_s == "servername" | where Category == "SQLSecurityAuditEvents" | where succeeded_s == "false" | where additional_information_s contains "Err 1205, Level 13" Step 7 - Use query to Create an Alert Use the query below to create an Alert on Azure Log Analytics Make sure that you change servername with your Azure SQL Managed Instance name. The query checks for deadlocks that occurred on the previous hour. AzureDiagnostics | where TimeGenerated > ago(1h) | where LogicalServerName_s == "servername" | where Category == "SQLSecurityAuditEvents" | where succeeded_s == "false" | where additional_information_s contains "Err 1205, Level 13" Run the query and click on New alert rule Create the alert with the desired settingsOptimizing Microsoft Sentinel: Resolving AMA-Induced Syslog & CEF Duplicates
2) Recommended Solutions When collecting both Syslog and CEF logs from the same Linux collector using the Azure Monitor Agent (AMA) in Microsoft Sentinel, duplicate log entries can occur. These duplicates arise because the same event may be ingested through both the Syslog and CEF pipelines, leading to redundancy in the Log Analytics Workspace (LAW). The following solutions aim to eliminate or reduce duplicate log ingestion, ensuring that: CEF events are parsed correctly and only once. Syslog data remains clean and non-redundant. Storage and analytics efficiency is improved. Alerting and incident investigation are not skewed by duplicate entries. Each option provides a different strategy based on your environment’s flexibility and configuration capabilities—from facility-level separation, to ingestion-time filtering, to daemon-side log routing. Option 1: Facility Separation (Preferred) Configure devices to emit CEF logs on a dedicated facility (for example, 'local4'), and adjust the Data Collection Rules (DCRs) so that the CEF stream includes only that facility, while the Syslog stream excludes it. This ensures CEF events are parsed once into 'CommonSecurityLog' and never land in 'Syslog'. CEF via AMA DCR (include only CEF facility): { "properties": { "dataSources": { "syslog": [ { "streams": ["Microsoft-CommonSecurityLog"], "facilityNames": ["local4"], "logLevels": ["*"], "name": "cefDataSource" } ] }, "dataFlows": [ { "streams": ["Microsoft-CommonSecurityLog"], "destinations": ["laDest"] } ] } } Syslog via AMA DCR (exclude CEF facility): { "properties": { "dataSources": { "syslog": [ { "streams": ["Microsoft-Syslog"], "facilityNames": [ "auth","authpriv","cron","daemon","kern","mail", "syslog","user","local0","local1","local2","local3", "local5","local6","local7" ], "logLevels": ["*"], "name": "syslogDataSource" } ] }, "dataFlows": [ { "streams": ["Microsoft-Syslog"], "destinations": ["laDest"] } ] } } Option 2: Ingest-time Transform (Drop CEF from Syslog) If facility separation is not feasible, apply a transformation to the Syslog stream in the DCR so that any CEF-formatted messages are dropped during ingestion. Syslog stream transformKql: { "properties": { "dataFlows": [ { "streams": ["Microsoft-Syslog"], "transformKql": "source | where not(SyslogMessage startswith 'CEF:')", "destinations": ["laDest"] } ] } } Option 3: Daemon-side Filtering/Rewriting (rsyslog/syslog-ng) Filter or rewrite CEF messages before AMA sees them. For example, route CEF messages to a dedicated facility using syslog-ng and stop further processing: # Match CEF filter f_cef { message("^CEF:"); }; # Send CEF to local5 and stop further processing log { source(s_src); filter(f_cef); rewrite { set_facility(local5); }; destination(d_azure_mdsd); flags(final); } 3) Verification Steps with KQL Queries Detect CEF messages that leaked into Syslog: Syslog | where TimeGenerated > ago(1d) | where SyslogMessage startswith "CEF:" | summarize count() by Computer | order by count_ desc Estimate duplicate count across Syslog and CommonSecurityLog: let sys = Syslog | where TimeGenerated > ago(1d) | where SyslogMessage startswith "CEF:" | extend key = hash_sha256(SyslogMessage); let cef = CommonSecurityLog | where TimeGenerated > ago(1d) | extend key = hash_sha256(RawEvent); cef | join kind=innerunique (sys) on key | summarize duplicates = count() Note : You should identify the RawEvent that might be causing the duplicates. 3.1) Duplicate Detection Query Explained This query helps quantify duplicate ingestion when both Syslog and CEF connectors ingest the same events. It works as follows: Build the Syslog set (sys): Filter the 'Syslog' table for the last day and keep only messages that start with 'CEF:'. Compute a SHA-256 hash of the entire message as a stable join key ("key"). Build the CEF set (cef): Filter the 'CommonSecurityLog' table for the last day and compute a SHA-256 hash of the 'RawEvent' field as the same-style join key. Join on the key: Use 'join kind=innerunique' to find messages that exist in both sets (i.e., duplicates). Summarize: Count the number of matching rows to get a duplicate total. 4) Common Pitfalls - Overlapping DCRs applied to the same collector VM causing overlapping facilities/severities. - CEF and Syslog using the same facility on sources, leading to ingestion on both streams. - rsyslog/syslog-ng filters placed after AMA’s own configuration include (ensure your custom rules run before '10-azuremonitoragent.conf'). 5) References - Microsoft Learn: Ingest syslog and CEF messages to Microsoft Sentinel with AMA (https://learn.microsoft.com/en-us/azure/sentinel/connect-cef-syslog-ama)Hunting for MFA manipulations in Entra ID tenants using KQL
The following article, Hunting for MFA manipulations in Entra ID tenants using KQL proved to be an invaluable resource in my search for an automated way to notify users of MFA modifications. I've adapted the KQL query to function within Defender Advanced Hunting or Azure Entra, my objective is to establish an alert that directly E-Mails the affected user, informing them of the MFA change and advising them to contact security if they did not initiate it. While the query runs correctly under Defender Advanced Hunting, I'm currently unable to create a workable custom alert because no "ReportId" is being captured. Despite consulting with Copilot, Gemini, CDW Support, and Microsoft Support, no workable solution has been achieved. Any insight would be greatly appreciated - Thank You! //Advanced Hunting query to parse modified: //StrongAuthenticationUserDetails (SAUD) //StrongAuthenticationMethod (SAM) let SearchWindow = 1h; let AuthenticationMethods = dynamic(["TwoWayVoiceMobile","TwoWaySms","TwoWayVoiceOffice","TwoWayVoiceOtherMobile","TwoWaySmsOtherMobile","OneWaySms","PhoneAppNotification","PhoneAppOTP"]); let AuthenticationMethodChanges = CloudAppEvents | where ActionType == "Update user." and RawEventData contains "StrongAuthenticationMethod" | extend Target = tostring(RawEventData.ObjectId) | extend Actor = tostring(RawEventData.UserId) | mv-expand ModifiedProperties = parse_json(RawEventData.ModifiedProperties) | where ModifiedProperties.Name == "StrongAuthenticationMethod" | project Timestamp,Actor,Target,ModifiedProperties,RawEventData,ReportId; let OldValues = AuthenticationMethodChanges | extend OldValue = parse_json(tostring(ModifiedProperties.OldValue)) | mv-apply OldValue on (extend Old_MethodType=tostring(OldValue.MethodType),Old_Default=tostring(OldValue.Default) | sort by Old_MethodType); let NewValues = AuthenticationMethodChanges | extend NewValue = parse_json(tostring(ModifiedProperties.NewValue)) | mv-apply NewValue on (extend New_MethodType=tostring(NewValue.MethodType),New_Default=tostring(NewValue.Default) | sort by New_MethodType); let RemovedMethods = AuthenticationMethodChanges | join kind=inner OldValues on ReportId | join kind=leftouter NewValues on ReportId,$left.Old_MethodType==$right.New_MethodType | where Old_MethodType != New_MethodType | extend Action = strcat("Removed (" , AuthenticationMethods[toint(Old_MethodType)], ") from Authentication Methods.") | extend ChangedValue = "Method Removed"; let AddedMethods = AuthenticationMethodChanges | join kind=inner NewValues on ReportId | join kind=leftouter OldValues on ReportId,$left.New_MethodType==$right.Old_MethodType | where Old_MethodType != New_MethodType | extend Action = strcat("Added (" , AuthenticationMethods[toint(New_MethodType)], ") as Authentication Method.") | extend ChangedValue = "Method Added"; let DefaultMethodChanges = AuthenticationMethodChanges | join kind=inner OldValues on ReportId | join kind=inner NewValues on ReportId | where Old_Default != New_Default and Old_MethodType == New_MethodType and New_Default == "true" | join kind=inner OldValues on ReportId | where Old_Default1 == "true" and Old_MethodType1 != New_MethodType | extend Old_MethodType = Old_MethodType1 | extend Action = strcat("Default Authentication Method was changed to (" , AuthenticationMethods[toint(New_MethodType)], ").") | extend ChangedValue = "Default Method"; let AuthenticationMethodReport = union RemovedMethods,AddedMethods,DefaultMethodChanges | project Timestamp,Action,Actor,Target,ChangedValue,OldValue=case(isempty(Old_MethodType), "",strcat(Old_MethodType,": ", AuthenticationMethods[toint(Old_MethodType)])),NewValue=case(isempty( New_MethodType),"", strcat(New_MethodType,": ", AuthenticationMethods[toint(New_MethodType)])); let AuthenticationDetailsChanges = CloudAppEvents | where ActionType == "Update user." and RawEventData contains "StrongAuthenticationUserDetails" | extend Target = tostring(RawEventData.ObjectId) | extend Actor = tostring(RawEventData.UserId) | extend ReportId= tostring(RawEventData.ReportId) | mvexpand ModifiedProperties = parse_json(RawEventData.ModifiedProperties) | where ModifiedProperties.Name == "StrongAuthenticationUserDetails" | extend NewValue = parse_json(replace_string(replace_string(tostring(ModifiedProperties.NewValue),"[",""),"]","")) | extend OldValue = parse_json(replace_string(replace_string(tostring(ModifiedProperties.OldValue),"[",""),"]","")) | mv-expand NewValue | mv-expand OldValue | where (tostring( bag_keys(OldValue)) == tostring(bag_keys(NewValue))) or (isempty(OldValue) and tostring(NewValue) !contains ":null") or (isempty(NewValue) and tostring(OldValue) !contains ":null") | extend ChangedValue = tostring(bag_keys(NewValue)[0]) | extend OldValue = tostring(parse_json(OldValue)[ChangedValue]) | extend NewValue = tostring(parse_json(NewValue)[ChangedValue]) | extend OldValue = case(ChangedValue == "PhoneNumber" or ChangedValue == "AlternativePhoneNumber", replace_strings(OldValue,dynamic([' ','(',')']), dynamic(['','',''])), OldValue ) | extend NewValue = case(ChangedValue == "PhoneNumber" or ChangedValue == "AlternativePhoneNumber", replace_strings(NewValue,dynamic([' ','(',')']), dynamic(['','',''])), NewValue ) | where tostring(OldValue) != tostring(NewValue) | extend Action = case(isempty(OldValue), strcat("Added new ",ChangedValue, " to Strong Authentication."),isempty(NewValue),strcat("Removed existing ",ChangedValue, " from Strong Authentication."),strcat("Changed ",ChangedValue," in Strong Authentication.")); union AuthenticationMethodReport, AuthenticationDetailsChanges | extend AccountUpn = Target | where Timestamp > ago(SearchWindow) //| summarize count() by Timestamp, Action, Actor, Target, ChangedValue, OldValue, NewValue, ReportId, AccountDisplayName, AccountId, AccountUpn | summarize arg_max(Timestamp, *) by Action | project Timestamp, Action, Actor, Target, ChangedValue, OldValue, NewValue, ReportId, AccountDisplayName, AccountId, AccountUpn | sort by Timestamp desc576Views1like2Comments