KQL
426 TopicsUnderstand New Sentinel Pricing Model with Sentinel Data Lake Tier
Introduction on Sentinel and its New Pricing Model Microsoft Sentinel is a cloud-native Security Information and Event Management (SIEM) and Security Orchestration, Automation, and Response (SOAR) platform that collects, analyzes, and correlates security data from across your environment to detect threats and automate response. Traditionally, Sentinel stored all ingested data in the Analytics tier (Log Analytics workspace), which is powerful but expensive for high-volume logs. To reduce cost and enable customers to retain all security data without compromise, Microsoft introduced a new dual-tier pricing model consisting of the Analytics tier and the Data Lake tier. The Analytics tier continues to support fast, real-time querying and analytics for core security scenarios, while the new Data Lake tier provides very low-cost storage for long-term retention and high-volume datasets. Customers can now choose where each data type lands—analytics for high-value detections and investigations, and data lake for large or archival types—allowing organizations to significantly lower cost while still retaining all their security data for analytics, compliance, and hunting. Please flow diagram depicts new sentinel pricing model: Now let's understand this new pricing model with below scenarios: Scenario 1A (PAY GO) Scenario 1B (Usage Commitment) Scenario 2 (Data Lake Tier Only) Scenario 1A (PAY GO) Requirement Suppose you need to ingest 10 GB of data per day, and you must retain that data for 2 years. However, you will only frequently use, query, and analyze the data for the first 6 months. Solution To optimize cost, you can ingest the data into the Analytics tier and retain it there for the first 6 months, where active querying and investigation happen. After that period, the remaining 18 months of retention can be shifted to the Data Lake tier, which provides low-cost storage for compliance and auditing needs. But you will be charged separately for data lake tier querying and analytics which depicted as Compute (D) in pricing flow diagram. Pricing Flow / Notes The first 10 GB/day ingested into the Analytics tier is free for 31 days under the Analytics logs plan. All data ingested into the Analytics tier is automatically mirrored to the Data Lake tier at no additional ingestion or retention cost. For the first 6 months, you pay only for Analytics tier ingestion and retention, excluding any free capacity. For the next 18 months, you pay only for Data Lake tier retention, which is significantly cheaper. Azure Pricing Calculator Equivalent Assuming no data is queried or analyzed during the 18-month Data Lake tier retention period: Although the Analytics tier retention is set to 6 months, the first 3 months of retention fall under the free retention limit, so retention charges apply only for the remaining 3 months of the analytics retention window. Azure pricing calculator will adjust accordingly. Scenario 1B (Usage Commitment) Now, suppose you are ingesting 100 GB per day. If you follow the same pay-as-you-go pricing model described above, your estimated cost would be approximately $15,204 per month. However, you can reduce this cost by choosing a Commitment Tier, where Analytics tier ingestion is billed at a discounted rate. Note that the discount applies only to Analytics tier ingestion—it does not apply to Analytics tier retention costs or to any Data Lake tier–related charges. Please refer to the pricing flow and the equivalent pricing calculator results shown below. Monthly cost savings: $15,204 – $11,184 = $4,020 per month Now the question is: What happens if your usage reaches 150 GB per day? Will the additional 50 GB be billed at the Pay-As-You-Go rate? No. The entire 150 GB/day will still be billed at the discounted rate associated with the 100 GB/day commitment tier bucket. Azure Pricing Calculator Equivalent (100 GB/ Day) Azure Pricing Calculator Equivalent (150 GB/ Day) Scenario 2 (Data Lake Tier Only) Requirement Suppose you need to store certain audit or compliance logs amounting to 10 GB per day. These logs are not used for querying, analytics, or investigations on a regular basis, but must be retained for 2 years as per your organization’s compliance or forensic policies. Solution Since these logs are not actively analyzed, you should avoid ingesting them into the Analytics tier, which is more expensive and optimized for active querying. Instead, send them directly to the Data Lake tier, where they can be retained cost-effectively for future audit, compliance, or forensic needs. Pricing Flow Because the data is ingested directly into the Data Lake tier, you pay both ingestion and retention costs there for the entire 2-year period. If, at any point in the future, you need to perform advanced analytics, querying, or search, you will incur additional compute charges, based on actual usage. Even with occasional compute charges, the cost remains significantly lower than storing the same data in the Analytics tier. Realized Savings Scenario Cost per Month Scenario 1: 10 GB/day in Analytics tier $1,520.40 Scenario 2: 10 GB/day directly into Data Lake tier $202.20 (without compute) $257.20 (with sample compute price) Savings with no compute activity: $1,520.40 – $202.20 = $1,318.20 per month Savings with some compute activity (sample value): $1,520.40 – $257.20 = $1,263.20 per month Azure calculator equivalent without compute Azure calculator equivalent with Sample Compute Conclusion The combination of the Analytics tier and the Data Lake tier in Microsoft Sentinel enables organizations to optimize cost based on how their security data is used. High-value logs that require frequent querying, real-time analytics, and investigation can be stored in the Analytics tier, which provides powerful search performance and built-in detection capabilities. At the same time, large-volume or infrequently accessed logs—such as audit, compliance, or long-term retention data—can be directed to the Data Lake tier, which offers dramatically lower storage and ingestion costs. Because all Analytics tier data is automatically mirrored to the Data Lake tier at no extra cost, customers can use the Analytics tier only for the period they actively query data, and rely on the Data Lake tier for the remaining retention. This tiered model allows different scenarios—active investigation, archival storage, compliance retention, or large-scale telemetry ingestion—to be handled at the most cost-effective layer, ultimately delivering substantial savings without sacrificing visibility, retention, or future analytical capabilities.94Views0likes0Commentsneed to create monitoring queries to track the health status of data connectors
I'm working with Microsoft Sentinel and need to create monitoring queries to track the health status of data connectors. Specifically, I want to: Identify unhealthy or disconnected data connectors, Determine when a data connector last lost connection Get historical connection status information What I'm looking for: A KQL query that can be run in the Sentinel workspace to check connector status OR a PowerShell script/command that can retrieve this information Ideally, something that can be automated for regular monitoring Looking at the SentinelHealth table, but unsure about the exact schema,connector, etc Checking if there are specific tables that track connector status changes Using Azure Resource Graph or management APIs Ive Tried multiple approaches (KQL, PowerShell, Resource Graph) however I somehow cannot get the information I'm looking to obtain. Please assist with this, for example i see this microsoft docs page, https://learn.microsoft.com/en-us/azure/sentinel/monitor-data-connector-health#supported-data-connectors however I would like my query to state data such as - Last ingestion of tables? How much data has been ingested by specific tables and connectors? What connectors are currently connected? The health of my connectors? Please help265Views2likes3CommentsSentinel Data Connector: Google Workspace (G Suite) (using Azure Functions)
I'm encountering a problem when attempting to run the GWorkspace_Report workbook in Azure Sentinel. The query is throwing this error related to the union operator: 'union' operator: Failed to resolve table expression named 'GWorkspace_ReportsAPI_gcp_CL' I've double-checked, and the GoogleWorkspaceReports connector is installed and updated to version 3.0.2. Has anyone seen this or know what might be causing the table GWorkspace_ReportsAPI_gcp_CL to be unresolved? Thanks!149Views0likes2CommentsDevice Tables are not ingesting tables for an orgs workspace
Device Tables are not ingesting tables for an orgs workspace. I can confirm that all devices are enrolled and onboarded to MDE (Microsoft defender for endpoint) I had placed an EICAR file on one of the machine which bought an alert through to sentinel,however this did not invoke any of the device related tables . Workspace i am targeting Workspace from another org with tables enabled and ingesting data Microsoft Defender XDR connector shows as connected however the tables do not seem to be ingesting data; I run the following; DeviceEvents | where TimeGenerated > ago(15m) | top 20 by TimeGenerated DeviceProcessEvents | where TimeGenerated > ago(15m) | top 20 by TimeGenerated I receive no results; No results found from the specified time range Try selecting another time range Please assist As I cannot think where this is failing91Views1like1CommentOptimizing Microsoft Sentinel: Resolving AMA-Induced Syslog & CEF Duplicates
2) Recommended Solutions When collecting both Syslog and CEF logs from the same Linux collector using the Azure Monitor Agent (AMA) in Microsoft Sentinel, duplicate log entries can occur. These duplicates arise because the same event may be ingested through both the Syslog and CEF pipelines, leading to redundancy in the Log Analytics Workspace (LAW). The following solutions aim to eliminate or reduce duplicate log ingestion, ensuring that: CEF events are parsed correctly and only once. Syslog data remains clean and non-redundant. Storage and analytics efficiency is improved. Alerting and incident investigation are not skewed by duplicate entries. Each option provides a different strategy based on your environment’s flexibility and configuration capabilities—from facility-level separation, to ingestion-time filtering, to daemon-side log routing. Option 1: Facility Separation (Preferred) Configure devices to emit CEF logs on a dedicated facility (for example, 'local4'), and adjust the Data Collection Rules (DCRs) so that the CEF stream includes only that facility, while the Syslog stream excludes it. This ensures CEF events are parsed once into 'CommonSecurityLog' and never land in 'Syslog'. CEF via AMA DCR (include only CEF facility): { "properties": { "dataSources": { "syslog": [ { "streams": ["Microsoft-CommonSecurityLog"], "facilityNames": ["local4"], "logLevels": ["*"], "name": "cefDataSource" } ] }, "dataFlows": [ { "streams": ["Microsoft-CommonSecurityLog"], "destinations": ["laDest"] } ] } } Syslog via AMA DCR (exclude CEF facility): { "properties": { "dataSources": { "syslog": [ { "streams": ["Microsoft-Syslog"], "facilityNames": [ "auth","authpriv","cron","daemon","kern","mail", "syslog","user","local0","local1","local2","local3", "local5","local6","local7" ], "logLevels": ["*"], "name": "syslogDataSource" } ] }, "dataFlows": [ { "streams": ["Microsoft-Syslog"], "destinations": ["laDest"] } ] } } Option 2: Ingest-time Transform (Drop CEF from Syslog) If facility separation is not feasible, apply a transformation to the Syslog stream in the DCR so that any CEF-formatted messages are dropped during ingestion. Syslog stream transformKql: { "properties": { "dataFlows": [ { "streams": ["Microsoft-Syslog"], "transformKql": "source | where not(SyslogMessage startswith 'CEF:')", "destinations": ["laDest"] } ] } } Option 3: Daemon-side Filtering/Rewriting (rsyslog/syslog-ng) Filter or rewrite CEF messages before AMA sees them. For example, route CEF messages to a dedicated facility using syslog-ng and stop further processing: # Match CEF filter f_cef { message("^CEF:"); }; # Send CEF to local5 and stop further processing log { source(s_src); filter(f_cef); rewrite { set_facility(local5); }; destination(d_azure_mdsd); flags(final); } 3) Verification Steps with KQL Queries Detect CEF messages that leaked into Syslog: Syslog | where TimeGenerated > ago(1d) | where SyslogMessage startswith "CEF:" | summarize count() by Computer | order by count_ desc Estimate duplicate count across Syslog and CommonSecurityLog: let sys = Syslog | where TimeGenerated > ago(1d) | where SyslogMessage startswith "CEF:" | extend key = hash_sha256(SyslogMessage); let cef = CommonSecurityLog | where TimeGenerated > ago(1d) | extend key = hash_sha256(RawEvent); cef | join kind=innerunique (sys) on key | summarize duplicates = count() Note : You should identify the RawEvent that might be causing the duplicates. 3.1) Duplicate Detection Query Explained This query helps quantify duplicate ingestion when both Syslog and CEF connectors ingest the same events. It works as follows: Build the Syslog set (sys): Filter the 'Syslog' table for the last day and keep only messages that start with 'CEF:'. Compute a SHA-256 hash of the entire message as a stable join key ("key"). Build the CEF set (cef): Filter the 'CommonSecurityLog' table for the last day and compute a SHA-256 hash of the 'RawEvent' field as the same-style join key. Join on the key: Use 'join kind=innerunique' to find messages that exist in both sets (i.e., duplicates). Summarize: Count the number of matching rows to get a duplicate total. 4) Common Pitfalls - Overlapping DCRs applied to the same collector VM causing overlapping facilities/severities. - CEF and Syslog using the same facility on sources, leading to ingestion on both streams. - rsyslog/syslog-ng filters placed after AMA’s own configuration include (ensure your custom rules run before '10-azuremonitoragent.conf'). 5) References - Microsoft Learn: Ingest syslog and CEF messages to Microsoft Sentinel with AMA (https://learn.microsoft.com/en-us/azure/sentinel/connect-cef-syslog-ama)Microsoft 365 defender alerts not capturing fields (entities) in azure sentinel
We got an alert from 365 defenders to azure sentinel ( A potentially malicious URL click was detected). To investigate this alert we have to check in the 365 defender portal. We noticed that entities are not capturing (user, host, IP). How can we resolve this issue? Note: This is not a custom rule.2.6KViews1like3CommentsMicrosoft Sentinel Query History not updating
Hello, Apologies if this isn't the correct place for this but I know I will likely retire before I get any traction with Microsoft support. Has anyone experienced issues with their Sentinel Query History not updating with the latest queries? I run a lot of queries each day and any time I open a new browser window and go to the logs tab, the latest query it shows in my history is 7/29/2025. If I run any new queries in that browser tab, they show in my query history but the moment I open a new browser tab and access sentinel logs, they are gone and it shows the latest query as 7/29/2025. My colleague has the exact same issue except their latest query date is 8/7/2025... Yes I do have the "Save query history" setting set to On. I have toggled it of and back on just to see if it would do anything but no luck. Does anyone know what could be causing this?318Views0likes6CommentsStandard Ontology and SIEM Field Mapping
Hello Community, We are working on a Microsoft Sentinel → Google Chronicle integration and need to automate the SIEM Field Mapping process between the two platforms Sentinel and google chronicle Schema Differences – Sentinel and Chronicle use different naming conventions and field hierarchies. Analytics Portability – Without mapping, a Chronicle rule expecting principal user email won’t understand Sentinel’s User Principal Name. Questions: Is there an API, PowerShell cmdlet, or Logic App method Sentinel’s field mapping with google chronicle fields.? is there any possibility via Automation.?149Views0likes2CommentsHow to exclude IPs & accounts from Analytic Rule, with Watchlist?
We are trying to filter out some false positives from a Analytic rule called "Service accounts performing RemotePS". Using automation rules still gives a lot of false mail notifications we don't want so we would like to try using a watchlist with the serviceaccounts and IP combination we want to exclude. Anyone knows where and what syntax we would need to exlude the items on the specific Watchlist? Query: let InteractiveTypes = pack_array( // Declare Interactive logon type names 'Interactive', 'CachedInteractive', 'Unlock', 'RemoteInteractive', 'CachedRemoteInteractive', 'CachedUnlock' ); let WhitelistedCmdlets = pack_array( // List of whitelisted commands that don't provide a lot of value 'prompt', 'Out-Default', 'out-lineoutput', 'format-default', 'Set-StrictMode', 'TabExpansion2' ); let WhitelistedAccounts = pack_array('FakeWhitelistedAccount'); // List of accounts that are known to perform this activity in the environment and can be ignored DeviceLogonEvents // Get all logon events... | where AccountName !in~ (WhitelistedAccounts) // ...where it is not a whitelisted account... | where ActionType == "LogonSuccess" // ...and the logon was successful... | where AccountName !contains "$" // ...and not a machine logon. | where AccountName !has "winrm va_" // WinRM will have pseudo account names that match this if there is an explicit permission for an admin to run the cmdlet, so assume it is good. | extend IsInteractive=(LogonType in (InteractiveTypes)) // Determine if the logon is interactive (True=1,False=0)... | summarize HasInteractiveLogon=max(IsInteractive) // ...then bucket and get the maximum interactive value (0 or 1)... by AccountName // ... by the AccountNames | where HasInteractiveLogon == 0 // ...and filter out all accounts that had an interactive logon. // At this point, we have a list of accounts that we believe to be service accounts // Now we need to find RemotePS sessions that were spawned by those accounts // Note that we look at all powershell cmdlets executed to form a 29-day baseline to evaluate the data on today | join kind=rightsemi ( // Start by dropping the account name and only tracking the... DeviceEvents // ... | where ActionType == 'PowerShellCommand' // ...PowerShell commands seen... | where InitiatingProcessFileName =~ 'wsmprovhost.exe' // ...whose parent was wsmprovhost.exe (RemotePS Server)... | extend AccountName = InitiatingProcessAccountName // ...and add an AccountName field so the join is easier ) on AccountName // At this point, we have all of the commands that were ran by service accounts | extend Command = tostring(extractjson('$.Command', tostring(AdditionalFields))) // Extract the actual PowerShell command that was executed | where Command !in (WhitelistedCmdlets) // Remove any values that match the whitelisted cmdlets | summarize (Timestamp, ReportId)=arg_max(TimeGenerated, ReportId), // Then group all of the cmdlets and calculate the min/max times of execution... make_set(Command, 100000), count(), min(TimeGenerated) by // ...as well as creating a list of cmdlets ran and the count.. AccountName, AccountDomain, DeviceName, DeviceId // ...and have the commonality be the account, DeviceName and DeviceId // At this point, we have machine-account pairs along with the list of commands run as well as the first/last time the commands were ran | order by AccountName asc // Order the final list by AccountName just to make it easier to go through | extend HostName = iff(DeviceName has '.', substring(DeviceName, 0, indexof(DeviceName, '.')), DeviceName) | extend DnsDomain = iff(DeviceName has '.', substring(DeviceName, indexof(DeviceName, '.') + 1), "")220Views0likes1CommentKQL: setting query time leads to problem in watchlist column projecting
Hello to the community! I have stumbled upon a very strange issue when using watchlists. I have a watchlist with 2 columns (userPrincipalName,allowedActivity) that I am then using to whitelist activities. Watchlist is imported using: let WhitelistedUsers = _GetWatchlist("testQuery") | project userPrincipalName, allowedActivity; Then I wanted to set it to a specific time frame to test it on given data set: set query_now = datetime("1/14/2022, 1:45:46.556 PM"); Problem is that when setting my query for a specific time, I get the following error from the watchlist: 'project' operator: Failed to resolve scalar expression named 'userPrincipalName'. Commenting the set query_now solves the project problem (not my problem though). I tried to set the time before and after watchlist import but that does not solve the issue. I could not find any posts around the topic (quite a specific one), so anyone observed similar behaviors or has a possible explanation? I can probably work around the set query_now with other functions but I gotten used to it, and find this behavior extremely strange4.4KViews0likes5Comments