sentinel
29 TopicsAutomating Sentinel Triage with Microsoft Security Copilot
Automating Sentinel Triage with Microsoft Security Copilot 🤖🧠🛡️ We’re diving deep into the transformative world of AI-driven automation in cybersecurity. This session will explore how Microsoft Security Copilot, integrated with Logic Apps, can supercharge the triage process in Microsoft Sentinel. 💡 What you’ll take away: ✔️ Practical applications of AI in triage and incident response ✔️ How to reduce manual effort and operational costs ✔️ Innovative strategies to elevate efficiency in your SOC Join us as we explore how cutting-edge AI reshapes security operations and empowers teams to focus on what matters most. 🗓️ Date: 29 September 2025 ⏰ Time: 17:00 (AEST) 🎙️ Speaker: Anthony Porter 📌 Topic: Automating Sentinel Triage with Microsoft Security Copilot53Views1like1CommentOptimizing Microsoft Sentinel: Resolving AMA-Induced Syslog & CEF Duplicates
2) Recommended Solutions When collecting both Syslog and CEF logs from the same Linux collector using the Azure Monitor Agent (AMA) in Microsoft Sentinel, duplicate log entries can occur. These duplicates arise because the same event may be ingested through both the Syslog and CEF pipelines, leading to redundancy in the Log Analytics Workspace (LAW). The following solutions aim to eliminate or reduce duplicate log ingestion, ensuring that: CEF events are parsed correctly and only once. Syslog data remains clean and non-redundant. Storage and analytics efficiency is improved. Alerting and incident investigation are not skewed by duplicate entries. Each option provides a different strategy based on your environment’s flexibility and configuration capabilities—from facility-level separation, to ingestion-time filtering, to daemon-side log routing. Option 1: Facility Separation (Preferred) Configure devices to emit CEF logs on a dedicated facility (for example, 'local4'), and adjust the Data Collection Rules (DCRs) so that the CEF stream includes only that facility, while the Syslog stream excludes it. This ensures CEF events are parsed once into 'CommonSecurityLog' and never land in 'Syslog'. CEF via AMA DCR (include only CEF facility): { "properties": { "dataSources": { "syslog": [ { "streams": ["Microsoft-CommonSecurityLog"], "facilityNames": ["local4"], "logLevels": ["*"], "name": "cefDataSource" } ] }, "dataFlows": [ { "streams": ["Microsoft-CommonSecurityLog"], "destinations": ["laDest"] } ] } } Syslog via AMA DCR (exclude CEF facility): { "properties": { "dataSources": { "syslog": [ { "streams": ["Microsoft-Syslog"], "facilityNames": [ "auth","authpriv","cron","daemon","kern","mail", "syslog","user","local0","local1","local2","local3", "local5","local6","local7" ], "logLevels": ["*"], "name": "syslogDataSource" } ] }, "dataFlows": [ { "streams": ["Microsoft-Syslog"], "destinations": ["laDest"] } ] } } Option 2: Ingest-time Transform (Drop CEF from Syslog) If facility separation is not feasible, apply a transformation to the Syslog stream in the DCR so that any CEF-formatted messages are dropped during ingestion. Syslog stream transformKql: { "properties": { "dataFlows": [ { "streams": ["Microsoft-Syslog"], "transformKql": "source | where not(SyslogMessage startswith 'CEF:')", "destinations": ["laDest"] } ] } } Option 3: Daemon-side Filtering/Rewriting (rsyslog/syslog-ng) Filter or rewrite CEF messages before AMA sees them. For example, route CEF messages to a dedicated facility using syslog-ng and stop further processing: # Match CEF filter f_cef { message("^CEF:"); }; # Send CEF to local5 and stop further processing log { source(s_src); filter(f_cef); rewrite { set_facility(local5); }; destination(d_azure_mdsd); flags(final); } 3) Verification Steps with KQL Queries Detect CEF messages that leaked into Syslog: Syslog | where TimeGenerated > ago(1d) | where SyslogMessage startswith "CEF:" | summarize count() by Computer | order by count_ desc Estimate duplicate count across Syslog and CommonSecurityLog: let sys = Syslog | where TimeGenerated > ago(1d) | where SyslogMessage startswith "CEF:" | extend key = hash_sha256(SyslogMessage); let cef = CommonSecurityLog | where TimeGenerated > ago(1d) | extend key = hash_sha256(RawEvent); cef | join kind=innerunique (sys) on key | summarize duplicates = count() Note : You should identify the RawEvent that might be causing the duplicates. 3.1) Duplicate Detection Query Explained This query helps quantify duplicate ingestion when both Syslog and CEF connectors ingest the same events. It works as follows: Build the Syslog set (sys): Filter the 'Syslog' table for the last day and keep only messages that start with 'CEF:'. Compute a SHA-256 hash of the entire message as a stable join key ("key"). Build the CEF set (cef): Filter the 'CommonSecurityLog' table for the last day and compute a SHA-256 hash of the 'RawEvent' field as the same-style join key. Join on the key: Use 'join kind=innerunique' to find messages that exist in both sets (i.e., duplicates). Summarize: Count the number of matching rows to get a duplicate total. 4) Common Pitfalls - Overlapping DCRs applied to the same collector VM causing overlapping facilities/severities. - CEF and Syslog using the same facility on sources, leading to ingestion on both streams. - rsyslog/syslog-ng filters placed after AMA’s own configuration include (ensure your custom rules run before '10-azuremonitoragent.conf'). 5) References - Microsoft Learn: Ingest syslog and CEF messages to Microsoft Sentinel with AMA (https://learn.microsoft.com/en-us/azure/sentinel/connect-cef-syslog-ama)KQL: setting query time leads to problem in watchlist column projecting
Hello to the community! I have stumbled upon a very strange issue when using watchlists. I have a watchlist with 2 columns (userPrincipalName,allowedActivity) that I am then using to whitelist activities. Watchlist is imported using: let WhitelistedUsers = _GetWatchlist("testQuery") | project userPrincipalName, allowedActivity; Then I wanted to set it to a specific time frame to test it on given data set: set query_now = datetime("1/14/2022, 1:45:46.556 PM"); Problem is that when setting my query for a specific time, I get the following error from the watchlist: 'project' operator: Failed to resolve scalar expression named 'userPrincipalName'. Commenting the set query_now solves the project problem (not my problem though). I tried to set the time before and after watchlist import but that does not solve the issue. I could not find any posts around the topic (quite a specific one), so anyone observed similar behaviors or has a possible explanation? I can probably work around the set query_now with other functions but I gotten used to it, and find this behavior extremely strange4.3KViews0likes5CommentsSentinel-Threat Intelligence Feeds Integration to strengthen Threat Detection & Proactive Hunting.
Combining threat intelligence feeds is important for detecting threats and identifying Indicators of Compromise (IOCs) in various scenarios. Here are some key situations where this approach is advantageous: Comprehensive Threat Detection Integrating multiple threat intelligence feeds can cover a wider range of threats. Different feeds may provide unique insights into malicious activities, IP addresses, domain names, and other IOCs. Reducing False Positives Combining feeds helps cross-verify data, decreasing the likelihood of false positives. This ensures that security teams focus on actual threats rather than inaccurate alerts. Enhanced Contextual Analysis Multiple feeds can offer richer context around threats, including tactics, techniques, and procedures (TTPs) used by attackers. This helps in understanding the threat landscape better and making informed decisions. Real-Time Threat Response Integrating feeds allows for real-time updates on emerging threats. This enables security teams to respond swiftly to new threats and mitigate potential damage. Proactive Threat Hunting Threat hunters can use combined feeds to identify patterns and anomalies that might indicate a threat. This proactive approach assists in detecting threats before they can cause significant harm. Improved Threat Intelligence Sharing Combining feeds from different sources, such as government agencies, commercial vendors, and open-source communities, enhances the overall quality and reliability of threat intelligence. Example Query in Microsoft Sentinel Here's an example of how you might combine two threat intelligence feeds using the coalesce function in KQL: _______________________________________________________________________________________ ThreatIntelFeed1 | extend CombinedIndicator = coalesce(ThreatIntelFeed1.Indicator, ThreatIntelFeed2.Indicator) | extend CombinedDescription = coalesce(ThreatIntelFeed1.Description, ThreatIntelFeed2.Description) | project CombinedIndicator, CombinedDescription _________________________________________________________________________________________ In the above example coalsce function is used. The coalesce function in Kusto Query Language (KQL) is used to evaluate a list of expressions and return the first non-null (or non-empty for strings) expression. This function is particularly useful in Microsoft Sentinel for handling data where some fields might be missing or null. Syntax coalesce(arg, arg_2, [arg_3, ...]) arg: The expression to be evaluated. All arguments must be of the same type. Maximum of 64 arguments is supported. Functions of coalesce in Sentinel Threat Intelligence Feeds Handling Missing Data: It helps in filling gaps where data might be missing by providing a fallback value. For example, if one threat intelligence feed lacks an IP address, coalesce can pull it from another feed. Data Normalization: Combines multiple fields into one, ensuring that you always have a value to work with. This is useful when different feeds provide similar data in different fields. Simplifying Queries: Reduces the need for complex conditional logic to handle null values, making queries more readable and maintainable. Let’s look at Threat Intelligence Analytic rule where caolsec function is used. The query combines threat intelligence indicators with DNS data to identify potential malicious activity. It ensures that only relevant and recent indicators are considered and matches them with DNS queries to detect suspicious behavior. This query ensures that you obtain the most comprehensive data by taking the first non-null value from either feed. Let's break down this KQL query step by step: Define Lookback Periods dt_lookBack: Sets a lookback period of 1 hour for DNS data. ioc_lookBack: Sets a lookback period of 14 days for threat intelligence indicators. Extract Relevant Threat Intelligence Indicators ThreatIntelligenceIndicator: Filters threat intelligence indicators generated within the last 14 days and not expired. arg_max(TimeGenerated, *) by IndicatorId: Summarizes to get the latest indicator for each IndicatorId. Active == true: Filters only active indicators. coalesce(NetworkIP, NetworkDestinationIP, NetworkSourceIP, EmailSourceIpAddress, "NO_IP"): Combines various IP fields into a single IoC field, defaulting to "NO_IP" if none are present. where IoC != "NO_IP": Filters out entries without valid IP addresses. Join with DNS Data join kind=innerunique: Joins the threat intelligence indicators with DNS data using an inner unique join to keep performance fast and result set low. _Im_Dns(starttime=ago(dt_lookBack)): Retrieves DNS data from the last hour. where isnotempty(DnsResponseName): Filters DNS records with non-empty response names. summarize imDns_mintime=min(TimeGenerated), imDns_maxtime=max(TimeGenerated) by SrcIpAddr, DnsQuery, DnsResponseName, Dvc, EventProduct, EventVendor: Summarizes DNS data by various fields. extract_all(@'(\d+\.\d+\.\d+\.\d+)', DnsResponseName): Extracts all IP addresses from the DNS response name. mv-expand IoC = addresses to typeof(string): Expands the extracted IP addresses into individual rows. Combined KQL looks like below _________________________________________________________________________________________ let dt_lookBack = 1h; let ioc_lookBack = 14d; let IP_TI = ThreatIntelligenceIndicator | where TimeGenerated >= ago(ioc_lookBack) and ExpirationDateTime > now() | summarize LatestIndicatorTime = arg_max(TimeGenerated, *) by IndicatorId | where Active == true | extend IoC = coalesce(NetworkIP, NetworkDestinationIP, NetworkSourceIP,EmailSourceIpAddress,"NO_IP") | where IoC != "NO_IP" ; IP_TI | join kind=innerunique // using innerunique to keep perf fast and result set low, we only need one match to indicate potential malicious activity that needs to be investigated ( _Im_Dns(starttime=ago(dt_lookBack)) | where isnotempty(DnsResponseName) | summarize imDns_mintime=min(TimeGenerated), imDns_maxtime=max(TimeGenerated) by SrcIpAddr, DnsQuery, DnsResponseName, Dvc, EventProduct, EventVendor | extend addresses = extract_all (@'(\d+\.\d+\.\d+\.\d+)', DnsResponseName) | mv-expand IoC = addresses to typeof(string) ) on IoC _________________________________________________________________________________________ Summary This article explores the importance of combining threat intelligence feeds to improve security operations. Key benefits include extending threat coverage, reducing false positives, and enhancing contextual analysis through detailed insights into attackers' tactics and techniques. The integration process also facilitates real-time threat updates and enables better collaboration between different intelligence sources. An example is provided using KQL (Kusto Query Language) to demonstrate how threat intelligence feeds can be combined effectively within Microsoft Sentinel. The query showcases steps like defining lookback periods, extracting relevant indicators, and correlating them with DNS data through an inner unique join. By leveraging this method, organizations can efficiently identify potential malicious activities and strengthen their threat response capabilities. The content emphasizes that integrating threat feeds is not just a technical function but a strategic necessity to fortify organizations against evolving cyber threats.Sentinel Notebook: Guided Hunting - Domain Generation Algorithm (DGA) Detection
Overview This notebook, titled “Guided Hunting - Domain Generation Algorithm (DGA) Detection”, provides a framework for investigating anomalous network activity by identifying domains generated by algorithms, which are often used by malware to evade detection. It integrates data from Log Analytics (DeviceNetworkEvents) and employs Python-based tools and libraries such as “msticpy”, “pandas”, and “scikit-learn” for analysis and visualization. DGA detection is crucial for cybersecurity as it helps identify and mitigate threats like botnets and malware that use dynamically generated domains for command-and-control communication, making it a key component in proactive threat hunting and network defense. Link: https://github.com/GonePhishing402/SentinelNotebooks/blob/main/DGA_Detection_ManagedIdentity.ipynb What is Domain Generation Algorithm and How to Detect it? A Domain Generation Algorithm (DGA) is a technique used by malware to create numerous domain names for communicating with Command and Control (C2) servers, ensuring continued operation even if some domains are blocked. DGAs evade static signature detection by dynamically generating unpredictable domain names, making it hard for traditional security methods to identify and blacklist them. Machine learning models can effectively detect DGAs by analyzing patterns and features in domain names, leveraging techniques like deep learning to adapt to new variants and identify anomalies that static methods may miss. How to Run the Notebook Log in with Managed Identity This notebook requires you to authenticate with a managed identity. The managed identity can be created from the Azure portal and must have the following RBAC: - Sentinel Contributor - Log Analytics Contributor - AzureML Data Scientist - AzureML Compute Operator Replace the [CLIENT_ID] with the client ID for your managed identity. This can be obtained from the Azure portal under Managed Identities -> Select the identity -> Overview. Note: This notebook will still work if you choose to authenticate with just an azure user using the CLI method as well. Import Libraries This code block is used to import the necessary libraries and label the “credential” variable to use the ManagedIdentityCredential() library. Setup msticpyconfig.yaml This section just pulls the msticpyconfig.yaml to use later on in the notebook. Ensure this is setup before running this notebook and in your current working directory. Setup QueryProvider The query provider is setup for Azure Sentinel. This does not need to be changed unless you want to use a different query provider from msticpy. Connect to Sentinel This code block is used to connect to Sentinel with the managed identity to the workspace specified in your msticpyconfig.yaml. You should see a “connected” after running this code block. DGA Model Creation This code block is designed to use CountVectorizer() and MultinomialNB() to create a model called dga_model.joblib and save it to the path specified in the “model_filename” variable. It is important to change this path specific to your environment. You must give the algorithm data to learn from in order to be effective. Download the domain.csv located here and upload to your current working directory on Azure Machine Learning Workspace: DGA_Detection/data/domain.csv at master · hmaccelerate/DGA_Detection You must also change line 10 in this code block to have the “labeled_domains_df” point to the domain.csv in your environment. Once you run the code block, you should see the model saved and the model accuracy. This number will vary depending on the data you are giving it. Apply dga_model.joblib to Sentinel Data This code block uses the model that we generated in the previous block and runs it against our data we specific in the “query” variable. This is using domain names from the “DeviceNetworkEvents” table in MDE events. The “parse_json” was used in our KQL to extract the appropriate sub-field needed for this search. When this model is run against the data, it will try to determine if any domains in our environment are associated with domain generation algorithms (DGA). If the “IsDGA” column contains a value of “True”, the model has determined that the characteristics of that domain matches a DGA. Here is what the output will look like: Output All Results to CSV This code block will output all the results above to a CSV called “dgaresults.csv”. Change the “output_path” variable to match your environment. Filter DGA Results to CSV This code block will output just the DGA results above to a CSV called “dgaresults2.csv”. Change the “output_path” variable to match your environment. How to Investigate these Results Further You can take the domain results that match DGA and find the correlating IP to see if it matches any threat intelligence. Correlate findings with other security logs, such as firewall or endpoint data, to uncover patterns of malicious behavior. This approach helps pinpoint potential threats and enables proactive mitigation. We can also create logic apps to automate follow-on analysis of these notebooks. This will be covered in a later blog.