machine learning
5 TopicsSentinel Notebook: Guided Hunting - Domain Generation Algorithm (DGA) Detection
Overview This notebook, titled “Guided Hunting - Domain Generation Algorithm (DGA) Detection”, provides a framework for investigating anomalous network activity by identifying domains generated by algorithms, which are often used by malware to evade detection. It integrates data from Log Analytics (DeviceNetworkEvents) and employs Python-based tools and libraries such as “msticpy”, “pandas”, and “scikit-learn” for analysis and visualization. DGA detection is crucial for cybersecurity as it helps identify and mitigate threats like botnets and malware that use dynamically generated domains for command-and-control communication, making it a key component in proactive threat hunting and network defense. Link: https://github.com/GonePhishing402/SentinelNotebooks/blob/main/DGA_Detection_ManagedIdentity.ipynb What is Domain Generation Algorithm and How to Detect it? A Domain Generation Algorithm (DGA) is a technique used by malware to create numerous domain names for communicating with Command and Control (C2) servers, ensuring continued operation even if some domains are blocked. DGAs evade static signature detection by dynamically generating unpredictable domain names, making it hard for traditional security methods to identify and blacklist them. Machine learning models can effectively detect DGAs by analyzing patterns and features in domain names, leveraging techniques like deep learning to adapt to new variants and identify anomalies that static methods may miss. How to Run the Notebook Log in with Managed Identity This notebook requires you to authenticate with a managed identity. The managed identity can be created from the Azure portal and must have the following RBAC: - Sentinel Contributor - Log Analytics Contributor - AzureML Data Scientist - AzureML Compute Operator Replace the [CLIENT_ID] with the client ID for your managed identity. This can be obtained from the Azure portal under Managed Identities -> Select the identity -> Overview. Note: This notebook will still work if you choose to authenticate with just an azure user using the CLI method as well. Import Libraries This code block is used to import the necessary libraries and label the “credential” variable to use the ManagedIdentityCredential() library. Setup msticpyconfig.yaml This section just pulls the msticpyconfig.yaml to use later on in the notebook. Ensure this is setup before running this notebook and in your current working directory. Setup QueryProvider The query provider is setup for Azure Sentinel. This does not need to be changed unless you want to use a different query provider from msticpy. Connect to Sentinel This code block is used to connect to Sentinel with the managed identity to the workspace specified in your msticpyconfig.yaml. You should see a “connected” after running this code block. DGA Model Creation This code block is designed to use CountVectorizer() and MultinomialNB() to create a model called dga_model.joblib and save it to the path specified in the “model_filename” variable. It is important to change this path specific to your environment. You must give the algorithm data to learn from in order to be effective. Download the domain.csv located here and upload to your current working directory on Azure Machine Learning Workspace: DGA_Detection/data/domain.csv at master · hmaccelerate/DGA_Detection You must also change line 10 in this code block to have the “labeled_domains_df” point to the domain.csv in your environment. Once you run the code block, you should see the model saved and the model accuracy. This number will vary depending on the data you are giving it. Apply dga_model.joblib to Sentinel Data This code block uses the model that we generated in the previous block and runs it against our data we specific in the “query” variable. This is using domain names from the “DeviceNetworkEvents” table in MDE events. The “parse_json” was used in our KQL to extract the appropriate sub-field needed for this search. When this model is run against the data, it will try to determine if any domains in our environment are associated with domain generation algorithms (DGA). If the “IsDGA” column contains a value of “True”, the model has determined that the characteristics of that domain matches a DGA. Here is what the output will look like: Output All Results to CSV This code block will output all the results above to a CSV called “dgaresults.csv”. Change the “output_path” variable to match your environment. Filter DGA Results to CSV This code block will output just the DGA results above to a CSV called “dgaresults2.csv”. Change the “output_path” variable to match your environment. How to Investigate these Results Further You can take the domain results that match DGA and find the correlating IP to see if it matches any threat intelligence. Correlate findings with other security logs, such as firewall or endpoint data, to uncover patterns of malicious behavior. This approach helps pinpoint potential threats and enables proactive mitigation. We can also create logic apps to automate follow-on analysis of these notebooks. This will be covered in a later blog.Optimisation For Abnormal Deny Rate for Source IP
Hi, I have recently enabled the "Abnormal Deny Rate for Source IP" alert in Microsoft Sentinel and found it to be quite noisy, generating a large number of alerts many of which do not appear to be actionable. I understand that adjusting the learning period is one way to reduce this noise. However, I am wondering if there are any other optimisation strategies available that do not involve simply changing the learning window. Has anyone had success with tuning this rule using: Threshold-based suppression (e.g. minimum deny count)? Source IP allowlists? Frequency filters (e.g. repeated anomalies over multiple intervals)? Combining with other signal types before generating alerts? Open to any suggestions, experiences, or best practices that others may have found effective in reducing false positives while still maintaining visibility into meaningful anomalies. Thanks in advance,215Views0likes1CommentNew Blog Post | Introduction to Machine Learning Notebooks in Microsoft Sentinel
Read the full blog post here: Introduction to Machine Learning Notebooks in Microsoft Sentinel It has never been harder to keep hybrid environments secure. Microsoft’s Security Research teams are observing an increasing number and complexity of cybercrimes occurring across all sectors of critical infrastructure, from targeted ransomware attacks to increasing password and phishing campaigns on email, according to the Microsoft Digital Defense Report. The 2022 Cost of Insider Threats reported that threat incidents have risen by over 44% in the last two years, with associated costs exceeding $15.38M per incident per year, up by a third in the preceding years. The report also concluded that there has been a 10.3% increase in the average time taken to contain an incident, from 77 days to 85 days. Advanced tools, techniques, and processes used by threat actor groups allow them to counter obsolete defences and scale their attack campaigns to a broad range of victims, from government organisations to for-profit enterprises. Original Post: New Blog Post | Introduction to Machine Learning Notebooks in Microsoft Sentinel - Microsoft Tech Community1.1KViews0likes0CommentsNew Blog Post | Microsoft Sentinel customizable machine learning based anomalies Generally Available
Microsoft Sentinel customizable machine learning based anomalies is Generally Available - Microsoft Tech Community Security analysts can use anomalies to reduce investigation and hunting time, as well as detect new and emerging threats. Typically, these benefits come at the cost of a high benign positive rate, but Microsoft Sentinel’s customizable anomaly models are tuned by our data science team and trained with the data in your Microsoft Sentinel workspace to reduce, providing out-of-the box value. If security analysts need to tune them further, the process is simple and requires no knowledge of machine learning. Read this blog to find out which capabilities were supported in Public Preview and how to tune anomalies: Democratize Machine Learning with Customizable ML Anomalies - Microsoft Tech Community In this blog, we will discuss how customizable machine learning based anomalies have improved since Public Preview. Original Post: New Blog Post | Microsoft Sentinel customizable machine learning based anomalies Generally Available - Microsoft Tech Community782Views0likes0Comments