Ingestion
12 TopicsCalculating Data Latency
When using Azure Data Explorer to process near real time data, it’s often important to understand how quickly or slowly the data arrives in the source table. For this post, we’ll assume that our source data has an EventTime field which denotes when the event actually happened on the source entity. The quickest way to determine latency is to look for the latest EventTime and compare it to the current time. If you do this repeatedly, you’ll get a rough idea of how often the table is getting updated and how fresh the data is. MyEventData | summarize max(EventTime) We can do a lot better than that though. In the background, Kusto is keeping track of the time that every row was ready to be queried. That information is available in the ingestion_time() scalar function. Comparing the ingestion time to the EventTime will show the lag for every row: MyEventData | project lag = ingestion_time() - EventTime At this point I can run some basic aggregations like min, avg and max, but let’s do more and build a cumulative distribution function for the latency. This will tell me how much of the data arrives within X minutes of the event time. I'll start by creating a function which calculates the cumulative distribution for a table of two values. This function uses the invoke operator which receives the source of the invoke as a tabular parameter argument. .create-or-alter function CumulativePercentage(T:(x:real,y:real)) { let sum = toreal(toscalar(T | summarize sum(y))); T | order by x asc | summarize x=make_list(x), y=make_list(y/sum * 100) | project x = x, y = series_iir(y, dynamic([1]), dynamic([1,-1])) | mv-expand x to typeof(real), y to typeof(real) } Now we need to get our ingestion data into the format that the CumulativePercentage function requires, invoke that function and render a linechart with the results. MyEventData | project lag = round((ingestion_time() - EventTime)/1m, 1) | summarize count() by lag | project x=toreal(lag), y=toreal(count_) | invoke CumulativePercentage() | render linechart Now I can see that if I wait 2.6 minutes, about 48% of the data will have arrived in Kusto. That information is handy if I’m doing manual debugging on logs, setting up a scheduled job to process to the data, or monitoring the latency of various data sources. [Update 3/12/2019] Replaced mvexpand and makelist with the newer/preferred versions: mv-expand and make_list.3.7KViews2likes0CommentsIngesting .CSV log files from Azure Blob Storage into Microsoft Sentinel
Overview: Organizations generate vast amounts of log data from various applications, services, and systems. These logs are often stored in .CSV (Comma-Separated Values) format in Azure Blob Storage, a scalable cloud-based storage solution. To enhance security monitoring, compliance, and threat detection, it is important to bring this log data into a centralized security tool like Microsoft Sentinel. The main goal is to automatically collect and analyze .CSV log files stored in Azure Blob Storage using Sentinel’s advanced analytics and automation capabilities. This enables better visibility into security events and helps in proactive threat management. Benefits: Flexible Log Ingestion via logic app: Allows ingestion of logs from systems without built-in Sentinel connectors, including custom, third-party, or legacy systems. Uses Existing Storage Workflows: Reuses Azure Blob Storage where logs are already being saved, with no need to change current export methods. Structured and Clean Data Format: .CSV files offer a structured format that makes mapping and parsing data into Sentinel efficient and reliable. Enables Custom Analysis: Once in Sentinel, the data can be queried using Kusto Query Language (KQL) for in-depth analysis and reporting. Operational Efficiency: Reduces manual efforts in collecting, uploading, or processing logs. Saves time for IT and security teams by automating the data pipeline. Improves Threat Visibility: Ingested data is available in real-time. Dashboards and visualizations make it easy to understand what's happening. Pre-requisites: Log Analytics Workspace A configured workspace to receive and analyze the ingested data. Blob Storage Path The exact location in Azure Blob Storage where the CSV log files are stored. Required Roles and Permissions Microsoft Sentinel Contributor– to manage Sentinel resources. Logic App Contributor– to create and manage automation workflows. Access to the Storage Account– to read and retrieve log files from Blob Storage. Implementation Steps: Configure the Logic App trigger to run whenever a new blob is added or an existing one is modified. Select the storage account and container details, then configure the recurrence based on how frequently data is uploaded to the storage account. Choose the authentication type to connect with storage account. CSV Retrieval: Use the Logic App action to retrieve the CSV blob content by specifying the exact file path of the container. CSV Parsing: Use built-in Logic App actions along with regex to parse the CSV content. Apply the Composeaction to split the file contents by new lines, converting them into an array for structured processing. Here is the expression used in SplitLines compose action: split(body('Get_blob_content_(V2)'),decodeUriComponent('%0D%0A')) Follow the below MS Doc to write expressions: Removing last(empty) line from previous output using another compose action as shown below, take(outputs('SplitLines'),add(length(outputs('SplitLines')),-1)) Separating field names using compose action: split(first(outputs('SplitLines')), ',') Column Mapping: Repeat the required expression using the Select action to map each column from the CSV file to its corresponding field in the structured output. **From**: **`skip(outputs('RemoveLastLine'), 1)`** **Map:** **`outputs('SplitFieldName')[0]`** **`split(item(), ',')?[0]`** **`outputs('SplitFieldName')[1]`** **`split(item(), ',')?[1]`** Data Ingestion to Sentinel: Leveraging the Microsoft Sentinel connector to ingest the parsed data into the appropriate table. The connection to be configured using the workspace ID, shared key, and target table name. Key Highlights: The Logic App is triggered whenever a file is added or modified in the Blob container. The CSV content is parsed within the Logic App before being ingested into Sentinel. Leveraged the Microsoft Sentinel connector to ingest the parsed data into Sentinel. To support dynamic updates, we recommended overwriting the existing CSV file in the storage account. Outcome: Log Visibility in Sentinel Workspace: Once the Logic App is triggered, the custom table will be created automatically in Microsoft Sentinel, and logs can be viewed by running a KQL query in the Sentinel workspace. Conclusion: Ingesting .CSV log files from Azure Blob Storage into Microsoft Sentinel is a powerful way to centralize and automate the organization’s security monitoring. It enhances visibility, supports compliance, and empowers security teams with timely insights and alerts.Kusto .NET and .NET Standard SDKs are public on NuGet.Org
We are thrilled to announce that Kusto .NET SDK has been released to Nuget.Org. It includes .Net Standard SDK that is currently available for preview. What’s new? Kusto SDK is now officially available from https://nuget.org. It includes .NET (release) and .NET Standard (preview) packages: [Release] Microsoft.Azure.Kusto.Data 4.0.3 [Release] Microsoft.Azure.Kusto.Ingest 4.0.3 [Preview] Microsoft.Azure.Kusto.Data.NETStandard 4.0.3 [Preview] Microsoft.Azure.Kusto.Ingest.NETStandard 4.0.3 [Release] Microsoft.Azure.Kusto.Management 1.0.7 [Release] Microsoft.Azure.Kusto.Tools 1.1.5 (LightIngest and Kusto.Cli) See Quickstart: Ingest data using the Azure Data Explorer .NET Standard SDK for ingestion using .NET SDK. What’s next? .NET Standard SDK released today is a preview version. We will continue to invest in its quality and reliability in the following versions. Currently .NET Standard SDKs are published as stand-alone packages. We plan to repackage both flavors into a single package. Stay tuned!2.6KViews1like0CommentsHow to Monitor Azure Data Explorer ingestion using diagnostic logs (Preview)
Azure Data Explorer uses diagnostic logs for insights on ingestion successes and failures. You can export operation logs to Azure Storage, Event Hub, or Log Analytics to monitor ingestion status. Logs from Azure Storage and Azure Event Hub can be routed to a table in your Azure Data Explorer cluster for further analysis.3.3KViews1like0Comments