Azure Data Explorer (Kusto)

71 Topics

Most effecient way to identify duplicates in data?
We're moving data analytics towards Kusto and one feature we would like to have is to sanity-check our data and find duplicates of data in a dataset. The problem I want to solve is to make sure we don't accidentally ingest duplicates and then report too high sales numers. Our plan now is to introduce an extra column with a sha1 hash of the row and do something like "summarize count() by sha1 | where count_ > 1" ... but that would need to touch every column in the data set. I realize that's in the nature of the problem, but I'd just like to investigate strategies what would make this as effecient as possible. Strategies I've thought of would be to first limit the dataset to a certain timewindow or perhaps by customerId. I also know about the ingestion tags but I don't want to impact extents too much since this is more of a sanity-check. What other strategies could we use to make this operation as efficient as possible?
abergs
Mar 14, 2019 Place Azure Data Explorer
18KViews
0likes
1Comment
Get a permanent URL for Azure Storage Blob?
I have images in Azure Storage Blob. I am trying to get a permanent URL for the images from Azure Storage Explorer. I can right-click on the actual blob container, select "Get Shared Access Signature" but it has a start and expiration time. I know I can set it to expire in 2050 or something like that, but is there a way to just have a URL that I can use with no start/expire date? The URL I get has the start/expire dates in it as shown below. Looking for a way to avoid that if possible. https://storageaccpimtname.blob.core.windows.net/filesareheresomewhere/" & ThisItem.ItemNumber & ".jpg?st=2019-11-22T18%3A16%3A00Z&se=2051-01-01T07%3A59%3A00Z&sp=rl&sv=2018-03-28&sr=c&sig=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" Apologies if not the right forum. Couldn't find one specific to Azure Storage Blobs.
Ed Hansberry
Nov 22, 2019 Place Azure Data Explorer
14KViews
0likes
2Comments
Harnessing the Power of Left-Anti Joins in the Kusto Query Language
The Kusto query language supports a variety of joins. Left-anti might not be among the most common ones used, but it can be one of the most powerful. The docs state that a left-anti join “returns all records from the left side that do not match any record from the right side.” Let’s walk through two ways that this can be used in your processing pipeline. Late-Arriving Data Let’s say that we have an incoming stream if time-series data that we want to process. We have a function called ProcessData(startTime:datetime, endTime:datetime) that periodically gets executed and written to a table called Output via .set-or-append commands. The function processes data between those two timestamps in the parameters. Since we don’t want to end up with duplicate rows, we can’t rerun with the same time window. We can, however, catch the late arriving data for that time window by implementing ProcessData in such a way that it reprocesses all the data in the previous day and then does a left-anti join against the Output table to only return the results haven’t been recorded yet. Anything new gets written to the Output table by the set-or-append command and the duplicates get thrown away. .create-or-alter function with (folder = "demo", skipvalidation = "true") ProcessData (startTime:datetime, endTime:datetime) { let lookback = 1d; let allData = SourceData | where Timestamp >= startTime - lookback and Timestamp < endTime ; OutputTable | join kind = leftanti (allData) on DeviceId, Timestamp } [Update 2019-02-21] The Kusto docs have a good document on dealing with late arriving data. Changelog Left-anti joins can also be used to create a changelog. Let’s say there is a process that is dumping 500,000 rows of data into a table. Those rows contain information about a set of devices. The table gets dropped and replaced every day. We can make a CreateChangelog() function that gets its results written to the Changelog table via set-or-append commands. We can do a left-anti join with the data we already have in Output and only write the rows that have changed. So the CreateChangelog function body would look something like this: DeviceData | where PreciseTimeStamp >= startTime and PreciseTimeStamp < endTime | project DeviceId, DimensionA | join kind = leftanti( Output | project DeviceId, DimensionA ) on DeviceId | project DeviceId, DimensionA, ProcessedTime=now() Now the Output table has a record of every time that a device was added, removed or modified.
Ben Martens
Feb 05, 2019 Place Azure Data Explorer
13KViews
3likes
0Comments
Kusto - Compare multiple returned values
Hi all, I would like to compare the HTTP 5xx and 2xx codes from the W3CIISLog in Azure Monitor using Kusto. How do you return two or more values and then compare against eachother? For example, I can return all 2xx and 5xx values using: search "W3CIISLog"// | where scStatus startswith "2" or scStatus startswith "5" But then I want what each returns into a variable so I can then compare to eachother. Thanks
Solved
Chris Peacock
Nov 12, 2019 Place Azure Data Explorer
12KViews
0likes
4Comments
How can I summarize by case insensitive?
Currently I'm doing 'summarize by tolower(name)' but I suspect that to be suboptimal
Solved
abergs
Feb 20, 2019 Place Azure Data Explorer
9KViews
0likes
2Comments
Using Azure Data Factory orchestrating Kusto query-ingest
In this blog post, we’ll explore how Azure Data Factory (ADF) can be used for orchestrating large query ingestions. With this approach you will learn, how to split one large query ingests into multiple partitions, orchestrated with ADF.
Hauke
Feb 19, 2023 Place Azure Data Explorer Blog
7.9KViews
3likes
1Comment
Ingesting Data To Azure Data Explorer With C#
So you’ve got a shiny new Azure Data Explorer (Kusto) cluster and you want to put some data into it. The easiest way is to import a CSV with Kusto Explorer, but if you want to do it programmatically, there are some nice APIs to get you started. Think of this post a crib notes for the real documentation which covers all of this in detail. This post will walk through the absolute minimum requirement to ingest data with C#. These steps assume that you have already created a cluster and a database. In the Azure portal, go to the query editor for your database and execute: .create table ingestionTest(a:int, b:int, c:int) Create a new Console app and reference the Microsoft.Azure.Kusto.Ingest NuGet package. See the docs for more help with the package. Go to the C# API samples and copy the code from Ingest From Local File(s) using KustoDirectIngestClient into your new console project. Go the the Overview blade for your cluster on the portal and copy the URI value. Pass that value into the KustoConnectionStringBuilder constructor. (Note, normally you would not use direct ingestion. It can be hard on your cluster and requires you to manage a lot of retry logic, etc on your own. Queued ingestion is preferred. In those cases, your connection string would be the "Data Ingestion URI" value.) Update the Initial Catalog to the name of your database and also set that as the value for the kustoDatabase variable lower in the code. The code sample is configured to use an AAD app to authenticate, but for this example, let's just keep it simple and run with your own AAD account. To do this, simply remove the ApplicationKey and ApplicationClientId fields from the initialize block. All you need to set are the FederatedSecurity and InitialCatalog fields. Update the kustoTable variable to be the name of the table that you created in step 1. Create a sample CSV file that has three integers on each row. Put that file name into the "files" list in the code. Run! Because this is using direct ingestion, you should immediately be able to go the Query blade in the portal and query your table. So if you called your table "ingestionTest", execute a query that is just "ingestionTest". You should see the contents of your data there. In a real production scenario, you would be using queued ingestion and you'd probably want to ingest from a blob or a stream. For further reading, checkout a bigger walkthrough of a scenario like I described above and also best practices for ingestion. Congrats! You have now ingested some data into Azure Data Explorer. Expand that out to a millions rows and witness the power of the Kusto query language!
Ben Martens
Jan 31, 2019 Place Azure Data Explorer
7.9KViews
4likes
0Comments
Summarize dynamic array?
Hi, I apologize for my lack of experience, however this is literally my first time using / learning about Azure Data Explorer. I have this data: | project Data1 = Data[0], Data2 = Data[1], Data3 = Data[2] where Data is in the form of an integer array: (something like [9, 12, 24]) I want to summarize the data to produce an output with the most common number for each index, so if Data was: [9, 12, 24] [17, 12, 37] [9, 17, 37] Then the output of the function would be: Data1 9 Data2 12 Data3 37 Is this possible? Thanks!
bphillips09
Aug 16, 2019 Place Azure Data Explorer
6.6KViews
0likes
8Comments
Summarize // remove duplicate
If I have 10 columns and Ii use summarize on more than few, it complains of too many columns. Also I only want to remove duplicates in 1 column while still retaining all other columns in output. How can I accomplish this.
ratoor
Aug 31, 2021 Place Azure Data Explorer
5.2KViews
0likes
1Comment
How do I use Kusto client to create table?
I am trying to automate provisioning of Kusto cluster. So far I have terraform to create the cluster, create the database. I have yet to find a way to create a table. I went down the path of using python but I ran into an issue (see further below) so switched to java. With Java I am getting an error with the query syntax. Can someone point me to the correct syntax that I should be using to create tables with the client? Java Error Caused by: com.microsoft.azure.kusto.data.exceptions.DataWebException: BadRequest_SyntaxError: Request is invalid and cannot be executed. Syntax error: Query could not be parsed: . Query: '.create table FlowState ( id:string, flow:string, flowRunId:string )' Python Error (Note: I have 'adal' installed.) Traceback (most recent call last): File "table_creation.py", line 1, in <module> from azure.kusto.data.exceptions import KustoServiceError File "/.../python3.6/site-packages/azure_kusto_data-0.0.44-py3.6.egg/azure/kusto/data/__init__.py", line 3, in <module> from .request import KustoClient, KustoConnectionStringBuilder, ClientRequestProperties File "/.../python3.6/site-packages/azure_kusto_data-0.0.44-py3.6.egg/azure/kusto/data/request.py", line 17, in <module> from .security import _AadHelper File "/.../python3.6/site-packages/azure_kusto_data-0.0.44-py3.6.egg/azure/kusto/data/security.py", line 10, in <module> from adal import AuthenticationContext, AdalError ModuleNotFoundError: No module named 'adal'
kusto345
Feb 18, 2020 Place Azure Data Explorer
4.5KViews
0likes
2Comments