Forum Widgets
Latest Discussions
Welcome to Azure Data Explorer (Kusto) Space
Welcome to the Azure Data Explorer (Kusto) space @ TechCommunity. Join us to share questions, thoughts or ideas about Kusto and receive answers from the diverse Azure Data Explorer community. Our community is here to assist you with any question or challenge such as creating a new Data Explorer cluster, database or table, ingesting data or performing a complex query. Learn more about Data Explorer (Kusto): Azure Data Explorer Documentation Course – Basics of KQL Query explorer Azure Portal User Voice End to End Lab Azure Data Explorer Blog Investigate your data with Azure Data Explorer (Kusto). Question, comment or request? Post it here. BR, Azure Data Explorer product team17KViews17likes15CommentsAzure Data Explorer Support for Virtual Network (VNet) Deployment
Azure Data Explorer now supports deploying a cluster into a subnet in your Virtual Network (VNet), enabling you to: Enforce Network Security Group (NSG) rules on your ADX cluster traffic. Connect your on-premise network to ADX cluster's subnet. Secure your Data Connection sources (EventHub/EventGrid) with service endpoint. The feature is still in private preview, please fill this form to deploy ADX cluster into your VNet.3.4KViews9likes2CommentsAnnouncing an Azure Data Explorer AMA on January 27!
We are very excited to announce the next monthly Azure Data Explorer 'Ask Microsoft Anything' (AMA)! This will be happening concurrently with an Azure Data Explorer overview webinar. Watch the webinar and ask questions in real-time here. You can register for the webinar here. The AMA will take place on Wednesday, January 27, 2021 from 9:00 a.m. to 10:00 a.m. PT in the Azure AMA space. Add the event to your calendar and view in your time zone here. An AMA is a live online event similar to a “YamJam” on Yammer or an “Ask Me Anything” on Reddit. This AMA gives you the opportunity to connect with members of the product engineering team who will be on hand to answer your questions and listen to feedback.EricStarkerJan 06, 2021Gold Contributor9.4KViews7likes11CommentsAnalyzing Spotify streaming history in Azure Data Explorer (Kusto)
Do you love listening to music? You can use Azure Data Explorer (Kusto) to analyze your Spotify listening history. Here's a look at what you can explore: Analyzing Spotify streaming history in Azure Data Explorer (Kusto) Getting the source data Ingesting the data into Kusto (Azure Data Explorer) Analyzing the data Artists I’m listening to Artists and songs I’m skipping Artist analytics Streaks / Binge listening Activity days / hours Random facts1.9KViews5likes0CommentsOrb - A tool to run Live Site using Azure Data Explorer
Orb (Object-based Resource Browser) is a tool used by several internal Azure engineering and support teams to run Live Site for their services – I’m happy to announce that this tool is now available on https://github.com/microsoft/Orb. Orb enables you to reduce the time to investigate Live Site issues in your services by providing the ability to rapidly browse through information, based on an object hierarchy. Object hierarchies are user-defined and backed by Azure Data Explorer (ADX) queries - ADX integration allows rapidly searching for objects and discovering hierarchies dynamically based on event logs from your services. Internal Azure services use Orb to map the layout of Azure data centers and cross-service interactions. For example, a user Virtual Machine (VM) is mapped to a host server. A host server is further mapped to a rack, network switch and so forth. This post provides more information on how our internal services emit these event logs. Orb also allows streamlined access and discovery of common resources like queries, dashboards, web links and scripts across different teams. Within a large organization, it’s easy for these resources to get fragmented on different team specific wiki pages and notebooks. With Orb, the entire organization can have a shared and consistent view of these resources. The resources are organized by Object – think of Objects as directories and resources as files in a directory. In a file system, you’d have directories linked together by metadata stored on disk – in Orb, directories or Objects are linked together dynamically based on ADX queries (or PowerShell scripts). When a resource under an object is clicked, all necessary context about the object is injected into the resource. For example, if you click a saved web link for a VM object and the URL contains a VM Id, the VM Id is automatically injected into the query. If there were more object properties in the URL that needed to be populated (say, VM region), those would also be added in by Orb. In addition to web URLs, this parameter injection works across different resource types like saved ADX queries, PowerShell scripts/terminals and more. The combination of the object hierarchy and shared resources is what enables the rapid information exploration. When a Live Site incident is raised to our team in Azure Compute, we use Orb to quickly navigate from a VM object to the host server. From there, we can look at all the various host resources and dashboards that determine the host server health. Since Azure Networking has also modeled their objects in Orb, my team can jump to the network switch object the host is running under and then look at all the shared dashboards that determine network switch health. Without this shared resource model, we might have previously engaged another engineer on the Networking team to get access to the same information. Under the covers, Orb uses a git repository to store the shared resources and object hierarchy definitions. This allows organizations to control who was read/write permissions on the shared view, as well as audit and rollback capabilities. Users can update the shared view and submit pull requests from within Orb. Orb is available to download and use for free on https://github.com/microsoft/Orb/releases. Windows is the only supported Operating System at this point. Orb ships with a sample ADX cluster and object definitions - to use Orb with your real service data, take a look at https://github.com/microsoft/Orb/blob/master/dist/resources/app/documentation/onboarding.mdgauravjagJul 16, 2019Copper Contributor3.7KViews4likes0CommentsObject Modeling in Azure Data Explorer
Over the past few years, Azure Data Explorer has become a critical part of running Live Site for most internal Azure services. This post describes a simple approach to model objects in your services using event streams ingested into Azure Data Explorer. The sample streams described below are based on real streams used by teams in Azure Compute (including my team) to monitor and manage all Virtual Machines (VMs) and host servers in Azure. We’ve built a lot of tooling in Azure Compute around the data modeling and query patterns shared below. If you don’t already have a pipeline setup for your service logs to be ingested into Azure Data Explorer, take a look at https://docs.microsoft.com/en-us/azure/data-explorer/ingest-data-event-hub guide. Snapshot Events Object properties are modeled as columns of a table and each row is a snapshot representation of that object. Let’s look at a sample event schema for a VM. [https://dataexplorer.azure.com/clusters/orbcluster.westus2/databases/orbtestdb?query=H4sIAAAAAAAAAwvzDc5LLCjOyC/h5apRKEnMTlUw5OXi5QIAfqaqehgAAAA=] VMSnapshot | take 1 VMId HostId OsImage State Region PreciseTimeStamp 3f576830-2d8f-481b-8448-bca561090831 a3d1a948-53ae-4175-8030-d7d174c427d1 RServer-WS2016 Started westus 2019-05-21T19:25:39Z Our services emit these snapshots periodically and on every column change – the latter is essential to look at when things changed without waiting on the snapshot period. We do this for every key object modeled in the system – service endpoints, configuration values, VMs, host servers, etc. This snapshot pattern works well for long lived objects – short lived objects like requests typically have their properties emitted as part of the initial request processing. You can also consider pushing immutable properties of an object as a separate stream with lower snapshot frequencies. Query Pattern 1 – What is the latest state of objects? In the example above, if you wanted to get the latest state of a single object, this is quite straightforward: [https://dataexplorer.azure.com/clusters/orbcluster.westus2/databases/orbtestdb?query=H4sIAAAAAAAAAwvzDc5LLCjOyC/h5apRKM9ILUpVCPP1TFGwtVVQMk4zNTezMDbQNUqxSNM1sTBM0rUwMbHQTUpONDUzNLA0sDA2VFKoUSjJL1AwVEiqVAgoSk3OLE4NycxNDS5JzC1QSEktTubl4uUCALB4YctnAAAA] VMSnapshot | where VMId == "3f576830-2d8f-481b-8448-bca561090831" | top 1 by PreciseTimeStamp desc What if you wanted the latest snapshot for several objects at a time? This is where https://docs.microsoft.com/en-us/azure/kusto/query/arg-max-aggfunction is helpful. arg_max finds rows corresponding to the expression that needs to be maximized. If there are n VMs in your query, arg_max can be used to maximize PreciseTimeStamp for every unique VMId – the end result will be n rows, where each row corresponds to the latest snapshot for that VM. For example, the query below finds the latest VM state distribution in the region westus. [https://dataexplorer.azure.com/clusters/orbcluster.westus2/databases/orbtestdb?query=H4sIAAAAAAAAA13LvQ6CMBSG4Z2Ee/jCVIyTO+4OJEYIqznCSdvhtKQ/QYgXL7rp/D7v0HaO5mh8KosXFsOBcWNtvUPToFo4phwrkJtwDTzayL0V7hLJjDNIe3Uy9WeNWYSC3RgU9F3oqf79EYcajxVDe5nwu4w+u6S+daeJsfeyeAOSaydHnAAAAA==] VMSnapshot | where Region == "westus" and PreciseTimeStamp > ago(2h) | summarize arg_max(PreciseTimeStamp, *) by VMId | summarize count() by State The other thing to note is that the query looks back 2 hours here – 2x the snapshot interval which is hourly for this table. Since the full snapshot event is emitted on every property change for the VM, this query does give you the latest state of every object in Azure Data Explorer, not just what was visible on the hourly snapshot. Also, the query result might include VMs that have already been deleted. To deal with this, we model the deletion itself as a state on the snapshot, so that such results can be filtered if required. Since we also model every service configuration value and the whole service itself as an object, we use the pattern above to track each config value and build version as rollouts occur in the fleet. For example, the query below gives us the version distribution of our services running across Azure host servers. [https://dataexplorer.azure.com/clusters/orbcluster.westus2/databases/orbtestdb?query=H4sIAAAAAAAAA1WMOw7CMAxA90q9g8eCegpYwoYIOzKJ1VrCcRU7CBCHp4KJ+X2Cmkeqd04UCy42q/fdG6yJYOUXAdbpIvgYjpUSG51ZKDrKMsJ2A9cnhHVwyPAf5aSt+PBjX+1EE2sZYdf4lvcqwh7Q5rXruw/WpJfehAAAAA==] HostServiceSnapshot | summarize arg_max(PreciseTimeStamp, *) by HostId | summarize dcount(HostId) by Region, BuildCommitHash The BuildCommitHash in our snapshots enables us to track every code change committed as it’s deployed across the fleet. We have some tooling that takes in a commit, gets the latest snapshots of the service, runs https://git-scm.com/docs/git-merge-base and provides the engineer with the exact spread of host servers that have that fix. We also use similar queries to setup reports using Microsoft Flow to track build versions, A/B testing of config and more. Query Pattern 2 – When did a Property Value Change? Let’s say you wanted to know when the version of a host service changed using snapshots – you could do a self-join with the same table, though this makes for a complicated query. This pattern is solved quite cleanly with the https://docs.microsoft.com/en-us/azure/kusto/query/sortoperator and https://docs.microsoft.com/en-us/azure/kusto/query/prevfunction operators. The prev operator allows you to access rows before the current row being processed (from a sorted dataset). To find the when a value changed, you can now compare row n to row n-1 quite easily. [https://dataexplorer.azure.com/clusters/orbcluster.westus2/databases/orbtestdb?query=H4sIAAAAAAAAA12OvQ6CMBDHdxLe4WSShCYiGHBg0QU3E3yBsz1CDbWkLSoJD2+ZDE53ud/9P2ptXUPmJTk1Txxsp10YzPDuyBDUHl4EVBVEmIkUj3nJDhkSy9PiwMpdtmOiEGmR83zvZ7QorTYO7hNcDXFp6SYVNQ7VAGg5LA/0cfQU4JPGXjTUE3dQgWzb7WmUvThrpaSr0XawqWAw9Pq/xwk4M1ICLfaW4l/dleXstfrhN4ZvnNYsDL4465vk9wAAAA==] HostServiceSnapshot | where HostId == "a3d1a948-53ae-4175-8030-d7d174c427d1" | sort by PreciseTimeStamp asc | extend shouldSelect = iff(BuildCommitHash != prev(BuildCommitHash), true, false) | where shouldSelect | project-away shouldSelect The query above gives you the service changes for one host server. It can be modified to look at all host servers, by doing a sort by HostId and then PreciseTimeStamp. The query below also creates a new column that contains the old value and the new value being transitioned to. [https://dataexplorer.azure.com/clusters/orbcluster.westus2/databases/orbtestdb?query=H4sIAAAAAAAAA3VQSw6CMBDdN+kdxq4gqUfQhWxwZ4IXKGUiNUJJO6ImHN4KxGjF5bw375OXW08Fut5oLFrV+doSZwN46wjKB+SB3lcSDg618Xg0DRakmg6U168/vBO2FXQO+93VXKrMNo2hXPkaNiOaRHD6oSq/qaxW7QmDTiuPCWdTNqxmo+lMJQghOYvT5q8YluDJaUXJAhmcYL0FISHuGPyFGJveanT4p2iIFAKGkGvPqOlnI/leb1nPGWdPaTLWTH8BAAA=] HostServiceSnapshot | sort by HostId, PreciseTimeStamp asc | extend prevBuildCommitHash = prev(BuildCommitHash) | extend buildCommitHashChange = case( HostId != prev(HostId), "", BuildCommitHash != prevBuildCommitHash , strcat(prevBuildCommitHash, " -> ", BuildCommitHash), "") | where buildCommitHashChange != "" | project PreciseTimeStamp, HostId, buildCommitHashChange Query Pattern 3 – Event Correlation with Object Properties Besides these snapshot events, our services push a lot of other traditional log events to Azure Data Explorer. What if you wanted to get the last known property of an object as another log event occurs? For example, let’s say you wanted to get a property of the host server right before a windows event occurs. The simplest option might be to log this property on the windows event table itself – however, this approach doesn’t scale well when an object has hundreds of properties. One of the biggest advantages to having your object snapshots and log events both be in Azure Data Explorer is that it enables you to easily combine this information – you can take any log event, find the object associated with that event and pull in properties for that object and add it to the original event, making it seem like that property was logged natively on the event. This pattern is based on time window joins described https://docs.microsoft.com/en-us/azure/kusto/query/join-timewindow. The query below takes all host windows events and adds the last known BuildCommitHash of the host server to each event. Note that events without a matching snapshot prior to the event are dropped here, though this can be tweaked depending on your use case. The lookback window here is 2 hours, twice the snapshot interval for this object. [https://dataexplorer.azure.com/clusters/orbcluster.westus2/databases/orbtestdb?query=H4sIAAAAAAAAA41SS2/TQBC+R8p/GHqyi3FoEVxQOJSHWiEeUipxRBvvON7Eu2t213GD+uOZcd3YTo0glyS7s99rvhIDlNbu6uqHMtI2sITL4u18Vh7Pr5Shw9HMAi7TlzT08Nd/3KMJMJ/dA94FNBJulcbPeKBnTpgNRmtlou8OM+WRr1ZB6ApejDCTni1OCAv++ZkEfd7DjBD/B7AfZyt6j3eVGJgJFsKhQptHUgQMdBp3D3l8aymmHVmhqDAPtg7oIGppr60PK3R7leHKiMoXNgDcP3isnN1iFuDbmr8umItSO7WVtBg3EhK4qlUp31utVbgWvhhE1Yc+lUwysheDNT1m93KwQHb3QfkgTIYXhJgJj2TmhByWSzg7SygZnveUVvTqzWtJ+1ss4KuFo9vc1kamcLMx1iGEAj0C6bNO+nQ+G3p/N+F9Ev6IrTwYbCjsUAgDyFUcMNGt8NBgWRLRRAUH3DHDkrmKVsca2xBAdinAGkODaNqblgW4HL6Twdk1BTo8ie7Z8ol8puEfrHxb+wC1R8kqBexFWSMXLVcl94daBMrQqZKgRcgoOCbytdbCqd8IhSK3vqjzvMRdu/tuqcJtfmqqwUhNAucxrA/dTPIkalb2SfH6yWNWWo+k7tFhCqueh6S3qkmqrYiDtfyq0R2gQpdbp5kvZa2PBf9bpSca/QW9FxukZv8BUQH+WZ0EAAA=] let lookupWindow = 2h; let lookupBin = lookupWindow / 2.0; WindowsEvent | extend TimeKey = range(bin(PreciseTimeStamp - lookupWindow, lookupBin), bin(PreciseTimeStamp + lookupBin, lookupBin), lookupBin) | mvexpand TimeKey to typeof(datetime) | join kind= leftouter ( HostServiceSnapshot | project Object1Time = PreciseTimeStamp, HostId , BuildCommitHash, TimeKey = bin(PreciseTimeStamp, lookupBin) ) on HostId , TimeKey | extend timeDistance1 = case( BuildCommitHash == "", totimespan(365d), // No Snapshot found. Ignore these records. Object1Time > PreciseTimeStamp, totimespan(365d), // Snapshot is newer than event. Ignore this as well. PreciseTimeStamp - Object1Time) // Compute the time distance between the event and snapshot | where timeDistance1 != totimespan(365d) // 365d is just used as a value to filter out invalid matches | summarize hint.shufflekey = HostId arg_min(timeDistance1, *) by HostId, PreciseTimeStamp // Find the closest snapshot. Shufflekey is used to optimize query performance. | project PreciseTimeStamp, HostId, BuildCommitHash, Message This pattern allows most log events to only carry object identifiers and pull in all properties about that object dynamically using snapshots. This is especially useful in cases where the service that emits log events doesn’t have all the relevant properties available about the object – this is becoming more prevalent with microservices. Internally, we use similar queries for A/B testing and automated anomaly detection. In a follow-up post, I’ll be sharing more details about a Live Site tool built around these object snapshots that will be available on GitHub soon. Edit: The follow up post on Orb is here.gauravjagJul 05, 2019Copper Contributor2.2KViews4likes0CommentsHow to save $$$ by stopping and starting your Azure Data Explorer cluster
Azure Data Explore is worth your while, now see how you can get the best bang for your buck while using it. Read my new blog post here: https://radennis.github.io/Ravit-Blog/blogs/SaveMoneyUsingFlow.html3.2KViews4likes0CommentsIngesting Data To Azure Data Explorer With C#
So you’ve got a shiny new Azure Data Explorer (Kusto) cluster and you want to put some data into it. The easiest way is to import a CSV with Kusto Explorer, but if you want to do it programmatically, there are some nice APIs to get you started. Think of this post a crib notes for the real documentation which covers all of this in detail. This post will walk through the absolute minimum requirement to ingest data with C#. These steps assume that you have already created a cluster and a database. In the Azure portal, go to the query editor for your database and execute: .create table ingestionTest(a:int, b:int, c:int) Create a new Console app and reference the Microsoft.Azure.Kusto.Ingest NuGet package. See the docs for more help with the package. Go to the C# API samples and copy the code from Ingest From Local File(s) using KustoDirectIngestClient into your new console project. Go the the Overview blade for your cluster on the portal and copy the URI value. Pass that value into the KustoConnectionStringBuilder constructor. (Note, normally you would not use direct ingestion. It can be hard on your cluster and requires you to manage a lot of retry logic, etc on your own. Queued ingestion is preferred. In those cases, your connection string would be the "Data Ingestion URI" value.) Update the Initial Catalog to the name of your database and also set that as the value for the kustoDatabase variable lower in the code. The code sample is configured to use an AAD app to authenticate, but for this example, let's just keep it simple and run with your own AAD account. To do this, simply remove the ApplicationKey and ApplicationClientId fields from the initialize block. All you need to set are the FederatedSecurity and InitialCatalog fields. Update the kustoTable variable to be the name of the table that you created in step 1. Create a sample CSV file that has three integers on each row. Put that file name into the "files" list in the code. Run! Because this is using direct ingestion, you should immediately be able to go the Query blade in the portal and query your table. So if you called your table "ingestionTest", execute a query that is just "ingestionTest". You should see the contents of your data there. In a real production scenario, you would be using queued ingestion and you'd probably want to ingest from a blob or a stream. For further reading, checkout a bigger walkthrough of a scenario like I described above and also best practices for ingestion. Congrats! You have now ingested some data into Azure Data Explorer. Expand that out to a millions rows and witness the power of the Kusto query language!Ben MartensJan 31, 2019Microsoft7.9KViews4likes0CommentsCheck out what we announced at //Build 2019
Here is a roundup of all the product news at //Build 2019 related to Azure Data Explorer - https://techcommunity.microsoft.com/t5/Azure-Data-Explorer/What-s-new-in-Azure-Data-Explorer-at-Build-2019/ba-p/547984#.XNMXJ1x_lBs.twitter What could we do better or more of? Tell us in your comments!716Views3likes0CommentsHarnessing the Power of Left-Anti Joins in the Kusto Query Language
The Kusto query language supports a variety of joins. Left-anti might not be among the most common ones used, but it can be one of the most powerful. The docs state that a left-anti join “returns all records from the left side that do not match any record from the right side.” Let’s walk through two ways that this can be used in your processing pipeline. Late-Arriving Data Let’s say that we have an incoming stream if time-series data that we want to process. We have a function called ProcessData(startTime:datetime, endTime:datetime) that periodically gets executed and written to a table called Output via .set-or-append commands. The function processes data between those two timestamps in the parameters. Since we don’t want to end up with duplicate rows, we can’t rerun with the same time window. We can, however, catch the late arriving data for that time window by implementing ProcessData in such a way that it reprocesses all the data in the previous day and then does a left-anti join against the Output table to only return the results haven’t been recorded yet. Anything new gets written to the Output table by the set-or-append command and the duplicates get thrown away. .create-or-alter function with (folder = "demo", skipvalidation = "true") ProcessData (startTime:datetime, endTime:datetime) { let lookback = 1d; let allData = SourceData | where Timestamp >= startTime - lookback and Timestamp < endTime ; OutputTable | join kind = leftanti (allData) on DeviceId, Timestamp } [Update 2019-02-21] The Kusto docs have a good document on dealing with late arriving data. Changelog Left-anti joins can also be used to create a changelog. Let’s say there is a process that is dumping 500,000 rows of data into a table. Those rows contain information about a set of devices. The table gets dropped and replaced every day. We can make a CreateChangelog() function that gets its results written to the Changelog table via set-or-append commands. We can do a left-anti join with the data we already have in Output and only write the rows that have changed. So the CreateChangelog function body would look something like this: DeviceData | where PreciseTimeStamp >= startTime and PreciseTimeStamp < endTime | project DeviceId, DimensionA | join kind = leftanti( Output | project DeviceId, DimensionA ) on DeviceId | project DeviceId, DimensionA, ProcessedTime=now() Now the Output table has a record of every time that a device was added, removed or modified.13KViews3likes0Comments
Resources
Tags
- Azure Data Explorer (Kusto)66 Topics
- Kusto language36 Topics
- AMA16 Topics
- Ingestion8 Topics
- Azure Data Explorer6 Topics
- announcements5 Topics
- Azure Data Explorer AMA5 Topics
- microsoft fabric5 Topics
- Cluster Management4 Topics