Blog Post

Azure Data Explorer Blog
1 MIN READ

Sending Azure Cost Management to Azure Data Explorer/Fabric KQL DB

bwatts670's avatar
bwatts670
Icon for Microsoft rankMicrosoft
Sep 12, 2023

Understanding your Azure Spend is one of the most important things you do as an Azure customer. Azure Cost Management is built into the platform to provide you insights. But we live in a world of data and looking at the Azure Cost Management data in a silo may not meet your organization's needs. In those situations, we can solve that need by putting your Cost Management data into an analytical platform like Azure Data Explorer or Microsoft Fabric KQL Database. Here we can bring in or join additional data that's useful, run ad-hoc queries and build visualization tying it all together.

 

Using the below repository, you'll be able to utilize Azure Cost Management exports to setup an automated process that ingests the cost data into ADX or Fabric KQL Database.

 

Components

The system consists of an Azure Data Explorer cluster, an Azure Data Factory, and a Storage Account.

 

An admin using AzCLI or Azure Cost Management automatic export pushes usage data in to blob containers in the storage account. Azure Data Factory automatically loads new data into Azure Data Explorer as it arrives in the blob containers via an Event Grid trigger.

 

Architecture

 

Installation

There are two methods to deploy this architecture.

  1. Click to implement this using AzCLI (automatic):
 

  1. Click to implement this using a guided walkthrough (manual):
 

Sample Results

Power BI Report

 

 

For more information about dealing with duplicates or updating records review click here!

Updated Aug 25, 2023
Version 1.0

6 Comments

  • mco365 thanks for the response. Just to address a couple items:

    -- ADF is just one option. You could easily implement the same logic with an Azure Function. 

    -- wernerzirkel we aren't deduping in ADF just dropping extents. That step takes less than a second so doesn't really add cost. I looked at MVs to do the dedup but wasn't comfortable the solution would be 100% accurate. That's the only reason I didn't go that route.  

    -- The method being used to delete the records is 100% recommended. It's using tags on extents and then dropping the entire extents. This is very efficient.

    -- With the current method that ADF exports the data. Each day producing a new csv that contains month to date data, along with the fact that corrections could have been made to previous days. I believe this is the simplest method. There are certainly other methods to doing this and at the end of the day as long as the correct data gets ingested they are just as correct! 

    -- You can def put the data in Azure SQL if you prefer. I don't think that's the best analytic store for this data but it is an option. I will say that the Azure Cost Management team has also decided that ADX is the better store for this data than Azure SQL.

     

    Sorry about the long winded reply. Just getting around to responding and wanted to answer everything at once!

     

    I'd love to evolve this solution and anyone interested in contributing ideas feel free to submit ideas via the GitHub repo! 

  • wernerzirkel's avatar
    wernerzirkel
    Copper Contributor

    mco365  agreed Marius. If I would have the choice for this use case, I would also look first for a traditional database instead of ingesting this data in Kusto. The reason is simple, I believe Kusto is best in class for massive amounts of log files, IOT and sensor data. But this use case sounds more like a standard reporting scenario, so Azure SQL DB would be my first idea (assuming we stay in the Microsoft Azure world). 

     

    Of course, deduplication will always matter independent of what database we use. In our Kusto example, I would not want to do deduplication in ADF. As you already mentioned, it will be costly and assuming our data changes over time complexity will increase and so will operations effort.

     

    I would simply ingest any data in Kusto and then have a materialized view on top of it which does the deduplication. The ingest table retention policy would be set to a few days only. This solution can be set up in a few minutes with almost no operations effort. 

  • mco365's avatar
    mco365
    Iron Contributor

    Having implemented already something similar I’m curious as to the architectural choices around dealing with duplicates (e.g. assuming you are using MTD this means you need to filter out prior day delta of costs and using ADF will turn out to be costly to process this data for large organisations). Deleting content in ADX is also not recommended, hence ideally cleaning must be performed as close to source. One way we addressed this is WTD export jobs in ACM and process that.

    Also, there are better ways, in my experience than ingest all data, by example using external files. We could go on improving this with Native query folding, using functions or lookup instead of pure query, filter our unreserved columns, etc

  • wernerzirkel's avatar
    wernerzirkel
    Copper Contributor

    both options are possible - a) create an Event Subcription/Hub to ingest data and b) set up a trigger in ADF which starts an ADF pipeline

  • irwins's avatar
    irwins
    Copper Contributor

    I like your version! I haven't played with EventHub. Is it necessary or can EventGrid trigger the ADF pipeline as well?

  • wernerzirkel's avatar
    wernerzirkel
    Copper Contributor

    Here is an additional option how to implement this with no AzCli required. This might be more suited for people who are familiar with Event Grid. For me it was quite easy to do and it requires no maintenance/operations.

     

    1. Create export

    go to a ressource group you want to analyze

    click on cost analysis and create a cost view you would like to export

    click on download->schedule export->export to your blob storage

     

    2. set up event grid and send to Kusto

    create an Event Subscription which listens to your storage container

    send data to Event Hub

    ingest data from your Event Hub to a Kusto table

     

    note: simply use distinct or a materialized view to filter out duplicates