Sending Azure Cost Management to Azure Data Explorer/Fabric KQL DB

Understanding your Azure Spend is one of the most important things you do as an Azure customer. Azure Cost Management is built into the platform to provide you insights. But we live in a world of data and looking at the Azure Cost Management data in a silo may not meet your organization's needs. In those situations, we can solve that need by putting your Cost Management data into an analytical platform like Azure Data Explorer or Microsoft Fabric KQL Database. Here we can bring in or join additional data that's useful, run ad-hoc queries and build visualization tying it all together.

Using the below repository, you'll be able to utilize Azure Cost Management exports to setup an automated process that ingests the cost data into ADX or Fabric KQL Database.

Components

The system consists of an Azure Data Explorer cluster, an Azure Data Factory, and a Storage Account.

An admin using AzCLI or Azure Cost Management automatic export pushes usage data in to blob containers in the storage account. Azure Data Factory automatically loads new data into Azure Data Explorer as it arrives in the blob containers via an Event Grid trigger.

Architecture

Installation

There are two methods to deploy this architecture.

Click to implement this using AzCLI (automatic):

Click to implement this using a guided walkthrough (manual):

Sample Results

Power BI Report

For more information about dealing with duplicates or updating records review click here!

6 Comments

bwatts670
Microsoft
Sep 26, 2023
mco365 thanks for the response. Just to address a couple items:

-- ADF is just one option. You could easily implement the same logic with an Azure Function.

-- wernerzirkel we aren't deduping in ADF just dropping extents. That step takes less than a second so doesn't really add cost. I looked at MVs to do the dedup but wasn't comfortable the solution would be 100% accurate. That's the only reason I didn't go that route.

-- The method being used to delete the records is 100% recommended. It's using tags on extents and then dropping the entire extents. This is very efficient.

-- With the current method that ADF exports the data. Each day producing a new csv that contains month to date data, along with the fact that corrections could have been made to previous days. I believe this is the simplest method. There are certainly other methods to doing this and at the end of the day as long as the correct data gets ingested they are just as correct!

-- You can def put the data in Azure SQL if you prefer. I don't think that's the best analytic store for this data but it is an option. I will say that the Azure Cost Management team has also decided that ADX is the better store for this data than Azure SQL.

Sorry about the long winded reply. Just getting around to responding and wanted to answer everything at once!

I'd love to evolve this solution and anyone interested in contributing ideas feel free to submit ideas via the GitHub repo!
wernerzirkel
Brass Contributor
Sep 26, 2023
mco365 agreed Marius. If I would have the choice for this use case, I would also look first for a traditional database instead of ingesting this data in Kusto. The reason is simple, I believe Kusto is best in class for massive amounts of log files, IOT and sensor data. But this use case sounds more like a standard reporting scenario, so Azure SQL DB would be my first idea (assuming we stay in the Microsoft Azure world).

Of course, deduplication will always matter independent of what database we use. In our Kusto example, I would not want to do deduplication in ADF. As you already mentioned, it will be costly and assuming our data changes over time complexity will increase and so will operations effort.

I would simply ingest any data in Kusto and then have a materialized view on top of it which does the deduplication. The ingest table retention policy would be set to a few days only. This solution can be set up in a few minutes with almost no operations effort.
mco365
MCT
Sep 20, 2023
Having implemented already something similar I’m curious as to the architectural choices around dealing with duplicates (e.g. assuming you are using MTD this means you need to filter out prior day delta of costs and using ADF will turn out to be costly to process this data for large organisations). Deleting content in ADX is also not recommended, hence ideally cleaning must be performed as close to source. One way we addressed this is WTD export jobs in ACM and process that.
Also, there are better ways, in my experience than ingest all data, by example using external files. We could go on improving this with Native query folding, using functions or lookup instead of pure query, filter our unreserved columns, etc
wernerzirkel
Brass Contributor
Sep 15, 2023
both options are possible - a) create an Event Subcription/Hub to ingest data and b) set up a trigger in ADF which starts an ADF pipeline
irwins
Copper Contributor
Sep 14, 2023
I like your version! I haven't played with EventHub. Is it necessary or can EventGrid trigger the ADF pipeline as well?
wernerzirkel
Brass Contributor
Sep 12, 2023
Here is an additional option how to implement this with no AzCli required. This might be more suited for people who are familiar with Event Grid. For me it was quite easy to do and it requires no maintenance/operations.

1. Create export
go to a ressource group you want to analyze
click on cost analysis and create a cost view you would like to export
click on download->schedule export->export to your blob storage

2. set up event grid and send to Kusto
create an Event Subscription which listens to your storage container
send data to Event Hub
ingest data from your Event Hub to a Kusto table

note: simply use distinct or a materialized view to filter out duplicates

Blog Post