Blog Post

Azure Data Factory Blog
3 MIN READ

Granular Billing for Azure Data Factory

ChenyeCharlieZhu's avatar
ChenyeCharlieZhu
Former Employee
Oct 18, 2022

Overview

By default, Azure Data Factory reports lump sum charges for billing, meaning that at the factory level, we add up charges across all pipelines within a factory, and tell you how much you have spent on these pipelines. In many cases, these aggregate numbers should suffice. But in others, these numbers lack the clarity and transparency that we thrive to provide customers. For instance, if you are running data pipelines for multiple teams, you may want to determine the cost for each pipeline, for proper book-keeping and/or charge backs.

 

Now, Azure Data Factory will help you with this endeavor, with built-in per pipeline detailed billing view. Moreover, we built the feature on top of the Azure Billing and Cost Analysis platform, allowing you to stay with the cost and budget management tool that you are familiar with to identify spending trends and spot where overspending might have occurred.

 

Billing Report Behaviors

By opting in to the feature, you will have a separate line item for each of your pipelines. Charges associated with a pipeline will be grouped together under the pipeline name, giving you a clear view of the cost of operations. You also have the chance to get the aggregate view for your factories, as you can filter the charges by factory name in the Azure billing report. 

 

 

NOTE: There will be one entry for each pipeline in your factory. Please be particularly aware if you have excessive amount of pipelines in the factory, as it may significantly lengthen your billing report.

 

NOTEthe change only impacts how bills are emitted going forward, and does not change past charges. Please give some time before the change populate to billing report: typically, the change is updated within 1 day.

 

How to Opt-in

You need to opt-in to this feature for every factory you want detailed billing for. To turn on the per pipeline detailed billing feature, (1) go to Azure Data Factory portal; (2) under the Manage tab, select Factory settings in the General section; (3) select Show billing report (preview) by pipeline; and (4) publish the change.

 

 

This setting is not included in the exported ARM templates from your factory, meaning that Continuous Integration and Delivery (CI/CD) will not overwrite billing behaviors for the factory. This neat trick allows you to set different billing behaviors for development, test, and production environments, even when they share the same pipeline definitions.

 

Known Limitations

 

1. Only Azure Data Factory billings will be included

Azure Data Factory runs on Azure infrastructure that accrues costs when you deploy new resources. It's important to understand that other extra infrastructure costs might accrue. For instance, when you move data across availability zones, bandwidth charges will apply. These charges will not be included in the per pipeline billing reports.
 

2. Certain Charges are inherently shared at factory level

These charges will file under a fallback line item for your factory:

 

3. Dataflows with Time-to-Live setting

For now, Dataflows running on Azure Integration runtime with a Time-to-Live (TTL) setting will file under a fallback line item for your factory. We are iterating to improve the experience for our users.

 

A Special Note to Existing Private Preview Customers

For all existing private preview customers that previously onboarded to the feature's private preview whitelist: we will continue to honor our commitment and allow detailed billing for your factories. However, for the time being, you may notice some discrepancies in the setting: even when the factory setting states billing by factory, you may see detailed billing in your billing report.

 

For existing private preview customers, we recommend the following steps:

  1. Turn on detailed billings for factories you want detailed billing for
  2. Select by factory billing for all other factories
  3. Once ready with steps 1 and 2, contact your account manager to confirm you are ready for GA

We will gradually turn off the feature whitelist, as we progress to Public Preview and GA stages. Going forward, your factory settings will be the single source of control for the detailed billing behavior. Thank you so much for your understanding!

 

 

 

Updated Nov 02, 2022
Version 4.0

20 Comments

  • sim__'s avatar
    sim__
    Copper Contributor

    The parameters were just an example.
    Microsoft should only create a solution how to differentiate the costs for generic pipelines.

    Thats all what i want or need 😉


    Just think of this example:
    1 generic pipeline that runs for 10 internal customers, differently often.

    Now i have to allocate the costs to the individual internal customers.

     

    How can i do that?Its not possible right now.

     

    I think we need a solution for this.

    The solution cannot be to build different pipelines or to create different data factories.

     

  • souravagasti's avatar
    souravagasti
    Copper Contributor

    I am just pointing out that an overview won't be of much use if there are lots of generic pipelines getting used. A drill through link does sound useful. MS can use customer's data lake to store the details with a set retention period. People really won't mind paying that extra bit for storage.

  • KoenVerbeeck's avatar
    KoenVerbeeck
    Brass Contributor

    A run ID? You mean something that is unique per execution? For larger ADF implementations this would means thousands or even hundreds of thousand of rows per month. You're asking too much detail for a cost overview. The fact that generic pipelines are a best practice doesn't change this. It's still the same object, and you get an overview of the cost for that object.

     

    What would be useful though, would be a drill-through link to a report where you get the cost per execution for a specific pipeline.

  • souravagasti's avatar
    souravagasti
    Copper Contributor

    KoenVerbeeck  why can't MS provide the pipeline run ID in the cost sheet. That way we know exactly which run ID cost how much. That would be much more useful instead of segregating by just pipeline names as that would help the dev team also identify if any particular run was costly. Using generic pipelines/datasets/linked services is industry-wide best practice- so unless it is enabled, it won't be of much use to such teams who use generic pipelines.

  • KoenVerbeeck's avatar
    KoenVerbeeck
    Brass Contributor

    sim__ souravagasti I get your requests and why a split per paramaterized pipeline would be useful, but in my opinion it's a bit of an unreasonable request. A parameterized pipeline is about the same as an SSIS package with parameters. In the logging, there's no way to distinguish between the different parameters, you just get logging for the same package, because it's the same object.

    Suppose there's a split per parameter. What if you have hundreds or even thousands of different parameter values? That would mean hundreds of lines on the invoice. In my opinion, using KQL would be a better option to try to figure out how much a given pipeline has been running for a parameter value.

  • sim__'s avatar
    sim__
    Copper Contributor

    I agree with souravagasti.
    We have many pipelines that a running for different internal clients (we are doing this with different pipelines parameters).
    It would be nice if there was a way to split the costs of the general pipelines.
    Is there a solution for this?

  • NOTEthe change only impacts how bills are emitted going forward, and does not change past charges. Please give some time before the change populate to billing report: typically, the change is updated within 1 day.

     

  • souravagasti's avatar
    souravagasti
    Copper Contributor

    It’s a standard best practice to build parameterized pipelines. If the same pipeline has to be used for many loads, will there be a way to distinguish?

  • mark_james's avatar
    mark_james
    Copper Contributor

    Hi,  when is this feature likely to be available in Synapse to have a granular billing view of pipelines created within the Synapse environment?

  • KoenVerbeeck's avatar
    KoenVerbeeck
    Brass Contributor

    How long does it take for the pipelines to actually show up in the cost analysis tool?

     

    EDIT: never mind, it just showed up. Around 24h it seems.