Blog Post

FinOps Blog
19 MIN READ

Getting started with FinOps hubs: Multicloud cost reporting with Azure and Google Cloud

GrahamM's avatar
GrahamM
Copper Contributor
May 27, 2025

[Guest perspective: The following article is a guest post by Graham Murphy. The views and opinions expressed in this article are those of the author. We thank Graham for sharing his perspective.]

Microsoft’s FinOps hubs offer a powerful and trusted foundation for managing, analyzing, and optimizing cloud costs. Built on Azure Data Explorer (ADX) and Azure Data Lake Storage (ADLS), FinOps hubs provide a scalable platform to unify billing data across providers leveraging FOCUS datasets.

 

In this post, we’ll take a hands-on technical walkthrough of how to connect Microsoft FinOps hubs to Google Cloud, enabling you to export and analyze Google Cloud billing data with Azure billing data directly within your FinOps hub instance. This walk through will focus on using only storage in the hub for accessing data and is designed to get you started and understand multicloud connection and reporting from FinOps hubs. For large datasets Azure Data Explorer or Microsoft Fabric is recommended.

With the introduction of the FinOps Open Cost and Usage Specification (FOCUS), normalizing billing data and reporting it through a single-pane-of-glass experience has never been easier.

As a long-time contributor to the FOCUS working group, I’ve spent the past few years helping define standards to make multicloud reporting simpler and more actionable. FOCUS enables side-by-side views of cost and usage data—such as compute hours across cloud providers—helping organizations make better decisions around workload placement and right-sizing. Before FOCUS, this kind of unified analysis was incredibly challenging due to the differing data models used by each provider.

To complete this technical walk through, you’ll need access to both Azure and Google Cloud—and approximately 2 to 3 hours of your time.

Getting started: The basics you’ll need

Before diving in, ensure you have the following prerequisites in place:

Access to your Google Cloud billing account.

A Google Cloud project with BigQuery and other APIs enabled and linked to your billing account.

✅ All required IAM roles and permissions for working with BigQuery, Cloud Functions, Storage, and billing data (detailed below).

Detailed billing export and pricing export configured in Google Cloud.

✅ An existing deployment of FinOps hubs.

What you’ll be building

In this walk-through, you’ll set up Google billing exports and an Azure Data Factory pipeline to fetch the exported data, convert to parquet and ingest into your FinOps hub storage, following the FOCUS 1.0 standard for normalization.

Through the process you should end up creating:

  1. Configure a BigQuery view to convert detailed billing exports into the FOCUS 1.0 schema.
  2. Create a metadata table in BigQuery to track export timestamps to enable incremental data exports reducing file export sizes and avoiding duplicate data.
  3. Set up a GCS bucket to export FOCUS-formatted data in CSV format.
  4. Deploy a Google Cloud Function that performs incremental exports from BigQuery to GCS.
  5. Create a Google Cloud Schedule to automate the export of your billing data to Google Cloud Storage.
  6. Build a pipeline in Azure Data Factory to:
    • Fetch the CSV billing exports from Google Cloud Storage.
    • Convert them to Parquet.
    • Ingest the transformed data into your FinOps hub ingestion container.

Let's get started. We're going to start in Google Cloud before we jump back into Azure to setup the Data Factory pipeline.

Enabling FOCUS exports in Google

Prerequisite: Enable detailed billing & pricing exports

⚠️ You cannot complete this guide without billing data enabled in BigQuery if you have not enabled detailed billing exports and pricing exports, do this now and come back to the walk-through in 24 hours.

Steps to enabled detailed billing exports and pricing exports:

  1. Navigate to Billing > Billing Export.
  2. Enable Detailed Cost Export to BigQuery.
  3. Select the billing project and a dataset - if you have not created a project for billing do so now.
  4. Enable Pricing Export to the same dataset.

🔊 “This enables daily cost and usage data to be streamed to BigQuery for granular analysis.”

Detailed guidance and information on billing exports in Google can be found here: Google Billing Exports.

What you'll create:

  • Service category table
  • Metadata table
  • FOCUS View - you must have detailed billing export and pricing exports enabled to create this view
  • Cloud Function
  • Cloud Schedule

What you'll need

  • An active GCP Billing Account (this is your payment method).
  • A GCP Project (new or existing) linked to your GCP billing account.
  • All required APIs enabled.

  • All required IAM roles enabled.

Enable Required APIs:

  • BigQuery API
  • Cloud Billing API
  • Cloud Build API
  • Cloud Scheduler
  • Cloud Functions
  • Cloud Storage
  • Cloud Run Admin
  • Pub/Sub (optional, for event triggers)

Required IAM Roles

Assign the following roles to your user account:

  • roles/billing.viewer or billing.admin
  • roles/bigquery.dataEditor
  • roles/storage.admin
  • roles/cloudfunctions.admin
  • roles/cloudscheduler.admin
  • roles/iam.serviceAccountTokenCreator
  • roles/cloudfunctions.invoker
  • roles/run.admin (if using Cloud Run)
  • roles/project.editor

Create a service account (e.g., svc-bq-focus)

Assign the following roles to your service account:

  • roles/bigquery.dataOwner
  • roles/storage.objectAdmin
  • roles/cloudfunctions.invoker
  • roles/cloudscheduler.admin
  • roles/serviceAccount.tokenCreator
  • roles/run.admin

Your default compute service account will also require access to run cloud build services. Ensure you apply the cloud build role to your default compute service account in your project, it may look like this: 

projectID-compute@developer.gserviceaccount.com

→ roles/cloudbuild.builds.builder

Create FOCUS data structure and View

This section will create two new tables in Big Query, one for service category mappings and one for metadata related to export times. These are important to get in before we create the FOCUS view to extract billing data in FOCUS format.

Create a service category mapping table

I removed service category from the original google FOCUS view to reduce the size of the SQL Query, therefore, to ensure we mapped Service category properly, I created a new service category table and joined it to the FOCUS view.

In this step we will create a new table using open data to map GCP services to service category. Doing this helps reduce the size of the SQL query and simplifies management of Service Category mapping. Leveraging open source data we can easily update service category mappings if they ever change or new categories are added without impacting the FOCUS view query.

Process:
  • Download the latest service_category_mapping.csv from the FOCUS Converter repo
  • Go to BigQuery > Your Dataset > Create Table
  • Upload the CSV
  • Table name: service_category
  • Schema: Auto-detect

Create a metadata table

This table will be used to track the last time detailed billing data was added to your detailed billing export, we use this to enable incremental exports of billing data through the FOCUS view to ensure we only export the latest set of data and not everything all the time.

Process:
  • Go to BigQuery > Create Table
  • Table name: metadata_focus_export
  • Schema:
    Field Name : Format
    last_export_time: TIMESTAMP
    export_message: STRING
    
    Enter your field name and then choose field format, do not add :

🔊 “Ensures each export only includes new data since the last timestamp.”

Create the FOCUS-aligned view

Creating a view in BigQuery allows us to un-nest the detailed billing export tables into the format of FOCUS 1.0*. To use Power BI we must un-nest the tables so this step is required. It also ensures we map the right columns in Google Cloud detailed billing export to the right columns in FOCUS.

⚠️ The FOCUS SQL code provided in this walk-through has been altered from the original Google provided code. I believe this new code is better formatted for FOCUS 1.0 than the original however it does contain some nuances that suit my personal views. Please evaluate this carefully before using this in a production system and adjust the code accordingly to your needs.

Steps:
  • Navigate to BigQuery > New Query
  • Paste and update the FOCUS view SQL query which is provided below
  • Replace:
    • yourexporttable with detailed export dataset ID and table name that will look like "
    • yourpricingexporttable with pricing export and table name
    • your_billing_dataset with your detailed export dataset ID and table name
  • FOCUS SQL Query:
WITH usage_export AS ( 
  SELECT
    *,
    (
      SELECT AS STRUCT type, id, full_name, amount, name
      FROM UNNEST(credits)
      LIMIT 1
    ) AS cud,
  FROM
    "Your-Detailed-Billing-Export-ID"
  -- replace with your detailed usage export table path
),
prices AS (
  SELECT
  export_time,
  sku.id AS sku_id,
  sku.description AS sku_description,
  service.id AS service_id,
  service.description AS service_description,
  tier.*
  FROM
    "your_pricing_export_id",
    UNNEST(list_price.tiered_rates) AS tier
)
SELECT
  "111111-222222-333333" AS BillingAccountId,
  "Your Company Name" AS BillingAccountName,
   COALESCE((SELECT SUM(x.amount)
    FROM UNNEST(usage_export.credits) x),0) + cost as BilledCost,
  usage_export.currency AS BillingCurrency,
  DATETIME(PARSE_DATE("%Y%m", invoice.month)) AS BillingPeriodStart,
  DATETIME(DATE_SUB(DATE_ADD(PARSE_DATE("%Y%m", invoice.month), INTERVAL 1 MONTH), INTERVAL 1 DAY)) AS BillingPeriodEnd,
  CASE
  WHEN usage_export.adjustment_info.type IS NOT NULL and usage_export.adjustment_info.type !='' THEN 'Adjustment'
  WHEN usage_export.cud.type = 'PROMOTION' AND usage_export.cost_type = 'regular' AND usage_export.cud.id IS NOT NULL THEN 'Credit'
  WHEN usage_export.sku.description LIKE '%Commitment - 3 years - dollar based VCF%' or usage_export.sku.description LIKE '%Prepay Commitment%' THEN 'Purchase'
  WHEN usage_export.cud.id IS NOT NULL AND usage_export.cud.id != '' THEN 'Credit'
  WHEN usage_export.cost_type = 'regular' THEN 'Usage'
  WHEN usage_export.cost_type = 'tax' THEN 'Tax' 
  WHEN usage_export.cost_type = 'adjustment' THEN 'Adjustment'
  WHEN usage_export.cost_type = 'rounding_error' THEN 'Adjustment'
  ELSE usage_export.cost_type 
  END AS ChargeCategory,
  IF(COALESCE(
    usage_export.adjustment_info.id,
    usage_export.adjustment_info.description,
    usage_export.adjustment_info.type,
    usage_export.adjustment_info.mode)
    IS NOT NULL,
    "correction",
    NULL) AS ChargeClass,
  CASE
  WHEN usage_export.adjustment_info.type IS NOT NULL AND usage_export.adjustment_info.type != '' THEN usage_export.adjustment_info.type
  ELSE usage_export.sku.description
  END AS ChargeDescription,
  CAST(usage_export.usage_start_time AS DATETIME) AS ChargePeriodStart,
  CAST(usage_export.usage_end_time AS DATETIME) AS ChargePeriodEnd,
  CASE usage_export.cud.type
    WHEN "COMMITTED_USAGE_DISCOUNT_DOLLAR_BASE" THEN "Spend"
    WHEN "COMMITTED_USAGE_DISCOUNT" THEN "Usage"
  END AS CommitmentDiscountCategory,
  usage_export.cud.id AS CommitmentDiscountId,
  COALESCE (usage_export.cud.full_name, usage_export.cud.name) AS CommitmentDiscountName,
  usage_export.cud.type AS CommitmentDiscountType,
  CAST(usage_export.usage.amount_in_pricing_units AS numeric) AS ConsumedQuantity,
  usage_export.usage.pricing_unit AS ConsumedUnit,
  -- review
   CAST(
    CASE 
    WHEN usage_export.cost_type = "regular" THEN usage_export.price.effective_price * usage_export.price.pricing_unit_quantity
    ELSE 0
    END AS NUMERIC
    ) AS ContractedCost,
    --
  CAST(usage_export.price.effective_price AS numeric) AS ContractedUnitPrice,
  usage_export.seller_name AS InvoiceIssuerName,
    COALESCE((SELECT SUM(x.amount)
    FROM UNNEST(usage_export.credits) x),0) + cost as EffectiveCost,
  CAST(usage_export.cost_at_list AS numeric) AS ListCost,
  prices.account_currency_amount AS ListUnitPrice,
  IF(
      usage_export.cost_type = "regular",
      IF(
        LOWER(usage_export.sku.description) LIKE "commitment%" OR usage_export.cud IS NOT NULL,
        "committed",
        "standard"),
      null) AS PricingCategory,
  IF(usage_export.cost_type = 'regular', usage_export.price.pricing_unit_quantity, NULL) AS PricingQuantity,
  IF(usage_export.cost_type = 'regular', usage_export.price.unit, NULL) AS PricingUnit,
  'Google'AS ProviderName,
  usage_export.transaction_type AS PublisherName,
  usage_export.location.region AS RegionId,
  usage_export.location.region AS RegionName,
  usage_export.service.id AS ResourceId,
  REGEXP_EXTRACT (usage_export.resource.global_name, r'[^/]+$') AS ResourceName,
  usage_export.sku.description AS ResourceType,
 COALESCE(servicemapping.string_field_1, 'Other') AS ServiceCategory,
  usage_export.service.description AS ServiceName,
  usage_export.sku.id AS SkuId,
  CONCAT("SKU ID:", usage_export.sku.id, ", Price Tier Start Amount: ", price.tier_start_amount) AS SkuPriceId,
  usage_export.project.id AS SubAccountId,
  usage_export.project.name AS SubAccountName,
  (SELECT CONCAT('{', STRING_AGG(FORMAT('%s:%s', kv.key, kv.value), ', '), '}') 
   FROM (
     SELECT key, value FROM UNNEST(usage_export.project.labels)
     UNION ALL
     SELECT key, value FROM UNNEST(usage_export.tags)
     UNION ALL
     SELECT key, value FROM UNNEST(usage_export.labels)
   ) AS kv) AS Tags,
  FORMAT_DATE('%B', PARSE_DATE('%Y%m', invoice.month)) AS Month,
  usage_export.project.name AS x_ResourceGroupName,
  CAST(usage_export.export_time AS TIMESTAMP) AS export_time,
FROM
  usage_export
  LEFT JOIN
  "Your-Service-Category-Id".ServiceCategory AS servicemapping
ON
  usage_export.service.description = servicemapping.string_field_0
LEFT JOIN
  prices
ON
 usage_export.sku.id = prices.sku_id
 AND usage_export.price.tier_start_amount = prices.start_usage_amount
 AND DATE(usage_export.export_time) = DATE(prices.export_time);

 

🔊 "This creates a FOCUS-aligned view of your billing data using FOCUS 1.0 specification, this view does not 100% conform to FOCUS 1.0.

Create GCS Bucket for CSV Exports

Your GCS bucket is where you will place your incremental exports to be exported to Azure. Once your data is exported to Azure you may elect to delete the files in the bucket, the metaata table keeps a record of the last export time.

Steps:
  • Go to Cloud Storage > Create Bucket
  • Name: focus-cost-export (or any name you would like)
  • Region: Match your dataset region
  • Storage Class: Nearline (cheaper to use Earline but standard will also work just fine)
  • Enable Interoperability settings > create access/secret key
    • The access key and secret are tied to your user account, if you want to be able to use this with multiple people, create an access key for your service account - recommended for Prod, for this guide purpose, an access key linked to your account is fine.
  • Save the access key and secret to a secure location to use later as part of the ADF pipeline setup.

🔊 “This bucket will store daily CSVs. Consider enabling lifecycle cleanup policies.”

Create a Cloud Function for incremental export

A cloud function is used here to enable incremental exports of your billing data in FOCUS format on a regular schedule. At present there is no known supported on demand export service from Google, so we came up with this little workaround.

The function is designed to evaluate your billing data for last export time that is equals to or greater than the last time the function ran. To do this we look at the export_time column and the metadata table for the last time we ran this. This ensure we only export the most recent billing data which aids in reducing data export costs to Azure.

This process is done through the GCP GUI using an inline editor to create the cloud function in the cloud run service.

Steps:
  1. Go to Cloud Run > Write Function
  2. Select > Use Inline editor to create a function
  3. Service Name: daily_focus_export
  4. Region, the same region as your dataset - in our demo case us-central1
  5. Use settings:
    • Runtime: Python 3.11 (you cannot use anything later than 3.11)
    • Trigger: Optional
    • Authentication: Require Authentication
    • Billing: Request based
    • Service Scaling: Auto-scaling set to 0
    • Ingress: All
    • Containers: leave all settings as set
    • Volumes: Leave all settings as set
    • Networking: Leave all settings as set
    • Security: Choose the service account you created earlier
    • Save
  6. Create main.py file and requirements.txt files through the inline editor
    • For requirements.txt copy and paste the below:
      • functions-framework==3.*
      • google-cloud-bigquery
      • Google-cloud-storage
    • For main.py your function entry point is: export_focus_data
  7. Paste the code below into your main.py inline editor window
import logging
import time
import json
from google.cloud import bigquery, storage

logging.basicConfig(level=logging.INFO)

def export_focus_data(request):
    bq_client = bigquery.Client()
    storage_client = storage.Client()

    project_id = "YOUR-ProjectID"
    dataset = "YOUR-Detailed-Export-Dataset"
    view = "YOUR-FOCUS-View-Name"
    metadata_table = "Your-Metadata-Table"
    job_name = "The name you want to call the job for the export"
    bucket_base = "gs://<your bucketname>/<your foldername>"
    bucket_name = "Your Bucket Name"
    metadata_file_path = "Your-Bucket-name/export_metadata.json"

    try:
        logging.info("🔁 Starting incremental export based on export_time...")

        # Step 1: Get last export_time from metadata
        metadata_query = f"""
            SELECT last_export_time
            FROM `{project_id}.{dataset}.{metadata_table}`
            WHERE job_name = '{job_name}'
            LIMIT 1
        """
        metadata_result = list(bq_client.query(metadata_query).result())
        if not metadata_result:
            return "No metadata row found. Please seed export_metadata.", 400
        last_export_time = metadata_result[0].last_export_time
        logging.info(f"📌 Last export_time from metadata: {last_export_time}")

        # Step 2: Check for new data
        check_data_query = f"""
            SELECT COUNT(*) AS row_count
            FROM `{project_id}.{dataset}.{view}`
            WHERE export_time >= TIMESTAMP('{last_export_time}')
        """
        row_count = list(bq_client.query(check_data_query).result())[0].row_count
        if row_count == 0:
            logging.info("✅ No new data to export.")
            return "No new data to export.", 204

        # Step 3: Get distinct export months
        folder_query = f"""
            SELECT DISTINCT FORMAT_DATETIME('%Y%m', BillingPeriodStart) AS export_month
            FROM `{project_id}.{dataset}.{view}`
            WHERE export_time >= TIMESTAMP('{last_export_time}')
              AND BillingPeriodStart IS NOT NULL
        """
        export_months = [row.export_month for row in bq_client.query(folder_query).result()]
        logging.info(f"📁 Exporting rows from months: {export_months}")

        # Step 4: Export data for each month
        for export_month in export_months:
            export_path = f"{bucket_base}/{export_month}/export_{int(time.time())}_*.csv"
            export_query = f"""
                EXPORT DATA OPTIONS(
                    uri='{export_path}',
                    format='CSV',
                    overwrite=true,
                    header=true,
                    field_delimiter=';'
                ) AS
                SELECT *
                FROM `{project_id}.{dataset}.{view}`
                WHERE export_time >= TIMESTAMP('{last_export_time}')
                  AND FORMAT_DATETIME('%Y%m', BillingPeriodStart) = '{export_month}'
            """
            bq_client.query(export_query).result()

        # Step 5: Get latest export_time from exported rows
        max_export_time_query = f"""
            SELECT MAX(export_time) AS new_export_time
            FROM `{project_id}.{dataset}.{view}`
            WHERE export_time >= TIMESTAMP('{last_export_time}')
        """
        new_export_time = list(bq_client.query(max_export_time_query).result())[0].new_export_time

        # Step 6: Update BigQuery metadata table
        update_query = f"""
            MERGE `{project_id}.{dataset}.{metadata_table}` T
            USING (
                SELECT '{job_name}' AS job_name, TIMESTAMP('{new_export_time}') AS new_export_time
            ) S
            ON T.job_name = S.job_name
            WHEN MATCHED THEN
                UPDATE SET last_export_time = S.new_export_time
            WHEN NOT MATCHED THEN
                INSERT (job_name, last_export_time) VALUES (S.job_name, S.new_export_time)
        """
        bq_client.query(update_query).result()

        # Step 7: Write metadata JSON to GCS
        blob = storage_client.bucket(bucket_name).blob(metadata_file_path)
        blob.upload_from_string(
            json.dumps({"last_export_time": new_export_time.isoformat()}),
            content_type="application/json"
        )
        logging.info(f"📄 Metadata file written to GCS: gs://{bucket_name}/{metadata_file_path}")

        return f"✅ Export complete. Metadata updated to {new_export_time}", 200

    except Exception as e:
        logging.exception("❌ Incremental export failed:")
        return f"Function failed: {str(e)}", 500

Before saving, update variables like project_id, bucket_name, etc.

🔊 “This function exports new billing data based on the last timestamp in your metadata table.”

Test the Cloud Function

  • Deploy and click Test Function
  • Ensure a seed export_metadata.json file is uploaded to your GCS bucket (if you have not uploaded a seed export file the function will not run. Example file is below
    • { "last_export_time": "2024-12-31T23:59:59.000000+00:00" }
  • Confirm new CSVs appear in your target folder

Automate with Cloud Scheduler

This step will set up an automated schedule to run your cloud function on a daily or hourly pattern, you may adjust the frequency to your desired schedule, this demo uses the same time each day at 11pm.

Steps:
  • Go to Cloud Scheduler > Create Job
  • Region: Same region as your Function - This Demo us-central1
  • Name: daily-focus-export
  • Frequency: 0 */6 * * * (every 6 hours) or Frequency: 0 23 * * * (daily at 11 PM)
  • Time zone: your desired time zone
  • Target: HTTP
  • Auth: OIDC Token → Use service account
  • Click CREATE

🔊 "This step automates the entire pipeline to run daily and keep downstream platforms updated with the latest billing data."

Wrapping up Google setup

Wow—there was a lot to get through! But now that you’ve successfully enabled FOCUS-formatted exports in Google Cloud, you're ready for the next step: connecting Azure to your Google Cloud Storage bucket and ingesting the data into your FinOps hub instance.

This is where everything comes together—enabling unified, multi-cloud reporting in your Hub across both Azure and Google Cloud billing data.

Let’s dive into building the Data Factory pipeline and the associated datasets needed to fetch, transform, and load your Google billing exports.

Connecting Azure to Google billing data

With your Google Cloud billing data now exporting in FOCUS format, the next step is to bring it into your Azure environment for centralized FinOps reporting. Using Data Factory, we'll build a pipeline to fetch the CSVs from your Google Cloud Storage bucket, convert them to Parquet, and land them in your FinOps Hub storage account.

Azure access and prerequisites

  • Access to the Azure Portal
  • Access to the resource group your hub has been deployed to
  • Contributor access to your resource group
  • At a minimum storage account contributor and storage blob data owner roles
  • An existing deployment of the Microsoft FinOps Hub Toolkit

  • Admin rights on Azure Data Factory (or at least contributor role on ADF)

Services and tools required

Make sure the following Azure resources are in place:

  • Azure Data Factory instance
  • A linked Azure Storage Account (this is where FinOps Hub is expecting data)
  • (Optional) Azure Key Vault for secret management

Pipeline Overview

We’ll be creating the following components in Azure Data Factory

Component

Purpose

GCS Linked ServiceConnects ADF to your Google Cloud Storage bucket
Azure Blob Linked ServiceConnects ADF to your Hub’s ingestion container
Source DatasetReads CSV files from GCS
Sink DatasetWrites Parquet files to Azure Data Lake (or Blob) in Hub's expected format
Pipeline LogicOrchestrates the copy activity, format conversion, and metadata updates

Secrets and authentication

To connect securely from ADF to GCS, you will need:

  • The access key and secret from GCS Interoperability settings (created earlier)
  • Store them securely in Azure Key Vault or in ADF pipeline parameters (if short-lived)
💡 Best Practice Tip:

Use Azure Key Vault to securely store GCS access credentials and reference them from ADF linked services. This improves security and manageability over hardcoding values in JSON.

Create Google Cloud storage billing data dataset

Now that we're in Azure, it's time to connect to your Google Cloud Storage bucket and begin building your Azure Data Factory pipeline.

Steps
  • Launch Azure Data Factory
  • Log in to the Azure Portal
  • Navigate to your deployed Azure Data Factory instance
  • Click “Author” from the left-hand menu to open the pipeline editor
  • In the ADF Author pane, click the "+" (plus) icon next to Datasets
  • Select “New dataset”
  • Choose Google Cloud Storage as the data source
  • Select CSV as the file format
  • Set the Dataset Name, for example:
    gcs_focus_export_dataset
  • Click “+ New” next to the Linked Service dropdown
  • Enter a name like:
    GCS-Billing-LinkedService
  • Under Authentication Type, choose:
  • Access Key (for Interoperability access key)
  • Or Azure Key Vault (recommended for secure credential storage)
  • Fill in the following fields:
  • Access Key: Your GCS interoperability key
  • Secret: Your GCS interoperability secret
  • Bucket Name: focus-cost-export
  • (Optional) Point to a folder path like: focus-billing-data/
  • Click Test Connection to validate access
  • Click Create
  • Adjust Dataset Properties

Now that the dataset has been created, make the following modifications to ensure proper parsing of the CSVs:

SettingValue

  • Column delimiter; (semicolon)
  • Escape character" (double quote)

You can find these under the “Connection” and “Schema” tabs of the dataset editor.

If your connection fails you may need to enable public access on your GCS bucket or check your firewall restrictions from azure to the internet!

Create a Google Cloud Storage metadata dataset

To support incremental loading, we need a dedicated dataset to read the export_metadata.json file that your Cloud Function writes to Google Cloud Storage. This dataset will be used by the Lookup activity in your pipeline to get the latest export timestamp from Google.

Steps:
  1. In the Author pane of ADF, click "+" > Dataset > New dataset
  2. Select Google Cloud Storage as the source
  3. Choose JSON as the format
  4. Click Continue
  5. Configure the Dataset
SettingValue
Namegcs_export_metadata_dataset
Linked ServiceUse your existing GCS linked service
File pathe.g., focus-cost-export/metadata/export_metadata.json
Import schemaSet to From connection/store or manually define if needed
File patternSet to single file (not folder)

Create the sink dataset – "ingestion_gcp"

Now that we’ve connected to Google Cloud Storage and defined the source dataset, it’s time to create the sink dataset. This will land your transformed billing data in Parquet format into your Azure FinOps Hub’s ingestion container.

Steps:
  1. In Azure Data Factory, go to the Author section
  2. Under Datasets, click the "+" (plus) icon and select “New dataset”
  3. Choose Azure Data Lake Storage Gen2 (or Blob Storage, depending on your Hub setup)
  4. Select Parquet as the file format
  5. Click Continue
Configure dataset properties
  • Name: ingestion_gcp
  • Linked Service:
    Select your existing linked service that connects to your FinOps Hub’s storage account. (this will have been created when you deployed the hub)
  • File path:
    Point to the container and folder path where you want to store the ingested files (e.g., ingestion-gcp/cost/billing-provider/gcp/focus/)
Optional Settings:
OptionRecommended Value
Compression typesnappy

Once configured, click Publish All again to save your new sink dataset.

Create the sink dataset – "adls_last_import_metadata"

The adls_last_import_metadata dataset (sinks dataset) is the location you use to copy the export time json file from google to azure, this location is sued for the pipeline to check the last time the import of data was run by reading the json file you coped from google and comparing it to the new json file from google

Steps:
  1. In Azure Data Factory, go to the Author section
  2. Under Datasets, click the "+" (plus) icon and select “New dataset”
  3. Choose Azure Data Lake Storage Gen2 (or Blob Storage, depending on your Hub setup)
  4. Select JSON as the file format
  5. Click Continue
Configure dataset properties
  • Name: adsl_last_import_metadata
  • Linked Service:
    Select your existing linked service that connects to your FinOps Hub’s storage account. (this will have been created when you deployed the hub)
  • File path:
    Point to the container and folder path where you want to store the ingested files (e.g., ingestion/cost/metadata)

Build the ADF pipeline for incremental import

With your datasets created, the final step is building the pipeline logic that orchestrates the data flow. The goal is to only import newly exported billing data from Google, avoiding duplicates by comparing timestamps from both clouds.

In this pipeline, we’ll:

  1. Fetch Google’s export timestamp from the JSON metadata file
  2. Fetch the last successful import time from the Hub’s metadata file in Azure
  3. Compare timestamps to determine whether there is new data to ingest
  4. If new data exists:
    • Run the Copy Activity to fetch GCS CSVs, convert to Parquet, and write to the ingestion_gcp container
    • Write an updated metadata file in the hub, with the latest import timestamp
Pipeline components:
ActivityPurpose
Lookup - GCS MetadataReads the last export time from Google's metadata JSON in GCS
Lookup - Hub MetadataReads the last import time from Azure’s metadata JSON in ADLS
If ConditionCompares timestamps to decide whether to continue with copy
Copy DataTransfers files from GCS (CSV) to ADLS (Parquet)
Set VariableCaptures the latest import timestamp
Web/Copy ActivityWrites updated import timestamp JSON file to ingestion_gcp container
What your pipeline will look like:

 

Step-by-Step
  • Create two lookup activities:
    • GCS Metadata
    • Hub Metadata

Lookup1 – GCS Metadata

  • Add an activity called Lookup to your new pipeline
  • Select the Source Dataset: gcs_export_metadata_dataset or whatever you named it earlier

This lookup reads the the export_metadata.json file created by your Cloud Function in GCS for the last export time available.

Configuration view of lookup for GCS metadata file

Lookup2 – Hub Metadata
  • Add an activity called lookup and name it Hub Metadata
  • Select the source dataset: adls_last_import_metadata

This lookup reads the last import time data from the hub metadata file to compare it to the last export time from GCS

Configuration view of Metadata lookup activity

Add conditional logic

In this step we will add the condition logic

  • Add activity called If Condition
  • Add the expression below to the condition

Go to the Expression tab and paste the following into Dynamic Content:

@greater(activity('Lookup1').output.last_export_time, activity('Lookup2').output.last_import_time)

Configuration view of If Condition Activity

Next
  • Add Copy Activity (If Condition = True)
  • next to True condition select edit and add activity Copy Data
  • Configure the activity with the following details
SettingValue
Source Datasetgcs_focus_export_dataset (CSV from GCS)
Sink Datasetingestion_gcp (Parquet to ADLS)
Merge FilesEnabled (reduce file count)
Filter Expression@activity('Lookup1').output.firstRow.last_export_time

Ensure you add filter expression for filter by last modified. This is important, if you do not add the filter by last modified expression your pipeline will not function properly.

Finally we create an activity to update the metadata file in the hub

Add another copy activity to your if condition and ensure it is linked to the previous copy activity, this ensure it runs after the import activity is completed.

Copy metadata activity settings:
Sourcegcs_export_metadata_dataset
Sinkadls_last_import_dataset
Destination Pathingestion_gcp/metadata/last_import.json

This step ensures the next run of your pipeline uses the updated import time to decide whether new data exists.

 

Your pipeline now ingests only new billing data from Google and records each successful import to prevent duplicate processing.

Wrapping up – A unified FinOps reporting solution

Congratulations — you’ve just built a fully functional multi-cloud FinOps data pipeline using Microsoft’s FinOps Hub Toolkit and Google Cloud billing data, normalized with the FOCUS 1.0 standard.

By following this guide, you’ve:

Enabled FOCUS billing exports in Google Cloud using BigQuery, GCS, and Cloud Functions

Created normalized, FOCUS-aligned views to unify your GCP billing data

Automated metadata tracking to support incremental exports

Built an Azure Data Factory pipeline to fetch, transform, and ingest GCP data into your Hub

Established a reliable foundation for centralized, multi-cloud cost reporting

This solution brings side-by-side visibility into Azure and Google Cloud costs, enabling informed decision-making, workload optimization, and true multi-cloud FinOps maturity.

Next steps

  • 🔁 Schedule the ADF pipeline to run daily or hourly using triggers
  • 📈 Build Power BI dashboards or use templates from the Microsoft FinOps Toolkit visualise unified cloud spend
  • 🧠 Extend to AWS by applying the same principles using the AWS FOCUS export and AWS S3 storage

Feedback?

Have questions or want to see more deep dives like this? Let me know — or connect with me if you’re working on FinOps, FOCUS, or multi-cloud reporting. This blog is just the beginning of what's possible.

Updated Jun 26, 2025
Version 3.0

1 Comment

  • AnneG's avatar
    AnneG
    Copper Contributor

    Thank you for this very useful post.

    I'm following all the steps and on the GCP part, I was able to finalize everything, except that when I ran the FOCUS SQL Query, the database of my dataset would not be located in the right EU region. There are limitations on some EU regions when you are not on a Multi-region dataset. In my case, I located everything in 1 region in EU and I didn't choose the right one, apparently 🤔