focus

33 Topics

Learning FOCUS: Introducing an open billing data format
Learn about what the FinOps Open Cost and Usage Specification (FOCUS) is and why you should be leveraging it within your FinOps practice.
micflan
Nov 25, 2024 Place FinOps Blog
3.2KViews
11likes
1Comment
News and updates from FinOps X 2024: How Microsoft is empowering organizations
Last year, I shared a broad set of updates that showcased how Microsoft is embracing FinOps practitioners through education, product improvements, and innovative solutions that help organizations achieve more. with AI-powered experiences like Copilot and Microsoft Fabric. Whether you’re an engineer working in the Azure portal or part of a business or finance team collaborating in Microsoft 365 or analyzing data in Power BI, Microsoft Cloud has the tools you need to accelerate business value for your cloud investments.
micflan
Jul 02, 2024 Place FinOps Blog
12KViews
8likes
0Comments
Getting started with FinOps hubs: Multicloud cost reporting with Azure and Google Cloud
Microsoft’s FinOps hubs offer a powerful and trusted foundation for managing, analyzing, and optimizing cloud costs. Built on Azure Data Explorer (ADX) and Azure Data Lake Storage (ADLS), FinOps hubs provide a scalable platform to unify billing data across providers leveraging FOCUS datasets. In this post, we’ll take a hands-on technical walkthrough of how to connect Microsoft FinOps hubs to Google Cloud, enabling you to export and analyze Google Cloud billing data with Azure billing data directly within your FinOps hub instance. This walk through will focus on using only storage in the hub for accessing data and is designed to get you started and understand multicloud connection and reporting from FinOps hubs. For large datasets Azure Data Explorer or Microsoft Fabric is recommended. With the introduction of the FinOps Open Cost and Usage Specification (FOCUS), normalizing billing data and reporting it through a single-pane-of-glass experience has never been easier. As a long-time contributor to the FOCUS working group, I’ve spent the past few years helping define standards to make multicloud reporting simpler and more actionable. FOCUS enables side-by-side views of cost and usage data—such as compute hours across cloud providers—helping organizations make better decisions around workload placement and right-sizing. Before FOCUS, this kind of unified analysis was incredibly challenging due to the differing data models used by each provider. To complete this technical walk through, you’ll need access to both Azure and Google Cloud—and approximately 2 to 3 hours of your time. Getting started: The basics you’ll need Before diving in, ensure you have the following prerequisites in place: ✅ Access to your Google Cloud billing account. ✅ A Google Cloud project with BigQuery and other APIs enabled and linked to your billing account. ✅ All required IAM roles and permissions for working with BigQuery, Cloud Functions, Storage, and billing data (detailed below). ✅ Detailed billing export and pricing export configured in Google Cloud. ✅ An existing deployment of FinOps hubs. What you’ll be building In this walk-through, you’ll set up Google billing exports and an Azure Data Factory pipeline to fetch the exported data, convert to parquet and ingest into your FinOps hub storage, following the FOCUS 1.0 standard for normalization. Through the process you should end up creating: Configure a BigQuery view to convert detailed billing exports into the FOCUS 1.0 schema. Create a metadata table in BigQuery to track export timestamps to enable incremental data exports reducing file export sizes and avoiding duplicate data. Set up a GCS bucket to export FOCUS-formatted data in CSV format. Deploy a Google Cloud Function that performs incremental exports from BigQuery to GCS. Create a Google Cloud Schedule to automate the export of your billing data to Google Cloud Storage. Build a pipeline in Azure Data Factory to: Fetch the CSV billing exports from Google Cloud Storage. Convert them to Parquet. Ingest the transformed data into your FinOps hub ingestion container. Let's get started. We're going to start in Google Cloud before we jump back into Azure to setup the Data Factory pipeline. Enabling FOCUS exports in Google Prerequisite: Enable detailed billing & pricing exports ⚠️ You cannot complete this guide without billing data enabled in BigQuery if you have not enabled detailed billing exports and pricing exports, do this now and come back to the walk-through in 24 hours. Steps to enabled detailed billing exports and pricing exports: Navigate to Billing > Billing Export. Enable Detailed Cost Export to BigQuery. Select the billing project and a dataset - if you have not created a project for billing do so now. Enable Pricing Export to the same dataset. 🔊 “This enables daily cost and usage data to be streamed to BigQuery for granular analysis.” Detailed guidance and information on billing exports in Google can be found here: Google Billing Exports. What you'll create: Service category table Metadata table FOCUS View - you must have detailed billing export and pricing exports enabled to create this view Cloud Function Cloud Schedule What you'll need An active GCP Billing Account (this is your payment method). A GCP Project (new or existing) linked to your GCP billing account. All required APIs enabled. All required IAM roles enabled. Enable Required APIs: BigQuery API Cloud Billing API Cloud Build API Cloud Scheduler Cloud Functions Cloud Storage Cloud Run Admin Pub/Sub (optional, for event triggers) Required IAM Roles Assign the following roles to your user account: roles/billing.viewer or billing.admin roles/bigquery.dataEditor roles/storage.admin roles/cloudfunctions.admin roles/cloudscheduler.admin roles/iam.serviceAccountTokenCreator roles/cloudfunctions.invoker roles/run.admin (if using Cloud Run) roles/project.editor Create a service account (e.g., svc-bq-focus) Assign the following roles to your service account: roles/bigquery.dataOwner roles/storage.objectAdmin roles/cloudfunctions.invoker roles/cloudscheduler.admin roles/serviceAccount.tokenCreator roles/run.admin Your default compute service account will also require access to run cloud build services. Ensure you apply the cloud build role to your default compute service account in your project, it may look like this: projectID-compute@developer.gserviceaccount.com → roles/cloudbuild.builds.builder Create FOCUS data structure and View This section will create two new tables in Big Query, one for service category mappings and one for metadata related to export times. These are important to get in before we create the FOCUS view to extract billing data in FOCUS format. Create a service category mapping table I removed service category from the original google FOCUS view to reduce the size of the SQL Query, therefore, to ensure we mapped Service category properly, I created a new service category table and joined it to the FOCUS view. In this step we will create a new table using open data to map GCP services to service category. Doing this helps reduce the size of the SQL query and simplifies management of Service Category mapping. Leveraging open source data we can easily update service category mappings if they ever change or new categories are added without impacting the FOCUS view query. Process: Download the latest service_category_mapping.csv from the FOCUS Converter repo Go to BigQuery > Your Dataset > Create Table Upload the CSV Table name: service_category Schema: Auto-detect Create a metadata table This table will be used to track the last time detailed billing data was added to your detailed billing export, we use this to enable incremental exports of billing data through the FOCUS view to ensure we only export the latest set of data and not everything all the time. Process: Go to BigQuery > Create Table Table name: metadata_focus_export Schema: Field Name : Format last_export_time: TIMESTAMP export_message: STRING Enter your field name and then choose field format, do not add : 🔊 “Ensures each export only includes new data since the last timestamp.” Create the FOCUS-aligned view Creating a view in BigQuery allows us to un-nest the detailed billing export tables into the format of FOCUS 1.0*. To use Power BI we must un-nest the tables so this step is required. It also ensures we map the right columns in Google Cloud detailed billing export to the right columns in FOCUS. ⚠️ The FOCUS SQL code provided in this walk-through has been altered from the original Google provided code. I believe this new code is better formatted for FOCUS 1.0 than the original however it does contain some nuances that suit my personal views. Please evaluate this carefully before using this in a production system and adjust the code accordingly to your needs. Steps: Navigate to BigQuery > New Query Paste and update the FOCUS view SQL query which is provided below Replace: yourexporttable with detailed export dataset ID and table name that will look like " yourpricingexporttable with pricing export and table name your_billing_dataset with your detailed export dataset ID and table name FOCUS SQL Query: WITH usage_export AS ( SELECT *, ( SELECT AS STRUCT type, id, full_name, amount, name FROM UNNEST(credits) LIMIT 1 ) AS cud, FROM "Your-Detailed-Billing-Export-ID" -- replace with your detailed usage export table path ), prices AS ( SELECT export_time, sku.id AS sku_id, sku.description AS sku_description, service.id AS service_id, service.description AS service_description, tier.* FROM "your_pricing_export_id", UNNEST(list_price.tiered_rates) AS tier ) SELECT "111111-222222-333333" AS BillingAccountId, "Your Company Name" AS BillingAccountName, COALESCE((SELECT SUM(x.amount) FROM UNNEST(usage_export.credits) x),0) + cost as BilledCost, usage_export.currency AS BillingCurrency, DATETIME(PARSE_DATE("%Y%m", invoice.month)) AS BillingPeriodStart, DATETIME(DATE_SUB(DATE_ADD(PARSE_DATE("%Y%m", invoice.month), INTERVAL 1 MONTH), INTERVAL 1 DAY)) AS BillingPeriodEnd, CASE WHEN usage_export.adjustment_info.type IS NOT NULL and usage_export.adjustment_info.type !='' THEN 'Adjustment' WHEN usage_export.cud.type = 'PROMOTION' AND usage_export.cost_type = 'regular' AND usage_export.cud.id IS NOT NULL THEN 'Credit' WHEN usage_export.sku.description LIKE '%Commitment - 3 years - dollar based VCF%' or usage_export.sku.description LIKE '%Prepay Commitment%' THEN 'Purchase' WHEN usage_export.cud.id IS NOT NULL AND usage_export.cud.id != '' THEN 'Credit' WHEN usage_export.cost_type = 'regular' THEN 'Usage' WHEN usage_export.cost_type = 'tax' THEN 'Tax' WHEN usage_export.cost_type = 'adjustment' THEN 'Adjustment' WHEN usage_export.cost_type = 'rounding_error' THEN 'Adjustment' ELSE usage_export.cost_type END AS ChargeCategory, IF(COALESCE( usage_export.adjustment_info.id, usage_export.adjustment_info.description, usage_export.adjustment_info.type, usage_export.adjustment_info.mode) IS NOT NULL, "correction", NULL) AS ChargeClass, CASE WHEN usage_export.adjustment_info.type IS NOT NULL AND usage_export.adjustment_info.type != '' THEN usage_export.adjustment_info.type ELSE usage_export.sku.description END AS ChargeDescription, CAST(usage_export.usage_start_time AS DATETIME) AS ChargePeriodStart, CAST(usage_export.usage_end_time AS DATETIME) AS ChargePeriodEnd, CASE usage_export.cud.type WHEN "COMMITTED_USAGE_DISCOUNT_DOLLAR_BASE" THEN "Spend" WHEN "COMMITTED_USAGE_DISCOUNT" THEN "Usage" END AS CommitmentDiscountCategory, usage_export.cud.id AS CommitmentDiscountId, COALESCE (usage_export.cud.full_name, usage_export.cud.name) AS CommitmentDiscountName, usage_export.cud.type AS CommitmentDiscountType, CAST(usage_export.usage.amount_in_pricing_units AS numeric) AS ConsumedQuantity, usage_export.usage.pricing_unit AS ConsumedUnit, -- review CAST( CASE WHEN usage_export.cost_type = "regular" THEN usage_export.price.effective_price * usage_export.price.pricing_unit_quantity ELSE 0 END AS NUMERIC ) AS ContractedCost, -- CAST(usage_export.price.effective_price AS numeric) AS ContractedUnitPrice, usage_export.seller_name AS InvoiceIssuerName, COALESCE((SELECT SUM(x.amount) FROM UNNEST(usage_export.credits) x),0) + cost as EffectiveCost, CAST(usage_export.cost_at_list AS numeric) AS ListCost, prices.account_currency_amount AS ListUnitPrice, IF( usage_export.cost_type = "regular", IF( LOWER(usage_export.sku.description) LIKE "commitment%" OR usage_export.cud IS NOT NULL, "committed", "standard"), null) AS PricingCategory, IF(usage_export.cost_type = 'regular', usage_export.price.pricing_unit_quantity, NULL) AS PricingQuantity, IF(usage_export.cost_type = 'regular', usage_export.price.unit, NULL) AS PricingUnit, 'Google'AS ProviderName, usage_export.transaction_type AS PublisherName, usage_export.location.region AS RegionId, usage_export.location.region AS RegionName, usage_export.service.id AS ResourceId, REGEXP_EXTRACT (usage_export.resource.global_name, r'[^/]+$') AS ResourceName, usage_export.sku.description AS ResourceType, COALESCE(servicemapping.string_field_1, 'Other') AS ServiceCategory, usage_export.service.description AS ServiceName, usage_export.sku.id AS SkuId, CONCAT("SKU ID:", usage_export.sku.id, ", Price Tier Start Amount: ", price.tier_start_amount) AS SkuPriceId, usage_export.project.id AS SubAccountId, usage_export.project.name AS SubAccountName, (SELECT CONCAT('{', STRING_AGG(FORMAT('%s:%s', kv.key, kv.value), ', '), '}') FROM ( SELECT key, value FROM UNNEST(usage_export.project.labels) UNION ALL SELECT key, value FROM UNNEST(usage_export.tags) UNION ALL SELECT key, value FROM UNNEST(usage_export.labels) ) AS kv) AS Tags, FORMAT_DATE('%B', PARSE_DATE('%Y%m', invoice.month)) AS Month, usage_export.project.name AS x_ResourceGroupName, CAST(usage_export.export_time AS TIMESTAMP) AS export_time, FROM usage_export LEFT JOIN "Your-Service-Category-Id".ServiceCategory AS servicemapping ON usage_export.service.description = servicemapping.string_field_0 LEFT JOIN prices ON usage_export.sku.id = prices.sku_id AND usage_export.price.tier_start_amount = prices.start_usage_amount AND DATE(usage_export.export_time) = DATE(prices.export_time); 🔊 "This creates a FOCUS-aligned view of your billing data using FOCUS 1.0 specification, this view does not 100% conform to FOCUS 1.0. Create GCS Bucket for CSV Exports Your GCS bucket is where you will place your incremental exports to be exported to Azure. Once your data is exported to Azure you may elect to delete the files in the bucket, the metaata table keeps a record of the last export time. Steps: Go to Cloud Storage > Create Bucket Name: focus-cost-export (or any name you would like) Region: Match your dataset region Storage Class: Nearline (cheaper to use Earline but standard will also work just fine) Enable Interoperability settings > create access/secret key The access key and secret are tied to your user account, if you want to be able to use this with multiple people, create an access key for your service account - recommended for Prod, for this guide purpose, an access key linked to your account is fine. Save the access key and secret to a secure location to use later as part of the ADF pipeline setup. 🔊 “This bucket will store daily CSVs. Consider enabling lifecycle cleanup policies.” Create a Cloud Function for incremental export A cloud function is used here to enable incremental exports of your billing data in FOCUS format on a regular schedule. At present there is no known supported on demand export service from Google, so we came up with this little workaround. The function is designed to evaluate your billing data for last export time that is equals to or greater than the last time the function ran. To do this we look at the export_time column and the metadata table for the last time we ran this. This ensure we only export the most recent billing data which aids in reducing data export costs to Azure. This process is done through the GCP GUI using an inline editor to create the cloud function in the cloud run service. Steps: Go to Cloud Run > Write Function Select > Use Inline editor to create a function Service Name: daily_focus_export Region, the same region as your dataset - in our demo case us-central1 Use settings: Runtime: Python 3.11 (you cannot use anything later than 3.11) Trigger: Optional Authentication: Require Authentication Billing: Request based Service Scaling: Auto-scaling set to 0 Ingress: All Containers: leave all settings as set Volumes: Leave all settings as set Networking: Leave all settings as set Security: Choose the service account you created earlier Save Create main.py file and requirements.txt files through the inline editor For requirements.txt copy and paste the below: functions-framework==3.* google-cloud-bigquery Google-cloud-storage For main.py your function entry point is: export_focus_data Paste the code below into your main.py inline editor window import logging import time import json from google.cloud import bigquery, storage logging.basicConfig(level=logging.INFO) def export_focus_data(request): bq_client = bigquery.Client() storage_client = storage.Client() project_id = "YOUR-ProjectID" dataset = "YOUR-Detailed-Export-Dataset" view = "YOUR-FOCUS-View-Name" metadata_table = "Your-Metadata-Table" job_name = "The name you want to call the job for the export" bucket_base = "gs://<your bucketname>/<your foldername>" bucket_name = "Your Bucket Name" metadata_file_path = "Your-Bucket-name/export_metadata.json" try: logging.info("🔁 Starting incremental export based on export_time...") # Step 1: Get last export_time from metadata metadata_query = f""" SELECT last_export_time FROM `{project_id}.{dataset}.{metadata_table}` WHERE job_name = '{job_name}' LIMIT 1 """ metadata_result = list(bq_client.query(metadata_query).result()) if not metadata_result: return "No metadata row found. Please seed export_metadata.", 400 last_export_time = metadata_result[0].last_export_time logging.info(f"📌 Last export_time from metadata: {last_export_time}") # Step 2: Check for new data check_data_query = f""" SELECT COUNT(*) AS row_count FROM `{project_id}.{dataset}.{view}` WHERE export_time >= TIMESTAMP('{last_export_time}') """ row_count = list(bq_client.query(check_data_query).result())[0].row_count if row_count == 0: logging.info("✅ No new data to export.") return "No new data to export.", 204 # Step 3: Get distinct export months folder_query = f""" SELECT DISTINCT FORMAT_DATETIME('%Y%m', BillingPeriodStart) AS export_month FROM `{project_id}.{dataset}.{view}` WHERE export_time >= TIMESTAMP('{last_export_time}') AND BillingPeriodStart IS NOT NULL """ export_months = [row.export_month for row in bq_client.query(folder_query).result()] logging.info(f"📁 Exporting rows from months: {export_months}") # Step 4: Export data for each month for export_month in export_months: export_path = f"{bucket_base}/{export_month}/export_{int(time.time())}_*.csv" export_query = f""" EXPORT DATA OPTIONS( uri='{export_path}', format='CSV', overwrite=true, header=true, field_delimiter=';' ) AS SELECT * FROM `{project_id}.{dataset}.{view}` WHERE export_time >= TIMESTAMP('{last_export_time}') AND FORMAT_DATETIME('%Y%m', BillingPeriodStart) = '{export_month}' """ bq_client.query(export_query).result() # Step 5: Get latest export_time from exported rows max_export_time_query = f""" SELECT MAX(export_time) AS new_export_time FROM `{project_id}.{dataset}.{view}` WHERE export_time >= TIMESTAMP('{last_export_time}') """ new_export_time = list(bq_client.query(max_export_time_query).result())[0].new_export_time # Step 6: Update BigQuery metadata table update_query = f""" MERGE `{project_id}.{dataset}.{metadata_table}` T USING ( SELECT '{job_name}' AS job_name, TIMESTAMP('{new_export_time}') AS new_export_time ) S ON T.job_name = S.job_name WHEN MATCHED THEN UPDATE SET last_export_time = S.new_export_time WHEN NOT MATCHED THEN INSERT (job_name, last_export_time) VALUES (S.job_name, S.new_export_time) """ bq_client.query(update_query).result() # Step 7: Write metadata JSON to GCS blob = storage_client.bucket(bucket_name).blob(metadata_file_path) blob.upload_from_string( json.dumps({"last_export_time": new_export_time.isoformat()}), content_type="application/json" ) logging.info(f"📄 Metadata file written to GCS: gs://{bucket_name}/{metadata_file_path}") return f"✅ Export complete. Metadata updated to {new_export_time}", 200 except Exception as e: logging.exception("❌ Incremental export failed:") return f"Function failed: {str(e)}", 500 Before saving, update variables like project_id, bucket_name, etc. 🔊 “This function exports new billing data based on the last timestamp in your metadata table.” Test the Cloud Function Deploy and click Test Function Ensure a seed export_metadata.json file is uploaded to your GCS bucket (if you have not uploaded a seed export file the function will not run. Example file is below { "last_export_time": "2024-12-31T23:59:59.000000+00:00" } Confirm new CSVs appear in your target folder Automate with Cloud Scheduler This step will set up an automated schedule to run your cloud function on a daily or hourly pattern, you may adjust the frequency to your desired schedule, this demo uses the same time each day at 11pm. Steps: Go to Cloud Scheduler > Create Job Region: Same region as your Function - This Demo us-central1 Name: daily-focus-export Frequency: 0 */6 * * * (every 6 hours) or Frequency: 0 23 * * * (daily at 11 PM) Time zone: your desired time zone Target: HTTP Auth: OIDC Token → Use service account Click CREATE 🔊 "This step automates the entire pipeline to run daily and keep downstream platforms updated with the latest billing data." Wrapping up Google setup Wow—there was a lot to get through! But now that you’ve successfully enabled FOCUS-formatted exports in Google Cloud, you're ready for the next step: connecting Azure to your Google Cloud Storage bucket and ingesting the data into your FinOps hub instance. This is where everything comes together—enabling unified, multi-cloud reporting in your Hub across both Azure and Google Cloud billing data. Let’s dive into building the Data Factory pipeline and the associated datasets needed to fetch, transform, and load your Google billing exports. Connecting Azure to Google billing data With your Google Cloud billing data now exporting in FOCUS format, the next step is to bring it into your Azure environment for centralized FinOps reporting. Using Data Factory, we'll build a pipeline to fetch the CSVs from your Google Cloud Storage bucket, convert them to Parquet, and land them in your FinOps Hub storage account. Azure access and prerequisites Access to the Azure Portal Access to the resource group your hub has been deployed to Contributor access to your resource group At a minimum storage account contributor and storage blob data owner roles An existing deployment of the Microsoft FinOps Hub Toolkit Admin rights on Azure Data Factory (or at least contributor role on ADF) Services and tools required Make sure the following Azure resources are in place: Azure Data Factory instance A linked Azure Storage Account (this is where FinOps Hub is expecting data) (Optional) Azure Key Vault for secret management Pipeline Overview We’ll be creating the following components in Azure Data Factory Component Purpose GCS Linked Service Connects ADF to your Google Cloud Storage bucket Azure Blob Linked Service Connects ADF to your Hub’s ingestion container Source Dataset Reads CSV files from GCS Sink Dataset Writes Parquet files to Azure Data Lake (or Blob) in Hub's expected format Pipeline Logic Orchestrates the copy activity, format conversion, and metadata updates Secrets and authentication To connect securely from ADF to GCS, you will need: The access key and secret from GCS Interoperability settings (created earlier) Store them securely in Azure Key Vault or in ADF pipeline parameters (if short-lived) 💡 Best Practice Tip: Use Azure Key Vault to securely store GCS access credentials and reference them from ADF linked services. This improves security and manageability over hardcoding values in JSON. Create Google Cloud storage billing data dataset Now that we're in Azure, it's time to connect to your Google Cloud Storage bucket and begin building your Azure Data Factory pipeline. Steps Launch Azure Data Factory Log in to the Azure Portal Navigate to your deployed Azure Data Factory instance Click “Author” from the left-hand menu to open the pipeline editor In the ADF Author pane, click the "+" (plus) icon next to Datasets Select “New dataset” Choose Google Cloud Storage as the data source Select CSV as the file format Set the Dataset Name, for example: gcs_focus_export_dataset Click “+ New” next to the Linked Service dropdown Enter a name like: GCS-Billing-LinkedService Under Authentication Type, choose: Access Key (for Interoperability access key) Or Azure Key Vault (recommended for secure credential storage) Fill in the following fields: Access Key: Your GCS interoperability key Secret: Your GCS interoperability secret Bucket Name: focus-cost-export (Optional) Point to a folder path like: focus-billing-data/ Click Test Connection to validate access Click Create Adjust Dataset Properties Now that the dataset has been created, make the following modifications to ensure proper parsing of the CSVs: SettingValue Column delimiter; (semicolon) Escape character" (double quote) You can find these under the “Connection” and “Schema” tabs of the dataset editor. If your connection fails you may need to enable public access on your GCS bucket or check your firewall restrictions from azure to the internet! Create a Google Cloud Storage metadata dataset To support incremental loading, we need a dedicated dataset to read the export_metadata.json file that your Cloud Function writes to Google Cloud Storage. This dataset will be used by the Lookup activity in your pipeline to get the latest export timestamp from Google. Steps: In the Author pane of ADF, click "+" > Dataset > New dataset Select Google Cloud Storage as the source Choose JSON as the format Click Continue Configure the Dataset Setting Value Name gcs_export_metadata_dataset Linked Service Use your existing GCS linked service File path e.g., focus-cost-export/metadata/export_metadata.json Import schema Set to From connection/store or manually define if needed File pattern Set to single file (not folder) Create the sink dataset – "ingestion_gcp" Now that we’ve connected to Google Cloud Storage and defined the source dataset, it’s time to create the sink dataset. This will land your transformed billing data in Parquet format into your Azure FinOps Hub’s ingestion container. Steps: In Azure Data Factory, go to the Author section Under Datasets, click the "+" (plus) icon and select “New dataset” Choose Azure Data Lake Storage Gen2 (or Blob Storage, depending on your Hub setup) Select Parquet as the file format Click Continue Configure dataset properties Name: ingestion_gcp Linked Service: Select your existing linked service that connects to your FinOps Hub’s storage account. (this will have been created when you deployed the hub) File path: Point to the container and folder path where you want to store the ingested files (e.g., ingestion-gcp/cost/billing-provider/gcp/focus/) Optional Settings: Option Recommended Value Compression type snappy Once configured, click Publish All again to save your new sink dataset. Create the sink dataset – "adls_last_import_metadata" The adls_last_import_metadata dataset (sinks dataset) is the location you use to copy the export time json file from google to azure, this location is sued for the pipeline to check the last time the import of data was run by reading the json file you coped from google and comparing it to the new json file from google Steps: In Azure Data Factory, go to the Author section Under Datasets, click the "+" (plus) icon and select “New dataset” Choose Azure Data Lake Storage Gen2 (or Blob Storage, depending on your Hub setup) Select JSON as the file format Click Continue Configure dataset properties Name: adsl_last_import_metadata Linked Service: Select your existing linked service that connects to your FinOps Hub’s storage account. (this will have been created when you deployed the hub) File path: Point to the container and folder path where you want to store the ingested files (e.g., ingestion/cost/metadata) Build the ADF pipeline for incremental import With your datasets created, the final step is building the pipeline logic that orchestrates the data flow. The goal is to only import newly exported billing data from Google, avoiding duplicates by comparing timestamps from both clouds. In this pipeline, we’ll: Fetch Google’s export timestamp from the JSON metadata file Fetch the last successful import time from the Hub’s metadata file in Azure Compare timestamps to determine whether there is new data to ingest If new data exists: Run the Copy Activity to fetch GCS CSVs, convert to Parquet, and write to the ingestion_gcp container Write an updated metadata file in the hub, with the latest import timestamp Pipeline components: Activity Purpose Lookup - GCS Metadata Reads the last export time from Google's metadata JSON in GCS Lookup - Hub Metadata Reads the last import time from Azure’s metadata JSON in ADLS If Condition Compares timestamps to decide whether to continue with copy Copy Data Transfers files from GCS (CSV) to ADLS (Parquet) Set Variable Captures the latest import timestamp Web/Copy Activity Writes updated import timestamp JSON file to ingestion_gcp container What your pipeline will look like: Step-by-Step Create two lookup activities: GCS Metadata Hub Metadata Lookup1 – GCS Metadata Add an activity called Lookup to your new pipeline Select the Source Dataset: gcs_export_metadata_dataset or whatever you named it earlier This lookup reads the the export_metadata.json file created by your Cloud Function in GCS for the last export time available. Configuration view of lookup for GCS metadata file Lookup2 – Hub Metadata Add an activity called lookup and name it Hub Metadata Select the source dataset: adls_last_import_metadata This lookup reads the last import time data from the hub metadata file to compare it to the last export time from GCS Configuration view of Metadata lookup activity Add conditional logic In this step we will add the condition logic Add activity called If Condition Add the expression below to the condition Go to the Expression tab and paste the following into Dynamic Content: @greater(activity('Lookup1').output.last_export_time, activity('Lookup2').output.last_import_time) Configuration view of If Condition Activity Next Add Copy Activity (If Condition = True) next to True condition select edit and add activity Copy Data Configure the activity with the following details Setting Value Source Dataset gcs_focus_export_dataset (CSV from GCS) Sink Dataset ingestion_gcp (Parquet to ADLS) Merge Files Enabled (reduce file count) Filter Expression @activity('Lookup1').output.firstRow.last_export_time Ensure you add filter expression for filter by last modified. This is important, if you do not add the filter by last modified expression your pipeline will not function properly. Finally we create an activity to update the metadata file in the hub Add another copy activity to your if condition and ensure it is linked to the previous copy activity, this ensure it runs after the import activity is completed. Copy metadata activity settings: Source gcs_export_metadata_dataset Sink adls_last_import_dataset Destination Path ingestion_gcp/metadata/last_import.json This step ensures the next run of your pipeline uses the updated import time to decide whether new data exists. Your pipeline now ingests only new billing data from Google and records each successful import to prevent duplicate processing. Wrapping up – A unified FinOps reporting solution Congratulations — you’ve just built a fully functional multi-cloud FinOps data pipeline using Microsoft’s FinOps Hub Toolkit and Google Cloud billing data, normalized with the FOCUS 1.0 standard. By following this guide, you’ve: ✅ Enabled FOCUS billing exports in Google Cloud using BigQuery, GCS, and Cloud Functions ✅ Created normalized, FOCUS-aligned views to unify your GCP billing data ✅ Automated metadata tracking to support incremental exports ✅ Built an Azure Data Factory pipeline to fetch, transform, and ingest GCP data into your Hub ✅ Established a reliable foundation for centralized, multi-cloud cost reporting This solution brings side-by-side visibility into Azure and Google Cloud costs, enabling informed decision-making, workload optimization, and true multi-cloud FinOps maturity. Next steps 🔁 Schedule the ADF pipeline to run daily or hourly using triggers 📈 Build Power BI dashboards or use templates from the Microsoft FinOps Toolkit visualise unified cloud spend 🧠 Extend to AWS by applying the same principles using the AWS FOCUS export and AWS S3 storage Feedback? Have questions or want to see more deep dives like this? Let me know — or connect with me if you’re working on FinOps, FOCUS, or multi-cloud reporting. This blog is just the beginning of what's possible.
GrahamM
May 27, 2025 Place FinOps Blog
3.3KViews
4likes
1Comment
What’s new in FinOps toolkit 0.8 – February 2025
In February, the FinOps toolkit 0.8 introduced a complete refresh of Power BI with a new design, improved performance, and ability to calculate reservation savings; FinOps hubs have a new dashboard and simpler architecture; and much more!
micflan
Mar 12, 2025 Place FinOps Blog
2.2KViews
4likes
0Comments
What’s new in FinOps toolkit 0.4 – July 2024
In July, the FinOps toolkit 0.4 added support for FOCUS 1.0, updated tools and resources to align with the FinOps Framework 2024 updates, introduced a new tool for cloud optimization recommendations called Azure Optimization Engine, and more!
micflan
Jul 23, 2024 Place FinOps Blog
3.9KViews
4likes
1Comment
What’s new in FinOps toolkit 12 – July 2025
This month, you’ll find support for FOCUS 1.2, autostart in FinOps hubs which can reduce your hub costs, a new page in the Cost summary Power BI report, and various small fixes, improvements, and documentation updates across the board. Read on for details.
Michael_Flanakin
Aug 07, 2025 Place FinOps Blog
1.3KViews
3likes
5Comments
Learning FOCUS: Prices + quantities
Learn about how to track and measure usage and purchase amounts and calculate costs in a FOCUS dataset.
micflan
May 28, 2025 Place FinOps Blog
801Views
3likes
0Comments
A journey from Excel to FinOps hubs
When I look back at our early days, it almost seems surreal how our Cloud Economics Practice at Alescent evolved - bootstrapping a 360-degree cloud economic optimization from an internal Excel-based cloud cost analysis service, we have built a state-of-the-art FinOps hubs-based solution with a myopic focus on value realization. In this story, we share our technical evolution and a whole lot of learnings along the way. Humble beginnings: The Excel era About five years ago, in 2020, I was working as a cloud solution architect for my former employer – a company deeply embedded in the Cloud Solution Provider business. We set out to build a FinOps capability within the B2B consulting department, driven by the belief that the CSP should always act in the end-customer's best interest. Our initial approach was simple: we began by examining cloud cost data, focusing primarily on Azure (with a touch of AWS). Back then, our analysis was rooted in Excel. I remember the excitement of uncovering cost patterns in data that spanned thousands – if not millions—of rows. We focused on three key Azure services: Virtual machines Virtual machine scale sets Azure Kubernetes Service Using a combination of 'fgrep' and Linux shell scripting on an Ubuntu shell via Windows Subsystem for Linux, I filtered out the irrelevant data. We were clear that this was only a partial view - capturing just the compute and licensing costs - but it was enough to reveal that many customers were overspending. In fact, our early analyses showed potential savings between 15% and 30% of their total cloud costs. These savings were primarily due to two things: Outdated services: Many customers weren’t leveraging the latest VM versions, missing out on better per-vCPU performance and the opportunity to right-size their configurations. Licensing oversights: For Windows Server, customers often used Azure’s “license included” feature instead of taking advantage of cheaper licensing options through the Azure Hybrid Use Benefit (AHUB) feature. The results were eye-opening. Customers who had migrated from their on-premises data centers in a typical lift-and-shift manner were wasting significant amounts of money on compute services and licensing: a realization that paved the way for more radical changes. The breakthrough: Transitioning to Azure Data Explorer and Kusto Query Language Despite the effectiveness of our Excel-based method, its limitations soon became too great to ignore. Enter Azure Data Explorer (ADX), and its powerful query language (KQL). My colleague Roland, who had a strong background in development and IoT proofs of concept, using ADX and KQL, made the case for transitioning our data analysis to ADX. Initially, I was skeptical. My bash scripts and Excel workflows, while cumbersome, were working. But the simplicity and power of KQL quickly won me over, as – having founded my own company – I quickly had to step up my game from merely providing optimization recommendations to achieving value realizations. From that moment on, we – Roland and I – embarked on a new chapter. Working with ADX revolutionized our process. Suddenly, questions that once took hours to answer in Excel were resolved in minutes with a few lines of KQL. For example, in our working group and steering committee meetings, queries like "what has been the cost trend of our Premium Managed Disks?" or "how significant was the impact of reducing instances in our App Service Plans?" shifted from "I’ll get back to you" to "Let’s check it right now." The ability to produce credible, live data answers on the spot dramatically boosted our credibility with IT leadership – while being able to do complex calculations like the break-even of a reservation or of a commitment tier expanded both the savings potential we could tap into as well as our rapport with the cloud ops engineers supporting our FinOps programs. And all that with data never leaving our customers’ Azure tenant: we deployed ADX into their subscriptions and opened the cloud data providing full visibility, cost transparency and allocation – sometimes for the first time – to all stakeholders in our customers’ project environment. Our customers’ technical teams soon jumped on the KQL train. Not only did our data analysts start exchanging KQL snippets with our internal team, but our customers’ engineers began writing their own queries to generate real-time insights from Azure cost data. This cultural shift from static Excel reports to dynamic, code-driven real-time dashboards revolutionized the way cost optimization was approached. Embracing the FOCUS standard and the FinOps hubs transition The evolution didn’t stop with KQL. Soon after, the FOCUS standard was released, merging EffectiveCost and BilledCost into a single, unified dataset. We adapted our ingestion process to the FOCUS standard—preserving our quality-of-life improvements like calculating "first used" and "last seen" dates for resources and splitting critical tags (such as "environment" or "application") into dedicated columns. This transition significantly improved query performance and enriched our data analysis. In late 2024, we learned about Microsoft’s FinOps toolkit and the FinOps hubs solution was also moving to ADX and KQL. Thanks to Michael Flanakin and his team’s work in mapping different FOCUS versions for backward compatibility, the transition was seamless. We only had to update a few details in our KQL queries – changing table names and converting certain data types - to cut over to FinOps hubs. Our dashboards continued to function without interruption, and our customers barely noticed the switch. In fact, our new ADX & KQL-based FinOps hubs service provided an even more robust foundation for real-time cost management by adding our dashboards and analysis scripts on top of the dashboards provided by FinOps hubs. The impact on our journey and our customers This transformation had a profound impact on our work: Customer success: By transitioning to ADX, we helped customers drastically reduce their cloud costs. Our dynamic dashboards and live queries became a cornerstone of our service, leading to contract renewals and expanded FinOps capabilities. Internal evolution: The success of our ADX implementation not only enhanced our credibility but also drove a cultural shift. Our customers' engineers started interacting with our data in real time, exchanging KQL code snippets, and independently verifying our findings. Strategic partnerships: This journey was not just a technical evolution—it was a turning point that spurred strategic growth. What started as a two-man show with Roland and me has grown to a team of 20 people with skills ranging from FinOps over license and contract optimization to AI development, using of our Value Realization Framework to drive economic optimization inside and around the public cloud. Conclusion Our journey from Excel to FinOps hubs is a testament to the power of innovation and the importance of evolving with the tools at our disposal. What began as a rudimentary Excel analysis transformed into a cutting-edge, real-time data processing platform that reshaped our approach to cloud economic optimization. This evolution not only enhanced our technical capabilities but also revolutionized how we interact with our customers – turning static reports into dynamic, actionable insights and reducing implementation times from months to days. I hope this narrative not only informs but also inspires you to embrace change and continuously seek better solutions. Our journey is a living story, one that continues to evolve, just as our technology and strategies do. Follow-up discussion: I’d love to hear from you all – what additional details, anecdotes, or angles should we explore in future blog posts about our journey? How can we further illustrate the impact of our transition on both our internal processes and our customers’ success?
Stefan_Denk
Apr 15, 2025 Place FinOps Blog
491Views
3likes
0Comments
Managing Azure OpenAI costs with the FinOps toolkit and FOCUS: Turning tokens into unit economics
By Robb Dilallo Introduction As organizations rapidly adopt generative AI, Azure OpenAI usage is growing—and so are the complexities of managing its costs. Unlike traditional cloud services billed per compute hour or storage GB, Azure OpenAI charges based on token usage. For FinOps practitioners, this introduces a new frontier: understanding AI unit economics and managing costs where the consumed unit is a token. This article explains how to leverage the Microsoft FinOps toolkit and the FinOps Open Cost and Usage Specification (FOCUS) to gain visibility, allocate costs, and calculate unit economics for Azure OpenAI workloads. Why Azure OpenAI cost management is different AI services break many traditional cost management assumptions: Billed by token usage (input + output tokens). Model choices matter (e.g., GPT-3.5 vs. GPT-4 Turbo vs. GPT-4o). Prompt engineering impacts cost (longer context = more tokens). Bursty usage patterns complicate forecasting. Without proper visibility and unit cost tracking, it's difficult to optimize spend or align costs to business value. Step 1: Get visibility with the FinOps toolkit The Microsoft FinOps toolkit provides pre-built modules and patterns for analyzing Azure cost data. Key tools include: Microsoft Cost Management exports Export daily usage and cost data in a FOCUS-aligned format. FinOps hubs Infrastructure-as-Code solution to ingest, transform, and serve cost data. Power BI templates Pre-built reports conformed to FOCUS for easy analysis. Pro tip: Start by connecting your Microsoft Cost Management exports to a FinOps hub. Then, use the toolkit’s Power BI FOCUS templates to begin reporting. Learn more about the FinOps toolkit Step 2: Normalize data with FOCUS The FinOps Open Cost and Usage Specification (FOCUS) standardizes billing data across providers—including Azure OpenAI. FOCUS Column Purpose Azure Cost Management Field ServiceName Cloud service (e.g., Azure OpenAI Service) ServiceName ConsumedQuantity Number of tokens consumed Quantity PricingUnit Unit type, should align to "tokens" DistinctUnits BilledCost Actual cost billed CostInBillingCurrency ChargeCategory Identifies consumption vs. reservation ChargeType ResourceId Links to specific deployments or apps ResourceId Tags Maps usage to teams, projects, or environments Tags UsageType / Usage Details Further SKU-level detail Sku Meter Subcategory, Sku Meter Name Why it matters: Azure’s native billing schema can vary across services and time. FOCUS ensures consistency and enables cross-cloud comparisons. Tip: If you use custom deployment IDs or user metadata, apply them as tags to improve allocation and unit economics. Review the FOCUS specification Step 3: Calculate unit economics Unit cost per token = BilledCost ÷ ConsumedQuantity Real-world example: Calculating unit cost in Power BI A recent Power BI report breaks down Azure OpenAI usage by: SKU Meter Category → e.g., Azure OpenAI SKU Meter Subcategory → e.g., gpt 4o 0513 Input global Tokens SKU Meter Name → detailed SKU info (input/output, model version, etc.) GPT Model Usage Type Effective Cost gpt 4o 0513 Input global Tokens Input $292.77 gpt 4o 0513 Output global Tokens Output $23.40 Unit Cost Formula: Unit Cost = EffectiveCost ÷ ConsumedQuantity Power BI Measure Example: Unit Cost = SUM(EffectiveCost) / SUM(ConsumedQuantity) Pro tip: Break out input and output token costs by model version to: Track which workloads are driving spend. Benchmark cost per token across GPT models. Attribute costs back to teams or product features using Tags or ResourceId. Power BI tip: Building a GPT cost breakdown matrix To easily calculate token unit costs by GPT model and usage type, build a Matrix visual in Power BI using this hierarchy: Rows: SKU Meter Category SKU Meter Subcategory SKU Meter Name Values: EffectiveCost (sum) ConsumedQuantity (sum) Unit Cost (calculated measure) Unit Cost = SUM(‘Costs’[EffectiveCost]) / SUM(‘Costs’[ConsumedQuantity]) Hierarchy Example: Azure OpenAI ├── GPT 4o Input global Tokens ├── GPT 4o Output global Tokens ├── GPT 4.5 Input global Tokens └── etc. Power BI Matrix visual showing Azure OpenAI token usage and costs by SKU Meter Category, Subcategory, and Name. This breakdown enables calculation of unit cost per token across GPT models and usage types, supporting FinOps allocation and unit economics analysis. What you can see at the token level Metric Description Data Source Token Volume Total tokens consumed Consumed Quantity Effective Cost Actual billed cost BilledCost / Cost Unit Cost per Token Cost divided by token quantity Effective Unit Price SKU Category & Subcategory Model, version, and token type (input/output) Sku Meter Category, Subcategory, Meter Name Resource Group / Business Unit Logical or organizational grouping Resource Group, Business Unit Application Application or workload responsible for usage Application (tag) This visibility allows teams to: Benchmark cost efficiency across GPT models. Track token costs over time. Allocate AI costs to business units or features. Detect usage anomalies and optimize workload design. Tip: Apply consistent tagging (Cost Center, Application, Environment) to Azure OpenAI resources to enhance allocation and unit economics reporting. How the FinOps Foundation’s AI working group informs this approach The FinOps for AI overview, developed by the FinOps Foundation’s AI working group, highlights unique challenges in managing AI-related cloud costs, including: Complex cost drivers (tokens, models, compute hours, data transfer). Cross-functional collaboration between Finance, Engineering, and ML Ops teams. The importance of tracking AI unit economics to connect spend with value. By combining the FinOps toolkit, FOCUS-conformed data, and Power BI reporting, practitioners can implement many of the AI Working Group’s recommendations: Establish token-level unit cost metrics. Allocate costs to teams, models, and AI features. Detect cost anomalies specific to AI usage patterns. Improve forecasting accuracy despite AI workload variability. Tip: Applying consistent tagging to AI workloads (model version, environment, business unit, and experiment ID) significantly improves cost allocation and reporting maturity. Step 4: Allocate and Report Costs With FOCUS + FinOps toolkit: Allocate costs to teams, projects, or business units using Tags, ResourceId, or custom dimensions. Showback/Chargeback AI usage costs to stakeholders. Detect anomalies using the Toolkit’s patterns or integrate with Azure Monitor. Tagging tip: Add metadata to Azure OpenAI deployments for easier allocation and unit cost reporting. Example: tags: CostCenter: AI-Research Environment: Production Feature: Chatbot Step 5: Iterate Using FinOps Best Practices FinOps capability Relevance Reporting & analytics Visualize token costs and trends Allocation Assign costs to teams or workloads Unit economics Track cost per token or business output Forecasting Predict future AI costs Anomaly management Identify unexpected usage spikes Start small (Crawl), expand as you mature (Walk → Run). Learn about the FinOps Framework Next steps Ready to take control of your Azure OpenAI costs? Deploy the Microsoft FinOps toolkit Start ingesting and analyzing your Azure billing data. Get started Adopt FOCUS Normalize your cost data for clarity and cross-cloud consistency. Explore FOCUS Calculate AI unit economics Track token consumption and unit costs using Power BI. Customize Power BI reports Extend toolkit templates to include token-based unit economics. Join the conversation Share insights or questions with the FinOps community on TechCommunity or in the FinOps Foundation Slack. Advance Your Skills Consider the FinOps Certified FOCUS Analyst certification. Further Reading Managing the cost of AI: Understanding AI workload cost considerations Microsoft FinOps toolkit Learn about FOCUS Microsoft Cost Management + Billing FinOps Foundation Appendix: FOCUS column glossary ConsumedQuantity: The number of tokens or units consumed for a given SKU. This is the key measure of usage. ConsumedUnit: The type of unit being consumed, such as 'tokens', 'GB', or 'vCPU hours'. Often appears as 'Units' in Azure exports for OpenAI workloads. PricingUnit: The unit of measure used for pricing. Should match 'ConsumedUnit', e.g., 'tokens'. EffectiveCost: Final cost after amortization of reservations, discounts, and prepaid credits. Often derived from billing data. BilledCost: The invoiced charge before applying commitment discounts or amortization. PricingQuantity: The volume of usage after applying pricing rules such as tiered or block pricing. Used to calculate cost when multiplied by unit price.
robbdi
May 21, 2025 Place FinOps Blog
2KViews
2likes
1Comment
EA to MCA: How the FinOps toolkit can help you transition smoothly
Make the shift from EA to MCA seamless with the FinOps toolkit—unlock better cost visibility across accounts, quantify savings, and gain actionable insights to optimize your cloud investments.
micflan
May 01, 2025 Place FinOps Blog
868Views
2likes
0Comments