azure databricks
89 TopicsApproaches to Integrating Azure Databricks with Microsoft Fabric: The Better Together Story!
Azure Databricks and Microsoft Fabric can be combined to create a unified and scalable analytics ecosystem. This document outlines eight distinct integration approaches, each accompanied by step-by-step implementation guidance and key design considerations. These methods are not prescriptive—your cloud architecture team can choose the integration strategy that best aligns with your organization’s governance model, workload requirements and platform preferences. Whether you prioritize centralized orchestration, direct data access, or seamless reporting, the flexibility of these options allows you to tailor the solution to your specific needs.268Views1like0CommentsGeneral Availability: Automatic Identity Management (AIM) for Entra ID on Azure Databricks
In February, we announced that Automatic Identity Management in public preview and loved to hear your overwhelmingly positive feedback. Prior to public preview, you either had to set up an Entra Enterprise Application or involve an Azure Databricks account admin to import the appropriate groups. This required manual steps whether it was adding or removing users with organizational changes, maintaining scripts, or requiring additional Entra or SCIM configuration. Identity management was thus cumbersome and required management overhead. Today, we are excited to announce that Automatic Identity management (AIM) for Entra ID on Azure Databricks is generally available. This means no manual user setup is needed and you can instantly add users to your workspace(s). Users, groups, and service principals from Microsoft Entra ID are automatically available within Azure Databricks, including support for nested groups and dashboards. This native integration is one of the many reasons Databricks runs best on Azure. Here are some addition ways AIM could benefit you and your organization: Seamlessly share dashboards You can share AI/BI dashboards with any user, service principal, or group in Microsoft Entra ID immediately as these users are automatically added to the Azure Databricks account upon login. Members of Microsoft Entra ID who do not have access to the workspace are granted access to a view-only copy of a dashboard published with embedded credentials. This enables you to share dashboards with users outside your organization, too. To learn more, see share a dashboard. Updated defaults for new accounts All new Azure Databricks accounts have AIM enabled – no opt in or additional configuration required. For existing accounts, you can enable AIM with a single click in the Account Admin Console. Soon, we will also make this the default for existing accounts. Automation at scale enabled via APIs You can also register users, groups, or service principles in Microsoft Entra ID via APIs. Being able to do this programmatically enables the enterprise scale most of our customers need. You can also enable automation via scripts leveraging these APIs. Read the Databricks blog here and get started via documentation today!565Views1like0CommentsPart 2: Performance Configurations for Connecting PBI to a Private Link ADB Workspace
This blog was written in conjunction with Leo Furlong, Lead Solutions Architect at Databricks. In Part 1, we discussed networking options for connecting Power BI to an Azure Databricks workspace with a Public Endpoint protected with a workspace IP Access List. In Part 2, we continue our discussion and elaborate on private networking options for an Azure Databricks Private Link workspace. When using Azure Databricks Private Link with Allow Public Network Access setting set to Disabled, all connections to the workspace must go through Private Endpoints. For one of the private networking options, we’ll also discuss how to configure your On-Premise Data Gateway VM to get good performance. Connecting Power BI to a Private Link Azure Databricks Workspaces As covered in Part 1, Power BI offers two primary methods for secure connections to data sources with private networking: 1. On-premises data gateway: An application that gets installed on a Virtual Machine that has a direct networking connection to a data source. It allows Power BI to connect to data sources that don’t allow public connections. The general flow of this setup entails: a. Create or leverage a set of Private Endpoints to the Azure Databricks workspace - both sub-resources for databricks_ui_api and browser_authentication are required b. Create or leverage a Private DNS Zone for privatelink.azuredatabricks.net c. Deploy an Azure VM into a VNet/subnet d. The VM’s VNet/subnet should have access to the Private Endpoints (PEs) via either them being in the same VNet or being peered with another VNet where they reside e. Install and configure the on-premise data gateway software on the VM f. Create a connection in the Power BI Service via Settings -> Manage Connections and Gateways UIs g. Configure the Semantic Model to use the connection under the Semantic Model’s settings and gateway and cloud connections sub-section 2. Virtual Network Data Gateway: A fully managed data gateway that gets created and managed by the Power BI service. Connections work by allowing Power BI to delegate into a VNet for secure connectivity to the data source. The general flow of this setup entails: a. Create or leverage a set of Private Endpoints (PEs) to the Azure Databricks workspace - both sub-resources for databricks_ui_api and browser_authentication are required b. Create or leverage a Private DNS Zone for privatelink.azuredatabricks.net c. Create a subnet in a VNet that has access to the Private Endpoints (PEs) via either them being in the same VNet or being peered with another VNet where they reside. Delegate the Subnet to Microsoft.PowerPlatform/vnetaccesslinks d. Create a virtual network data gateway in the Power BI Service via Settings -> Manage Connections and Gateways UIs e. Configure the the Semantic Model to use the connection under the Semantic Model’s settings and gateway and cloud connections sub-section The documentation for both options is fairly extensive, and this blog post will not focus on breaking down the configurations further. Instead, this post is about configuring your private connections to get the best Import performance. On-Premise Data Gateway Performance Testing In order to provide configuration guidance, a series of Power BI Import tests were performed using various configurations and a testing dataset. Testing Data The testing dataset used was a TPC-DS scale factor 10 dataset (you can create your own using this Repo). A scale factor of 10 in TPC-DS generates about 10 gigabytes (GB) of data. The TPC-DS dataset was loaded into Unity Catalog and the primary and foreign keys were created between the tables. A model was then created in the Power BI Service using the Publish to Power BI capabilities in Unity Catalog; the primary and foreign keys were used to automatically create relationships between the tables in the Power BI semantic model. Here’s an overview of the tables used in this dataset: Fabric Capacity An F64 Fabric Capacity was used in the West US region. The F64 was the smallest size available (in terms of RAM) for refreshing the model without getting capacity errors - the compressed Semantic Model size is 5,244 MB. Azure Databricks SQL Warehouse An Azure Databricks workspace using Unity Catalog was deployed in the East US 2 and West US regions for the performance tests. A Medium Databricks SQL Warehouse was used. For Imports, generally speaking, the size of the SQL Warehouse isn’t very important. Using an aggressive Auto Stop configuration of 5 minutes is ideal to minimize compute charges (1 minute can be used if the SQL Warehouse is deployed via an API). Testing Architecture The following diagram summarizes a simplified Azure networking architecture for the performance tests. A Power BI Semantic Model is connected to a Power BI On-Premise Data Gateway Connection The On-Premise Data Gateway Connection connects to the Azure Databricks workspace using Private Endpoints. Azure Databricks provisions up a Serverless SQL Warehouse in ~5 seconds within the Serverless Data Plane within Azure. SQL queries are executed on the Serverless SQL Warehouse. Unity Catalog gives the Serverless SQL Warehouse a read-only, down-scoped, and pre-signed URL to ADLS. Data is fetched from ADLS and placed on the Azure Databricks workspace’s managed storage account via a capability called Cloud Fetch. Arrow Files are pulled from Cloud Fetch and delivered to the Power BI Service through the Data Gateway. Data in the Semanic Model is compressed and stored in Vertipaq In-Memory storage. Testing Results The following grid outlines the scenarios tested and the results for each test. We’ll review the different configurations tested below in specific sections. Scenario Gateway Scenario Avg Refresh Duration Minutes A East US 2, Public Endpoint 17:01 B West US, Public Endpoint 12:21 C West US, Public Endpoint via IP Access List 15:19 D West US, E VM Gateway Base 12:14 E West US, E VM StreamBeforeRequestCompletes 07:46 F West US, E VM StreamBeforeRequestCompletes + Logical Partitions 07:31 G West US, E VM Spooler (D) 12:57 H West US, E VM Spooler (E) 13:32 I West US, D VM Gateway Base 16:47 J West US, D VM StreamBeforeRequestCompletes 12:19 K West US, PBI Managed Vnet 27:04 Scenario VM Configuration D Standard E8bds v5 (8 vcpus, 64 GiB memory) [NVMe, Accelerated Networking], C Drive default (Premium SSD LRS 127 GiB) E Standard E8bds v5 (8 vcpus, 64 GiB memory) [NVMe, Accelerated Networking], C Drive default (Premium SSD LRS 127 GiB) F Standard E8bds v5 (8 vcpus, 64 GiB memory) [NVMe, Accelerated Networking], C Drive default (Premium SSD LRS 127 GiB) G Standard E8bds v5 (8 vcpus, 64 GiB memory) [NVMe, Accelerated Networking], D drive H Standard E8bds v5 (8 vcpus, 64 GiB memory) [NVMe, Accelerated Networking], E Drive (Premium SSD LRS 600 GiB) I Standard D8s v3 (8 vcpus, 32 GiB memory), C Drive default (Premium SSD LRS 127 GiB) J Standard D8s v3 (8 vcpus, 32 GiB memory), C Drive default (Premium SSD LRS 127 GiB) Performance Configurations 1. Regional Alignment Aligning your Power BI Premium/Fabric Capacity to the same region as your Azure Databricks deployment and your On-Premise Data Gateway VM helps reduce the overall network latency and data transfer duration. It should also eliminate cross-region networking charges. In scenario A, the Azure Databricks deployment was in East US 2 while the Fabric Capacity and On-Premise Data Gateway VM were in West US. The Import processing time when using the public endpoint between the regions was 17:01 minutes. In scenario B, while still using the public endpoint, there is complete regional alignment in the West US region and the Import times averaged 12:21 minutes which is a 27.4% decrease 2. Configure a Gateway Cluster A Power BI Data Gateway Cluster configuration is highly recommended for Prouduction deployments but this configuration was not performance tested during this experiment. Data Gateway clusters can help with data refresh redundancy and for overall volume / throughput of data transfer. This configuration is highly recommended for Production Power BI environments. 3. VM Family Selection The Power BI documentation recommends a VM with 8 cores, 8 GB of RAM, and an SSD for the VM used for the On-Premise Data Gateway. Through testing, it can be proven that using a VM with good performance characteristics can provide immense value in the Import times. In scenario D, data gateway tests were run using a Standard E8bds v5 with 8 cores and 64 GB RAM that also included NVMe, and Accelerated Networking, and a C drive using a Premium SSD. The import times for this scenario averaged 12:14 minutes which was slightly faster than the regionally aligned public endpoint test in scenario B. In scenario I, data gateway tests were run using a Standard D8s v3 with 8 cores and 32 GB RAM and a C drive using a Premium SSD. The import times for this scenario averaged 16:47 minutes which was noticeably slower than using the regionally aligned public endpoint in cenario B which was a 35.96% performance degradation. More tests could certainly be done to determine which VM characteristics help the most with Import performance, but it is clear certain features can be helpful like: Premium SSDs Accelerated Networking NVMe controller Memory optimized instances And while the better E8bds v5 Azure VM costs ~$820 per month in West US at list and the D8s v3 costs ~$610 per month at list (25% more expensive), this feels like a scenario where you pay the premium to get better performance and optimize through Azure VM reservations. 4. StreamBeforeRequestCompletes By default, the on-premise data gateway spools data to disk before sending it to Power BI. Enabling the StreamBeforeRequestCompletes setting to True can significantly improve gateway refresh performance as it allows data to be streamed directly to the Power BI Service without first being spooled to disk. In scenario E, when StreamBeforeRequestCompletes is set to True and restarted, you can see that the average Import times significantly improved to 07:46 minutes which is a 54% improvement compared to scenario A and a 36% improvement over the base VM configuration in scenario D. 5. Spooler Location As discussed above, when using the default setting for StreamBeforeRequestCompletes as False, Power BI spools the data to the data gateway spool directory before sending it to the Power BI Service. In scenarios D, G, and H, StreamBeforeRequestCompletes is False and the Spooler directory has been mapped to the C drive, D drive, and E drives respectively which all correspond to an SSD (of varying configuration) on the Azure VMs. In all scenarios, you can see the times are similar between 12:14, 12:57, and 13:32 minutes, respectively. In all three scenarios the tests were performed with SSDs on the E series VM configured with NVMe. Using this configuration mix, it doesn’t appear that the Spooler directory location provides significant performance improvements. Since the C drive configuration gave the best performance it seems prudent to keep the C drive default configuration. However, it is possible that that the Spooler directory setting might provide more value on a different VM configurations. 6. Logical Partitioning As outlined in the QuickStart samples guide, logical partitioning can often help with Power BI Import performance as multiple logical partitions in the Semantic Model can be processed at the same time. In scenario F, logical partitions were created for the inventory and store_sales table to have 5 partitions each. When combined with the StreamBeforeRequestCompletes setting, the benefit from adding Logical Partitions was negligible (15 second improvement) even though the parallelization settings were increased to 30 (Max Parallelism Per Refresh and Data Source Default Max Connections). While logical partitions are usually a very valuable strategy, combining them with StreamBeforeRequestCompletes, the E series VM configurations, and a Fabric F64 capacity yielded diminishing returns. It is probably worth more testing at some point in the future. Virtual Network Data Gateway Performance Testing The configuration and performance of a Virtual Network Data Gateway was briefly tested. A Power BI subnet was created in the same VNet as the Azure Databricks workspace and delegated to the Power BI Service. A virtual network data gateway was created in the UI with 2 gateways (12 queries can run in parallel) and assigned to the Semantic Model. In scenario K, an Import test was performed through the Virtual Network Data Gateway that took 27:04 minutes. More time was not spent trying to tune the Virtual Network Data Gateway as it was not the primary focus of this blog post. The Best Configuration The Best Configuration: Region Alignment + Good VM + StreamBeforeRequestsCompletes While the Import testing performed for this blog post isn’t definitive, it does provide good directional value in forming an opinion on how you can configure your Power BI On-Premise Data Gateway on an Azure Virtual Machine to get good performance. When looking at the tests performed for this blog, an Azure Virtual Machine, in the same region as the Azure Databricks Workspace and the Fabric Capacity, with Accelerated networking, an SSD, NVMe, and memory optimized compute provided performance that was faster than just using the public endpoint of the Azure Databricks Workspace alone. Using this configuration, we improved our Import performance from 17:01 to 07:46 minutes which is a 54% performance improvement.2.8KViews1like1CommentClosing the loop: Interactive write-back from Power BI to Azure Databricks
This is a collaborative post from Microsoft and Databricks. We thank Toussaint Webb, Product Manager at Databricks, for his contributions. We're excited to announce that the Azure Databricks connector for Power Platform is now Generally Available. With this integration, organizations can seamlessly build Power Apps, Power Automate flows, and Copilot Studio agents with secure, governed data and no data duplication. A key functionality unlocked by this connector is the ability to write data back from Power BI to Azure Databricks. Many organizations want to not only analyze data but also act on insights quickly and efficiently. Power BI users, in particular, have been seeking a straightforward way to “close the loop” by writing data back from Power BI into Azure Databricks. This capability is now here - real-time updates and streamlined operational workflows with the new Azure Databricks connector for Power Platform. With this connector, users can now read from and write to Azure Databricks data warehouses in real time, all from within familiar interfaces — no custom connectors, no data duplication, and no loss of governance. How It Works: Write-backs from Power BI through Power Apps Enabling writebacks from Power BI to Azure Databricks is seamless. Follow these steps: Open Power Apps and create a connection to Azure Databricks (documentation). In Power BI (desktop or service), add a Power Apps visual to your report (purple Power Apps icon). Add data to connect to your Power App via the visualization pane. Create a new Power App directly from the Power BI interface, or choose an existing app to embed. Start writing records to Azure Databricks! With this integration, users can make real-time updates directly within Power BI using the embedded Power App, instantly writing changes back to Azure Databricks. Think of all the workflows that this can unlock, such as warehouse managers monitoring performance and flagging issues on the spot, or store owners reviewing and adjusting inventory levels as needed. The seamless connection between Azure Databricks, Power Apps, and Power BI lets you close the loop on critical processes by uniting reporting and action in one place. Try It Out: Get started with Azure Databricks Power Platform Connector The Power Platform Connector is now Generally Available for all Azure Databricks customers. Explore more in the deep dive blog here and to get started, check out our technical documentation. Coming soon we will add the ability to execute existing Azure Databricks Jobs via Power Automate. If your organization is looking for an even more customizable end-to-end solution, check out Databricks Apps in Azure Databricks! No extra services or licenses required.2.9KViews2likes2CommentsSupercharge Data Intelligence: Build Teams App with Azure Databricks Genie & Azure AI Agent Service
Introduction Are you looking to unlock the full potential of your data investments in Azure Databricks while seamlessly harnessing the capabilities of Azure AI? At Microsoft BUILD 2025, we made the announcement of the Azure Databricks connector in Azure AI Foundry. This blog post is a follow-up to take advantage of this feature within Microsoft Teams. We'll guide you through leveraging the integration between Azure Databricks and Azure AI Foundry to build a Python-based Teams app that consumes Genie APIs using the secure On-Behalf-Of (OBO) authentication flow. Whether you are a data engineer, AI developer, or a business user seeking actionable insights, this guide will help you accelerate your journey in data intelligence with modern, secure, and streamlined tools. You can find the code samples here: AI-Foundry-Connections - Teams chat with Azure Databricks Genie | GitHub (git clone https://github.com/Azure-Samples/AI-Foundry-Connections.git ) Setting the Stage: Key Components Before we dive into the sample app, let’s quickly establish what each major component brings to the table. What is AI/BI Genie? AI/BI Genie is an intelligent agent feature in Azure Databricks, exposing natural language APIs to interact with your data. It empowers users to query, analyze, and visualize data using conversational AI, making advanced analytics accessible to everyone. Azure AI Foundry Azure AI Foundry is your control center for building, managing, and deploying AI solutions. It offers a unified platform to orchestrate AI agents, connect to various data sources like Azure Databricks, and manage models, workflows, and governance. Azure AI Agent Service Azure AI Agents are modular, reusable, and secure components that interact with data and AI services. They enable you to build multi-agent workflows, integrate with enterprise systems, and deliver contextual, actionable insights to your end-users. The Sample Solution: Python Teams App for Genie APIs To help you get hands-on, this blog features a sample Teams app that demonstrates how to: Connect Teams to Azure Databricks Genie via Azure AI Foundry using OBO (On-Behalf-Of) authentication. Query and visualize data using Genie APIs and LLMs in a secure, governed manner. Build and run the app locally using DevTunnel and easily extend it for production deployments. Purpose: The sample app is designed as a learning tool, enabling developers to explore the integration and build their own enterprise-grade solutions. It shows you how to wire up authentication, connect to Genie, and ask data-driven questions—all from within Teams. Architecture Overview Below is a logical architecture that illustrates the flow: Key Steps from the Sample (posted on Github) Prerequisites: a. Familiarity with Azure Databricks, DevTunnel, Azure Bot Service, and Teams App setup. b. An AI Foundry project with Databricks and LLM connection. Configuration: a. Set up the DevTunnel for local development. b. Register and configure your bot and app in Azure. c. Create a connection in Azure AI Foundry to your Databricks Genie space. d. Provision storage with public blob access for dynamic chart/image rendering. e. Populate the `.env` file with all necessary IDs, secrets, and endpoints. Teams App Manifest: a. Create and configure the Teams app manifest, specifying endpoints, domains, permissions, and SSO details. Run and Test: a. Host the app locally, activate the virtual environment, and run the Python application. b. Upload your Teams app manifest and test conversational queries directly in the Teams client. Sample Queries (Try these in the app!): "Show the top sales reps by pipeline in a pie chart." "What is the average size of active opportunities?" "How many opportunities were won or lost in the current fiscal year?" Get Started: Next Steps To replicate this solution: Review the sample in AI-Foundry-Connections - Teams chat with Azure Databricks Genie | GitHub (git clone https://github.com/Azure-Samples/AI-Foundry-Connections.git) —you'll need to review the README for the step-by-step setup and configuration details! Explore homework opportunities such as deploying the app on Azure App Service or Kubernetes, and experimenting with additional languages or the M365 Agents Toolkit. Final Thoughts Thank you for reading! By integrating Azure Databricks, Azure AI Foundry, and Genie APIs, you can deliver powerful, secure, and collaborative data intelligence experiences right within Microsoft Teams. Ready to go further? Check out the M365 Agents SDKand M365 Agents Toolkit for building advanced AI applications. Experiment, extend, and share your feedback with the community! Happy building! 🚀648Views0likes0CommentsEnabling Open Data Sharing of Unity Catalog Assets with Microsoft Purview
As organizations scale their use of Databricks for data and AI workloads, enabling secure and discoverable access to Unity Catalog assets becomes increasingly important. In many cases, users such as analysts or business partners need to query specific datasets but do not have direct access to the Databricks workspace. Databricks Delta Sharing addresses this challenge by allowing secure access to data and AI assets through an open sharing protocol, making it possible to share governed datasets across organizational boundaries. Learn more about sharing data using the Delta Sharing open sharing protocol. While Delta Sharing provides the access layer, Microsoft Purview plays a critical role in enabling data discovery. Without access to Databricks Catalog Explorer, users may struggle to locate Unity Catalog-managed data. By integrating with Unity Catalog, Microsoft Purview makes metadata for these assets discoverable through its data map. This allows users across the organization to explore available datasets, including those in Unity Catalog, and request access in a secure and governed manner. Once data is discoverable, Microsoft Purview enables automated access provisioning through its self-service workflows. These workflows can be configured to trigger Databricks REST APIs, allowing Unity Catalog access requests to be processed in an automated fashion. This approach streamlines the approval process, reduces manual overhead, and ensures that access is granted in a secure and auditable way. Check out this detailed guide I wrote to implement this integration: A Guide to Enabling Open Data Sharing of Databricks Unity Catalog Assets with Microsoft Purview. The guide currently focuses on using Purview workflows, which are part of the classic data catalog experience. Guidance on the upcoming Power Automate connector—which will enable access requests directly from Purview—is expected to be available soon.387Views0likes1CommentWorkspace failure
Hi Community, I had my Databricks workspace up and running and it was managed through terraform, and encryption was enabled through cmk, there were some updation in the code, so I put terraform plan, one of the key changes(replace) it showed me was "azurerm_role_assignment.storage_identity_kv_access module.workspace.azurerm_role_assignment.storage_identity_kv_access" the terraform run was running for 30 min, and the workspace was in deployment for long time and then ultimately got failed. Again, as all the changes were not done, I reapplied, and I got this error "Performing CreateOrUpdate: unexpected status 400 (400 Bad Request ) With error: InvalidEncryptionConfiguration: Configure encryption for workspace at creation is not allowed, configure encryption once workspace is created and key vault access policies are added" Again, I applied and everything and terraform run succeeded but I can see in azure portal that workspace is in failed state, but if I go to Databricks account I can see Databricks as running and if I go to workspace, I am able to start clusters and execute some queries. I am not able to launch the workspace using azure portal, not sure there will be other issues due to this. Could anyone help me to resolve this issue. Let me know if you need anything further to investigate the issue.133Views0likes3CommentsAnnouncing the Azure Databricks connector in Power Platform
We are ecstatic to announce the public preview of the Azure Databricks Connector for Power Platform. This native connector is specifically for Power Apps, Power Automation, and Copilot Studio within Power Platform and enables seamless, single click connection. With this connector, your organization can build data-driven, intelligent conversational experiences that leverage the full power of your data within Azure Databricks without any additional custom configuration or scripting – it's all fully built in! The Azure Databricks connector in power platform enables you to: Maintain governance: All access controls for data you set up in Azure Databricks are maintained in Power Platform Prevent data copy: Read and write to your data without data duplication Secure your connection: Connect Azure Databricks to Power Platform using Microsoft Entra user-based OAuth or service principals Have real time updates: Read and write data and see updates in Azure Databricks in near real time Build agents with context: Build agents with Azure Databricks as grounding knowledge with all the context of your data Instead of spending time copying or moving data and building custom connections which require additional manual maintenance, you can now seamlessly connect and focus on what matters – getting rich insights from your data – without worrying about security or governance. Let’s see how this connector can be beneficial across Power Apps, Power Automate, and Copilot Studio: Azure Databricks Connector for Power Apps – You can seamlessly connect to Azure Databricks from Power Apps to enable read/write access to your data directly within canvas apps enabling your organization to build data-driven experiences in real time. For example, our retail customers are using this connector to visualize different placements of items within the store and how they impact revenue. Azure Databricks Connector for Power Automate – You can execute SQL commands against your data within Azure Databricks with the rich context of your business use case. For example, one of our global retail customers is using automated workflows to track safety incidents, which plays a crucial role in keeping employees safe. Azure Databricks as a Knowledge Source in Copilot Studio – You can add Azure Databricks as a primary knowledge source for your agents, enabling them to understand, reason over, and respond to user prompts based on data from Azure Databricks. To get started, all you need to do in Power Apps or Power Automate is add a new connection – that's how simple it is! Check out our demo here and get started using our documentation today! This connector is available in all public cloud regions. You can also learn more about customer use cases in this blog. You can also review the connector reference here2.9KViews2likes2CommentsAnnouncing the availability of Azure Databricks connector in Azure AI Foundry
At Microsoft, Databricks Data Intelligence Platform is available as a fully managed, native, first party Data and AI solution called Azure Databricks. This makes Azure the optimal cloud for running Databricks workloads. Because of our unique partnership, we can bring you seamless integrations leveraging the power of the entire Microsoft ecosystem to do more with your data. Azure AI Foundry is an integrated platform for Developers and IT Administrators to design, customize, and manage AI applications and agents. Today we are excited to announce the public preview of the Azure Databricks connector in Azure AI Foundry. With this launch you can build enterprise-grade AI agents that reason over real-time Azure Databricks data while being governed by Unity Catalog. These agents will also be enriched by the responsible AI capabilities of Azure AI Foundry. Here are a few ways this can benefit you and your organization: Native Integration: Connect to Azure Databricks AI/BI Genie from Azure AI Foundry Contextual Answers: Genie agents provide answers grounded in your unique data Supports Various LLMs: Secure, authenticated data access Streamlined Process: Real-time data insights within GenAI apps Seamless Integration: Simplifies AI agent management with data governance Multi-Agent workflows: Leverages Azure AI agents and Genie Spaces for faster insights Enhanced Collaboration: Boosts productivity between business and technical users To further democratize the use of data to those in your organization who aren't directly interacting with Azure Databricks, you can also take it one step further with Microsoft Teams and AI/BI Genie. AI/BI Genie enables you to get deep insights from your data using your natural language without needing to access Azure Databricks. Here you see an example of what an agent built in AI Foundry using data from Azure Databricks available in Microsoft Teams looks like We'd love to hear your feedback as you use the Azure Databricks connector in AI Foundry. Try it out today – to help you get started, we’ve put together some samples here. Read more on the Databricks blog, too.7.4KViews5likes3CommentsAutomating Data Vault processes on Microsoft Fabric with VaultSpeed
This Article is Authored By Jonas De Keuster from VaultSpeed and Co-authored with Michael Olschimke, co-founder and CEO at Scalefree International GmbH & Trung Ta is a senior BI consultant at Scalefree International GmbH. The Technical Review is done by Ian Clarke, Naveed Hussain – GBBs (Cloud Scale Analytics) for EMEA at Microsoft Businesses often struggle to align their understanding of processes and products across disparate systems in corporate operations. In our previous blogs in this series, we explored the advantages of Data Vault as a methodology and why it is increasingly recognized due to its automation-friendly approach to modern data warehousing. Data Vault’s modular structure, scalability, and flexibility address the challenges of integrating diverse and evolving data sources. However, the key to successfully implementing a Data Vault lies in automation. Data Vault’s pattern-based modeling - organized around hubs, links, and satellites - provides a standardized framework well-suited to integrate data from horizontally scattered operational source systems. Automation tools like VaultSpeed enhance this methodology by simplifying the generation of Data Vault structures, streamlining workflows, and enabling rapid delivery of analytics-ready data solutions. By leveraging the strengths of Data Vault and VaultSpeed’s automation capabilities, organizations can overcome inefficiencies in traditional ETL processes, enabling scalable and adaptable data integration. Examples of such operational systems include Microsoft Dynamics 365 for CRM and ERP, SAP for enterprise resource planning, or Salesforce for customer data. Attempts to harmonize this complexity historically relied on pre-built industry data models. However, these models often fell short, requiring significant customization and failing to accommodate unique business processes. Different approaches to Data Integration Industry data models offer a standardized way to structure data, providing a head start for organizations with well-aligned business processes. They work well in stable, regulated environments where consistency is key. However, for organizations dealing with diverse sources and fast-changing requirements, Data Vault offers greater flexibility. Its modular, scalable approach supports evolving data landscapes without the need to reshape existing models. Both approaches aim to streamline integration. Data Vault simply offers more adaptability when complexity and change are the norm. So it depends on the use cases when it comes to choosing the right approach. Tackling data complexity with automation Integrating data from horizontally distributed sources is one of the biggest challenges data engineers face. VaultSpeed addresses this by connecting the physical metadata from source systems with the business's conceptual data model and creating a "town plan" for building a Data Vault model. This "town plan" for Data Vault model construction serves as the bedrock for automating various data pipeline stages. By aligning data's technical and business perspectives, VaultSpeed enables the automated generation of logical and physical data models. This automation streamlines the design process and ensures consistency between the data's conceptual understanding and physical implementation. Furthermore, VaultSpeed's automation extends to the generation of transformation code. This code converts data from its source format into the structure defined by the Data Vault model. Automating this process reduces the potential for errors and accelerates the development of the data integration pipeline. In addition to data models and transformation code, VaultSpeed also automates workflow orchestration. This involves defining and managing the tasks required to extract, transform, and load data into the Data Vault. By automating this orchestration, VaultSpeed ensures that the data integration process is executed reliably and efficiently. How VaultSpeed automates integration The following section will examine the detailed steps involved in the VaultSpeed workflow. We will examine how it combines metadata-driven and data-driven modeling approaches to streamline data integration and automate various data pipeline stages. Harvest metadata: VaultSpeed collects metadata from source systems such as OneLake, AzureSQL, SAP, and Dynamics 365, capturing schema details, relationships, and dependencies. Align with conceptual models: Using a business’s conceptual data model as a guiding framework, VaultSpeed ensures that physical source metadata is mapped consistently to the target Data Vault structure. Generate logical and physical models: VaultSpeed leverages its metadata repository and automation templates to produce fully defined logical and physical Data Vault models, including hubs, links, and satellites. Automate code creation: Once the models are defined, VaultSpeed generates the necessary transformation and workflow code using templates with embedded standards and conventions for Data Vault implementation. This ensures seamless data ingestion, integration, and consistent population of the Data Vault model. By automating these steps, VaultSpeed eliminates the manual effort required for traditional data modeling and integration, reducing errors and addressing the inefficiencies of data integration using traditional ETL. Due to the model driven approach, the code is always in sync with the data model. Unified integration with Microsoft Fabric Microsoft Fabric offers a robust data ingestion, storage, and analytics ecosystem. VaultSpeed seamlessly embeds within this ecosystem to ensure an efficient and automated data pipeline. Here’s how the process works: Ingestion (Extract and Load): Tools like ADF, Fivetran, or OneLake replication bring data from various sources into Fabric. These tools handle the extraction and replication of raw data from operational systems. Microsoft Fabric also supports mirrored databases, enabling real-time data replication from sources like CosmosDB, Azure SQL, or application data into the Fabric environment. This ensures data remains synchronized across the ecosystem, providing a consistent foundation for downstream modeling and analytics. Data Repository or Platform: Microsoft Fabric is the data platform providing the infrastructure for storing, managing, and securing the ingested data. Fabric uniquely supports warehouse and lakehouse experiences, bringing them together under a unified data architecture. This means organizations can combine structured, transactional data with unstructured or semi-structured data in a single platform, eliminating silos and enabling broader analytics use cases. Modeling and Transformation: VaultSpeed takes over at this stage, leveraging its advanced automation to model and transform data into a Data Vault structure. This includes creating hubs, links, and satellites while ensuring alignment with business taxonomies. Unlike traditional ETL tools, VaultSpeed is not involved in the runtime execution of transformations. Instead, it generates code that runs within Microsoft Fabric. This approach ensures better performance, reduces vendor lock-in, and enhances security since no data flows through VaultSpeed itself. By focusing exclusively on model-driven automation, VaultSpeed enables organizations to maintain full control over their data processing while benefiting from automation efficiencies. Additionally, Fabric's VertiPaq engine manages the transformation workloads automatically, ensuring optimal performance without requiring extensive manual tuning, a key capability in a Data Vault context where performance is critical for handling large volumes of data and complex transformations. This simplifies operations for data engineers and ensures that query performance remains efficient, even as data volumes and complexity grow. Consume: The integrated data layer within Microsoft Fabric serves multiple consumption paths. While tools like Power BI enable actionable insights through analytics dashboards, the same data foundation can also drive AI use cases, such as machine learning models or intelligent applications. By connecting ingestion tools, a unified data platform, and analytics or AI solutions, VaultSpeed ensures a streamlined and integrated workflow that maximizes the value of the Microsoft Fabric ecosystem. Loading at multiple speeds: real-time Data Vaults with Fabric Loading data into a Data Vault often requires balancing traditional batch processes with the demands of real-time ingestion within a unified model. Microsoft Fabric’s event-driven tools, such as Data Activator, empower organizations to process data streams in real-time while supporting traditional batch loads. VaultSpeed complements these capabilities by ensuring that both modes of ingestion feed seamlessly into the same Data Vault model, eliminating the need for separate architectures like the Lambda pattern. Key capabilities for real time Data Vault include: Event-driven updates: Automatically trigger incremental loads into the Data Vault when changes occur in CosmosDB, OneLake, or other sources. Automated workflow orchestration: VaultSpeed’s Flow Management Control (FMC) automates the entire data ingestion, transformation, and loading workflow. This includes handling delta detection, incremental updates, and batch processes, ensuring optimal efficiency regardless of the speed of data arrival. FMC integrates natively with Azure Data Factory (ADF) for seamless orchestration within the Microsoft ecosystem. For more complex or distributed workflows, FMC also supports Apache Airflow, enabling flexibility in managing a wide range of data pipelines. Seamless integration: Maintain synchronized pipelines for historical and real-time data within the Fabric environment. The FMC intelligently manages multiple data streams, dynamically adjusting to workload demands to support high-volume batch loads and real-time event-driven updates. These capabilities ensure analytics dashboards reflect the latest data, delivering immediate value to decision-makers. Automating the gold layer and delivering data products at scale Power BI is a cornerstone of Microsoft Fabric, and VaultSpeed makes it easier for data modelers to connect the dots. By automating the creation of the gold layer, VaultSpeed enables seamless integration between Data Vaults and Power BI. Benefits for data teams: Automated gold layer: VaultSpeed automates the creation of the gold layer, including templates for star schemas, One Big Table (OBT), and other analytics-ready structures. These automated templates allow businesses to generate consistent and scalable presentation layers without manual intervention. Accelerated time to insight: By reducing manual preparation steps, VaultSpeed enables teams to deliver dashboards and reports quickly, ensuring faster access to actionable insights. Deliver data products: The ability to automate and standardize star schemas and other presentation models empowers organizations to deliver analytics-ready data products at scale, efficiently meeting the needs of multiple business domains. Improved data governance: VaultSpeed’s lineage tracking ensures compliance and transparency, providing full traceability from raw data to the presentation layer. No-code automation: Eliminate the need for custom scripting, freeing up time to focus on innovation and higher-value tasks. Conclusion Integrating VaultSpeed and Microsoft Fabric redefines how data modelers and engineers approach Data Vault 2.0. This partnership unlocks the full potential of modern data ecosystems by automating workflows, enabling real-time insights, and streamlining analytics. If you’re ready to transform your data workflows, VaultSpeed and Microsoft Fabric provide the tools you need to succeed. The following article will focus on the DataOps part of automation. Further reading Automating common understanding: Integrating different data source views into one comprehensive perspective Why Data Vault is the best model for data warehouse automation: Read the eBook The Elephant in the Fridge by John Giles: A great reference on conceptual data modeling for Data Vault About VaultSpeed VaultSpeed empowers enterprises to deliver data products at scale through advanced automation for modern data ecosystems, including data lakehouse, data mesh, and fabric architectures. The no-code platform eliminates nearly all traditional ETL tasks, delivering significant improvements in automation across areas like data modeling, engineering, testing, and deployment. With seamless integration to platforms like Microsoft Fabric or Databricks, VaultSpeed enables organizations to automate the entire software development lifecycle for data products, accelerating delivery from design to deployment. VaultSpeed addresses inefficiencies in traditional data processes, transforming how data engineers and business users collaborate to build flexible, scalable data foundations for AI and analytics. About the Authors Jonas De Keuster is VP Product at VaultSpeed. He had close to 10 years of experience as a DWH consultant in various industries like banking, insurance, healthcare, and HR services, before joining the data automation vendor. This background allows him to help understand current customer needs and engage in conversations with members of the data industry. Michael Olschimke is co-founder and CEO of Scalefree International GmbH, a European Big Data consulting firm. The firm empowers clients across all industries to use Data Vault 2.0 and similar Big Data solutions. Michael has trained thousands of industry data warehousing professionals, taught academic classes, and published regularly on these topics. Trung Ta is a senior BI consultant at Scalefree International GmbH. With over 7 years of experience in data warehousing and BI, he has advised Scalefree’s clients in different industries (banking, insurance, government, etc.) and of various sizes in establishing and maintaining their data architectures. Trung’s expertise lies within Data Vault 2.0 architecture, modeling, and implementation, specifically focusing on data automation tools. <<< Back to Blog Series Title Page501Views1like0Comments