microsoft fabric
81 TopicsOverload to Optimal: Tuning Microsoft Fabric Capacity
Co-Authored by: Daya Ram, Sr. Cloud Solutions Architect Optimizing Microsoft Fabric capacity is both a performance and cost exercise. By diagnosing workloads, tuning cluster and Spark settings, and applying data best practices, teams can reduce run times, avoid throttling, and lower total cost of ownership—without compromising SLAs. Use Fabric’s built-in observability (Monitoring Hub, Capacity Metrics, Spark UI) to identify hot spots and then apply cluster- and data-level remediations. For capacity planning and sizing guidance, see Plan your capacity size. Options to Diagnose Capacity Issues 1) Monitoring Hub — Start with the Story of the Run What to use it for: Browse Spark activity across applications (notebooks, Spark Job Definitions, and pipelines). Quickly surface long‑running or anomalous runs; view read/write bytes, idle time, core allocation, and utilization. How to use it From the Fabric portal, open Monitoring (Monitor Hub). Select a Notebook or Spark Job Definition to run and choose Historical Runs. Inspect the Run Duration chart; click on a run to see read/write bytes, idle time, core allocation, overall utilization, and other Spark metrics. What to look for Use the guide: application detail monitoring to review and monitor your application. 2) Capacity Metrics App — Measure the Whole Environment What to use it for: Review capacity-wide utilization and system events (overloads, queueing); compare utilization across time windows and identify sustained peaks. How to use it Open the Microsoft Fabric Capacity Metrics app for your capacity. Review the Compute page (ribbon charts, utilization trends) and the System events tab to see overload or throttling windows. Use the Timepoint page to drill into a 30‑second interval and see which operations consumed the most compute. What to look for Use the Troubleshooting guide: Monitor and identify capacity usage to pinpoint top CU‑consuming items. 3) Spark UI — Diagnose at Deeper Level Why it matters: Spark UI exposes skew, shuffle, memory pressure, and long stages. Use it after Monitoring Hub/Capacity Metrics to pinpoint the problematic job. Key tabs to inspect Stages: uneven task durations (data skew), heavy shuffle read/write, large input/output volumes. Executors: storage memory, task time (GC), shuffle metrics. High GC or frequent spills indicate memory tuning is needed. Storage: which RDDs/cached tables occupy memory; any disk spill. Jobs: long‑running jobs and gaps in the timeline (driver compilation, non‑Spark code, driver overload). What to look for Set via environment Spark properties or session config. Data skew, Memory usage, High/Low Shuffles: Adjust Apache Spark settings: i.e. spark.ms.autotune.enabled, spark.task.cpus and spark.sql.shuffle.partitions. Section 2: Remediation and Optimization Suggestions A) Cluster & Workspace Settings Runtime & Native Execution Engine (NEE) Use Fabric Runtime 1.3 (Spark 3.5, Delta 3.2) and enable the Native Execution Engine to boost performance; enable at the environment level under Spark compute → Acceleration. Starter Pools vs. Custom Pools Starter Pool: prehydrated, medium‑size pools; fast session starts, good for dev/quick runs. Custom Pools: size nodes, enable autoscale, dynamic executors. Create via workspace Spark Settings (requires capacity admin to enable workspace customization). High Concurrency Session Sharing Enable High Concurrency to share Spark Sessions across notebooks (and pipelines) to reduce session startup latency and cost; use session tags in pipelines to group notebooks. Autotune for Spark Enable Autotune (spark.ms.autotune.enabled = true) to auto‑adjust per‑query: spark.sql.shuffle.partitions Spark.sql.autoBroadcastJoinThreshold spark.sql.files.maxPartitionBytes. Autotune is disabled by default and is in preview; enable per environment or session. B) Data‑level best practices Microsoft Fabric offers several approaches to maintain optimal file sizes in Delta tables, review documentation here: Table Compaction - Microsoft Fabric. Intelligent Cache Enabled by default (Runtime 1.1/1.2) for Spark pools: caches frequently read files at node level for Delta/Parquet/CSV; improves subsequent read performance and TCO. OPTIMIZE & Z‑Order Run OPTIMIZE regularly to rewrite files and improve file layout. V‑Order V‑Order (disabled by default in new workspaces) can accelerate reads for read‑heavy workloads; enable via spark.sql.parquet.vorder.default = true. Vacuum Run VACUUM to remove unreferenced files (stale data); default retention is 7 days; align retention across OneLake to control storage costs and maintain time travel. Collaboration & Next Steps Engage Data Engineering Team to Define an Optimization Playbook Start with reviewing capacity sizing guidance, cluster‑level optimizations (runtime/NEE, pools, concurrency, Autotune) and then target data improvements (Z‑order, compaction, caching, query refactors). Triage: Monitor Hub → Capacity Metrics → Spark UI to map workloads and identify high‑impact jobs, and workloads causing throttling. Schedule: Operationalize maintenance: OPTIMIZE (full or selective) during off‑peak windows; enable Auto Compaction for micro‑batch/streaming writes; add VACUUM to your cadence with agreed retention. Add regular code review sessions to ensure consistent performance patterns. Fix: Adjust pool sizing or concurrency; enable Autotune; tune shuffle partitions; refactor problematic queries; re‑run compaction. Verify: Re‑run the job and change, i.e. reduced run time, lower shuffle, improved utilization.165Views0likes0CommentsDefining the Raw Data Vault with Artificial Intelligence
This Article is Authored By Michael Olschimke, co-founder and CEO at Scalefree International GmbH. The Technical Review is done by Ian Clarke, Naveed Hussain – GBBs (Cloud Scale Analytics) for EMEA at Microsoft The Data Vault concept is used across the industry to build robust and agile data solutions. Traditionally, the definition (and subsequent modelling) of the Raw Data Vault, which captures the unmodified raw data, is done manually. This work demands significant human intervention and expertise. However, with the advent of artificial intelligence (AI), we are witnessing a paradigm shift in how we approach this foundational task. This article explores the transformative potential of leveraging AI to define the Raw Data Vault, demonstrating how intelligent automation can enhance efficiency, accuracy, and scalability, ultimately unlocking new levels of insight and agility for organizations. Note that this article describes a solution to AI-generated Raw Data Vault models. However, the solution is not limited to Data Vault, but allows the definition of any data-driven, schema-on-read model to integrate independent data sets in an enterprise environment. We discuss this towards the end of this article. Metadata-Driven Data Warehouse Automation In the early days of Data Vault, all engineering was done manually: an engineer would analyse the data sources and their datasets, come up with a Raw Data Vault model in an E/R tool or Microsoft Visio, and then develop both the DDL code (CREATE TABLE) and the ELT / ETL code (INSERT INTO statements). However, Data Vault follows many patterns. Hubs look very similar (the difference lies in the business keys) and are loaded similarly. We discussed these patterns in previous articles of this series, for example, when covering the Data Vault model and implementation. In most projects where Data Vault entities are created and loaded manually, a data engineer eventually develops the idea of creating a metadata-driven Data Vault generator due to these existing patterns. The effort to build a generator is too considerable, and most projects are better off using an off-the-shelf solution such as Vaultspeed. These tools come with a metadata repository and a user interface for setting up the metadata and code templates required to generate the Raw Data Vault (and often subsequent layers). We have discussed Vaultspeed in previous articles of this series. By applying the code templates to the metadata defined by the user, the actual code for the physical model is generated for a data platform, such as Microsoft Fabric. The code templates define the appearance of hubs, links, and satellites, as well as how they are loaded. The metadata defines which hubs, links, and satellites should exist to capture the incoming data set consistently. Manual development often introduces mistakes and errors that result in deviations in code quality. By generating the data platform code, deviations from the defined templates are not possible (without manual intervention), thus raising the overall quality. But the major driver for most project teams is to increase productivity. Instead of manually developing code, they generate the code. Metadata-driven generation of the Raw Data Vault is standard practice in today's projects. Today’s project tasks have therefore changed: while engineers still need to analyse the source data sets and develop a Raw Data Vault model, they no longer create the code (DDL/ELT). Instead, they set up the metadata that represents the Raw Data Vault model in the tool of their choice. Each data warehouse automation tool comes with its specific features, limitations, and metadata formats. The data engineer/modeler must understand how to transfer the Raw Data Vault model into the data warehouse automation tool by correctly setting up the metadata. This is also true for Vaultspeed; the data modeler can set up the metadata either through the user interface or via the SDK. This is the most labour-intensive task concerning the Raw Data Vault layer. It also requires experts who not only know Data Vault modelling but also know (or can analyse) the source systems' data and understand the selected data warehouse automation solution. Additionally, Data Vault is not equal to Data Vault in many cases, as it allows for a very flexible interpretation of how to model a Data Vault, which also leads to quality issues. But what if the organization has no access to such experts? What if budgets are limited, time is of the essence, or there are no available experts in sufficient numbers in the field? As Data Vault experts, we can debate the value of Data Vault as much as we want, but if there are no experts capable of modeling it, the debate will remain inconclusive. And what if this problem is only getting worse? In the past, a few dozen source tables might have been sufficient to be processed by the data platform. Today, several hundred source tables could be considered a medium-sized data platform. Tomorrow, there will be thousands of source tables. The reason? There is not only an exponential growth in the volume of data to be produced and processed, but it also comes with an exponential growth in the complexity of data shape. The source of this exponential growth in data shape comes from more complex source databases, APIs that produce and deliver semi-structured JSON data, and, ultimately, more complex business processes and an increasing amount of generated and available data that needs to be analysed for meaningful business results. Generating the Data Vault using Artificial Intelligence Increasingly, this data is generated using artificial intelligence (AI) and still requires integration, transformation, and analysis. The issue is that the number of data engineers, data modelers, and data scientists is not growing exponentially. Universities around the world only produce a limited number of these roles, and some of us would like to retire one day. Based on our experience, the increase in these roles is linear at best. Even if you argue for exponential growth in these roles, it is evident that there is no debate about a growing gap between the increasing data volume and the people who should analyse it. This gap cannot be closed by humans in the future. Even in a world where all kids want to become and eventually work in a data role. Sorry for all the pilots, police officers, nurses, doctors, etc., there is no way for you to retire without the whole economy imploding. Therefore, the only way to close the gap is through the use of artificial intelligence. It is not about reducing the data roles. It's about making them efficient so that they can deal with the growing data shape (and not just the volume). For a long time, it was common sense in the industry that, if an artificial intelligence could generate or define the Raw Data Vault, it would be an assisting technology. The AI would make recommendations, for example, such as which hubs or links to model and which business keys to use. The human data modeler would make the final decision, with input from the AI. But what if the AI made the final decision? What would it look like? What if one could attach data sources to the AI platform and the AI would analyze the source datasets, come up with a Raw Data Vault model, and load that model into Vaultspeed or another data warehouse automation tool, know the source system’s data, know Data Vault modelling, and understand the selected data warehouse automation? These questions were posed by Michael Olschimke, a Data Vault and AI expert, when initially considering the challenge. He researched the distribution of neural networks on massively parallel processing (MPP) clusters to classify unstructured data at Santa Clara University in Silicon Valley. This prior AI research, combined with the knowledge he accumulated in the Data Vault, enabled him to build a solution that later became known as Flow.BI. Flow.BI as a Generative AI to Define the Raw Data Vault The solution is simple, at least from the outside: attach a few data sources, let the AI do the rest. Flow.BI supports several data sources already, including Microsoft SQL Server and derivatives, such as Synapse and Fabric, as long as a JDBC driver is available, Flow.BI should eventually be able to analyze the data source. And the AI doesn’t care if the data originates from a CRM system, such as Microsoft Dynamics, or an e-commerce platform; it's just data. There are no provisions in the code to deal with specific datasets, at least for now. The goal of Flow.BI is to produce a valid, that is, consistent and integrated, enterprise data model. Typically, this follows a Data Vault design, but it's not limited to that (we’ll discuss this later in the article). This is achieved by following a strict data-driven approach that imitates the human data modeler. Flow.BI needs data to make decisions, just like its human counterpart. Source entities with no data will be ignored. It only requires some metadata, such as the available entities and their columns. Datatypes are nice-to-have; primary keys and foreign keys would improve the target model, just like entity and column descriptions. But they are not required to define a valid Raw Data Vault model. Humans write this text, and as such, we like to influence the result of the modelling exercise. Flow.BI is appreciating this by offering many options for the human data modeler to influence the engine. Some of them will be discussed in this article, but there are many more already available and more to come. Flow.BI’s user interface is kept as lean and straightforward as possible: the solution is designed so that the AI should take the lead and model the whole Raw Data Vault. The UI’s purpose is to interact with human data modelers, allowing them to influence the results. That’s what many screens are related to - and the configuration of the security system. A client can have multiple instances, which result in independent Data Vault models. This is particularly useful when dealing with independent data platforms, such as those used by HR, the compliance department, or specific business use cases, or when creating the raw data foundation for data products within a data mesh. In this case, a Flow.BI instance equals a data product. But don’t underestimate the complexity of Flow.BI: The frontend is used to manage a large number of compute clusters that implement scalable agents to work on defining the Raw Data Vault. The platform is implementing full separation of data and processing, not only by client but also by instance. Mapping Raw Data to Organizational Ontology The very first step in the process is to identify the concepts in the attached datasets. For this purpose, there is a concept classifier that analyses the data and recognizes datasets and their classified concepts that it has seen in the past. A common requirement of clients is that they would like to leverage their organizational requirements in this process. While Flow.BI doesn’t know a client’s ontology; it is possible to override (and in some cases, complete) the concept classifications and refer to concepts from the organizational ontology. By doing so, Flow.BI will integrate the source system’s raw data into the organization's ontology. It will not create a logical Data Vault, which is where the Data Vault model reflects the desired business, but instead model the raw data as the business uses it, and therefore follow the data-driven Data Vault modeling principles that Michael Olschimke has taught to thousands of students over the years at Scalefree. Flow.BI also allows the definition of a multi-tenant Data Vault model, where source systems either provide multi-tenant data or are assigned to a specific tenant. In both cases, the integrated enterprise data model will be extended to allow queries across multiple tenants or within a single tenant, depending on the information consumer’s needs. Ensuring Security and Privacy Flow.BI was designed with security and privacy in mind. From a design perspective, this has two aspects: Security and privacy in the service itself, to protect client solutions and related assets Security and privacy are integral to the defined model, allowing for the effective utilization of Data Vault’s capabilities in addressing security and privacy requirements, such as satellite splits. While Flow.BI is using a shared architecture; all data and metadata storage and processing are separated by client and instance. However, this is often not sufficient for clients as they hesitate to share their highly sensitive data with a third party. For this reason, Flow.BI allows two critical features: Local data storage: instead of storing client data on Flow.BI infrastructure, the client provides an Azure Data Lake Storage to be used for storing the data. Local data processing: A Docker container can be deployed into the client’s infrastructure to access the client's data sources, extract the data, and process it. When using both options, only metadata, such as entity and column names, constraints, and descriptions, are shared with Flow.BI. No data is transferred from the client’s infrastructure to Flow.BI. The metadata is secured on Flow.BI’s premises as if it were actual data: row-level security separates the metadata by instance, and roles and permissions are defined per client who can access the metadata and what they can do with it. But security and privacy are not limited to the service itself. The defined model also utilizes the security and privacy features of Data Vault. For example, it enables the classification of source columns based on security and privacy. The user can set up security and privacy classes and apply them to the influence screen for both. By doing so, the column classifications are used when defining the Raw Data Vault and can later be used to implement a satellite split in the physical model (if necessary). An upcoming release will include an AI model for classifying columns based on privacy, utilizing data and metadata to automate this task. Tackling Multilingual Challenges A common challenge for clients is navigating multilingual data environments. Many data sources use English entity and column names, but there are systems using metadata in a different language. Also, the assumption that the data platform should use English metadata is not always correct. Especially in government clients, the use of the official language is mandatory. Both options, translating the source metadata to English (the default within Flow.BI) and translating the defined target model into any target language, are supported by Flow.BI’s translations tab on the influence screen: The tab utilizes an AI translator to fully automatically translate the incoming table names, column names, and concept names. However, the user can step in and override the translation to improve it to their needs. All strings of the source metadata and the defined model are passed through the translation module. It is also possible to reuse existing translations for a growing list of popular data sources. This feature enables readable names for satellites and their attributes (as well as hubs and links), resulting in a significantly improved user experience for the defined Raw Data Vault. Generating the Physical Model You should have noticed by now that we consistently discuss the defined Raw Data Vault model. Flow.BI is not generating the physical model, that is, the CREATE TABLE and INSERT INTO statements for the Raw Data Vault. Instead, it “just” defines the hubs, links, and satellites required for capturing all incoming data from the attached data sources, including business key selection, satellite splits, and special entity types, such as non-historized links and their satellites, multi-active satellites, hierarchical links, effectivity satellites, and reference tables. Video on Generating Physical Models This logical model (not to be confused with “logical Data Vault modelling”) is then provided to our growing number of ISV partner solutions that will consume our defined model, set up the required metadata in their tool, and generate the physical model. As a result, Flow.BI acts as a team member that analyses your organizational data sources and their data, knows how to model the Raw Data Vault, and how to set up metadata in the tool of your choice. The metadata is provided by Flow.BI can be used to model the landing zone/staging area (either on a data lake or a relational database such as Microsoft Fabric) and the Raw Data Vault in a data-driven Data Vault architecture, which is the recommended practice. With this in mind, Flow.BI is not a competition to Vaultspeed or your other existing data warehouse automation solution, but a valid extension that integrates with your existing tool stack. This makes it much easier to justify the introduction of Flow.BI to the project. Going Beyond Data Vault Flow.BI is not limited to the definition of Data Vault models. While it has been designed with the Data Vault concepts in mind, a customizable expert system is used to define the Data Vault model. Although the expert system is not yet publicly available, it has already been implemented and is in use for every model generation. This expert system enables the implementation of alternative data models, provided they adhere to data-driven, schema-on-read principles. Data Vault is such an example, but many others are possible, as well: Customized Data Vault models Inmon-style enterprise models in third-normal form (3NF, if no business logic is required Kimball-style analytical models with facts and dimensions, again without business logic Semi-structured JSON and XML document collections Key-value stores “One Big Table (OBT)” models “Many Big Related Table (MBRT)” models Okay, we’ve just invented the MBRT model as we're writing the article, but you get the idea: many large, fully denormalized tables with foreign–key relationships between each other. If you've developed your data-driven model, please get in touch with us. About the Authors Michael Olschimke is co-founder and CEO of Flow.BI, a generative AI that defines integrated enterprise data models, such as (but not limited to) Data Vault. Michael has trained thousands of industry data warehousing professionals, taught academic classes, and published regularly on topics around data platforms, data engineering, and Data Vault. He has over two decades of experience in information technology, with a specialization in business intelligence topics, artificial intelligence and data platforms. <<< Back to Blog Series Title Page184Views0likes0CommentsThe Future of AI: How Lovable.dev and Azure OpenAI Accelerate Apps that Change Lives
Discover how Charles Elwood, a Microsoft AI MVP and TEDx Speaker, leverages Lovable.dev and Azure OpenAI to create impactful AI solutions. From automating expense reports to restoring voices, translating gestures to speech, and visualizing public health data, Charles's innovations are transforming lives and democratizing technology. Follow his journey to learn more about AI for good.1.5KViews2likes0CommentsAzure AI Search: Microsoft OneLake integration plus more features now generally available
From ingestion to retrieval, Azure AI Search releases enterprise-grade GA features: new connectors, enrichment skills, vector/semantic capabilities and wizard improvements—enabling smarter agentic systems and scalable RAG experiences.909Views1like0CommentsExplore Microsoft Fabric Data Agent & Azure AI Foundry for agentic solutions
Contributors for this blogpost: Jeet J & Ritaja S Context & Objective Over the past year, Gen AI apps have expanded significantly across enterprises. The agentic AI era is here, and the Microsoft ecosystem helps enable end-to-end acceleration of agentic AI apps in production. In this blog, we'll cover how both low-code business analysts and pro-code developers can use the Microsoft stack to build reusable agentic apps for their organizations. Professionals in the Microsoft ecosystem are starting to build advanced agentic generative AI solutions using Microsoft AI Services and Azure AI Foundry, which supports both open source and industry models. Combined with the advancements in Microsoft Fabric, these tools enable robust, industry-specific applications. This blog post explains how to develop multi-agent solutions for various industries using Azure AI Foundry, Copilot Studio, and Fabric. Disclaimer: This blogpost is for educational purposes only and walks through the process of using the relevant services without ton of custom code; teams must follow engineering best practices—including development, automated deployment, testing, security, and responsible AI—before any production deployment. What to expect In-Focus: Our goal is to help the reader explore specific industry use cases and understand the concept of building multi-agent solutions. In our case, we will focus on the insurance and financial services use case, use Fabric Notebooks to create sample (fake) datasets, utilize simple click-through based workflow to build-and-configure three agents (both on Fabric and Azure AI Foundry), tie them together and offer the solution via Teams or Microsoft Copilot using the new M365 Agents Toolkit. Out-of-Focus: This blog post will not cover the fundamentals of Azure AI Services, Azure AI Foundry, or the various components of Microsoft Fabric. It also won’t cover the different ways (low-code or pro-code) to build agents, orchestration frameworks (Semantic Kernel, Langchain, AutoGen, etc.) for orchestrating the agents, or hosting options (Azure App Service – Web App, Azure Kubernetes Service, Azure Container Apps, Azure Functions ). Please review the pointers listed towards the end to gain a holistic understanding of building and deploying mission-critical generative AI solutions. Logical Architecture of Multi-Agent Solution utilizing Microsoft Fabric Data Agent and Azure AI Foundry Agent. Fabric & Azure AI Foundry – Pro-code Agentic path Prerequisites a. Access to Azure Tenant & Subscription b. Work with Azure tenant administrator to have appropriate Azure Roles and Capacity to provision Azure Resources (services) and Deploy AI Models in certain regions. c. A paid F2 or higher Fabric capacity resource – Important to note that the Fabric compute capacity can be paused and resumed. Pause in case you wish to save costs after your learning. d. Access Fabric Admin Portal Power BI for enabling these settings > Fabric data agent tenant settings is enabled. > Copilot tenant switch is enabled. > Optional: Cross-geo processing for AI is enabled. (depends on your region and data sovereignty requirements) > Optional: Cross-geo storing for AI is enabled. (depends on your region) e. At least one of these: Fabric Data Warehouse, Fabric Lakehouse, one or more Power BI semantic models, or a KQL database with data. This blog post will cover how to create sample datasets using Fabric Notebooks. f. Power BI semantic models via XMLA endpoints tenant switch is enabled for Power BI semantic model data sources. Walkthrough/Set-up One-time setup for Fabric Workspace and all agents a) Visit https://app.powerbi.com Click “New workspace” button to create a new workspace, give it a name and ensure that it is tied/associated to the Fabric Capacity you have provisioned. b) Click Workspace settings of the newly created Workspace c) Review the information in License info. If the workspace isn’t associated with your newly created Fabric Capacity, please do the proper association (link the Fabric capacity to the workspace) and wait for 5-10 mins. 2. Create an Insurance Agent a) Create a new Lakehouse in your Fabric Workspace. Change the name to InsuranceLakehouse b) Create a new Fabric Notebook, assign a name, and associate the Insurance Lakehouse with it. c) Add the following Pyspark (Python) code-snippets in the Notebook. i) Faker library for Fabric Notebook ii) Insurance Sample Dataset in Fabric Notebook iii) Run both cells to generate the sample Insurance dataset. d) Create a new Fabric data agent, give it a name and add the Data Source (InsuranceLakehouse). i) Ensure that the Insurance Lakehouse is added as the data source in the Insurance Fabric data agent. ii) Click AI instructions button first to paste the sample instructions and finally the Publish button. iii) Paste the sample instructions in the field. A churn is represented by number 1. Calculate churn rate as : total number of churns (TC) / (TT) total count of churn entries. When asked to calculate churn for each policy type then TT should be total count of churn of that policy type e.g Life, legal. iv) Make sure to hit the Publish button. v) Capture two values from the URL and store them in secure/private place. We will use them to configure the knowledge source in Azure AI Foundry Agent. https://app.powerbi.com/groups/<WORKSPACE-ID>/aiskills/<ARTIFACT-ID>?experience=power-bi e) Create Azure AI Agent on Azure AI Foundry and use Fabric Data Agent as the Knowledge Source i) Visit https://ai.azure/com ii) Create new Azure AI Project and deploy gpt-4o-mini model (as an example model) in the region where the model is available. iii) Create new Azure AI Foundry Agent by clicking the “New agent” button. Give it a name (for e.g. AgentforInsurance) iv) Paste the sample Instructions in the Azure AI Agent as follows Use Fabric data agent tool to answer questions related to Insurance data. Fabric data agent as a tool has access to insurance data tables: Claims (amount, date, status), Customer (age, address, occupation, etc), Policy (premium amount, policy type: life insurance, auto insurance, etc) v) On the right-hand pane, click “+Add” button next to Knowledge. vi) Choose an existing Fabric data agent connection or click the new Connection. vii) In the next dialog, plug-in the values of the Workspace and Artifact ID you captured above, Ensure that “is secret” is checked, give a name to the connection and hit the Connect button. viii) Add the Code Interpreter as the tool in the Azure AI Foundry agent by clicking +Add next to Actions and selecting Code Interpreter. ix) Test the agent by clicking “Try in playground” button x) To test the Agent, you can try out these sample questions: What is the churn rate across my different insurance policy types What’s the month over month claims change in % for each insurance type? Show me graph view of month over month claims change in % for each insurance type for the year 2025 only Based on month over month claims change for the year 2025, can you show the forecast for the next 3 months in a graph? f) Exposing the Fabric data agent to end users: We will explore this in the Copilot Studio Section Fabric & Copilot Studio – Low-code agentic path Prerequisites For Copilot Studio, you have 3 options to work with: Copilot Studio trial or Copilot Studio license or Copilot Studio Pay as you go (connected to your Azure billing). Follow steps here to setup: Copilot Studio licensing - Microsoft Copilot Studio | Microsoft Learn Once you have Copilot Studio set up, navigate to https://copilotstudio.microsoft.com/ and start creating a new agent – Click on “create” and then “New Agent” Walkthrough/Set-up: Follow steps from “Fabric & Azure AI Foundry section” to create the Fabric Lakehouse and You could create the new agent by describing step by step in natural language but for this exercise we will “skip to configure” (button): Give the agent a name, add a helpful description (suggestion, add: Agent that has access to Insurance data and helps with data driven insights from Policy, Claims and Customer information). Then add the agent instruction prompt: “Answer questions related to Insurance data, you have access to insurance data agent, use the agent to gather insights from insurance Lakehouse containing customer, policy and claim information.” Finally click on “Create” You should have the following setup like below: Next, we want to add another agent for this agent to use – in this case this will be our Fabric Data Agent. Click on “Agents”: Next click on “Add” and add agent. From the screen click on Microsoft Fabric: If you haven’t set-up a connection to Fabric from Copilot Studio, you will be prompted to create a new connection, follow the prompts to sign in with your user and add a connection. Once that is done click “Next” to continue: From the Fabric screen, select the appropriate data agent and click “Next”: On the next screen, name the agent appropriately and use a friendly description “Agent that answers questions from the insurance lakehouse knowledge, has access to claims, policies and customer information” and finally click on “Add Agent”: On the “Tools” section click on refresh to make sure the tool description populates. Finally go back to overview and then Start Testing the agent from the side Test Panel. Click on Activity map to see the sequence of events. Type in the following question: “What’s the month over month claims change in % for auto insurance ?” You can see the Fabric data agent is called by the Copilot Agent in this scenario to answer your question: Now let’s prepare to surface this through Teams. You will need to publish the agent to a channel (in this case, we will use the Teams channel). First, navigate to channels: Click on the Teams and M365 Copilot channel and click add. After adding the channel, a pop up will ask y ou if you are ready to publish the agent: To view the app in Teams you need to make sure that you have setup proper policies in Teams. Follow this tutorial: Connect and configure an agent for Teams and Microsoft 365 Copilot - Microsoft Copilot Studio | Microsoft Learn. Now your app is available across Teams. Below is an example of how to use it from Teams – make sure you click on “Allow” for the fabric data agent connection: Fabric & AI Foundry & Copilot Studio – the end to end We saw how Fabric data agents can be created and utilized in Copilot Studio in a multi-agent setup. In the future, pro-code and low-code agentic developers are expected to work together to create agentic apps, instead of in silos. So, how do we solve the challenge of connecting all the components together in the same technology stack ? Let’s say a pro-code developer has created a custom agent in AI Foundry. Meanwhile, a low-code business user has put in business context to create another agent that requires access to the agent in AI Foundry. You’ll be pleased to know that Copilot Studio and Azure AI Foundry are becoming more integrated to enable complex, custom scenarios: Copilot Studio will soon release the integration to help with this: Summary: We demonstrated how one can build a Gen AI solution that allows seamless integration between Azure AI Foundry agents and Fabric data agents. We look forward to seeing what innovative solutions you can build by learning and working closely with your Microsoft contacts or your SI partner. This may include but not limited to: Utilizing a real industry domain to illustrate the concepts of building simple multi-agent solution. Showcasing the value of combining Fabric data agent and Azure AI agent Demonstrating how one can publish the conceptual solution to Teams or Copilot using the new M365 Agents Toolkit. Note that this blog post focused only on Fabric data agent and Azure AI Foundry Agent Service, but production ready solutions will need to consider Azure Monitor (for monitoring and observability) and Microsoft Purview for data governance. Pointers to Other Learning Resources Ways to build AI Agents: Build agents, your way | Microsoft Developer Components of Microsoft Fabric: What is Microsoft Fabric - Microsoft Fabric | Microsoft Learn Info on Microsoft Fabric data agent Create a Fabric data agent (preview) - Microsoft Fabric | Microsoft Learn Fabric Data Agent Tutorial: Fabric data agent scenario (preview) - Microsoft Fabric | Microsoft Learn New in Fabric Data Agent: Data source instructions for smarter, more accurate AI responses | Microsoft Fabric Blog | Microsoft Fabric Info on Azure AI Services, Models and Azure AI Foundry Azure AI Foundry documentation | Microsoft Learn What are Azure AI services? - Azure AI services | Microsoft Learn How to use Azure AI services in Azure AI Foundry portal - Azure AI Services | Microsoft Learn What is Azure AI Foundry Agent Service? - Azure AI Foundry | Microsoft Learn Explore Azure AI Foundry Models - Azure AI Foundry | Microsoft Learn What is Azure OpenAI in Azure AI Foundry Models? | Microsoft Learn Explore model leaderboards in Azure AI Foundry portal - Azure AI Foundry | Microsoft Learn & Benchmark models in the model leaderboard of Azure AI Foundry portal - Azure AI Foundry | Microsoft Learn How to use model router for Azure AI Foundry (preview) | Microsoft Learn Observability in Generative AI with Azure AI Foundry - Azure AI Foundry | Microsoft Learn Trustworthy AI for Azure AI Foundry - Azure AI Foundry | Microsoft Learn Cost Management for Models: Plan to manage costs for Azure AI Foundry Models | Microsoft Learn Provisioned Throughput Offering: Provisioned throughput for Azure AI Foundry Models | Microsoft Learn Extend Azure AI Foundry Agent with Microsoft Fabric Expand Azure AI Agent with New Knowledge Tools: Microsoft Fabric and Tripadvisor | Microsoft Community Hub How to use the data agents in Microsoft Fabric with Azure AI Foundry Agent Service - Azure AI Foundry | Microsoft Learn General guidance on when to adopt, extend and build CoPilot experiences: Adopt, extend and build Copilot experiences across the Microsoft Cloud | Microsoft Learn M365 Agents Toolkit Choose the right agent solution to support your use case | Microsoft Learn Microsoft 365 Agents Toolkit Overview - Microsoft 365 Developer | Microsoft Learn Github and Video to offer Azure AI Agent inside Teams or CoPilot via M365 Agents Toolkit OfficeDev/microsoft-365-agents-toolkit: Developer tools for building Teams apps & Deploying your Azure AI Foundry agent to Microsoft 365 Copilot, Microsoft Teams, and beyond Happy Learning! Contributors: Jeet J & Ritaja S Special thanks to reviewers: Joanne W, Amir J & Noah A702Views1like2CommentsApproaches to Integrating Azure Databricks with Microsoft Fabric: The Better Together Story!
Azure Databricks and Microsoft Fabric can be combined to create a unified and scalable analytics ecosystem. This document outlines eight distinct integration approaches, each accompanied by step-by-step implementation guidance and key design considerations. These methods are not prescriptive—your cloud architecture team can choose the integration strategy that best aligns with your organization’s governance model, workload requirements and platform preferences. Whether you prioritize centralized orchestration, direct data access, or seamless reporting, the flexibility of these options allows you to tailor the solution to your specific needs.1.2KViews6likes1CommentActing on Real-Time data using custom actions with Data Activator
Being able to make data driven decisions and act on real-time data is important to organizations because it enable them to either avert crisis in systems that monitor product health and take other actions based on their requirements. For an example, a shipping company may want to monitor their packages and act in real-time when the temperature of the packages becomes too hot. One way of monitoring and acting on data is to use Data Activator, which is a no-code experience in Microsoft Fabric for automatically taking actions when the condition of the package temperature is detected in the data.1.1KViews0likes1CommentExternal Data Sharing With Microsoft Fabric
The demands and growth of data for external analytics consumption is rapidly growing. There are many options to share data externally and the field is very dynamic. One of the most frictionless and easy onboarding steps for external data sharing we will explore is with Microsoft Fabric. This external data allows users to share data from their tenant with users in another Microsoft Fabric tenant.6KViews3likes2CommentsAnnouncing Mirroring for Azure Database for PostgreSQL in Microsoft Fabric for Public Preview
Back at the first European Microsoft Fabric Community Conference in September 2024 we announced our Private Preview program for Mirroring for Azure Database for PostgreSQL in Microsoft Fabric. Today, in conjunction with 2025 edition of Microsoft Fabric Community Conference in Las Vegas, we're thrilled to announce our Public Preview milestone, giving customers the ability to leverage friction-free near-real time replication from Azure Database for PostgreSQL flexible server to Fabric OneLake in Delta tables, providing a solid foundation for reporting, advanced analytics, AI, and data science on operational data with minimal effort and impact on transactional workloads. Mirroring is setup from Fabric Data Warehousing experience by providing the Azure Database for PostgreSQL flexible server and database connection details, provide selections on what needs to be mirrored into Fabric, either all data or user selected eligible mirrored tables. And, just like that, mirroring is ready to go. Mirroring Azure Database for PostgreSQL flexible server creates an initial snapshot in Fabric OneLake, after which data is kept in sync in near-real time with every transaction. How mirroring to Fabric works in Azure Database for PostgreSQL flexible server Fabric mirroring in Azure Database for PostgreSQL flexible server is based on principles such as logical replication and the Change Data Capture (CDC) design pattern. Once Fabric mirroring is established for a database in Azure Database for PostgreSQL flexible server, an initial snapshot is created by a background process for selected tables to be mirrored. That snapshot is shipped to a Fabric OneLake's landing zone in Parquet format. A process running in Fabric, known as replicator, takes these initial snapshot files and creates tables in Delta format in the Mirrored database artifact. Subsequent changes applied to selected tables are also captured in the source database and shipped to the OneLake landing zone in batches. Those batches of changes are finally applied to the respective Delta tables in the Mirrored database artifact. For Fabric mirroring, the CDC pattern is implemented in a proprietary PostgreSQL extension called azure_cdc, which is installed and registered in source databases during Fabric mirroring enablement workflow. This guided process has a new dedicated page in Azure Portal and is setting up all required pre-requisites and is offering a simplified experience where you just need to select which databases you want to replicate to Fabric OneLake (default is up to 3). You can read additional details regarding the server enablement process and other critical configuration and monitoring options on a dedicated page in Azure Database for PostgreSQL flexible server product documentation. Explore advanced analytics and data engineering for PostgreSQL in Microsoft Fabric Once data is on OneLake, mirrored data in the delta format is ready for immediate consumption across all Fabric experiences and features, such as Power BI with new Direct Lake mode, Data Warehouse, Data Engineering, Lakehouse, KQL Database, Notebooks and Copilot, which work instantly. Direct Lake mode is a fast path to load the data from the lake with groundbreaking semantic model capability for analyzing very large data volumes in Power BI. As Direct Lake mode also supports reading Delta tables right from OneLake, the Mirrored PostgreSQL database is Power BI ready along with Copilot capabilities. Data across any mirrored database (either Azure Database for PostgreSQL, Azure SQL DB, Azure Cosmos DB or Snowflake) can be cross-joined as well, enabling querying across any database, warehouse or Lakehouse (either as a shortcut to AWS S3 or ADLS Gen 2 etc.). With the same approach, you can also have multiple PosgreSQL databases from multiple servers mirrored to OneLake like in a typical SaaS provider scenario, where each database belongs to a different tenant, and execute cross-database queries to aggregate and analyze critical business metrics. Data scientists and data engineers can work with the mirrored Azure Database for PostgreSQL data joined with other sources (see this example with CosmosDB data) that are created as shortcuts in Lakehouse. Read about endless possibilities when loading operational databases in OneLake and Microsoft Fabric in related section of our product documentation here. Getting started with Mirroring for Azure Database for PostgreSQL in Fabric To summarize, Mirroring Azure Database for PostgreSQL in Microsoft Fabric plays a crucial role in enabling analytics and driving insights from operational data by ensuring that the most recent data is available for analysis. This allows businesses to make decisions based on the most current situation, rather than relying on outdated information. Improving accuracy also reduces the risk of discrepancies between the source and the replicated data, leading to more accurate analytics and reliable insights. In addition, is essential for predictive analytics and AI models provide the most recent data to make accurate predictions and decisions. To get started and learn more about Mirroring Azure Database for PostgreSQL flexible server in Microsoft Fabric, its pre-requisites, setup, FAQ’s, current limitations, and tutorial, please click here to read all about it and stay tuned for more updates and new features coming soon. To get more updates also on overall Mirroring capabilities in Fabric, please read this other blog post where you will get the latest news.1.3KViews3likes4Comments