microsoft fabric
78 TopicsApproaches to Integrating Azure Databricks with Microsoft Fabric: The Better Together Story!
Azure Databricks and Microsoft Fabric can be combined to create a unified and scalable analytics ecosystem. This document outlines eight distinct integration approaches, each accompanied by step-by-step implementation guidance and key design considerations. These methods are not prescriptive—your cloud architecture team can choose the integration strategy that best aligns with your organization’s governance model, workload requirements and platform preferences. Whether you prioritize centralized orchestration, direct data access, or seamless reporting, the flexibility of these options allows you to tailor the solution to your specific needs.480Views3likes1CommentExplore Microsoft Fabric Data Agent & Azure AI Foundry for agentic solutions
Contributors for this blogpost: Jeet J & Ritaja S Context & Objective Over the past year, Gen AI apps have expanded significantly across enterprises. The agentic AI era is here, and the Microsoft ecosystem helps enable end-to-end acceleration of agentic AI apps in production. In this blog, we'll cover how both low-code business analysts and pro-code developers can use the Microsoft stack to build reusable agentic apps for their organizations. Professionals in the Microsoft ecosystem are starting to build advanced agentic generative AI solutions using Microsoft AI Services and Azure AI Foundry, which supports both open source and industry models. Combined with the advancements in Microsoft Fabric, these tools enable robust, industry-specific applications. This blog post explains how to develop multi-agent solutions for various industries using Azure AI Foundry, Copilot Studio, and Fabric. Disclaimer: This blogpost is for educational purposes only and walks through the process of using the relevant services without ton of custom code; teams must follow engineering best practices—including development, automated deployment, testing, security, and responsible AI—before any production deployment. What to expect In-Focus: Our goal is to help the reader explore specific industry use cases and understand the concept of building multi-agent solutions. In our case, we will focus on the insurance and financial services use case, use Fabric Notebooks to create sample (fake) datasets, utilize simple click-through based workflow to build-and-configure three agents (both on Fabric and Azure AI Foundry), tie them together and offer the solution via Teams or Microsoft Copilot using the new M365 Agents Toolkit. Out-of-Focus: This blog post will not cover the fundamentals of Azure AI Services, Azure AI Foundry, or the various components of Microsoft Fabric. It also won’t cover the different ways (low-code or pro-code) to build agents, orchestration frameworks (Semantic Kernel, Langchain, AutoGen, etc.) for orchestrating the agents, or hosting options (Azure App Service – Web App, Azure Kubernetes Service, Azure Container Apps, Azure Functions ). Please review the pointers listed towards the end to gain a holistic understanding of building and deploying mission-critical generative AI solutions. Logical Architecture of Multi-Agent Solution utilizing Microsoft Fabric Data Agent and Azure AI Foundry Agent. Fabric & Azure AI Foundry – Pro-code Agentic path Prerequisites a. Access to Azure Tenant & Subscription b. Work with Azure tenant administrator to have appropriate Azure Roles and Capacity to provision Azure Resources (services) and Deploy AI Models in certain regions. c. A paid F2 or higher Fabric capacity resource – Important to note that the Fabric compute capacity can be paused and resumed. Pause in case you wish to save costs after your learning. d. Access Fabric Admin Portal Power BI for enabling these settings > Fabric data agent tenant settings is enabled. > Copilot tenant switch is enabled. > Optional: Cross-geo processing for AI is enabled. (depends on your region and data sovereignty requirements) > Optional: Cross-geo storing for AI is enabled. (depends on your region) e. At least one of these: Fabric Data Warehouse, Fabric Lakehouse, one or more Power BI semantic models, or a KQL database with data. This blog post will cover how to create sample datasets using Fabric Notebooks. f. Power BI semantic models via XMLA endpoints tenant switch is enabled for Power BI semantic model data sources. Walkthrough/Set-up One-time setup for Fabric Workspace and all agents a) Visit https://app.powerbi.com Click “New workspace” button to create a new workspace, give it a name and ensure that it is tied/associated to the Fabric Capacity you have provisioned. b) Click Workspace settings of the newly created Workspace c) Review the information in License info. If the workspace isn’t associated with your newly created Fabric Capacity, please do the proper association (link the Fabric capacity to the workspace) and wait for 5-10 mins. 2. Create an Insurance Agent a) Create a new Lakehouse in your Fabric Workspace. Change the name to InsuranceLakehouse b) Create a new Fabric Notebook, assign a name, and associate the Insurance Lakehouse with it. c) Add the following Pyspark (Python) code-snippets in the Notebook. i) Faker library for Fabric Notebook ii) Insurance Sample Dataset in Fabric Notebook iii) Run both cells to generate the sample Insurance dataset. d) Create a new Fabric data agent, give it a name and add the Data Source (InsuranceLakehouse). i) Ensure that the Insurance Lakehouse is added as the data source in the Insurance Fabric data agent. ii) Click AI instructions button first to paste the sample instructions and finally the Publish button. iii) Paste the sample instructions in the field. A churn is represented by number 1. Calculate churn rate as : total number of churns (TC) / (TT) total count of churn entries. When asked to calculate churn for each policy type then TT should be total count of churn of that policy type e.g Life, legal. iv) Make sure to hit the Publish button. v) Capture two values from the URL and store them in secure/private place. We will use them to configure the knowledge source in Azure AI Foundry Agent. https://app.powerbi.com/groups/<WORKSPACE-ID>/aiskills/<ARTIFACT-ID>?experience=power-bi e) Create Azure AI Agent on Azure AI Foundry and use Fabric Data Agent as the Knowledge Source i) Visit https://ai.azure/com ii) Create new Azure AI Project and deploy gpt-4o-mini model (as an example model) in the region where the model is available. iii) Create new Azure AI Foundry Agent by clicking the “New agent” button. Give it a name (for e.g. AgentforInsurance) iv) Paste the sample Instructions in the Azure AI Agent as follows Use Fabric data agent tool to answer questions related to Insurance data. Fabric data agent as a tool has access to insurance data tables: Claims (amount, date, status), Customer (age, address, occupation, etc), Policy (premium amount, policy type: life insurance, auto insurance, etc) v) On the right-hand pane, click “+Add” button next to Knowledge. vi) Choose an existing Fabric data agent connection or click the new Connection. vii) In the next dialog, plug-in the values of the Workspace and Artifact ID you captured above, Ensure that “is secret” is checked, give a name to the connection and hit the Connect button. viii) Add the Code Interpreter as the tool in the Azure AI Foundry agent by clicking +Add next to Actions and selecting Code Interpreter. ix) Test the agent by clicking “Try in playground” button x) To test the Agent, you can try out these sample questions: What is the churn rate across my different insurance policy types What’s the month over month claims change in % for each insurance type? Show me graph view of month over month claims change in % for each insurance type for the year 2025 only Based on month over month claims change for the year 2025, can you show the forecast for the next 3 months in a graph? f) Exposing the Fabric data agent to end users: We will explore this in the Copilot Studio Section Fabric & Copilot Studio – Low-code agentic path Prerequisites For Copilot Studio, you have 3 options to work with: Copilot Studio trial or Copilot Studio license or Copilot Studio Pay as you go (connected to your Azure billing). Follow steps here to setup: Copilot Studio licensing - Microsoft Copilot Studio | Microsoft Learn Once you have Copilot Studio set up, navigate to https://copilotstudio.microsoft.com/ and start creating a new agent – Click on “create” and then “New Agent” Walkthrough/Set-up: Follow steps from “Fabric & Azure AI Foundry section” to create the Fabric Lakehouse and You could create the new agent by describing step by step in natural language but for this exercise we will “skip to configure” (button): Give the agent a name, add a helpful description (suggestion, add: Agent that has access to Insurance data and helps with data driven insights from Policy, Claims and Customer information). Then add the agent instruction prompt: “Answer questions related to Insurance data, you have access to insurance data agent, use the agent to gather insights from insurance Lakehouse containing customer, policy and claim information.” Finally click on “Create” You should have the following setup like below: Next, we want to add another agent for this agent to use – in this case this will be our Fabric Data Agent. Click on “Agents”: Next click on “Add” and add agent. From the screen click on Microsoft Fabric: If you haven’t set-up a connection to Fabric from Copilot Studio, you will be prompted to create a new connection, follow the prompts to sign in with your user and add a connection. Once that is done click “Next” to continue: From the Fabric screen, select the appropriate data agent and click “Next”: On the next screen, name the agent appropriately and use a friendly description “Agent that answers questions from the insurance lakehouse knowledge, has access to claims, policies and customer information” and finally click on “Add Agent”: On the “Tools” section click on refresh to make sure the tool description populates. Finally go back to overview and then Start Testing the agent from the side Test Panel. Click on Activity map to see the sequence of events. Type in the following question: “What’s the month over month claims change in % for auto insurance ?” You can see the Fabric data agent is called by the Copilot Agent in this scenario to answer your question: Now let’s prepare to surface this through Teams. You will need to publish the agent to a channel (in this case, we will use the Teams channel). First, navigate to channels: Click on the Teams and M365 Copilot channel and click add. After adding the channel, a pop up will ask y ou if you are ready to publish the agent: To view the app in Teams you need to make sure that you have setup proper policies in Teams. Follow this tutorial: Connect and configure an agent for Teams and Microsoft 365 Copilot - Microsoft Copilot Studio | Microsoft Learn. Now your app is available across Teams. Below is an example of how to use it from Teams – make sure you click on “Allow” for the fabric data agent connection: Fabric & AI Foundry & Copilot Studio – the end to end We saw how Fabric data agents can be created and utilized in Copilot Studio in a multi-agent setup. In the future, pro-code and low-code agentic developers are expected to work together to create agentic apps, instead of in silos. So, how do we solve the challenge of connecting all the components together in the same technology stack ? Let’s say a pro-code developer has created a custom agent in AI Foundry. Meanwhile, a low-code business user has put in business context to create another agent that requires access to the agent in AI Foundry. You’ll be pleased to know that Copilot Studio and Azure AI Foundry are becoming more integrated to enable complex, custom scenarios: Copilot Studio will soon release the integration to help with this: Summary: We demonstrated how one can build a Gen AI solution that allows seamless integration between Azure AI Foundry agents and Fabric data agents. We look forward to seeing what innovative solutions you can build by learning and working closely with your Microsoft contacts or your SI partner. This may include but not limited to: Utilizing a real industry domain to illustrate the concepts of building simple multi-agent solution. Showcasing the value of combining Fabric data agent and Azure AI agent Demonstrating how one can publish the conceptual solution to Teams or Copilot using the new M365 Agents Toolkit. Note that this blog post focused only on Fabric data agent and Azure AI Foundry Agent Service, but production ready solutions will need to consider Azure Monitor (for monitoring and observability) and Microsoft Purview for data governance. Pointers to Other Learning Resources Ways to build AI Agents: Build agents, your way | Microsoft Developer Components of Microsoft Fabric: What is Microsoft Fabric - Microsoft Fabric | Microsoft Learn Info on Microsoft Fabric data agent Create a Fabric data agent (preview) - Microsoft Fabric | Microsoft Learn Fabric Data Agent Tutorial: Fabric data agent scenario (preview) - Microsoft Fabric | Microsoft Learn New in Fabric Data Agent: Data source instructions for smarter, more accurate AI responses | Microsoft Fabric Blog | Microsoft Fabric Info on Azure AI Services, Models and Azure AI Foundry Azure AI Foundry documentation | Microsoft Learn What are Azure AI services? - Azure AI services | Microsoft Learn How to use Azure AI services in Azure AI Foundry portal - Azure AI Services | Microsoft Learn What is Azure AI Foundry Agent Service? - Azure AI Foundry | Microsoft Learn Explore Azure AI Foundry Models - Azure AI Foundry | Microsoft Learn What is Azure OpenAI in Azure AI Foundry Models? | Microsoft Learn Explore model leaderboards in Azure AI Foundry portal - Azure AI Foundry | Microsoft Learn & Benchmark models in the model leaderboard of Azure AI Foundry portal - Azure AI Foundry | Microsoft Learn How to use model router for Azure AI Foundry (preview) | Microsoft Learn Observability in Generative AI with Azure AI Foundry - Azure AI Foundry | Microsoft Learn Trustworthy AI for Azure AI Foundry - Azure AI Foundry | Microsoft Learn Cost Management for Models: Plan to manage costs for Azure AI Foundry Models | Microsoft Learn Provisioned Throughput Offering: Provisioned throughput for Azure AI Foundry Models | Microsoft Learn Extend Azure AI Foundry Agent with Microsoft Fabric Expand Azure AI Agent with New Knowledge Tools: Microsoft Fabric and Tripadvisor | Microsoft Community Hub How to use the data agents in Microsoft Fabric with Azure AI Foundry Agent Service - Azure AI Foundry | Microsoft Learn General guidance on when to adopt, extend and build CoPilot experiences: Adopt, extend and build Copilot experiences across the Microsoft Cloud | Microsoft Learn M365 Agents Toolkit Choose the right agent solution to support your use case | Microsoft Learn Microsoft 365 Agents Toolkit Overview - Microsoft 365 Developer | Microsoft Learn Github and Video to offer Azure AI Agent inside Teams or CoPilot via M365 Agents Toolkit OfficeDev/microsoft-365-agents-toolkit: Developer tools for building Teams apps & Deploying your Azure AI Foundry agent to Microsoft 365 Copilot, Microsoft Teams, and beyond Happy Learning! Contributors: Jeet J & Ritaja S Special thanks to reviewers: Joanne W, Amir J & Noah A181Views0likes0CommentsExternal Data Sharing With Microsoft Fabric
The demands and growth of data for external analytics consumption is rapidly growing. There are many options to share data externally and the field is very dynamic. One of the most frictionless and easy onboarding steps for external data sharing we will explore is with Microsoft Fabric. This external data allows users to share data from their tenant with users in another Microsoft Fabric tenant.5.9KViews3likes2CommentsAnnouncing Mirroring for Azure Database for PostgreSQL in Microsoft Fabric for Public Preview
Back at the first European Microsoft Fabric Community Conference in September 2024 we announced our Private Preview program for Mirroring for Azure Database for PostgreSQL in Microsoft Fabric. Today, in conjunction with 2025 edition of Microsoft Fabric Community Conference in Las Vegas, we're thrilled to announce our Public Preview milestone, giving customers the ability to leverage friction-free near-real time replication from Azure Database for PostgreSQL flexible server to Fabric OneLake in Delta tables, providing a solid foundation for reporting, advanced analytics, AI, and data science on operational data with minimal effort and impact on transactional workloads. Mirroring is setup from Fabric Data Warehousing experience by providing the Azure Database for PostgreSQL flexible server and database connection details, provide selections on what needs to be mirrored into Fabric, either all data or user selected eligible mirrored tables. And, just like that, mirroring is ready to go. Mirroring Azure Database for PostgreSQL flexible server creates an initial snapshot in Fabric OneLake, after which data is kept in sync in near-real time with every transaction. How mirroring to Fabric works in Azure Database for PostgreSQL flexible server Fabric mirroring in Azure Database for PostgreSQL flexible server is based on principles such as logical replication and the Change Data Capture (CDC) design pattern. Once Fabric mirroring is established for a database in Azure Database for PostgreSQL flexible server, an initial snapshot is created by a background process for selected tables to be mirrored. That snapshot is shipped to a Fabric OneLake's landing zone in Parquet format. A process running in Fabric, known as replicator, takes these initial snapshot files and creates tables in Delta format in the Mirrored database artifact. Subsequent changes applied to selected tables are also captured in the source database and shipped to the OneLake landing zone in batches. Those batches of changes are finally applied to the respective Delta tables in the Mirrored database artifact. For Fabric mirroring, the CDC pattern is implemented in a proprietary PostgreSQL extension called azure_cdc, which is installed and registered in source databases during Fabric mirroring enablement workflow. This guided process has a new dedicated page in Azure Portal and is setting up all required pre-requisites and is offering a simplified experience where you just need to select which databases you want to replicate to Fabric OneLake (default is up to 3). You can read additional details regarding the server enablement process and other critical configuration and monitoring options on a dedicated page in Azure Database for PostgreSQL flexible server product documentation. Explore advanced analytics and data engineering for PostgreSQL in Microsoft Fabric Once data is on OneLake, mirrored data in the delta format is ready for immediate consumption across all Fabric experiences and features, such as Power BI with new Direct Lake mode, Data Warehouse, Data Engineering, Lakehouse, KQL Database, Notebooks and Copilot, which work instantly. Direct Lake mode is a fast path to load the data from the lake with groundbreaking semantic model capability for analyzing very large data volumes in Power BI. As Direct Lake mode also supports reading Delta tables right from OneLake, the Mirrored PostgreSQL database is Power BI ready along with Copilot capabilities. Data across any mirrored database (either Azure Database for PostgreSQL, Azure SQL DB, Azure Cosmos DB or Snowflake) can be cross-joined as well, enabling querying across any database, warehouse or Lakehouse (either as a shortcut to AWS S3 or ADLS Gen 2 etc.). With the same approach, you can also have multiple PosgreSQL databases from multiple servers mirrored to OneLake like in a typical SaaS provider scenario, where each database belongs to a different tenant, and execute cross-database queries to aggregate and analyze critical business metrics. Data scientists and data engineers can work with the mirrored Azure Database for PostgreSQL data joined with other sources (see this example with CosmosDB data) that are created as shortcuts in Lakehouse. Read about endless possibilities when loading operational databases in OneLake and Microsoft Fabric in related section of our product documentation here. Getting started with Mirroring for Azure Database for PostgreSQL in Fabric To summarize, Mirroring Azure Database for PostgreSQL in Microsoft Fabric plays a crucial role in enabling analytics and driving insights from operational data by ensuring that the most recent data is available for analysis. This allows businesses to make decisions based on the most current situation, rather than relying on outdated information. Improving accuracy also reduces the risk of discrepancies between the source and the replicated data, leading to more accurate analytics and reliable insights. In addition, is essential for predictive analytics and AI models provide the most recent data to make accurate predictions and decisions. To get started and learn more about Mirroring Azure Database for PostgreSQL flexible server in Microsoft Fabric, its pre-requisites, setup, FAQ’s, current limitations, and tutorial, please click here to read all about it and stay tuned for more updates and new features coming soon. To get more updates also on overall Mirroring capabilities in Fabric, please read this other blog post where you will get the latest news.1.3KViews3likes4CommentsBuilding Enterprise Voice-Enabled AI Agents with Azure Voice Live API
The sample application covered in this post demonstrates two approaches in an end-to-end solution that includes product search, order management, automated shipment creation, intelligent analytics, and comprehensive business intelligence through Microsoft Fabric integration. Use Case Scenario: Retail Fashion Agent Core Business Capabilities: Product Discovery and Ordering: Natural language product search across fashion categories (Winter wear, Active wear, etc.) and order placement. REST APIs hosted in Azure Function Apps provide this functionality and a Swagger definition is configured in the Application for tool action. Automated Fulfillment: Integration with Azure Logic Apps for shipment creation in Azure SQL Database Policy Support: Vector-powered QnA for returns, payment issues, and customer policies. Azure AI Search & File Search capabilities are used for this requirement. Conversation Analytics: AI-powered analysis using GPT-4o for sentiment scoring and performance evaluation. The Application captures the entire conversation between the customer and Agent and sends them to an Agent running in Azure Logic Apps to perform call quality assessment, before storing the results in Azure CosmosDB. When during the voice call the customer indicates that the conversation can be concluded, the Agent autonomously sends the conversation history to the Azure Logic App to perform quality assessment. Advanced Analytics Pipeline: Real-time Data Mirroring: Automatic synchronization from Azure Cosmos DB to Microsoft Fabric OneLake Business Intelligence: Custom Data Agents in Fabric for trend analysis and insights Executive Dashboards: Power BI reports for comprehensive performance monitoring Technical Architecture Overview The solution presents two approaches, each optimized for different enterprise scenarios: 🎯Approach 1: Direct Model Integration with GPT-Realtime Architecture Components This approach provides direct integration with Azure Voice Live API using GPT-Realtime model for immediate speech-to-speech conversational experiences without intermediate text processing. The Application connects to the Voice Live API uses a Web socket connection. The semantics of this API are similar to the one used when connecting to the GPT-Realtime API directly. The Voice Live API provides additional configurability, like the choice of a custom Voice from Azure Speech Services, options for echo cancellation, noise reduction and plugging an Avatar integration. Core Technical Stack: GPT-Realtime Model: Direct audio-to-audio processing Azure Speech Voice: High-quality TTS synthesis (en-IN-AartiIndicNeural) WebSocket Communication: Real-time bidirectional audio streaming Voice Activity Detection: Server-side VAD for natural conversation flow Client-Side Function Calling: Full control over tool execution logic Key Session Configuration The Direct Model Integration uses the session configuration below: session_config = { "input_audio_sampling_rate": 24000, "instructions": system_instructions, "turn_detection": { "type": "server_vad", "threshold": 0.5, "prefix_padding_ms": 300, "silence_duration_ms": 500, }, "tools": tools_list, "tool_choice": "auto", "input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"}, "input_audio_echo_cancellation": {"type": "server_echo_cancellation"}, "voice": { "name": "en-IN-AartiIndicNeural", "type": "azure-standard", "temperature": 0.8, }, "input_audio_transcription": {"model": "whisper-1"}, } Configuration Highlights: 24kHz Audio Sampling: High-quality audio processing for natural speech Server VAD: Optimized threshold (0.5) with 300ms padding for natural conversation flow Azure Deep Noise Suppression: Advanced noise reduction for clear audio Indic Voice Support: en-IN-AartiIndicNeural for localized customer experience Whisper-1 Transcription: Accurate speech recognition for conversation logging Connecting to the Azure Voice Live API The voicelive_modelclient.py demonstrates advanced WebSocket handling for real-time audio streaming: def get_websocket_url(self, access_token: str) -> str: """Generate WebSocket URL for Voice Live API.""" azure_ws_endpoint = endpoint.rstrip("/").replace("https://", "wss://") return ( f"{azure_ws_endpoint}/voice-live/realtime?api-version={api_version}" f"&model={model_name}" f"&agent-access-token={access_token}" ) async def connect(self): if self.is_connected(): # raise Exception("Already connected") self.log("Already connected") # Get access token access_token = self.get_azure_token() # Build WebSocket URL and headers ws_url = self.get_websocket_url(access_token) self.ws = await websockets.connect( ws_url, additional_headers={ "Authorization": f"Bearer {self.get_azure_token()}", "x-ms-client-request-id": str(uuid.uuid4()), }, ) print(f"Connected to Azure Voice Live API....") asyncio.create_task(self.receive()) await self.update_session() Function Calling Implementation The Direct Model Integration provides client-side function execution with complete control: tools_list = [ { "type": "function", "name": "perform_search_based_qna", "description": "call this function to respond to the user query on Contoso retail policies, procedures and general QnA", "parameters": { "type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"], }, }, { "type": "function", "name": "create_delivery_order", "description": "call this function to create a delivery order based on order id and destination location", "parameters": { "type": "object", "properties": { "order_id": {"type": "string"}, "destination": {"type": "string"}, }, "required": ["order_id", "destination"], }, }, { "type": "function", "name": "perform_call_log_analysis", "description": "call this function to analyze call log based on input call log conversation text", "parameters": { "type": "object", "properties": { "call_log": {"type": "string"}, }, "required": ["call_log"], }, }, { "type": "function", "name": "search_products_by_category", "description": "call this function to search for products by category", "parameters": { "type": "object", "properties": { "category": {"type": "string"}, }, "required": ["category"], }, }, { "type": "function", "name": "order_products", "description": "call this function to order products by product id and quantity", "parameters": { "type": "object", "properties": { "product_id": {"type": "string"}, "quantity": {"type": "integer"}, }, "required": ["product_id", "quantity"], }, } ] 🤖 Approach 2: Azure AI Foundry Agent Integration Architecture Components This approach leverages existing Azure AI Foundry Service Agents, providing enterprise-grade voice capabilities as a clean wrapper over pre-configured agents. It does not entail any code changes to the Agent itself to voice enable it. Core Technical Stack: Azure Fast Transcript: Advanced multi-language speech-to-text processing Azure AI Foundry Agent: Pre-configured Agent with autonomous capabilities GPT-4o-mini Model: Agent-configured model for text processing Neural Voice Synthesis: Indic language optimized TTS Semantic VAD: Azure semantic voice activity detection Session Configuration The Agent Integration approach uses advanced semantic voice activity detection: session_config = { "input_audio_sampling_rate": 24000, "turn_detection": { "type": "azure_semantic_vad", "threshold": 0.3, "prefix_padding_ms": 200, "silence_duration_ms": 200, "remove_filler_words": False, "end_of_utterance_detection": { "model": "semantic_detection_v1", "threshold": 0.01, "timeout": 2, }, }, "input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"}, "input_audio_echo_cancellation": {"type": "server_echo_cancellation"}, "voice": { "name": "en-IN-AartiIndicNeural", "type": "azure-standard", "temperature": 0.8, }, "input_audio_transcription": {"model": "azure-speech", "language": "en-IN, hi-IN"}, } Key Differentiators: Semantic VAD: Intelligent voice activity detection with utterance prediction Multi-language Support: Azure Speech with en-IN and hi-IN language support End-of-Utterance Detection: AI-powered conversation turn management Filler Word Handling: Configurable processing of conversational fillers Agent Integration Code The voicelive_client.py demonstrates seamless integration with Azure AI Foundry Agents. Notice that we need to provide the Azure AI Foundry Project Name and an ID of the Agent in it. We do not need to pass the model's name here, since the Agent is already configured with one. def get_websocket_url(self, access_token: str) -> str: """Generate WebSocket URL for Voice Live API.""" azure_ws_endpoint = endpoint.rstrip("/").replace("https://", "wss://") return ( f"{azure_ws_endpoint}/voice-live/realtime?api-version={api_version}" f"&agent-project-name={project_name}&agent-id={agent_id}" f"&agent-access-token={access_token}" ) async def connect(self): """Connects the client using a WS Connection to the Realtime API.""" if self.is_connected(): # raise Exception("Already connected") self.log("Already connected") # Get access token access_token = self.get_azure_token() # Build WebSocket URL and headers ws_url = self.get_websocket_url(access_token) self.ws = await websockets.connect( ws_url, additional_headers={ "Authorization": f"Bearer {self.get_azure_token()}", "x-ms-client-request-id": str(uuid.uuid4()), }, ) print(f"Connected to Azure Voice Live API....") asyncio.create_task(self.receive()) await self.update_session() Advanced Analytics Pipeline GPT-4o Powered Call Analysis The solution implements conversation analytics using Azure Logic Apps with GPT-4o: { "functions": [ { "name": "evaluate_call_log", "description": "Evaluate call log for Contoso Retail customer service call", "parameters": { "properties": { "call_reason": { "description": "Categorized call reason from 50+ predefined scenarios", "type": "string" }, "customer_satisfaction": { "description": "Overall satisfaction assessment", "type": "string" }, "customer_sentiment": { "description": "Emotional tone analysis", "type": "string" }, "call_rating": { "description": "Numerical rating (1-5 scale)", "type": "number" }, "call_rating_justification": { "description": "Detailed reasoning for rating", "type": "string" } } } } ] } Microsoft Fabric Integration The analytics pipeline extends into Microsoft Fabric for enterprise business intelligence: Fabric Integration Features: Real-time Data Mirroring: Cosmos DB to OneLake synchronization Custom Data Agents: Business-specific analytics agents in Fabric Copilot Integration: Natural language business intelligence queries Power BI Dashboards: Interactive reports and executive summaries Artefacts for reference The source code of the solution is available in the GitHub Repo here. An article on this topic is published on LinkedIn here A video recording of the demonstration of this App is available below: Part1 - walkthrough of the Agent configuration in Azure AI Foundry - here Part2 - demonstration of the Application that integrates with the Azure Voice Live API - here Part 3 - demonstration of the Microsoft Fabric Integration, Data Agents, Copilot in Fabric and Power BI for insights and analysis - here Conclusion Azure Voice Live API enables enterprises to build sophisticated voice-enabled AI assistants using two distinct architectural approaches. The Direct Model Integration provides ultra-low latency for real-time applications, while the Azure AI Foundry Agent Integration offers enterprise-grade governance and autonomous operation. Both approaches deliver the same comprehensive business capabilities: Natural voice interactions with advanced VAD and noise suppression Complete retail workflow automation from inquiry to fulfillment AI-powered conversation analytics with sentiment scoring Enterprise business intelligence through Microsoft Fabric integration The choice between approaches depends on your specific requirements: Choose Direct Model Integration for custom function calling and minimal latency Choose Azure AI Foundry Agent Integration for enterprise governance and existing investments422Views1like0CommentsCapacity Template v2 with Microsoft Fabric
1. Capacity Scenario One of the most common scenarios for Microsoft Graph Data Connect (MGDC) for SharePoint is Capacity. This scenario focuses on identifying which sites and files are using the most storage, along with understanding the distribution of these large sites and files by properties like type and age. The MGDC datasets for this scenario are SharePoint Sites and SharePoint Files. If you’re not familiar with these datasets, you can find details in the schema definitions at https://aka.ms/SharePointDatasets. To assist you in using these datasets, the team has developed a Capacity Template. Initially published as a template for Azure Synapse, we now have a new Microsoft Fabric template that is simpler and offers more features. This SharePoint Capacity v2 Template, based on Microsoft Fabric, is now publicly available. 2. Instructions The template comes with a set of detailed instructions at https://aka.ms/fabriccapacitytemplatesteps. These instructions include: How to install the Microsoft Fabric and Microsoft Graph Data Connect prerequisites How to import the pipeline template from the Microsoft Fabric gallery and set it up How to import the Power BI template and configure the data source settings See below some additional details about the template. 3. Microsoft Fabric Pipeline After you import the pipeline template, it will look like this: 4. Pipeline in Microsoft Fabric The Capacity template for Microsoft Fabric includes a few key improvements: The new template uses delta datasets to update the SharePoint Sites and SharePoint Files datasets. It keeps track of the last time the datasets were pulled by this pipeline, requesting just what changed since then. The new template uses views to do calculations and create new properties like size bands or date bands. In our previous template, this was done in Power Query, when importing into Power BI. The new template also uses a view to aggregate file data, grouping the data by file extension. You can find details on how to find and deploy the Microsoft Fabric template in the instructions (see item 3). 5. Microsoft Fabric Report The typical result from this solution is a set of Power BI dashboards pulled from the Microsoft Fabric data source. Here are some examples: These dashboards serve as examples or starting points and can be modified as necessary for various visualizations of the data within these datasets. The instructions (see item 3) include details on how to find and deploy a few sample Power BI Capacity templates. 6. Conclusion I hope this provides a good overview of the Capacity template for Microsoft Fabric. You can read more about the Microsoft Graph Data Connect for SharePoint at https://aka.ms/SharePointData. There you will find many details, including a list of datasets available, other common scenarios and frequently asked questions.Using Copilot (GPT-5) to turn schema into SQL and write queries
This article shows a few ways you can use Copilot to help with Microsoft Graph Data Connect for SharePoint. This includes turning a schema into a SQL CREATE TABLE statement and summarizing data from that. Make sure to use the Copilot option to use GPT-5 for best results. Scenario 1 This first scenario just builds a CREATE TABLE statement from the dataset schema definition in markdown notation. I provide a few instructions on how to handle the flattening of the objects. Me: Given the schema at this URL: https://github.com/microsoftgraph/dataconnect-solutions/blob/main/Datasets/data-connect-dataset-sharepointsites.md, give me a CREATE TABLE statement for a SQL Server. Just include the single SQL command in the output. Each column should have a comment with a short description. Don't include any information about keys or constraints. This dataset is flattened. One column in the table for every row, except those of type object. Column names to be flattened with a prefix. Use an underline to separate the prefix. Copilot: CREATE TABLE SharePointSites ( ptenant NVARCHAR(255) -- Id of the tenant ,Id NVARCHAR(255) -- GUID of the site ,Url NVARCHAR(2048) -- URL for the site ,ArchiveState NVARCHAR(50) -- Archive state of the site: None, Archiving, Archived, or Reactivating ,SiteArchivedBy NVARCHAR(255) -- Who archived the site, null if not archived ,SiteArchivedTime DATETIME -- When the site was archived, null if not archived ,RootWeb_Id NVARCHAR(255) -- Root web id ,RootWeb_Title NVARCHAR(512) -- Root web title ,RootWeb_WebTemplate NVARCHAR(255) -- Root web template name ,RootWeb_WebTemplateId INT -- Root web template id ,RootWeb_Configuration INT -- Root web template configuration id ,RootWeb_LastItemModifiedDate DATETIME -- Date when an item in the root web was last modified ,WebCount BIGINT -- Number of webs (subsites) in the site ,StorageQuota BIGINT -- Total storage in bytes allowed for this site ,StorageUsed BIGINT -- Total storage in bytes used by this site ,StorageMetrics_MetadataSize BIGINT -- Total metadata size for the site in bytes ,StorageMetrics_TotalFileCount BIGINT -- Total number of files for the site ,StorageMetrics_TotalFileStreamSize BIGINT -- Total size of latest version of files in bytes ,StorageMetrics_AdditionalFileStreamSize BIGINT -- Total size of additional file streams in bytes ,StorageMetrics_TotalSize BIGINT -- Total size of all files for the site in bytes ,GroupId NVARCHAR(255) -- Id of the group associated with this site ,GeoLocation NVARCHAR(50) -- Geographic region where the data is stored ,IsInRecycleBin BIT -- Indicates if the site is in the recycle bin ,RecycleBinItemCount BIGINT -- Number of items in the recycle bin ,RecycleBinItemSize BIGINT -- Size of items in the recycle bin ,SecondStageRecycleBinStorageUsage BIGINT -- Size of items in the second stage recycle bin ,IsTeamsConnectedSite BIT -- Indicates if the site is connected to Teams ,IsTeamsChannelSite BIT -- Indicates if the site is a channel site ,TeamsChannelType NVARCHAR(50) -- Type of channel if site is a Teams channel site ,IsHubSite BIT -- Indicates if the site is associated with a hub site ,HubSiteId NVARCHAR(255) -- Id of the hub site for this site ,IsCommunicationSite BIT -- Indicates if the site is a communication site ,IsOneDrive BIT -- Indicates if the site is a OneDrive ,BlockAccessFromUnmanagedDevices BIT -- Site blocks access from unmanaged devices ,BlockDownloadOfAllFilesOnUnmanagedDevices BIT -- Site blocks download of all files from unmanaged devices ,BlockDownloadOfViewableFilesOnUnmanagedDevices BIT -- Site blocks download of viewable files from unmanaged devices ,ShareByEmailEnabled BIT -- Site enables share by e-mail ,ShareByLinkEnabled BIT -- Site enables share by link ,IsExternalSharingEnabled BIT -- Indicates if external sharing is enabled ,SiteConnectedToPrivateGroup BIT -- Indicates if site is connected to a Private Group ,Privacy NVARCHAR(50) -- Privacy of the site: Private or Public ,SensitivityLabelInfo_Id NVARCHAR(255) -- Id of the Sensitivity Label for the site ,SensitivityLabelInfo_DisplayName NVARCHAR(255) -- Display name of the Sensitivity Label for the site ,Classification NVARCHAR(255) -- Classification of the site ,IBMode NVARCHAR(50) -- Information Barriers Mode ,IBSegments NVARCHAR(MAX) -- List of organization segments if IB mode is Explicit ,Owner_AadObjectId NVARCHAR(255) -- AAD Object Id of the owner ,Owner_Email NVARCHAR(320) -- Email of the owner ,Owner_UPN NVARCHAR(320) -- User Principal Name of the owner ,Owner_Name NVARCHAR(255) -- Name of the owner ,SecondaryContact_AadObjectId NVARCHAR(255) -- AAD Object Id of the secondary contact ,SecondaryContact_Email NVARCHAR(320) -- Email of the secondary contact ,SecondaryContact_UPN NVARCHAR(320) -- User Principal Name of the secondary contact ,SecondaryContact_Name NVARCHAR(255) -- Name of the secondary contact ,ReadLocked BIT -- Whether the site is locked for read access ,ReadOnly BIT -- Whether the site is in read-only mode ,CreatedTime DATETIME -- When the site was created (UTC) ,LastSecurityModifiedDate DATETIME -- When security on the site was last changed (UTC) ,LastUserAccessDate DATETIME -- Last access by a real user (UTC) ,LastContentChange DATETIME -- When the site contents were last changed (UTC) ,SnapshotDate DATETIME -- When this site information was captured (UTC) ,Operation NVARCHAR(50) -- Extraction mode of this row ); Scenario 2 Now I use the CREATE TABLE as the base to ask for a SELECT statement. Me: Give me a list of sites that have more than 5GB of files. Give me a single SQL statement using the tables described below. <create table statements included here> Copilot: SELECT Id AS SiteId, Url AS SiteUrl, RootWeb_Title AS SiteTitle, StorageMetrics_TotalSize AS TotalSizeInBytes, CAST(StorageMetrics_TotalSize / (1024.0 * 1024 * 1024) AS DECIMAL(10,2)) AS TotalSizeInGB FROM SharePointSites WHERE StorageMetrics_TotalSize > (5 * 1024 * 1024 * 1024) ORDER BY StorageMetrics_TotalSize DESC; Note: Interesting choice of columns and order, without any specific prompting to that effect. Scenario 3 Now for a more challenging scenario using two tables and a more complex requirement. Me: Given the tables below, write a SELECT statement to give me a list of sites with the name of the site, the total storage size and the top 3 file extensions by size. <create table statements for Sites and Files included here> Copilot: WITH ExtensionTotals AS ( SELECT f.SiteId, f.Extension, SUM(f.SizeInBytes) AS TotalSize FROM SharePointFiles f WHERE f.Extension IS NOT NULL GROUP BY f.SiteId, f.Extension ), RankedExtensions AS ( SELECT et.SiteId, et.Extension, et.TotalSize, ROW_NUMBER() OVER (PARTITION BY et.SiteId ORDER BY et.TotalSize DESC) AS rn FROM ExtensionTotals et ), TopExtensions AS ( SELECT SiteId, STRING_AGG(Extension + ' (' + CAST(TotalSize AS VARCHAR(20)) + ')', ', ') WITHIN GROUP (ORDER BY TotalSize DESC) AS Top3Extensions FROM RankedExtensions WHERE rn <= 3 GROUP BY SiteId ) SELECT s.RootWeb_Title AS SiteName, s.StorageMetrics_TotalSize AS TotalStorageSize, te.Top3Extensions FROM SharePointSites s LEFT JOIN TopExtensions te ON s.Id = te.SiteId ORDER BY s.StorageMetrics_TotalSize DESC; Notes: Scenario is nicely handled, using both the Files and Sites tables. Complex aggregation is coded correctly. Scenario 4 In this one, I skip the step of using a SQL CREATE TABLE statement and go directly from the markdown for the two schemas (Sites and Files) to a SQL SELECT statement. I also let GPT-5 add his comments and notes this time, to make sure it's thinking correctly... Me: Consider a SQL table for SharePoint Sites based on the schema at https://github.com/microsoftgraph/dataconnect-solutions/blob/main/Datasets/data-connect-dataset-sharepointsites.md - This dataset is flattened. There is one column in the SQL table for every row, except those of type object. Column names are flattened with a prefix. Column names use an underline to separate the prefix. Also consider a SQL table for SharePoint Files based on the schema at https://github.com/microsoftgraph/dataconnect-solutions/blob/main/Datasets/data-connect-dataset-sharepointfiles.md - Use the same assumptions as the previous one. Now write a SELECT statement to give me a list of sites with the name of the site, the total storage size and the top 3 file extensions by size. Copilot: Got it — you want to join the flattened SharePoint Sites table with the flattened SharePoint Files table, aggregate storage usage, and then find the top 3 file extensions by size for each site. Here’s a SQL example that works with the flattened column naming convention you described: WITH FileSizeByExtension AS ( SELECT f.SiteId, f.Extension, SUM(f.SizeInBytes) AS TotalExtensionSize FROM SharePointFiles f GROUP BY f.SiteId, f.Extension ), RankedExtensions AS ( SELECT SiteId, Extension, TotalExtensionSize, ROW_NUMBER() OVER (PARTITION BY SiteId ORDER BY TotalExtensionSize DESC) AS rn FROM FileSizeByExtension ), Top3Extensions AS ( SELECT SiteId, STRING_AGG(Extension, ', ' ORDER BY TotalExtensionSize DESC) AS TopExtensions FROM RankedExtensions WHERE rn <= 3 GROUP BY SiteId ) SELECT s.RootWeb_Title AS SiteName, s.StorageMetrics_TotalSize AS TotalStorageBytes, t.TopExtensions FROM SharePointSites s LEFT JOIN Top3Extensions t ON s.Id = t.SiteId ORDER BY s.StorageMetrics_TotalSize DESC; How it works FileSizeByExtension: Groups files by SiteId and Extension to sum their sizes. RankedExtensions: Uses ROW_NUMBER() to rank extensions by size within each site. Top3Extensions: Filters to the top 3 per site and concatenates them into a comma-separated list. Final SELECT: Joins back to the Sites table to get the site name and total storage size. Notes RootWeb_Title comes from flattening the RootWeb object in the Sites dataset. StorageMetrics_TotalSize is the flattened column for total site storage (excluding recycle bin). STRING_AGG works in SQL Server 2017+, Azure SQL, and some other RDBMS. If your SQL dialect doesn’t support it, you’d need a different aggregation method. This query orders sites by total storage size so the largest sites appear first. Note: Nicely done in one prompt with the proper explanations. Conclusion I hope these examples help you see the potential of getting Copilot's help with your queries. This is one more reason to load your MGDC for SharePoint tables directly into Microsoft Fabric, where you can run SQL queries against tables in the OneLake...Integrating remote patient monitoring solutions with healthcare data solutions in Microsoft Fabric
Co-Authors: Kemal Kepenek, Mustafa Al-Durra PhD, Matt Dearing, Jason Foerch, Manoj Kumar Introduction Remote patient monitoring solutions rely on connected devices, wearable technology, and advanced software platforms to collect and transmit patient health data. They facilitate monitoring of vital signs, chronic conditions, and behavioral patterns. Healthcare data solutions in Microsoft Fabric offers a secure, scalable, and interoperable data platform as part of Microsoft for Healthcare. Such a unified data platform is crucial for integrating disparate data sources and generating actionable health insights. This article provides a reference architecture and the steps to integrate remote patient monitoring solutions with healthcare data solutions in Fabric. The integration is aimed at satisfying low data resolution use cases. With low data resolution, we address infrequent (hourly, daily, or less) transfer of aggregated or point-in-time-snapshot device data into healthcare data solutions in Fabric to be used in a batch fashion to generate analytical insights. Integration steps for high data resolution use cases, which necessitate high frequency transfer of highly granular medical device data (for example, data from EKGs or ECGs) to become input to either batch or (near) real-time analytics processing and consumption, is a candidate for a future article. There are several methods, solutions and partners available in the marketplace today that will allow you to integrate a remote patient monitoring solution with the healthcare data solutions in Fabric. In this article, we leveraged the solution from Life365 (a Microsoft partner). The integration approach discussed here is applicable to most remote patient monitoring solutions whose integration logic (code) can be run inside a platform that can programmatically access (for example, through REST API calls) Microsoft Fabric. In our approach, the integration platform chosen is the Function App service within Microsoft Azure. In the subsequent sections of this article, we cover the integration approach in two phases: Interoperability phase, which illustrates how the data from medical devices (used by the remote patient monitoring solution) can be converted into format suitable for transferring into healthcare data solutions in Fabric. Analytical processing and consumption phase, which provides the steps to turn the medical device data into insights that can be easily accessed through Fabric. Integration Approach Interoperability Phase Step 1 of this phase performs the transfer of proprietary device data. As part of this step, datasets are collected from medical devices and transferred (typically, in the form of files) to an integration platform or service. In our reference architecture, the datasets are trans ferred to the Function App (inside an Azure Resource Group) that is responsible for the integration function. It is important for these datasets to contain information about (at least) three concepts or entities: Medical device(s) from which the datasets are collected. Patient(s) to whom the datasets belong. Reading(s) obtained from the medical device(s) throughout the time that the patients utilize these devices. Medical device readings data may be point-in-time data capture, metrics, measures, calculations, collections, or similar data points. Information about the entities listed above will be used in the later step of interoperability phase (discussed below) when we will convert this information into resources to be transferred to the second phase that will perform analytical processing and consumption. In step 2, to maintain mapping between proprietary device data and FHIR® resources, you can use transformation templates, or follow a programmatic approach, to convert datasets received from medical devices into appropriate FHIR® resources. Using the entities mentioned in the previous step, the conversion takes place as follows: Medical device information is converted to Device resource in FHIR® * . Patient information is converted to Patient resource in FHIR®. Device reading information is converted to Observation resource in FHIR®. * Currently, healthcare data solutions in Fabric supports FHIR® Release 4 (R4) standard. Consequently, the FHIR® resources that are created as part of this step should follow the same standard. Transformation and mapping activities are under the purview of each specific remote patient monitoring integration solution and are not reviewed in detail in this article. As an example, we provided below the high-level steps that one of the Microsoft partners (Life365) followed to integrate their remote patient monitoring solution with healthcare data solutions in Fabric: Life365 team developed a cloud-based transformation service that translates internal device data into standardized FHIR® (Fast Healthcare Interoperability Resources) Observations to enable compatibility with healthcare data solutions in Microsoft Fabric and other health data ecosystems. This service is implemented in Microsoft Azure Cloud and designed to ingest structured payloads from Life365-connected medical devices —including blood pressure monitors, weight scales, and pulse oximeters— and convert them into FHIR®-compliant formats in real time. When a reading is received: The service identifies relevant clinical metrics (e.g., systolic/diastolic blood pressure, heart rate, weight, SpO₂). These metrics are mapped to FHIR® Observation resources using industry-standard LOINC codes and units. Each Observation is enriched with references to the associated patient and device, formatted in NDJSON to meet the ingestion requirements in healthcare data solutions in Fabric. The resulting FHIR®-compliant data is securely transmitted to the Fabric instance using token-based authentication. This implementation provides a consistent, standards-aligned pathway for Life365 device data to integrate with downstream FHIR®-based platforms while abstracting the proprietary structure of the original device payloads. For examples from the public domain, you can use the following open-source projects as references: https://github.com/microsoft/fit-on-FHIR® https://github.com/microsoft/healthkit-to-FHIR® https://github.com/microsoft/FitbitOnFHIR® https://github.com/microsoft/FHIR®-Converter Please note that above open-source repositories might not be up to date. While they may not provide a complete (end to end) solution to map medical device data to FHIR®, they may still be helpful as a starting point. If you decide to incorporate them into your remote patient monitoring integration solution, validate their functionality and make necessary changes to meet your solution’s requirements. For the resulting FHIR® resources to be successfully consumed by the analytics processing later (within healthcare data solutions in Fabric), they need to satisfy the requisites listed below. Each FHIR® resource, in its entirety, needs to be saved as a single row into an NDJSON-formatted file. We recommend creating one NDJSON file per FHIR® resource type. That means creating Device.ndjson, Patient.ndjson, and Observation.ndjson files for the three entities we reviewed above. Each FHIR® resource needs to have a meta segment populated with inclusion of lastUpdated value. As an example: "meta":{"lastUpdated":"2025-05-15T15:35:04.218Z", "profile":["http://hl7.org/FHIR®/us/core/StructureDefinition/us-core-documentreference"]} Cross references between Observation and Patient, as well as between Observation and Device FHIR® resources need to be represented correctly, either through formal FHIR® identifiers or logical identifiers. As an example, the subject and device attributes of Observation FHIR® resource need to refer to Patient and Device FHIR® resources, respectively, in this manner: "subject":{"reference":"Patient/d3281621-1584-4631-bc82-edcaf49fda96"} "device":{"reference":"Device/5a934020-c2c4-4e92-a0c5-2116e29e757d"} For Patient FHIR® resource, if MRN is used as the identifier, it is important to represent the MRN value according to the FHIR® standard. Patient identifier is a critical attribute that it is used to establish cross-FHIR®-resource relationships throughout the analytics processing and consumption phase. We will review that phase later in this article. At a minimum, a Patient identifier, which uses MRN coding as its identifier type, needs to have its value, system, type.coding.system, and type.coding.code (with value “MR”) attributes populated correctly. See an example below. You can also refer to a Patient FHIR® resource example from hl7.org. "reference": null, "type": "Patient", "identifier": { "extension": null, "use": null, "value": "4e7e5bf8-2823-8ec1-fe37-eba9c9d69463", "system": "urn:oid: 1.2.36.146.595.217.0.1", "type": { "extension": null, "id": null, "coding": [ { "extension": null, "id": null, "system": "http://terminology.h17.org/CodeSystem/v2-0203", "version": null, "code": "MR", "display": null, "userSelected": null } "text": null }, ... With Step 3, to perform the transfer of FHIR® resource NDJSON files to healthcare data solutions in Fabric: Ensure that the integration platform (Azure Function App, in our case) has permission to transfer (upload) files to the healthcare data solutions in Fabric: Find the managed identity or the service principal that the Azure Function App is running under: Navigate to the Azure portal and find your Function App within your resource group. In the Function App's navigation pane, under "Settings," select "Identity". Identify the Managed Identity (if enabled): If System-assigned managed identity is enabled, you'll see information about the system-assigned managed identity, including its object ID and principal ID. If User-assigned managed identity is linked, the details of that identity will be displayed. You can also add user-assigned identities here if needed. Service Principal (if applicable): If the Function App is configured to use a service principal, you'll need to look for the service principal within the Azure Active Directory (a.k.a. Microsoft Entra ID). You can find this by searching for "Enterprise Applications" within Azure Active Directory and looking for the application associated with the Function App. Grant Azure Function App’s identity access to upload files: Having been logged into Fabric with an administrator account, navigate to the Fabric workspace where your healthcare data solutions instance is deployed. Click on the “Manage Access” button on the top right. Click on “Add People or Groups” Add the managed identity or the service principal, which is associated with your Azure Function App, with Contributor access by selecting “Contributor” from the dropdown list. Using a coding environment, similar to the Python example provided below, you can manage the OneLake content programmatically. This includes the ability to transfer (upload) the NDJSON-formatted files, which have been created earlier, to the destination OneLake folder. from azure.identity import DefaultAzureCredential from azure.storage.filedatalake import DataLakeFileClient, DataLakeFileSystemClient # Replace with your OneLake URI onelake_uri = "https://your-account-name.dfs.core.windows.net" # Replace with the destination path to your file file_path = "/<full path to destination folder (see below)>/<entity name>.ndjson" # Get the credential credential = DefaultAzureCredential() # Create a DataLakeFileClient file_client = DataLakeFileClient( url=f"{onelake_uri}{file_path}", credential=credential ) # Upload the file with open("<entity name>.ndjson", "rb") as f: file_client.upload_data(f, overwrite=True) print(f"File uploaded successfully: {file_path}") The destination OneLake folder to use for the remote patient monitoring solution integration into healthcare data solutions in Fabric is determined as follows: Navigate to the bronze lakehouse created with the healthcare data solutions instance inside the Fabric workspace. The lakehouse is named as “healthcare1_msft_bronze”. “healthcare1” segment in the name of the lakehouse points to the name of the healthcare data solutions instance deployed in the workspace. You might see a different name in your Fabric workspace; however, the rest of the lakehouse name (“_msft_bronze”) remains unchanged. Unified folder structure of healthcare data solutions is located inside the bronze lakehouse. Within that folder structure, create a subfolder named with the name of the remote patient monitoring solution you are integration with. See the screenshot below. This subfolder is referred to as namespace in healthcare data solutions documentation, and is used to uniquely identify the source of incoming (to-be-uploaded) data. NDJSON files, which have been generated during the previous interoperability phase, will be transferred (uploaded) into that subfolder. The full path of the destination OneLake folder to use in your file transfer (upload) code is: healthcare1_msft_bronze.Lakehouse\Files\Ingest\Clinical\FHIR®-NDJSON\<Solution-Name-as-Namespace> Analytics Processing and Consumption Phase Step 1 of this phase connects the interoperability phase discussed earlier with the analytics processing and consumption phase. As part of this step, you can simply verify that the NDJSON files have been uploaded to the remote patient monitoring solution subfolder inside the unified folder structure in bronze lakehouse of healthcare data solutions in Fabric. The path to that subfolder is provided earlier in this article. After the upload of the files has been completed, you are ready to run the data pipeline that will perform data ingestion and transformation so that the device readings data may be used for analytics purposes. In the Fabric workspace, where healthcare data solutions instance is deployed, find and open the data pipeline named “healthcare1_msft_omop_analytics”. As is the case with the bronze lakehouse name, “healthcare1” segment in the name of the data pipeline points to the name of the healthcare data solutions instance deployed in the workspace. You might see a different name in your Fabric workspace depending on your own instance. This data pipeline will execute four activities, first of which will copy the transferred files into another subfolder within unified folder structure so that they can be input to the ingestion step next. Then, the subsequent pipeline activities perform steps 2 through 4 as illustrated in the analytics processing and consumption phase diagram further above. Step 2 ingests the content from the transferred (NDJSON) file(s) to the ClinicalFHIR delta table of the bronze lakehouse. Step 3 transforms the content from the ClinicalFHIR delta table of the bronze lakehouse into flattened FHIR® data model content inside silver lakehouse. Step 4 transforms the flattened FHIR® content of silver lakehouse into OMOP data model content inside gold lakehouse. As part of step 5, you can develop your own gold lakehouse(s) through transforming content from the silver lakehouse into data model(s) best suited for your custom analytics use cases. Device data, once transformed into a gold lakehouse, may be used for analytics or reporting through several ways some of which are discussed briefly below. In step 6, Power BI reports and dashboards can be built inside Fabric that offer a visual and interactive canvas to analyze the data in detail. (Overview of Power BI - Microsoft Fabric | Microsoft Learn) As part of step 7, Fabric data share feature can be used to grant teams within external organizations (that you collaborate with) access to the data (External data sharing in Microsoft Fabric - Microsoft Fabric | Microsoft Learn). Finally, step 8 enables you to utilize the discover and build cohorts capability of healthcare data solutions in Fabric. With this capability, you can submit natural language queries to explore the data and build patient cohorts that fit the criteria that your use cases are aiming for. (Build patient cohorts with generative AI in discover and build cohorts (preview) - Microsoft Cloud for Healthcare | Microsoft Learn) Conclusion When integrated with healthcare data solutions in Fabric, remote patient monitoring solutions can enable transformative potential in enhancing patient outcomes, optimizing care coordination, and streamlining healthcare system operations. If your organization would like to explore the next steps in such a journey, please contact your Microsoft account team.Automating Data Vault processes on Microsoft Fabric with VaultSpeed
This Article is Authored By Jonas De Keuster from VaultSpeed and Co-authored with Michael Olschimke, co-founder and CEO at Scalefree International GmbH & Trung Ta is a senior BI consultant at Scalefree International GmbH. The Technical Review is done by Ian Clarke, Naveed Hussain – GBBs (Cloud Scale Analytics) for EMEA at Microsoft Businesses often struggle to align their understanding of processes and products across disparate systems in corporate operations. In our previous blogs in this series, we explored the advantages of Data Vault as a methodology and why it is increasingly recognized due to its automation-friendly approach to modern data warehousing. Data Vault’s modular structure, scalability, and flexibility address the challenges of integrating diverse and evolving data sources. However, the key to successfully implementing a Data Vault lies in automation. Data Vault’s pattern-based modeling - organized around hubs, links, and satellites - provides a standardized framework well-suited to integrate data from horizontally scattered operational source systems. Automation tools like VaultSpeed enhance this methodology by simplifying the generation of Data Vault structures, streamlining workflows, and enabling rapid delivery of analytics-ready data solutions. By leveraging the strengths of Data Vault and VaultSpeed’s automation capabilities, organizations can overcome inefficiencies in traditional ETL processes, enabling scalable and adaptable data integration. Examples of such operational systems include Microsoft Dynamics 365 for CRM and ERP, SAP for enterprise resource planning, or Salesforce for customer data. Attempts to harmonize this complexity historically relied on pre-built industry data models. However, these models often fell short, requiring significant customization and failing to accommodate unique business processes. Different approaches to Data Integration Industry data models offer a standardized way to structure data, providing a head start for organizations with well-aligned business processes. They work well in stable, regulated environments where consistency is key. However, for organizations dealing with diverse sources and fast-changing requirements, Data Vault offers greater flexibility. Its modular, scalable approach supports evolving data landscapes without the need to reshape existing models. Both approaches aim to streamline integration. Data Vault simply offers more adaptability when complexity and change are the norm. So it depends on the use cases when it comes to choosing the right approach. Tackling data complexity with automation Integrating data from horizontally distributed sources is one of the biggest challenges data engineers face. VaultSpeed addresses this by connecting the physical metadata from source systems with the business's conceptual data model and creating a "town plan" for building a Data Vault model. This "town plan" for Data Vault model construction serves as the bedrock for automating various data pipeline stages. By aligning data's technical and business perspectives, VaultSpeed enables the automated generation of logical and physical data models. This automation streamlines the design process and ensures consistency between the data's conceptual understanding and physical implementation. Furthermore, VaultSpeed's automation extends to the generation of transformation code. This code converts data from its source format into the structure defined by the Data Vault model. Automating this process reduces the potential for errors and accelerates the development of the data integration pipeline. In addition to data models and transformation code, VaultSpeed also automates workflow orchestration. This involves defining and managing the tasks required to extract, transform, and load data into the Data Vault. By automating this orchestration, VaultSpeed ensures that the data integration process is executed reliably and efficiently. How VaultSpeed automates integration The following section will examine the detailed steps involved in the VaultSpeed workflow. We will examine how it combines metadata-driven and data-driven modeling approaches to streamline data integration and automate various data pipeline stages. Harvest metadata: VaultSpeed collects metadata from source systems such as OneLake, AzureSQL, SAP, and Dynamics 365, capturing schema details, relationships, and dependencies. Align with conceptual models: Using a business’s conceptual data model as a guiding framework, VaultSpeed ensures that physical source metadata is mapped consistently to the target Data Vault structure. Generate logical and physical models: VaultSpeed leverages its metadata repository and automation templates to produce fully defined logical and physical Data Vault models, including hubs, links, and satellites. Automate code creation: Once the models are defined, VaultSpeed generates the necessary transformation and workflow code using templates with embedded standards and conventions for Data Vault implementation. This ensures seamless data ingestion, integration, and consistent population of the Data Vault model. By automating these steps, VaultSpeed eliminates the manual effort required for traditional data modeling and integration, reducing errors and addressing the inefficiencies of data integration using traditional ETL. Due to the model driven approach, the code is always in sync with the data model. Unified integration with Microsoft Fabric Microsoft Fabric offers a robust data ingestion, storage, and analytics ecosystem. VaultSpeed seamlessly embeds within this ecosystem to ensure an efficient and automated data pipeline. Here’s how the process works: Ingestion (Extract and Load): Tools like ADF, Fivetran, or OneLake replication bring data from various sources into Fabric. These tools handle the extraction and replication of raw data from operational systems. Microsoft Fabric also supports mirrored databases, enabling real-time data replication from sources like CosmosDB, Azure SQL, or application data into the Fabric environment. This ensures data remains synchronized across the ecosystem, providing a consistent foundation for downstream modeling and analytics. Data Repository or Platform: Microsoft Fabric is the data platform providing the infrastructure for storing, managing, and securing the ingested data. Fabric uniquely supports warehouse and lakehouse experiences, bringing them together under a unified data architecture. This means organizations can combine structured, transactional data with unstructured or semi-structured data in a single platform, eliminating silos and enabling broader analytics use cases. Modeling and Transformation: VaultSpeed takes over at this stage, leveraging its advanced automation to model and transform data into a Data Vault structure. This includes creating hubs, links, and satellites while ensuring alignment with business taxonomies. Unlike traditional ETL tools, VaultSpeed is not involved in the runtime execution of transformations. Instead, it generates code that runs within Microsoft Fabric. This approach ensures better performance, reduces vendor lock-in, and enhances security since no data flows through VaultSpeed itself. By focusing exclusively on model-driven automation, VaultSpeed enables organizations to maintain full control over their data processing while benefiting from automation efficiencies. Additionally, Fabric's VertiPaq engine manages the transformation workloads automatically, ensuring optimal performance without requiring extensive manual tuning, a key capability in a Data Vault context where performance is critical for handling large volumes of data and complex transformations. This simplifies operations for data engineers and ensures that query performance remains efficient, even as data volumes and complexity grow. Consume: The integrated data layer within Microsoft Fabric serves multiple consumption paths. While tools like Power BI enable actionable insights through analytics dashboards, the same data foundation can also drive AI use cases, such as machine learning models or intelligent applications. By connecting ingestion tools, a unified data platform, and analytics or AI solutions, VaultSpeed ensures a streamlined and integrated workflow that maximizes the value of the Microsoft Fabric ecosystem. Loading at multiple speeds: real-time Data Vaults with Fabric Loading data into a Data Vault often requires balancing traditional batch processes with the demands of real-time ingestion within a unified model. Microsoft Fabric’s event-driven tools, such as Data Activator, empower organizations to process data streams in real-time while supporting traditional batch loads. VaultSpeed complements these capabilities by ensuring that both modes of ingestion feed seamlessly into the same Data Vault model, eliminating the need for separate architectures like the Lambda pattern. Key capabilities for real time Data Vault include: Event-driven updates: Automatically trigger incremental loads into the Data Vault when changes occur in CosmosDB, OneLake, or other sources. Automated workflow orchestration: VaultSpeed’s Flow Management Control (FMC) automates the entire data ingestion, transformation, and loading workflow. This includes handling delta detection, incremental updates, and batch processes, ensuring optimal efficiency regardless of the speed of data arrival. FMC integrates natively with Azure Data Factory (ADF) for seamless orchestration within the Microsoft ecosystem. For more complex or distributed workflows, FMC also supports Apache Airflow, enabling flexibility in managing a wide range of data pipelines. Seamless integration: Maintain synchronized pipelines for historical and real-time data within the Fabric environment. The FMC intelligently manages multiple data streams, dynamically adjusting to workload demands to support high-volume batch loads and real-time event-driven updates. These capabilities ensure analytics dashboards reflect the latest data, delivering immediate value to decision-makers. Automating the gold layer and delivering data products at scale Power BI is a cornerstone of Microsoft Fabric, and VaultSpeed makes it easier for data modelers to connect the dots. By automating the creation of the gold layer, VaultSpeed enables seamless integration between Data Vaults and Power BI. Benefits for data teams: Automated gold layer: VaultSpeed automates the creation of the gold layer, including templates for star schemas, One Big Table (OBT), and other analytics-ready structures. These automated templates allow businesses to generate consistent and scalable presentation layers without manual intervention. Accelerated time to insight: By reducing manual preparation steps, VaultSpeed enables teams to deliver dashboards and reports quickly, ensuring faster access to actionable insights. Deliver data products: The ability to automate and standardize star schemas and other presentation models empowers organizations to deliver analytics-ready data products at scale, efficiently meeting the needs of multiple business domains. Improved data governance: VaultSpeed’s lineage tracking ensures compliance and transparency, providing full traceability from raw data to the presentation layer. No-code automation: Eliminate the need for custom scripting, freeing up time to focus on innovation and higher-value tasks. Conclusion Integrating VaultSpeed and Microsoft Fabric redefines how data modelers and engineers approach Data Vault 2.0. This partnership unlocks the full potential of modern data ecosystems by automating workflows, enabling real-time insights, and streamlining analytics. If you’re ready to transform your data workflows, VaultSpeed and Microsoft Fabric provide the tools you need to succeed. The following article will focus on the DataOps part of automation. Further reading Automating common understanding: Integrating different data source views into one comprehensive perspective Why Data Vault is the best model for data warehouse automation: Read the eBook The Elephant in the Fridge by John Giles: A great reference on conceptual data modeling for Data Vault About VaultSpeed VaultSpeed empowers enterprises to deliver data products at scale through advanced automation for modern data ecosystems, including data lakehouse, data mesh, and fabric architectures. The no-code platform eliminates nearly all traditional ETL tasks, delivering significant improvements in automation across areas like data modeling, engineering, testing, and deployment. With seamless integration to platforms like Microsoft Fabric or Databricks, VaultSpeed enables organizations to automate the entire software development lifecycle for data products, accelerating delivery from design to deployment. VaultSpeed addresses inefficiencies in traditional data processes, transforming how data engineers and business users collaborate to build flexible, scalable data foundations for AI and analytics. About the Authors Jonas De Keuster is VP Product at VaultSpeed. He had close to 10 years of experience as a DWH consultant in various industries like banking, insurance, healthcare, and HR services, before joining the data automation vendor. This background allows him to help understand current customer needs and engage in conversations with members of the data industry. Michael Olschimke is co-founder and CEO of Scalefree International GmbH, a European Big Data consulting firm. The firm empowers clients across all industries to use Data Vault 2.0 and similar Big Data solutions. Michael has trained thousands of industry data warehousing professionals, taught academic classes, and published regularly on these topics. Trung Ta is a senior BI consultant at Scalefree International GmbH. With over 7 years of experience in data warehousing and BI, he has advised Scalefree’s clients in different industries (banking, insurance, government, etc.) and of various sizes in establishing and maintaining their data architectures. Trung’s expertise lies within Data Vault 2.0 architecture, modeling, and implementation, specifically focusing on data automation tools. <<< Back to Blog Series Title Page517Views1like0Comments